AI Agent Failure Loops

AI agents do not only fail by giving a wrong answer.

Sometimes they fail by continuing.

I was reminded of this while working through a custom Cyrillic font project. The task looked bounded: adapt a Latin typographic voice into Cyrillic, export webfonts, generate specimens, compare mixed English and Russian lines, and iterate until the result felt native.

The agent did iterate.

Too well.

The same visible defects came back again and again. A glyph looked too short. A tail was too heavy. A lower line looked like a block instead of a serif detail. The gate passed, the screenshot failed, and the next attempt often changed the symptom without changing the underlying construction problem.

That is the failure loop.

Not "the model is dumb." Not "the prompt needs one more sentence." A failure loop is what happens when an autonomous workflow has enough permission to keep working, but not enough structure to know that its current method has stopped producing evidence.

This matters far beyond type design.

It applies to Claude Code, Codex, Cursor, Windsurf, marketing agents, ContentOS pipelines, browser agents, and any workflow where a model can edit, test, judge, and retry.

Agentic engineering

The dangerous moment is not the first mistake

The first mistake is normal.

The dangerous moment is when the same class of mistake survives the second review.

At that point, the agent has revealed something important: its validation loop cannot see the defect that the user can see. Continuing with the same loop is not diligence. It is a process bug.

In our font case, the repeated defects were visual:

A gate measured bounding boxes while the eye judged construction.
A glyph passed vertical metrics while its lower detail still looked wrong.
A generated proof showed the same problem, but the agent summarized the run as progress.
The same author designed the test, changed the glyph, and declared the result better.

That last point is the quiet killer.

When the same agent writes the gate and grades its own output, it can create a self-consistent system that still fails reality. OpenAI's eval guidance says evals need task-specific tests and human calibration, not only generic metrics or "it seems like it works" judgment. Anthropic's eval guidance similarly separates code-based, human, and LLM-based grading and recommends detailed rubrics for subjective cases. The principle is simple: the metric has to see the thing the user cares about.

If it cannot, the metric is not a gate.

It is a ritual.

Agentic engineering

Why agents get stuck in quality loops

There are five common causes.

First, the success condition is too broad.

"Make it look right" is a real human instruction, but it is not yet an agent contract. It must be decomposed into observable checks: alignment, stroke weight, spacing, contrast, fallback behavior, raster proof at target sizes, mixed-script consistency, and blind review.

Second, the available measurements are proxy measurements.

Bounding boxes, coverage, line height, compile success, and screenshot generation are useful. They are not acceptance. A font can compile and still look broken. A deck can export and still feel off-brand. A page can pass Lighthouse and still fail the buyer's question.

Third, the agent optimizes the last complaint rather than the defect class.

If the user says "Ц is short," the agent may make the bottom bar thicker because that improves a crude pixel metric. But the real fix may be to redraw the glyph from a better construction base. Local patching produces the appearance of work while preserving the wrong architecture.

Fourth, the author is contaminated.

Once the agent has seen the expected answer, the previous failed attempts, and the user's highlighted screenshots, its next "independent" review is not independent. It knows what it intended to fix. That makes it worse at noticing what it actually broke.

Fifth, the workflow has no stop rule.

"Keep going" is useful when the method is working. It is harmful when the method is not. Without a stop rule, an agent can spend 30 iterations producing increasingly elaborate proof artifacts while still missing the obvious defect.

Agentic engineering

The failure-loop breaker

A reliable agent workflow needs a hard mode switch.

My rule now is:

If the same visible defect class is rejected twice, stop implementation.

Do not make one more candidate.

Do not relax the threshold.

Do not add a reassuring paragraph.

Switch into failure-loop breaker mode.

That mode has five steps.

Downloadable guardrail

Can this become an agent skill?

Yes. The practical output from this failure is now packaged as a public Humanswith.ai devtool:

Download the Agent Failure Loop Breaker skill on GitHub.com

The repo turns the lesson into installable agent instructions and local guardrails for Claude Code, Codex, Cursor, and Windsurf. The goal is simple: when an agent repeats the same defect class, it should stop pretending the next small patch is enough and switch into a failure-loop breaker workflow.

I also published the shorter professional version on LinkedIn:

Read the LinkedIn.com discussion

And the developer-oriented cross-post on DEV.to:

Read the DEV.to cross-post

And the short X pointer:

Read the X.com pointer

Failure-loop breaker

1. Name the repeated defect class

The agent must say what is repeating.

Not "the font still needs polish." Something sharper:

The capital Cyrillic Ц and Щ tails are structurally wrong.
The italic д counter is too small and its top stroke is too light.
The spacing gate accepts words where the eye sees inconsistent rhythm.
The lower alignment gate treats a thick patch as equivalent to a real serif shelf.

Naming the class prevents the agent from treating each screenshot as a new unrelated complaint.

Failure-loop breaker

2. Build a rejected-build corpus

Every failed attempt becomes data.

For a font, that means the build version, the proof image, the target glyphs, the user's visual complaint, and the metric that falsely passed.

For a website, it might be screenshots, route smoke tests, visual diffs, and the exact acceptance criterion that failed.

For a content system, it might be rejected drafts, failed claims, weak citations, and publish-readiness reports.

This is where the loop becomes useful. The failed attempts are no longer waste. They become the red set.

Failure-loop breaker

3. Write a red-first gate

Before editing again, the new gate must fail at least two known bad builds.

This is the most important rule.

If the gate cannot catch the failures already visible to the user, it has no right to judge the next candidate.

For visual work, a red-first gate may combine deterministic checks with a human checklist:

Does the proof include the exact problem word or glyph pair?
Does the check compare the glyph to the correct neighbor, not only to a global metric?
Does the gate distinguish construction from padding?
Does it run at the real target size?
Does it fail previous rejected examples?

The gate does not have to be perfect. It has to be honest.

Failure-loop breaker

4. Separate author and reviewer

Blind validation is not bureaucracy. It is how you stop self-deception.

The author can write the general guard and deterministic grader. A separate reviewer, model, or human should inspect the output without being told what was changed.

For code, this can be a second agent reading the diff against a checklist.

For design, it can be a screenshot review where the reviewer only sees the target and candidate.

For content, it can be a source-integrity pass where the reviewer checks claims without seeing the intended narrative.

LangChain's human-in-the-loop pattern formalizes this idea at the tool level: when a risky action needs review, execution pauses and a human can approve, edit, reject, or respond. The same idea applies to quality gates. When the agent's gate has already failed twice, continuing without an independent reviewer is not autonomy. It is drift.

Failure-loop breaker

5. Change strategy or close the task

A failure-loop breaker must allow an uncomfortable answer:

This method is not working.

That answer is valuable.

It may mean choosing a different donor font, using a real font editor, reducing the scope, accepting a fallback, or closing the experiment as a failed attempt. In agentic systems, "stop" is not weakness. It is a valid terminal state.

The worst outcome is not a failed task.

The worst outcome is pretending the task is almost done for another 40 turns.

Agentic engineering

What this means for agentic engineering

The useful lesson is not about typography.

It is about quality ownership.

Agents are strong at producing candidate work, transforming files, running tests, and preserving evidence. They are weak when they are asked to be the sole judge of a subjective result that their current measurements cannot see.

So the operating model changes:

Use agents for production work while the proof loop is reducing uncertainty.
Switch to diagnostic mode when the same defect repeats.
Convert user complaints into rejected examples.
Make gates fail known bad outputs before trusting them.
Use blind validation when the author is contaminated.
Treat "close as failed" as a legitimate outcome.

This is the difference between an AI workflow and an AI slot machine.

Agentic engineering

The conclusion

Agentic work does not become reliable because the model tries harder.

It becomes reliable when the workflow knows when trying harder has stopped being evidence.

The strongest agent teams will not be separated by who has the sharpest prompt. They will be separated by who has the best failure memory: rejected corpora, red-first gates, blind review, stop rules, and clean handoffs.

That is the article I wish I had before spending dozens of iterations on a font that still did not pass the eye.

The failure was expensive.

The process it forced us to write down is worth keeping.

FAQ

Common questions about AI agent failure loops

What is an AI agent failure loop?

An AI agent failure loop is a repeated cycle where the agent keeps editing, testing, or generating artifacts while the same defect class survives. The important signal is not one bad output; it is repeated action without new evidence.

When should an agent stop instead of trying again?

If the same visible defect class is rejected twice, the agent should stop implementation and switch modes: collect rejected examples, write a red-first gate, or hand the task to a blind reviewer.

Why is another prompt not enough?

Another prompt can help when the task is underspecified. It does not solve a loop where the validator cannot detect the defect. In that case the missing piece is a deterministic gate or a better review protocol.

What is a rejected-build corpus?

A rejected-build corpus is a small set of known bad outputs that already failed human review. The next gate must reject those examples before it is allowed to score a new candidate.

How does this apply to content and marketing agents?

The same pattern applies when an agent keeps producing weak drafts, unsupported claims, bad screenshots, or broken layouts. The fix is the same: preserve rejected examples, turn them into gates, and stop rewarding blind persistence.

Sources

1. OpenAI, "Evaluation best practices"

https://developers.openai.com/api/docs/guides/evaluation-best-practices

2. OpenAI, "Working with evals"

https://developers.openai.com/api/docs/guides/evals

3. Anthropic, "Best practices for Claude Code"

https://code.claude.com/docs/en/best-practices

4. Anthropic, "Define success criteria and build evaluations"

https://platform.claude.com/docs/en/test-and-evaluate/develop-tests

5. LangChain, "Human-in-the-loop"

https://docs.langchain.com/oss/python/langchain/human-in-the-loop

6. Chen et al., "TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories"

https://arxiv.org/abs/2604.07223

Republished on Medium

Read and share the Medium.com version

What to cite from this page

The dangerous moment is not the first mistake

Why agents get stuck in quality loops

The failure-loop breaker

Can this become an agent skill?

1. Name the repeated defect class

2. Build a rejected-build corpus

3. Write a red-first gate

4. Separate author and reviewer

5. Change strategy or close the task

What this means for agentic engineering

The conclusion

Common questions about AI agent failure loops

What is an AI agent failure loop?

When should an agent stop instead of trying again?

Why is another prompt not enough?

What is a rejected-build corpus?

How does this apply to content and marketing agents?

Sources

Read these next