Founder note · Updated 24 May 2026

AI agent failure loops

AI agents do not only fail by giving a wrong answer. Sometimes they fail by continuing: changing files, generating new proof artifacts, and summarizing progress while the same visible defect survives every loop.

This note turns a failed custom-font project into a practical failure-loop breaker for Claude Code, Codex, Cursor, Windsurf, ContentOS pipelines, browser agents, and any workflow where an agent can edit, test, judge, and retry.

Core rule
If the same visible defect class is rejected twice, stop implementation and switch to failure-loop breaker mode.
Public skill
The guardrail is packaged for Claude Code, Codex, Cursor, and Windsurf in a public GitHub repo.
Distribution
Canonical page here, professional discussion on LinkedIn, and developer cross-post on DEV.to.

What to cite from this page

  • A failure loop begins when the validation loop cannot see a defect the user can see.
  • Known bad outputs must become a rejected-build corpus and a red-first gate before the next implementation attempt.
  • The same agent should not write the gate, change the artifact, and declare the result acceptable after repeated failures.
  • Stopping a task as failed is a valid terminal state when the current strategy no longer produces evidence.

AI agents do not only fail by giving a wrong answer.

Sometimes they fail by continuing.

I was reminded of this while working through a custom Cyrillic font project. The task looked bounded: adapt a Latin typographic voice into Cyrillic, export webfonts, generate specimens, compare mixed English and Russian lines, and iterate until the result felt native.

The agent did iterate.

Too well.

The same visible defects came back again and again. A glyph looked too short. A tail was too heavy. A lower line looked like a block instead of a serif detail. The gate passed, the screenshot failed, and the next attempt often changed the symptom without changing the underlying construction problem.

That is the failure loop.

Not "the model is dumb." Not "the prompt needs one more sentence." A failure loop is what happens when an autonomous workflow has enough permission to keep working, but not enough structure to know that its current method has stopped producing evidence.

This matters far beyond type design.

It applies to Claude Code, Codex, Cursor, Windsurf, marketing agents, ContentOS pipelines, browser agents, and any workflow where a model can edit, test, judge, and retry.

Agentic engineering

The dangerous moment is not the first mistake

The first mistake is normal.

The dangerous moment is when the same class of mistake survives the second review.

At that point, the agent has revealed something important: its validation loop cannot see the defect that the user can see. Continuing with the same loop is not diligence. It is a process bug.

In our font case, the repeated defects were visual:

  1. A gate measured bounding boxes while the eye judged construction.
  2. A glyph passed vertical metrics while its lower detail still looked wrong.
  3. A generated proof showed the same problem, but the agent summarized the run as progress.
  4. The same author designed the test, changed the glyph, and declared the result better.

That last point is the quiet killer.

When the same agent writes the gate and grades its own output, it can create a self-consistent system that still fails reality. OpenAI's eval guidance says evals need task-specific tests and human calibration, not only generic metrics or "it seems like it works" judgment. Anthropic's eval guidance similarly separates code-based, human, and LLM-based grading and recommends detailed rubrics for subjective cases. The principle is simple: the metric has to see the thing the user cares about.

If it cannot, the metric is not a gate.

It is a ritual.

Agentic engineering

Why agents get stuck in quality loops

There are five common causes.

First, the success condition is too broad.

"Make it look right" is a real human instruction, but it is not yet an agent contract. It must be decomposed into observable checks: alignment, stroke weight, spacing, contrast, fallback behavior, raster proof at target sizes, mixed-script consistency, and blind review.

Second, the available measurements are proxy measurements.

Bounding boxes, coverage, line height, compile success, and screenshot generation are useful. They are not acceptance. A font can compile and still look broken. A deck can export and still feel off-brand. A page can pass Lighthouse and still fail the buyer's question.

Third, the agent optimizes the last complaint rather than the defect class.

If the user says "Ц is short," the agent may make the bottom bar thicker because that improves a crude pixel metric. But the real fix may be to redraw the glyph from a better construction base. Local patching produces the appearance of work while preserving the wrong architecture.

Fourth, the author is contaminated.

Once the agent has seen the expected answer, the previous failed attempts, and the user's highlighted screenshots, its next "independent" review is not independent. It knows what it intended to fix. That makes it worse at noticing what it actually broke.

Fifth, the workflow has no stop rule.

"Keep going" is useful when the method is working. It is harmful when the method is not. Without a stop rule, an agent can spend 30 iterations producing increasingly elaborate proof artifacts while still missing the obvious defect.

Agentic engineering

The failure-loop breaker

A reliable agent workflow needs a hard mode switch.

My rule now is:

If the same visible defect class is rejected twice, stop implementation.

Do not make one more candidate.

Do not relax the threshold.

Do not add a reassuring paragraph.

Switch into failure-loop breaker mode.

That mode has five steps.

Downloadable guardrail

Can this become an agent skill?

Yes. The practical output from this failure is now packaged as a public Humanswith.ai devtool:

Download the Agent Failure Loop Breaker skill on GitHub.com

The repo turns the lesson into installable agent instructions and local guardrails for Claude Code, Codex, Cursor, and Windsurf. The goal is simple: when an agent repeats the same defect class, it should stop pretending the next small patch is enough and switch into a failure-loop breaker workflow.

I also published the shorter professional version on LinkedIn:

Read the LinkedIn.com discussion

And the developer-oriented cross-post on DEV.to:

Read the DEV.to cross-post

And the short X pointer:

Read the X.com pointer

Failure-loop breaker

1. Name the repeated defect class

The agent must say what is repeating.

Not "the font still needs polish." Something sharper:

  1. The capital Cyrillic Ц and Щ tails are structurally wrong.
  2. The italic д counter is too small and its top stroke is too light.
  3. The spacing gate accepts words where the eye sees inconsistent rhythm.
  4. The lower alignment gate treats a thick patch as equivalent to a real serif shelf.

Naming the class prevents the agent from treating each screenshot as a new unrelated complaint.

Failure-loop breaker

2. Build a rejected-build corpus

Every failed attempt becomes data.

For a font, that means the build version, the proof image, the target glyphs, the user's visual complaint, and the metric that falsely passed.

For a website, it might be screenshots, route smoke tests, visual diffs, and the exact acceptance criterion that failed.

For a content system, it might be rejected drafts, failed claims, weak citations, and publish-readiness reports.

This is where the loop becomes useful. The failed attempts are no longer waste. They become the red set.

Failure-loop breaker

3. Write a red-first gate

Before editing again, the new gate must fail at least two known bad builds.

This is the most important rule.

If the gate cannot catch the failures already visible to the user, it has no right to judge the next candidate.

For visual work, a red-first gate may combine deterministic checks with a human checklist:

  1. Does the proof include the exact problem word or glyph pair?
  2. Does the check compare the glyph to the correct neighbor, not only to a global metric?
  3. Does the gate distinguish construction from padding?
  4. Does it run at the real target size?
  5. Does it fail previous rejected examples?

The gate does not have to be perfect. It has to be honest.

Failure-loop breaker

4. Separate author and reviewer

Blind validation is not bureaucracy. It is how you stop self-deception.

The author can write the general guard and deterministic grader. A separate reviewer, model, or human should inspect the output without being told what was changed.

For code, this can be a second agent reading the diff against a checklist.

For design, it can be a screenshot review where the reviewer only sees the target and candidate.

For content, it can be a source-integrity pass where the reviewer checks claims without seeing the intended narrative.

LangChain's human-in-the-loop pattern formalizes this idea at the tool level: when a risky action needs review, execution pauses and a human can approve, edit, reject, or respond. The same idea applies to quality gates. When the agent's gate has already failed twice, continuing without an independent reviewer is not autonomy. It is drift.

Failure-loop breaker

5. Change strategy or close the task

A failure-loop breaker must allow an uncomfortable answer:

This method is not working.

That answer is valuable.

It may mean choosing a different donor font, using a real font editor, reducing the scope, accepting a fallback, or closing the experiment as a failed attempt. In agentic systems, "stop" is not weakness. It is a valid terminal state.

The worst outcome is not a failed task.

The worst outcome is pretending the task is almost done for another 40 turns.

Agentic engineering

What this means for agentic engineering

The useful lesson is not about typography.

It is about quality ownership.

Agents are strong at producing candidate work, transforming files, running tests, and preserving evidence. They are weak when they are asked to be the sole judge of a subjective result that their current measurements cannot see.

So the operating model changes:

  1. Use agents for production work while the proof loop is reducing uncertainty.
  2. Switch to diagnostic mode when the same defect repeats.
  3. Convert user complaints into rejected examples.
  4. Make gates fail known bad outputs before trusting them.
  5. Use blind validation when the author is contaminated.
  6. Treat "close as failed" as a legitimate outcome.

This is the difference between an AI workflow and an AI slot machine.

Agentic engineering

The conclusion

Agentic work does not become reliable because the model tries harder.

It becomes reliable when the workflow knows when trying harder has stopped being evidence.

The next generation of agent teams will not be separated by who has the best prompt. They will be separated by who has the best failure memory: rejected corpora, red-first gates, blind review, stop rules, and clean handoffs.

That is the article I wish I had before spending dozens of iterations on a font that still did not pass the eye.

The failure was expensive.

The process it forced us to write down is worth keeping.

Sources

Sources

Related pages