Method
Why retesting is different from monitoring
Monitoring tells you what changed. Retesting tells you what to do because something changed.
Most AI visibility pages stop at a dashboard problem: build a prompt library, run it across ChatGPT, Perplexity, Gemini, Google AI Overviews, Claude, and Copilot, then watch citation frequency, share of voice, sentiment, source URLs, and platform variance. That work matters. Otterly.ai frames citation tracking as a repeatable workflow with weekly pulse checks, monthly analysis, and quarterly review, and it separates mentions from linked citations 1. Semrush defines AI visibility across mentions, citations, and recommendations in AI-generated answers 4.
But a dashboard does not repair a source graph by itself. A report can tell you "mentioned but not cited." It cannot decide whether the next task is a title/meta repair, a source-pack update, a third-party correction, a distribution push, or a new canonical page.
That is the missing operating layer. Retesting should convert observations into backlog states.
Method
The 30-day retesting cadence
Use a 30-day cadence for newly published source pages, repaired citation-gap pages, and high-value updates. The exact timing will vary by crawl speed, engine, and category volatility, but the decision logic should stay stable.
| Checkpoint | What to prove | What can change | Backlog decision |
|---|---|---|---|
| 24-48 hours | Live URL, canonical, robots, sitemap, feed, llms.txt, schema, and extractor output. | Technical eligibility. | Fix crawl/discovery before judging content. |
| Day 7 | Target prompts rerun across engines with baseline comparison. | Answer state, mention, citation, cited URL, sentiment. | Mark early movement, but avoid strategic overreaction. |
| Day 14 | Source dominance and competitor/source-type pattern. | Owned page, competitor page, third-party source, stale source, no source. | Choose page repair, distribution, third-party work, or new source pack. |
| Day 30 | Trend across two or more checks plus downstream signals. | Stable citation, unstable citation, no citation, negative context, wrong source. | Keep, refresh, create, distribute, correct, or escalate to ContentOS brief. |
The cadence protects the team from two opposite mistakes: declaring success too early, or rewriting the page before the page has had a fair chance to be discovered and tested.
Method
Start with a baseline before the page changes
The retest loop starts before publication. Save the baseline answer for every target prompt while the page is still unchanged or unpublished.
Baseline capture should include the prompt, engine, market, language, date, answer text, cited URLs, visible source titles, brand mention, competitor mentions, answer state, and screenshot or export. If the answer is personalized or account-dependent, record the account or environment class. Do not mix logged-in and logged-out checks in the same trend line.
The baseline has two jobs. First, it prevents false positives. If Perplexity already cited the brand before the repair, the new page did not cause that citation. Second, it prevents false negatives. If Google AI Overviews did not trigger for the prompt before publication and still does not trigger at day 7, the issue may be answer availability, not page quality.
For a source-pack repair, the baseline row should also include the claim being tested. That claim is the bridge between AI visibility and editorial work. "Does the answer cite us?" is too broad. "Does the answer cite the canonical page for the 30-day citation repair cadence?" is actionable.
Method
Use thresholds, not vibes
The retest loop needs thresholds because AI answers are variable. A single miss can be noise. A repeated miss can be work.
Use conservative thresholds for early-stage pages:
| Signal | Threshold | Decision |
|---|---|---|
| Technical discovery failure | 1 confirmed failure | Fix immediately before prompt retesting. |
| Brand mentioned but not cited | 2 or more target prompts by day 14 | Create a source-ownership task. |
| Competitor cited | Same competitor or source type appears in 2 or more engines | Run competitor/source comparison. |
| Stale source cited | Any high-risk outdated factual source | Correct or supersede the stale source. |
| Cited, not absorbed | Owned URL cited in 2 checks without answer-language uptake | Add stronger extractable answer units. |
| No movement | 30 days with no state improvement on strategic prompts | Escalate to source-pack rebuild or new page decision. |
The exact numbers can change for a large brand or a high-volume category. The principle should not. Decide what pattern is strong enough to create work before you look at the answer.
Method
The retest log schema
The retest row is the smallest useful unit of AI visibility work.
| Field | Why it matters |
|---|---|
| Prompt | The exact query or buyer question being tested. |
| Engine | ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude, or another answer engine. |
| Region and language | AI answers vary by market and language. |
| Baseline answer state | Absent, mentioned, cited, wrong-source, competitor-cited, stale-source, cited-not-absorbed. |
| Current answer state | The new state after retest. |
| Cited URL | The actual URL selected by the answer. |
| Source type | First-party page, competitor page, third-party article, directory, forum, research report, social profile, documentation. |
| Answer absorption | Whether the answer used the page's definition, numbers, comparison, or process, not only linked it. |
| Competitor source | Which competitor or third-party source is winning. |
| Sentiment | Positive, neutral, negative, or inaccurate. |
| Action | Keep, refresh, create, distribute, correct, or escalate. |
| Owner | Content, technical SEO, PR, product marketing, partner/source owner, or ContentOS. |
| Next retest date | The next scheduled measurement. |
If a row has no action field, it is not a retest log. It is a measurement archive.
Method
Group prompts by job, not only by keyword
AI Search prompts should be grouped by the job the answer has to perform.
Use five prompt groups:
- Definition prompts: "What is X?" or "What does this method mean?"
- How-to prompts: "How do I do X?"
- Comparison prompts: "X vs Y" or "best tools for X."
- Troubleshooting prompts: "Why is X not working?"
- Decision prompts: "When should I do X instead of Y?"
This article's primary job is a how-to and troubleshooting job. It should answer how often to retest, what to check, and what to do when the result is weak. If a comparison prompt asks for the best AI visibility tools, this page may not need to win as the only cited page. It may need to be cited as the workflow source that explains what the tools should feed.
That distinction keeps the backlog sane. Not every prompt should become a new page. Some prompts should map to a tool page, some to a methodology page, some to a glossary note, and some to a third-party corroboration push.
Method
What to do in the first 24-48 hours
The first check is not about whether ChatGPT cites the page. It is about whether the page is technically eligible to become a source.
Check the live URL. Check status code. Check canonical. Check robots and noindex. Check whether the page appears in sitemap, feed, and LLM discovery surfaces. Check whether Article and FAQPage schema parse. Check whether the first screen answers the primary prompt. Check whether the visible source links are actually visible to users, not only hidden in JSON-LD.
For gregshevchenko.com, this means proving the canonical page, sitemap, feed.xml, llms.txt, llms-full.txt, visible sources, FAQ parity, and extracted article text before treating an AI answer as a content-quality signal.
If the page fails this step, do not rewrite the article. Fix the discovery problem.
Method
What to do at day 7
Day 7 is the first prompt retest.
Rerun the target prompts across the target engines. Do not use a single prompt. A page can move for the direct prompt and stay invisible for comparison prompts. A page can be cited in Perplexity and absent in Gemini. A page can be mentioned by ChatGPT but cited through a third-party page.
Classify each prompt into one of six states:
- Absent: the brand or page does not appear.
- Mentioned, not cited: the brand appears, but no owned or accepted source is linked.
- Wrong-source cited: an old page, directory, or third-party summary wins.
- Competitor-cited: the answer uses a competitor as the evidence anchor.
- Stale-source cited: the answer cites outdated information.
- Cited, not absorbed: the URL appears, but the answer does not use the page's evidence or framing.
Day 7 is early. Treat it as movement detection, not final judgment.
Method
How to handle engine-specific results
Do not average engines too early.
ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews do not behave like five identical search results pages. They differ in source availability, citation UX, retrieval paths, freshness, and how much they expose. A prompt can improve in one engine and stay flat in another for reasons that are not purely editorial.
Use engine-specific notes:
- ChatGPT: track whether the answer uses a definition or framework from the page, even when citations are inconsistent by interface or mode.
- Perplexity: inspect the exact cited URLs and whether the answer prefers article, forum, directory, or first-party pages.
- Google AI Overviews: separate whether an overview triggers from whether the target page is selected as a source.
- Gemini: compare answer wording and source mix across repeated runs before deciding that a page failed.
- Claude: track source use when browsing/search is available, but avoid treating a no-source answer as the same signal as a cited-source answer.
The output should not be "AI visibility score went up." The output should be "Perplexity moved from competitor-cited to owned-cited for the how-to prompt; Gemini still cites a third-party explainer for the comparison prompt; Google AI Overviews has no stable trigger yet."
Method
Why day 14 matters
Day 14 is when the source pattern starts to matter more than the individual answer.
If the same competitor domain appears across several engines, you have a source-dominance problem. If third-party pages win repeatedly, you may need corroboration or external source work. If the page is cited but not absorbed, your article may have enough authority to be selected but not enough extractable structure to shape the answer.
Profound's platform citation research is useful here because it reminds teams that answer engines source information differently 2. A source that wins in Perplexity may not be the source that wins in Google AI Overviews. That is why the day-14 review should group failures by source type and engine, not by one aggregate score.
Use this matrix:
| Day-14 pattern | Diagnosis | Next action |
|---|---|---|
| Owned page cited and absorbed | The source page is working. | Keep monitoring; add internal links if needed. |
| Owned page cited but not absorbed | The page is selected but not shaping the answer. | Add clearer definitions, comparison rows, numbers, and procedural snippets. |
| Brand mentioned but not cited | Recognition exists, source ownership is weak. | Build a stronger source unit and reinforce discovery. |
| Third-party source wins | The engine trusts independent corroboration. | Improve or influence the third-party source and link it back to canonical facts. |
| Competitor wins | The competitor owns the answer's evidence shape. | Compare source depth, freshness, proof type, and page structure. |
| Stale source wins | The answer is anchored to old evidence. | Refresh dated facts and distribute the newer source. |
| No stable pattern | The prompt may be volatile. | Increase sample size before rewriting. |
Method
The day-30 decision
At day 30, make a backlog decision. Do not leave the row in a permanent "monitoring" state.
The day-30 options are:
- Keep: the page is cited or answer state improved enough; continue monthly monitoring.
- Refresh: the page is eligible but lacks extractable answer units, fresh facts, or stronger structure.
- Create: the prompt needs its own page instead of being buried inside a broader article.
- Distribute: the source exists, but engines still prefer external corroboration or recent references.
- Correct: a third-party or stale source is wrong and needs outreach, profile repair, or replacement.
- Escalate: the pattern needs a ContentOS brief, source-pack rebuild, technical audit, or PR/source strategy.
ZipTie's freshness framing supports the refresh question, but refresh is only one possible decision 5. A page can be fresh and still lose because the source type is wrong. A page can be old and still win because it has the clearest answer unit.
The day-30 decision should answer: what is the smallest source-graph change likely to move the next retest?
Method
Example backlog after a 30-day retest
Imagine the canonical page was published for the prompt: "How often should I retest AI Search citations?"
At baseline, ChatGPT mentioned the concept but cited no owned page. Perplexity cited a tool guide. Google AI Overviews did not trigger. Gemini cited a general AI SEO article. Claude answered without sources.
At 24-48 hours, the page is live, canonical, indexed in sitemap, included in feed and llms.txt, and Article/FAQPage schema parses. No technical task remains.
At day 7, Perplexity cites the canonical page for one prompt, but not for comparison prompts. ChatGPT uses the "30-day loop" phrasing without citation. Gemini still cites the third-party article.
At day 14, the source pattern is clear: the owned page is strong for the exact how-to prompt, but the comparison prompt still wants broader third-party validation. The backlog should not be "rewrite the whole page." It should be:
| Backlog item | Owner | Reason |
|---|---|---|
| Add one comparison table that names what tracking tools provide vs what the repair workflow requires. | ContentOS/editor | The comparison prompt needs a clearer bridge from tools to workflow. |
| Add internal links from measurement and citation-gap pages to the new retesting page. | Site/content | The source graph should show this as the next step after repair. |
| Create one external LinkedIn/Medium adaptation that cites the canonical framework. | Distribution | Engines may need corroborating offsite references. |
| Retest comparison prompts at day 30. | AI Visibility operator | Check whether source dominance moved. |
That is a productive retest. It created narrow work.
Method
When a retest becomes a ContentOS brief
A retest should become a ContentOS brief when the row contains a repeatable pattern, not just a weird answer.
Good triggers:
- Three or more target prompts show the same answer-state failure.
- A competitor is repeatedly cited for the claim you want to own.
- The brand is mentioned but not linked across multiple engines.
- The answer cites your page but does not absorb the definition, table, or proof.
- A third-party source wins with stale or incomplete facts.
- The prompt clearly needs a page that does not exist.
The ContentOS brief should not say "write an article about AI visibility." It should include the prompt, failed answer state, winning source, source type, target claim, missing evidence, required page surface, required third-party corroboration, and next retest date.
That is how the measurement loop becomes production work.
Method
What not to do after a weak retest
Do not rewrite the page because one answer did not cite it.
Do not add generic "AI search" keywords if the failure is third-party trust.
Do not treat a brand mention as a citation win.
Do not treat a citation as a complete win if the answer ignores the page's definition, table, or evidence.
Do not collapse all engines into one score before source-type analysis.
Do not publish a new page for every prompt. Some prompt gaps need internal links, source refresh, offsite corroboration, schema fixes, or third-party source repair.
And do not leave old retest rows open forever. A row that never becomes a decision teaches the team to ignore the measurement system.
Method
How this connects to the source-pack workflow
The citation-gap repair workflow explains how to diagnose answer states and rebuild the evidence graph. This retesting cadence explains how to know whether the repair worked.
The source pack should travel with the retest row. It should include:
- Primary prompt and target prompt cluster.
- Desired canonical URL.
- Current winning sources.
- Claim being supported.
- Approved first-party evidence.
- Approved third-party corroboration.
- Rejected or stale evidence.
- Required snippets, FAQ, schema, and internal links.
- Retest schedule and owner.
Without that packet, teams tend to re-open the article and make unfocused edits. With it, the next action is obvious.
Method
The operating rhythm
For a small team, the simplest rhythm is weekly review and monthly decisions.
Every week, rerun the watched prompt set and mark state changes. Every month, make backlog decisions for rows that crossed thresholds. Do not let the weekly meeting become a discussion about every interesting answer. Keep it to changed states, source dominance, negative or inaccurate answers, and rows that now require work.
The recurring agenda can be short:
- Which prompts changed state?
- Which prompts now cite owned sources?
- Which prompts cite competitors or stale third-party sources?
- Which owned citations are not absorbed into the answer?
- Which rows crossed a threshold for ContentOS, technical SEO, distribution, or third-party work?
- Which rows should be closed, watched, or escalated?
That is enough. AI Search retesting should not become a second analytics department. It should become the source of the next useful repair.
FAQ
FAQ
How often should AI Search citations be retested?
For a newly published or repaired source page, use a 30-day loop: 24-48h technical proof, day 7 prompt retest, day 14 source-dominance review, and day 30 backlog decision. After that, move strategic prompts to weekly or monthly monitoring depending on value and volatility.
What should be checked in the first 48 hours?
Check live status, canonical, robots, noindex, sitemap, feed, llms.txt, schema, FAQ parity, visible source links, and extracted first-screen answer. Do not judge citation performance until technical eligibility is proven.
What changes at day 7?
Day 7 is the first prompt-level comparison. Rerun the target prompts across engines and compare answer state, cited URL, source type, sentiment, and whether the answer absorbed the page's evidence.
Why does day 14 matter?
Day 14 is where source patterns become visible. You can see whether owned pages, competitors, third-party sources, stale pages, or no stable source dominate the prompt cluster.
What is the day-30 decision?
The day-30 decision is the backlog action: keep, refresh, create, distribute, correct, or escalate to a ContentOS/source-pack repair brief.
How are mentions different from citations?
A mention names the brand. A citation links or attributes information to a source. A brand can be mentioned often and still have a source ownership problem if the answer cites competitors or third-party summaries.
When should a retest become a ContentOS brief?
Create a ContentOS brief when multiple prompts show the same failure, a competitor repeatedly wins the claim, a third-party source dominates, or the page is cited but not absorbed into the answer.
Sources
Sources
How to Track AI Search Engine Citations & Sources: The Complete Guide for 2026
Mention-vs-citation distinction, prompt library, weekly/monthly/quarterly monitoring cadence, platform variance, source URL distribution, competitor tracking.
AI Platform Citation Patterns: How ChatGPT, Google AI Overviews, and Perplexity Source Information
Engine-specific source behavior and the need to compare citation patterns across answer engines.
The 10 Best AI Visibility Tools in 2026
Competitor/tool landscape and the key distinction between tools that only alert and tools that help fix the gap.
AI visibility: What it is and how to grow yours in 2026
Definition of AI visibility across mentions, citations, and recommendations.
Content Refresh Strategy for AI Citations
Freshness and refresh-cadence language for day-30 backlog decisions.
Best AI search monitoring tools for 2026
Monitoring-tool market framing: inaccurate claims, competitor presence, citation patterns, and blind spots.
How to measure AI Search visibility with prompts, citations, traffic, and revenue signals
Canonical measurement vocabulary: prompts, citations, recommendation context, source surfaces, traffic, and downstream demand.
AI visibility measurement is a weekly operating rhythm
Parent operating model; this article should extend it with 30-day post-publish decision gates.
AI Search Citation Gap Repair Workflow
Immediate predecessor article; this retesting article should continue from the source-pack repair workflow.
How to build a source pack for AI Search content
Source-pack artifact shape and evidence discipline.
Prompt-page map for AI Search site architecture
Prompt-to-page mapping and required page surfaces.
ContentOS first-party and third-party evidence scoring for AI Search
How retest failures become evidence scoring and repair tasks.