Research synthesis · Updated 17 May 2026

What AI systems cite, according to our 158-publication audit and a 150M-link Runet dataset

AI systems do not cite the best-written article by default. In Gregory Shevchenko's 158-publication audit, platform authority, topic framing, publication age, question-based headings, and answer-ready structure explained citation better than generic writing quality alone.1, 2 The public English adaptation makes the retrieval logic explicit for an international audience, while a 150M-link Runet dataset shows that AI traffic is already large enough to matter for founders and CMOs.3

Author
Gregory Shevchenko
Source base
3 public source assets, 2 original research pages, 1 partner-dataset caveat
Main claim
Topic, platform, age, and extractable structure beat generic quality scores.1, 2
Best use
Definition, methodology, findings, implications, and caveats for AI citation work

What to cite from this page

Use this page when you need one founder-authored explanation of what AI systems cite, how the 158-publication audit was structured, and why the 150M-link Runet dataset matters without turning a partner dataset into a product pitch.

  • The audit covered 158 publications, with 78 audited across 28 criteria and checked against multiple AI systems.1, 2
  • The sharpest sample result was platform dependence: vc.ru reached a 52% citation rate in the sample, while same-author corporate-blog and Medium copies hit 0%.1, 2
  • Article age and structure mattered: 2+ month pages were cited far more often than fresh pages, while question-form H2s and tables or checklists were the strongest text-level lifts.2
  • The Runet market dataset should be cited as partner evidence about AI traffic distribution, not as a Humanswith.ai product endorsement.3

Definition

What does this page mean by "what AI systems cite"?

This page is a synthesis of three public assets: the original Russian-language citation audit on vc.ru,1 the English LinkedIn version for an international AEO and GEO audience,2 and a separate Runet market analysis built on 150 million links collected from six AI services.3 Together they answer a founder-level question: what makes a page retrievable, quotable, and reusable inside ChatGPT, Alice, Perplexity, Google AI Overviews, Claude, and Gemini?

The short answer is practical. AI systems reward pages that solve a broad commercial question on a trusted surface, state the answer early, and package the answer into reusable chunks such as question-led sections, tables, checklists, and source-backed statistics.1, 2 This is a different optimization target from classic SEO alone. The English source makes that explicit by framing retrieval as chunk selection rather than site-wide favoritism: the model picks dense, relevant tokens, not the prettiest domain homepage.2

Signal What the source showed Why it matters
Audit size 158 publications reviewed, with 78 audited across 28 criteria.1, 2 Large enough to compare platform, topic, age, and structure patterns instead of one-off anecdotes.
Platform effect vc.ru reached a 52% citation rate in the sample; same-author copies on a corporate blog and Medium were 0%.1, 2 Off-site authority can matter more than generic "good writing" on a weak surface.
Age effect Articles older than two months were cited 43% of the time, versus 15% at one month and 7% for fresh pages.2 AI visibility needs indexing and trust accumulation time.
Structure effect Question-form H2s lifted citation by 19%, and comparison tables or checklists added another 18% in the sample.2 Reusable answer units are easier for retrieval systems to extract and reuse.
Market scale The partner dataset processed 150 million links from six AI services.3 AI traffic is not a toy signal. It is large enough to shape where brands publish and measure visibility.

Methodology

What did the 158-publication audit actually test?

The original audit came from an internal Humanswith.ai experiment that asked a concrete question: why would one article get cited by ChatGPT, Perplexity, Claude, Google AI Overviews, and other systems while a near-duplicate on another surface remained invisible?1, 2 The team reviewed 158 publications, audited 78 of them across 28 criteria, and then compared citation behavior across the major systems in the sample.1, 2

The audit did not treat "citation" as a vague vibe. It compared topic framing, surface authority, publication age, section structure, use of facts, table or checklist presence, and heading format. That matters because it separates two different problems. One is writing clarity. The other is retrieval fitness. The sources argue that AI citation depends more on the second problem than most content teams assume.1, 2

What was measured

Topic breadth, platform authority, age, heading structure, FAQ presence, tables or checklists, fact density, and cross-engine citation behavior in commercial-query contexts.1, 2

What it was not

It was not a universal law for every niche, language, or engine. It was a field study built from a real publishing corpus and then explained further in the English adaptation.1, 2

For a founder or CMO, that distinction matters. You should treat the audit as directional evidence about what makes a business page reusable inside AI answers, not as proof that one platform will always beat another in every market.

Findings

What patterns explained citation better than generic writing quality?

The most important finding in both the Russian and English sources is almost awkwardly simple: writing quality alone did not explain citation outcomes well.1, 2 In the sample, cited vc.ru articles averaged a quality score of 60.4, while uncited articles averaged 62.5.1, 2 The ignored articles were not dramatically worse. In formal terms they were slightly better. Citation still went elsewhere.

The sources reduce that result to a five-level hierarchy: topic first, platform second, age third, high-impact text structure fourth, and minimum structural threshold fifth.1, 2 Remove a high-level multiplier and the lower-level craft improvements do not rescue the page.

Working formula from the public English adaptation: right topic × authoritative platform × 2+ months of age × question-based headings × minimum structure threshold (FAQ + lists + tables + 1500+ words) = LLM citation.2

Broad business framing beats narrow headline targeting. Narrow vertical headlines such as bank-only or clinic-only framing collapsed the likely query surface in the cited sample.1, 2
Question-led H2 blocks create retrieval-friendly chunks. The audit measured a 19% lift from question-form headings, and the English version explains why: the next paragraph can act as the direct answer chunk.2
Tables and checklists improve extractability. Comparison tables or checklists produced an 18% lift in the sample because they package facts into machine-reusable units.2
Fresh publication is rarely enough. The age curve in the English version shows why teams should not panic after one week of silence: 43% for 2+ month pages versus 7% for fresh pages.2

Cross-engine reading

What did the English LinkedIn version add for non-Russian readers?

The English adaptation turned the field study into a clearer model of retrieval behavior. It states that the LLM does not pick websites in the abstract; it picks chunks that survive retrieval and Top-K selection.2 That framing helps explain why broad commercial topics, question-form headings, and self-contained sections outperform dense but poorly packaged material.

The English version also adds an engine-by-engine reading that the Russian article only hints at. Google AI Overviews still inherit part of classic Google authority, citing roughly 62% of domains that already rank in traditional results.2 Perplexity is described as stricter about sources and primary references, while Claude is portrayed as more conservative and more likely to lean toward established publishers and well-structured documentation.2 Those differences do not erase the core pattern. They show where the same article structure needs stronger sourcing or a different distribution surface.

For this page, that matters because the target audience is not only a Russian-language GEO operator. It is also an English-speaking founder, CMO, or operator who needs one coherent explanation of why citation readiness depends on structure, not on "AI-ready" buzzwords.

Market picture

What does the 150M-link Runet dataset change about the market picture?

The 150M-link source is different from the two citation-audit assets and should be treated differently.3 It is a partner-dataset market analysis, not a first-party product claim. Gregory Shevchenko introduces it as work from GPTfox and explicitly frames it as a dataset Humanswith.ai uses in Russian-market projects rather than a tool to promote on the personal site.3

Its value is scale. The dataset covers 150 million links extracted from real answers across six AI services, which means it can show which categories and surfaces already absorb AI traffic at market level.3 In the public write-up, financial aggregators, review platforms, and strong UGC surfaces appear repeatedly. The article also argues that vc.ru keeps outrunning classic editorial media in many GEO-style comparisons, which supports the platform-authority conclusion from the 158-publication audit.3

The correct use of this dataset is strategic. It helps answer where Russian-language AI traffic already clusters and which third-party surfaces deserve distribution effort. The incorrect use is to cite it as if it were a neutral product benchmark for Humanswith.ai itself. This page keeps that caveat visible on purpose.3

Implications

What should founders and CMOs do with these findings?

The immediate lesson is not "write more content." It is "publish the right answer on the right surface, then give it time to become retrievable." The sources point to five operating rules that are practical enough to reuse.1, 2, 3

  1. Pick a broad business question before you pick a narrow vertical angle.
  2. Publish first on surfaces that already carry trust in your market, then connect that authority back to the first-party site.
  3. Make each H2 behave like a user question and make the next paragraph answer it directly.
  4. Package evidence into tables, checklists, and dated source-backed statements instead of burying it in narrative paragraphs.
  5. Measure citation, not only traffic, and allow at least a two-month window before calling the page dead.2

On this site, that is why the research archive and the writing archive exist as first-party citation targets, while public platform assets on vc.ru and LinkedIn remain visible as supporting authority surfaces rather than hidden "off-site" work.

Caveats

Which caveats matter before anyone turns this into a universal law?

Three caveats belong next to every citation claim on this topic. First, the 158-publication audit is a real field study, but it is still sample-bound.1, 2 Second, the English adaptation adds cross-engine interpretation, which is useful but should not be mistaken for a fully controlled benchmark.2 Third, the 150M-link Runet article is a partner-dataset market analysis, so its strategic value is strongest when you use it to understand category-level traffic patterns, not to borrow someone else's product positioning.3

Those caveats do not weaken the case for AEO and GEO. They make the case cleaner. A useful research page should tell readers which claims are first-party findings, which claims are synthesis, and which claims depend on partner data.

Research method

How this page was assembled

This page is a founder-authored synthesis, not a new net-new experiment. I used three source files from the local authority knowledge base, checked each numerical claim against those files, and kept the public-source caveats visible inside the body and source list. The goal was to create one page that can answer five recurring questions: definition, findings, methodology, implications, and caveats.

  • Primary source 1: the original Russian-language vc.ru publication on the 158-publication citation audit.
  • Primary source 2: the public English LinkedIn adaptation that expands the retrieval explanation and adds cross-engine commentary.
  • Supporting source 3: the vc.ru article on 150 million Runet AI links, clearly labeled here as partner-dataset evidence.

Sources

References and source notes

FAQ

Frequently asked questions

Q: Does better writing alone get a page cited by AI systems?

A: No. The audit's core surprise is that generic writing quality did not explain citation as well as topic framing, platform authority, publication age, and answer-ready structure.1, 2

Q: How long should a team wait before judging a new page?

A: The public English source reports 43% citation for 2+ month pages, versus 15% at one month and 7% for fresh pages.2 That does not mean every page will wait two months, but it does mean that one week is the wrong evaluation window.

Q: Should founders treat the 150M-link article as a product proof page?

A: No. Treat it as partner-dataset market evidence about where AI traffic clusters in the Russian-language market.3 That is why this page repeats the caveat instead of turning the source into a sales proxy.

Q: What internal pages should I read next on this site?

A: Start with the research archive for the authority layer and the writing archive for the broader editorial map. The speaking archive adds public talks and decks that reinforce the same topic set.

Q: Why do you recommend six FAQ questions?

A: Six is a practical baseline: it gives you multiple reusable answer chunks, covers objections, and increases the odds that one answer matches a prompt. Use fewer if you genuinely have fewer questions—do not pad with filler.

Q: Should FAQ answers cite sources?

A: When you make factual or comparative claims, yes. Keep a visible Sources section with links to the exact pages behind the claims, and keep the visible FAQ aligned with the FAQ schema when you update the page.

Related pages