Open-source AI Search visibility audit stack

Operating principle

Why I started with deterministic gates

AI Search visibility can get vague very quickly. People jump from "we need to appear in ChatGPT" to prompts, dashboards, and content factories before the site has stable title tags, canonical URLs, JSON-LD, crawlable routes, or clean social previews.

That order is backwards. If the page layer is messy, every later signal becomes harder to interpret. A low citation rate might be a content problem, a crawl problem, a schema problem, an entity problem, or simply a page that is not represented clearly enough for machines to reuse. The same pattern showed up in my AI citation research: source clarity and repeatable structure matter before any distribution trick.

So the first public layer of the stack is intentionally boring. It asks whether the site can be fetched, whether key routes expose the expected head tags, whether JSON-LD exists and parses, whether the page has a visible H1, and whether there are obvious noindex or schema gaps. Only after that does it make sense to add LLM brand-mention checks, ContentOS readiness gates, distribution mapping, and weekly AI Search measurement.⁵⁶

Public release

What went public

The new public slice adds two modules to geo-audit (GitHub.com): site-crawl-lite and head-schema-gate.²

Module	What it checks	Why it matters
`site-crawl-lite`	Status, final URL, title, meta description, canonical, H1, word count, JSON-LD count/types, links, image alt counts, and noindex.	It gives a small-site route inventory before deeper SEO/GEO work starts.
`head-schema-gate`	Title, description, canonical, H1, Open Graph, JSON-LD parse errors, Article author sameAs, BreadcrumbList, and FAQPage signals.	It catches deterministic head/social/schema issues before the team treats the problem as a content or LLM issue.

These modules are informational gates for now. They produce scores and action items, but they do not silently change the existing composite GEO methodology. That was deliberate: a public audit tool should not move scoring goalposts in the same PR that adds new checks.

Security boundary

How the secrets boundary works

The public GitHub repo must never contain real API keys, private credentials, internal hostnames, or personal secrets. Users who clone the repo bring their own keys. They can run the deterministic modules with no keys, then copy .env.example into a local, gitignored .env file if they want optional provider checks.³⁴

Public repo = safe code, docs, placeholders, tests, and trust checks. Private workspace = local keys, team credentials, configured providers, and deployment-specific proof.

That split matters for agentic work. If an agent can safely update public code without touching secrets, we can ship improvements openly. If the same stack runs in a private environment with configured keys, it can do richer audits. The boundary is not a convenience detail. It is part of the product architecture.

System map

Where this fits in the AI Search visibility stack

My working model is not "one tool replaces everything." It is a layered operating stack.

1. Crawl and head/schema gate

Can the site be fetched and represented cleanly enough for search engines, answer engines, and social surfaces?

2. ContentOS readiness

Does the page have a source pack, claims, evidence, answer units, FAQs, and human review before publication?

3. Distribution surfaces

Do Medium, LinkedIn, Habr, VC.ru, X, Substack, GitHub, and profile pages route authority back to the canonical URL?

4. Measurement loop

Do prompt sets, citations, source context, competitors, and downstream traffic change after the work ships?

geo-audit is strongest at the first layer today. The roadmap is to connect it more tightly with the other layers, especially the cross-case patterns in my AI visibility case-study synthesis, but I do not want to hide the base under a private dashboard. The base should be inspectable.

Proof loop

What the first live proof found

I ran the two new gates against gregshevchenko.com before publishing this note. The result was not a dramatic failure, which is exactly what a baseline gate should show when recent technical cleanup worked: site-crawl-lite returned 99/100 across 19 checked routes, and head-schema-gate returned 94/100 on the homepage.²

The remaining notes were small follow-ups: one route without JSON-LD and a BreadcrumbList recommendation where breadcrumb-like markup exists. That is useful signal. It says the next improvement is a schema consistency pass, not a panic rewrite.

Boundary

What this does not replace

geo-audit does not replace Screaming Frog, Sitebulb, Oncrawl, log-file analysis, enterprise crawls, full keyword suites, or paid brand-monitoring products. Those tools are still useful.

The point is different. I want an install-first, agent-friendly, testable stack that can run inside a repo workflow, produce proof artifacts, keep secrets local, and explain exactly what it checked. That makes it easier to improve the process in public and then write about the improvement with the source code attached.

Roadmap

What I will build next

The next useful modules are not glamorous. I want an internal-link graph, route-readiness runner, image-alt gate, sitemap/feed/llms consistency checker, and a stronger bridge from ContentOS source packs into publish-readiness scoring.

That is the pattern I want to keep: build a small deterministic layer, prove it on my own site, publish the code, write the canonical note, and then distribute the idea through Medium, LinkedIn, Habr, VC.ru, X, and Substack only after gregshevchenko.com is the source of record.

FAQ

Questions I expect

Is geo-audit a replacement for Screaming Frog or Sitebulb?

No. It is an inspectable AI Search visibility audit layer. Enterprise crawlers still matter for large-scale crawling, log-file analysis, and advanced technical SEO workflows.

Does the public repo contain API keys?

No. Public users bring their own keys through local environment variables or a gitignored .env file. The public repository should contain placeholders and documentation, not real credentials.

Can the tool run without paid APIs?

Yes. The deterministic modules run without paid API keys. Optional keys unlock richer brand-mention, PageSpeed, and provider-specific checks.

Why start with crawl and head/schema gates?

Because LLM scoring is less useful when the page is missing canonical tags, titles, descriptions, JSON-LD, or crawlable routes. Deterministic checks remove preventable noise first.

How does this connect to ContentOS?

ContentOS governs the content production corridor. geo-audit checks whether the published site layer is technically clean enough for measurement, distribution, and AI Search visibility work.