PlatformIndexing

what AI knows about you, before it ever searches.

Centium is the AEO platform to optimize for AI indexing. Centium tracks your indexing footprint, checks which AI crawlers can reach your site, and cross-references it all with the training window of every major model. You see what each model already knows about you, and how it influences their responses.

01 / Modes

closed book, open book.

AEO is not SEO with new rules. The mechanics are different because the machine reading your content is different. Every AI answer is shaped by two kinds of bots: training scrapers that built the model’s memory, and live-retrieval agents that fetch fresh data at query time. Indexing is how you show up in the first one.

Closed Book Exam

Intelligence only.

The model answers from what it learned during training. The pages it indexed, the citations baked into its weights, the reinforcement it received before launch. No internet, no live lookups. If your brand was not in the training data, you do not exist for the closed-book answer.

Indexing optimizes for the brain.

Open Book Exam

Intelligence plus live search.

The model still has its trained opinions, but it can also reach out to the web at query time. PerplexityBot, ChatGPT-User, Gemini-Deep-Research and others fetch fresh pages to ground the answer. If your site is reachable and well-organized, it can contribute to the response even if it was not in the training data.

Search optimizes for the eyes.

Indexing decides what AI knows. Search decides what it can find. Centium measures both. See Sources for the search side.

02 / Crawl History

every crawl, on a calendar.

Common Crawl is the largest open dataset on the internet and the foundation training source for ChatGPT, Claude, Gemini, Perplexity, and Grok. Centium tracks how often your site is indexed there, month after month.

The dashboard builds a calendar of every crawl, flags any cycle you missed, and lists your most-crawled and most-recently-indexed pages so you know exactly which content AI has had a chance to learn from.

Crawl History
Pages indexed by Common Crawl, the largest training source for every major AI model.
1,247
Pages Indexed
89%
Coverage
04.2026
Last Crawled
Last 12 Months
IndexedMissed
Jun '25
Jul '25
Aug '25
Sep '25
Oct '25
Nov '25
Dec '25
Jan '26
Feb '26
Mar '26
Apr '26
May '26
Crawler Access
Which AI crawlers can reach your site, based on your robots.txt rules.
1 Blocked
GPTBotAllowed
ClaudeBotAllowed
Google-ExtendedAllowed
PerplexityBotAllowed
CCBotAllowed
Meta-ExternalAgentBlocked

Showing 6 of 21 tracked crawlers

03 / Crawler Access

every bot, on the list.

21 AI crawlers across 11 companies want access to your site, split between training scrapers and live-retrieval agents. GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, PerplexityBot, Bytespider, Meta-ExternalAgent, and a dozen more. Centium runs a full robots.txt audit and reports exactly who can get in.

Most sites accidentally block at least one critical AI crawler. Any restriction that quietly removes you from training data, live answers, or AI-integrated search gets flagged at the top of the dashboard.

04 / Training Coverage

training data, by model.

Every AI model has a knowledge cutoff. Pages published after that date cannot be remembered. They have to be found, live, through search.

Centium cross-references your Common Crawl history with the published training window of each model. You see which of your pages were captured before each model finished training, which were published after, and which of your most important pages a model has to search to find.

AI Training Coverage
Which pages were in the training data for each AI model, and which were published after the cutoff.
Knowledge Cutoff: Apr 2025
In Training
412
Pages
After Cutoff
38
Pages
FAQ

questions, answered.

AI indexing is the process of large language models capturing your website during the training crawl phase. The pages a model has indexed are the pages it can describe from memory. Centium tracks your indexing footprint across Common Crawl, the open dataset that trains every major AI model, so you know what AI already knows about you before it ever searches.

Common Crawl is a free, open repository of web crawl data that has been collected continuously since 2007. It contains over 300 billion pages spanning 19 years and is the largest single training source for every major AI model: ChatGPT, Claude, Gemini, Perplexity, and Grok. If your pages are not in Common Crawl, the chance that an AI model learned about your brand during training is very low.

There are three categories. Training crawlers like GPTBot, ClaudeBot, Google-Extended, CCBot, and Meta-ExternalAgent index your content to teach the next generation of AI models. Live browsing crawlers like ChatGPT-User, PerplexityBot, and Gemini-Deep-Research visit your site in real time when a user asks an AI a question. Search indexing crawlers like Googlebot, Bingbot, and Applebot-Extended power AI-integrated search results. Blocking the wrong ones quietly removes your brand from where AI is looking.

Centium queries Common Crawl directly for every domain on the platform and cross-references the crawl history against the published training window of each AI model. You see exactly which pages were captured before each model finished training, which were published after the cutoff, and how recently your site has been re-crawled.

Training crawlers index your content so AI models can learn from it during their next training cycle. That language stays in the model. Live search crawlers visit your site at query time to ground an AI answer in fresh information. That language is only used for that single response. Indexing optimizes the brain. Search optimizes the being. See Sources for how Centium tracks the live-search side.

See your own indexing footprint

see what AI
already knows.

Strong indexing means AI models recognize you, cite you, and recommend you. Weak indexing means they default to your competitors. Centium tracks your indexing footprint across every major AI model so you know exactly what is shaping each answer about your brand.