AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) read HTML source, not rendered JavaScript. Websites that rely on client-side rendering are effectively invisible to AI search. Static HTML, clean heading hierarchy, answer capsule formatting, and proper robots.txt configuration are the foundations of AI-visible website structure.

The fundamental problem: JavaScript rendering

Most AI crawlers do not execute JavaScript. They read raw HTML source code. This creates a massive visibility gap:

Rendering approach	AI crawler visibility	Common platforms
Static HTML / SSG	Full visibility	Astro, Hugo, Eleventy, Jekyll
Server-side rendered (SSR)	Full visibility	Next.js (SSR mode), Nuxt, Astro
Static export from SSR	Full visibility	Next.js (static export), Gatsby
Client-side rendered (CSR)	Minimal to zero	React SPA, Vue SPA, Angular SPA
Heavy JS WordPress themes	Partial — depends on theme	WordPress with Elementor, Divi, WPBakery

If your website content only appears after JavaScript executes in the browser, AI crawlers cannot see it. This applies to GPTBot (ChatGPT), ClaudeBot (Claude), PerplexityBot (Perplexity), and most other AI crawlers. Static Site Generation (SSG) or Server-Side Rendering (SSR) are required for AI visibility.

How to test what AI crawlers see

View page source (not inspect element) — this is what crawlers read
Disable JavaScript in your browser and reload — this is what crawlers see
Use curl https://yoursite.com/page in terminal — this returns raw HTML
If your content disappears in any of these tests, AI crawlers cannot see it

Robots.txt configuration for AI crawlers

Your robots.txt file controls which AI crawlers can access your content. Many websites block AI crawlers without realising it — either through broad wildcard rules or security plugins.

# Allow all AI search crawlers
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

# Block sensitive directories from all bots
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /staging/

Common robots.txt mistakes

Wildcard blocking: User-agent: * / Disallow: / blocks everything including AI crawlers
Security plugin defaults: WordPress security plugins often block unknown user agents
Forgetting OAI-SearchBot: GPTBot is for training, but OAI-SearchBot is for real-time ChatGPT search
Blocking ClaudeBot: Some sites block ClaudeBot specifically — this prevents training data inclusion

The answer capsule format

An answer capsule is a 40-60 word factual paragraph placed immediately after a heading. It contains a direct, specific answer to the question implied by the heading. AI platforms extract these capsules as citation-ready content. Pages using this format see significantly higher citation rates across ChatGPT, Gemini, and AI Overviews.

Answer capsule structure

Placement: Immediately after the H2 or H3 heading
Length: 40-60 words (concise enough for extraction)
Content: Direct factual answer with specific data points
Formatting: Bold the first sentence or the entire capsule
CSS class: Use .answer-capsule for Speakable schema targeting

Example

After a heading "How much does AI search optimisation cost?", the answer capsule would be:

"AI search optimisation costs between £500-£5,000 per month from specialist agencies. The price depends on scope, competition, and the number of AI platforms targeted. Most UK agencies charge separately for audit, implementation, and ongoing monitoring."

Heading hierarchy for AI extraction

AI crawlers use heading hierarchy to understand content structure and extract relevant sections. Follow these rules:

Rule	Why it matters
One H1 per page	Defines the primary topic for AI extraction
H2 for major sections	Each H2 should be independently answerable
H3 for subsections	Provides granular extraction targets
No skipped levels	Don't jump from H2 to H4 — breaks hierarchy logic
Question-format headings	Match user queries directly for citation matching
Answer capsule after each H2	Gives AI a citation-ready extract per section

One idea per paragraph

AI models process content at the paragraph level. Long paragraphs that cover multiple ideas create extraction confusion. Keep paragraphs focused:

One claim per paragraph — don't bundle multiple statistics or facts
2-4 sentences maximum — shorter is easier to extract
Lead with the fact — put the key information in the first sentence
Avoid transition fluff — "As we discussed earlier" adds nothing for AI crawlers

Content freshness signals

76.4% of ChatGPT-cited pages were updated within 30 days. Freshness is a real citation factor. Implement these:

dateModified in schema — update this whenever you revise content
Visible "Last updated" date on the page — AI crawlers read this
Genuine content updates — don't just change the date, actually revise the content
Regular content audits — review and update key pages at least monthly

llms.txt — the machine-readable index

llms.txt is an emerging standard that provides AI models with a machine-readable index of your most important content. Similar to how robots.txt tells crawlers what they can access, llms.txt tells AI models what they should prioritise. Place it at your domain root alongside robots.txt and sitemap.xml.

# Example llms.txt
# Your Company Name
# https://example.com

## About
> Brief description of your company and what you do.

## Key Pages
- [Homepage](https://example.com/)
- [About Us](https://example.com/about/)
- [Services](https://example.com/services/)
- [Contact](https://example.com/contact/)

## Expertise Areas
- [Topic Area 1](https://example.com/topic-1/)
- [Topic Area 2](https://example.com/topic-2/)

## FAQs
- [Common Questions](https://example.com/faq/)

IndexNow protocol

IndexNow notifies Bing (and therefore ChatGPT) immediately when you publish or update content. Without IndexNow, you're waiting for Bing to discover changes through normal crawling.

Supported by: Bing, Yandex, Seznam, Naver
Not supported by: Google (uses its own systems)
Impact: Near-instant Bing indexation, which feeds ChatGPT and Copilot
Implementation: API call or plugin (WordPress, Cloudflare Workers)

Bing Webmaster Tools submission

Since ChatGPT uses Bing's index, submitting your sitemap to Bing Webmaster Tools is essential. Many businesses only submit to Google Search Console and miss Bing entirely.

Go to bing.com/webmasters
Add your site and verify ownership
Submit your XML sitemap
Enable IndexNow for instant update notifications
Monitor crawl errors and coverage

The Astro + Cloudflare advantage

Static site generators like Astro, combined with edge deployment on Cloudflare, create the ideal architecture for AI visibility:

Pre-rendered HTML — every page is static, fully readable by all crawlers
No JavaScript dependency — content exists in the HTML source
Edge caching — fast response times from global CDN
Markdown for Agents — Cloudflare's feature that serves clean markdown to AI crawlers
Lighthouse scores 95+ — compared to WordPress average of 40-70

This site is built on Astro and deployed to Cloudflare — you can read about our methodology.

Technical checklist

Item	Priority	Status check
Static HTML or SSR rendering	Critical	View source — is content visible?
Allow AI crawlers in robots.txt	Critical	Check for GPTBot, ClaudeBot, PerplexityBot
Submit sitemap to Bing	High	Bing Webmaster Tools dashboard
Implement IndexNow	High	Test with Bing URL Submission API
Answer capsules after headings	High	40-60 word factual paragraphs
Clean heading hierarchy	High	H1 > H2 > H3, no skipped levels
One idea per paragraph	Medium	2-4 sentences, lead with the fact
Schema markup	High	Google Rich Results Test
Create llms.txt	Medium	File at domain root
Content freshness dates	Medium	dateModified in schema + visible date

How to Structure Your Website for AI Crawlers