Is Your Website Invisible to AI Search? A Developer's Checklist

Is Your Website Invisible to AI Search? A Developer's Checklist

Leader posted 4 min read

I run an AI-powered test prep platform (LearnQ.ai) and a SaaS product for educational institutions (VEGA AI). When AI search engines like ChatGPT, Perplexity, and Google AI Overviews started becoming real traffic sources, I went back to basics and audited both sites for AI crawlability.
What I found surprised me. Several things that were perfectly fine for Google were actively blocking AI crawlers. This checklist is what I now run on every site before doing any AI search optimization work.

1. Check Your robots.txt for AI Crawler Blocks

Most developers set up robots.txt for Google and Bing and forget about it. AI crawlers use different user-agent strings.

Open your robots.txt file and check for blocks on these agents:

  • GPTBot (OpenAI)
  • ClaudeBot (Anthropic)
  • PerplexityBot
  • GoogleOther (used for AI training and features)
  • CCBot (Common Crawl, used by many LLM training pipelines)

A blanket Disallow: / on User-agent: * blocks all of them. If you added aggressive bot protection after a traffic spike or a security scare, there is a good chance you blocked AI crawlers along with the bad bots.

Fix: Add explicit Allow rules for AI crawlers you want to permit, or remove overly broad Disallow rules.

2. Create and Publish an llms.txt File

llms.txt is an emerging standard (think robots.txt but designed for LLMs). It sits at yourdomain.com/llms.txt and tells AI systems what your site is about, which pages are most important, and how the content should be used.

It is not yet universally supported, but ChatGPT, Perplexity, and several AI agents are beginning to respect it. Early adoption gives you an edge.

A basic llms.txt looks like this:


 llms.txt for proaisearch.com

> Pro AI Search is India's first AI search optimization resource.

 Key Pages
- [Home](https://proaisearch.com/)
- [GEO Guide](https://proaisearch.com/generative-engine-optimization/)
- [AEO Guide](https://proaisearch.com/answer-engine-optimization/)

Keep it simple. List your most important pages with clean descriptions.

3. Audit Your Bot Protection Settings

Cloudflare, Wordfence, Sucuri, and similar tools are aggressive by default. Check these specific settings:

Cloudflare: Under Security > Bots, check if "Block AI Scrapers and Crawlers" is enabled. This is a relatively new toggle that blocks GPTBot, ClaudeBot, and others by default when switched on.

Wordfence: Check the rate limiting rules. If you set aggressive thresholds after a brute force attack, legitimate AI crawlers hitting multiple pages during a crawl session can get auto-blocked.

Sucuri: Check the "Block All Bots" option under Security > Hardening. It does exactly what it says.

Review your server access logs if you have them. Look for 403 or 429 responses against GPTBot or ClaudeBot. Those are your AI crawlers getting turned away at the door.

4. Validate Your Schema Markup

AI engines do not just crawl text. They parse structured data to understand entities, relationships, and content type. Missing or broken schema is a silent visibility killer.

Check for these schema types depending on your site:

  • Organization schema on your homepage (name, URL, logo, social
    profiles)
  • Article or BlogPosting schema on content pages
  • FAQPage schema on pages with Q&A content
  • BreadcrumbList schema for site structure signals
  • Person schema for author pages (critical for E-E-A-T in AI citations)

Use Google's Rich Results Test and Schema.org's validator. Fix any errors before doing anything else. AI engines rely heavily on structured data to decide whether to cite your content.

5. Check Page Rendering: Is Your Content Actually in the HTML?

If your site is JavaScript-heavy and content renders client-side, AI crawlers may be seeing blank pages. Most AI crawlers do not execute JavaScript the way Googlebot does.

Quick test: Disable JavaScript in your browser and load your key pages. If the main content disappears, AI crawlers likely see the same empty page.

Fix options:

  • Implement server-side rendering (SSR) or static site generation (SSG)
  • Use pre-rendering services for specific pages
  • At minimum, ensure critical content (headings, body text, key facts)
    is in the initial HTML response

6. Check Canonical Tags and Indexability

This one catches people off guard. If you have:

  • <meta name="robots" content="noindex"> on important pages
  • Canonical tags pointing to a different URL
  • Pages blocked in robots.txt that you actually want crawled

...AI search engines will either ignore those pages or attribute the content to the wrong URL.

Run a crawl with Screaming Frog or Sitebulb. Filter for noindex pages and mismatched canonicals. Fix anything that should be crawlable and citable.

Running This Checklist

I go through these six checks in this order on any new site before touching content or link building. Technical access comes first. There is no point optimizing content that AI crawlers cannot reach in the first place.

For WordPress sites specifically, WPCode is useful for adding llms.txt content via a snippet rather than manually creating a file on the server. That is how I manage it on proaisearch.com.

If you find issues, fix robots.txt and bot protection first since those are the most common blockers. Then llms.txt, then schema, then rendering.

If you want to go deeper on any of these, I cover the full AI search optimization framework at proaisearch.com.


Amit Kumar is the founder of Pro AI Search and Growth Manager at LearnQ.ai.

1 Comment

1 vote

More Posts

Your Tech Stack Isn’t Your Ceiling. Your Story Is

Karol Modelskiverified - Apr 9

Your AI Doesn't Just Write Tests. It Runs Them Too.

Kevin Martinez - May 12

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Your Website Exists. AI Doesn't Know That.

Joshua R. Gutierrez - Apr 9

Our Free AI SEO Tool Scanned 500+ Sites

Joshua R. Gutierrez - Mar 17
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

10 comments
9 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!