Cloudflare's AI Crawler Blockade: A New Web Power Play
Cloudflare's decision to automatically block mixed-use web crawlers on ad-supported pages is a direct answer to a fundamental question: who controls the data that fuels AI? Starting September 15, 2026, new Cloudflare customers and existing subscribers adding new sites will default to blocking crawlers that both index for search and train AI models. Free account users will also be switched unless they opt out. This is not a minor policy tweak—it is a structural shift in the economics of web content.
According to Cloudflare CEO Matthew Prince, 'the majority of traffic on the Internet is non-human.' The company's new default settings target mixed-use crawlers—bots that serve dual purposes, like Google's Googlebot, which indexes for search and trains Gemini. By forcing a separation, Cloudflare aims to create a sustainable ecosystem where publishers can monetize AI use without sacrificing search visibility.
For executives, this matters because it redefines the cost of AI training data. If your business relies on scraping web content for AI models, you now face a paywall on a significant portion of the internet. If you run a website, you gain a default shield against unauthorized AI training—but you must understand the opt-out mechanics to avoid unintended blocking of legitimate search crawlers.
The Mechanics of the Blockade
Cloudflare's new policy applies to 'pages with ads.' For such pages, the default is to allow search indexing but block AI training and agent use. Mixed-use crawlers that do not offer site owners a choice between search and AI will also be blocked. This effectively forces AI companies to either use separate crawlers (like Google-Extended for search only) or negotiate paid access.
The company is also evolving its Pay Per Crawl feature into Pay Per Use. Instead of paying per crawl, site owners get paid when their content appears in AI chatbot answers. Cloudflare has announced partnerships with Ceramic.AI and You.com, but the success of this model depends on attracting major AI players like OpenAI, Anthropic, and Google.
Strategic Winners and Losers
Who Gains
Website publishers with ad-supported content gain default protection. They no longer need to manually block AI crawlers or rely on robots.txt, which is often ignored. This reduces content theft and creates a potential revenue stream via Pay Per Use.
Cloudflare strengthens its moat. By acting as the arbiter of AI crawler access, it increases switching costs for customers and opens a new revenue line. It also positions itself as the guardian of publisher rights, which could attract new customers fleeing less protective CDNs.
Ceramic.AI and You.com gain exclusive early access to Pay Per Use content, giving them a competitive advantage in delivering AI answers with licensed data.
Who Loses
AI companies relying on mixed-use crawlers—including OpenAI, Anthropic, and others—face reduced access to fresh web content. They must either develop separate search crawlers or pay for data, increasing costs.
Google is indirectly targeted. While Googlebot is not blocked by default (it is a mixed-use crawler, but Cloudflare's policy only blocks those that don't offer a choice—Google offers Google-Extended for search only), the precedent pressures Google to separate its crawlers more cleanly or risk future restrictions. Google's dual-use crawler gives it a data advantage; Cloudflare's move erodes that.
Small AI startups without resources to negotiate separate agreements or build compliant crawlers will struggle to access the same breadth of web data as larger competitors.
Market Impact: The Two-Tier Web
Cloudflare's policy accelerates the emergence of a two-tier internet: free access for search, paid access for AI training. This could become an industry standard, with other CDNs and hosting providers following suit. The implications are profound:
- Data costs rise for AI companies. Training data from the open web becomes more expensive, potentially slowing model improvements and favoring incumbents with deep pockets.
- Publisher revenue diversifies. Websites can monetize AI use directly, reducing reliance on ad revenue, which is itself under pressure from AI-generated content.
- Search engines face pressure. Google's ability to index the web for AI features like AI Overviews may be constrained if publishers opt out of mixed-use crawling. This could degrade the quality of Google's AI products.
Outlook and Next Steps
Over the next 30 days, watch for:
- Google's response. Will Google announce a clearer separation of its crawlers or challenge Cloudflare's policy? A legal or technical countermove is likely.
- Adoption by other CDNs. Akamai, Fastly, and others may introduce similar defaults, amplifying the effect.
- Pay Per Use expansion. If Cloudflare signs deals with OpenAI or Anthropic, the model gains legitimacy. If not, it remains niche.
- Opt-out rates. If many free users opt out, the policy's impact diminishes. Cloudflare's communication will be key.
Executives should audit their Cloudflare settings before September 15 to ensure desired crawler access. AI companies should accelerate development of separate search crawlers and prepare for a world where web data is no longer free.
Final Take
Cloudflare is betting that the web's future depends on clear rules of engagement between content creators and AI consumers. By defaulting to block mixed-use crawlers, it forces a conversation that many have avoided. The move is bold, but its success hinges on execution—partnerships, user education, and the ability to withstand pushback from powerful AI players. For now, the balance of power has shifted slightly back to publishers.
Rate the Intelligence Signal
Intelligence FAQ
Google's main crawler, Googlebot, is a mixed-use crawler, but Cloudflare's policy only blocks those that don't offer a choice. Google offers Google-Extended for search-only crawling, so Googlebot may still be allowed. However, if Google doesn't clearly separate its crawlers, it could be blocked on ad pages by default.
Review your Cloudflare settings. If you want to allow AI training or agent use on ad pages, you must opt out of the new defaults. If you rely on search traffic, ensure Google-Extended or other search-only crawlers are not blocked. Consider enabling Pay Per Use to monetize AI content use.



