The Structural Shift in Content Distribution
AI companies have established a content extraction economy where publishing traffic is systematically diverted from traditional channels to AI platforms. Akamai's analysis reveals that 63% of AI bot activity targeting media in the second half of 2025 came from training crawlers, while 24% came from fetcher bots that directly answer user queries without driving traffic to publisher sites. This represents a fundamental restructuring of how information flows from creators to consumers.
The publishing sector accounted for 40% of all AI bot activity in media, significantly ahead of broadcast and OTT at 29%. This concentration indicates that text-based content remains the primary fuel for AI systems. OpenAI alone generated 40% of its media requests to publishing companies, demonstrating how deeply integrated these systems have become in content acquisition.
The Strategic Implications of Bot Differentiation
Training crawlers and fetcher bots represent two distinct strategic threats to publishers. Training crawlers, which made up 63% of AI bot activity, collect content to build language models—a long-term investment in AI capabilities. Fetcher bots, while only 24% of activity, present an immediate revenue threat by pulling specific pages in real-time when users ask AI chatbots questions, effectively bypassing publisher monetization channels.
Publishing accounted for 43% of fetcher bot activity, indicating that text-based content is particularly vulnerable to this immediate displacement. When a fetcher bot pulls an article to answer a chatbot query, the user receives the information without visiting the publisher's site, eliminating advertising revenue, subscription opportunities, and brand engagement.
The Power Dynamics of AI Bot Operators
OpenAI's dominance with 40% of media requests going to publishing companies reflects its multi-bot strategy. Meta and ByteDance as second- and third-largest operators demonstrate how social media and short-form video platforms are expanding into text-based content acquisition. Anthropic and Perplexity rounding out the top five at lower volumes suggests a tiered market structure where a few dominant players control the majority of traffic.
The concentration of power among OpenAI, Meta, and ByteDance creates significant bargaining asymmetry in licensing negotiations, as publishers face a consolidated buyer market for their content.
The Publisher Response Matrix
Akamai's report reveals three primary publisher responses: deny (blocking requests outright), tarpit (holding connections open to waste bot resources), and delay (adding a pause before responding). One unnamed publisher's approach of tarpitting over blocking controlled 97% of AI bot requests while keeping the door open to potential licensing deals.
The report's argument against blanket blocking—that some AI companies are willing to pay for content access—highlights the strategic dilemma publishers face. Complete protection could mean missing out on potential revenue streams, while complete openness risks commoditization of content.
The Revenue Model Disruption
The distinction between training crawlers and fetcher bots has direct financial implications. Blocking a training crawler influences how content helps build future AI models—a strategic decision about long-term positioning. Blocking a fetcher bot affects whether content appears in AI responses right now—an immediate revenue protection decision.
The commerce sector drawing 48% of AI bot traffic compared to media's 13% suggests that transactional content may be more valuable to AI systems, potentially creating different valuation models for different content types.
The Global Scale of Content Extraction
The diverse currency values—$10.5 billion, £50 million, ¥1.2 trillion, €1.1 billion, ₹1.1 trillion—indicate that this is a global phenomenon with significant financial stakes across major economies. The rapid monthly progression from January through December 2025 shows how quickly this market is evolving.
ByteDance's position as third-largest operator demonstrates how Chinese tech companies are actively participating in this content extraction economy, creating additional complexity for global publishers who must navigate technological challenges and geopolitical considerations.
The Future of Content Valuation
As AI bots become more sophisticated in content extraction, the fundamental question becomes: what is the value of content when it can be efficiently extracted, processed, and redistributed by AI systems? The 97% control rate achieved by one publisher through tarpitting suggests that technical solutions exist, but they require significant investment and expertise.
The extremely low percentages (0.2%, 0.01%) for some AI bot traffic sources indicate that not all extraction attempts are equally effective or valuable. This creates opportunities for publishers to differentiate between high-value and low-value extraction attempts.
Source: Search Engine Journal
Rate the Intelligence Signal
Intelligence FAQ
Fetcher bots that pull specific pages in real-time when users ask AI questions—bypassing publisher sites entirely and eliminating advertising, subscription, and engagement revenue.
They collect content to build AI language models that could eventually generate similar content independently, potentially making publishers redundant in the content creation chain.
OpenAI leads with 40% of media requests to publishing, followed by Meta and ByteDance—three companies control the majority of extraction traffic.
Tarpitting (holding connections open) controlled 97% of AI bot requests for one publisher while keeping licensing options open—more strategic than outright blocking.
Diverse currency values indicate global scale extraction, with Chinese company ByteDance as third-largest operator despite different regulatory environments.

