Google's AI Advantage: Why Separating Crawlers is Essential for a Fair Internet

The UK’s Competition and Markets Authority (CMA) recently initiated a consultation regarding proposed conduct requirements for Google. This consultation seeks feedback on measures designed to enhance choice and transparency for publishers concerning Google’s utilization of search data to power its generative AI services. These are the inaugural consultations on conduct requirements under the UK’s digital markets competition framework.

The recognition by the CMA that publishers require a more equitable arrangement is a positive development, and the proposed rules represent progress. Publishers should have access to tools that allow them to manage the inclusion of their content in generative AI services, and AI companies should operate on a level playing field. However, these proposals may not fully protect the UK’s creative sector or adequately promote competition within the generative and agentic AI markets.

CMA Designates Google with Strategic Market Status

January 2025 marked a significant legal change in the UK’s regulatory environment with the introduction of the Digital Markets, Competition and Consumers Act 2024 (DMCC). This act enables the CMA to designate companies with Strategic Market Status (SMS) if they possess substantial, entrenched market power, moving beyond traditional antitrust investigations. This designation facilitates targeted CMA interventions, including specific conduct requirements, to enhance competition in digital markets.

In October 2025, Google was designated by the CMA as having SMS in general search and search advertising, reflecting its over 90 percent market share in the UK. This designation notably includes AI Overviews and AI Mode, granting the CMA authority to enforce conduct requirements within Google’s search ecosystem. These final requirements are legally binding and can specifically address AI crawling, with substantial penalties to ensure Google’s fair operation.

Publishers Require Effective Opt-Out Mechanisms for Generative AI Content Use

The CMA’s designation is particularly relevant given the current need for clear guidelines on AI crawling behavior across the Internet.

As the CMA accurately observes, publishers are compelled to permit Googlebot to crawl their content for general search due to Google’s dominant market position. However, Google currently utilizes this same content for both its search-integrated generative AI features and its broader generative AI services.

This means content scraped for search indexing is also employed for inference and grounding in applications like AI Overviews and AI Mode, which retrieve real-time information for user queries. This dual use presents a significant challenge for publishers and competition.

Since publishers cannot realistically block Googlebot without jeopardizing traffic, they must accept that their content will be used in generative AI applications within Google Search. These applications, such as AI Overviews and AI Mode, often generate minimal, if any, referral traffic to their websites. This practice undermines the advertising-supported business models that have sustained digital publishing for decades, as Google Search is crucial for directing human traffic to online ads. Furthermore, Google’s generative AI applications directly compete with publishers by reproducing their content, frequently without proper attribution or compensation.

Publishers’ inability to block Google due to its search dominance grants Google an unfair competitive edge in the generative and agentic AI markets. Unlike other AI bot operators, Google can leverage its search crawler to collect data for various AI functions with little concern of access restrictions. This minimizes its incentive to compensate publishers for data it obtains freely.

Such a scenario hinders the development of a fair marketplace where AI developers can negotiate content value. Other AI companies are discouraged from participating, as they face a structural disadvantage where a dominant player can bypass compensation entirely. The CMA acknowledges this, stating that by not offering adequate control over content usage, Google can restrict publishers’ ability to monetize their content while accessing it for AI-generated results in a manner unmatched by competitors.

Google’s Market Advantage

Data indicates Google’s significant competitive advantage. Googlebot accesses substantially more Internet content than its closest counterparts.

Over a two-month observation period, Googlebot successfully accessed individual pages nearly twice as often as ClaudeBot and GPTBot, three times more than Meta-ExternalAgent, and over three times more than Bingbot. The disparity was even greater for other prominent AI crawlers; for example, Googlebot observed 167 times more unique pages than PerplexityBot. Approximately 8% of sampled unique URLs on the network were crawled by Googlebot during this period.

~1.70x the unique URLs seen by ClaudeBot;
~1.76x the unique URLs seen by GPTBot;
~2.99x the unique URLs by Meta-ExternalAgent;
~3.26x the unique URLs seen by Bingbot;
~5.09x the unique URLs seen by Amazonbot;
~14.87x the unique URLs seen by Applebot;
~23.73x the unique URLs seen by Bytespider;
~166.98x the unique URLs seen by PerplexityBot;
~714.48x the unique URLs seen by CCBot; and
~1801.97x the unique URLs seen by archive.org_bot.

Googlebot’s prominence is also evident in other datasets.

Despite being the most active bot by overall traffic, publishers are considerably less likely to disallow or block Googlebot in their robots.txt files compared to other crawlers. This is likely due to its critical role in directing human traffic and, consequently, advertising revenue, to their content through search.

Few websites explicitly disallow the dual-purpose Googlebot entirely, underscoring its importance for search referrals. Partial disallows typically affect website sections irrelevant for search engine optimization (SEO), such as login pages.

Robots.txt files merely express crawling preferences and are not an enforcement mechanism; publishers rely on “good bots” to comply. For more effective and independent management of crawler access, publishers can implement a Web Application Firewall (WAF) with specific rules to technically prevent unwanted crawlers. Consistent with robots.txt behavior, websites are expected to block most other AI crawlers but not Googlebot.

Indeed, a comparison of data from customers utilizing AI Crawl Control, an AI crawler blocking tool integrated into an Application Security suite, between July 2025 and January 2026, reveals that the number of websites actively blocking other popular AI crawlers (e.g., GPTBot, Claudebot) was nearly seven times higher than those blocking Googlebot and Bingbot. Bingbot, similar to Googlebot, combines search and AI crawling and drives traffic, but its smaller search market share means its impact is less significant.

There is agreement with the CMA’s problem statement. The question remains how publishers can effectively opt out of Google using their content for generative AI applications. The CMA concludes that publishers require the ability to effectively opt their search content out of both Google’s search generative AI features and its broader generative AI services to make informed decisions. However, concerns exist that the CMA’s current proposal may be insufficient.

CMA’s Proposed Publisher Conduct Requirements

On January 28, 2026, the CMA released four sets of proposed conduct requirements for Google, which included specific rules for publishers. The CMA states these proposed rules aim to address publisher concerns regarding (1) insufficient choice over Google’s use of their content in AI-generated responses, (2) limited transparency into this content usage, and (3) inadequate attribution for their content. The CMA acknowledged the significance of these issues given Google Search’s role in online content discovery.

These conduct requirements would compel Google to provide publishers with "meaningful and effective" control over whether their content is used for AI features such as AI Overviews. Google would also be forbidden from actions that undermine these control options, like intentionally downranking content in search results.

To facilitate informed decisions, the CMA’s proposal also mandates Google to enhance transparency. This involves publishing clear documentation on how it uses crawled content for generative AI and precisely what its various publisher controls encompass. Additionally, the proposal requires Google to ensure effective attribution for publisher content and to supply publishers with detailed, disaggregated engagement data—including metrics for impressions, clicks, and “click quality”—to assist them in assessing the commercial value of allowing their content to be used in AI-generated search summaries.

CMA’s Proposed Remedies Deemed Insufficient

While the CMA’s efforts to enhance publisher options are commendable, concerns remain that the proposed requirements do not fully address the core issue of fostering fair, transparent choice regarding Google’s content usage. Publishers are effectively compelled to use Google’s proprietary opt-out mechanisms, which are platform-specific and subject to Google’s terms, rather than being granted direct, autonomous control. A system where the platform dictates rules, manages technical controls, and defines scope does not provide “effective control” to content creators or encourage competitive market innovation; instead, it perpetuates dependency.

Such a framework also limits publisher choice. New opt-out controls would prevent publishers from using external tools to block Googlebot without risking their visibility in search results. Under the current proposal, content creators would still need to permit Googlebot to scrape their websites, lacking independent enforcement mechanisms and with limited transparency if Google disregards their preferences. Enforcing these requirements effectively would be burdensome for the CMA, without guaranteeing publisher trust in the solution.

Feedback from Cloudflare customers indicates that Google’s existing proprietary opt-out mechanisms, including Google-Extended and ‘nosnippet’, have not effectively prevented content from being used in ways publishers cannot control. Furthermore, these tools do not facilitate fair compensation for publishers.

More broadly, consistent with responsible AI bot principles, all AI bots should have a declared, distinct purpose, enabling website owners to make informed decisions about who accesses their content and why. Googlebot, unlike crawlers from leading competitors like OpenAI and Anthropic, does not adhere to this principle, as it serves multiple purposes (search indexing, AI training, and inference/grounding). Simply mandating a new opt-out mechanism from Google would not grant publishers meaningful control over their content’s usage.

The most effective method to provide publishers with this essential control is to mandate the separation of Googlebot into distinct crawlers. This would allow publishers to permit crawling for traditional search indexing, which is vital for attracting site traffic, while simultaneously blocking access for undesired use of their content in generative AI services and features.

Crawler Separation: The Only Effective Solution

To establish a fair digital ecosystem, the CMA should empower content owners to prevent Google from accessing their data for specific purposes initially, rather than relying on Google-managed workarounds after content has already been accessed. This approach would also enable creators to define conditions for content access.

Although the CMA considered crawler separation an “equally effective intervention,” it ultimately rejected mandating it, citing Google’s assertion that it would be too burdensome. This perspective is debatable.

Requiring Google to separate Googlebot by purpose—a practice Google already follows for its nearly 20 other crawlers—is not only technically feasible but also a necessary and proportionate solution. It would provide website operators with the granular control they currently lack, potentially even reducing traffic load from crawlers if they choose to block AI crawling.

A crawler separation remedy would benefit AI companies by leveling the competitive landscape with Google, in addition to granting UK-based publishers greater control over their content. This approach has garnered widespread public support from entities like Daily Mail Group, The Guardian, and the News Media Association. Mandatory crawler separation does not disadvantage Google or hinder AI investment. Instead, it acts as a pro-competitive safeguard, preventing Google from using its search monopoly to gain an unfair advantage in the AI market. Decoupling these functions ensures that AI development is driven by fair-market competition, not by the exploitation of a single dominant platform.

The UK has a unique opportunity to lead globally in safeguarding the value of original, high-quality Internet content. However, current proposals may not go far enough. Rules are needed to ensure Google operates under the same content access conditions as other AI developers, thereby genuinely restoring agency to publishers and fostering new business models for content monetization.

Engagement with the CMA and other stakeholders during upcoming consultations is crucial to provide data-driven insights that can shape a final decision on conduct requirements that are targeted, proportional, and effective. The CMA still has the chance to ensure the Internet evolves into a fair marketplace for content creators and smaller AI players, rather than remaining dominated by a select few tech giants.

Latest Post

Anker’s X1 Pro shouldn’t exist, but I’m so glad it does

Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations

Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling

How Cloudflare Mitigated a Vulnerability in its ACME Validation Logic

Demis Hassabis and John Jumper Receive Nobel Prize in Chemistry

How to Cancel Your Google Pixel Watch Fitbit Premium Trial

ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

Latest Post

Anker’s X1 Pro shouldn’t exist, but I’m so glad it does

Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations

Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling

Latest Post

Google’s AI Advantage: Why Separating Crawlers is Essential for a Fair Internet

CMA Designates Google with Strategic Market Status

Publishers Require Effective Opt-Out Mechanisms for Generative AI Content Use

Google’s Market Advantage

CMA’s Proposed Publisher Conduct Requirements

CMA’s Proposed Remedies Deemed Insufficient

Crawler Separation: The Only Effective Solution

Related Posts