The Cloudflare Blog

Google’s AI advantage: why crawler separation is the only path to a fair Internet

Maria Palmieri — Fri, 30 Jan 2026 17:01:04 GMT

Earlier this week, the UK’s Competition and Markets Authority (CMA) opened its consultation on a package of proposed conduct requirements for Google. The consultation invites comments on the proposed requirements before the CMA imposes any final measures. These new rules aim to address the lack of choice and transparency that publishers (broadly defined as “any party that makes content available on the web”) face over how Google uses search to fuel its generative AI services and features. These are the first consultations on conduct requirements launched under the digital markets competition regime in the UK.

We welcome the CMA’s recognition that publishers need a fairer deal and believe the proposed rules are a step into the right direction. Publishers should be entitled to have access to tools that enable them to control the inclusion of their content in generative AI services, and AI companies should have a level playing field on which to compete.

But we believe the CMA has not gone far enough and should do more to safeguard the UK’s creative sector and foster healthy competition in the market for generative and agentic AI.

CMA designation of Google as having Strategic Market Status

In January 2025, the UK’s regulatory landscape underwent a significant legal shift with the implementation of the Digital Markets, Competition and Consumers Act 2024 (DMCC). Rather than relying on antitrust investigations to address risks to competition, the CMA can now designate firms as having Strategic Market Status (SMS) when they hold substantial, entrenched market power. This designation allows for targeted CMA interventions in digital markets, such as imposing detailed conduct requirements, to improve competition.

In October 2025, the CMA designated Google as having SMS in general search and search advertising, given its 90 percent share of the search market in the UK. Crucially, this designation encompasses AI Overviews and AI Mode, with the CMA now having the authority to impose conduct requirements on Google’s search ecosystem. Final requirements imposed by the CMA are not merely suggestions but legally enforceable rules that can relate specifically to AI crawling with significant sanctions to ensure Google operates fairly.

Publishers need a meaningful way to opt out of Google’s use of their content for generative AI

The CMA’s designation could not be more timely. As we’ve said before, we are indisputably in a time when the Internet needs clear “rules of the road” for AI crawling behavior.

As the CMA rightly states, “publishers have no realistic option but to allow their content to be crawled for Google’s general search because of the market power Google holds in general search. However, Google currently uses that content in both its search generative AI features and in its broader generative AI services.”

In other words: the same content that Google scrapes for search indexing is also used for inference/grounding purposes, like AI Overviews and AI Mode, which rely on fetching live information from the Internet in response to real-time user queries. And that creates a big problem for publishers—and for competition.

Because publishers cannot afford to disallow or block Googlebot, Google’s search crawler, on their website, they have to accept that their content will be used in generative AI applications such as AI Overviews and AI Mode within Google Search that return very little, if any, traffic to their websites. This undermines the ad-supported business models that have sustained digital publishing for decades, given the critical role of Google Search in driving human traffic to online advertising. It also means that Google’s generative AI applications enter into direct competition with publishers by reproducing their content, most often without attribution or compensation.

Publishers’ reluctance to block Google because of its dominance in search gives Google an unfair competitive advantage in the market for generative and agentic AI. Unlike other AI bot operators, Google can use its search crawler to gather data for a variety of AI functions with little fear that its access will be restricted. It has minimal incentive to pay publishers for that data, which it is already getting for free.

This prevents the emergence of a well-functioning marketplace where AI developers negotiate fair value for content. Instead, other AI companies are disincentivized from coming to the table, as they are structurally disadvantaged by a system that allows one dominant player to bypass compensation entirely. As the CMA itself recognizes, "[b]y not providing sufficient control over how this content is used, Google can limit the ability of publishers to monetise their content, while accessing content for AI-generated results in a way that its competitors cannot match”.

Google’s advantage

Cloudflare data validates the concern about Google’s competitive advantage. Based on our data, Googlebot sees significantly more Internet content than its closest peers.

Over an observed period of two months, Googlebot successfully accessed individual pages almost two times more than ClaudeBot and GPTBot, three times more than Meta-ExternalAgent, and more than three times more than Bingbot. The difference was even more extreme for other popular AI crawlers: for instance, Googlebot saw 167 times more unique pages than PerplexityBot. Out of the sampled unique URLs using our network that we observed over the last two months, Googlebot crawled roughly 8%.

In rounded multiple terms, Googlebot sees:

vs. ~1.70x the amount of unique URLs seen by ClaudeBot;
vs. ~1.76x the amount of unique URLs seen by GPTBot;
vs. ~2.99x the amount of unique URLs by Meta-ExternalAgent;
vs. ~3.26x the amount of unique URLs seen by Bingbot;
vs. ~5.09x the amount of unique URLs seen by Amazonbot;
vs. ~14.87x the amount of unique URLs seen by Applebot;
vs. ~23.73x the amount of unique URLs seen by Bytespider;
vs. ~166.98x the amount of unique URLs seen by PerplexityBot;
vs. ~714.48x the amount of unique URLs seen by CCBot; and
vs: ~1801.97x the amount of unique URLs seen by archive.org_bot.

Googlebot also stands out in other Cloudflare datasets.

Even though it ranks as the most active bot by overall traffic, publishers are far less likely to disallow or block Googlebot in their robots.txt file compared to other crawlers. This is likely due to its importance in driving human traffic to their content—and, as a result, ad revenue—through search.

As shown below, almost no website explicitly disallows the dual-purpose Googlebot in full, reflecting how important this bot is to driving traffic via search referrals. (Note that partial disallows often impact certain parts of a website that are irrelevant for search engine optimization, or SEO, such as login endpoints.)

Robots.txt merely allows the expression of crawling preferences; it is not an enforcement mechanism. Publishers rely on “good bots” to comply. To manage crawler access to their sites more effectively—and independently of a given bot’s compliance—publishers can set up a Web Application Firewall (WAF) with specific rules, technically preventing undesired crawlers from accessing their sites. Following the same logic as with robots.txt above, we would expect websites to block mostly other AI crawlers but not Googlebot.

Indeed, when comparing the numbers for customers using AI Crawl Control, Cloudflare’s own AI crawler blocking tool that is integrated in our Application Security suite, between July 2025 and January 2026, one can see that the number of websites actively blocking other popular AI crawlers (e.g., GPTBot, Claudebot), was nearly seven times as high as the number of websites that blocked Googlebot and Bingbot. (Like Googlebot, Bingbot combines search and AI crawling and drives traffic to these sites, but given its small market share in search, its impact is less significant.)

So we agree with the CMA on the problem statement. But how can publishers be enabled to effectively opt out of Google using their content for its generative AI applications? We share the CMA’s conclusion that “in order to be able to make meaningful decisions about how Google uses their Search Content, (...) publishers need the ability effectively to opt their Search Content out of both Google’s search generative AI features and Google’s broader generative AI services.”

But we’re concerned that the CMA’s proposal is insufficient.

CMA’s proposed publisher conduct requirements

On January 28, 2026, the CMA published four sets of proposed conduct requirements for Google, including conduct requirements related to publishers. According to the CMA, the proposed publisher rules are designed to address concerns that publishers (1) lack sufficient choice over how Google uses their content in its AI-generated responses, (2) have limited transparency into Google’s use of that content, and (3) do not get effective attribution for Google’s use of their content. The CMA recognized the importance of these concerns because of the role that Google search plays in finding content online.

The conduct requirements would mandate Google grant publishers "meaningful and effective" control over whether their content is used for AI features, like AI Overviews. Google would be prohibited from taking any action that negatively impacts the effectiveness of those control options, such as intentionally downranking the content in search.

To support informed decisionmaking, the CMA proposal also requires Google to increase transparency, by publishing clear documentation on how it uses crawled content for generative AI and on exactly what its various publisher controls cover in practice. Finally, the proposal would require Google to ensure effective attribution of publisher content and to provide publishers with detailed, disaggregated engagement data—including specific metrics for impressions, clicks, and "click quality"—to help them evaluate the commercial value of allowing their content to be used in AI-generated search summaries.

The CMA’s proposed remedies are insufficient

Although we support the CMA’s efforts to improve options for publishers, we are concerned that the proposed requirements do not solve the underlying issue of promoting fair, transparent choice over how their content is used by Google. Publishers are effectively forced to use Google’s proprietary opt-out mechanisms, tied specifically to the Google platform and under the conditions set by Google, rather than granting them direct, autonomous control. A framework where the platform dictates the rules, manages the technical controls, and defines the scope of application does not offer “effective control” to content creators or encourage competitive innovation in the market. Instead, it reinforces a state of permanent dependency.

Such a framework also reduces choice for publishers. Creating new opt-out controls makes it impossible for publishers to choose to use external tools to block Googlebot from accessing their content without jeopardizing their appearance in Search results. Instead, under the current proposal, content creators will still have to allow Googlebot to scrape their websites, with no enforcement mechanisms to deploy and limited visibility available if Google does not respect their signalled preferences. Enforcement of these requirements by the CMA, if done properly, will be very onerous, without guarantee that publishers will trust the solution.

In fact, Cloudflare has received feedback from its customers that Google’s current proprietary opt-out mechanisms, including Google-Extended and ‘nosnippet’, have failed to prevent content from being utilized in ways that publishers cannot control. These opt-out tools also do not enable mechanisms for fair compensation for publishers.

More broadly, as reflected in our proposed responsible AI bot principles, we believe that all AI bots should have one distinct purpose and declare it, so that website owners can make clear decisions over who can access their content and why. Unlike its leading competitors, such as OpenAI and Anthropic, Google does not comply with this principle for Googlebot, which is used for multiple purposes (search indexing, AI training, and inference/grounding). Simply requiring Google to develop a new opt-out mechanism would not allow publishers to achieve meaningful control over the use of their content.

The most effective way to give publishers that necessary control is to require Googlebot to be split up into separate crawlers. That way, publishers could allow crawling for traditional search indexing, which they need to attract traffic to their sites, but block access for unwanted use of their content in generative AI services and features.

Requiring crawler separation is the only effective solution

To ensure a fair digital ecosystem, the CMA must instead empower content owners to prevent Google from accessing their data for particular purposes in the first place, rather than relying on Google-managed workarounds after the crawler has already accessed the content for other purposes. That approach also enables creators to set conditions for access to their content.

Although the CMA described crawler separation as an “equally effective intervention”, it ultimately rejected mandating separation based on Google’s input that it would be too onerous. We disagree.

Requiring Google to split up Googlebot by purpose — just like Google already does for its nearly 20 other crawlers — is not only technically feasible, but also a necessary and proportionate remedy that empowers website operators to have the granular control they currently lack, without increasing traffic load from crawlers to their websites (and in fact, perhaps even decreasing it, should they choose to block AI crawling).

To be clear, a crawler separation remedy benefits AI companies, by leveling the playing field between them and Google, in addition to giving UK-based publishers more control over their content. (There has been widespread public support for a crawler separation remedy by Daily Mail Group, the Guardian and the News Media Association.) Mandatory crawler separation is not a disadvantage to Google, nor does it undermine investment in AI. On the contrary, it is a pro-competitive safeguard that prevents Google from leveraging its search monopoly to gain an unfair advantage in the AI market. By decoupling these functions, we ensure that AI development is driven by fair-market competition rather than the exploitation of a single hyperscaler’s dominance.

******

The UK has a unique chance to lead the world in protecting the value of original and high-quality content on the Internet. However, we worry that the current proposals fall short. We would encourage rules that ensure that Google operates under the same conditions for content access as other AI developers, meaningfully restoring agency to publishers and paving the way for new business models promoting content monetization.

Cloudflare remains committed to engaging with the CMA and other partners during upcoming consultations to provide evidence-based data to help shape a final decision on conduct requirements that are targeted, proportional, and effective. The CMA still has an opportunity to ensure that the Internet becomes a fair marketplace for content creators and smaller AI players—not just a select few tech giants.

To build a better Internet in the age of AI, we need responsible AI bot principles. Here’s our proposal.

Leah Romm — Wed, 24 Sep 2025 13:00:00 GMT

Cloudflare has a unique vantage point: we see not only how changes in technology shape the Internet, but also how new technologies can unintentionally impact different stakeholders. Take, for instance, the increasing reliance by everyday Internet users on AI–powered chatbots and search summaries. On the one hand, end users are getting information faster than ever before. On the other hand, web publishers, who have historically relied on human eyeballs to their website to support their businesses, are seeing a dramatic decrease in those eyeballs, which can reduce their ability to create original high-quality content. This cycle will ultimately hurt end users and AI companies (whose success relies on fresh, high-quality content to train models and provide services) alike.

We are indisputably at a point in time when the Internet needs clear “rules of the road” for AI bot behavior (a note on terminology: throughout this blog we refer to AI bots and crawlers interchangeably). We have had ongoing cross-functional conversations, both internally and with stakeholders and partners across the world, and it’s clear to us that the Internet at large needs key groups — publishers and content creators, bot operators, and Internet infrastructure and cybersecurity companies — to reach a consensus on certain principles that AI bots should follow.

Of course, agreeing on what exactly those principles are will take time and require continued discussion and collaboration, and a policy framework can’t perfectly capture every technical concern. Nevertheless, we think it’s important to start a conversation that we hope others will join. After all, a rough draft is better than a blank page.

That is why we are proposing the following responsible AI bot principles as starting points:

Public disclosure: Companies should publicly disclose information about their AI bots;
Self-identification: AI bots should truthfully self-identify, eventually replacing less reliable methods, like user agent and IP address verification, with cryptographic verification;
Declared single purpose: AI bots should have one distinct purpose and declare it;
Respect preferences: AI bots should respect and comply with preferences expressed by website operators where proportionate and technically feasible;
Act with good intent: AI bots must not flood sites with excessive traffic or engage in deceptive behavior.

Each principle is discussed in greater detail below. These principles focus on AI bots because of the impact generative AI is having on the Internet, but we have already seen these practices in action with other types of (non-AI) bots as well. We believe these principles will help move the Internet in a better direction. That said, we acknowledge that they are a starting point for this conversation, which requires input from other stakeholders. The Internet has always been a collaborative place for innovation, and these principles should be seen as equally dynamic and evolving.

Why Cloudflare is encouraging this conversation

Since declaring July 1st Content Independence Day, Cloudflare has strived to play a balanced and effective role in safeguarding the future of the Internet in the age of generative AI. We have enabled customers to charge AI crawlers for access or block them with one click, published and enforced our verified bots policy and developed the Web Bot Auth proposal, and unapologetically called out and stopped bad behavior.

While we have recently focused our attention on AI crawlers, Cloudflare has long been a leader in the bot management space, helping our customers protect their websites from unwanted — and even malicious —traffic. We also want to make sure that anyone — whether they’re our customer or not — can see which AI bots are abiding by all, some, or none of these best practices.

But we aren’t ignorant to the fact that companies operating crawlers are also adapting to a new Internet landscape — and we genuinely believe that most players in this space want to do the right thing, while continuing to innovate and propel the Internet in an exciting direction. Our hope is that we can use our expertise and unique vantage point on the Internet to help bring seemingly incompatible parties together and find a path forward — continuing our mission of helping to build a better Internet for everyone.

Responsible AI bot principles

The following principles are a launchpad for a larger conversation, and we recognize that there is work to be done to address many nuanced perspectives. We envision these principles applying to AI bots but understand that technical complexity may require flexibility. Ultimately, our goal is to emphasize transparency, accountability, and respect for content access and use preferences. If these principles fall short of that — or fail to consider other important priorities — we want to know.

Principle #1: Public disclosure

Companies should publicly disclose information about their AI bots. The following information should be publicly available and easy to find:

Identity: information that helps external parties identify a bot, e.g., user agent, relevant IP address(es), and/or individual cryptographic identification (more on this below, in Principle #2: Self-identification).
Operator: the legal entity responsible for the AI bot, including a point of contact (e.g., for reporting abuse);
Purpose: for which purpose the accessed data will be used, i.e., search, AI-input, or training (more on this below, in Principle #3: Declared Single Purpose).

OpenAI is an example of a leading AI company that clearly discloses their bots, complete with detailed explanations of each bot’s purpose. The benefits of this disclosure are apparent in the subsequent principles. It helps website operators validate that a given request is in fact coming from OpenAI and what its purpose is (e.g., search indexing or AI model training). This, in turn, enables website operators to control access to and use of their content through preference expression mechanisms, like robots.txt files.

Principle #2: Self-identification

AI bots should truthfully self-identify. Not only should information about bots be disclosed in a publicly accessible location, this information should also be clearly communicated by bots themselves, e.g., through an HTTP request that conveys the bot’s official user agent and comes from an IP address that the bot claims to send traffic from. Admittedly, this current approach is flawed, as we discuss in more detail below. But until cryptographic verification is more widely adopted, we think relying on user agent and IP verification is better than nothing.

OpenAI’s GPTBot is an example of this principle in action. OpenAI publicly shares the expected full user-agent string for this bot and includes it in its requests. OpenAI also explains this bot’s purpose (“used to make [OpenAI’s] generative AI foundation models more useful and safe” and “to crawl content that may be used in training [their] generative AI foundation models”). And we have observed this bot sending traffic from IP addresses reported by OpenAI. Because site operators see GPTBot’s user agent and IP addresses matching what is publicly disclosed and expected, and they know information about the bot is publicly documented, they can confidently recognize the bot. This enables them to make informed decisions about whether they want to allow traffic from it.

Unfortunately, not all bots uphold this principle, making it difficult for website owners to know exactly which bot operators respect their crawl preferences, much less enforce them. For example, while Anthropic publishes its user agent alone, absent other verifiable information, it’s unclear which requests are truly from Anthropic. And xAI’s bot, grok, does not self-identify at all, making it impossible for website operators to block it. Anthropic and xAI’s lack of identification undermines trust between them and website owners, yet this could be fixed with minimal effort on their parts.

A note on cryptographic verification and the future of Principle #2

Truthful declaration of user agent and dedicated IP lists have historically been a functional way to verify. But in today’s rapidly-evolving bot climate, bots are increasingly vulnerable to being spoofed by bad actors. These bad actors, in turn, ignore robots.txt, which communicates allow/disallow preferences only on a user agent basis (so, a bad bot could spoof a permitted user agent and circumvent that domain’s preferences).

Ultimately, every AI bot should be cryptographically verified using an accepted standard. This would protect them against spoofing and ensure website operators have the accurate and reliable information they need to properly evaluate access by AI bots. At this time, we believe that Web Bot Auth is sufficient proof of compliance with Principle #2. We recognize that this standard is still in development, and, as a result, this principle may evolve accordingly.

Web Bot Auth uses cryptography to verify bot traffic; cryptographic signatures in HTTP messages are used as verification that a given request came from an automated bot. Our implementation relies on proposed IETF directory and protocol drafts. Initial reception of Web Bot Auth has been very positive, and we expect even more adoption. For example, a little over a month ago, Vercel announced that its bot verification now supports Web Bot Auth. And OpenAI’s ChatGPT agent now signs its requests using Web Bot Auth, in addition to using the HTTP Message Signatures standard.

We envision a future where cryptographic authentication becomes the norm, as we believe this will further strengthen the trustworthiness of bots.

Principle #3: Declared single purpose

AI bots should have one distinct purpose and declare it. Today, some bots self-identify their purpose as Training, Search, or User Action (i.e., accessing a web page in response to a user’s query).

However, these purposes are sometimes combined without clear distinction. For example, content accessed for search purposes might also be used to train the AI model powering the search engine. When a bot’s purpose is unclear, website operators face a difficult decision: block it and risk undermining search engine optimization (SEO), or allow it and risk content being used in unwanted ways.

When operators deploy bots with distinct purposes, website owners are able to make clear decisions over who can access their content. What those purposes should be is up for debate, but we think the following breakdown is a starting point based on bot activity we see. We recognize this is an evolving space and changes may be required as innovation continues:

Search: building a search index and providing search results (e.g., returning hyperlinks and short excerpts from your website’s contents). Search does not include providing AI-generated search summaries;
AI-input: inputting content into one or more AI models, e.g., retrieval-augmented generation (RAG), grounding, or other real-time taking of content for generative AI search answers; and
Training: training or fine-tuning AI models.

Relatedly, bots should not combine purposes in a way that prevents web operators from deliberately and effectively deciding whether to allow crawling.

Let’s consider two AI bots, OAI-SearchBot and Googlebot, from the perspective of Vinny, a website operator trying to make a living on the Internet. OAI-SearchBot has a single purpose: linking to and surfacing websites in ChatGPT’s search features. If Vinny takes OpenAI at face value (which we think it makes sense to do), he can trust that OAI-SearchBot does not crawl his content for training OpenAI’s generative AI models rather, a separate bot (GPTBot, as discussed in Principle #2: Self-identification) does. Vinny can decide how he wants his content used by OpenAI, e.g., permitting its use for search but not for AI training, and feel confident that his choices are respected because OAI-SearchBot only crawls for search purposes, while GPTBot is not granted access to the content in the first place (and therefore cannot use it).

On the other hand, while Googlebot scrapes content for traditional search-indexing (not model training), it also uses that content for inference purposes, such as for AI Overviews and AI Mode. Why is this a problem for Vinny? While he almost certainly wants his content appearing in search results, which drive the human eyeballs that fund his site, Vinny is forced to also accept that his content will appear in Google’s AI-generated summaries. If eyeballs are satisfied by the summary then they never visit Vinny’s website, which leads to “zero-click” searches and undermines Vinny’s ability to financially benefit from his content.

This is a vicious cycle: creating high-quality content, which typically leads to higher search rankings, now inadvertently also reduces the chances an eyeball will visit the site because that same valuable content is surfaced in an AI Overview (if it is even referenced as a source in the summary). To prevent this, Vinny must either opt out of search completely or use snippet controls (which risks degrading how his content appears in search results). This is because the only available signal to opt-out of AI, disallowing Google-Extended, is limited to training and does not apply to AI Overview, which is attached to search. Whether by accident or by design, this setup forces an impossible choice onto website owners.

Finally, the prominent technical argument in favor of combining multiple purposes — that this reduces the crawler operator’s costs — needs to be debunked. To reason by analogy: it’s like arguing that placing one call to order two pizzas is cheaper than placing two calls to order two pizzas. In reality, the cost of the two pizzas (both of which take time and effort to make) remains the same. The extra phone call may be annoying, but its costs are negligible.

Similarly, whether one bot request is made for two purposes (e.g., search indexing and AI model training) or a separate bot request is made for each of two purposes, the costs basically remain the same. For the crawler, the cost of compute is the same because the content still needs to be processed for each purpose. And the cost of two connections (i.e., for two requests) is virtually the same as one. We know this because Cloudflare runs one of the largest networks in the world, handling on average 84 million requests per second, so we understand the cost of requests at Internet scale. (As an aside, while additional crawls incur costs on website operators, they have the ability to choose whether the crawl is worth the cost, especially when bots have a single purpose.)

Principle # 4: Respect preferences

AI bots should respect and comply with preferences expressed by website operators where proportionate and technically feasible. There are multiple options for expressing preferences. Prominent examples include the longstanding and familiar robots.txt, as well as newly emerging HTTP headers.

Given the widespread use of robots.txt files, bots should make a good faith attempt to fetch a robots.txt file first, in accordance with RFC 9309, and abide by both the access and use preferences specified therein. AI bot operators should also stay up to date on how those preferences evolve as a result of a draft vocabulary currently under development by an IETF working group. The goal of the proposed vocabulary is to improve granularity in robots.txt files, so that website operators are empowered to control how their assets are used.

At the same time, new industry standards under discussion may involve the attachment of machine-readable preferences to different formats, such as individual files. AI bot operators should eventually be prepared to comply with these standards, too. One idea currently being explored is a way for site owners to list preferences via HTTP headers, which offer a server-level method of declaring how content should be used.

Principle #5: Act with good intent

AI bots must not flood sites with excessive traffic or engage in deceptive behavior. AI bot behavior should be benign or helpful to website operators and their users. It is also incumbent on companies that operate AI bots to monitor their networks and resources for breaches and patch vulnerabilities. Jeopardizing a website’s security or performance or engaging in harmful tactics is unacceptable.

Nor is it appropriate to appear to comply with the principles, only to secretly circumvent them. Reaffirming a long-standing principle of acceptable bot behavior, AI bots must never engage in stealth crawling or use other stealth tactics to try and dodge detection, such as modifying their user agent, changing their source ASNs to hide their crawling activity, or ignoring robots.txt files. Doing so would undermine the preceding four principles, hurting website operators and worsening the Internet for all.

The road ahead: multi-stakeholder efforts to bring these principles to life

As we continue working on these principles and soliciting feedback, we strive to find a balance: we want the wishes of content creators respected while still encouraging AI innovation. It’s a privilege to sit at the intersection of these important interests and to play a crucial role in developing an agreeable path forward.

We are continuing to engage with right holders, AI companies, policy-makers, and regulators to shape global industry standards and regulatory frameworks accordingly. We believe that the influx of generative AI use need not threaten the Internet’s place as an open source of quality content. Protecting its integrity requires agreement on workable technical standards that reflect the interests of web publishers, content creators, and AI companies alike.

The whole ecosystem must continue to come together and collaborate towards a better Internet that truly works for everyone. Cloudflare advocates for neutral forums where all affected parties can discuss the impact of AI developments on the Internet. One such example is the IETF, which has current work focused on some of the technical aspects being considered. Those efforts attempt to address some, but not all, of the issues in an area that deserves holistic consideration. We believe the principles we have proposed are a step in the right direction — but we hope others will join this complex and important conversation, so that norms and behavior on the Internet can successfully adapt to this exciting new technological age.

Towards a global framework for cross-border data flows and privacy protection

Sebastian Hufnagel — Fri, 27 Jan 2023 14:00:00 GMT

As our societies and economies rely more and more on digital technologies, there is an increased need to share and transfer data, including personal data, over the Internet. Cross-border data flows have become essential to international trade and global economic development. In fact, the digital transformation of the global economy could never have happened as it did without the open and global architecture of the Internet and the ability for data to transcend national borders. As we described in our blog post yesterday, data localization doesn’t necessarily improve data privacy. Actually, there can be real benefits to data security and - by extension - privacy if we are able to transfer data across borders. So with Data Privacy Day coming up tomorrow, we wanted to take this opportunity to drill down into the current environment for the transfer of personal data from the EU to the US, which is governed by the EU’s privacy regulation (GDPR). Looking to the future, we will make the case for a more stable, global cross-border data transfer framework, which will be critical for an open, more secure and more private Internet.

The privacy challenge to cross-border data flows

In the last decade, we have observed a growing tendency around the world to ring-fence the Internet and erect new barriers to international data flows, especially personal data. In some cases this has resulted in less choice and poorer performance for users of digital products and services. In other cases it has limited free access to information, and - paradoxically- in some cases this has resulted in even less data security and privacy, which is contrary to the very rationale of data protection regulations. The motives for these concerning developments are manifold, ranging from a lack of trust with regard to privacy protection in third countries, to asserting national security, to seeking economic self-determination.

In the European Union, for the last few years, even the most privacy-focused companies (like Cloudflare) have faced a drumbeat of speculation and concerns from some hardliner data protection authorities, privacy activists and others about whether data processed by US cloud service providers could really be processed in a manner that complies with the GDPR. Often, these concerns are purely legalistic and fail to take into account the actual risks associated with a specific data transfer, and, in Cloudflare’s case, the essential contribution of our services to the security and privacy of millions of European Internet users. In fact, official guidance from the European Data Protection Board (EDPB) has confirmed that EU personal data can still be processed in the US, but this has become quite complicated since the suspension of the Privacy Shield framework by the European Court of Justice with its 2020 Schrems II judgment: data controllers must use legal transfer mechanisms such as EU standard contractual clauses as well as a host of additional legal, technical and organizational safeguards.

However, it is ultimately up to the competent data protection authorities to decide whether such measures are sufficient in a case-by-case interpretation. Since these cases are often quite complex, since every case is different, and since there are 45 data protection authorities across Europe alone, this approach simply doesn’t scale. Further, DPAs - sometimes even within the same EU country (Germany) - have disagreed in their interpretation of the law when it comes to third country transfers. And when it comes to an actual court ruling, it is our experience that the courts tend to be more pragmatic and balanced about data protection than the DPAs are. But it takes a long time and many resources before a data protection case ends up before a court. This is particularly problematic for small businesses that can’t afford lengthy legal battles. As a result, the theoretical threat of a hefty fine from a DPA may create enough of a deterrent for them to stop using services involving third-country data transfers altogether, even if those services provide greater security and privacy for the personal data they process, and make them more productive. This is clearly not in the interest of the European economy and most likely was not the intention of policy-makers when adopting the GDPR back in 2016.

The good news: there is hope on the horizon

While recent developments will not resolve all the challenges mentioned above, last December, after years of complex negotiations, international policy-makers took two important steps towards restoring legal certainty and trust relating to cross-border flows of personal data.

On December 13, 2022, the European Commission published its long-awaited preliminary assessment that the EU would consider that personal data transferred from the EU to the US under the future EU-US Data Privacy Framework (DPF) enjoys an adequate level of protection in the United States. The assessment follows the recent signing of Executive Order 14086 by US President Biden, which comprehensively addressed the concerns expressed by the European Court of Justice (ECJ) in its 2020 Schrems II decision. Notably, the US government will impose additional limits on US authorities’ use of bulk surveillance methods against non-US citizens and create an independent redress mechanism in the US that allows EU data subjects to exercise their data protection rights. While the Commission’s initial assessment is only the start of an EU ratification process that is expected to take about 4-6 months, experts are very optimistic that it will be adopted at the end.

Just one day later, the US, along with the 37 other OECD countries and the European Union, adopted a first-of-its kind agreement to enhance trust in cross-border data flows between rule-of law democratic systems, by articulating joint principles for safeguards to protect privacy and other human rights and freedoms when governments access personal data held by private entities on grounds of national security and law enforcement. Where legal frameworks require that transborder data flows are subject to safeguards, like in the case of GDPR in the EU, participants agreed to “take into account a destination country’s effective implementation of the principles as a positive contribution towards facilitating transborder data flows in the application of those rules.” (It’s also good to note that, in line with Cloudflare’s mission to help build a better Internet, the OECD declaration recalls members’ shared commitment to a “global, open, accessible, interconnected, interoperable, reliable and secure Internet”).

The future: a truly global privacy framework

The EU-US DPF and the OECD Declaration are complementary to each other and both mark important steps to restore trust in cross-border data flows between countries that share common values like democracy and the rule of law, protecting privacy and other human rights and freedoms. However, both approaches come with their own limitations: the DPF is limited to personal data transfers from the EU to the US In addition, it cannot be excluded that it will be invalidated by the ECJ again in a few years time, as privacy activists have already announced that they will legally challenge it again. The OECD Declaration, on the other hand, is global in scope, but limited to general principles for governments, which can be interpreted quite differently in practice.

This is why, in addition to these efforts, we need a stable, multilateral framework with specific privacy protection requirements, which cannot be invalidated unilaterally. One single global certification should suffice for participating companies to safely transfer personal data between participating countries worldwide. The emerging Global Cross Border Privacy Rules (CBPR) certification, which is already supported by several governments from North America and Asia, looks very promising in this regard.

European policy-makers will ultimately need to decide whether they want to continue on the present path, which risks leaving Europe behind as an isolated data island. Alternatively, the EU could revise its privacy regulation with a view to prevent Europe’s many national and regional data protection authorities from interpreting it in a way that is out of touch with reality. It could also make it interoperable with a global framework for cross-border data flows based on shared values and mutual trust.

Cloudflare will continue to actively engage with policy-makers globally to create awareness for the practical challenges our industry is facing and to work on sustainable policy solutions for an open and interconnected Internet that is more private and secure.

Data Privacy Day tomorrow provides a unique occasion for us all to celebrate the significant progress achieved so far to protect users’ privacy online. At the same time, we should use this day to reflect on how regulations can be adapted or enforced in a way that more meaningfully protects privacy, notably by prioritizing the use of security and privacy-enhancing technologies over prohibitive approaches that harm the economy without tangible privacy benefits.

Project Safekeeping – protecting the world’s most vulnerable infrastructure with Zero Trust

Carly Ramsey — Tue, 13 Dec 2022 14:00:00 GMT

Under-resourced organizations that are vital to the basic functioning of our global communities face relentless cyber attacks, threatening basic needs for health, safety and security.

Cloudflare’s mission is to help make a better Internet. Starting December 13, 2022, we will help support these vulnerable infrastructure by providing our enterprise-level Zero Trust cybersecurity solution to them at no cost, with no time limit.

It is our pleasure to introduce our newest Impact initiative: Project Safekeeping.

Small targets, devastating impacts

Critical infrastructure is an obvious target for cyber attack: by its very definition, these are the organizations and systems that are crucial for the functioning of our society and economy. As such, these organizations cannot have prolonged interruptions in service, or risk having sensitive data exposed.

Our conversations over the past few months with government officials in Australia, Germany, Japan, Portugal, and the United Kingdom show that they are focused on the threat to critical infrastructure, but resource constraints mean that their attention is on protecting large organizations – immense financial institutions, hospital networks, oil pipelines, and airports. Yet, the small critical infrastructure organizations that are the foundation of our communities are also at risk: the neighborhood hospital, water treatment facility, and local energy provider that fulfill our fundamental needs. We tend to ignore the small-yet-vitally-important companies that form the supply chains of our nationwide critical systems.

Unlike large organizations, smaller organizations typically do not have the capacity to manage relentless cyber attacks – usually operating on shoestring budgets, they do not have security personnel, threat insight teams, or the latest technology to keep their organizations secure. The numerous real life examples of cyber attacks against these small but vital organizations best illustrate the devastating impacts: in Japan, ransomware shut down a hospital’s access to patient records for nearly two months, halting the hospital’s ability to accept any new patients, including emergency patients; and in Germany, ransomware compromised a local county’s IT systems and no local public services could be provided to citizens for weeks, while the county is still struggling with the aftermath of the attack one year on.

Project Safekeeping: protecting global vulnerable critical infrastructure with Zero Trust

We at Cloudflare believe in helping to build a better Internet, for everyone. And we think that the welfare of our local communities should not be at risk because of the budget and operational constraints of these small and vulnerable entities. We think that we are particularly well-suited to help: Cloudflare is a global cybersecurity provider that blocked an average of 126 billion cyber threats each day in Q3 2022. And with Project Galileo and the Athenian Project, we have rich experience supporting organizations that are particularly vulnerable to cyber threats and lack the resources to protect themselves.

We want our support to be meaningful in order to allow these entities to focus on what they do best – meeting our communities’ basic needs. As expressed in this blog, Cloudflare provides an innovative and elegant solution to cybersecurity: Zero Trust. Zero Trust is a radical change in the approach to cybersecurity that is both effective and effortless, something that a resource-strapped organization will certainly appreciate.

Earlier this year, in response to the increasing cyber attacks on critical infrastructure stemming from Russia’s invasion of Ukraine, we provided our Zero Trust solution to critical infrastructure in the United States via the Critical Infrastructure Defense Project. Now, we are expanding our support to the global community, initially focusing our efforts in Australia, Japan, Germany, Portugal and the United Kingdom.

What Zero Trust services are available?

Depending on their specific needs, eligible entities in these regions will have our enterprise-level Zero Trust cybersecurity services for free and with no time limit – there is no catch and no underlying obligations. Eligible organizations will benefit from the full range of our Zero Trust services:

Connecting users to applications: Real-time verification of every user to every protected application in order to protect internal resources and defend against potential data breaches.
Filtering traffic: A Secure Web Gateway (SWG) prevents cyber threats and data breaches by filtering unwanted content from web traffic, blocking unauthorized user behavior, and enforcing company security policies.
Securing cloud applications: A Cloud Access Security Broker, or CASB, performs several security functions for cloud-hosted services (e.g. SaaS, IaaS, and PaaS applications). Standard CASBs secure confidential data through access control and data loss prevention, reveal shadow IT, and ensure compliance with data privacy regulations.
Protecting sensitive data: Data Loss Prevention (DLP) secures your orgnizations’ most sensitive data in transit.
Email security: Area 1 preemptively blocks phishing, Business Email Compromise attacks, malware-less fraud, and other incessant attacks coming through email.
Safer web browsing: Remote Browser Isolation (RBI) insulates users from untrusted web content and protects data in browser interactions from untrusted users and devices.

In addition to Zero Trust services above, eligible entities will have our world-class application security products – DDOS protection and Web Application Firewall (WAF).

Who can apply?

To be eligible, Project Safekeeping participants must be:

Located in Australia, Japan, Germany, Portugal, and the United Kingdom.
Considered critical infrastructure by governments in their respective localities.
Approximately up to 50 people and/or less than USD $10million in annual revenue/ balance sheet total.

If you think your organization may be eligible, we welcome you to contact us to learn more and apply, please visit: https://www.cloudflare.com/lp/project-safekeeping/.