Cloudflare outage on February 20, 2026

On February 20, 2026, at 17:48 UTC, Cloudflare experienced a service outage. This incident affected a subset of customers utilizing Cloudflare’s Bring Your Own IP (BYOIP) service, as their Internet routes were withdrawn via Border Gateway Protocol (BGP).

The outage was not the result of a cyberattack or malicious activity. Instead, it stemmed from a change to how Cloudflare’s network manages IP addresses onboarded through the BYOIP pipeline. This modification inadvertently caused the withdrawal of customer prefixes.

Consequently, some BYOIP customers found their services and applications unreachable from the Internet, leading to connection timeouts and failures across their Cloudflare deployments that relied on BYOIP. The website for Cloudflare’s recursive DNS resolver (1.1.1.1) also displayed 403 errors. The incident lasted 6 hours and 7 minutes, with the majority of this time dedicated to restoring prefix configurations to their original state.

Cloudflare engineers initiated a rollback of the change, and prefix withdrawals ceased upon detection of failures. Approximately 1,100 BYOIP prefixes were withdrawn from the Cloudflare network before the change could be fully reverted. Some customers managed to restore their services by using the Cloudflare dashboard to re-advertise their IP addresses. The incident concluded once all prefix configurations were restored.

Cloudflare expressed regret for the impact on its customers. This article provides a detailed account of the event, identifying system and process failures, and outlines steps being taken to prevent similar outages in the future.

How the Outage Impacted Customers

The graph below illustrates the number of prefixes advertised by Cloudflare to a BGP neighbor during the incident. A reduction in advertised prefixes indicates unreachability on the Internet.

Of the 6,500 prefixes advertised to this peer, 4,306 of those were BYOIP prefixes. These BYOIP prefixes are advertised globally to all peers.

During the incident, 1,100 out of 6,500 total prefixes were withdrawn between 17:56 and 18:46 UTC. This meant 25% of the 4,306 BYOIP prefixes were unintentionally withdrawn. Impact on one.one.one.one was detected, allowing the problematic change to be reverted before further prefixes were affected. At 19:19 UTC, customers received guidance on self-remediation by re-advertising their prefixes via the Cloudflare dashboard.

Cloudflare successfully reverted many advertisement changes around 20:20 UTC, restoring 800 prefixes. Approximately 300 prefixes remained un-remediated through the dashboard because a software bug had removed their service configurations from the edge. Cloudflare engineers manually restored these prefixes by 23:03 UTC.

The incident did not affect all BYOIP customers, as the configuration change was applied iteratively rather than instantaneously. The change was reverted once its impact became apparent, preventing a broader customer impact.

Affected BYOIP customers initially experienced BGP Path Hunting. In this state, user connections attempt to find a route to the destination IP, persisting until the connection times out. This failure mode continued until the prefix was re-advertised, impacting any product using BYOIP for Internet advertisement. Additionally, visitors to one.one.one.one, Cloudflare’s recursive DNS resolver website, encountered HTTP 403 errors with an “Edge IP Restricted” message. DNS resolution via the 1.1.1.1 Public Resolver, including DNS over HTTPS, remained unaffected. A detailed breakdown of impacted services is provided below:

Service Impact Summary

Core CDN and Security Services: Traffic was not directed to Cloudflare, resulting in connection failures for users accessing websites advertised on the affected ranges.
Spectrum: Spectrum applications on BYOIP failed to proxy traffic because traffic was not being attracted to Cloudflare.
Dedicated Egress: Customers using Gateway Dedicated Egress with BYOIP or Dedicated IPs for CDN Egress with BYOIP could not send traffic to their destinations.
Magic Transit: End users attempting to connect to applications protected by Magic Transit found them unadvertised on the Internet, leading to connection timeouts and failures.

Some customers could not restore service by toggling prefixes on the Cloudflare dashboard. Even as engineers re-announced prefixes for these customers, they might have observed increased latency and failures despite their IP addresses being advertised. This was due to addressing settings for some users being removed from edge servers by a software issue, requiring state propagation back to the edge.

To understand the root cause, an overview of Cloudflare’s Addressing API, the authoritative source for customer IP addresses, is necessary.

Cloudflare’s Addressing API

The Addressing API serves as the authoritative dataset for addresses within the Cloudflare network. Modifications to this dataset are instantly reflected across Cloudflare’s global network. While improvements are underway for how these systems deploy changes, as part of Code Orange: Fail Small, customers currently configure IP addresses by interacting with public APIs. These APIs configure databases that initiate operational workflows, propagating changes to Cloudflare’s edge. This ensures immediate propagation of Addressing API changes to the Cloudflare edge.

The process of advertising and configuring IP addresses on Cloudflare includes several stages:

Customers communicate advertisement or withdrawal of IP addresses through the Addressing API or BGP Control.
The Addressing API then directs machines to modify prefix advertisements.
BGP is updated on routers once a sufficient number of machines have received the prefix update notification.
Finally, customers can configure Cloudflare products to use BYOIP addresses via service bindings, which assign products to these ranges.

While the Addressing API automates most processes related to address advertisement or withdrawal, some steps still require manual intervention. These manual processes carry inherent risks due to their direct interaction with Production environments. A key objective of the Code Orange: Fail Small initiative is to eliminate manual actions within the Addressing API, replacing them with secure, automated workflows.

How the Incident Occurred

The configuration failure originated from a modification designed to automate the removal of prefixes from Cloudflare’s BYOIP service, a task previously performed manually by customers. This automation was part of the Code Orange: Fail Small initiative, aiming to transition all changes to safe, automated, and health-mediated deployments. Given the potentially large number of related BYOIP prefix objects, this automation was implemented as a recurring sub-task that identified and removed prefixes slated for deletion. However, this cleanup sub-task contained a bug in its API query.

The API query from the cleanup sub-task is shown below:

 resp, err := d.doRequest(ctx, http.MethodGet, `/v1/prefixes?pending_delete`, nil)


The relevant section of the API implementation is:
	if v := req.URL.Query().Get("pending_delete"); v != "" {
		// ignore other behavior and fetch pending objects from the ip_prefixes_deleted table
		prefixes, err := c.RO().IPPrefixes().FetchPrefixesPendingDeletion(ctx)
		if err != nil {
			api.RenderError(ctx, w, ErrInternalError)
			return
		}

		api.Render(ctx, w, http.StatusOK, renderIPPrefixAPIResponse(prefixes, nil))
		return
	}

Since the client passed pending_delete without a value, Query().Get(“pending_delete”) returned an empty string. The API server consequently interpreted this as a request for all BYOIP prefixes, rather than only those intended for removal. The system then queued all returned prefixes for deletion. The new sub-task proceeded to systematically delete all BYOIP prefixes and all of their associated dependent objects, including service bindings, until the impact was recognized and an engineer terminated the sub-task.
Why the Bug Was Not Detected in Staging or Testing
Cloudflare’s staging environment aims to mirror Production data as closely as possible, but in this instance, the mock data used for simulation proved inadequate. 
Although tests existed for this functionality, the testing process and environment had incomplete coverage for this specific scenario. Initial testing and code review successfully focused on the BYOIP self-service API journey. While engineers successfully tested the customer-facing process, testing did not encompass a scenario where the task-runner service would autonomously execute changes to user data without explicit input.
Reasons for Non-Immediate Recovery
The recovery process was not immediate because affected BYOIP prefixes experienced varying degrees of impact, requiring more complex data recovery procedures. As part of Code Orange: Fail Small, a system is being developed to enable safe, health-mediated rollouts of operational state snapshots. This system would allow for rapid rollback to a known-good state if unexpected behavior occurs, but it is not yet in Production.
During the incident, BYOIP prefixes were in different states of impact, each demanding distinct actions:


Most affected customers only had their prefixes withdrawn. These customers could restore service by toggling their advertisements in the dashboard.


Some customers experienced both prefix withdrawal and the removal of certain bindings, leading to a partial recovery state where some prefixes could be toggled, but not others.


A third group of customers had their prefixes withdrawn and all service bindings removed. They were unable to toggle their prefixes in the dashboard due to the absence of a service (such as Magic Transit, Spectrum, or CDN) bound to them. Mitigating issues for these customers took the longest, as it required a global configuration update to reapply service bindings to every machine on Cloudflare’s edge.


Connection to Code Orange: Fail Small
The change being implemented at the time of the incident was part of the Code Orange: Fail Small initiative, which focuses on enhancing the resilience of Cloudflare’s code and configurations. This initiative comprises three main areas:


Implementing controlled rollouts for all network configuration changes, mirroring the current practice for software binary releases.


Modifying internal “break glass” procedures and eliminating circular dependencies to ensure rapid access and action across all systems during an incident.


Thoroughly reviewing, improving, and testing the failure modes of all systems managing network traffic to guarantee predictable behavior under all conditions, including unexpected error states.


The deployment attempt that led to the incident falls under the first category. The goal is to improve service reliability by transitioning risky, manual changes to secure, automated, health-mediated configuration updates.
Crucial work was already in progress to bolster the Addressing API’s configuration change support with staged test mediation and improved correctness checks. This work proceeded concurrently with the problematic deployment. Although preventative measures were not fully deployed before the outage, teams were actively developing these systems when the incident occurred. The Code Orange: Fail Small commitment to controlled rollouts for any change into Production has led engineering teams to meticulously examine all layers of the stack for issues. While this outage was not global, its unacceptable blast radius and impact underscore the continued priority of Code Orange: Fail Small until confidence is fully restored in the gradual and safe deployment of all network changes. Further details on system improvements are discussed below.
Remediation and Follow-up Steps
API Schema Standardization
A contributing factor to the incident was the interpretation of the pending_delete flag as a string, complicating value rationalization for both client and server. The API schema will be improved to enhance standardization, simplifying validation of API calls for testing and systems. This effort is part of the third Code Orange workstream, which aims to establish well-defined behavior under all conditions.
Improved Separation Between Operational and Configured State
Currently, customer changes to the addressing schema are stored in an authoritative database, which also serves operational actions. This setup complicates manual rollback processes, as engineers must rely on database snapshots rather than reconciling desired versus actual states. The rollback mechanism and database configuration will be redesigned to facilitate quick rollbacks and introduce layers between customer configuration and Production environments.
Data read from the database and applied to Production will be snapshotted. These snapshots will be deployed using the same health-mediated process as other Production changes, allowing automatic deployment halts if issues arise. This approach means that in future database corruption scenarios, individual customers (or all customers) can be near-instantly reverted to a working version.
While this might temporarily prevent customers from making direct API updates during an outage, it ensures continued traffic serving while the database is fixed, avoiding downtime. This work aligns with the first and second Code Orange workstreams, focusing on rapid rollback and safe, health-mediated configuration deployment.
Enhanced Arbitration for Large Withdrawal Actions
Monitoring will be enhanced to detect overly rapid or broad changes, such as quick BGP prefix withdrawals or deletions. This will trigger a disablement of snapshot deployments, acting as a circuit breaker to prevent any uncontrolled process manipulating the database from causing a large-scale impact, as seen in this incident.
Ongoing efforts also include direct monitoring of customer service behavior. Signals from these monitors can activate the circuit breaker, halting potentially dangerous changes until investigations are complete. This work aligns with the first Code Orange workstream, which emphasizes safe deployment of changes.
The timeline of events, including the change deployment and remediation steps, is detailed below:


2026-02-05 21:53 UTC: Broken sub-process code merged into the system.


2026-02-20 17:46 UTC: Address API release with the broken sub-process completes deployment.


2026-02-20 17:56 UTC: Impact Start - Broken sub-process begins execution. Prefix advertisement updates propagate, and prefixes start to be withdrawn.


2026-02-20 18:13 UTC: Cloudflare engaged due to failures on one.one.one.one.


2026-02-20 18:18 UTC: Internal incident declared; Cloudflare engineers continue investigation.


2026-02-20 18:21 UTC: Addressing API engineering team paged; debugging commences.


2026-02-20 18:46 UTC: Issue identified. Broken sub-process terminated by an engineer, and regular execution disabled; remediation begins.


2026-02-20 19:11 UTC: Mitigation begins. Cloudflare engineers start restoring serviceability for withdrawn prefixes, while others focus on removed prefixes.


2026-02-20 19:19 UTC: Some Prefixes Mitigated - Customers begin re-advertising prefixes via the dashboard to restore service.


2026-02-20 19:44 UTC: Additional mitigation continues. Engineers initiate database recovery methods for removed prefixes.


2026-02-20 20:30 UTC: Final Mitigation Process Begins - Engineers complete release to restore withdrawn prefixes with existing service bindings. Work continues on removed prefixes.


2026-02-20 21:08 UTC: Configuration Update Deploys - Engineering begins global machine configuration rollout to restore prefixes not self-mitigated or mitigated by previous efforts.


2026-02-20 23:03 UTC: Configuration Update Completed - Global machine configuration deployment to restore remaining prefixes is finalized. Impact Ends.


Cloudflare expressed profound apologies for the incident and its impact on customer services and the broader Internet. The company aims to provide a network resilient to change and acknowledges falling short of this commitment. Active improvements are being implemented to enhance stability and prevent recurrence.

Latest Post

YC Startups Can Now Receive Investment in Stablecoin

What is Alpha, the AI-only school of the future?

Cloudflare outage on February 20, 2026

How Cloudflare Mitigated a Vulnerability in its ACME Validation Logic

Mozilla Leaders Advocate for Open Source AI as a Path to Sovereignty at India AI Impact Summit

A Video Codec’s Emmy Win: The Story of AV1

ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

Latest Post

YC Startups Can Now Receive Investment in Stablecoin

What is Alpha, the AI-only school of the future?

Cloudflare outage on February 20, 2026

Latest Post

Cloudflare outage on February 20, 2026

How the Outage Impacted Customers

Service Impact Summary

Cloudflare’s Addressing API

How the Incident Occurred

Why the Bug Was Not Detected in Staging or Testing

Reasons for Non-Immediate Recovery

Connection to Code Orange: Fail Small

Remediation and Follow-up Steps

API Schema Standardization

Improved Separation Between Operational and Configured State

Enhanced Arbitration for Large Withdrawal Actions

Related Posts