A Critical Patch, a Brief Outage — and Bigger Questions for Internet Resilience
On the morning of December 5, 2025, at precisely 08:47 UTC, a substantial portion of Cloudflare’s global network faltered. Within minutes, thousands of websites — including major online services such as LinkedIn, Zoom and Downdetector — began returning HTTP 500 errors. Roughly 28 % of all HTTP traffic carried by Cloudflare was affected in this window, according to the company’s post-mortem.
By 09:12 UTC the same morning, the error-inducing configuration change had been reverted and traffic had returned to normal — meaning the disruption lasted 25 minutes. But the short timespan belied the scope of the impact: globally significant services experienced failures or sluggishness, prompting renewed scrutiny on the fragility of digital infrastructure.
Cloudflare’s CTO, Dane Knecht, later acknowledged in a blog post that the outage was “unacceptable,” particularly so soon after a previous incident on November 18.
What Actually Went Wrong
The root cause of the outage was not an external cyber-attack — Cloudflare emphasized that point. Instead, the disruption stemmed from internal changes made in response to a severe security vulnerability in the web framework React’s server-side component implementation. The flaw, tracked under CVE-2025-55182 (widely referred to as “React2Shell”), had been disclosed earlier in the week — and reportedly was already being actively exploited by threat actors.
To shield customers who had not yet patched their applications, Cloudflare began rolling out an emergency mitigation: increasing the buffer size for HTTP request bodies in its Web Application Firewall (WAF) from 128 KB to 1 MB — aligning with the default buffer size used by modern React/Next.js apps.
During this rollout, Cloudflare discovered that an internal WAF testing tool did not support the larger buffer size. That tool was subsequently disabled — via Cloudflare’s global configuration mechanism, which propagates changes rapidly across its entire network. Unlike the buffer size change, this disabling was not deployed gradually.
Unfortunately, in Cloudflare’s older “FL1” proxy — still in use in parts of their infrastructure — disabling the testing tool triggered a dormant bug. A block of Lua code used to handle request-routing failed because it assumed a field existed that had been removed (the “execute” object in a rule result). The result: the proxy threw an exception and returned HTTP 500 errors for every request it attempted to handle under those configurations.
The graph below shows HTTP 500 errors served by Cloudflares network during the incident timeframe (red line at the bottom), compared to unaffected total Cloudflare traffic (green line at the top).

As soon as Cloudflare engineers identified the problem, they reverted the change. By 09:12 UTC, normal traffic flow had been restored.
The Broader Context — November’s Outage and a Pattern of Risk
This is not the first time Cloudflare has destabilized large swathes of the internet in recent weeks. On November 18, 2025, the company suffered a far longer outage — lasting around three hours — after a bug in the generation logic for a “Bot Management” configuration file caused widespread HTTP errors and service disruptions across many major platforms.
At the time, websites including social networks, AI-powered services, and content platforms went offline or became unreachable for some period, illustrating how deeply modern digital services depend on relatively few infrastructure providers.
The December 5 incident — though shorter — echoes the same structural weakness: a single misconfigured update triggers cascading failures for a large fraction of internet traffic. Industry observers, already unsettled by the November outage, warned that repeated incidents like this risk eroding trust in large centralized infrastructure providers.
Why Cloudflare Chose — or Felt It Had to Take — the Risk
To some, it might seem surprising that Cyber-security firm Cloudflare would risk downtime in the name of defending against an upstream vulnerability. But stakeholders familiar with the pace of exploitation insisted action was urgent. Security reporting indicates that the React2Shell bug was not just theoretical — within hours of its public disclosure, multiple “China-nexus” threat groups reportedly began scanning the internet and launching attacks against unpatched React/Next.js web apps.
From Cloudflare’s perspective, failing to patch rapidly would expose potentially thousands of customer sites to remote code execution attacks — possibly far more damaging than a brief outage. Indeed, the firm chose to apply mitigations globally, presumably under the assumption that short disruption was a manageable trade-off. As one security analysis put it, “Cloudflare determined that the React2Shell was so dangerous that it was willing to sustain a short 25-minute outage to fix the bug.”
Still, the decision underscores the tension between security and stability in modern web infrastructure: patching quickly is often essential — but so is ensuring that patching doesn’t itself break things.
What Happens Next — Fixes, Transparency, and Resilience
In the official post-mortem, Cloudflare promised to share detailed plans aimed at preventing single updates from causing widespread impact. Among the measures outlined:
- Improved rollout mechanisms and versioning for configuration updates and security-related changes, to match the care taken when deploying software.
- More robust “break-glass” procedures for critical operations, so that failures in one subsystem don’t cascade network-wide.
- A shift toward a more “fail-open” error-handling model in critical data-plane components: where feasible, mis-configurations or corrupt configuration should default to a safe baseline (e.g., pass traffic without scoring) rather than dropping all requests.
In short: Cloudflare committed to bolstering its internal safeguards — but acknowledged that some of these changes had been planned and were not yet fully deployed. The December 5 outage, then, is a stark demonstration of what can go wrong while in the middle of that transition.
What It Means for the Internet — and What Users Can Do
For many internet users, the morning disruption may have felt like a fleeting flash of downtime — a handful of “500 Internal Server Error” messages, maybe a pause, then everything back to normal. But the broader significance is far less trivial.
As analysts and commentators point out, outages like these highlight a growing structural vulnerability: a high degree of concentration in global web infrastructure. When a single vendor — even one as large as Cloudflare — missteps, thousands of websites and services can suffer simultaneously.
For businesses that rely on web services, content delivery, or third-party APIs — particularly over the long term — this raises urgent questions about resilience. Do you trust a single vendor? Do you build in fallback paths or a multi-cloud approach? Do you demand greater transparency from your infrastructure providers?
Meanwhile, for infrastructure providers themselves, the December 5 incident is a wake-up call: protecting against external threats (in this case, a critical vulnerability) is essential — but so is internal governance, code hygiene, and rollout discipline.
For now, Cloudflare’s quick fix and pledge of reforms may restore confidence — but the real test will come when future updates are deployed under the revised, hardened systems they have promised.



