November 19, 2025

An intermittent outage at Cloudflare on Tuesday briefly knocked many of the Internet’s top destinations offline. Some affected Cloudflare customers were able to pivot away from the platform temporarily so that visitors could still access their websites. But security experts say doing so may have also triggered an impromptu network penetration test for organizations that have come to rely on Cloudflare to block many types of abusive and malicious traffic.

At around 6:30 EST/11:30 UTC on Nov. 18, Cloudflare’s status page acknowledged the company was experiencing “an internal service degradation.” After several hours of Cloudflare services coming back up and failing again, many websites behind Cloudflare found they could not migrate away from using the company’s services because the Cloudflare portal was unreachable and/or because they also were getting their domain name system (DNS) services from Cloudflare.

However, some customers did manage to pivot their domains away from Cloudflare during the outage. And many of those organizations probably need to take a closer look at their web application firewall (WAF) logs during that time, said Aaron Turner, a faculty member at IANS Research.

Turner said Cloudflare’s WAF does a good job filtering out malicious traffic that matches any one of the top ten types of application-layer attacks, including credential stuffing, cross-site scripting, SQL injection, bot attacks and API abuse. But he said this outage might be a good opportunity for Cloudflare customers to better understand how their own app and website defenses may be failing without Cloudflare’s help.

“Your developers could have been lazy in the past for SQL injection because Cloudflare stopped that stuff at the edge,” Turner said. “Maybe you didn’t have the best security QA [quality assurance] for certain things because Cloudflare was the control layer to compensate for that.”

Turner said one company he’s working with saw a huge increase in log volume and they are still trying to figure out what was “legit malicious” versus just noise.

“It looks like there was about an eight hour window when several high-profile sites decided to bypass Cloudflare for the sake of availability,” Turner said. “Many companies have essentially relied on Cloudflare for the OWASP Top Ten [web application vulnerabilities] and a whole range of bot blocking. How much badness could have happened in that window? Any organization that made that decision needs to look closely at any exposed infrastructure to see if they have someone persisting after they’ve switched back to Cloudflare protections.”

Turner said some cybercrime groups likely noticed when an online merchant they normally stalk stopped using Cloudflare’s services during the outage.

“Let’s say you were an attacker, trying to grind your way into a target, but you felt that Cloudflare was in the way in the past,” he said. “Then you see through DNS changes that the target has eliminated Cloudflare from their web stack due to the outage. You’re now going to launch a whole bunch of new attacks because the protective layer is no longer in place.”

Nicole Scott, senior product marketing manager at the McLean, Va. based Replica Cyber, called yesterday’s outage “a free tabletop exercise, whether you meant to run one or not.”

“That few-hour window was a live stress test of how your organization routes around its own control plane and shadow IT blossoms under the sunlamp of time pressure,” Scott said in a post on LinkedIn. “Yes, look at the traffic that hit you while protections were weakened. But also look hard at the behavior inside your org.”

Scott said organizations seeking security insights from the Cloudflare outage should ask themselves:

1. What was turned off or bypassed (WAF, bot protections, geo blocks), and for how long?
2. What emergency DNS or routing changes were made, and who approved them?
3. Did people shift work to personal devices, home Wi-Fi, or unsanctioned Software-as-a-Service providers to get around the outage?
4. Did anyone stand up new services, tunnels, or vendor accounts “just for now”?
5. Is there a plan to unwind those changes, or are they now permanent workarounds?
6. For the next incident, what’s the intentional fallback plan, instead of decentralized improvisation?

In a postmortem published Tuesday evening, Cloudflare said the disruption was not caused, directly or indirectly, by a cyberattack or malicious activity of any kind.

“Instead, it was triggered by a change to one of our database systems’ permissions which caused the database to output multiple entries into a ‘feature file’ used by our Bot Management system,” Cloudflare CEO Matthew Prince wrote. “That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.”

Cloudflare estimates that roughly 20 percent of websites use its services, and with much of the modern web relying heavily on a handful of other cloud providers including AWS and Azure, even a brief outage at one of these platforms can create a single point of failure for many organizations.

Martin Greenfield, CEO at the IT consultancy Quod Orbis, said Tuesday’s outage was another reminder that many organizations may be putting too many of their eggs in one basket.

“There are several practical and overdue fixes,” Greenfield advised. “Split your estate. Spread WAF and DDoS protection across multiple zones. Use multi-vendor DNS. Segment applications so a single provider outage doesn’t cascade. And continuously monitor controls to detect single-vendor dependency.”


19 thoughts on “The Cloudflare Outage May Be a Security Roadmap

  1. DadJokes78

    I’m torn about their postmortem. I applaud their candor and transparency, and I wish more companies would do that. Overall this was a refreshing change from the kind of ‘root cause analysis’ that most companies put out that are nothing of the sort.

    And then they had to ruin that perfect 10/10 with pitching their products at the end, including talking about how Cloudflare helps customers on their journey to ‘Zero Trust’ in complete deafness to the fact that their own post-mortem says they don’t practice what they preach re: they trusted the developer input and didn’t sanitize it like they would user-generated input, which is the opposite of zero-trust. They should have just stopped on the high note.

    Reply
    1. ramalamadingdong

      I think ‘Zero Trust’ happens to be the perfect phrase for the times; I suspect we’ll be in this milieu forever, there, Phil.

      Reply
      1. DadJokes78

        To be fair, the technical analysis didn’t try to hide their mistakes, or handwave it away, or make excuses. It was beautiful. My comment really was more about keeping your marketing folks away from your post-mortems.

        Reply
    1. lowly pipette

      Hey, Catwhisperer, did you know IP addresses are declarations of love or being hunted from strangers? Ask me about a great deal on a 46.

      Reply
  2. OldFuddyDuddy

    In my experience app developers and operations group don’t do a great job at protecting their apps. It’s very hard to do comprehensively. They can pass PCI audits, do many things right, and still be at risk. External WAFs are essential. Automated bots would thus probably get some penetration in the window with Cloudflare down. I wonder how many criminal orgs would be able to see Cloudflare down, and pivot to take advantage of that window?

    Reply
  3. mark

    Why was there a permission change? The permissions on that should *not* be changing.

    Then there’s the software issue – when it his 200, it should have either rolled the list, or cut it, it should *not* have crashed.

    Reply
  4. Jason Reed

    Good post — this outage really shines a light on how much we rely on Cloudflare as a security layer. As Krebs points out, it’s a wake-up call: companies need to audit what protections they’ve let Cloudflare handle, and build fallback plans.

    Reply
    1. mealy

      Well Greenfield’s suggestions about diversifying ops are good, obvious on their face observations about eggs in single baskets, but the last line is key : “And continuously monitor.” The more you diversify the more you have to monitor. More is not always better! Cloudflare as a single service solution is appealing for this reason. You don’t have to have (and maintain) the diversified on-point IT team with expertise in all the various aspects, you have a few core people and a streamlined firewatch. Companies that can’t/won’t/don’t maintain that sizeable in-house expert team for cost or other personnel reasons aren’t even going to be able to diversify AND keep things buttoned up effectively. Auditing is outsourced too. So it’s a big wake-up call for a patient under heavy sedation. But what if there were a competitor to CF that did basically all the things they do, differentiated on the backend for exactly this type of backup/redundancy? Or a few of them? If there’s only one basket that can ‘handle’ all the eggs, it’s a problem that I’d guess most medium sized orgs are not going to effectively step up and solve on their own. A basket with holes is not much better than no baskets at all.

      Reply
      1. gonna sell that monitor 25 years ago soon, there, piper

        Cloudflare is about as amazing as crowdsourcing. How we doing at our progress towards dystopia, there, Mother Earth?

        I’d respond more but apparently Mono on my resume is making me sick.

        Good job!

        Reply
  5. Joe

    As an Akamai customer I am just sorry to read that most of people assume outages are the new trends . Like it’s normal.

    Hope you enjoyed Cloudflare was down for the fourth time in 2025. You guys level down (including your business).

    Reply
    1. Mike H.

      I think the outages in 2018/2019 were more memorable to me. I kinda remember them from around when krakatau was forming.

      Reply
  6. Hurricane Andrew

    I also applaud their candor and quick turnaround for this postmortem. What I still don’t grasp, however, is how a single file doubling in size could create such a cascade. If the file were constantly doubling, that’s one thing, but that’s not what they said. I still think there’s a bit more to this than what’s been disclosed so far, and if that’s out of op sec, I’m totally fine with that. Just strikes me as a bit of a head scratcher at this point.

    Reply
    1. mealy

      It said it was larger than expected, maybe so large it crashed the service – or concatenated into looping etc. When things start to snowball they can lead to all kinds of unpredictable failure models. This was in their bot-sniffer protections so if looking for the ‘wrong’ things it’s tons of false positives. That can trigger automatic mitigations… etc.

      Reply
      1. mealy

        “The explanation was that the file was being generated every five minutes by a query running on a ClickHouse database cluster, which was being gradually updated to improve permissions management. Bad data was only generated if the query ran on a part of the cluster which had been updated. As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network.

        This fluctuation made it unclear what was happening as the entire system would recover and then fail again as sometimes good, sometimes bad configuration files were distributed to our network.”

        Best laid plans.

        Reply
    2. mealy

      Perhaps the doubling is not the critical flaw, what was IN the doubled file that killed things.

      Reply
      1. mealy

        “The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail.” Reading > Guessing. Failure point : me.

        Reply
    3. ramalamadingdong

      How’s your subscription to Mark Monitor going? Mine seems to be having issues with MDNS.

      Reply
  7. Impossibly Stupid

    > “Your developers could have been lazy in the past for SQL injection because Cloudflare stopped that stuff at the edge,” Turner said. “Maybe you didn’t have the best security QA [quality assurance] for certain things because Cloudflare was the control layer to compensate for that.”

    In my experience, companies with hires like that simply don’t have anybody in-house to undo the knock-on effects of Cloudflare failures. When management fosters a corporate culture of externalizing those kinds of *basic* job practices, incidents like this aren’t a wake up call, they are a CYA scramble.

    > Turner said one company he’s working with saw a huge increase in log volume and they are still trying to figure out what was “legit malicious” versus just noise.

    It’s *all* legit! I see PHP requests in my web server logs every day, but I don’t use PHP for anything. That’s *not* noise, that is a network signaling me that it is an active source of attacks! Cloudflare’s greatest sin is that it does *nothing* to remove those bad actors from the Internet, and instead seeks to profit from sweeping that malicious traffic under the rug.

    > “Yes, look at the traffic that hit you while protections were weakened. But also look hard at the behavior inside your org.”

    Absolutely right. But, again, these are unlikely to be organizations that are doing *anything* that would attract people with that skill set to work for them. Odds are, they aggressively acted to *purge* anybody like that when they decided to go with Cloudflare in the first place, patting themselves on the back and getting a big bonus.

    Reply

Leave a Reply to OldFuddyDuddy Cancel reply

Your email address will not be published. Required fields are marked *