That's an unwrap : thoughts on Cloudflare's FL2 outage

At November 18th, 2025, the global internet faced turmoil when Cloudflare’s network was down for a few hours.

Although I now work on a product that has no direct dependency on Cloudflare, it was interesting to see how it played as a long-distance fragility role in our internal systems, simply because too many of our third-party partners were actually down as part of the outage. At some point, our main product experience was so degraded that we found ourselves facing a P1 incident as well. Another lesson in chaos engineering, I suppose!

Cloudflare published a post-mortem covering what happened in detail; I definitely recommend reading it. A couple of factors drove the outage, but the subject of this writing is how poor error handling in Rust-powered code played a key factor in what happened.

A semantic footgun#

One of the things that I truly love about Rust is the take it has on error handling, heavily inspired by functional programming. In my opinion, Rust get a lot of things right on this topic, and in particular, I now find myself struggling when I have to work with programming languages that do not natively support errors as values.

Obviously, “getting a lot of things right” does not imply perfection. Rust’s take on error handling has some footguns around, especially when considering the semantics of methods associated with the Option and Result types.

In this case, the culprit of the outage was the unwrap method available for the Result type API. Below I share an adapted version from the snippet one can find in Cloudflare’s post-mortem post:

///
/// Adapted from: https://blog.cloudflare.com/18-november-2025-outage
///
pub fn fetch_features (
    &mut self,
    input: &dyn BotsInput,
    features: &mut Features,
) -> Result<(), (ErrorFlags, i32) > {

    features.checksum &= 0xFFFF_FFFF_0000_0000;
    features.checksum |= u64::from(self.config.checksum);

    let (feature_values, _) = features
        .append_with_names(&self.config.feature_names)
        .unwrap(); // <- Should we do this?

    // more code here
}

According to the terminology used in the book, Rust groups errors in two distinct categories: recoverable and unrecoverable. Recoverable errors are modeled with the aforementioned Result type, while unrecoverable errors terminate the program execution, e.g. when using the panic macro.

However, some operations over a recoverable error might produce an unrecoverable outcome, and that was exactly what happened in the aforementioned outage.

The Result::unwrap and Result::expect methods are shortcuts intended to access the value wrapped under the Ok variant of a Result enum instance, but they incur in an unrecoverable error if called on an Err variant instead. In the case of Cloudflare’s outage, the programmer trusted a data input with constraints externally defined, and assumed it would be safe to unwrap what was returned from the Features::append_with_names method execution. Therefore, the aforementioned code reveals a validation issue, not a Rust issue per se.

Personally, after starting with Rust some years ago, I began avoiding Result::unwrap and defaulted to using Result::expect in the very few situations where I decided that crashing the program was the correct course of action, primarily because the later allows for a meaningful message to be attached to the program termination.

I now consider the use of Result::unwrap and Option::unwrap code smells in Rust codebases, with a single exceptional (and debatable) case to discuss later.

The reality is that both unwrap and expect suffer from a fundamental naming problem, as these function names do not capture the outcome when execution follows the unhappy path. This opinion is also held by well-respected individuals within the Rust community.

Since unwrap is a convenient - and naive! - way to circumvent pattern matching when handling errors, and because in many scenarios naiveness or lack of attention will not cause an internet outage anyway, the semantic footgun is always awaiting in the corner, and it is ultimately up to the Rust programmer to avoid it.

Linting to the rescue#

Unsurprisingly, Rust’s tooling already has an answer to this issue.

The official linter supports a way to automatically flag this code smell and break a build when one forgets about it. I’ve been using this setup (or some variation of it) in all my Rust projects.

In your root Cargo.toml, add:

[workspace.lints.clippy]
unwrap_used = "deny"

Then, in any Cargo modules (either root one or nested within your Workspace), add:

[lints]
workspace = true

That’s it. Easy and simple. Furthermore, you can tweak clippy even more, especially when aiming to panic-free code:

[workspace.lints.clippy]
expect_used = "deny"
unwrap_used = "deny"

# just deny more things

The only situation where I believe explicit usage of Result::unwrap or Option::unwrap is tolerable is within test code, as it can make statements shorter to read. Furthermore, a panic within the arrange or act phases of the test might flag an error in the test code itself, and failing the test eagerly may be useful.

Personally, I also prefer to rely on expect rather than unwrap in test code, mainly because I want meaningful descriptions if my test aborts before reaching my assertions. However, if one wants to allow unwrap only in tests, this can be done using a specific entry in the clippy.toml file:

allow-unwrap-in-tests = true

Final remarks#

Rust is more present in critical infrastructure today than it was yesterday. This fact sometimes annoys and envy people, who spend time writing nonsense like “Rust safety guarantees are just marketing”.

As a programming language, Rust never promised bug-free code. There are plenty of ways one can shot (him|her)self in the foot, and it is up to the Software Engineer to make the most of existing tools to avoid common mistakes.

That being said, I believe the unwrap and expect methods associated with Result and Option types could be deprecated in favor of better-named replacements and potentially removed from std in a future edition of Rust. Meanwhile, more opinionated clippy configurations could be made the default when generating new Rust projects with cargo, making people aware of well-known issues as soon as possible.

In any case, Cloudflare’s outage is already settled as part of the history of the internet. I appreciate the level of detail Cloudflare’s CTO provided in the post-mortem, but the question I see no one asking about it, and which I ask here is : was this silly Rust mistake - which is so common among people learning Rust, and so just easy to copycat based on all sorts of related content available on the public internet - made by an AI Agent who codes like a 10x Junior Engineer?

Some food for thought, I guess…

Although I find it funny that the outage helped OpenAI and Anthropic since for some hours both stopped losing money, if my business got harmed due to such an outage, I would expect a clear statement from Cloudflare on this particular topic.

Afterall, when paying customers lose money, they may decide to enforce contractual agreements. And, just like that, one more market advantage prophesied by AI preachers just vanishes away, at the speed of millions of requests per second.

That’s an unwrap : thoughts on Cloudflare’s FL2 outage

A semantic footgun#

Linting to the rescue#

Final remarks#