What backpressure actually means
Backpressure is a flow-control technique. When a downstream component — a database, a queue consumer, a microservice — is processing work more slowly than it is receiving it, backpressure gives it a mechanism to signal that fact upstream, so the producer can slow down, pause, or shed load rather than continuing to flood the system.
The term comes from fluid dynamics, where it describes resistance in a pipe that pushes back against flow. In computing, the analogy holds: without resistance, a fast producer and a slow consumer will eventually exhaust memory, saturate a queue, or trigger cascading failures across dependent services.
Why this keeps being rediscovered
Backpressure is not a new idea. TCP's congestion window — the mechanism that prevents a sender from overwhelming a receiver — is a form of backpressure baked into the internet's foundational protocol. Unix pipes implement it at the operating system level: a writer blocks when the pipe buffer is full until the reader catches up.
Reactive programming frameworks, including those following the Reactive Streams specification (a standard for asynchronous stream processing with non-blocking backpressure, formalized in 2015 and later incorporated into Java 9), made backpressure a first-class concept at the application layer.
And yet, as Lucas F. Costa argues in a detailed technical post, the pattern is routinely absent from the service architectures engineers build today. The reasons are structural: microservices communicate over HTTP or message queues that, by default, do not propagate capacity signals backward through a call chain. A service that is overwhelmed returns errors or times out; it does not, unless explicitly designed to, tell its callers to slow down.
The failure modes backpressure prevents
The most common failure pattern in its absence is the retry storm. A slow downstream service begins returning errors. Callers, following standard reliability guidance, retry. Retries increase load on an already-struggling service. The service degrades further. More retries follow. The cycle is self-reinforcing and can take down services that were healthy before the incident began.
Unbounded queues create a related problem. A queue that accepts work indefinitely appears to absorb load spikes gracefully — until it doesn't. When the consumer finally falls far enough behind, the queue grows until memory is exhausted, or the latency of processing any given item becomes so high that it is effectively useless. The queue provided the illusion of resilience without the substance of it.
Autoscaling, a common proposed remedy, addresses neither problem directly. It adds capacity reactively, after a signal (CPU utilization, queue depth, request latency) crosses a threshold. The lag between signal and new capacity coming online — typically measured in minutes — is long enough for a spike to cause significant damage. Backpressure, by contrast, shapes load in real time.
What implementing it actually requires
Backpressure is not a library you add; it is a design decision that propagates through a system. A service that wants to apply backpressure must first be able to measure its own capacity — how much work it can accept without degrading. It must then expose that signal in a form callers can act on: an HTTP 429 (Too Many Requests) response, a queue that stops accepting messages, a gRPC flow-control signal.
Callers must be designed to receive and respect that signal. This is the part that is most often missing. A service that returns 429 to a caller that simply retries immediately has not implemented backpressure; it has implemented a more expensive version of the original problem.
Finally, there is the question of what happens at the edge of the system — the point where load cannot be pushed further upstream. At that boundary, a system must make an explicit choice: shed load (drop requests, return errors to end users), buffer (accept the work but delay it), or surface the constraint visibly so that operators can respond. Each choice has tradeoffs. None of them is free. The value of backpressure is not that it eliminates that choice, but that it makes the choice deliberate rather than accidental.
The broader argument
Costa's framing — that backpressure is 'all you need' — is deliberately provocative. The stronger version of the claim is that a large proportion of distributed systems reliability failures are, at their root, backpressure failures: systems that lacked a mechanism to say 'not yet' and paid for it when load exceeded capacity.
That framing is speculative in its generality, and should be read as such. What is less speculative is the narrower point: backpressure is a well-understood, well-proven mechanism that remains underimplemented in application-layer service design, and the cost of that gap shows up reliably in incident reports.