Designing a Distributed Rate Limiter
Token bucket vs sliding window, where to store the counters, and how to keep a rate limiter correct when it's spread across many servers.
The Problem
You need to cap each API client at, say, 100 requests per minute. On a single server this is a counter in memory. Across a fleet of servers behind a load balancer, that counter has to be shared — and now you have a distributed systems problem hiding inside a one-line requirement.
Why It Matters
Rate limiting protects you from abuse, runaway clients, and accidental self-inflicted load. But a limiter that's wrong under concurrency either lets too much through (no protection) or rejects legitimate traffic (angry users). Correctness under races is the whole game.
Core Concepts
Two algorithms dominate:
- Token bucket: a bucket refills at a fixed rate up to a cap; each request removes a token. It allows bursts up to the bucket size, which is usually what you want.
- Sliding window: count requests in the trailing time window. More precise, but requires tracking timestamps or weighting two fixed windows.
The hard part isn't the algorithm — it's making the counter update atomic across many servers.
Implementation
Redis is the usual home for the shared counter, and a Lua script makes the read-modify-write atomic so concurrent servers can't race:
-- token_bucket.lua: KEYS[1]=bucket, ARGV: rate, capacity, now, requested
local tokens = tonumber(redis.call("HGET", KEYS[1], "tokens") or ARGV[2])
local last = tonumber(redis.call("HGET", KEYS[1], "ts") or ARGV[3])
local elapsed = math.max(0, tonumber(ARGV[3]) - last)
local refill = elapsed * tonumber(ARGV[1])
tokens = math.min(tonumber(ARGV[2]), tokens + refill)
local allowed = tokens >= tonumber(ARGV[4])
if allowed then tokens = tokens - tonumber(ARGV[4]) end
redis.call("HSET", KEYS[1], "tokens", tokens, "ts", ARGV[3])
redis.call("EXPIRE", KEYS[1], 60)
return allowed and 1 or 0
Because the whole script runs atomically inside Redis, two servers checking the same key at the same instant can't both spend the last token.
Common Mistakes
- Read-then-write in app code.
GETthe count, increment,SETit back — two servers interleave and both think they're under the limit. Use an atomic operation or a Lua script. - Per-server counters. Limiting per instance means the real limit is your cap times the number of servers.
- Fixed windows at the boundary. A naive per-minute counter allows a double burst straddling the minute boundary. Sliding windows smooth this out.
Production Considerations
Decide what happens when the limiter store is unavailable. Fail open (allow traffic) keeps the product working but drops protection; fail closed (reject) protects the backend but turns a Redis blip into an outage. Most user-facing APIs fail open with an alert; abuse-sensitive endpoints fail closed.
Security
Key the limiter on something the client can't trivially rotate — an authenticated account id beats a raw IP, which is shared behind NAT and easy to spoof at the edge. Layer limits: per-key, per-IP, and a global ceiling.
Performance
A single Redis round-trip per request adds well under a millisecond and scales to tens of thousands of checks per second on one node. If that round-trip becomes the bottleneck, shard by key prefix or push a coarse first check to the edge.
Summary
Pick token bucket for burst-friendly limits, keep the counter in a shared store, and make every update atomic with a Lua script so concurrency can't corrupt it. Decide your fail-open vs fail-closed policy deliberately, and key limits on identity rather than IP. The algorithm is simple; correctness under races is the engineering.
The weekly engineering digest
Production-grade engineering writing in your inbox. No spam, unsubscribe anytime.