Skip to content
$EngineeringAtlas

Designing a Distributed Rate Limiter

Token bucket vs sliding window, where to store the counters, and how to keep a rate limiter correct when it's spread across many servers.

Amit Kumar Singh3 min read

The Problem

You need to cap each API client at, say, 100 requests per minute. On a single server this is a counter in memory. Across a fleet of servers behind a load balancer, that counter has to be shared — and now you have a distributed systems problem hiding inside a one-line requirement.

Why It Matters

Rate limiting protects you from abuse, runaway clients, and accidental self-inflicted load. But a limiter that's wrong under concurrency either lets too much through (no protection) or rejects legitimate traffic (angry users). Correctness under races is the whole game.

Core Concepts

Two algorithms dominate:

  • Token bucket: a bucket refills at a fixed rate up to a cap; each request removes a token. It allows bursts up to the bucket size, which is usually what you want.
  • Sliding window: count requests in the trailing time window. More precise, but requires tracking timestamps or weighting two fixed windows.

The hard part isn't the algorithm — it's making the counter update atomic across many servers.

Implementation

Redis is the usual home for the shared counter, and a Lua script makes the read-modify-write atomic so concurrent servers can't race:

-- token_bucket.lua: KEYS[1]=bucket, ARGV: rate, capacity, now, requested
local tokens = tonumber(redis.call("HGET", KEYS[1], "tokens") or ARGV[2])
local last   = tonumber(redis.call("HGET", KEYS[1], "ts") or ARGV[3])

local elapsed = math.max(0, tonumber(ARGV[3]) - last)
local refill  = elapsed * tonumber(ARGV[1])
tokens = math.min(tonumber(ARGV[2]), tokens + refill)

local allowed = tokens >= tonumber(ARGV[4])
if allowed then tokens = tokens - tonumber(ARGV[4]) end

redis.call("HSET", KEYS[1], "tokens", tokens, "ts", ARGV[3])
redis.call("EXPIRE", KEYS[1], 60)
return allowed and 1 or 0

Because the whole script runs atomically inside Redis, two servers checking the same key at the same instant can't both spend the last token.

Common Mistakes

  • Read-then-write in app code. GET the count, increment, SET it back — two servers interleave and both think they're under the limit. Use an atomic operation or a Lua script.
  • Per-server counters. Limiting per instance means the real limit is your cap times the number of servers.
  • Fixed windows at the boundary. A naive per-minute counter allows a double burst straddling the minute boundary. Sliding windows smooth this out.

Production Considerations

Decide what happens when the limiter store is unavailable. Fail open (allow traffic) keeps the product working but drops protection; fail closed (reject) protects the backend but turns a Redis blip into an outage. Most user-facing APIs fail open with an alert; abuse-sensitive endpoints fail closed.

Security

Key the limiter on something the client can't trivially rotate — an authenticated account id beats a raw IP, which is shared behind NAT and easy to spoof at the edge. Layer limits: per-key, per-IP, and a global ceiling.

Performance

A single Redis round-trip per request adds well under a millisecond and scales to tens of thousands of checks per second on one node. If that round-trip becomes the bottleneck, shard by key prefix or push a coarse first check to the edge.

Summary

Pick token bucket for burst-friendly limits, keep the counter in a shared store, and make every update atomic with a Lua script so concurrency can't corrupt it. Decide your fail-open vs fail-closed policy deliberately, and key limits on identity rather than IP. The algorithm is simple; correctness under races is the engineering.

Amit Kumar Singh

// written by

Amit Kumar Singh

Software engineer writing about backend systems, cloud, and the realities of running code in production.

$ subscribe --weekly

The weekly engineering digest

Production-grade engineering writing in your inbox. No spam, unsubscribe anytime.

## related

[Backend]▲ trending

Designing Idempotent APIs That Survive Retries

Networks fail, clients retry, and duplicate requests happen. Here's how to design write endpoints that produce the same result no matter how many times they're called.

Amit Kumar Singh3 min read