Rate limiting in payment systems is different from rate limiting in a typical web API. A false positive – rejecting a legitimate authorization – is a failed transaction. A customer’s card gets declined at checkout. That is not an acceptable failure mode.
This article walks through the design of a rate limiter that protects infrastructure without creating false declines.
The Architecture
A payment authorization pipeline typically looks like this:
The rate limiter sits between the Parser API and the Distributor API. It must make a decision in under 1ms – any longer and it becomes the bottleneck in a sub-second pipeline.
Choosing an Algorithm
There are three common approaches, each with different trade-offs:
Token Bucket
The token bucket algorithm maintains a counter that refills at a fixed rate. Each request consumes one token. When tokens are exhausted, requests are rejected.
The math is straightforward. Given a bucket capacity $C$ and a refill rate $r$ tokens per second, the maximum burst size equals $C$, and the sustained throughput equals $r$.
$$\text{tokens}(t) = \min\left(C,\ \text{tokens}(t_0) + r \cdot (t - t_0)\right)$$
Sliding Window Log
Tracks the exact timestamp of every request in a time window. Precise, but memory-intensive.
For $n$ requests in the current window of duration $W$:
$$\text{rate} = \frac{n}{W}$$
Sliding Window Counter
A hybrid: divides time into fixed slots and interpolates between the current and previous slot.
$$\text{count} = \text{prev} \times \left(1 - \frac{t_{\text{elapsed}}}{W}\right) + \text{curr}$$
Pros: O(1) memory, O(1) time, allows bursts
Cons: No per-client fairness without separate buckets
Best for: Global throughput protection
Pros: Precise rate tracking, smooth distribution
Cons: O(n) memory for log variant, interpolation error for counter variant
Best for: Per-client or per-issuer fairness
The Distributed Coordination Problem
In a multi-region deployment, each instance of the rate limiter sees only local traffic. Without coordination, a client can exceed the global limit by spreading requests across regions.
but combined rate
may exceed limit
Implementation in Go
The token bucket is the right choice for our use case: O(1) operations, burst tolerance, and simple distributed coordination via atomic counters.
type TokenBucket struct {
mu sync.Mutex
tokens float64
capacity float64
rate float64
lastTime time.Time
}
func (tb *TokenBucket) Allow() bool {
tb.mu.Lock()
defer tb.mu.Unlock()
now := time.Now()
elapsed := now.Sub(tb.lastTime).Seconds()
tb.tokens = math.Min(tb.capacity, tb.tokens+tb.rate*elapsed)
tb.lastTime = now
if tb.tokens >= 1 {
tb.tokens--
return true
}
return false
}
sync/atomic with CAS operations for lock-free performance, or a sharded bucket per goroutine with periodic merging.
Capacity Planning
For a system processing $\lambda$ transactions per second with target rejection rate below $\epsilon$:
$$C \geq \lambda \cdot T_{\text{burst}}$$
where $T_{\text{burst}}$ is the expected burst duration.
$$r \geq \lambda \cdot (1 + \sigma)$$
where $\sigma$ is the traffic variance coefficient.
If your p99 latency budget is $L$ milliseconds and the rate limiter check takes $\delta$ ms:
$$L_{\text{remaining}} = L - \delta$$
For our authorization pipeline where $L = 200\text{ms}$ and the rate limiter adds $\delta = 0.3\text{ms}$:
$$L_{\text{remaining}} = 200 - 0.3 = 199.7\text{ms}$$
The overhead is negligible – which is exactly the point. A rate limiter that measurably impacts latency is a rate limiter that needs to be redesigned.
Key Takeaways
Design decisions for payment rate limiters:
- Token bucket for global protection, sliding window for per-client fairness
- Local-first with async sync beats centralized coordination for latency
- Set capacity from measured burst patterns, not theoretical maximums
- Monitor rejection rate as a business metric, not just an infra metric
Return 429 Too Many Requests
Client retries with backoff
False positives are annoying but recoverable
Return decline response code in ISO 8583
No retry – transaction is failed
False positives are declined cards at checkout
The difference between rate limiting a REST API and rate limiting an authorization pipeline is the cost of a false positive. In payments, you are not protecting a server – you are deciding whether someone’s groceries get paid for.