Rate Limiter
TrafficControls request throughput using algorithms like token bucket and sliding window to protect services from overload and abuse.
Overview
A Rate Limiter is a traffic control component that restricts the number of requests a client or service can make within a defined time window. It is a fundamental defense mechanism in distributed systems, protecting backend services from being overwhelmed by excessive traffic — whether from legitimate spikes, misbehaving clients, or malicious DDoS attacks. In Vetora's simulator, the Rate Limiter component models different limiting algorithms, burst handling, and the interaction between rate limits and upstream retry behavior.
The token bucket algorithm is the most widely used rate limiting approach. Imagine a bucket that holds a fixed number of tokens (the burst capacity). Tokens are added at a constant rate (the sustained rate). Each request consumes one token. If the bucket is empty, the request is rejected or queued. This algorithm elegantly handles bursts — a client can send a burst of requests up to the bucket capacity, then must slow down to the token refill rate. Vetora models token bucket behavior and shows how burst capacity affects tail latency.
The sliding window algorithm counts requests within a rolling time window (e.g., the last 60 seconds). Unlike fixed window counters that reset at interval boundaries — creating a boundary problem where 2x the limit can be sent across two adjacent windows — sliding window provides smooth, continuous rate enforcement. The tradeoff is higher memory usage since each individual request timestamp must be tracked. In practice, a hybrid sliding window log/counter approach balances accuracy with efficiency.
The fixed window algorithm divides time into fixed intervals (e.g., per minute) and counts requests in each interval. It is the simplest to implement but has the boundary spike problem: if the limit is 100/minute and a client sends 100 requests at 0:59 and 100 more at 1:01, they effectively send 200 requests in 2 seconds. Despite this weakness, fixed window is popular because it is trivially implementable with atomic counters in Redis.
Distributed rate limiting adds another dimension of complexity. When your rate limiter runs across multiple instances, they must share state. Redis is the de facto solution — all instances increment the same Redis counter using atomic INCR operations. Vetora models this shared-state latency and shows how distributed counter synchronization affects limiting accuracy under high concurrency.
When to Use
Recommended
- +Protecting backend services from traffic spikes that exceed their capacity — especially during viral events or flash sales
- +Enforcing fair usage policies across multi-tenant platforms — preventing one tenant from monopolizing shared resources
- +DDoS mitigation by limiting request rates per IP address, API key, or user identity
- +Preventing retry storms — limiting how quickly failed requests can be retried after a partial outage
- +API monetization — enforcing different rate limits for free, standard, and premium API tiers
Not Recommended
- -Internal trusted service-to-service calls where all services are capacity-planned together — rate limiting adds latency without benefit
- -Real-time systems where rejecting any request is unacceptable — use backpressure or load shedding instead
- -Batch processing pipelines where throttling defeats the purpose of maximizing throughput
Key Parameters in Vetora
Real-World Examples
Redis-based Rate Limiters
Most production rate limiters use Redis INCR with EXPIRE for distributed counters. Libraries like rate-limiter-flexible (Node.js) and limits (Python) provide battle-tested Redis-backed implementations.
Stripe API Rate Limiting
Stripe enforces 100 read requests/second and 100 write requests/second per API key using a token bucket algorithm, with graceful 429 responses and Retry-After headers.
GitHub API Rate Limiting
GitHub enforces 5,000 requests/hour for authenticated users and 60/hour for unauthenticated requests. Rate limit state is communicated via X-RateLimit headers on every response.
Frequently Asked Questions
What is a Rate Limiter in system design?
A Rate Limiter is a component that controls the rate of incoming requests to protect backend services from overload. It enforces maximum request rates per client, IP, or API key using algorithms like token bucket, sliding window, or fixed window. When a client exceeds the limit, requests are rejected with HTTP 429 status codes. Rate limiters are essential in system design for DDoS protection, fair multi-tenant resource sharing, and preventing cascade failures during traffic spikes.
What is the difference between token bucket and sliding window rate limiting?
Token bucket allows controlled bursts — tokens accumulate up to a maximum (burst capacity) and are consumed per request. A client can send a burst of requests equal to the bucket size, then must slow to the refill rate. Sliding window counts requests in a continuously rolling time window with no burst allowance, providing smoother rate enforcement. Token bucket is better for APIs where short bursts are acceptable; sliding window is better when strict, uniform rate enforcement is required.
How do you implement distributed rate limiting?
Distributed rate limiting requires shared state across all rate limiter instances. The standard approach uses Redis with atomic INCR and EXPIRE commands — each request increments a counter keyed by client ID and time window, and EXPIRE ensures automatic cleanup. For token bucket, Redis stores the token count and last refill timestamp. Lua scripting in Redis ensures atomicity of multi-step operations. Alternative approaches include local rate limiting with gossip protocol synchronization, though these sacrifice accuracy for lower latency.
What is the fixed window boundary problem in rate limiting?
The fixed window boundary problem occurs because counters reset at interval boundaries. If the limit is 100 requests per minute, a client could send 100 requests at 0:59 and 100 more at 1:01 — effectively 200 requests in 2 seconds while technically respecting the per-minute limit. Sliding window algorithms solve this by tracking request counts across a continuously moving time window. A common compromise is the sliding window counter, which interpolates between the current and previous window counts.
How do rate limiters interact with retry logic?
Rate limiters and client retries can create dangerous feedback loops. When a rate limiter rejects requests with 429 responses, clients that retry immediately amplify the overload. Best practice is for the rate limiter to include a Retry-After header indicating when the client should retry. Well-behaved clients implement exponential backoff with jitter based on this header. In Vetora's simulator, you can model this interaction to observe how aggressive retry policies cause sustained overload while conservative backoff allows graceful recovery.
Related Components
Centralized entry point that handles authentication, rate limiting, request routing, and protocol tr...
Distributes incoming traffic across multiple server instances using algorithms like round-robin, lea...
Application server or microservice that processes requests, runs business logic, and communicates wi...
Traffic source representing end users or external systems that generate requests to your architectur...
Try Rate Limiter in the Simulator
Build architectures with Rate Limiter and 13 other component types. Run discrete event simulations and get AI-powered feedback.
Open Playground