Industry-standard URL shortener using Redis cache-aside for redirect lookups (95% hit rate), an atomic counter for collision-free ID generation, and Base62 encoding. Handles 20K RPS with 6 pods.
The counter-based URL shortener is the standard production answer to the TinyURL interview question. It addresses every bottleneck identified in the naive variant: a Redis cache absorbs 95% of redirect reads, an atomic counter eliminates collision risk during ID generation, a load balancer distributes traffic across multiple service pods, and Base62 encoding produces compact 7-character short codes from sequential integers.
The core insight is the 100:1 read-to-write ratio. URL shorteners are massively read-heavy — for every URL created, it is redirected through 100 or more times. The naive variant forces every one of those reads through PostgreSQL. Adding a Redis cache changes the economics dramatically: at a 95% cache hit rate, only 5% of reads reach the database. At 10K redirect RPS, that means Redis handles 9,500 reads/sec (~2ms each) while PostgreSQL handles only 500 reads/sec (~10ms each). Database utilization drops from 95% to under 15%.
ID generation uses Redis INCR to maintain a global atomic counter. Each new URL gets the next integer (e.g., 1000001), which is Base62-encoded to produce a 7-character short code (e.g., "4c92"→"4c92" in Base62). This guarantees uniqueness without collision checks — every integer maps to a unique Base62 string, and INCR is atomic. The trade-off is that sequential IDs are guessable: observing two short codes reveals the total URL count between their creation times. For most use cases this is acceptable; for privacy-sensitive applications, a bit-shuffle or XOR mask can obscure the sequence.
This template introduces the API Gateway and Load Balancer layers. The API Gateway handles JWT authentication and rate limiting, protecting the service tier from abuse. The ALB distributes traffic across 6 UrlService pods using round-robin, providing horizontal scalability and redundancy — losing one pod reduces capacity by 17% instead of 100%.
The comparison with the naive variant is instructive. Same workload, dramatically different results: p99 redirect latency drops from 50ms (DB-bound) to 12ms (cache-bound), maximum throughput increases from 500 RPS to 20K+ RPS, and database utilization drops from 95% to 15%. The only additions are a cache, a counter, a load balancer, and a gateway — each solving a specific bottleneck identified in the naive approach.
The counter-based URL shortener adds three components to the naive architecture: an API Gateway for authentication and rate limiting, a Load Balancer for traffic distribution, and a Redis Cache that serves dual duty as both the read cache and the atomic counter for ID generation.
All traffic enters through the API Gateway (Amazon API Gateway REST), which validates JWT tokens (~3ms), enforces rate limits (30K RPS cap with headroom), and routes requests to the ALB. The Gateway adds security without burdening the service tier. The ALB distributes requests across 6 UrlService pods using round-robin — stateless routing since no session affinity is needed.
The UrlService (6 pods, 100 threads each) handles both URL creation and redirect logic. For URL creation (POST /api/v1/shorten), the service calls Redis INCR on a global counter key to get the next sequential integer, Base62-encodes it into a 7-character short code, writes the mapping to PostgreSQL (durable storage), and writes it to Redis (cache population). For redirect reads (GET /api/v1/redirect/{code}), the service checks Redis first — at a 95% hit rate, most redirects are served from cache in ~2ms. On cache miss (~5% of reads), the service falls back to PostgreSQL (~10ms indexed read) and writes the result back to Redis for future requests.
Redis serves dual purpose: (1) the INCR command maintains the global atomic counter for ID generation — O(1), sub-millisecond, and guaranteed unique; (2) the cache-aside pattern stores URL mappings (key: url:{short_code}, value: original_url) with 24-hour TTL and LRU eviction. URL mappings are immutable after creation, making cache invalidation trivial — there is none. The 95% hit rate emerges naturally from the Zipfian access pattern: a small fraction of URLs (viral links) receive the vast majority of redirect traffic, and these stay hot in cache.
PostgreSQL serves as the durable store with 2 read replicas. At 95% cache hit rate, only ~500 reads/sec reach the database (5% of 10K redirect RPS), plus ~2K writes/sec for URL creation. The database operates comfortably at 15-20% utilization — a dramatic improvement from the naive variant's 95%.
The architecture handles 20K+ RPS at peak with comfortable headroom at every layer. The service tier (6 pods x 100 threads = 600 concurrent slots at 5ms processing time) sustains ~120K RPS. The ALB is rated for 50K RPS. Redis handles 100K+ operations/sec. The bottleneck at extreme scale is the atomic counter — Redis INCR is a single-point contention at 2K writes/sec, well within Redis capacity but would require counter sharding at 100K+ writes/sec.
The cache-aside pattern fundamentally changes the read path economics. Instead of every redirect hitting PostgreSQL (naive variant), 95% of reads are served from Redis in ~2ms. The counter-based ID generation eliminates collision risk entirely — Redis INCR returns the next unique integer atomically, and Base62 encoding produces a compact short code. The write path populates both the database (durable storage) and cache (fast reads) simultaneously.
Step-by-Step Walkthrough
Pseudocode
// URL Creation — counter-based, zero collision risk
async function createShortUrl(long_url):
counter = await redis.incr("url_counter") // Atomic, O(1), ~1ms
short_code = base62_encode(counter) // Deterministic, O(1)
await db.execute(
"INSERT INTO urls (short_code, original_url, created_at)
VALUES ($1, $2, now())",
[short_code, long_url]
) // ~50ms durable write
await redis.setex("url:" + short_code, 86400, long_url) // Cache populate, ~2ms
return BASE_URL + "/" + short_code
// Redirect — cache-aside pattern
async function redirect(short_code):
// Check cache first (95% hit rate)
cached = await redis.get("url:" + short_code) // ~2ms
if cached: return 301, Location: cached
// Cache miss — fall back to DB
row = await db.execute(
"SELECT original_url FROM urls WHERE short_code = $1",
[short_code]
) // ~10ms
if not row: return 404
// Populate cache for future reads
await redis.setex("url:" + short_code, 86400, row.original_url)
return 301, Location: row.original_urlThe storage layer has two tiers: Redis for fast reads and counter state, PostgreSQL for durable persistence. Redis stores URL mappings as simple key-value pairs with 24-hour TTL, plus the global counter for ID generation. PostgreSQL stores the same mappings durably with B-tree indexing. The cache-aside pattern means Redis is populated on writes (cache-through) and on cache-miss reads (cache-populate).
Step-by-Step Walkthrough
Choice
Redis INCR + Base62 encoding
Rationale
Counter-based IDs guarantee uniqueness without collision checks. Redis INCR is atomic and O(1) — each URL gets the next integer, which is Base62-encoded into a 7-character short code. 62^7 = 3.5 trillion unique codes, sufficient for decades of operation. The trade-off is sequential IDs are guessable, leaking creation volume. For privacy, apply a bit-shuffle or XOR mask.
Choice
Redis cache-aside with 24h TTL, LRU eviction
Rationale
URL mappings are immutable — a short code always maps to the same long URL. This makes cache invalidation trivial (no invalidation needed). Cache-aside with LRU naturally keeps hot URLs in cache while cold URLs expire. 95% hit rate drops database utilization from 95% to 15%.
Choice
Same Redis cluster for counter + cache
Rationale
Redis INCR for the counter and GET/SET for cache lookups use the same Redis cluster, eliminating one component. Both operations are sub-millisecond. Operational simplicity outweighs the minor risk of a single Redis failure affecting both functions.
Choice
ALB with 6 UrlService pods (round-robin)
Rationale
6 pods provide both horizontal throughput (120K sustained RPS) and redundancy (losing 1 pod = 17% capacity reduction). Round-robin is sufficient since the service is stateless. No sticky sessions needed because every pod can serve any request.
Choice
Amazon API Gateway with JWT auth + rate limiting
Rationale
Centralizes authentication and rate limiting at the edge, preventing abuse (bulk URL creation, redirect scraping) before requests reach the service tier. Adds ~3ms latency but eliminates the need for auth logic in every service pod.
Target RPS
~20K sustained
Latency (p99)
<100ms p99 (12ms cache hit)
Storage
~50 GB/year at 100K URLs/day
Availability
~99.9% (6 pods, Redis replication)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Create short URL | O(1) Redis INCR + O(1) Base62 encode + O(log n) DB INSERT | O(1) per URL — one DB row + one cache entry | INCR is O(1) and atomic. Base62 encoding of a 64-bit integer is O(1). DB INSERT is dominated by B-tree index update. |
| Redirect (cache hit, 95%) | O(1) Redis GET | O(1) | Sub-2ms. Hash table lookup in Redis. No database involvement. |
| Redirect (cache miss, 5%) | O(log n) PostgreSQL SELECT + O(1) Redis SET | O(1) | B-tree index lookup + cache population. ~15ms total. |
Durable store for all URL mappings. Same schema as the naive variant but with 2 read replicas. At 95% cache hit rate, the database handles only 5% of reads (~500 RPS at 10K total) plus all writes (~2K RPS). Primary key on short_code with B-tree index for fast cache-miss lookups.
Indexes: PK B-tree on short_code
2 read replicas absorb cache-miss reads. Replication lag ~5ms. Write path goes to primary only.
In-memory URL mapping cache in Redis. Key pattern: url:{short_code}. Stores the original_url with 24-hour TTL. 95% hit rate from Zipfian access pattern — viral links stay hot in cache. Also stores the global counter key for INCR-based ID generation. LRU eviction ensures the cache stays within memory limits.
Working set: ~1 GB for 1M active URLs at ~1KB per entry. Cache size: 950 MB. Hit rate: ~95% from Zipfian access.
Redis cluster fails completely
Impact
URL creation fails (counter unavailable). Redirect reads degrade to database-only (naive variant behavior). p99 latency jumps from 12ms to 50ms+. DB utilization spikes from 15% to 95%.
Mitigation
Redis Cluster with automatic failover (2-3s switchover). Fallback to UUID-based IDs for creation during outage. Circuit breaker on cache calls to fail fast.
Counter reaches 62^7 (3.5 trillion)
Impact
Base62 encoding produces 8-character codes instead of 7. Short URLs become slightly longer but still functional. No data corruption or service failure.
Mitigation
Monitor counter value. At 100K URLs/day, reaching 3.5T takes ~96,000 years. This is a non-issue in practice.
Cache eviction storm from burst traffic pattern change
Impact
If traffic shifts from viral URLs (high cache hit) to long-tail URLs (low cache hit), cache hit rate drops from 95% to 50-60%. Database utilization spikes. p99 latency increases.
Mitigation
Monitor cache hit rate. If it drops below 80%, either increase cache size or add read replicas to handle the increased DB load.
One of 6 service pods enters a crash loop
Impact
Capacity drops by 17%. ALB health checks detect the failure in ~10 seconds and stop routing to the unhealthy pod. Remaining 5 pods handle the load with reduced headroom.
Mitigation
ECS auto-restart replaces the failed task. Auto-scaling adds a replacement pod within 2 minutes. ALB health checks prevent traffic from reaching unhealthy pods.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| Redis Cache | Node failure | If the primary Redis node fails, reads fall back to PostgreSQL (increased latency). Counter INCR fails, blocking URL creation. Automatic failover to replica takes 2-3 seconds. | Redis Cluster with 2 replicas per shard. Automatic failover. Application-level fallback to UUID generation during Redis outage. |
| API Gateway | Rate limit misconfiguration | If rate limit is set too low, legitimate traffic is rejected with 429 errors. If too high, abuse traffic reaches the service tier. | Set rate limit at 1.5x expected peak. Monitor 429 error rate. Use adaptive rate limiting that adjusts based on service health. |
| PostgreSQL | Replication lag spike | Read replicas serve stale data for a few seconds. A newly created URL might not be found on a cache-miss read that hits a lagging replica. | Read-after-write consistency: after URL creation, write to cache immediately. Subsequent reads hit cache (not replica). Replication lag is only visible for cache-miss reads of very recently created URLs. |
Horizontal scaling at the service tier: ECS auto-scaling adds pods based on CPU utilization (target: 60%) or ALB request count. Redis can be scaled vertically (larger node type) or horizontally (Redis Cluster with more shards). PostgreSQL scales reads via additional read replicas and writes via vertical scaling. The scaling ceiling is approximately 20K RPS, limited by: (1) the atomic counter's single-point contention at high write rates, and (2) the single Redis cluster's memory capacity for the URL cache. Beyond 20K RPS, upgrade to the Production variant (v3) with CDN, ZooKeeper range allocation, and sharded databases.
Key metrics to monitor: Redis cache hit rate (target: >90%, alert at <85%), Redis memory utilization (alert at 80%), Redis INCR counter value (informational — track growth rate), PostgreSQL active connections (alert at 160/200), DB replication lag (alert at >10ms), ALB request count and error rate, API Gateway 4xx/5xx rates, service pod CPU and memory utilization, and end-to-end redirect latency p99. The most important single metric is cache hit rate — a drop from 95% to 80% doubles database load. Set up a correlation dashboard showing cache hit rate vs. DB utilization vs. redirect latency to catch cascading degradation early.
Monthly infrastructure cost at moderate traffic (10K RPS): API Gateway (~$35 for 26B requests/month), ALB (~$20 + $8/million requests), 6x ECS Fargate pods (2 vCPU, 4 GB each, ~$360/month), ElastiCache Redis (cache.r7g.large 2-node, ~$200/month), RDS PostgreSQL primary + 2 read replicas (~$360/month). Total: approximately $400-450/month. Compared to the naive variant at $180/month, you get 40x the throughput (20K vs 500 RPS) for 2.5x the cost. Cost per 1M requests: ~$0.60 (vs $12.00 for naive). This is the sweet spot for most production URL shorteners — simple enough to operate, cost-effective up to 20K RPS.
The API Gateway adds JWT authentication, preventing anonymous abuse. Rate limiting (30K RPS cap) protects against DDoS and bulk URL creation. Sequential Base62 short codes are guessable — an attacker can enumerate URLs by incrementing the counter. Mitigations: (1) apply a bit-shuffle to the counter before Base62 encoding, (2) require authentication for URL creation, (3) implement URL scanning to block malicious redirect targets (phishing, malware). Redis should be deployed in a private subnet with security group rules limiting access to the service tier only. PostgreSQL credentials should be stored in AWS Secrets Manager and rotated on a schedule.
Rolling deployment via ECS service update. The ALB drains connections from old tasks (30-second drain period) before terminating them. With 6 pods, a rolling update replaces one pod at a time, maintaining 83% capacity throughout. Blue/green deployment is supported via ALB target group switching for zero-downtime releases. Database migrations run before code deployment using a separate migration task. Redis cache is populated on demand (cache-aside), so no pre-warming is needed after deployment — the first requests after deploy will have cache misses that resolve on subsequent access.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Single Server) | T1 | ~50ms p99 | ~500 RPS | ~$180/mo | 3 components | ~99% (single pod) |
| Counter-Based (Base62) | T2 | <100ms p99 | ~20K RPS | ~$400/mo | 6 components | ~99.9% |
| Production Multi-Region | T3 | ~2ms CDN hit | 100K+ RPS | ~$3,000/mo | 11 components | ~99.99% |
| Serverless (Lambda + DynamoDB) | T4 | <30ms warm | 10K+ RPS (auto) | $0-800/mo | 4 components | ~99.99% |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
URL shorteners have a 100:1 read-to-write ratio. At 10K redirect RPS, 95% cache hit rate means Redis serves 9,500 reads/sec while PostgreSQL handles only 500 reads/sec. The database goes from handling 100% of reads (95% utilization) to handling 5% of reads (15% utilization). The cache absorbs the vast majority of the read workload.
Sequential IDs leak information: observing short codes 'abc' and 'def' reveals how many URLs were created between them. For most URL shorteners this is acceptable (Bitly uses sequential IDs). For privacy-sensitive applications, apply a reversible bit-shuffle: XOR the counter value with a secret key before Base62 encoding. The codes appear random but are still deterministically generated from the counter.
If Redis fails, two things break: (1) new URL creation fails because the counter is unavailable, and (2) redirect reads fall back to PostgreSQL for every request. The system degrades to naive-variant behavior until Redis recovers. Mitigation: Redis Cluster with automatic failover (2-3 second switchover), or a fallback to UUID-based IDs during Redis outage.
Base62 uses only alphanumeric characters (0-9, a-z, A-Z). Base64 adds '+' and '/' which require URL encoding (becoming '%2B' and '%2F'), making short URLs longer and less readable. Base62 produces clean URLs that work in any context — emails, SMS, social media — without encoding issues.
The counter-based approach handles up to ~20K RPS comfortably. Beyond that, two bottlenecks emerge: (1) the atomic counter becomes contention-hot at 10K+ writes/sec (needs range-based allocation as in v3), and (2) geographic latency becomes a factor for global users (needs CDN as in v3). Below 20K RPS, this is the right architecture — simple, well-understood, and cost-effective.
Sign in to join the discussion.
Ready to design your own TinyURL?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator