A URL shortening service like TinyURL maps long URLs to 7-character Base62 codes, serving 100K+ writes/day and 10M+ redirects/day at sub-50ms p99 latency. This 5-component architecture uses Redis caching (93% hit rate), sharded PostgreSQL, and async Kafka analytics to achieve 99.99% availability. The read-to-write ratio of 100:1 fundamentally shapes every design decision, from LRU eviction policy to range-based database partitioning.
A URL shortening service is one of the most commonly asked system design interview questions because it touches on several fundamental distributed systems concepts while remaining approachable for candidates at all levels. The core requirement is deceptively simple: given a long URL, generate a short alias that redirects users to the original destination. In practice, this problem expands into a rich set of trade-offs around data modeling, caching strategy, and horizontal scalability.
At production scale, a service like TinyURL or Bitly handles hundreds of thousands of URL creation requests per day and millions of redirect lookups. The read-to-write ratio is extremely skewed — typically 100:1 or higher — which fundamentally shapes the architecture. Every redirect must complete in under 100 milliseconds to avoid perceptible delay for end users, meaning the hot path must avoid database round-trips whenever possible.
Beyond basic functionality, interviewers expect candidates to address URL expiration and cleanup, analytics tracking (click counts, referrer data, geographic distribution), custom alias support, and abuse prevention. The system must also handle hash collisions gracefully, ensure uniqueness across a distributed fleet of application servers, and provide durability guarantees so that shortened URLs remain valid for years.
This template models the complete architecture: API gateway for rate limiting and authentication, application service for encoding logic, a distributed cache layer for read amplification, a partitioned database for persistent storage, and an analytics pipeline for click tracking. Running the simulation reveals how cache hit rate directly impacts p99 latency and how database partitioning strategy affects write throughput during traffic spikes.
## How the URL Shortener Routes Requests
The URL shortener architecture follows a classic read-heavy pattern optimized for sub-100ms redirect latency. The data flow begins at the API Gateway, which handles rate limiting, authentication for the creation endpoint, and routes requests to the appropriate service. For write operations (creating a short URL), the Application Service generates a unique short code using Base62 encoding of an auto-incremented ID or a hash-based approach, then persists the mapping to the database and populates the cache. The gateway also serves as the single entry point for both public redirect traffic and authenticated API calls, simplifying TLS termination and observability.
## Redis Caching Layer and Hit Rate Optimization
For read operations (redirecting a short URL to the original), the Application Service first checks the distributed cache (Redis or Memcached). On a cache hit — which should occur 90-95% of the time for a well-tuned system — the redirect completes without touching the database at all. Cache misses fall through to the database, and the result is backfilled into the cache with a TTL. LRU eviction ensures the most frequently accessed URLs remain hot. The cache is sized to hold the working set of popular URLs, typically the top 20% by access frequency, which accounts for 95% of redirect traffic. Monitoring cache hit rate is the single most important operational metric.
## Database Sharding Strategy for URL Storage
The database layer uses range-based partitioning on the short code to distribute writes evenly across shards. Each shard maintains a local auto-increment counter, and a Zookeeper or etcd cluster assigns non-overlapping ranges to each shard to guarantee global uniqueness without cross-shard coordination. Range-based sharding enables efficient batch reads for analytics and simplifies shard routing since the short code prefix itself encodes the target shard. Shard rebalancing is predictable because ranges can be split without rehashing existing keys.
## Async Analytics Pipeline and Click Tracking
An asynchronous analytics pipeline captures click events via a message queue (Kafka), processes them in batch windows, and stores aggregated metrics in a time-series database. This decouples analytics from the latency-critical redirect path. Click events include the short code, timestamp, referrer, user agent, and geographic data derived from the client IP. The topology demonstrates how read-heavy workloads benefit enormously from caching, and how separating read and write paths enables independent scaling of each tier.
The URL shortener serves two fundamentally different workloads: a low-volume write path (URL creation) and a high-volume read path (URL redirect). This diagram traces the redirect path — the latency-critical hot path that must complete in under 50ms at p99. The cache layer is the linchpin of the entire design: a 93% cache hit rate means only 7% of redirects touch the database, reducing DB load by 14× compared to a cacheless design.
The write path (URL creation) follows a different route: the Application Service generates a Base62-encoded short code, writes the mapping to both the database and cache (write-through), and asynchronously publishes a click-tracking event to Kafka. The write path tolerates higher latency (200-300ms) since it's not user-facing in the same way as redirects.
Under load, the simulation reveals that cache eviction rate is the critical metric. When the working set exceeds cache capacity, hit rate drops and database connections saturate, creating a cascading latency spike. The solution is to size the Redis cluster for the hot set (top 20% of URLs by access frequency account for 95% of traffic) rather than the entire URL corpus.
Step-by-Step Walkthrough
Pseudocode
// Redirect — the latency-critical read path
async function handleRedirect(shortCode: string):
// 1. Cache lookup (fast path — 93% hit rate)
cached = await redis.get(`url:${shortCode}`)
if cached:
originalUrl = cached.originalUrl
else:
// 2. Database fallback (slow path — 7% of requests)
shard = resolveShard(shortCode) // prefix-based routing
row = await db[shard].query(
"SELECT original_url, expires_at FROM urls WHERE short_code = $1",
[shortCode]
) // ~12ms (index seek on short_code PK)
if !row: return 404
if row.expires_at < now(): return 410 Gone
originalUrl = row.original_url
// 3. Backfill cache for future requests
await redis.set(`url:${shortCode}`, row, { ttl: 86400 })
// 4. Async analytics (fire-and-forget, <1ms)
kafka.produce("click-events", {
shortCode, timestamp: now(), referrer, userAgent, geo
})
return redirect(301, originalUrl) // ~35ms total (hit), ~50ms (miss)
// URL creation — write path
async function createShortUrl(originalUrl: string, customAlias?: string):
shortCode = customAlias ?? base62Encode(nextId())
shard = resolveShard(shortCode)
await db[shard].execute(
"INSERT INTO urls (short_code, original_url, created_at, expires_at) VALUES ($1,$2,now(),$3)",
[shortCode, originalUrl, defaultExpiry()]
) // ~50ms
await redis.set(`url:${shortCode}`, { originalUrl }, { ttl: 86400 })
return { shortUrl: `https://tny.url/r/${shortCode}` }Choice
Base62 encoding of auto-incremented IDs with range-based allocation
Rationale
Hash-based approaches (MD5, SHA-256 truncation) risk collisions and require collision detection loops. Auto-incremented IDs with Base62 encoding guarantee uniqueness without retries. Range-based allocation across shards avoids a single-point-of-failure counter service while maintaining global uniqueness.
Choice
Redis cluster with LRU eviction and 24-hour TTL
Rationale
With a 100:1 read-to-write ratio, the cache absorbs the vast majority of traffic. Redis provides sub-millisecond reads and supports cluster mode for horizontal scaling. LRU eviction naturally keeps the most popular URLs hot, and a 24-hour TTL ensures stale entries are eventually refreshed.
Choice
Range-based sharding on the short code prefix
Rationale
Range-based partitioning enables efficient batch reads for analytics and simplifies shard routing since the short code itself encodes the target shard. Unlike hash-based partitioning, range-based allows ordered scans and makes shard rebalancing more predictable.
Choice
Async event streaming via Kafka with batch aggregation
Rationale
Tracking every click synchronously would add 10-50ms to the redirect path and create a tight coupling between the redirect service and analytics storage. Async streaming via Kafka decouples these concerns, provides durability for click events, and enables downstream consumers (real-time dashboards, batch ETL) to process independently.
Target RPS
1,200 reads/s, 12 writes/s
Latency (p99)
<50ms (p99 redirect)
Storage
~50 GB/year (100K URLs/day)
Availability
99.99%
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
If using hash-based generation (e.g., MD5 truncation), you must check the database for existing entries with the same short code and retry with a different salt or append a counter. The better approach is to avoid collisions entirely by using auto-incremented IDs encoded in Base62, which guarantees uniqueness by construction and eliminates the collision-detection overhead.
A well-tuned URL shortener should achieve a 90-95% cache hit rate. This is achievable because URL access follows a power-law distribution — a small percentage of URLs receive the vast majority of clicks. With LRU eviction and sufficient cache memory, the hot set stays resident. Below 85%, database load becomes a bottleneck; above 95% offers diminishing returns relative to cache cost.
Horizontal scaling involves three tiers: (1) Stateless application servers behind a load balancer — add instances as read traffic grows. (2) A distributed cache cluster (Redis Cluster) partitioned by short code — scales read throughput linearly with nodes. (3) A sharded database with range-based partitioning — each shard handles a subset of the key space. The key insight is that reads and writes scale independently because they have different resource profiles.
Either works, but the choice depends on query patterns. A key-value store (DynamoDB, Cassandra) is natural since the primary access pattern is a point lookup by short code. If you need analytics queries (top URLs by click count, URLs created per day), a relational database with proper indexing provides richer query capabilities. Many production systems use a KV store for the redirect path and a relational database for analytics.
Abuse prevention requires multiple layers: (1) Rate limiting per API key and per IP address at the gateway. (2) URL scanning against malware and phishing blocklists before creation. (3) CAPTCHA or proof-of-work for unauthenticated creation requests. (4) Monitoring for bulk creation patterns and automated takedown of flagged URLs. (5) Reporting mechanisms that allow users to flag malicious short links for review.
Base62 uses characters [a-z, A-Z, 0-9] to encode numeric IDs into short alphanumeric strings. A 7-character Base62 code supports 62^7 (approximately 3.5 trillion) unique URLs, far exceeding most production needs. The key trade-off versus Base64 is URL-safety: Base62 avoids '+' and '/' characters that require percent-encoding in URLs. Compared to UUID-based approaches, Base62 of auto-incremented IDs produces shorter codes and guarantees uniqueness without collision detection, but it does leak creation ordering information, which some services consider a privacy concern.
Start with the write volume: 100K new URLs/day equals roughly 36.5M URLs/year. Each URL record contains the short code (7 bytes), original URL (average 200 bytes), creation timestamp (8 bytes), expiry (8 bytes), and metadata (roughly 100 bytes) for approximately 323 bytes per record. At 36.5M records/year, that is about 11.2 GB/year for raw data. Adding indexes (B-tree on short code, secondary index on expiry) roughly doubles storage to 22-25 GB/year. The Redis cache, storing the hot set of 10M URLs at roughly 250 bytes each, requires about 2.5 GB of memory.
With 7-character Base62 codes, the theoretical limit is 3.5 trillion URLs, which would take roughly 96,000 years at 100K writes/day. In practice, the risk is shard-level exhaustion: if using range-based ID allocation, a single shard could exhaust its pre-assigned range. The solution is dynamic range extension via a coordination service like Zookeeper, which allocates new ranges to shards on demand. If you genuinely need more capacity, extending to 8 characters multiplies the address space by 62x. Interviewers ask this to test whether you understand the difference between theoretical limits and operational constraints.
Sign in to join the discussion.
Ready to design your own TinyURL?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator