The simplest possible URL shortener: one service pod, one database, no cache. Demonstrates why caching, load balancing, and horizontal scaling become essential as traffic grows beyond 500 RPS.
URL shortening is the canonical easy system design interview question, and the naive single-server approach is where every discussion begins. The question sounds deceptively simple: given a long URL, generate a short alias; given the short alias, redirect to the original URL. Three components — a client, a service, and a database — are the absolute minimum to make this work. The naive approach deliberately omits every optimization to establish a measurable baseline.
This architecture runs a single ECS Fargate task (1 pod, 5 threads) that talks directly to a PostgreSQL RDS instance with 50 max connections. Short codes are generated using random UUIDs (first 7 characters of a v4 UUID), which avoids any need for a coordinated counter service or distributed lock. Every redirect lookup hits the database directly — there is no cache between the service and PostgreSQL, no CDN in front of the service, and no load balancer distributing traffic across replicas.
The educational value emerges under load. At 100 RPS, redirect latency is a comfortable 25ms and the database runs at 20% utilization. At 500 RPS, the service's 5-thread pool begins saturating (theoretical max is ~625 sustained RPS at 8ms processing time), and p99 latency climbs to 80ms. At 1,000 RPS, the system crosses a critical threshold: the database connection pool fills up, thread pool exhaustion causes request queuing, and p99 latency exceeds the 100ms SLO. At 10,000 RPS — a realistic peak for even a moderately popular shortener — the database is at 95%+ utilization and the system is effectively non-functional.
This is precisely the inflection point that interviewers want candidates to identify. The 100:1 read-to-write ratio in URL shortening means the database handles 100 reads for every write. Without a cache to absorb repeated reads for popular URLs, the database is the sole bottleneck for the entire read path. Run this variant side-by-side with the Counter-Based (v1) variant to see the dramatic difference: adding a Redis cache drops database utilization from 95% to under 15% at the same traffic level, because 95% of reads are served from cache instead of hitting PostgreSQL.
The naive URL shortener is a straight-line three-component architecture with zero branching, zero caching, and zero redundancy. Every request follows the same path: Client to UrlService to UrlDatabase.
For URL creation (POST /api/v1/shorten), the UrlService generates a random UUID and takes the first 7 characters as the short code. This requires no coordination — no counter service, no distributed lock, no consensus protocol. The service performs an INSERT into the urls table with a UNIQUE constraint on short_code. If an astronomically unlikely collision occurs (probability approximately 1 in 3.5 trillion for 7-char Base62), the INSERT fails and the service retries with a new UUID. The write path takes approximately 63ms: 8ms CPU processing + 5ms network + 50ms durable DB write.
For redirect reads (GET /api/v1/redirect/{code}), the UrlService performs a SELECT query on PostgreSQL using the short_code primary key. The B-tree index ensures O(log n) lookup time, completing in approximately 10ms at low load. However, every single redirect hits the database — there is no cache to absorb repeated reads for viral links. With a 100:1 read-to-write ratio, 90% of all requests are redirect reads, and every one of them goes to PostgreSQL.
The service runs on ECS Fargate with 2 vCPU, 4 GB memory, and only 5 threads. This deliberately constrained sizing creates a clear throughput ceiling at approximately 625 sustained RPS (5 threads / 0.008s per request). The PostgreSQL instance allows 50 maximum connections — once all are in use, new requests wait in a connection queue, adding queuing delay on top of base query latency.
There is zero redundancy at any layer. A service pod crash takes down 100% of capacity (not 17% as in the v1 variant with 6 pods). A database failure loses all data (no replicas). This simplicity is the point: three components, three failure modes, one obvious bottleneck.
Every request follows a single linear path with no branching. Redirect reads and URL creation both go directly to PostgreSQL — the only difference is the query type (SELECT vs INSERT). There is no cache to short-circuit reads, no load balancer to distribute traffic, and no CDN to serve at the edge. This simplicity makes bottleneck identification trivial: if the system is slow, the database is the bottleneck.
Step-by-Step Walkthrough
Pseudocode
// URL creation — UUID-based short code generation
async function createShortUrl(long_url):
short_code = uuid_v4().substring(0, 7) // Random, no coordination
await db.execute(
"INSERT INTO urls (short_code, original_url, created_at)
VALUES ($1, $2, now())",
[short_code, long_url]
) // ~50ms durable write
return BASE_URL + "/" + short_code
// Redirect — direct DB lookup every time
async function redirect(short_code):
row = await db.execute(
"SELECT original_url FROM urls WHERE short_code = $1",
[short_code]
) // ~10ms indexed read — NO CACHE
if not row: return 404
return 301, Location: row.original_urlA single table with a B-tree primary key index. The schema is minimal: short_code as the lookup key, original_url as the redirect target, and timestamps for management. With no partitioning or sharding, the entire dataset lives on one PostgreSQL instance. At 100K URLs/day, the table grows approximately 1 GB/month.
Step-by-Step Walkthrough
Choice
Random UUID (first 7 characters)
Rationale
Random UUIDs require zero coordination — no counter service, no atomic increment, no distributed lock. The collision space of 62^7 (3.5 trillion) makes collisions negligible at low volume. The trade-off versus counter-based approaches is slightly longer codes and non-sequential IDs, which prevent key-range partitioning optimizations in the database.
Choice
Every read hits PostgreSQL directly
Rationale
At low traffic (under 100 RPS), the database handles reads without issue. Adding a cache introduces complexity: invalidation logic, an additional component to deploy and monitor, and cache warming strategies. The deliberate omission makes the database bottleneck visible under load — run at 1K RPS and watch DB utilization spike to show exactly when a cache becomes essential.
Choice
One ECS Fargate task with 5 threads
Rationale
With only one service instance, there is nothing to load-balance. A load balancer adds latency (~1.5ms) and cost without benefit. This keeps the architecture minimal so the simulation focuses on the database bottleneck rather than load distribution.
Choice
Fixed capacity — 1 pod, 50 DB connections
Rationale
Horizontal scaling requires a load balancer, health checks, connection pooling, and potentially database read replicas. Each adds a component and configuration surface area. Fixed capacity at 1 pod and 50 DB connections creates a clear ceiling that traffic will hit during simulation, making the scaling conversation concrete and measurable.
Target RPS
~500 sustained (ceiling)
Latency (p99)
~50ms p99 redirect (low load)
Storage
~50 GB/year at 100 URLs/day
Availability
~99% (single pod, no redundancy)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Create short URL (POST /shorten) | O(1) UUID generation + O(log n) DB INSERT with B-tree index update | O(1) per URL — one row (~200 bytes) per mapping | UUID generation is constant-time. DB write is dominated by index maintenance at large table sizes. |
| Redirect lookup (GET /redirect/{code}) | O(log n) B-tree index lookup on short_code PK | O(1) — returns a single row | No cache means every lookup is a full DB round trip. At 10K RPS, this saturates the connection pool. |
Single table storing all URL mappings. short_code is the primary key with a B-tree index for O(log n) lookups. No partitioning, no sharding — the entire dataset lives on one PostgreSQL instance. At 100K URLs/day, the table grows ~1 GB/month. Without a cache, every redirect is a SELECT by primary key against this table.
Indexes: PK B-tree on short_code
UNIQUE constraint on short_code catches UUID collisions. At low volume the collision probability is negligible (~1 in 3.5T), but the constraint provides a safety net.
A viral tweet generates 50K RPS to a single short URL
Impact
The database receives 50K reads/sec for one row. Connection pool exhaustion within seconds, all other URLs affected by queuing. Service returns 503 errors.
Mitigation
Add a Redis cache (v1 variant). The hot URL would be cached after the first read, and subsequent reads served from Redis in 2ms — zero database load for that URL.
The single service pod crashes
Impact
100% downtime. No requests can be served. No health check triggers a restart for 30-60 seconds (ECS task restart time).
Mitigation
Add a load balancer with multiple pods (v1 variant). Losing 1 of 6 pods reduces capacity by 17% instead of 100%.
PostgreSQL disk fills up
Impact
All writes fail (URL creation). Reads continue until the connection pool is exhausted by error-handling overhead. Full outage follows.
Mitigation
Monitor disk usage, set up alerts at 80% capacity. In production (v3 variant), sharded databases distribute storage across instances.
UUID collision on short code generation
Impact
The INSERT fails with a UNIQUE constraint violation. The service retries with a new UUID. At low volume this adds ~10ms per collision; at high volume with a large table, collisions become more frequent.
Mitigation
Counter-based IDs (v1 variant) eliminate collisions entirely. Each ID is unique by construction — no retry logic needed.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| UrlService | Thread pool exhaustion | All 5 threads occupied — new requests queue indefinitely. Latency spikes from 25ms to seconds. Cascading timeouts on client side. | Increase thread count, add pods behind a load balancer, or implement request shedding (reject with 503 when queue depth exceeds threshold). |
| UrlDatabase (PostgreSQL) | Connection pool exhaustion | 50 connections all in use — new queries wait for a free connection. Adds 50-500ms of queuing delay on top of query latency. | Increase max_connections (limited by PostgreSQL memory), add PgBouncer for connection pooling, or add a cache to reduce DB query volume. |
| UrlDatabase (PostgreSQL) | Instance crash | Total data loss if no backups. All reads and writes fail simultaneously. Complete system outage. | Enable automated backups, add a standby replica for failover. In production (v3), sharded databases with replicas provide redundancy. |
Vertical scaling only. Upgrade the ECS task to more vCPUs and memory to increase service throughput. Upgrade the RDS instance to a larger class for more connections and I/O bandwidth. Both have hard ceilings: the largest ECS task is 16 vCPU / 120 GB, and the largest RDS instance handles ~10K connections. Horizontal scaling (adding pods) requires a load balancer, which fundamentally changes the architecture to the v1 variant. The naive design hits its scaling wall at approximately 500 RPS, after which you must add architectural components (cache, LB, replicas) rather than just scaling existing ones.
For the naive variant, monitoring focuses on the single database as the primary bottleneck. Key metrics: PostgreSQL active connections (alert at 40/50, critical at 48/50), query latency p99 (alert at 50ms, critical at 100ms), CPU utilization (alert at 70%, critical at 90%), disk I/O wait time, and WAL write throughput. Service-level metrics: thread pool utilization (active_threads / 5), request queue depth, error rate (5xx responses). Set up a dashboard showing database utilization vs. traffic RPS to identify the exact inflection point where the database becomes the bottleneck. At low traffic, all metrics are green — the value is running the simulation at increasing RPS to see when they turn yellow and red.
The naive architecture is the cheapest option at low traffic: one ECS Fargate task (2 vCPU, 4 GB) at ~$60/month + one RDS PostgreSQL instance (db.r7g.large) at ~$120/month = ~$180/month total. No load balancer, no cache, no CDN overhead. However, cost-per-request increases non-linearly as the database becomes the bottleneck — at 500 RPS, you are paying $180/month for a system near its limits. The v1 variant (counter + cache) costs ~$400/month but handles 20K RPS — a 40x throughput improvement for 2x the cost. The naive approach is cost-effective only below ~200 RPS sustained.
The naive variant has no authentication layer (no API Gateway). All endpoints are publicly accessible, making them vulnerable to abuse: automated URL creation to exhaust storage, redirect-based phishing, and open redirector attacks (using the shortener to disguise malicious URLs). Short codes are random UUIDs, which are not guessable — unlike sequential counters that leak creation volume. For production, add rate limiting (v1's API Gateway), URL validation (block known malicious domains), and abuse detection. HTTPS should be enforced at the service level since there is no CDN or gateway to terminate TLS.
Deployment is straightforward: a single ECS task definition with one container. Rolling updates replace the old task with a new one, causing 30-60 seconds of downtime during the swap (no load balancer means no graceful draining). For zero-downtime deploys, you need at least two pods behind a load balancer (v1 variant). Database schema changes require careful handling — run migrations before deploying new code since there is only one database instance. Rollback: revert the ECS task definition to the previous version. There is no blue/green capability without a load balancer.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Single Server) | T1 | ~50ms p99 | ~500 RPS | ~$180/mo | 3 components | ~99% (single pod) |
| Counter-Based (Base62) | T2 | <100ms p99 | ~20K RPS | ~$400/mo | 6 components | ~99.9% |
| Production Multi-Region | T3 | ~2ms CDN hit | 100K+ RPS | ~$3,000/mo | 11 components | ~99.99% |
| Serverless (Lambda + DynamoDB) | T4 | <30ms warm | 10K+ RPS (auto) | $0-800/mo | 4 components | ~99.99% |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
For low-traffic internal tools, the simplicity trade-off is worth it. A cache adds deployment, monitoring, and invalidation complexity. At under 100 RPS, PostgreSQL handles reads comfortably with 10ms latency. The cache becomes essential when traffic exceeds the database's capacity — this simulation helps identify that inflection point precisely.
The system fails in multiple ways simultaneously. The service's 5-thread pool is exhausted (max ~625 RPS sustained), causing request queuing. The database's 50 connection pool fills up, adding connection-wait latency. DB utilization exceeds 95%, query latency spikes from 10ms to 200ms+, and the 100ms SLO is violated for most traffic.
With 7-character Base62 codes from UUID, the collision space is ~3.5 trillion. By the birthday paradox, a 50% collision probability is reached after ~2.4 million URLs (~24 days at 100K/day). Counter-based approaches have zero collision risk since each ID is unique by construction. At scale, counters are strictly superior.
Graduate when the simulation shows your SLOs are at risk: database utilization exceeding 80%, p99 latency exceeding your SLO, or single-pod failure causing unacceptable downtime. The comparison tool quantifies exactly how much headroom each improvement (cache, load balancer, replicas) provides.
A Redis cache. URL mappings are immutable (a short code always maps to the same long URL), making cache invalidation trivial — there is none. Adding a cache with 95% hit rate drops database utilization from 95% to under 15% at 1K RPS. This is the single highest-impact change you can make to this architecture.
Sign in to join the discussion.
Ready to design your own TinyURL?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator