Easy3 componentsInterview: High

TinyURL — Naive Single-Server

Q: Why would anyone build a URL shortener without a cache?

For low-traffic internal tools, the simplicity trade-off is worth it. A cache adds deployment, monitoring, and invalidation complexity. At under 100 RPS, PostgreSQL handles reads comfortably with 10ms latency. The cache becomes essential when traffic exceeds the database's capacity — this simulation helps identify that inflection point precisely.

Q: What happens at 10,000 RPS?

The system fails in multiple ways simultaneously. The service's 5-thread pool is exhausted (max ~625 RPS sustained), causing request queuing. The database's 50 connection pool fills up, adding connection-wait latency. DB utilization exceeds 95%, query latency spikes from 10ms to 200ms+, and the 100ms SLO is violated for most traffic.

Q: How does UUID collision risk compare to counter-based codes?

With 7-character Base62 codes from UUID, the collision space is ~3.5 trillion. By the birthday paradox, a 50% collision probability is reached after ~2.4 million URLs (~24 days at 100K/day). Counter-based approaches have zero collision risk since each ID is unique by construction. At scale, counters are strictly superior.

Q: When should you graduate from naive to a production architecture?

Graduate when the simulation shows your SLOs are at risk: database utilization exceeding 80%, p99 latency exceeding your SLO, or single-pod failure causing unacceptable downtime. The comparison tool quantifies exactly how much headroom each improvement (cache, load balancer, replicas) provides.

Q: What is the first optimization you should add?

A Redis cache. URL mappings are immutable (a short code always maps to the same long URL), making cache invalidation trivial — there is none. Adding a cache with 95% hit rate drops database utilization from 95% to under 15% at 1K RPS. This is the single highest-impact change you can make to this architecture.

The simplest possible URL shortener: one service pod, one database, no cache. Demonstrates why caching, load balancing, and horizontal scaling become essential as traffic grows beyond 500 RPS.

StorageBeginnerBottleneck Analysis

Try in Simulator

Problem Statement

URL shortening is the canonical easy system design interview question, and the naive single-server approach is where every discussion begins. The question sounds deceptively simple: given a long URL, generate a short alias; given the short alias, redirect to the original URL. Three components — a client, a service, and a database — are the absolute minimum to make this work. The naive approach deliberately omits every optimization to establish a measurable baseline.

This architecture runs a single ECS Fargate task (1 pod, 5 threads) that talks directly to a PostgreSQL RDS instance with 50 max connections. Short codes are generated using random UUIDs (first 7 characters of a v4 UUID), which avoids any need for a coordinated counter service or distributed lock. Every redirect lookup hits the database directly — there is no cache between the service and PostgreSQL, no CDN in front of the service, and no load balancer distributing traffic across replicas.

The educational value emerges under load. At 100 RPS, redirect latency is a comfortable 25ms and the database runs at 20% utilization. At 500 RPS, the service's 5-thread pool begins saturating (theoretical max is ~625 sustained RPS at 8ms processing time), and p99 latency climbs to 80ms. At 1,000 RPS, the system crosses a critical threshold: the database connection pool fills up, thread pool exhaustion causes request queuing, and p99 latency exceeds the 100ms SLO. At 10,000 RPS — a realistic peak for even a moderately popular shortener — the database is at 95%+ utilization and the system is effectively non-functional.

This is precisely the inflection point that interviewers want candidates to identify. The 100:1 read-to-write ratio in URL shortening means the database handles 100 reads for every write. Without a cache to absorb repeated reads for popular URLs, the database is the sole bottleneck for the entire read path. Run this variant side-by-side with the Counter-Based (v1) variant to see the dramatic difference: adding a Redis cache drops database utilization from 95% to under 15% at the same traffic level, because 95% of reads are served from cache instead of hitting PostgreSQL.

Architecture Overview

The naive URL shortener is a straight-line three-component architecture with zero branching, zero caching, and zero redundancy. Every request follows the same path: Client to UrlService to UrlDatabase.

For URL creation (POST /api/v1/shorten), the UrlService generates a random UUID and takes the first 7 characters as the short code. This requires no coordination — no counter service, no distributed lock, no consensus protocol. The service performs an INSERT into the urls table with a UNIQUE constraint on short_code. If an astronomically unlikely collision occurs (probability approximately 1 in 3.5 trillion for 7-char Base62), the INSERT fails and the service retries with a new UUID. The write path takes approximately 63ms: 8ms CPU processing + 5ms network + 50ms durable DB write.

For redirect reads (GET /api/v1/redirect/{code}), the UrlService performs a SELECT query on PostgreSQL using the short_code primary key. The B-tree index ensures O(log n) lookup time, completing in approximately 10ms at low load. However, every single redirect hits the database — there is no cache to absorb repeated reads for viral links. With a 100:1 read-to-write ratio, 90% of all requests are redirect reads, and every one of them goes to PostgreSQL.

The service runs on ECS Fargate with 2 vCPU, 4 GB memory, and only 5 threads. This deliberately constrained sizing creates a clear throughput ceiling at approximately 625 sustained RPS (5 threads / 0.008s per request). The PostgreSQL instance allows 50 maximum connections — once all are in use, new requests wait in a connection queue, adding queuing delay on top of base query latency.

There is zero redundancy at any layer. A service pod crash takes down 100% of capacity (not 17% as in the v1 variant with 6 pods). A database failure loses all data (no replicas). This simplicity is the point: three components, three failure modes, one obvious bottleneck.

Architecture Preview

Loading architecture preview...

Open in Simulator

Request Flow — Direct Database Access

Every request follows a single linear path with no branching. Redirect reads and URL creation both go directly to PostgreSQL — the only difference is the query type (SELECT vs INSERT). There is no cache to short-circuit reads, no load balancer to distribute traffic, and no CDN to serve at the edge. This simplicity makes bottleneck identification trivial: if the system is slow, the database is the bottleneck.

Loading diagram...

Step-by-Step Walkthrough

1Client sends POST /api/v1/shorten with the long URL
2UrlService generates a random UUID and takes the first 7 characters as the short code (~1ms)
3UrlService INSERTs the mapping into PostgreSQL's urls table (~50ms durable write with index update)
4UrlService returns the short URL to the client (~63ms total)
5Client visits the short URL via GET /api/v1/redirect/{code}
6UrlService performs a SELECT by short_code primary key on PostgreSQL (~10ms indexed read)
7UrlService returns HTTP 301 with the original URL in the Location header (~23ms total)
8Every single redirect hits the database — there is no cache to absorb repeated reads for popular URLs

Pseudocode

// URL creation — UUID-based short code generation
async function createShortUrl(long_url):
    short_code = uuid_v4().substring(0, 7)   // Random, no coordination
    await db.execute(
        "INSERT INTO urls (short_code, original_url, created_at)
         VALUES ($1, $2, now())",
        [short_code, long_url]
    )   // ~50ms durable write
    return BASE_URL + "/" + short_code

// Redirect — direct DB lookup every time
async function redirect(short_code):
    row = await db.execute(
        "SELECT original_url FROM urls WHERE short_code = $1",
        [short_code]
    )   // ~10ms indexed read — NO CACHE
    if not row: return 404
    return 301, Location: row.original_url

Database Schema

A single table with a B-tree primary key index. The schema is minimal: short_code as the lookup key, original_url as the redirect target, and timestamps for management. With no partitioning or sharding, the entire dataset lives on one PostgreSQL instance. At 100K URLs/day, the table grows approximately 1 GB/month.

Loading diagram...

Step-by-Step Walkthrough

1short_code is the primary key — a 7-character string from a UUID v4 prefix, indexed via B-tree for O(log n) lookups
2original_url stores the full long URL (up to ~2KB typical). TEXT type allows arbitrary length
3created_at records when the mapping was created. Used for analytics and TTL-based cleanup
4expires_at is optional — when set, a background job or query filter can skip expired URLs
5No additional indexes beyond the PK — the only access pattern is lookup by short_code

Key Design Decisions

Short Code Generation

Choice

Random UUID (first 7 characters)

Rationale

Random UUIDs require zero coordination — no counter service, no atomic increment, no distributed lock. The collision space of 62^7 (3.5 trillion) makes collisions negligible at low volume. The trade-off versus counter-based approaches is slightly longer codes and non-sequential IDs, which prevent key-range partitioning optimizations in the database.

No Caching Layer

Choice

Every read hits PostgreSQL directly

Rationale

At low traffic (under 100 RPS), the database handles reads without issue. Adding a cache introduces complexity: invalidation logic, an additional component to deploy and monitor, and cache warming strategies. The deliberate omission makes the database bottleneck visible under load — run at 1K RPS and watch DB utilization spike to show exactly when a cache becomes essential.

Single Pod, No Load Balancer

Choice

One ECS Fargate task with 5 threads

Rationale

With only one service instance, there is nothing to load-balance. A load balancer adds latency (~1.5ms) and cost without benefit. This keeps the architecture minimal so the simulation focuses on the database bottleneck rather than load distribution.

No Horizontal Scaling

Choice

Fixed capacity — 1 pod, 50 DB connections

Rationale

Horizontal scaling requires a load balancer, health checks, connection pooling, and potentially database read replicas. Each adds a component and configuration surface area. Fixed capacity at 1 pod and 50 DB connections creates a clear ceiling that traffic will hit during simulation, making the scaling conversation concrete and measurable.

Scale & Performance

Target RPS

~500 sustained (ceiling)

Latency (p99)

~50ms p99 redirect (low load)

Storage

~50 GB/year at 100 URLs/day

Availability

~99% (single pod, no redundancy)

Time & Space Complexity

Operation	Time	Space	Notes
Create short URL (POST /shorten)	O(1) UUID generation + O(log n) DB INSERT with B-tree index update	O(1) per URL — one row (~200 bytes) per mapping	UUID generation is constant-time. DB write is dominated by index maintenance at large table sizes.
Redirect lookup (GET /redirect/{code})	O(log n) B-tree index lookup on short_code PK	O(1) — returns a single row	No cache means every lookup is a full DB round trip. At 10K RPS, this saturates the connection pool.

Database Schema (HLD)

urls

Single table storing all URL mappings. short_code is the primary key with a B-tree index for O(log n) lookups. No partitioning, no sharding — the entire dataset lives on one PostgreSQL instance. At 100K URLs/day, the table grows ~1 GB/month. Without a cache, every redirect is a SELECT by primary key against this table.

short_code VARCHAR(7) PKoriginal_url TEXT NOT NULLcreated_at TIMESTAMPTZ NOT NULL DEFAULT now()expires_at TIMESTAMPTZ

Indexes: PK B-tree on short_code

UNIQUE constraint on short_code catches UUID collisions. At low volume the collision probability is negligible (~1 in 3.5T), but the constraint provides a safety net.

What-If Scenarios

A viral tweet generates 50K RPS to a single short URL

Impact

The database receives 50K reads/sec for one row. Connection pool exhaustion within seconds, all other URLs affected by queuing. Service returns 503 errors.

Mitigation

Add a Redis cache (v1 variant). The hot URL would be cached after the first read, and subsequent reads served from Redis in 2ms — zero database load for that URL.

The single service pod crashes

Impact

100% downtime. No requests can be served. No health check triggers a restart for 30-60 seconds (ECS task restart time).

Mitigation

Add a load balancer with multiple pods (v1 variant). Losing 1 of 6 pods reduces capacity by 17% instead of 100%.

PostgreSQL disk fills up

Impact

All writes fail (URL creation). Reads continue until the connection pool is exhausted by error-handling overhead. Full outage follows.

Mitigation

Monitor disk usage, set up alerts at 80% capacity. In production (v3 variant), sharded databases distribute storage across instances.

UUID collision on short code generation

Impact

The INSERT fails with a UNIQUE constraint violation. The service retries with a new UUID. At low volume this adds ~10ms per collision; at high volume with a large table, collisions become more frequent.

Mitigation

Counter-based IDs (v1 variant) eliminate collisions entirely. Each ID is unique by construction — no retry logic needed.

Failure Modes & Resilience

Component	Failure	Impact	Mitigation
UrlService	Thread pool exhaustion	All 5 threads occupied — new requests queue indefinitely. Latency spikes from 25ms to seconds. Cascading timeouts on client side.	Increase thread count, add pods behind a load balancer, or implement request shedding (reject with 503 when queue depth exceeds threshold).
UrlDatabase (PostgreSQL)	Connection pool exhaustion	50 connections all in use — new queries wait for a free connection. Adds 50-500ms of queuing delay on top of query latency.	Increase max_connections (limited by PostgreSQL memory), add PgBouncer for connection pooling, or add a cache to reduce DB query volume.
UrlDatabase (PostgreSQL)	Instance crash	Total data loss if no backups. All reads and writes fail simultaneously. Complete system outage.	Enable automated backups, add a standby replica for failover. In production (v3), sharded databases with replicas provide redundancy.

Scaling Strategy

Vertical scaling only. Upgrade the ECS task to more vCPUs and memory to increase service throughput. Upgrade the RDS instance to a larger class for more connections and I/O bandwidth. Both have hard ceilings: the largest ECS task is 16 vCPU / 120 GB, and the largest RDS instance handles ~10K connections. Horizontal scaling (adding pods) requires a load balancer, which fundamentally changes the architecture to the v1 variant. The naive design hits its scaling wall at approximately 500 RPS, after which you must add architectural components (cache, LB, replicas) rather than just scaling existing ones.

Monitoring & Alerting

For the naive variant, monitoring focuses on the single database as the primary bottleneck. Key metrics: PostgreSQL active connections (alert at 40/50, critical at 48/50), query latency p99 (alert at 50ms, critical at 100ms), CPU utilization (alert at 70%, critical at 90%), disk I/O wait time, and WAL write throughput. Service-level metrics: thread pool utilization (active_threads / 5), request queue depth, error rate (5xx responses). Set up a dashboard showing database utilization vs. traffic RPS to identify the exact inflection point where the database becomes the bottleneck. At low traffic, all metrics are green — the value is running the simulation at increasing RPS to see when they turn yellow and red.

Cost Analysis

The naive architecture is the cheapest option at low traffic: one ECS Fargate task (2 vCPU, 4 GB) at ~$60/month + one RDS PostgreSQL instance (db.r7g.large) at ~$120/month = ~$180/month total. No load balancer, no cache, no CDN overhead. However, cost-per-request increases non-linearly as the database becomes the bottleneck — at 500 RPS, you are paying $180/month for a system near its limits. The v1 variant (counter + cache) costs ~$400/month but handles 20K RPS — a 40x throughput improvement for 2x the cost. The naive approach is cost-effective only below ~200 RPS sustained.

Security Considerations

The naive variant has no authentication layer (no API Gateway). All endpoints are publicly accessible, making them vulnerable to abuse: automated URL creation to exhaust storage, redirect-based phishing, and open redirector attacks (using the shortener to disguise malicious URLs). Short codes are random UUIDs, which are not guessable — unlike sequential counters that leak creation volume. For production, add rate limiting (v1's API Gateway), URL validation (block known malicious domains), and abuse detection. HTTPS should be enforced at the service level since there is no CDN or gateway to terminate TLS.

Deployment Strategy

Deployment is straightforward: a single ECS task definition with one container. Rolling updates replace the old task with a new one, causing 30-60 seconds of downtime during the swap (no load balancer means no graceful draining). For zero-downtime deploys, you need at least two pods behind a load balancer (v1 variant). Database schema changes require careful handling — run migrations before deploying new code since there is only one database instance. Rollback: revert the ECS task definition to the previous version. There is no blue/green capability without a load balancer.

Real-World Examples

•Internal corporate link shorteners (< 100 users, low traffic)
•Hackathon prototypes and MVPs for URL shortening startups
•Academic projects demonstrating basic web service architecture
•Small-business marketing tools with manual link creation

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
Naive (Single Server)	T1	~50ms p99	~500 RPS	~$180/mo	3 components	~99% (single pod)
Counter-Based (Base62)	T2	<100ms p99	~20K RPS	~$400/mo	6 components	~99.9%
Production Multi-Region	T3	~2ms CDN hit	100K+ RPS	~$3,000/mo	11 components	~99.99%
Serverless (Lambda + DynamoDB)	T4	<30ms warm	10K+ RPS (auto)	$0-800/mo	4 components	~99.99%

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why would anyone build a URL shortener without a cache?

For low-traffic internal tools, the simplicity trade-off is worth it. A cache adds deployment, monitoring, and invalidation complexity. At under 100 RPS, PostgreSQL handles reads comfortably with 10ms latency. The cache becomes essential when traffic exceeds the database's capacity — this simulation helps identify that inflection point precisely.

What happens at 10,000 RPS?

The system fails in multiple ways simultaneously. The service's 5-thread pool is exhausted (max ~625 RPS sustained), causing request queuing. The database's 50 connection pool fills up, adding connection-wait latency. DB utilization exceeds 95%, query latency spikes from 10ms to 200ms+, and the 100ms SLO is violated for most traffic.

How does UUID collision risk compare to counter-based codes?

With 7-character Base62 codes from UUID, the collision space is ~3.5 trillion. By the birthday paradox, a 50% collision probability is reached after ~2.4 million URLs (~24 days at 100K/day). Counter-based approaches have zero collision risk since each ID is unique by construction. At scale, counters are strictly superior.

When should you graduate from naive to a production architecture?

Graduate when the simulation shows your SLOs are at risk: database utilization exceeding 80%, p99 latency exceeding your SLO, or single-pod failure causing unacceptable downtime. The comparison tool quantifies exactly how much headroom each improvement (cache, load balancer, replicas) provides.

What is the first optimization you should add?

A Redis cache. URL mappings are immutable (a short code always maps to the same long URL), making cache invalidation trivial — there is none. Adding a cache with 95% hit rate drops database utilization from 95% to under 15% at 1K RPS. This is the single highest-impact change you can make to this architecture.

Related Templates

TinyURL — Counter-Based (Base62)TinyURL — Production Multi-Region TinyURL — Serverless (Lambda + DynamoDB)

Discussion

Ready to design your own TinyURL?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator