A fully serverless URL shortener using AWS Lambda for compute and DynamoDB for storage. Zero idle cost, automatic scaling from 0 to 10K RPS, with cold-start latency as the primary trade-off.
The serverless URL shortener represents a fundamentally different design philosophy: instead of choosing pod counts, thread pools, and connection limits, you delegate capacity planning entirely to the cloud provider. AWS Lambda handles compute, DynamoDB handles storage, and API Gateway handles edge routing. The system scales from zero traffic (zero cost) to 10K RPS (automatically provisioned) without any manual intervention or capacity planning.
This architecture is increasingly relevant in system design interviews, especially at companies that heavily use AWS. Interviewers want to see that candidates can reason beyond traditional container-based architectures and articulate the serverless trade-offs: cold-start latency (100-500ms on first invocation after idle), per-invocation pricing (cost scales linearly with traffic), vendor lock-in (three AWS-specific services), and the cost crossover point where serverless becomes more expensive than dedicated infrastructure.
The core insight is that DynamoDB's native read performance (5ms for a key-value GetItem) eliminates the need for a separate caching layer. In the counter-based (v1) variant, Redis is essential because PostgreSQL reads take 10-20ms and the database has a finite connection pool. DynamoDB has no connection pool to exhaust and delivers cache-like latency for point reads. This removes an entire component from the architecture, reducing operational complexity while maintaining acceptable performance.
The four-component architecture (Client, API Gateway, Lambda, DynamoDB) is the minimum viable production deployment. API Gateway handles rate limiting and request validation. Lambda contains the URL shortening logic — generating short codes, reading/writing DynamoDB. DynamoDB stores URL mappings as key-value pairs with the short_code as the partition key, delivering consistent single-digit millisecond performance at any scale.
The trade-offs define when serverless is the right choice. Below 5K RPS average, serverless is 40-60% cheaper than dedicated infrastructure because you pay nothing during idle periods. Above 10K sustained RPS, Lambda's per-invocation pricing ($0.20 per million + compute time) exceeds the cost of equivalent ECS Fargate tasks. The cost crossover depends heavily on traffic variability: bursty workloads with long idle periods strongly favor serverless, while sustained high throughput favors containers.
Compare this variant with the Production (v3) variant to see the ultimate trade-off: serverless trades p99 latency (cold starts) and high-traffic cost efficiency for dramatic simplicity (4 vs 11 components) and ~80% cost reduction at spiky/variable traffic patterns.
The serverless URL shortener uses four managed AWS services, each scaling independently without manual intervention. There are no servers to provision, patch, or monitor at the OS level.
All traffic enters through AWS API Gateway (REST), which serves as the managed edge layer. API Gateway handles rate limiting (10K RPS cap), request validation, CORS headers, and SSL termination. It adds approximately 3ms of latency per request but provides a stable endpoint and abuse protection. API Gateway routes all requests matching /api/v1/* to the Lambda function.
The Lambda function (512 MB memory) contains the URL shortening logic. For URL creation (POST /api/v1/shorten), it generates a random 7-character short code from a UUID, calls DynamoDB PutItem to store the mapping, and returns the short URL. For redirects (GET /api/v1/redirect/{code}), it calls DynamoDB GetItem with the short_code as the partition key and returns HTTP 301 with the original URL. Processing time is approximately 10ms per invocation (slightly higher than ECS due to Lambda runtime overhead).
Lambda's scaling model differs fundamentally from containers. Instead of pre-provisioned pods with thread pools, Lambda creates a new execution environment for each concurrent request (up to 10K concurrent invocations by default). A burst of 5,000 simultaneous requests creates 5,000 Lambda environments. The trade-off is cold-start latency: creating a fresh environment (no recent invocations) adds 100-500ms for container initialization. Warm invocations (reusing an existing environment) add only ~5ms overhead.
DynamoDB stores URL mappings in a single table with short_code as the partition key. GetItem queries return in approximately 5ms with eventual consistency (sufficient for URL redirects where the mapping is immutable). PutItem writes complete in approximately 10ms. On-demand capacity mode auto-scales without provisioned throughput settings — you pay per request ($1.25 per million reads, $1.25 per million writes) with no idle cost.
The absence of a caching layer is deliberate. DynamoDB's 5ms read latency for key-value lookups is comparable to Redis (2-3ms) but without the operational overhead of managing a cache cluster, sizing memory, handling eviction, or dealing with cache invalidation. For a cost-optimized architecture, eliminating components is as important as optimizing individual component performance. If sub-5ms reads become a hard requirement, DynamoDB Accelerator (DAX) provides an integrated caching layer without adding a separate component.
The pay-per-request cost model operates at every layer: API Gateway ($3.50 per million requests), Lambda ($0.20 per million invocations + compute time), DynamoDB (per-request pricing). At zero traffic, monthly cost is $0. At 1K RPS sustained, approximately $250/month. At 10K RPS sustained, approximately $800/month.
The serverless data flow is the simplest of all four variants. Every request follows the same path: API Gateway validates and routes, Lambda processes, DynamoDB reads or writes. There is no cache to check, no load balancer to route through, no event stream to produce to. The simplicity is the design: fewer components means fewer failure modes, fewer configuration knobs, and fewer things to monitor. The trade-off is cold-start latency and the lack of multi-tier caching that the production variant provides.
Step-by-Step Walkthrough
Pseudocode
// Lambda handler — single function for both endpoints
exports.handler = async (event) => {
const { httpMethod, pathParameters, body } = event
if (httpMethod === "POST" && event.path === "/api/v1/shorten") {
const { url } = JSON.parse(body)
const short_code = uuid_v4().substring(0, 7)
await dynamodb.putItem({
TableName: "urls",
Item: {
short_code: { S: short_code },
original_url: { S: url },
created_at: { S: new Date().toISOString() },
ttl: { N: String(Math.floor(Date.now()/1000) + 86400*365) }
}
}) // ~10ms
return { statusCode: 200, body: JSON.stringify({ short_url: BASE_URL + "/" + short_code }) }
}
if (httpMethod === "GET" && pathParameters?.code) {
const result = await dynamodb.getItem({
TableName: "urls",
Key: { short_code: { S: pathParameters.code } }
}) // ~5ms
if (!result.Item) return { statusCode: 404 }
return {
statusCode: 301,
headers: { Location: result.Item.original_url.S }
}
}
}
// Cost model (per month):
// Zero traffic: $0
// 1K RPS avg: Lambda $60 + DynamoDB $100 + APIGW $10 = $170
// 5K RPS avg: Lambda $300 + DynamoDB $400 + APIGW $50 = $750
// 10K RPS avg: Lambda $600 + DynamoDB $800 + APIGW $100 = $1,500DynamoDB's schema is intentionally minimal. A single table with short_code as the partition key serves all access patterns. There are no sort keys, no global secondary indexes, and no complex queries. This simplicity aligns with the serverless philosophy: the URL shortener's access pattern is a pure key-value lookup, and DynamoDB is optimized exactly for this. The TTL attribute enables automatic cleanup of expired URLs without a background job.
Step-by-Step Walkthrough
Choice
AWS Lambda (512 MB, pay-per-invocation)
Rationale
Lambda eliminates capacity planning. No pod counts, no thread pools, no auto-scaling rules. Scales from 0 to 10K concurrent invocations automatically. The trade-off is cold-start latency (100-500ms first request after idle) and higher per-request cost at sustained high traffic.
Choice
DynamoDB (on-demand mode) instead of PostgreSQL
Rationale
DynamoDB is a natural fit for URL mappings: the access pattern is a pure key-value lookup by partition key (short_code) with no joins or complex queries. GetItem returns in 5ms at any scale. No connection pool management, no max_connections tuning, no read replicas needed.
Choice
DynamoDB serves as both storage and fast-read layer
Rationale
DynamoDB's 5ms read latency is close enough to Redis (2-3ms) that a separate cache adds marginal value at significant operational cost. Eliminating components is a core serverless principle. DAX is available as a drop-in integrated cache if latency requirements tighten.
Choice
AWS API Gateway (REST) for rate limiting and routing
Rationale
Even serverless architectures need rate limiting and a stable endpoint. API Gateway provides these fully managed, adding ~3ms of latency but eliminating custom gateway code. It also handles CORS, API keys, and usage plans — features that would require significant Lambda code otherwise.
Choice
Random UUID (first 7 characters) instead of counter
Rationale
In a serverless architecture, maintaining a global counter requires an additional service (Redis or DynamoDB conditional write). Random UUIDs require no coordination — each Lambda invocation generates a short code independently. The collision risk (1 in 3.5T) is acceptable for the operational simplicity gained.
Target RPS
10K+ (auto-scales from zero)
Latency (p99)
<30ms warm, 200-500ms cold start
Storage
Unlimited (DynamoDB scales transparently)
Availability
99.99% (AWS managed services SLA)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Create short URL | O(1) UUID generation + O(1) DynamoDB PutItem | O(1) per URL — one DynamoDB item (~200 bytes) | ~23ms total (Lambda 10ms + DynamoDB PutItem 10ms + network 3ms). No index maintenance cost. |
| Redirect (warm Lambda) | O(1) DynamoDB GetItem by partition key | O(1) | ~18ms total (Lambda 10ms + DynamoDB GetItem 5ms + network 3ms). Hash-based lookup, no B-tree traversal. |
| Redirect (cold Lambda) | O(1) DynamoDB GetItem + O(n) container init | O(1) | 200-500ms total. Cold start is a one-time cost per Lambda environment — subsequent requests to the same environment are warm. |
Single DynamoDB table storing all URL mappings. short_code is the partition key — all access is by primary key (GetItem/PutItem). No sort key needed since each short_code maps to exactly one URL. On-demand capacity mode auto-scales read and write throughput. TTL attribute enables automatic expiration of old URLs without explicit cleanup jobs.
Indexes: Partition key: short_code (hash index)
DynamoDB pricing: $1.25 per million read units, $1.25 per million write units (on-demand). At 10K RPS, ~$800/month total. No connection pool — HTTP API with automatic retry.
Traffic spikes from 100 RPS to 10K RPS in 10 seconds
Impact
Lambda creates ~10K new execution environments simultaneously. Each has a cold start (100-500ms). First wave of requests experiences high p99 latency. Subsequent requests use warm environments and return to <30ms latency.
Mitigation
Provisioned concurrency: pre-warm N Lambda environments to handle expected spikes. Trade-off: you pay for provisioned capacity even during idle periods, partially negating the serverless cost advantage.
DynamoDB throttling during extreme write burst
Impact
DynamoDB on-demand mode auto-scales, but it takes 1-2 minutes to adjust capacity. During that window, some writes may return ProvisionedThroughputExceededException. URL creation fails temporarily.
Mitigation
DynamoDB SDK has built-in exponential backoff and retry. For predictable spikes, switch to provisioned capacity with auto-scaling. The application handles DynamoDB errors gracefully with retry logic.
AWS region outage (us-east-1 down)
Impact
All components (API Gateway, Lambda, DynamoDB) are in one region. Complete service outage until the region recovers.
Mitigation
Multi-region deployment with DynamoDB Global Tables (cross-region replication). API Gateway custom domain with Route 53 health-check failover. Adds cost and complexity, but provides regional failover.
Lambda concurrency limit reached (default 1K concurrent)
Impact
New invocations are throttled with 429 errors. URL creation and redirects fail for excess traffic.
Mitigation
Request a concurrency limit increase from AWS (up to 10K+). Monitor concurrent executions and set CloudWatch alarms at 80% of limit.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| AWS Lambda | Cold start during traffic spike | First requests after idle experience 100-500ms latency. Affects p99 but not p50. No data loss or errors — just higher latency. | Provisioned concurrency keeps N environments warm. Monitor cold start rate. Schedule periodic keep-alive invocations for critical paths. |
| DynamoDB | Throughput throttling | Requests return ProvisionedThroughputExceededException. SDK retries with backoff. Some requests fail if retries are exhausted. | On-demand mode auto-scales within 1-2 minutes. Built-in SDK retry with exponential backoff. Monitor ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits. |
| API Gateway | Rate limit exceeded | Requests beyond the rate limit cap receive 429 Too Many Requests. Legitimate traffic may be rejected during unexpected spikes. | Set rate limit with 2x headroom above expected peak. Monitor 429 error rate. Use usage plans with per-client rate limits for better granularity. |
Scaling is fully automatic at every layer. Lambda: scales by creating new execution environments on demand (up to account concurrency limit). No configuration needed. DynamoDB: on-demand mode auto-scales read and write capacity based on traffic. No provisioned capacity to manage. API Gateway: managed service, scales automatically. The only manual scaling action is requesting Lambda concurrency limit increases from AWS. For predictable high-traffic events (marketing campaign launches), pre-warm Lambda with provisioned concurrency and optionally switch DynamoDB to provisioned capacity with auto-scaling for more predictable performance.
Serverless monitoring uses CloudWatch metrics natively since all components are AWS managed. Lambda: invocation count, duration (p50/p99), error count, throttle count, concurrent executions, cold start percentage. DynamoDB: ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, ThrottledRequests, SuccessfulRequestLatency, SystemErrors. API Gateway: 4xx/5xx error rates, latency (p50/p99), request count, integration latency. Key dashboard: correlate Lambda cold start rate with p99 latency to quantify cold start impact. Alert on throttle counts at any layer. The most important operational metric is Lambda concurrent executions — it determines when you need to request a limit increase or add provisioned concurrency.
The serverless model is pay-per-request at every layer. At zero traffic: $0/month (the entire stack costs nothing when idle). At 1K RPS sustained: Lambda (~$60/month), DynamoDB (~$100/month), API Gateway (~$10/month) = ~$170/month total. At 5K RPS sustained: Lambda (~$300/month), DynamoDB (~$400/month), API Gateway (~$50/month) = ~$750/month. At 10K RPS sustained: Lambda (~$600/month), DynamoDB (~$800/month), API Gateway (~$100/month) = ~$1,500/month. The crossover with dedicated infrastructure (v1 at ~$400/month) occurs around 3-5K sustained RPS. However, for bursty traffic (e.g., 500 RPS average with 10K RPS peaks), serverless is 60-80% cheaper because you only pay for actual invocations, not idle capacity.
API Gateway provides the security perimeter: rate limiting prevents abuse, request validation blocks malformed payloads. For authentication, API Gateway supports API keys, IAM authorization, or Cognito authorizers. Lambda execution roles follow least privilege — the function only has DynamoDB read/write permissions for the urls table. DynamoDB encryption at rest is enabled by default (AWS-managed keys or customer-managed KMS keys). All data in transit is encrypted via HTTPS (API Gateway enforces TLS). URL validation in the Lambda function blocks known malicious domains and open redirector patterns. The serverless model eliminates OS-level security concerns (patching, hardening) since AWS manages the underlying infrastructure.
Lambda deployment uses versioning and aliases. A new version is deployed alongside the current version. An alias (e.g., 'prod') points to the current version. After testing, the alias is updated to point to the new version (instant traffic shift). Canary deployments route a percentage of traffic to the new version using weighted aliases (e.g., 10% new, 90% old). Rollback: point the alias back to the previous version (instant, no redeployment). DynamoDB table changes are backward-compatible — add new attributes freely; removing or renaming attributes requires a phased migration. Infrastructure-as-code (CDK or Terraform) manages all resources for reproducible deployments.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Single Server) | T1 | ~50ms p99 | ~500 RPS | ~$180/mo | 3 components | ~99% (single pod) |
| Counter-Based (Base62) | T2 | <100ms p99 | ~20K RPS | ~$400/mo | 6 components | ~99.9% |
| Production Multi-Region | T3 | ~2ms CDN hit | 100K+ RPS | ~$3,000/mo | 11 components | ~99.99% |
| Serverless (Lambda + DynamoDB) | T4 | <30ms warm | 10K+ RPS (auto) | $0-800/mo | 4 components | ~99.99% |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
A cold start occurs when Lambda creates a new execution environment for a function that hasn't been invoked recently (typically 5-15 minutes of inactivity). It adds 100-500ms for container initialization, runtime bootstrapping, and code loading. Subsequent invocations reuse the warm environment (~5ms overhead). Cold starts are most visible after idle periods (overnight, weekends) and during sudden traffic spikes.
The cost crossover is between 5-10K sustained RPS. Below 5K RPS, Lambda is 40-60% cheaper because you pay nothing during idle periods. Above 10K sustained RPS, Lambda costs ~$600/month for compute alone, while equivalent ECS Fargate pods cost ~$400/month with better p99 latency. Bursty workloads favor serverless even at higher average RPS due to zero idle cost.
DynamoDB is preferred because the access pattern is a pure key-value lookup with no joins or aggregations. DynamoDB GetItem returns in 5ms at any scale. Aurora Serverless adds SQL capabilities but with connection management overhead, scaling latency (~30 seconds for Aurora auto-scaling), and higher per-request costs for simple key-value reads.
The URL shortening business logic is portable — it is a Lambda function. The lock-in is in infrastructure glue: API Gateway routing, Lambda deployment, DynamoDB table schema. Mitigations: use infrastructure-as-code (CDK/Terraform), abstract the data access layer behind an interface, and keep business logic separate from AWS SDK calls.
Yes. DynamoDB Accelerator (DAX) provides a fully managed in-memory cache in front of DynamoDB. DAX reduces read latency from 5ms to sub-millisecond without changing application code — point the DynamoDB client at the DAX endpoint instead. DAX costs ~$0.04/hour per node, justified only when volume or latency requirements demand it.
The serverless variant trades p99 latency (cold starts) and high-traffic cost efficiency for dramatic simplicity (4 vs 11 components) and ~80% cost reduction at variable traffic. At 5K average RPS with bursty patterns, serverless costs ~$400/month vs ~$3,000/month for production. At 100K sustained RPS, the production variant wins on both cost and latency.
Sign in to join the discussion.
Ready to design your own TinyURL?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator