Vetora logo
Easy4 componentsInterview: High

TinyURL — Serverless (Lambda + DynamoDB)

A fully serverless URL shortener using AWS Lambda for compute and DynamoDB for storage. Zero idle cost, automatic scaling from 0 to 10K RPS, with cold-start latency as the primary trade-off.

ServerlessDynamoDBCost-OptimizedLambda
Problem Statement

The serverless URL shortener represents a fundamentally different design philosophy: instead of choosing pod counts, thread pools, and connection limits, you delegate capacity planning entirely to the cloud provider. AWS Lambda handles compute, DynamoDB handles storage, and API Gateway handles edge routing. The system scales from zero traffic (zero cost) to 10K RPS (automatically provisioned) without any manual intervention or capacity planning.

This architecture is increasingly relevant in system design interviews, especially at companies that heavily use AWS. Interviewers want to see that candidates can reason beyond traditional container-based architectures and articulate the serverless trade-offs: cold-start latency (100-500ms on first invocation after idle), per-invocation pricing (cost scales linearly with traffic), vendor lock-in (three AWS-specific services), and the cost crossover point where serverless becomes more expensive than dedicated infrastructure.

The core insight is that DynamoDB's native read performance (5ms for a key-value GetItem) eliminates the need for a separate caching layer. In the counter-based (v1) variant, Redis is essential because PostgreSQL reads take 10-20ms and the database has a finite connection pool. DynamoDB has no connection pool to exhaust and delivers cache-like latency for point reads. This removes an entire component from the architecture, reducing operational complexity while maintaining acceptable performance.

The four-component architecture (Client, API Gateway, Lambda, DynamoDB) is the minimum viable production deployment. API Gateway handles rate limiting and request validation. Lambda contains the URL shortening logic — generating short codes, reading/writing DynamoDB. DynamoDB stores URL mappings as key-value pairs with the short_code as the partition key, delivering consistent single-digit millisecond performance at any scale.

The trade-offs define when serverless is the right choice. Below 5K RPS average, serverless is 40-60% cheaper than dedicated infrastructure because you pay nothing during idle periods. Above 10K sustained RPS, Lambda's per-invocation pricing ($0.20 per million + compute time) exceeds the cost of equivalent ECS Fargate tasks. The cost crossover depends heavily on traffic variability: bursty workloads with long idle periods strongly favor serverless, while sustained high throughput favors containers.

Compare this variant with the Production (v3) variant to see the ultimate trade-off: serverless trades p99 latency (cold starts) and high-traffic cost efficiency for dramatic simplicity (4 vs 11 components) and ~80% cost reduction at spiky/variable traffic patterns.

Architecture Overview

The serverless URL shortener uses four managed AWS services, each scaling independently without manual intervention. There are no servers to provision, patch, or monitor at the OS level.

All traffic enters through AWS API Gateway (REST), which serves as the managed edge layer. API Gateway handles rate limiting (10K RPS cap), request validation, CORS headers, and SSL termination. It adds approximately 3ms of latency per request but provides a stable endpoint and abuse protection. API Gateway routes all requests matching /api/v1/* to the Lambda function.

The Lambda function (512 MB memory) contains the URL shortening logic. For URL creation (POST /api/v1/shorten), it generates a random 7-character short code from a UUID, calls DynamoDB PutItem to store the mapping, and returns the short URL. For redirects (GET /api/v1/redirect/{code}), it calls DynamoDB GetItem with the short_code as the partition key and returns HTTP 301 with the original URL. Processing time is approximately 10ms per invocation (slightly higher than ECS due to Lambda runtime overhead).

Lambda's scaling model differs fundamentally from containers. Instead of pre-provisioned pods with thread pools, Lambda creates a new execution environment for each concurrent request (up to 10K concurrent invocations by default). A burst of 5,000 simultaneous requests creates 5,000 Lambda environments. The trade-off is cold-start latency: creating a fresh environment (no recent invocations) adds 100-500ms for container initialization. Warm invocations (reusing an existing environment) add only ~5ms overhead.

DynamoDB stores URL mappings in a single table with short_code as the partition key. GetItem queries return in approximately 5ms with eventual consistency (sufficient for URL redirects where the mapping is immutable). PutItem writes complete in approximately 10ms. On-demand capacity mode auto-scales without provisioned throughput settings — you pay per request ($1.25 per million reads, $1.25 per million writes) with no idle cost.

The absence of a caching layer is deliberate. DynamoDB's 5ms read latency for key-value lookups is comparable to Redis (2-3ms) but without the operational overhead of managing a cache cluster, sizing memory, handling eviction, or dealing with cache invalidation. For a cost-optimized architecture, eliminating components is as important as optimizing individual component performance. If sub-5ms reads become a hard requirement, DynamoDB Accelerator (DAX) provides an integrated caching layer without adding a separate component.

The pay-per-request cost model operates at every layer: API Gateway ($3.50 per million requests), Lambda ($0.20 per million invocations + compute time), DynamoDB (per-request pricing). At zero traffic, monthly cost is $0. At 1K RPS sustained, approximately $250/month. At 10K RPS sustained, approximately $800/month.

Architecture Preview
Loading architecture preview...
Request Flow — Serverless Execution

The serverless data flow is the simplest of all four variants. Every request follows the same path: API Gateway validates and routes, Lambda processes, DynamoDB reads or writes. There is no cache to check, no load balancer to route through, no event stream to produce to. The simplicity is the design: fewer components means fewer failure modes, fewer configuration knobs, and fewer things to monitor. The trade-off is cold-start latency and the lack of multi-tier caching that the production variant provides.

Loading diagram...

Step-by-Step Walkthrough

  1. 1Browser sends request to API Gateway. Gateway checks rate limits (10K RPS cap), validates the request, and handles CORS.
  2. 2API Gateway invokes the Lambda function. If the function is warm (recent invocation), overhead is ~5ms. If cold (no recent invocation), initialization adds 100-500ms.
  3. 3For URL creation: Lambda generates a random UUID short code (first 7 characters). No counter service needed — each invocation generates independently.
  4. 4Lambda calls DynamoDB PutItem to store the mapping. DynamoDB writes are single-digit milliseconds (~10ms). No connection pool — HTTP API with automatic retry.
  5. 5For redirect: Lambda calls DynamoDB GetItem with the short_code as partition key. Hash-based lookup returns in ~5ms regardless of table size.
  6. 6Lambda returns the response to API Gateway, which forwards it to the browser. Total latency: 18ms warm, 200-500ms cold.
  7. 7DynamoDB on-demand mode auto-scales — no capacity planning. Costs $1.25 per million reads and $1.25 per million writes.
  8. 8The entire stack costs $0/month at zero traffic, scaling linearly with usage.

Pseudocode

// Lambda handler — single function for both endpoints
exports.handler = async (event) => {
    const { httpMethod, pathParameters, body } = event

    if (httpMethod === "POST" && event.path === "/api/v1/shorten") {
        const { url } = JSON.parse(body)
        const short_code = uuid_v4().substring(0, 7)

        await dynamodb.putItem({
            TableName: "urls",
            Item: {
                short_code: { S: short_code },
                original_url: { S: url },
                created_at: { S: new Date().toISOString() },
                ttl: { N: String(Math.floor(Date.now()/1000) + 86400*365) }
            }
        })   // ~10ms

        return { statusCode: 200, body: JSON.stringify({ short_url: BASE_URL + "/" + short_code }) }
    }

    if (httpMethod === "GET" && pathParameters?.code) {
        const result = await dynamodb.getItem({
            TableName: "urls",
            Key: { short_code: { S: pathParameters.code } }
        })   // ~5ms

        if (!result.Item) return { statusCode: 404 }

        return {
            statusCode: 301,
            headers: { Location: result.Item.original_url.S }
        }
    }
}

// Cost model (per month):
// Zero traffic:  $0
// 1K RPS avg:    Lambda $60 + DynamoDB $100 + APIGW $10 = $170
// 5K RPS avg:    Lambda $300 + DynamoDB $400 + APIGW $50 = $750
// 10K RPS avg:   Lambda $600 + DynamoDB $800 + APIGW $100 = $1,500
DynamoDB Table Schema

DynamoDB's schema is intentionally minimal. A single table with short_code as the partition key serves all access patterns. There are no sort keys, no global secondary indexes, and no complex queries. This simplicity aligns with the serverless philosophy: the URL shortener's access pattern is a pure key-value lookup, and DynamoDB is optimized exactly for this. The TTL attribute enables automatic cleanup of expired URLs without a background job.

Loading diagram...

Step-by-Step Walkthrough

  1. 1short_code is the partition key — DynamoDB hashes it to determine which storage partition holds the item. All reads and writes are by this key.
  2. 2original_url stores the full redirect target URL. No size limit concerns — DynamoDB items can be up to 400 KB.
  3. 3created_at is an ISO8601 timestamp string. Used for analytics and audit but not indexed (no query by time needed).
  4. 4ttl is a DynamoDB Time-To-Live attribute (epoch seconds). DynamoDB automatically deletes items after this timestamp, providing free cleanup without a Lambda cron job.
  5. 5No global secondary indexes — the only access pattern is GetItem/PutItem by partition key. Adding GSIs would increase cost and complexity without benefit for this use case.
Key Design Decisions
Compute Layer

Choice

AWS Lambda (512 MB, pay-per-invocation)

Rationale

Lambda eliminates capacity planning. No pod counts, no thread pools, no auto-scaling rules. Scales from 0 to 10K concurrent invocations automatically. The trade-off is cold-start latency (100-500ms first request after idle) and higher per-request cost at sustained high traffic.

Database Choice

Choice

DynamoDB (on-demand mode) instead of PostgreSQL

Rationale

DynamoDB is a natural fit for URL mappings: the access pattern is a pure key-value lookup by partition key (short_code) with no joins or complex queries. GetItem returns in 5ms at any scale. No connection pool management, no max_connections tuning, no read replicas needed.

No Separate Cache

Choice

DynamoDB serves as both storage and fast-read layer

Rationale

DynamoDB's 5ms read latency is close enough to Redis (2-3ms) that a separate cache adds marginal value at significant operational cost. Eliminating components is a core serverless principle. DAX is available as a drop-in integrated cache if latency requirements tighten.

API Gateway as Edge Layer

Choice

AWS API Gateway (REST) for rate limiting and routing

Rationale

Even serverless architectures need rate limiting and a stable endpoint. API Gateway provides these fully managed, adding ~3ms of latency but eliminating custom gateway code. It also handles CORS, API keys, and usage plans — features that would require significant Lambda code otherwise.

UUID-Based Short Codes

Choice

Random UUID (first 7 characters) instead of counter

Rationale

In a serverless architecture, maintaining a global counter requires an additional service (Redis or DynamoDB conditional write). Random UUIDs require no coordination — each Lambda invocation generates a short code independently. The collision risk (1 in 3.5T) is acceptable for the operational simplicity gained.

Scale & Performance

Target RPS

10K+ (auto-scales from zero)

Latency (p99)

<30ms warm, 200-500ms cold start

Storage

Unlimited (DynamoDB scales transparently)

Availability

99.99% (AWS managed services SLA)

Time & Space Complexity
OperationTimeSpaceNotes
Create short URLO(1) UUID generation + O(1) DynamoDB PutItemO(1) per URL — one DynamoDB item (~200 bytes)~23ms total (Lambda 10ms + DynamoDB PutItem 10ms + network 3ms). No index maintenance cost.
Redirect (warm Lambda)O(1) DynamoDB GetItem by partition keyO(1)~18ms total (Lambda 10ms + DynamoDB GetItem 5ms + network 3ms). Hash-based lookup, no B-tree traversal.
Redirect (cold Lambda)O(1) DynamoDB GetItem + O(n) container initO(1)200-500ms total. Cold start is a one-time cost per Lambda environment — subsequent requests to the same environment are warm.
Database Schema (HLD)
urls (DynamoDB)

Single DynamoDB table storing all URL mappings. short_code is the partition key — all access is by primary key (GetItem/PutItem). No sort key needed since each short_code maps to exactly one URL. On-demand capacity mode auto-scales read and write throughput. TTL attribute enables automatic expiration of old URLs without explicit cleanup jobs.

short_code STRING (Partition Key)original_url STRINGcreated_at STRING (ISO8601)ttl NUMBER (DynamoDB TTL, epoch seconds)

Indexes: Partition key: short_code (hash index)

DynamoDB pricing: $1.25 per million read units, $1.25 per million write units (on-demand). At 10K RPS, ~$800/month total. No connection pool — HTTP API with automatic retry.

What-If Scenarios

Traffic spikes from 100 RPS to 10K RPS in 10 seconds

Impact

Lambda creates ~10K new execution environments simultaneously. Each has a cold start (100-500ms). First wave of requests experiences high p99 latency. Subsequent requests use warm environments and return to <30ms latency.

Mitigation

Provisioned concurrency: pre-warm N Lambda environments to handle expected spikes. Trade-off: you pay for provisioned capacity even during idle periods, partially negating the serverless cost advantage.

DynamoDB throttling during extreme write burst

Impact

DynamoDB on-demand mode auto-scales, but it takes 1-2 minutes to adjust capacity. During that window, some writes may return ProvisionedThroughputExceededException. URL creation fails temporarily.

Mitigation

DynamoDB SDK has built-in exponential backoff and retry. For predictable spikes, switch to provisioned capacity with auto-scaling. The application handles DynamoDB errors gracefully with retry logic.

AWS region outage (us-east-1 down)

Impact

All components (API Gateway, Lambda, DynamoDB) are in one region. Complete service outage until the region recovers.

Mitigation

Multi-region deployment with DynamoDB Global Tables (cross-region replication). API Gateway custom domain with Route 53 health-check failover. Adds cost and complexity, but provides regional failover.

Lambda concurrency limit reached (default 1K concurrent)

Impact

New invocations are throttled with 429 errors. URL creation and redirects fail for excess traffic.

Mitigation

Request a concurrency limit increase from AWS (up to 10K+). Monitor concurrent executions and set CloudWatch alarms at 80% of limit.

Failure Modes & Resilience
ComponentFailureImpactMitigation
AWS LambdaCold start during traffic spikeFirst requests after idle experience 100-500ms latency. Affects p99 but not p50. No data loss or errors — just higher latency.Provisioned concurrency keeps N environments warm. Monitor cold start rate. Schedule periodic keep-alive invocations for critical paths.
DynamoDBThroughput throttlingRequests return ProvisionedThroughputExceededException. SDK retries with backoff. Some requests fail if retries are exhausted.On-demand mode auto-scales within 1-2 minutes. Built-in SDK retry with exponential backoff. Monitor ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits.
API GatewayRate limit exceededRequests beyond the rate limit cap receive 429 Too Many Requests. Legitimate traffic may be rejected during unexpected spikes.Set rate limit with 2x headroom above expected peak. Monitor 429 error rate. Use usage plans with per-client rate limits for better granularity.
Scaling Strategy

Scaling is fully automatic at every layer. Lambda: scales by creating new execution environments on demand (up to account concurrency limit). No configuration needed. DynamoDB: on-demand mode auto-scales read and write capacity based on traffic. No provisioned capacity to manage. API Gateway: managed service, scales automatically. The only manual scaling action is requesting Lambda concurrency limit increases from AWS. For predictable high-traffic events (marketing campaign launches), pre-warm Lambda with provisioned concurrency and optionally switch DynamoDB to provisioned capacity with auto-scaling for more predictable performance.

Monitoring & Alerting

Serverless monitoring uses CloudWatch metrics natively since all components are AWS managed. Lambda: invocation count, duration (p50/p99), error count, throttle count, concurrent executions, cold start percentage. DynamoDB: ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, ThrottledRequests, SuccessfulRequestLatency, SystemErrors. API Gateway: 4xx/5xx error rates, latency (p50/p99), request count, integration latency. Key dashboard: correlate Lambda cold start rate with p99 latency to quantify cold start impact. Alert on throttle counts at any layer. The most important operational metric is Lambda concurrent executions — it determines when you need to request a limit increase or add provisioned concurrency.

Cost Analysis

The serverless model is pay-per-request at every layer. At zero traffic: $0/month (the entire stack costs nothing when idle). At 1K RPS sustained: Lambda (~$60/month), DynamoDB (~$100/month), API Gateway (~$10/month) = ~$170/month total. At 5K RPS sustained: Lambda (~$300/month), DynamoDB (~$400/month), API Gateway (~$50/month) = ~$750/month. At 10K RPS sustained: Lambda (~$600/month), DynamoDB (~$800/month), API Gateway (~$100/month) = ~$1,500/month. The crossover with dedicated infrastructure (v1 at ~$400/month) occurs around 3-5K sustained RPS. However, for bursty traffic (e.g., 500 RPS average with 10K RPS peaks), serverless is 60-80% cheaper because you only pay for actual invocations, not idle capacity.

Security Considerations

API Gateway provides the security perimeter: rate limiting prevents abuse, request validation blocks malformed payloads. For authentication, API Gateway supports API keys, IAM authorization, or Cognito authorizers. Lambda execution roles follow least privilege — the function only has DynamoDB read/write permissions for the urls table. DynamoDB encryption at rest is enabled by default (AWS-managed keys or customer-managed KMS keys). All data in transit is encrypted via HTTPS (API Gateway enforces TLS). URL validation in the Lambda function blocks known malicious domains and open redirector patterns. The serverless model eliminates OS-level security concerns (patching, hardening) since AWS manages the underlying infrastructure.

Deployment Strategy

Lambda deployment uses versioning and aliases. A new version is deployed alongside the current version. An alias (e.g., 'prod') points to the current version. After testing, the alias is updated to point to the new version (instant traffic shift). Canary deployments route a percentage of traffic to the new version using weighted aliases (e.g., 10% new, 90% old). Rollback: point the alias back to the previous version (instant, no redeployment). DynamoDB table changes are backward-compatible — add new attributes freely; removing or renaming attributes requires a phased migration. Infrastructure-as-code (CDK or Terraform) manages all resources for reproducible deployments.

Real-World Examples
  • AWS URL Shortener reference architecture (Lambda + DynamoDB)
  • Serverless Framework showcase applications
  • Internal tools at startups using AWS serverless stack
  • Event-driven short link generators for marketing campaigns (zero idle cost)
Solution Comparison
VariantTierLatencyThroughputCostComplexityReliability
Naive (Single Server)T1~50ms p99~500 RPS~$180/mo3 components~99% (single pod)
Counter-Based (Base62)T2<100ms p99~20K RPS~$400/mo6 components~99.9%
Production Multi-RegionT3~2ms CDN hit100K+ RPS~$3,000/mo11 components~99.99%
Serverless (Lambda + DynamoDB)T4<30ms warm10K+ RPS (auto)$0-800/mo4 components~99.99%

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
What is Lambda cold-start latency and when does it occur?

A cold start occurs when Lambda creates a new execution environment for a function that hasn't been invoked recently (typically 5-15 minutes of inactivity). It adds 100-500ms for container initialization, runtime bootstrapping, and code loading. Subsequent invocations reuse the warm environment (~5ms overhead). Cold starts are most visible after idle periods (overnight, weekends) and during sudden traffic spikes.

When does serverless become more expensive than dedicated containers?

The cost crossover is between 5-10K sustained RPS. Below 5K RPS, Lambda is 40-60% cheaper because you pay nothing during idle periods. Above 10K sustained RPS, Lambda costs ~$600/month for compute alone, while equivalent ECS Fargate pods cost ~$400/month with better p99 latency. Bursty workloads favor serverless even at higher average RPS due to zero idle cost.

Why DynamoDB instead of Aurora Serverless?

DynamoDB is preferred because the access pattern is a pure key-value lookup with no joins or aggregations. DynamoDB GetItem returns in 5ms at any scale. Aurora Serverless adds SQL capabilities but with connection management overhead, scaling latency (~30 seconds for Aurora auto-scaling), and higher per-request costs for simple key-value reads.

How do you mitigate vendor lock-in?

The URL shortening business logic is portable — it is a Lambda function. The lock-in is in infrastructure glue: API Gateway routing, Lambda deployment, DynamoDB table schema. Mitigations: use infrastructure-as-code (CDK/Terraform), abstract the data access layer behind an interface, and keep business logic separate from AWS SDK calls.

Can you add caching without adding a component?

Yes. DynamoDB Accelerator (DAX) provides a fully managed in-memory cache in front of DynamoDB. DAX reduces read latency from 5ms to sub-millisecond without changing application code — point the DynamoDB client at the DAX endpoint instead. DAX costs ~$0.04/hour per node, justified only when volume or latency requirements demand it.

How does serverless compare to the Production (v3) variant?

The serverless variant trades p99 latency (cold starts) and high-traffic cost efficiency for dramatic simplicity (4 vs 11 components) and ~80% cost reduction at variable traffic. At 5K average RPS with bursty patterns, serverless costs ~$400/month vs ~$3,000/month for production. At 100K sustained RPS, the production variant wins on both cost and latency.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own TinyURL?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator