Medium5 componentsInterview: Very High

Video Streaming — Naive (Single Bitrate, No CDN)

Q: Why is video streaming one of the most common system design interview questions?

Video streaming combines five hard distributed systems challenges: (1) massive bandwidth engineering — 100M viewers x 5 Mbps = 500 Tbps of egress, (2) async processing pipelines — transcoding takes minutes and must not block the upload response, (3) content delivery networks — edge caching is non-negotiable for global low-latency delivery, (4) adaptive bitrate protocols — HLS/DASH enable quality switching for heterogeneous network conditions, and (5) storage at petabyte scale — 500 hours uploaded per minute at YouTube scale. YouTube, Netflix, Twitch, and TikTok ask this question because it is their core business. Google, Amazon, and Meta ask it because it tests CDN architecture, async pipelines, and bandwidth cost optimization.

Q: Why does synchronous transcoding kill the system?

FFmpeg transcoding of a 1-hour video takes 15-45 minutes of CPU time. During this time, the API thread is completely blocked — it cannot serve any other request. With 4 pods x 10 threads = 40 total threads, 40 concurrent uploads exhaust all capacity. The system stops serving video playback (80% of traffic), metadata queries (15%), and new uploads (5%). This is equivalent to a total outage triggered by just 40 users uploading simultaneously. Async transcoding via Kafka workers frees the upload thread in seconds — the upload response returns immediately while transcoding runs in the background on dedicated workers.

Q: At what viewer count does the lack of CDN become catastrophic?

At approximately 100-500 concurrent viewers. A single 1080p stream at 5 Mbps = 2.25 GB/hour per viewer. At 100 viewers, S3 serves 500 Mbps — manageable. At 1,000 viewers, 5 Gbps — S3 may begin throttling popular objects. At 10,000 viewers, 50 Gbps — S3 returns 503 SlowDown errors and the cost is $900/hour in egress. CloudFront CDN absorbs 95%+ of this traffic at edge locations, reducing origin load to under 2.5 Gbps and cutting per-GB cost by 5-9x. CDN is the single most impactful architectural change for video delivery.

Q: Why not just add more API server pods instead of async transcoding?

Adding pods increases the thread count but does not solve the fundamental problem: each transcoding job holds a thread for 30+ minutes. With 20 pods x 10 threads = 200 threads, the system can handle 200 concurrent uploads — but those 200 threads are blocked for 30 minutes each, consuming 200 CPU cores continuously. The cost of running 200 CPU cores 24/7 for transcoding is approximately $15,000/month, versus $2,000/month for 16 dedicated GPU workers that transcode 5x faster. Async transcoding on specialized hardware is both cheaper and more responsive.

Q: How does single bitrate affect the viewer experience?

A single 1080p stream requires a stable 5 Mbps connection. Viewers on 3G (1 Mbps), congested public Wi-Fi, or bandwidth-limited mobile plans experience constant buffering because the player has no lower-quality fallback. The video either plays at 1080p or buffers — there is no middle ground. HLS adaptive bitrate provides 5 quality tiers (240p at 0.5 Mbps through 4K at 15 Mbps). The player monitors download speed per segment and switches quality tiers to prevent buffering while maximizing visual quality. This is why every production video platform uses adaptive bitrate.

The simplest possible video streaming architecture: a single service handles upload (proxied through the API server), synchronous transcoding (user waits 30+ minutes), single bitrate MP4 output (no adaptive streaming), and direct S3 origin delivery (no CDN). Demonstrates why async transcoding, adaptive bitrate, and CDN are non-negotiable at any meaningful viewer count.

ComputeBeginnerBottleneck AnalysisVideo Streaming

Try in Simulator

Problem Statement

Video streaming is one of the most commonly asked system design interview questions because it combines massive bandwidth engineering, async processing pipelines, content delivery networks, and adaptive bitrate protocols into a single problem. Companies like YouTube, Netflix, Hulu, Twitch, and TikTok all ask variants of this question because it directly maps to their core engineering challenges: how do you deliver video to 100M+ concurrent viewers at 100+ Tbps of egress while keeping playback start latency under 1 second and buffering rate below 0.5%?

The naive approach uses the simplest possible architecture: a single monolithic service backed by PostgreSQL and S3. Uploaders send the entire video file body through the API server via a standard multipart form POST. The API server receives all bytes into memory (or streams to disk), then writes the file to S3. For a 2GB video, this means 2GB of server memory is consumed for the entire upload duration. At 100 concurrent uploads averaging 1GB each, the API tier needs 100GB of memory just for proxying bytes — and the server is doing nothing useful with those bytes, just relaying them to S3.

After the upload completes, the API server runs FFmpeg synchronously in the request path. The HTTP connection remains open while FFmpeg converts the raw video to a single 1080p MP4 file. A 1-hour video takes 15-45 minutes of CPU time to transcode, depending on the source format and resolution. During this time, the API thread is completely blocked — it cannot serve any other request. With 4 pods at 10 threads each, only 40 concurrent transcodes are possible before the entire system stops accepting requests, including video playback and metadata queries from existing viewers.

The transcoded output is a single 1080p MP4 file — no adaptive bitrate, no quality tiers, no HLS segments. Viewers on slow connections (3G mobile, congested Wi-Fi) must download the full 5 Mbps 1080p stream or buffer endlessly. There is no quality fallback: the player cannot switch to 480p or 240p because those variants do not exist. This makes the platform unusable for a significant fraction of the audience — anyone without a stable 5+ Mbps connection experiences constant buffering.

Viewer playback is served directly from S3 origin. The API server looks up the video's S3 URL in PostgreSQL and returns an HTTP 302 redirect. Every viewer then streams the full MP4 directly from S3. At 1,000 concurrent viewers watching 5 Mbps streams, S3 serves 5 Gbps of egress. At 10,000 viewers, 50 Gbps. At 100,000 viewers, 500 Gbps — far beyond S3's practical throughput limits. Popular videos trigger S3 503 SlowDown throttling errors. And the cost is catastrophic: S3 data transfer out is $0.09/GB, so 10,000 viewers watching for 2 hours costs $4,000 in egress alone.

This template exists to make three fatal bottlenecks visible and measurable: (1) API bandwidth saturation from proxied uploads, (2) thread starvation from synchronous transcoding blocking all workers, and (3) catastrophic origin egress cost without CDN. Run the simulation at increasing viewer counts and watch S3 egress costs grow linearly while the API tier collapses under transcoding load. The comparison with V1 (CDN + async transcode) quantifies the improvement: presigned URLs eliminate upload proxying, Kafka workers free the API thread in seconds, and CloudFront absorbs 95% of viewer traffic at 100x lower cost per GB.

Interviewers expect candidates to identify these three bottlenecks, propose async transcoding via a message queue, explain why CDN is non-negotiable for video delivery, and discuss adaptive bitrate streaming (HLS/DASH) as the solution to the single-bitrate problem.

Architecture Overview

The naive video streaming system is a four-component architecture: Client, VideoLB (Load Balancer), VideoService (monolith), and VideoDatabase (PostgreSQL), with VideoStorage (S3) as an attached object store. There is no CDN, no message queue, no async workers, no cache layer, and no separation between upload, transcoding, and playback paths.

All traffic arrives at VideoLB (AWS ALB), which distributes requests across VideoService pods using round-robin. The LB handles up to 15K RPS — well above the system's actual limits, which are constrained by thread exhaustion in the service tier and egress bandwidth from S3. The load balancer is never the bottleneck.

VideoService is a monolithic service handling three types of requests. Upload requests (5% of traffic): the client POSTs the full video body to the API server. The server buffers the entire file in memory, writes it to S3, then runs FFmpeg synchronously to transcode to a single 1080p MP4. The HTTP response is not sent until transcoding completes — 15 to 45 minutes later. Each upload consumes one thread for the entire duration plus the full file size in server memory. Metadata queries (15%): GET requests for video title, description, view count — simple PostgreSQL reads at 12ms per query. Playback redirects (80%): the server looks up the video's S3 URL in PostgreSQL and returns an HTTP 302 redirect. The viewer's player then streams the MP4 directly from S3.

VideoDatabase (PostgreSQL) is a single primary instance with no read replicas. It stores video metadata (title, description, S3 URL, status, view count) and user accounts. Every metadata query and playback redirect requires a database read. At 10K RPS peak, the database handles approximately 8,000 redirect lookups/sec + 1,500 metadata reads/sec + 500 upload writes/sec. The 200-connection pool can sustain this load, but transcoding threads hold connections open for 30+ minutes, gradually exhausting the pool.

VideoStorage (S3) serves dual duty: receiving uploads from the API server and serving all viewer playback traffic. There is no CDN in front of S3. Every viewer request hits the S3 origin directly. S3 can handle high request rates for small objects, but video files are large (500MB-5GB), and the combined bandwidth of thousands of concurrent streams quickly exceeds practical limits. S3 responds with 503 SlowDown errors when a single prefix receives too many concurrent GET requests.

The total end-to-end flow for an upload is: Client sends POST with video body (minutes to hours depending on file size and connection speed) -> VideoLB routes to VideoService -> VideoService buffers entire file in memory -> VideoService writes to S3 (minutes) -> VideoService runs FFmpeg synchronously (15-45 minutes) -> VideoService writes transcoded MP4 to S3 -> VideoService INSERTs video record in PostgreSQL -> VideoService returns 200 to client. The total response time for a 1-hour video upload is 20-60 minutes. During this entire time, the thread and its memory allocation are unavailable for any other request.

Architecture Preview

Loading architecture preview...

Open in Simulator

Request Flow — Upload, Transcode, and Playback

This sequence diagram traces the three primary flows: video upload with synchronous transcoding, metadata queries, and playback redirects. The critical insight is the 15-45 minute blocking period during transcoding — the API thread is completely unavailable. The second insight is that every viewer streams directly from S3 origin, with egress scaling linearly with viewer count.

Loading diagram...

Step-by-Step Walkthrough

1Client uploads the full video body to the API server via POST. The server buffers the entire file in memory (2GB for a typical video). This consumes server memory and bandwidth for the upload duration
2The server writes the raw file to S3 and then runs FFmpeg synchronously. The thread is blocked for 15-45 minutes. No other requests can be processed on this thread during transcoding
3After transcoding, the server writes the 1080p MP4 to S3, inserts a video record in PostgreSQL, and finally returns 200 to the client. The user has been waiting 15-45 minutes for this response
4Viewers request playback. The server looks up the S3 URL in PostgreSQL (12ms) and returns a 302 redirect. The viewer's player then streams the MP4 directly from S3 origin — no CDN, no edge caching
5Every concurrent viewer adds 5 Mbps of S3 origin egress. At 1,000 viewers: 5 Gbps. At 10,000: 50 Gbps. S3 throttles popular objects and the cost is $0.09/GB

Pseudocode

// UPLOAD — synchronous transcoding (THE BOTTLENECK)
async function uploadVideo(title, description, fileBody):
    // Step 1: Buffer entire file in memory (2GB for a typical video)
    videoData = await readRequestBody(fileBody)  // O(N) memory

    // Step 2: Write raw file to S3
    rawKey = "raw/" + uuid() + ".mp4"
    await s3.putObject(rawKey, videoData)  // minutes for large files

    // Step 3: Synchronous transcoding — BLOCKS THREAD FOR 15-45 MINUTES
    outputKey = "transcoded/" + uuid() + "_1080p.mp4"
    await exec("ffmpeg -i /tmp/raw.mp4 -vf scale=1920:1080 /tmp/output.mp4")
    // ^^ 15-45 MINUTES OF CPU TIME. Thread is BLOCKED.
    await s3.putObject(outputKey, readFile("/tmp/output.mp4"))

    // Step 4: Store metadata
    await db.execute(
        "INSERT INTO videos (video_id, title, s3_url, status) VALUES (?, ?, ?, 'ready')",
        [uuid(), title, s3Origin + outputKey]
    )
    return { video_id, status: "ready" }  // After 15-45 min wait

// PLAYBACK — redirect to S3 origin (no CDN)
async function streamVideo(video_id):
    row = await db.execute("SELECT s3_url FROM videos WHERE video_id = ?", [video_id])
    return redirect(302, row.s3_url)
    // Every viewer follows this redirect to S3 origin
    // 1000 viewers x 5 Mbps = 5 Gbps origin egress

Database Schema (ER Diagram)

The schema reflects the naive approach's single-database design. The videos table is the only performance-relevant table — read on every playback redirect (80% of traffic) and every metadata query (15%). No caching layer means all reads hit PostgreSQL directly.

Loading diagram...

Key Design Decisions

API-Proxied Upload (No Presigned URLs)

Choice

POST full video body to the API server instead of direct-to-S3 upload

Rationale

The naive approach uses standard multipart form upload because it requires no S3 SDK on the client, no presigned URL generation, and no multipart upload coordination. The server receives the file and writes it to S3 — simple and familiar. The catastrophic cost is that every byte of video passes through the API server's memory. A 2GB video consumes 2GB of RAM for the upload duration. At 100 concurrent uploads, the API tier needs 100GB+ of memory just for proxying. The V1 variant solves this with presigned URLs: the client uploads directly to S3, and the API server only signs the URL (2ms, 512 bytes of memory).

Synchronous FFmpeg Transcoding

Choice

Run FFmpeg in the API request handler, blocking the HTTP response

Rationale

Synchronous transcoding keeps the implementation trivially simple — one function call: receive file, transcode, store, respond. No Kafka, no worker pools, no job tracking, no status polling. But the HTTP request stays open for 15-45 minutes while FFmpeg runs on a single CPU core. The thread is completely blocked. With 40 total threads (4 pods x 10), 40 concurrent uploads consume every thread, and the system stops serving playback and metadata requests. The V1 variant publishes a transcode-job to Kafka and returns immediately — the upload response takes seconds, not minutes.

Single 1080p MP4 Output

Choice

One resolution, one file, no adaptive streaming

Rationale

Transcoding to a single MP4 is one FFmpeg command with default settings. HLS adaptive bitrate requires producing 5 resolution variants, segmenting each into 6-10 second chunks, generating per-variant playlists, and creating a master manifest — roughly 3x the transcoding time, 5x the storage, and significant pipeline complexity. The naive approach avoids all of this at the cost of viewer experience: anyone without a stable 5+ Mbps connection buffers constantly because there is no lower-quality fallback.

Direct S3 Origin Delivery (No CDN)

Choice

Serve all viewer traffic directly from S3 with no edge caching

Rationale

S3 serves files over HTTPS natively — no CDN configuration, no edge location management, no cache invalidation. The fatal flaw is linear egress scaling: every viewer request hits the origin. At 1,000 viewers x 5 Mbps = 5 Gbps from S3. At 100,000 viewers = 500 Gbps — physically impossible for a single S3 bucket. CDN (CloudFront) caches content at 400+ edge locations, absorbing 95%+ of requests and reducing origin load to a fraction of total traffic. CDN egress also costs $0.01-0.02/GB versus S3's $0.09/GB — a 5-9x cost reduction.

No Metadata Cache

Choice

Every metadata and redirect query reads from PostgreSQL directly

Rationale

Without a Redis cache layer, every GET /videos/{id} and GET /videos/{id}/stream query hits PostgreSQL. At 9,500 reads/sec (80% redirects + 15% metadata at 10K peak), PostgreSQL handles the load adequately on a single db.r7g.xlarge instance. The lack of caching is not the primary bottleneck — thread starvation and origin egress are far more critical. However, adding Redis would reduce database load by 85-90% and free connection pool capacity for upload writes.

Scale & Performance

Target RPS

~10K RPS (theoretical; thread starvation limits actual throughput)

Latency (p99)

15-45 min upload (synchronous transcode), 50-200ms playback redirect

Storage

~50 GB/year per video (raw + single 1080p transcode, no multi-resolution)

Availability

~99% (single DB, no CDN, no redundancy)

Time & Space Complexity

Operation	Time	Space	Notes
Video upload (API-proxied)	O(N) — linear in file size (all bytes proxied through API server)	O(N) — file size buffered in server memory	A 2GB video consumes 2GB of server memory for the upload duration. At 100 concurrent uploads: 100GB of memory consumed for proxying alone. Presigned URLs reduce this to O(1) — 2ms and 512 bytes per upload.
Synchronous transcoding (FFmpeg)	O(N x M) — N = video duration, M = output resolution pixels	O(N) — intermediate frames buffered during encoding	1-hour 1080p video: ~30 minutes of CPU time. Blocks the API thread for the entire duration. With 40 threads total, limits concurrent transcodes to 40.
Playback redirect (GET /stream)	O(1) — indexed PK lookup in PostgreSQL	O(1) — returns a URL string	Fast per-query (12ms) but at 8,000 QPS peak. The redirect itself is cheap — the problem is that every viewer then streams the full video from S3 origin.
S3 origin delivery (no CDN)	O(1) per request — S3 serves the object	O(N) bandwidth — file size x concurrent viewers	1,000 viewers x 5 Mbps = 5 Gbps origin egress. S3 throttles at high request rates for popular objects. No caching, no edge distribution, no bandwidth amortization.

Database Schema (HLD)

videos

Video metadata including upload status, S3 origin URL, and view counts. Written on upload initiation, updated after synchronous transcoding completion. Read on every playback redirect and metadata query. No caching — all reads hit PostgreSQL directly.

video_id UUID PKtitle TEXTdescription TEXTs3_url TEXT (null until transcoding completes)status TEXT (uploading | transcoding | ready | failed)view_count INTEGER DEFAULT 0file_size_bytes BIGINTduration_seconds INTEGERcreated_at TIMESTAMPTZ

Indexes: PK on video_id, idx_videos_status ON (status, created_at), idx_videos_created ON (created_at DESC) for browse pagination

The s3_url column is null during upload and transcoding. It is set to the S3 origin URL after synchronous transcoding completes. Every playback redirect reads this column — at 8,000 reads/sec peak, this is the hottest query path. Without a cache layer, all reads hit PostgreSQL directly.

users

User account records for uploaders and viewers. Low write volume. Read on authentication. Small table fully cached in PostgreSQL buffer pool.

user_id UUID PKusername TEXT UNIQUEemail TEXT UNIQUEcreated_at TIMESTAMPTZ

Indexes: PK on user_id, UNIQUE on username, UNIQUE on email

Small table (~100K rows). Not a performance concern — fully cached in buffer pool.

What-If Scenarios

Viral video with 100K concurrent viewers (no CDN to absorb traffic)

Impact

S3 origin faces 100K concurrent GET requests for the same object at 5 Mbps each = 500 Gbps. S3 returns 503 SlowDown errors for the majority of requests. Viewers experience constant buffering, connection timeouts, and complete playback failure. The API server's redirect endpoint continues to work, but every redirect leads to a failing S3 download.

Mitigation

Add CloudFront CDN with the S3 bucket as origin. CloudFront caches the video at 400+ edge locations. At 95% cache hit rate, only 5K of 100K requests reach S3. Edge locations serve video with sub-100ms latency globally. This is the single most impactful change — the V1 variant adds CDN as its primary improvement.

40 simultaneous uploads exhaust all API threads (total outage)

Impact

Each upload holds a thread for 15-45 minutes during synchronous transcoding. At 40 concurrent uploads (4 pods x 10 threads), every thread is blocked by FFmpeg. New requests — including playback redirects and metadata queries — queue indefinitely. Existing viewers whose players need new segment data (seek, quality change) receive 503 errors. Effectively a total outage caused by just 40 uploaders.

Mitigation

Move transcoding to async Kafka workers. The upload endpoint writes the raw file to S3, publishes a transcode-job event to Kafka, and returns immediately (seconds, not minutes). Dedicated TranscodeWorker pods consume jobs and process them independently. The API thread is freed for playback and metadata requests.

Viewer on 1 Mbps 3G connection attempting to watch 5 Mbps 1080p video

Impact

The player attempts to download the 1080p stream at 5 Mbps over a 1 Mbps connection. The video buffers for approximately 4 seconds for every 1 second of playback (5:1 ratio). The viewer experiences constant start-stop buffering, making the video unwatchable. There is no lower-quality alternative because only a single 1080p variant was transcoded.

Mitigation

Implement HLS adaptive bitrate streaming. Transcode to 5 resolution tiers: 240p (0.5 Mbps), 480p (1.5 Mbps), 720p (3 Mbps), 1080p (5 Mbps), 4K (15 Mbps). The HLS player selects 480p on a 1 Mbps connection, providing smooth playback at lower quality rather than constant buffering at high quality.

Upload of a 10GB video file causes API server out-of-memory crash

Impact

The API server buffers the entire 10GB file in memory before writing to S3. A single 10GB upload exceeds the 16GB memory allocation for the pod, triggering an OOM kill. The pod restarts, dropping all in-flight requests (including other uploads and playback redirects). If multiple large uploads hit the same pod, cascading OOM kills can take down the entire service tier.

Mitigation

Use presigned URLs for direct-to-S3 multipart upload. The API server generates signed URLs (2ms, 512 bytes) and the client uploads 100MB chunks directly to S3. The API server never touches video bytes. A 10GB upload becomes 100 chunks of 100MB each, with per-chunk retry on failure.

Failure Modes & Resilience

Component	Failure	Impact	Mitigation
VideoService (Monolith)	Thread starvation from synchronous transcoding	40 concurrent uploads block all 40 threads for 15-45 minutes each. All traffic fails — playback redirects, metadata queries, and new uploads return 503. Total system outage caused by normal upload activity.	Async transcoding via Kafka workers. Upload endpoint returns in seconds. Dedicated TranscodeWorker pods handle CPU-intensive work independently of the API tier.
VideoService (Monolith)	OOM crash from proxying large video files	A 10GB upload exceeds the pod's 16GB memory. OOM kill restarts the pod, dropping all in-flight requests. Cascading OOM kills can take down the entire service tier.	Presigned URLs for direct-to-S3 upload. API server never buffers video bytes. Memory usage per upload drops from gigabytes to kilobytes.
VideoStorage (S3 Origin)	503 SlowDown throttling from high concurrent access to popular objects	S3 throttles requests to frequently accessed objects. Viewers of popular videos experience connection timeouts and playback failure. Unpopular videos continue to play normally.	CloudFront CDN as S3 origin shield. CDN absorbs 95%+ of requests. S3 sees only cache misses — dramatically reducing per-object request rate.
VideoDatabase (PostgreSQL)	Connection pool exhaustion from long-held transcoding connections	Transcoding threads hold database connections open for 30+ minutes (for status updates during transcoding). 200-connection pool gradually fills with long-held connections, blocking short-lived metadata and redirect queries.	Separate connection pools for upload (long-held) and read (short-lived) paths. Or better: move transcoding off the API server entirely (async workers), eliminating long-held connections.

Scaling Strategy

Vertical scaling only for PostgreSQL (upgrade instance size). Horizontal scaling for VideoService via pod count (4 -> 8 -> 16). But adding pods only increases concurrent transcoding capacity linearly — 40 -> 80 -> 160 concurrent uploads. Each upload still blocks a thread for 30+ minutes. The ceiling is approximately 500 concurrent viewers before S3 egress becomes unmanageable, regardless of service pod count. Beyond this, architectural changes are required: CDN for viewer delivery (V1) and async transcoding for upload capacity (V1).

Monitoring & Alerting

Key metrics to monitor: (1) Active transcoding threads — the primary capacity indicator. Alert at 30/40 (75%), critical at 38/40 (95%). (2) S3 origin egress bandwidth — alert at 1 Gbps, critical at 5 Gbps. Without CDN, this scales linearly with viewer count and is the dominant cost driver. (3) API server memory usage — alert at 70% of pod memory. Each proxied upload consumes file-size memory. (4) PostgreSQL active connections — alert at 150/200 (75%). Long-held transcoding connections compete with short-lived read queries. (5) S3 503 SlowDown error rate — indicates object-level throttling from too many concurrent viewers on popular videos. (6) Upload response time — should be 15-45 minutes per video (synchronous transcoding). Anything longer indicates CPU contention or S3 write throttling.

Cost Analysis

At 1,000 concurrent viewers: VideoService 4 pods ECS Fargate (~$400/month), PostgreSQL db.r7g.xlarge (~$350/month), S3 storage (~$50/month for moderate catalog), S3 egress at 5 Gbps continuous = 1,620 TB/month x $0.09/GB = ~$145,800/month in egress alone. Total: ~$146,600/month — dominated entirely by S3 egress. The V1 variant with CloudFront CDN reduces egress cost by 5-9x: CloudFront absorbs 95% of traffic at $0.01-0.02/GB, reducing the egress bill to approximately $20,000/month. CDN is not just a performance optimization — it is the primary cost optimization.

Security Considerations

Upload validation: the API server must validate video file headers (magic bytes) to prevent non-video files from being uploaded and consuming transcoding resources. File size limits (10GB max) prevent storage abuse. Rate limiting on uploads (10/minute per user) prevents bulk upload attacks. No content moderation in the naive approach — all uploaded content is immediately accessible after transcoding. Video URLs are S3 origin URLs with no access control — anyone with the URL can stream the video. The V1 variant uses CloudFront signed URLs for time-limited access control.

Deployment Strategy

Rolling deployment for VideoService — replace one pod at a time. In-flight transcoding jobs on the old pod are lost on replacement (no checkpointing). Database migrations require a maintenance window for schema changes. S3 storage is stateless and always available. No blue/green or canary — the naive architecture does not support sophisticated deployment strategies.

Real-World Examples

•Early YouTube (2005-2006) used a similar architecture with PHP monolith handling upload and transcoding before migrating to a distributed transcoding pipeline
•Small video hosting startups (Wistia, Vimeo early days) launched with synchronous transcoding and S3 origin delivery before adding CDN as their first scaling investment
•Internal corporate video platforms often use this pattern because viewer counts are low (under 1,000) and the simplicity outweighs the scaling limitations

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
V0: Naive (Single Bitrate, No CDN)	T1	15-45 min upload, 50-200ms playback redirect	~10K RPS (limited by 40 threads)	$146K/month at 1K viewers (S3 egress dominated)	Low	99% (single DB, no CDN)
V1: CDN + Async Transcode (HLS + CloudFront)	T2	~2s upload init, 5-30 min async transcode, <1s playback	100K RPS peak	$25K/month at 10M viewers (CDN egress)	Medium	99.9% (multi-AZ, CDN)
V2: Adaptive Multi-Region (HLS + Edge + Cassandra)	T3	<1s upload init, 3-20 min GPU transcode, <800ms playback	200K RPS peak	$80K/month at 100M viewers (multi-region CDN + GPU)	Very High	99.99% (multi-region, origin failover)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why is video streaming one of the most common system design interview questions?

Video streaming combines five hard distributed systems challenges: (1) massive bandwidth engineering — 100M viewers x 5 Mbps = 500 Tbps of egress, (2) async processing pipelines — transcoding takes minutes and must not block the upload response, (3) content delivery networks — edge caching is non-negotiable for global low-latency delivery, (4) adaptive bitrate protocols — HLS/DASH enable quality switching for heterogeneous network conditions, and (5) storage at petabyte scale — 500 hours uploaded per minute at YouTube scale. YouTube, Netflix, Twitch, and TikTok ask this question because it is their core business. Google, Amazon, and Meta ask it because it tests CDN architecture, async pipelines, and bandwidth cost optimization.

Why does synchronous transcoding kill the system?

FFmpeg transcoding of a 1-hour video takes 15-45 minutes of CPU time. During this time, the API thread is completely blocked — it cannot serve any other request. With 4 pods x 10 threads = 40 total threads, 40 concurrent uploads exhaust all capacity. The system stops serving video playback (80% of traffic), metadata queries (15%), and new uploads (5%). This is equivalent to a total outage triggered by just 40 users uploading simultaneously. Async transcoding via Kafka workers frees the upload thread in seconds — the upload response returns immediately while transcoding runs in the background on dedicated workers.

At what viewer count does the lack of CDN become catastrophic?

At approximately 100-500 concurrent viewers. A single 1080p stream at 5 Mbps = 2.25 GB/hour per viewer. At 100 viewers, S3 serves 500 Mbps — manageable. At 1,000 viewers, 5 Gbps — S3 may begin throttling popular objects. At 10,000 viewers, 50 Gbps — S3 returns 503 SlowDown errors and the cost is $900/hour in egress. CloudFront CDN absorbs 95%+ of this traffic at edge locations, reducing origin load to under 2.5 Gbps and cutting per-GB cost by 5-9x. CDN is the single most impactful architectural change for video delivery.

Why not just add more API server pods instead of async transcoding?

Adding pods increases the thread count but does not solve the fundamental problem: each transcoding job holds a thread for 30+ minutes. With 20 pods x 10 threads = 200 threads, the system can handle 200 concurrent uploads — but those 200 threads are blocked for 30 minutes each, consuming 200 CPU cores continuously. The cost of running 200 CPU cores 24/7 for transcoding is approximately $15,000/month, versus $2,000/month for 16 dedicated GPU workers that transcode 5x faster. Async transcoding on specialized hardware is both cheaper and more responsive.

How does single bitrate affect the viewer experience?

A single 1080p stream requires a stable 5 Mbps connection. Viewers on 3G (1 Mbps), congested public Wi-Fi, or bandwidth-limited mobile plans experience constant buffering because the player has no lower-quality fallback. The video either plays at 1080p or buffers — there is no middle ground. HLS adaptive bitrate provides 5 quality tiers (240p at 0.5 Mbps through 4K at 15 Mbps). The player monitors download speed per segment and switches quality tiers to prevent buffering while maximizing visual quality. This is why every production video platform uses adaptive bitrate.

Related Templates

Video Streaming — CDN + Async Transcode (HLS + CloudFront)Video Streaming — Adaptive Multi-Region (HLS + Edge + Cassandra)Instagram — CDN-First Photo Sharing Dropbox — File Storage & Sync

Discussion

Ready to design your own Video Streaming?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator