The simplest possible Instagram architecture: a single monolith service that proxies all image uploads through the API server, generates thumbnails synchronously, and stores everything in PostgreSQL. No CDN, no cache, no async processing. Demonstrates why presigned URL uploads and CDN delivery are essential for any media-heavy application.
Instagram is one of the most frequently asked system design interview questions because it combines media upload and processing, social graph traversal, feed generation, and global content delivery into a single problem. Companies like Meta, Pinterest, TikTok, Snapchat, and Twitter ask variants of this question because it tests a candidate's ability to reason about bandwidth constraints, storage scaling, caching strategies, and the fundamental trade-offs between consistency and latency in media-heavy systems.
The naive approach uses the simplest possible architecture: a single MediaService backed by PostgreSQL and basic S3 storage. When a user uploads a photo, the entire 2MB image is sent as a multipart POST request to the API server. The MediaService receives the full payload in memory, validates the file type (JPEG, PNG, HEIC) and size, generates a 150px thumbnail synchronously using libvips (taking 300-500ms of CPU time), uploads both the original and thumbnail to S3 via the AWS SDK PutObject API, inserts a media record into PostgreSQL, and finally returns 201 Created with the media_id. Every byte of the image passes through the API server — at 100 uploads/sec, this means 200MB/sec of image data flowing through the service tier, consuming network bandwidth and memory that could otherwise serve feed reads.
Feeds are constructed using a pure pull model. When a user opens their feed, the MediaService executes a JOIN query: SELECT posts.* FROM posts JOIN follows ON posts.author_id = follows.followee_id WHERE follows.follower_id = ? ORDER BY created_at DESC LIMIT 20. This query runs on every single feed request — there is no pre-computed feed cache. At 600 feed reads/sec, the database handles 600 complex JOIN queries per second, each scanning the follows index to find the user's followed accounts and then the posts index to find recent content. The query complexity grows with the number of followed accounts — a user following 1,000 accounts triggers a massive IN clause that forces PostgreSQL to scan thousands of index entries.
There is no CDN. All image reads hit the S3 origin directly. A user in Tokyo loading a feed with 5 images makes 5 separate requests to S3 in us-east-1, each adding 150-200ms of network latency. The total image loading time for a single feed page is 750-1000ms — dramatically worse than the 25-50ms that a CDN edge location would provide. At 600 feed reads/sec with 5 images per feed, the system generates 3,000 S3 GET requests/sec for images alone, all hitting the origin.
The architecture has no async processing pipeline. Thumbnail generation is synchronous — the user waits 300-500ms for libvips to resize the image before getting a response. There is no Kafka event stream, no worker pool, no multiple image variants (only original + thumbnail). Medium-resolution (600px) and high-resolution (1080px) variants that modern Instagram requires are simply not available.
Interviewers expect candidates to identify three critical bottlenecks: (1) API server bandwidth saturation from proxying image bytes, (2) pull-model feed queries that create O(followed_accounts) JOIN complexity on every read, and (3) absence of CDN causing global latency penalties. The progression from this naive approach to presigned URL uploads (V1) and then to async media pipelines with hybrid feed (V2) demonstrates the candidate's ability to evolve an architecture as requirements scale.
The naive Instagram system is a four-component architecture: Client, Load Balancer, MediaService, MediaDatabase (PostgreSQL), and MediaStorage (S3). There is no CDN, no cache, no event stream, no async workers, and no separation between upload handling and feed serving.
All traffic enters through the Load Balancer (AWS ALB), which distributes requests to MediaService pods using round-robin. The ALB handles up to 10K RPS — well above the system's actual limits, which are constrained by API server bandwidth and database query capacity. The ALB adds approximately 1.5ms of routing latency and must be configured for large request bodies (up to 10MB for image uploads) by increasing the default body size limit.
The MediaService is a monolithic application handling five distinct operations. Image upload (10% of traffic by count, 90%+ by bytes): receives the full 2MB image as multipart form data, buffers it in memory, validates format and size, generates a 150px thumbnail synchronously using libvips (~400ms CPU), uploads both original and thumbnail to S3 via PutObject, and inserts a media record into PostgreSQL. Post creation (10%): validates the media_id, inserts a post record with author_id, caption, and S3 URLs. Feed read (60%): executes the JOIN query across follows and posts tables to construct a reverse-chronological feed. Post detail (15%): simple indexed PK lookup. Like toggle (5%): atomic counter increment on the posts table.
The MediaService runs on 3 ECS Fargate pods with 4 vCPU and 8 GB memory each. At 50 threads per pod (150 total), the service can handle approximately 250 uploads/sec (limited by the 400ms thumbnail generation per upload consuming CPU) and 10K reads/sec (limited by the 15ms average processing time). The bottleneck shifts depending on traffic mix: during upload spikes, CPU is saturated by thumbnail generation; during read spikes, database connections are saturated by JOIN queries.
PostgreSQL stores four tables: users (profile data), posts (content records with S3 URLs), follows (social graph edges), and media (upload metadata). All four tables share a single RDS instance with 100 max connections. The feed query is the heaviest operation: it JOINs follows with posts, filters by follower_id, sorts by created_at DESC, and returns 20 rows. At 600 feed reads/sec, this JOIN dominates database CPU. There are no read replicas — all reads and writes hit the same primary instance.
MediaStorage (S3) stores original images and thumbnails. There are no presigned URLs — all uploads go through the MediaService using the AWS SDK. Images are served to clients via direct S3 origin URLs returned in API responses. Without CDN, every image read hits S3 in us-east-1, adding 50-200ms latency depending on the client's geographic location. S3 handles the throughput adequately (S3 supports thousands of GET requests/sec per prefix), but the latency penalty for global users is severe.
This sequence diagram traces three primary flows: image upload with synchronous thumbnail generation, post creation, and feed reading via pull-model JOIN query. The critical bottleneck is the upload path — the full 2MB image payload flows through the API server, consuming network bandwidth and memory. The second bottleneck is the feed query — every feed request triggers a JOIN across follows and posts tables, with no caching layer.
The key insight is that the API server acts as a proxy for image bytes it does not need to process (beyond thumbnail generation). Presigned URLs (V1) eliminate this proxy entirely, and async workers (V1/V2) move thumbnail generation off the hot path.
Step-by-Step Walkthrough
Choice
Multipart POST through the API server instead of presigned URL direct-to-S3
Rationale
The simplest upload pattern — any HTTP client can send a multipart form POST. No presigned URL generation, no client-side S3 SDK, no CORS configuration. The devastating trade-off: at 2MB per image x 500 uploads/sec = 1GB/sec of image data flowing through the API tier. The service pods need massive bandwidth and memory to buffer these payloads. Presigned URLs (V1 variant) eliminate this by letting the client upload directly to S3 — the API server only signs a URL (~2ms).
Choice
Generate thumbnail in the upload request path using libvips
Rationale
The thumbnail is available immediately after upload — no eventual consistency, no loading placeholders. The cost is 300-500ms of CPU-intensive libvips processing added to every upload response. At 100 uploads/sec, this consumes ~40% of the service tier's total CPU capacity (3 pods x 4 cores = 12 cores, thumbnail uses ~0.5 cores for 400ms per image). The V1 variant moves this to async Kafka workers, reducing upload latency to under 200ms.
Choice
SELECT ... JOIN follows ON ... WHERE follower_id = ? ORDER BY created_at DESC
Rationale
No background processing required — no fanout workers, no feed cache invalidation, no write amplification on post creation. The feed is always perfectly fresh and consistent. The cost is O(followed_accounts) JOIN query on every feed request. At 600 reads/sec with users following 200 accounts each, the database executes 600 complex JOINs/sec. Pre-computed feeds in Redis (V1 variant) reduce this to O(1) cache lookups at the cost of eventual consistency.
Choice
Serve images directly from S3 origin URLs
Rationale
Zero CDN configuration, no cache invalidation logic, no origin shield setup. One fewer infrastructure component to manage. The cost is that every image read hits S3 in us-east-1 with full internet latency. A user in Singapore adds ~200ms per image request. A feed page with 5 images takes 1+ seconds just for image loading. CDN (V1 variant) reduces this to 5-20ms by serving from edge locations.
Choice
One database instance for users, posts, follows, and media metadata
Rationale
Simple operations: one connection string, one backup schedule, one set of credentials, ACID transactions across all tables. The cost is resource contention: feed JOIN queries (60% of traffic), upload INSERTs (10%), post detail SELECTs (15%), and like UPDATEs (5%) all compete for the same 100-connection pool. Adding read replicas would offload feed queries but does not help with write contention from uploads and likes.
Target RPS
~1K sustained (ceiling at API bandwidth + DB)
Latency (p99)
500-800ms upload, 200-500ms feed read, 50-200ms image fetch
Storage
~200MB/sec growth at 100 uploads/sec
Availability
~99% (single DB instance, no redundancy)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Feed read (pull model JOIN query) | O(F) where F = number of followed accounts | O(F) — intermediate result set from follows JOIN | PostgreSQL executes a nested-loop or hash join between follows and posts. At F=200, the query scans 200 posts index entries and sorts by created_at. At F=1000, the IN clause grows proportionally. P99 latency: ~50ms at F=200, ~200ms at F=1000. |
| Image upload (API-proxied + sync thumbnail) | O(1) per upload, but 400ms wall-clock for thumbnail generation | O(image_size) — full image buffered in service memory (~2MB) | CPU-bound on libvips thumbnail generation. Each upload occupies one thread for ~600ms total (transfer + resize + S3 upload + DB insert). At 150 threads, max throughput is ~250 uploads/sec. |
| Post creation | O(1) — single INSERT | O(1) | Fast operation (~50ms DB write). Not a bottleneck. No fanout — just a single row INSERT. |
| Like toggle | O(1) — atomic UPDATE on like_count | O(1) | Uses PostgreSQL atomic increment: UPDATE posts SET like_count = like_count + 1 WHERE post_id = ?. Fast (~5ms) but creates row-level lock contention on viral posts. |
Post records linking authors to uploaded media. Written on post creation (100/sec), read on every feed request (600/sec) via JOIN with follows table. Indexed by (author_id, created_at) for feed construction. S3 origin URLs stored directly — no CDN URL transformation.
Indexes: PK on post_id, idx_posts_author_time ON (author_id, created_at DESC), idx_posts_created ON (created_at DESC)
The feed query JOINs this table with follows: SELECT posts.* FROM posts JOIN follows ON posts.author_id = follows.followee_id WHERE follows.follower_id = ? ORDER BY posts.created_at DESC LIMIT 20. The compound index on (author_id, created_at) supports this query but the JOIN across the two tables still requires index merge operations at scale.
Social graph edges. Read on every feed request (600/sec) to determine which accounts the user follows. Write-sparse: updated only on follow/unfollow actions. Indexed on both follower_id and followee_id for bidirectional lookups.
Indexes: PK on follow_id, idx_follows_follower ON (follower_id), idx_follows_followee ON (followee_id), UNIQUE ON (follower_id, followee_id)
Queried on every feed request to find followed accounts. At 600 reads/sec with users following 200 accounts on average, the follower_id index returns 200 rows per query. These 200 followee_ids are then used to filter the posts table — creating a multi-key lookup that PostgreSQL handles via index scan but cannot optimize beyond O(followed_accounts) per query.
Upload metadata records. Written once during image upload, read on post creation for validation. Stores S3 keys for original and thumbnail. One record per uploaded image.
Indexes: PK on media_id, idx_media_user ON (user_id, created_at DESC)
Low-traffic table — write-once on upload, read-once on post creation. Not a performance concern. Grows at the upload rate (100 rows/sec baseline).
User profile records. Low write volume (account creation only). Read for display names in feed responses and on post creation for validation.
Indexes: PK on user_id, UNIQUE on username
Small table — fully cached in PostgreSQL buffer pool. Not a performance concern at any scale this architecture supports.
Celebrity uploads a photo and 1M followers request their feed simultaneously
Impact
All 1M feed requests trigger the JOIN query against PostgreSQL. The database handles 1M queries that each scan the follows table and join against posts. Connection pool (100 connections) is exhausted within seconds. Feed read latency spikes from 200ms to 5+ seconds. Upload and post creation requests fail with connection timeout errors.
Mitigation
Add Redis feed cache (V1 variant): pre-compute feeds so reads are O(1) LRANGE operations. For celebrities specifically, the V2 variant uses hybrid fanout — pull celebrity posts on read instead of pushing to 1M caches.
Image upload rate doubles from 100/sec to 200/sec during a viral event
Impact
API server bandwidth consumption doubles from 200MB/sec to 400MB/sec. CPU usage spikes due to thumbnail generation (200 concurrent resizes x 0.5 cores = 100% of 12 available cores). Feed read requests queue behind upload requests, with feed latency increasing from 200ms to 2+ seconds as threads are occupied by slow uploads.
Mitigation
Presigned URL uploads (V1 variant) eliminate the bandwidth bottleneck entirely. Async thumbnail generation via Kafka workers (V1 variant) moves CPU-intensive work off the API server.
Database connection pool exhaustion during peak hours
Impact
100 max connections saturate when feed JOINs (600/sec x 50ms avg hold = 30 concurrent) plus uploads (100/sec x 60ms = 6 concurrent) plus other queries fill the pool. New requests receive 'connection timeout' errors. All functionality degrades simultaneously because all operations share one database.
Mitigation
Increase max_connections to 300 as a stopgap. Add read replicas for feed queries. Long-term: separate feed reads into a cached path (Redis) and move post storage to Cassandra (V2 variant) for linear-scale writes.
S3 origin becomes slow (degraded performance in us-east-1)
Impact
All image reads are affected — there is no CDN to serve cached copies. Feed pages load with broken or slow images. Upload requests that write to S3 also slow down, causing the synchronous thumbnail upload to take 500ms+ instead of 50ms. Total upload latency exceeds 1 second.
Mitigation
CDN (V1 variant) would absorb 95% of image reads from edge cache, making the architecture resilient to S3 origin slowdowns. For uploads, implement retry with exponential backoff on S3 PutObject failures.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| PostgreSQL (MediaDatabase) | Connection pool exhaustion from concurrent feed JOINs + upload writes | All operations fail — no feed reads, no uploads, no post creation. Total system outage because every operation depends on the single database instance. | PgBouncer connection pooling (transaction mode). Increase max_connections from 100 to 300. Long-term: Redis feed cache for reads (V1) + Cassandra for posts (V2). |
| MediaService | CPU saturation from concurrent thumbnail generation | Upload requests queue and eventually timeout. Feed reads and other operations are starved of CPU. Thread pool exhaustion causes all request types to fail. | Separate thread pools for uploads vs reads (bulkhead pattern). Set CPU-based auto-scaling triggers. Long-term: async thumbnail generation via Kafka workers (V1 variant). |
| S3 (MediaStorage) | S3 PUT/GET failures or elevated latency | Uploads fail (cannot store images). Feed reads succeed but images fail to load (broken image links in the client). No CDN fallback — S3 is the only image source. | S3 has 99.99% availability SLA. Implement retry with jitter for transient failures. CDN (V1 variant) caches images at edge, providing resilience against origin failures for reads. |
| Load Balancer | All MediaService health checks fail | ALB returns 502 Bad Gateway. All traffic fails. Users cannot upload photos or view feeds. | Multi-AZ deployment with at least 2 pods per AZ. Configure health check thresholds to tolerate transient failures (3 consecutive failures before marking unhealthy). |
Vertical scaling for PostgreSQL (upgrade instance size). Horizontal scaling for MediaService via pod count increase (3 -> 6 -> 12 pods). Auto-scaling trigger: CPU utilization > 70% for 3 consecutive minutes. The ceiling is approximately 200 uploads/sec regardless of pod count because the bandwidth bottleneck is fundamental — each upload proxies 2MB through the API tier. Beyond this ceiling, architectural changes are required: presigned URL uploads (V1) to eliminate API proxying, and CDN (V1) to offload image reads from S3 origin.
Key metrics to monitor: (1) API server network throughput (bytes/sec) — the primary bandwidth indicator. Alert at 500MB/sec (50% of 10Gbps capacity). (2) Thumbnail generation duration (p50, p99) — should be 300-500ms. Alert at p99 > 800ms indicating CPU contention. (3) Feed JOIN query latency (p50, p99) — alert at p99 > 500ms. (4) PostgreSQL active connections — alert at 70/100, critical at 85/100. (5) S3 GET latency — alert at p99 > 300ms indicating origin degradation. (6) Upload success rate — alert if drops below 99%. Dashboard: Grafana with panels for upload throughput (images/sec), feed latency histogram, DB connection pool usage, S3 latency, and API server bandwidth consumption.
At 100 uploads/sec baseline: RDS db.r7g.xlarge (~$350/month), S3 Standard (~$500/month for 500TB), ECS Fargate 3 pods 4vCPU/8GB (~$250/month), ALB (~$30/month). Total: ~$1,130/month. This is the cheapest variant but breaks down beyond 200 uploads/sec due to API bandwidth saturation. The V1 CDN variant at 5K uploads/sec costs approximately $3,000/month but handles 25x the upload throughput — the per-upload cost decreases from $0.011/upload to $0.0006/upload as you scale past the naive approach's ceiling.
Image validation: all uploaded images are validated for file type (magic bytes, not just extension) and scanned for malware before storage. Size limits enforced at both ALB (10MB body limit) and application level. User authentication via JWT tokens validated on every request (~3ms overhead). No presigned URLs means all S3 access is server-side — clients never have direct S3 credentials. Rate limiting at 1K RPS per user to prevent upload abuse. Content moderation is not implemented in the naive approach — production systems require NSFW detection and content policy enforcement.
Rolling deployment for MediaService — replace one pod at a time while the ALB routes traffic to remaining pods. Database migrations run during low-traffic windows with brief maintenance for schema changes requiring table locks. S3 requires no deployment — objects are immutable once written. Zero-downtime deployment achievable for service code changes but not for PostgreSQL schema changes that add indexes on large tables (posts, follows).
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| V0: Naive (API-Proxied Upload + No CDN) | T1 | 500-800ms upload, 200-500ms feed, 50-200ms image | ~1K RPS total | $1,130/month | Low | 99% (single DB) |
| V1: CDN + Presigned Upload (Async Processing) | T2 | <200ms upload, <500ms feed, <100ms image (CDN) | 100K RPS peak | $3,000/month | Medium | 99.9% (multi-AZ) |
| V2: Async Media Pipeline (Hybrid Feed + Whale Detection) | T3 | <150ms upload, <200ms feed, <50ms image (CDN) | 1M RPS peak | $15,000/month | Very High | 99.95% (multi-AZ, tiered CDN) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Instagram combines five distinct distributed systems challenges: (1) media upload and processing — handling 2MB image payloads at scale, generating multiple resized variants, (2) social graph — follow relationships driving feed construction, (3) feed generation — the fanout-on-write vs pull-on-read trade-off, (4) global content delivery — CDN strategy for billions of image reads, and (5) engagement systems — likes, comments, and notifications at scale. Meta, Pinterest, TikTok, and Snapchat ask this question because it directly maps to their core product. Google, Amazon, and other companies ask it because it tests fundamental distributed systems reasoning: bandwidth constraints, caching strategies, async processing, and consistency trade-offs.
At 2MB per image x 500 uploads/sec = 1GB/sec of image data flowing through the API tier. Each service pod needs to buffer the entire image in memory (2MB per concurrent request x 50 threads = 100MB buffer memory per pod) and transmit it to S3 via the internal network. The bandwidth required exceeds the 10Gbps network capacity of standard cloud instances. Meanwhile, the same pods must serve feed reads, post creation, and likes — but their network and memory are consumed by image proxying. Presigned URLs solve this completely: the API server generates a URL (~2ms, ~500 bytes) and the client uploads directly to S3, bypassing the API tier entirely.
Add CDN as soon as you have users outside your primary cloud region — even at 10 users. The latency improvement is so significant (200ms to 5ms for cached images) that CDN is justified at any scale where user experience matters. In this simulation, the inflection point for cost justification is around 1,000 image reads/sec — CloudFront costs approximately $0.085/GB for the first 10TB, and the latency improvement dramatically improves user engagement and retention. Instagram added CDN before they had 1 million users.
Pull model: O(1) write (just insert the post), O(followed_accounts) read (JOIN query on every feed request). Fanout-on-write: O(followers) write (write to every follower's cache), O(1) read (LRANGE from cache). For a typical user following 200 accounts, the pull model executes a JOIN over 200 author_ids on every read — at 600 reads/sec, this is 600 x 200 = 120K index lookups/sec. Fanout-on-write has zero read-time database cost but requires a background worker to push posts to followers' caches. The trade-off is write amplification: a user with 1M followers triggers 1M cache writes per post. The V2 hybrid approach combines both models based on follower count.
More pods provide more CPU and threads but do not solve the bandwidth bottleneck. Each pod still proxies 2MB per upload through its network interface. At 500 uploads/sec across 10 pods, each pod handles 50 uploads/sec = 100MB/sec per pod — feasible for a single pod's network, but the aggregate 1GB/sec between the ALB and the pod fleet requires significant internal bandwidth. More critically, this approach wastes expensive compute resources on what is essentially a proxy operation. Presigned URLs eliminate the proxy entirely — the API server's only upload-related work is generating a signed URL (~2ms, ~500 bytes response).
Sign in to join the discussion.
Ready to design your own Instagram?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator