Design a cloud file sync service with block-level chunked uploads, content-addressed dedup, and near-real-time cross-device propagation.
Dropbox is one of the most commonly asked system design interview questions because it touches nearly every distributed systems concept: chunked file transfer, content-addressed storage, metadata management, real-time synchronization, and cross-device consistency. Interviewers use it to gauge whether candidates understand bandwidth optimization, deduplication, and the trade-offs between consistency and availability in a file sync context.
At production scale, a Dropbox-like service manages hundreds of petabytes of user files across hundreds of millions of accounts. Files range from tiny configuration files to multi-gigabyte video projects. Re-uploading an entire 2GB file because a single byte changed is unacceptable from both a bandwidth and user experience perspective. The system must support resumable uploads so that interrupted transfers do not start from scratch, and it must propagate changes to all connected devices within seconds.
The key challenges include designing an efficient chunking strategy that minimizes bandwidth for incremental edits, implementing content-addressed deduplication to avoid storing identical blocks multiple times across users, building a metadata layer that tracks file-to-block mappings with full version history, and orchestrating near-real-time sync notifications so devices stay in lockstep. Additional concerns include garbage collection of orphaned blocks, conflict resolution when two devices edit the same file offline, and maintaining eleven-nines storage durability for user data that may be irreplaceable.
The architecture follows a block-level chunked sync model where the client splits every file into 4MB blocks and computes a SHA-256 hash for each block. On upload, the client sends the list of block hashes to the SyncService, which checks MetadataCache (Redis) and MetadataDB (PostgreSQL) to determine which blocks already exist. Only genuinely new blocks are uploaded to ObjectStorage (S3), achieving 90% or greater bandwidth savings for typical edits. This content-addressed approach also enables cross-user deduplication since identical blocks share a single S3 object.
The request flow begins at the UploadClient or DownloadClient, passes through an API Gateway for authentication and rate limiting, then hits a load balancer that distributes traffic across stateless SyncService pods. The SyncService coordinates all file operations: upload initialization with dedup checks, individual chunk uploads to S3, upload finalization that commits the new file version, metadata reads for downloads, and sync change polling. MetadataCache provides sub-2ms lookups for hot file metadata with an 85% hit rate, while MetadataDB serves as the durable source of truth for file-to-block mappings, block reference counts, and version history.
For real-time synchronization, the SyncService publishes file_changed events to a Kafka-based SyncStream partitioned by user ID. All of a user's connected devices subscribe to their sync channel and receive push notifications when files change, pulling only the blocks they do not already have locally. A separate DeduplicationWorker consumes events from SyncStream to handle garbage collection, decrementing block reference counts when files are deleted and marking zero-reference blocks for removal from S3 after a 24-hour grace period to prevent race conditions with concurrent uploads.
Choice
4MB content-addressed blocks with SHA-256 hashing
Rationale
A 1GB file with a single-byte edit would require re-uploading the entire file without chunking. With 4MB blocks, only the changed block needs uploading, yielding a 250x bandwidth reduction for typical edits. This is the core innovation behind Dropbox and the industry standard for all modern file sync services.
Choice
SHA-256 hash as the S3 object key for each block
Rationale
Content addressing enables dedup at every level: within a user's files across versions, across different users who share common content, and across file copies. In practice this reduces storage costs by 30-50% because identical blocks are stored exactly once regardless of how many files reference them.
Choice
ElastiCache Redis with 85% hit rate for file metadata
Rationale
Every upload-init and download operation requires a metadata lookup to resolve block lists. Redis provides sub-2ms responses for hot files, reducing MetadataDB load by roughly 6x. Without this cache layer, PostgreSQL would need to handle the full 25K peak RPS of metadata reads, requiring significantly more database capacity.
Choice
MSK (Managed Kafka) partitioned by user ID with 7-day retention
Rationale
Sync events must fan out reliably to all of a user's connected devices, including those that may be temporarily offline. Kafka's consumer group model retains events until consumed, enabling offline device catch-up on reconnect. Partitioning by user ID provides strict ordering of file changes within a single user's account.
Target RPS
25,000 peak (5K chunk uploads, 12.5K reads, 5K sync polls)
Latency (p99)
< 5s sync latency for small changes; < 100ms metadata reads
Storage
Petabyte-scale object storage (S3) with 30-50% dedup savings
Availability
99.99% service uptime; 99.999999999% storage durability (S3)
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Files are split into 4MB blocks on the client side, and each block is hashed with SHA-256. The client sends the full list of block hashes to the server, which identifies which blocks already exist in storage. Only genuinely new blocks are uploaded, meaning a small edit to a large file only transfers the changed block rather than the entire file. Uploads are also resumable at the block level, so an interrupted transfer only needs to retry the single failed block.
Content-addressed storage uses the cryptographic hash of the data as its storage key rather than an arbitrary identifier. Two blocks with identical content always produce the same hash and therefore map to the same storage object. This enables automatic deduplication across all users and all file versions without any explicit dedup logic. Dropbox leverages this to reduce storage costs by 30-50%, since common files like OS libraries, default documents, and shared attachments are stored only once.
When a file upload is finalized, the SyncService publishes a file_changed event to a Kafka stream partitioned by user ID. Each of the user's connected devices maintains a long-lived subscription to their sync channel. Upon receiving the event, a device fetches the updated block list and downloads only the blocks it does not already have locally. This push-based notification model achieves sync latency under 5 seconds for typical single-block changes.
The metadata layer tracks file versions using incrementing version numbers. When a device attempts to finalize an upload, the SyncService checks whether the base version matches the current version. If another device has already committed a newer version, the system detects a conflict and creates a conflict copy rather than silently overwriting changes. The user is then prompted to manually resolve the conflicting versions, similar to how Dropbox surfaces conflict files in the sync folder.
A dedicated DeduplicationWorker processes file deletion events asynchronously from Kafka. It decrements the reference count for each block in the deleted file. When a block's reference count reaches zero, it is not immediately deleted from S3. Instead, it is marked for removal after a 24-72 hour grace period. This grace period prevents race conditions where a concurrent upload is referencing the same block, ensuring that no data loss occurs from premature garbage collection.
Sign in to join the discussion.
Ready to design your own Dropbox?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator