1What is the primary disadvantage of hash partitioning compared to range partitioning?
Hash partitioning applies a hash function to each key and uses the hash value to determine which partition owns the data. This produces uniform data distribution regardless of key patterns, eliminating the hotspot problem inherent in range partitioning, but sacrifices the ability to perform efficient range queries.
Hash partitioning is the most widely used data distribution strategy in distributed key-value stores and wide-column databases. The idea is simple: apply a deterministic hash function to each record's partition key, then use the resulting hash value to assign the record to one of N partitions -- typically via hash(key) mod N or by mapping the hash to a position on a hash ring. Because good hash functions produce uniformly distributed outputs regardless of input patterns, hash partitioning eliminates the hotspot problem that plagues range partitioning when keys are sequential or skewed.
The choice of hash function matters for both distribution quality and performance. MD5 produces a 128-bit hash with excellent uniformity but is cryptographically heavy and slower than necessary for partitioning. MurmurHash3, used by Apache Cassandra, is a non-cryptographic hash that provides excellent distribution with significantly lower CPU overhead. xxHash is even faster and is used in systems where hashing throughput is critical. The hash function does not need to be cryptographically secure for partitioning -- it only needs to produce a uniform distribution and be deterministic (the same key always produces the same hash).
A key innovation in hash partitioning is the compound partition key, pioneered by Cassandra. In Cassandra's data model, a primary key consists of a partition key (one or more columns hashed for placement) and optional clustering columns (used for sort order within the partition). For example, a primary key of ((user_id), timestamp) hashes user_id to determine the partition, then sorts all of that user's records by timestamp within the partition. This gives the benefits of hash partitioning (even distribution across nodes) while preserving range query capability within a single partition (all events for user X are sorted by time). This compound key pattern is now standard in most distributed databases.
Virtual nodes (vnodes) address the imbalance problem that arises with simple modular hashing when the number of partitions changes. Instead of assigning each physical node one contiguous range of the hash space, vnodes assign each node many small, non-contiguous ranges (e.g., 256 vnodes per physical node). This means that when a node is added or removed, only a fraction of each existing node's data needs to move, rather than one node dumping its entire range onto the next. Cassandra uses vnodes by default (num_tokens=256), and DynamoDB uses a similar virtual partition concept internally. The tradeoff is increased metadata overhead -- the system must track thousands of token assignments instead of a handful of range boundaries.
The Restaurant Coat Check
Imagine a restaurant coat check with 4 numbered racks. Instead of assigning racks alphabetically (which would overload rack 1 with all the S-name guests at a Smith family reunion), the attendant assigns each coat by adding up the letters in the guest's name and taking the remainder when divided by 4. 'Alice' (1+12+9+3+5=30, 30 mod 4 = 2) goes to rack 2. 'Bob' (2+15+2=19, 19 mod 4 = 3) goes to rack 3. Even if 50 guests arrive from the same family, their coats spread evenly across all 4 racks. The downside: if a guest asks 'give me all the coats for the Smith family,' the attendant must check all 4 racks because family members are scattered -- there is no way to find them without looking everywhere.
Apache Cassandra
Cassandra hashes the partition key using MurmurHash3 to produce a 64-bit token that maps to a position on a token ring. Each node owns multiple token ranges (vnodes, default 256 per node). Data is replicated to the next N-1 nodes on the ring (where N is the replication factor). Cassandra's compound primary key -- PRIMARY KEY ((partition_key), clustering_col1, clustering_col2) -- hashes the partition key for placement and uses clustering columns for sorted storage within the partition, enabling efficient time-range queries within a single partition.
Amazon DynamoDB
DynamoDB uses hash-based partitioning internally, applying an undisclosed hash function to the partition key attribute to assign items to partitions. Each partition supports up to 3,000 read capacity units and 1,000 write capacity units, and DynamoDB automatically splits partitions that exceed these limits. DynamoDB also supports composite primary keys (partition key + sort key), where the sort key enables range queries within a single partition -- for example, querying all orders for a customer within a date range.
Redis Cluster
Redis Cluster uses CRC16 hashing to map each key to one of 16,384 hash slots. Each node in the cluster is assigned a subset of these slots. When a key is accessed, Redis computes CRC16(key) mod 16384 to determine the slot, then routes the request to the node owning that slot. The fixed 16,384 slots act as virtual partitions, making it straightforward to rebalance by migrating slots between nodes. Redis also supports hash tags -- {user123}.profile and {user123}.settings hash to the same slot, enabling multi-key operations on related data.
| Aspect | Description |
|---|---|
| Uniform Distribution vs Range Query Efficiency | Hash partitioning guarantees even data and load distribution regardless of key patterns, but destroys key ordering. Range queries must scatter to all partitions and gather results, increasing latency proportionally to the number of partitions. For workloads dominated by range scans, this overhead can be prohibitive. |
| Simplicity vs Rebalancing Cost | Simple hash(key) mod N is easy to implement but requires rehashing nearly all data when N changes. Consistent hashing or virtual nodes reduce the rebalancing cost to K/N keys per topology change but add metadata complexity and require a distributed coordination mechanism to track partition assignments. |
| Hash Function Speed vs Distribution Quality | Faster hash functions (xxHash, MurmurHash3) reduce per-operation CPU overhead but may have marginally less uniform distribution than cryptographic hashes. In practice, MurmurHash3 and xxHash provide excellent uniformity for partitioning workloads, and the speed advantage is significant at millions of operations per second. |
| Vnode Count: Balance vs Metadata Overhead | More vnodes per physical node produce more uniform distribution and faster rebalancing, but increase the metadata that must be stored and replicated (token maps, gossip protocol overhead). Cassandra's default of 256 vnodes per node works well for clusters under 100 nodes, but very large clusters may reduce this to 16-32 vnodes to limit gossip traffic. |
Cassandra at Instagram: Scaling to Billions of Rows
Scenario
Instagram needed to store billions of user-generated media entries with two dominant access patterns: (1) insert a new photo/video for a user, and (2) retrieve the most recent N media items for a user's profile feed. Using a simple timestamp-based partition key would have created hotspots during peak upload times (everyone posting simultaneously), while a random partition key would have made profile feed queries (recent media for one user) require scatter-gather across all nodes.
Solution
Instagram used Cassandra's compound primary key with user_id as the partition key and upload_timestamp as the clustering column: PRIMARY KEY ((user_id), upload_timestamp). This hashed user_id via MurmurHash3 for even distribution across the cluster -- no single node was overloaded regardless of how many photos were uploaded simultaneously. Within each user's partition, media entries were stored sorted by timestamp (descending), so fetching a user's 20 most recent photos was a single-partition range scan -- no scatter-gather required.
Outcome
The schema supported 400+ million users with sub-5ms P99 read latency for profile feeds. Write throughput scaled linearly with cluster size because user IDs are uniformly distributed by hash. The compound key pattern became a canonical example in Cassandra data modeling and is now taught in every Cassandra training course.
See Hash Partitioning in action
Explore system design templates that use hash partitioning and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the primary disadvantage of hash partitioning compared to range partitioning?
2In Cassandra's PRIMARY KEY ((user_id), timestamp), what determines which node stores the data?
3Redis Cluster maps keys to how many hash slots?