Vetora logo
#️⃣Partitioning

Hash Partitioning

Hash partitioning applies a hash function to each key and uses the hash value to determine which partition owns the data. This produces uniform data distribution regardless of key patterns, eliminating the hotspot problem inherent in range partitioning, but sacrifices the ability to perform efficient range queries.

Overview

Hash partitioning is the most widely used data distribution strategy in distributed key-value stores and wide-column databases. The idea is simple: apply a deterministic hash function to each record's partition key, then use the resulting hash value to assign the record to one of N partitions -- typically via hash(key) mod N or by mapping the hash to a position on a hash ring. Because good hash functions produce uniformly distributed outputs regardless of input patterns, hash partitioning eliminates the hotspot problem that plagues range partitioning when keys are sequential or skewed.

The choice of hash function matters for both distribution quality and performance. MD5 produces a 128-bit hash with excellent uniformity but is cryptographically heavy and slower than necessary for partitioning. MurmurHash3, used by Apache Cassandra, is a non-cryptographic hash that provides excellent distribution with significantly lower CPU overhead. xxHash is even faster and is used in systems where hashing throughput is critical. The hash function does not need to be cryptographically secure for partitioning -- it only needs to produce a uniform distribution and be deterministic (the same key always produces the same hash).

A key innovation in hash partitioning is the compound partition key, pioneered by Cassandra. In Cassandra's data model, a primary key consists of a partition key (one or more columns hashed for placement) and optional clustering columns (used for sort order within the partition). For example, a primary key of ((user_id), timestamp) hashes user_id to determine the partition, then sorts all of that user's records by timestamp within the partition. This gives the benefits of hash partitioning (even distribution across nodes) while preserving range query capability within a single partition (all events for user X are sorted by time). This compound key pattern is now standard in most distributed databases.

Virtual nodes (vnodes) address the imbalance problem that arises with simple modular hashing when the number of partitions changes. Instead of assigning each physical node one contiguous range of the hash space, vnodes assign each node many small, non-contiguous ranges (e.g., 256 vnodes per physical node). This means that when a node is added or removed, only a fraction of each existing node's data needs to move, rather than one node dumping its entire range onto the next. Cassandra uses vnodes by default (num_tokens=256), and DynamoDB uses a similar virtual partition concept internally. The tradeoff is increased metadata overhead -- the system must track thousands of token assignments instead of a handful of range boundaries.

Key Points
  • 1Hash partitioning applies a deterministic hash function to the partition key and maps the result to a partition, typically via modular arithmetic (hash mod N) or a hash ring. This produces uniform distribution regardless of key patterns, eliminating sequential-key hotspots.
  • 2Common hash functions for partitioning include MurmurHash3 (used by Cassandra for speed and uniformity), xxHash (fastest non-cryptographic hash, used in high-throughput systems), and MD5 (uniform but unnecessarily slow for non-security use). The function must be deterministic but need not be cryptographically secure.
  • 3Range queries require scatter-gather: because adjacent keys hash to different partitions, a range scan like 'all users with IDs 1000-2000' must query every partition and merge results. This is the fundamental tradeoff of hash partitioning compared to range partitioning.
  • 4Compound partition keys (Cassandra model) combine hash-based placement with sorted order within a partition. The partition key columns are hashed for node assignment, while clustering columns define sort order within the partition, enabling efficient range scans within a single entity's data.
  • 5Virtual nodes (vnodes) assign each physical node many small token ranges (e.g., 256 per node) instead of one large range. This improves load balancing, speeds up rebalancing when nodes join or leave, and reduces the variance in data distribution across nodes.
  • 6The 'mod N' problem occurs when N (number of partitions) changes: hash(key) mod N produces different assignments than hash(key) mod (N+1) for most keys, requiring massive data movement. Consistent hashing solves this by only redistributing K/N keys when a node is added or removed.
Simple Example

The Restaurant Coat Check

Imagine a restaurant coat check with 4 numbered racks. Instead of assigning racks alphabetically (which would overload rack 1 with all the S-name guests at a Smith family reunion), the attendant assigns each coat by adding up the letters in the guest's name and taking the remainder when divided by 4. 'Alice' (1+12+9+3+5=30, 30 mod 4 = 2) goes to rack 2. 'Bob' (2+15+2=19, 19 mod 4 = 3) goes to rack 3. Even if 50 guests arrive from the same family, their coats spread evenly across all 4 racks. The downside: if a guest asks 'give me all the coats for the Smith family,' the attendant must check all 4 racks because family members are scattered -- there is no way to find them without looking everywhere.

Real-World Examples

Apache Cassandra

Cassandra hashes the partition key using MurmurHash3 to produce a 64-bit token that maps to a position on a token ring. Each node owns multiple token ranges (vnodes, default 256 per node). Data is replicated to the next N-1 nodes on the ring (where N is the replication factor). Cassandra's compound primary key -- PRIMARY KEY ((partition_key), clustering_col1, clustering_col2) -- hashes the partition key for placement and uses clustering columns for sorted storage within the partition, enabling efficient time-range queries within a single partition.

Amazon DynamoDB

DynamoDB uses hash-based partitioning internally, applying an undisclosed hash function to the partition key attribute to assign items to partitions. Each partition supports up to 3,000 read capacity units and 1,000 write capacity units, and DynamoDB automatically splits partitions that exceed these limits. DynamoDB also supports composite primary keys (partition key + sort key), where the sort key enables range queries within a single partition -- for example, querying all orders for a customer within a date range.

Redis Cluster

Redis Cluster uses CRC16 hashing to map each key to one of 16,384 hash slots. Each node in the cluster is assigned a subset of these slots. When a key is accessed, Redis computes CRC16(key) mod 16384 to determine the slot, then routes the request to the node owning that slot. The fixed 16,384 slots act as virtual partitions, making it straightforward to rebalance by migrating slots between nodes. Redis also supports hash tags -- {user123}.profile and {user123}.settings hash to the same slot, enabling multi-key operations on related data.

Trade-Offs
AspectDescription
Uniform Distribution vs Range Query EfficiencyHash partitioning guarantees even data and load distribution regardless of key patterns, but destroys key ordering. Range queries must scatter to all partitions and gather results, increasing latency proportionally to the number of partitions. For workloads dominated by range scans, this overhead can be prohibitive.
Simplicity vs Rebalancing CostSimple hash(key) mod N is easy to implement but requires rehashing nearly all data when N changes. Consistent hashing or virtual nodes reduce the rebalancing cost to K/N keys per topology change but add metadata complexity and require a distributed coordination mechanism to track partition assignments.
Hash Function Speed vs Distribution QualityFaster hash functions (xxHash, MurmurHash3) reduce per-operation CPU overhead but may have marginally less uniform distribution than cryptographic hashes. In practice, MurmurHash3 and xxHash provide excellent uniformity for partitioning workloads, and the speed advantage is significant at millions of operations per second.
Vnode Count: Balance vs Metadata OverheadMore vnodes per physical node produce more uniform distribution and faster rebalancing, but increase the metadata that must be stored and replicated (token maps, gossip protocol overhead). Cassandra's default of 256 vnodes per node works well for clusters under 100 nodes, but very large clusters may reduce this to 16-32 vnodes to limit gossip traffic.
Case Study

Cassandra at Instagram: Scaling to Billions of Rows

Scenario

Instagram needed to store billions of user-generated media entries with two dominant access patterns: (1) insert a new photo/video for a user, and (2) retrieve the most recent N media items for a user's profile feed. Using a simple timestamp-based partition key would have created hotspots during peak upload times (everyone posting simultaneously), while a random partition key would have made profile feed queries (recent media for one user) require scatter-gather across all nodes.

Solution

Instagram used Cassandra's compound primary key with user_id as the partition key and upload_timestamp as the clustering column: PRIMARY KEY ((user_id), upload_timestamp). This hashed user_id via MurmurHash3 for even distribution across the cluster -- no single node was overloaded regardless of how many photos were uploaded simultaneously. Within each user's partition, media entries were stored sorted by timestamp (descending), so fetching a user's 20 most recent photos was a single-partition range scan -- no scatter-gather required.

Outcome

The schema supported 400+ million users with sub-5ms P99 read latency for profile feeds. Write throughput scaled linearly with cluster size because user IDs are uniformly distributed by hash. The compound key pattern became a canonical example in Cassandra data modeling and is now taught in every Cassandra training course.

Common Mistakes
  • Using hash(key) mod N without considering what happens when N changes. Adding or removing a partition reshuffles nearly all data. Use consistent hashing or virtual nodes to limit data movement to approximately 1/N of total data per topology change.
  • Choosing a cryptographic hash function (SHA-256, bcrypt) for partitioning. Cryptographic hashes are designed to be computationally expensive to resist attacks, but partitioning only needs uniform distribution and determinism. MurmurHash3 or xxHash are 10-100x faster with equally good distribution for this use case.
  • Forgetting that hash partitioning destroys key ordering. If your application needs range scans across entities (e.g., all orders between two dates across all users), hash partitioning forces an expensive scatter-gather. Consider compound keys or a hybrid approach with a secondary index.
  • Setting too many or too few virtual nodes. Too many vnodes (512+) cause excessive gossip protocol traffic and metadata overhead in large clusters. Too few vnodes (4-8) produce uneven distribution, especially in heterogeneous clusters where nodes have different capacities.
Related Concepts

See Hash Partitioning in action

Explore system design templates that use hash partitioning and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Compare hash vs range partitioning under different write patterns

Metrics to watch
partition_balance_ratioscatter_gather_latencythroughput_per_node
Run Simulation
Test Your Understanding

1What is the primary disadvantage of hash partitioning compared to range partitioning?

2In Cassandra's PRIMARY KEY ((user_id), timestamp), what determines which node stores the data?

3Redis Cluster maps keys to how many hash slots?

Deeper Reading