Partitioning

Splitting data across nodes to scale beyond the limits of a single machine.

Concepts

Range partitioning splits data across nodes by dividing the key space into contiguous ranges, where each partition owns all keys between a lower and upper boundary. This strategy keeps data sorted within each partition and enables efficient range queries, but can lead to hotspots when keys are sequentially distributed.

Hash PartitioningP0

Hash partitioning applies a hash function to each key and uses the hash value to determine which partition owns the data. This produces uniform data distribution regardless of key patterns, eliminating the hotspot problem inherent in range partitioning, but sacrifices the ability to perform efficient range queries.

Consistent HashingP0

Consistent hashing maps both keys and nodes onto a hash ring, assigning each key to the nearest node in the clockwise direction. When a node is added or removed, only K/N keys (where K is total keys and N is total nodes) need to be redistributed -- compared to nearly all keys with traditional mod-N hashing. This makes consistent hashing essential for distributed caches, CDNs, and databases that need elastic scaling.

Hotspot MitigationP0

Hotspot mitigation addresses the problem of one partition receiving disproportionate traffic compared to others. Hotspots arise from celebrity-problem access patterns, temporal key distribution, or poor shard key selection. Solutions include key salting, dynamic partition splitting, read caching, and application-level routing -- each trading complexity for more uniform load distribution.