Partition assignment schemes

Partition assignment, or sharding, is a basic problem in distributed system design. When resourcesResources might be computation processing time, or storage, or network bandwidth. need to be divided across multiple physical computers, partition assignment tells us which computed to address for any given request.

One way to address this might be to use a centralized data store to maintain state about partition assignments (Such as Zookeeper), but in some cases it's also possible for individual servers and clients to determine consistent shard assignments individually.

Above: How items (left) map to servers (right) with different partitioning schemes Code on Github

One approach is to treat the available servers like buckets in a hash table, and use a hash function keyed on the data to determine which server to address. The simplest hashing approach to distribute data across k buckets is probably modular hashing. Given an integer h derived from the data, the host to address is h mod k. By selecting the mod-K scheme in the visualization above, we can see how individual items (left column) are assigned to each server (right columns). If we add or remove servers from the pool, we can see how the partition assignments change.

A problem with this approach is that if the pool of servers changes, a large fraction of the items end up being reassigned to a new server. When a new server is added, not only are some items moved from the existing servers onto it, but many items swap places on the existing servers! If each item represents a parcel of data stored on the server, this implies that a lot of copying over the network needs to happen whenever the cluster size changes.

To address this problem, Highest Random Weight also called Rendezvous hashingpartitioning gives us more steady set of partition assignments. When we add a server, the only items that are reassigned are those that are moved to the new server. When we remove a server, the number of items moved is likewise small.

The Highest Random WeightThaler and Ravishankar (1996) scheme works by choosing a common weighting function, f(server, item), agreed to by all users of the clusters (servers and potentially clients). When an item needs to be hashed, the weighting function is called for all of the available servers, and the item is assigned to the one with the highest weight. Since the weighting function is shared by all users of the cluster, any two systems would come to the same conclusion of where the item should be assigned. If the weighting function is uniform, the variance in the number of items assigned to each server with converge to zero as the number of items grows.

If there are a lot of computers in the cluster, a potential drawback of the strategy above is that computing an assignment for a new item requires O(k) operations (k is the server count). Although there are some workarounds that come with increases implementation complexity, a related partitioning startegy is Consistent Hashing, which only requires O(log k) operations.

Consistent Hashing works in a way that is convenient to visualize geometrically. Each server is hashed to a point on the edge of a circle. When an item requires assignment, it is hashed to its own point on the circle's edge, and then assigned to the first server found searching (counter-)clockwise. By storing a sorted list of the server hashes, the correct server for each item can be found by bisection.

A complication with Consistent Hashing is that when a server is removed, all items assigned to it will fall onto the next server along the edge. In some scenarios, this could even create cascading failures. A workaround for this problem (implemented in the example above), is to create a large number of hashes for each server. As a result, the items assigned to that server are likely to come from many parts along the edge of the circle, and if it is removed, they will fall more evenly onto the remaining servers.

Neither Highest Random Weight nor Consistent Hashing schemes guarantee an even distrubition of resources, although they both converge toward that probabilistically. But if the number of items to be assigned is small, this may become an issue. If the items can be arbitrarily subdivided, it may help with balancing.