What is consistency hash and what problems are usually used to solve?
This is the back-end small class of the monastery. Each article is shared from
[background introduction] [knowledge analysis] [common problems] [solutions] [coding practice] [extended thinking] [more discussion] [References]
Eight aspects of in-depth analysis of back-end knowledge / skills. This article shares:
[what is consistency hash? What problems are commonly used to solve?]
[Java class of the Academy] what is consistency hash? What problems are commonly used to solve?
Hello, I'm an honest, pure and kind java programmer of Beijing Branch of it Academy. Today, I'd like to share with you the knowledge points in deep thinking - what is consistency hash? What problems are usually used to solve?
1. Background introduction
Before understanding the consistent hash algorithm, first understand the application scenario of the consistent hash algorithm. In order to alleviate the pressure of servers, multiple cache servers will be deployed to evenly allocate data resources to each server. The distributed database must first solve the problem of mapping the entire data set to multiple nodes according to partition rules, That is, the data set is divided into multiple nodes, and each node is responsible for a subset of the overall data.
There are usually two ways of data distribution: hash partition and sequential partition
Sequential distribution: data dispersion is easy to tilt, key value business-related, sequential access, and batch operation is not supported
Hash distribution: high data dispersion, key value distribution independent of business, unable to access sequentially, and supports batch operations
2. Knowledge analysis
Node remainder partition
Ordinary hash algorithm uses specific data, such as redis key or user ID, and then uses the formula: hash (key)% n to calculate the hash value according to the number of nodes n, which is used to determine which node the data is mapped to.
advantage
The outstanding advantage of this method is simplicity, which is often used in database database and table rules. Generally, the method of pre partition is adopted, and the number of partitions is planned in advance according to the amount of data
shortcoming
When the number of nodes changes, such as expanding or shrinking nodes, the data node mapping relationship needs to be recalculated, which will lead to the re migration of data. Therefore, the capacity expansion is usually doubled to avoid all data mappings being disrupted, resulting in full migration. In this way, only 50% of the data migration will occur.
Consistent hash partition
The purpose of consistent hash is to migrate as little data as possible when the number of nodes changes. All storage nodes are arranged on the ending hash ring. Each key will find the adjacent storage node clockwise after calculating the hash. When a node joins or retreats, only the subsequent nodes clockwise adjacent to the node on the hash ring are affected.
advantage
Adding and deleting nodes only affect the clockwise adjacent nodes in the hash ring, and have no impact on other nodes.
shortcoming
The distribution of data is related to the location of nodes. Because these nodes are not evenly distributed on the hash ring, the effect of uniform distribution can not be achieved when storing data.
Virtual slot partition
In essence, it is the first ordinary hash algorithm, which discretizes all data into a specified number of hash slots, and partitions these hash slots according to the number of nodes. In this way, because the number of hash slots is fixed, adding nodes do not need to migrate data to new hash slots, as long as they migrate each other between nodes, which not only ensures the uniformity of data distribution, but also ensures that there is no need to migrate too much data when adding nodes.
Redis's cluster mode uses virtual slot partitions. A total of 16383 slots are evenly distributed to nodes
3. Frequently asked questions
4. Solutions
5. Coding practice
General hash algorithm