What is a consistent hash
This is the back-end small class of the monastery. Each article is shared from
[background introduction] [knowledge analysis] [common problems] [solutions] [coding practice] [extended thinking] [more discussion] [References]
Eight aspects of in-depth analysis of back-end knowledge / skills. This article shares:
[what is a consistent hash]
[java small class of Xiuzhen academy] what is consistency hash
Opening remarks:
Hello, I'm Liao you, a student of the 32nd session of the Beijing Branch of the IT academy. I'm an honest, pure and kind java programmer. Today, I'd like to share with you the knowledge points in deep thinking on Java task 5 on the official website of the Academy - what is consistency hash
1、 Background:
The consistent hash algorithm was proposed by MIT in 1997 (see extended reading [1]). Its design goal is to solve the hot spot problem in the Internet. Its original intention is very similar to that of carp. Consistent hash corrects the problem caused by the simple hash algorithm used in car, so that DHT can be really applied in P2P environment.
2、 Knowledge analysis:
1. Principle of consistent hash algorithm
Everyone who has studied the memcached cache database knows that the memcached server does not provide distributed cache consistency, but the client provides it. The specific steps are as follows when calculating the consistency hash:
(1) First, get the hash value of memcached server (node) and configure it on the circle of 0 ~ 2 '32.
(2) Then the hash value of the key storing the data is obtained in the same way and mapped to the same circle.
(3) Then, start searching clockwise from the location where the data is mapped, and save the data to the first server found. If the server is still not found after 232, it will be saved to the first memcached server.
3. Balance of characteristics of consistent hash algorithm
Balance means that the hash results can be distributed to all buffers as much as possible, so that all buffer spaces can be used. Many hash algorithms can meet this condition.
4. Monotonicity of properties of consistent hash algorithm
Monotonicity means that if some content has been allocated to the corresponding buffer through hash and a new buffer is added to the system, the hash result should ensure that the original allocated content can be mapped to the new buffer instead of other buffers in the old buffer set. Simple hash algorithms often cannot meet the monotonicity requirements, such as the simplest linear hash: x = (AX + b) mod (P). In the above formula, P represents the size of all buffers. It is not difficult to see that when the buffer size changes (from P1 to P2), all the original hash results will change, so it does not meet the monotonicity requirements. The change of hash results means that when the buffer space changes, all mapping relationships need to be updated in the system. In P2P system, the change of buffer is equivalent to peer joining or exiting the system. This situation will occur frequently in P2P system, so it will bring great computing and transmission load. Monotonicity requires that the hash algorithm can deal with this situation.
5. Dispersion of properties of consistent hash algorithm
In a distributed environment, the terminal may not see all the buffers, but only some of them. When the terminal wants to map the content to the buffer through the hash process, the buffer range seen by different terminals may be different, resulting in inconsistent hash results. The final result is that the same content is mapped to different buffers by different terminals. This situation should be avoided obviously, because it causes the same content to be stored in different buffers, reducing the efficiency of system storage. Dispersion is defined as the severity of the above situation. A good hash algorithm should avoid inconsistency as much as possible, that is, reduce the dispersion as much as possible.
6. Load of consistent hash algorithm characteristics
The load problem is actually looking at decentralization from another perspective. Since different terminals may map the same content to different buffers, a specific buffer may also be mapped to different content by different users. Like decentralization, this situation should be avoided, so a good hash algorithm should be able to minimize the load of buffer.
3、 Frequently asked questions:
Xmemcached itself does not have distributed functions, so how does it implement distributed applications?
4、 Solution:
The distributed implementation of xmemcached is implemented by the client. The client uses a consistent hash algorithm to calculate the storage location of each record.
5、 Coding practice
6、 Expand thinking:
Common mapping methods for data distribution: hash mapping:
The key is mapped to a finite value through a hash algorithm, such as CRC16 (key)% 16384 range mapping: the value space of the key is divided into ranges and cached in the corresponding area according to the data ID. hash is combined with the range: the typical algorithm is a consistent hash algorithm, which uniformly hashes the key to obtain the hash value, and divides the range through the whole hash space, The divided nodes are cache nodes used to store data, and then store the data in the corresponding storage space.
7、 References:
[1] http://www.zsythink.net/archives/1182/
[2] Deep distributed cache
[3] https://www.cnblogs.com/lpfuture/p/5796398.html
8、 More discussion:
Q1: why do I need a consistent hash
A1: in solving the problem of data distribution, if a simple hash modulus is used, because the modulus is bound to the data volume of the cache server, the modulus will also change when the cache server is increased or decreased, which will lead to the invalidation of all previous data; If the consistent hash algorithm is adopted, because the modulus is fixed, the increase or decrease of the server will not cause all data failure, and only a small part will be affected.
Q2: what are the application scenarios of consistent hash
A2: the xmemcached client uses the consistent hash algorithm to solve the data distribution problem. Redis uses CRC16 (key)% 16384 by default to solve the data distribution problem. However, we can also apply the consistent hash to the data distribution of redis. The specific choice depends on the application situation.
Q3: if virtual node technology is used to solve the balance problem, does the data must be evenly distributed on each server?
A3: This is not necessarily true. Using virtual node technology can only make the nodes evenly distributed on the hash ring, but not absolutely uniform. If you want to improve the balance, you can virtual several more nodes.
9、 Conclusion:
That's all for today's sharing. You are welcome to like, forward, leave messages and make bricks~
Ppt link video link