Jdk1. 8 detailed introduction to the implementation principle of HashMap

HashMap overview

HashMap is an asynchronous implementation of map interface based on hash table. This implementation provides all optional mapping operations and allows the use of null values and null keys. This class does not guarantee the order of mapping, especially it does not guarantee that the order is constant.

Data structure of HashMap

In the Java programming language, there are two basic structures, one is array and the other is analog pointer (Reference). All data structures can be constructed with these two basic structures, and HashMap is no exception. HashMap is actually a "linked list hash" data structure, that is, the structure of array and linked list, but in JDK1.8

The implementation of red black tree is added. When the length of the linked list is greater than 8, it is converted to the structure of red black tree.

As can be seen from the above figure, the HashMap in Java adopts the chain address method. Chain address method, in short, is the combination of array and linked list. There is a linked list structure on each array element. When the data is hashed, the array subscript is obtained, and the data is placed on the linked list of the corresponding subscript element.

Node is an internal class of HashMap, which implements map The entry interface is essentially a mapping (key value pair).

Sometimes two keys will be located at the same position, indicating that a hash collision has occurred. Of course, the more distributed and uniform the calculation results of hash algorithm, the smaller the probability of hash collision and the higher the access efficiency of map.

A very important field in the HashMap class is the node [] table, that is, the hash bucket array. Obviously, it is an array of nodes.

If the hash bucket array is large, even the poor hash algorithm will be scattered. If the hash bucket array is small, even the good hash algorithm will have more collisions. Therefore, it is necessary to balance the space cost and time cost. In fact, it is to determine the size of the hash bucket array according to the actual situation, and design a good hash algorithm on this basis to reduce hash collisions. So how to control the map so that the probability of hash collision is small and the hash bucket array (node [] table) occupies less space? The answer is a good hash algorithm and capacity expansion mechanism.

Before understanding the hash and capacity expansion process, we must first understand several fields of HashMap. From the source code of the default constructor of HashMap, the constructor initializes the following fields. The source code is as follows:

First, the initialization length of the node [] table is length (the default value is 16), the load factor is the load factor (the default value is 0.75), and the threshold is the number of nodes (key value pairs) with the maximum amount of data that the HashMap can accommodate. threshold = length * Load factor。 In other words, after the length of the array is defined, the larger the load factor, the more key value pairs can be accommodated.

According to the definition formula of load factor, threshold is the maximum number of elements allowed under the corresponding load factor and length (array length). If this number is exceeded, resize (expand) again. The HashMap capacity after expansion is twice the previous capacity. The default load factor of 0.75 is a balanced choice for space and time efficiency. It is recommended that you do not modify it, unless in the case of special time and space, if there is a lot of memory space and high requirements for time efficiency, you can reduce the value of load factor; On the contrary, if memory space is tight and time efficiency is not required, you can increase the value of load factor, which can be greater than 1.

The size field is actually easy to understand, which is the actual number of key value pairs in HashMap. Note the difference between the table length and the maximum number of key value pairs held threshold. The modcount field is mainly used to record the number of changes in the internal structure of HashMap, and is mainly used for the rapid failure of iteration. It should be emphasized that changes in internal structure refer to changes in structure, such as putting new key value pairs, but overwriting the value value corresponding to a key does not belong to structural changes.

In HashMap, the length and size of the hash bucket array table must be the nth power of 2 (must be a composite number). This is an unconventional design. The conventional design is to design the bucket size as a prime number. Relatively speaking, the probability of conflict caused by prime numbers is less than that of composite numbers. The specific proof can be referred to https://www.oudahe.com/p/11989/ , the initial bucket size of hashtable is 11, which is the application in which the bucket size is designed as a prime number (it cannot be guaranteed to be a prime number after hashtable capacity expansion). HashMap adopts this unconventional design, mainly for optimization during module fetching and capacity expansion. At the same time, in order to reduce conflict, when HashMap locates the index position of hash bucket, it also adds the process of high-order participation in operation.

There is a problem here. Even if the load factor and hash algorithm are designed reasonably, the zipper will inevitably be too long. Once the zipper is too long, it will seriously affect the performance of HashMap. So, in jdk1 In version 8, the data structure is further optimized and the red black tree is introduced. When the length of the linked list is too long (more than 8 by default), the linked list will be transformed into a red black tree. The characteristics of rapid addition, deletion, modification and query of the red black tree will be used to improve the performance of HashMap, in which the insertion, deletion, search and other algorithms of the red black tree will be used

Determines the index position of the hash bucket array

Code implementation:

The hash algorithm here essentially consists of three steps: Taking the hashcode value of the key, high-order operation and modular operation.

For any given object, as long as its hashcode () return value is the same, the hash code value calculated by the program calling method 1 is always the same. The first thing we think of is to modulo the hash value to the array length. In this way, the distribution of elements is relatively uniform. However, the consumption of modular operation is still relatively large. This is done in HashMap: call method 2 to calculate which index of the table array the object should be saved at.

This method is very ingenious. It obtains the save bit of the object through H & (table. Length - 1), and the length of the underlying array of HashMap is always the nth power of 2, which is the speed optimization of HashMap. When length is always to the nth power of 2, H & (length-1) operation is equivalent to modulo length, that is, H% length, but & is more efficient than%.

At jdk1 In the implementation of 8, the algorithm of high-order operation is optimized. It is realized through the high 16 bits exclusive or low 16 bits of hashcode(): (H = k.hashcode()) ^ (H > > > 16), mainly considering speed, efficiency and quality. This can be done in  When the length of the array table is relatively small, it can also ensure that both high and low bits participate in the hash calculation without too much overhead.

In the following example, n is the length of the table.

Implementation of put method of HashMap

The general idea of the put function is:

The specific code is as follows:

Get method implementation of HashMap

The idea is as follows:

1. The first node in the bucket is hit directly;

2. If there is a conflict, press key Equals (k) to find the corresponding entry

If it is a tree, pass key. In the tree Equals (k) search, O (logn); If it is a linked list, you can use key Equals (k) lookup, O (n).

Capacity expansion mechanism

Resizing is to recalculate the capacity and constantly add elements to the HashMap object. When the array inside the HashMap object cannot load more elements, the object needs to expand the length of the array so that more elements can be loaded. Of course, arrays in Java cannot be automatically expanded. The method is to use a new array to replace the existing array with small capacity, just like we use a small bucket of water. If we want to hold more water, we have to change a large bucket.

We analyze the source code of resize, in view of jdk1 8 is integrated into the red black tree, which is more complex. In order to facilitate understanding, we still use jdk1 The code of 7 is easy to understand. There is little difference in essence. The specific differences will be discussed later.

Here is to use an array with larger capacity to replace the existing array with smaller capacity. The transfer () method copies the elements of the original entry array to the new entry array.

The reference of newtable [i] is assigned to e.next, that is, the header insertion method of single linked list is used, and new elements will always be placed at the header of the linked list at the same position; In this way, the elements first placed on an index will eventually be placed at the end of the entry chain (in case of hash conflict). This is different from JDK1.8, which is explained in detail below. After recalculating the index position, the elements on the same entry chain in the old array may be placed at different positions in the new array.

The following is an example to illustrate the capacity expansion process. Suppose that our hash algorithm simply uses key mod to check the size of the table (that is, the length of the array)  The size of the hash bucket array table is 2, so the key is 3, 7 and 5, and the put order is 5, 7 and 3. After mod 2, all conflicts are in table [1]. Here, it is assumed that the load factor LoadFactor = 1, that is, when the actual size of the key value pair is greater than the actual size of the table, the capacity is expanded. The next three steps are to resize the hash bucket array to 4, and then rehash all nodes.

Let's explain jdk1 8 what optimizations have been made. After observation, it can be found that we use the expansion of the power of 2 (referring to the expansion of the length to twice the original), so the position of the element is either in the original position or move the position of the power of 2 in the original position. See the figure below to understand the meaning of this sentence. N is the length of the table. Figure (a) shows an example of determining the index position by key1 and key2 keys before capacity expansion, and figure (b) shows an example of determining the index position by key1 and key2 keys after capacity expansion, where hASH1 is the hash and high bit operation result corresponding to key1.

After the element recalculates the hash, because n becomes twice, the mask range of n-1 is 1 bit more in the high order (red), so the new index will change like this:

Therefore, when we expand HashMap, we don't need jdk1 To recalculate the hash like the implementation of 7, just look at whether the new bit of the original hash value is 1 or 0. If it is 0, the index does not change. If it is 1, the index becomes "original index + oldcap". You can see the resize diagram expanded from 16 to 32 in the following figure:

This design is really ingenious, which saves the time of recalculating the hash value. At the same time, since the new 1 bit is 0 or 1, it can be considered random, so the resize process evenly disperses the previously conflicting nodes to the new bucket. This one is jdk1 8 new optimization points. Pay attention to the difference, jdk1 During rehash in 7, when the old linked list is migrated to the new linked list, if the array index positions of the new list are the same, the linked list elements will be inverted. However, as can be seen from the above figure, jdk1 8 does not invert. Interested students can study jdk1 8's resize source code, written very well, as follows:

summary

We can now answer the first few questions to deepen our understanding of HashMap:

1. When will HashMap be used? What are his characteristics?

2. Do you know how HashMap works?

3. Do you know the principle of get and put? What are the functions of equals() and hashcode()?

4. Do you know the implementation of hash? Why do we do this?

5. What if the size of the HashMap exceeds the capacity defined by the load factor?

6. What happens when two objects have the same hashcode?

7. If the hashcode of two keys is the same, how do you get the value object?

8. What if the size of the HashMap exceeds the capacity defined by the load factor?

9. Do you understand the problem of resizing HashMap?

10. Why are wrapper classes such as string and interger suitable as keys?

Thank you for reading, hope to help you, thank you for your support to this site!

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>