[introduction to Java] day22 Java container class (V) HashMap source code analysis (I)

2020-12-17 • Java

The first part mainly explains the structure, important parameters and methods in HashMap, as well as the places and application scenarios that need attention in use.

The second part mainly explains the hash algorithm, disturbance function and expansion function in HashMap. The more in-depth analysis of ordinary nodes is the beauty of the algorithm.

The third chapter mainly explains entryset, keyset and values in HashMap.

The fourth chapter mainly explains the treenode structure in HashMap and the structure adjustment method when increasing or decreasing elements. (explain with the red black tree in jdk8).

Because there are too many points to talk about in HashMap, here are some important points to explain. The angles and depths of the four articles are different, so students at different stages can also choose different parts to read. The first part belongs to the elementary part that is easy to understand, and the second, third and fourth parts belong to the advanced part of HashMap. If reading is difficult, You can skip it first and read it later.

Well, let's move on to our topic.

This article will abandon the previous statement and directly put hundreds of lines of source code. It is too dry. We have to wet it a little to digest (funny). Next, we will explain it with pictures and text.

Through this article, you will understand the following questions:

　　1. What is the structure of HashMap?

　　2. What are the advantages and disadvantages of HashMap?

　　3. When should I use HashMap?

　　4. What are the common methods in HashMap?

　　5. How do the get () and put () methods of HashMap work?

　　6. What is collision detection and collision resolution in HashMap?

　　7. What if the size of the HashMap exceeds the capacity defined by the load factor?

　　8. What should I pay attention to when setting the key of HashMap? Can I use custom objects as keys?

Well, so there's still a lot of content and a lot of dry goods. Let's first look at a small chestnut in HashMap:

The output is also simple:

It can be seen that the order of storage in HashMap is somewhat different from the order we put it in, but the results of each run are the same and output in a magical order. Why? Don't worry, let's make a breakpoint first.

You can see that there is a table field in the HashMap object. It can be seen that it is an array. The score information of our put is in this guy. Look, is this order very similar to the output order above?

But wait a minute. Did you find that Xiao Li, Xiao Zi and Xiao Chi are missing.. Don't worry about this problem. We'll find them later.

The data structure in HashMap stores nodes in the form of array + linked list. Each node is stored in the form of key value pairs (node < K, V >). The table seen above is where the values are stored in HashMap. Its data structure is as follows: node < K, V > [] table; what exactly is this node? Let's look at its code:

Node class inherits from map The entry interface. If you have no impression of this interface, you can turn back to the contents of the map interface. The contents of the node are very simple. Hash, key value information and the reference of the next node are connected through such references to form a chain. Think about the structure of table, node < K, V > []. Now do you understand the storage method of array + linked list?

What? Is that not enough? Well, a picture is worth a thousand words. I said I should explain it with pictures and text, so let's look at some pictures together:

Well, the data we store is like this in memory. Let's take a look at the data in the breakpoint:

By comparing the two pictures, we can see that the arrays are not stored in order, and there are many empty buckets in the middle (each grid is called a bin, which is translated into buckets

From here, we can see the logic when adding elements to HashMap:

Maybe you want to know what this table is. Let's look at some important member variables by the way:

The table field holds our data. The type is the node array. The structure of the node is also very simple. It simply stores the key and value, as well as the hash of the key and the reference to the next node.

The set of entries is cached in the member variable entryset (in fact, if you look carefully, you will find that the storage element logic of entryset is not simple, which will be explained in Chapter 3). Threshold indicates the threshold for the next readjustment (capacity * loading factor). The reprint factor indicates the maximum filling degree of the table, which is 0.75 by default, that is, when 75% of the capacity is used up, the capacity expansion will be triggered, because when there are enough elements in the table, the probability of conflict will greatly increase, and the increase of conflict will lead to the increase of the number of elements in each bucket, which will make the efficiency of finding elements inefficient. When the same bucket When the number of elements in the bucket reaches 8, the element structure in the bucket will be converted into a red black tree.

So, the question is, why is it 8 instead of 6 or 7,10???

It roughly means that, ideally, hashcode is randomly distributed. When the load factor is set to 0.75, the probability of the number of elements in the bucket roughly conforms to the Poisson distribution of 0.5, and the probability of the number of elements in the bucket reaching 8 is less than one in ten million. Because the conversion to red black tree is still a more time-consuming and labor-consuming operation, it is naturally not expected to be carried out frequently, but if it is set too large, The meaning of setting this value will be lost.

Well, the problem comes again.. Why 0.75 instead of 0.5,0.8???

Another poor translation:

Moreover, the load factor and capacity can be specified in the constructor:

If you do not specify the initial size and load factor, the default load factor and default capacity will be used, and the HashMap uses lazy loading. The table will be initialized only when you really add elements to it. We have seen the implementation of the put method above. Let's see how the get method is implemented:

The idea of get is very simple. The general idea is as follows:

Well, the most important methods have been introduced. It's time to save Grandpa

　　1. What is the structure of HashMap?

HashMap is an array + linked list storage form. The default initial capacity is 16 and the default loading factor is 0.75. When the length of the linked list reaches 8, it will be transformed into a red black tree to improve the search efficiency.

　　2. What are the advantages and disadvantages of HashMap?

The advantage of HashMap is that the search speed is very fast. We can quickly locate a bucket and the object to be found in a constant time. The disadvantage is that it is good at hashing - the hashing algorithm depends on the hashcode of the key, so if the hashcode of the key is poorly designed, it will seriously affect the performance. In extreme cases, if the hash value is the same every time, the length in the linked list will be too long and then transformed into a tree. The effect of hashing during capacity expansion is also very poor. The other extreme Each time the hash value is calculated, it is different, so the array in the HashMap will continue to expand, resulting in the increasing capacity of the HashMap.

On the other hand, HashMap is thread unsafe. If you want to use HashMap in concurrent programming, you need to use its synchronization class, collections The synchronizedmap () method converts a normal HashMap into a thread safe one, or replaces it with the concurrenthashmap under the concurrent package.

　　3. When should I use HashMap?

Because the HashMap search speed is very fast, it is used in scenes that often need to access elements. For example, if you want to sort the elements in one list B according to the elements in another list a, you need to often search the elements in B in A. generally, the search is carried out in the way of traversal. If the list is large, Efficiency still needs to be considered. At this time, if the elements in a are stored in the map and the elements in B are used as keys, the search efficiency will be greatly improved. This is a strategy of exchanging space for time.

　　4. What are the common methods in HashMap?

Common methods include put, get, putall, remove, clear, replace and size.

　　5. How do the get () and put () methods of HashMap work?

By hashing the hashcode () of the key and calculating the subscript (n-1 & hash), the position of the bucket is obtained. If a hash conflict occurs, use key The equals () method goes to the linked list or tree to find the corresponding node.

　　6. What is collision detection and collision resolution in HashMap?

When the hashcodes of two keys are the same, a collision will occur, like Xiao Ming and Xiao Li above. At this time, the added elements will be hung behind the bucket in the form of linked list or tree nodes.

　　7. What if the size of the HashMap exceeds the capacity defined by the load factor?

If the size of the HashMap exceeds the load factor * capacity, the capacity will be expanded to twice the original capacity.

　　8. What should I pay attention to when setting the key of HashMap? Can I use custom objects as keys?

Use immutable objects as much as possible for the set key, such as value and string, so as to ensure the immutability of the key. You can also use custom class objects, but you need to have a good design for hashcode method and equals method.

There should be applause here. The explanation of this article is finally completed. I haven't had much time to write a blog for two weeks because of other time. I'm sorry, you watchers, but writing a blog really takes time. I've thought about this article for several days. There are too many things in HashMap, and it's impossible to cover all the details, Moreover, if you introduce everything in detail, it is obviously not friendly enough for beginners, so you decided to divide it into four articles, so that you can read selectively according to the degree you can understand.

I also hope you will appreciate it. If you are interested in my article, you are also welcome to pay attention to it. It is still updated continuously. You are also welcome to put forward good suggestions. If there are important knowledge points missing or wrong, please point them out in time and communicate more.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java Basics (02): special string classes and related extension APIs

< <上一篇

Detailed explanation and misunderstanding of Java 8 stream reduce

下一篇>>

搜索内容

[introduction to Java] day22 Java container class (V) HashMap source code analysis (I)

热门文章