HashMap
This is the back-end small class of the monastery. Each article is shared from
[background introduction] [knowledge analysis] [common problems] [solutions] [coding practice] [extended thinking] [more discussion] [References]
Eight aspects of in-depth analysis of back-end knowledge / skills. This article shares:
【HashMap 】
(1) Background:
Do not talk about the use of HashMap, take a look at the underlying source code?
Think: HashMap uses key and value for storage. What is the data structure used?
We know array and linked list data structures
Array:
Advantages: fast query speed
Disadvantages: slow addition and deletion
Linked list:
Advantages: fast addition and deletion
Disadvantages: slow query speed
Can we combine the advantages of the two to achieve very fast query, addition and deletion efficiency?
So let's guess whether the underlying data structure of HashMap source code is in the form of linked list + array?
(2) Knowledge analysis:
1. What is the underlying storage of HashMap data?
Java is an object-oriented development language. Everything can be regarded as objects.
From the source code, we can see that the key and value in the map are saved in the node object.
class Node {
private K key; // Used to locate the array index position
private V value;
private Node next; // The next node of the linked list
}
2. How to represent an array?
transient Node< K,V>[] table;
Each list is called a bucket, or hash bucket array, which is an array of nodes.
HashMap uses a hash table for storage. How to get the index position of the object in the table?
key. Hashcode() ---- > hashcode ----- > high bit operation and modulo operation of hash algorithm ----- > get the storage location
Sometimes two keys will be located at the same position, and a hash collision occurs. The more distributed and uniform the results of hash algorithm, the smaller the probability of hash collision and the higher the access efficiency of map.
If the hash bucket array is large, even the poor hash algorithm will be scattered. If the hash bucket array is small, even the good hash algorithm will have more collisions.
3. Should the array have a size?
After learning array, we all know that if we want to give an initialization size to the array, does HashMap have a size?
Defalutsize = ... Initial size???
Maxmumsize = ... Maximum size???
There must be.
4. What is the initial size of the array?
You can see from the source code
static final int DEFAULT_ INITIAL_ CAPACITY = 1 << 4; // aka 16
The initial size is 16
If the initial size is not enough or reaches a certain value, do you want to expand the array size?
final Node [] resize()
5. Is there a basis for capacity expansion?
int threshold = 16; Initialization length of node [] table
transient int size; Number of key value pairs actually existing in HashMap
Size > threshold * decimal less than 1
The actual size cannot be expanded until it reaches 16, which will lose performance. It must be expanded when it reaches a point, which is 0.25, 0.5, 0.75?
A capacity expansion factor is defined in the source code
static final float DEFAULT_ LOAD_ FACTOR = 0.75f
size = threshold * Load factor ----->16 * 0.75 =12
Therefore, the capacity will be expanded at 12.
Why choose 0.75?
The default load factor of 0.75 is a balanced choice for space and time efficiency. It is recommended that you do not modify it, unless in the case of special time and space, if there is a lot of memory space and high requirements for time efficiency, you can reduce the value of load factor; On the contrary, if memory space is tight and time efficiency is not required, you can increase the value of load factor, which can be greater than 1.
6. Do you need a length limit for linked lists?
No matter how reasonable the design of load factor and hash algorithm is, it is inevitable that the zipper is too long. Once the zipper is too long, it will seriously affect the performance of HashMap. After the linked list reaches a limit value, do you need to make a change?
At jdk1 In version 8, the data structure is further optimized and the red black tree is introduced.
When the length of the linked list is too long, the linked list will be transformed into a red black tree. The performance of HashMap will be improved by using the characteristics of rapid addition, deletion, modification and query of red black tree
As defined in the source code, when the linked list exceeds 8, it will be transformed into a red black tree
static final int TREEIFY_ THRESHOLD = 8;
7. Where does the node exist?
Where there is a position, this position is calculated.
Use this key to perform a calculation to calculate where this node should exist.
To know where key and value are stored in the HashMap structure, use the hash function to realize the source code (method 1 + method 2):
Method 1:
static final int hash(Object key) { //jdk1.8 & jdk1.7
int h;
// h = key. Hashcode () is the first step to get the hashcode value
//H ^ (H > > > 16) is the high-order participation operation in the second step
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Method 2:
Static int indexfor (int h, int length) {/ / the source code of jdk1.7. JDK1.8 does not have this method, but the implementation principle is the same
return h & (leng
(3) Frequently asked questions:
Is HashMap thread safe?
(4) Solution:
HashMap is not thread safe, that is, multiple threads can write HashMap at any time, which may lead to inconsistent data. If you need to meet thread safety, in the multithreading scenario, you should try to avoid using thread unsafe HashMap and use thread safe concurrent HashMap.
(5) Coding practice:
For example, the hash value of one of my keys
h=hashCode():
11111111111111111111000011101010
h>>>16:
00000000000000001111111111111111
hash=h^(h>>>16):
11111111111111110000111100010101
(n-1)&hash:
11111111111111110000111100010101
Finally, take the last four digits 0101 and convert it to decimal, which is 5
(6) Expand thinking:
JDK1. 8 and jdk1 7 performance comparison?
JDK1. 8. The performance of HashMap is greatly optimized by introducing red black tree.
(7) References:
Re understanding of Java 8 series HashMap -- meituan technical team
(8) More discussion:
Q1. What other implementation classes do map collections have?
A1: LinkedHashMap, hashtable and treemap
Q2. Why should HashMap ensure that the capacity of HashMap is to the power of 2?
A2: the HashMap capacity is to the power of 2 to reduce hash conflicts, because if there are hash conflicts, multiple entries in one location will occur when storing entries, which will reduce the reading and writing speed; When using the capacity to the power of 2, bit operations will be performed with the HashMap (capacity-1) according to the hash value of the key. Such operations will minimize the hash conflict when the binary numbers of (capacity-1) are all 1.
Q3. Advantages and disadvantages of HashMap?
A3: it stores data according to the hashcode value of the key. In most cases, it can directly locate its value, so it has fast access speed, but the traversal order is uncertain. HashMap only allows the key of one record to be null at most, and the value of multiple records to be null. HashMap is not thread safe, that is, multiple threads can write HashMap at any time, which may lead to inconsistent data. If you need to meet thread safety, you can use the synchronized map method of collections to make HashMap thread safe, or use concurrenthashmap.
(9) Thanks:
(10) Conclusion:
That's all for today's sharing. You are welcome to like, forward, leave messages and make bricks~
Ppt link video link