Java collection source code analysis (6): HashMap

2020-12-17 • Java

summary

HashMap is an implementation class based on hash table, which is unsafe for the next thread of map interface. Because his method to solve hash conflicts is the separate linked list method, that is, the zipper method, his data structure is array + linked list. After jdk8, when hash conflicts are serious, the linked list of HashMap will be converted to red black tree under certain conditions to optimize query performance. Therefore, after jdk8, his data structure is array + linked list + red black tree.

For HashMap, as a collection container, we need to pay attention to its data storage structure, iterative method and whether it can store null values; As a collection that uses arrays as the underlying structure, we also need to pay attention to the implementation of its capacity expansion; At the same time, according to the characteristics of hash table, we also need to pay attention to how it can quickly locate subscripts through hash algorithm modulus.

1、 Data structure of HashMap

Before jdk8, the data structure of HashMap was array + linked list. After jdk8 is array + linked list + red black tree.

In HashMap, each value is stored in a node or treenode instance. There is a node [] table array member variable in the container, and each cell in the array is called a "bucket". When adding an element, the corresponding subscript is calculated through the hash value according to the key of the element, and the node class is stored in the bucket. If the table capacity is insufficient, it will be expanded, and the elements inside the container will be rehashed.

When a hash conflict occurs, that is, different elements get the same subscript, the node will be connected to the first element in the bucket, and the subsequent operations will be the same. Finally, a linked list will be formed.

After jdk8, considering that the linked list in the "bucket" will affect the query efficiency when the hash conflict is serious, under certain conditions, if there are many linked list elements to a certain extent, the node will be transformed into treenode, that is, the linked list will be transformed into a red black tree.

For the red black tree, it can be simply understood as a balanced binary tree that does not require strict balance. It not only ensures the search efficiency, but also maintains a low number of rotations. This data structure ensures the search efficiency in the case of serious hash conflict.

2、 Member variables of HashMap

Because HashMap itself inherits the member variables of abstractmap abstract class, plus its own member variables and the parameters required for re hashing during capacity expansion, the member variables of HashMap are more complex. According to the source and purpose, we divide its member variables into three categories:

1. Variables from parent class

/**
 * 1.存放key的Set集合视图，通过 keySet()方法获取
 */
transient Set<K> keySet;

/**
 * 1.存放value的Collection集合视图，通过values()方法获取
 */
transient Collection<V> values;

2. Own variables

/**
 * 1.结构更改次数。用于实现并发修改情况下的fast-fail机制，同AbstractList
 */
transient int modCount;

/**
 * 2.集合中的元素个数
 */
transient int size;

/**
 * 3.存放集合中键值对对象Entry的Set集合视图，通过entrySet()获取
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * 4.集合中的桶数组。桶即是当链表或者红黑树的容器
 */
transient Node<K,V>[] table;

3. Variables and constants related to capacity expansion

/**
 * 1.默认初始容量。必须为2的幂，默认为16
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

/**
 * 2.最大容量。不能超过1073741824，即Integer.MAX_VALUE的一半
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * 3.扩容阈值。负载系数与容量的乘积，当元素个数超过该值则扩容。默认为0
 */
int threshold;

/**
 * 4.负载系数。当容器内元素数量/容器容量大于等于该值时发生扩容
 */
final float loadFactor;

/**
 * 5.默认负载系数。未在构造函数中指定则默认为0.75
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
 * 6.容器中桶的最小树化阈值。当容器中元素个数大于等于该值时，桶才会发生树化。
 */
static final int MIN_TREEIFY_CAPACITY = 64;

/**
 * 7.桶的树化阈值。当容器元素个数大于等于MIN_TREEIFY_CAPACITY，并且桶中元素个数大于等于该值以后，将链表转为红黑树
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * 8.桶的链化阈值。当桶中元素个数，或者说链表长度小于等于该值以后，将红黑树转为链表
 */
static final int UNTREEIFY_THRESHOLD = 6;

3、 Construction method

HashMap provides four construction methods:

1. Specify capacity and load factor

public HashMap(int initialCapacity,float loadFactor) {
    // 指定初始容量是否小于0
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    // 若指定初始容量大于最大容量，则初始容量为最大容量
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    // 初始容量是否为小于0或未初始化
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    // 指定初始容量
    this.loadFactor = loadFactor;
    // 下一扩容大小为loadFactor或最接近的2的幂
    this.threshold = tableSizeFor(initialCapacity);
}

This involves a value taking method tablesizefor():

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

This method is used to obtain the power of the nearest 2 of the specified capacity. For example, if you pass in 1, you will get 2, and if you pass in 7, you will get 8.

2. Specify capacity only

public HashMap(int initialCapacity) {
    // 使用默认负载系数0.75
    this(initialCapacity,DEFAULT_LOAD_FACTOR);
}

3. No factor is specified

public HashMap() {
    // 下一扩容大小为默认大小16，其负载系数默认为0.75，初始容量默认为16
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

4. Build according to the specified map set

public HashMap(Map<? extends K,? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m,false);
}

This involves a method to merge collections. Putmapentries(), putall() method is also implemented based on this method. Since the addition also involves capacity expansion and other methods, it will not be introduced here for the time being. We will learn more about it later.

4、 Inner class of HashMap

Based on the previous analysis of Java collection source code (V): the analysis of abstractmap in part V "abstractmap view" in map and abstractmap, we know that HashMap, as a subclass of abstractmap, inherits three collection views

At the same time, you also need an internal class that implements the entry interface to be used as the element of the entryset.

Therefore, as a subclass of abstractmap, HashMap needs at least 3 collection views + 3 iterators combined with views + entry implementation classes and 7 internal classes.

In fact, due to the requirements of red black tree and parallel iteration after jdk8, he also needs to add one entry red black tree node implementation + two internal classes of parallel iterators corresponding to three view containers.

Since an abstract class is extracted for iterators and parallel iterators respectively, there are:

3 view containers + 1 iterator abstract class + iterators of 3 view containers + 1 parallel iterator abstract class + parallel iterators corresponding to 3 view containers + 1 entry implementation class + 1 red black tree node implementation class of entry

Total 13 internal classes

1. Node / TreeNode

Node is the node class in HashMap. Before jdk8, it corresponds to the entry class. It is the implementation class of entry in the map interface.

In the HashMap, each position of the array is a "bucket", and the "bucket" stores the node object node with data. In case of hash conflict, multiple nodes will form a linked list in the same bucket.

static class Node<K,V> implements Map.Entry<K,V> {
    // 节点的hashcode
    final int hash;
    // key
    final K key;
    // value
    V value;
    // 下一节点
    Node<K,V> next;
}

In jdk8, when the number of elements in the container is greater than or equal to 64 and the number of nodes in the bucket is greater than or equal to 8, the red black tree will be triggered before capacity expansion, the node class will be transformed into treenode, and the linked list will become:

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    // 父节点
    TreeNode<K,V> parent;
    // 左子节点
    TreeNode<K,V> left;
    // 右子节点
    TreeNode<K,V> right;
    // 前驱节点
    TreeNode<K,V> prev;
    // 是否为红色节点
    boolean red;
}

It is worth mentioning that treenode inherits LinkedHashMap Entry class, but LinkedHashMap The entry class inherits HashMap Node. Therefore, in fact, treenode is also a subclass of node class, which is the structural basis for transforming node into treenode.

In addition, although treenode is a tree, it still maintains an implicit linked list structure through prev. Theoretically, each node can obtain the last inserted node, which can still be understood as a one-way linked list.

2. KeySet / KeyIterator

Set < k > keyset is a set that has defined variables in abstractmap. It is a set that stores keys. HashMap's hash algorithm ensures the uniqueness of keys, which also conforms to the characteristics of set set. In HashMap, the implementation class keyset is provided.

Keyset inherits the abstractset abstract class and directly uses the methods in HashMap to implement most of the abstract methods in the abstract class. It is worth mentioning that the iterator () implemented by him also returns an internal class keyiterator of HashMap.

3. Values / ValueIterator

Like the keyset class, values is also an implementation class for collection < V > values in abstractmap. It inherits the abstractcollection abstract class and implements most of the abstract methods using the HashMap method.

Similarly, its iterator () returns an internal class valueiterator of HashMap.

4. EntrySet / EntryIterator

In abstractmap, there is a core abstract method entryset () left for subclasses to implement, and entryset is a class created to implement this method. It inherits abstractset < map Entry < K, V > >, which represents a pair of key value pair objects in the container. In comments, the author calls it a view.

Through the entryset class, we can express the map in the form of set view like the toArray of collection.

Similarly, as an implementation class of abstractset, HashMap also implements an internal iterator class entryitterer for it. The iterator () method of entryset returns this class.

5. HashIterator

Hashiterator class is an abstract iterator class used to iterate node nodes. It is also the parent class of the above three internal iterator classes: keyiterator, valueiterator and entryiiterator.

abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    public final boolean hasNext() {}

    final Node<K,V> nextNode() {}

    public final void remove() {}
}

Although it is called hashiterator, it does not implement the iterator interface, but lets its subclasses implement the interface themselves. And only three methods of iteration and deletion are provided.

In addition, its subclasses keyiterator, valueiterator and entryiiterator are also very simple. They only rewrite and wrap nextnode () as their own next () method on the basis of it, which can also be regarded as a kind of adapter here.

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

final class ValueIterator extends HashIterator
    implements Iterator<V> {
    public final V next() { return nextNode().value; }
}

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    public final Map.Entry<K,V> next() { return nextNode(); }
}

6. Spliterator

Like iterator, HashMap also provides four parallel iterators: hashmapsplitter, keysplitter, valuesplitter and entrysplitter. The latter three are subclasses of hashmapsplitter.

5、 HashMap get insert subscript

HashMap is implemented based on the hash table. Therefore, when adding elements and expanding capacity, obtaining the array subscript corresponding to the key through the hash algorithm is the basis for the addition operation of the whole class.

public V put(K key,V value) {
    return putVal(hash(key),key,value,false,true);
}

1. Calculate hash value

Two methods are involved here. One is the hash () method for calculating the hash value:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This is a very detailed answer from Zhihu boss: what is the hash method principle of HashMap in JDK source code- Know;

Here is a brief summary:

This method is actually a "perturbation function" for object The hash value obtained by hashcode() is mixed in high and low order.

We can see that after the symbol is shifted to the right by 16 bits, the first 16 bits of the new binary number are 0, and the last 16 bits are the high 16 bits of the original hashcode.

After XOR operation between the original hashcode and the binary number obtained by bit operation, we get that the first 16 bits of the hash are all 1, and the last 16 bits confuse the characteristics of the high 16 bits and the low 16 bits at the same time, further increasing the randomness.

Now we have the hashcode of the key, which is the basis for calculating the subscript.

2. Calculate subscript

Next, enter the putval () method. In fact, all methods of adding / replacing elements, including putall (), depend on the putval () implementation. Putval() needs to pass in the hash of the key as a parameter. It will perform further calculation according to the hash value and key to obtain the subscript to be inserted into the actual value.

Let's ignore other methods other than calculating subscripts:

final V putVal(int hash,K key,V value,boolean onlyIfAbsent,boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n,i;
    ... ...
    // n 即为当前数组长度
    if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
    // 根据 n 与 hash 计算插入下标
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash,null);
    
    ... ...
}

In other words, after the hashcode of the key that has been confused with the high and low order is obtained through the disturbance function hash (), it will be combined with the array length - 1: (n - 1) & hash.

Taking the default capacity of 16 as an example, the binary number converted from (16-1) is 10000, and the binary number converted from (16-1) is 1111. After zero filling, it performs and operation with the hashcode calculated by hash (). The process is as follows:

In this process, the two questions left before have been answered:

Why does capacity need to be a power of 2?

We can see that the bitwise and operation is only 1 & 1 = 1. Since the array length is converted to binary, there are only 4 bits, and all bits higher than 4 bits are 0, so the position of the operation result higher than 4 bits will also be 0. Here, the effect of modulo is cleverly realized, and the array length plays the role of low bit mask. This is why the capacity of HashMap is a power of 2.

Why should hash () confuse high and low bits?

Looking back at the hash () function, it mixes the high-order and low-order features of the original hashcode. We say that it increases randomness. How can we understand it at this point?

Let's take an example:

In fact, the modulo operation only looks at the significant digits of the binary number converted from the array length, that is, the array significant bits are 4 bits, so the hash of the key only looks at 4 bits. If it is 18 bits, the hash only looks at 18 bits.

In this case, if the array is long enough, then there are enough hash significant bits, and the hash degree will be good; However, if the significant bit is very short, for example, there are only 4 bits, the value of the high-order number cannot be distinguished. For example, the three numbers with the same low order as 8083211997015199461539999 shown in the table will be regarded as 1111 when taking the mold, and the mixed high-low bits will be 000101001111, which can be distinguished.

6、 HashMap add element

In the previous example of obtaining subscripts, we know that the put () method depends on the putval () method. In fact, all methods adding elements, including putall (), need to depend on putval ().

Since the addition of elements involves the change of the whole structure, putval () not only needs to calculate the subscript, but also includes multiple processes such as capacity expansion, treelization of linked list and treelization.

1. putVal

final V putVal(int hash,boolean evict) {
    // 当前table数组
    Node<K,V> p; 
    // 当前数组长度，当前要插入数组位置的下标
    int n,i;
    // 若集合未扩容，则进行第一次扩容
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    
    // 获取key对应要插入的下标（即桶）
    if ((p = tab[i = (n - 1) & hash]) == null)
        // 若是桶中没有元素，则添加第一个
        tab[i] = newNode(hash,null);
    else {
        // 若已经桶中已经存在元素
        Node<K,V> e; K k;
        // 1.插入元素与第一个元素是否有相同key
        if (p.hash == hash && ((k = p.key) == key ||
                               (key != null && key.equals(k))))
            e = p;
        // 2.桶中的链表是否已经转换为红黑树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this,tab,hash,value);
        // 3.遍历链表，添加到尾端
        else {
            for (int binCount = 0; ; ++binCount) {
                // 将节点添加到队尾
                if ((e = p.next) == null) {
                    p.next = newNode(hash,null);
                    // 是否需要转换为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab,hash);
                    break;
                }
                // 是否遇到了key相同的节点
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        
        // 如果key已经存在对应的值
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // 是否要覆盖value
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            
            // 空方法，用于LinkedHashMap插入后的回调
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 是否需要扩容
    if (++size > threshold)
        resize();
    // 空方法，用于LinkedHashMap插入后的回调
    afterNodeInsertion(evict);
    return null;
}

2. Tree linked list

In the above process, it involves the operation of judging whether the bucket has been turned into a red black tree:

else if (p instanceof TreeNode)
    // 将Node转为TreeNode，并且添加到红黑树
    e = ((TreeNode<K,value);

And the operation of converting the linked list into a red black tree:

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
    treeifyBin(tab,hash);

Puttreeval() is a method to add nodes to the red black tree, and treeifybin() is a method to turn the linked list into a red black tree. Let's just look at how the HashMap linked list is transformed into a red black tree:

final void treeifyBin(Node<K,V>[] tab,int hash) {
    int n,index; Node<K,V> e;
    // table是否小于最小树化阈值64
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        // 如果不到64就直接扩容
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        // 否则看看桶中是否存在元素
        TreeNode<K,V> hd = null,tl = null;
        // 将桶中链表的所有节点Node转为TreeNode
        do {
            TreeNode<K,V> p = replacementTreeNode(e,null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        
        //  如果桶中不为空，就将链表转为红黑树
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

The above process implements the process of converting the node of the linked list into a treenode. Next, treenode The treeify () method will actually turn the linked list into a red black tree:

final void treeify(Node<K,V>[] tab) {
    TreeNode<K,V> root = null;
    // 遍历链表
    for (TreeNode<K,V> x = this,next; x != null; x = next) {
        // 获取下一节点
        next = (TreeNode<K,V>)x.next;
        x.left = x.right = null;
        // 队首元素为根节点
        if (root == null) {
            x.parent = null;
            x.red = false;
            root = x;
        } else {
            // 不是队首元素，则构建子节点
            K k = x.key;
            int h = x.hash;
            Class<?> kc = null;
            for (TreeNode<K,V> p = root;;) {
                int dir,ph;
                K pk = p.key;
                // 向右
                if ((ph = p.hash) > h)
                    dir = -1;
                // 向左
                else if (ph < h)
                    dir = 1;
                // 使用比较器进行比较
                else if ((kc == null &&
                          (kc = comparableClassFor(k)) == null) ||
                         (dir = compareComparables(kc,k,pk)) == 0)
                    dir = tieBreakOrder(k,pk);

                // 构建子节点
                TreeNode<K,V> xp = p;
                if ((p = (dir <= 0) ? p.left : p.right) == null) {
                    x.parent = xp;
                    if (dir <= 0)
                        xp.left = x;
                    else
                        xp.right = x;
                    
                    // 再平衡
                    root = balanceInsertion(root,x);
                    break;
                }
            }
        }
    }
    // 再平衡，保证链表头节点是树的根节点
    moveRootToFront(tab,root);
}

The above is the process of linked list tree. Although the implementation process is not simple, the process is very simple:

Conditions for converting linked list into red black tree

Here we also clarify the conditions for the tree of the linked list: one is whether the number of elements added to the linked list is greater than 8, and the current total number of elements is greater than 64.

When this condition is not met, adding more elements will directly expand the capacity. The hash conflict is alleviated by using the re hash in the expansion process, rather than turning into a red black tree.

3. Why can key be null

Let's review the hash () method:

(key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

It can be seen that the case of key = = null is handled here. When the key is null, the hash value will be directly treated as an element with hash 0. When adding an element to putval(), it will be judged:

if (p.hash == hash && ((k = p.key) == key ||
                       (key != null && key.equals(k))))

In addition to comparing hash values, memory addresses are also compared and equals comparison is called, so null will be screened out and used as a key with only one.

7、 Capacity expansion of HashMap

Now that we know how HashMap calculates subscripts and how HashMap adds elements, it's time to understand the principle of resizing () in the process of adding elements.

1. resize

Resize() is the capacity expansion method of HashMap:

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    // 当前容量
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // 上一次的扩容阈值（在上一次扩容时指定）
    int oldThr = threshold;
    // 新容量，下一次扩容目标容量
    int newCap,newThr = 0;
    
    //======一、计算并获取扩容目标大小======
    
    // 1.若当前容量大于0（即已经扩容过了）
    if (oldCap > 0) {
        // 是否大于理论允许最大值
        if (oldCap >= MAXIMUM_CAPACITY) {
            // 扩容阈值设置为Integer.MAX_VALUE，本次以后不会再触发扩容
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 若未达到理论允许最大值，并且：
        // (1)本次扩容目标容量的两边小于理论允许最大值
        // (2)当前容量大于默认初始容量16
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // 新扩容阈值为当前扩容阈值的两倍
            newThr = oldThr << 1;
    }
    // 2.若本次扩容容量大于0（即还是初始状态，指定了容量，但是是第一次扩容）
    else if (oldThr > 0)
        // 新容量为上一次指定的扩容阈值
        newCap = oldThr;
    // 3.若当前容量和上一次的扩容阈值都为0（即还是初始状态，未指定容量而且也没扩容过）
    else {
        // 使用默认值
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    
     //======二、根据指定大小扩容======
    
    // 根据负载系数检验新容量是否可用
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        // 如果乘上负载系数大于理论允许最大容量，则直接扩容到Integer.MAX_VALUE
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // 遍历数组（桶）
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            // 若桶中不为空
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    // 重新计算节点在新HashMap桶数组的下标
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // 如果是红黑树
                    ((TreeNode<K,V>)e).split(this,newTab,j,oldCap);
                else {
                    // 如果是链表
                    Node<K,V> loHead = null,loTail = null;
                    Node<K,V> hiHead = null,hiTail = null;
                    Node<K,V> next;
                    
                    // 遍历链表
                    do {
                        next = e.next;
                        // 判断扩容后是否需要移动位置
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

The above process code is a large string, which is actually two steps: determining capacity and expanding capacity.

2. Confirm the expansion size

Before understanding how HashMap determines the expansion size, we need to understand when HashMap thinks it needs to be expanded.

As we know earlier, the subscript of value is obtained by the sum operation with the array length - 1 after mixing the high and low bits of key. That is, if the array length is not large enough - or the capacity is not large enough, the range of random values obtained after operation will be limited, so it may cause hash conflict.

Therefore, the load factor LoadFactor is introduced into HashMap. When it is not specified, it defaults to 0.75. Then, there is a capacity expansion threshold, threshold = capacity * load factor. When the capacity expansion threshold is reached - rather than the capacity size - the capacity will be expanded.

If we all use the initial value, that is, the default capacity is 16 and the default load factor is 0.75, after the first expansion, when the number of elements reaches 0.75 * 16 = 12, we will expand the capacity twice, that is, 32, and update the threshold to 32 * 0.75 = 24. So repeatedly.

Size of expansion

There are two types of capacity expansion: expanded and not expanded.

We only draw a general flow chart for the code logic of obtaining new capacity newcap and new capacity expansion threshold newthr:

It should be noted here that when oldcap is greater than or equal to the theoretical maximum, newthr = integer will be set MAX_ Value is returned directly after value, and the subsequent capacity expansion process will not be executed.

In addition, when the new capacity expansion threshold is set to integer MAX_ After value, since this value is already the maximum integer value, HashMap will not trigger capacity expansion after setting this value.

3. Re hashing process

We know that if the bucket array is expanded, the length of the array will change, and the subscripts calculated during the and operation according to the length and hash value will be different. In JDK7, when HashMap expands and moves the data of the old container, it will directly re hash to obtain the new index and disrupt the arrangement of all elements. Jdk8 is optimized to move only some elements.

We can go back and look at the code of the expansion part. There are two judgments:

// 判断扩容后是否需要移动位置
if ((e.hash & oldCap) == 0) {
    //... ...
}else {
    if (hiTail == null)
        hiHead = e;
    else
        hiTail.next = e;
    hiTail = e;
}

// 扩容后移动位置
if (hiTail != null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;
}

As mentioned earlier, the subscript of HashMap is obtained by summing the hash value and length of key, that is, (n-1) & hash. Here, whether displacement is required is determined by calculating whether n & hash is 0.

His idea is as follows:

If the capacity is expanded from 16 to 32, the last 4 bits are taken through (n-1) & hash module before capacity expansion, and the last 5 bits are taken after capacity expansion. Because 01111 and 1111 are no different, if the extra bit is 0, then the coordinates obtained by the new length and operation are unchanged, so there is no need to move. Otherwise, the extra digit is equivalent to an extra 10000. To convert to decimal is to add 16 to the original basis, that is, to add the length of the original bucket array. Then you can directly move the length of the original bucket array on the original basis.

Taking the initial capacity oldcap = 16 and newcap = 32 as examples, let's take a look at the conversion process:

Based on the above data, we simulate the calculation of the following three keys during capacity expansion:

It is not difficult to see that the element needs to be moved only when oldcap & hash > 0. Since the capacity must be 2, the new capacity is twice the old capacity every time it is expanded. If it is replaced with binary, the same hash value and the calculated coordinates are always 1 more, so it is equivalent to that the distance to be moved every time is the old capacity.

In other words, if oldcap & hash > 0, there will be new coordinates = Original subscript + oldcap, and the corresponding code of this logic is newtab [J + oldcap] = hihead; This line.

The benefits of this are obvious. Moving fewer elements can reduce the performance consumption of capacity expansion. At the same time, the elements in the same bucket may also be moved after re hashing, so that the hash conflict can be slowed down after capacity expansion and the element hash is more uniform.

8、 HashMap get element

Like the relationship between put () method and putval (), the get () method and other methods to get elements ultimately depend on the getNode () method.

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key),key)) == null ? null : e.value;
}

1. getNode

final Node<K,V> getNode(int hash,Object key) {
    Node<K,V> first,e; int n; K k;
    // 如果桶数组不为空，并且桶中不为null
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 如果桶中第一个元素的key与要查找的key相同，返回第一个元素
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 如果存在后续元素
        if ((e = first.next) != null) {
            // 如果已经转为红黑树，则使用红黑树进行查找
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash,key);
            do {
                // 否则遍历链表
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

2. Why do elements override equals and hashcode at the same time?

First of all, this is not the case with the original hashcode and equals

Reviewing the above, we can see that both put () and get () have statements like this:

// putVal
p.hash == hash && ((k = p.key) == key ||
                   (key != null && key.equals(k)));
// getNode
p.hash == hash && (key != null && key.equals(k));

Because there may be hash conflicts or null keys, it is not enough to judge the hash value. In fact, when we try to add or find a key, the method will determine a unique key according to three aspects:

Why override the equals and hashcode methods?

When we use the default process provided by HashMap, these three checks are enough to ensure that the key is unique. But this also brings some problems when we use some objects that are not rewritten Hashcode () or object When the instance of the class of equlas () method is used as the key, the methods in the object class compare the memory addresses by default, so the value can be obtained only by holding the instance originally used as the key.

Let's take an example:

Suppose we have a student class

public class Student {
    String name;
    Integer age;

    public Student(String name,Integer age) {
        this.name = name;
        this.age = age;
    }
}

Now let's use the student instance as the key:

Map<Object,Object> map = new HashMap<>(2);
map.put(new Student("xx",16),"a");
map.put(new Student("xx","a");
for(Map.Entry<Object,Object> entry : map.entrySet()){
    System.out.print(entry.getValue());
};
// aa

Therefore, if we want to use an object as a key, we need to override the values of equals () and hashcode () most of the time.

Why rewrite two methods at the same time?

This is also easy to understand. To judge a key, you need to judge the return values of equals () and hashcode (). If you rewrite only one of them, it will not be considered the same key in the end.

When we rewrite equals () and hashcode () for students, the output will be only an a after the result is run.

@Override
public int hashCode() {
    return this.name.hashCode() + age;
}

@Override
public boolean equals(Object obj) {
    return obj instanceof Student &&
        this.name.equals(((Student) obj).name);
}

9、 Iteration of HashMap

Because the map set essentially represents the mapping relationship between a group of key value pairs, and the data structure of HashMap is array + linked list / tree, the HashMap set cannot iterate directly like the implementation class of the collection interface.

In the fourth part of this article, we learned about several main internal classes in HashMap. Among them, the four view classes are keyset, values, entryset and a key value pair view entry of three collection views. When we want to iterate the HashMap, we need to iterate the three collection views, and accept the iterated objects through the key, value or entry objects.

It is worth mentioning that, like ArrayList, HashMap also implements the fast fail mechanism, so it is best not to perform structural operations during iteration.

1. Iterator iteration

All collections can be iterated through the iterator iterator (the enhanced for loop of the collection is also iterated by the iterator after compilation).

Therefore, in HashMap, all three view sets can be iterated through iterators or enhanced for loop iterators. However, although HashMap itself has iterators, there is no iterator () method, so it cannot iterate directly through iterators or enhanced for. The iteration must be realized through three view sets.

Take the entryset view as an example:

Map<Object,Object> map = new HashMap<>();

// 迭代器
Iterator<Map.Entry<Object,Object>> iterator = map.entrySet().iterator();
while (iterator.hasNext()) {
    Map.Entry<Object,Object> entry = iterator.next();
    System.out.print(entry.getValue());
}

// 增强for迭代，等价于迭代器迭代
for(Map.Entry<Object,Object> entry : map.entrySet()){
    System.out.print(entry.getValue());
};

// forEach迭代
map.entrySet().forEach(s -> {
    System.out.println(s.getValue());
});

The keyset view is the same as values, but values is a collection set, so the writing method will be slightly different.

2. Where does the data in the view set come from

Although we obtained the view set through entryset(), values() and keyset() and the iteration was successful, looking back at the source code, we found that the source code returned only an empty set without any operation to fill the data. However, when we directly obtained the view set, we could traverse it directly because of their iterators:

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    // 获取迭代器返回的node的key
    public final K next() { return nextNode().key; }
}

final class ValueIterator extends HashIterator
    implements Iterator<V> {
    // 获取迭代器返回的node的value
    public final V next() { return nextNode().value; }
}

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    // 获取迭代器返回的node
    public final Map.Entry<K,V> next() { return nextNode(); }
}

The nextnode () method comes from their parent class hashiterator. Here we need to look at it together with its construction method:

// 构造方法
HashIterator() {
    expectedModCount = modCount;
    Node<K,V>[] t = table;
    current = next = null;
    index = 0;
    // 获取第一个不为空的桶中的第一个节点
    if (t != null && size > 0) { // advance to first entry
        do {} while (index < t.length && (next = t[index++]) == null);
    }
}

// nextNode
final Node<K,V> nextNode() {
    Node<K,V>[] t;
    Node<K,V> e = next;
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    if (e == null)
        throw new NoSuchElementException();
    // 接着构造方法获取到的第一个非空桶中的节点，遍历以该节点为头结点的链表
    // 当遍历完，接着查看下一个桶中的节点
    if ((next = (current = e).next) == null && (t = table) != null) {
        do {} while (index < t.length && (next = t[index++]) == null);
    }
    return e;
}

You can see that this hashiterator is the real iterator of HashMap. It will start from the first node of the first non empty bucket in the bucket array and iterate every node in the full bucket array. However, it is not used directly, but as the parent of the iterator of the three view collections.

The iterators of the three view sets adapt the basis of the nextnode () method of hashiterator to next (), changing it from returning node class to returning node, node key and node value respectively. This is the principle of set view iteration.

Because node itself carries key, value and hash, deleting or adding can be directly operated through the method of HashMap class, which is the principle of adding, deleting and modifying iterators.

3. Foreach iteration

HashMap rewrites the foreach () method, and the three view collections rewrite their own foreach () methods.

public void forEach(BiConsumer<? super K,? super V> action) {
    Node<K,V>[] tab;
    if (action == null)
        throw new NullPointerException();
    // 若集合不为空
    if (size > 0 && (tab = table) != null) {
        int mc = modCount;
        // 遍历桶
        for (int i = 0; i < tab.length; ++i) {
            // 遍历链表
            for (Node<K,V> e = tab[i]; e != null; e = e.next)
                action.accept(e.key,e.value);
        }
        if (modCount != mc)
            throw new ConcurrentModificationException();
    }
}

The logic of foreach () is similar to that of iterator, but the writing method is more straightforward, that is, traversing the bucket array and then traversing the linked list in the bucket array. The writing method of foreach () of the three view sets is basically the same as that of HashMap, which will not be repeated here.

10、 Summary

Capacity expansion and tree

The underlying structure of HashMap is array + linked list / red black tree.

When HashMap does not specify the initial capacity and load factor, the default capacity is 16, the default load factor is 0.75, and the capacity expansion threshold is the current capacity * load factor. When the number of elements in the container is greater than or equal to the capacity expansion threshold, the capacity will be expanded to twice the original capacity.

When the number of container elements is greater than or equal to 64 and the length of the linked list in the bucket after adding elements is greater than or equal to 8, the capacity expansion will not be given priority, but the linked list will be transformed into a red black tree first.

Hash algorithm

The process of obtaining subscripts in HashMap is divided into two steps:

Among them, the reason why the length is 2 is to use the length as the low mask of the hash value here to skillfully realize the modulo effect.

Capacity expansion rehash

iteration

HashMap itself has an iterator hashiterator, but it does not have an iterator () method, so it cannot iterate directly by enhancing the for loop or obtaining the iterator. It can only iterate with the iterator of three view sets or enhanced for. However, the view iterator itself is also a subclass of hashiterator, so the view itself is only an empty collection, and its iterative ability comes from the parent class hashiterator of its own iterator.

HashMap and its three collection views override the foreach () method, so you can use the foreach () iterator.

HashMap also implements the fast fail mechanism, so it is best not to perform structural operations during iteration.

Equals and hashcode methods

HashMap will pass object. When getting () and set () Equals() and object Hashcode () method to determine the unique key. Since the implementation comparison of the default object is the memory address, it is inconvenient to use the self built object as the key, so you need to rewrite the two methods. However, because both methods are used to verify uniqueness, if you want to override equals () and hashcode (), you must override both methods at the same time, not one of them.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java Basics (02): special string classes and related extension APIs

< <上一篇

Detailed explanation and misunderstanding of Java 8 stream reduce

下一篇>>