[JDK1.8] Java Xiaobai’s source code Learning Series: HashMap

2020-12-26 • Java

Java Xiaobai's source code Learning Series: HashMap

The Spring Festival was cancelled. I spent many days at home gnawing the source code of HashMap. I also found a lot of information, including jdk1 7, jdk1 8. Of course, this article is based on jdk1 8。 Sort out what you have learned and hope to have deeper insights when you look back.

Interpretation of official documents

Let's take a look at the official introduction of the epic prefect screen

Basic data structure

In fact, in jdk1 In 8, the bottom layer of HashMap stores data according to the structure of array + single linked list + red black tree. What exactly is it?

HashMap implements the map interface and maintains a set of key value pairs, so that we can immediately obtain their corresponding values according to the keys. In addition, HashMap uses special techniques to optimize its performance. Let's learn and summarize in this article.

Interpretation of basic source code

Basic member variable

Take another look at some constants defined in HashMap:

    //序列号
    private static final long serialVersionUID = 362498820763181265L;
    //默认的初始容量为16（必须为2的幂）
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    //允许的最大容量2的30次幂
    static final int MAXIMUM_CAPACITY = 1 << 30;
    //没有指定负载因子时，默认为0.75f
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    //链表转化为红黑树的阈值
    static final int TREEIFY_THRESHOLD = 8;
    //红黑树退化为链表的阈值
    static final int UNTREEIFY_THRESHOLD = 6;
    //数组的容量大于64时，桶才有可能转化为树形结构
    static final int MIN_TREEIFY_CAPACITY = 64;

There are also some member variables:

    //存储的元素的数组，数组容量一定时2的幂次
    transient Node<K,V>[] table;    
    //存放具体元素的集
    transient Set<Map.Entry<K,V>> entrySet;
    //存放元素的个数
    transient int size;
    //每次更改结构的计数器
    transient int modCount;
    //阈值，还没有分配数组时，阈值为默认容量或指定容量，之后该值等于容量*负载因子
    int threshold;
    //负载因子
    final float loadFactor;

constructor

According to the source code, let's take a look at jdk1 8, how these are realized and why they should be considered in this way. Let's look at three constructors first (ignore the last one for now):

    //无参构造器
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
    //指定容量的构造器
    public HashMap(int initialCapacity) {
        this(initialCapacity,DEFAULT_LOAD_FACTOR);
    }
    //两参构造器
    public HashMap(int initialCapacity,float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
    //传入映射集的构造器
    public HashMap(Map<? extends K,? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m,false);
    }

These are the four constructors provided in HashMap, from which we can detect some clues.

Ingenious tablesizefor

Speaking of this, let's take a look at the ingenious tablesizefor. We can know from the annotation that this method returns a power greater than or equal to the minimum 2 of the incoming value (when 1 is passed in, it is 1). How is it implemented? Let's take a look at the specific source Code:

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

To tell the truth, after I saw the concrete implementation of this method, I sighed, good math! I read many articles and videos about this part by substituting specific numbers, and made a summary through simple examples.

At this point, I know why the capacity of the array is always the power of 2: it is because of the operation regulations, but this is basically not the reason. There must be convenience reasons for choosing the power of 2, which we will talk about later.

So when will the array be initialized? If you turn your head around, you should know that it's time to store elements in it. Let's take a look at the method of storing elements in HashMap.

Put method

    //联系指定的键Key和值Value，如果在这之前map包含相同的key，返回旧key对应的value
    public V put(K key,V value) {
        return putVal(hash(key),key,value,false,true);
    }

Ingenious hash method

The hash method is called to hash the incoming key key, and the details are as follows:

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

Let's focus on the implementation of hash function when the key is not null. Why is it designed like this? We'll summarize later:

JDK1. Putval method of 8

The following is a key method putval.

    final V putVal(int hash,K key,V value,boolean onlyIfAbsent,boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n,i;
        //如果数组未初始化或者长度为0，则调用resize()初始化数组
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //根据hash值计算数组中的桶位，如果为null，则在该桶位上新建节点
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash,null);
        else {
            Node<K,V> e; K k;
            //hash值相同，落入同一个桶中，且key相同
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //判断是否为树形节点
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this,tab,hash,value);
            else {
                //在节点后面插入新节点，桶中链表最多有8个节点，再加就变成了树
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash,null);
                        //超过阈值，转为树形
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab,hash);
                        break;
                    }
                    //判断后面节点是否存在key相同的情况
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    //e=p.next;p=e;这两步完成遍历
                    p = e;
                }
            }
            //如果存在相同key值相同，新值替换旧值
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        //容量大于阈值，resize();
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

Before understanding the resize method, let's define it as an important method for capacity expansion and re hashing. Let's first summarize the putval method:

JDK1. 8. Resize method

Then, it's finally the turn of the resize method. Let's take a look at the implementation part of the code first. Wow, it took me a lot of effort. If I still have an incorrect understanding, I hope to criticize and correct it in the comment area:

    final Node<K,V>[] resize() {
        //oldTab存储的是扩容前的数组
        Node<K,V>[] oldTab = table;
        //oldCap存储的是扩容前的数组容量
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        //oldThr存储的是扩容前的阈值
        int oldThr = threshold;
        //newCap新数组容量，newThr新数组阈值
        int newCap,newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                //如果老数组容量比数组最大容量还大，阈值变为Integer的最大值，返回老数组
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            //新数组容量变为老数组容量的两倍
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                //新阈值变为两倍需要上面的条件都成立（1、扩容两倍之后的数组容量小于最大容量2、老容量大于等于16）
                newThr = oldThr << 1; // double threshold
        }
        
        else if (oldThr > 0) // initial capacity was placed in threshold
            //使用带有初始容量构造器，让新容量变为通过initial capacity求得的threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            //使用默认构造器，初始化容量为16
            newCap = DEFAULT_INITIAL_CAPACITY;
            //新容量变为16，新阈值变为0.75*16 = 12
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        //使用带有初始容量的构造器进行扩容
        if (newThr == 0) {
            //新阈值 = 新容量 * 指定的负载因子
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        //将newThr赋值给threshold表示阈值
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        //数组如果进行初始化的步骤，不用进入下面的代码段
        //判断老数组是否为空
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                //创建临时节点存储老数组oldTab上的元素
                Node<K,V> e;
                //如果老数组上索引j的位置不为null
                if ((e = oldTab[j]) != null) {
                    //将该位置置空
                    oldTab[j] = null;
                    //判断下一位是否还有元素
                    if (e.next == null)
                        //下一位为空，则表明该桶位只有一个元素，搬移至新数组
                        newTab[e.hash & (newCap - 1)] = e;
                    //判断是否为树形节点
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this,newTab,j,oldCap);
                    //下一位不为空且为链表节点
                    else { // preserve order
                        
                        Node<K,V> loHead = null,loTail = null;
                        Node<K,V> hiHead = null,hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            //在原来索引位置新建链表
                            if ((e.hash & oldCap) == 0) {
                                //尾节点为空时
                                if (loTail == null)
                                    //头节点指向原头节点，不再变化
                                    loHead = e;
                                else
                                    //在尾部接上老数组中的当前节点
                                    loTail.next = e;
                                //尾节点指向当前节点
                                loTail = e;
                            }
                            //在原来索引位置+老数组容量的位置新建链表
                            else {
                                //与上述相同
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                            //while循环保证从到到尾遍历链表
                        } while ((e = next) != null);
                        //如果尾节点不为空，就让它的next指向空，链表完整
                        if (loTail != null) {
                            loTail.next = null;
                            //新数组的原索引位置指向链表头节点
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            //新数组的原索引加老数组容量的索引位置指向链表头节点
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

Initialization part

Let's talk about the initialization part of the array first:

Let's focus on the basic part of array moving:

The most difficult thing is how to move the array in case of hash collision? We can find that the source code classifies and judges whether the value of e.hash & oldcap is 0 or 1. Why do you do this?

First of all, we must make it clear that the difference between the same hash value before and after capacity expansion is only the bit intercepted. For example, 26 (0001 1010), when 16 is the capacity, its effective index position is 1010, while when 32 is the capacity, its effective index is 11010, which is just 10000, that is, oldcap, as shown in the following figure:

After understanding this, let's start the analysis of the code for node movement during hash collision! For a pair of nodes with the same function defined for different e.hash & oldcap, let's take them out separately and study lohead and lotail. The other pair is actually the same.

//do……while循环
do{
    next = e.next;
}while((e = next)!=null);

Finally, there are many aspects of this article that need to be improved or modified. After that, new experiences will be uploaded one after another. Please comment and correct in the comment area.

reference resources:

Some questions about the hash algorithm in HashMap the hash function jdk1 in HashMap 8 HashMap working principle and capacity expansion mechanism (source code analysis) understanding of the capacity expansion part of the resize () method of HashMap in Java 1.8

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java Web (5) – servlet details (Part 2)

< <上一篇

Millions of AdGuardHomeRules-ADH ad filtering interception rules”

下一篇>>

搜索内容