HashMap source code analysis

2021-03-12 • Java

HashMap source code analysis

1. Structure

1. Succession

this class inherits from abstractmap, which is similar to ArrayList

2. Realization

The details are as follows:

3. Main fields

1. Attribute field

   // 默认大小 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
    // 最大容量 2^30
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // 负载因子
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    // 链表转成树，链表的长度 8
    static final int TREEIFY_THRESHOLD = 8;
    // 树转成链表时树大节点数 6
    static final int UNTREEIFY_THRESHOLD = 6;
    // Node 数组
    transient Node<K,V>[] table;
    // 大小
    transient int size;
    // 阈值 他等于容量乘以负载因子
    int threshold;

2. Node

this is actually in jdk1 We often refer to entry in 7, but in Java 8, entry is further abstracted and put into the map interface, which is the internal interface. There are no fields defined, only some public methods.

then, this node implements the entry interface, which defines four attributes, which are the key of HashMap, namely hash, key, value and next. Let's look at the code.

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }

3. TreeNode

treenode obviously, we mentioned the operation of converting a linked list into a tree in the above attribute field, so we need to wrap the node node into a treenode. An interesting thing here is that the treenode inherits from the LinkedHashMap entry, but it also inherits from the HashMap node, and the entry adds attributes before and after to the node. It's a little windy. To put it simply, treenode adds before, after and other red black tree information to the node. Let's take a specific look at the structure.

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
    	//before after inhert from Entry
}

4. Overview of main methods

2. Analysis of main methods

1. cotr-4

first, let's introduce the construction methods. Here we will see four construction methods. They do almost the same things in them, mainly setting the capacity and threshold of the container. We can see some constants in the above fields. Among them, the initial size is 16, and then the load factor is 0.75. The maximum capacity is 2 ^ 32.

there is an interesting method when setting the array size. The tablesizefor (int size) method can ensure that the last returned value is the smallest 2 ^ n larger than the size. This may be a little difficult to understand. Let's take an example. If we pass in an 18, then the return is 32, because 32 is a number such as the n-th power of 2, and then it is the n-th power of 2 closest to 18.

then you may find out why the node array is not initialized, because in jdk1 In 8, the implementation of HashMap is delayed allocation, that is, the node array will be created only when we really put the value in it. We will see this mechanism when we analyze the put method.

// 设置负载因子和初始大小
    public HashMap(int initialCapacity,float loadFactor) {
        // 参数判断
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        // 这个方法就是把一个不是 2 的次幂的数转换成一个大于当前数的最小的 2 的次幂的数
        this.threshold = tableSizeFor(initialCapacity);
    }

    // 大小设置一下，负载因子默认
    public HashMap(int initialCapacity) {
        this(initialCapacity,DEFAULT_LOAD_FACTOR);
    }

    // 设置负载因子为默认的 0.75
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    // putMapEntries 这个方法就是 new 新数组然后 foreach 拷贝
    public HashMap(Map<? extends K,? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m,false);
    }

2. put/putVal

The put method is today's highlight, because most of the code is actually concentrated here. The put method directly calls putval, which performs the real operation of storing elements.

I'll talk about the logic of putval first, and then look at the code, so it won't be so troublesome.

The above is the whole process of putval. There is one capacity expansion operation not mentioned. This method will be discussed separately later. Let's take a look at the source code of this method.

	public V put(K key,V value) {
        return putVal(hash(key),key,value,false,true);
    }


	// 第三个参数 onlyIfAbsent 如果是 true，那么只有在不存在该 key 时才会进行 put 操作
	// 第四个参数 evict 我们这里不关心
    final V putVal(int hash,K key,V value,boolean onlyIfAbsent,boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n,i;
        // 第一次 put 值的时候，会触发下面的 resize()。这是因为我们调用了默认的无参的构造方法导致的，无参的里面只设置了负载因子
        // 第一次 resize 和后续的扩容有些不一样，因为这次是数组从 null 初始化到默认的 16 或自定义的初始容量
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 找到具体的数组下标，如果此位置没有值，那么直接初始化一下 Node 并放置在这个位置就可以了
        // 这个地方采用的 (n - 1) & hash 来寻找数组的下标，他和 hash%n 的效果一样的 但是仅仅限制在 n 是 2 的次幂
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash,null);//newNode 方法不用看  就直接 new

        else {// 数组该位置有数据
            Node<K,V> e; K k;
            // 首先，判断该位置的第一个数据和我们要插入的数据，key 是不是"相等"，如果是，把这个节点放到 e 里面
            // 最后还有一个判断 e 是不是 null 然后对这个节点的 value 进行替换也就是说，如果 key 一样的话直接替换 vaule key 不一样说明是碰撞
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 下面两种情况都是碰撞处理
            // 如果该节点是代表红黑树的节点，调用红黑树的插值方法，本文不展开说红黑树
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this,tab,hash,value);
            // 数组该位置上是一个链表
            else {
                // 循环找链表的最后一个节点
                for (int binCount = 0; ; ++binCount) {
                    // 找到尾部就插入尾部 (Java7 是插入到链表的最前面)
                    if ((e = p.next) == null) {
                        p.next = newNode(hash,null);
                        // TREEIFY_THRESHOLD 为 8，所以，如果新插入的值是链表中的第 9 个，会触发下面的 treeifyBin，也就是将链表转换为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab,hash);
                        break;
                    }
                    // 不可能发生，所以直接 break 了 ，这个条件在前面就过滤掉了，也就是 key 相同的情况应该进行 value 的替换
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // e!=null 说明存在旧值的key与要插入的key"相等"，不是碰撞情况而是一致的 key 需替换返回老值
            if (e != null) {
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);  //这个操作是空操作  模板设计模式   是给 LinkedHashMap 使用的
                return oldValue;
            }
        }
        ++modCount;
        // 如果 HashMap 由于新插入这个值导致 size 已经超过了阈值，需要进行扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict); // 同 afterNodeAccess
        return null;
    }

3. resize

The capacity expansion method is also a little complex. The method body is a little long. It doesn't matter. Let's analyze it slowly. First understand the ideas and look at the source code.

therefore, if the process is clear, it depends on the source code, and the comments on the source code are relatively clear.

final Node<K,V>[] resize() {
        //前面这一大堆都是关于计算扩容的操作   不管他
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap,newThr = 0;
        // 对应数组扩容
        if (oldCap > 0) {
            //到极限了不扩容 修改阈值防止下一次又进入扩容操作
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // 将数组大小扩大一倍 将阈值扩大一倍
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        // 对应使用两个有参的构造方法初始化后，第一次 put 的时候  也就是说 HashMap 在初始化的时候没有分配数组，空间是延时分配的
        else if (oldThr > 0)
            newCap = oldThr;
        // 对应使用 new HashMap() 初始化后，第一次 put 的时候
        else {
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }

        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;


        // 用新的数组大小初始化新的数组
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        // 如果是初始化数组，到这里就结束了，返回 newTab 即可  接下来的操作就是数据迁移
        table = newTab;


        if (oldTab != null) {
            // 开始遍历原数组，进行数据迁移。
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // 如果该数组位置上只有单个元素，那就简单了，简单迁移这个元素就可以了
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // 如果是红黑树，就进行分裂操作
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this,newTab,j,oldCap);
                    // 链表的话就要把这些数据迁移到对应的位置 ，注意不是直接把整个链表放到数组的新位置  而是拆成两个链表
                    else {
                        // 这块是处理链表的情况，需要将此链表拆成两个链表，放到新的数组中，并且保留原来的先后顺序
                        // loHead、loTail 对应一条链表，hiHead、hiTail 对应另一条链表，代码还是比较简单的
                        Node<K,V> loHead = null,loTail = null;
                        Node<K,V> hiHead = null,hiTail = null;
                        Node<K,V> next;

                        //一条链表拆成两条
                        do {
                            next = e.next;
                            //这里就是用了一个条件拆分成了两条链表
                            //他代码这样写的原因在于：oldCap 是一个2的次幂，那么也就是说 他是形如 "100000000...000" 这个格式的
                            //那么任何一个数和他相与必然只有两种结果 0 / 非0 就看最高位，其他位出来肯定是0  这样就区分了两个链表  巧妙！
                            if ((e.hash & oldCap) == 0) {
                                //这里面的逻辑其实就是链表按照原来的顺序连接  也就是说原来 a 在 c 前面只要 ac 在同一条链表上 a 就在 c 前面
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                // 同上
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);

                        //用来把两条链表分别挂在正确的数组位置
                        if (loTail != null) {
                            loTail.next = null;
                            // 第一条链表新位置就是原来的位置
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            // 第二条链表的新的位置是 j + oldCap，这个很好理解
                            newTab[j + oldCap] = hiHead;
                        }

                    }
                }
            }
        }
        return newTab;
    }

4. putAll/putMapEntries

after analyzing the difficult putval and resize methods, the next methods are very easy.

this putall method calls putmapentries, and this method is also called in the constructor. Its concrete is the foreach copy element.

5. get/getNode/containsKey

The bottom layer of these methods calls the getNode method. Its principle is to judge whether the first element is a red black tree or a single linked list, and then traverse to get the result.

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key),key)) == null ? null : e.value;
}


final Node<K,V> getNode(int hash,Object key) {
        Node<K,V> first,e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
                (first = tab[(n - 1) & hash]) != null) {
            // 判断第一个节点是不是就是需要的
            if (first.hash == hash && // always check first node
                    ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                // 判断是否是红黑树
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash,key);

                // 链表遍历
                do {
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
}

5. remove/removeNode/clear

These methods are also very simple. Remove is the underlying dependency. Removinode is to traverse to find the corresponding node, and then delete it.

The clear method, like the ArrayList described earlier, is to set the array element to null and let it go to GC

	public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key),null,true)) == null ?
            null : e.value;
    }

    final Node<K,V> removeNode(int hash,Object key,Object value,boolean matchValue,boolean movable) {
        Node<K,index;
        //首先 hash 对应的位置是有东西的 否则直接返回 null
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null,e; K k; V v;
            //这个 if else 是用来寻找那个要删除的节点的
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            else if ((e = p.next) != null) {
                if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash,key);
                else {
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            //这个 if 是用来删除上面找到的那个节点
            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this,movable);
                else if (node == p)
                    tab[index] = node.next;
                else
                    p.next = node.next;
                ++modCount;
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }

	public void clear() {
        Node<K,V>[] tab;
        modCount++;
        if ((tab = table) != null && size > 0) {
            size = 0;
            for (int i = 0; i < tab.length; ++i)
                tab[i] = null;
        }
    }

6. containsValue

This method still adopts the traversal method. It does not distinguish between the tree and the linked list. It uniformly adopts the next pointer because the key is used as the index condition of the red black tree, but the value is not, and there is a next in the treenode because it indirectly inherits the node.

public boolean containsValue(Object value) {
        Node<K,V>[] tab; V v;
        if ((tab = table) != null && size > 0) {
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                    if ((v = e.value) == value ||
                        (value != null && value.equals(v)))
                        return true;
                }
            }
        }
        return false;
    }

7. read/writeObject

Finally, there is the problem of serialization. The node array does not adopt the default serialization. You can see that it adds the transient keyword. The manual serialization here only serializes the key value, and none of the others are stored. The reason is to save space.

private void writeObject(java.io.ObjectOutputStream s)
        throws IOException {
        int buckets = capacity();
        // Write out the threshold,loadfactor,and any hidden stuff
        s.defaultWriteObject();
        s.writeInt(buckets);
        s.writeInt(size);
        internalWriteEntries(s);
    }

 void internalWriteEntries(java.io.ObjectOutputStream s) throws IOException {
        Node<K,V>[] tab;
        if (size > 0 && (tab = table) != null) {
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                    s.writeObject(e.key);
                    s.writeObject(e.value);
                }
            }
        }
    }

3. Hashtable

just like ArrayList and vector, we need to discuss the similarities and differences between hashtable and HashMap.

as for why &0x7ffffffffff is mentioned here, the key is that the hashcode of an object can be a negative number. After operation, it can be guaranteed to be a positive integer. 0x7fffffff is 0111 1111 1111 1111 1111 1111 1111 (hash & 0x7fffffffff) will get a positive integer. Because the hash is used as the index of the array, this can avoid exceptions when the subscript is negative.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java container

< <上一篇

CDN 2021 full attack Guide (2)

下一篇>>

搜索内容

HashMap source code analysis

HashMap source code analysis

1. Structure

1. Succession

2. Realization

3. Main fields

1. Attribute field

2. Node

3. TreeNode

4. Overview of main methods

2. Analysis of main methods

1. cotr-4

2. put/putVal

3. resize

4. putAll/putMapEntries

5. get/getNode/containsKey

5. remove/removeNode/clear

6. containsValue

7. read/writeObject

3. Hashtable

热门文章