13000 word HashMap interview must ask for detailed explanation of knowledge points

2021-01-07 • Java

an introduction to

Inheritance of hasmap

Principle of HashMap

Ways to resolve hash conflicts

Open addressing

This method is also called re hashing, The basic idea is: when the hash address P = H (key) of the keyword key conflicts, another hash address P1 is generated based on P. if P1 still conflicts, another hash address P2,... Is generated based on P until a non conflicting hash address Pi is found and the corresponding elements are stored in it. This method has a general form of re hash function:

Hi=（H（key）+di）% m i=1，2，…，n

Among them, H (key) is the hash function, M is the table length, and Di is called the incremental sequence. The value of the incremental sequence is different, and the corresponding re hashing methods are also different. There are mainly three kinds of linear detection re hashing, secondary detection re hashing, and pseudo-random detection re hashing

double hashing

This method is to construct multiple different hash functions at the same time

Hi=RH1（key） i=1，2，…，k

When the hash address hi = Rh1 (key) conflicts, calculate hi = Rh2 (key)... Until the conflict no longer occurs. This method is not easy to generate aggregation, but increases the calculation time

Chain address method

The basic idea of this method is that all elements with hash address I form a single linked list called synonym chain, and the head pointer of the single linked list is stored in the ith cell of the hash table. Therefore, the search, insertion and deletion are mainly carried out in the synonym chain.

The chain address method is suitable for frequent insertion and deletion.

Establish public overflow area

The basic idea of this method is to divide the hash table into two parts: the basic table and the overflow table. All elements that conflict with the basic table are filled in the overflow table.

The final form of HashMap

A meal was as fierce as a tiger. The original simple HashMap became so complex that it baffled countless heroes. Due to the length process of the linked list, the query will slow down, so the linked list slowly evolved into a red black tree

The HashMap body is an array structure. Each index position is called a bin in English. Let's call it a bucket first. For example, if you define a HashMap with a length of 8, it can be said that it is an array composed of 8 buckets.

When we insert data into an array, most of the time we save an element of type node, which is a static internal class defined in HashMap

Return value of HashMap

Many people think that HashMap does not have a return value or pay attention to the return value of HashMap. In fact, when you call the put (key, value) method of HashMap, it will return the existing value of the current key, and then put your new value in the position of the corresponding key

public class JavaHashMap {
    public static void main(String[] args) {
        HashMap<String,String> map = new HashMap<String,String>();
        String oldValue = map.put("java大数据","数据仓库");
        System.out.println(oldValue);
        oldValue = map.put("java大数据","实时数仓");
        System.out.println(oldValue);
    }
}

The running results are as follows. Since there is no value at the beginning, null is returned, followed by a value. When put, the old value is returned

There is a problem to note here. Because the key and value types of map are reference types, null must be returned when there is no value, rather than initial values such as 0.

Key internal elements of HashMap

Storage container table;

Because HashMap uses an array to store content, its definition is as follows

transient Node
[] table

If the hash bucket array is large, even the poor hash algorithm will be scattered. If the hash bucket array is small, even the good hash algorithm will have more collisions. Therefore, it is necessary to balance the space cost and time cost. In fact, it is to determine the size of the hash bucket array according to the actual situation, and design a good hash algorithm on this basis to reduce hash collisions. So how to control the map so that the probability of hash collision is small and the hash bucket array (node [] table) occupies less space? The answer is a good hash algorithm and capacity expansion mechanism.

In HashMap, the length and size of the hash bucket array table must be the nth power of 2 (must be a composite number). This is an unconventional design. The conventional design is to design the bucket size as a prime number. Relatively speaking, the probability of conflict caused by prime numbers is less than that of composite numbers

Number of size elements

The size field is actually easy to understand, which is the actual number of key value pairs in HashMap. Note the difference between the table length and the maximum number of key value pairs held threshold

Node

 static class Node<K,V> implements Map.Entry<K,V> {
     final int hash;
     final K key;
     V value;
     Node<K,V> next;

     Node(int hash,K key,V value,Node<K,V> next) {
         this.hash = hash;
         this.key = key;
         this.value = value;
         this.next = next;
     }
}

TreeNode

When the bucket linked list reaches 8, it will be converted into a red black tree, that is, the treenode type, which is also a static internal class defined in HashMap.

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash,V val,V> next) {
        super(hash,val,next);
}

Speaking of treenode, we have to say the other three related parameters treeify_ Threshold = 8 and untreeify_ Threshold = 6 and min_ TREEIFY_ CAPACITY=64

TREEIFY_ Threshold = 8 refers to treeing when the length of the linked list is greater than 8_ Threshold = 6 means that when the length of the deleted linked list is less than 6, the element is degraded, and the red black tree is degraded into a linked list

MIN_ TREEIFY_ Capability = 64 means that the number of elements in the array must be greater than or equal to 64 before it can be trealized

The modcount field is mainly used to record the number of changes in the internal structure of HashMap, and is mainly used for rapid failure of iteration. It should be emphasized that changes in internal structure refer to changes in structure, such as putting new key value pairs, but overwriting the value value corresponding to a key does not belong to structural changes.

Threshold

It is added by multiplying the factor by the size of the initial value. Subsequent capacity expansion is the same as the size of the array, and the capacity is expanded twice

threshold = (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY)

Actual number of storage elements size

The default size of size is 0. It refers to the number of elements stored in the array, not the number of elements in the entire HashMap. For the following figure, it is 3 instead of 11

transient int size;

The process of inserting elements into the debug source code

public class JavaHashMap {
    public static void main(String[] args) {
        HashMap<String,"数据仓库");
    }
}

Call the put() method

There's nothing to say about this method. It's a simple method provided by HashMap for users to call

Call putval()

The put method actually calls the real putval () method

It can be seen that before entering the putval () method, the hash value of the key needs to be calculated with the help of the hash method, and then the hash value of the key and the key are passed in at the same time

Call the hash () method

Enter putval ()

After entering the putval method, the overall data flow is as follows, and each step will be described in detail below

final V putVal(int hash,boolean onlyIfAbsent,boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n,i;
    // 判断是否需要初始化数组
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
    		// 当前位置为空，则直接插入，同时意味着不走else 最后直接返回null
        tab[i] = newNode(hash,null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this,tab,hash,value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash,null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab,hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    // 可以看出只有当前key 的位置为空的时候才判断时候需要reszie 已经返回 null 其他情况下都走了else 的环节
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Determine whether the array is empty and whether the resize method needs to be called

For the first call, the table here is null, so the resize method will be used

The resize method itself is also complex, because this is the first call, so it is simplified here

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap,newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               
        		//  首次初始化 zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
           // 因为 oldTab 为null 所以不会进来这个if 判断，所以将这里的代码省略了
        }
        return newTab;
    }

Table is empty. Initialization for the first time

If so, initialize the array size and threashold

newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

After initialization, the newly created array is returned, and the assignment of the variable table is completed before returning

Table is not empty. It is not initialized for the first time

If not, initialize the size of the new array with the information of the current array

Finally, complete the initialization of the table and return to the table. In fact, there is data migration here. However, in order to ensure the structure of the article, the detailed explanation of the resize method is proposed separately

table = newTab;

Determine whether there are elements in the current position

1 is not directly placed in the current position

2. Record the current node as P

The current node is marked as P, and then enters the else loop

else {
    Node<K,V> e; K k;
    if (p.hash == hash &&
        ((k = p.key) == key || (key != null && key.equals(k))))
        e = p;
    else if (p instanceof TreeNode)
        e = ((TreeNode<K,value);
    else {
        for (int binCount = 0; ; ++binCount) {
            if ((e = p.next) == null) {
                p.next = newNode(hash,null);
                if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                    treeifyBin(tab,hash);
                break;
            }
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                break;
            p = e;
        }
    }
    if (e != null) { // existing mapping for key
        V oldValue = e.value;
        if (!onlyIfAbsent || oldValue == null)
            e.value = value;
        afterNodeAccess(e);
        return oldValue;
    }
 }

Judge whether the new key and the old key are the same. Here, when the hash value and the actual value are equal, the assignment of E = P is directly completed. In fact, the replacement is completed because the key is the same.

If it is not the same key, the current element will be inserted into the linked list or red black tree, because it is a different key

If the current element is a treenode, put the current element into the red black tree, and then

if (e != null) { // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;
}

At this time, e points to the original position of P, because e = P, then overwrites the oldvalue with the new value, completes the insertion, and finally returns the oldvalue.

Finally, judge whether to expand the capacity and return null value

In fact, this step means that there is no element at the corresponding position of the key when putting in the element, so it is equivalent to adding a new element to the array. Therefore, it is necessary to judge whether to resize and return a null value.

 ++modCount;
 if (++size > threshold)
     resize();
 afterNodeInsertion(evict);
 return null;

Explain the resize method separately

First, remember that the resize method will return the expanded array

The first part initializes the new array

This part will exist whether the resize method is called for the first time or not, but the data migration part does not exist when it is called for the first time

Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap,newThr = 0;
// 判断是oldCap 是否大于0 因为可能是首次resize，如果不是的话 oldCap
if (oldCap > 0) {
	 // 到达扩容上限
    if (oldCap >= MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return oldTab;
    }
    // 这里是正常的扩容
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
             oldCap >= DEFAULT_INITIAL_CAPACITY)
        newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
    newCap = oldThr;
//第一次调用resize 方法，然后使用默认值进行初始化
else {
		// zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}

if (newThr == 0) {
    float ft = (float)newCap * loadFactor;
    newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
              (int)ft : Integer.MAX_VALUE);
}
// 创建新的数组，下面
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[])new Node[newCap];
table = newTab;

Part II data migration

Oldtab is migrated when the old array is not empty

 if (oldTab != null) {
 			// 遍历oldTable，拿到每一个元素准备放入大新的数组中去
      for (int j = 0; j < oldCap; ++j) {
          Node<K,V> e;
          if ((e = oldTab[j]) != null) {
              oldTab[j] = null;
              // 当前元素只是单个元素，不是链表
              if (e.next == null)
                  // 重新计算每个元素在数组中的位置
                  newTab[e.hash & (newCap - 1)] = e;
              // 判断当前元素是否是树   
              else if (e instanceof TreeNode)
                  ((TreeNode<K,V>)e).split(this,newTab,j,oldCap);
              // 当前元素是链表，则遍历链表    
              else { // preserve order
                  Node<K,V> loHead = null,loTail = null;
                  Node<K,V> hiHead = null,hiTail = null;
                  Node<K,V> next;
                  do {
                      next = e.next;
                      if ((e.hash & oldCap) == 0) {
                          if (loTail == null)
                              loHead = e;
                          else
                              loTail.next = e;
                          loTail = e;
                      }
                      else {
                          if (hiTail == null)
                              hiHead = e;
                          else
                              hiTail.next = e;
                          hiTail = e;
                      }
                  } while ((e = next) != null);
                  if (loTail != null) {
                      loTail.next = null;
                      newTab[j] = loHead;
                  }
                  if (hiTail != null) {
                      hiTail.next = null;
                      newTab[j + oldCap] = hiHead;
                  }
              }
          }
      }
  }

The third part returns the new array

 return newTab;

As long as the capacity expansion limit is not reached, this part will certainly go. As for whether to go or not, pandan needs to resize () for the first time

Explain the treeifybin method separately

 for (int binCount = 0; ; ++binCount) {
     if ((e = p.next) == null) {
         p.next = newNode(hash,null);
         if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
             treeifyBin(tab,hash);
         break;
     }
     if (e.hash == hash &&
         ((k = e.key) == key || (key != null && key.equals(k))))
         break;
     p = e;
 }

final void treeifyBin(Node<K,V>[] tab,int hash) {
    int n,index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null,tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e,null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

The process of getting elements

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key),key)) == null ? null : e.value;
}

/**
 * Implements Map.get and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @return the node,or null if none
 */
final Node<K,V> getNode(int hash,Object key) {
    Node<K,V> first,e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash,key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Front end Foundation (1), HTML

< <上一篇

java. lang.IllegalAccessError: tried to access field org. slf4j. impl. StaticLoggerBinder. SINGLETON from class org. slf4j. LoggerFactory

下一篇>>

搜索内容