HashMap, hashtable, and concurrenthashmap thread safety issues

1、 HashMap

HashMap is thread unsafe.

JDK 1.7 HashMap adopts the data structure of array + linked list. Under the background of multithreading, when the array is expanded, there are problems of entry chain deadlock and data loss.

JDK 1.8 HashMap adopts the data structure of array + linked list + Red Black binary tree, optimizes the scheme of array expansion in 1.7, and solves the problems of entry chain dead cycle and data loss. However, in the context of multithreading, put method has the problem of data coverage.

Thread insecurity analysis caused by capacity expansion in 1.7

void transfer(Entry[] newTable,boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash,newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

This code is the capacity expansion operation of HashMap, repositioning the subscript of each bucket, and migrating the elements to the new array by header interpolation. The head insertion method will flip the order of the linked list, which is also the key point to form an endless loop.

Suppose that two threads a and B expand the following HashMap at the same time:

The results after normal capacity expansion are as follows:

However, when thread a executes line 11 of the transfer function above, the CPU time slice runs out and thread a is suspended. As shown in the following figure:

At this time, in thread a: e = 3, next = 7, e.next = null

When the time slice of thread a is exhausted, the CPU starts to execute thread B and successfully completes the data migration in thread B:

Here's the point. According to the JAVA memory mode, after thread B performs data migration, the newtable and table in the main memory are the latest, that is, 7 next=3、3. next=null

Then, thread a obtains the CPU time slice, continues to execute newtable [i] = e, and puts 3 into the position corresponding to the new array. After this round of cycle, the situation of thread a is as follows:

Then continue to execute the next cycle. At this time, e = 7. When reading e.next from the main memory, it is found that 7.5% is in the main memory Next = 3, then next = 3, put 7 into the new array by header insertion, and continue to execute this cycle. The results are as follows:

Execute the next loop and find that next = e.next = null, so this loop will be the last loop. Next, when e.next = newtable [i] is executed, that is, 3 After next = 7, 3 and 7 are connected to each other. When newtable [i] = e is executed, 3 is re inserted into the linked list by header interpolation. The execution results are shown in the following figure:

As mentioned above, at this time, e.next = null, that is, next = null. When e = null is executed, the next cycle will not be carried out. The capacity expansion of threads a and B is completed. Obviously, after thread a is executed, a ring structure appears in the HashMap. When the HashMap is operated in the future, an endless loop will appear.

It can be seen from the above figure that element 5 is inexplicably lost during capacity expansion, which leads to the problem of data loss.

Analysis on data coverage of put method in 1.8

According to jdk1 7 problems in jdk1 8 has been well solved. If you read the source code of 1.8, you will find that you can't find the transfer function because jdk1 8. The data migration is completed directly in the resize function. In addition, jdk1 8 tail interpolation is used for element insertion.

Why jdk1 8. Data coverage will occur. Let's take a look at the following jdk1 Put operation code in 8:

final V putVal(int hash,K key,V value,boolean onlyIfAbsent,boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n,i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null) // 如果没有hash碰撞则直接插入元素
            tab[i] = newNode(hash,key,value,null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this,tab,hash,value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash,null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab,hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

The sixth line of code is to judge whether hash collision occurs. Assuming that both threads a and B are putting, and the insertion subscripts calculated by the hash function are the same, thread a is suspended due to the depletion of time slice after executing the sixth line of code, and thread B inserts elements at the subscript after obtaining the time slice to complete the normal insertion, Then, thread a obtains the time slice. Since the hash collision has been judged before, it will not be judged at this time, but will be inserted directly, which leads to the data inserted by thread B being overwritten by thread a, so the thread is unsafe.

In addition, there is a + + size at line 38 of the code. We think so. It is still threads a and B. when these two threads perform put operation at the same time, it is assumed that the Zise size of the current HashMap is 10. When thread a executes the code at line 38, it obtains the size value of 10 from the main memory and is ready for + 1 operation. However, due to the depletion of time slices, it has to give up the CPU, Thread B happily gets the CPU or gets the size value 10 from the main memory for + 1 operation, completes the put operation and writes the size = 11 back to the main memory, and then thread a gets the CPU again and continues to execute (at this time, the size value is still 10). After the put operation, thread a and thread B still write the size = 11 back to the memory. At this time, both threads a and B perform a put operation, but the size value only increases by 1, All this is because data coverage leads to thread insecurity.

2、 Hashtable

Hashtable is thread safe.

The hashtable container uses synchronized to ensure thread safety, but in the case of fierce thread competition, the efficiency of hashtable is very low. Because when one thread accesses the synchronization method of hashtable and other threads also access the synchronization method of hashtable, it will enter the blocking or polling state. For example, thread 1 uses put to add elements, and thread 2 cannot use put to add elements or get elements. Therefore, the more intense the competition, the lower the efficiency.

    public synchronized V get(Object key) {
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();
        int index = (hash & 0x7FFFFFFF) % tab.length;
        for (Entry<?,?> e = tab[index] ; e != null ; e = e.next) {
            if ((e.hash == hash) && e.key.equals(key)) {
                return (V)e.value;
            }
        }
        return null;
    }

3、 Concurrenthashmap

Concurrenthashmap is thread safe.

JDK 1.7 concurrenthashmap is implemented by segment array + hashentry array. Segment is a reentrant lock, which plays the role of lock in concurrenthashmap; hashentry is used to store key value pair data. A concurrenthashmap contains a segment array, a segment contains a hashentry array, and each hashentry is an element of a linked list structure.

Segmented lock technology divides the data into sections for storage, and then assigns a lock to each section of data. When a thread occupies the lock to access one section of data, the data of other sections can also be accessed by other threads, which can realize real concurrent access.

From the above structure, we can see that the process of locating an element in concurrenthashmap requires two hash operations. For the first time, the hash is positioned to the segment, and for the second time, the hash is positioned to the head of the linked list where the element is located.

In this structure, only the segment where the element is located can be locked, and other segments will not be affected. In this way, in the most ideal case, concurrenthashmap can support the maximum number and size of segments at the same time (just these write operations are unevenly distributed among all segments).

The side effect of this structure is that the process of hash is longer than that of ordinary HashMap.

JDK 1.8 concurrenthashmap is implemented in the way of array + linked list + red black tree. Its structure is basically the same as that of HashMap in 1.8, but a lot of lock free technologies such as volatile, final and CAS are used to reduce the impact of lock competition on performance, so as to ensure thread safety.

appendix

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>