Is the size method of concurrenthashmap thread safe?

preface

During the interview, I was asked, is the size method of concurrenthashmap thread safe? This question is really not answered well. This time, let's learn about the specific implementation process according to the source code.

Principle and structure of concurrenthashmap

We all know that the structure of hash table is array plus linked list, that is, in an array, each element is a linked list. Sometimes, each element in the array is vividly called a "bucket". When inserting an element, first process the incoming key with a hash function to determine which element in the array should be stored in the linked list. This data structure can be found in many computer languages, such as HashMap and concurrent HashMap in Java.

However, this data structure is not thread safe when implementing HashMap, because when HashMap is expanded, the original linked list will be migrated to the new linked list array, In the migration process, multithreading will cause an endless loop in the linked list (header insertion before jdk1.7); in addition, multithreading will also cause data coverage in the linked list and data loss.

Therefore, there are hash table collections similar to thread safe HashMap, typically hashtable and concurrenthashmap. The cost of implementing thread safety in hashtable is relatively high, that is, synchronized is added to all possible competing methods, which will lead to that when there is a competition, only one thread can operate the whole hashtable, and all other threads need to be blocked until the thread that has obtained the lock is completed. This efficiency is very low.

Concurrent HashMap solves thread safety in a different way. It avoids locking the whole map, thus improving the efficiency of concurrency. Next, we will introduce jdk1 7 and 1.8.

JDK1. Concurrenthashmap in 7

JDK1. The concurrenthashmap in 7 adopts the form of segment lock. Each segment is a segment class. Its internal structure is similar to that of HashMap. There is an entry array inside, and each element of the array is a linked list. At the same time, the segment class inherits from reentrantlock. The structure is as follows:

static final int MAX_SCAN_RETRIES =
            Runtime.getRuntime().availableProcessors() > 1 ? 64 : 1;
final V put(K key,int hash,V value,boolean onlyIfAbsent) {
	// 首先尝试获取锁,获取失败则执行自旋,自旋次数超过最大长度,后改为阻塞锁,直到获取锁成功。
     HashEntry<K,V> node = tryLock() ? null :
         scanAndLockForPut(key,hash,value);
     V oldValue;
     try {
         HashEntry<K,V>[] tab = table;
         int index = (tab.length - 1) & hash;
         HashEntry<K,V> first = entryAt(tab,index);
         for (HashEntry<K,V> e = first;;) {
             if (e != null) {
                 K k;
                 if ((k = e.key) == key ||
                     (e.hash == hash && key.equals(k))) {
                     oldValue = e.value;
                     if (!onlyIfAbsent) {
                         e.value = value;
                         ++modCount;
                     }
                     break;
                 }
                 e = e.next;
             }
             else {
                 if (node != null)
                     node.setNext(first);
                 else
                     node = new HashEntry<K,V>(hash,key,value,first);
                 int c = count + 1;
                 if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                     rehash(node);
                 else
                     setEntryAt(tab,index,node);
                 ++modCount;
                 count = c;
                 oldValue = null;
                 break;
             }
         }
     } finally {
         unlock();
     }
     return oldValue;
 }

JDK1. Concurrenthashmap after 8

At jdk1 In 8, instead of segment lock, CAS + synchronized is adopted to ensure concurrent operation. The same structure as HashMap is adopted. Array and linked list are directly used. When the length of linked list is greater than 8, the linked list will be transformed into red black tree in order to improve query efficiency (the time complexity of linked list positioning data is O (n)), The time complexity of red black tree location data is O (logn)). The code is also similar to jdk1 8's HashMap is very similar. It also changes the original hashentry to node class, but it still uses volatile to modify the current value and the value of next. So as to ensure the efficiency in obtaining data. JDK1. The concurrencthashmap in 8 is still a little complicated when executing the put () method. It mainly takes a series of measures to ensure thread safety. The source code is as follows:

JDK1. The get () method of concurrenthashmap in 8 is still relatively simple:

public V get(Object key) {
     Node<K,V>[] tab; Node<K,V> e,p; int n,eh; K ek;
     int h = spread(key.hashCode());
     if ((tab = table) != null && (n = tab.length) > 0 &&
         (e = tabAt(tab,(n - 1) & h)) != null) {
         if ((eh = e.hash) == h) {
             if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                 return e.val;
         }
         else if (eh < 0)
             return (p = e.find(h,key)) != null ? p.val : null;
         while ((e = e.next) != null) {
             if (e.hash == h &&
                 ((ek = e.key) == key || (ek != null && key.equals(ek))))
                 return e.val;
         }
     }
     return null;
 }

Size method of concurrenthashmap

JDK1. In the size method of concurrenthashmap in 7, when calculating the size, the data length will be obtained once without locking, and then obtained again, up to three times. Compare the two values before and after. If they are the same, it indicates that there is no competitive editing operation. Just return the value directly. However, if the values obtained before and after are different, each segment will be locked, and then the size value of concurrenthashmap will be calculated.

/**
 * {@inheritDoc}
 */
public int size() {
    long n = sumCount();
    return ((n < 0L) ? 0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
            (int)n);
}

This method will return the maximum value of int, but the length of concurrenthashmap may exceed the maximum value of int. At jdk1 The mappingcount () method is added in 8. The return value of this method is of type long, so jdk1 It is more recommended to use this method to obtain the amount of data in the map in the future.

/**
 * @return the number of mappings
 * @since 1.8
 */
 public long mappingCount() {
     long n = sumCount();
     return (n < 0L) ? 0L : n; // ignore transient negative values
 }

Whether it is the size () method or the mappingcount () method, the core method is the sumcount () method. The source code is as follows:

final long sumCount() {
     CounterCell[] as = counterCells; CounterCell a;
     long sum = baseCount;
     if (as != null) {
         for (int i = 0; i < as.length; ++i) {
             if ((a = as[i]) != null)
                 sum += a.value;
         }
     }
     return sum;
 }

In the sumcount () method above, we can see that when countercells are empty, basecount is returned directly. When countercells are not empty, it is traversed and added to basecount. Look at basecount first

/**
 * Base counter value,used mainly when there is no contention,* but also as a fallback during table initialization
 * races. Updated via CAS.
 */
private transient volatile long baseCount;

Basecount is a volatile variable. Let's see how basecount is used when the put () method is executed. Addcount () method will be called in the last code of the put method, and the source code of addcount () method is as follows:

/**
 * A padded cell for distributing counts.  Adapted from LongAdder
 * and Striped64.  See their internal docs for explanation.
 */
@sun.misc.Contended static final class CounterCell {
    volatile long value;
    CounterCell(long x) { value = x; }
}

One uses @ sun misc. The class marked with contented has a volatile variable inside@ sun. misc. The annotation contained is to prevent "pseudo sharing". So what is pseudo sharing?

Therefore, pseudo sharing does great harm to performance. This annotation is not available before JDK 8, jdk1 8 then use splicing to solve this problem, fill up the cache lines, so that the modifications between caches do not affect each other.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>