On the underlying implementation of Java concurrency

The purpose of concurrent programming is to make the program run faster, but the use of concurrency may not make the program run faster. The advantages of concurrent programming can be reflected only when the number of concurrent programs reaches a certain order of magnitude. Therefore, it is meaningful to talk about concurrent programming when there is a high amount of concurrency. Although there is no program with high concurrency, learning concurrency is to better understand some distributed architectures. Then, when the concurrency of a program is not high, such as a single threaded program, the execution efficiency of a single thread is higher than that of multiple threads. Why? Those familiar with the operating system should know that the CPU realizes multithreading by allocating time slices to each thread. In this way, when the CPU switches from one task to another, it will save the state of the previous task. When the task is completed, the CPU will continue to execute the state of the previous task. This process is called context switching.

In Java multithreading, volatile keyword and synchronized keyword play an important role. They can realize thread synchronization, but how to realize it at the bottom?

volatile

Volatile can only guarantee the visibility of variables to individual threads, but it can not guarantee atomicity. I won't say much about the use of Java language volatile. My suggestion is to cooperate with package Java util. concurrent. The class library in atomic is not used in other cases. See this article for more explanation.

Introduction

See the following code

Each time the above code is executed, the result is different, and the output number is always less than 10000 This is because I + + is not an atomic operation when performing inc(). Some people might suggest using synchronized to synchronize Inc (), or package Java util. concurrent. The lock under locks controls thread synchronization. But they are not as good as the following solutions:

At this time, if you don't understand the implementation of atomic, you will doubt it with disdain. Maybe the underlying layer of atomicinteger is implemented with locks, so it may not be efficient. So what is it? Let's see.

Internal implementation of atomic class

Whether it is atomicinteger or the node class concurrentlinkedqueue Node, they all have a static variable private static final sun misc. Unsafe UNSAFE;, This class is a Java encapsulation of the C + + object Sun:: misc:: unsafe that implements atomic semantics. I want to see the underlying implementation. I happen to have gcc4 8 source code, compared with the local path, it is convenient to find the path of GitHub. See here.

Take the implementation of interface getandincrement () as an example

AtomicInteger. java

Note that this for loop will return only if compareandset succeeds. Otherwise, keep compareandset.

The compareandset implementation was called. Here, I notice that the implementation of Oracle JDK is slightly different. If you look at SRC under JDK, you can see that Oracle JDK calls getandincrement () of unsafe, but I believe Oracle JDK implements unsafe In Java, you should only call compareandset, because a compareandset can realize atomic operations of increasing, decreasing and setting values.

Unsafe. java

The implementation of C + + called by JNI.

natUnsafe. cc

Unsafe:: compareandswapint calls the static function compareandswap. Compareandswap uses spinlock as the lock. Spinlock here means lockguard. It is locked during construction and released during decomposition.

We need to focus on spinlock. Here is to ensure that spinlock is the real implementation of atomic operation before it is released.

What is spinlock

Spinlock, or spin lock, is a kind of lock that circularly waits to obtain resources. Unlike mutex's method of blocking the current thread and releasing CPU resources to wait for required resources, spinlock will not enter the process of suspending, waiting for conditions to be met and re competing for CPU. This means that spinlock is better than mutex only when the cost of waiting for a lock is less than the cost of thread context switching.

natUnsafe. cc

With a static variable static volatile obj_ addr_ t lock; As a flag bit, a guard is implemented through C + + raii, so the so-called lock is actually a static member variable obj_ addr_ T lock, volatile in C + + does not guarantee synchronization. Compare called in the constructor ensures synchronization_ and_ Swap and a static variable lock When the lock variable is 1, you need to wait; When it is 0, set it to 1 through atomic operation, indicating that you have obtained the lock.

It is an accident to use a static variable here, which means that all unlocked structures share the same variable (actually size_t) to distinguish whether to lock or not. When this variable is set to 1, others that use spinlock need to wait. Why not add a private variable volatile obj in Sun:: misc:: unsafe_ addr_ t lock;, And pass it to spinlock as a construction parameter? This is equivalent to each unsafe sharing a flag bit. Will the effect be better?

_ Jv_ Threadyield is in the following file through the system call sched_ Yield (Man 2 sched_yield) gives up CPU resources. Macro have_ SCHED_ Yield is defined in configure, which means that spinlock is called a real spin lock if it is undefined during compilation.

posix-threads. h

This lock H has different implementations on different platforms. We take IA64 (Intel amd x64) platform as an example, and other implementations can be seen here.

ia64/locks. h

__ sync_ bool_ compare_ and_ Swap is a GCC built-in function, and the assembly instruction "memory" completes the memory barrier.

In short, the hardware ensures the synchronization of multi-core CPUs, and the implementation of unsafe is as efficient as possible. GCC Java is fairly efficient. I believe Oracle and openjdk will not be worse.

Atomic operation and GCC built-in atomic operation

Atomic operation

Java expressions and C + + expressions are not atomic operations, that is, in your code:

In a multithreaded environment, I's access is non atomic, which is actually divided into the following three operands:

The compiler changes the timing of execution, so the execution result may not be what you expect.

GCC built-in atomic operation

GCC has built-in atomic operations as follows, which are from 4.1.1 2 was added. Previously, they were implemented using inline assembly.

It should be noted that:

Openjdk related files

Here are some atomic operation implementations of openjdk9 on GitHub, hoping to help those who need to know. After all, openjdk is more widely used than GCC But there is no Oracle JDK source code after all, although it is said that the gap between openjdk and Oracle source code is very small.

AtomicInteger. java

Unsafe. java::compareAndExchangeObject

unsafe. cpp::Unsafe_ CompareAndExchangeObject

oop. inline. hpp::oopDesc::atomic_ compare_ exchange_ oop

atomic_ linux_ x86. hpp::Atomic::cmpxchg

Here you need to give a hint to Java programmers unfamiliar with C / C + +. The format of embedded assembly instructions is as follows

% 1,% 3,% 4 in the assembly template correspond to the following parameter lists {"R" (exchange_value), "R" (dest), "R" (MP)}. The parameter lists are separated by commas and sorted from 0. The output parameter is placed to the right of the first colon and the output parameter is placed to the right of the second colon. "R" indicates that it is put into the general register, "a" indicates register eax, and "=" indicates that it is used for output (write back). The cmpxchg instruction implicitly uses the eax register, parameter% 2

Other details are not listed here. The implementation of GCC is to pass down the pointer to be exchanged, and assign the value directly after the comparison is successful (the assignment is non atomic), and the atomicity is guaranteed through spinlock.

The implementation of openjdk is to pass down the pointers to be exchanged, directly assign values through assembly instructions cmpxchgq, and ensure atomicity through assembly instructions. Of course, the spinlock bottom layer of GCC is also guaranteed by cmpxchgq.

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>