Four optimization methods of Java virtual machine for internal lock

Since Java 6 / Java 7, the Java virtual machine has optimized the implementation of internal locks. These optimizations mainly include lock elision, lock coarsening, biased locking and adaptive locking. These optimizations only work in the Java virtual machine server mode (that is, when running Java programs, we may need to specify the Java virtual machine parameter "- server" in the command line to turn on these optimizations).

1 lock elimination

Lock elimination is an optimization made by the JIT compiler for the specific implementation of internal locks.

Schematic diagram of lock elision

When dynamically compiling synchronous blocks, JIT compilers can use a technique called escape analysis (escape analysis) to determine whether the lock object used by the synchronization block can only be accessed by one thread without being published to other threads. If the lock object used by the synchronization block is proved to be accessed by only one thread through this analysis, the JIT compiler does not generate the application and release pair of the lock represented by synchronized when compiling the synchronization block The machine code corresponding to the original critical area code shall be generated, This causes the dynamically compiled bytecode to be like two bytecode instructions that do not contain monitorenter and monitorexit, which eliminates the use of locks. This compiler optimization is called lock elision, which enables us to completely eliminate the cost of locks under specific circumstances.

Although some classes in the Java standard library (such as StringBuffer) are thread safe, in practice, we often do not share instances of these classes among multiple threads. These classes often rely on internal locks when implementing thread safety. Therefore, these classes are common goals of lock elimination optimization.

Listing 12-1 example code for lock elimination optimization

In the above example, when compiling the tojson method, the JIT compiler will call its StringBuffer The append / toString method is inline into the method, which is equivalent to copying the instructions in the method body of the stringbuffer.append/tostring method into the tojson method body. The StringBuffer instance SBF here is a local variable, and the object referenced by the variable is not published to other threads, so the object referenced by SBF can only be used by the method where SBF is located Current execution thread of (tojson method) (a thread) access. Therefore, the JIT compiler can eliminate the internal lock used by the instruction copied from the method body of the stringbuffer.append/tostring method in the tojson method. In this example, the lock used by the stringbuffer.append/tostring method itself will not be eliminated, because there may be other places in the system using StringBuffer, and these codes can be used Can share StringBuffer instances.

The escape analysis technology that lock elimination optimization relies on is enabled by default since Java se 6u23, but lock elimination optimization was introduced in Java 7.

As can be seen from the above example, lock elimination optimization may also be based on the inline optimization of JIT compiler. Whether a method will be inlined by the JIT compiler depends on the heat of the method and the bytecode size corresponding to the method. Therefore, whether lock elimination optimization can be implemented also depends on whether the called synchronization method (or method with synchronization block) can be inlined.

Lock elimination optimization tells us that we must use locks when we should use locks without paying too much attention to the cost of locks. Developers should consider whether a lock is needed at the logical level of the code, and whether a lock is really necessary at the code running level is determined by the JIT compiler. Lock elimination optimization does not mean that developers can use internal locks freely when writing code (locking without locking) because lock elimination is an optimization made by the JIT compiler rather than javac, and a piece of code may be optimized by the JIT compiler only if it is executed frequently enough. That is, before JIT compiler optimization is introduced, as long as an internal lock is used in the source code, the cost of the lock will exist. Another In addition, the inline optimization, escape analysis and lock elimination optimization performed by the JIT compiler have their own overhead.

Under the function of lock elimination, using ThreadLocal to use a thread safe object (such as random) as a thread specific object can not only avoid lock contention, but also completely eliminate the cost of locks used inside these objects.

2 lock coarsening

Lock coarsening / lock merging is an optimization made by the JIT compiler for the specific implementation of internal locks.

Schematic diagram of lock coarsening

For several adjacent synchronization blocks, if these synchronization blocks use the same lock instance, the JIT compiler will merge these synchronization blocks into a large synchronization block, so as to avoid the overhead caused by a thread repeatedly applying for and releasing the same lock. However, lock coarsening may cause a thread to hold a lock for a longer time, so that the waiting time of other threads synchronized on the lock when applying for a lock becomes longer. For example, in the above figure, in the time gap between the end of the first synchronization block and the start of the second synchronization block, other threads had the opportunity to obtain monitorx, but after lock coarsening, the waiting time required by these threads to apply for monitorx also increased due to the longer length of the critical area. Therefore, lock coarsening is not applied to adjacent synchronization blocks in the loop body.

If there are other statements between two adjacent synchronization blocks, it does not necessarily prevent the JIT compiler from performing lock coarsening optimization. This is because the JIT compiler may move these statements (i.e. instruction reordering) to the critical area of the next synchronization block before performing lock coarsening optimization (of course, the JIT compiler does not move the code in the critical area outside the critical area).

In fact, there may be few consecutive synchronization blocks in the code we write. This adjacent synchronization block guided by the same lock instance is often formed after JIT compiler compilation.

For example, in the following example

Listing 12-2 example code for lock coarsening optimization

The simulate method continuously calls the randomiq method to generate three random IQs that conform to the normal distribution (Gaussian distribution). When the simulate method is executed frequently enough, the JIT compiler may perform a series of optimization on the method: first, the JIT compiler may inline the randomiq method (inline) into the simulate method, which is equivalent to copying the instructions in the randomiq method body into the simulate method. On this basis, the RND. Nextgaussian() call in the randomiq method may also be inlined, which is equivalent to random The instructions in the body of the nextgaussian () method are copied into the simulate method. Random. Nextgaussian () is a synchronous method. Because the random instance RND may be shared by multiple threads (because the simulate method may be executed by multiple threads), the JIT compiler cannot perform lock elimination optimization on the random. Nextgaussian () method itself, which makes the random. Method inline into the simulate method The body of the nextgaussian () method is equivalent to a synchronization block guided by RND. After the above optimization, the JIT compiler will find that there are three adjacent synchronization blocks guided by RND (random instance) in the simulate method, so the lock coarsening optimization is "on the stage".

Lock coarsening is on by default. If you want to turn off this feature, we can add the virtual machine parameter "- XX: - eliminatelocks" in the startup command line of the Java program (if it is turned on, you can use the virtual machine parameter "- XX: + eliminatelocks").

3 deflection lock

Biased locking is an optimization of the implementation of locks by the Java virtual machine. This optimization is based on the observation that most locks are not contested, and these locks will be held by at most one thread in their entire life cycle. However, the Java virtual machine implements the monitorenter bytecode (applying for lock) and monitorexit bytecode (releasing lock) need to rely on an atomic operation (CAS operation), which is relatively expensive. Therefore, Java virtual opportunity maintains a bias for each object, that is, if the internal lock corresponding to an object is obtained by a thread for the first time, this thread will be recorded as the preference thread of the object (biased thread). Whether the thread applies for the lock again or releases the lock, it does not need to rely on the expensive atomic operation originally (before biased lock optimization is implemented), so as to reduce the overhead of lock application and release.

However, the fact that a lock is not contested does not mean that only one thread accesses the lock. When another thread other than an object's preference thread applies for the object's internal lock, The Java virtual machine needs to revoke the object's "preference" for the original preference thread And reset the preference thread of the object. The cost of this preference recovery and reallocation process is also relatively expensive. Therefore, if there are many lock contentions during program operation, the cost of this preference recovery and reallocation will be amplified. In view of this, biased lock optimization is only suitable for systems where a large number of locks are not contested. If there are a large number of contested locks in the system, and the non contested locks account for only a small part, we can consider turning off biased lock optimization.

Bias lock optimization is on by default. To turn off biased lock optimization, we can add the virtual machine parameter "- XX: - usebiasedlocking" (the virtual machine parameter "- XX: + usebiasedlocking" can be used to turn on biased lock optimization) in the startup command line of the Java program.

4 adaptive lock

Adaptive locking (also known as adaptive spinning) is an optimization of internal lock implementation by JIT compiler.

In the case of lock contention, when a thread applies for a lock, if the lock happens to be held by other threads, the thread needs to wait for the lock to be released by its holding thread. A conservative way to achieve this wait -- pause the thread (the thread's life cycle state changes to non runnable state). Because suspending a thread will lead to context switching, this implementation strategy is more suitable for a specific lock instance where most threads in the system hold the lock for a long time, so as to offset the overhead of context switching. Another implementation method is to use busy, etc (busy wait). The so-called busy wait is equivalent to a loop statement with empty loop body as shown in the following code:

so Busy, etc. is through repeated empty operations (do nothing) wait until the required conditions are established. The advantage of this strategy is that it will not lead to context switching. The disadvantage is that it consumes processor resources - if the required conditions are not established for a long time, the busy cycle will be executed all the time. Therefore, for a specific lock instance, the busy policy is more suitable for Most threads hold the lock for a short time, which can avoid excessive processor time overhead.

In fact, the Java virtual machine does not have to choose between the above two implementation strategies -- it can use the above two strategies in combination. For a specific lock instance, the Java virtual opportunity determines whether the lock is "longer" or "shorter" held by the thread according to the information collected during its operation. For the lock held by the thread for a "long time", the Java virtual opportunity selects the pause and wait strategy; For the lock held by the thread for a "short time", the Java virtual opportunity selects the busy wait strategy. The Java virtual machine may also adopt the busy wait strategy first, and then adopt the pause wait strategy when the busy wait fails. This optimization of Java virtual machine is called adaptive locking, which also needs the intervention of JIT compiler.

Adaptive lock optimization can be based on a specific lock instance. In other words, the Java virtual machine may adopt a busy wait strategy for one lock instance and a pause wait strategy for another lock instance.

From the adaptive lock optimization, we can see that the use of internal locks does not necessarily lead to context switching, which is why we say that locks and context switching are "possible" to lead to context switching.

This article is selected from the practical guide to Java multithreaded programming (core).

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.               

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>