Java Memory Model JMM detailed explanation
JAVA memory model, abbreviated as JMM, is a unified guarantee of a series of Java virtual machine platforms to the memory visibility, reordering and other issues provided by developers in a multithreaded environment. (it may be terminologically ambiguous with Java runtime memory distribution, which refers to memory areas such as heap, method area, thread stack, etc.). There are many styles of concurrent programming. In addition to CSP (communication sequential process), actor and other models, the most familiar one should be the shared memory model based on thread and lock. In multithreaded programming, three types of concurrency problems need to be paid attention to:
・ atomicity ・ visibility ・ reordering
Atomicity involves whether other threads can see the intermediate state or interfere when a thread performs a composite operation. The typical problem is I + +. Two threads perform + + operations on the shared heap memory at the same time, and the implementation of + + operations in the JVM, runtime and CPU may be a composite operation. For example, from the perspective of JVM instructions, the value of I is read from the heap memory to the operand stack, plus one, and then written back to the I in the heap memory. If there is no correct synchronization during these operations, Other threads can also execute at the same time, which may cause problems such as data loss. Common atomicity problems, also known as race conditions, are judged based on a possible failure result, such as read modify write. Both visibility and reordering problems stem from the optimization of the system.
Due to the serious mismatch between the execution speed of the CPU and the access speed of the memory, in order to optimize the performance, based on the locality principles such as time locality and space locality, the CPU adds a multi-layer cache between the CPU and the memory. When it needs to fetch data, the CPU will first go to the cache to find out whether the corresponding cache exists, and return directly if it exists, If it does not exist, it is taken out of memory and saved in the cache. Now multi-core processors have become standard. At this time, each processor has its own cache, which involves the problem of cache consistency. CPUs have different consistency models, the strongest consistency and the highest security, which is also in line with our sequential thinking mode. However, in terms of performance, there will be a lot of overhead due to the need for coordinated communication between different CPUs.
The typical CPU cache structure is shown below
The instruction cycle of CPU is usually fetch instruction, parse instruction, read data, execute instruction, and write data back to register or memory. When serially executing instructions, the part of reading and storing data takes a long time, so CPU generally adopts the way of instruction pipeline to execute multiple instructions at the same time to improve the overall throughput, just like factory pipeline.
Compared with writing back data to memory, the speed of executing instructions is not an order of magnitude, so CPU uses registers and cache as cache and buffer. When reading data from memory, The module that writes back the data will put the storage request into a store buffer to continue to execute the next stage of the instruction cycle when the old data is not in the cache. If it is stored in the cache, the cache will be updated, and the data in the cache will be flushed into memory according to certain policies.
When the above code is executed, we may think that count = 1 will be executed before stop = false, which is correct in the ideal state shown in the CPU execution diagram above, but it is not correct when the upper register and cache buffer are considered. For example, stop itself is in the cache but count is not, It is possible that the write buffer of count is flushed to memory after the stop update.
In addition, CPU The compiler (generally referred to as JIT for Java) may modify the instruction execution order. For example, in the above code, count = 1 and stop = False have no dependency, so both CPU and compiler may modify the order of the two. In the view of programs executed by single thread, the result is the same, which is also the as if serial guaranteed by CPU and compiler (no matter how the execution order is modified, the execution result of a single thread remains unchanged). Since most program execution is single threaded, such optimization is acceptable and brings great performance improvement. However, in the case of multithreading, unexpected results may occur if the necessary synchronization operation is not carried out. For example, after thread T1 executes the initcountandstop method, thread T2 executes printresult, and the result may be 0, false, 1, 0, true. If thread T1 executes doloop () first and thread T2 executes initcountandstop one second later, T1 may jump out of the loop, or it may never see the modification of stop due to compiler optimization.
Due to the above problems in the case of multithreading, the program sequence in multithreading is no longer the execution sequence and result in the underlying mechanism. The programming language needs to give developers a guarantee. This guarantee is simply when the modification of one thread is visible to other threads. Therefore, the Java language proposes the Java Memory model, that is, the JAVA memory model, for the Java language, JVM Implementers such as compilers need to implement according to the Convention of this model. Java provides volatile, synchronized, final and other mechanisms to help developers ensure the correctness of multithreaded programs on all processor platforms.
At jdk1 Before 5, Java's memory model had serious problems. For example, in the old memory model, a thread may see the default value of a final field after the constructor is executed, and the writing of volatile field may reorder the reading and writing of non volatile field.
So in jdk1 5, a new memory model is proposed through jsr133 to fix the previous problems.
Reordering rule
Volatile and monitor locks
Ordinary reading refers to getfield, getstatic, arraload of non volatile array, and ordinary writing refers to putfield, putstatic, arraystore of non volatile array.
Volatile reads and writes are getfield, getstatic and putfield, putstatic of volatile fields respectively.
Monitorenter refers to entering the synchronization block or synchronization method, and monitorexist refers to exiting the synchronization block or synchronization method.
No in the above table means that two operations in sequence are not allowed to be reordered. For example (normal write, volatile write) means that the writing of non volatile fields cannot be reordered with the writing of any subsequent volatile fields. When there is no no, it means that reordering is allowed, but the JVM needs to ensure the minimum security - the read value is either the default value or written by other threads (64 bit double and long read-write operations are special cases. When there is no volatile decoration, the read-write is not guaranteed to be original, and the underlying layer may split it into two separate operations).
Final field
The final field has two additional special rules
Neither the writing of the final field (in the constructor) nor the writing of the reference of the final field object itself can be reordered with the subsequent writing of the object holding the final field (outside the constructor). For example, the following statement cannot be reordered
The first loading of the final field cannot be reordered with the writing of the object holding the final field. For example, the following statement does not allow reordering
Memory barrier
Processors support certain memory barriers or fences to control reordering and data visibility between different processors. For example, when the CPU writes back the data, it will put the store request into the write buffer and wait for it to be flushed to the memory. You can insert a barrier to prevent the store request from reordering with other requests and ensure the visibility of the data. An example in life can be used to compare barriers. For example, when taking the ramp elevator of the subway, people enter the elevator in order, but some people will go around from the left, so the order when getting out of the elevator is different. If a person carries a large luggage blocked (barrier), the people behind can't go around:). In addition, the barrier here is different from the write barrier used in GC.
Classification of memory barriers
Almost all processors support a certain coarse-grained barrier instruction, usually called fence (fence), which can ensure that the load and store instructions initiated before fence can be strictly in order with the load and store after fence. Generally, there are four types of barriers according to their purpose
LoadLoad Barriers
Load1; LoadLoad; Load2;
Ensure that the data of load1 is loaded before load2 and subsequent loads
StoreStore Barriers
Store1; StoreStore; Store2
Ensure that the data of store1 is visible to other processors before the data of store2 and later
LoadStore Barriers
Load1; LoadStore; Store2
Ensure that the data of load1 is loaded before store2 and subsequent data flush
StoreLoad Barriers
Store1; StoreLoad; Load2
Ensure that the data of store1 is visible in front of other processors (such as flush to memory) before the data of load2 and subsequent load. The storeload barrier prevents load from reading old data instead of data recently written by other processors.
Almost all modern multiprocessors need storeload. The overhead of storeload is usually the largest, and storeload has the effect of other three barriers, so storeload can be used as a general (but higher overhead) barrier.
Therefore, using the above memory barrier, the reordering rules in the above table can be realized
In order to support the rule of final field, you need to add a barrier to the final write
x.finalField = v; StoreStore; sharedRef = x;
Insert memory barrier
Based on the above rules, barriers can be added to the processing of volatile fields and synchronized keywords to meet the rules of the memory model
Insert the storestore barrier before the volatile store. After all final fields are written, but before the constructor returns, insert the storestore. After the volatile store, insert the storeload barrier. After the volatile load, insert the loadload and loadstore barriers. The monitor enter and volatile load rules are consistent, and the monitor exit and volatile store rules are consistent. HappenBefore
The various memory barriers mentioned above are relatively complex for developers. Therefore, JMM can use a series of rules of happenbefore partial order relationship to explain that in order to ensure that the thread executing operation B sees the result of operation a (whether a and B are executed in the same thread or not), the happenbefore relationship must be satisfied between a and B, Otherwise, the JVM can reorder them arbitrarily.
Happenbefore rule list
Happendbefore rules include
Program sequence rule: if operation a in the program is before operation B, Then, operation a in the same thread will execute the monitor lock before operation B. rule: the lock operation on the monitor lock must execute the volatile variable before the lock operation on the same monitor lock. Rule: the write operation of the volatile variable must execute the thread startup rule before the read operation of the variable. Rule: execute the thread on the thread The call of start must execute the thread end rule before any operation is executed in the thread: any operation in the thread must execute the interrupt rule before other threads detect that the thread has ended: when a thread calls interrupt in another thread, it must execute the transitivity rule before the interrupted thread detects interrupt: if operation a is executed before operation B, And operation B is performed before operation C, then operation a is performed before operation C. The display lock has the same memory semantics as the monitor lock, and the atomic variable has the same memory semantics as volatile. The acquisition and release of locks and the reading and writing of volatile variables meet the full order relationship, so volatile writes can be used before subsequent volatile reads.
Multiple rules of happenbefore above can be used for combination.
For example, after thread a enters the monitor lock, the operation before releasing the monitor lock is happenbefore the monitor release operation according to the program sequence rule, while the monitor release operation happenbefore is the acquisition operation of the same monitor lock by subsequent thread B, and the acquisition operation happenbefore is the same as the operation in thread B.
summary
The above is all about the detailed explanation of Java Memory Model JMM in this paper. I hope it will be helpful to you. If there are deficiencies, please leave a message to point out. Thank you for your support!