Java Memory Model and garbage collection analysis of overflow of Java method area and runtime constant pool
1. Java Memory Model
When executing a program, the Java virtual machine divides the memory it manages into several data areas. The distribution of these data areas is shown in the following figure:
2. How are garbage objects determined
There are several object instances in the Java heap. Before recycling the heap, the garbage collector first needs to determine which objects are still "alive" and which are "dead", that is, objects that will not be used in any way.
Reference counting method
The implementation of reference counting method is simple and efficient. It is a good algorithm in most cases. Its principle is: add a reference counter to the object. Whenever there is a reference to the object, the counter will increase by 1. When the reference fails, the counter will decrease by 1. When the counter value is 0, it means that the object is no longer used. It should be noted that the reference counting method is difficult to solve the problem of circular references between objects. The mainstream Java virtual machines do not choose the reference counting method to manage memory.
Reachability analysis algorithm
The basic idea of this algorithm is to search down from these nodes through a series of objects called "GC roots" as the starting point, The search path is called reference chain. When an object is connected to GC roots without any reference chain (in graph theory, it means that the object is unreachable from GC roots). As shown in the figure, although objects 5, 6 and 7 are related to each other, they are unreachable to GC roots, so they will be judged as recyclable objects.
In the Java language, objects that can be used as GC roots include the following:
Now the question is, will there be a circular reference problem between objects in the reachability analysis algorithm? The answer is yes, that is, there will be no circular reference between objects. GC root is a specially defined "starting point" outside the object graph and cannot be referenced by objects in the object graph. For details, see: http://www.zhihu.com/question/29218534 。
To die or not to die
Even in the reachability analysis algorithm, unreachable objects are not "non dead". At this time, they are temporarily in the "Probation" stage. To really declare an object dead, it needs to go through at least two marking processes: if the object does not have a reference chain connected to GC roots after reachability analysis, Then it will be marked for the first time and filtered based on whether the object needs to execute the finalize () method. When the object does not overwrite the finalize () method, or the finalize () method has been called by the virtual machine, the virtual machine regards both cases as "unnecessary execution". In the program, you can override finalize() to a "thrilling" self rescue process, but this is only one chance.
The operation result is:
Go on to quote
Whether the number of references of an object is determined by reference counting algorithm, whether the reference chain of an object is reachable by reachability analysis algorithm, and whether the object survives are related to "reference". Before JDK 1.2, the definition of reference in java was very traditional: if the value stored in the data of reference type represents the starting address of another memory, it is said that this memory represents a reference. After JDK 1.2, Java expanded the concept of reference and divided it into four kinds: strong reference, soft reference, weak reference and phantom reference. The strength of these four kinds of references gradually weakened in turn.
Examples of using soft references:
The output result is:
3. Typical garbage collection algorithm
1. Mark sweep algorithm
This is the most basic garbage collection algorithm. The reason why it is the most basic is that it is the easiest to implement and the simplest idea. The mark clear algorithm is divided into two stages: Mark stage and clear stage. The task of the marking phase is to mark all objects that need to be recycled. The cleaning phase is to recycle the space occupied by the marked objects. The specific process is shown in the figure below:
It is easy to see from the figure that the mark clear algorithm is easy to implement, but there is a serious problem that it is easy to generate memory fragments. Too many fragments may lead to the inability to find enough space for large objects in the subsequent process, and trigger a new garbage collection action.
2. Copying algorithm
In order to solve the defects of mark sweep algorithm, copying algorithm is proposed. It divides the available memory into two equal sized blocks according to capacity, and only one of them is used at a time. When this block of memory runs out, copy the living objects to another block, and then clean up the used memory space at one time. In this way, it is not easy to have the problem of memory fragmentation. The specific process is shown in the figure below:
Although this algorithm is simple to implement, efficient and not easy to generate memory fragments, it makes a high price for the use of memory space, because the memory that can be used is reduced to half of the original.
Obviously, the efficiency of copying algorithm is closely related to the number of surviving objects. If there are many surviving objects, the efficiency of copying algorithm will be greatly reduced.
3. Mark compact algorithm
In order to solve the defects of copying algorithm and make full use of memory space, mark compact algorithm is proposed. The marking stage of the algorithm is the same as that of mark sweep, but after marking, it does not directly clean up recyclable objects, but moves all living objects to one end, and then cleans up the memory outside the end boundary. The specific process is shown in the figure below:
4. Generational collection algorithm
Generational collection algorithm is adopted by most JVM garbage collectors at present. Its core idea is to divide the memory into several different regions according to the life cycle of the object. Generally, the heap area is divided into the aged generation and the young generation. The characteristic of the old generation is that only a small number of objects need to be recycled during each garbage collection, while the characteristic of the new generation is that a large number of objects need to be recycled during each garbage collection, so the most suitable collection algorithm can be adopted according to the characteristics of different generations.
At present, most garbage collectors adopt the copying algorithm for the new generation, because most objects are recycled every time garbage collection in the new generation, that is, there are fewer operations to copy, but in practice, the space of the new generation is not divided according to the ratio of 1:1, Generally speaking, the Cenozoic era is divided into a larger Eden space and two smaller survivor spaces (generally 8:1:1). Eden space and one survivor space are used each time. When recycling, the surviving objects in Eden and survivor are copied to another survivor space, and then Eden and the survivor space just used are cleaned up.
Because of the characteristics of the old age, only a small number of objects are recycled each time, the mark compact algorithm is generally used.
reference material
1. In depth understanding of chapters 2 and 3 of Java virtual machine
2. Java garbage collection mechanism
3. Analysis of overflow of Java method area and runtime constant pool