Java garbage collection mechanism

When it comes to garbage collection (GC), many people naturally associate it with Java. In Java, programmers don't need to care about dynamic memory allocation and garbage collection, which are all handled by the JVM.

As the name suggests, garbage collection is to free up the space occupied by garbage. What kind of objects will be recognized as "garbage" in Java? Then, when some objects are identified as garbage, what strategies are used to recycle (free up space)? What are the typical garbage collectors in the current commercial virtual machine? Let's discuss these problems one by one. The following is the outline of this article:

How to determine that an object is "garbage"? Typical garbage collection algorithm typical garbage collector

I How to determine that an object is "garbage"?

In this section, we first understand the most basic problem: if an object is determined to be "garbage"? Since the task of the garbage collector is to reclaim the space occupied by garbage objects for new objects, how does the garbage collector determine that an object is "garbage"? That is, how to judge whether an object can be recycled.

In Java, objects are associated by reference, that is, if you want to manipulate objects, you must do so by reference. Obviously, a simple way is to judge whether an object can be recycled by reference counting. Without losing generality, if an object has no reference associated with it, it means that the object is basically unlikely to be used elsewhere, then the object will become a recyclable object. This method is called reference counting.

This method is characterized by simple implementation and high efficiency, but it cannot solve the problem of circular reference. Therefore, this method is not adopted in Java (Python adopts reference counting method). See the following code:

The last two sentences assign object1 and object2 null, that is, the objects pointed to by object1 and object2 can no longer be accessed, but because they refer to each other, their reference count is not 0, so the garbage collector will never recycle them.

In order to solve this problem, reachability analysis is adopted in Java. The basic idea of this method is to search through a series of "GC roots" objects as the starting point. If there is no reachable path between "GC roots" and an object, the object is said to be unreachable. However, it should be noted that the object determined to be unreachable does not necessarily become a recyclable object. The object determined to be unreachable must go through at least two marking processes to become a recyclable object. If it still does not escape the possibility of becoming a recyclable object in these two marking processes, it will basically become a recyclable object.

As for the specific operation of reachability analysis, I don't understand it at the moment. If any friend knows it better, please don't hesitate to give me advice.

Here is an example:

What line might make an object recyclable? The code in line 7 causes an object to become recyclable. As for why it is left to the readers to think for themselves.

Take another example:

Which of these three sentences will make the string object recyclable? In the second and third sentences, the second sentence will judge the string object as recyclable when the memory is insufficient, and the third sentence will judge the string object as recyclable under any circumstances.

Finally, summarize the common cases of judging objects as recyclable objects:

1) Explicitly assign a reference to null or point a reference that already points to an object to a new object, such as the following code:

2) The object pointed to by the local reference, such as the following code:

Each time the loop is executed, the generated object object will become a recyclable object.

3) Only weakly references the objects associated with it, such as:

II Typical garbage collection algorithms

After determining which garbage can be recycled, what the garbage collector has to do is start garbage collection, but there is a problem involved: how to recycle garbage efficiently. Because the Java virtual machine specification does not make clear provisions on how to implement the garbage collector, the virtual machines of various manufacturers can implement the garbage collector in different ways. Therefore, only the core ideas of several common garbage collection algorithms are discussed here.

1. Mark sweep algorithm

This is the most basic garbage collection algorithm. The reason why it is the most basic is that it is the easiest to implement and the simplest idea. The mark clear algorithm is divided into two stages: Mark stage and clear stage. The task of the marking phase is to mark all objects that need to be recycled. The cleaning phase is to recycle the space occupied by the marked objects. The specific process is shown in the figure below:

It is easy to see from the figure that the mark clear algorithm is easy to implement, but there is a serious problem that it is easy to generate memory fragments. Too many fragments may lead to the inability to find enough space for large objects in the subsequent process, and trigger a new garbage collection action.

2. Copying algorithm

In order to solve the defects of mark sweep algorithm, copying algorithm is proposed. It divides the available memory into two equal sized blocks according to capacity, and only one of them is used at a time. When this block of memory runs out, copy the living objects to another block, and then clean up the used memory space at one time. In this way, it is not easy to have the problem of memory fragmentation. The specific process is shown in the figure below:

Although this algorithm is simple to implement, efficient and not easy to generate memory fragments, it makes a high price for the use of memory space, because the memory that can be used is reduced to half of the original.

Obviously, the efficiency of copying algorithm is closely related to the number of surviving objects. If there are many surviving objects, the efficiency of copying algorithm will be greatly reduced.

3. Mark compact algorithm

In order to solve the defects of copying algorithm and make full use of memory space, mark compact algorithm is proposed. The marking stage of the algorithm is the same as that of mark sweep, but after marking, it does not directly clean up recyclable objects, but moves all living objects to one end, and then cleans up the memory outside the end boundary. The specific process is shown in the figure below:

4. Generational collection algorithm

Generational collection algorithm is adopted by most JVM garbage collectors at present. Its core idea is to divide the memory into several different regions according to the life cycle of the object. Generally, the heap area is divided into the aged generation and the young generation. The characteristic of the old generation is that only a small number of objects need to be recycled during each garbage collection, while the characteristic of the new generation is that a large number of objects need to be recycled during each garbage collection, so the most suitable collection algorithm can be adopted according to the characteristics of different generations.

At present, most garbage collectors adopt the copying algorithm for the new generation, because most objects are recycled every time garbage collection in the new generation, that is, there are fewer operations to copy, but in practice, the space of the new generation is not divided according to the ratio of 1:1. Generally speaking, the Cenozoic era is divided into a larger Eden space and two smaller survivor spaces. Eden space and one survivor space are used each time. When recycling, the surviving objects in Eden and survivor are copied to another survivor space, and then Eden and the survivor space just used are cleaned up.

Because of the characteristics of the old age, only a small number of objects are recycled each time, the mark compact algorithm is generally used.

Note that there is another generation outside the heap, namely the permanent generation, which is used to store class classes, constants, method descriptions, etc. the recycling of the permanent generation mainly recycles two parts: discarded constants and useless classes.

III Typical garbage collector

Garbage collection algorithm is the theoretical basis of memory recycling, and garbage collector is the specific implementation of memory recycling. The following describes several garbage collectors provided by the hotspot (JDK 7) virtual machine. Users can combine the collectors used in various years according to their own needs.

1.Serial/Serial Old

The serial / serial old collector is the most basic and oldest collector. It is a single threaded collector, and all user threads must be suspended during garbage collection. The serial collector is for the new generation of collectors, using the copying algorithm, and the serial old collector is for the old generation of collectors, using the mark compact algorithm. Its advantage is simple and efficient, but its disadvantage is that it will bring pause to users.

2.ParNew

The parnew collector is a multi-threaded version of the serial collector that uses multiple threads for garbage collection.

3.Parallel Scavenge

Parallel scavenge collector is a new generation of multi-threaded collector (parallel collector). It does not need to pause other user threads during recycling. It uses the copying algorithm. This collector is different from the first two collectors. It is mainly to achieve a controllable throughput.

4.Parallel Old

Parallel old is an older version of parallel scavenge collector (parallel collector), which uses multithreading and mark compact algorithm.

5.CMS

CMS (current mark sweep) collector is a collector aiming at obtaining the shortest recovery pause time. It is a concurrent collector using mark sweep algorithm.

6.G1

G1 collector is the most advanced achievement in the development of collector technology. It is a collector for server-side applications. It can make full use of multi CPU and multi-core environment. Therefore, it is a parallel and concurrent collector, and it can establish a predictable pause time model.

Here is a supplement about memory allocation:

Generally speaking, the memory allocation of objects is allocated on the heap. Objects are mainly allocated in the new generation of Eden space and from space. In a few cases, they are directly allocated to the old generation. If the space of the new generation Eden space and from space is insufficient, a GC will be initiated. If the Eden space and from space can accommodate the object after GC, it will be placed in Eden space and from space.

During GC, live objects in Eden space and from space are moved to to to space, and then Eden space and from space are cleaned up. If to space cannot store an object enough during cleanup, the object will be moved to the older generation. After GC, Eden space and to space are used. During the next GC, the living objects will be copied to from space and repeated. When an object evades GC once in the survivor area, its object age will be increased by 1. By default, if the object reaches the age of 15, it will be moved to the elderly generation.

Generally speaking, large objects are directly allocated to the old age. The so-called large objects refer to objects that require a large amount of continuous storage space. The most common large objects are large arrays, such as:

This generally allocates storage space directly to the older generation.

Of course, the allocation rules are not 100% fixed, which depends on which garbage collector combination is currently used and the relevant parameters of the JVM.

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>