Summary of GC knowledge of garbage collection mechanism and how to make good use of GC

1、 Why GC

The resource operation of an application is generally divided into the following steps:

1. Allocate memory for corresponding resources

2. Initialize memory

3. Use of resources

4. Clean up resources

5. Release memory

The common methods of resource (memory usage) management of applications are as follows:

1. Manual management: C, C++

2. Count management: com

3. Automatic management: NET,Java,PHP,GO…

However, the complexity of manual management and counting management can easily lead to the following typical problems:

1. The programmer forgot to free the memory

2. The application accesses the released memory

The consequences are very serious, such as memory leakage, data content scrambling, and most of the time, the behavior of the program will become strange and unpredictable, as well as access violation.

  . Net, Java and other solutions are memory management through the automatic garbage collection mechanism GC. In this way, problem 1 is naturally solved, and problem 2 has no foundation for existence.

Conclusion: the memory management method that cannot be automated is very easy to produce bugs and affect the system stability, especially in the online multi server cluster environment. When the program is executed, the bug must be located to a server, and then dump the memory to analyze the bug, which greatly undermines the programming enthusiasm of developers, and the continuous flow of similar bugs is disgusting.

2、 How does GC work

The working process of GC is mainly divided into the following steps:

1. Mark

2. Plan

3. Cleaning (sweep)

4. Reference update (relocate)

5. Compression (compact)

(I) marking

Target: find all instances whose reference is not 0 (Live)

Methods: find all GC roots, put them in the queue, and then recursively traverse all root nodes and all referenced child nodes and child nodes in turn, marking all traversed nodes as live. Weak references are not taken into account

(II) planning and liquidation

1. Plan

Objective: to determine whether compression is required

Method: traverse all the tags (Live) on all the current generation and make decisions according to specific algorithms

2. Cleaning

Goal: Reclaim all free space

Method: traverse all the tags (live or dead) on all the current generation, and add all the memory blocks in the middle of the live instance to the available memory linked list

(III) reference update and compression

1. Reference update

Target: update all referenced addresses

Methods: calculate the new address corresponding to each instance after compression, find all GC root nodes (GC roots), put them in the queue, and then recursively traverse all root nodes and all referenced child nodes and child nodes in turn, and update the addresses referenced in all traversed nodes, including weak references.

2. Compression

Goal: reduce memory fragmentation

Method: move the instance to the corresponding location according to the calculated new address.

3、 Root node of GC

What is the GC root node that appears repeatedly in this article?

Each application contains a set of roots. Each root is a storage location that contains a pointer to an object of reference type. The pointer either refers to an object in the managed heap or is null.

In an application, as long as an object becomes unreachable, that is, there is no root reference to the object, the object will become the target of the garbage collector.

GC roots are not objects in themselves but are instead references to objects Moreover, any object referred by a GC root will automatically survive the next garbage collection   . Net can be used as GC root as follows:

1. Global variables

2. Static variables

3. All local variables on the stack (JIT)

4. Parameter variables passed in on the stack

5. Variables in registers

Note that only variables of reference type are considered as roots, and variables of value type are never considered as roots. Only by deeply understanding the difference in memory allocation and management between reference type and value type can we know why root can only be reference type.

Incidentally, in Java, there are several objects that can be used as GC root:

1. Referenced objects in virtual machine (JVM) stack

2. Objects referenced by class static attributes in the method area

3. Objects referenced by constants in the method area (mainly constant values declared as final)

4. The object referenced by JNI in the local method stack

4、 When did this happen

1. When the application allocates new objects, the budget size of the generation of GC has reached the threshold. For example, generation 0 of GC is full

2. The code actively and explicitly calls system GC. Collect()

3. Other special cases, such as windows reporting insufficient memory, CLR unloading AppDomain, CLR closing, and even changes in system parameter settings in some extreme cases, may lead to GC recycling

5、 Generation in GC

Generation is mainly introduced to improve performance and avoid collecting the whole heap. A generation based garbage collector makes the following assumptions:

1. The newer the object, the shorter the lifetime

2. The older the object, the longer the survival time

3. Recycle part of the reactor faster than the whole reactor

  . Net garbage collector divides objects into three generations (generation0, generation1, generation2). The contents of different generations are as follows:

1. G0 small object (size < 85000byte)

2. G1: G0 object that survived in GC

3. G2: large object (size > = 85000byte); G1 objects that survived in GC

PS: you must know that CLR requires all resources to be allocated from managed heap. CLR will manage two types of heap, small object heap (soh) and large object heap (LOH). All memory allocation greater than 85000bytes will be carried out on LOH. An interesting question is why 85000bytes?

Generation collection rule: after a generation n is collected, the surviving objects in this generation will be marked as generation n + 1 objects. GC performs different inspection strategies on objects of different generations to optimize performance. Generation 0 objects are checked every GC cycle. About 1 / 10 of the GC cycle checks generation 0 and generation 1 objects. About 1 / 100 of the GC cycle checks all objects.

6、 Caution: explicitly calling GC

GC is usually very expensive and its operation is uncertain. In Microsoft's programming specification, it is strongly recommended that you do not explicitly call GC. However, some methods of GC in the framework can still be used for manual recycling in your code. The premise is that you must deeply understand the recycling principle of GC, otherwise manually calling GC can easily interfere with the normal recycling of GC and even introduce unpredictable errors in specific scenarios.

For example, the following code:

If there is no GC Collect(), O1 and O2 will enter gen0 in the next automatic garbage collection, but add GC Collect(), O2 will be marked as GEN1, that is, generation 0 reclaims the memory occupied by O2

In other cases, nonstandard programming may lead to deadlock, such as a widely circulated piece of code:

Call through the following code:

The above code will cause a deadlock. The causes are analyzed as follows:

1. The client main thread calls the code monitor The enter (instance) code segment locks the instance

2. Then manually execute GC recovery, and the main (finalizer) thread will execute the MyClass destructor

3. Inside the MyClass destructor, the lock (this) code is used, but the main thread has not released the instance (that is, this here). At this time, the main thread can only wait

Although strictly speaking, the above code is not the fault of GC and seems to have nothing to do with multithreading, it is caused by incorrect use of lock.

At the same time, please note that some behaviors of GC are completely different in debug and release modes (Jeffrey Richter cited a timer example in < < CLR via c# > > to illustrate this problem). For example, for the above code, you may find that it runs normally in debug mode, but it will deadlock in release mode.

7、 When GC encounters multithreading

The garbage collection algorithm discussed above has a great premise: it runs only on one thread. In real development, multiple threads often access the managed heap at the same time, or at least multiple threads operate the objects in the heap at the same time. When one thread causes garbage collection, other threads must not access any threads, because the garbage collector may move these objects and change their memory location. When the CLR wants to garbage collect, it will immediately suspend all threads executing managed code, and the threads executing unmanaged code will not suspend. Then, the CLR checks the instruction pointer of each thread to determine where the thread points to. Next, the instruction pointer is compared with the JIT generated table to determine what code the thread is executing.

If the instruction pointer of a thread happens to be at the offset marked in a table, it indicates that the thread has reached a safe point. Threads can safely hang at a safe point until garbage collection ends. If the thread instruction pointer is not at the offset marked in the table, it indicates that the thread is not at the safe point, and the CLR will not start garbage collection. In this case, the CLR hijacks the thread. That is, the CLR will modify the thread stack so that the thread points to a special function within a CLR. The thread then resumes execution. After the current method is executed, it will execute this special function, which will safely suspend the thread. However, threads sometimes execute the current method for a long time. Therefore, when the thread resumes execution, it takes about 250 milliseconds to attempt to hijack the thread. After this time, the CLR suspends the thread again and checks the instruction pointer of the thread. If the thread has reached a safe point, garbage collection can begin. However, if the thread has not reached a safe point, the CLR checks whether another method has been called. If so, the CLR modifies the thread stack again to hijack the thread after returning from a recently executed method. The CLR then resumes the thread for the next hijacking attempt. Garbage collection cannot be used until all threads have reached a safe point or have been hijacked. After garbage collection, all threads will resume, the application will continue to run, and the hijacked threads will return to the method that originally called them.

In practical applications, CLR mostly suspends threads by hijacking threads, rather than judging whether threads have reached a safe point according to the table generated by JIT. The reason for this is that JIT generated tables require a lot of memory, which will increase the working set and seriously affect the performance.

Here is another real case. After a large number of tasks are used in a web application, inexplicable phenomena occur in the production environment, and the program works sometimes and sometimes, According to the database log (in fact, it can also be based on Windows event tracking (ETW), IIS log and dump file), it is found that there are irregular unhandled exceptions during task execution. After analysis, it is suspected that it is caused by CLR garbage collection. Of course, this situation will be exposed only under high concurrency.

8、 Some suggestions and opinions in development

Due to the high cost of GC, paying attention to some good programming habits in normal development may have a positive impact on GC, otherwise it may have adverse effects.

1. Try not to new large objects. Large objects (> = 85000byte) are directly classified as generation G2. The GC recycling algorithm never compresses the memory of the large object heap (LOH), because moving down memory blocks of 85000bytes or more in the heap will waste too much CPU time

2. Don't frequent new objects with a short life cycle. Frequent garbage collection and frequent compression may lead to a lot of memory fragments. You can use the well-designed and stable object pool technology to avoid this problem

3. Use better programming skills, such as better algorithms, better data structures, better solutions, etc

  update:. NET4. 5.1 and above versions already support compressing large object heap, which can be accessed through system Runtime. GCSettings. The control implementation of largeobjectheapcompactionmode needs to compress LOH. See here.

According to experience, sometimes the space and time in the programming idea can not be used indiscriminately. If it is not used well, not only the system performance can not be guaranteed, but also it may lead to out of memory. For oom, please refer to an article I wrote before to effectively prevent. Net application OOM.

Previously, when maintaining a system, I found that there were many processing logic for a large amount of data, but there was no batch and paging processing. With the continuous expansion of the amount of data, hidden problems will continue to be exposed. Then, when I rewrite, I design and implement it according to the idea of batch and multiple times. With multithreading, multi process and distributed cluster technology, no matter how large the amount of data can be handled, the performance will not decline, and the system will become more stable and reliable.

9、 GC thread and finalizer thread

GC runs in a separate thread to delete memory that is no longer referenced.

The finalizer uses another independent (high priority CLR) thread to perform memory reclamation of the finalizer's objects.

The finalizer of an object is executed at an uncertain time after the object is no longer referenced. As in C + +, the destructor is not executed immediately when the object exceeds its life cycle.

GC puts each object that needs to be finalized into a queue (moved from the termination list to the freeable queue), and then starts another thread instead of the thread executed by GC to execute all these finalizers. GC thread continues to delete other objects to be recycled.

In the next GC cycle, the memory of these objects after finalizer execution will be recycled. In other words, an object that implements the finalize method must wait twice for GC to be completely released. This also indicates that an object with a finalize method (object does not count by default) will automatically "extend" its lifetime in GC.

Special note: the thread responsible for calling finalize does not guarantee the call order of finalize of each object, which may lead to subtle dependency problems (see < < CLR via c# > > an interesting dependency problem).

Reproduced at: https://www.cnblogs.com/jeffwongishandsome/p/talk-about-GC-and-how-to-use-GC-better.html

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>