Deep understanding of garbage collection in Java
premise
Recently, due to the large volume of system business, from the production GC log (combined with pinpoint), it is necessary to perform GC tuning on some systems. However, in view of the fact that we did not specialize in this area in the past, but there has always been scattered accumulation, here is a relatively comprehensive summary. This article only focuses on the hotspot VM, that is, Oracle hotspot VM or openjdk hotspot VM. The version is java8, and other VMS are not necessarily applicable.
What is GC (garbage collection)
Garbage collection can be translated as "garbage collection" -- generally subjectively, it is thought that the practice is to find the garbage and throw it away. In VM, the implementation process of GC is just the opposite. The purpose of GC is to track all objects in use, mark the remaining objects as garbage, and then the objects marked as garbage will be cleared to reclaim the memory occupied by these garbage objects, so as to realize the automatic management of memory.
Generational hypothesis
The weak generational hypothesis has been confirmed in various types of programming paradigms or programming languages, but the evidence provided by the strong generational hypothesis is not sufficient, and the view is still controversial.
The main design purpose of the generational garbage collector is to reduce the pause time of the recycling process and improve the spatial throughput. If the replication algorithm is used to recycle young generation objects, the expected pause time largely depends on the total number of objects that survive after minor collection, and this value depends on the overall space of the young generation.
If the overall space of the younger generation is too small, although the process of one-time recycling is relatively fast, due to the short interval between two recycling, the younger generation objects may not have enough time to "reach death", resulting in a small amount of recycled memory, which may lead to the following situations:
The designer of generational garbage collector is an art of balancing the above aspects:
Based on the weak generation hypothesis, the JVM divides the heap memory into young generation and old generation, and the old generation is sometimes called tenured.
JVM provides different garbage collection algorithms for different generations. In fact, objects in different generations may refer to each other, and these referenced objects will also be regarded as GC roots during generation garbage collection (see the analysis in the next section). The weak generation hypothesis may not be applicable to some applications in specific scenarios; The GC algorithm is optimized for young or old objects. For objects with "medium" life expectancy, the garbage collection performance of the JVM is relatively inferior.
Object activation algorithm
In the JVM, reachability analysis is used to determine whether an object is alive or not. The basic idea of this algorithm is to use the active references of some columns called GC roots (GC root set) as the starting point, and start to search downward from these set nodes. The search path is called reference chain. When an object is not connected to GC roots by any reference chain, it indicates that the object is unreachable.
What exactly does GC roots mean? This can be summarized from the implementation of parallel scavenge source code of hotspot VM. Refer to pstasks.com of jdk9 branch HPP and pstasks cpp:
// psTasks.hpp
class ScavengeRootsTask : public GCTask {
public:
enum RootType {
universe = 1,jni_handles = 2,threads = 3,object_synchronizer = 4,flat_profiler = 5,system_dictionary = 6,class_loader_data = 7,management = 8,jvmti = 9,code_cache = 10
};
private:
RootType _root_type;
public:
ScavengeRootsTask(RootType value) : _root_type(value) {}
char* name() { return (char *)"scavenge-roots-task"; }
virtual void do_it(GCTaskManager* manager,uint which);
};
// psTasks.cpp
void ScavengeRootsTask::do_it(GCTaskManager* manager,uint which) {
assert(ParallelScavengeHeap::heap()->is_gc_active(),"called outside gc");
PSPromotionManager* pm = PSPromotionManager::gc_thread_promotion_manager(which);
PSScavengeRootsClosure roots_closure(pm);
PSPromoteRootsClosure roots_to_old_closure(pm);
switch (_root_type) {
case universe:
Universe::oops_do(&roots_closure);
break;
case jni_handles:
JNIHandles::oops_do(&roots_closure);
break;
case threads:
{
ResourceMark rm;
Threads::oops_do(&roots_closure,NULL);
}
break;
case object_synchronizer:
ObjectSynchronizer::oops_do(&roots_closure);
break;
case flat_profiler:
FlatProfiler::oops_do(&roots_closure);
break;
case system_dictionary:
SystemDictionary::oops_do(&roots_closure);
break;
case class_loader_data:
{
PSScavengeKlassClosure klass_closure(pm);
ClassLoaderDataGraph::oops_do(&roots_closure,&klass_closure,false);
}
break;
case management:
Management::oops_do(&roots_closure);
break;
case jvmti:
jvmtiExport::oops_do(&roots_closure);
break;
case code_cache:
{
MarkingCodeBlobClosure each_scavengable_code_blob(&roots_to_old_closure,CodeBlobToOopClosure::FixRelocations);
CodeCache::scavenge_root_nmethods_do(&each_scavengable_code_blob);
AOTLoader::oops_do(&roots_closure);
}
break;
default:
fatal("UnkNown root type");
}
// Do the real work
pm->drain_stacks(false);
}
Since there are few comments in the source code of hotspot VM, you can only refer to some materials and the specific implementation of the source code method to guess the specific composition of GC roots:
There are other possible references:
StringTable::oops_ Do: all resident strings (in stringtable).
Memory pool in JVM
The JVM divides the memory pool into multiple regions. The composition and basic functions of each region are described below to facilitate understanding how garbage collection plays its role in different memory pool spaces when introducing the GC algorithm.
Eden
Eden, also known as the garden of Eden, is an ordinary memory area for object allocation when creating objects. Eden is further divided into one or more thread local allocation buffers (TLAB) residing in Eden space. TLAB is thread exclusive. JVM allows threads to allocate directly in the corresponding TLAB when creating most objects, which can avoid the performance overhead caused by synchronization between multiple threads.
When object allocation cannot be performed in TLAB (generally, there is not enough space in the buffer), the object allocation operation will be performed in the shared space (common area) in Eden. If the entire Eden does not have enough space, YGC (young generation garbage collection) will be triggered to free up more space in Eden. If there is still not enough memory after triggering YGC, the object will be allocated in the older generation (this is generally called handle promotion, which has preconditions).
When the garbage collector collects Eden, it will traverse all objects reachable relative to GC roots and mark them as live objects. This stage is called marking stage.
Another thing to note here is that objects in the heap may be linked across generations, that is, objects in the younger generation may be held by objects in the older generation (Note: objects in the older generation are held by objects in the younger generation, which does not need to be considered in YGC). If objects in the older generation are not traversed at this time, Then it is impossible to analyze whether the objects of the younger generation held by the objects of the older generation are reachable through the reachability algorithm. The JVM uses card marking to solve this problem. The detailed implementation of card marking is not expanded here.
After the marking phase is completed, all surviving objects in Eden will be copied to one of the survivor spaces. After the replication phase is completed, the entire Eden is considered empty and can be reused to allocate more other objects. The GC algorithm used here is called Mark and copy algorithm: mark the surviving objects, and then copy them to one of the survivor spaces. Note that this is a copy, not a move.
So much about Eden. TLAB and card marking are relatively low-level implementations in the JVM. You can probably know them.
Survivor Spaces
Survivor spaces is the survivor space. The most common names of survivor spaces are from and to. The most important point is that one of the two areas in survivor space is always empty.
After the next YGC trigger, the free survivor space will enter the object. All surviving objects of the younger generation (including Eden and those in the non empty from survivor area) will be copied to the to survivor area. After this process is completed, the to survivor area will store active objects, and the from survivor area will be cleared. Next, the roles of the from survivor area and the to survivor area will be exchanged, that is, after the next round of YGC trigger, the surviving objects will be copied to the from survivor area, and the to survivor area will be emptied, and so on.
After the replication process of the surviving objects mentioned above goes back and forth between the two survivor spaces for many times, if some surviving objects are "old enough" (they survive after multiple replications), these "old" objects will be promoted to the old generation, these objects will move from the survivor space to the old age space, and then they will reside in the old generation, Until it becomes unreachable.
If the object is born in Eden and still survives after the first YGC and can be accommodated by survivor spaces, the object will be copied to survivor spaces and the object age is set to 1. If an object can survive every YGC in survivor spaces, the age of the object will increase by 1. When its age increases to the age threshold of promoting to the old age, it will be promoted to the old age, that is, it will be moved to the old generation. The JVM parameter of the age threshold for promotion to old age is - XX: maxtenuringthreshold = n:
It is worth noting that the default value of - XX: maxtenuringthreshold set in the JVM is the maximum optional value, that is, 15.
The JVM also has the function of judging the age of dynamic objects. The JVM does not always require that the age of surviving objects must reach maxtenuringthreshold before they can be promoted to the old age. If the total size of all objects of the same age in survivor spaces is greater than half of survivor spaces, objects older than or equal to this age can be directly promoted to the old age, There is no need to wait for the object age to reach maxtenuringthreshold, for example:
We can briefly summarize several situations of the object entering the old age:
Tenured
The old generation is often called tenured, and its implementation of memory space is generally more complex. The space in the old age is generally much larger than that in the young age. The objects carried in it are generally not "memory garbage". The side also shows that the recovery rate of objects in the old generation is generally low.
In general, the frequency of GC in the old generation is lower than that in the young generation, and most objects in the old generation are expected to be living objects (that is, the survival rate of objects after GC is relatively high). Therefore, the labeling and replication algorithm is not suitable for the old generation. GC algorithms in the old days generally moved objects to minimize memory fragmentation. The general rules of GC algorithm in the old age are as follows:
Metaspace
Before Java 8, a space called permanent generation was also defined in the JVM memory pool. This memory space is mainly used to store metadata, such as class information, and other data contents, such as resident strings (string constant pool). In fact, the permanent generation has brought a lot of trouble to Java developers, because in most cases, it is difficult to predict how much space the permanent generation needs to set, because it is also difficult for developers to predict the specific size of metadata or string constant pool. Once the allocated metadata fails, they will encounter Java Lang.outofmemoryerror: permgen space exception. Exclude Java. Net memory overflow Lang. outofmemoryerror exception. If the exception is caused under normal conditions, the only solution is to increase the memory of the permanent generation through the VM parameter - XX: maxpermsize = xxxmm. However, this is also a temporary solution.
Because metadata and other contents are unpredictable, Java 8 has removed the permanent generation, added a memory area Metaspace, and many other miscellaneous items (such as string constant pool) have been moved to the Java heap. Metadata such as class definition information is currently loaded directly into the meta space. Meta space is a memory area allocated in the machine's local memory, which is isolated from the heap memory that hosts Java objects. By default, the size of the meta space is only limited by the limit that the local memory of the machine can be allocated to the Java program, which can basically avoid the Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java. Java Lang. outofmemoryerror: the scenario in which the permgen space exception occurs.
VM parameters related to common memory pools
GC type
Referring to the answer of rednexelafx, there are only two GC classifications in hotspot VM:
Because hotspot VM has been developed for many years, the external interpretation of GC terms has been confused, so minor GC, major GC and full GC appear.
Minor GC
Minor GC, also known as minor garbage collection, is directly translated as secondary garbage collection. Its definition is relatively clear: garbage collection in the younger generation is called minor GC. During the processing of minor garbage collection:
Major GC and full GC
Major GC (major garbage collection) and full GC are currently two terms that are not formally defined. Specifically, major GC or full GC are not clearly defined in the JVM specification or in the garbage collection research paper. However, according to folk customs or conventions, the differences between the two are as follows:
In fact, the GC process is very complex, and many major GCs are triggered by minor GC, so it is almost impossible to strictly split major GC or minor GC. On the other hand, the current garbage collection algorithm, like the G1 collection algorithm, provides some garbage collection functions. It is explained that the type of GC cannot be divided simply according to what area to collect.
Some of the above theories or data indicate that rather than discussing or worrying about whether GC is a major GC or a minor GC, it is better to pay more attention to whether the GC process will cause the application thread to pause or whether the GC process can be executed concurrently with the application thread.
Common GC algorithms
Let's analyze the common GC algorithms in the hotspot VM. Because the G1 algorithm is relatively complex, we don't have the ability to analyze it for the time being.
Purpose of GC algorithm
There are two main purposes of GC algorithm:
Finding surviving objects is mainly based on the reachability algorithm of GC roots. There are several notes about the marking stage:
The next stage after the marking stage is completed is to remove all useless objects. There are three common algorithms according to the processing methods:
Mark sweep algorithm
The mark sweep algorithm, that is, the mark sweep algorithm, is an indirect collection algorithm. It does not directly detect the garbage object itself, but first determines all the surviving objects, and then conversely determines that other objects are garbage objects. It mainly includes two stages: marking and cleaning. It is the simplest and most basic collection algorithm. It mainly includes two stages:
Mark sweep compact algorithm
Memory fragmentation is one of the problems that the non mobile collection algorithm cannot solve: Although there is free space in the heap, the memory manager cannot find a continuous memory block to meet the allocation requirements of large objects, or it takes a long time to find the appropriate free memory space.
Mark sweep compact algorithm, that is, mark sweep compact algorithm, is also an indirect collection algorithm, which mainly includes three stages:
Compression and defragmentation of heap memory can effectively reduce the problem of external fragmentation, which is an advantage of mark clean compression algorithm.
Mark copy algorithm
The mark copy algorithm, that is, the mark copy algorithm, is very similar to the mark clean compression algorithm. The important difference is that after marking and cleaning, all surviving objects in the mark copy algorithm will be copied to a different memory area - survivor space. It mainly includes three stages:
Mark copy algorithm can avoid the problem of memory fragmentation, but its cost is relatively high, because it uses half area replication and recycling, and the available memory of the area is half of the original.
Summary
JVM and GC are the contents that Java developers must master. In fact, they contain a lot of knowledge. This article only briefly introduces some basic concepts:
Later, we will analyze the GC collector matching, GC log viewing, tools provided by the JVM, and so on.
reference material:
Original link
The official account of Technology (Throwable Digest), which is not regularly pushed to the original technical article (never copied or copied):
Entertainment official account ("sand sculpture"), select interesting sand sculptures, videos and videos, push them to relieve life and work stress.