Author: Maoni Stephens (@maoni0) - 2015
Note: See The Garbage Collection Handbook to learn more about garbage collection topics in general; for specific knowledge on the CLR GC please refer to the Pro .NET Memory Management book. Both referenced in the resources section at the end of this document.
The 2 components that belong to GC are the allocator and the collector. The allocator is responsible for getting more memory and triggering the collector when appropriate. The collector reclaims garbage, or the memory of objects that are no longer in use by the program.
There are other ways that the collector can get called, such as manually calling GC.Collect or the finalizer thread receiving an asynchronous notification of the low memory (which triggers the collector).
The allocator gets called by the allocation helpers in the Execution Engine (EE), with the following information:
The GC does not have special treatment for different kinds of object types. It consults the EE to get the size of an object.
Based on the size, the GC divides objects into 2 categories: small objects (< 85,000 bytes) and large objects (>= 85,000 bytes). In principle, small and large objects can be treated the same way but since compacting large objects is more expensive GC makes this distinction.
When the GC gives out memory to the allocator, it does so in terms of allocation contexts. The size of an allocation context is defined by the allocation quantum.
Large objects do not use allocation contexts and quantums. A single large object can itself be larger than these smaller regions of memory. Also, the benefits (discussed below) of these regions are specific to smaller objects. Large objects are allocated directly to a heap segment.
The allocator is designed to achieve the following:
Object* GCHeap::Alloc(size_t size, DWORD flags);
Object* GCHeap::Alloc(alloc_context* acontext, size_t size, DWORD flags);
The above functions can be used to allocate both small objects and large objects. There is also a function to allocate directly on LOH:
Object* GCHeap::AllocLHeap(size_t size, DWORD flags);
The GC strives to manage memory extremely efficiently and require very little effort from people who write “managed code”. Efficient means:
The CLR GC is a generational collector which means objects are logically divided into generations. When a generation N is collected, the survived objects are now marked as belonging to generation N+1. This process is called promotion. There are exceptions to this when we decide to demote or not promote.
For small objects the heap is divided into 3 generations: gen0, gen1 and gen2. For large objects there’s one generation – gen3. Gen0 and gen1 are referred to as ephemeral (objects lasting for a short time) generations.
For the small object heap, the generation number represents the age – gen0 being the youngest generation. This doesn’t mean all objects in gen0 are younger than any objects in gen1 or gen2. There are exceptions which will be explained below. Collecting a generation means collecting objects in that generation and all of its younger generations.
In principle large objects can be handled the same way as small objects but since compacting large objects is very expensive, they are treated differently. There is only one generation for large objects and they are always collected with gen2 collections due to performance reasons. Both gen2 and gen3 can be big, and collecting ephemeral generations (gen0 and gen1) needs to have a bounded cost.
Allocations are made in the youngest generation – for small objects this means always gen0 and for large objects this means gen3 since there’s only one generation.
The managed heap is a set of managed heap segments. A heap segment is a contiguous block of memory that is acquired by the GC from the OS. Heap segments can be small, large, or pinned object segments depending on what they contain. On each heap the heap segments are chained together. There is at least one small object segment and one large segment - they are reserved when CLR is loaded. There is also a NonGC heap that contains ro (readonly) segments.
There’s always only one ephemeral segment in each small object heap, which is where gen0 and gen1 live. This segment may or may not include gen2 objects. In addition to the ephemeral segment, there can be zero, one or more additional segments, which will be gen2 segments since they only contain gen2 objects.
There are 1 or more segments on the large object heap.
A heap segment is consumed from the lower address to the higher address, which means objects of lower addresses on the segment are older than those of higher addresses. Again there are exceptions that will be described below.
Heap segments can be acquired as needed. They are deleted when they don’t contain any live objects, however the initial segment on the heap will always exist. For each heap, one segment at a time is acquired, which is done during a GC for small objects and during allocation time for large objects. This design provides better performance because large objects are only collected with gen2 collections (which are relatively expensive).
Heap segments are chained together in order of when they were acquired. The last segment in the chain is always the ephemeral segment. Collected segments (no live objects) can be reused instead of deleted and instead become the new ephemeral segment. Segment reuse is only implemented for small object heap. Each time a large object is allocated, the whole large object heap is considered. Small object allocations only consider the ephemeral segment.
The allocation budget is a logical concept associated with each generation. It is a size limit that triggers a GC for that generation when exceeded.
The budget is a property set on the generation mostly based on the survival rate of that generation. If the survival rate is high, the budget is made larger with the expectation that there will be a better ratio of dead to live objects next time there is a GC for that generation.
When a GC is triggered, the GC must first determine which generation to collect. Besides the allocation budget there are other factors that must be considered:
The goal of the mark phase is to find all live objects.
The benefit of a generational collector is the ability to collect just part of the heap instead of having to look at all of the objects all the time. When collecting the ephemeral generations, the GC needs to find out which objects are live in these generations, which is information reported by the EE. Besides the objects kept live by the EE, objects in older generations can also keep objects in younger generations live by making references to them.
The GC uses cards for the older generation marking. Cards are set by JIT helpers during assignment operations. If the JIT helper sees an object in the ephemeral range it will set the byte that contains the card representing the source location. During ephemeral collections, the GC can look at the set cards for the rest of the heap and only look at the objects that these cards correspond to.
The plan phase simulates a compaction to determine the effective result. If compaction is productive the GC starts an actual compaction; otherwise it sweeps.
If the GC decides to compact, which will result in moving objects, then references to these objects must be updated. The relocate phase needs to find all references that point to objects that are in the generations being collected. In contrast, the mark phase only consults live objects so it doesn’t need to consider weak references.
This phase is very straight forward since the plan phase already calculated the new addresses the objects should move to. The compact phase will copy the objects there.
The sweep phase looks for the dead space in between live objects. It creates free objects in place of these dead spaces. Adjacent dead objects are made into one free object. It places all of these free objects onto the freelist.
Terms:
This illustrates how a background GC is done.
This scenario is the same as WKS GC with concurrent GC on, except the non background GCs are done on SVR GC threads.
This section is meant to help you follow the code flow.
User thread runs out of space in an allocation context and gets a new one via try_allocate_more_space.
try_allocate_more_space calls GarbageCollectGeneration when it needs to trigger a GC.
Given WKS GC with concurrent GC off, GarbageCollectGeneration is done all on the user thread that triggered the GC. The code flow is:
GarbageCollectGeneration()
{
SuspendEE();
garbage_collect();
RestartEE();
}
garbage_collect()
{
generation_to_condemn();
gc1();
}
gc1()
{
mark_phase();
plan_phase();
}
plan_phase()
{
// actual plan phase work to decide to
// compact or not
if (compact)
{
relocate_phase();
compact_phase();
}
else
make_free_lists();
}
Given WKS GC with concurrent GC on (default case), the code flow for a background GC is
GarbageCollectGeneration()
{
SuspendEE();
garbage_collect();
RestartEE();
}
garbage_collect()
{
generation_to_condemn();
// decide to do a background GC
// wake up the background GC thread to do the work
do_background_gc();
}
do_background_gc()
{
init_background_gc();
start_c_gc ();
//wait until restarted by the BGC.
wait_to_proceed();
}
bgc_thread_function()
{
while (1)
{
// wait on an event
// wake up
gc1();
}
}
gc1()
{
background_mark_phase();
background_sweep();
}