GC Explained: Algorithms
As described in the previous post, we have four different garbage collectors available in HotSpot JVM. There are some significant differences between them, but the actual concepts behind the algorithms which are used to do the actual job are quite similar. In this short post, I will try to explain three basic algorithms:
- Mark-sweep
- Mark-sweep-compact
- Mark-copy
GC Roots
Before we move into the details, let’s make sure that we have a common understanding of what GC Roots are. These are the objects which are directly accessible from outside the heap. For example:
- Active threads
- Static variables
- Local variables (accessible via stack of a thread)
- JNI references
Mark
All of the algorithms discussed have the same mark phase. Marking phase is about traversing the whole object graph, starting from GC Roots. When GC visits the object, it marks it as accessible and thus alive. All the objects which are not reachable from GC Roots are garbage. Marking requires stop-the-world (STW) pauses, because the running application threads could interfere. How long the STW pause is, depends mostly on the number of visited objects.
Mark-sweep
After marking phase, we have the memory space which is occupied by visited (accessible via GC Roots) and unvisited objects. Sweep phase releases the memory fragments which contains unreachable objects. It is simple, but because the dead objects are not necessarily next to each other, we end up having a fragmented memory. That’s not bad per se, but trying to fit a too large object into the memory can lead to OutOfMemoryError.
Mark-sweep-compact
This algorithm fixes the problem with fragmented memory. After all alive objects are marked, they are moved to the beginning of the memory space. That helps to avoid OutOfMemoryError caused by too fragmented memory, but compacting the heap isn’t for free. Copying objects and updating all references to them take time and it all happens during STW pause.
Mark-copy
Mark-copy algorithm copies all alive objects to a new memory region. The previously occupied region is considered to be free. Next time mark-copy is executed, all the alive objects are moved back to the previous memory region. As you can imagine, this of course leads to a memory compaction. Unfortunately, it requires additional extra region large enough to fit all live objects at any given point in time.
Reference: | GC Explained: Algorithms from our JCG partner Grzegorz Mirek at the > performant code_ blog. |
Please let me know when this GC root is created and is there any way to visualize this?
GC roots are ‘created’ in many situations, for example when static fields are initialized, a new thread is created or JNI call is being executed. I can’t come up with any visualization that would make it easier to understand.