Core Java

GC tuning in practice

Garbage Truck IsolatedTuning Garbage Collection is no different from any other performance-tuning activities.

Instead of giving in to temptation for tweaking random parts of the application, you need to make sure you understand the current situation and the desired outcome. In general it is as easy as following the following process:

  1. State your performance goals
  2. Run tests
  3. Measure
  4. Compare to goals
  5. Make a change and go back to running tests

It is important that the goals can be set and measured the three dimensions, all relevant to performance tuning. These goals include latency, throughput and capacity, understanding which I can recommend to take a look at the corresponding chapter in the Garbage Collection Handbook.

Lets see how we can start investigating how setting and hitting such goals looks like in practice. For this purpose, lets take a look at an example code:

//imports skipped for brevity
public class Producer implements Runnable {

  private static ScheduledExecutorService executorService = Executors.newScheduledThreadPool(2);

  private Deque<byte[]> deque;
  private int objectSize;
  private int queueSize;

  public Producer(int objectSize, int ttl) {
    this.deque = new ArrayDeque<byte[]>();
    this.objectSize = objectSize;
    this.queueSize = ttl * 1000;
  }

  @Override
  public void run() {
    for (int i = 0; i < 100; i++) {
      deque.add(new byte[objectSize]);
      if (deque.size() > queueSize) {
        deque.poll();
      }
    }
  }

  public static void main(String[] args) throws InterruptedException {
    executorService.scheduleAtFixedRate(new Producer(200 * 1024 * 1024 / 1000, 5), 0, 100, TimeUnit.MILLISECONDS);
    executorService.scheduleAtFixedRate(new Producer(50 * 1024 * 1024 / 1000, 120), 0, 100, TimeUnit.MILLISECONDS);
    TimeUnit.MINUTES.sleep(10);
    executorService.shutdownNow();
  }
}

The code is submitting two jobs to run every 100 ms. Each job emulates objects with the specific lifespan: it creates objects, lets them leave for a predetermined amount of time and then forgets about them, allowing GC to reclaim the memory.

When running the example with GC logging turned on with the following parameters

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

we start seeing the impact of GC immediately in the log files, similar to the following:

2015-06-04T13:34:16.119-0200: 1.723: [GC (Allocation Failure) [PSYoungGen: 114016K->73191K(234496K)] 421540K->421269K(745984K), 0.0858176 secs] [Times: user=0.04 sys=0.06, real=0.09 secs] 
2015-06-04T13:34:16.738-0200: 2.342: [GC (Allocation Failure) [PSYoungGen: 234462K->93677K(254976K)] 582540K->593275K(766464K), 0.2357086 secs] [Times: user=0.11 sys=0.14, real=0.24 secs] 
2015-06-04T13:34:16.974-0200: 2.578: [Full GC (Ergonomics) [PSYoungGen: 93677K->70109K(254976K)] [ParOldGen: 499597K->511230K(761856K)] 593275K->581339K(1016832K), [Metaspace: 2936K->2936K(1056768K)], 0.0713174 secs] [Times: user=0.21 sys=0.02, real=0.07 secs]

Based on the information in the log we can start improving the situation with three different goals in mind

  1. Making sure the worst-case GC pause does not exceed a predetermined threshold
  2. Making sure the total time during which application threads are stopped does not exceed a predetermined threshold
  3. Reducing infrastructure costs while making sure we can still achieve reasonable latency and/or throughput targets.

For this, the code above was run for 10 minutes on three different configurations resulting in three very different results summarized in the following table:

HeapGC AlgorithmUseful workLongest pause
-Xmx12g-XX:+UseConcMarkSweepGC89.8%560 ms
-Xmx12g-XX:+UseParallelGC91.5%1,104 ms
-Xmx8g-XX:+UseConcMarkSweepGC66.3%1,610 ms

 
The experiment ran the same code with different GC algorithms and different heap size to measure the duration of garbage collection pauses with regards to latency and throughput. Details of the experiments and interpretation of results are presented in our Garbage Collection Handbook. Take a look at the handbook for examples in how simple changes in configuration turn the example to behave completely differently in regards of latency, throughput of capacity.

Note that in order to keep the example as simple as possible only a limited amount of input parameters were changed, for example the experiments do not test on different number of cores or with a different heap layout.

Reference: GC tuning in practice from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button