Allocating memory for the JVM: a case study
This post is about a recent performance tuning exercise. As always, these start with vague statements about symptoms. This time the devil took the form of “The application is slow and we do not have the access to source code. What are our options to improve the situation”.
A closer look to the application revealed it consisting of several batch jobs bundled together. Drilling down through the “performance” criteria revealed that the time it takes to run a specific job was taking a bit too long. Some more scrutinising later I was given a measurable target. I needed to shave off two minutes from a particular job runtime to fit in a pre-allocated time window.
The troubled application was a pretty innocent-looking small JAR file. Which, to my luck also bundled the load tests.
Running the application with the GC logging turned on (-XX:+PrintGCTimeStamps -Xloggc:/path-to/gc.log -XX:+PrintGCDetails) and visualizing the logs quickly revealed first target for the optimization. The accumulated GC pause times added up to three and a half minutes, hinting I might stand a chance.
In a situation like this one has several tools at his disposal, some of which are simple and straightforward:
- Modify the heap/permgen size
- Change GC algorithm
- Configure the ratios between the memory regions
I took the path of altering the heap size. Besides just a lucky guess, it was based on the recently learned lesson about the correlation between live data set size and recommended heap size. From the GC logs I also noted that the live data set of the application was approximately 240m. So according to my recently acquired knowledge, the sweet spot for this application heap was somewhere in between 720 and 960m.
But in the configuration I discovered -Xmx being set to just 300m. Tweaking the parameters a bit I ran the tests again with the following results:
Heap size | Total GC pauses | Throughput |
---|---|---|
300m | 207.48s | 92.25% |
384m | 54.13s | 97.97% |
720m | 20.52s | 99.11% |
1,440m | *11.37s | *99.55% |
* indicates that this configuration did not trigger Full GC during the run.
Now if you look the results it might just translate the outcome to the “bigger is better”. You are indeed correct if you are measured only in milliseconds. If one of the success criterias is related to money, it might not be as easy. Pile together hundreds or thousands of those machines in a large deployment and you might get a nasty surprise in the electricity bill alone.
Other than this, the article is a textbook case from a performance tuning textbook. You are set to success if you have a measurable goal and you can measure the results instead of guessing. If I had been forced to jump in without a clear goal or load testing capability I would still be tweaking random bits of the configuration.