Reducing memory usage with String.intern()
Every now and then you have a dying production application at hand. And you know you need to patch it as fast as possible. So have we, and thought it would be interesting to share one of the recent war stories. In this case we had a chance to patch an application with something as simple as String.intern(). But let me start from the beginning.
The application at hand was suffering from lack of memory and not even starting up after its recent changes. The symptoms included high CPU usage after JVM restart and then a few minutes later the fatal OutOfMemoryError: heap space in the logs. A quick look into heap contents gave us a suspect – the application was loading millions of objects into a certain internal data structure.
Background check with the development team revealed that the number of objects loaded was recently multiplied by a factor of two – instead of ~five million objects the application now had to deal with approximately ten million instances in memory. This can indeed use up some heap space. But knowing the possible cause was not going to help us much – no way the business owners were willing to give up on the precious data they had just acquired.
Digging into the data structure at hand we found its excessive usage of Strings underneath. Which should come as no surprise to any of our readers. But some of these Strings contained a repeating representation content. You can think of address elements, such as street names and / or countries as being the equivalent cases.
And here a quick fix started to brew in our heads. What if we internalize those repeating Strings? After quickly checking with the developers of the application, we were given a green light. The developers warranted that the side effects of the interning, such as remembering to String.intern() all of the strings that were being compared to our internalized Strings, will be contained. Thank god for encapsulation.
Now we just had to understand how much CPU overhead we were going to introduce on internalizing. By our surprise, interning ~10M Strings took just a bit less than four minutes. And saved us exactly those ~500MB of memory we were short of. So the day was saved for the time.
Now, before you jump to your application and start internalizing all the Strings you are going to find, I must warn you beforehand. There are a lot of possible things that can go wrong:
- Your internalized Strings will disappear from the heap and relocate to the permanent generation. So make sure you have enough room in the permgen space.
- Be sure to internalize all Strings you are going to compare to your internalized Strings. Or you will be creating the nastiest types of bugs in your application.
- Make sure you can tolerate the CPU overhead on internalizing. It is a native method call, thus it will be completely dependent on your specific platform, so make sure you try it out before rolling out the changes in production
We admit that our case was rather rare – the data structure contained a lot of repeating String objects and was integrated with the application in a way that made it possible for us to isolate our quick fix. And even in our case, the fix was soon after removed by developers who reworked their data structures to a more reasonable graph representation.
But the warnings aside – there are interesting and helpful tools built into the Java Virtual Machine. Know how to use them and beware of their side effects, and they will become your friends. Use them without caution and you can easily kill your application. Your best friend will always be an actual test case, built on top of your very own application.
Thanks for sharing :)