Core Java

Learn about JDK9 Compact Strings (Video review Charlie Hunt)

JDK 9 introduces a new feature called Compact Strings.  Given the ubiquity of Strings in Java programs I feel that this is a really important change that needs to be understood by all Java developers.

In this video Charlie Hunt explains the history and implementation of this new feature.  The video is not actually about Compact Strings. Compact Strings are only introduced as a case study to explain how with a lot of work, the three legged stool of, latency, throughput and memory footprint can all be improved together.

If you have the time I definitely recommend watching the whole video – although the actual part on Compact Strings start at 26:24.

If you want a 5 minute overview here are the highlights:

  • String density (JEP 254 Compact Strings) is a feature of JDK 9.
  • Aims were to reduce memory footprint without affecting any performance – latency or throughput as well maintaining full backward compatibility.
  • JDK 6 introduced compressed strings but this was never brought forward into later JVMs.  This is a complete rewrite.
  • To work out how much memory could be saved 960 disparate java application heap dumps were analysed.
  • Live data size of the heap dumps were between 300MB and 2.5GB.
  • char[] consumed between 10% and 45% of the live data
  • vast majority of chars were only one bit in size (i.e. ASCII)
  • 75% of the char arrays were 35 chars or smaller
  • On average reduction in application size would be 5-15% (reduction in char[] size about 35-45% because of header size)
  • The way it will be implemented is that if all chars in the String use only 1 byte (the higher byte is 0) then a byte[] will be used rather than char[] (IS0-8859-1/Latin1 encoding).  There will a leading bye to indicate which encoding was used.
  • UTF8 not used because it supports variable length chars and is therefore not performant for random access.
  • private final byte coder on the String indicates the encoding.  Note the room to support many more encodings in the future.
  • For all 64 bit JVMs no extra memory was needed for the extra field because of the ‘dead’ space needed for 8 byte object alignment.
  • Throughput doesn’t suffer as tested with 400 JMH benchmarks available online.
  • The reason for this is that String is highly optimized in that there 55 specific JVM features for String alone.
  • Latency also improved tested with industry benchmark SPECjbb2015 also regression tested on SPECjbb2005
  • Feature can be enabled and disabled with -XX:+CompactStrings but will be enabled by default.

Daniel Shaya

Daniel has been programming in Java since it was in beta. Working predominantly in the finance industry he has created real time trading and margin risk applications. He is currently a director at OpenHFT where we are building next generation Java low latency products.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button