Java: ChronicleMap Part 1, Go Off-Heap
Filling up a HashMap
with millions of objects will quickly lead to problems such as inefficient memory usage, low performance and garbage collection problems. Learn how to use off-heap CronicleMap
that can contain billions of objects with little or no heap impact.
The built-in Map
implementations, such as HashMap
and ConcurrentHashMap
are excellent tools when we want to work with small to medium-sized data sets. However, as the amount of data grows, theseMap
implementations are deteriorating and start to exhibit a number of unpleasant drawbacks as shown in this first article in an article series about open-sourceed CronicleMap
.
Heap Allocation
In the examples below, we will use Point
objects.Point
is a POJO with a public default constructor and getters and setters for X and Y properties (int). The following snippet adds a million Point
objects to a HashMap
:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 | final Map<Long, Point> m = LongStream.range( 0 , 1_000_000) .boxed() .collect( toMap( Function.identity(), FillMaps::pointFrom, (u,v) -> { throw new IllegalStateException(); }, HashMap:: new ) ); // Conveniency method that creates a Point from // a long by applying modulo prime number operations private static Point pointFrom( long seed) { final Point point = new Point(); point.setX(( int ) seed % 4517 ); point.setY(( int ) seed % 5011 ); return point; } |
We can easily see the number of objects allocated on the heap and how much heap memory these objects consume:
01 02 03 04 05 06 07 08 09 10 11 | Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34366 | head num #instances #bytes class name (module) ------------------------------------------------------- 1 : 1002429 32077728 java.util.HashMap$Node (java.base @10 ) 2 : 1000128 24003072 java.lang.Long (java.base @10 ) 3 : 1000000 24000000 com.speedment.chronicle.test.map.Point 4 : 454 8434256 [Ljava.util.HashMap$Node; (java.base @10 ) 5 : 3427 870104 [B (java.base @10 ) 6 : 185 746312 [I (java.base @10 ) 7 : 839 102696 java.lang.Class (java.base @10 ) 8 : 1164 89088 [Ljava.lang.Object; (java.base @10 ) |
For each Map
entry, a Long
, aHashMap$Node
and aPoint
object need to be created on the heap. There are also a number of arrays with HashMap$Node
objects created. In total, these objects and arrays consume 88,515,056 bytes of heap memory. Thus, each entry consumes on average 88.5 bytes.
NB: The extra 2429 HashMap$Node
objects come from other HashMap
objects used internally by Java.
Off-Heap Allocation
Contrary to this, a CronicleMap
uses very little heap memory as can be observed when running the following code:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 | final Map<Long, Point> m2 = LongStream.range( 0 , 1_000_000) .boxed() .collect( toMap( Function.identity(), FillMaps::pointFrom, (u,v) -> { throw new IllegalStateException(); }, () -> ChronicleMap .of(Long. class , Point. class ) .averageValueSize( 8 ) .valueMarshaller(PointSerializer.getInstance()) .entries(1_000_000) .create() ) ); |
01 02 03 04 05 06 07 08 09 10 11 | Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34413 | head num #instances #bytes class name (module) ------------------------------------------------------- 1 : 6537 1017768 [B (java.base @10 ) 2 : 448 563936 [I (java.base @10 ) 3 : 1899 227480 java.lang.Class (java.base @10 ) 4 : 6294 151056 java.lang.String (java.base @10 ) 5 : 2456 145992 [Ljava.lang.Object; (java.base @10 ) 6 : 3351 107232 java.util.concurrent.ConcurrentHashMap$Node (java.base @10 ) 7 : 2537 81184 java.util.HashMap$Node (java.base @10 ) 8 : 512 49360 [Ljava.util.HashMap$Node; (java.base @10 ) |
As can be seen, there are no Java heap objects allocated for theCronicleMap
entries and consequently no heap memory either.
Instead of allocating heap memory,CronicleMap
allocates its memory off-heap. Provided that we start our JVM with the flag -XX:NativeMemoryTracking=summary
, we can retrieve the amount off-heap memory being used by issuing the following command:
1 2 | Pers-MacBook-Pro:chronicle-test pemi$ jcmd 34413 VM.native_memory | grep Internal - Internal (reserved=30229KB, committed=30229KB) |
Apparently, our one million objects were laid out in off-heap memory using a little more than 30 MB of off-heap RAM. This means that each entry in theCronicleMap
used above needs on average 30 bytes.
This is much more memory effective than a HashMap
that required 88.5 bytes. In fact, we saved 66% of RAM memory and almost 100% of heap memory. The latter is important because the Java Garbage Collector only sees objects that are on the heap.
Note that we have to decide upon creation how many entries the CronicleMap
can hold at maximum. This is different compared toHashMap
which can grow dynamically as we add new associations. We also have to provide a serializer (i.e. PointSerializer.getInstance()
), which will be discussed in detail later in this article.
Garbage Collection
Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap. So if we, for example, double the number of objects on the heap, we can expect the GC would take four times longer to complete.
If we, on the other hand, create 64 times more objects, we can expect to suffer an agonizing 1,024 fold increase in expected GC time. This effectively prevents us from ever being able to create really largeHashMap
objects.
With ChronicleMap
we could just put new associations without any concern of garbage collection times.
Serializer
The mediator between heap and off-heap memory is often called a
serializer.ChronicleMap
comes with a number of pre-configured serializers for most built-in Java types such asInteger
,Long
,String
and many more.
In the example above, we used a custom serializer that was used to convert aPoint
back and forth between heap and off-heap memory. The serializer class looks like this:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | public final class PointSerializer implements SizedReader<Point>, SizedWriter<Point> { private static PointSerializer INSTANCE = new PointSerializer(); public static PointSerializer getInstance() { return INSTANCE; } private PointSerializer() {} @Override public long size( @NotNull Point toWrite) { return Integer.BYTES * 2 ; } @Override public void write(Bytes out, long size, @NotNull Point point) { out.writeInt(point.getX()); out.writeInt(point.getY()); } @NotNull @Override public Point read(Bytes in, long size, @Nullable Point using) { if (using == null ) { using = new Point(); } using.setX(in.readInt()); using.setY(in.readInt()); return using; } } |
The serializer above is implemented as a stateless singleton and the actual serialization in the methods write()
and read()
are fairly straight forward. The only tricky part is that we need to have a null check in theread()
method if the “using” variable does not reference an instantiated/reused object.
How to Install it?
When we want to use ChronicleMap
in our project, we just add the following Maven dependency in our pom.xml file and we have access to the library.
1 2 3 4 5 | < dependency > < groupId >net.openhft</ groupId > < artifactId >chronicle-map</ artifactId > < version >3.17.3</ version > </ dependency > |
If you are using another build tool, for example, Gradle, you can see how to depend on ChronicleMap
by clicking this link.
The Short Story
Here are some properties of ChronicleMap:
Stores data off-heap
Is almost always more memory efficient than aHashMap
ImplementsConcurrentMap
Does not affect garbage collection times
Sometimes needs a serializer
Has a fixed max entry size
Can hold billions of associations
Is free and open-source
Published on Java Code Geeks with permission by Per Minborg, partner at our JCG program. See the original article here: Java: ChronicleMap Part 1, Go Off-Heap Opinions expressed by Java Code Geeks contributors are their own. |