Core Java

Creating Millions of Objects with Zero Garbage

As noted in First rule of performance optimisation, garbage is the enemy of fast code. Not only can it destroy any sort of deterministic performance by employing the services of the garbage collector but we start filling our CPU caches with garbage that will cause expensive cache misses for our program.

So, can we use Java without creating garbage? Is it possible, for example, in natural Java, to solve this problem:
 
 

Create 10m financial instrument objects, store them in a map, retrieve them and perform a calculation using each object without creating any garbage at all.

It is if you use Chronicle!  Chronicle provides libraries so that you can easily use off heap storage in the form off memory mapped files for your objects. (For full source code for this article see here.)

Let’s look implementing a solution for the above problem.

First let’s have a look at how you might do this in normal Java so that we make sure we understand the problem and what happens if we use the standard Java libraries for our implementation.

package zeroalloc;

import org.junit.Assert;
import org.junit.Test;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

/**
 * Class to demonstrate zero garbage creation.
 * Run with -verbose:gc -Xmx4G
 */
public class CreateOnHeapTest {
    private static final int ITERATIONS = 10_000_000;

    @Test
    public void testOnHeapMap() {
        System.out.println("----- HASHMAP ------------------------");
        Map<Integer, BondVOImpl> map = new ConcurrentHashMap<>(ITERATIONS);
        long actualQuantity = 0;
        long expectedQuantity = 0;
        long time = System.currentTimeMillis();

        System.out.println("*** Entering critical section ***");

        for (int i = 0; i < ITERATIONS; i++) {
            BondVOImpl bondVo = new BondVOImpl();
            bondVo.setQuantity(i);
            map.put(Integer.valueOf(i), bondVo);
            expectedQuantity += i;
        }


        long putTime = System.currentTimeMillis() - time;
        time = System.currentTimeMillis();
        System.out.println("************* STARTING GET *********************");
        for (int i = 0; i < map.size(); i++) {
            actualQuantity += map.get(i).getQuantity();
        }

        System.out.println("*** Exiting critical section ***");

        System.out.println("Time for putting " + putTime);
        System.out.println("Time for getting " + (System.currentTimeMillis() - time));

        Assert.assertEquals(expectedQuantity, actualQuantity);

        printMemUsage();
    }

    public static void printMemUsage() {
        System.gc();
        System.gc();
        System.out.println("Memory(heap) used " + humanReadableByteCount(Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(), true));
    }

    public static String humanReadableByteCount(long bytes, boolean si) {
        int unit = si ? 1000 : 1024;
        if (bytes < unit) return bytes + " B";
        int exp = (int) (Math.log(bytes) / Math.log(unit));
        String pre = (si ? "kMGTPE" : "KMGTPE").charAt(exp - 1) + (si ? "" : "i");
        return String.format("%.1f %sB", bytes / Math.pow(unit, exp), pre);
    }
}

This is the out put from the program:

*** Entering critical section ***
[GC (Allocation Failure)  98816K->92120K(125952K), 0.0317021 secs]
[Full GC (Ergonomics)  92120K->91917K(216576K), 0.2510530 secs]
[GC (Allocation Failure)  125197K->125430K(224256K), 0.0449051 secs]
[GC (Allocation Failure)  166390K->166686K(244224K), 0.0504341 secs]
[Full GC (Ergonomics)  166686K->165777K(387072K), 0.6243385 secs]
[GC (Allocation Failure)  226705K->226513K(388096K), 0.0785121 secs]
[GC (Allocation Failure)  293073K->293497K(392704K), 0.0825828 secs]
[Full GC (Ergonomics)  293497K->292649K(591872K), 1.2479519 secs]
[GC (Allocation Failure)  359209K->359433K(689664K), 0.0666344 secs]
[GC (Allocation Failure)  449033K->449417K(695296K), 0.1759746 secs]
[GC (Allocation Failure)  539017K->539385K(747008K), 0.1907760 secs]
[GC (Allocation Failure)  632569K->633009K(786944K), 0.2293778 secs]
[Full GC (Ergonomics)  633009K->631584K(1085952K), 2.1328028 secs]
[GC (Allocation Failure)  724768K->723368K(1146368K), 0.3092297 secs]
[GC (Allocation Failure)  827816K->825088K(1174016K), 0.3156138 secs]
[GC (Allocation Failure)  929536K->929952K(1207296K), 0.3891754 secs]
[GC (Allocation Failure)  1008800K->1009560K(1273856K), 0.4149915 secs]
[Full GC (Ergonomics)  1009560K->1007636K(1650688K), 3.4521240 secs]
[GC (Allocation Failure)  1086484K->1087425K(1671680K), 0.3884906 secs]
[GC (Allocation Failure)  1195969K->1196129K(1694208K), 0.2905121 secs]
[GC (Allocation Failure)  1304673K->1305257K(1739776K), 0.4291658 secs]
[GC (Allocation Failure)  1432745K->1433137K(1766912K), 0.4470582 secs]
[GC (Allocation Failure)  1560625K->1561697K(1832960K), 0.6003558 secs]
[Full GC (Ergonomics)  1561697K->1558537K(2343936K), 4.9359721 secs]
[GC (Allocation Failure)  1728009K->1730019K(2343936K), 0.7616385 secs]
[GC (Allocation Failure)  1899491K->1901139K(2413056K), 0.5187234 secs]
[Full GC (Ergonomics)  1901139K->1897477K(3119616K), 5.7177263 secs]
[GC (Allocation Failure)  2113029K->2114505K(3119616K), 0.6768888 secs]
[GC (Allocation Failure)  2330057K->2331441K(3171840K), 0.4812436 secs]
[Full GC (Ergonomics)  2331441K->2328578K(3530240K), 6.3054896 secs]
[GC (Allocation Failure)  2600962K->2488834K(3528704K), 0.1580837 secs]
*** Exiting critical section ***
Time for putting 32088
Time for getting 454
[GC (System.gc())  2537859K->2488834K(3547136K), 0.1599314 secs]
[Full GC (System.gc())  2488834K->2488485K(3547136K), 6.2759293 secs]
[GC (System.gc())  2488485K->2488485K(3559936K), 0.0060901 secs]
[Full GC (System.gc())  2488485K->2488485K(3559936K), 6.0975322 secs]
Memory(heap) used 2.6 GB

The two main points that jump out of this issue are, one, the number and expense of the garbage collections (clearly this could be tuned) and two the amount of heap used 2.6GB. In short, there’s no getting away from it, this program produces masses of garbage.

Let’s try exactly the same thing, this time using ChronicleMap.

This is the code to solve the problem:

package zeroalloc;

import net.openhft.chronicle.map.ChronicleMap;
import net.openhft.chronicle.map.ChronicleMapBuilder;
import net.openhft.lang.values.IntValue;
import org.junit.Assert;
import org.junit.Test;

import java.io.File;
import java.io.IOException;

/**
 * Class to demonstrate zero garbage creation.
 * Run with -verbose:gc
 * To run in JFR use these options for best results
 * -XX:+UnlockCommercialFeatures -XX:+FlightRecorder
 */
public class CreateChronicleTest {
    private static final int ITERATIONS = 10_000_000;

    @Test
    public void demoChronicleMap() throws IOException, InterruptedException {
        System.out.println("----- CHRONICLE MAP ------------------------");
        File file = new File("/tmp/chronicle-map-" + System.nanoTime() + ".map");
        file.deleteOnExit();

        ChronicleMapBuilder<IntValue, BondVOInterface> builder =
                ChronicleMapBuilder.of(IntValue.class, BondVOInterface.class)
                        .entries(ITERATIONS);

        try (ChronicleMap<IntValue, BondVOInterface> map =
                     builder.createPersistedTo(file)) {
            final BondVOInterface value = map.newValueInstance();
            final IntValue key = map.newKeyInstance();
            long actualQuantity = 0;
            long expectedQuantity = 0;

            long time = System.currentTimeMillis();

            System.out.println("*** Entering critical section ***");

            for (int i = 0; i < ITERATIONS; i++) {
                value.setQuantity(i);
                key.setValue(i);
                map.put(key, value);
                expectedQuantity += i;
            }

            long putTime = (System.currentTimeMillis()-time);
            time = System.currentTimeMillis();

            for (int i = 0; i < ITERATIONS; i++) {
                key.setValue(i);
                actualQuantity += map.getUsing(key, value).getQuantity();
            }

            System.out.println("*** Exiting critical section ***");

            System.out.println("Time for putting " + putTime);
            System.out.println("Time for getting " + (System.currentTimeMillis()-time));

            Assert.assertEquals(expectedQuantity, actualQuantity);
            printMemUsage();

        } finally {
            file.delete();
        }
    }
    
    public static void printMemUsage(){
        System.gc();
        System.gc();
        System.out.println("Memory(heap) used " + humanReadableByteCount(Runtime.getRuntime().totalMemory() 
           - Runtime.getRuntime().freeMemory(), true));
    }

    public static String humanReadableByteCount(long bytes, boolean si) {
        int unit = si ? 1000 : 1024;
        if (bytes < unit) return bytes + " B";
        int exp = (int) (Math.log(bytes) / Math.log(unit));
        String pre = (si ? "kMGTPE" : "KMGTPE").charAt(exp-1) + (si ? "" : "i");
        return String.format("%.1f %sB", bytes / Math.pow(unit, exp), pre);
    }
}

This is the output from the program:

[GC (Allocation Failure)  33280K->6595K(125952K), 0.0072065 secs]
[GC (Allocation Failure)  39875K->12177K(125952K), 0.0106678 secs]
[GC (Allocation Failure)  45457K->15289K(125952K), 0.0068434 secs]
[GC (Allocation Failure)  48569K->18357K(159232K), 0.0098287 secs]
[GC (Allocation Failure)  84917K->21008K(159232K), 0.0156393 secs]
*** Entering critical section ***
*** Exiting critical section ***
Time for putting 8554
Time for getting 4351
[GC (System.gc())  36921K->21516K(230400K), 0.0331916 secs]
[Full GC (System.gc())  21516K->15209K(230400K), 0.0630483 secs]
[GC (System.gc())  15209K->15209K(230912K), 0.0006491 secs]
[Full GC (System.gc())  15209K->15209K(230912K), 0.0234045 secs]
Memory(heap) used 18.2 MB

The main point here is obviously that there were no GCs in the critical section and that the whole program only used 18MB of heap. We have managed to create a program that ordinarily would have produced gigabytes of garbage without producing any garbage at all.

A note on timings

ChronicleMap is clearly not a drop in replacement for ConcurrentHashMap, they have very different uses and it is beyond the scope of this post to go too much further into that line of discussion. But the main differences in functionality are  that ChronicleMap is persisted and can be shared amongst many JVMs. (ChronicleMap also has the ability to be tcp replicated.) Nevertheless it is interesting to quickly compare timings if nothing else than to make sure we are in the same ball park.  ChronicleMap was faster for putting, 8.5s compared to 32s.  But most of the time in ConcurrentHashMap was spent in GC and that might be tuned away to some extent.  ConcurrentHashMap was faster for getting, 0.5s compared to 4.3s.  Nevertheless on other runs I’ve seen ConcurrentHashMap taking over 7s because of a GC that occurred in that section. Even though ChronicleMap is doing significantly more work, the lack of garbage produced actually makes the timings comparable with ConcurrentHashMap.

Restarting the program

Where ChronicleMap really comes into its own is on a restart.  Let’s say your program goes down and you need to recalculate the same computation we did earlier.  In the case of ConcurrentHashMap we would have to repopulate the map in exactly the same we did earlier. With ChronicleMap, since the map is persistent it is just a matter of pointing the map at the existing file and rerunning the calculation to produce the totalQuantity.

Summary 

ConcurrentHashMapChronicleMap
gc pausesManyNone
update time32s8s
reads allowing gc7s4s
reads no gc0.5s4s
heap size2.6GB18MB
persistenceNoYes
fast restartNoYes

 

Reference: Creating Millions of Objects with Zero Garbage from our JCG partner Daniel Shaya at the Rational Java blog.

Daniel Shaya

Daniel has been programming in Java since it was in beta. Working predominantly in the finance industry he has created real time trading and margin risk applications. He is currently a director at OpenHFT where we are building next generation Java low latency products.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Wojciech Oczkowski
Wojciech Oczkowski
9 years ago

For full picture You should mention that ChronicleMap example’s memory usage isn’t only 18MB and heap is not the only kind of memory we should care about. Also i would say that main difference in functionality of Chronicle and ConcurrentHashMap is it’s contract in referring to actual object since changes made to an Object put in ChronicleMap aren’t visible until that object is reinserted in map. For more reliable test’s it would be nice for tunning the xms options because that is why so many FGC’s are triggered. But i must admit that in some cases ChronicleMap is great tool… Read more »

Zoltan Juhasz
Zoltan Juhasz
9 years ago

CreateOnHeapTest doesn’t produce garbage, so it is a very bad candidate for testing garbage collection. You are storing the data in the ConcurrentHashMap, but you never remove it, so they never become eligible for collection. It is clearly visible in the GC log. The GC doesn’t remove these map entries, the used heap size is not reduced. And the overall heap size is growing continuously. It is a good example of memory leak, as nobody cleans up this collection, it just grows… The Java GC is based on the generational theory. it is about the most of the objects are… Read more »

Jacob Zimmerman
Jacob Zimmerman
9 years ago

I kind of skimmed through this article, so I may have missed it, but am I to understand that the ChronicleMap essentially uses a file instead of in-memory storage?

Daniel Shaya
9 years ago

Please see my latest article on http://www.rationaljava.com which explains in more detail.
http://www.rationaljava.com/2015/03/chroniclemap-java-architecture-with-off.html

It uses off heap memory backed by memory mapped files.

Niklas
Niklas
8 years ago

Talking on the GC pauses, it can be improved. Since all the HashMap data (10M elements) are long-lived and occupy about 2.8GB, If I set the minimum java heap to the same at the maximum size (4G), and the old generation of the heap is big enough to accommodate the map, I will get less garbage collection. Therefore, the performance will be so much better. Below is the proof. I ran the CreateOnHeapTest with this configuration -verbose:gc -Xmx4G -Xms4G -XX:NewRatio=3 And the result: —– HASHMAP ———————— *** Entering critical section *** [GC (Allocation Failure) 786432K->759184K(4063232K), 0,8160308 secs] [GC (Allocation Failure)… Read more »

Niklas
Niklas
8 years ago
Reply to  Niklas

Notice that if I ran the CreateOnHeapTest with this the same configuration as in the article -verbose:gc -Xms4G, I also got the similar time result:
Time for putting 30100
Time for getting 411

Back to top button