JLBH Examples 1 – Why Code Should be Benchmarked in Context
In this post:
- A side by side example using JMH and JLBH for Date serialisation
- Measuring Date serialisation in a microbenchmark
- Measuring Date serialisation as part of a proper application
- How to add a probe to your JLBH benchmark
- Understanding the importance of measuring code in context
In the last post, ‘Introducing JLBH‘ we introduced JLBH the latency testing tool that Chronicle used to test Chronicle-FIX and is now available as open source.
In the next few posts we are going to look at some example applications:
All the code for the examples cane be found here in my GitHub project:
One of the points I made in the introduction to JLBH was that it is important to benchmark code in context. That means benchmarking code in an environment as close to how it will be running in real life as possible. This post demonstrates this in practice.
Let’s look at a relatively expensive Java operation – Date Serialisation – and see how long it takes:
First here’s a JMH benchmark:
package org.latency.serialisation.date; import net.openhft.affinity.Affinity; import net.openhft.chronicle.core.Jvm; import net.openhft.chronicle.core.OS; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import org.openjdk.jmh.runner.options.TimeValue; import java.io.*; import java.lang.reflect.InvocationTargetException; import java.util.Date; import java.util.concurrent.TimeUnit; /** * Created to show the effects of running code within more complex code. * Date serialisation as a micro benchmark vs date serialisation inside a TCP call. */ @State(Scope.Thread) public class DateSerialiseJMH { private final Date date = new Date(); public static void main(String[] args) throws InvocationTargetException, IllegalAccessException, RunnerException, IOException, ClassNotFoundException { if (OS.isLinux()) Affinity.setAffinity(2); if(Jvm.isDebug()){ DateSerialiseJMH jmhParse = new DateSerialiseJMH(); jmhParse.test(); } else { Options opt = new OptionsBuilder() .include(DateSerialiseJMH.class.getSimpleName()) .warmupIterations(6) .forks(1) .measurementIterations(5) .mode(Mode.SampleTime) .measurementTime(TimeValue.seconds(3)) .timeUnit(TimeUnit.MICROSECONDS) .build(); new Runner(opt).run(); } } @Benchmark public Date test() throws IOException, ClassNotFoundException { ByteArrayOutputStream out = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(out); oos.writeObject(date); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(out.toByteArray())); return (Date)ois.readObject(); } }
Running on my laptop (MBP i7) these are the results I get:
Result "test": 4.578 ±(99.9%) 0.046 us/op [Average] (min, avg, max) = (3.664, 4.578, 975.872), stdev = 6.320 CI (99.9%): [4.533, 4.624] (assumes normal distribution) Samples, N = 206803 mean = 4.578 ±(99.9%) 0.046 us/op min = 3.664 us/op p( 0.0000) = 3.664 us/op p(50.0000) = 4.096 us/op p(90.0000) = 5.608 us/op p(95.0000) = 5.776 us/op p(99.0000) = 8.432 us/op p(99.9000) = 24.742 us/op p(99.9900) = 113.362 us/op p(99.9990) = 847.245 us/op p(99.9999) = 975.872 us/op max = 975.872 us/op # Run complete. Total time: 00:00:21 Benchmark Mode Cnt Score Error Units DateSerialiseJMH.test sample 206803 4.578 ± 0.046 us/op
A mean time of 4.5us for the operation:
We get pretty much the same results running the test with JLBH:
package org.latency.serialisation.date; import net.openhft.chronicle.core.jlbh.JLBHOptions; import net.openhft.chronicle.core.jlbh.JLBHTask; import net.openhft.chronicle.core.jlbh.JLBH; import java.io.*; import java.lang.reflect.InvocationTargetException; import java.util.Date; /** * Created to show the effects of running code within more complex code. * Date serialisation as a micro benchmark vs date serialisation inside a TCP call. */ public class DateSerialisedJLBHTask implements JLBHTask { private Date date = new Date(); private JLBH lth; public static void main(String[] args) throws InvocationTargetException, IllegalAccessException, IOException, ClassNotFoundException { JLBHOptions jlbhOptions = new JLBHOptions() .warmUpIterations(400_000) .iterations(1_000_000) .throughput(100_000) .runs(3) .recordOSJitter(true) .accountForCoordinatedOmmission(true) .jlbhTask(new DateSerialisedJLBHTask()); new JLBH(jlbhOptions).start(); } @Override public void run(long startTimeNS) { try { ByteArrayOutputStream out = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(out); oos.writeObject(date); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(out.toByteArray())); date = (Date)ois.readObject(); lth.sample(System.nanoTime() - startTimeNS); } catch (IOException | ClassNotFoundException e) { e.printStackTrace(); } } @Override public void init(JLBH lth) { this.lth = lth; } }
These are the results:
Warm up complete (400000 iterations took 2.934s) -------------------------------- BENCHMARK RESULTS (RUN 1) --------- Run time: 10.0s Correcting for co-ordinated:true Target throughput:100000/s = 1 message every 10us End to End: (1,000,000) 50/90 99/99.9 99.99/99.999 - worst was 4.2 / 5.8 352 / 672 803 / 901 - 934 OS Jitter (13,939) 50/90 99/99.9 99.99 - worst was 8.4 / 17 639 / 4,130 12,850 - 20,450 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 2) --------- Run time: 10.0s Correcting for co-ordinated:true Target throughput:100000/s = 1 message every 10us End to End: (1,000,000) 50/90 99/99.9 99.99/99.999 - worst was 4.2 / 5.8 434 / 705 836 / 934 - 967 OS Jitter (11,016) 50/90 99/99.9 99.99 - worst was 8.4 / 17 606 / 770 868 - 1,340 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 3) --------- Run time: 10.0s Correcting for co-ordinated:true Target throughput:100000/s = 1 message every 10us End to End: (1,000,000) 50/90 99/99.9 99.99/99.999 - worst was 4.2 / 5.8 434 / 737 901 / 999 - 1,030 OS Jitter (12,319) 50/90 99/99.9 99.99 - worst was 8.4 / 15 573 / 737 803 - 901 ---------------------------------------------------------------------------------------------------- SUMMARY (end to end)---------------Percentile run1 run2 run3 % Variation 50: 4.22 4.22 4.22 0.00 90: 5.76 5.76 5.76 0.00 99: 352.26 434.18 434.18 0.00 99.9: 671.74 704.51 737.28 3.01 99.99: 802.82 835.58 901.12 4.97 worst: 901.12 933.89 999.42 4.47 --------------------------------------------------------------------
A mean time of 4.2us for the operation:
Note: This is case where there is no advantage using JLBH over JMH. I just include the code as a comparison.
Now we’re going to run exactly the same operation but inside a TCP call the code will work like this:
- Client send the server a fix message over TCP loopback (localhost)
- Server reads the message
- Server does the date serialisation
- Server returns a message to the client
As explained in the previous post JLBH allows us to produce a latency profile for any part of the code. We will add a probe for stage 3.
package org.latency.serialisation.date; import net.openhft.affinity.Affinity; import net.openhft.chronicle.core.Jvm; import net.openhft.chronicle.core.jlbh.JLBHOptions; import net.openhft.chronicle.core.jlbh.JLBHTask; import net.openhft.chronicle.core.jlbh.JLBH; import net.openhft.chronicle.core.util.NanoSampler; import java.io.*; import java.net.InetSocketAddress; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; import java.util.Date; /** * Created to show the effects of running code within more complex code. * Date serialisation as a micro benchmark vs date serialisation inside a TCP call. */ public class DateSerialiseJLBHTcpTask implements JLBHTask { private final static int port = 8007; private static final boolean BLOCKING = false; private final int SERVER_CPU = Integer.getInteger("server.cpu", 0); private Date date = new Date(); private JLBH lth; private ByteBuffer bb; private SocketChannel socket; private byte[] fixMessageBytes; private NanoSampler dateProbe; public static void main(String[] args) { JLBHOptions lth = new JLBHOptions() .warmUpIterations(50_000) .iterations(100_000) .throughput(20_000) .runs(3) .recordOSJitter(true) .accountForCoordinatedOmmission(true) .jlbhTask(new DateSerialiseJLBHTcpTask()); new JLBH(lth).start(); } @Override public void init(JLBH lth) { this.lth = lth; dateProbe = lth.addProbe("date serialisation "); try { runServer(port); Jvm.pause(200); socket = SocketChannel.open(new InetSocketAddress(port)); socket.socket().setTcpNoDelay(true); socket.configureBlocking(BLOCKING); } catch (IOException e) { e.printStackTrace(); } String fixMessage = "8=FIX.4.2\u00019=211\u000135=D\u000134=3\u000149=MY-INITIATOR-SERVICE\u000152=20160229-" + "09:04:14.459\u000156=MY-ACCEPTOR-SERVICE\u00011=ABCTEST1\u000111=863913604164909\u000121=3\u000122=5" + "\u000138=1\u000140=2\u000144=200\u000148=LCOM1\u000154=1\u000155=LCOM1\u000159=0\u000160=20160229-09:" + "04:14.459\u0001167=FUT\u0001200=201106\u000110=144\u0001\n"; fixMessageBytes = fixMessage.getBytes(); int length = fixMessageBytes.length; bb = ByteBuffer.allocateDirect(length).order(ByteOrder.nativeOrder()); bb.put(fixMessageBytes); } private void runServer(int port) throws IOException { new Thread(() -> { if (SERVER_CPU > 0) { System.out.println("server cpu: " + SERVER_CPU); Affinity.setAffinity(SERVER_CPU); } ServerSocketChannel ssc = null; SocketChannel socket = null; try { ssc = ServerSocketChannel.open(); ssc.bind(new InetSocketAddress(port)); System.out.println("listening on " + ssc); socket = ssc.accept(); socket.socket().setTcpNoDelay(true); socket.configureBlocking(BLOCKING); System.out.println("Connected " + socket); ByteBuffer bb = ByteBuffer.allocateDirect(32 * 1024).order(ByteOrder.nativeOrder()); for (; ; ) { bb.limit(12); do { if (socket.read(bb) < 0) throw new EOFException(); } while (bb.remaining() > 0); int length = bb.getInt(0); bb.limit(length); do { if (socket.read(bb) < 0) throw new EOFException(); } while (bb.remaining() > 0); long now = System.nanoTime(); try { //Running the date serialisation but this time inside the TCP callback. ByteArrayOutputStream out = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(out); oos.writeObject(date); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(out.toByteArray())); date = (Date)ois.readObject(); dateProbe.sampleNanos(System.nanoTime() - now); } catch (IOException | ClassNotFoundException e) { e.printStackTrace(); } bb.flip(); if (socket.write(bb) < 0) throw new EOFException(); bb.clear(); } } catch (IOException e) { e.printStackTrace(); } finally { System.out.println("... disconnected " + socket); try { if (ssc != null) ssc.close(); } catch (IOException ignored) { } try { if (socket != null) socket.close(); } catch (IOException ignored) { } } }, "server").start(); } @Override public void run(long startTimeNs) { bb.position(0); bb.putInt(bb.remaining()); bb.putLong(startTimeNs); bb.position(0); writeAll(socket, bb); bb.position(0); try { readAll(socket, bb); } catch (IOException e) { e.printStackTrace(); } bb.flip(); if (bb.getInt(0) != fixMessageBytes.length) { throw new AssertionError("read error"); } lth.sample(System.nanoTime() - startTimeNs); } private static void readAll(SocketChannel socket, ByteBuffer bb) throws IOException { bb.clear(); do { if (socket.read(bb) < 0) throw new EOFException(); } while (bb.remaining() > 0); } private static void writeAll(SocketChannel socket, ByteBuffer bb) { try { while (bb.remaining() > 0 && socket.write(bb) >= 0) ; } catch (IOException e) { e.printStackTrace(); } } }
This time the results look like this:
Warm up complete (50000 iterations took 3.83s) -------------------------------- BENCHMARK RESULTS (RUN 1) ------------------------ Run time: 6.712s Correcting for co-ordinated:true Target throughput:20000/s = 1 message every 50us End to End: (100,000) 50/90 99/99.9 99.99 - worst was 822,080 / 1,509,950 1,711,280 / 1,711,280 1,711,280 - 1,711,280 date serialisation (100,000) 50/90 99/99.9 99.99 - worst was 11 / 19 31 / 50 901 - 2,420 OS Jitter (64,973) 50/90 99/99.9 99.99 - worst was 8.1 / 16 40 / 1,540 4,850 - 18,350 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 2) --------- Run time: 6.373s Correcting for co-ordinated:true Target throughput:20000/s = 1 message every 50us End to End: (100,000) 50/90 99/99.9 99.99 - worst was 1,107,300 / 1,375,730 1,375,730 / 1,375,730 1,375,730 - 1,375,730 date serialisation (100,000) 50/90 99/99.9 99.99 - worst was 11 / 19 29 / 52 901 - 1,670 OS Jitter (40,677) 50/90 99/99.9 99.99 - worst was 8.4 / 16 34 / 209 934 - 1,470 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 3) --------- Run time: 5.333s Correcting for co-ordinated:true Target throughput:20000/s = 1 message every 50us End to End: (100,000) 50/90 99/99.9 99.99 - worst was 55,570 / 293,600 343,930 / 343,930 343,930 - 343,930 date serialisation (100,000) 50/90 99/99.9 99.99 - worst was 9.0 / 16 26 / 38 770 - 1,030 OS Jitter (32,042) 50/90 99/99.9 99.99 - worst was 9.0 / 13 22 / 58 737 - 934 -------------------------------------------------------------------- -------------------------------- SUMMARY (end to end)--------------- Percentile run1 run2 run3 % Variation 50: 822083.58 1107296.26 55574.53 92.66 90: 1509949.44 1375731.71 293601.28 71.07 99: 1711276.03 1375731.71 343932.93 66.67 99.9: 1711276.03 1375731.71 343932.93 66.67 99.99: 1711276.03 1375731.71 343932.93 66.67 worst: 1711276.03 1375731.71 343932.93 66.67 -------------------------------------------------------------------- -------------------------------- SUMMARY (date serialisation )------ Percentile run1 run2 run3 % Variation 50: 11.01 11.01 8.96 13.22 90: 18.94 18.94 15.62 12.44 99: 31.23 29.18 26.11 7.27 99.9: 50.18 52.22 37.89 20.14 99.99: 901.12 901.12 770.05 10.19 worst: 2424.83 1671.17 1032.19 29.21 --------------------------------------------------------------------
As can be seen the very same Date Serialisation take over twice as long from ~4.5us to ~10us.
It’s not really the place here to go into too much detail about why the code takes longer to run when in context but it’s to do with CPU caches getting filled in between calls to the date serialisation.
When all we are running (as in the micro benchmark) is the Date Serialisation then that can fit nicely into a CPU cache and never needs to get cleared out. However when there is a gap between calls to the Date serialisation the code for the operation gets cleared out and needs to be reloaded.
JLBH allows you to benchmark code in context and that’s an important part of latency benchmarking.
Reference: | JLBH Examples 1 – Why Code Should be Benchmarked in Context from our JCG partner Daniel Shaya at the Rational Java blog. |