The Effect of Co-ordinated Omission – Measure loopback latency using a simple NIO client/server
In this post I demonstrate a number of ideas and techniques:
- How to write a simple non-blocking NIO client/server
- The effects of co-ordinated omission
- How to measure latencies in percentiles (as opposed to simple average)
- How to time latency loopback on your machine
I was recently developing a low latency benchmark for client server application. Initially I was simulating the benchmark on a single machine using TCP over loopback. The first measure I wanted to quantify was how much of the recorded latency I needed to allow for simple loopback latency. I would then be able to understand more clearly the latency added by my actual application.
To do this I created a program (code at the end of the post) that transfers a single byte from the client to the server and back again. This is done repeatedly and the results are processed.
The program is written using non-blocking java NIO to optimise the loopback latencies as much as possible.
More importantly than recording the average time, the percentile latencies are recorded. (See a previous post here for a discussion of how to measure latency). Critically, the code factors for co-ordinated omission. (To learn more about this see here from Gil Tene). In a nutshell, you don’t time a job from when it started but from it should have started.
These are results on my 2 year old MBP.
Starting latency test rate: 80000 Average time 2513852 Loop back echo latency was 2514247.3/3887258.6 4,196,487/4,226,913 4,229,987/4230294 4,230,294 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 70000 Average time 2327041 Loop back echo latency was 2339701.6/3666542.5 3,957,860/3,986,626 3,989,404/3989763 3,989,763 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 50000 Average time 1883303 Loop back echo latency was 1881621.0/2960104.0 3,203,771/3,229,260 3,231,809/3232046 3,232,046 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 30000 Average time 1021576 Loop back echo latency was 1029566.5/1599881.0 1,726,326/1,739,626 1,741,098/1741233 1,741,233 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 20000 Average time 304 Loop back echo latency was 65.6/831.2 3,632/4,559 4,690/4698 4,698 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 10000 Average time 50 Loop back echo latency was 47.8/57.9 89/120 152/182 182 us for 50/90 99/99.9 99.99/99.999 worst %tile
Compare these results to when I did not correct for co-ordinated omission:
Starting latency test rate: 80000 Average time 45 Loop back echo latency was 44.1/48.8 71/105 124/374 374 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 70000 Average time 45 Loop back echo latency was 44.1/48.9 76/106 145/358 358 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 50000 Average time 45 Loop back echo latency was 43.9/48.8 74/105 123/162 162 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 30000 Average time 45 Loop back echo latency was 44.0/48.8 73/104 129/147 147 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 20000 Average time 45 Loop back echo latency was 44.7/49.6 78/107 135/311 311 us for 50/90 99/99.9 99.99/99.999 worst %tile Starting latency test rate: 10000 Average time 46 Loop back echo latency was 45.1/50.8 81/112 144/184 184 us for 50/90 99/99.9 99.99/99.999 worst %tile
As you can see the effects of throughput are completely ignored! It would appear that even at a rate of 80,000 messages a second your 99.99 percentile is 374us when in fact it is way way larger than that.
In fact you can only hit your target latencies when your throughput is nearer 10,000 a second. As you would understand intuitively there is a trade off between throughput and latency.
The code for this test is below:
package util; import java.io.EOFException; import java.io.IOException; import java.net.InetSocketAddress; import java.net.Socket; import java.nio.ByteBuffer; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; import java.util.Arrays; /** * Created by daniel on 02/07/2015. * Simple program to test loopback speeds and latencies. */ public class LoopBackPingPong { public final static int PORT = 8007; public void runServer(int port) throws IOException { ServerSocketChannel ssc = ServerSocketChannel.open(); ssc.bind(new InetSocketAddress(port)); System.out.println("listening on " + ssc); final SocketChannel socket = ssc.accept(); socket.socket().setTcpNoDelay(true); socket.configureBlocking(false); new Thread(() -> { long totalTime = 0; int count = 0; try { System.out.println("Connected " + socket); ByteBuffer bb = ByteBuffer.allocateDirect(1); int length; while ((length = socket.read(bb)) >= 0) { if (length > 0) { long time = System.nanoTime(); bb.flip(); bb.position(0); count++; if (socket.write(bb) < 0) throw new EOFException(); bb.clear(); totalTime += System.nanoTime() - time; } } } catch (IOException ignored) { } finally { System.out.println("Total server time " + (totalTime / count) / 1000); System.out.println("... disconnected " + socket); try { socket.close(); } catch (IOException ignored) { } } }).start(); } public void testLatency(int targetThroughput, SocketChannel socket) throws IOException { System.out.println("Starting latency test rate: " + targetThroughput); int tests = Math.min(18 * targetThroughput, 100_000); long[] times = new long[tests]; int count = 0; long now = System.nanoTime(); long rate = (long) (1e9 / targetThroughput); ByteBuffer bb = ByteBuffer.allocateDirect(4); bb.putInt(0, 0x12345678); for (int i = -20000; i < tests; i++) { //now += rate; //while (System.nanoTime() < now) // ; now = System.nanoTime(); bb.position(0); while (bb.remaining() > 0) if (socket.write(bb) < 0) throw new EOFException(); bb.position(0); while (bb.remaining() > 0) if (socket.read(bb) < 0) throw new EOFException(); if (bb.getInt(0) != 0x12345678) throw new AssertionError("read error"); if (i >= 0) times[count++] = System.nanoTime() - now; } System.out.println("Average time " + (Arrays.stream(times).sum() / times.length) / 1000); Arrays.sort(times); System.out.printf("Loop back echo latency was %.1f/%.1f %,d/%,d %,d/%d %,d us for 50/90 99/99.9 99.99/99.999 worst %%tile%n", times[times.length / 2] / 1e3, times[times.length * 9 / 10] / 1e3, times[times.length - times.length / 100] / 1000, times[times.length - times.length / 1000] / 1000, times[times.length - times.length / 10000] / 1000, times[times.length - times.length / 100000] / 1000, times[times.length - 1] / 1000 ); } public static void main(String... args) throws Exception { int port = args.length < 1 ? PORT : Integer.parseInt(args[0]); LoopBackPingPong lbpp = new LoopBackPingPong(); new Thread(() -> { try { lbpp.runServer(port); } catch (IOException e) { Jvm.rethrow(e); } }).start(); //give the server a chance to start Thread.sleep(1000); SocketChannel socket = SocketChannel.open(new InetSocketAddress("localhost", port)); socket.socket().setTcpNoDelay(true); socket.configureBlocking(false); for (int i : new int[]{80_000, 70_000, 50_000, 30_000, 20_000, 10_000}) lbpp.testLatency(i, socket); System.exit(0); } }
Reference: | The Effect of Co-ordinated Omission – Measure loopback latency using a simple NIO client/server from our JCG partner Daniel Shaya at the Rational Java blog. |