JLBH Examples 3 – The Affects of Throughput on Latency
In this post:
- A discussion about the effects of thoughput on latency
- How use JLBH to measure TCP loopback
- Adding probes to test both halves of the TCP round trip
- Watching the effect of increasing throughput on latency
- Understanding that you have to drop throughput to achieve good latencies at high percentiles.
In the post we saw the effects of accounting for coordinated omission or measuring the effect a delay to one iteration will have an affect on subsequent iterations.
Intuitively we understand that throughput will affect latency. It seems natural that a
s we raise throughput we will also raise latency.
Going into a very crowded shop will affect how fast you can select and purchase your goods. On the other hand consider a very rarely frequented shop. It could be that in such a shop the shopkeeper is away from the till having a tea break and that your purchase will be delayed as you wait for him to put down his cup of tea and make his way to the counter where he can server you.
This is exactly what you find when running benchmarks and vary throughput.
In general as you raise throughput latency increases but at some point when throughput drops below a certain threshold latency can increase as well.
The code below times a round trip TCP call over loopback.
We add two probes:
- client2server – the amount of time taken to complete the first half of the round trip
- server2client – the amount of time taken to complete the second half of the trip
These probes do not take into account co-ordinated omission – it is only the end to end time that takes coordinated omission into account.
Here’s the code for the benchmark:
package org.latency.tcp; import net.openhft.affinity.Affinity; import net.openhft.chronicle.core.Jvm; import net.openhft.chronicle.core.jlbh.JLBHOptions; import net.openhft.chronicle.core.jlbh.JLBHTask; import net.openhft.chronicle.core.jlbh.JLBH; import net.openhft.chronicle.core.util.NanoSampler; import java.io.EOFException; import java.io.IOException; import java.net.InetSocketAddress; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; public class TcpBenchmark implements JLBHTask { private final static int port = 8007; private static final boolean BLOCKING = false; private final int SERVER_CPU = Integer.getInteger("server.cpu", 0); private JLBH jlbh; private ByteBuffer bb; private SocketChannel socket; private NanoSampler client2serverProbe; private NanoSampler server2clientProbe; public static void main(String[] args) { JLBHOptions jlbhOptions = new JLBHOptions() .warmUpIterations(50000) .iterations(50000) .throughput(20000) .runs(5) .jlbhTask(new TcpBenchmark()); new JLBH(jlbhOptions).start(); } @Override public void init(JLBH jlbh) { this.jlbh = jlbh; client2serverProbe = jlbh.addProbe("client2server"); server2clientProbe = jlbh.addProbe("server2clientProbe"); try { runServer(port); Jvm.pause(200); socket = SocketChannel.open(new InetSocketAddress(port)); socket.socket().setTcpNoDelay(true); socket.configureBlocking(BLOCKING); } catch (IOException e) { e.printStackTrace(); } bb = ByteBuffer.allocateDirect(8).order(ByteOrder.nativeOrder()); } private void runServer(int port) throws IOException { new Thread(() -> { if (SERVER_CPU > 0) { System.out.println("server cpu: " + SERVER_CPU); Affinity.setAffinity(SERVER_CPU); } ServerSocketChannel ssc = null; SocketChannel socket = null; try { ssc = ServerSocketChannel.open(); ssc.bind(new InetSocketAddress(port)); System.out.println("listening on " + ssc); socket = ssc.accept(); socket.socket().setTcpNoDelay(true); socket.configureBlocking(BLOCKING); System.out.println("Connected " + socket); ByteBuffer bb = ByteBuffer.allocateDirect(8).order(ByteOrder.nativeOrder()); while (true) { readAll(socket, bb); bb.flip(); long time = System.nanoTime(); client2serverProbe.sampleNanos(time - bb.getLong()); bb.clear(); bb.putLong(time); bb.flip(); writeAll(socket, bb); } } catch (IOException e) { e.printStackTrace(); } finally { System.out.println("... disconnected " + socket); try { if (ssc != null) ssc.close(); } catch (IOException ignored) { } try { if (socket != null) socket.close(); } catch (IOException ignored) { } } }, "server").start(); } private static void readAll(SocketChannel socket, ByteBuffer bb) throws IOException { bb.clear(); do { if (socket.read(bb) < 0) throw new EOFException(); } while (bb.remaining() > 0); } @Override public void run(long startTimeNs) { bb.position(0); bb.putLong(System.nanoTime()); bb.position(0); writeAll(socket, bb); bb.position(0); try { readAll(socket, bb); server2clientProbe.sampleNanos(System.nanoTime() - bb.getLong(0)); } catch (IOException e) { e.printStackTrace(); } jlbh.sample(System.nanoTime() - startTimeNs); } private static void writeAll(SocketChannel socket, ByteBuffer bb) { try { while (bb.remaining() > 0 && socket.write(bb) >= 0) ; } catch (IOException e) { e.printStackTrace(); } } @Override public void complete() { System.exit(0); } }
And here are the results when run at a throughput of 20,000 iterations / second:
Warm up complete (50000 iterations took 2.296s) -------------------------------- BENCHMARK RESULTS (RUN 1) ---------Run time: 2.5s Correcting for co-ordinated:true Target throughput:20000/s = 1 message every 50us End to End: (50,000) 50/90 99/99.9 99.99 - worst was 34 / 2,950 19,400 / 20,450 20,450 - 20,450 client2server (50,000) 50/90 99/99.9 99.99 - worst was 16 / 26 38 / 72 287 - 336 server2clientProbe (50,000) 50/90 99/99.9 99.99 - worst was 16 / 27 40 / 76 319 - 901 OS Jitter (26,960) 50/90 99/99.9 99.99 - worst was 9.0 / 16 44 / 1,340 10,220 - 11,800 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 2) --------- Run time: 2.5s Correcting for co-ordinated:true Target throughput:20000/s = 1 message every 50us End to End: (50,000) 50/90 99/99.9 99.99 - worst was 42 / 868 4,590 / 5,110 5,370 - 5,370 client2server (50,000) 50/90 99/99.9 99.99 - worst was 20 / 27 38 / 92 573 - 2,560 server2clientProbe (50,000) 50/90 99/99.9 99.99 - worst was 19 / 27 38 / 72 868 - 1,740 OS Jitter (13,314) 50/90 99/99.9 99.99 - worst was 9.0 / 16 32 / 96 303 - 672 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 3) --------- Run time: 2.5s Correcting for co-ordinated:true Target throughput:20000/s = 1 message every 50us End to End: (50,000) 50/90 99/99.9 99.99 - worst was 34 / 152 999 / 2,160 2,290 - 2,290 client2server (50,000) 50/90 99/99.9 99.99 - worst was 17 / 26 36 / 54 201 - 901 server2clientProbe (50,000) 50/90 99/99.9 99.99 - worst was 16 / 25 36 / 50 225 - 1,740 OS Jitter (14,306) 50/90 99/99.9 99.99 - worst was 9.0 / 15 23 / 44 160 - 184 ---------------------------------------------------------------------------------------------------- SUMMARY (end to end)--------------- Percentile run1 run2 run3 % Variation var(log) 50: 33.79 41.98 33.79 13.91 90: 2949.12 868.35 151.55 75.92 99: 19398.66 4587.52 999.42 70.53 99.9: 20447.23 5111.81 2162.69 47.62 99.99: 20447.23 5373.95 2293.76 47.24 worst: 20447.23 5373.95 2293.76 47.24 -------------------------------------------------------------------- -------------------------------- SUMMARY (client2server)------------ Percentile run1 run2 run3 % Variation 50: 16.13 19.97 16.90 10.81 90: 26.11 27.14 26.11 2.55 99: 37.89 37.89 35.84 3.67 99.9: 71.68 92.16 54.27 31.76 99.99: 286.72 573.44 200.70 55.32 worst: 335.87 2555.90 901.12 55.04 -------------------------------------------------------------------- -------------------------------- SUMMARY (server2clientProbe)------- Percentile run1 run2 run3 % Variation 50: 16.13 18.94 16.13 10.43 90: 27.14 27.14 25.09 5.16 99: 39.94 37.89 35.84 3.67 99.9: 75.78 71.68 50.18 22.22 99.99: 319.49 868.35 225.28 65.55 worst: 901.12 1736.70 1736.70 0.00 --------------------------------------------------------------------
What should happen is that:
client2server + server2client ~= endToEnd
And this more of less what happens at the 50th percentile
Take the second run for the purposes of this demonstration:
19.97 + 18.94 ~= 41.98
If that was all you were measuring you might say that there would be no problem running 20k/s messages through my machine.
However my laptop clearly can’t handle this volume and if we look at the second run again for the 90th percentile.
27.14 + 27.14 !~= 868.35
And as you move up the percentiles it just gets worse and worse…
If however I change the throughput to 5k messages / second I see this on the 90th percentile:
32.23 + 29.38 ~= 62.46
So we see that if you want to achieve low latencies at the high percentiles you have to drop your throughput to the correct level.
This is why it so important that we are able to vary throughputs with JLBH.
Reference: | JLBH Examples 3 – The Affects of Throughput on Latency from our JCG partner Daniel Shaya at the Rational Java blog. |