JLBH Examples 3 – The Affects of Throughput on Latency
In this post:
- A discussion about the effects of thoughput on latency
- How use JLBH to measure TCP loopback
- Adding probes to test both halves of the TCP round trip
- Watching the effect of increasing throughput on latency
- Understanding that you have to drop throughput to achieve good latencies at high percentiles.
In the post we saw the effects of accounting for coordinated omission or measuring the effect a delay to one iteration will have an affect on subsequent iterations.
Intuitively we understand that throughput will affect latency. It seems natural that a
s we raise throughput we will also raise latency.
Going into a very crowded shop will affect how fast you can select and purchase your goods. On the other hand consider a very rarely frequented shop. It could be that in such a shop the shopkeeper is away from the till having a tea break and that your purchase will be delayed as you wait for him to put down his cup of tea and make his way to the counter where he can server you.
This is exactly what you find when running benchmarks and vary throughput.
In general as you raise throughput latency increases but at some point when throughput drops below a certain threshold latency can increase as well.
The code below times a round trip TCP call over loopback.
We add two probes:
- client2server – the amount of time taken to complete the first half of the round trip
- server2client – the amount of time taken to complete the second half of the trip
These probes do not take into account co-ordinated omission – it is only the end to end time that takes coordinated omission into account.
Here’s the code for the benchmark:
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | package org.latency.tcp; import net.openhft.affinity.Affinity; import net.openhft.chronicle.core.Jvm; import net.openhft.chronicle.core.jlbh.JLBHOptions; import net.openhft.chronicle.core.jlbh.JLBHTask; import net.openhft.chronicle.core.jlbh.JLBH; import net.openhft.chronicle.core.util.NanoSampler; import java.io.EOFException; import java.io.IOException; import java.net.InetSocketAddress; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.channels.ServerSocketChannel; import java.nio.channels.SocketChannel; public class TcpBenchmark implements JLBHTask { private final static int port = 8007 ; private static final boolean BLOCKING = false ; private final int SERVER_CPU = Integer.getInteger( "server.cpu" , 0 ); private JLBH jlbh; private ByteBuffer bb; private SocketChannel socket; private NanoSampler client2serverProbe; private NanoSampler server2clientProbe; public static void main(String[] args) { JLBHOptions jlbhOptions = new JLBHOptions() .warmUpIterations( 50000 ) .iterations( 50000 ) .throughput( 20000 ) .runs( 5 ) .jlbhTask( new TcpBenchmark()); new JLBH(jlbhOptions).start(); } @Override public void init(JLBH jlbh) { this .jlbh = jlbh; client2serverProbe = jlbh.addProbe( "client2server" ); server2clientProbe = jlbh.addProbe( "server2clientProbe" ); try { runServer(port); Jvm.pause( 200 ); socket = SocketChannel.open( new InetSocketAddress(port)); socket.socket().setTcpNoDelay( true ); socket.configureBlocking(BLOCKING); } catch (IOException e) { e.printStackTrace(); } bb = ByteBuffer.allocateDirect( 8 ).order(ByteOrder.nativeOrder()); } private void runServer( int port) throws IOException { new Thread(() -> { if (SERVER_CPU > 0 ) { System.out.println( "server cpu: " + SERVER_CPU); Affinity.setAffinity(SERVER_CPU); } ServerSocketChannel ssc = null ; SocketChannel socket = null ; try { ssc = ServerSocketChannel.open(); ssc.bind( new InetSocketAddress(port)); System.out.println( "listening on " + ssc); socket = ssc.accept(); socket.socket().setTcpNoDelay( true ); socket.configureBlocking(BLOCKING); System.out.println( "Connected " + socket); ByteBuffer bb = ByteBuffer.allocateDirect( 8 ).order(ByteOrder.nativeOrder()); while ( true ) { readAll(socket, bb); bb.flip(); long time = System.nanoTime(); client2serverProbe.sampleNanos(time - bb.getLong()); bb.clear(); bb.putLong(time); bb.flip(); writeAll(socket, bb); } } catch (IOException e) { e.printStackTrace(); } finally { System.out.println( "... disconnected " + socket); try { if (ssc != null ) ssc.close(); } catch (IOException ignored) { } try { if (socket != null ) socket.close(); } catch (IOException ignored) { } } }, "server" ).start(); } private static void readAll(SocketChannel socket, ByteBuffer bb) throws IOException { bb.clear(); do { if (socket.read(bb) < 0 ) throw new EOFException(); } while (bb.remaining() > 0 ); } @Override public void run( long startTimeNs) { bb.position( 0 ); bb.putLong(System.nanoTime()); bb.position( 0 ); writeAll(socket, bb); bb.position( 0 ); try { readAll(socket, bb); server2clientProbe.sampleNanos(System.nanoTime() - bb.getLong( 0 )); } catch (IOException e) { e.printStackTrace(); } jlbh.sample(System.nanoTime() - startTimeNs); } private static void writeAll(SocketChannel socket, ByteBuffer bb) { try { while (bb.remaining() > 0 && socket.write(bb) >= 0 ) ; } catch (IOException e) { e.printStackTrace(); } } @Override public void complete() { System.exit( 0 ); } } |
And here are the results when run at a throughput of 20,000 iterations / second:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | Warm up complete (50000 iterations took 2.296s) -------------------------------- BENCHMARK RESULTS (RUN 1) ---------Run time : 2.5s Correcting for co-ordinated: true Target throughput:20000 /s = 1 message every 50us End to End: (50,000) 50 /90 99 /99 .9 99.99 - worst was 34 / 2,950 19,400 / 20,450 20,450 - 20,450 client2server (50,000) 50 /90 99 /99 .9 99.99 - worst was 16 / 26 38 / 72 287 - 336 server2clientProbe (50,000) 50 /90 99 /99 .9 99.99 - worst was 16 / 27 40 / 76 319 - 901 OS Jitter (26,960) 50 /90 99 /99 .9 99.99 - worst was 9.0 / 16 44 / 1,340 10,220 - 11,800 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 2) --------- Run time : 2.5s Correcting for co-ordinated: true Target throughput:20000 /s = 1 message every 50us End to End: (50,000) 50 /90 99 /99 .9 99.99 - worst was 42 / 868 4,590 / 5,110 5,370 - 5,370 client2server (50,000) 50 /90 99 /99 .9 99.99 - worst was 20 / 27 38 / 92 573 - 2,560 server2clientProbe (50,000) 50 /90 99 /99 .9 99.99 - worst was 19 / 27 38 / 72 868 - 1,740 OS Jitter (13,314) 50 /90 99 /99 .9 99.99 - worst was 9.0 / 16 32 / 96 303 - 672 -------------------------------------------------------------------- -------------------------------- BENCHMARK RESULTS (RUN 3) --------- Run time : 2.5s Correcting for co-ordinated: true Target throughput:20000 /s = 1 message every 50us End to End: (50,000) 50 /90 99 /99 .9 99.99 - worst was 34 / 152 999 / 2,160 2,290 - 2,290 client2server (50,000) 50 /90 99 /99 .9 99.99 - worst was 17 / 26 36 / 54 201 - 901 server2clientProbe (50,000) 50 /90 99 /99 .9 99.99 - worst was 16 / 25 36 / 50 225 - 1,740 OS Jitter (14,306) 50 /90 99 /99 .9 99.99 - worst was 9.0 / 15 23 / 44 160 - 184 ---------------------------------------------------------------------------------------------------- SUMMARY (end to end)--------------- Percentile run1 run2 run3 % Variation var(log) 50: 33.79 41.98 33.79 13.91 90: 2949.12 868.35 151.55 75.92 99: 19398.66 4587.52 999.42 70.53 99.9: 20447.23 5111.81 2162.69 47.62 99.99: 20447.23 5373.95 2293.76 47.24 worst: 20447.23 5373.95 2293.76 47.24 -------------------------------------------------------------------- -------------------------------- SUMMARY (client2server)------------ Percentile run1 run2 run3 % Variation 50: 16.13 19.97 16.90 10.81 90: 26.11 27.14 26.11 2.55 99: 37.89 37.89 35.84 3.67 99.9: 71.68 92.16 54.27 31.76 99.99: 286.72 573.44 200.70 55.32 worst: 335.87 2555.90 901.12 55.04 -------------------------------------------------------------------- -------------------------------- SUMMARY (server2clientProbe)------- Percentile run1 run2 run3 % Variation 50: 16.13 18.94 16.13 10.43 90: 27.14 27.14 25.09 5.16 99: 39.94 37.89 35.84 3.67 99.9: 75.78 71.68 50.18 22.22 99.99: 319.49 868.35 225.28 65.55 worst: 901.12 1736.70 1736.70 0.00 -------------------------------------------------------------------- |
What should happen is that:
client2server + server2client ~= endToEnd
And this more of less what happens at the 50th percentile
Take the second run for the purposes of this demonstration:
19.97 + 18.94 ~= 41.98
If that was all you were measuring you might say that there would be no problem running 20k/s messages through my machine.
However my laptop clearly can’t handle this volume and if we look at the second run again for the 90th percentile.
27.14 + 27.14 !~= 868.35
And as you move up the percentiles it just gets worse and worse…
If however I change the throughput to 5k messages / second I see this on the 90th percentile:
32.23 + 29.38 ~= 62.46
So we see that if you want to achieve low latencies at the high percentiles you have to drop your throughput to the correct level.
This is why it so important that we are able to vary throughputs with JLBH.
Reference: | JLBH Examples 3 – The Affects of Throughput on Latency from our JCG partner Daniel Shaya at the Rational Java blog. |