Using YAML over the network
Overview
There is a number of popular text based protocols for exchanging data over the network. These include XML, FIX, and JSON. Chronicle Engine uses YAML which has some advantages and disadvantages.
Isn’t text slower than binary?
Text protocols are slower than binary protocols. The cost of encoding numbers and even unicode strings adds an overhead for the CPU.
While text is slower it has one major advantage over binary which is human readability. This makes it much easier to describe the protocol and implement a solution for the interface without using a framework.
While binary is faster than text, you may find that text is fast enough, in which case you want a format with is as easy to work with as possible.
The following lists the latency for serializing and deserializing an object with 6 field of different types. The TextWire is in YAML format, the BinaryWire is a binary form and RawWire and SBE are other binary formats.
For more details see Chronicle-Wire/microbenchmarks
All times are in micro-seconds:
Wire Format | Bytes | 99.9 %tile | 99.99 %tile | 99.999 %tile |
---|---|---|---|---|
YAML (TextWire) | 91 | 2.81 | 4.94 | 8.62 |
YAML (TextWire) | 91 | 2.59 | 4.70 | 8.58 |
JSONWire | 100 | 3.11 | 5.56 | 10.62 |
BinaryWire text fields | 70 | 1.57 | 3.42 | 7.14 |
BinaryWire number fields | 44 | 0.67 | 2.44 | 5.93 |
BinaryWire field less | 32 | 0.65 | 2.42 | 5.53 |
RawWire UTF-8 | 43 | 0.49 | 2.07 | 4.87 |
RawWire 8-bit | 43 | 0.40 | 0.57 | 2.90 |
BytesMarshallable | 39 | 0.17 | 0.21 | 2.13 |
BytesMarshallable + stop bit encoding | 28 | 0.21 | 0.25 | 2.40 |
It is usually in these high percentiles that common Java libraries show much higher results, usually due to GC pauses. Even under modest throughputs this latency jitter starts to matter e.g. if you are processing 10,000 messages per second, the following jitter would delay 14-15 messages, not just one.
Format | Size in bytes | 99.99%tile latency |
Jackson | 100 | 8.3 μS |
BSON + C-Bytes | 96 | 15.1 μS |
Snake YAML | 88 | 4,067 μS |
Boon JSON | 99 | 32.5 μS |
Externalizable | 197 | 29.3 μS |
“+ C-Bytes” means when used with Chronicle Bytes to recycle the buffer.
While Jackson had a good result for 99.99% it’s 99.999% was 1,405 μS.
What do these formats look like?
TextWire
This format has a 4 bytes size prefix which is decoded in the first line:
--- !!data price: 1234 flag: true text: Hello World! side: Sell smallInt: 123 longInt: 1234567890
BinaryWire with text fields
This is what the data looks like when automatically translated into text.
--- !!data #binary price: 1234 flag: true text: Hello World! side: Sell smallInt: 123 longInt: 1234567890
BinaryWire with number fields
This is what the data looks like when automatically translated into text.
--- !!data #binary 3: 1234 4: true 5: Hello World! 6: Sell 1: 123 2: 1234567890
RawWire without meta data
00000000 27 00 00 00 00 00 00 00 00 48 93 40 B1 0C 48 65 '······· ·H·@··He 00000010 6C 6C 6F 20 57 6F 72 6C 64 21 04 53 65 6C 6C 7B llo Worl d!·Sell{ 00000020 00 00 00 D2 02 96 49 00 00 00 00 ······I· ···
BytesMarshallable with stop bit encoding
00000000 18 00 00 00 A0 A4 69 D2 85 D8 CC 04 7B 59 00 0C ······i· ····{Y·· 00000010 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 Hello Wo rld!
Simple Binary Encoding
00000000 29 00 7B 00 00 00 D2 02 96 49 00 00 00 00 00 00 )·{····· ·I······ 00000010 00 00 00 48 93 40 01 0C 48 65 6C 6C 6F 20 57 6F ···H·@·· Hello Wo 00000020 72 6C 64 21 00 00 00 00 01 00 00 rld!···· ···
Human readable
XML and JSON are derived text formats. They are reduced forms of SGML and Javascript. This means it can be read by humans but wasn’t designed for this purpose. As we will see, not being specifically designed for human readability has some advantages.
YAML: A format designed for human readability.
The advantage of YAML as we see it, is that it was specifically designed for human readability. This means it is less verbose, and has a richer set of constructs.
The main disadvantage is it was designed for human readability (rather than machine readability) It’s richer set of constructs means that you can arrange the data to taste, though writing a program to arrange data in a tasteful manner is much harder.
An related disadvantage is that different implementations can be incompatible with each other as there is more options to support, some of which are left to interpretation. For example, symbols should be placed in quotes if needed and different libraries have a different idea of whether such a symbol needs to be quoted. In the spec, there are examples where strings with quotes in them are not in side quoted. Also there is two quotes, single and double quotes.
So why use YAML?
YAML has the advantage that it was at least designed for reading by humans. It is not the fastest format, though it can be more than fast enough, and is a very readable format. If you compare this with XML, JSON, or FIX, these were not designed for speed, nor are they particularly readable.
Protocol Documentation
Using YAML makes it easy to document what needs to be sent over the wire. We have unit tests for different functionality where we log what is sent and received in text. We add some meta data around those messages and have an output which can be directly included into our documentation. This means we have confidence it is correct.
As text we can include when the message is expected to look like and detect when even minor changes have altered the message contents easily. This is as simple as checking the string matches. The IDE can then show you a multi-line comparison so you can see the exact field which has been altered.
What can we do about YAML being slow?
We use a high level API Chronicle Wire where you can chose the exact wire format as in independent concern. This means we can switch to using a binary protocol once we have checked that the text protocol works. We use a Binary translation of YAML, but we can also use a RAW data format which strips away all the meta data for maximum speed.
We also have tools to convert “Binary YAML” automatically to YAML for logging and debugging purposes.
Conclusion
By being able to use YAML for testing and development is a productive way to develop new solutions, with the option to switch to a Binary form for speed is a good way to get a combination of readability and speed.
Reference: | Using YAML over the network from our JCG partner Peter Lawrey at the Vanilla Java blog. |