Writing 2 Characters into a Single Java char
Here’s another nice trick we used when creating the ultra low latency Chronicle FIX-Engine.
When it comes to reading data off a stream of bytes it’s way more efficient, if possible, to store data in a char
rather than having to read it into a String
. (At the very least you are avoiding creating a String object, although this can be mitigated by using a cache or working with CharSequence
rather than String
but that’s the subject of another post.)
Using JMH benchmarks I’ve found these timings: (I haven’t included the source code for this as this is going to be the subject of another post where I describe the different methodologies in more detail).
Reading 2 ascii characters off a bytes stream into:
String - 34.48ns Pooled String - 28.57ns StringBuilder - 21.27ns char (using 2 chars method) - 6.75ns
The point is that it takes at least 3 times longer to read data into a String
than a char
, and that doesn’t even take into account the garbage created.
So it goes without saying that when you know that you are expecting data that is always a single character, rather than reading that data into a String
variable you should read it into a char
.
Now what if you know that that data you are expecting on the stream is no more than 2 characters. (You find this situation, for example in FIX 5.0 tag 35 msgType). Do you have to use a String so that you can accommodate the extra character? At first thoughts it appears so, after all a char can only contain a single character.
Or can it?
A java char
is made up of 2 bytes not one. Therefore if you know that your data is made up of ascii characters you know that only a single byte (of the 2 bytes in the char
) will be used. For example ‘A’ is 65 though to ‘z’ which is 122.
You can print out the values that fit into a single byte with this simple loop:
for (int i = 0; i < 256; i++) { char c = (char)i; System.out.println(i+ ":" + c); }
You are now free to use the other bye of the char to hold the second ascii character.
This is the way to do it:
In this example you have read 2 bytes ‘a’ and ‘b’ and want to store them in a single char.
byte a = (byte)'a'; byte b = (byte)'b'; //Now place a and b into a single char char ab = (char)((a << 8) + b); //To retrieve the bytes individually see code below System.out.println((char)(ab>>8) +""+ (char)(ab & 0xff));
To better understand this let’s look at the binary:
byte a = (byte)'a' // 01100001 byte b = (byte)'b' // 01100010 As you can see below, when viewed as a char, the top 8 bits are not being used char ca = 'a' // 00000000 01100001 char cb = 'b' // 00000000 01100010 Combine the characters with a taking the top 8 bits and b the bottom 8 bits. char ab = (char)((a << 8) + b); // 01100001 01100010
Summary
It’s more efficient reading data into a char rather than a String. If you know that you have a maximum of 2 ascii characters they can be combined into a single Java char. Of course only use this technique if you really are worried about ultra low latency!
Reference: | Writing 2 Characters into a Single Java char from our JCG partner Daniel Shaya at the Rational Java blog. |
What kind of real-life application would require that?
My daily life of Java ecosystem user is full of applications heavily relying on databases and for which the performance is mainly database driven.
Dividing the access time to a character stream by 3 is simply meaningless …
Welcome to the world of ultra low latency messaging systems and high frequency trading:)
Nice to know!
Thanks for info :D
The option -XX:+UseCompressedStrings is more easy in use .