A down side of durable messaging
Overview
Durable messaging can be very fast, as fast as non-durable messaging up to a point.
Limitations of durable messaging
Durable messaging is dependant on the size of your main memory and the speed of your hard drive. If you have a HDD, this can be as low as 20 MB/s and as high as 60 MB/s. A RAID set of HDD can support between 100 and 300 MB/s. An SATA SSD can support between 100 and 500 MB/s and a PCI SSD can support up to 1.5 GB/s.
Case study
Say you have 8 GB of memory, writing two million 100 bytes messages per second and a HDD which support 25 MB/s. This works fine at the speed in bursts but you reach a point where your disk cache is full. Depending on your OS this can be between 20% and 80% of your main memory size. In my experience, Windows tends to be closer to 20% even if you have plenty of free memory whereas Linux tends to allow in the region of 30% of your memory in uncommitted writes.
Say you are writing two million 100 byte message per second or 200 MB/s and you have 1600 MB of disk cache. The difference in speed is 175 MB/s between the rate you are writing and the rate you are generating it so in just 9 seconds you have filled the cache. At this point your performance plummets to the write speed of your disk which is 25 MB/second. With each messaging being 100 bytes, you are now writing 250,000 messages per second or 8x slower.
What is the solution?
- Keep your micro-bursts to less than you can fit in disk cache e.g. in the above case this would be about 18 million messages.
- Increase the amount of memory you have. While memory is cheap and you can buy 32 GB for about £150, all this does in include the duration of the micro-burst you can support.
- Increase the speed of your drive. With SSD you can support much higher bandwidths. SATA SSD drives support up to 500 MB/sec which is higher than Chronicle can typically serialize messages, i.e. more than enough. The downside of this is it reduces the total number of messages you can write. A 500 MB SSD can store 5 billion 100 byte messages. A 6×4 TB RAID-5 set can support a transfer rate of over 200 MB/s which would be enough for the above case study, and can store 200 billion messages.
Conclusion
If you see any durable messaging solution suddenly slow down under load, you need to look at the size of your buffers and the throughput of your disk sub-system.
Reference: A down side of durable messaging from our JCG partner Peter Lawrey at the Vanilla Java blog.