Java 8 LongAdders: The Right Way To Manage Concurrent Counters
I just lOvE new toys, and Java 8 has a bunch of them. This time around I want to talk about one of my favourites – concurrent adders. This is a new set of classes for managing counters written and read by multiple threads. The new API promises significant performance gains, while still keeping things simple and straightforward.
As people have been managing concurrent counters since the dawn of multi-core architectures, let’s take a look and see what are some of the options Java offered up until now, and how they perform compared to this new API.
Dirty counters – this approach means you’re writing / reading from a regular object or static field across multiple threads. Unfortunately, this doesn’t work for two reasons. The first is that in Java, an A += B operation isn’t Atomic. If you open up the output bytecode, you’ll see at least four instructions – one for loading the field value from the heap into the thread stack, a second for loading the delta, a third to add them and the fourth to set the result into the field.
If more than one thread is doing this at the same time for the same memory location, you run a high chance of missing out on a write operation, as one thread can override the value of another (AKA “read-modify-write”). There’s also another nasty angle to this which has to do with the volatility of the value. More on that below.
This is such a rookie mistake, and one that’s super hard to debug. If you do run across anybody doing this in your app, I’d like to ask a small favor. Run a search across your database for “Tal Weiss”. If you see me there – delete my records. I’ll feel safer.
Synchronized – the most basic of concurrency idioms, this blocks all other threads while reading or writing the value. While it works, it’s a sure-fire way of turning your code into a DMV line.
RWLock – this slightly more sophisticated version of the basic Java lock enables you to discern between threads that change the value and need to block others vs. ones that only read and don’t require a critical section. While this can be more efficient (assuming the number of writers is low) it’s a pretty meh approach, as you’re blocking the execution of all other threads when acquiring the write lock.
Volatile – this fairly misunderstood keyword essentially instructs the JIT compiler to de-optimize the run-time machine code, so that any modification to the field is immediately seen by other threads.
This invalidates some of the JIT compiler favorite optimizations of playing with the order in which assignments are applied to memory. Come again you say? You heard me. The JIT compiler may change the order in which assignments to fields are made. This arcane little strategy (also known as happens-before) allows it to minimize the number of times the program needs to access global heap, while still making sure your code is unaffected by it. Pretty sneaky…
So when should I use volatile counters? If you have just one thread updating a value and multiple threads consuming it, this is a really good strategy – no contention at all.
So why not use it always you ask? Because this doesn’t work well when more than one thread is updating the field. Since A += B is not atomic, you’re running a risk of overriding somebody else’s write. Up until Java 8, what you needed to do for this was use an AtomicInteger.
AtomicInteger – this set of classes uses CAS (compare-and-swap) processor instructions to update the value of the counter. Sounds great, doesn’t it? Well, yes and no. This works well as it utilizes a direct machine code instruction to set the value with minimum effect on the execution of other threads. The downside is that if it fails to set the value due to a race with another thread, it has to try again. Under high contention this can turn into a spin lock, where the thread has to continuously try and set the value in an infinite loop, until it succeeds. This isn’t quite what we were looking for. Enter Java 8 with LongAdders.
Java 8 Adders – this is such a cool new API I just can’t stop gushing about it! From a usage perspective it’s very similar to an AtomicInteger. Simply create a LongAdder and use intValue() and add() to get / set the value. The magic happens behind the scenes.
What this class does is when a straight CAS fails due to contention, it stores the delta in an internal cell object allocated for that thread. It then adds the value of pending cells to the sum when intValue() is called. This reduces the need to go back and CAS or block other threads. Pretty smart stuff!
So alright enough talking – let’s see this puppy in action. We’ve set up the following benchmark: reset a counter to zero and start to read and increment it using multiple threads. Stop when the counter reaches 10^8. We ran the benchmark on an i7 processor with 4 cores.
We ran the benchmark with a total of ten threads – five for writing and five for reading so we were bound to have some serious contention here:
- Notice that both dirty and volatile risk value overwrites.
- The code is available here
The Bottom Line
- Concurrent Adders clean house with a 60-100% performance boost over atomic integers.
- Adding threads didn’t make much of a difference, except when locking.
- Notice the huge performance penalty you get for using synchronized or RW-locks – an order of magnitude slower!
If you’ve already had the chance to use these classes in your code – I’d love to hear about it.
- Additional reading – Brian Goetz on Java concurrency.
Reference: | Java 8 LongAdders: The Right Way To Manage Concurrent Counters from our JCG partner Tal Weiss at the Takipi blog. |