Java performance tutorial – How fast are the Java 8 streams?
In this JAX Magazine sneak preview, JAX London speaker Angelika Langer answers the most important question for anyone using Java streams: are they really faster?
Java 8 came with a major addition to the JDK collection framework, namely the stream API. Similar to collections, streams represent sequences of elements. Collections support operations such as add()
, remove()
, and contains()
that work on a single element. Streams, in contrast, have bulk operations such as forEach()
, filter()
, map()
, andreduce()
that access all elements in a sequence. The notion of a Java stream is inspired by functional programming languages, where the corresponding abstraction is typically called a sequence, which also has filter-map-reduce operations. Due to this similarity, Java 8 – at least to some extent – permits a functional programming style in addition to the object-oriented paradigm that it supported all along.
Perhaps contrary to widespread belief, the designers of the Java programming language did not extend Java and its JDK to allow functional programming in Java or to turn Java into a hybrid “object-oriented & functional” programming language. The actual motivation for inventing streams for Java was performance or – more precisely – making parallelism more accessible to software developers (see Brian Goetz, State of the Lambda). This goal makes a lot of sense to me, considering the way in which hardware evolves. Our hardware has dozens of cpu cores today and will probably have hundreds some time in the future. In order to effectively utilize the hardware capabilities and thereby achieve state-of-the-art execution performance we must parallelize. After all – what is the point to running a single thread on a multicore platform? At the same time, multithread programming is considered hard and error-prone, and rightly so. Streams, which come in two flavours (as sequential and parallel streams), are designed to hide the complexity of running multiple threads. Parallel streams make it extremely easy to execute bulk operations in parallel – magically, effortlessly, and in a way that is accessible to every Java developer.
So, let’s talk about performance. How fast are the Java 8 streams? A common expectation is that parallel execution of stream operations is faster than sequential execution with only a single thread. Is it true? Do streams improve performance?
In order to answer questions regarding performance we must measure, that is, run a micro-benchmark. Benchmarking is hard and error-prone, too. You need to perform a proper warm-up, watch out for all kinds of distorting effects from optimizations applied by the virtual machine’s JIT compiler (dead code elimination being a notorious one) up to hardware optimizations (such as increasing one core’s cpu frequency if the other cores are idle). In general, benchmark results must be taken with a grain of salt. Every benchmark is an experiment. Its results are context-dependent. Never trust benchmark figures that you haven’t produced yourself in your context on your hardware. This said, let us experiment.
Comparing streams to loops
First, we want to find out how a stream’s bulk operation compares to a regular, traditional for-
loop. Is it worth using streams in the first place (for performance reasons)?
The sequence which we will use for the benchmark is an int-
array filled with 500,000 random integral values. In this array we will search for the maximum value.
Here is the traditional solution with a for-
loop:
int[] a = ints; int e = ints.length; int m = Integer.MIN_VALUE; for(int i=0; i < e; i++) if(a[i] > m) m = a[i];
Here is the solution with a sequential IntStream:
int m = Arrays.stream(ints) .reduce(Integer.MIN_VALUE, Math::max);
We measured on an outdated hardware (dual core, no dynamic overclocking) with proper warm-up and all it takes to produce halfway reliable benchmark figures. This was the result in that particular context:
int-array, for-loop : 0.36 ms int-array, seq. stream: 5.35 ms
The result is sobering: the good old for-
loop is 15 times faster than the sequential stream. How disappointing! Years of development effort spent on building streams for java 8 and then this ?!?!? But, wait! Before we conclude that streams are abysmally slow let us see what happens if we replace the int-
array by an ArrayList<Integer>.
Here is the for-
loop:
int m = Integer.MIN_VALUE; for (int i : myList) if (i>m) m=i;
Here is the stream-based solution:
int m = myList.stream() .reduce(Integer.MIN_VALUE, Math::max);
These are the results:
ArrayList, for-loop : 6.55 ms ArrayList, seq. stream: 8.33 ms
Again, the for-
loop is faster that the sequential stream operation, but the difference on an ArrayList is not nearly as significant as it was on an array.
Let’s think about it. Why do the results differ that much? There are several aspects to consider.
First, access to array elements is very fast. It is an index-based memory access with no overhead whatsoever. In other words, it is a plain down-to-the-metal memory access. Elements in a collection such as ArrayList on the other hand are accessed via an iterator and the iterator inevitably adds overhead. Plus, there is the overhead of boxing and unboxing collection elements whereas int-arrays use plain primitive type ints. Essentially, the measurements for the ArrayList are dominated by the iteration and boxing overhead whereas the figures for the int-array illustrate the advantage of for-
loops.
Secondly, had we seriously expected that streams would be faster than plain for-loops? Compilers have 40+ years of experience optimizing loops and the virtual machine’s JIT compiler is especially apt to optimize for-
loops over arrays with an equal stride like the one in our benchmark. Streams on the other hand are a very recent addition to Java and the JIT compiler does not (yet) perform any particularly sophisticated optimizations to them.
Thirdly, we must keep in mind that we are not doing much with the sequence elements once we got hold of them. We spend a lot of effort trying to get access to an element and then we don’t do much with it. We just compare two integers, which after JIT compilation is barely more than one assembly instruction. For this reason, our benchmarks illustrate the cost of element access – which need not necessarily be a typical situation. The performance figures change substantially if the functionality applied to each element in the sequence is cpu intensive. You will find that there is no measurable difference any more between for-loop and sequential stream if the functionality is heavily cpu bound.
The ultimate conclusion to draw from this benchmark experiment is NOT that streams are always slower than loops. Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops. If you use sequential streams then you don’t do it for performance reasons; you do it because you like the functional programming style.
So, where is the performance improvement streams were invented for? So far we have only compared loops to streams. How about parallelization? The point of streams is easy parallelization for better performance.
Comparing sequential streams to parallel streams
As a second experiment, we want to figure out how a sequential stream compares to a parallel stream performance-wise. Are parallel stream operations faster than sequential ones?
We use the same int-
array filled with 500,000 integral values. Here is the sequential stream operation:
int m = Arrays.stream(ints) .reduce(Integer.MIN_VALUE, Math::max);
This is the parallel stream operation:
int m = Arrays.stream(ints).parallel() .reduce(Integer.MIN_VALUE, Math::max);
Our expectation is that parallel execution should be faster than sequential execution. Since the measurements were made on a dual-core platform parallel execution can be at most twice as fast as sequential execution. Ideally, the ratio sequential / parallel performance should be 2.0, Naturally, parallel execution does introduce some overhead for splitting the problem, creating subtasks, running them in multiple threads, gathering their partial results, and producing the overall result. The ratio will be less than 2.0, but it should come close.
These are the actual benchmark results:
sequential parallel seq./par. int-array 5.35 ms 3.35 ms 1.60
The reality check via our benchmark yields a ratio (sequential / parallel) of only 1.6 instead of 2.0, which illustrates the amount of overhead that is involved in going parallel and how (well or poorly) it is overcompensated (on this particular platform).
You might be tempted to generalise these figures and conclude that parallel streams are always faster than sequential streams, perhaps not twice as fast (on a dual core hardware), as one might hope for, but at least faster. However, this is not true. Again, there are numerous aspects that contribute to the performance of a parallel stream operation.
One of them is the splittability of the stream source. An array splits nicely; it just takes an index calculation to figure out the mid element and split the array into halves. There is no overhead and thus barely any cost of splitting. How easily do collections split compared to an array? What does it take to split a binary tree or a linked list? In certain situation you will observe vastly different performance results for different types of collections.
Another aspect is statefulness. Some stream operations maintain state. An example is the distinct()
operation. It is an intermediate operation that eliminates duplicates from the input sequence, i.e., it returns an output sequence with distinct elements. In order to decide whether the next element is a duplicate or not the operation must compare to all elements it has already encountered. For this purpose it maintains some sort of data structure as its state. If you call distinct()
on a parallel stream its state will be accessed concurrently by multiple worker threads, which requires some form of coordination or synchronisation, which adds overhead, which slows down parallel execution, up to the extent that parallel execution may be significantly slower than sequential execution.
With this in mind it is fair to say that the performance model of streams is not a trivial one. Expecting that parallel stream operations are always faster than sequential stream operations is naive. The performance gain, if any, depends on numerous factors, some of which I briefly mentioned above. If you are familiar with the inner workings of streams you will be capable of coming up with an informed guess regarding the performance of a parallel stream operation. Yet, you need to benchmark a lot in order to find out for a given context whether going parallel is worth doing or not. There are indeed situations in which parallel execution is slower than sequential execution and blindly using parallel streams in all cases can be downright counter- productive.
The realisation is: Yes, parallel stream operations are easy to use and often they run faster than sequential operations, but don’t expect miracles. Also, don’t guess; instead, benchmark a lot.
This was a sneak preview from the JAX Magazine – sign up here for more free developer tips, trends and tutorials.
Look up fork join, while you’re at it, and the system property overriding the default parallelism degree.
Hello,
the Stream based solution to find the max value in an array of ints is: int max= Arrays.stream(ints).max().getAsInt()
The max() methods of streams are implemented as reductions. For instance, this is the implementation of the IntStream’s max() method (see class java.util.IntPipeline):
public final OptionalInt max() {
return reduce(Math::max);
}
Not to say I am an expert at Java but i wrote up the code to test @Albi’s theory. She is right, it is actually close to int for loop and much faster than using your code for the example. I am not able to understand why you would use the way you did. That is obviously slow.
Hello!
I suggest you to learn JMH ( http://openjdk.java.net/projects/code-tools/jmh/ ) before giving any performance advice.
I cannot reproduce your first test, see my JMH benchmark and results:
https://gist.github.com/amaembo/19445fbf2419bf04a635
Sequential stream version is only 5x slower, not 15x. Have you used 64 bit Java or 32 bit? There might be a very big difference in performance. I tested on 64 bit Quad-core machine with Hyper-threading (8 hardware threads).
Did you notice that I was saying: “This was the result in that particular context.” ? Of course, you cannot reproduce the figures. That’s the nature of a benchmark. Benchmark figures are only valid in the context in which they were produced. This is entirely independent of the benchmark harness that is used. We received different results on different machines as well. In fact, we use the benchmarks mentioned in the post in our Java 8 seminars. They were run by hundreds of developers on hundreds of computers. The individual figures differed from situation to situation (pretty much like your… Read more »
Firstly, thank you for this informative post, I learned plenty of things. It can be predictable that java 8 streams is slower than old fashion array loops; however, it is very surprising for me to read that old fashion array loop is speeder than Java 8 streams five more times. And I have just run a benchmark with your test cases (I used JMH, most probably you heard it before), I got the very similar results with yours. (https://github.com/cenkakin/jmh-benchmarks) Hopefully, streams will be more efficient in the future like you said. Also, as you know, premature optimization is the root… Read more »
I did my own tests with this: for (int j = 0; j rand.nextInt()); long t1 = System.nanoTime(); int min = Integer.MIN_VALUE; for (int i = 0; i min) { min = value; } } long t2 = System.nanoTime(); System.out.println(“array for = ” + (t2 – t1)); t1 = System.nanoTime(); min = Arrays.stream(values).reduce(Integer.MIN_VALUE, Math::max); t2 = System.nanoTime(); System.out.println(“Arrays.stream = ” + (t2 – t1)); t1 = System.nanoTime(); min = IntStream.of(values).max().getAsInt(); t2 = System.nanoTime(); System.out.println(“Intstream.of = ” + (t2 – t1)); } Here were the results: array for = 185277 Arrays.stream = 6194937 Intstream.of = 1872787 array for = 172540 Arrays.stream… Read more »
I think one of the points that some are missing is the fact that streams didn’t appear to replace for loops … because those where slow. Streams were introduced in the Java 8 edition in the context of also introducting lambda expressions. That is, making Java look more functional, more beautiful and expressive, and as a syntactic sugar to avoid writing that cumbersome code based on annonymous inner classes. The fact that they are quite slow, while interesting, is not that relevant. At any point in time, outputing html code by hand would be faster than using jsp, iterating in… Read more »
That’s exactly the point. If I use sequential streams then I don’t do it to improve execution performance; I use them because I like the style. Note, I am talking about _sequential_ streams – not parallel streams. There are various conceivable reasons for using streams. One of them is performance, namely the easy way in which streams enable parallel execution of bulk operations on sequences. If I don’t have idle cores because my application is already nicely parallelized or runs on a hardware that does not have dozens of cores or my application does not do any cpu-intensive then parallelism… Read more »
This was actually something I evaluated when JDK 8 first came out (http://www.micromux.com/2014/03/23/swimming-java-stream/); and I entirely expected streams to be inefficient (unnecessary) for these kinds of operations.
With that said, over the past year or so I have started to find more opportunities for parallel streams where IO latency is involved with map reduce functions. So if you are going over an array or a map that requires a database query for every element in the map then parallel streams can help you with the latency of the database calls.