Introduction to writing custom collectors in Java 8
Java 8 introduced the concept of collectors. Most of the time we barely use factory methods from Collectors
class, e.g. collect(toList())
, toSet()
or maybe something more fancy like counting()
or groupingBy()
. Not many of us actually bother to look how collectors are defined and implemented. Let’s start from analysing what Collector<T, A, R>
really is and how it works.
Collector<T, A, R>
works as a “sink” for streams – stream pushes items (one after another) into a collector, which should produce some “collected” value in the end. Most of the time it means building a collection (like toList()
) by accumulating elements or reducing stream into something smaller (e.g. counting()
collector that barely counts elements). Every collector accepts items of type T
and produces aggregated (accumulated) value of type R
(e.g. R = List<T>
). Generic type A
simply defines the type of intermediate mutable data structure that we are going to use to accumulate items of type T
in the meantime. Type A
can, but doesn’t have to be the same as R
– in simple words the mutable data structure that we use to collect items from input Stream<T>
can be different than the actual output collection/value. That being said, every collector must implement the following methods:
interface Collector<T,A,R> { Supplier<A> supplier() BiConsumer<A,T> acumulator() BinaryOperator<A> combiner() Function<A,R> finisher() Set<Characteristics> characteristics() }
supplier()
returns a function that creates an instance of accumulator – mutable data structure that we will use to accumulate input elements of typeT
.accumulator()
returns a function that will take accumulator and one item of typeT
, mutating accumulator.combiner()
is used to join two accumulators together into one. It is used when collector is executed in parallel, splitting inputStream<T>
and collecting parts independently first.finisher()
takes an accumulatorA
and turns it into a result value, e.g. collection, of typeR
. All of this sounds quite abstract, so let’s do a simple example.
Obviously Java 8 doesn’t provide a built-in collector for ImmutableSet<T>
from Guava. However creating one is very simple. Remember that in order to iteratively build ImmutableSet
we use ImmutableSet.Builder<T>
– this is going to be our accumulator.
import com.google.common.collect.ImmutableSet; public class ImmutableSetCollector<T> implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> { @Override public Supplier<ImmutableSet.Builder<T>> supplier() { return ImmutableSet::builder; } @Override public BiConsumer<ImmutableSet.Builder<T>, T> accumulator() { return (builder, t) -> builder.add(t); } @Override public BinaryOperator<ImmutableSet.Builder<T>> combiner() { return (left, right) -> { left.addAll(right.build()); return left; }; } @Override public Function<ImmutableSet.Builder<T>, ImmutableSet<T>> finisher() { return ImmutableSet.Builder::build; } @Override public Set<Characteristics> characteristics() { return EnumSet.of(Characteristics.UNORDERED); } }
First of all look carefully at generic types. Our ImmutableSetCollector
takes input elements of type T
, so it works for any Stream<T>
. In the end it will produce ImmutableSet<T>
– as expected. ImmutableSet.Builder<T>
is going to be our intermediate data structure.
supplier()
returns a function that creates newImmutableSet.Builder<T>
. If you are not that familiar with lambdas in Java 8,ImmutableSet::builder
is a shorthand for() -> ImmutableSet.builder()
.accumulator()
returns a function that takesbuilder
and one element of typeT
. It simply adds said element to the builder.combiner()
returns a function that will accept two builders and turn them into one by adding all elements from one of them into the other – and returning the latter. Finallyfinisher()
returns a function that will turnImmutableSet.Builder<T>
intoImmutableSet<T>
. Again this is a shorthand syntax for:builder -> builder.build()
.- Last but not least,
characteristics()
informs JDK what capabilities our collector has. For example ifImmutableSet.Builder<T>
was thread-safe (it isn’t), we could sayCharacteristics.CONCURRENT
as well.
We can now use our custom collector everywhere using collect()
:
final ImmutableSet<Integer> set = Arrays .asList(1, 2, 3, 4) .stream() .collect(new ImmutableSetCollector<>());
However creating new instance is slightly verbose so I suggest creating static factory method, similar to what JDK does:
public class ImmutableSetCollector<T> implements Collector<T, ImmutableSet.Builder<T>, ImmutableSet<T>> { //... public static <T> Collector<T, ?, ImmutableSet<T>> toImmutableSet() { return new ImmutableSetCollector<>(); } }
From now on we can take full advantage of our custom collector by simply typing: collect(toImmutableSet())
. In the second part we will learn how to write more complex and useful collectors.
Update
@akarazniewicz pointed out that collectors are just verbose implementation of folding. With my love and hate relationship with folds, I have to comment on that. Collectors in Java 8 are basically object-oriented encapsulation of the most complex type of fold found in Scala, namely GenTraversableOnce.aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
. aggregate()
is like fold()
, but requires extra combop
to combine two accumulators (of type B
) into one. Comparing this to collectors, parameter z
comes from a supplier()
, seqop()
reduction operation is an accumulator()
and combop
is a combiner()
. In pseudo-code we can write:
finisher( seq.aggregate(collector.supplier()) (collector.accumulator(), collector.combiner()))
GenTraversableOnce.aggregate()
is used when concurrent reduction is possible – just like with collectors.
Reference: | Introduction to writing custom collectors in Java 8 from our JCG partner Tomasz Nurkiewicz at the Java and neighbourhood blog. |