Accumulative: Custom Java Collectors Made Easy
Accumulative
is an interface proposed for the intermediate accumulation type A
of Collector<T, A, R>
in order to make defining custom Java Collector
s easier.
Introduction
If you’ve ever used Java Stream
s, you most likely used some Collector
s, e.g.:
But have you ever used…
- A composed
Collector
?- It takes another
Collector
as a parameter, e.g.:Collectors.collectingAndThen
.
- It takes another
- A custom
Collector
?- Its functions are specified explicitly in
Collector.of
.
- Its functions are specified explicitly in
This post is about custom Collector
s.
Collector
Let’s recall the essence of the Collector
contract (comments mine) :
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 | /** * @param <T> (input) element type * @param <A> (intermediate) mutable accumulation type (container) * @param <R> (output) result type */ public interface Collector<T, A, R> { Supplier<A> supplier(); // create a container BiConsumer<A, T> accumulator(); // add to the container BinaryOperator<A> combiner(); // combine two containers Function<A, R> finisher(); // get the final result from the container Set<Characteristics> characteristics(); // irrelevant here } |
The above contract is functional in nature, and that’s very good! This lets us create Collector
s using arbitrary accumulation types (A
), e.g.:
A
:StringBuilder
(Collectors.joining
)A
:OptionalBox
(Collectors.reducing
)A
:long[]
(Collectors.averagingLong
)
Proposal
Before I provide any rationale, I’ll present the proposal, because it’s brief. Full source code of this proposal is available as a GitHub gist.
Accumulative Interface
I propose to add the following interface dubbed Accumulative
(name to be discussed) to the JDK:
1 2 3 4 5 6 7 8 | public interface Accumulative<T, A extends Accumulative<T, A, R>, R> { void accumulate(T t); // target for Collector.accumulator() A combine(A other); // target for Collector.combiner() R finish(); // target for Collector.finisher() } |
This interface, as opposed to Collector
, is object-oriented in nature, and classes implementing it must represent some mutable state.
Collector.of Overload
Having Accumulative
, we can add the following Collector.of
overload:
1 2 3 4 | public static <T, A extends Accumulative<T, A, R>, R> Collector<T, ?, R> of( Supplier<A> supplier, Collector.Characteristics... characteristics) { return Collector.of(supplier, A::accumulate, A::combine, A::finish, characteristics); } |
Average-Developer Story
In this section, I show how the proposal may impact an average developer, who knows only the basics of the Collector API. If you know this API well, please do your best to imagine you don’t before reading on…
Example
Let’s reuse the example from my latest post (simplified even further). Assume that we have a Stream
of:
1 2 3 4 | interface IssueWiseText { int issueLength(); int textLength(); } |
and that we need to calculate issue coverage:
total issue length
─────────────
total text length
This requirement translates to the following signature:
1 | Collector<IssueWiseText, ?, Double> toIssueCoverage(); |
Solution
An average developer may decide to use a custom accumulation type A
to solve this (other solutions are possible, though). Let’s say the developer names it CoverageContainer
so that:
T
:IssueWiseText
A
:CoverageContainer
R
:Double
Below, I’ll show how such a developer may arrive at the structure of CoverageContainer
.
Structure Without Accumulative
Note: This section is long to illustrate how complex the procedure may be for a developer inexperienced with Collector
s. You may skip it if you realize this already
Without Accumulative
, the developer will look at Collector.of
, and see four main parameters:
Supplier<A> supplier
BiConsumer<A, T> accumulator
BinaryOperator<A> combiner
Function<A, R> finisher
To handle Supplier<A> supplier
, the developer should:
- mentally substitute
A
inSupplier<A>
to getSupplier<CoverageContainer>
- mentally resolve the signature to
CoverageContainer get()
- recall the JavaDoc for
Collector.supplier()
- recall method reference of the 4th kind (reference to a constructor)
- realize that
supplier = CoverageContainer::new
To handle BiConsumer<A, T> accumulator
, the developer should:
BiConsumer<CoverageContainer, IssueWiseText>
void accept(CoverageContainer a, IssueWiseText t)
- mentally transform the signature to an instance-method one
void accumulate(IssueWiseText t)
- recall method reference of the 3rd kind (reference to an instance method of an arbitrary object of a particular type)
- realize that
accumulator = CoverageContainer::accumulate
To handle BinaryOperator<A> combiner
:
BinaryOperator<CoverageContainer>
CoverageContainer apply(CoverageContainer a, CoverageContainer b)
CoverageContainer combine(CoverageContainer other)
combiner = CoverageContainer::combine
To handle Function<A, R> finisher
:
Function<CoverageContainer, Double>
Double apply(CoverageContainer a)
double issueCoverage()
finisher = CoverageContainer::issueCoverage
This long procedure results in:
1 2 3 4 5 6 7 | class CoverageContainer { void accumulate(IssueWiseText t) { } CoverageContainer combine(CoverageContainer other) { } double issueCoverage() { } } |
And the developer can define toIssueCoverage()
(having to provide the arguments in proper order):
1 2 3 4 5 6 | Collector<IssueWiseText, ?, Double> toIssueCoverage() { return Collector.of( CoverageContainer:: new , CoverageContainer::accumulate, CoverageContainer::combine, CoverageContainer::finish ); } |
Structure With Accumulative
Now, with Accumulative
, the developer will look at the new Collector.of
overload and will see only one main parameter:
Supplier<A> supplier
and one bounded type parameter:
A extends Accumulative<T, A, R>
So the developer will start with the natural thing — implementing Accumulative<T, A, R>
and resolving T
, A
, R
for the first and last time:
1 2 3 | class CoverageContainer implements Accumulative<IssueWiseText, CoverageContainer, Double> { } |
At this point, a decent IDE will complain that the class must implement all abstract methods. What’s more — and that’s the most beautiful part — it will offer a quick fix. In IntelliJ, you hit “Alt+Enter” → “Implement methods”, and… you’re done!
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 | class CoverageContainer implements Accumulative<IssueWiseText, CoverageContainer, Double> { @Override public void accumulate(IssueWiseText issueWiseText) { } @Override public CoverageContainer combine(CoverageContainer other) { return null ; } @Override public Double finish() { return null ; } } |
So… you don’t have to juggle the types, write anything manually, nor name anything!
Oh, yes — you still need to define toIssueCoverage()
, but it’s simple now:
1 2 3 | Collector<IssueWiseText, ?, Double> toIssueCoverage() { return Collector.of(CoverageContainer:: new ); } |
Isn’t that nice?
Implementation
The implementation isn’t relevant here, as it’s nearly the same for both cases (diff).
Rationale
Too Complex Procedure
I hope I’ve demonstrated how defining a custom Collector
can be a challenge. I must say that even I always feel reluctant about defining one. However, I also feel that — with Accumulative
— this reluctance would go away, because the procedure would shrink to two steps:
- Implement
Accumulative<T, A, R>
- Call
Collector.of(YourContainer::new)
Drive to Implement
JetBrains coined “the drive to develop“, and I’d like to twist it to “the drive to implement”.
Since a Collector
is simply a box of functions, there’s usually no point (as far as I can tell) to implement it (there are exceptions). However, a Google search for “implements Collector” shows (~5000 results) that people do it.
And it’s natural, because to create a “custom” TYPE
in Java, one usually extends/implements TYPE
. In fact, it’s so natural that even experienced developers (like Tomasz Nurkiewicz, a Java Champion) may do it.
To sum up, people feel the drive to implement, but — in this case — JDK provides them with nothing to implement. And Accumulative
could fill this gap…
Relevant Examples
Finally, I searched for examples where it’d be straightforward to implement Accumulative
.
In OpenJDK (which is not the target place, though), I found two:
On Stack Overflow, though, I found plenty: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53.
I also found a few array-based examples that could be refactored to Accumulative
for better readability: a, b, c.
Naming
Accumulative
is not the best name, mainly because it’s an adjective. However, I chose it because:
- I wanted the name to start with
A
(as in<T, A, R>
), - my best candidate (
Accumulator
) was already taken byBiConsumer<A, T> accumulator()
, AccumulativeContainer
seemed too long.
In OpenJDK, A
is called:
which prompts the following alternatives:
AccumulatingBox
AccumulationState
Collector.Container
MutableResultContainer
Of course, if the idea were accepted, the name would go through the “traditional” name bikeshedding
Summary
In this post, I proposed to add Accumulative
interface and a new Collector.of
overload to the JDK. With them, creating a custom Collector
would no longer be associated by developers with a lot of effort. Instead, it’d simply become “implement the contract” & “reference the constructor”.
In other words, this proposal aims at lowering the bar of entering the custom-Collector
world!
Appendix
Optional reading below.
Example Solution: JDK 12+
In JDK 12+, we’ll be able to define toIssueCoverage()
as a composed Collector
, thanks to Collectors.teeing
(JDK-8209685):
static Collector<IssueWiseText, ?, Double> toIssueCoverage() {
return Collectors.teeing(
Collectors.summingInt(IssueWiseText::issueLength),
Collectors.summingInt(IssueWiseText::textLength),
(totalIssueLength, totalTextLength) -> (double) totalIssueLength / totalTextLength
);
}
The above is concise, but it may be somewhat hard to follow for a Collector API newbie.
Example Solution: the JDK Way
Alternatively, toIssueCoverage()
could be defined as:
static Collector<IssueWiseText, ?, Double> toIssueCoverage() {
return Collector.of(
() -> new int[2],
(a, t) -> { a[0] += t.issueLength(); a[1] += t.textLength(); },
(a, b) -> { a[0] += b[0]; a[1] += b[1]; return a; },
a -> (double) a[0] / a[1]
);
}
I dubbed this the “JDK way”, because some Collector
s are implemented like that in OpenJDK (e.g. Collector.averagingInt
).
Yet, while such terse code may be suitable for OpenJDK, it’s certainly not suitable for business logic because of the level of readability (which is low to the point that I call cryptic).
Published on Java Code Geeks with permission by Tomasz Linkowski, partner at our JCG program. See the original article here: Accumulative: Custom Java Collectors Made Easy Opinions expressed by Java Code Geeks contributors are their own. |