Filterer Pattern in 10 Steps
Filterer is a pattern that should be applied only in certain cases. In the original post, I presented a very simple example intended to show howto apply it. In this post, I present a much more detailed example that’s intended to also explain when and why to apply it.
Introduction
The post consists of the following 10 short steps. In each step, I introduce requirements of the following two types:
- B-*: business requirements (given by the product owner → indisputable)
- S-*: solution requirements (resulting from the choice of solutions → disputable)
and I present a Java model meeting the requirements introduced so far. I do this until Filterer
emerges as the preferable solution.
So, let me take you upon this journey…
Step 1: Issue Detector
Requirements #1
Let’s assume business asks for an algorithm to detect grammatical and spelling issues in English texts.
For example:
- text: You migth know it. → issues to detect:
- migth (type: spelling)
- text: I have noting to loose. → issues to detect:
- noting (type: spelling)
- to loose (type: grammar)
- text: I kept noting it’s loose. → issues to detect: ∅
This is out first business requirement (B-1).
The simplest model meeting B-1 could be:
- input: plain text
- output: a list of issues, where each issue provides:
- offsets within the input text
- a type (grammar / spelling)
This is our first solution requirement (S-1).
Java Model #1
We can model S-1 as:
interface IssueDetector { // e.g. text: "You migth know it." List<Issue> detect(String text); }
where:
interface Issue { int startOffset(); // e.g. 4 (start of "migth") int endOffset(); // e.g. 9 (end of "migth") IssueType type(); // e.g. SPELLING }
enum IssueType { GRAMMAR, SPELLING }
It’s commit 1.
Step 2: Probability
Requirements #2
However, it’d be rather hard to implement a real IssueDetector
that worked in such a deterministic way:
- issue (probability P=100%)
- non-issue (probability P=0%)
Instead, IssueDetector
should rather be probabilistic:
- probable issue (probability P=?)
We can keep the issue/non-issue distinction by introducing a probability threshold (PT):
- issue (probability P ≥ PT),
- non-issue (probability P < PT).
Still, it’s worth to adapt the model to keep the probability (P) — it’s useful e.g. in rendering (higher probability → more prominent rendering).
To sum up, our extra solution requirements are:
- S-2: Support issue probability (P);
- S-3: Support probability threshold (PT).
Java Model #2
We can meet S-2 by adding probability()
to Issue
:
interface Issue { // ... double probability(); }
We can meet S-3 by adding probabilityThreshold
to IssueDetector
:
interface IssueDetector { List<Issue> detect(String text, double probabilityThreshold); }
It’s commit 2.
Step 3: Probable Issue
Requirements #3
Assume business requires:
- B-3: Test all issue detectors using texts proofread by an English linguist (= no probabilities).
Such a proofread text (or: a test case) can be defined as:
- text, e.g. You shuold know it.
- expected issues, e.g.
- shuold (type: spelling)
So, our solution requirement is:
- S-4: Support expected issues (= no probability).
Java Model #3
We can meet S-4 by extracting a subinterface (ProbableIssue
):
interface ProbableIssue extends Issue { double probability(); }
and by returning ProbableIssue
s from IssueDetector
:
interface IssueDetector { List<ProbableIssue> detect(...); }
It’s commit 3.
Step 4: Issue-wise Text
Requirements #4
Assume that:
- All test cases are defined externally (e.g. in XML files);
- We want to create a parametrized JUnit test where parameters are test cases provided as a
Stream
.
Generally, a test case represents something we could call an issue-wise text (a text + its issues).
In order to avoid modeling issue-wise text as Map.Entry<String, List<Issue>>
(which is vague, and signifies insufficient abstraction), let’s introduce another solution requirement:
- S-5: Support issue-wise texts.
Java Model #4
We can model S-5 as:
interface IssueWiseText { String text(); // e.g. "You migth know it." List<Issue> issues(); // e.g. ["migth"] }
This lets us define a Stream
of test cases simply as
Stream<IssueWiseText>
instead of
Stream<Map.Entry<String, List<Issue>>>
.
It’s commit 4.
Step 5: Expected Coverage
Requirements #5
Assume business requires:
- B-4: Report expected issue coverage for a stream of test cases;
where issue coverage — for the sake of simplicity — is defined as:
total issue length
─────────────
total text length
In reality, issue coverage could represent some very complex business logic.
Java Model #5
We can handle B-4 with a Collector
-based method:
static double issueCoverage(Stream<? extends IssueWiseText> textStream) { return textStream.collect(IssueCoverage.collector()); }
The Collector
is based on an Accumulator
having two mutable fields:
int totalIssueLength = 0; int totalTextLength = 0;
which, for each IssueWiseText
, we increment:
totalIssueLength += issueWiseText.issues().stream().mapToInt(Issue::length).sum(); totalTextLength += issueWiseText.text().length();
and then we calculate issue coverage as:
(double) totalIssueLength / totalTextLength
It’s commit 5.
Step 6: Obtained Coverage
Requirements #6
Assume business requires:
- B-5: Report obtained issue coverage for the entire test set.
where “obtained” means “calculated using detected issues”. Now things start to get interesting!
First of all, since IssueCoverage
represents business logic, we shouldn’t duplicate it:
- S-6: Reuse issue coverage code.
Secondly, since the method takes a Stream<? extends IssueWiseText>
, we need to model an IssueWiseText
for ProbableIssue
s:
- S-7: Support probabilistic issue-wise texts.
I see only two choices here:
- Parametrization:
IssueWiseText<I extends Issue>
; - Subtyping:
ProbabilisticIssueWiseText extends IssueWiseText
.
Parametric Java Model #6
The parametric model of S-7 is simple — we need <I extends Issue>
(a bounded type parameter) in IssueWiseText
:
interface IssueWiseText<I extends Issue> { String text(); List<I> issues(); }
This model has drawbacks (like type erasure), but it’s concise.
We can also adapt IssueDetector
to return IssueWiseText<ProbableIssue>
.
What’s more, our Stream
of test cases may turn into Stream<IssueWiseText<Issue>>
(although IssueWiseText<Issue>
is somewhat controversial).
It’s commit 6a.
Subtyping Java Model #6
The other option is to choose subtyping (which has its own drawbacks, greatest of which can perhaps be duplication).
A subtyping model of S-7 employs return type covariance:
interface ProbabilisticIssueWiseText extends IssueWiseText { @Override List<? extends ProbableIssue> issues(); }
where issues()
in IssueWiseText
has to become upper bounded (List<? extends Issue>
).
We can also adapt IssueDetector
to return ProbabilisticIssueWiseText
.
It’s commit 6b.
Step 7: Filtering by Issue Type
Requirements #7
Assume business requires:
- B-6: Report issue coverage per issue type.
We could support it by accepting an extra parameter of type Predicate<? super Issue>
(IssueType
parameter would be too narrow, in general).
However, supporting it directly in IssueCoverage
would complicate business logic (commit 7a’). Instead, we’d rather feed the filtered instances of IssueWiseText
to IssueCoverage
.
How do we do the filtering? Doing it “manually” (calling new
ourselves) would introduce unnecessary coupling to the implementations (we don’t even know them yet). That’s why we’ll let IssueWiseText
do the filtering (I feel this logic belongs there):
- S-8: Support filtering by
Issue
inIssueWiseText
.
In other words, we want to be able to say:
In other words, we want to be able to say:
Hey
IssueWiseText
, filter yourself byIssue
!
Parametric Java Model #7
In the parametric model, we add the following filtered
method to IssueWiseText<I>
IssueWiseText<I> filtered(Predicate<? super I> issueFilter);
This lets us meet B-6 as:
return textStream .map(text -> text.filtered(issue -> issue.type() == issueType)) .collect(IssueCoverage.collector());
It’s commit 7a.
Subtyping Java Model #7
In the subtyping model, we also add filtered
method (very similar to the one above):
IssueWiseText filtered(Predicate<? super Issue> issueFilter);
This lets us meet B-6 in the same way as above.
It’s commit 7b.
Step 8: Filtering by Probability
Requirements #8
Assume business requires:
- B-7: Report issue coverage per minimum probability.
In other words, business wants to know how the probability distribution affects issue coverage.
Now, we don’t want to run IssueDetector
with many different probability thresholds (PT), because it’d be very inefficient. Instead, we’ll run it just once (with PT=0), and then keep discarding issues with the lowest probability to recalculate issue coverage.
Yet, in order to be able to filter by probabilities, we need to:
- S-9: Support filtering by
ProbableIssue
in probabilistic issue-wise text.
Parametric Java Model #8
In the parametric model, we don’t need to change anything. We can meet B-7 as:
return textStream .map(text -> text.filtered(issue -> issue.probability() >= minProbability)) .collect(IssueCoverage.collector());
It’s commit 8a.
Subtyping Java Model #8
In the subtyping model, it’s harder, because we need an extra method in ProbabilisticIssueWiseText
:
ProbabilisticIssueWiseText filteredProbabilistic(Predicate<? super ProbableIssue> issueFilter);
which lets us meet B-7 as:
return textStream .map(text -> text.filteredProbabilistic(issue -> issue.probability() >= minProbability)) .collect(IssueCoverage.collector());
It’s commit 8b.
To me, this extra method in ProbabilisticIssueWiseText
is quite disturbing, though (see here). That’s why I propose…
Step 9: Filterer
Requirements #9
Since regular filtering in the subtyping model is so “non-uniform”, let’s make it uniform:
- S-10: Support uniform filtering in the subtyping model of issue-wise text.
In other words, we want to be able to say:
Hey
ProbabilisticIssueWiseText
, filter yourself byProbableIssue
(but in the same way asIssueWiseText
filters itself byIssue
)!
To the best of my knowledge, this can be achieved only with the Filterer Pattern.
Subtyping Java Model #9
So we apply a generic Filterer
to IssueWiseText
:
Filterer<? extends IssueWiseText, ? extends Issue> filtered();
and to ProbablisticIssueWiseText
:
@Override Filterer<? extends ProbabilisticIssueWiseText, ? extends ProbableIssue> filtered();
Now, we can filter uniformly by calling:
text.filtered().by(issue -> ...)
It’s commit 9.
Step 10: Detection Time
By this time, you must wonder why I bother with the subtyping model if the parametric one is so much easier.
So, for the last time, let’s assume that business requires:
- B-8: Report detection time (= time it takes to detect all issues in a given text).
Parametric Java Model #10
I see only two ways of incorporating B-8 into the parametric model: 1) composition, 2) subtyping.
Composition for Parametric Java Model #10
Applying composition is easy. We introduce IssueDetectionResult
:
interface IssueDetectionResult { IssueWiseText<ProbableIssue> probabilisticIssueWiseText(); Duration detectionTime(); }
and modify IssueDetector
to return it.
It’s commit 10a.
Subtyping for Parametric Java Model #10
Applying subtyping requires a bit more work. We need to add ProbabilisticIssueWiseText<I>
*
interface ProbabilisticIssueWiseText<I extends ProbableIssue> extends IssueWiseText<I> { Duration detectionTime(); // ... }
and modify IssueDetector
to return ProbabilisticIssueWiseText<?>
.
It’s commit 10a’.
* Note that I left <I>
on ProbabilisticIssueWiseText
in order not to correlate parametrization with subtyping in a dangerous way.
Subtyping Java Model #10
With the purely subtyping model, incorporating B-8 is very easy. We just add detectionTime()
to ProbabilisticIssueAwareText
:
interface ProbabilisticIssueWiseText extends IssueWiseText { Duration detectionTime(); // ... }
It’s commit 10b.
Conclusions
There’s no time left to go into details (the post is already way longer than I expected).
However, I prefer pure subtyping (and hence Filterer
) over other solutions because:
- Parametrization with composition leaves me without a common supertype (in certain cases, it’s a problem);
- Parametrization with subtyping has too many degrees of freedom.
By “too many degrees of freedom”, I mean I only need:
IssueAwareText<?>
ProbabilisticIssueAwareText<?>
IssueAwareText<Issue>
(controversial)
but in code, I’ll also encounter (saying from experience!):
IssueAwareText<? extends Issue>
(redundant upper bound)IssueAwareText<ProbableIssue>
IssueAwareText<? extends ProbableIssue>
(why notProbabilisticIssueAwareText<?>
?)ProbabilisticIssueAwareText<? extends ProbableIssue>
(redundant upper bound)ProbabilisticIssueAwareText<ProbableIssue>
so it’s just too confusing for me. But if you’re really interested in this topic, check out Complex Subtyping vs. Parametrization (be warned, though — it’s even longer than this post!).
Thank you for reading!
Published on Java Code Geeks with permission by Tomasz Linkowski, partner at our JCG program. See the original article here: Filterer Pattern in 10 Steps Opinions expressed by Java Code Geeks contributors are their own. |