How to Pattern-Match Files and Display Adjacent Lines in Java
Recently, we’ve published our article about the awesome window function support in jOOλ 0.9.9, which I believe is some of the best additions to the library that we’ve ever done.
Today, we’ll look into an awesome application of window functions in a use-case that is inspired by this Stack Overflow question Sean Nguyen:
How to get lines before and after matching from java 8 stream like grep?
I have a text files that have a lot of string lines in there. If I want to find lines before and after a matching in grep, I will do like this:
grep -A 10 -B 10 "ABC" myfile.txtHow can I implement the equivalent in java 8 using streams?
So the question is:
How can I implement the equivalent in Java 8 using streams?
Well, the unix shell and its various “pipable” commands are about the only thing that are even more awesome (and mysterious) than window functions. Being able to grep for a certain string in a file, and then display a “window” of a couple of lines is quite useful.
With jOOλ 0.9.9, however, we can do that very easily in Java 8 as well. Consider this little snippet:
Seq.seq(Files.readAllLines(Paths.get( new File("/path/to/Example.java").toURI()))) .window() .filter(w -> w.value().contains("ABC")) .forEach(w -> { System.out.println(); System.out.println("-1:" + w.lag().orElse("")); System.out.println(" 0:" + w.value()); System.out.println("+1:" + w.lead().orElse("")); // ABC: Just checking });
This program will output:
-1: .window() 0: .filter(w -> w.value().contains("ABC")) +1: .forEach(w -> { -1: System.out.println("+1:" + w.lead().orElse("")); 0: // ABC: Just checking +1: });
So, I’ve run the program on itself and I’ve found all the lines that match “ABC”, plus the previous lines (“lagging” / lag()
) and the following lines (leading / lead()
). These lead()
and lag()
functions work just like their SQL equivalents.
But unlike SQL, composing functions in Java (or other general purpose languages) is a bit simpler as there is less syntax clutter involved. We can easily do aggregations over a window frame to collect a generic amount of lines “lagging” and “leading” a match. Consider the following alternative:
int lower = -5; int upper = 5; Seq.seq(Files.readAllLines(Paths.get( new File("/path/to/Example.java").toURI()))) .window(lower, upper) .filter(w -> w.value().contains("ABC")) .map(w -> w.window() .zipWithIndex() .map(t -> tuple(t.v1, t.v2 + lower)) .map(t -> (t.v2 > 0 ? "+" : t.v2 == 0 ? " " : "") + t.v2 + ":" + t.v1) .toString("\n"))
And the output that we’re getting is this:
-5:int upper = 5; -4: -3:Seq.seq(Files.readAllLines(Paths.get( -2: new File("/path/to/Example.java").toURI()))) -1: .window(lower, upper) 0: .filter(w -> w.value().contains("ABC")) +1: .map(w -> w.window() +2: .zipWithIndex() +3: .map(t -> tuple(t.v1, t.v2 + lower)) +4: .map(t -> (t.v2 > 0 +5: ? "+"
Could it get any more concise? I don’t think so. Most of the logic above was just generating the index next to the line.
Conclusion
Window functions are extremely powerful. The recent discussion on reddit about our previous article on jOOλ’s window function support has shown that other languages also support primitives to build similar functionality. But usually, these building blocks aren’t as concise as the ones exposed in jOOλ, which are inspired by SQL.
With jOOλ mimicking SQL’s window functions, there is only little cognitive friction when composing powerful operations on in memory data streams.
Learn more about window functions in these articles here:
- Probably the Coolest SQL Feature: Window Functions
- Use this Neat Window Function Trick to Calculate Time Differences in a Time Series
- How to Find the Longest Consecutive Series of Events in SQL
- Don’t Miss out on Awesome SQL Power with FIRST_VALUE(), LAST_VALUE(), LEAD(), and LAG()
- The Difference Between ROW_NUMBER(), RANK(), and DENSE_RANK()
Reference: | How to Pattern-Match Files and Display Adjacent Lines in Java from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog. |