Core Java

How to Pattern-Match Files and Display Adjacent Lines in Java

Recently, we’ve published our article about the awesome window function support in jOOλ 0.9.9, which I believe is some of the best additions to the library that we’ve ever done.

Today, we’ll look into an awesome application of window functions in a use-case that is inspired by this Stack Overflow question Sean Nguyen:

How to get lines before and after matching from java 8 stream like grep?

I have a text files that have a lot of string lines in there. If I want to find lines before and after a matching in grep, I will do like this:

grep -A 10 -B 10 "ABC" myfile.txt

How can I implement the equivalent in java 8 using streams?

So the question is:

How can I implement the equivalent in Java 8 using streams?

jool-logoWell, the unix shell and its various “pipable” commands are about the only thing that are even more awesome (and mysterious) than window functions. Being able to grep for a certain string in a file, and then display a “window” of a couple of lines is quite useful.

With jOOλ 0.9.9, however, we can do that very easily in Java 8 as well. Consider this little snippet:

Seq.seq(Files.readAllLines(Paths.get(
        new File("/path/to/Example.java").toURI())))
   .window()
   .filter(w -> w.value().contains("ABC"))
   .forEach(w -> {
       System.out.println();
       System.out.println("-1:" + w.lag().orElse(""));
       System.out.println(" 0:" + w.value());
       System.out.println("+1:" + w.lead().orElse(""));
       // ABC: Just checking
   });

This program will output:

-1: .window()
 0: .filter(w -> w.value().contains("ABC"))
+1: .forEach(w -> {

-1:     System.out.println("+1:" + w.lead().orElse(""));
 0:     // ABC: Just checking
+1: });

So, I’ve run the program on itself and I’ve found all the lines that match “ABC”, plus the previous lines (“lagging” / lag()) and the following lines (leading / lead()). These lead() and lag() functions work just like their SQL equivalents.

But unlike SQL, composing functions in Java (or other general purpose languages) is a bit simpler as there is less syntax clutter involved. We can easily do aggregations over a window frame to collect a generic amount of lines “lagging” and “leading” a match. Consider the following alternative:

int lower = -5;
int upper =  5;
        
Seq.seq(Files.readAllLines(Paths.get(
        new File("/path/to/Example.java").toURI())))
   .window(lower, upper)
   .filter(w -> w.value().contains("ABC"))
   .map(w -> w.window()
              .zipWithIndex()
              .map(t -> tuple(t.v1, t.v2 + lower))
              .map(t -> (t.v2 > 0 
                       ? "+" 
                       : t.v2 == 0 
                       ? " " : "") 
                       + t.v2 + ":" + t.v1)
              .toString("\n"))

And the output that we’re getting is this:

-5:int upper =  5;
-4:        
-3:Seq.seq(Files.readAllLines(Paths.get(
-2:        new File("/path/to/Example.java").toURI())))
-1:   .window(lower, upper)
 0:   .filter(w -> w.value().contains("ABC"))
+1:   .map(w -> w.window()
+2:              .zipWithIndex()
+3:              .map(t -> tuple(t.v1, t.v2 + lower))
+4:              .map(t -> (t.v2 > 0 
+5:                       ? "+"

Could it get any more concise? I don’t think so. Most of the logic above was just generating the index next to the line.

Conclusion

Window functions are extremely powerful. The recent discussion on reddit about our previous article on jOOλ’s window function support has shown that other languages also support primitives to build similar functionality. But usually, these building blocks aren’t as concise as the ones exposed in jOOλ, which are inspired by SQL.

With jOOλ mimicking SQL’s window functions, there is only little cognitive friction when composing powerful operations on in memory data streams.

Learn more about window functions in these articles here:

Lukas Eder

Lukas is a Java and SQL enthusiast developer. He created the Data Geekery GmbH. He is the creator of jOOQ, a comprehensive SQL library for Java, and he is blogging mostly about these three topics: Java, SQL and jOOQ.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button