Core Java

Filtering a List with Regular Expressions in Java

In Java, filtering a list based on certain criteria is a common operation. One powerful way to do this is by using Regular Expressions (Regex), which allows for pattern matching. Whether you’re working with a list of strings or other types of data, Regex can help filter out unwanted elements efficiently. Let us delve into understanding how to apply a Java list regex filter to efficiently match and extract specific elements based on patterns.

1. Regex Overview

Regular Expressions (Regex) are sequences of characters that form search patterns. In Java, the java.util.regex package provides the tools to work with Regex. Regex is commonly used for tasks such as searching, matching, and replacing text based on patterns. The key components of Regex include:

  • Literal Characters: These represent the actual characters we want to match, e.g., “a”, “1”, etc.
  • Metacharacters: Special characters like ., *, and +, which represent more complex patterns.
  • Character Classes: Denoted by square brackets, like [a-z] for lowercase letters.
  • Quantifiers: Define how many instances of a character or group of characters should be matched, e.g., *, +, or {n}.

To perform Regex operations in Java, we typically use the Pattern and Matcher classes from the java.util.regex package.

2. Different Ways to Filter a List in Java Using a Regex

Let’s explore different ways to filter a list in Java using Regular Expressions. We’ll start by defining a list of strings and applying various filtering techniques.

2.1 Using Java 8 Streams and Regex

Java 8 introduced the Stream API, which makes it easier to perform operations on collections like filtering, mapping, and reducing. We can use streams with a Regex to filter a list of strings. Here’s an example of filtering a list of strings that contain the word “Java” using a Regex pattern:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
import java.util.*;
import java.util.regex.*;
import java.util.stream.*;
 
public class RegexFilterExample {
    public static void main(String[] args) {
        List items = Arrays.asList("Java", "Python", "JavaScript", "Ruby", "JavaFX");
         
        // Regex pattern to match strings containing "Java"
        String pattern = ".*Java.*";
         
        List filteredItems = items.stream()
            .filter(item -> Pattern.matches(pattern, item)) // Filtering using Regex
            .collect(Collectors.toList());
         
        System.out.println(filteredItems);
    }
}

2.1.1 Code Explanation and Output

In the provided Java code, a List of strings is created with the names of several programming languages. The goal is to filter this list to only include those strings that contain the word “Java”.

The code first imports the necessary libraries such as List, Arrays, Pattern, and Collectors. It initializes a List called items with the strings “Java”, “Python”, “JavaScript”, “Ruby”, and “JavaFX”.

A String variable called pattern is defined with the regular expression .*Java.*. This Regex pattern is designed to match any string that contains the substring “Java” at any position in the string. The .* before and after “Java” allows for any characters to precede or follow the word “Java”.

The code then uses the stream() method to convert the items list into a stream. The filter() method is applied to the stream, and it filters the elements by checking each element against the regular expression using the Pattern.matches() method. The matches() method returns true for any string that matches the pattern, and false otherwise.

After filtering, the collect() method is called to collect the matching elements into a new list called filteredItems using Collectors.toList().

Finally, the System.out.println(filteredItems) statement prints the filtered list to the console.

The output of this code will be a list containing the strings “Java”, “JavaScript”, and “JavaFX” because these are the strings that contain the word “Java”.

2.2 Using a Loop and Matcher Class

Another way to filter a list is by using a for loop and the Matcher class. This allows for more control over the matching process. Here’s an example that filters out all strings that don’t start with “Java”:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
import java.util.*;
import java.util.regex.*;
 
public class RegexFilterWithMatcher {
    public static void main(String[] args) {
        List items = Arrays.asList("Java", "Python", "JavaScript", "Ruby", "JavaFX");
         
        // Regex pattern to match strings starting with "Java"
        String pattern = "^Java.*";
         
        List filteredItems = new ArrayList();
         
        for (String item : items) {
            Matcher matcher = Pattern.compile(pattern).matcher(item);
            if (matcher.matches()) {
                filteredItems.add(item); // Adding matching items to the list
            }
        }
         
        System.out.println(filteredItems);
    }
}

2.2.1 Code Explanation and Output

In this Java code, a List of strings is created with various programming languages, including “Java”, “Python”, “JavaScript”, “Ruby”, and “JavaFX”. The goal is to filter this list to only include strings that start with the word “Java”.

The code defines a String variable named pattern with the regular expression ^Java.*. This Regex pattern is designed to match any string that begins with “Java”. The ^ at the start of the pattern indicates the beginning of the string, and .* means any characters can follow after “Java”.

An empty ArrayList called filteredItems is created to hold the strings that match the pattern.

The code then enters a for loop, iterating over each string in the items list. Inside the loop, a Matcher object is created using the Pattern.compile(pattern) method to compile the Regex pattern, followed by the matcher(item) method to apply the pattern to the current string item.

The matcher.matches() method is called to check if the current string matches the pattern. If the string matches (i.e., it starts with “Java”), the string is added to the filteredItems list using filteredItems.add(item).

After the loop completes, the filtered list of strings is printed to the console with System.out.println(filteredItems).

The output will be a list containing the strings “Java”, “JavaScript”, and “JavaFX”, as these are the strings that start with “Java”.

2.3 Using the Predicate Interface

If you’re working with Java 8 and above, you can also use the Predicate interface to filter a list with a Regex. Here’s an example that filters the list of strings based on a Regex condition:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import java.util.*;
import java.util.regex.*;
import java.util.function.*;
 
public class RegexFilterWithPredicate {
    public static void main(String[] args) {
        List items = Arrays.asList("Java", "Python", "JavaScript", "Ruby", "JavaFX");
         
        // Regex pattern to match strings containing "Java"
        String pattern = ".*Java.*";
         
        // Creating a Predicate to filter using the Regex pattern
        Predicate matchesPattern = item -> Pattern.compile(pattern).matcher(item).matches();
         
        List filteredItems = new ArrayList();
         
        for (String item : items) {
            if (matchesPattern.test(item)) {
                filteredItems.add(item); // Adding matching items to the list
            }
        }
         
        System.out.println(filteredItems);
    }
}

2.3.1 Code Explanation and Output

In this Java code, a list of strings is created containing several programming language names: “Java”, “Python”, “JavaScript”, “Ruby”, and “JavaFX”. The goal of the program is to filter out the strings that contain the word “Java” using Regular Expressions.

The code defines a regular expression pattern .*Java.* to match any string that contains “Java” at any position within the string. The .* on either side of “Java” allows for any characters before or after the word “Java”.

Next, the code creates a Predicate named matchesPattern using a lambda expression. This Predicate takes each string in the list and applies the regex pattern to it by compiling the pattern with Pattern.compile(pattern) and then matching it using the matcher(item).matches() method. The matches() method returns true if the string matches the pattern and false if it doesn’t.

An empty ArrayList named filteredItems is then created to store the strings that match the pattern. The program iterates over the items list using a for loop, applying the test() method of the matchesPattern predicate to each string. If the string matches the pattern, it is added to the filteredItems list.

Finally, the code prints the filtered list of strings, which will include “Java”, “JavaScript”, and “JavaFX” because they contain the word “Java”.

This approach demonstrates how to use the Predicate interface in combination with a regular expression to filter a list concisely and functionally.

3. Conclusion

Filtering a list using Regular Expressions in Java provides a powerful mechanism for processing and filtering data based on patterns. By combining Regex with Java’s Stream API, Matcher class, and Predicate interface, developers can implement various filtering strategies efficiently.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest


This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button