Remove Non-alphabetic Characters From String Array Example
1. Introduction
Removing non-alphabetic characters from a string is useful for an application that includes text search, match, and analysis. In this example, I will show four ways to remove non-alphabetic characters string:
- via
String.replaceAll
method with regular expressions. - via character filtering with java.util.Stream.
- via StringBuilder from the java.lang package to append the alphabetic characters.
- via RegExUtils.replaceAll from Apache Commons Lang.
2. Setup
In this step, I will create a maven project with both Apache Commons Lang and Junit 5 libraries.
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.zheng</groupId> <artifactId>t</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.17.0</version> </dependency> <!-- https://mvnrepository.com/artifact/org.junit.jupiter/junit-jupiter-api --> <dependency> <groupId>org.junit.jupiter</groupId> <artifactId>junit-jupiter-api</artifactId> <version>5.11.4</version> <scope>test</scope> </dependency> </dependencies> </project>
3. Remove All Non-alphabetic Characters String
In this step, I will create a RemoveNonAlphabeticUtil.java
class that includes four methods to remove non-alphabetic characters.
viaCharacter
utilizes theStringBuilder
to append the alphabetic characters from the character intoCharArray
.viaRegExUtils_replaceAll
uses thereplaceAll
method fromorg.apache.commons.lang3.RegExUtils
.viaString_replaceAll_RegEx
uses thereplaceAll
method fromjava.lang.String
.viaStream
filters out any non-alphabetic character.
RemoveNonAlphabeticUtil.java
package org.zheng.demo; import java.util.Arrays; import org.apache.commons.lang3.RegExUtils; public class RemoveNonAlphabeticUtil { private static final String NON_ALPHA_REGEX = "[^a-zZ-Z]"; public String viaCharacter(final String testStr) { StringBuilder sb = new StringBuilder(); for (char c : testStr.toCharArray()) { if (Character.isLetter(c)) { sb.append(c); } } return sb.toString(); } public String viaRegExUtils_replaceAll(final String testMsgs) { return RegExUtils.replaceAll(testMsgs, NON_ALPHA_REGEX, ""); } public String viaString_replaceAll_RegEx(final String testMsgs) { return testMsgs.replaceAll(NON_ALPHA_REGEX, ""); } public String[] viaStream(final String[] stringArray) { return Arrays.stream(stringArray) .map(str -> str.chars().filter(Character::isLetter) .collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append).toString()) .toArray(String[]::new); } }
- Line 8: the regular expression
[^a-zA-Z]
matches any non-alphabetic character, meaning any character that is neither a lowercase nor an uppercase letter. - Line 13: append only if the
Character.isLetter(c)
is true. - Line 21: replace any non-alphabetic character with “” via the
org.apache.commons.lang3.RegExUtils.replaceAll
method. - Line 26: replace any non-alphabetic character with “” via the
java.lang.String.replaceAll
method. - Line 32: remove any non-alphabetic character with
Character::isLetter
via thejava.util.Stream.filter
method.
4. Junit Test
In this step, I will create a RemoveNonAlphabeticUtilTest.java
to test the four methods defined in step 3.
RemoveNonAlphabeticUtilTest.java
package org.zheng.demo; import static org.junit.jupiter.api.Assertions.assertEquals; import org.junit.jupiter.api.Test; class RemoveNonAlphabeticUtilTest { String[] stringArray = { "this ", "is:", "some odd!~12323", "characters!" }; RemoveNonAlphabeticUtil testClass = new RemoveNonAlphabeticUtil(); @Test void test_viaCharacter() { for (int idx = 0; idx < stringArray.length; idx++) { stringArray[idx] = testClass.viaCharacter(stringArray[idx]); } verifyData(stringArray); } @Test void test_viaRegExUtils_replaceAll() { for (int idx = 0; idx < stringArray.length; idx++) { stringArray[idx] = testClass.viaRegExUtils_replaceAll(stringArray[idx]); } verifyData(stringArray); } @Test void test_viaString_replaceAll_RegEx() { for (int idx = 0; idx < stringArray.length; idx++) { stringArray[idx] = testClass.viaString_replaceAll_RegEx(stringArray[idx]); } verifyData(stringArray); } @Test void test_viaStream() { String[] updatedStrs = testClass.viaStream(stringArray); verifyData(updatedStrs); } private void verifyData(final String[] stringArray) { assertEquals("this", stringArray[0]); assertEquals("is", stringArray[1]); assertEquals("someodd", stringArray[2]); assertEquals("characters", stringArray[3]); } }
- Line 9: defines a test string array { “this “, “is:”, “some odd!~12323”, “characters!” }. Note: there are some non-alphabetic characters: white space, colon(:), exclamation mark(!), tilde(~), and numeric digits(12323).
- Line 48: verifies that the white space is removed.
- Line 49: verifies that the colon is removed.
- Line 50: verifies that the white space, “!”, “~“, and digits are removed.
- Line 51: verifies that the “!” is removed.
5. Demonstration
In this step, I will run the Junit tests and capture the test results.
6. Conclusion
In this example, I created a simple maven project that included a Java class with four methods to remove non-alphabetic characters from a string. Both viaString_replaceAll_RegEx
and viaRegExUtils_replaceAll
methods utilize replaceAll
with a regular expression argument from both org.apache.commons.lang3.RegExUtils
and java.lang.String
classes.
7. Download
This was an example of a maven project which removed non-alphabetic characters from a string.
You can download the full source code of this example here: Remove Non-alphabetic Characters From String Array Example