Remove Non-alphabetic Characters From String Array Example
1. Introduction
Removing non-alphabetic characters from a string is useful for an application that includes text search, match, and analysis. In this example, I will show four ways to remove non-alphabetic characters string:
- via
String.replaceAll
method with regular expressions. - via character filtering with java.util.Stream.
- via StringBuilder from the java.lang package to append the alphabetic characters.
- via RegExUtils.replaceAll from Apache Commons Lang.
2. Setup
In this step, I will create a maven project with both Apache Commons Lang and Junit 5 libraries.
pom.xml
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd" > < modelVersion >4.0.0</ modelVersion > < groupId >org.zheng</ groupId > < artifactId >t</ artifactId > < version >0.0.1-SNAPSHOT</ version > < dependencies > < dependency > < groupId >org.apache.commons</ groupId > < artifactId >commons-lang3</ artifactId > < version >3.17.0</ version > </ dependency > <!-- < dependency > < groupId >org.junit.jupiter</ groupId > < artifactId >junit-jupiter-api</ artifactId > < version >5.11.4</ version > < scope >test</ scope > </ dependency > </ dependencies > </ project > |
3. Remove All Non-alphabetic Characters String
In this step, I will create a RemoveNonAlphabeticUtil.java
class that includes four methods to remove non-alphabetic characters.
viaCharacter
utilizes theStringBuilder
to append the alphabetic characters from the character intoCharArray
.viaRegExUtils_replaceAll
uses thereplaceAll
method fromorg.apache.commons.lang3.RegExUtils
.viaString_replaceAll_RegEx
uses thereplaceAll
method fromjava.lang.String
.viaStream
filters out any non-alphabetic character.
RemoveNonAlphabeticUtil.java
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | package org.zheng.demo; import java.util.Arrays; import org.apache.commons.lang3.RegExUtils; public class RemoveNonAlphabeticUtil { private static final String NON_ALPHA_REGEX = "[^a-zZ-Z]" ; public String viaCharacter( final String testStr) { StringBuilder sb = new StringBuilder(); for ( char c : testStr.toCharArray()) { if (Character.isLetter(c)) { sb.append(c); } } return sb.toString(); } public String viaRegExUtils_replaceAll( final String testMsgs) { return RegExUtils.replaceAll(testMsgs, NON_ALPHA_REGEX, "" ); } public String viaString_replaceAll_RegEx( final String testMsgs) { return testMsgs.replaceAll(NON_ALPHA_REGEX, "" ); } public String[] viaStream( final String[] stringArray) { return Arrays.stream(stringArray) .map(str -> str.chars().filter(Character::isLetter) .collect(StringBuilder:: new , StringBuilder::appendCodePoint, StringBuilder::append).toString()) .toArray(String[]:: new ); } } |
- Line 8: the regular expression
[^a-zA-Z]
matches any non-alphabetic character, meaning any character that is neither a lowercase nor an uppercase letter. - Line 13: append only if the
Character.isLetter(c)
is true. - Line 21: replace any non-alphabetic character with “” via the
org.apache.commons.lang3.RegExUtils.replaceAll
method. - Line 26: replace any non-alphabetic character with “” via the
java.lang.String.replaceAll
method. - Line 32: remove any non-alphabetic character with
Character::isLetter
via thejava.util.Stream.filter
method.
4. Junit Test
In this step, I will create a RemoveNonAlphabeticUtilTest.java
to test the four methods defined in step 3.
RemoveNonAlphabeticUtilTest.java
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | package org.zheng.demo; import static org.junit.jupiter.api.Assertions.assertEquals; import org.junit.jupiter.api.Test; class RemoveNonAlphabeticUtilTest { String[] stringArray = { "this " , "is:" , "some odd!~12323" , "characters!" }; RemoveNonAlphabeticUtil testClass = new RemoveNonAlphabeticUtil(); @Test void test_viaCharacter() { for ( int idx = 0 ; idx < stringArray.length; idx++) { stringArray[idx] = testClass.viaCharacter(stringArray[idx]); } verifyData(stringArray); } @Test void test_viaRegExUtils_replaceAll() { for ( int idx = 0 ; idx < stringArray.length; idx++) { stringArray[idx] = testClass.viaRegExUtils_replaceAll(stringArray[idx]); } verifyData(stringArray); } @Test void test_viaString_replaceAll_RegEx() { for ( int idx = 0 ; idx < stringArray.length; idx++) { stringArray[idx] = testClass.viaString_replaceAll_RegEx(stringArray[idx]); } verifyData(stringArray); } @Test void test_viaStream() { String[] updatedStrs = testClass.viaStream(stringArray); verifyData(updatedStrs); } private void verifyData( final String[] stringArray) { assertEquals( "this" , stringArray[ 0 ]); assertEquals( "is" , stringArray[ 1 ]); assertEquals( "someodd" , stringArray[ 2 ]); assertEquals( "characters" , stringArray[ 3 ]); } } |
- Line 9: defines a test string array { “this “, “is:”, “some odd!~12323”, “characters!” }. Note: there are some non-alphabetic characters: white space, colon(:), exclamation mark(!), tilde(~), and numeric digits(12323).
- Line 48: verifies that the white space is removed.
- Line 49: verifies that the colon is removed.
- Line 50: verifies that the white space, “!”, “~“, and digits are removed.
- Line 51: verifies that the “!” is removed.
5. Demonstration
In this step, I will run the Junit tests and capture the test results.
6. Conclusion
In this example, I created a simple maven project that included a Java class with four methods to remove non-alphabetic characters from a string. Both viaString_replaceAll_RegEx
and viaRegExUtils_replaceAll
methods utilize replaceAll
with a regular expression argument from both org.apache.commons.lang3.RegExUtils
and java.lang.String
classes.
7. Download
This was an example of a maven project which removed non-alphabetic characters from a string.
You can download the full source code of this example here: Remove Non-alphabetic Characters From String Array Example