Normalize End Of Line Character
1. Introduction
An end-of-line (EOL) character is a special character that marks the end of a line in a text file or a string. Historically, different operating systems denote a different character for EOL. For example, The UNIX system defines EOL as "\n"
(NewLine ), and Apple MacOS chooses "\r"
(CarriageReturn) while the Microsoft Windows system defines "\r\n"
(CRLF). Out of these three EOL, the “\n
” is the most used. When java programs process text data from different operating systems with different EOL characters, normalizing the EOL characters ensures consistency in data processing and avoids unexpected behavior. In this example, I will demonstrate how to normalize end of line character via the java.lang.System.lineSeparator method as it returns the system-dependent line separator string.
2. Maven Set up
In this step, I will set up a maven project to print out 3 text lines.
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.zheng.demo</groupId> <artifactId>demoEOL</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.14.0</version> </dependency> </dependencies> </project>
I will create a TestData
class which builds 3 text lines with a different EOL character with the buildAStringWithEol
method.
TestData.java
package demoEOL; public class TestData { public static String EOL_MAC = "\r"; public static String EOL_UNIX = "\n"; public static String EOL_WIN = "\r\n"; public static String buildAStringWithEol(String eolChar) { StringBuilder sb = new StringBuilder(); sb.append("Line 1Mary"); sb.append(eolChar); sb.append("Line 2Zheng"); sb.append(eolChar); sb.append("Line 3Joe"); sb.append(eolChar); return sb.toString(); } public static void printout(String replacedString) { System.out.print(System.lineSeparator()); System.out.println("*** Replaced String should have 3 lines:"); System.out.print(replacedString); } }
- Line 4: the EOL character for MacOs.
- Line 5: the EOL character for UNIX.
- Line 6: the EOL character for Windows.
- Line 22: the Built-in Java system property for the line separator. The value is different based on the underlying operating system.
3. Show the Issue
In this step, I will demonstrate the potential issue caused by unnormalized EOL characters.
DemoEOLIssue.java
package demoEOL; import org.junit.jupiter.api.Test; class DemoEOLIssue { @Test void test_eol_macs() { System.out.println("Mac \\r with 3 lines:"); System.out.print(TestData.buildAStringWithEol(TestData.EOL_MAC)); } @Test void test_eol_macs_console() { System.out.print("Line1LongSentenceWillbeTruncated\r"); System.out.print("Line2MaryZheng"); } }
- Line 10: constructs three text line with MacOs EOL character.
- Line 15: print out a line with
"\r"
character which moves the cursor to the beginning at the line in Windows system instead of creating a new line as it does in MacOS.
Execution the test_eol_macs tests
and capture the output here.
test_eol_test output
Mac \r with 3 lines: Line 3Joeng
Note: as you see in the output, the first two lines are missing and the third line becomes “Line 3Joeng“.
Run the test_eol_macs_console
test and capture the output:
test_eol_macs_console output
Line2MaryZhengnceWillbeTruncated
As you see here, the printed line is not expected. This is the data issue caused by the EOL characters "\r"
is not supported as the new line in the windows system.
Note: Eclipse IDE console setting for "Interpret Carriage Return(\r) as control character"
should be checked.
4. Normalize EOL Character via Java Built-in Library
In this step, I will normalize the EOL characters into System.lineSeparator
with two Java built-in libraries:
String.replaceAll
– replaces each substring of this string that matches the given regular expression with the given replacement.- Stream API – uses both
lines
andmap
methods to transform the line separator.
NormalizeEOL.java
package demoEOL; import java.util.stream.Collectors; import org.junit.jupiter.api.Test; class NormalizeEOL { private static final String NEW_LINE_REG = "\r\n|\r|\n"; @Test void test_eol_mac_stream() { String replacedString = TestData.buildAStringWithEol(TestData.EOL_MAC).lines() .map(line -> line + System.lineSeparator()).collect(Collectors.joining()); TestData.printout(replacedString); } @Test void test_eol_macs() { String replacedString = TestData.buildAStringWithEol(TestData.EOL_MAC).replaceAll(NEW_LINE_REG, System.lineSeparator()); TestData.printout(replacedString); } @Test void test_eol_unix() { String replacedString = TestData.buildAStringWithEol(TestData.EOL_UNIX).replaceAll(NEW_LINE_REG, System.lineSeparator()); TestData.printout(replacedString); } @Test void test_eol_unix_stream() { String replacedString = TestData.buildAStringWithEol(TestData.EOL_UNIX).lines() .map(line -> line + System.lineSeparator()).collect(Collectors.joining()); TestData.printout(replacedString); } @Test void test_eol_windows() { String replacedString = TestData.buildAStringWithEol(TestData.EOL_WIN).replaceAll(NEW_LINE_REG, System.lineSeparator()); TestData.printout(replacedString); } @Test void test_eol_windows_stream() { String replacedString = TestData.buildAStringWithEol(TestData.EOL_WIN).lines() .map(line -> line + System.lineSeparator()).collect(Collectors.joining()); TestData.printout(replacedString); } }
- Line 9: create a regular expression for three EOL characters used by Unix, MacOS, and Windows.
- Line 14. 37, 52: use Stream
map
method to change the EOL toSystem.lineSeparator
. - Line 21, 29, 44: Use replaceAll to
System.lineSeparator
Execute the Junit test and capture the output. All print three text lines as expected.
Junit Output
*** Replaced String should have 3 lines: Line 1Mary Line 2Zheng Line 3Joe
5. Normalize EOL Character via Apache Library
In this step, I will normalize the EOL characters into System.lineSeparator
with Apache Common Lang library:
replaceEach
: replaces all occurrences of aString
within anotherString
.
NormalizeEOLViaApache.java
package demoEOL; import org.apache.commons.lang3.StringUtils; import org.junit.jupiter.api.Test; public class NormalizeEOLViaApache { @Test void test_eol_mac_apacheCommon() { String replacedString = StringUtils.replaceEach(TestData.buildAStringWithEol(TestData.EOL_MAC), new String[] { TestData.EOL_WIN, TestData.EOL_MAC, TestData.EOL_UNIX }, new String[] { System.lineSeparator(), System.lineSeparator(), System.lineSeparator() }); TestData.printout(replacedString); } @Test void test_eol_unix_apacheCommon() { String replacedString = StringUtils.replaceEach(TestData.buildAStringWithEol(TestData.EOL_UNIX), new String[] { TestData.EOL_WIN, TestData.EOL_MAC, TestData.EOL_UNIX }, new String[] { System.lineSeparator(), System.lineSeparator(), System.lineSeparator() }); TestData.printout(replacedString); } @Test void test_eol_windows_apacheCommon() { String replacedString = StringUtils.replaceEach(TestData.buildAStringWithEol(TestData.EOL_WIN), new String[] { TestData.EOL_WIN, TestData.EOL_MAC, TestData.EOL_UNIX }, new String[] { System.lineSeparator(), System.lineSeparator(), System.lineSeparator() }); TestData.printout(replacedString); } }
Execute the Junit test and capture the output. All prints out three lines as expected.
Junit Output
*** Replaced String should have 3 lines: Line 1Mary Line 2Zheng Line 3Joe
6. Conclusion
In this example, I created four Java classes to demonstrate the importance of normalizing the EOL characters and how to normalize end of line characters in Java.
- The
TestData
class sets up the basic 3 text lines. - The
DemoEOLIssue
class demonstrates the issue caused by the EOL character used in the string is not supported by the underlying operating system. - The
NormalizeEOL
class normalizes the EOL characters intoSystem.lineSeparator
based on the built-in library. - The
NormalizeEOLViaApache
class normalizes the EOL character intoSystem.lineSeparator
based on the apache common lang library.
7. Download
This was an example of a Java maven project which normalizes the EOL character.
You can download the full source code of this example here: Normalize the EOL Character