Core Java

Converting UTF-8 to ISO-8859-1

1. Introduction

ISO 8859 is an eight-bit extension to ASCII developed by the International Organization for Standardization (ISO). ISO 8859 includes the 128 ASCII characters and additional 128 characters. ISO-8859-1 (Latin-1) is the first version of ISO-8859 which supports most Western-European languages including Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish. Unicode Transformation-8-bit (UTF-8) is a variable-length character encoding standard and each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte and they are the same as those in ASCII. Therefore, both ISO-8859-1 and UTF-8 are backwards compatible with ASCII. ISO-8859-1 is more memory-efficient than UTF-8 since it uses a single-byte for each character. If the applications support only Western-European languages and don’t require characters from other languages or special symbols, then ISO 8859-1 is a better choice. In this example, I will demonstrate UTF-8 to ISO-8859-1 conversion with Java applications.

2. Set up Java Project

In this step, I will create a simple Java project in an Eclipse IDE. In order to display the UTF-8 character in the console window, please select the “UTF-8” from with the “Other:” options under the “text file encoding” section as the screenshot shown here.

Figure 1 Eclipse IDE Text File Encoding Setting

3. UTF-8 to ISO-8859-1 Conversion via getBytes

In this step, I will create a ConvertViaBytes class which converts the bytes of the original UTF-8 string to a sequence of characters using UTF-8 encoding, and then encoding those characters into bytes using ISO-8859-1 encoding.

ConvertViaBytes.java

package org.zheng.demo;

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;

public class ConvertViaBytes {

	private static final String ISO_8859_1 = "ISO-8859-1";
	private static final String UTF_8 = "UTF-8";

	public static void main(String[] args) {
		System.out.println("Java default Charset: " + Charset.defaultCharset());

		Charset.availableCharsets().entrySet().stream()
				.filter(c -> c.getKey().startsWith(UTF_8) || c.getKey().startsWith(ISO_8859_1))
				.forEach(c -> System.out.println("Found Charset: " + c.getKey()));

		try {
			String utf8String = "UTF-8 Text: MaryZhengäöüß测试";

			// Convert UTF-8 string to byte array using UTF-8 encoding
			byte[] utf8Bytes = utf8String.getBytes(UTF_8);

			// Convert byte array to string using ISO-8859-1 encoding
			String iso88591String = new String(utf8Bytes, ISO_8859_1);

			System.out.println("Original UTF-8 string: " + utf8String);
			System.out.println("Converted ISO-8859-1 string: " + iso88591String);
		} catch (UnsupportedEncodingException e) {
			System.out.println("Unsupported encoding: " + e.getMessage());
		}
	}

}
  • line 12: prints out the default character setting. For this example, it should print out as “UTF-8”.
  • line 15, 16: prints out the supported character setting whose name starts with “UTF-8” and “ISO-8859-1”. You will see that there are several supported versions of ISO-8859-1.
  • line 19: defines a UTF-8 string which includes ASCII characters and two Chinese characters.
  • line 22: returns a byte array of the UTF-8 string.
  • line 25: creates a new string with the above byte array and encodes it with ISO-8859-1.
  • line 27, 28: prints the original UTF-8 string and converted string.

Execute the main program and capture the output.

ConvertViaBytes output

Java default Charset: UTF-8
Found Charset: ISO-8859-1
Found Charset: ISO-8859-13
Found Charset: ISO-8859-15
Found Charset: ISO-8859-16
Found Charset: UTF-8
Original UTF-8 string: UTF-8 Text: MaryZhengäöüß测试
Converted ISO-8859-1 string: UTF-8 Text: MaryZhengäöüß测è¯

Note: as you saw at the last line, the converted string didn’t display the Chinese characters correctly.

4. UTF-8 to ISO-8859-1 Conversion via charArray

In this step, I will create a ConvertViaCharArrayclass which converts the original UTF-8 string to a char array and then create a string from byte[] with ISO-8859-1 encoding.

ConvertViaCharArray.java

package org.zheng.demo;

import java.nio.charset.Charset;

public class ConvertViaCharArray {

	private static final int LAST_CHAR = 0xFF;
	private static final String ISO_8859_1 = "ISO-8859-1";

	public static void main(String[] args) {

		String utf8String = "UTF-8 Text: MaryZhengäöüß测试";

		// Decode UTF-8 string to characters
		char[] utf8Chars = utf8String.toCharArray();

		// Encode characters to ISO-8859-1 bytes
		byte[] iso88591Bytes = new byte[utf8Chars.length];
		for (int i = 0; i < utf8Chars.length; i++) {
			char c = utf8Chars[i];
			
			if (c <= LAST_CHAR) {
				iso88591Bytes[i] = (byte) c;
			} else {
				iso88591Bytes[i] = '?'; // Replace characters not representable in ISO-8859-1
			}
		}

		// Create ISO-8859-1 string from bytes
		String iso88591String = new String(iso88591Bytes, Charset.forName(ISO_8859_1));

		System.out.println("Original UTF-8 string: " + utf8String);
		System.out.println("Converted ISO-8859-1 string: " + iso88591String);
	}

}
  • line 12: defines a UTF-8 string with some Chinese characters.
  • line 15: returns a charArray from the above UTF-8 string.
  • line 18: creates a new byte array with the same length as the original string.
  • line 22,23: reuses the same bytes if the character is less than the last ASCII 0xFF.
  • line 25: changes the character to ? for these non-represtable UTF-8 characters.
  • line 30: creates a new string with ISO-8859-1 encoding.
  • line 32, 33: prints out the original UTF-8 and converted string.

Execute the main program and capture the output:

ConvertViaCharArray output

Original UTF-8 string: UTF-8 Text: MaryZhengäöüß测试
Converted ISO-8859-1 string: UTF-8 Text: MaryZhengäöüß??

Note: as you see from the outline, the Chinese characters changed to the ? symbol.

5. Conclusion

Different operating systems choose a different default character encoding. For example, Microsoft Windows system default character encoding is set as UTF-16 while Linux and MasOS set UTF-8 as the default. Sometimes, character encoding conversion is necessary to ensure that text data is properly interpreted and processed. In this example, I demonstrated UTF-8 to ISO-8859-1 conversion with two java applications. The ConvertViaCharArray class converts a UTF-8 String to ISO-8859-1 and masks the not-supported characters with the question mark(?). The ConvertViaBytes class converts a UTF-8 string into ISO-8859-1 with the getBytes method.

6. Download

This was a Java example of converting UTF-8 to ISO-8859-1.

Download
You can download the full source code of this example here: Converting UTF-8 to ISO-8859-1

Mary Zheng

Mary graduated from the Mechanical Engineering department at ShangHai JiaoTong University. She also holds a Master degree in Computer Science from Webster University. During her studies she has been involved with a large number of projects ranging from programming and software engineering. She worked as a lead Software Engineer where she led and worked with others to design, implement, and monitor the software solution.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button