Converting UTF-8 to ISO-8859-1

Mary ZhengMay 17th, 2024Last Updated: May 17th, 2024

0 2,954 4 minutes read

1. Introduction

ISO 8859 is an eight-bit extension to ASCII developed by the International Organization for Standardization (ISO). ISO 8859 includes the 128 ASCII characters and additional 128 characters. ISO-8859-1 (Latin-1) is the first version of ISO-8859 which supports most Western-European languages including Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish. Unicode Transformation-8-bit (UTF-8) is a variable-length character encoding standard and each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte and they are the same as those in ASCII. Therefore, both ISO-8859-1 and UTF-8 are backwards compatible with ASCII. ISO-8859-1 is more memory-efficient than UTF-8 since it uses a single-byte for each character. If the applications support only Western-European languages and don’t require characters from other languages or special symbols, then ISO 8859-1 is a better choice. In this example, I will demonstrate UTF-8 to ISO-8859-1 conversion with Java applications.

2. Set up Java Project

In this step, I will create a simple Java project in an Eclipse IDE. In order to display the UTF-8 character in the console window, please select the “UTF-8” from with the “Other:” options under the “text file encoding” section as the screenshot shown here.

Figure 1 Eclipse IDE Text File Encoding Setting

3. UTF-8 to ISO-8859-1 Conversion via getBytes

In this step, I will create a ConvertViaBytes class which converts the bytes of the original UTF-8 string to a sequence of characters using UTF-8 encoding, and then encoding those characters into bytes using ISO-8859-1 encoding.

ConvertViaBytes.java

package org.zheng.demo;
 
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
 
public class ConvertViaBytes {
 
    private static final String ISO_8859_1 = "ISO-8859-1";
    private static final String UTF_8 = "UTF-8";
 
    public static void main(String[] args) {
        System.out.println("Java default Charset: " + Charset.defaultCharset());
 
        Charset.availableCharsets().entrySet().stream()
                .filter(c -> c.getKey().startsWith(UTF_8) || c.getKey().startsWith(ISO_8859_1))
                .forEach(c -> System.out.println("Found Charset: " + c.getKey()));
 
        try {
            String utf8String = "UTF-8 Text: MaryZhengäöüß测试";
 
            // Convert UTF-8 string to byte array using UTF-8 encoding
            byte[] utf8Bytes = utf8String.getBytes(UTF_8);
 
            // Convert byte array to string using ISO-8859-1 encoding
            String iso88591String = new String(utf8Bytes, ISO_8859_1);
 
            System.out.println("Original UTF-8 string: " + utf8String);
            System.out.println("Converted ISO-8859-1 string: " + iso88591String);
        } catch (UnsupportedEncodingException e) {
            System.out.println("Unsupported encoding: " + e.getMessage());
        }
    }
 
}

line 12: prints out the default character setting. For this example, it should print out as “UTF-8”.
line 15, 16: prints out the supported character setting whose name starts with “UTF-8” and “ISO-8859-1”. You will see that there are several supported versions of ISO-8859-1.
line 19: defines a UTF-8 string which includes ASCII characters and two Chinese characters.
line 22: returns a byte array of the UTF-8 string.
line 25: creates a new string with the above byte array and encodes it with ISO-8859-1.
line 27, 28: prints the original UTF-8 string and converted string.

Execute the main program and capture the output.

ConvertViaBytes output

Java default Charset: UTF-8
Found Charset: ISO-8859-1
Found Charset: ISO-8859-13
Found Charset: ISO-8859-15
Found Charset: ISO-8859-16
Found Charset: UTF-8
Original UTF-8 string: UTF-8 Text: MaryZhengäöüß测试
Converted ISO-8859-1 string: UTF-8 Text: MaryZhengÃ¤Ã¶Ã¼Ãæµè¯

Note: as you saw at the last line, the converted string didn’t display the Chinese characters correctly.

4. UTF-8 to ISO-8859-1 Conversion via charArray

In this step, I will create a ConvertViaCharArrayclass which converts the original UTF-8 string to a char array and then create a string from byte[] with ISO-8859-1 encoding.

ConvertViaCharArray.java

package org.zheng.demo;
 
import java.nio.charset.Charset;
 
public class ConvertViaCharArray {
 
    private static final int LAST_CHAR = 0xFF;
    private static final String ISO_8859_1 = "ISO-8859-1";
 
    public static void main(String[] args) {
 
        String utf8String = "UTF-8 Text: MaryZhengäöüß测试";
 
        // Decode UTF-8 string to characters
        char[] utf8Chars = utf8String.toCharArray();
 
        // Encode characters to ISO-8859-1 bytes
        byte[] iso88591Bytes = new byte[utf8Chars.length];
        for (int i = 0; i < utf8Chars.length; i++) {
            char c = utf8Chars[i];
             
            if (c <= LAST_CHAR) {
                iso88591Bytes[i] = (byte) c;
            } else {
                iso88591Bytes[i] = '?'; // Replace characters not representable in ISO-8859-1
            }
        }
 
        // Create ISO-8859-1 string from bytes
        String iso88591String = new String(iso88591Bytes, Charset.forName(ISO_8859_1));
 
        System.out.println("Original UTF-8 string: " + utf8String);
        System.out.println("Converted ISO-8859-1 string: " + iso88591String);
    }
 
}

line 12: defines a UTF-8 string with some Chinese characters.
line 15: returns a charArray from the above UTF-8 string.
line 18: creates a new byte array with the same length as the original string.
line 22,23: reuses the same bytes if the character is less than the last ASCII 0xFF.
line 25: changes the character to ? for these non-represtable UTF-8 characters.
line 30: creates a new string with ISO-8859-1 encoding.
line 32, 33: prints out the original UTF-8 and converted string.

Execute the main program and capture the output:

ConvertViaCharArray output

Original UTF-8 string: UTF-8 Text: MaryZhengäöüß测试
Converted ISO-8859-1 string: UTF-8 Text: MaryZhengäöüß??

Note: as you see from the outline, the Chinese characters changed to the ? symbol.

5. Conclusion

Different operating systems choose a different default character encoding. For example, Microsoft Windows system default character encoding is set as UTF-16 while Linux and MasOS set UTF-8 as the default. Sometimes, character encoding conversion is necessary to ensure that text data is properly interpreted and processed. In this example, I demonstrated UTF-8 to ISO-8859-1 conversion with two java applications. The ConvertViaCharArray class converts a UTF-8 String to ISO-8859-1 and masks the not-supported characters with the question mark(?). The ConvertViaBytes class converts a UTF-8 string into ISO-8859-1 with the getBytes method.

6. Download

This was a Java example of converting UTF-8 to ISO-8859-1.

Download
You can download the full source code of this example here: Converting UTF-8 to ISO-8859-1

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design

and many more ....

I agree to the Terms and Privacy Policy

Mary ZhengMay 17th, 2024Last Updated: May 17th, 2024

0 2,954 4 minutes read

Converting UTF-8 to ISO-8859-1

1. Introduction

2. Set up Java Project

3. UTF-8 to ISO-8859-1 Conversion via getBytes

4. UTF-8 to ISO-8859-1 Conversion via charArray

5. Conclusion

6. Download

Thank you!

Mary Zheng

Thank you!

1. Introduction

2. Set up Java Project

3. UTF-8 to ISO-8859-1 Conversion via getBytes

4. UTF-8 to ISO-8859-1 Conversion via charArray

5. Conclusion

6. Download

Thank you!

Related Articles

Thank you!