Core Java

Read a .gz File via GZIPInputStream

1. Introduction

GZIP, short for GNU Zip, is a compression technology used for transferring data over the internet. Java built-in library includes the GZIPInputStream class which reads compressed data in the GZIP file format. In this example, I’ll demonstrate how to utilize gzipinputstream to read a .gz file line by line, performing the following steps.

  • Construct FileInputStream from the compressed .gz file.
  • Construct GZIPInputStream from the FileInputStream.
  • Construct BufferReader from the GZIPInputStream.
  • Utilize the BufferReader‘s readLine or lines to process line by line. Stream API is more efficient when reading from a large compressed file.

2. Read .gz file via BufferReader.readLine

In this step, I will use try-resource clause to create three AutoCloseable resources: InputStream, GZIPInputStream, and BufferReader and then loop through BufferReader‘s lines to print the compressed sample.csv.gz file.

ReadGZipViaBufferReader.java

package readgzipfile;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;

public class ReadGZipViaBufferReader {
	public static void main(String[] args) {
		String fileName = "sample.csv.gz";

		try (InputStream fileInputStream = ReadGZipViaBufferReader.class.getClassLoader().getResourceAsStream(fileName);
				GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
				BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream))) {

			String line;
			while ((line = bufferedReader.readLine()) != null) {
				System.out.println(line);
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}
  • line 13: create a try-resource clause. The first resource is fileInputStream from the sample.csv.gz file.
  • line 14: create a GZIPInputStream from the fileInputStream created at line 13.
  • line 15: create a BufferReader from the GZIPInputStream created at line 14.
  • line 18: loop through the bufferReader created at line 15 and print out each line.

Note: the try-resource clause will auto close the resources in the reversed order from the declaration statements.

3. Read .gz file via BufferReader.lines

In this step, I will use try-resource clause to create three resources: InputStream, GZIPInputStream, and BufferReader and then use Stream.forEach to print the compressed sample.csv.gz file.

Please note that this is the same as step 2 except using Stream.forEach method to process data because Stream API reads and processes the file without loading the entire contents into memory, therefore it’s more efficient when dealing with large compressed .gz files.

ReadGZipViaStream.java

package readgzipfile;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;

public class ReadGZipViaStream {
	public static void main(String[] args) {
		String fileName = "sample.csv.gz";

		try (InputStream fileInputStream = ReadGZipViaStream.class.getClassLoader().getResourceAsStream(fileName);
				GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
				BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream))) {

			bufferedReader.lines().forEach(System.out::println);

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}
  • line 13: create a try-resource clause. The first resource is fileInputStream from the sample.csv.gz file.
  • line 14: create a GZIPInputStream from the fileInputStream created at line 13.
  • line 15: create a BufferReader from the GZIPInputStream created at line 14.
  • line 17: loop through the bufferReader via Stream.forEach method as it’s more efficient.

4. Read .gz file via BufferReader.lines with Nested Resources

In this step, I will use try-resource clause to create a BufferReader which is built from GZIPInputStream, and the GZIPInputStream is built from the FileInputStream of the compressed sample.csv.gz file. Note: this step is exactly the same as step 3 but using the nested constructor to create a BufferReader object.

ReadGZipViaStream2.java

package readgzipfile;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.zip.GZIPInputStream;

public class ReadGZipViaStream2 {
	public static void main(String[] args) {
		String fileName = "sample.csv.gz";

		try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
				new GZIPInputStream(ReadGZipViaStream2.class.getClassLoader().getResourceAsStream(fileName))))) {

			bufferedReader.lines().forEach(System.out::println);

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}
  • line 12,13: create a BufferReader resource with try-resource clause with nested constructors.
  • line 15: loop through the bufferReader via Stream.forEach method as it’s more efficient.

Ran the program and captured the output as the following screenshot. Please note that the sample.csv.gz file is under the src/main/resources folder.

Figure 1. Read .gz File

5. Conclusion

In this example, I created three classes with java gzipinputstream read gz file line by line with the following steps:

  • Construct a FileInputStream from the compressed .gz file.
  • Construct a GZIPInputStream from the FileInputStream.
  • Construct a BufferReader from the GZIPInputStream.
  • Invoke BufferReader.readLine method to process line by line.
  • Invoke BufferReader.lines and the Stream.foreach method to process line by line.

6. Download

This was an example of reading a .gz file via GZIPInputStream .

Download
You can download the full source code of this example here: Read a .gz File via GZIPInputStream

Mary Zheng

Mary graduated from the Mechanical Engineering department at ShangHai JiaoTong University. She also holds a Master degree in Computer Science from Webster University. During her studies she has been involved with a large number of projects ranging from programming and software engineering. She worked as a lead Software Engineer where she led and worked with others to design, implement, and monitor the software solution.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button