Read a .gz File via GZIPInputStream
1. Introduction
GZIP, short for GNU Zip, is a compression technology used for transferring data over the internet. Java built-in library includes the GZIPInputStream class which reads compressed data in the GZIP file format. In this example, I’ll demonstrate how to utilize gzipinputstream to read a .gz file line by line, performing the following steps.
- Construct
FileInputStream
from the compressed.gz
file. - Construct
GZIPInputStream
from theFileInputStream
. - Construct
BufferReader
from theGZIPInputStream
. - Utilize the
BufferReader
‘sreadLine
orlines
to process line by line. Stream API is more efficient when reading from a large compressed file.
2. Read .gz file via BufferReader.readLine
In this step, I will use try-resource
clause to create three AutoCloseable
resources: InputStream
, GZIPInputStream
, and BufferReader
and then loop through BufferReader
‘s lines to print the compressed sample.csv.gz
file.
ReadGZipViaBufferReader.java
package readgzipfile; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.util.zip.GZIPInputStream; public class ReadGZipViaBufferReader { public static void main(String[] args) { String fileName = "sample.csv.gz"; try (InputStream fileInputStream = ReadGZipViaBufferReader.class.getClassLoader().getResourceAsStream(fileName); GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream))) { String line; while ((line = bufferedReader.readLine()) != null) { System.out.println(line); } } catch (IOException e) { e.printStackTrace(); } } }
- line 13: create a
try-resource
clause. The first resource isfileInputStream
from thesample.csv.gz
file. - line 14: create a
GZIPInputStream
from thefileInputStream
created at line 13. - line 15: create a
BufferReader
from theGZIPInputStream
created at line 14. - line 18: loop through the
bufferReader
created at line 15 and print out each line.
Note: the try-resource
clause will auto close the resources in the reversed order from the declaration statements.
3. Read .gz file via BufferReader.lines
In this step, I will use try-resource clause to create three resources: InputStream
, GZIPInputStream
, and BufferReader
and then use Stream.forEach
to print the compressed sample.csv.gz
file.
Please note that this is the same as step 2 except using Stream.forEach
method to process data because Stream API reads and processes the file without loading the entire contents into memory, therefore it’s more efficient when dealing with large compressed .gz
files.
ReadGZipViaStream.java
package readgzipfile; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.util.zip.GZIPInputStream; public class ReadGZipViaStream { public static void main(String[] args) { String fileName = "sample.csv.gz"; try (InputStream fileInputStream = ReadGZipViaStream.class.getClassLoader().getResourceAsStream(fileName); GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream))) { bufferedReader.lines().forEach(System.out::println); } catch (IOException e) { e.printStackTrace(); } } }
- line 13: create a
try-resource
clause. The first resource isfileInputStream
from thesample.csv.gz
file. - line 14: create a
GZIPInputStream
from thefileInputStream
created at line 13. - line 15: create a
BufferReader
from theGZIPInputStream
created at line 14. - line 17: loop through the
bufferReader
viaStream.forEach
method as it’s more efficient.
4. Read .gz file via BufferReader.lines with Nested Resources
In this step, I will use try-resource
clause to create a BufferReader
which is built from GZIPInputStream
, and the GZIPInputStream
is built from the FileInputStream
of the compressed sample.csv.gz
file. Note: this step is exactly the same as step 3 but using the nested constructor to create a BufferReader
object.
ReadGZipViaStream2.java
package readgzipfile; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.util.zip.GZIPInputStream; public class ReadGZipViaStream2 { public static void main(String[] args) { String fileName = "sample.csv.gz"; try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader( new GZIPInputStream(ReadGZipViaStream2.class.getClassLoader().getResourceAsStream(fileName))))) { bufferedReader.lines().forEach(System.out::println); } catch (IOException e) { e.printStackTrace(); } } }
- line 12,13: create a
BufferReader
resource withtry-resource
clause with nested constructors. - line 15: loop through the
bufferReader
viaStream.forEach
method as it’s more efficient.
Ran the program and captured the output as the following screenshot. Please note that the sample.csv.gz
file is under the src/main/resources
folder.
5. Conclusion
In this example, I created three classes with java gzipinputstream read gz file line by line with the following steps:
- Construct a
FileInputStream
from the compressed .gz file. - Construct a
GZIPInputStream
from theFileInputStream
. - Construct a
BufferReader
from theGZIPInputStream
. - Invoke
BufferReader.readLine
method to process line by line. - Invoke
BufferReader.lines
and theStream.foreach
method to process line by line.
6. Download
This was an example of reading a .gz
file via GZIPInputStream
.
You can download the full source code of this example here: Read a .gz File via GZIPInputStream