Core Java

Convert Avro File to JSON File in Java

In today’s data-driven world, effective data serialization formats are crucial for efficient storage and transmission of information. Apache Avro and JSON (JavaScript Object Notation) are two widely used formats that cater to different needs. Let us dive into understanding how Java can be utilized to convert Avro files into JSON format.

1. Introduction to Apache Avro and JSON

Apache Avro data types and protocols while storing data in a compact binary format. This makes Avro efficient for both storage and transmission. Avro files include a schema that describes the structure of the data, supporting complex data types such as nested records, arrays, and maps. One of its key features is schema evolution, allowing changes over time while maintaining backward compatibility.

JSON (JavaScript Object Notation) is a lightweight, text-based format used for data interchange. It is easy for humans to read and write and simple for machines to parse and generate. JSON structures data in key-value pairs within objects and lists within arrays, making it widely used in web applications for data exchange between clients and servers.

2. Converting Avro Object to JSON

To convert an Avro object to JSON in Java, you can use the Apache Avro library. Below is an example demonstrating how to perform this conversion:

// Import necessary classes

import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;

public class AvroToJsonExample {
    public static void main(String[] args) throws Exception {
        // Define your Avro schema
        String schemaString = "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"age\",\"type\":\"int\"}]}";
        Schema schema = new Schema.Parser().parse(schemaString);

        // Create a GenericRecord
        GenericData.Record user = new GenericData.Record(schema);
        user.put("name", "John Doe");
        user.put("age", 30);

        // Convert to JSON
        String jsonOutput = user.toString();
        System.out.println("JSON Output: " + jsonOutput);
    }
}

2.1 Code Explanation and Output

The provided Java code defines a class named AvroToJsonExample, which demonstrates how to create an Avro record and convert it into a JSON string. The program begins with the main method, which serves as the entry point for execution.

Within the main method, the first step is to define an Avro schema in JSON format. This schema describes a data structure called User, which contains two fields: name, a string, and age, an integer. The schema is represented as a JSON string and is parsed using the Schema.Parser class to create a Schema object.

Next, a GenericRecord object named user is created using the defined schema. This object acts as a container for the data that adheres to the specified schema. The code then populates this record by calling the put method twice: first to set the name field to “John Doe” and second to set the age field to 30.

Finally, the code converts the populated Avro record into a JSON string by invoking the toString method on the user object. This JSON representation is stored in the variable jsonOutput. The program then prints this output to the console, displaying the resulting JSON string that represents the user’s data.

2.2.1 Code Output

If everything goes well the following output will be shown on the IDE console.

JSON Output: {"name":"John Doe","age":30}

3. Converting Avro File to JSON File

To convert an entire Avro file to a JSON file, you can utilize the following approach using Apache Avro’s command-line tools or programmatically in Java:

// Import necessary classes

import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.JsonEncoder;
import org.apache.avro.io.EncoderFactory;

import java.io.File;
import java.io.FileWriter;

public class AvroFileToJson {
    public static void main(String[] args) throws Exception {
        File avroFile = new File("/path_to _avro_file/users.avro");
        File jsonFile = new File("/path_to_json_file/users.json");

        // Create a reader for the Avro file
        DatumReader datumReader = new GenericDatumReader();
        DataFileReader dataFileReader = new DataFileReader(avroFile, datumReader);

        // Create a writer for the JSON file
        FileWriter fileWriter = new FileWriter(jsonFile);
        JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(dataFileReader.getSchema(), fileWriter);

        // Read each record from the Avro file and write it as JSON
        while (dataFileReader.hasNext()) {
            GenericRecord record = dataFileReader.next();
            // Write record to JSON
            datumWriter.write(record, jsonEncoder);
            jsonEncoder.flush();
            fileWriter.write("\n"); // New line for each record
        }

        // Close resources
        jsonEncoder.flush();
        fileWriter.close();
        dataFileReader.close();
        
        System.out.println("Conversion complete: " + jsonFile.getAbsolutePath());
    }
}

3.1 Code Explanation and Output Understanding

The Java code provided defines a class named AvroFileToJson, which is designed to convert an Avro file into a JSON file. The program begins execution from the main method, where it handles file operations and data conversions.

Initially, two File objects are created: one for the input Avro file named users.avro and another for the output JSON file named users.json. These files are referenced using their respective paths.

Next, the code sets up a reader for the Avro file. A DatumReader of type GenericRecord is instantiated using GenericDatumReader. This reader is then used to create a DataFileReader, which is responsible for reading the contents of the specified Avro file.

Following this, the program prepares to write to the JSON file. A FileWriter object is created for the output JSON file. Additionally, a JsonEncoder is initialized using the schema obtained from the Avro file reader. This encoder will facilitate converting Avro records into JSON format.

The core of the conversion process occurs within a while loop that checks if there are more records to read from the Avro file. For each record retrieved using dataFileReader.next(), the code writes it to the JSON encoder using the datumWriter.write() method. After writing each record, the encoder is flushed to ensure that all data is written out properly, and a newline character is added to separate records in the JSON output.

Once all records have been processed, the program performs cleanup by flushing any remaining data in the encoder, closing the file writer, and closing the data file reader. Finally, a message indicating that the conversion is complete is printed to the console, along with the absolute path of the generated JSON file.

4. Conclusion

Converting Avro files to JSON format in Java is straightforward with the use of the Apache Avro library. This process allows developers to leverage the efficiency of binary storage while providing flexibility through JSON for data interchange. The examples provided illustrate how to convert both individual Avro objects and entire files into JSON format, facilitating integration with various systems that utilize JSON.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button