Convert Avro File to JSON File in Java
In today’s data-driven world, effective data serialization formats are crucial for efficient storage and transmission of information. Apache Avro and JSON (JavaScript Object Notation) are two widely used formats that cater to different needs. Let us dive into understanding how Java can be utilized to convert Avro files into JSON format.
1. Introduction to Apache Avro and JSON
Apache Avro data types and protocols while storing data in a compact binary format. This makes Avro efficient for both storage and transmission. Avro files include a schema that describes the structure of the data, supporting complex data types such as nested records, arrays, and maps. One of its key features is schema evolution, allowing changes over time while maintaining backward compatibility.
JSON (JavaScript Object Notation) is a lightweight, text-based format used for data interchange. It is easy for humans to read and write and simple for machines to parse and generate. JSON structures data in key-value pairs within objects and lists within arrays, making it widely used in web applications for data exchange between clients and servers.
2. Converting Avro Object to JSON
To convert an Avro object to JSON in Java, you can use the Apache Avro library. Below is an example demonstrating how to perform this conversion:
// Import necessary classes import org.apache.avro.Schema; import org.apache.avro.generic.GenericData; import org.apache.avro.io.Decoder; import org.apache.avro.io.DecoderFactory; import org.apache.avro.io.Encoder; import org.apache.avro.io.EncoderFactory; import org.apache.avro.specific.SpecificDatumReader; import org.apache.avro.specific.SpecificDatumWriter; public class AvroToJsonExample { public static void main(String[] args) throws Exception { // Define your Avro schema String schemaString = "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"age\",\"type\":\"int\"}]}"; Schema schema = new Schema.Parser().parse(schemaString); // Create a GenericRecord GenericData.Record user = new GenericData.Record(schema); user.put("name", "John Doe"); user.put("age", 30); // Convert to JSON String jsonOutput = user.toString(); System.out.println("JSON Output: " + jsonOutput); } }
2.1 Code Explanation and Output
The provided Java code defines a class named AvroToJsonExample
, which demonstrates how to create an Avro record and convert it into a JSON string. The program begins with the main
method, which serves as the entry point for execution.
Within the main
method, the first step is to define an Avro schema in JSON format. This schema describes a data structure called User
, which contains two fields: name
, a string, and age
, an integer. The schema is represented as a JSON string and is parsed using the Schema.Parser
class to create a Schema
object.
Next, a GenericRecord
object named user
is created using the defined schema. This object acts as a container for the data that adheres to the specified schema. The code then populates this record by calling the put
method twice: first to set the name
field to “John Doe” and second to set the age
field to 30.
Finally, the code converts the populated Avro record into a JSON string by invoking the toString
method on the user
object. This JSON representation is stored in the variable jsonOutput
. The program then prints this output to the console, displaying the resulting JSON string that represents the user’s data.
2.2.1 Code Output
If everything goes well the following output will be shown on the IDE console.
JSON Output: {"name":"John Doe","age":30}
3. Converting Avro File to JSON File
To convert an entire Avro file to a JSON file, you can utilize the following approach using Apache Avro’s command-line tools or programmatically in Java:
// Import necessary classes import org.apache.avro.file.DataFileReader; import org.apache.avro.file.DataFileWriter; import org.apache.avro.generic.GenericDatumReader; import org.apache.avro.generic.GenericRecord; import org.apache.avro.io.JsonEncoder; import org.apache.avro.io.EncoderFactory; import java.io.File; import java.io.FileWriter; public class AvroFileToJson { public static void main(String[] args) throws Exception { File avroFile = new File("/path_to _avro_file/users.avro"); File jsonFile = new File("/path_to_json_file/users.json"); // Create a reader for the Avro file DatumReader datumReader = new GenericDatumReader(); DataFileReader dataFileReader = new DataFileReader(avroFile, datumReader); // Create a writer for the JSON file FileWriter fileWriter = new FileWriter(jsonFile); JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(dataFileReader.getSchema(), fileWriter); // Read each record from the Avro file and write it as JSON while (dataFileReader.hasNext()) { GenericRecord record = dataFileReader.next(); // Write record to JSON datumWriter.write(record, jsonEncoder); jsonEncoder.flush(); fileWriter.write("\n"); // New line for each record } // Close resources jsonEncoder.flush(); fileWriter.close(); dataFileReader.close(); System.out.println("Conversion complete: " + jsonFile.getAbsolutePath()); } }
3.1 Code Explanation and Output Understanding
The Java code provided defines a class named AvroFileToJson
, which is designed to convert an Avro file into a JSON file. The program begins execution from the main
method, where it handles file operations and data conversions.
Initially, two File
objects are created: one for the input Avro file named users.avro
and another for the output JSON file named users.json
. These files are referenced using their respective paths.
Next, the code sets up a reader for the Avro file. A DatumReader
of type GenericRecord
is instantiated using GenericDatumReader
. This reader is then used to create a DataFileReader
, which is responsible for reading the contents of the specified Avro file.
Following this, the program prepares to write to the JSON file. A FileWriter
object is created for the output JSON file. Additionally, a JsonEncoder
is initialized using the schema obtained from the Avro file reader. This encoder will facilitate converting Avro records into JSON format.
The core of the conversion process occurs within a while loop that checks if there are more records to read from the Avro file. For each record retrieved using dataFileReader.next()
, the code writes it to the JSON encoder using the datumWriter.write()
method. After writing each record, the encoder is flushed to ensure that all data is written out properly, and a newline character is added to separate records in the JSON output.
Once all records have been processed, the program performs cleanup by flushing any remaining data in the encoder, closing the file writer, and closing the data file reader. Finally, a message indicating that the conversion is complete is printed to the console, along with the absolute path of the generated JSON file.
4. Conclusion
Converting Avro files to JSON format in Java is straightforward with the use of the Apache Avro library. This process allows developers to leverage the efficiency of binary storage while providing flexibility through JSON for data interchange. The examples provided illustrate how to convert both individual Avro objects and entire files into JSON format, facilitating integration with various systems that utilize JSON.