Enterprise Java

Create Avro Schema With List of Objects

Apache Avro is a powerful data serialization framework that enables efficient data exchange and storage. It is widely used in big data applications, particularly within the Hadoop ecosystem, due to its compact binary format and schema-based structure. Avro schemas, written in JSON, define the data types and structure, ensuring compatibility between systems. Let us delve into understanding how to create an Avro schema that effectively manages list-objects.

1. Introduction

Apache Avro is a framework for data serialization that provides a compact, fast, and binary format for data exchange. It is particularly useful in big data applications, such as those using Apache Hadoop. Avro is schema-based, meaning that the data structure is defined in a schema, which is written in JSON format.

An Avro schema describes the data types and structure of the data being serialized. This allows for dynamic data processing and ensures compatibility between different systems. The schema is essential for both the serialization and deserialization processes, enabling efficient data interchange.

2. Creating the Avro Schema

To define an Avro schema that includes a list of objects, you can use the following JSON structure. Suppose we want to create a schema for a Person that includes a list of Address objects.

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "int"
    },
    {
      "name": "addresses",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Address",
          "fields": [
            {
              "name": "street",
              "type": "string"
            },
            {
              "name": "city",
              "type": "string"
            },
            {
              "name": "zipCode",
              "type": "string"
            }
          ]
        }
      }
    }
  ]
}

In this schema, the Person record has a list of Address records, which includes fields for street, city, and zip code.

3. Using the Schema in Java

To use the Avro schema in a Java application, you need to include the Avro library in your project. You can do this by adding the following dependency to your pom.xml if you are using Maven:

<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.10.2</version>
</dependency>

After adding the dependency, you can generate Java classes from the Avro schema (Created above person.avsc) using the Avro tools. This can be done with the following command:

java -jar avro-tools-1.10.2.jar compile schema person.avsc .

4. Working With Generated Classes

Once you have generated the Java classes from your Avro schema, you can use them in your application to serialize and deserialize data. Here’s an example of how to create a Person object and serialize it to a file:

import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumWriter;
import java.io.File;
import java.util.Arrays;

public class AvroExample {
    public static void main(String[] args) throws Exception {
        // Create a Person object
        Person person = Person.newBuilder()
            .setName("John Doe")
            .setAge(30)
            .setAddresses(Arrays.asList(
                Address.newBuilder().setStreet("123 Main St").setCity("New York").setZipCode("10001").build(),
                Address.newBuilder().setStreet("456 Elm St").setCity("Boston").setZipCode("02110").build()
            ))
            .build();

        // Create a DatumWriter for the Person class
        DatumWriter personDatumWriter = new SpecificDatumWriter(Person.class);

        // Create a DataFileWriter to write the Person object to a file
        DataFileWriter dataFileWriter = new DataFileWriter(personDatumWriter);
        dataFileWriter.create(person.getSchema(), new File("person.avro")); // Create the file with schema
        dataFileWriter.append(person); // Append the Person object to the file
        dataFileWriter.close(); // Close the file writer
    }
}

4.1 Code Explanation

The given code defines a Java class named AvroExample that demonstrates how to serialize a Person object using Apache Avro. The main method serves as the entry point for the application.

Inside the main method, a new Person object is created using the builder pattern. The Person.newBuilder() method initializes the builder, allowing us to set various attributes. The setName method assigns the name “John Doe” to the Person object, while the setAge method sets the age to 30.

The setAddresses method is used to assign a list of Address objects to the Person. Each Address object is created using its builder, which allows us to specify the street, city, and zip code. In this example, two addresses are provided: one for “123 Main St, New York, 10001” and another for “456 Elm St, Boston, 02110”. After setting these values, the build() method is called to finalize the creation of the Person object.

Next, a DatumWriter specific to the Person class is created using SpecificDatumWriter. This writer is responsible for converting the Person object into a format suitable for Avro serialization. A DataFileWriter is then instantiated with this DatumWriter, which will handle the process of writing the serialized data to a file.

The create method of the DataFileWriter initializes a new file named person.avro using the schema retrieved from the Person object. This file will store the serialized data according to the defined schema. Following this, the append method is called to write the person object to the file.

Finally, the close method is invoked on the DataFileWriter to ensure that all data is flushed to the file and that resources are released, completing the serialization process.

4.2 Code Output

Once the file is generated the following output will be written to the generated file.

{"name": "John Doe", "age": 30, "addresses": [
    {"street": "123 Main St", "city": "New York", "zipCode": "10001"},
    {"street": "456 Elm St", "city": "Boston", "zipCode": "02110"}
]}

5. Conclusion

Apache Avro provides a robust and efficient way to work with data serialization in Java applications. By defining an Avro schema that includes a list of objects, you can easily manage complex data structures. This article covered the basics of Avro, how to create an Avro schema, use it in Java, and work with the generated classes. With Avro, you can ensure that your data is structured, consistent, and easily transferable across different systems.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button