Software Development

Conversion from POJO to Avro Record

Apache Avro is a powerful data serialization framework widely used in distributed systems to exchange data efficiently and compactly. Avro’s schema-based design ensures that data producers and consumers can seamlessly share structured data while maintaining compatibility. One of Avro’s most versatile features is its ability to convert Java objects (POJOs) into Avro records, making it an excellent choice for applications involving data pipelines, storage systems, or message queues. Let us delve into understanding the process of POJO to Avro Record conversion.

1. Introduction

Apache Avro is a popular data serialization framework that is widely used for storing and transmitting data in a compact and efficient binary format. It is part of the Apache Hadoop ecosystem and is highly optimized for processing large volumes of data. One of the key features of Avro is its schema-based structure. Avro defines data in terms of schemas that are written in JSON. These schemas allow data to be serialized and deserialized efficiently, ensuring that both the producer and consumer of the data can read and interpret the data correctly.

An Avro Record is a specific type of Avro data structure that represents a complex record with multiple fields. Each field can have different data types, such as strings, integers, or arrays. Avro Records are often used in distributed systems where data needs to be exchanged between different services or components.

2. Code Example for Generic Conversion Using Reflection

Java Reflection allows us to inspect and manipulate the fields and methods of classes at runtime. Using reflection, we can dynamically map the fields of a POJO to an Avro Record without needing explicit schema definitions. Below is an example that demonstrates how to achieve this.

import java.lang.reflect.Field;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;

public class PojoToAvroExample {
  public static void main(String[] args) throws IllegalAccessException {
    // Example POJO
    Person person = new Person("John", 30);

    // Avro schema for Person class
    String schemaString = "{\n"
        + "  \"type\": \"record\",\n"
        + "  \"name\": \"Person\",\n"
        + "  \"fields\": [\n"
        + "    {\"name\": \"name\", \"type\": \"string\"},\n"
        + "    {\"name\": \"age\", \"type\": \"int\"}\n"
        + "  ]\n"
        + "}";

    // Parse the schema
    Schema schema = new Schema.Parser().parse(schemaString);

    // Create a GenericRecord from POJO
    GenericRecord avroRecord = convertPojoToAvro(person, schema);

    // Print the Avro Record
    System.out.println(avroRecord);
  }

  public static GenericRecord convertPojoToAvro(Object pojo, Schema schema)
      throws IllegalAccessException {
    GenericRecord record = new GenericData.Record(schema);

    // Iterate over the fields of the POJO using reflection
    for (Field field : pojo.getClass().getDeclaredFields()) {
      field.setAccessible(true); // Allow access to private fields
      String fieldName = field.getName();
      Object fieldValue = field.get(pojo);

      // Set the corresponding value in the Avro Record
      record.put(fieldName, fieldValue);
    }
    return record;
  }
}

class Person {
  private String name;
  private int age;

  public Person(String name, int age) {
    this.name = name;
    this.age = age;
  }

  // Getters and setters omitted for brevity
}

2.1 Code Explanation and Output

The PojoToAvroExample class demonstrates how to convert a Plain Old Java Object (POJO) into an Apache Avro GenericRecord. The process involves defining an Avro schema, using reflection to dynamically map the POJO’s fields to the corresponding fields in the Avro record, and then printing the resulting record.

In the main method, a Person object is created with the name “John” and age 30. This object serves as the POJO that will be converted into an Avro record. The Avro schema for the Person class is defined as a JSON string, specifying the structure of the record with two fields: name (a string) and age (an integer).

The schema is parsed into an Avro Schema object using Schema.Parser().parse(). This schema defines the format of the Avro record and ensures that the data adheres to the specified structure.

The convertPojoToAvro method is responsible for converting the POJO into an Avro record. A GenericRecord is initialized using the parsed schema. Reflection is used to iterate over the fields of the POJO. For each field, the field name and value are retrieved using the Field class from Java’s reflection API. Private fields are made accessible using field.setAccessible(true). The retrieved values are then added to the corresponding fields of the GenericRecord using the put method.

Finally, the generated Avro record is returned to the main method, where it is printed to the console. The output displays the Avro record in JSON-like format, showcasing the values of the fields in the Person object mapped into the Avro structure.

The Person class serves as the example POJO with private fields for name and age, along with a constructor for initialization. The convertPojoToAvro method demonstrates the flexibility of using reflection to handle dynamic field mapping without requiring hardcoded transformations for each POJO.

The following output is printed to the console.

{"name": "John", "age": 30}

3. Code Example Using Avro ReflectDatumWriter Class

The Avro ReflectDatumWriter class provides a simpler approach to serialize a POJO to an Avro Record without needing to manually define a schema or use reflection directly. Instead, it utilizes annotations and reflection internally to map the POJO fields to an Avro Record.

import java.io.ByteArrayOutputStream;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.reflect.ReflectDatumWriter;

public class ReflectDatumWriterExample {
  public static void main(String[] args) throws Exception {
    // Example POJO
    Person person = new Person("Alice", 28);

    // Use ReflectDatumWriter to serialize POJO to Avro Record
    DatumWriter<Person> datumWriter = new ReflectDatumWriter<>(Person.class);
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    EncoderFactory.get().binaryEncoder(outputStream, null).flush();

    // Get the Avro record from the POJO
    GenericRecord avroRecord = convertPojoUsingReflect(person, datumWriter);

    // Print the Avro Record
    System.out.println(avroRecord);
  }

  public static GenericRecord convertPojoUsingReflect(
      Object pojo, DatumWriter<Object> datumWriter) throws Exception {
    Schema schema = datumWriter.getSchema();
    GenericRecord record = new GenericData.Record(schema);

    // Use the ReflectDatumWriter to populate the Avro record
    datumWriter.write(pojo,
        EncoderFactory.get().binaryEncoder(new ByteArrayOutputStream(), null));
    return record;
  }
}

class Person {
  private String name;
  private int age;

  public Person(String name, int age) {
    this.name = name;
    this.age = age;
  }

  // Getters and setters omitted for brevity
}

3.1 Code Explanation and Output

The ReflectDatumWriterExample class demonstrates how to convert a Plain Old Java Object (POJO) into an Avro record using the ReflectDatumWriter class provided by Apache Avro. This approach simplifies the serialization process by leveraging Avro’s ability to infer schema information directly from the POJO.

In the main method, a Person object is created with the name “Alice” and age 28. This object serves as the POJO that will be serialized into an Avro record. To achieve this, a ReflectDatumWriter is instantiated, specifying the Person class as the target type. The ReflectDatumWriter uses reflection to map the fields of the POJO to the fields in an Avro record.

A ByteArrayOutputStream is initialized to act as the destination for the serialized data. An Avro binary encoder is created using the EncoderFactory, and it is used to encode the Avro data into the output stream. This ensures that the serialized data is stored in a compact and efficient binary format.

The convertPojoUsingReflect method is called to perform the actual conversion of the POJO into an Avro record. Inside this method, the schema for the Avro record is retrieved from the datumWriter, and a GenericRecord is initialized with the schema. The ReflectDatumWriter is then used to write the POJO’s data into the Avro record using the binary encoder.

The generated Avro record is returned to the main method and printed to the console. The output showcases the Avro record in a format that represents the Person object’s fields mapped according to the inferred schema.

The Person class serves as the POJO for this example. It contains private fields for name and age, along with a constructor for initialization. By using the ReflectDatumWriter, the conversion process becomes straightforward and avoids the need for manual schema definitions or field mappings.

The following output is printed to the console.

{"name": "Alice", "age": 28}

4. Conclusion

Converting a POJO to an Avro Record in Java can be achieved through several approaches. The generic method using reflection provides flexibility and works well for cases where you do not have a predefined schema. On the other hand, using the ReflectDatumWriter class simplifies the process by leveraging Avro’s built-in support for POJOs, reducing the need for manual schema creation.Both methods have their advantages depending on the requirements of your project. The reflection-based approach offers a more generic solution, while the ReflectDatumWriter class is ideal when you have well-defined POJOs with annotations.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button