Core Java

Avro Schema From Java Class Example

Apache Avro is a data serialization system that is compact, fast, and ideal for distributed applications. Let us delve into understanding how a Java class can generate an Avro schema.

1. What is Avro?

Apache Avro is a widely used data serialization format that allows efficient serialization and deserialization of structured data. It is commonly used in big data frameworks such as Apache Kafka, Hadoop, and Spark. Avro’s schema-driven nature ensures compatibility across different systems, making it a preferred choice for data serialization. Avro consists of the following key components:

  • Schema: Defines the structure of the data in the JSON format.
  • Serializer: Converts Java objects into Avro’s compatible binary format.
  • Deserializer: Reads Avro binary data and converts it back to Java objects.

1.1 Benefits of Avro

Avro offers several advantages, including:

  • Compact and efficient binary format: Reduces storage and transmission overhead.
  • Supports schema evolution: Allows changes to the schema while maintaining backward and forward compatibility.
  • Interoperability between multiple languages: Supports multiple programming languages, including Java, Python, and Scala.
  • Optimized for big data frameworks: Designed for fast read/write operations in distributed environments.
  • Self-describing format with embedded schema: Embeds schema information within the data, eliminating the need for external schema definitions.

1.2 Use cases of Avro

Avro is widely used in various scenarios, including:

  • Streaming data pipelines: Used in message brokers like Apache Kafka to serialize event data efficiently.
  • Big data storage: Integrated with Hadoop and HDFS for structured data storage.
  • Inter-process communication: Used in Remote Procedure Calls (RPC) for seamless data exchange.
  • Schema evolution management: Helps manage schema changes in distributed applications.

1.3 Schema Evolution in Avro

One of the powerful features of Avro is schema evolution, which allows modifying the schema without breaking existing data. This flexibility is crucial for long-term data compatibility and smooth updates in distributed systems. Avro supports the following schema changes:

  • Adding new fields with default values: New fields can be introduced without affecting existing records, as default values ensure backward compatibility.
  • Removing existing fields: Fields can be removed if they are optional, ensuring that older data remains readable.
  • Changing data types: Type modifications are allowed only in a compatible way, such as widening numeric types (e.g., int to long) or using logical types.

With these capabilities, Avro ensures seamless data evolution, making it a preferred choice for data storage and streaming applications.

1.4 Setting up Avro in a Maven project

To use Avro in a Java project using Maven, add the following dependencies to your pom.xml file:

01
02
03
04
05
06
07
08
09
10
11
12
<dependencies>
    <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro</artifactId>
        <version>your__jar__version</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>your__jar__version</version>
    </dependency>
</dependencies>

2. Code Example

2.1 Define a POJO Class for Avro Schema

Let’s define a simple POJO class that we will use to generate an Avro schema.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
public class Employee {
    private String name;
    private int age;
    private String department;
 
    public Employee() {} // Default constructor required for Avro serialization
 
    public Employee(String name, int age, String department) {
        this.name = name;
        this.age = age;
        this.department = department;
    }
 
    public String getName() { return name; }
    public void setName(String name) { this.name = name; }
 
    public int getAge() { return age; }
    public void setAge(int age) { this.age = age; }
 
    public String getDepartment() { return department; }
    public void setDepartment(String department) { this.department = department; }
}

2.2 Creating Avro Schemas with the Avro Reflection API

The Avro Reflection API allows us to generate an Avro schema from a Java class dynamically.

The ReflectData class in Apache Avro is part of the Avro Reflection API and is used to generate Avro schemas dynamically from Java classes without requiring manually written Avro schema files (.avsc). It enables schema generation at runtime based on the structure of Java classes, making it useful for scenarios where you want to avoid defining Avro schemas separately. It offers several key features:

  • Schema Generation – Automatically derives an Avro schema from a Java class using reflection.
  • Supports POJOs (Plain Old Java Objects) – Works with standard Java classes without requiring Avro-specific annotations.
  • Handles Complex Types – Supports Java objects containing nested classes, lists, and maps.
  • Useful for Serialization – Converts Java objects to Avro records for serialization.
  • Facilitates Schema Evolution – Works well with evolving schemas when combined with default values.

This is useful when you want to work with Avro but don’t want to predefine schemas. Below is an example of how to generate an Avro schema using the Avro Reflection API.

1
2
3
4
5
6
7
8
9
import org.apache.avro.Schema;
import org.apache.avro.reflect.ReflectData;
 
public class AvroReflectionExample {
    public static void main(String[] args) {
        Schema schema = ReflectData.get().getSchema(Employee.class);
        System.out.println(schema.toString(true));
    }
}

2.2.1 Code Explanation and Output

The given Java code demonstrates how to generate an Avro schema using the Avro Reflection API. It imports the necessary Avro classes and defines a main method that retrieves the Avro schema of the Employee class using ReflectData.get().getSchema(Employee.class). The schema is then printed in a human-readable JSON format using schema.toString(true).

01
02
03
04
05
06
07
08
09
10
{
  "type" : "record",
  "name" : "Employee",
  "namespace" : "com.example",
  "fields" : [
    { "name" : "name", "type" : "string" },
    { "name" : "age", "type" : "int" },
    { "name" : "department", "type" : "string" }
  ]
}

This approach allows automatic schema generation based on the structure of the Employee class without manually defining an Avro schema file.

2.3 Creating Avro Schemas with the Jackson Library

Jackson is a widely used library for JSON processing. It also supports Avro schema generation. Below is an example of how to generate an Avro schema using Jackson’s Avro module.

01
02
03
04
05
06
07
08
09
10
11
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.avro.AvroMapper;
import com.fasterxml.jackson.dataformat.avro.schema.AvroSchema;
 
public class JacksonAvroExample {
    public static void main(String[] args) throws Exception {
        ObjectMapper mapper = new AvroMapper();
        AvroSchema schema = mapper.schemaFor(Employee.class);
        System.out.println(schema.getAvroSchema().toString(true));
    }
}

2.3.1 Code Explanation and Output

The given Java code demonstrates how to generate an Avro schema using Jackson’s Avro module. It initializes an AvroMapper, a subclass of ObjectMapper, to handle Avro data serialization. The schemaFor(Employee.class) method is used to generate an Avro schema for the Employee class, which is then retrieved using schema.getAvroSchema().toString(true) and printed in a human-readable JSON format. Assuming Employee is a class with fields like name (String) and age (int), the output schema will define a record type with corresponding fields.

01
02
03
04
05
06
07
08
09
10
{
  "type" : "record",
  "name" : "Employee",
  "namespace" : "com.example",
  "fields" : [
    { "name" : "name", "type" : "string" },
    { "name" : "age", "type" : "int" },
    { "name" : "department", "type" : "string" }
  ]
}

3. Comparison: Avro Reflection API vs. Jackson Avro Module

FeatureAvro Reflection APIJackson Avro Module
Schema GenerationAutomatically generates Avro schema from Java POJOs at runtime using reflection.Generates Avro schema using Jackson’s AvroMapper and JSON-based annotations.
Dependency RequirementsRequires the org.apache.avro library.Requires com.fasterxml.jackson.dataformat.avro along with core Jackson libraries.
Ease of UseSimple to use, as it does not require annotations or pre-defined schema files.Flexible and integrates well with JSON-based serialization workflows.
CustomizationLimited customization as it strictly follows Java class structure.Provides more control over schema generation via Jackson annotations like @JsonProperty and @JsonIgnore.
PerformanceFaster schema generation since it directly reflects Java class metadata.May have slight overhead due to Jackson’s processing layers.
Support for Complex TypesSupports nested Java objects, lists, and maps.Supports complex types but requires additional configuration for handling nested structures.
Schema Evolution SupportSupports schema evolution by allowing default values for new fields.Supports schema evolution but requires careful use of Jackson annotations.
InteroperabilityWorks best within Avro’s native ecosystem (Hadoop, Kafka, etc.).Better suited for applications already using Jackson for JSON processing.
Use Case SuitabilityBest for projects that rely on Avro as the primary serialization mechanism.Ideal for applications where JSON and Avro coexist, such as microservices with REST and Avro-based event streaming.

4. Conclusion

In this article, we explored how to generate an Avro schema from a Java class using two approaches: the Avro Reflection API and Jackson’s Avro module. Additionally, we covered Avro’s architecture, benefits, use cases, and schema evolution techniques. Avro is widely used in data-intensive applications, offering an efficient way to handle structured data serialization.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest


This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button