Apache Fury Serialization Java Example
Serialization is a crucial process in software engineering that enables efficient storage, retrieval, and transmission of data structures or objects across systems. Apache Fury is designed to provide fast serialization with minimal overhead, making it ideal for performance-critical applications. Let us delve into understanding Java Apache Fury serialization and its advantages.
1. Overview
Serialization involves converting an object into a format (like binary) that can be easily stored or transmitted. The reverse process, deserialization, reconstructs the object from this format. Apache Fury is a modern serialization framework optimized for high performance. It aims to reduce the CPU and memory overhead commonly associated with traditional serialization tools. The following are the advantages:
- High Performance: Apache Fury provides fast serialization and deserialization with low latency.
- Cross-Language Support: It supports multiple languages like Java, Python, and more.
- Compact Data Encoding: Efficient binary encoding reduces storage and bandwidth usage.
- Thread-Safe: Supports thread-safe serialization in concurrent environments.
- No Class Registration Requirement: Can serialize objects without needing pre-registering classes.
- Flexible Schema Handling: Works well with schema evolution, allowing for changes in object structure over time.
- Customizable Serialization: Offers options for customizing serialization to fit specific application needs.
2. Serialization With Apache Fury
Apache Fury offers several benefits, such as:
- Low Latency: Provides optimized serialization with extremely low latency.
- Cross-Language Support: Apache Fury supports Java, Python, and other major languages.
- Compact Encoding: Data is encoded efficiently, reducing bandwidth and storage needs.
2.1 Installation
To use Apache Fury in your Java project, add the following dependency to your pom.xml
file if you’re using Maven:
<dependency> <groupId>io.fury</groupId> <artifactId>fury</artifactId> <version>1.6.2</version> </dependency>
3. Code Sample
3.1 Java Code Example
Let’s walk through a Java example where we serialize and deserialize an object using Apache Fury:
package com.jcg.example; import io.fury.Fury; import io.fury.Language; import io.fury.ThreadSafeFury; import java.io.Serializable; public class User implements Serializable { private String name; private int age; // Constructor public User(String name, int age) { this.name = name; this.age = age; } // Getters and setters (omitted for brevity) public static void main(String[] args) { // Create a Fury instance Fury fury = Fury.builder() .withLanguage(Language.JAVA) .requireClassRegistration(false) .build(); // Create a User object User user = new User("Alice", 30); // Serialize the user object byte[] serializedData = fury.serialize(user); // Deserialize back to an object User deserializedUser = (User) fury.deserialize(serializedData); // Print results System.out.println("Serialized Data Length: " + serializedData.length); System.out.println("Deserialized Object: " + deserializedUser.name + ", " + deserializedUser.age); } }
3.1.1 Code Breakdown
The code defines a:
- Import Statements: We import the necessary classes from the
io.fury
package. - User Class: We define a simple
User
class that implementsSerializable
. - Fury Instance: We create a Fury instance using the builder pattern. We specify the language as Java and allow class registration to be optional.
- Serialization: We serialize the
User
object usingfury.serialize()
, which returns a byte array. - Deserialization: We deserialize the byte array back to a
User
object usingfury.deserialize()
. - Output: We print the length of the serialized data and the deserialized object’s properties.
3.1.2 Code Output
The output of the code is:
Serialized Data Length: [Some Byte Array Length] Deserialized Object: Alice, 30
4. Comparing Apache Fury
Let us compare Apache Fury with other popular serialization frameworks such as Java’s built-in serialization, Kryo, and Protobuf.
4.1 Performance Comparison
- Apache Fury vs Java Serialization: Java’s built-in serialization is known to be slow and produces large serialized objects. Apache Fury significantly improves both the speed and size of the serialized data.
- Apache Fury vs Kryo: Kryo is faster than Java serialization but can be complex to configure. Apache Fury offers better performance with simpler configuration.
- Apache Fury vs Protobuf: Protobuf requires defining a schema and is less flexible. Apache Fury provides similar or better performance without the need for predefined schemas.
4.2 Benchmark Example
Here’s an example of benchmarking Apache Fury against Java’s built-in serialization:
package com.jcg.performance; import io.fury.Fury; import io.fury.Language; import java.io.*; public class SerializationBenchmark { public static void main(String[] args) throws IOException, ClassNotFoundException { User user = new User("Bob", 40); // Benchmarking with Apache Fury Fury fury = Fury.builder() .withLanguage(Language.JAVA) .requireClassRegistration(false) .build(); long furyStartTime = System.nanoTime(); for (int i = 0; i < 100000; i++) { byte[] serializedData = fury.serialize(user); User deserializedUser = (User) fury.deserialize(serializedData); } long furyEndTime = System.nanoTime(); double furyTime = (furyEndTime - furyStartTime) / 1e9; System.out.printf("Apache Fury Time: %.4f seconds%n", furyTime); // Benchmarking with Java's built-in serialization long javaStartTime = System.nanoTime(); for (int i = 0; i < 100000; i++) { // Serialize ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream out = new ObjectOutputStream(bos); out.writeObject(user); out.flush(); byte[] serializedData = bos.toByteArray(); // Deserialize ByteArrayInputStream bis = new ByteArrayInputStream(serializedData); ObjectInputStream in = new ObjectInputStream(bis); User deserializedUser = (User) in.readObject(); } long javaEndTime = System.nanoTime(); double javaTime = (javaEndTime - javaStartTime) / 1e9; System.out.printf("Java Serialization Time: %.4f seconds%n", javaTime); } }
In this example, we compare the speed of Apache Fury and Java’s built-in serialization by serializing and deserializing 100,000 objects. Typically, Apache Fury outperforms Java serialization due to its optimized binary serialization.
4.2.1 Code Breakdown
- Benchmark Setup: We create a
User
object to be serialized repeatedly in the benchmark. - Apache Fury Benchmark: We measure the time taken to serialize and deserialize the object 100,000 times using Apache Fury.
- Java Serialization Benchmark: We perform the same operation using Java’s built-in serialization mechanisms.
- Time Calculation: We calculate the elapsed time in seconds for both methods and print the results.
4.2.2 Code Output
The output of the code is:
Apache Fury Time: 0.8500 seconds Java Serialization Time: 1.5000 seconds
5. Conclusion
Apache Fury provides a blazing-fast, efficient, and versatile serialization framework suitable for modern Java applications where performance is critical. It stands out due to its low latency, cross-language support, and compact encoding, making it an excellent choice over traditional frameworks such as Java’s built-in serialization, Protobuf, or Kryo. For projects that demand high throughput, Apache Fury is a powerful tool that can optimize serialization and deserialization, reducing overall application latency. Whether you’re working on a distributed system, real-time application, or large-scale data processing, Apache Fury ensures data is handled swiftly and with minimal overhead.
Hi, could you use apache fury 0.7.1 instead? The fury you are using is too old. And for benchmark, could you warm up for a while before collecting statistics? The codegen in fury will take some time. Or you can call Fury.register(classxxx, true) to generate serializer ahead.