Enterprise Java

XML File Processing with Spring Batch

Spring Batch provides essential functionalities such as transaction management, job processing statistics, job restart capabilities, and more. One of its key features is the ability to handle large volumes of data efficiently. In this article, we’ll delve into using Spring Batch for reading from and writing to XML files with StaxEventItemReader and StaxEventItemWriter.

1. Introduction

When it comes to XML file processing, Spring Batch makes it straightforward to read XML records, map them to Java objects, and write Java objects back as XML records. This is accomplished using StaxEventItemReader for reading and StaxEventItemWriter for writing, with the help of Jakarta Binding formerly known as JAXB (Java Architecture for XML Binding) for marshalling and unmarshalling XML data.

The StaxEventItemReader reads XML files and is suitable for processing large XML files. It uses JAXB to unmarshal XML data into Java objects. Similarly, the StaxEventItemWriter marshals Java objects back into XML format using JAXB and writes them to an XML file.

In the following sections, we will demonstrate how to set up a Spring Batch project with Maven, define our XML schema and corresponding Java model classes, configure the reader and writer, and execute a complete batch job.

2. Project Setup

Maven pom.xml Configuration

First, let’s set up our Maven pom.xml file with the necessary dependencies:

    <dependencies>       
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>

        <!-- JAXB dependencies -->
        <dependency>
            <groupId>jakarta.xml.bind</groupId>
            <artifactId>jakarta.xml.bind-api</artifactId>
        </dependency>
        <dependency>
            <groupId>org.glassfish.jaxb</groupId>
            <artifactId>jaxb-runtime</artifactId>
        </dependency>

        <!-- Spring OXM (Object/XML Mapping) -->
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-oxm</artifactId>
        </dependency>

        <dependency>
            <groupId>com.h2database</groupId>
            <artifactId>h2</artifactId>
            <scope>runtime</scope>
        </dependency>
    </dependencies>
  • Spring Batch Dependencies: These dependencies include the core and infrastructure libraries required for Spring Batch functionality.
  • Spring OXM: This dependency includes the Spring Object/XML Mapping module, which integrates JAXB with Spring Batch.
  • JAXB Dependencies: These dependencies include the JAXB API and runtime implementations, enabling the marshalling and unmarshalling of XML data to and from Java objects.

Example XML File

Let’s define a sample XML file (input.xml) that we will read and process:

<?xml version="1.0" encoding="UTF-8"?>
<employees>
    <employee>
        <id>1</id>
        <name>John Franklin</name>
        <department>Sales</department>
    </employee>
    <employee>
        <id>2</id>
        <name>Thomas Smith</name>
        <department>HR</department>
    </employee>
    <employee>
        <id>3</id>
        <name>Adams Jefferson</name>
        <department>Accounts</department>
    </employee>
</employees>

Next, define the data model class and the Jakarta Binding (JAXB) annotations for XML binding.

@XmlRootElement(name = "employee")
public class Employee {

    private int id;
    private String name;
    private String department;

    public Employee() {
    }

    public Employee(int id, String name, String department) {
        this.id = id;
        this.name = name;
        this.department = department;
    }

    @XmlElement(name = "id")
    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    @XmlElement(name = "name")
    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @XmlElement(name = "department")
    public String getDepartment() {
        return department;
    }

    public void setDepartment(String department) {
        this.department = department;
    }

    @Override
    public String toString() {
        return "Employee [id=" + id + ", name=" + name + ", department=" + department + "]";
    }
}

The above Employee class is annotated with JAXB annotations to map its fields to XML elements. Here’s a breakdown of the annotations used:

  • @XmlRootElement: This annotation specifies the root element of the XML structure.
  • @XmlElement: This annotation is used on getter methods to specify that the corresponding field should be an XML element. Each field (id, name, department) in the Employee class is annotated with @XmlElement, indicating that they should be mapped to XML elements with the same name as the fields.

3. StaxEventItemReader Configuration

Before we dive into the full batch job configuration, let’s separate and focus on the configuration of the XML reader. This configuration ensures that our application can efficiently read and map XML records to Java objects.

@Configuration
public class ReaderConfig {
    
    @Bean
    @StepScope
    public StaxEventItemReader<Employee> reader() {
        
        Jaxb2Marshaller unmarshaller = new Jaxb2Marshaller();
        unmarshaller.setClassesToBeBound(Employee.class);

        return new StaxEventItemReaderBuilder<Employee>()
                .name("employeeReader")
                .resource(new ClassPathResource("input.xml"))
                .addFragmentRootElements("employee")
                .unmarshaller(unmarshaller)
                .build();
    }
}

The above ReaderConfig class configures the StaxEventItemReader for reading XML files. It uses Jaxb2Marshaller to unmarshal XML data into Employee objects. Here is an explanation of what the reader method does:

  • Jaxb2Marshaller: This is configured with the Employee class to handle the unmarshalling process.
  • StaxEventItemReaderBuilder: This builds the StaxEventItemReader with the specified name, resource (input XML file), root element (employee), and the configured unmarshaller.

4. StaxEventItemWriter Configuration

Next, let’s configure the XML writer. This setup allows our application to marshal Java objects back into an XML format and write them to a specified file.

@Configuration
public class WriterConfig {
    
    @Bean
    public StaxEventItemWriter<Employee> writer() {
        Jaxb2Marshaller marshaller = new Jaxb2Marshaller();
        marshaller.setClassesToBeBound(Employee.class);

        return new StaxEventItemWriterBuilder<Employee>()
                .name("employeeWriter")
                .resource(new FileSystemResource("output.xml"))
                .marshaller(marshaller)
                .rootTagName("employees")
                .build();
    }
}

The above WriterConfig class configures the StaxEventItemWriter for writing XML files. It uses Jaxb2Marshaller to marshal Employee objects into XML data. Here is an explanation of what the writer method does:

  • Jaxb2Marshaller: This is configured with the Employee class to handle the marshalling process.
  • StaxEventItemWriterBuilder: This builds the StaxEventItemWriter with the specified name, resource (output XML file), marshaller, and the root tag name (employees).

5. Full Batch Configuration

Now that we have separated configurations for reading and writing XML files, let’s integrate these components into a complete batch job. This configuration will define the job, steps, and necessary processors to process the data from start to finish.

@SpringBootApplication
public class SpringBatchXmlApplication {

    @Bean
    @StepScope
    public StaxEventItemReader reader() {

        Jaxb2Marshaller unmarshaller = new Jaxb2Marshaller();
        unmarshaller.setClassesToBeBound(Employee.class);

        return new StaxEventItemReaderBuilder()
                .name("employeeReader")
                .resource(new ClassPathResource("input.xml"))
                .addFragmentRootElements("employee")
                .unmarshaller(unmarshaller)
                .build();
    }

    @Bean
    public StaxEventItemWriter writer() {
        Jaxb2Marshaller marshaller = new Jaxb2Marshaller();
        marshaller.setClassesToBeBound(Employee.class);

        return new StaxEventItemWriterBuilder()
                .name("employeeWriter")
                .resource(new FileSystemResource("output.xml"))
                .marshaller(marshaller)
                .rootTagName("employees")
                .build();
    }

    @Bean
    public ItemProcessor processor() {
        return employee -> {
            // Example processor logic
            employee.setName(employee.getName().toUpperCase());
            System.out.println("Name: " + employee.getName() + ", Department: " + employee.getDepartment() );
            return employee;
        };
    }

    @Bean
    Job job(Step step1, JobRepository jobRepository) {

        var builder = new JobBuilder("job", jobRepository);
        return builder
                .start(step1)
                .build();
    }

    @Bean
    public Step step1(StaxEventItemReader reader,
            StaxEventItemWriter writer,
            JobRepository jobRepository,
            PlatformTransactionManager transactionManager) {

        var builder = new StepBuilder("step1", jobRepository);
        return builder
                .chunk(1, transactionManager)
                .reader(reader)
                .processor(processor())
                .writer(writer)
                .build();
    }

    public static void main(String[] args) {
        SpringApplication.run(SpringBatchXmlApplication.class, args);
    }

}

This block of code configures a batch job to read from an XML file, process the data, and write the results back to another XML file. In addition to the StaxEventItemReader and StaxEventItemWriter, we define a bean for ItemProcessor which processes each Employee object by converting the employee’s name to uppercase and prints the employee’s name and department.

The job method defines a batch job that starts with step1. The job builder uses a JobRepository to manage job execution details.

The step1 method defines a step named step1. It uses the reader, processor, and writer beans, with a chunk size of 1. The StepBuilder manages step execution details using the JobRepository and PlatformTransactionManager.

Log Output

When the application code is run, the console log output is:

Figure 1: Example Output from XML Item Reader and Writer

6. Conclusion

In this article, we explored how to leverage Spring Batch for XML file processing using StaxEventItemReader and StaxEventItemWriter. We started by configuring our Maven project with the necessary dependencies, and then we defined our data model class with JAXB annotations for XML binding.

We demonstrated how to set up StaxEventItemReader to read XML files and map them to Java objects, and the StaxEventItemWriter to marshal Java objects back into XML format. The provided batch configuration integrated these components into a complete Spring Batch job, including a simple processor for data transformation.

7. Download the Source Code

This was an article on XML item reader and writer.

Download
You can download the full source code of this example here: XML Item Reader and Writer

Omozegie Aziegbe

Omos holds a Master degree in Information Engineering with Network Management from the Robert Gordon University, Aberdeen. Omos is currently a freelance web/application developer who is currently focused on developing Java enterprise applications with the Jakarta EE framework.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button