Enterprise Java

Spring Batch Composite Item Reader Example

Batch processing plays a crucial role in applications that handle large datasets, ensuring efficient data ingestion, transformation, and storage. In Spring Batch, the ItemReader component is responsible for reading data from various sources before processing. However, in scenarios where data needs to be fetched from multiple sources or different formats, managing multiple readers individually can be cumbersome. The CompositeItemReader simplifies this by combining multiple readers into a single logical unit, allowing seamless data retrieval from multiple sources in a structured way. This article will explore the Composite Item Reader in Spring Batch, exploring its use cases with practical examples.

1. Setting Up a Spring Boot Project with Spring Batch

Before diving into the Composite Item Reader, we need a basic Spring Boot project configured with Spring Batch. Spring Boot simplifies batch job configurations, allowing us to focus on our reader implementations. To configure the project, we include the following dependencies in the pom.xml:

1
2
3
4
5
6
7
8
9
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <scope>runtime</scope>
</dependency>

These dependencies include Spring Batch and an H2 in-memory database for the job repository. With this setup, we can implement our composite reader.

2. Understanding the Composite Item Reader

The CompositeItemReader<T> in Spring Batch allows chaining multiple ItemReader implementations together. It reads from multiple sources one after another, treating them as a single data source. This is useful when combining data from different formats like CSV files, databases, and APIs. Spring Batch provides CompositeItemReader<T> that takes a list of readers and iterates through them sequentially. It ensures all data sources are exhausted before moving to the next step in the batch job.

3. Merging Data from a CSV File and a Database

In this article’s example, we will read customer records from a CSV file and a database, then merge them using CompositeItemReader.

3.1 Define the Customer Model

We need a simple Customer model to represent our data:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@Entity
@Table(name = "customers")
public class Customer {
     
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    @Column(name = "name")
    private String name;
    @Column(name = "email")
    private String email;
 
    public Customer() {
    }
 
    public Customer(Long id, String name, String email) {
        this.id = id;
        this.name = name;
        this.email = email;
    }
     
    // Getters and Setters
}

This class represents the customer data structure, which we will populate from different sources.

3.2 Implement a CSV Item Reader

To integrate data from multiple sources, we first need to read customer records from a CSV file. In Spring Batch, we use FlatFileItemReader for this purpose. The following configuration reads a CSV file named customers.csv, mapping its fields to the Customer class.

01
02
03
04
05
06
07
08
09
10
11
@Bean
public FlatFileItemReader<Customer> fileReader() {
    return new FlatFileItemReaderBuilder<Customer>()
            .name("customerFileItemReader")
            .resource(new ClassPathResource("customers.csv"))
            .delimited()
            .names("id", "name", "email")
            .linesToSkip(1)
            .targetType(Customer.class)
            .build();
}

The fileReader bean defines a FlatFileItemReader<Customer> to read customer data from a CSV file named customers.csv, which is located in the classpath. Using FlatFileItemReaderBuilder, it specifies the reader’s name as "customerFileItemReader" and sets the file as the input resource. The .delimited() method indicates that the file uses a delimiter (such as a comma) to separate values, and .names("id", "name", "email") maps these columns to the corresponding fields in the Customer class.

The .linesToSkip(1) directive ensures that the first row (typically the header) is ignored, and .targetType(Customer.class) instructs Spring Batch to automatically convert each row into a Customer object.

3.3 Implement a Database Item Reader

In addition to reading customer records from a CSV file, we also need to retrieve customer data from a database. Spring Batch provides JdbcCursorItemReader, which allows us to read data efficiently using SQL queries. Below, we define a database reader that fetches customer details from an in-memory H2 database.

Customer Table Structure

The customer data is stored in a database table named customers, with the following structure:

The following JdbcCursorItemReader reads customer data from this table.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
@Bean
public JdbcCursorItemReader<Customer> databaseCustomerReader() {
    String sql = "select * from customers";
    return new JdbcCursorItemReaderBuilder<Customer>()
            .name("customerTableItemReader")
            .dataSource(dataSource())
            .sql(sql)
            .beanRowMapper(Customer.class)
            .build();
}
 
@Bean
public DataSource dataSource() {
    return DataSourceBuilder.create()
            .driverClassName("org.h2.Driver")
            .url("jdbc:h2:mem:batchdb;DB_CLOSE_DELAY=-1;")
            .username("sa")
            .password("")
            .build();
}

The above databaseCustomerReader bean defines a JdbcCursorItemReader<Customer> that retrieves customer records from our relational database. It uses JdbcCursorItemReaderBuilder to configure the reader, setting its name as "customerTableItemReader". The dataSource() method provides the database connection details, ensuring that the reader can interact with the database. The .beanRowMapper(Customer.class) method automatically maps the retrieved rows to instances of the Customer class, allowing Spring Batch to process them as structured objects.

The dataSource bean configures an in-memory H2 database for testing and batch processing. With this setup, our batch job can now read customer records directly from the database and process them alongside CSV data.

3.4 Creating the Composite Item Reader

Now that we have set up separate readers for the CSV file and the database, we need to combine them into a single reader. This ensures that our batch job can read customer records from both sources. Spring Batch 5.2 provides CompositeItemReader, which allows us to merge multiple readers into one logical unit.

1
2
3
4
@Bean
public CompositeItemReader<Customer> itemReader() {
    return new CompositeItemReader<>(Arrays.asList(fileReader(), databaseCustomerReader()));
}

This configuration defines a CompositeItemReader<Customer> that merges data from both the CSV file reader and the database reader, allowing customer records from both sources to be processed as if they originated from a single reader. By combining these readers, the batch job can efficiently handle customer data from multiple sources without requiring distinct processing logic for each, ensuring a unified approach to data reading.

4. Configuring the Batch Job

Now that we have defined the item readers, we need to configure the batch job to process the customer data. A batch job in Spring Batch consists of steps, where each step includes an ItemReader, an ItemProcessor (optional), and an ItemWriter. Here, we define a simple batch job that reads customer records from the composite reader and writes them to the console.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
@Bean
public Step customerStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
    return new StepBuilder("customerStep", jobRepository)
            .<Customer, Customer>chunk(10, transactionManager)
            .reader(itemReader())
            .writer(customerWriter())
            .build();
}
 
@Bean
public Job customerJob(JobRepository jobRepository, Step customerStep) {
    return new JobBuilder("customerJob", jobRepository)
            .start(customerStep)
            .build();
}

This configuration defines a Spring Batch job that processes customer data in steps. The customerStep bean creates a batch step named "customerStep", which reads and writes Customer objects in chunks of 10 using an itemReader() and customerWriter(). It is managed by the provided JobRepository and PlatformTransactionManager, ensuring proper execution and transaction handling.

The customerJob bean sets up a batch job called "customerJob" that begins with customerStep, making sure the step runs as part of the job’s process. Both the job and step work together to run the batch job smoothly and process customer data.

5. Executing the Batch Job

After setting up the batch job, we need a way to trigger its execution. In a Spring Boot application, we can use a CommandLineRunner to start the job when the application runs. This ensures that our batch process begins automatically and executes according to the defined steps. The following code configures the execution of the batch job:

01
02
03
04
05
06
07
08
09
10
11
@Bean
public CommandLineRunner run(JobLauncher jobLauncher, Job customerJob) {
    return args -> {
        JobParameters jobParameters = new JobParametersBuilder()
                .addLong("startTime", System.currentTimeMillis())
                .toJobParameters();
 
        JobExecution jobExecution = jobLauncher.run(customerJob, jobParameters);
        System.out.println("Job Status: " + jobExecution.getStatus());
    };
}

In this code, the run method is defined as a Spring bean using CommandLineRunner, which executes when the application starts. It creates JobParameters with the current timestamp to ensure a fresh execution each time. The jobLauncher.run() method is then used to trigger the customerJob, and its execution status is printed to the console. This setup ensures that the batch job runs automatically whenever the application starts, making it easy to test and deploy.

6. Running the Spring Batch Program and Confirming the Results

Now that we have implemented our Spring Batch job, let’s see how to run it and confirm that the CompositeItemReader correctly reads customer records from both the CSV file and the database.

Sample CSV File (customers.csv)

Before running the application, let’s define a sample CSV file that contains customer data. This file serves as one of the data sources for our CompositeItemReader.

1
2
3
id,name,email
1,John Fish,john.fish@jcg.com
2,Dane Smith,dane.smith@jcg.com

Each line represents a customer with their ID, name, and email address.

Inserting Sample Data into the Database

To ensure that our batch job processes customers from the database, we need to insert some sample records into the customers table before execution. We can create a data.sql file inside the src/main/resources directory. Spring Boot will automatically execute it when the application starts if you have spring.datasource.initialization-mode=always in application.properties.

Create src/main/resources/data.sql:

1
2
3
INSERT INTO customers (id, name, email) VALUES (101, 'Alice Johnson', 'alice.johnson@jcg.com');
INSERT INTO customers (id, name, email) VALUES (102, 'Bob Williams', 'bob.williams@jcg.com');
INSERT INTO customers (id, name, email) VALUES (103, 'Charlie Brown', 'charlie.brown@jcg.com');

Example schema.sql file to define the table:

1
2
3
4
5
CREATE TABLE IF NOT EXISTS customers (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

Build and Run the Application

To build and run the application with Maven, use the command mvn clean spring-boot:run, which triggers Spring Batch to execute the job and prints customer records from both the CSV file and the database in the console. You should see customer records from both the CSV file and the database, printed as follows:

This confirms that the job successfully read customer data from the customers.csv file, retrieved pre-inserted customers from the database, and merged records from both sources using CompositeItemReader to process them as a single data stream. Additionally, the job execution was completed successfully.

7. Conclusion

In this article, we explored how to use CompositeItemReader in Spring Batch to read and process customer data from multiple sources, including a CSV file and a database. We started by setting up the necessary FlatFileItemReader and JdbcCursorItemReader, combined them using CompositeItemReader, and configured a batch job to process the merged data stream. We also demonstrated how to execute and the batch job to ensure that records from both sources were correctly read and processed.

8. Download the Source Code

Download
You can download the full source code of this example here: spring batch composite item reader

Omozegie Aziegbe

Omos Aziegbe is a technical writer and web/application developer with a BSc in Computer Science and Software Engineering from the University of Bedfordshire. Specializing in Java enterprise applications with the Jakarta EE framework, Omos also works with HTML5, CSS, and JavaScript for web development. As a freelance web developer, Omos combines technical expertise with research and writing on topics such as software engineering, programming, web application development, computer science, and technology.
Subscribe
Notify of
guest


This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button