Spring Batch Composite Item Reader Example
Batch processing plays a crucial role in applications that handle large datasets, ensuring efficient data ingestion, transformation, and storage. In Spring Batch, the ItemReader
component is responsible for reading data from various sources before processing. However, in scenarios where data needs to be fetched from multiple sources or different formats, managing multiple readers individually can be cumbersome. The CompositeItemReader
simplifies this by combining multiple readers into a single logical unit, allowing seamless data retrieval from multiple sources in a structured way. This article will explore the Composite Item Reader in Spring Batch, exploring its use cases with practical examples.
1. Setting Up a Spring Boot Project with Spring Batch
Before diving into the Composite Item Reader, we need a basic Spring Boot project configured with Spring Batch. Spring Boot simplifies batch job configurations, allowing us to focus on our reader implementations. To configure the project, we include the following dependencies in the pom.xml
:
1 2 3 4 5 6 7 8 9 | < dependency > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-starter-batch</ artifactId > </ dependency > < dependency > < groupId >com.h2database</ groupId > < artifactId >h2</ artifactId > < scope >runtime</ scope > </ dependency > |
These dependencies include Spring Batch and an H2 in-memory database for the job repository. With this setup, we can implement our composite reader.
2. Understanding the Composite Item Reader
The CompositeItemReader<T>
in Spring Batch allows chaining multiple ItemReader
implementations together. It reads from multiple sources one after another, treating them as a single data source. This is useful when combining data from different formats like CSV files, databases, and APIs. Spring Batch provides CompositeItemReader<T>
that takes a list of readers and iterates through them sequentially. It ensures all data sources are exhausted before moving to the next step in the batch job.
3. Merging Data from a CSV File and a Database
In this article’s example, we will read customer records from a CSV file and a database, then merge them using CompositeItemReader
.
3.1 Define the Customer Model
We need a simple Customer
model to represent our data:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | @Entity @Table (name = "customers" ) public class Customer { @Id @GeneratedValue (strategy = GenerationType.IDENTITY) private Long id; @Column (name = "name" ) private String name; @Column (name = "email" ) private String email; public Customer() { } public Customer(Long id, String name, String email) { this .id = id; this .name = name; this .email = email; } // Getters and Setters } |
This class represents the customer data structure, which we will populate from different sources.
3.2 Implement a CSV Item Reader
To integrate data from multiple sources, we first need to read customer records from a CSV file. In Spring Batch, we use FlatFileItemReader
for this purpose. The following configuration reads a CSV file named customers.csv
, mapping its fields to the Customer
class.
01 02 03 04 05 06 07 08 09 10 11 | @Bean public FlatFileItemReader<Customer> fileReader() { return new FlatFileItemReaderBuilder<Customer>() .name( "customerFileItemReader" ) .resource( new ClassPathResource( "customers.csv" )) .delimited() .names( "id" , "name" , "email" ) .linesToSkip( 1 ) .targetType(Customer. class ) .build(); } |
The fileReader
bean defines a FlatFileItemReader<Customer>
to read customer data from a CSV file named customers.csv
, which is located in the classpath. Using FlatFileItemReaderBuilder
, it specifies the reader’s name as "customerFileItemReader"
and sets the file as the input resource. The .delimited()
method indicates that the file uses a delimiter (such as a comma) to separate values, and .names("id", "name", "email")
maps these columns to the corresponding fields in the Customer
class.
The .linesToSkip(1)
directive ensures that the first row (typically the header) is ignored, and .targetType(Customer.class)
instructs Spring Batch to automatically convert each row into a Customer
object.
3.3 Implement a Database Item Reader
In addition to reading customer records from a CSV file, we also need to retrieve customer data from a database. Spring Batch provides JdbcCursorItemReader, which allows us to read data efficiently using SQL queries. Below, we define a database reader that fetches customer details from an in-memory H2 database.
Customer Table Structure
The customer data is stored in a database table named customers, with the following structure:
The following JdbcCursorItemReader
reads customer data from this table.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 | @Bean public JdbcCursorItemReader<Customer> databaseCustomerReader() { String sql = "select * from customers" ; return new JdbcCursorItemReaderBuilder<Customer>() .name( "customerTableItemReader" ) .dataSource(dataSource()) .sql(sql) .beanRowMapper(Customer. class ) .build(); } @Bean public DataSource dataSource() { return DataSourceBuilder.create() .driverClassName( "org.h2.Driver" ) .url( "jdbc:h2:mem:batchdb;DB_CLOSE_DELAY=-1;" ) .username( "sa" ) .password( "" ) .build(); } |
The above databaseCustomerReader
bean defines a JdbcCursorItemReader<Customer>
that retrieves customer records from our relational database. It uses JdbcCursorItemReaderBuilder
to configure the reader, setting its name as "customerTableItemReader"
. The dataSource()
method provides the database connection details, ensuring that the reader can interact with the database. The .beanRowMapper(Customer.class)
method automatically maps the retrieved rows to instances of the Customer
class, allowing Spring Batch to process them as structured objects.
The dataSource
bean configures an in-memory H2 database for testing and batch processing. With this setup, our batch job can now read customer records directly from the database and process them alongside CSV data.
3.4 Creating the Composite Item Reader
Now that we have set up separate readers for the CSV file and the database, we need to combine them into a single reader. This ensures that our batch job can read customer records from both sources. Spring Batch 5.2 provides CompositeItemReader
, which allows us to merge multiple readers into one logical unit.
1 2 3 4 | @Bean public CompositeItemReader<Customer> itemReader() { return new CompositeItemReader<>(Arrays.asList(fileReader(), databaseCustomerReader())); } |
This configuration defines a CompositeItemReader<Customer>
that merges data from both the CSV file reader and the database reader, allowing customer records from both sources to be processed as if they originated from a single reader. By combining these readers, the batch job can efficiently handle customer data from multiple sources without requiring distinct processing logic for each, ensuring a unified approach to data reading.
4. Configuring the Batch Job
Now that we have defined the item readers, we need to configure the batch job to process the customer data. A batch job in Spring Batch consists of steps, where each step includes an ItemReader, an ItemProcessor (optional), and an ItemWriter. Here, we define a simple batch job that reads customer records from the composite reader and writes them to the console.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 | @Bean public Step customerStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) { return new StepBuilder( "customerStep" , jobRepository) .<Customer, Customer>chunk( 10 , transactionManager) .reader(itemReader()) .writer(customerWriter()) .build(); } @Bean public Job customerJob(JobRepository jobRepository, Step customerStep) { return new JobBuilder( "customerJob" , jobRepository) .start(customerStep) .build(); } |
This configuration defines a Spring Batch job that processes customer data in steps. The customerStep
bean creates a batch step named "customerStep"
, which reads and writes Customer
objects in chunks of 10 using an itemReader()
and customerWriter()
. It is managed by the provided JobRepository
and PlatformTransactionManager
, ensuring proper execution and transaction handling.
The customerJob
bean sets up a batch job called "customerJob"
that begins with customerStep
, making sure the step runs as part of the job’s process. Both the job and step work together to run the batch job smoothly and process customer data.
5. Executing the Batch Job
After setting up the batch job, we need a way to trigger its execution. In a Spring Boot application, we can use a CommandLineRunner
to start the job when the application runs. This ensures that our batch process begins automatically and executes according to the defined steps. The following code configures the execution of the batch job:
01 02 03 04 05 06 07 08 09 10 11 | @Bean public CommandLineRunner run(JobLauncher jobLauncher, Job customerJob) { return args -> { JobParameters jobParameters = new JobParametersBuilder() .addLong( "startTime" , System.currentTimeMillis()) .toJobParameters(); JobExecution jobExecution = jobLauncher.run(customerJob, jobParameters); System.out.println( "Job Status: " + jobExecution.getStatus()); }; } |
In this code, the run
method is defined as a Spring bean using CommandLineRunner
, which executes when the application starts. It creates JobParameters
with the current timestamp to ensure a fresh execution each time. The jobLauncher.run()
method is then used to trigger the customerJob
, and its execution status is printed to the console. This setup ensures that the batch job runs automatically whenever the application starts, making it easy to test and deploy.
6. Running the Spring Batch Program and Confirming the Results
Now that we have implemented our Spring Batch job, let’s see how to run it and confirm that the CompositeItemReader correctly reads customer records from both the CSV file and the database.
Sample CSV File (customers.csv
)
Before running the application, let’s define a sample CSV file that contains customer data. This file serves as one of the data sources for our CompositeItemReader
.
1 2 3 | id,name,email 1,John Fish,john.fish@jcg.com 2,Dane Smith,dane.smith@jcg.com |
Each line represents a customer with their ID, name, and email address.
Inserting Sample Data into the Database
To ensure that our batch job processes customers from the database, we need to insert some sample records into the customers table before execution. We can create a data.sql
file inside the src/main/resources
directory. Spring Boot will automatically execute it when the application starts if you have spring.datasource.initialization-mode=always
in application.properties
.
Create src/main/resources/data.sql
:
1 2 3 | INSERT INTO customers (id, name , email) VALUES (101, 'Alice Johnson' , 'alice.johnson@jcg.com' ); INSERT INTO customers (id, name , email) VALUES (102, 'Bob Williams' , 'bob.williams@jcg.com' ); INSERT INTO customers (id, name , email) VALUES (103, 'Charlie Brown' , 'charlie.brown@jcg.com' ); |
Example schema.sql
file to define the table:
1 2 3 4 5 | CREATE TABLE IF NOT EXISTS customers ( id BIGINT PRIMARY KEY , name VARCHAR (255), email VARCHAR (255) ); |
Build and Run the Application
To build and run the application with Maven, use the command mvn clean spring-boot:run
, which triggers Spring Batch to execute the job and prints customer records from both the CSV file and the database in the console. You should see customer records from both the CSV file and the database, printed as follows:
This confirms that the job successfully read customer data from the customers.csv
file, retrieved pre-inserted customers from the database, and merged records from both sources using CompositeItemReader to process them as a single data stream. Additionally, the job execution was completed successfully.
7. Conclusion
In this article, we explored how to use CompositeItemReader
in Spring Batch to read and process customer data from multiple sources, including a CSV file and a database. We started by setting up the necessary FlatFileItemReader
and JdbcCursorItemReader
, combined them using CompositeItemReader
, and configured a batch job to process the merged data stream. We also demonstrated how to execute and the batch job to ensure that records from both sources were correctly read and processed.
8. Download the Source Code
You can download the full source code of this example here: spring batch composite item reader