Excel File Processing with Spring Batch and JPA

Omozegie AziegbeJuly 17th, 2024Last Updated: July 17th, 2024

0 1,091 5 minutes read

In this article, we will demonstrate how to read all the rows from an Excel (.xls or .xlsx) file using Spring Batch and save them into a database in a Spring Boot application. We will cover the entire process from reading the Excel file, converting the rows into Entity instances, saving these instances into the database, logging the progress, and scheduling the batch job with a cron expression.

1. Prerequisites

Ensure you have the following:

JDK 17 or later
Maven or Gradle
Spring Boot 3.2.7 or later
H2 or any other database
Apache POI for reading Excel files

2. Setting Up the Project

Here is the pom.xml configuration for a Maven project:

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch.extensions</groupId>
            <artifactId>spring-batch-excel</artifactId>
            <version>0.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>5.2.5</version>
        </dependency>
        <dependency>
            <groupId>com.h2database</groupId>
            <artifactId>h2</artifactId>
            <scope>runtime</scope>
        </dependency>
   </dependencies>

Sample Excel File

Below is a sample Excel file named employees.xlsx used in this article:

This Excel file contains a list of employees, with each row representing an individual employee’s details. The columns in the Excel file correspond directly to the fields in the Employee entity class:

Name: This column maps to the name field in the Employee entity.
Department: This column maps to the department field in the Employee entity.
Email: This column maps to the email field in the Employee entity.

Define the Employee Entity

The Employee entity class is defined as follows:

@Entity
public class Employee {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String name;
    private String department;
    private String email;

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getDepartment() {
        return department;
    }

    public void setDepartment(String department) {
        this.department = department;
    }

    public String getEmail() {
        return email;
    }

    public void setEmail(String email) {
        this.email = email;
    }

    @Override
    public String toString() {
        return "Employee{" + "id=" + id + ", name=" + name + ", department=" + department + ", email=" + email + '}';
    }    
}

Create the EmployeeRepository

Create a repository interface for the Employee entity.

@Repository
public interface EmployeeRepository extends JpaRepository{
    
}

3. Implement the Excel Reader with PoiItemReader

Spring Batch Extension provides PoiItemReader for reading Excel files. Here, we will configure it to process the employee data:

@Configuration
public class PoiReader {
    
    @Value("${excel.file.path}")
    private String filePath;

    @Bean
    public ItemReader<Employee> employeeReader() {
        PoiItemReader<Employee> reader = new PoiItemReader<>();
        reader.setResource(new ClassPathResource(filePath));
        reader.setLinesToSkip(1);
        reader.setRowMapper(new BeanWrapperRowMapper<Employee>() {
            {
                setTargetType(Employee.class);
            }
        });
        reader.setName("employeeReader");
        return reader;
    }
    
}

@Value annotation injects the file path from application properties (e.g., excel.file.path=employees.xlsx).
PoiItemReader is configured with the Excel file location as a ClasspathResource.
BeanWrapperRowMapper automatically maps column names to the corresponding fields in the Employee class.
setLinesToSkip(1) ensures the header row is skipped.

4. Saving to Database with JpaRepository

Next, we will leverage the Spring Data JPA EmployeeRepository to save the processed employee data to the database.

    @Bean
    public ItemWriter employeeWriter(EmployeeRepository repository) {
        return items -> {
            for (Employee employee : items) {
                repository.save(employee);
                System.out.println("Employee saved: " + employee.getName());
            }
        };
    }

This ItemWriter iterates through a list of Employee objects and saves them using the EmployeeRepository.

5. Configure Spring Batch Job

Here is the full example code used in this article to configure a Spring Batch job that reads employee data from an Excel file, processes the data, and writes it into a database. Each step has be broken down into smaller components – ItemReader, ItemProcessor, and ItemWriter.

@Configuration
public class PoiReader {

    @Value("${excel.file.path}")
    private String filePath;

    private final JobRepository jobRepository;

    private final PlatformTransactionManager transactionManager;

    private final EmployeeRepository employeeRepository;

    public PoiReader(JobRepository jobRepository, EmployeeRepository employeeRepository, PlatformTransactionManager transactionManager) {
        this.jobRepository = jobRepository;
        this.transactionManager = transactionManager;
        this.employeeRepository = employeeRepository;
    }

    @Bean
    public ItemReader<Employee> employeeReader() {
        PoiItemReader<Employee> reader = new PoiItemReader<>();
        reader.setResource(new ClassPathResource(filePath));
        reader.setLinesToSkip(1);
        reader.setRowMapper(new BeanWrapperRowMapper<Employee>() {
            {
                setTargetType(Employee.class);
            }
        });
        reader.setName("employeeReader");
        return reader;
    }

    @Bean
    public ItemWriter<Employee> employeeWriter() {
        return items -> {
            for (Employee employee : items) {
                employeeRepository.save(employee);
                System.out.println("Employee saved: " + employee.getName());
            }
        };
    }

    @Bean
    public ItemProcessor<Employee, Employee> processor() {
        return employee -> {
            // Example processor logic
            employee.setName(employee.getName());
            System.out.println("Name: " + employee.getName() + ", Department: " + employee.getDepartment());
            return employee;
        };
    }

    @Bean
    public Step chunkProcessingStep() {

        var builder = new StepBuilder("chunkProcessingStep", jobRepository);
        return builder
                .<Employee, Employee>chunk(1, transactionManager)
                .reader(employeeReader())
                .processor(processor())
                .writer(employeeWriter())
                .build();
    }

    @Bean
    Job importUserJob(Step step1) {

        var builder = new JobBuilder("importUserJob", jobRepository);
        return builder
                .incrementer(new RunIdIncrementer())
                .start(step1)
                .build();
    }

}

The above code defines a Spring Batch configuration that uses @Value to inject the path to the Excel file, and it includes constructor injection for JobRepository, EmployeeRepository, and PlatformTransactionManager, which are used for managing job execution, database operations, and transaction management, respectively.

The class defines several beans crucial for batch processing:

employeeReader(): This bean uses PoiItemReader to read data from an Excel file. It skips the header row and maps each row to an Employee object using BeanWrapperRowMapper.
employeeWriter(): This bean writes the processed Employee objects to the database using the EmployeeRepository.
processor(): This bean processes each Employee object. In this example, it prints the employee’s name and department.
chunkProcessingStep(): This bean defines a step in the batch job, specifying a chunk size of 1. It combines the reader, processor, and writer to handle the chunk of data.
importUserJob(): This bean defines the batch job itself, starting with the defined step and using RunIdIncrementer to ensure unique job runs.

6. Scheduling the Batch Job

Once the batch job is configured, the next step is to schedule it to run at specific intervals. Spring Boot provides scheduling capabilities through the @EnableScheduling annotation and the @Scheduled annotation. By using these annotations, we can easily schedule batch jobs to run at fixed intervals, such as hourly, daily, or based on more complex cron expressions.

In our example, we will schedule the batch job to run at the top of every hour. This ensures that our employee data is read from the Excel file and written to the database on an hourly basis.

@Configuration
@EnableScheduling
public class BatchScheduler {

    @Autowired
    private JobLauncher jobLauncher;

    @Autowired
    private Job importUserJob;

    @Scheduled(cron = "0 0 * * * ?") // Run at the top of every hour
    public void perform() throws Exception {
        jobLauncher.run(importUserJob, new JobParameters());
    }
       
}

The code snippet above schedules a batch job to run at the top of every hour. The class is annotated with @EnableScheduling to enable Spring’s scheduled task execution capability. It autowires JobLauncher and Job beans to launch and execute the batch job named importUserJob.

The @Scheduled(cron = "0 0 * * * ?") annotation specifies that the perform() method should run at the start of every hour, triggering the batch job.

Output

When we run the application with the configured Spring Batch job, it initializes all configured beans and sets up the batch job and scheduler and upon successful completion of the job, a summary log entry is created as shown below:

7. Conclusion

In this article, we explored how to implement a Spring Batch Excel reader to efficiently read employee data from Excel files and store it in a database. We covered each step, from setting up the project and defining an entity to configuring the batch job and scheduling it with cron expressions.