CSV Import into Elasticsearch with Spring Boot
Elasticsearch is a powerful search and analytics engine used in various applications requiring fast retrieval of structured and unstructured data. Importing CSV data into Elasticsearch is a common use case, and Spring Boot makes this process seamless. This article will guide you through how to import CSV data into Elasticsearch using Spring Boot.
1. Setting Up ElasticSearch
To install Elasticsearch, follow the official installation guide provided by Elastic: ElasticSearch Installation Guide. The commands below can download/pull the Elasticsearch Docker image.
1 2 | docker network create elastic docker pull docker.elastic.co /elasticsearch/elasticsearch :8.17.1 |
Next, run the following command to start an Elasticsearch container:
1 | docker run --name elasticsearch --net elastic -p 9200:9200 -it -m 1GB docker.elastic.co /elasticsearch/elasticsearch :8.17.1 |
This starts an Elasticsearch instance on port 9200
.
Alternatively, you can follow the guide below If you have Homebrew installed on your system. You can run a few simple brew
commands to quickly install Elasticsearch along with all the necessary dependencies required for proper functionality. This ensures a smooth setup process without manually handling configurations.
1 2 | brew tap elastic /tap brew install elastic /tap/elasticsearch-full |
After completing the installation, you can start Elasticsearch directly from the terminal by running the elasticsearch
command.
1 | $ elasticsearch |
2. Create a Spring Boot Project and Add Dependencies
2.1 Creating the Spring Boot Project
Go to Spring Initializr and generate a Spring Boot project with the following dependencies:
- Spring Web – To create REST APIs
- Spring Data Elasticsearch for integrating with Elasticsearch.
Download and unzip the project, then open it in your IDE.
2.2 Adding Required Dependencies
Open the pom.xml
file and add the following dependencies:
01 02 03 04 05 06 07 08 09 10 11 12 13 | <!-- Elasticsearch Rest High-Level Client --> < dependency > < groupId >org.elasticsearch.client</ groupId > < artifactId >elasticsearch-rest-high-level-client</ artifactId > < version >7.17.27</ version > </ dependency > <!-- Apache Commons CSV for parsing CSV files --> < dependency > < groupId >org.apache.commons</ groupId > < artifactId >commons-csv</ artifactId > < version >1.13.0</ version > </ dependency > |
3. Configuring Elasticsearch Rest High-Level Client
Create a configuration class to set up the Elasticsearch client:
01 02 03 04 05 06 07 08 09 10 | @Configuration public class ElasticsearchConfig { @Bean public RestHighLevelClient client() { return new RestHighLevelClient( RestClient.builder( new HttpHost( "localhost" , 9200 , "http" )) ); } } |
This configuration creates a RestHighLevelClient bean that connects to Elasticsearch on localhost:9200
.
4. Defining a Model for CSV Data
Assuming our CSV file (books.csv
) contains the following data:
id,title,author,genre 1,Spring Boot in Action,Craig Walls,Technology 2,Effective Java,Joshua Bloch,Programming 3,Clean Code,Robert C. Martin,Software Engineering
Ensure this file is placed in the src/main/resources
directory of your Spring Boot application.
Create a Book.java
model class:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 | @Document (indexName = "books" ) public class Book { @Id private String id; private String title; private String author; private String genre; public Book() { } public Book(String id, String title, String author, String genre) { this .id = id; this .title = title; this .author = author; this .genre = genre; } // Standard Getters and Setters } |
@Document(indexName = "books")
tells Spring Data Elasticsearch to treat this class as an Elasticsearch document.@Id
marks theid
field as the primary identifier in Elasticsearch.
5. Reading and Parsing CSV Data
We will use Apache Commons CSV to read the CSV file. Create CSVService.java
:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | @Service public class CSVService { public List<Book> parseCSV(String filePath) { List<Book> books = new ArrayList<>(); try (Reader reader = new InputStreamReader( new ClassPathResource(filePath).getInputStream(), StandardCharsets.UTF_8); CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT.withFirstRecordAsHeader())) { for (CSVRecord csvRecord : csvParser) { Book book = new Book( csvRecord.get( "id" ), csvRecord.get( "title" ), csvRecord.get( "author" ), csvRecord.get( "genre" ) ); books.add(book); } } catch (IOException e) { System.out.println( " " + e.getMessage()); } return books; } } |
This method, parseCSV(String filePath)
, is responsible for reading and parsing a CSV file containing book data. It takes the file path as an argument and returns a list of Book
objects. The method initializes an empty ArrayList
to store books and uses a try-with-resources block to handle file reading safely.
Inside the try block, it creates a Reader
to read the CSV file from the classpath using ClassPathResource
. A CSVParser
is then used to parse the file, with CSVFormat.DEFAULT.withFirstRecordAsHeader()
ensuring that the first row is treated as column headers rather than data.
The method iterates over each CSVRecord
in the parsed CSV file. For each record, it extracts values from the columns “id,” “title,” “author,” and “genre” and constructs a new Book
object. This object is then added to the list. Finally, the method returns the list of parsed Book
objects, making them available for further processing, such as indexing into Elasticsearch.
5.1 Indexing Data into Elasticsearch
Create ElasticsearchService.java
to index data into Elasticsearch.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 | @Service public class ElasticsearchService { @Autowired private RestHighLevelClient client; @Autowired private ObjectMapper objectMapper; public void indexBooks(List books) throws IOException { for (Book book : books) { IndexRequest indexRequest = new IndexRequest( "books" ) .id(book.getId()) .source(objectMapper.convertValue(book, Map. class )); client.index(indexRequest, RequestOptions.DEFAULT); } } } |
The ElasticsearchService
class is responsible for interacting with Elasticsearch using the Elasticsearch Rest High-Level Client. It provides a method to index a list of Book
objects into an Elasticsearch index named "books"
. The class has two dependencies: RestHighLevelClient
, which is used to communicate with Elasticsearch, and ObjectMapper
, which converts Java objects into a format suitable for indexing.
The indexBooks(List<Book> books)
method takes a list of books as input and iterates over them. For each Book
object, it creates an IndexRequest
for the "books"
index, using the book’s id
as the document ID. The source
method is used to convert the Book
object into a Map
, ensuring it is stored in a JSON-compatible format. The request is then sent to Elasticsearch using the client.index()
method with default request options.
5.2 Creating a Controller to Trigger Import
Now that we have set up our CSV parsing and Elasticsearch indexing logic, we need a way to trigger the import process. A REST controller will expose an API endpoint that reads the CSV file, converts the data into Book
objects, and indexes them into Elasticsearch. This approach lets us initiate the import process with a simple HTTP request.
Here’s the implementation of the controller:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 | @RestController @RequestMapping ( "/books" ) public class BookController { @Autowired private CSVService csvService; @Autowired private ElasticsearchService elasticsearchService; @PostMapping ( "/import" ) public String importCSV() throws IOException { List books = csvService.parseCSV( "books.csv" ); elasticsearchService.indexBooks(books); return "CSV Imported Successfully!" ; } } |
The importCSV()
method is mapped to a POST
request at /books/import
. When this endpoint is called, it first invokes csvService.parseCSV("books.csv")
, which reads and parses the CSV file. The returned list of Book
objects is then passed to elasticsearchService.indexBooks(books)
, which indexes the books into Elasticsearch. If the process completes without errors, the method returns "CSV Imported Successfully!"
as a response.
Now, start the application and trigger the CSV import using curl
, with the following command in your terminal:
1 | curl -X POST http: //localhost :8080 /books/import |
Example Response (JSON):
5. Importing CSV into Elasticsearch Using Spring Batch
In addition to using a simple CSV parsing and indexing approach, we can leverage Spring Batch to efficiently handle large CSV files and import them into Elasticsearch. Spring Batch provides a scalable and robust way to process large datasets in chunks, ensuring reliability, fault tolerance, and better performance.
Spring Batch operates in three main steps:
- Reading the CSV file
- Processing and transforming data
- Writing the data to Elasticsearch
This method is useful when dealing with large CSV files that may not fit into memory at once. First, let us add the necessary dependencies to our pom.xml
file:
1 2 3 4 5 | < dependency > < groupId >org.springframework.boot</ groupId > < artifactId >spring-boot-starter-batch</ artifactId > < version >3.4.2</ version > </ dependency > |
5.1 Implement CSV Reader
This block of code reads book records from the CSV file and maps them to Book
objects.
01 02 03 04 05 06 07 08 09 10 | @Bean public FlatFileItemReader<Book> reader() { return new FlatFileItemReaderBuilder<Book>() .name( "bookItemReader" ) .resource( new ClassPathResource( "data.csv" )) .delimited() .names( "id" , "title" , "author" , "genre" ) .targetType(Book. class ) .build(); } |
This code defines a FlatFileItemReader<Book>
bean using FlatFileItemReaderBuilder
for reading a CSV file (data.csv
) and mapping its contents to the Book
class. The reader is named "bookItemReader"
and loads the file from the classpath. It expects a comma-separated format, mapping columns (id, title, author, genre
) to corresponding fields in Book
. The .targetType(Book.class)
method automatically converts each row into a Book
object. Finally, .build()
constructs the reader, which will be used in the batch job to process CSV data.
5.2 Implement Processor (ItemProcessor, Optional)
We can add a processor to modify data before saving it. ItemProcessor
class allows for transformations or validation of data before writing it to Elasticsearch.
1 2 3 4 5 6 7 8 9 | @Component public class BookItemProcessor implements ItemProcessor<Book, Book> { @Override public Book process(Book book) { book.setTitle(book.getTitle().toUpperCase()); // Convert title to uppercase return book; } } |
The BookItemProcessor
is an optional step that allows us to transform the data before writing it to Elasticsearch. In this example, the book title is converted to uppercase. If additional processing is needed, such as data validation or enrichment, it can be done here.
5.3 Implement Elasticsearch Writer
This writer takes the processed Book
objects and indexes them into Elasticsearch.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 | @Bean public ItemWriter<Book> writer(RestHighLevelClient restHighLevelClient) { return books -> { for (Book book : books) { IndexRequest request = new IndexRequest( "book" ) .id(book.getId()) .source(Map.of( "id" , book.getId(), "title" , book.getTitle(), "author" , book.getAuthor(), "genre" , book.getGenre() )); restHighLevelClient.index(request, RequestOptions.DEFAULT); } }; } |
This ItemWriter<Book>
bean writes Book
objects to Elasticsearch using RestHighLevelClient
. It iterates through the list of books, creating an IndexRequest
for each one with the index name "book"
, setting the document ID from book.getId()
, and mapping the book fields (id, title, author, genre
) to a Map
. The request is then sent using restHighLevelClient.index(request, RequestOptions.DEFAULT)
, ensuring each book is stored in Elasticsearch.
5.4 Configure Spring Batch Job
This class defines the workflow and the chunk size for efficient batch processing.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | @Bean public Job importBooksJob(Step step, JobRepository jobRepository) { var builder = new JobBuilder( "importBooksJob" , jobRepository); return builder .start(step) .build(); } @Bean public Step jsonstep( JobRepository jobRepository, PlatformTransactionManager transactionManager, RestHighLevelClient restHighLevelClient) { var builder = new StepBuilder( "batch-step" , jobRepository); return builder .<Book, Book>chunk( 4 , transactionManager) .reader(reader()) .writer(writer(restHighLevelClient)) .faultTolerant() .skip(FlatFileParseException. class ) .skipLimit( 10 ) .build(); } |
The first bean, importBooksJob
, defines a Spring Batch job named "importBooksJob"
. It takes a Step
and a JobRepository
as parameters. The JobBuilder
is used to create the job, specifying the jobRepository
for managing job execution metadata. The job starts with the provided Step
and is then built, making it ready for execution.
The second bean, jsonstep
, defines a batch step named "batch-step"
using StepBuilder
. It takes a JobRepository
, PlatformTransactionManager
, and RestHighLevelClient
as dependencies. The step processes data in chunks of 4 records at a time, using reader()
to read CSV data and writer(restHighLevelClient)
to store the books in Elasticsearch. The .faultTolerant()
configuration allows the step to skip up to 10 FlatFileParseException
errors, ensuring minor issues don’t halt the entire batch job.
5.5 Run the Job from a Controller
This controller exposes an API endpoint to manually start the batch import.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 | @RestController @RequestMapping ( "/batch" ) public class BatchController { @Autowired private JobLauncher jobLauncher; @Autowired private Job job; @GetMapping ( "/import" ) public String importCSV() throws Exception { jobLauncher.run(job, new JobParameters()); return "Batch Import Started!" ; } } |
This BatchController
is a REST controller that provides an endpoint to trigger the batch job. It is mapped to "/batch"
using @RequestMapping
. The JobLauncher
and Job
are autowired to facilitate launching the batch job. JobLauncher
is responsible for executing jobs, while Job
represents the defined import process.
The importCSV()
method is mapped to GET /batch/import
. When called, it triggers the batch job using jobLauncher.run(job, new JobParameters())
, starting the CSV import process.
6. Conclusion
In this article, we explored two methods for importing CSV data into Elasticsearch using Spring Boot: Apache Commons CSV and Spring Batch. The Apache Commons CSV approach provided a simple way to read CSV files, manually parse records, and index them into Elasticsearch using RestHighLevelClient
. This method is lightweight and useful for small-scale imports or applications that don’t require advanced job management.
The Spring Batch approach offered a more structured and scalable solution for handling large CSV imports. It leveraged FlatFileItemReader
for reading data, ItemProcessor
for optional transformations, and ItemWriter
for writing to Elasticsearch. Spring Batch’s fault tolerance, chunk-based processing, and job execution tracking made it ideal for handling large datasets efficiently.
Both methods are effective depending on our needs. Apache Commons CSV is great for quick imports, while Spring Batch provides better scalability and resilience. By implementing these techniques, we can efficiently manage and index CSV data in Elasticsearch, making it accessible for search and analytics.
7. Download the Source Code
This article covered how to import CSV data into Elasticsearch using Spring Boot.
You can download the full source code of this example here: import CSV data into Elasticsearch using Spring Boot