Implementing an Elasticsearch Aggregation Query in Java
Elasticsearch is a distributed search engine that is designed for scalability, flexibility, and high performance. It allows us to store, search, and analyze large volumes of data quickly. One of its key features is the capability to perform aggregations, allowing us to conduct complex data analysis and extract statistical insights from our indexed data. In this article, we will walk through how to add an aggregation to an Elasticsearch query using Spring Data Elasticsearch.
1. Understanding Aggregations
Aggregations in Elasticsearch are tools that enable us to perform complex data analysis and generate insightful metrics on our indexed data. They work by processing and grouping our data in various ways, allowing us to compute statistics, histograms, and other advanced analytics.
1.1 Sample Application Overview
The sample application used in this article is designed to manage a collection of Product entities, which include fields such as id
, name
, category
, and price
. The application’s primary focus is to demonstrate how to perform aggregation queries in Elasticsearch using Spring Data, specifically how to count the number of documents per category.
To create an index named product-items
in Elasticsearch and add a few documents to it, we will use the curl
command to interact with Elasticsearch’s REST API.
First, create the product-items
index.
curl -X PUT "localhost:9200/product-items" -H 'Content-Type: application/json' -d' { "settings": { "number_of_shards": 1, "number_of_replicas": 1 } } '
This command will create an index named product-items
with one shard and one replica.
Next, add a few documents to the product-items
index. Each document will represent a product item, with fields like name
, price
, and category
.
Document 1
curl -X POST "localhost:9200/product-items/_doc/1" -H 'Content-Type: application/json' -d' { "name": "Laptop", "price": 999.99, "category": "Electronics" } '
Document 2
curl -X POST "localhost:9200/product-items/_doc/2" -H 'Content-Type: application/json' -d' { "name": "Smartphone", "price": 699.99, "category": "Electronics" } '
Document 3
curl -X POST "localhost:9200/product-items/_doc/3" -H 'Content-Type: application/json' -d' { "name": "Vans", "price": 399.99, "category": "Clothing" } '
To verify that the documents were added successfully, we can retrieve them using a search query:
curl -X GET "localhost:9200/product-items/_search" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} } } '
1.2 How Aggregations Work
Aggregations operate on the documents that match a query. Aggregations allow us to perform statistical operations on groups of documents. For instance, we can count the number of documents matching a specific criteria, calculate averages, find minimum and maximum values, or group data based on specific fields.
1.2.1 Basic Structure of an Elasticsearch Query with Aggregations
An Elasticsearch query with aggregations has the following structure:
{ "query": { "match_all": {} }, "aggs": { "by_category": { "terms": { "field": "category.keyword" }, "aggs": { "average_price": { "avg": { "field": "price" } } } } } }
In this query:
- The
match_all
query matches all documents. - The
terms
aggregationby_category
groups documents by thecategory
field. - Within each category bucket, an
avg
aggregationaverage_price
calculates the average price of products.
2. Project Setup
First, we set up a Spring Boot project with the necessary dependencies. In the pom.xml
file, include the following dependencies:
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.17.11</version> </dependency>
2.1 Configure Elasticsearch
Next, we set up Elasticsearch by creating a configuration class.
@Configuration @EnableElasticsearchRepositories public class ElasticsearchConfig { @Bean public RestClient restHighLevelClient() { return RestClient.builder( new HttpHost("localhost", 9200, "http"), new HttpHost("localhost", 9201, "http")) .build(); } @Bean public ElasticsearchClient elasticsearchClient(RestClient restClient) { return ElasticsearchClients.createImperative(restClient); } @Bean(name = { "elasticsearchOperations", "elasticsearchTemplate" }) public ElasticsearchOperations elasticsearchOperations(ElasticsearchClient searchClient) { ElasticsearchTemplate template = new ElasticsearchTemplate(searchClient); template.setRefreshPolicy(RefreshPolicy.NONE); return template; } }
RestClient
Bean: TheRestClient
is the main client that interacts with Elasticsearch. It is set up to connect to an Elasticsearch cluster running onlocalhost
at ports9200
and9201
. You can adjust theHttpHost
values to match your specific Elasticsearch setup.- The
ElasticsearchClient
is a client provided by Elasticsearch. We create theElasticsearchClient
by wrapping theRestClient
using theElasticsearchClients.createImperative()
method. ElasticsearchOperations
is an abstraction provided by Spring Data Elasticsearch, which simplifies the interaction with Elasticsearch.ElasticsearchTemplate
is an implementation ofElasticsearchOperations
, leveraging theElasticsearchClient
for its operations.
2.2 Create the Model
Next, let’s define the Product
class. This class will represent the documents stored in our Elasticsearch index.
@Document(indexName = "product-items") public class Product { @Id private String id; @Field(type = Keyword) private String name; @Field(type = Keyword) private String category; @Field(type = Keyword) private float price; public String getId() { return id; } public void setId(String id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getCategory() { return category; } public void setCategory(String category) { this.category = category; } public float getPrice() { return price; } public void setPrice(float price) { this.price = price; } }
This code snippet defines a Java class Product
that is used to map documents in an Elasticsearch index named product-items
using Spring Data Elasticsearch. The @Document
annotation specifies that instances of the Product
class will be indexed in Elasticsearch under the index name product-items
.
Within the Product
class, the @Field
annotations are used to define the type of each field in the Elasticsearch index. In this case, all fields (name
, category
, price
) are of type Keyword
. The Keyword
type is suitable for exact matches and aggregations, and is typically used for fields where the value should not be analyzed, such as categories.
2.3 Create the Repository
Next, create an interface to return the number of products grouped by category.
import org.springframework.data.domain.Pageable; import org.springframework.data.elasticsearch.core.SearchPage; public interface ProductItemRepository { SearchPage<Product> getProductsCountByCategory(Pageable pageable); }
This interface declares a method that will return the number of products grouped by category, with results paginated according to the Pageable
parameter.
Next, create a repository interface for Product
:
@Repository public interface ProductRepository extends ElasticsearchRepository<Product, String>, ProductItemRepository { }
Here, we are extending ElasticsearchRepository<Product, String>
which is a Spring Data Elasticsearch interface that provides standard CRUD (Create, Read, Update, Delete) operations for the Product
entity. By extending ProductItemRepository
interface, the ProductRepository
inherits the custom method to perform the query on our Product
entity.
2.4 Add Aggregations
Now, let’s focus on adding aggregations. We will add a simple aggregation to count the number of products in each category. We will create a service class that uses ElasticsearchOperations
to execute the aggregation queries.
@Service public class ProductService implements ProductItemRepository { @Autowired private ElasticsearchOperations elasticsearchOperations; @Override public SearchPage<Product> getProductsCountByCategory(Pageable pageable) { Query query = NativeQuery.builder() .withAggregation("category_aggregation", Aggregation.of(b -> b.terms(t -> t.field("category.keyword")))) .build(); SearchHits<Product> response = elasticsearchOperations.search(query, Product.class); return SearchHitSupport.searchPageFor(response, pageable); } public Map<String, Long> countByCategory() { Query query = NativeQuery.builder() .withAggregation("category_aggregation", Aggregation.of(b -> b.terms(t -> t.field("category.keyword")))) .build(); SearchHits<Product> searchHits = elasticsearchOperations.search(query, Product.class); Map<String, Long> aggregatedData = ((ElasticsearchAggregations) searchHits .getAggregations()) .get("category_aggregation") .aggregation() .getAggregate() .sterms() .buckets() .array() .stream() .collect(Collectors.toMap(bucket -> bucket.key().stringValue(), MultiBucketBase::docCount)); return aggregatedData; } }
Here, we use NativeQuery.builder()
to build a query with a terms aggregation. In the getProductsCountByCategory(Pageable pageable)
method, the aggregation, named "category_aggregation"
, counts the number of documents for each unique value in the category
field. The query is executed using elasticsearchOperations.search()
, and the result is converted to a SearchPage
for pagination.
The countByCategory
method executes a terms aggregation query that counts the number of documents in each category.
2.5 Expose the Aggregation Result via REST
Finally, expose the aggregation results via a REST controller:
@RestController public class ProductController { @Autowired private ProductService productService; @GetMapping("/products-count-by-category") public List<Product> getProductsCountByCategory(@RequestParam(defaultValue = "2") int size) { SearchHits<Product> searchHits = productService.getProductsCountByCategory(Pageable.ofSize(size)) .getSearchHits(); return searchHits.getSearchHits() .stream() .map(SearchHit::getContent) .collect(Collectors.toList()); } @GetMapping("/count-by-category") public Map<String, Long> countByCategory() { return productService.countByCategory(); }
The ProductController
class exposes two endpoints.
- The
/products-count-by-category
endpoint takessize
as a query parameter for pagination. It calls thegetProductsCountByCategory()
method from theProductService
to fetch the aggregated data. - The
/count-by-category
endpoint calls thecountByCategory
method and returns the counts as a JSON object.
3. Running and Testing the Application
With everything set up, we can run the application. The endpoint /products-count-by-category
will return a list of Product
objects, grouped by category with the number of documents (products) in each category.
We can test the /products-count-by-category
endpoint using the following curl
command:
curl -X GET "http://localhost:8080/products-count-by-category?size=2" -H "Content-Type: application/json"
We can test the new /count-by-category
endpoint using the following curl
command:
curl -X GET "http://localhost:8080/count-by-category" -H "Content-Type: application/json"
The response will be a JSON object where the keys are the category names and the values are the number of documents in each category. The screenshot below shows the output when viewed from a web browser.
This output shows that there are 2 products in the “Electronics” category and 1 in “Clothing.” The exact counts will depend on the data in your Elasticsearch index.
4. Conclusion
In this article, we’ve explored how to integrate and configure Elasticsearch in a Spring Boot application. We set up a basic configuration using a configuration class, which included defining essential beans like RestClient
, ElasticsearchClient
, and ElasticsearchOperations
. These components are crucial for interacting with Elasticsearch leveraging advanced features like aggregations.
5. Download the Source Code
This was an article on how to perform an Elasticsearch aggregation query.
You can download the full source code of this example here: elasticsearch aggregation query