Enterprise Java

Implementing an Elasticsearch Aggregation Query in Java

Elasticsearch is a distributed search engine that is designed for scalability, flexibility, and high performance. It allows us to store, search, and analyze large volumes of data quickly. One of its key features is the capability to perform aggregations, allowing us to conduct complex data analysis and extract statistical insights from our indexed data. In this article, we will walk through how to add an aggregation to an Elasticsearch query using Spring Data Elasticsearch.

1. Understanding Aggregations

Aggregations in Elasticsearch are tools that enable us to perform complex data analysis and generate insightful metrics on our indexed data. They work by processing and grouping our data in various ways, allowing us to compute statistics, histograms, and other advanced analytics.

1.1 Sample Application Overview

The sample application used in this article is designed to manage a collection of Product entities, which include fields such as id, name, category, and price. The application’s primary focus is to demonstrate how to perform aggregation queries in Elasticsearch using Spring Data, specifically how to count the number of documents per category.

To create an index named product-items in Elasticsearch and add a few documents to it, we will use the curl command to interact with Elasticsearch’s REST API.

First, create the product-items index.

curl -X PUT "localhost:9200/product-items" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}
'

This command will create an index named product-items with one shard and one replica.

Next, add a few documents to the product-items index. Each document will represent a product item, with fields like name, price, and category.

Document 1

curl -X POST "localhost:9200/product-items/_doc/1" -H 'Content-Type: application/json' -d'
{
  "name": "Laptop",
  "price": 999.99,
  "category": "Electronics"
}
'

Document 2

curl -X POST "localhost:9200/product-items/_doc/2" -H 'Content-Type: application/json' -d'
{
  "name": "Smartphone",
  "price": 699.99,
  "category": "Electronics"
}
'

Document 3

curl -X POST "localhost:9200/product-items/_doc/3" -H 'Content-Type: application/json' -d'
{
  "name": "Vans",
  "price": 399.99,
  "category": "Clothing"
}
'

To verify that the documents were added successfully, we can retrieve them using a search query:

curl -X GET "localhost:9200/product-items/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}
'

1.2 How Aggregations Work

Aggregations operate on the documents that match a query. Aggregations allow us to perform statistical operations on groups of documents. For instance, we can count the number of documents matching a specific criteria, calculate averages, find minimum and maximum values, or group data based on specific fields.

1.2.1 Basic Structure of an Elasticsearch Query with Aggregations

An Elasticsearch query with aggregations has the following structure:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category.keyword"
      },
      "aggs": {
        "average_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

In this query:

  • The match_all query matches all documents.
  • The terms aggregation by_category groups documents by the category field.
  • Within each category bucket, an avg aggregation average_price calculates the average price of products.

2. Project Setup

First, we set up a Spring Boot project with the necessary dependencies. In the pom.xml file, include the following dependencies:

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.17.11</version>
        </dependency>

2.1 Configure Elasticsearch

Next, we set up Elasticsearch by creating a configuration class.

@Configuration
@EnableElasticsearchRepositories
public class ElasticsearchConfig {

    @Bean
    public RestClient restHighLevelClient() {
        return RestClient.builder(
                new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http"))
                .build();
    }

    @Bean
    public ElasticsearchClient elasticsearchClient(RestClient restClient) {
        return ElasticsearchClients.createImperative(restClient);
    }

    @Bean(name = { "elasticsearchOperations", "elasticsearchTemplate" })
    public ElasticsearchOperations elasticsearchOperations(ElasticsearchClient searchClient) {

        ElasticsearchTemplate template = new ElasticsearchTemplate(searchClient);
        template.setRefreshPolicy(RefreshPolicy.NONE);
        return template;
    }
}

  • RestClient Bean: The RestClient is the main client that interacts with Elasticsearch. It is set up to connect to an Elasticsearch cluster running on localhost at ports 9200 and 9201. You can adjust the HttpHost values to match your specific Elasticsearch setup.
  • The ElasticsearchClient is a client provided by Elasticsearch. We create the ElasticsearchClient by wrapping the RestClient using the ElasticsearchClients.createImperative() method.
  • ElasticsearchOperations is an abstraction provided by Spring Data Elasticsearch, which simplifies the interaction with Elasticsearch. ElasticsearchTemplate is an implementation of ElasticsearchOperations, leveraging the ElasticsearchClient for its operations.

2.2 Create the Model

Next, let’s define the Product class. This class will represent the documents stored in our Elasticsearch index.

@Document(indexName = "product-items")
public class Product {
    
    @Id
    private String id;
    
    @Field(type = Keyword)
    private String name;
    
    @Field(type = Keyword)
    private String category;
    
    @Field(type = Keyword)
    private float price;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getCategory() {
        return category;
    }

    public void setCategory(String category) {
        this.category = category;
    }

    public float getPrice() {
        return price;
    }

    public void setPrice(float price) {
        this.price = price;
    }   
}

This code snippet defines a Java class Product that is used to map documents in an Elasticsearch index named product-items using Spring Data Elasticsearch. The @Document annotation specifies that instances of the Product class will be indexed in Elasticsearch under the index name product-items.

Within the Product class, the @Field annotations are used to define the type of each field in the Elasticsearch index. In this case, all fields (name, category, price) are of type Keyword. The Keyword type is suitable for exact matches and aggregations, and is typically used for fields where the value should not be analyzed, such as categories.

2.3 Create the Repository

Next, create an interface to return the number of products grouped by category.

import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.core.SearchPage;

public interface ProductItemRepository {

    SearchPage<Product> getProductsCountByCategory(Pageable pageable);
}

This interface declares a method that will return the number of products grouped by category, with results paginated according to the Pageable parameter.

Next, create a repository interface for Product:

@Repository
public interface ProductRepository extends ElasticsearchRepository<Product, String>, ProductItemRepository {

}

Here, we are extending ElasticsearchRepository<Product, String> which is a Spring Data Elasticsearch interface that provides standard CRUD (Create, Read, Update, Delete) operations for the Product entity. By extending ProductItemRepository interface, the ProductRepository inherits the custom method to perform the query on our Product entity.

2.4 Add Aggregations

Now, let’s focus on adding aggregations. We will add a simple aggregation to count the number of products in each category. We will create a service class that uses ElasticsearchOperations to execute the aggregation queries.

@Service
public class ProductService implements ProductItemRepository {

    @Autowired
    private ElasticsearchOperations elasticsearchOperations;

    @Override
    public SearchPage<Product> getProductsCountByCategory(Pageable pageable) {
        Query query = NativeQuery.builder()
                .withAggregation("category_aggregation",
                        Aggregation.of(b -> b.terms(t -> t.field("category.keyword"))))
                .build();

        SearchHits<Product> response = elasticsearchOperations.search(query, Product.class);
        return SearchHitSupport.searchPageFor(response, pageable);
    }

        public Map<String, Long> countByCategory() {
        Query query = NativeQuery.builder()
                .withAggregation("category_aggregation",
                        Aggregation.of(b -> b.terms(t -> t.field("category.keyword"))))
                .build();

        SearchHits<Product> searchHits = elasticsearchOperations.search(query, Product.class);

        Map<String, Long> aggregatedData = ((ElasticsearchAggregations) searchHits
                .getAggregations())
                .get("category_aggregation")
                .aggregation()
                .getAggregate()
                .sterms()
                .buckets()
                .array()
                .stream()
                .collect(Collectors.toMap(bucket -> bucket.key().stringValue(), MultiBucketBase::docCount));

        return aggregatedData;
    }
}

Here, we use NativeQuery.builder() to build a query with a terms aggregation. In the getProductsCountByCategory(Pageable pageable) method, the aggregation, named "category_aggregation", counts the number of documents for each unique value in the category field. The query is executed using elasticsearchOperations.search(), and the result is converted to a SearchPage for pagination.

The countByCategory method executes a terms aggregation query that counts the number of documents in each category.

2.5 Expose the Aggregation Result via REST

Finally, expose the aggregation results via a REST controller:

@RestController
public class ProductController {

    @Autowired
    private ProductService productService;

    @GetMapping("/products-count-by-category")
    public List<Product> getProductsCountByCategory(@RequestParam(defaultValue = "2") int size) {
        SearchHits<Product> searchHits = productService.getProductsCountByCategory(Pageable.ofSize(size))
                .getSearchHits();

        return searchHits.getSearchHits()
                .stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());
    }

    @GetMapping("/count-by-category")
    public Map<String, Long> countByCategory() {
        return productService.countByCategory();
    }


The ProductController class exposes two endpoints.

  • The /products-count-by-category endpoint takes size as a query parameter for pagination. It calls the getProductsCountByCategory() method from the ProductService to fetch the aggregated data.
  • The /count-by-category endpoint calls the countByCategory method and returns the counts as a JSON object.

3. Running and Testing the Application

With everything set up, we can run the application. The endpoint /products-count-by-category will return a list of Product objects, grouped by category with the number of documents (products) in each category.

We can test the /products-count-by-category endpoint using the following curl command:

curl -X GET "http://localhost:8080/products-count-by-category?size=2" -H "Content-Type: application/json"

We can test the new /count-by-category endpoint using the following curl command:

curl -X GET "http://localhost:8080/count-by-category" -H "Content-Type: application/json"

The response will be a JSON object where the keys are the category names and the values are the number of documents in each category. The screenshot below shows the output when viewed from a web browser.

Elasticsearch aggregation query example results displayed in a web browser.

This output shows that there are 2 products in the “Electronics” category and 1 in “Clothing.” The exact counts will depend on the data in your Elasticsearch index.

4. Conclusion

In this article, we’ve explored how to integrate and configure Elasticsearch in a Spring Boot application. We set up a basic configuration using a configuration class, which included defining essential beans like RestClient, ElasticsearchClient, and ElasticsearchOperations. These components are crucial for interacting with Elasticsearch leveraging advanced features like aggregations.

5. Download the Source Code

This was an article on how to perform an Elasticsearch aggregation query.

Download
You can download the full source code of this example here: elasticsearch aggregation query

Omozegie Aziegbe

Omos holds a Master degree in Information Engineering with Network Management from the Robert Gordon University, Aberdeen. Omos is currently a freelance web/application developer who is currently focused on developing Java enterprise applications with the Jakarta EE framework.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button