Enterprise Java

Spring AI Testing: AI Evaluators Example

Large Language Models (LLMs) have become an integral part of modern applications, but ensuring their accuracy and reliability is crucial. Spring AI provides tools to evaluate LLM responses effectively. Let us delve into understanding Spring AI testing and the role of AI evaluators in ensuring reliable AI models.

1. Introduction to Ollama and TestContainers

1.1 What is Ollama?

Ollama is a platform designed to run large language models (LLMs) efficiently on local machines. It provides an easy-to-use interface for loading, running, and managing AI models without requiring cloud dependencies. Ollama is optimized for speed and performance, making it ideal for developers and researchers who want to experiment with AI models directly on their hardware.

One of the key features of Ollama is its ability to work offline, ensuring data privacy and reducing reliance on external servers. It supports various open-source models, allowing users to choose the best fit for their applications. Additionally, Ollama simplifies model deployment by providing a streamlined installation process and a user-friendly API for integration with other applications.

Overall, Ollama is a powerful tool for AI enthusiasts, developers, and businesses looking to harness the potential of large language models without the complexities of cloud-based solutions.

1.2 What are TestContainers?

Testcontainers is an open-source library that enables developers to create lightweight, throwaway instances of databases, message brokers, and other services in Docker containers for integration testing. It simplifies the process of setting up and managing dependencies, ensuring consistent test environments. The key benefits of Testcontainers include improved test reliability, faster feedback loops, and easier setup of complex dependencies. It is widely used for testing microservices, database interactions, and cloud-native applications without requiring dedicated infrastructure. By leveraging Docker, Testcontainers provides a clean, isolated environment, reducing flakiness in tests and making continuous integration pipelines more robust.

By combining Ollama with TestContainers, developers can create a controlled environment for evaluating AI models. This approach not only improves the reliability of AI responses but also enhances the overall testing and deployment workflow of LLM-based applications.

2. Code Example

Create a Spring Boot application using Spring Initializr. This Spring boot application integrates with Ollama running in a TestContainer and evaluates LLM responses using RelevanceEvaluator and FactCheckingEvaluator.

2.1 Adding Spring Dependencies

Add the following dependencies in pom.xml file:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<dependencies>
    <!-- Spring Boot Web -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
 
    <!-- Spring AI Core -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-core</artifactId>
        <version>0.8.0</version>
    </dependency>
 
    <!-- TestContainers for Ollama -->
    <dependency>
        <groupId>org.testcontainers</groupId>
        <artifactId>testcontainers</artifactId>
        <version>1.19.3</version>
    </dependency>
 
    <!-- JUnit for Testing -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

2.2 Running Ollama in a TestContainer

We use TestContainers to run Ollama in an isolated environment. Ensure that Docker is installed on your machine before proceeding.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package com.example.ollama;
 
import org.testcontainers.containers.GenericContainer;
import org.testcontainers.utility.DockerImageName;
 
/**
 * This class sets up and runs an Ollama container using TestContainers.
 */
public class OllamaContainer {
 
    // Create a GenericContainer instance for Ollama using the latest image.
    private static final GenericContainer ollama =
        new GenericContainer(DockerImageName.parse("ollama/ollama:latest"))
            .withExposedPorts(11434) // Expose port 11434 for API access.
            .withCommand("ollama serve"); // Run the Ollama server inside the container.
 
    // Static block to start the container when the class is loaded.
    static {
        ollama.start();
    }
 
    /**
     * Returns the base URL for accessing the Ollama instance.
     *
     * @return Base URL as a string in the format http://host:port
     */
    public static String getBaseUrl() {
        return "http://" + ollama.getHost() + ":" + ollama.getMappedPort(11434);
    }
}

2.2.1 Code Explanation

The provided Java code defines an OllamaContainer class to run an Ollama instance in an isolated environment using TestContainers. It leverages the GenericContainer class from TestContainers to pull and run the ollama/ollama:latest Docker image. The .withExposedPorts(11434) method ensures the Ollama server is accessible on port 11434, while .withCommand("ollama serve") starts the server inside the container. The static block initializes and starts the container automatically when the class is loaded. The getBaseUrl() method constructs and returns the base URL for accessing Ollama by dynamically mapping the container’s port to the host machine. This setup allows for seamless testing and integration of Ollama within a Spring Boot or Java-based application.

2.3 LLM Service (Queries Ollama)

This service sends prompts to Ollama and retrieves responses.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package com.example.ollama.service;
 
import org.springframework.ai.client.OpenAiClient;
import org.springframework.ai.client.generative.GenerativeAiResponse;
import org.springframework.stereotype.Service;
 
/**
 * Service class for interacting with the AI model using OpenAI's client.
 */
@Service
public class LlmService {
 
    // OpenAiClient instance to communicate with the AI model.
    private final OpenAiClient client;
 
    /**
     * Constructor initializes the OpenAiClient with the base URL
     * retrieved from the OllamaContainer.
     */
    public LlmService() {
        this.client = new OpenAiClient(OllamaContainer.getBaseUrl());
    }
 
    /**
     * Sends a prompt to the AI model and returns the generated response.
     *
     * @param prompt The user-provided input for AI processing.
     * @return The AI-generated response as a string.
     */
    public String getResponse(String prompt) {
        GenerativeAiResponse response = client.complete(prompt);
        return response.getResult();
    }
}

2.3.1 Code Explanation

The LlmService class is a Spring service that interacts with an AI model using OpenAI’s client. It initializes an OpenAiClient instance, connecting to the locally running Ollama instance via the base URL retrieved from OllamaContainer.getBaseUrl(). The getResponse(String prompt) method takes a user-provided prompt and sends it to the AI model using the client.complete(prompt) method, and returns the generated response. This setup allows seamless communication between the Spring Boot application and the AI model hosted in a TestContainers-managed environment.

2.4 Evaluators for Relevance and Fact-Checking

We integrate Spring AI’s RelevanceEvaluator and FactCheckingEvaluator for AI model assessment.

2.4.1 RelevanceEvaluator

The RelevanceEvaluator is a component in Spring AI designed to assess how well an AI-generated response aligns with the given prompt. It analyzes the semantic similarity between the input and output, providing a relevance score that indicates the accuracy and contextual appropriateness of the response. This evaluator is useful for applications that require quality control over AI-generated content, ensuring meaningful and relevant outputs in conversational AI, chatbots, and automated text generation systems. For more details, visit the official Spring AI documentation.

01
02
03
04
05
06
07
08
09
10
11
12
13
package com.example.ollama.evaluation;
 
import org.springframework.ai.evaluation.RelevanceEvaluator;
import org.springframework.stereotype.Service;
 
@Service
public class ResponseEvaluator {
    private final RelevanceEvaluator evaluator = new RelevanceEvaluator();
 
    public double getRelevanceScore(String prompt, String response) {
        return evaluator.evaluate(prompt, response).getScore();
    }
}
2.4.1.1 Code Explanation

The ResponseEvaluator class is a Spring service that utilizes the RelevanceEvaluator to assess the relevance of an AI-generated response based on a given prompt. It initializes an instance of RelevanceEvaluator and provides the getRelevanceScore(String prompt, String response) method, which evaluates the AI’s response against the prompt and returns a relevance score. This helps in measuring the accuracy and appropriateness of AI-generated content within applications integrating AI models.

2.4.2 FactCheckingEvaluator

The FactCheckingEvaluator is a component in Spring AI that assesses the factual accuracy of AI-generated responses by comparing them against the given prompt. It assigns a fact-checking score, helping to determine how truthful and reliable the response is. This evaluator is particularly useful for applications that require high accuracy, such as news verification, research support, and knowledge-based AI systems. For more details, visit the official Spring AI documentation.

01
02
03
04
05
06
07
08
09
10
11
12
13
package com.example.ollama.evaluation;
 
import org.springframework.ai.evaluation.FactCheckingEvaluator;
import org.springframework.stereotype.Service;
 
@Service
public class FactCheckService {
    private final FactCheckingEvaluator evaluator = new FactCheckingEvaluator();
 
    public double getFactCheckScore(String prompt, String response) {
        return evaluator.evaluate(prompt, response).getScore();
    }
}
2.4.2.1 Code Explanation

The FactCheckService class is a Spring service that leverages the FactCheckingEvaluator to verify the factual accuracy of an AI-generated response. It initializes an instance of FactCheckingEvaluator and provides the getFactCheckScore(String prompt, String response) method, which evaluates the response against the given prompt and returns a fact-checking score. This helps ensure that AI-generated content is reliable and factually correct, making it useful for applications where content accuracy is critical, such as news verification, research assistance, and knowledge-based AI systems.

2.5 Controller (Integrating LLM & Evaluators)

Develop a controller that generates an LLM response, evaluates its relevance and factual accuracy, and returns the results in a well-structured JSON format.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package com.example.ollama.controller;
 
import com.example.ollama.evaluation.FactCheckService;
import com.example.ollama.e;valuation.ResponseEvaluator;
import com.example.ollama.service.LlmService;
import org.springframework.web.bind.annotation.*;
 
import java.util.HashMap;
import java.util.Map;
 
@RestController
@RequestMapping("/api/llm")
public class LlmController {
    private final LlmService llmService;
    private final ResponseEvaluator responseEvaluator;
    private final FactCheckService factCheckService;
 
    public LlmController(LlmService llmService, ResponseEvaluator responseEvaluator, FactCheckService factCheckService) {
        this.llmService = llmService;
        this.responseEvaluator = responseEvaluator;
        this.factCheckService = factCheckService;
    }
 
    @GetMapping("/generate")
    public Map<String, Object> generate(@RequestParam String prompt) {
        String response = llmService.getResponse(prompt);
        double relevanceScore = responseEvaluator.getRelevanceScore(prompt, response);
        double factCheckScore = factCheckService.getFactCheckScore(prompt, response);
 
        // Creating a structured response
        Map<String, Object> result = new HashMap<>();
        result.put("prompt", prompt);
        result.put("response", response);
        result.put("relevanceScore", relevanceScore);
        result.put("factCheckScore", factCheckScore);
 
        return result;
    }
}

2.5.1 Code Explanation

The LlmController class is a Spring Boot REST controller that handles AI-generated responses while evaluating their relevance and factual accuracy. It is mapped to the /api/llm endpoint and depends on three services: LlmService for generating responses, ResponseEvaluator for assessing relevance, and FactCheckService for verifying factual accuracy. The /generate endpoint accepts a prompt as a query parameter, fetches the AI-generated response, calculates the relevance and fact-checking scores, and returns a structured JSON response containing the prompt, generated response, and evaluation scores. This approach ensures that AI responses are assessed for quality and reliability before being presented to the user.

2.6 Setting up the properties

Include the following properties in the application.properties file located at src/main/resources/application.properties.

1
2
3
4
5
6
7
# Server Configuration
server.port=8080
# Ollama API (Dynamic URL from TestContainers)
# ollama.api.base-url=${OLLAMA_BASE_URL:http://localhost:11434}
# Spring AI Configurations
spring.ai.model=gpt-3.5-turbo
spring.ai.temperature=0.7
  • server.port=8080 – Specifies that the application will run on port 8080.
  • Spring AI Configurations:
    • spring.ai.model=gpt-3.5-turbo – Defines the AI model to be used, in this case, gpt-3.5-turbo.
    • spring.ai.temperature=0.7 – Adjusts the AI response randomness, with higher values making outputs more creative and lower values making them more deterministic.

2.7 Running the Application

To start the application, run the main class from your IDE or use the following command:

1
mvn spring-boot:run

2.7.1 Testing the API

You can test the API using cURL or Postman.

Execute the following cURL command:

The API will respond with a structured JSON output similar to the following:

1
2
3
4
5
6
{
    "prompt": "What is Spring AI?",
    "response": "Spring AI is a framework that integrates AI into Java applications.",
    "relevanceScore": 0.95,
    "factCheckScore": 0.87
}

3. Unit Testing (Validating the LLM response)

The LlmEvaluatorTest class is a unit test for validating the functionality of the LlmController in a Spring Boot application.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package com.example.ollama;
 
import com.example.ollama.controller.LlmController;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
 
import java.util.Map;
 
import static org.junit.jupiter.api.Assertions.*;
 
@SpringBootTest
public class LlmEvaluatorTest {
     
    @Autowired
    private LlmController llmController;
 
    @Test
    void testLlmResponse() {
        String prompt = "What is Spring AI?";
        Map<String, Object> result = llmController.generate(prompt);
 
        assertNotNull(result);
        assertTrue((double) result.get("relevanceScore") > 0.7, "Response is not relevant!");
        assertTrue((double) result.get("factCheckScore") > 0.8, "Response contains incorrect information!");
    }
}

The test class is annotated with @SpringBootTest to load the application context, and it uses @Autowired to inject an instance of LlmController. The testLlmResponse() method sends a test prompt, “What is Spring AI?”, to the controller and retrieves the AI-generated response. It then asserts that the response is not null and verifies that the relevanceScore is above 0.7 and the factCheckScore is above 0.8. If these conditions are not met, appropriate failure messages are displayed.

The test case will successfully pass if all conditions are met and no assertions fail during execution.

1
All Tests passed.

If these conditions are not met, an assertion failure message is displayed.

4. Code Example

By using Spring AI Evaluators along with Ollama in TestContainers, we can effectively test LLM responses in a reproducible environment. This approach ensures that the generated responses meet expectations, improving the reliability of AI-driven applications.

Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest


This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button