Enterprise Java

Hugging Face Models With Spring AI and Ollama Example

Ollama provides a lightweight way to run LLM models locally, and Spring AI enables seamless integration with AI models in Java applications. Let us delve into understanding Spring AI, Ollama, and Hugging Face models.

1. Introduction

1.1 What is Ollama?

Ollama is a lightweight platform for running large language models (LLMs) locally on your machine. It provides a simple way to download and serve models such as Mistral, Llama, and others with minimal setup.

1.1.1 Benefits of Ollama

  • Local AI Processing: Run AI models on your system without relying on cloud services.
  • Fast and Efficient: Optimized for low-latency responses, making it ideal for real-time applications.
  • Easy Model Management: Download, update, and switch between models seamlessly.
  • Privacy & Security: No data leaves your machine, ensuring a secure AI experience.

1.2.1 Use Cases of Ollama

  • Chatbots: Deploy AI assistants that run locally for improved response time and privacy.
  • Content Generation: Use AI models to generate text, summarize content, or rewrite documents.
  • Embedding Generation: Generate high-quality embeddings for search and recommendation systems.

1.2 What are TestContainers?

Testcontainers is a Java library that enables integration testing with real dependencies such as databases, message brokers, and application services by running them in lightweight, disposable Docker containers.

1.2.1 Benefits of TestContainers

  • Reliable Integration Testing: Test with real databases and services instead of mocks or in-memory databases.
  • Lightweight & Disposable: Containers are spun up for tests and automatically removed afterward.
  • Parallel Execution: Each test instance can run in an isolated container, avoiding conflicts.
  • Easy CI/CD Integration: Works well in continuous integration pipelines without external dependencies.

1.2.2 Use Cases of TestContainers

  • Database Testing: Run PostgreSQL, MySQL, or MongoDB containers to test database interactions.
  • Microservices Testing: Simulate full-service dependencies for end-to-end testing.
  • AI Model Testing: Deploy AI models in a containerized environment for testing and validation.

By combining Ollama with Testcontainers, developers can easily set up AI-driven applications with real-world testing scenarios, ensuring reliability and scalability.

2. Code Example

2.1 Dependencies

To use Spring AI with Ollama on TestContainers, add the required dependencies in your pom.xml file.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
<dependencies>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
        <version>1.0.0</version>
    </dependency>
     
    <dependency>
        <groupId>org.testcontainers</groupId>
        <artifactId>testcontainers</artifactId>
        <version>1.19.3</version>
        <scope>test</scope>
    </dependency>
</dependencies>

2.2 Setting up Ollama With Testcontainers

To start an Ollama container using Testcontainers, add the following configuration in a test class:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
import org.junit.jupiter.api.BeforeAll;
import org.testcontainers.containers.GenericContainer;
 
public class OllamaContainerTest {
    private static GenericContainer<?> ollamaContainer;
 
    @BeforeAll
    static void startContainer() {
        ollamaContainer = new GenericContainer("ollama/ollama:latest")
                .withExposedPorts(11434)
                .withCommand("serve");
        ollamaContainer.start();
    }
}

2.2.1 Code Explanation

The given Java code defines a test class, OllamaContainerTest, which uses Testcontainers to run an Ollama container for testing purposes. It declares a GenericContainer instance named ollamaContainer, which is initialized in the startContainer() method annotated with @BeforeAll, meaning it runs once before all tests. The container is configured to use the ollama/ollama:latest image, expose port 11434, and execute the serve command upon startup. When ollamaContainer.start() is invoked, it pulls and runs the container, making the Ollama service available for integration testing.

2.3 Using a Chat Completion Model

Now, let’s integrate a Hugging Face model using Spring AI’s Ollama support.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
import org.springframework.ai.ollama.OllamaChatClient;
import org.springframework.ai.ollama.api.OllamaChatModel;
import org.springframework.stereotype.Service;
 
@Service
public class ChatService {
    private final OllamaChatClient chatClient;
 
    public ChatService() {
        this.chatClient = new OllamaChatClient(new OllamaChatModel("mistral"));
    }
 
    public String chatWithModel(String prompt) {
        return chatClient.chat(prompt).getResult();
    }
}

2.3.1 Code Explanation

The given Java code defines a Spring Boot service class named ChatService, which integrates with the Ollama AI model using Spring AI. It imports OllamaChatClient and OllamaChatModel from the Spring AI Ollama package and is annotated with @Service, making it a Spring-managed bean. The constructor initializes the chatClient by creating a new instance of OllamaChatClient with the OllamaChatModel set to “mistral”. The method chatWithModel(String prompt) takes a user prompt as input, sends it to the chat model, and returns the generated response using chatClient.chat(prompt).getResult(). This service acts as an interface to interact with the AI model and generate responses based on user queries.

2.4 REST Controller

Create an API endpoint to invoke the model and receive a response based on the given prompt.

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
 
@RestController
public class ChatController {
    private final ChatService chatService;
 
    public ChatController(ChatService chatService) {
        this.chatService = chatService;
    }
 
    @GetMapping("/chat")
    public String chat(@RequestParam String prompt) {
        return chatService.chatWithModel(prompt);
    }
}

2.4.1 Code Explanation

The given Java code defines a Spring Boot REST controller named ChatController that provides an API endpoint for interacting with the AI model. It is annotated with @RestController, indicating that it handles HTTP requests and returns JSON responses. The class has a dependency on ChatService, which is injected via constructor-based dependency injection. The method chat(@RequestParam String prompt) is mapped to the /chat endpoint using @GetMapping. When a user sends a GET request with a prompt parameter, it calls the chatWithModel method from ChatService and returns the generated response from the AI model. This controller acts as an interface for clients to interact with the AI chatbot via HTTP requests.

2.5 Code Output

When the Spring Boot application is started and the /chat endpoint is triggered, the following output is returned.

1
2
3
{
    "response": "Hello! How can I assist you today?"
}

3. Using an Embedding Model

3.1 Service Class

To generate embeddings, use the following service:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
import org.springframework.ai.ollama.OllamaEmbeddingClient;
import org.springframework.ai.ollama.api.OllamaEmbeddingModel;
import org.springframework.stereotype.Service;
 
import java.util.List;
 
@Service
public class EmbeddingService {
    private final OllamaEmbeddingClient embeddingClient;
 
    public EmbeddingService() {
        this.embeddingClient = new OllamaEmbeddingClient(new OllamaEmbeddingModel("all-MiniLM-L6-v2"));
    }
 
    public List<Float> getEmbedding(String text) {
        return embeddingClient.embed(text).getEmbedding();
    }
}

3.1.1 Code Explanation

The given Java code defines a Spring Boot service class named EmbeddingService, which is responsible for generating embeddings from text using the Ollama AI model. Annotated with @Service, it is a Spring-managed component that can be injected into other parts of the application. The class initializes an instance of OllamaEmbeddingClient with the OllamaEmbeddingModel set to "all-MiniLM-L6-v2", a commonly used model for text embeddings. The method getEmbedding(String text) takes a text input, processes it through the embedding client, and returns a list of floating-point values representing the text’s embedding vector. This service enables applications to generate numerical representations of text, useful for tasks such as semantic search, similarity comparisons, and natural language processing tasks.

If, for any reason, the model fails to return a response (i.e., the LLM model returns null), a 500 HTTP error (Internal Server Error) is thrown. However in most cases a default value approach like below is adopted where if the embeddingResponse is null, an emptyList() is returned in the response. You can modify the above code like this:

1
2
3
4
var embeddingResponse = embeddingClient.embed(text);
return (embeddingResponse != null && embeddingResponse.getEmbedding() != null)
    ? embeddingResponse.getEmbedding()
    : Collections.emptyList();

3.2 Rest endpoint for embeddings

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
 
@RestController
public class EmbeddingController {
    private final EmbeddingService embeddingService;
 
    public EmbeddingController(EmbeddingService embeddingService) {
        this.embeddingService = embeddingService;
    }
 
    @GetMapping("/embedding")
    public List<Float> getEmbedding(@RequestParam String text) {
        return embeddingService.getEmbedding(text);
    }
}

3.2.1 Code Explanation

The given Java code defines a Spring Boot REST controller class named EmbeddingController, which provides an API endpoint for generating text embeddings. The class is annotated with @RestController, indicating it handles HTTP requests and returns JSON responses. It has a dependency on EmbeddingService, which is injected via constructor-based dependency injection. The method getEmbedding(@RequestParam String text) is mapped to the /embedding endpoint using @GetMapping. When a GET request is made with a text parameter, the method calls getEmbedding from EmbeddingService, generating and returning the embedding for the provided text as a list of floating-point values. This controller enables clients to access text embedding functionality via HTTP requests.

3.3 Code Output

Redeploy and Spring boot application and once the /embedding?text=hello%20world endpoint is triggered, the following output is returned.

1
[0.1234, -0.5678, 0.9101, -0.1121, 0.3141]

4. Conclusion

In this article, we explored integrating Hugging Face models with Spring AI and Ollama, covering topics such as setting up Ollama with Testcontainers, utilizing a chat completion model, and generating embeddings with an embedding model. These techniques enable seamless integration of advanced AI capabilities into Java applications, enhancing their functionality efficiently.

Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest


This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button