Hugging Face Models With Spring AI and Ollama Example
Ollama provides a lightweight way to run LLM models locally, and Spring AI enables seamless integration with AI models in Java applications. Let us delve into understanding Spring AI, Ollama, and Hugging Face models.
1. Introduction
1.1 What is Ollama?
Ollama is a lightweight platform for running large language models (LLMs) locally on your machine. It provides a simple way to download and serve models such as Mistral, Llama, and others with minimal setup.
1.1.1 Benefits of Ollama
- Local AI Processing: Run AI models on your system without relying on cloud services.
- Fast and Efficient: Optimized for low-latency responses, making it ideal for real-time applications.
- Easy Model Management: Download, update, and switch between models seamlessly.
- Privacy & Security: No data leaves your machine, ensuring a secure AI experience.
1.2.1 Use Cases of Ollama
- Chatbots: Deploy AI assistants that run locally for improved response time and privacy.
- Content Generation: Use AI models to generate text, summarize content, or rewrite documents.
- Embedding Generation: Generate high-quality embeddings for search and recommendation systems.
1.2 What are TestContainers?
Testcontainers is a Java library that enables integration testing with real dependencies such as databases, message brokers, and application services by running them in lightweight, disposable Docker containers.
1.2.1 Benefits of TestContainers
- Reliable Integration Testing: Test with real databases and services instead of mocks or in-memory databases.
- Lightweight & Disposable: Containers are spun up for tests and automatically removed afterward.
- Parallel Execution: Each test instance can run in an isolated container, avoiding conflicts.
- Easy CI/CD Integration: Works well in continuous integration pipelines without external dependencies.
1.2.2 Use Cases of TestContainers
- Database Testing: Run PostgreSQL, MySQL, or MongoDB containers to test database interactions.
- Microservices Testing: Simulate full-service dependencies for end-to-end testing.
- AI Model Testing: Deploy AI models in a containerized environment for testing and validation.
By combining Ollama with Testcontainers, developers can easily set up AI-driven applications with real-world testing scenarios, ensuring reliability and scalability.
2. Code Example
2.1 Dependencies
To use Spring AI with Ollama on TestContainers, add the required dependencies in your pom.xml
file.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 | < dependencies > < dependency > < groupId >org.springframework.ai</ groupId > < artifactId >spring-ai-ollama-spring-boot-starter</ artifactId > < version >1.0.0</ version > </ dependency > < dependency > < groupId >org.testcontainers</ groupId > < artifactId >testcontainers</ artifactId > < version >1.19.3</ version > < scope >test</ scope > </ dependency > </ dependencies > |
2.2 Setting up Ollama With Testcontainers
To start an Ollama container using Testcontainers, add the following configuration in a test class:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 | import org.junit.jupiter.api.BeforeAll; import org.testcontainers.containers.GenericContainer; public class OllamaContainerTest { private static GenericContainer<?> ollamaContainer; @BeforeAll static void startContainer() { ollamaContainer = new GenericContainer( "ollama/ollama:latest" ) .withExposedPorts( 11434 ) .withCommand( "serve" ); ollamaContainer.start(); } } |
2.2.1 Code Explanation
The given Java code defines a test class, OllamaContainerTest
, which uses Testcontainers to run an Ollama container for testing purposes. It declares a GenericContainer
instance named ollamaContainer
, which is initialized in the startContainer()
method annotated with @BeforeAll
, meaning it runs once before all tests. The container is configured to use the ollama/ollama:latest
image, expose port 11434
, and execute the serve
command upon startup. When ollamaContainer.start()
is invoked, it pulls and runs the container, making the Ollama service available for integration testing.
2.3 Using a Chat Completion Model
Now, let’s integrate a Hugging Face model using Spring AI’s Ollama support.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 | import org.springframework.ai.ollama.OllamaChatClient; import org.springframework.ai.ollama.api.OllamaChatModel; import org.springframework.stereotype.Service; @Service public class ChatService { private final OllamaChatClient chatClient; public ChatService() { this .chatClient = new OllamaChatClient( new OllamaChatModel( "mistral" )); } public String chatWithModel(String prompt) { return chatClient.chat(prompt).getResult(); } } |
2.3.1 Code Explanation
The given Java code defines a Spring Boot service class named ChatService
, which integrates with the Ollama AI model using Spring AI
. It imports OllamaChatClient
and OllamaChatModel
from the Spring AI Ollama package and is annotated with @Service
, making it a Spring-managed bean. The constructor initializes the chatClient
by creating a new instance of OllamaChatClient
with the OllamaChatModel
set to “mistral”. The method chatWithModel(String prompt)
takes a user prompt as input, sends it to the chat model, and returns the generated response using chatClient.chat(prompt).getResult()
. This service acts as an interface to interact with the AI model and generate responses based on user queries.
2.4 REST Controller
Create an API endpoint to invoke the model and receive a response based on the given prompt.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 | import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; @RestController public class ChatController { private final ChatService chatService; public ChatController(ChatService chatService) { this .chatService = chatService; } @GetMapping ( "/chat" ) public String chat( @RequestParam String prompt) { return chatService.chatWithModel(prompt); } } |
2.4.1 Code Explanation
The given Java code defines a Spring Boot REST controller named ChatController
that provides an API endpoint for interacting with the AI model. It is annotated with @RestController
, indicating that it handles HTTP requests and returns JSON responses. The class has a dependency on ChatService
, which is injected via constructor-based dependency injection. The method chat(@RequestParam String prompt)
is mapped to the /chat
endpoint using @GetMapping
. When a user sends a GET request with a prompt
parameter, it calls the chatWithModel
method from ChatService
and returns the generated response from the AI model. This controller acts as an interface for clients to interact with the AI chatbot via HTTP requests.
2.5 Code Output
When the Spring Boot application is started and the /chat
endpoint is triggered, the following output is returned.
1 2 3 | { "response": "Hello! How can I assist you today?" } |
3. Using an Embedding Model
3.1 Service Class
To generate embeddings, use the following service:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 | import org.springframework.ai.ollama.OllamaEmbeddingClient; import org.springframework.ai.ollama.api.OllamaEmbeddingModel; import org.springframework.stereotype.Service; import java.util.List; @Service public class EmbeddingService { private final OllamaEmbeddingClient embeddingClient; public EmbeddingService() { this .embeddingClient = new OllamaEmbeddingClient( new OllamaEmbeddingModel( "all-MiniLM-L6-v2" )); } public List<Float> getEmbedding(String text) { return embeddingClient.embed(text).getEmbedding(); } } |
3.1.1 Code Explanation
The given Java code defines a Spring Boot service class named EmbeddingService
, which is responsible for generating embeddings from text using the Ollama AI model. Annotated with @Service
, it is a Spring-managed component that can be injected into other parts of the application. The class initializes an instance of OllamaEmbeddingClient
with the OllamaEmbeddingModel
set to "all-MiniLM-L6-v2"
, a commonly used model for text embeddings. The method getEmbedding(String text)
takes a text input, processes it through the embedding client, and returns a list of floating-point values representing the text’s embedding vector. This service enables applications to generate numerical representations of text, useful for tasks such as semantic search, similarity comparisons, and natural language processing tasks.
If, for any reason, the model fails to return a response (i.e., the LLM model returns null), a 500 HTTP error (Internal Server Error
) is thrown. However in most cases a default value approach like below is adopted where if the embeddingResponse
is null
, an emptyList() is returned in the response. You can modify the above code like this:
1 2 3 4 | var embeddingResponse = embeddingClient.embed(text); return (embeddingResponse != null && embeddingResponse.getEmbedding() != null ) ? embeddingResponse.getEmbedding() : Collections.emptyList(); |
3.2 Rest endpoint for embeddings
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 | import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; import java.util.List; @RestController public class EmbeddingController { private final EmbeddingService embeddingService; public EmbeddingController(EmbeddingService embeddingService) { this .embeddingService = embeddingService; } @GetMapping ( "/embedding" ) public List<Float> getEmbedding( @RequestParam String text) { return embeddingService.getEmbedding(text); } } |
3.2.1 Code Explanation
The given Java code defines a Spring Boot REST controller class named EmbeddingController
, which provides an API endpoint for generating text embeddings. The class is annotated with @RestController
, indicating it handles HTTP requests and returns JSON responses. It has a dependency on EmbeddingService
, which is injected via constructor-based dependency injection. The method getEmbedding(@RequestParam String text)
is mapped to the /embedding
endpoint using @GetMapping
. When a GET request is made with a text
parameter, the method calls getEmbedding
from EmbeddingService
, generating and returning the embedding for the provided text as a list of floating-point values. This controller enables clients to access text embedding functionality via HTTP requests.
3.3 Code Output
Redeploy and Spring boot application and once the /embedding?text=hello%20world
endpoint is triggered, the following output is returned.
1 | [0.1234, -0.5678, 0.9101, -0.1121, 0.3141] |
4. Conclusion
In this article, we explored integrating Hugging Face models with Spring AI and Ollama, covering topics such as setting up Ollama with Testcontainers, utilizing a chat completion model, and generating embeddings with an embedding model. These techniques enable seamless integration of advanced AI capabilities into Java applications, enhancing their functionality efficiently.