Create a Retrieval-Augmented Generation (RAG) App with Vector Stores and Spring AI
Vector databases often work as memory for AI apps, especially those using large language models (LLMs). They allow for semantic search, which helps find relevant information to improve the prompts sent to LLMs. This leads to more accurate and context-aware responses, making vector databases perfect for Retrieval Augmented Generation (RAG) applications. In this article, we will be building a Spring AI RAG (Retrieval Augmented Generation) app using the SimpleVectorStore
. Spring AI simplifies the process of integrating AI models with vector databases to enhance contextual understanding. By leveraging SimpleVectorStore
for storing and retrieving embeddings, we will demonstrate how to create an efficient system that generates accurate, context-driven responses to user queries.
1. Introduction to RAG
Retrieval Augmented Generation (RAG) is a smart way to improve how AI systems answer questions or create content by combining two steps: retrieving useful information and generating responses. Instead of just relying on what the AI knows, RAG pulls in extra data that helps the system understand the question better and provide more accurate, context-aware answers.
1.1 Vector Databases in RAG
A key component in making the Retrieval Augmented Generation (RAG) pattern effective is the vector database. A vector database is specifically designed to store and manage vector embeddings. Several vector databases can be integrated with RAG systems:
- Redis: Redis, especially with Redis Stack, supports vector similarity search, making it a great option for embedding retrieval.
- Weaviate: Another open-source option, Weaviate is tailored for managing and searching through vectors.
- Pinecone: A cloud-native vector database specifically designed for machine learning workloads.
- Milvus: An open-source vector database that specializes in managing massive quantities of unstructured data and provides fast, scalable search.
1.2 Benefits and Applications of RAG
RAG helps in a few important ways:
- Improved privacy: You can use data that the AI wasn’t trained on, meaning you don’t have to worry about the AI knowing sensitive information beforehand.
- Better context: The system can pull in relevant information to understand the user’s question more deeply.
- Higher accuracy: By looking up information, RAG helps reduce mistakes (when AI makes up things) by using real facts.
- Flexible applications: It can be used for various tasks like answering questions, creating summaries, or powering chatbots.
By combining the strengths of retrieving information and generating responses, RAG is great for:
- Customer Support: Quickly answers customer queries by pulling from relevant documents or FAQs.
- Personalized Recommendations: Offers tailored product or content suggestions using real-time data.
- Knowledge Base Search: Enhances search accuracy by synthesizing answers from multiple sources.
- Content Creation: Helps generate summaries or articles by combining information from various sources.
2. Implementing RAG in Java Spring
In a RAG workflow, data is loaded into a vector database like Redis. When a query is received, the database retrieves documents relevant to the query, which are then used to generate a response with the help of an AI model. We will use a dataset containing information about movies, including attributes such as title, genre, release year, and a brief description. This dataset will be loaded into Redis to illustrate the RAG process.
To build a Retrieval Augmented Generation (RAG) application, create a Spring Boot project using Spring Initializr or any IDE and add the following dependencies to the pom.xml
:
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency>
Make sure that the application.properties
is set up like this:
spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.embedding.api-key=${OPENAI_API_KEY}
Next, we need to set up a Vector Store to implement a RAG example. The Vector Store holds our data along with its vector embeddings, allowing us to perform semantic searches and find the most relevant information for a user’s query.
2.1 Application Configuration
Spring AI offers a VectorStore
interface designed for storing and retrieving embeddings. It offers various implementations, including SimpleVectorStore
, RedisVectorStore
, ChromaVectorStore
, Neo4jVectorStore
, and PgVectorStore
, among others. Each implementation is tailored for specific use cases. For instance, SimpleVectorStore
is ideal for smaller, in-memory applications, while RedisVectorStore
excels in distributed environments requiring scalability.
In this article, we will use the SimpleVectorStore
to handle the storage and retrieval of embeddings, providing a straightforward approach for managing vector data. Begin by creating a class named AppConfig
, and then add the following to your AppConfig
class.
@Configuration public class AppConfig { @Bean VectorStore vectorStore(EmbeddingModel embeddingModel) { return new SimpleVectorStore(embeddingModel); } }
2.2 Loader Service
Now that everything is set up, we can embed some data into our Vector Store and test the model’s responses. We will start by embedding a sample dataset. For this, we will import a JSON document containing a movie catalogue. Spring AI provides a range of DocumentReader
implementations to handle different file formats, allowing for easy data extraction and processing. These include JsonReader
for parsing JSON files, TextReader
for reading plain text documents, and PagePdfDocumentReader
for extracting content from PDF files.
A directory named data
should be created inside the resources
folder, and the movies.json
file should be placed within it. The application will work with a movie dataset that follows this JSON structure:
Dataset
[ { "id": "01ab23", "title": "Inception", "genre": "Science Fiction", "releaseYear": 2010, "description": "A skilled thief is given a chance at redemption if he can successfully perform an inception: implanting an idea into a target's subconscious." }, { "id": "02bc34", "title": "The Dark Knight", "genre": "Action", "releaseYear": 2008, "description": "Batman raises the stakes in his war on crime with the help of Lieutenant Jim Gordon and District Attorney Harvey Dent. Together, they dismantle Gotham's criminal organizations, but soon find themselves prey to a reign of chaos unleashed by the Joker." }, { "id": "04de56", "title": "The Matrix", "genre": "Science Fiction", "releaseYear": 1999, "description": "A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers." }, { "id": "05ef67", "title": "The Godfather", "genre": "Crime", "releaseYear": 1972, "description": "The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son." }, { "id": "07gh89", "title": "The Lord of the Rings: The Fellowship of the Ring", "genre": "Fantasy", "releaseYear": 2001, "description": "A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle-earth from the Dark Lord Sauron." }, { "id": "13qr45", "title": "The Lion King", "genre": "Animation", "releaseYear": 1994, "description": "Lion prince Simba and his father are targeted by his bitter uncle, who wants to ascend the throne himself." }, { "id": "14st56", "title": "Titanic", "genre": "Drama", "releaseYear": 1997, "description": "A seventeen-year-old aristocrat falls in love with a kind but poor artist aboard the luxurious, ill-fated R.M.S. Titanic." }, { "id": "16wx78", "title": "Jurassic Park", "genre": "Science Fiction", "releaseYear": 1993, "description": "During a preview tour, a theme park suffers a major power breakdown that allows its cloned dinosaur exhibits to run amok." } ]
Next, create a reference to the JSON file and use JsonReader
to read the document and store its contents in the VectorStore.
@Component public class MovieDataLoader { @Value("classpath:/data/movies.json") private Resource resource; private static final String[] KEYS = {"title", "genre", "releaseYear", "description"}; @Autowired private VectorStore vectorStore; public void uploadMovies() { JsonReader jsonReader = new JsonReader(resource, KEYS); List<Document> docs = jsonReader.get(); vectorStore.add(docs); } }
This class, MovieDataLoader
, includes a method that uses JsonReader
to parse the dataset and then inserts the parsed documents into the SimpleVectorStore
.
@Value("classpath:/data/movies.json")
is used to inject the JSON file (located in theresources/data
directory) as aResource
object. This file contains the movie dataset.- The
KEYS
array specifies the fields we’re interested in from the JSON file (e.g., title, genre, releaseYear, and description). - The
JsonReader
object is instantiated with the dataset file and the defined keys, which is responsible for parsing the JSON file into a list ofDocument
objects. - The parsed documents are added to the
VectorStore
using thevectorStore.add(docs)
method.
2.3 RAG Service
The MovieRagService
class implements the RAG workflow. It computes the vector for the user’s query, retrieves relevant documents from SimpleVectorStore
, and generates a response using a chat client.
@Service public class MovieRagService { @Autowired private VectorStore vectorStore; @Autowired private ChatModel chatClient; private final int topK = 5; // Number of top documents to retrieve public Generation retrieve(String message) { // Compute the vector of the user query SearchRequest searchRequest = SearchRequest.query(message).withTopK(topK); // Retrieve the most relevant documents from the vector store List<Document> documents = vectorStore.similaritySearch(searchRequest); Message systemMessage = createPrompt(documents); UserMessage userMessage = new UserMessage(message); // Construct the prompt for the AI model Prompt prompt = new Prompt(List.of(systemMessage, userMessage)); // Generate a response using the AI model ChatResponse response = chatClient.call(prompt); return response.getResult(); } private Message createPrompt(List<Document> similarDocuments) { String documents = similarDocuments.stream().map(Document::getContent).collect(Collectors.joining("\n")); SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(PROMPT_TEMPLATE); return systemPromptTemplate.createMessage(Map.of("documents", documents)); } private final String PROMPT_TEMPLATE = """ You're assisting with questions about movies in a catalog. Use the information from the DOCUMENTS section to provide accurate answers. If the question involves referring to the release year or genre of a movie, include the movie name in the response. If unsure, simply state that you don't know. DOCUMENTS: {documents} """; }
The retrieve
method creates a SearchRequest
object with the user’s message and a specified number of top results (topK
). This request is used to fetch the most relevant Document
objects from the VectorStore
via the similaritySearch
method.
The createPrompt
method then constructs a Message
containing the concatenated content of these documents. A UserMessage
is created with the original user query, and both messages are combined into a Prompt
object. This prompt is sent to the ChatModel
(chatClient
) to generate a ChatResponse
, which is returned as the result.
2.4 Controller
Finally, we expose the RAG service through a REST API:
@RestController public class MovieRAGController { @Autowired private MovieRagService movieService; @GetMapping("/ai/rag/movies") public Map chat(@RequestParam(name = "title") String title) { return Map.of("answer", movieService.retrieve(title)); } }
Testing via API
To manually test the endpoint, we can use tools like Postman or CURL. Here is how we can test using curl.
curl --location 'http://localhost:8080/ai/rag/movies?title=Inception'
Given the JSON data provided, the output would be a JSON response that includes details about the movie “Inception” extracted from our dataset. Here is what you can expect:
{ "title": "Inception", "genre": "Science Fiction", "releaseYear": 2010, "description": "A skilled thief is given a chance at redemption if he can successfully perform an inception: implanting an idea into a target's subconscious." }
3. Conclusion
In this article, we explored how to build a Retrieval Augmented Generation (RAG) application using Spring AI and SimpleVectorStore. We demonstrated how to integrate SimpleVectorStore as a vector database for storing and retrieving contextual data, and how to leverage an AI model to generate responses based on user queries.
4. Download the Source Code
This article covers building a spring AI RAG app.
You can download the full source code of this example here: spring ai RAG app