Create a Retrieval-Augmented Generation (RAG) App with Vector Stores and Spring AI

Omozegie AziegbeSeptember 23rd, 2024Last Updated: September 20th, 2024

0 459 7 minutes read

Vector databases often work as memory for AI apps, especially those using large language models (LLMs). They allow for semantic search, which helps find relevant information to improve the prompts sent to LLMs. This leads to more accurate and context-aware responses, making vector databases perfect for Retrieval Augmented Generation (RAG) applications. In this article, we will be building a Spring AI RAG (Retrieval Augmented Generation) app using the SimpleVectorStore. Spring AI simplifies the process of integrating AI models with vector databases to enhance contextual understanding. By leveraging SimpleVectorStore for storing and retrieving embeddings, we will demonstrate how to create an efficient system that generates accurate, context-driven responses to user queries.

1. Introduction to RAG

Retrieval Augmented Generation (RAG) is a smart way to improve how AI systems answer questions or create content by combining two steps: retrieving useful information and generating responses. Instead of just relying on what the AI knows, RAG pulls in extra data that helps the system understand the question better and provide more accurate, context-aware answers.

1.1 Vector Databases in RAG

A key component in making the Retrieval Augmented Generation (RAG) pattern effective is the vector database. A vector database is specifically designed to store and manage vector embeddings. Several vector databases can be integrated with RAG systems:

Redis: Redis, especially with Redis Stack, supports vector similarity search, making it a great option for embedding retrieval.
Weaviate: Another open-source option, Weaviate is tailored for managing and searching through vectors.
Pinecone: A cloud-native vector database specifically designed for machine learning workloads.
Milvus: An open-source vector database that specializes in managing massive quantities of unstructured data and provides fast, scalable search.

1.2 Benefits and Applications of RAG

RAG helps in a few important ways:

Improved privacy: You can use data that the AI wasn’t trained on, meaning you don’t have to worry about the AI knowing sensitive information beforehand.
Better context: The system can pull in relevant information to understand the user’s question more deeply.
Higher accuracy: By looking up information, RAG helps reduce mistakes (when AI makes up things) by using real facts.
Flexible applications: It can be used for various tasks like answering questions, creating summaries, or powering chatbots.

By combining the strengths of retrieving information and generating responses, RAG is great for:

Customer Support: Quickly answers customer queries by pulling from relevant documents or FAQs.
Personalized Recommendations: Offers tailored product or content suggestions using real-time data.
Knowledge Base Search: Enhances search accuracy by synthesizing answers from multiple sources.
Content Creation: Helps generate summaries or articles by combining information from various sources.

2. Implementing RAG in Java Spring

In a RAG workflow, data is loaded into a vector database like Redis. When a query is received, the database retrieves documents relevant to the query, which are then used to generate a response with the help of an AI model. We will use a dataset containing information about movies, including attributes such as title, genre, release year, and a brief description. This dataset will be loaded into Redis to illustrate the RAG process.

To build a Retrieval Augmented Generation (RAG) application, create a Spring Boot project using Spring Initializr or any IDE and add the following dependencies to the pom.xml:

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.ai</groupId>
			<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
		</dependency>

Make sure that the application.properties is set up like this:

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.embedding.api-key=${OPENAI_API_KEY}

Next, we need to set up a Vector Store to implement a RAG example. The Vector Store holds our data along with its vector embeddings, allowing us to perform semantic searches and find the most relevant information for a user’s query.

2.1 Application Configuration

Spring AI offers a VectorStore interface designed for storing and retrieving embeddings. It offers various implementations, including SimpleVectorStore, RedisVectorStore, ChromaVectorStore, Neo4jVectorStore, and PgVectorStore, among others. Each implementation is tailored for specific use cases. For instance, SimpleVectorStore is ideal for smaller, in-memory applications, while RedisVectorStore excels in distributed environments requiring scalability.

In this article, we will use the SimpleVectorStore to handle the storage and retrieval of embeddings, providing a straightforward approach for managing vector data. Begin by creating a class named AppConfig, and then add the following to your AppConfig class.

@Configuration
public class AppConfig {

    @Bean
    VectorStore vectorStore(EmbeddingModel embeddingModel) {
        return new SimpleVectorStore(embeddingModel);
    }

}

2.2 Loader Service

Now that everything is set up, we can embed some data into our Vector Store and test the model’s responses. We will start by embedding a sample dataset. For this, we will import a JSON document containing a movie catalogue. Spring AI provides a range of DocumentReader implementations to handle different file formats, allowing for easy data extraction and processing. These include JsonReader for parsing JSON files, TextReader for reading plain text documents, and PagePdfDocumentReader for extracting content from PDF files.

A directory named data should be created inside the resources folder, and the movies.json file should be placed within it. The application will work with a movie dataset that follows this JSON structure:

Dataset

[
  {
    "id": "01ab23",
    "title": "Inception",
    "genre": "Science Fiction",
    "releaseYear": 2010,
    "description": "A skilled thief is given a chance at redemption if he can successfully perform an inception: implanting an idea into a target's subconscious."
  },
  {
    "id": "02bc34",
    "title": "The Dark Knight",
    "genre": "Action",
    "releaseYear": 2008,
    "description": "Batman raises the stakes in his war on crime with the help of Lieutenant Jim Gordon and District Attorney Harvey Dent. Together, they dismantle Gotham's criminal organizations, but soon find themselves prey to a reign of chaos unleashed by the Joker."
  },
  {
    "id": "04de56",
    "title": "The Matrix",
    "genre": "Science Fiction",
    "releaseYear": 1999,
    "description": "A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers."
  },
  {
    "id": "05ef67",
    "title": "The Godfather",
    "genre": "Crime",
    "releaseYear": 1972,
    "description": "The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son."
  },
  {
    "id": "07gh89",
    "title": "The Lord of the Rings: The Fellowship of the Ring",
    "genre": "Fantasy",
    "releaseYear": 2001,
    "description": "A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle-earth from the Dark Lord Sauron."
  },
  {
    "id": "13qr45",
    "title": "The Lion King",
    "genre": "Animation",
    "releaseYear": 1994,
    "description": "Lion prince Simba and his father are targeted by his bitter uncle, who wants to ascend the throne himself."
  },
  {
    "id": "14st56",
    "title": "Titanic",
    "genre": "Drama",
    "releaseYear": 1997,
    "description": "A seventeen-year-old aristocrat falls in love with a kind but poor artist aboard the luxurious, ill-fated R.M.S. Titanic."
  },
  {
    "id": "16wx78",
    "title": "Jurassic Park",
    "genre": "Science Fiction",
    "releaseYear": 1993,
    "description": "During a preview tour, a theme park suffers a major power breakdown that allows its cloned dinosaur exhibits to run amok."
  }
]

Next, create a reference to the JSON file and use JsonReader to read the document and store its contents in the VectorStore.

@Component
public class MovieDataLoader {

    @Value("classpath:/data/movies.json")
    private Resource resource;

    private static final String[] KEYS = {"title", "genre", "releaseYear", "description"};

    @Autowired
    private VectorStore vectorStore;

    public void uploadMovies() {
        JsonReader jsonReader = new JsonReader(resource, KEYS);
        List<Document> docs = jsonReader.get();
        vectorStore.add(docs);
    }

}

This class, MovieDataLoader, includes a method that uses JsonReader to parse the dataset and then inserts the parsed documents into the SimpleVectorStore.

@Value("classpath:/data/movies.json") is used to inject the JSON file (located in the resources/data directory) as a Resource object. This file contains the movie dataset.
The KEYS array specifies the fields we’re interested in from the JSON file (e.g., title, genre, releaseYear, and description).
The JsonReader object is instantiated with the dataset file and the defined keys, which is responsible for parsing the JSON file into a list of Document objects.
The parsed documents are added to the VectorStore using the vectorStore.add(docs) method.

2.3 RAG Service

The MovieRagService class implements the RAG workflow. It computes the vector for the user’s query, retrieves relevant documents from SimpleVectorStore, and generates a response using a chat client.

@Service
public class MovieRagService {

    @Autowired
    private VectorStore vectorStore;

    @Autowired
    private ChatModel chatClient;

    private final int topK = 5; // Number of top documents to retrieve

    public Generation retrieve(String message) {
        // Compute the vector of the user query
        SearchRequest searchRequest = SearchRequest.query(message).withTopK(topK);
        // Retrieve the most relevant documents from the vector store
        List<Document> documents = vectorStore.similaritySearch(searchRequest);
        Message systemMessage = createPrompt(documents);
        UserMessage userMessage = new UserMessage(message);
        // Construct the prompt for the AI model
        Prompt prompt = new Prompt(List.of(systemMessage, userMessage));
        // Generate a response using the AI model
        ChatResponse response = chatClient.call(prompt);
        return response.getResult();
    }

    private Message createPrompt(List<Document> similarDocuments) {
        String documents = similarDocuments.stream().map(Document::getContent).collect(Collectors.joining("\n"));
        SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(PROMPT_TEMPLATE);
        return systemPromptTemplate.createMessage(Map.of("documents", documents));
    }

    private final String PROMPT_TEMPLATE = """
You're assisting with questions about movies in a catalog.
Use the information from the DOCUMENTS section to provide accurate answers.
If the question involves referring to the release year or genre of a movie, include the movie name in the response.
If unsure, simply state that you don't know.

DOCUMENTS:
{documents}
""";
}

The retrieve method creates a SearchRequest object with the user’s message and a specified number of top results (topK). This request is used to fetch the most relevant Document objects from the VectorStore via the similaritySearch method.

The createPrompt method then constructs a Message containing the concatenated content of these documents. A UserMessage is created with the original user query, and both messages are combined into a Prompt object. This prompt is sent to the ChatModel (chatClient) to generate a ChatResponse, which is returned as the result.

2.4 Controller

Finally, we expose the RAG service through a REST API:

@RestController
public class MovieRAGController {

    @Autowired
    private MovieRagService movieService;

    @GetMapping("/ai/rag/movies")
    public Map chat(@RequestParam(name = "title") String title) {
        return Map.of("answer", movieService.retrieve(title));
    }

}

Testing via API

To manually test the endpoint, we can use tools like Postman or CURL. Here is how we can test using curl.

curl --location 'http://localhost:8080/ai/rag/movies?title=Inception'

Given the JSON data provided, the output would be a JSON response that includes details about the movie “Inception” extracted from our dataset. Here is what you can expect:

{
  "title": "Inception",
  "genre": "Science Fiction",
  "releaseYear": 2010,
  "description": "A skilled thief is given a chance at redemption if he can successfully perform an inception: implanting an idea into a target's subconscious."
}

3. Conclusion

In this article, we explored how to build a Retrieval Augmented Generation (RAG) application using Spring AI and SimpleVectorStore. We demonstrated how to integrate SimpleVectorStore as a vector database for storing and retrieving contextual data, and how to leverage an AI model to generate responses based on user queries.

4. Download the Source Code

This article covers building a spring AI RAG app.

Download
You can download the full source code of this example here: spring ai RAG app

Create a Retrieval-Augmented Generation (RAG) App with Vector Stores and Spring AI

1. Introduction to RAG

1.1 Vector Databases in RAG

1.2 Benefits and Applications of RAG

2. Implementing RAG in Java Spring

2.1 Application Configuration

2.2 Loader Service

Dataset

2.3 RAG Service

2.4 Controller

3. Conclusion

4. Download the Source Code

Thank you!

Omozegie Aziegbe

Thank you!

1. Introduction to RAG

1.1 Vector Databases in RAG

1.2 Benefits and Applications of RAG

2. Implementing RAG in Java Spring

2.1 Application Configuration

2.2 Loader Service

Dataset

2.3 RAG Service

2.4 Controller

3. Conclusion

4. Download the Source Code

Thank you!

Related Articles

Thank you!