Retrieval-Augmented Generation (RAG) Agent

Large language models (LLMs) are trained on a finite dataset and can become outdated over time. Retrieval-augmented generation (RAG) is an architecture that allows LLMs to access external information, such as internal organizational data, scholarly publications, and specialized datasets, to enhance their responses and ensure they remain accurate and up to date [1].

Figure 1 illustrates the graph architecture of a RAG agent, which consists of a start state, two agent nodes, namely LLM agent and retriever agent, and an end state. The LLM agent invokes the LLM to generate responses. The retriever agent, in turn, is responsible for retrieving relevant information from designated external resources [2].

Fig. 1: Graph architecture of a RAG agent in LangGraph [2].

In this scenario, the objective is to query the RAG agent about the stock market in 2025. To achieve this, we first select a relevant PDF document and save it locally. We then define the appropriate loaders and functions to read the PDF and divide its content into chunks, ensuring compatibility with the LLM's token processing limits. Additionally, we introduce overlapping between chunks, which improves the LLM's ability to understand the document's context and maintain coherence across sections [2].

The next step is to create a Chroma (ChromaDB) instance, an open-source vector database used to store and retrieve vector embeddings. ChromaDB serves as an external knowledge base that provides LLMs with up-to-date information. As a result, it reduces hallucinations and improves the accuracy of retrieval [2].

Accordingly, a retriever is created from the vector database to extract the most relevant information from the document. In this setup, the retriever performs a similarity search and returns a specified number of chunks, in this case, the top five most relevant chunks, which are then passed to the LLM agent as context for generating accurate responses [2].

Then, we define a shared data structure, called AgentState, to manage and maintain the application's state during execution. Within this structure, the messages field is defined as an Annotated type. Sequence represents an ordered collection of messages, ensuring that the conversational history is preserved chronologically. Each message in the sequence is an instance of BaseMessage, an abstract class that serves as the foundation for all message types in LangGraph (e.g., HumanMessage, AIMessage, SystemMessage). The use of Annotated allows us to attach metadata (in this case, the add_messages method) which instructs LangGraph to append new messages to the existing state rather than replacing them. This design ensures that the agent can maintain a complete conversation history while updating its state dynamically [2].

The tool is defined as a function that takes the query as input, invokes retriever, and returns the retrieved information. To integrate this tool with the LLM, a list of tools is created and passed to the model that enables the LLM to invoke the appropriate tool when required [2].

As illustrated in Fig. 1, a conditional edge connects the two agent nodes. If additional information is required, the LLM agent queries the retriever agent to obtain relevant data. This decision is made by checking whether the use of a tool (retriever) is necessary. If no retrieval is needed, the LLM agent considers the query resolved and terminates its response [2].

Next, two agent nodes are defined within the system: (i) the LLM agent is assigned an LLM to handle response generation. While advanced models such as GPT-4o are available, we adopt the pre-trained "Qwen/Qwen2.5-7B-Instruct" model from the Hugging Face API \cite{huggingface} due to practical constraints. This agent node is configured to invoke the LLM with the provided input. (ii) the retriever agent is responsible for invoking tools to retrieve relevant information from external resources [2].

The next step involves constructing the graph. We begin by initializing an empty graph in LangGraph, specifying its input type as state (AgentState). Next, the LLM agent and the retriever agent nodes are added to the graph. The retriever agent node is connected to the LLM agent node via a deterministic edge, while the LLM agent node is linked to both the retriever agent node and the end state through conditional edges. Finally, the graph is compiled and stored in a variable for subsequent execution [2].

Lastly, we invoke the compiled graph by passing a query about the stock market. The results demonstrate that the RAG agent successfully retreives the required information using the designated retriever [2]. The complete implementation script is available on RAG.

References

[1] I. Belcic, “What is rag (retrieval augmented generation)?”, International Business Machines (IBM), accessed: 2025, https://www.ibm.com/think/topics/retrieval-augmented-generation.

[2] freeCodeCamp.org, https://youtu.be/jGg_1h0qzaM?si=69DsFmR2TMN259HC.

[3] Hugging Face, http://huggingface.co/

Forough Shirin Abkenar