Reasoning and Acting (ReAct) Agent

When faced with a complex problem, humans typically decompose it into a series of smaller, manageable steps (reasoning) and take actions by leveraging both internal knowledge and external information to solve each step. Similarly, an agent can utilize the reasoning approach to break down a problem into multiple sub-problems and then employ appropriate tools to address each sub-problem effectively. The corresponding agents are called reasoning and acting (ReAct) agents [1].

Figure 1 illustrates the graph architecture of a ReAct agent, which consists of a start state, two nodes (namely agent and tool), and an end state. The agent node invokes the large language model (LLM) to perform reasoning and determines when to leverage the tool node to execute the required actions [2].

Fig. 1: Graph architecture of a ReAct agent in LangGraph [2].

Single Tool

In a simplified scenario, the agent utilizes a single tool to calculate the sum of two numbers. Upon receiving a user query, the agent invokes the tool to perform the computation and return the result [2].

We first define a shared data structure, called AgentState, to manage and maintain the application's state during execution. Within this structure, the messages field is defined as an Annotated type. Sequence represents an ordered collection of messages, ensuring that the conversational history is preserved chronologically. Each message in the sequence is an instance of BaseMessage, an abstract class that serves as the foundation for all message types in LangGraph (e.g., HumanMessage, AIMessage, SystemMessage). The use of Annotated allows us to attach metadata (in this case, the add_messages method) which instructs LangGraph to append new messages to the existing state rather than replacing them. This design ensures that the agent can maintain a complete conversation history while updating its state dynamically [2].

Next, the tool is defined as a function that takes two numbers as input and returns their sum. To integrate this tool with the LLM, a list of tools is created and passed to the model that enables the LLM to invoke the appropriate tool when required [2].

Thereafter, the agent node is defined within the system by assigning an LLM to handle reasoning and decision-making. While advanced models such as GPT-4o are available, we adopt the pre-trained "Qwen/Qwen2.5-7B-Instruct" model from the Hugging Face API [3] due to certain practical constraints. The agent node is then configured to invoke this LLM with the provided input. It is worth noting that, since the Qwen model does not inherently employ external tools, the system prompt must be carefully crafted to explicitly instruct the model to use the designated tool. Otherwise, the model may default to its built-in capabilities, such as performing calculations internally, rather than invoking the external tool as intended [2].

As illustrated in Fig. 1, a conditional edge connects the agent node to the tool node. After performing reasoning and decomposing an input query into multiple sub-tasks, the agent evaluates whether the use of a tool is required to address the next sub-task. If no tool is needed, the agent considers the problem solved and terminates the response to the query [2].

The next step involves constructing the graph. We begin by initializing an empty graph in LangGraph, specifying its input type as state (AgentState). Next, the agent and tool nodes are added to the graph. The tool node is connected to the agent via a deterministic edge, while the agent node is linked to both the tool node and the end state through conditional edges. Finally, the graph is compiled and stored in a variable for subsequent execution [2].

Lastly, we invoke the compiled graph by providing a conversation as input. The results demonstrate that the ReAct agent successfully reasons over the query and utilizes the designated tool to respond to the user’s request [2]. The complete implementation script is available on ReAct - Single Tool.

Multiple Tools

In the second scenario, we increase the system's complexity by incorporating additional tools. The AgentState, LLM, and overall graph architecture remain largely the same as in the previous single-tool setup. The primary modification involves the tool set, where two new tools, namely subtract tool and multiply tool, are added to the system. These tools are responsible for computing the difference and the product of two numbers, respectively [2].

In the end, we invoke the compiled graph by providing a conversation that includes three distinct tasks: summing two numbers, multiplying the result by 6, and subtracting 10 from the result. The detailed execution steps demonstrate that the ReAct agent successfully decomposes the query into individual sub-tasks, selects the most appropriate tool for each step, and produces both the intermediate results and the final outcome [2]. The complete implementation script is available on ReAct - Multiple Tools.

References

[1] S. Yao et al., “React: Synergizing reasoning and acting in language models,” 2023, https://arxiv.org/abs/2210.03629.

[2] freeCodeCamp.org, https://youtu.be/jGg_1h0qzaM?si=69DsFmR2TMN259HC.

[3] Hugging Face, http://huggingface.co/

Forough Shirin Abkenar