# ReAct Agent Explained Simply | Generative AI | AI Agents

## Метаданные

- **Канал:** Siddhardhan
- **YouTube:** https://www.youtube.com/watch?v=bOhK-FRR-Ac
- **Дата:** 27.04.2026
- **Длительность:** 31:25
- **Просмотры:** 606
- **Источник:** https://ekstraktznaniy.ru/video/50874

## Описание

🤖 My end-to-end Machine Learning  & Generative AI Course - Udemy: https://linktr.ee/siddhardhan

In this video, we will understand the conceptual idea of a ReAct Agent in AI.
ReAct stands for Reasoning + Action. It is one of the most important patterns used in AI agents, where an LLM does not just generate an answer directly, but follows a loop of thinking, taking action, observing the result, and then producing the final answer.

Handwritten notes link: https://drive.google.com/file/d/18nz2_k-ZE5rPNoJiGa20iYVrXWoxD3DB/view?usp=sharing

#generativeai #genai #artificialintelligence #ai

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Hello everyone, I'm Siddharthan. In this video, we are going to understand the conceptual idea of a react agent. Now, when we use a normal LLM, it can reason very well. It can understand the question, break it down, and generate an answer. But the problem is it cannot actually take actions by itself. For example, it cannot search for latest information, it cannot use a calculator, it cannot call an API or interact with external tools unless we design a system around it. That is where the idea of react comes in. So, in this video, we will understand what a react agent is, why it is useful, how the thought, action, and observation loop works, when we should use it, and when it might be a overkill. And once we understand all these conceptual aspects, in the next video, we will take this idea forward and build a react agent from scratch without any agentic frameworks in Python. So, this will be the agenda for today's video, and let's get started. Before moving on to today's topic, let me quickly show you my Udemy courses. Currently, I have a complete generative AI course and a complete machine learning course. In these courses, I have started from the very basics and covered all the way up to the advanced topics, and you will also have a lot of hands-on practice in these courses. We also have capstone projects towards the end that you can work on and add in your resume. So, if you are a beginner or an intermediate learner, these courses will be a perfect fit for you. I'll give the link for these courses in the video description, you can check this out. And I'm also planning to add more courses in this series like deep learning course, ML ops, AI agents, etc. So, look out for that as well. With that being said, let's get started with today's video. So, the first thing that we can discuss about this react agent is what it is and why it is useful. So, let's try to answer these two questions to understand this better. So, here I can say LLMs reason well. But the limitation is that they can't act. So, what do I mean by this? Let's say that we are building a LLM-based chatbot with a open-source LLM like Llama 3 70 billion, which is a pretty good model, a stronger model. If a user asks a question on what is machine learning or what is deep learning, the LLM reasons well, it understand what the user is asking, and it would generate an answer. Whereas, if a user asks a question like who won the player of the match award in yesterday's game of Punjab Kings versus Delhi Capitals, the LLM cannot answer this. The same powerful LLM cannot answer this question. This is because each LLM would have a cutoff date. Let's assume that the cutoff date for this Llama 3 model is December 2025. Let's assume that the actual date might be different, but let's assume this. So, it doesn't know whatever has happened after this December 2025, and right now we are in April of 2026, so it doesn't have this information of what has happened in yesterday's game. To answer this user's question, the LLM or the system that we are building should have the ability to search the internet. Here, searching this internet is the action part. So, we can conclude this by saying that LLMs has an extraordinary ability to reason well, it can understand user questions, and it can answer the question that is given to it, but it cannot take actions like searching the internet, searching the database, etc., or calling an external API. So, this is where the react agent comes into pictures. So, here I'll write this as react, and this react stands for reasoning plus action. So, the idea of react agent is that we are going to combine the reasoning ability of the LLM and also have a system that can take actions by combining these two things. One is So, the reasoning, and the other one is action. So, how do we let the LLM to take this action? That is where entities called as tools will come into the picture. So, for an agent, think about LLM as a brain and tools as the hands. So, these are the systems that's going to actually take that action. Example for this tools can be a internet search tool, or it can be like a DB like, you know, a weather tool where given a city, it's going to predict like what's the weather of that particular city. Not predict, basically find what's the weather in that city. And let's say we are building a tool with external APIs, etc. So, by default, a LLM cannot call these tools

### Segment 2 (05:00 - 10:00) [5:00]

cannot use these tools, but we are building a agent that can uses this reasoning power, but also have the ability to call this tool. So, this is what a react agent is. Okay, so that is the core idea. Now, with this understanding, let's move on to understand what's the flow is like. So, how do we start with a user query, and how do this react agent would answer this user query? That would involve taking some actions. It doesn't always have to be internet search or DB search, it can also be something like booking a flight ticket, or it can be like scheduling a meeting. Any action this kind of agents can take it. So, now let's understand the flow of react agents. So, I'll write here as react agent flow. So, in this flow, first we would start with a user question. So, let's say that user asks a question. Let's assume that this question is about uh find the cheapest flight between Chennai to Mumbai. So, this is the question, and again, in this case, we need real-time information, right? As the flight ticket prices may fluctuate. So, this is the user's question, which is like again, trying to find some real-time information, and we know that by default, a LLM cannot do it. Now, what we would do is once we get this user question, we are going to build a prompt. So, here we will be building a specific type of prompt called as a react prompt. So, basically, in this react prompt, we would provide the list of tools that are available. So, I'll call this as tools list. Plus, we would have a plan of action. Plus, what's the output format that is required. So, this prompt is going to be later sent on to the LLM. Okay? So, first we start with the question, and with this question, we would create this react prompt. So, it would be like, this is the user question, and these are the list of tools that are available. So, let's say that the tools available are internet search tool, weather tool, and a calculator tool. And let's say the question is about finding the cheapest flight price. And for this case, it doesn't need this weather tool, or uh what you call [snorts] calculator tool, right? It needs a internet search tool. So, we would say that this is the user question, these are the list of tools that are available. Now, tell me what is the tool that I have to use, and this is the output format that you have to answer this question. So, we would like frame this prompt, and then send this to the LLM. So, just like any other prompt, we would send this prompt to an LLM. Now, let's see what happens next. In the third step, the LLM looks at this prompt, and we know that this prompt also has the user query, right? So, this LLM would write a thought and action. So, I'll call this as thought plus action. So, what it would write in this thought and action is that it would say that for this user query, I need to use the internet search tool so that I can search for the list of flights that are available and their prices. And the action is actually calling the tool. So, tools are nothing but Python functions. So, here, the LLM tells us which function to call, whether it is calculator function, internet search function, and it would also tell us what's the input that we have to give to this function. So, as I said, just think about functions that we have to call, and what are the input parameters that we have to pass. So, that will be provided by the LLM in the action because it knows what are the tools that are available, what's the type of input and output that are required for this functions, etc. So, this LLM writes a thought saying that now I have to use this internet search tool, and the action is I have to call internet_search function, and the input should be, let's say, a flight between Chennai to Mumbai. So, that particular function and the input parameters will be provided by the LLM as plain text. So, this is important, so remember this point. Now, next comes the next step, the fourth step. And this is kind of like a common misconception that people would have. There is some misunderstanding here where the tool execution happens, whether we provide this tool code to LLM and the LLM execute this. That's not the case. So, what happens is LLM only tells you what is the tool that you have to call, what is the input that you can provide to this function. That is all the LLM gives you. So, we would have a backend code. I'll call this as backend code, or basically a parser. This parser would parse the text output that the LLM has given us, which basically has the action of what tool to call with what input, and then call the tool. Calling the tool is just like invoking a function. So, here I'll say we have a back-end

### Segment 3 (10:00 - 15:00) [10:00]

code that would parse the text. Text is this uh you know, the one that I have written over here. So, we have this thought and action, right? So, this output the LLM saying that you have to call this particular tool. So, that is being parsed and then this back-end code would call the tool. And from this tool call I'll write it as saying tool returns a result. And in this case, the result can be it would say that book this particular Indigo I mean, it won't wouldn't say that book this flight. It would just say that Indigo flight this particular number the price is this. So, this is the lowest flight ticket that is available between Chennai to Mumbai. So, the tool would return that result. Now, what we would add here is this result that we got would be added to the chat history. Just like the way we would send this chat history to any LLM. So, we would add this tool result to chat history and then loop back to LLM. Basically, again have another call to this LLM. Now, the LLM knows that it has the required information and it can answer this. And the other key idea is that this process is going to be repeated multiple times. So, here I'll say repeat until LLM writes final answer. So, why do we need to repeat this? Let's try to understand this. So, here the result is added to chat history and then we loop back this to the LLM and we would repeat this until we get a final answer. So, in some cases one tool call is not enough. Sometimes it has to search the internet multiple times. In that case, we do this exact same process of like you know, building this react prompt, sending it to the LLM and LLM mainly writing this thought and action and we have to call this back-end code. So, these two are the important steps that we have. Step number three and four, which is LLM telling us what's the tool that we have to call, what is the input and output for it, right? And here the back-end code actually identifying that given tool call and calling the tool, getting the result and this result is then sent to this LLM. So, this is the process and as I said, sometimes you have to call multiple tools or you have to call the same tool multiple times. So, this happens in a loop. So, this is the overall process. So, user asks a question like the user asks the question of who won the player of the match in yesterday's game. We have already let's say I've built up react prompt template saying that these are the tools list that are available to you. This is how uh you have to pass the input and output to the tool and this is how I'm expecting my output uh to be. So, we build this prompt, add this user question and send it to the LLM and the LLM would say that to identify yesterday's player of the match, I have to use uh internet search tool and the action would be player of the match DC versus PBKS. Let's say that is the function call that the LLM is suggesting us. And as I said, it's only going to tell you what tool you have to call, what is the input should be. Now, in the back-end code as I said, we have to parse this text, identify the given function call and call that function or call that tool. And this tool would return a result saying that player of the match is KL Rahul and this result is then passed on to this LLM and now LLM would say that now I have the answer, I can answer my user's question. So, it would think that again it would frame this thought and instead of giving this action, it would wouldn't give this action because it has the final answer now. It would say that yesterday's game the player of the match was KL Rahul. So, this is how it's going to work. Whereas in some cases as I said, it has to call this tool multiple time or multiple tools being called multiple times. So, this run in a loop. So, that's why I've given this as repeat until LLM writes the final answer. So, this is how the flow is going to look like. So, this is the entire flow for this. And I hope that you have understood what we have discussed here. The next thing we can discuss to in order to reinforce what we have learned is understanding this three stage loop that happens. So, I'll write this as the three stage loop. Basically, what we have discussed in the above flow, but just like putting it in some concrete terms. There are three components to it. So, first I'll maybe uh you know, draw those little boxes. These are the three steps that happen. Let me use a different color. So, first we have this first box. Let me draw a second box and then the third one.

### Segment 4 (15:00 - 20:00) [15:00]

So, whenever you are building a agent, you have to remember these three things. Right. So, the first one is thought. Next step is an action. And the third one is observation. Okay? Let's try to understand this. So, first we have this thought. Next, you have an action. And then you have an observation. So, what is meant by this? So, here Sorry for that weird sound. I have my dog sleeping under my desk. Okay. So, here the thought is let's say this is about a calculation that LLM has to do. Now, the thought would be like the LLM would say that I need to multiply the given input. So, this is an example of this thought. I need to multiply So, the LLM thinks this and it would say that for this the action is calling the CALC tool or CALC function, which is the calculator function. And let's say the numbers are given numbers are 23 into 47. Okay? Let me maybe write this as 23 into 47 in string. Now, this tool would have given this output and let's say the output for this is 1081. So, let's understand what happens here. So, here let's say that the user asks a question of what is the product of 23 and 47? So, this is the user question. So, once this is given to the LLM, first it would think that to answer this user question, I have to multiply the given numbers. And they cannot multiply it directly because complex numbers the LLM can make mistakes. So, we are telling the LLM that don't rely on your knowledge of calculation, use a calculator tool. So, that we would have framed in our react prompt itself. So, it would say that I need to multiply, for that I need to use the calculator tool. So, it would say that call this calculator tool and provide this as input of 23 into 47. Now, as I said earlier in this step, we have a back-end code that's going to parse this text entire text of thought, action and all that and it's going to identify that the action is calculated and the input is 23 into 47. So, this is going to be called and this tool would provide an answer of let's say 1081. So, this 1081 the tool output is basically observation. This observation is again then passed on to the LLM just like the way we have mentioned over here. So, it's added to the chat history. So, it would also see that previous steps of thought, action, etc., the user query and all that and now it also have this observation. Now, it would say that I have the final answer, I can tell the user my final answer. Now, the LLM would say that 23 into 47 is 1081 and this is being sent to the user. So, by default LLM may not work well with like complex multiplications or some mathematical calculations, but we are providing this ability to call this calculator tool, which is the action in this case and now it can answer this better. Similarly, we can provide a DB tool where LLM can call this tool and in turn this tool is going to connect to a database, pull the relevant data and then provide it to the LLM and the LLM can frame the final answer. So, this is the overall idea. You have a thought, action, observation loop and we call this loop because this happens repeatedly. So, here I'll say So, I'll use this loop symbol say that repeat until you get the final answer. So, this is what we basically call like a react loop. So, just like keep that in mind. Thought, action and observation. So, this is the idea. So, these are the core concepts that I wanted to teach you about today about react agents. Let's just like discuss one example and just like understand when to use react and when this is like a overkill. And in the next video as I said earlier, we can also try building this react agent from scratch, which would give us like much more clear depth and knowledge around this. So, let's understand with this simple example. So, here I'll say the user comes up with a query saying what is 23 into 47? So, this is the user query and let's say that we have decided that any mathematical calculation the LLM should not answer from its own training knowledge, rather it should use a tool. So, that we would have framed it in our react prompt. Now, once we put this question to this react prompt and we send this to the LLM, the LLM first is going to come up with a thought. So, here I'll write this as thought.

### Segment 5 (20:00 - 25:00) [20:00]

And the LLM would think something like this. So, I need to multiply these two numbers. So, it would say, uh you know, use calculator tool. And it would also tell you like, you know, what is the action? So, let's say that I'm writing this action as it would say call this calc tool and the input for this would be 23 into 47. So, the LLM would only tell what is the function that you have to call or the tool call, what is the input parameters that you have to provide. Now, as I said, in the background, we call this tool, we pass this text that is provided by this LLM, this text provided by the LLM, we call this actual tool, get the output of this 23 and 47, which is basically my observation. So, here I'll write this as observation. And the observation, the calculator tool, it's just going to give you the output. Let's say in this case the output is 1081. And now again, we send this observation of 1081 to the LLM along with the previous chart history of uh the question, the thought, action, etc. And here, the again, the LLM would think that thought is you know, I got the final answer. Uh to user query. Now, it wouldn't ask you to call any tool because it doesn't need it, but sometimes it might need to call the tool second time or a third time, or it might need to call like a different tool, right? So, in that case, again, an action would be there, but here the LLM has got the final uh answer, so it would say that I got the final answer to user query. Now, I can provide the final answer. So, now it would give this final answer saying that 23 into 47 is 1081. So, the LLM would give the final answer. So, this is the overall idea. So, I hope this entire flow is clear. So, just like understand whatever happens here, uh you know, where this tool getting called, what the LLM provides you. If you understand that, then that is like sufficient. So, maybe I'll just like add it as a note as well, so that you will remember that. So, here I'll write it as saying, only writes the action as a text. Action basically like what tool to call, what is the input that you have to provide. Our code, our back-end code basically does the actual tool call and return the output or return the tool result to the LLM, which in turn is going to frame the final answer, okay? So, I'm insisting because as I said, people would kind of confuse this like where this exact tool execution happens, so that is why just remember that we provide the list of tools available, what's the input and output format for this tool, and LLM only tells us that for this question call tool A with this input, B with this input, etc. So, that is all this LLM is going to tell us. So, just like keep that in mind. And finally, let's discuss about when to use this react agent and when it is a overkill. So, here I'll write when to use react. And you can even like think about this react as a type of prompting. I mean, it is an agent as it kind of do stuff, but it's actually a prompting technique, I can say that. Right. So, first is like when it is like a good fit. When we can use this. So, let's try to answer that first. So, you can use this react-based systems if you are building a let's say a multi-step research task. You might have seen this deep research and this thing that are available in chat GPT where once you give this question, enable that and ask it to do a research, it might take like 10, 15 minutes, uh search the internet, search like different articles, papers, etc. and it would come up with this research. So, these are examples of the agent that is working internally that would repeatedly call several tools, finally come up with this research report. So, such kind of multi-step research and multi-step tasks, we can use a react-based systems as we have a loop that is running on continuously

### Segment 6 (25:00 - 30:00) [25:00]

until the final answer is reached. And let's say that you uh need this Let's say you need external data or real-time data. In that cases, we can use a react-based system. As I said, if you want to use a internet search tool to identify uh latest information, in that cases you can use this. Or if you want to build a rag-based system, you can also create this rag as a tool. So, this tool basically, once the LLM calls this, it's going to look at your vector database, return the result. So, this rag systems can also be built as a tool. And let's say that you have tasks with branching logic. So, in that case also, you can use this. Say, for example, you have a agent and uh the agent can answer from local database, let's say a rag database, or for real-time information, it should look for internet search. Now, you can provide these two tools for your agent. One is a rag tool and the other one is a internet search tool. So, there is a branching logic here. And now, the agent, using the LLM's reasoning power, it would decide whether I have to call the rag tool or the internet search tool. So, these kind of where branching logic is present, it's like a good candidate of where you can use this react-based systems. And these are the ones that are like overkill for us. Or basically, in these cases, we might not have to use like a react-based systems. So, let's try to understand that. So, if you are building a simple chat or question and answering systems, then you don't need like a complex agent with like a lot of tools and all that. You can simply build a single prompt-based chatbots. And when it's like pure creative writing, you don't need like a kind of agent to do that as it doesn't require any tools, etc. You can simply use like a default LLM or even you can fine-tune it, but a agent is not required. Let's say you have a simple summarization task. So, even this can be done with a simple uh you know, prompting, this can be achieved with some zero-shot or even like few-shot prompting, you can build a simple summarization bot. So, in these cases, you don't have to build like a complex tool for it. And let's say there is another case where you have a strict latency budget. Latency is like how quickly your chatbot can answer a user question. Agents call these tools multiple times, so there is like uh increased latency that you will see like in a agent-based systems, but in some cases, in some critical tasks, that latency is okay, but some tasks doesn't need this like we cannot like allow this higher latency. In those cases, agent is probably like not a good idea to kind of use. So, this is uh these are the things that I wanted to talk about in this react agent. So, I hope you have understood whatever we have discussed. So, first we have discussed what is a react agent and why we need it. And the main reason or the purpose of this is that LLM reasons well, but it cannot take actions. So, that is where the react agents comes into play where it's going to use the reasoning ability of the LLMs and also provided the capability to do some actions, do some steps. It can be searching the internet, a database, it can be like booking a flight ticket, etc. So, here we would build these tools and these tools act as the ants for this agent and the LLM would act as the brains for this agent. And this is the flow that happens. We would get a user question, and from this user question, we would have already have a react prompt template. To this, we add this user question. And this react prompt would have the list of tools that the LLM can access, what is the plan of action, what is the output format required, etc. And the LLM writes a thought plus action saying that I have to call tool A, tool B, or tool C. And it would say that the action is call this particular tool with this particular input. And our back-end code would parse the text, identify the tool call that is provided by the LLM and call the actual tool. And this tool or this particular Python function basically, as I said, tools are nothing but Python functions that we can write. So, this function is going to return a result, and this result is then added to the chart history and then sent back to the LLM. And now, if the LLM has the final answer, it would write the final answer, or it would just like repeat it in a loop until it receives this final answer. And then we have a three-stage loop. One is thought, which is the LLM identifying which tool to call, and the action, which is it saying that call this exact tool, provide this exact function, I mean, function input, and we then execute it, get the tool result, provide that as the observation to the LLM and the LLM would repeat until it gets the final answer. One thing to keep in mind is that we always have to provide a condition on what's the maximum number of retries or times this loop has to run. Otherwise, like the uh you

### Segment 7 (30:00 - 31:00) [30:00]

know, system can go to an infinite loop. That we can discuss in our answer video that's coming up. But yeah, that is the overall idea. We have seen this with an example. If the user asks a question of what is 23 into 47, the LLM would have a thought and action saying that for this I need to multiply the two numbers. So, I have to use the calculator tool for this. And this is how you can call this calculator tool. And in the back-end code, we would execute it, provide this observation to the LLM, and it would think that I got the final answer, and now I can answer the user query. And it would say 23 into 47 is 1081. And the main thing is that the LLM only writes the action as a plain text. So, we parse that and we do the actual tool call and then send the output to the LLM. And these are like when to use and when to not use. So, when you want to do a multi-step research, multi-step action, and etc. When you want external data or like real-time information. So, in those cases, you can use this. Or when you have a branching logic, you can use this. Simple uh systems that can be built with like single prompts, we don't need this. And also, it's not like a good candidate when the latency has to be like very low. So, these are the overall concepts. So, just like go through this and understand the you know, principle and the conceptual aspects. And in the next video, let's try to build a react agent from scratch so that would give us a better idea so that you can appreciate and understand how all these like agentic frameworks does all these steps under the hood. So, that is all from my side. I'll meet you in the next lesson.