# Build a ReAct AI Agent from Scratch in Python | No LangChain, No Frameworks

## Метаданные

- **Канал:** Siddhardhan
- **YouTube:** https://www.youtube.com/watch?v=5hnt-bWeeOM
- **Дата:** 30.04.2026
- **Длительность:** 1:05:44
- **Просмотры:** 1,217

## Описание

🤖 My end-to-end Machine Learning  & Generative AI Course - Udemy: https://linktr.ee/siddhardhan

Learn how AI agents really work by building a ReAct (Reasoning + Acting) agent from scratch in pure Python — no LangChain, no LlamaIndex, no frameworks. Just you, an LLM, and a simple loop.

In this hands-on tutorial, we break down the ReAct pattern into its core components and implement an agent that can think, call tools, observe results, and reason its way to a final answer

Colab file link: https://drive.google.com/file/d/1A2LqtfgQvsUI3fO4e-Ys2Ve5CsRPPF5b/view?usp=sharing

#generativeai #genai #artificialintelligence #ai

## Содержание

### [0:00](https://www.youtube.com/watch?v=5hnt-bWeeOM) Segment 1 (00:00 - 05:00)

Hello everyone. I'm Sidharthan. In the previous lesson, we understood the core idea behind a react agent. We have seen how it thinks, takes an action, observes the result, and continues this loop until it reaches the final answer. Now, in this lesson, we will take that concept and build it from scratch using Python. You might ask, when we already have agentic frameworks like LangChain, CrewAI, LangGraph, etc., why build this manually? The reason is simple. When we understand the internals, we can debug better, we can customize better, and use these frameworks more confidently, and we can have more control on how we build these agents. We will not directly depend on any agent framework. Instead, we will understand the internal flow step by step, like how the prompt is structured, how the model decides which tool to call, how we execute that tool, and how we feed the observation back, and finally produce the answer. By the end of this video, you will clearly understand how a react agent actually works behind the scenes, not just theoretically, but by building one yourself. So, this will be the agenda for this video, and let's get started. Before moving on to today's topic, let me quickly show you my Udemy courses. Currently, I have a complete generative AI course and a complete machine learning course. In these courses, I've started from the very basics and covered all the way up to the advanced topics, and you will also have a lot of hands-on practice in these courses. We also have capstone projects towards the end that you can work on and add in your resume. So, if you are a beginner or an intermediate learner, these courses will be a perfect fit for you. I'll give the link for these courses in the video description. You can check this out. And I'm also planning to add more courses in this series like deep learning course, ML Ops, AI agents, etc. So, look out for that as well. With that being said, let's get started with today's video. Let us build this react agent in a Google Colab environment. I'm connected to my session over here, as you can see. And in the previous lesson, we have seen that we have a three-stage loop, think, act, and observe, where the LLM thinks what it has to do, and what tool it has to call, what is the input for that tool, and action is calling that tool, executing that tool, getting the output from that tool, and providing this output or this result as observation to the LLM back, and the LLM would finally provide us the answer. So, this is the agent loop that we have discussed in terms of a react agent system. So, now let's try to build one. The first step that I'm going to do is just like install the required libraries. So, I'll call this as setup. All right. So, here I'm going to install Grok. And I'll put a hyphen Q over here. So, here we are going to use Grok as our LLM provider. So, we have discussed that in our agent, especially in a react agent, LLM is like the brain and tools are like hands, right? The hands are the ones that actually do the work, and the brain tells what action we have to do and how to do it. Now, as I said, I'll be using Grok as they should provide us some open-source LLMs that could give us responses quickly, but you can simply replace this Grok LLM with, you know, OpenAI LLMs or Gemini, Ollama LLMs. So, feel free to do that. I'm just like going with Grok and maybe use a Llama 3 LLM. So, that's the plan. So, we have installed the required libraries here. Grok, as I said, if you want to use OpenAI or Ollama, install all those required libraries here. Now, I'm going to import the required dependencies. So, first, I'll say import OS, import RE, and import getpass. So, all these three things I'm importing in one line. This is same as saying import OS, import RE, and import getpass. So, instead of writing it in three lines, we can just like write this in a single line in a simple way. And from Grok, I'm going to import Grok with upper case G. So, this is my Grok client. So, let's run this two lines of code. So, these are our import lines. Right. Next step is connecting to Grok client. So, I'll create this text cell and say that get a free key at console. grok. com. And I've explained several times how you can get this key. Simply go to the site, sign up, and you can get a free tier key from it, and you can access the Llama 3 models. Again, I'll be using a 70 billion Llama 3. 3 model, but you can use other models as well. Some models may not work that well with, you know, tool calling, but this Llama 3. 3 works well. So, just like have that in mind as well. So, I'll say client is equal to call this Grok that we have imported. And within this, we have to pass Grok API key. So, what I'll say is API key is equal to os. environ. And here, we have to mention

### [5:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=300s) Segment 2 (05:00 - 10:00)

grok_api_key. So, basically, the idea is that instead of providing or hardcoding the API key directly in our code, which is not a secure way, we can get this Grok API key from the environment variable. Now, before doing this step, we have to save this API key in the environment variable that we have provided over here. So, I'll create another code cell above this and say os. environ. Same thing that we have mentioned earlier, and say grok_api_key. Right. Now, what I'll do is I'll say is equal to getpass, which is the library that we have imported over here, and we can say getpass. getpass, which is the actual method that would help us to get a password or API kind of a data. And here, I'll say provide your Grok API key. So, basically, the idea here is when I run this, this is going to prompt the person who is running this notebook to paste their Grok API key. And when I paste this here, it the key won't be visible here. So, that's just like how you would type a password or paste a password. That's the purpose of using this getpass to just make sure that we are securely putting this API key in the notebook. So, let me quickly run this and paste my API key. So, I already have my Grok API key ready. So, I'll quickly copy that. And let me paste it here. Right. So, in this input box, I can paste it. You can see that I have pasted it, but it is not visible. So, this API key that I pasted will be saved in this environment variable called as grok_api_key, as we have used this os. environ. Now, I can share this notebook. Still, my API key is not visible. When a person is running this notebook, they can again run this particular cell. They can provide their own API key, but it won't be visible here. So, this is like a kind of secured way to do this. But in actual production environments, we might use a. env file. So, that's the idea. So, first, we have imported the required libraries, and then we have imported this Grok client. And to use this Grok client, we need this Grok API key to be saved in the environment variable. So, this step is saving the key, and this step is calling it. So, this os. environ will load the saved API key from the environment variable and provide this to this parameter called as API key for this Grok client. And in the next line, I'm going to create a variable called as model and mention the exact model that I'm going to use. So, I'm going to use a Llama 3 Llama 3. 3 70 billion versatile model, as I mentioned earlier. You can just like go through Grok docs in order to, you know, check the other models that are available. So, it would have like a kind of like several open-source models that are hosted there. So, you can use like model of your choice, but this 3. 3 70 billion, which is a pretty larger model, is a good candidate for us. Right. This is the first step, which is like accessing this Grok client. Next step is creating these tools. So, I'll create another text cell over here. So, plain Python functions registered in a dict. So, this is the next step that we are going to do. And as I've mentioned in the previous video as well, tools are nothing complex. Think about these tools as simple Python functions that we can execute. Here, for this particular lesson, we are going to build three tools. So, we will create three functions, basically. So, one is a calculator third one is a word count function. So, let's see how we can do that. So, I'll say d e f calculator. And the input for this function would be an expression, which will be of data type string, and the output type will be a string. So, this is the input parameter, we know, right? So, when we call this calculator function, we have to provide an expression. I'll talk about what this expression is, but just like look at this colon string that we have mentioned over here. So, what this is it's like way of telling that this function needs this input parameter expression, and it is of type string. And this symbol represents the output data type is also going to be a string. So, whenever we are creating these tools for agents, it's always recommended to provide these types clearly, the data types, the input data types, and the output data types clearly, because some, you know, agentic frameworks would look at these tool functions, get these data types, get the docstrings, etc., and provide it to the LLM, so that the LLM basically knows how it should call the tool, etc. Maybe I'll just like explain it after providing this docstring and the code for this particular tool. So, I'll quickly copy and paste it and explain what is basically happening over here. Right. So, this is all good. One second. So, this

### [10:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=600s) Segment 3 (10:00 - 15:00)

All right. So, as I said, when you use this agentic frameworks, we would build tools exactly like this and we would add a decorator, something like adding, you know, at tool or something like that. So, basically what happens here is when we send this user query to the prompt, the tools that are available will also be sent to the LLM and the LLM would look at the list of tools that are available and say that for this query you have to use a calculator tool, for the second get weather tool, etc. Now, we need to provide information on what this particular calculator tool do, what is the input parameter required, what is the data type, etc. So, that's why we provide all these information and these agentic frameworks would take all these informations as metadata for this function and provide it to the LLM. Say, for example, it not only gives the name of the tool that is calculator, but it can also provide this docstring that we have mentioned here. Now, the LLM has a more clear idea on what a tool does by looking at this docstring. It has a more clear idea on what's the input data type that it has to provide because we have mentioned it as string and all that. So, that is the idea. So, to put it simply, whenever you are creating tools for your agent, always clearly mention the purpose of the tool in your docstring, mention the data type of input and the output clearly. So, that's an important step. Now, in this calculator function that we have created, the important part is this eval that we have mentioned over here. Now, I'll explain you what is this eval. This is like a built-in method or a function that we have in Python. So, inside this eval, if I provide something like 30 asterisk symbol 10, which is like 30 into 10 in the language of Python, if I run this, this is going to execute this string that I have provided. So, basically, it evaluates the string that we have provided and executes this in Python. Any Python code given in string can be evaluated and basically executed using this eval. So, here I'm providing this mathematical expression. I can also like provide something like 30 minus five, which is now in the form of a string, right? So, this entire thing is provided in the string. Now, when I run this, it's going to do that computation. So, it's basically a way of evaluating and executing Python code or Python commands. We can just like leverage that and use this as our calculator tool. Instead of creating like, you know, separate addition, subtraction, division in different if-else condition, we can simply use this eval. So, this is what we are basically using. This string that we are providing, right? So, this is basically our expression. So, the idea is if a user asks a question like, "What is this 23 into 47? " LLM should not answer it by itself, rather it should call this calculator tool and it should tell us that you have to call something like 23 into 47. So, this is what the LLM would do. So, it would ask us to call this calculator tool with this input. And then we are like passing this expression. So, this is basically your expression or the string that we are providing, okay? Now, we are executing this expression that the LLM is suggesting us. And these are just like some additional, you know, things that we can add it. So, what we say here is, as this is like executing a Python code, some malicious code can also come here. Some people can actually try to command or the LLM get confused and pass a command saying that delete a particular file or something like that. So, here we are blocking such inbuilt Python commands and functions. So, that's what it represents. So, this is just like saying this empty braces with this built-in represents, you know, don't execute any built-in commands. So, to put it simply, this is like a safer version of executing some mathematical expression. So, you can just like remember that. A simple calculator function, a simple calculator tool. So, you give a mathematical expression like 23 into 47, 100 plus five divided by three, etc. And you have a safe way of executing it without executing any built-in Python commands. That's the idea. If there are any errors, we would just like return an exception over here. So, that it's kind of like easily we can debug it. So, that's the uh thing over here, right? Now, let me quickly copy the other two tools as well and I'll explain what we are doing here, right? And then we have this get weather tool and this is like a mocked-up tool, it's not like a actual tool. So, the purpose of this tool is that, given a city, it's going to return the weather of a particular city. So, return current weather for a city and this is mocked. Mocked as in, if you give Chennai, it's going to give the temperature as 32° C, humidity partly cloudy, for Bangalore it's going to give the temperature like this, etc. So, the idea is just instead of putting some time on building this tool, we are creating this mocked-up tool. But when you are actually building a weather forecasting agent, we would call an external API like OpenWeatherMap API and some kind of API that would give us. But to put it simply, we have like a

### [15:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=900s) Segment 4 (15:00 - 20:00)

mocked-up version of this. So, we have a dictionary, Chennai, Bangalore, Delhi, Mumbai, etc. And when you put the city name, we access this particular value from this dictionary and we would return it simply, just like as simple as that. So, data. get city. lower. So, basically, when we provide a city name, Chennai, Bangalore, Delhi, etc., so first make sure that we convert it into lowercase letter as all the keys are lowercase over here. And just like get this value. So, data. get. So, here basically, from this data dictionary, we are getting the value of specific city. So, to give an example, if we call this get weather and put the city is equal to Chennai, it's going to say, "From this data, get the value of this city. " In this case, it is Chennai, so it's going to give you the value of 32° C, humidity, etc. Now, we have the second part, right? What it means is, if we are providing some city that is not part of this dictionary, then it would say no weather data for this city. So, that's the idea. A simple mocked-up tool. As I said, later we can also replace this with an actual weather tool. And then we have a simple word count tool where, given a sentence, it's going to count the number of words in it. So, we basically use this text. split and use this length. So, it's a way of splitting the sentence using white spaces and counting the number of components. So, it's basically split a sentence word by word and count how many words are there. So, three simple tools. So, the idea is that now we are building a agent that can use a calculator if the user asks any mathematical calculation-related questions. It can use this get weather tool if the user is asking any weather forecast-related question. And if the user want to count, let's say, a large paragraph, then it's going to use this word count. So, this is basically the idea. So, we have these three tools. Similarly, you can have a DB tool or a rag tool, etc. You can build like build any kind of tools over here. Now, we are going to register these tools within a dictionary. So, let's see what do I mean by that. So, I'll say tools. I'm creating this variable in uppercase letters and creating a dictionary. And I'm saying calculator is calculator, the function that we have created. And similarly, get weather. Here we are calling this get weather function. And then finally, I'll call this word count. So, put this key as word count. So, what is this? Let's understand this. As we have discussed earlier, the LLM won't execute the tool for you in its own host machine. Let's say OpenAI model is hosted on OpenAI server. Let's say GPT 4. 1 is the model that we are using and we know that GPT model is hosted in OpenAI server. When we send this tool information and user query prompt and everything to this OpenAI model, that is the GPT model, the tool won't get executed on the OpenAI server, okay? So, this is a misunderstanding that people would have. What actually happens is we only send the information about the tool. So, we would say that you have access to a calculator tool that would basically evaluate a math expression, etc. And then you have a get weather tool, etc. So, basically, we tell the LLM that you have these three tools available. This is the purpose of each of this tool and this is the input required and the output data type for it. We would only send this information and the LLM would tell us, "For this query, you have to call calculator tool. For get weather tool, etc. " Now, it's just going to return in plain text, right? It's not going to return like Python data types. Python strings or Python dictionaries, etc. So, it would just like return a string saying that for this query, it might say that you have to use the tool calculator. So, this is coming as a string. Now, based on the string value that the LLM is providing, we can just say, let's say, if the LLM is saying that you have to, you know, use the tool word count. In that case, it would give it as a string. So, this string value, we are going to put it over here. Basically, access this tools dictionary and provide the key, which is word count. And now this will Let me just run this first. Oh, I forgot to execute this cell. Right. So, basically, the idea is LLM would just say that you have to call this word count tool, but it would just give this as a string. Now, from this string, we have to access the actual function from this tools dictionary. So, if you look at it, this is the tools dictionary that we have defined. And here, this is the data that the LLM has provided me, the name of the tool in string. So, this I would pass in this tool and this would return the actual function. Okay? And now to this we can pass a, you know, input and get the output. Say for example, what happens is this is how the LLM is going

### [20:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=1200s) Segment 5 (20:00 - 25:00)

to respond. So it would say, you have to call calculator tool. So LLM would say, action is calculator. And it would say, input for this is, let's say, 23 into 47. So this is what the LLM is going to tell us with like thought saying that you have to call calculator tool etc. And this will be coming as a string, a multi-line string. So this is how it's going to give it. So we would pass this, we would capture the term calculator and then use it over here. So now I would say, tools and within that we would pass this calculator. So when I run this, this is going to access the function. So previously, the LLM gives me the text or the name of the tool. From that name we are accessing the actual function. And to that I can, you know, provide the actual input. Let's say I'm providing this input as 23 47 like this. And this is going to call that tool. So this is basically the idea. So the LLM gives you the tool name and the input that you have to pass. And in the back end we have to get the exact tool or the exact function, provide the input that the LLM has suggested and executed executed. So that's it. So this dictionary is basically a registry from which we can This tools is the provide the key, which is the name of the tool and get the actual function. So that's what I have mentioned over here. So plain Python functions as our tools and we have registered it in a dictionary. So this is basically the process of registering in a dictionary, nothing complex here. So you're just like putting it in dictionaries with the tool names as the keys. That is all, nothing complex here. Now I'm going to create a tools description and this is what I'm going to provide in the system prompt for this LLM. So here I have a description saying that calculator is one tool, get weather is second tool, word count is the third tool. And calculator takes in this expression as input, which is of data type string, output is string. And similarly we have get weather, city is string and etc. We have mentioned what it does. And we have also providing this example kind of as a one-shot prompting where we would provide one example so that the LLM knows better on how you have to call this tool etc. So this is again an example of one-shot prompting. As I said earlier, here we manually give this tool descriptions. In agentic frameworks, it would automatically capture this, uh, you know, data type information, docstring information from the tool and it would pass that to the model. So that is the idea, right? So we have created the tool description. Now comes one of the important parts, which is framing the system prompt. So here I'll create a text cell saying that system prompt defines the thought action observation format the parser depends on. It's okay if some parts are not clear for you, that's completely fine. Once we create this prompt, agent loop, then everything will make sense. Just hang in there. Right now we have just created three tools and we have registered that in a dictionary and we have created a description for each of this tool. Now I'm going to create a system prompt and in that we have to [clears throat] mention how the agent has to think, provide an action and what is the observation. So it's basically saying that when a user gives a query you have to first think and tell me what tool that I have to use, what is the input pass to this tool. And we would say that then I would execute this tool from my end and I'll give this tool result as an observation. So look at it. If you got your final answer, respond that this is my final answer. If you have to call some other tool, then you have to call this tool again. So this runs in a loop. So this loop stops when we get a final answer or we can also set up like maximum criteria etc. We will come to it. So first I'll maybe copy and paste this prompt and I'll explain this. So here we are saying, you are a react agent. So this is like system prompt, you might have also built, you know, chatbots using these APIs where you would say that you are a full assistant, answer user queries, right? That is for a chatbot and this is for a react agent, we have to frame this prompt. This is why we can also say that react agent is also kind of like a prompting technique. It's also like part of prompt engineering technique where you have one-shot prompting, you have zero-shot prompting, right? And you also have like this direct prompting, chain of thought etc. Similarly, react agents are also kind of like something that is derived from this react prompting. Now look at this. So here we are saying, you are a react agent that solves problems step by step. So this is the actual prompt that we will send to the LLM. Think about we would paste this text to ChatGPT and ChatGPT would respond to this. That's basically the idea. So similarly, we are going to pass this to the LLM that is running on Grok server. You are a react agent that solves problems step by step. These are the tools available and these are the descriptions. So basically, we are replacing this variable with the tool descriptions that we have got. So this is a F-string, so this will be replaced later. And we are saying that this is the format in which you have to answer me. One is follow this exact format. You have two options. Either you can respond with thought, action and action input or you can answer me with thought and final

### [25:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=1500s) Segment 6 (25:00 - 30:00)

answer. Now, when these two happens, understand that. When we don't have the final answer, when we have to call a tool, then the LLM should say that you have to call this particular tool, this particular, uh, input you have to give the give to the tool etc. Now let's say that the LLM has the final information that it need from this, uh, tool. Let's say that it has this, uh, you know, uh, output as 1081 for the mathematical calculation. So this is the output given by the calculator tool, right? Now we pass this observation to this and now it would say that I now know the final answer and then it would give the final answer saying that 23 into 47 is 1081. So this is how it's going to work. So either it can tell us to call a tool. If it's telling tool, this is the output it would respond it respond with. If the LLM got the final answer, then it would return this thought as well as the final answer. So that is the overall idea. You are a react agent that solves problems step by step. These are the tools that are available. You have to answer in thought, action, action input. After action stop, the system replies with observation. So this is saying that after uh, this action, stop what you're doing, you send me this output. After that I would send this observation as a result. So basically, we would add this observation as chat history. Again, we would just like send this, uh, like a follow-up question. So it's like first you have a system prompt which is this prompt that you're seeing over here and then you would have a One second. So first we would have a system prompt like this, the one that you're seeing over here. And then we would have a user query. So now the user query is, let's say, it's 23 into 47. Now the LLM would say that you have to call a calculator output. So this is my, uh, user query. So I'll call this as U as user. And the assistant would say that you have to call this calculator tool with this input of, uh, 23 into 47. It would tell us. And now we would execute this tool in our back end because we got this exact tool name and the input that we have to pass. And then we would, uh, pass this observation along with this previous, uh, chat history observation. We would say 1081. And this is coming from our calculator tool that we have executed based on the suggestion from the LLM. And now once this the assistant, A represents assistant, U represents user. Once the, uh, you know, LLM got this output, it would say that now I have the final answer. It would say final answer is 23 into 47 is equal to 1081, which is our final answer. So this is the flow that we would frame this. So first comes the system prompt, which is mentioned over here. So this is just like a way of telling LLM that what we are expecting, what's the output format that we are expecting, how it should work. And based on that it would say that call this particular tool because we have provided that tool information in this tool description. And once we provide this observation, it would know that, you know, I have to give the final answer etc. But there can be like a multi-step process. In that case, instead of like calling or providing this multiple answer, it would say that now call your get weather tool or call your calculator tool again. So it can take that route as well. So that is the overall flow. So let's execute the system prompt. And now, uh, we have the system prompt, the tool description will be like replaced by that. Maybe I can print and show that to you as well to so that you can understand this better. System prompt. So this is the exact system prompt that we will be sending. We are saying that you are a react agent, solve this step by step. And this is the important part. So we mentioned that these are the tools that you are have available. This is what this each of this tool do. So this is what calculator tool do, this is what get weather tool do etc. Word count tool work etc. And you have to respond like this. So this is the overall system prompt that we are sending. Right. The next step that we have to do is pass the output. So I'll create this the fourth step and call this as parser. Right. Now let me quickly copy and paste this code and I'll explain you what is happening over here so that we can save some time. Right. The LLM is going to respond with thought, action and action input. And then it would respond with thought and final answer, right? So these are the two kind of output that it's going to give us. Now, you wouldn't get this thought action reasoning uh, in kind of like a text format, right? Maybe I'll just like give an example. So this is how it's going to respond. Let me quickly copy an example and paste it over here. This is how, like, it might work in a example. So let's say that we the user is asking, what is the weather in Sydney? Maybe I'll just like give that question as well so that it's easier for understanding. Let's assume that the user query is this. So, the user asks, "What's the weather in Chennai? " Now, once we pass this user query along with the system prompt. So, first goes the system

### [30:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=1800s) Segment 7 (30:00 - 35:00)

prompt, and then we would add this user query in our messages list, and then we would pass this. And now the LLM, this is the thought of the LLM. It would say that to find the current weather in Chennai, I need to use the tool that provides weather information for a given city. So, this is the reasoning part of this react agent. And now in the action, it would say, "We have to call get weather. " And the input for this get weather is Chennai. Now, what we have to do in our back end is the LLM would give this as output. So, this is my user query. query, and this is my LLM's response. So, it which will be a string, right? Let's say it is a multi-line string. From the string, we need to get the exact name of the tool and the exact input parameter for this. That's what we are parsing over here. And let's say that the LLM is uh if it has got the final answer, then it would just like say thought and final answer. So, basically from this output, we have to identify the action and action input. If it's final answer, we have to identify the final answer part. So, that's what we are trying to do in this parse response. So, this parse response can give us three types of output. So, it's just going to return us tuple. And the tuple can be of three types. One is if it's a final answer, it's going to have a flag called as final, and it would just give us a answer, the exact answer the LLM has given us. So, this happens at the end of this loop. Once the LLM has like performed like all this tool calls and all that, we got the final answer. So, this is the final step. If it's suggesting us to perform an action, this parse response function. So, basically like whatever output that we get from the LLM, we put this through this parser and get the exact final answer. Or if it is asking us to perform a tool call, then we would add a flag of action. Provide the name of the tool, and then the input over here. And then if there is a error in somewhere in between, some server error or some uh you know, any errors that can happen, that will be parsed as error and the exact messages. So, these are the three types of output that can come from a parse response. So, basically, if uh we have Let's say the LLM is saying that you have to call this get weather in Chennai. So, it Let's say that this is what the LLM is suggesting, then this parse response would say create a tuple. The first value is action. The second is the name, which is get weather. Third is input, which is this Chennai. If the LLM is giving the final answer in the second call. So, first call, it would say that call this get weather thing, what you're seeing over here. Second step, it would say that uh 32° C, humid, etc. So, here instead of this action, we would have this final over here. And here it we would have something like you know, 32° C, uh you know, humid, etc. And you wouldn't have any action input over here. If this is the case of the final answer, then we would respond with final flag and the answer. For error, we would have a error flag and a message. So, basically, a simple parser that would parse whatever LLM response that we are create we are getting. So, there are three scenarios over here. So, this is like a simple regular expression string matching step that we are doing here. So, basically, we are saying this text is once we get the output from the LLM, we put this through this parse response, and the LLM output is passed on to this text parameter that we have over here. So, that's what we are later passing. So, here basically, we identify whether there is a term called as final answer and a space. If you see that, capture whatever comes next after that. So, that's what we are doing here. So, whatever comes after this final answer, capture that and give that as final answer. And then we are returning it. So, this is that flag that we have talked about over here. So, it would return the flag as final, and this m. group. 1. strip is basically the answer part that we have identified over here. And next is this A represents action, and this I represents input. So, this part of the code runs if your LLM output has final answer in its output. Okay? So, this happens at the end of this agent loop. Whereas, at the initial step, it wouldn't have this final answer. Rather, it would have a tool call. And I've told you that tool call would have a action and action input, right? So, we are trying to identify and extract that. So, if your LLM output has action and colon, then extract that part. And this part would be the action. So, A is nothing but the name of the tool. And similarly, if you have action input, then extract that line, and that is my input, which is It can be this A would be get weather, and this I would be Chennai. Okay? This I is this input. And from this A and I, we are just like stripping it out and getting this. So, this is an example of how it would look like. If the LLM is suggesting us to call get weather tool with Chennai as the input, then it would be this action. So, the output would be like from this parse response. So, it would say action, and it would say get underscore weather

### [35:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=2100s) Segment 8 (35:00 - 40:00)

and Chennai. If the LLM has the final answer, then this parser would give the output as final. And then it would say 32° C, humid, and whatever we have configured earlier. So, this is the kind of output that you would get from it. So, it's basically from this raw output that we are getting from the LLM, we are extracting only the name of the tool, the key information that we need. So, we are trying to extract that. So, this is this parser. Right. Now comes the final step, which is creating this agent loop. So, I'll create a text here. Call this as fifth step is agent loop. Right. And now I'll give you a flow diagram that's going to explain the code that's going to come below. So, basically, we are implementing a think, act, observe loop. Repeat until final answer or maximum step. Right. The idea here is the LLM can suggest us to call the tool multiple times. It can tell us that call calculator two times or call this weather tool two times. Let's say the user is asking the weather for Chennai and London. So, we have to call this tool two times, right? So, it's There are like two steps over here. But we cannot give this kind of like open-ended scenario. Basically, to put it simply, we can limit how many steps that it can take, mainly to prohibit any infinite loop. So, if we don't frame this prompt properly, right? The LLM keeps on you know, giving us action input, and we would end up executing this action again and again, executing this let's say tool again and again. It might not give this final answer if it's not like configured correctly. So, let's just assume that we have given this that you have to give me action input, and once you have the enough information, you have to give me the final answer. Let's assume that the agent is not able to get the final answer. Let's assume that uh a particular mathematical expression, our calculator tool is not capable to give that answer. So, the LLM would repeatedly try to call this calculator tool again and again. So, there is a you know, problem for infinite loop. So, that's why we can add a condition saying that stop when you get the final answer, or stop after running this loop five times, this action loop five times. So, we can say something like this. So, this is how the flow is going to work. So, the user gives a user question. We append it to this react prompt that we have built, pass it to LLM. The LLM would think. And once it has given its output, okay? So, the LLM has thought, and it has given us that for this you have to call this calculator tool, and this is the input. So, this output is being passed by the parser that we have created over here. So, this is that component. There are two scenarios. One is it can give you a final answer, or tool call. Let's say that LLM is saying you have to call calculator tool, and this is the input. Now, we call this tool call. This act is basically the executing that calculator tool with the input that the LLM has suggested, and the output is observation. Okay? This observation is then fed back to this LLM, and now it says that, "Okay, the user asked the question of 23 47. I suggested you to use the calculator tool, and you have got the output as 1081, and you have looped me back. " So, it's basically calling this LLM again with all this chat history and all that. And the LLM would say, "Right, now I have the final answer. " So, the output would contain the term final, and then we would stop there and give the final answer. So, this is the overall idea. Parse a user question. LLM would either ask you to provide a tool call, or it would say that I have a final answer. If it is a tool call, execute the tool, get the output as observation, put it back to the LLM, and now it would have the final answer. Now, this tool call can be one, or sometimes the agent can tell us to first calculate 23 into 47. And let's say the user has also asked us to uh divide it by three. So, 23 into 47 divided by three, do it step by step. If the user has given that, then it would say that first call this calculator tool for 23 47. So, that is like the first step. Observation goes in 1081. Now, it would say that, "Now, divide 1081 divided by let's say three or some number. " So, this is the second tool call. So, this happens in a loop until a final answer is reached. Again, as I said, instead of there is a risk of infinite loop that can happen over here. So, we add another condition. You can stop once you get a final answer or stop once you have performed this you know action thing. Let's say five times or six times. So we can add this maximum number of steps that something can happen. So that is the overall idea. Now let's look at this exact run agent function. So this is the actual part where the

### [40:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=2400s) Segment 9 (40:00 - 45:00)

agent code runs. I mean, the code may look complex but it actually is not. It's a simple for loop of what we have discussed over here. So what I've mentioned here in this flow diagram is basically what is happening in this code. So we call this run agent function later. Maybe I'll give you an example as well to try it out. So I'll create a tick cell saying that try it. So that we can easily understand this. I'll provide some examples over here. So one example we are 234 multiplied by 17 and then divided by three and the other one is about like weather related question, okay? So I'll first run this. Maybe run this as well. So now let's try to understand what is happening, okay? So first the user gives a query. So this question the user gives as a query so in the form of string. So this is an example. So we call this run agent. We provided this question, right? So that goes into this question parameter. And then we have some default parameters. So if the user doesn't provide any value for this max steps or this verbose we are going to use this max step as six, the verbose as true. If they provide let's say max step as three, then we would use maximum number of steps as three. Verbose true is just to give us those intermediate results and all that. If you turn this verbose as false, you would only see this final output. This is just for our debugging purposes. If you want to debug better, add this verbose as true. Or let's say you don't want to look at any verbose, just say that verbose is equal to false. As I said, this is a default parameter so you can just like skip providing it. If you skip it, it's going to take the default value as true and then we have the maximum steps as six. Now let's understand what's happening here. So the purpose of this run agent function is to run the react loop until final answer or max step is reached as we have discussed, okay? This is the messages list that we are constructing. This is similar to how you would send a chat request to a LLM. So we define this role as system, provide the content the system prompt. As I said earlier, for a chatbot here we would give the system message as you are a helpful chat assistant. In this case, this is a react agent so the system prompt is what we have constructed over here. You are a react agent. This is how we have to work, the tools available, etc. So that goes into the system prompt. And now comes the user message. This question is basically what the user has provided over here. So how it would look like is first we would have a system prompt and the user query that you see over here will go as the next you know element over here, okay? If verbose is true, that means if we are requesting to look at the verbose, we would just like print the user query over here. So if you look at it, instead of just providing the final answer, we have printed it so that we can look at the intermediate results, what the agent has thought, what's the output that it has given, etc., okay? So we have constructed the messages with system prompt and added the user query and we have also added a verbose condition. If it's true, print the user query, whatever that they have given us. Here comes the loop. We have been talk talking about this think act and observe loop, right? This is that exact loop that we are building using this for loop over here. For step in range one comma max steps plus one. So basically it's saying range of one comma six if I put it over here. For I in range one comma so if six uh if it's six then this would be six plus one, right? Six plus one, seven. Let's print I over here inside this for loop. Oh sorry, I have to print this. I. So count how many times the loop has executed, one two three four five six, okay? This is basically what we are doing here. When we say that maximum steps is six, here we are saying the maximum number of time this for loop can run is six times. Even if the LLM was not able to provide a final answer no problem, don't run it beyond that. You can run a maximum number of six times. But if the LLM was able to give a give the final answer at third step, we don't have to run the extra four five six. So that's the idea, okay? So there are two stop conditions. One is stop once you get the final answer or stop once you know reach this maximum number of steps. So we are saying for step in range one comma max steps plus one. So this is just like a fancy way of saying run this maximum of six times. If the user has given maximum steps of three, it would be one comma three plus one four four. So one comma four which means the for loop would run three times. Now in the first step we are sending this uh you know system prompt and this uh question that the user has provided. So basically we send the system prompt and what is 234 multiplied by 17, etc. to this LLM.

### [45:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=2700s) Segment 10 (45:00 - 50:00)

So this is the actual part of calling this LLM. So here we have this client which is the Grok client that we have created over here earlier. So we have this client, right? Block Grok client and the model is Llama 3. 3 70 billion versatile. So that's what we are providing over here. So we are calling this client. client. chat. completions. create. So mention what's the model that you want to use. Messages is this messages list that we have created and temperature zero mainly for reproducibility. So reproducibility as in for a given input every time you should get like the similar output as kind of LLM is like a probabilistic model. So again to put it simply, for a given input we just like want consistent output. So we can have this temperature value closer to zero. If it's kind of closer to 1. 52 then it's like every time you might get slightly different creative answer. But for this agent kind of thing we can have this temperature value as low, okay? So we have model, messages, temperature value and then stop is observation. So this is again a kind of like a safety measure. So we know that this observation should come from the tool, right? But sometimes the LLM kind of can hallucinate. It can provide its own observation instead of calling the tool. Instead of saying that basically instead of tool generating this observation, the LLM might generate it as using basically mainly by this hallucinations. Let's say that here we have this 101326, right? So this 1326 is coming from the tool. But it can hallucinate and provide its own observation. So here we are saying if the LLM is generating a token like observation. So LLM generates token by token, right? If it starts to generate a token starting with observation, then stop the generation right there itself. So it's that stop condition. Again, it might not happen always but just like a condition so that to make sure that it's not hallucinating and providing its own observation. So if you don't do this, right? So it can say that observation is 2586. It would hallucinate and provide it. So here we are saying when it starts to type in this observation, stop generating the output from it. That is all. To just remember this quickly, remember that if it starts generating an observation then I'm just going to stop it. That is all. So we get a response from this. The ideal responses two cases. One is it can give you a final answer or tool call. So those are the two things that can happen from here. So we get this final output, access this from this choice. So this response would have information on the number of tokens, time taken and these metadata information would be present. choices zero messages. content is the final output for the user query that we have sent, okay? So from that we are removing any empty spaces. Basically identify the LLM call, save it in this output. Now save this in this role of assistant messages. So basically we are adding this to the chat history that we have over here. System prompt question and then we add this assistant response over here. Now we put this through this parser. So we check if this term final is present. Uh maybe I missed this part. So once we get this final output that is saved over here, put this through this parse response function that we have created over here. One second. Right. So parse response, it's going to give us either final, action, error. These three flags. Now we say that if you see this final flag return the parsed output. That means we know that the final response has been reached. If it's error, then return the error message. In case if we don't get final message or this error message, then the one that is remaining is action name and input. Now we say that if like whatever that particular tool is, we have to execute it. So basically let's say that we call this run agent, okay? With this particular question. So let's understand this with this particular example. What is 234 multiplied by 17 then divided by three? So this line, this one is basically getting printed by this line number eight, okay? Now we have a response. This response is basically this. So the agent would say thought to find the result of the given mathematical operation I first need to multiply 234 by 17 then divided by three, etc. Now it would say action is calculator and it's saying that uh you have to do this 234 into 17 divided by 3. So this is the output response that is coming from the LLM this part. So there's thought, action, and action input. So this three things are coming from the LLM. So if you look at it, we have this robot symbol over here, right? So whatever is present with this human face, it's coming from the user. Whatever is getting printed over here

### [50:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=3000s) Segment 11 (50:00 - 55:00)

with this robot face is the LLM output. And finally, we have this tool symbol. So this tool symbol if it's present, then it is like a validation for us that this is coming from the tool. It's not being hallucinated or given by the LLM. This value is coming from the tool, so that's what we are printing over here. So basically, we provide this question, what is 234 multiplied by this particular thing. So we add this to the system prompt, and then we have this question, we send it to this LLM. LLM would come up with this response, we extract this in the output, and then we are printing it. So first, it would come up with this thought that I have to use this calculator tool. Action is calculator, the name of the tool, the action input. Now, what we do is we parse this using this parse response. Now we know that it can either have final action or error. So if you look at it, this doesn't have final, but it has action and action input, right? So what happens in this parse response message, this part of the code will be executed, and then we would get flag action. The name of the tool is provided by this part, a. group. So a is action. a. group. So here from this, we extract only the action input, basically the function input. So that is the output of this parsed, okay? So you have action, calculator, and the 23 234 into 17 divided by 3. So these are the three things that the parser would give. Now, we have this if condition check. We don't have this final in this parse, right? So we only have action, so we have this action over here. We didn't get this final because this is not the final answer. We have this action. So this if condition won't be satisfied as this parser won't give us final answer it. Again, we don't have any errors. The last condition is that it's part of a action. Now we run this. underscore, comma, name, comma, argument is equal to parse. Basically, when you get it like this, so you this probably would be action, and it would say calculator, and then, as you can see, it would say this action input as 234 into 17 divided by 3. So this is the output of the parser that we would get in this particular step, okay? So yeah, so this is the output of this particular line. Action, calculator, 234. Now, look at the first element of this parse. parse is action. It's not final, so this condition is not satisfied. Next, we are checking if this is error. The first value is error. It is not. Now, automatically, this part of the code runs. Now, we are saying underscore, name, comma, argument is equal to parse. So it's basically like saying underscore, name, and arg is equal to this. So what this is going to do is it's like it would say underscore value is action. Similarly, calculator, this name value is calculator. And then, arg, the argument values 234 into 17 divided by 3. So basically, we are extracting the values from this tuple and saving this in this underscore name and argument. Now, if name in tools, if this name is calculator, that means if this calculator is present in tools, then observation is tool name and argument. So this is what I've shown you like earlier. This tools is the tools registry that we have created over here. Okay? So first, it checks whether a given tool by the LLM is present in the tools because sometimes the LLM can hallucinate and it can give a different tool name. We don't want that. So first, we are checking if that is present in this tools dictionary. If it's present, then I have told you earlier, right? So to call this tool, we would say tool. If it's calculator, we would pass this calculator, and then uh provide this input argument over here, which is 234 into 17 divided by 3. So this is going to call this calculator function that's present in this dictionary, pass on this input argument, and get the final result. That is basically what is happening over here. So the LLM would give you this thought, action, and the action input. From this, we are parsing it and getting this particular output, which is action, calculator, and this particular value, and then calling the tool, saving this value in this observation, and then we are printing it in this uh you know, tool symbol. So now it's getting printed in this tool symbol. If we face any errors, we also have some exception block over here. If everything works fine, then we append this to this messages, role user, and then we add this observation over here. And then, this loop runs again. So look at this clearly. If it's a final response, we add this return statement. So whatever comes below that after this won't run if this if condition is satisfied. If a error message is there, again we

### [55:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=3300s) Segment 12 (55:00 - 60:00)

are returning it. It this part of the code won't run. only run if these two returns are not executed. Basically, if it's this part of the code runs if we don't get a final answer. error, this part is going to run, right? Now, first step, we got this action, calculator, and this input. We have identified that we have got the observation, so everything worked. Now, the LLM would say that thought, the calculator has given me the result of the mathematical operation, which is 1326. Since this is the final result of the operation, I now know the final answer of the problem. Now it says action is none, final answer is 1326. Now, look at this. Now, in the next iteration of the for loop, again, we pass this So to this original messages, we have appended the agent's response, observation. Now, looking at it, it's saying that I now have all the information, and this is your final answer. Now comes this next parsed. This parsed in the second iteration won't have action. Instead, this would have final, and this would have the exact output that the LLM is providing, which is 1326. 0. This is the output of this parser function that we have created because it has final answer in its output. Now, this if condition will be satisfied here. Check the first element of this parse whether if it's final. So if you look at it, this is what parsed is. Right? So this if condition is satisfied because the first element is parsed. If it's verbose, we would print it this verbose is not an important thing. This is just like printing that final answer. Look at this return statement. This is important. Now, we are returning the first Sorry, index one of this parse. So zero is my first element, one is my second element. So I'm checking whether the first element of this parsed variable is final. If it's final, then I have to return the second element, which is 1326. So if the first element is final, then all good, return the second element, which is the actual output. And now, we are like printing the final answer. So this is the overall idea. So I'll just remove this. I'll just like give a recap of what is happening. This would like maybe reinforce what we have discussed right now. So the idea is add the system prompt to the role system, add the user question what whatever they are providing, send this to LLM over here. And the LLM would either come up with this action and action input with thought, or it might come with thought, action, and final answer. Now, if you got a final answer, then this part of the code is going to run. You get a final answer, you don't go to the other part of the code as a return statement is already present. In a function, if a value has been returned, then the other part of the code won't run, right? So that's like basic Python. Whereas, if you don't give up get a final answer, if the LLM has responded with thought, action, and action input, these two if conditions are not satisfied. Final is not satisfied, error is not satisfied. This is the part where we actually call the tool, get the result from the tool. So we are accessing the tool from this tools dictionary. This is like calling Let's say if it's calculator, it's calling calculator and providing the input arguments for this function, get the output, save this in the observation, append this to this messages, and then run this again. Now, in the second step also, the LLM may say that no, I still didn't get the final answer. Run the calculator tool again. It might say that Now, first, in this case, first iteration of this loop is tool call, second iteration is also tool call, third iteration may have the final answer, so this return statement would be executed, and we got the final answer. So this is the overall idea for it, okay? So I hope you have understood whatever we have discussed about this run agent. So this is the idea. We mainly frame the system prompt, and this is basically the prompt that's going to guide our LLM on what we are expecting, what is the tool that the LLM has access to, et cetera. So we frame our system prompt, we send a user question, LLM would say that call tool A, or it would say call tool B with this particular input, okay? If it's a tool call, then we would execute the tool in this part of the code. And in the next step, it may come up with a final answer. And in that case, this condition would be satisfied, we return the final answer. In between, if we got any error, we will just like give a error message over here, which is again a error case scenario, but that is the overall idea. So this is the whole idea of this react agent where we have discussed that it's going to combine the reasoning power of LLMs and then in the next step we are going to perform action basically by calling this tool. So that is the overall idea. So now let me quickly go you know go through this entire code and give you a overall recap so that you can kind of like remember whatever we have

### [1:00:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=3600s) Segment 13 (60:00 - 65:00)

discussed. So here we are building this react agent from scratch and we have this think act and observe loop and we are not going to use any libraries, just use Grok LLMs that are hosted in Grok, the open source LLMs as basically the brain for our react agent. We have imported all the required libraries. We have configured our Grok API key and we have said that we are going to use Llama 3. 3 70 billion model. And we have created three tools, calculator, get weather and word count and we have created a registry for this tools inside a dictionary. So later when the LLM provides the name of the tool as calculator, get weather or word count, we would access the function from this dictionary. Now we have this tools description mentioning what is the purpose of each of this tool, what is the data type for this input, the output etc. Finally we come up with the system prompt with the tools description, how it should answer mainly output format that we are expecting from the LLM. So either it would answer with thought, action, action input or thought and final answer. We have printed it parser that would give us either the final answer, action or error message. Basically from the raw output that the agent is providing us, we are identifying only the key information that we need and then we have this agent loop where we add the system prompt along with this user question, pass on to this LLM. LLM would either ask us to perform a tool call or it would give us a final answer. After parsing it, we can clearly know whether it is a tool call or final call. If it's a tool call, call the tool. That's basically our action given as act over here. Observation is the tool result, put it back to the messages list, send it to the LLM again. So this is the second iteration of this loop. Again it would parse this. Again if it suggests us to do a tool call, this part would again run. Whereas if it has given the final answer, then we would return the final answer over here. Or if it has reached the maximum number of steps, we finally would return saying that agent stopped because maximum steps has reached. So this is the overall process for it. Now look at this. So we call this run agent, what is 234 multiplied by 17 then divided by three? Step one. So this step one basically represents the this number represents the iteration of this for loop. So that we have a printed over here. So this step that you can see over here, right? All this verboses you will see only if you set this verbose as true. You can also try by sending you know verbose as false, you wouldn't see this intermediate results. You would only see the final output that you are receiving or yeah. Right, step one. Yeah, the LLM has told us that we have to call this calculator tool with this particular input and in the next iteration it said that once we have given this observation, it had said that I have the required output and this is your final answer and then we are printing it. Similarly when I call this uh what's the weather in Chennai? Step one it says that we have to call this uh you know get weather tool. This is the input that we have to provide. And as we can see this tool symbol, we are printing it from the tool output. Not this is not something that's hallucinated by the LLM. Step two it says that you have given me the observation, which is you know 32° C, humidity etc. and this is my final answer and then we would print it. So this is how every agent, every react agent would work. So let's say if it's a rag based agent, instead of these tools we would just have like a rag tool over here. So the idea is that the LLM reasons and identifies what's the next step, whether it has to call tool A, tool B or tool C. Or it would say that I have enough information, I'm going to return the final answer. So here the action part is calling the tool. Now you can also have a tool that's going to book a particular ticket, an external API that's let's say going to book a flight ticket or you have an external API that's going to connect to a database and return the information. In this case uh let's say we have a mocked up API that's going to find the real time weather information of a city, right? Similarly let's say that the user is asking a question on you know in plain language find uh you know the number of employees in my company who are more than 35 years of age. So this is like a plain language question. We can create a SQL query that's going to uh connect to a database, execute the query. So what the LLM would do is it would say that we have to call SQL tool and it would say that this is a query that you have to execute. Now in the back end just like the way that we have executed our calculator or our get weather tool, in the back end we would execute this SQL tool along with the SQL query that the LLM has given us. Return the output for this query in the LLM response and the LLM would respond with plain language to the user saying like you know there are about 500 employees in your company who are 35 plus in your company and so on. So that's the idea. So a user query comes in. The LLM is not just responding. Now it has access to this tools using which it can do some actions, get some additional information so that it can provide grounded response or do stuff as I said. It can be like booking tickets or it can be looking at uh looking at a particular database, rag database etc. But the main

### [1:05:00](https://www.youtube.com/watch?v=5hnt-bWeeOM&t=3900s) Segment 14 (65:00 - 65:00)

key thing is that in the back end we are executing this tools. LLM is just like only telling us what's the tool that we have to call, what's the input etc. So this is what all these agent frameworks also do underneath. So they rely on this LLM to know what is the next step. Once we identify the next step, these frameworks would internally call these tools and they would execute it. So that's like the important takeaway. So I hope you have understood whatever we have discussed today. If some part of this code or some part of the explanation is not clear, let me know in the Q& A sections. I'll be happy to reply to that with maybe some more like detailed steps or like detailed explanations and all that. So that is all from my side. I'll see you in the next lesson.

---
*Источник: https://ekstraktznaniy.ru/video/50873*