# Build Hour: Agentic Tool Calling

## Метаданные

- **Канал:** OpenAI
- **YouTube:** https://www.youtube.com/watch?v=7E-qdsVEoB8
- **Дата:** 03.09.2025
- **Длительность:** 55:14
- **Просмотры:** 15,746

## Описание

In 2025, agents don’t just think — they run code, call tools, and complete tasks.

This Build Hour is a hands-on walkthrough of how to design agentic systems that reason and act using OpenAI’s latest APIs and SDKs. 

Ilan Bigio (Developer Experience) covers:
- What’s new in 2025: Responses API, Agents SDK, Hosted Tools, Codex, and more
- Chain of thought concepts: reasoning, tool calling, and long-horizon tasks
- Live demo: building an agentic task system to process a backlog of tickets
- Delegation: directional guidance for evals
- Live Q&A

👉 Follow along with the code repo: https://github.com/openai/build-hours
👉 Check out additional resources: https://developers.openai.com/
👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours

## Содержание

### [0:00](https://www.youtube.com/watch?v=7E-qdsVEoB8) Segment 1 (00:00 - 05:00)

Hey everyone, welcome back for another build hour. This is actually our first one of 2025 and we're really excited to be with you here today. My name is Sarah Urbonus and I lead startup marketing here at OpenAI and I am joined by Alain. — Yeah, I'm Elan. I'm on the developer experience team. So, we always like to start Build Hour with kind of the goal of why we're here and it's really to empower you with the best practices, tools, and AI expertise to scale your company using our APIs and models. Now, this series is really for all of you. So, we take your feedback into what you want to hear more of, what you're building with, and hopefully this is a really valuable hour of your week that after this, you can accelerate what you're building and what you're creating with OpenAI. We have a new uh page here, webinar. openai. com/buildour. I wanted to plug we heard your feedback that you wanted kind of a centralized place for all upcoming build hours. So now you can actually see all of the topics that we have upcoming. So we've been busy in 2025. As the kids say, we've been cooking. We had a lot that we've shipped this year and just kind of wanted to show a quick high level of what we shipped, what it does, and maybe what it means for you. We've had the responses API, 410, 03, 04, mini, codec cli, codecs. You can take a screenshot of this. We unfortunately don't have time to go through all of it in this hour. We might be here for a while, but we're going to try to show you as many new features and models as possible today. So, obviously, there's one big one that is missing from this list that we launch, which is image gen. And we know that got a lot of attention from you all. Next week we're actually doing a build hour on image gen where we'll talk about how you can leverage the API for whatever you're building. Um, and yes, we might make a Studio Giblly image or two during that build hour. But today's is really going to focus on the flagship models, um, increased context and also codecs and how you can start building with it today. So for our agenda, we're first going to start out with some core concepts. We will talk about what's new, maybe set up a task with codecs, and then go into some core concepts and talk about reasoning, agentic tool calling, and tasks. As always with build hour, we really want this to be hands-on and get you into our codebase. So, we're actually going to anchor most of today's session on demos. And I think that we shared the code repo in the chat, so you can follow along with what Alain is building and build it yourselves after this hopefully. So we'll first do a demo on how to implement tasks and live build a task system interface and backend then talk a little bit about delegation just share some directional guidance for eval there's a lot we could dig into here but hopefully we'll give you a few tips to get started and then finally we will end with Q& A as always um this is a new platform that we're using today so just wanted to share if you click Q& A there should be a text field that pops up ask questions we have a couple folks in the room here who are going to be answering and then Alain will answer live at the end. — Yeah, absolutely. — All right, you ready to do some coding? — Let's do it. — All right, over to you. — Cool. Hey everyone. Um, yeah, so we have a lot of stuff that's new and to start off, we can just go over a few of the agents that we launched this year, right? So, we launched earlier this year, we launched deep research 03 codeex. You know, 2025 is the year of agents and we're really seeing it happen. Um, and to start off, let's just do a little quick demo of Codex. I know some of you might be interested in using this soon. So, um, just to kick things off. So, here I actually have, uh, the repo for the Codex CLI, which is not the demo I'll be showing. Um, I will instead be showing uh, how to use codeex on chatbt. However, the reason I have this open is just to grab like a sample task and show you how you could use codec. So here we have this issue that someone pointed out that Codex does not respect the uh respect the API keys um through environment variables for other vendors right so um you know this isn't great so what I'm going to do is just like highlight the whole issue um and just drop it into codeex right so I'll say like you know um can you fix this um and then you know I'll give it a bit of context I'll give it the title um and as you can see I have codeex enabled and So, what it'll do is it will uh it has um a local copy of this repo that it can like run tests and actually write code in. Um, and so I'm just going to kick this off. Um, and off it goes. And in the background, it's just going to go through uh check all of the different parts of this issue and then do the changes it can. And so, if we take a look at what's happening in this log, um, you know, it's downloading the repo. Uh, and we're not going to watch the whole thing happen, right? But this is kind of the interface that you'll expect with codeex where you have this like long-term task where you just

### [5:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=300s) Segment 2 (05:00 - 10:00)

want to give it the end state. Um you just say this is what I want you to do figure out a way and off it goes and it's using the environment. Um and we'll get back to it at the very end. Right. But this is just a sneak peek of codeex. You can already use it uh right now. Now the reason this is relevant to our build hours today is because uh you will be implementing your own agents soon, right? And part of the reason of bringing these agents up is to give you a background for like what yours can look like because the technology that we used internally for 01 03 codeex everything um is actually mostly out in the open for you to build as well. So this is how you can build your agents. The responses API is an incredibly powerful API especially as of yesterday where we launched hosted tools MCP so much stuff. Uh unfortunately we don't have a lot of time to get into everything that we launched but essentially with a single API call you can now set off this like entire sequence of events where the agent can uh query files call MCP servers etc. Uh the agents SDK is a really handy way to implement um this same loop but where you can do local function calling and it has a couple other features like handoffs um and of course hosted to tools in MCP um the these are some of the tools that you can use to build. So today we're going to talk about agentic tool calling right and how all the things that we talked about really come into this one idea. So uh what is agentic tool calling? really it comes down to reasoning and tools, right? So you can think of something like deep research, codeex or 03 um as just going through reasoning with tools. But you know why is this interesting, right? Um and the big part of this is really reasoning, right? So um last year we trained 01 uh where we really taught models to reason for the first time. And what this meant is instead of showing them like here's how you do a task step by step and hoping that it learns from our examples, we instead let it figure out how to get to solutions as it reasoned. Um, and we would just grade on whether it was correct or incorrect. Right? Uh, so through reinforcement learning, one learns to hone its own chain of thought to refine strategies it uses. And so this is the first component of agentic tool calling is reasoning, right? we train the models on solutions, not the steps that make them up. They figure out the steps um and reasoning emerges, right? And then you can take uh function calling or tool calling um where you take actions and fetch information and when you combine them, that's when you get agentic tool calling. Um and so what you can see in this diagram is pretty much what happens is now within this reasoning this chain of thought that the model is doing where it um it figures out how to think about things. We now also give it access to tools. So it can figure out not just how to think about things but how to do things as well. And so the paradigm is very similar. We didn't train it um on like this chain of thought like on specific steps that we wanted to take. we just train it on results and it learns to take the actions to actually achieve those results which is the really powerful part of RL that we're now bringing into models um not just for thinking but for doing as well and what you see is this long-term agency um that is really powerful and so this is what we're calling agentic tool calling and what results is uh a model that is goal oriented um is resourceful right it'll figure out a way to get you what you ask for um it's very robust to recovery. So if it gets failures during the tools, it can actually course correct and get to the end. Um and it's really consistent over long horizon tasks, right? You can actually have like tens hundreds of function calls in a row. Um the codeex I think does in the order of this uh and it stays consistent. And so this is the power of agentic tool calling. So with this agentic tool calling capability, we can actually start thinking about a new primitive or a new abstraction which is tasks, right? Everything used to be very chatbased. Um and now we're entering this world of long horizon tasks. So what goes into a long horizon task or like what do you consider it? And this is where we start to go a little bit from theory more into the actual practical land of like um what does it take to put together a task? So of course um you have the agent I guess to talk about these you have uh the agent which describes uh what does the task do you have infrastructure uh which is how do you actually run it? How does it do it? Um you have your product which is how does a user use it or interact with it? Um and then evaluation you know did it do what you wanted it to do? How well did it do it? So for agents you're going to be thinking about goal specification um which is kind of different from before. Instead of specifying step by step what you want to

### [10:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=600s) Segment 3 (10:00 - 15:00)

happen, you have to specify what the end state that you want is. Um so you also want to specify the tools which will give it access to the different resources that you have and this way your agent will be able to interact with your own systems. Um, this is also where you might want to use delegation where you're no longer talking to just one agent and waiting for it to be done. It might be able to kick off other tasks as well. Uh, and how you choose to interact with like async longunning function calls and human loop. Um, now infrastructure is where you start to think about these like parallel tasks and parallel execution and how you manage the state between like one agent and your product and your back end. um as well as the runtime environment. So, Codex has a runtime environment for each of the different um like repos that it has and you might want to set up one for you. Uh and then this is also where you choose how to handle failures, retries. This is like the nitty-gritty of like okay we have agents like we have these concepts how do you actually run them and what does it look like you know with code and with your like um backend architecture. Then you have your product where you can choose how the user interacts and what they get to see. And so this is where you can surface progress like how do you keep them informed of what the agents are doing. Um this is also where you work with the user essentially to provide the agent all the context it needs to actually accomplish the tasks. And this might be explicitly by asking in the user or it might be uh explicitly how we might do it in uh in a second where it is just gathered from the context itself or like from the application context itself. Um, and this is also where you can choose like how do you visualize tasks, right? Like um, codeex has this like list of tasks below and then when you open it, you can see the um, like all the things that it's doing with these nice animations. And so this is where you can actually have fun with it, right? How do you keep the user entertained and give them insight into what is going on. Um because I forget off the top of my head where um where this is from, but like I think it's pretty intuitive like a user who just like doesn't know what is happening behind the scenes and is just waiting um will a get more like stressed uh and impatient and b will actually trust your output less if they don't see what is actually going on. So finally um you have evaluation. you know, this is where eval come in. Uh where you want to collect examples, define how you want to grade. And so this is a little bit different from before where you really wanted to evaluate like turn by turn um in a chat conversation. This is like the most common way that we used to do it. Now with tasks, you actually are more interested in the end result um and maybe less interested in what each of the turns the model took was. So, you might want to set up like these graders that are now often like LLM based graders where you give it a rubric and you describe what criteria you're looking for. Um, I like to sit with people when we've worked with companies and just ask them like, okay, is this good? And then have them say yes or no and it's like why? And then just ask them and go through and really break down a task and what makes something good. Um, maybe a little sneak peek here is if you have a few examples, you can actually fine-tune a model to be your greater. And this is actually a great case for reinforcement fine-tuning that we launched where if you have a good golden set of examples you can train on that. Um and then finally tracing to do all of this you want to uh it's not evaluation is not just about running evals it's also about like monitoring uh evaluating it online during its interactions with the user. So these are the parts that really go into uh tasks and and when you consider how to build them. So I think this is enough to Oh, and uh yeah, they I I put things in different groups here, but everything is really connected, right? Like you know, async and human loop might actually have to do with like continuing longunning tasks on the infrastructure and like the product might be um include delegation. So you know this is like um let's say the four corners, but there's no clear lines between them. So I think with that said, it's time to start coding. So, we're going to take a stab at implementing tasks. And today's prompt is let's say we have a lot of like a lot of tickets, like a backlog of tickets, right? Um customer feedback, etc. Let's build an agentic task system that can actually take them on and resolve them. Um this is kind of broad, but it's kind of to show how broadly these agents can work in practice. And so um this is kind of let's ignore the issue. This is roughly what we're aiming for, right? We h let's say we have this customer service portal or like this feedback portal with a few different tasks. Um

### [15:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=900s) Segment 4 (15:00 - 20:00)

what we want to implement is this system that can actually take these tasks and operate on them. Uh and this is going to require like defining your agent, defining some infrastructure to run it and also defining like how you do the interactions. uh or what interactions the user has and how they see this. So let's get started with the agent uh in a pretty simple way, right? So this is actually a very powerful API call already. Um when you specify a response, you can actually give it hosted tools and have that run in the background. I added this because yesterday we launched uh background mode which actually takes a lot of this and like makes it a little bit easier if you're not using local function calls and all you're doing is MCP and hosted tools. Uh I'm going to skip this for now because there's a lot to get through. So for the first thing is we're going to use the assistance uh I'm sorry the agents SDK. And what this SDK does is it wraps this like loop that we actually implemented in um the my first build hours which was implementing an agent. Um it's very similar and it's a um based on the swarm if you're familiar with that. So here we have the simplest agent and let's just run it real quick to see what the interaction is. So, hi. Um, you know, can you say hi to everyone watching today? Um, and as you can see, we can stream the results and we can see the reasoning. Great. 03 is pretty friendly. Um, so it's funny because getting to this point last time actually took a lot of work, but now because we have the agents SDK, it really is just throwing together um a quick agent. Now, let's start adding some tools, right? So, if we go back to this, we might um want to take a look at one of the tasks and it says, you know, customer reports being charged twice for their monthly subscription. Um, so let's start building some tools for the agent to use here. Now, we're going to have to think about tools and also the prompt and how we specify the goal. Um, so I have some code ready, but I like uh doing this live with you all. So, let's just start typing up some functions. So, um, you know, cursor is very helpful here. It uh it sort of knows already what I want to do here. We're not going to search the web. instead you know let's say we want to give it access to you know um you know I guess get um you know get user data right and we can take a user name and then we can just return some mock data and then let's you know ask it to be like include recent order history. Cool. So now um when it queries for a user and similar to the swarm just by specifying a function uh with Python um it'll actually take this and turn it into the right schema and give it to the model and execute it. So super nice way to do this. Cool. So, we have this first function. Let's try this out really fast and just say, you know, um I'm Oh, maybe let's give it some instructions. Say, great. Thanks, cursor. We can say, you know, Milan, I'm upset because um actually, this isn't quite a goal. Let's pick a goal, right? So, we have the user data and then maybe let's add another function for like uh refund refund. Cool. Yeah, let's mock it out. Uh, and let's find let's make one for order. Get order details. Okay, cursor is actually amazing. So what we set up here is we want to have this ability to like get recent orders. Um now we can then take those orders and return some details on them including the price. And so for this flow, if I want the model to be able to perform um a refund, I want to be able to specify like the functions, my problem, and have it figured all out. So

### [20:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=1200s) Segment 5 (20:00 - 25:00)

this is all live. So, anything can happen. But let's see. I can say like um I really didn't like the last thing I got. I want a refund. So, the model is reasoning. Um can you just check my recent orders? So now it uses the first function to get the orders. It shows them to me. Um so what we're seeing here is pretty it's still pretty interactive. This is uh kind of the normal approach. But let's say let's add an instruction. And this is why specifying the end state is more important than specifying the steps. Let's say um you know get all the context you need up front. Then execute task to com completion without asking for more. And let's add one more. Just say like, you know, if you have everything you need, just go You know, this is not entirely what you might want to do, but this is just to show how I can say like, you know, I'm Elan and I want a refund on my last order. So, let's see. Okay, it's getting the user data. It's getting the order details. It's bitting the refund. And that's the end. Um, and so this is kind of to show by specifying the end state, you can actually have the model figure out the steps required to get there based just on the tools. Cool. So this was a simple start. Now let's move on to product and actually integrating it. So um if we have this front end, we are actually want to connect to uh to a back end, right? Um, and in order to keep using the agents SDK, which right now is in Python, JavaScript one coming soon. Um, we'll want to set up like a very simple like uh flask setup, right? So, I'll do this one by hand just to show you guys how to do it. Let's do quick server. I guess I have one right here, right? Um, so I'll start off with, you know, from Flask. Okay, very basic Flask server. Now, let's say we have like a, you know, task endpoint. Uh, here's where we're going to want to run our agent to actually perform the task. So, this pattern and then stream back the results. So this pattern is I don't really know what a good name for it maybe like a foreground task but essentially it means that when a uh front end connects to it stays that like the connection is what keeps the task alive and if we go back to our slides um this is what this would be where each new connection um actually starts its own task and like the connections are managed from the front end. Um and so this is useful if you want um a use like first of all it's very simple to implement and second of all if you implement it this way then when you connect you're essentially going to get be getting the stream of the events for that task. So actually let's start with um the simple server here. I'll walk through it. Um I think it's better than watching me type everything. This is slightly more code than I usually do for these live build hours. So let's take it easy. Let's take it slow. Um first we define our endpoint which is going to be an SSSE endpoint. Uh we take we grab from the body the input items and the previous response ID. Um and we use the assistance API runner. runstreams, right? And what that does is it gives us a stream of all the events that the agent is doing. We're passing in this agent that we can actually import from the one that uh we defined. So if we defined it, where was I? I think it was in one agent. So I'll import it from there. Yeah, it's 50/50. Oh, it's not happy with that import name. That's fine. Uh I have another one implemented. We'll be using that one. I'll show you in a second. But essentially, we have this runner. We can run it. And then for each of the events, we actually get different kinds of events from the agents SDK. We want to get the raw uh response events which represent the actual events coming from

### [25:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=1500s) Segment 6 (25:00 - 30:00)

the responses API because the agents SDK is by default backed by the responses API. And what we do is in this event stream that we're defining run it and then we just yield them back and we encode them with SSE. Um this is mostly just piping but I think it's important to show you to show like what kind of thing might go into this. Um and then at the end we'll yield like a done and then we return a stream. Now already with this we have implemented something that looks like this where from a front end we can make a connection to a back end um run the event and then wait for everything to come back. So this is a good place to start if you are just prototyping or making something. But in reality um you might want to handle like be able to handle a task where it is created in the background and then disconnected. And you might see this in chatbt where you can type something and close it like completely close it and then come back later and it'll be finished right. So this is implemented with a different approach with a background task view. So let's take a second to just walk through this architecture because this is what we're going to be using in practice for the front end that we just created. So we have our front end um which needs to be kept in uh in check with the back end that has the tasks. What we can do is open an events connection to the backend that just receives all of the new task events um from the back end. What are these events? This is like adding new items to the tasks, updating to-dos um pretty much anything that modifies the state of these tasks. Now, how do we actually kick this off? We can use uh a task endpoint. So here I've labeled in yellow what is like an SSSE endpoint that is meant to stay open. Um and I've labeled in gray anything that can um is just a post request and this has the shape of a background task right where you just make a post request which starts the stream and that's it and you disconnect and you could actually start this task from anywhere. So this architecture actually can scale nicely. Um what you can implement this way is once you have your backend you can send tasks to a task queue which is actually responsible for running your agents. Um and then you can take the events and any events that come out of that just associates associate them with which task it's related to and you stream them back. So uh this is the main architecture. Uh let me walk you through what the implement implementation can actually look like. Okie dokie. We have a lot going on, so I'll go slowly. This is the task object that we are defining. Um, we just wanted to have an ID. Uh, this items list represents the items that actually make up the conversation history or like the action history of an agent. Um, we will get into to-dos for a second. Let me delete this for now. and we have this status. Now, uh the global variables that we have here that you might actually want to persist somewhere, but we are not for this demo are the actual tasks mapped from their ID. Uh this async io event Q, um which just lets us run things what we're considering in the background. Um and then our fast API. Now, let's go all the way to the bottom and I'll show you the two routes that we have. So here's I showed you in the diagram earlier. Um the first one is the events route. Um and so what this does is you can see it's a pretty simple one. It's maybe a little bit hard to follow, but um essentially what it does is forever, right? As soon as you start a connection, it'll just forever stream these events. Um wait for this events Q to receive something and then forward it to the front end. And that's all it's doing, right? It's just taking from this event Q and sending it to the front end. Um, and you'll see why this is important. Because when we have this events Q, if we have this endpoint that a friend can connect to, it'll just guarantee that anything we push to this events Q gets routed to the front end. And what we can do is push updates from any of the agents up to this event Q. Um, so that's the events endpoint. Now, let's took a let's take a look at the tasks endpoint. Um, just like before, we take the body, we parse it, uh, we grab out the items, previous response ID, which if you're not familiar, is a very convenient way to specify what the um, previous requests were so that you don't

### [30:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=1800s) Segment 7 (30:00 - 35:00)

have to pass the context each time. And this becomes really important when you want to do this chain of thought tool calling because you want to make sure that when the model is calling a function, it actually has in its history the rest of the chain of thought. So it really is function calls within a chain of thought. Now we can create this task object, save it to our tasks. Um and then here is where the important part is. We can publish and I'll show you this function in a second. um that we have created a task and then what the task ID is. Um and so if we scroll up to publish, all we're doing is taking the events Q that we were talking about before and giving it the event itself encoding it as SSSE so that the front end can take it and decode it. Cool. So finally uh once we create the task then the last things left to do is to actually kick off our worker kick off our agent our task. Um and so this task is actually um bit overloaded. This is just a function that is part of the async. io library. Uh it considers a task like a background task which actually fits very nicely with our analogy. So we can take the async. io a worker um and then give it a worker function that we'll get into in a second. Uh and kick it off in the background now and then return the task ID for the front end to be able to have it. But um so this is kind of where the juicy bit happens, right? What is happening in the worker? Well, it's very similar to what we had before. All we're doing is we're going to take our runner, um give it the agent, which includes the prompt and the tools that are defined. uh give it the input items which is the input from the user uh the previous response ID so it can stay consistent with the previous conversation um give it the task in a context variable and so this is important later um for it to be able to modify its own task uh and this is a nice pattern in the agents SDK where you can supply objects or context to an agent runner so that when an agent is running they're not in the LLM's context, but they are accessible by function calls. So the model doesn't see it by default, but it can use it like a memory bank um where it doesn't see it in memory, but it can make changes to it. It's just a convenient way. It's a way to represent like a closure or just any kind of state that is tracked along with this run. And then what we have before which is for all the events we filter out and for all of them that are response events we publish them down to the front end so it can actually render them. Um and we also set max turns to 100 you know you can set this to any value you want. Um great and so this will run until the agent is finished and then we'll just set task to done and push down an update. And you'll see this pattern a lot where um we like I make an update to the task object and then I push that same update down to the front end and that's just to keep them synchronized. So the tasks that we represent in the back end are the same as the ones we represent in the front end. So this was a lot right but this is actually everything that we have. Um we have the tasks endpoint which actually kicks off a task and returns the task ID. We have the events endpoint which connects the front end to the back end and streams all of the events from all the tasks and we have the actual worker which takes the runner runs it in a streaming mode and then all the events just publishes so the front end can have now these are the components I'm going to go back to the diagram so we can see what this looks like um this is what we have made right we have made a system where you can start tasks give to the task view they'll run the agent and then they'll forward the events back down. So, let's take a look. Um, I'm just going to refresh everything just so we have the best shot. Here on the left, I'm running the front end, so I'm going to run it again. And here we have the back end. Let me just make sure I'm not lying to you. And I run the actual one server. Um, and right now I'm importing an agent, but since we already defined one that was, you know, so nice. Let's just use this one. So, I can really just grab this and pull it in. You know, this isn't the great best practice, but it's because I I'll change the name to, you know, my agent. And now from here, I can my agent. Amazing. So you know this is all live. It might work, it might not. But essentially what we did here is the agent that we defined earlier, we're

### [35:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=2100s) Segment 8 (35:00 - 40:00)

going to run and see it in the front end. So, server Q did I also bring in the ah um let's kill this. Let's rerun that. Go back to our front end. Rerun this. Okay, now if everything happened correctly, um, and I wired up the front end separately. I'm not going to be going through the front end because that code is kind of a little bit all over the place and I don't want to be switching languages, but essentially when I hit start investigate right now when I connected to the website, um, we can actually see in the servers that it sent a get events um, call from the website to the back end. So, it starts receiving the events and right now we haven't streamed anything. Um, but now when I hit start investigation, it'll supply the context, start a task, and you can see that the events are actually streaming from the back end. So, we're going to get user data um because we supplied everything together. And then it's probably going to tell me it can't do something because we didn't give it the right tools. Yeah, it gave me some response because again, we didn't design this agent for this website. So, let's actually design an Um, and for this, I'm going to go into this agent that I've defined earlier. So, if we just go through the prompt, you know, you're a helpful assistant. Um, we're going to ignore the to-dos for now. Uh, running in non-interactive mode. Final output. Great. Um, what we're saying here is just keep going until you're finished. And we're going to delete anything about to-dos. I'll get to that in a second. Um, but now if we refresh everything. So, sorry for all the uh, you know, motion sickness going back and forth. Um, then we should actually be using this new agent. And so, what can this agent do? You know, it can get weather, search open tickets, read document, get runbook by category, search policies, get emails, add ticket, write document. We specified these functions up here. And I made this mock API that essentially just returns like mock data for all of these. But this is just an extension of what we were doing before. And in reality, obviously, instead of hitting a mock API, you'll hit a real API. Everything would work just the same. So, let's give this a shot. If we start here, it starts a task. Uh, ignore the progress for now. That is the next thing we're going to be doing. But we should be able to see it reasoning. Um, and since we're streaming all the events, we can actually run multiple tasks in parallel. And so here we can see, you know, the reasoning happening. Get user data. It's loading it. And for this one, it's going to go through a different process. So, is it going to give up immediately? It probably will. Oh, did I not switch it out? Oops, my bad. Let's go here and let's go back from using this to uh server agents once again. I'm just going to do this one. Great. Okay. So now it's actually going through the functions that we defined going step by step uh and we can see some progress, right? But uh it would be nice to be able to have more of an insight. So here's where we get back into the product sense, right? How do we surface progress to the user? You know, the chain of thought events are a nice way to do it and there's other ways to render it. Um, but there's this cool pattern where you can make to-dos uh or make like have the model essentially have a function to surface progress to the user. So how would we implement this? Well, what we can do is we can take the task object and add this to-dos field. Um, and once we add this to-dos field, um, you know, for now it just does nothing. Uh but the magic happens when we give the model functions to update this to-do field. So this is a new function just like the ones that we were implementing before, but it's a little meta. It actually uses the cont the task in context we passed before. Um and the model can supply the actual task that we wanted to run. And so for each of the text in the to-dos, we create a new to-do. look at the uh add it to the actual task and then publish it back down. Uh and then we also get give it a function to just check off to-dos. So what have we done? We've declared these two additional functions and then we're going to add them to our agent um to add

### [40:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=2400s) Segment 9 (40:00 - 45:00)

to-dos and set to-dos. And this is a very tiny thing, but it actually feels very magical when it works. And so let's take a look at that. And maybe we want to add the thing I removed here. Cool. Now everything about to-dos is there. So always create a plan with to-dos and always start by setting the todos and then check them off as you go. Cool. Fingers crossed. So let's try the same task again. And so now, if we're lucky and the model does what we wanted to do, the suspense is killing me. Tada. Now we have some to-dos. And um we can see them in the front end. They can actually drive our progress and the idle. Um and as the model goes on, we get this like kind of magical insight into how far along it is. Um without having to build any like monitoring system or any other pieces. we can just have the model go through and check them off. Um yeah, and just like before, we can like kick off multiple in parallel and um very uh I actually am not the biggest fan of chat interfaces. I love just clicking. Um, and so this way of just like grabbing all the context, shoving it in. Um, so the the model has everything it needs is like in my exper uh in my opinion a very delightful user experience. Anyway, so now we can see that it's gone through. Now, uh you may be wondering why don't I see the function calls to check the to-dos and the answer is I am explicitly filtering them out in the front end. And this is how you keep some of the magic, right? If you just don't show the user how it's doing it, um, then it's just going to keep going and it'll look more natural without like checking off the to-dos. But yeah, I want you to just maybe take a beat, take this in, like what does this mean for your product? Um if you have anywhere that would benefit from these like long horizon tasks or anywhere where you want goals uh fulfilled or anything that requires multiple more open-ended steps um this is something that is actually quite useful and these patterns of uh running tasks in the background showing progress um just really come in handy. And so if we go back um there's a whole other bit about delegation. I'm not going to go into it too much. Maybe we can talk through it really fast. So delegation is a notion of like right now I am kicking off the tasks by hand. Um you know we went before like we have this task. What if we want the model to kick them off, right? Like what if we want to instead of clicking ourselves uh which I actually love, don't get me wrong, but if we want to like have chatpt or like your agent actually kick off tasks, but we can keep talking to it. We can implement this pattern where you frontload the context gathering. You know, it asks you follow-up questions. If you used deep research, this is what it feels like before it starts off on a long task. It makes sure it has enough information. It starts off that task with a function call which actually returns immediately and you can keep talking to it. Um meanwhile the task is running in the background with an architecture similar to the one that we ran. Um and you know you can keep chatting with it. It's non-blocking. This is optional but this is an approach I quite like and then once it's done it can come back and actually um update you on it. I included uh an example of delegation and I think we have enough time to go through it really fast. Uh so why don't we do that? So this one is once again textbased. I didn't build out the UI for this but um I said I wasn't going to talk about it. Maybe I will. Um where we need a system to run this in the background, right? Like if I just do a function call that like calls 03 for example. Um, and like maybe on the front end I'm talking to like uh GBT 4. 1 mini. Um, if if I don't have this enabled, um, and for example I go here and I say and just to remind you what I have on screen. I have this agent with no tools in uh set right now. Um, that is just 4. 1 mini. Maybe let's set the these two tools. Right? This is all I have. Um what are these tasks doing? Uh this function just calls the responses API with our input. Um and then get tasks can retrieve this. So right now this is not super interesting because I can say like you know uh I'm

### [45:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=2700s) Segment 10 (45:00 - 50:00)

going to ask you to do something hard you know like write a poem um where every word starts with the next I don't know um prime number uh if it was indexing into the alphabet something like that. Um, if I ask this to for many, it's probably not going to be able to do it. But I'll say, you know, start a task with this. Um, and by starting a task with this, if we go back uh to it's actually going to call 03. So, let's see if this works. Oh, it really just tried. Okay, I'm going to say, you know, instructions Uh, you know, if the user asks you something too hard, start a task with it. Cool. So, let's try that again. I don't want to type this out, so I'm going to copy it. Oh, it actually Maybe it did a good job. Wow. 4 4. 1 mini is a good model. But for this example, we really just wanted to call the function. So it starts the task. And now what is this task doing? We are waiting for a response from 03. And 03 is going to do this very carefully. Uh oh, it actually did some of the work for it. Wow. 4. 1 is a really good model. um but uh we're blocking we're just sitting here we can't keep talking to it until 01 03 is done however if I enable background task then what I can do is a very similar thing but now it should return immediately um and so I'll say this um it should start the task and then we can Why is it not happy with me? Uh 500. Okay, we launched yesterday. We're going to ignore this for now. Um, but I think what I want to show you here is if you um have a background system that you're either depending on OpenAI for or you implement it yourself like we just did for the agents SDK, then you can uh use function calls to essentially hand off a t not hand off like delegate a task um that will happen in the background and then just check up on it or push updates to your main agent. And so what the experience to the user can be is um non-blocking you. You can just keep talking with it uh and service them. Uh sadly I couldn't show you right now. Maybe let's see. Maybe I do have the simple server. Should we try this? Let's get a little Oh, no. These this is uh Sorry. This is This blocks this blocks. We could do it with the other one, but I don't really want to fumble around too much. uh live done that a bit too much. Um yeah, we would have to implement the task. It's fine. We'll skip it. But uh hopefully you can believe me, right? Uh we have this backend system and so what you can do is have a function to start which returns immediately uh and you'll be off and running. So I want to end this here uh so we can get straight to questions. There's a couple things we wanted to talk about we didn't get to. Uh, we'll be sharing more resources after, but yeah, shall we jump into questions? — Let's do it. If you bring back over the deck, I actually dropped some in for you. — Um, — can you update it? Refresh. — Yeah. So, what are the most efficient ways to orchestrate sequential and conditional tool calling? Um so I get this question sometimes and it always um it's an interesting question because the whole point of agentic calling is you are not you don't have an expectation for like sequential or conditional like you want the model to do whatever like you just wanted to figure out how to get to a solution. So, um, that being said, if you do want to have something kind of conditional or sequential, uh, or sequential, just use Python, right? Like, uh, code is an amazing way to express sequential and conditional things, uh, if you want 03 to like always call three functions in a row, uh, instead of like hoping it calls those three functions, which is also higher latency and more expensive for you, just put them all in one and then

### [50:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=3000s) Segment 11 (50:00 - 55:00)

have that function do the three things that you wanted. So uh that's kind of my answer there is like Python is very powerful use code where you want like this sort of uh task you can also like enumerate it in the prompt 03 is really good at like following those instructions. How should memory be managed in agents handling long horizon tasks? Another good question. Um there's many different ways, right? Um its own context is like a pretty decent memory bank. You can also have explicit memory system. So in a very similar way that we were like creating to-dos and checking off to-dos uh in an external state that the model can't actually see. It's not in context. um you might want to implement like a way to give it like a way for the model to remember things to save facts and then recall them later um if it's necessary either with a vector store or something similar. Um we actually did something very similar in the original version of ChachiPT's memory where it would be able to like explicitly choose to save a fact based on what you said um and then at a later time like bring up similar facts to a prompt. Um, and the way you can do that is just uh doing these like vector comparisons, embedding comparisons. I won't get too much into it, but there's many ways. An external memory store is useful. How many tools is too many tools? You can take it up a lot. Um, I try to stay under 20ish. Um, that's like very very heruristic. Um, but at that point it's not so much about can it even handle it. It's like what are you really describing? Um, and so the agents SDK, and I didn't get into this here, has this notion of handoffs. Uh, and if you check out the agents and assistance build hours, we do get into it more, but essentially it lets you specify multiple different agents that can pass off or hand off a conversation between each other where the one that has the most appropriate tools will get it um, and we'll get it in a way that doesn't have to be rerouted with each time. So 20-ish maybe. Uh, can you use OpenAI hosted functions together with your own custom functions? Yeah, of course. This is actually a really great pattern where um I didn't implement this here, but if you want to for example use like code interpreter to analyze some results from uh database queries, you can absolutely do that. You can have functions that retrieve certain information or like load in um certain uh like numbers for example or some table uh and then the model will choose to do that and then use code interpreter to run uh and then give you results. So yes, it's actually highly recommended to use these uh together. Does the responses API in Italy support MCP? As of yesterday, yes. So as of yesterday, uh any remote MCP server you can actually add to the responses API um and it'll make those remote calls. And so this is where the actually the magic of the background mode comes in. It's like usually the only reason the responses API has to come back to you um before like the agent is done is when it wants you to run your own local functions. But if you have no local functions, if you've implemented everything with a remote MCP server, then you can actually just run it and set background to true and just kind of like forget it and check in later. And that can be like essentially one very long responses API call that can do all these different things. MCP uh file search, image generation, etc. — Okie dokie. Just realized this guy. This is uh this is cute. Cool. Uh — I had to add that in. This is really sweet. And everybody in the room that's answering questions, this made them smile. So, I'm glad that this has been helpful and really appreciate you taking the time to teach us how to agent. I loved that. — This is cute. I like it. Are there any more? Or was that the last one? No, — that was the last one. And we are running perfectly on time. So, if you go to the next slide there on resources. So, thank you Alain for blessing us with this wonderful hour of building and doing lots of things live that we haven't built with before. — No, we're blessed by everyone building all of this. This is all possible because of everyone at OpenAI. So, this is Yeah. Yeah, — this is true. What a happy note to end on. Um, we're going to follow up with some resources. So, we're going to share the GitHub which has all of the repos from prior build hours. You can see upcoming build hours on the landing page as well as recorded build hours. If you want to spend more time with Alain, you can watch his previous build hours where he talks about assistance and agents. And then we'll also send out a link for a practical guide for building agents. There were a few questions that were coming in for the chat that I actually think this would be a really good starting point for. Um, our next build hour is going to be next week and it's all going to be about image gen and the API. We have lots of really exciting

### [55:00](https://www.youtube.com/watch?v=7E-qdsVEoB8&t=3300s) Segment 12 (55:00 - 55:00)

demos to go through, a customer story that we will share and we're looking forward to seeing you in more build hours. So, thanks for tuning in and happy building.

---
*Источник: https://ekstraktznaniy.ru/video/11273*