Build Hour: Responses API

51:15

Build Hour: Responses API

OpenAI 14.10.2025 22 665 просмотров 459 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

The Responses API is our flagship API for building agents. This Build Hour shows how the Responses API unlocks persistent reasoning, hosted tools, and multimodal workflows for GPT-5. Steve Coffey (API Engineering) covers: • How to use built-in tools, call multiple tools in one API request, and preserve reasoning tokens across turns • How the Responses API is faster and more cost-efficient • Live demo: building a simulator to bring a day in the life of an OpenAI engineer to life • Live Q&A 👉 Follow along with the code repo: https://github.com/openai/build-hours 👉 Responses API Docs: https://platform.openai.com/docs/api-reference/responses 👉 Migrate to Responses API Guide: https://platform.openai.com/docs/guides/migrate-to-responses 👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours/

Оглавление (11 сегментов)

Segment 1 (00:00 - 05:00)

Hi everyone, I'm Christine. Welcome back to another build hour. I'm on the startup marketing team and today I'm joined with Steve. — Yeah. — Hi, my name is Steve. I'm an engineer on the API team. — Great. So today we're talking all about the responses API. Um if this is your first build hour though, just a quick reminder that the goal of this hour is to empower you to build with OpenAI APIs and models um with live demos and also readytouse code repos. So, I'll drop that link um in the chat over to the right side of your screen. And you can now find all of our upcoming build hours on our homepage um as well as on YouTube. So, we heard your feedback. You wanted these to be easily searchable. So, if you go to the OpenAI YouTube channel, you can see a playlist with all seven of our last build hours um including GBD5, voice agents, codecs, and built-in tools um and many more. So, as I mentioned, um this is the first build hour of that we're having right after dev day. Uh we saw some really exciting product launches like the apps SDK and agent kit and we saw this great theme around building agents. Um hence why we're talking about the responses API today. So, this wouldn't be a build hour without um a custom meme. Um, and this just goes to show why the response API is so important um to talk about, especially um as it relates to building agents. So, here's what you can expect for today. Um, first we'll give you a quick brief on why the responses API. Um, then we'll show you with a live demo how we're going to migrate to the response API if you haven't already. Um, and then this demo is going to be really fun. It's going to kind of peel back the curtain on a day in the life of an OpenAI engineer. And then since this is the first build hour after Dev Day, we'll give you a little preview of Agent Kit before the October 29th build hour where we'll do a deeper dive. And then my favorite part is always the Q& A. Um on the right side of your screen, you'll see our chat as well as the Q& A function. So in the Q& A function, you can submit questions. We have our team in the room who will be answering them. um via chat as well as saving some to chat through live at the end. So over to you Steve. — Cool. Awesome. So yeah, we wanted to talk a little bit today about sort of why we built the responses API, why we felt the need to sort of evolve our core API primitive from what we previously had chat completions to something kind of brand new which enables a lot of new functionality and we hope fixes sort of a lot of the design paper cuts that people were experiencing with the chat completions API. So before I really get into it, I want to start with a little bit of history. Uh back in uh 2020, we launched our first API, V1 completions. And this was really built for an era where models kind of just finished your thought. You had you would have a prompt and it would start exactly where you left off and continue until it was done or it ran out of tokens. And we had an API for this. It was called view and completions. And if you were an OpenAI builder in this era, you might remember the models of this time. GPT3 uh text da Vinci ADA models like this were really sort of like the sort of frontier of LLMs at the time. But then in 2022 we launched chat GBT and in the API we launched uh GPT 3. 5 Turbo and this was the first model that was really post-trained on a conversational format. So instead of picking up where you left off and just continuing, it was trained to respond to you in a way that's more like a conversational partner and a lot less like uh like just a sentence finisher, right? Uh, and this API, we famously designed it on a Friday and shipped on a Tuesday, but it really quickly became the deacto standard for LLM APIs, and soon afterward, we shipped features like tool calling and vision in the chat completions API that really helped level up as the models became more advanced. Uh, but starting earlier this year with the release of 01, 03, and now GBT5, we have these models that are very different. They're agentic and highly multimodal. And we needed an API that would enable everything from sort of simple text in-n-out requests to highly agentic long rollouts that could last for minutes at a time. So uh I want to talk a little bit or the yeah so the response API really combines the simplicity of check completions with the ability to do more agentic tasks. Uh the responses API can simplify workflows including tool use, code execution, and state management. And as model capabilities evolve, we hope that the responses API will be a flexible platform for building agentic applications. And really the core piece of this is a lot of the built-in tools that we've shipped. So if you used to be an assistance user, you might remember a couple of these from back in the day. We shipped file search and code interpreter and the assistance API. And we've brought these to the responses API in uh in addition to a bunch of new tools like web search computer use co uh remote MCP image genen and of course

Segment 2 (05:00 - 10:00)

function calling that everyone knows and loves um and we believe the responses API will sort of be a um help us effectively enhance the openi platform into the future as models evolve. Cool. So want to talk about a few things that really set the responses API apart from chat completions and what really make it different. The first is that the responses API at its core is what we call an agentic loop. So the core philosophy of responses is that again it's an agentic primitive. It needs to be able to do multiple things in the span of one API request. So uh in comparison chat completions which sort of has a design where has n number of choices one message each. This only allows us to sample for the model one time per request. But in responses we can sample from the model multiple times. So what's an example of that? Let's say we want the model to be able to write some code and then use that code to be able to give us a final answer. So I can give the model uh the access to the code interpreter tool and I can say what's the square root of 5 bill276 and then what the model can do is it can write some code. Uh it can execute we can then execute that code server side. we can show the model what the output of that code was and then it can sample again to then give us a final answer that's based on what the code interpreter gave it. So this is what I mean when I say and it's a aentic loop. We can kind of do multiple things in a loop until the model finally says hey I'm done. Here's your final answer. Second big thing is this concept of items in and items out. So in the responses API everything is called an item. And so what's an item? uh an item is this sort of union of types that represent things the model can do as well as say. So in uh by comparison the track completions API everything was a message concepts like function calling were kind of bolted on to the concept of a message and so kind of handling these cases where the model is like doing something and also uh instead of saying something were a bit tough to reason about and the code didn't quite look amazing. In response API we brought we've broken these out as separate types. So a message is a type of item, a function call an MCP item and so on. And this makes it much easier to code around. So when you get these sort of multiple output items, it makes it really easy to sort of write a for loop and then a switch statement that allows you to do different things with items like populate them in the UI, persist them in your backend, or kind of do whatever your application desires. So I want to show a quick example of what this looks like. I'm going to pop open my terminal and uh on the left here I'm going to make a call to responses with GBT5 nano and the prompt is just tell me a joke and then on the right I'm going to make a call to track completions with the same prompt. So on the left we can see that we have an output key and it's got two items in it. So the first thing in here is a reasoning item. uh it's denoted by type reasoning and essentially what it is it's just a receipt that the model thought a little bit before it emitted the joke which is sort of our classic why don't scientists trust Adams um and then on the right in the chat completions world we don't have any receipt that the model did any reasoning we have the design doesn't really allow us to hydrate these kinds of things from step to step so we kind of just get this one message with content if the model were calling tools that would sort of be represented in line here so just a little bit example of what I mean when we say that the responses API is items in items out. You can take these items and then pass them straight back into your next request and everything will get sort of rehydrated. Uh so moving on um the responses API is also purpose-built for reasoning models. So the responses API allows you to preserve reasoning from request to request. So in that previous example, we saw that the response uh the responses API emitted a reasoning item. So if you were to call the responses API again and pass that same item back, we would be able to rehydrate the chain of thought from the previous request and ensure that the model is able to see it and use it in the subsequent request. So this works either statelessly or statefully, the responses API is stateful by default. So if you are just kind of using it out of the box and you're passing these items back, we're able to kind of rehydrate this chain of thought out of the database and then pass it back to the model. And what we see is this actually really boosts tool calling performance. For example, in our sort of primary tool calling eval tobench, we see a 5% performance increase on the responses API when compared to the chat completions API. Next thing is multimodal workflows. So we've really updated the design to make working with images and other kinds of multimodal content much easier. So if you want to do things with vision, it's really easy to pass B 64 or external URLs to the model with the responses API. We also added support for context stuffing. So you can pass files to the API PDFs. So for example, if I have my PG bill and I say, why was my PG bill so high in September? I can just pass that PDF directly to the responses API and we'll extract the content, show it to the model, and the model can help me figure out what was going on in my house that month. Um so which is a really cool way to sort of design these multimodal

Segment 3 (10:00 - 15:00)

workflows. The next thing is we've really rethought streaming from the ground up. Um in chat completions API the API emitted what we call object deltas which is sort of a pattern where that forces you as a developer to sort of accumulate every event that comes out of the API and stack them all up to get a full picture at the end of what happened. The responses API however emits a finite number of strongly typed events that you don't have to sort of look at every one to understand what happened. So if you're familiar with the responses API you might recognize a couple of the really common ones. So these are things like output text deltas if you just want to see the uh incremental token that the model is sort of emitting. Um you can also if you just want to know when the response started and when it finished or if it failed. uh those are events you can listen for and so you can only kind of you know this is really easy to write a switch statement around and uh you know the code is very nice to work with and sort of easy to reason about and we'll see an example of what this looks like in a little bit. Uh and then the last point is that because we are able to sort of rehydrate context from request to request, we actually see that at P50 these sort of long multi-turn rollouts with the responses API that are where the model is calling multiple functions and then eventually giving you a final answer are actually 20% faster and also they're less expensive because the model just has to emit fewer tokens. So why is that? um in the response API because the mo what the model will do is plan once and then call a function and then you'll respond and then we're able to rehydrate that original train of thought and then it can move straight on to the next function and so on. Uh we're able to sort of preserve that chain so that it can sort of move quickly through the roll out and then finish Jack completions where we have no way to preserve this chain of thought from request to request. the model's forced to think again at every step which results in many more output tokens. It also results in worse a worse cache hit rate because we're actually dropping the chain of thought at every step and so you sort of like lose that common prefix. So uh those are just sort of a few reasons kind of like the things we've thought differently about in the responses API. Uh and so kind of at the end of the day we've really realized that developers need a way to simplify um how they build these agentic applications. And so I want to look at how we're changing deployment with our agent platform. At the center is really the responses API and the agents SDK. And these are sort of our core building blocks that allow you to build embeddible customizable UIs in your application. So if you tuned in for dev day or if you were there in person, you saw that we launched agent builder and chatkit which make it really easy to sort of build these workflows into your application and drop them in with just a little bit of work. Uh and these are built on the responses API. And we also see this as sort of the core of what we call the improvement flywheel. So using sort of your if you're using responses statefully, you can sort of have build on a corpus of data that you already have to do things like distillation and reinforcement fine-tuning. And you can also build ebal for your tasks on top of this data and it makes it really easy and sort of um puts all of this stuff right at the center. Um and then you know in conjunction with all this we have all of our really great tools that enhance the model's abilities, things like web search, file search and so on. Cool. So, I want to show a little bit of a demo about how you might migrate to the responses API if you're still on chat completions. Uh, we know that migrating APIs is not a super fun task. So, just want to show how this can actually be really easy. So, I'm going to flip over to cursor and here I have a sort of like really simple chat application that I've built. Um, it's just, you know, kind of like a worst looking chatbt. I can say, "Hey, um, you know, tell me a joke. " Um, and then if I flip back to cursor, we can kind of see that, um, we're using the chat completions API to power this. So, you know, if you've ever built the chat application on our APIs before, a lot of this will look really familiar to you. Um, you can just you're sort of using the built-in SDK methods. And then this is just kind of like a single React file. So what we've done to make migrating your applications to my responses really easily is built a sort of migration pack. So what this is it's sort of a collection of prompts and guides that we use on top of codeex uh our CLI for agentic coding to actually go and migrate your app from one API to the other. And so the simplest way to get started is just to come in and kind of copy this bash command. And then I'll get this kicked off and then we'll kind of show uh what this looks like. So I'm going to paste this in, hit enter. Um it's gonna ask me what repository I want to migrate. We're just going to do the current one. Um do we want to migrate model references to GBT5? Sure, why not? Uh branch name. Sure looks good. And then do we want to proceed with dangerful access? And because this is a live demo, we obviously do. Uh so what this is going to do is it's going to kick off a run. It's going to run codeex in kind of a headless mode here. Um, and then we'll go back to the browser and we'll kind of show some of the different things that

Segment 4 (15:00 - 20:00)

we've kind of built into this pack to help codeex effectively migrate your AP your integration from one API to the other. So, we've baked in a lot of these great prompts, things like um, you know, sort of uh, the differences between so the high level differences between the two APIs, some of the guardrails, some of the acceptance criteria. Um, we're also providing it with some docs. So these are sort of migration notes like you know common things where if you were familiar with one concept in chat completions what does that concept look like in responses some of the formatting differences content items some of these sort of like philosophical changes that we've made that I talked about earlier so it's kind of going to feed all these things into codeex and allow it to just like really go and cook and then kind of come back to you when it's done and even though this is a really simple application uh that will should only take a few minutes um we are it actually scales really well to larger applications as well Um, the first few times I ran this, it took about 10 minutes. So, I'm actually going to stop here and then I'm going to uh take the fresh one out of the oven, right? And I'm going to say I'm just going to switch to a branch where I've already finished this. And then we'll do a quick diff and we can kind of see. Oops, that was backwards. Yeah, cool. So, we can kind of see what the uh codec actually did. So it switched our um conversation mapping to input items instead of messages. Uh it added a couple of extra fields here. It of course switched our model to GPT5. Um it included reasoning item encrypted content. So this is kind of what I was talking about earlier when I meant you can sort of rehydrate uh chain of thought from request to request even if you're you know a ZDR customer or if you want to work with the responses API statelessly. And then it's sort of changing the streaming handling uh to work with the streaming the responsive streaming events instead of the chat completions ones. So if I go back to my app, we get sort of um you know very similar experience, but now we're built on the chat completions API with GPT5. So I can say tell me a joke again and the model will think for a little bit and then uh we'll kind of come back and tell me the joke. So um this is sort of like a really easy way to sort of at least get started on migrating your application to the responses API if you have a sort of a really deep chat completions integration today. Um we you know we hope that the migration is very easy but we want to provide as many tools and guides to make it uh as easy as possible because we really think there are a ton of benefits to migrating over. So with that, uh, I want to move on to my next demo, which is sort of a, uh, sort of wanted to kind of talk about like a little game that we made and sort of like how we can use the responses API to really add some like cool agentic capabilities to this game. So um, you know, I over the weekend I built this sort of little game calling it OpenAI simulator. It sort of simulates a day in the life of an open engineer. And I won't tell you exactly how long I spent building this map. I'm a little embarrassed to, but it's a pretty faithful representation of sort of like what our floor uh the API floor at OpenAI looks like. And there's a bunch of characters and there's two kind of like main characters that we want to be able to interact with. We have uh Wendy J who's an engineer on my team and she built some of the great tools that you know and love like image gen and file search. And then of course we have Sam Olman, CEO of OpenAI. He's really interested in sort of like building AGI and helping his employees get there and help guide them to success. So let's flip over back to cursor and we'll look at sort of like how this is configured and we'll walk through some of the code here. So uh we have two agents, one is Sam and one is Wendy. We have some request options here. So we are feeding them both the model which is GP5. Uh Sam has some pretty basic instructions. Wendy has some basic instructions and these are just sort of things like um you know a little bit of backstory how you should act, how you should behave. Uh and if we go and back to our game and say, "Hey Sam, what's on the critical path to AGI? " Sam will think for a little bit and then he will eventually respond to us. Uh if we pop open our developer tools here, um we can kind of see the different streaming uh events that are streamed back to us. So we have response. create at the start in progress output item added. Um but the problem with this is that it takes a little bit of time for Sam to actually respond to us. And before he you know starts talking about frontier model architecture, agent tools in memory, we're kind of just left hanging and it's not that exciting of an experience. So, what we want to do is really is kind of give Sam the ability to um emit his sort of reasoning summary so we can kind of see what Sam's thinking before he actually starts talking. So, let's go back to our code and we'll update our code here and we'll say uh we'll give add the reasoning block and we'll say uh effort is medium and summary is auto. And what this will do is this enables a reasoning summarizer in the API which basically looks at the chain of thought that's coming out of the model decides if it's worth summarizing and if it is

Segment 5 (20:00 - 25:00)

it will start a sideline sampling process to summarize the chain of thought in a way that's consumable for a user and then start streaming that back. So if we go over to the right we can see our for loop where we're kind of handling our different streaming events. I'll minimize this one to start but we kind of have just three handlers right now. We're just looking to see when output items are done and we'll kind of get back to this stuff in a second. We want to see when we get text deltas. So this is sort of our final message from the model when it emits a new token. We basically want to emit that to the UI and we're just saying it's type text to give it that sort of visual treatment. And then we just have another one here. So we can log the final response when it's completed. So let's go ahead and add a new case here. We'll say case response. reasoning summary part delta. And then we will change this to reasoning just to give it a different visual treatment. And then we'll go back to our game and we'll say, "Hey Sam, what do you think is on the critical path to building AGI? " Sam's going to think and in the background he's going to emit some reasoning tokens and hopefully we'll start summarizing those tokens and start to see the reasoning summary uh in a second. Okay, cool. So, uh yeah, so he's thinking about the critical path priorities. Our UI is not amazing here. So because there's only a little bit of chain of thought to summarize, he kind of summarizes. We get a little bit of that summary and then he kind of launches right into his final answer. So anyway, a little preview of how you can use reasoning summaries to make your UIs a little bit more interactive while you're waiting for the model to think, especially if you're using uh GPT5. So um no good agent is complete without sort of tools and things that it can do in the real world. And because Sam is so focused on building AGI, we wanted to make sure he was sort of guided and kind of knew what stuff to do. And so we created a linear board that has some of the tasks that you might think are on the critical path to AGI. So things like a memory leak when AGI role plays as a toaster for too long or support for writing breakup texts or please recognition, you know, just some of the basic stuff. And we want to give Sam access to this linear board so that he can um you pull from this and then give me things to do in the game. So let's go back to our uh IDE and we'll go ahead and add tools here. We'll give him access to this MCP tool. And what this is it's just a simple tool definition kind of defining how we want our uh API servers to connect to this MCP server. So we give it a type. Type is MCP. server label. So this is sort of how the functions in the MCP server are named space. We give it a server URL, little description of course an authorization token that identifies me as being the owner of this project that we just looked at. And then we can give it some allowed tools. So these are things like uh get issue, list issue, create issue. And these are things that we this is sort of an allow list of tools that we want the model to be able to call. Um there are many tools but you know we don't always trust our agents to be able to work in a way where without supervision and so sometimes you want to be able to sort of limit the tools they can call and then we'll say always require approval and then over on the right side uh we'll look at some of the code we already wrote to handle this. So, uh, what we see here is we're looking at our output item done event. And if the type of item that we got back is an MCP approval request, we're just going to pop open a window to confirm yes, I want to run that t that task. And then we'll say uh we'll basically just auto approve it and then keep sampling. So, let's go back. Uh, actually, what we want to do is add a couple of more uh items in our switch statement here for our streaming events. So we'll say response. mmcp list tools in progress and then we'll emit something to the UI that looks like uh listing tools. Great. And then we want to add another one here so that we can emit another event when we're actually calling the tool. So let's say if uh event item. type is MCP call then we want to again emit an event to our UI that just describes what tool is being called. So we'll say uh calling and then we can say our event item server label dot uh event

Segment 6 (25:00 - 30:00)

item name. So we're going to print out the name of the function that the model is calling. And we want to add uh we added this so that we can kind of know what the API is doing in the back end. So when you first start out, when you first make a request to responses with MCP enabled, the first thing we do is list the tools that the server exposes. And this is because MCP servers can be dynamic. The tools can change from request to request. And depending on what level of authorization you have, so if I'm a limited privilege user, I might have access to fewer tools than a sort of more privileged admin user. So we just want to know sort of when this is happening to keep our UI really fresh and then know what's going on. So, let's make sure we save that and we will add some specific instructions to Sam kind of instructing him how to use the MCP server. We'll go back to our game and we'll go down to Sam and we'll say, "Hey, Sam, I'm really excited to work on AGI. Can you list a few of the issues in the AGI linear board? I might be able to work on seems like I think so. We'll see that he's listing tools. He's going to think a little bit about sort of like how to go about uh sort of like fetching stuff. Um we're going to get a little popup here that says run list issues and we'll say yes. Sam is thinking again. So he's going to uh hopefully run this tool in a second. So he's calling linear MCP server list issues. A little hard to see, but you might be able to see right down there. Uh he's got a list of whimsical issues. I think they're actually pretty serious, but that's fine. Uh, okay, cool. So, he's able to pull from the board. He said, "Yep, there's a issue with AGI coordinating Skyrim NPC unions. Um, there's a memory leak for the toaster thing. Um, please recognition, etc. " And I'm going to say, um, hey Sam, I'm uh actually really interested in making sure AGI can tell the difference between um uh between MacBooks and Windows PCs. Can you add a task in the board for this? And Sam's going to think again. He's kind of because the whole thing is kind of covered up by the existing text, but we should get a prompt to uh should might we'll probably think a little bit and then we should get a prompt for him to actually create an issue in our board. And so he will be able to sort of interact with this board kind of in real time. So he's thinking about how to create a concise description for the project. So how to distinguish MacBooks versus Windows PCs. He's thinking about how to let me know. So he's going to wants to run create issue. And I'll say okay. So now he's calling uh linear MCP server. create issue and then when he's done then we should summarize sort of what he did and then tell us kind of uh what he did. So okay great he was able to create 32 distinguish MacBooks versus Windows PCs. And if we come over to our uh our linear board here we see that he actually was able to do this. And if we click in he's actually given us a pretty thorough description of exactly how to do this. So um we'll say that's great. I can go off and work on this. Um this is sort of an example of how you can bring really rich information from other parts of the internet, other uh you know services that you know and love into your applications using something like MCP. Uh and this is has first class support sort of in the response API as we saw. So we also want to show a little bit about how we do these sort of like multi-turn multi-tool rollouts uh sort of in one API request. So this is the ability for the model to kind of go off and do multiple things and then finally come back and give you a final answer. And to do that, we'll talk to our other sort of character in the game, Wendy. Um, so let's flip back to our code and we'll give Wendy access to a couple tools. And we'll also copy some of these settings over here. We'll give Wendy access to two tools. We'll give her access to the web search tool and the image generation tool. And these are two cool tools that allow you to sort of search allow the model to search the internet and then also um sort of have access to the great image gen model that um everybody loves so much. So um we'll give her access to these two tools. Web search, this one's pretty simple. You can also add things here like where the user is based, their

Segment 7 (30:00 - 35:00)

city, their state, their time zone if you want really sort of localized results. And then we'll give her access to the image generation tool. So we're using our GPT image one model. um small image, small square image and low quality just for speed. And then we'll save this and uh oh actually we probably want to know sort of like when these tool calls are happening. So we'll add a couple of more case statements here to our switch statement. So we'll say case response web search callarching. And what we want to do, I'm going to copy some stuff. I'm going to say searching the web. Cool. And then we want to do another one for generating an image. So we'll say case response image generation call in progress. And we'll do a similar thing here. And we're just using the reasoning type here to give it sort of that blue visual treatment in the UI. Uh say generating image. We'll save. Save over there. We'll go back to our game and we'll go down and talk to Wendy. We'll say, "Hey Wendy, I've never seen French bulldog before. Can you search the web to find out what they look like and then draw me a picture of it? " Cool. So Wendy's going to think for a little bit. Uh, and what I want to do is sort of pop open our developer tools and we can kind of like see what the streaming events that are happening as they proceed. So Wendy is thinking about image generation tools. She's uh, you know, kind of thinking and planning about what she wants to do. So she's going to keep it to three to four queries, which should hopefully be enough. Uh, she's searching the web. So back in our dev tools, we can see records of this. We can see we have a output item done. Uh we have a reasoning call here. Uh we have web search call in progress. Web search call searching. So we kind of get these state machine events that tell us sort of like what uh amount or essentially what is happening with the tool call at any given time. Web search call completed. And then if we click into one of these events that represents the web search call, we can actually see what the model searched for. So it's looking for AKC. I think that's American Kennel Club. French bulldog breed standard appearance ears compact size colors blah blah. Um, and so we can see that Wendy's actually going to do this a few times and uh, you know, really gather all the things she needs to kind of give me an accurate picture of what she looks like and then uh, at the end she should hopefully um, be able to show this to me. So she's going to search the web multiple times. Uh, we added some code in our handler here. So we're looking for when an output item is done and specifically we're looking for image generation calls and those calls sort of have the B 64 data representing the image sort of streamed right back to you. And what we're going to do this the code we've already written will sort of open that image just in a new tab when the uh full call is done. So let's give Wendy a sec to sort of like think about all of what needs to happen and then um okay so great she's starting to think about the image prompt. Okay, this is great. And she's go ahead and draw me a photo of a French bulldog, which is very cute. So, yeah, this is just sort of like a really brief example of uh how the responses API can sort of really level up your applications, help you take advantage of multiple hosted tools, information from outside your application, and really bring your sort of like characters and your apps to life. So, uh, yeah, I think the last thing that I want to show is actually how sort of a preview of our, uh, agent builder product. So, if you tuned in to the and tuned into DevDay or if you were at DevDay in person, you saw uh, Christina sort of give an overview of how to do things with our agent builder product. And so, I just want to show a simple example of how you might recreate some of what we just built in agent builder. So what we have here is a really simple workflow and I'll kind of click through the nodes and explain what's happening. So the first node we have here is a sort of like a web search sort of decision agent. And so we have some of the same backstory here. Um and basically the instructions are to just decide if the query that's being asked requires a web search. And if so uh we want to emit some structured data saying like yes this user wants to search the web and then if so what the search query is. And then we have this if else statement which essentially looks at the output from the last node and will either then call this agent which has the web search tool enabled and we've gone ahead and set some of these localized settings here or it will call a different agent that is whose job is just to kind of respond conversationally. So we can go ahead and try this out. We'll click preview and we'll say um hey what's the weather near me?

Segment 8 (35:00 - 40:00)

We can kind of see the nodes activate as they uh as they sort of we go through the workflow. So we can see the first one classified my query as wanting web search and the quer the query being current weather near me. Um the if else statement passed and then we move on to this agent that I can actually search the web. It knows where I'm located and could tell me that um it's actually a pretty warm day in San Francisco all things considered 67 degrees or 20 um you know Celsius for uh folks not in the US. But then we can go back and ask like um hey actually can you just tell me a joke and we can sort of like watch the inverse happen. So this uh the decision node here should return false and then we kind of like flow into the conversational agent and we just get a few jokes here. Um including the Kit Kat ads one. I haven't seen that one. That's good. Um okay cool. So this is a little preview of sort of what our agent builder product looks like. The next build hours we'll go really deep on this. But um this product will be really cool for sort of build be able to build these drag and drop workflows and drop them right into your applications without the need to write so much code. So with that I think we can go to Q& A. — Awesome. — Yeah. — So you can just do a quick refresh — refresh here. Okay. Okay, cool. Uh, okay, cool. So, this person asks, "What's the best way to pass example outputs to the model using GPT5 mini and I want to return a structured JSON, but also often find that it can hallucinate? " Um, this is interesting. So, I we find that um we find that people really have a lot of success with fshot prompting the model. And even though this is sort of a technique that dates way back to sort of the first generation of these chat models, um it still works really well for the current generation of models too. So if you want to really give the model really clear instructions on how it should behave when if you're finding that it hallucinates, you might want to give it a few different varied examples. So you say um you know this is my user message, this is the input, this was the assistant message. It's sort of a good canonical example of what you would want it to say and you give it a few different examples of this. So it's varied and so this will kind of give the model like a really clear idea of what you want to uh what you actually want it to do in that scenario. So we recommend trying a few shot prompting. Um and if you need if you find that the hallucinations are kind of related to things where it's making up data that it might be able to find from outside its own context, you might try adding in tools like web search to be able to bring in data from the outside web to kind of make sure that it doesn't just sort of make something up. Um, great question. Cool. Are there performance differences between the chat completions and responses API? Yeah, this is a great question. So, we find that the responses API again really thrives in these like long sort of tool calling rollouts where the model is going to kind of think for a while and then call a bunch of functions in a row. We find that the end-to-end performance of these rollouts over many requests is actually a lot shorter uh than what it would be in chat completions where the model has to think again between every step because it can't preserve its reasoning from the very first step. In response is the model can think for a little bit and then call a tool and then you can respond and then when you respond we can rehydrate that original thinking content and then the model knows oh okay like that's what my plan was and I can just proceed with the next tool call. Whereas in chat completions, we have no way to preserve that original content. So it gets dropped. The model has to think again before it can continue. And so this you know the process of thinking is you know emitting more tokens which takes time. And so we find that uh by not having to do that we actually at median save about 20% of time and it's also a little bit cheaper and you get better cash hit rates. Um we've also uh found that uh the responses API is a little bit better at sort of um uh well it kind of enables a stateful query. So if you have um if you have sort of things like we were previously doing had a tool like a function call that did something that are one of our hosted tools can uh implement. Let's say you had like a rag function. So you want to look something up from a corpus of files, you know, having to roundtrip that. So you get the function call that says, "Okay, search the files. " Then you go do something on your own server and then you have to send it back to the API. Um, that round trip incurs a little bit of additional latency where the hosted tools and the responses API can kind of all happen in a tight loop and uh we save a little time there and of course our retrieval stack is really is really finely tuned. So a couple examples of where the responses API uh you might get sort of better performance than chations. Uh, okay cool. How are previous items passed to a new response requests? Is this a conversation this docs mentioned or something else? I'm interested in rehydrating my chain of thought over many requests. Okay, this is a really good question and we actually didn't

Segment 9 (40:00 - 45:00)

touch on a couple of different ways to do this. There are actually a few. So, the most simple one is the one that you're probably familiar with if you've used chat completions in the past and this is just taking the whole list of items that kind of represent your conversation and then just passing it to the next request. you can just keep appending things to that list and then passing it back. This is kind of how you would have used chat completions in the past. Um, if you don't want to do that, you we also have sort of a helper called previous response ID. So, this allows you to sort of chain off of a previous response and just add one or two incremental input items. And what that will do is it will load the previous response, fetch all the uh context from that and then append the things that you passed in and then continue from there. So it's kind of an easy way if you just want to like keep a pointer to the sort of like head of the conversation without having to sort of manage the context state yourself. Uh a couple months ago we also launched the conversations API. So if you were an assistance user before you were probably familiar with the concept of the thread object. The conversation object is sort of the reimagination of that uh for the responses API. So you can create a conversation by doing post v1 conversations. You get a object ID back. you can pass that to the responses API and then as the conversation progresses you just pass incremental input items to the your call to be one responses. So you might say conversation ID is this and then input items are um A and B and then we will take those uh items and then append them to the conversation. So the conversation grows and mutates over time and then you just really have to keep a reference to the one conversation object you created. Um, it's really great if you're sort of building a chat UI and you can just list the things in that conversation if you want to render a UI out of it. Um, it's sort of like a really easy way. So, there's kind of a few different ways to rehydrate context from request to request depending on sort of your specific application and sort of like how much you want to actually manage the context yourself. Uh, and again, if you're using responses statefully, so if you're using the conversation object and things are being persisted on our servers, the sort of chain of thought rehydration is happening automatically. If you are using it statelessly, so you're using the store parameter and setting it to false, then you can include the encrypted content to sort of round trip that uh roundtrip that encrypted reasoning so that you uh don't have to or so you sort of get the benefits of rehydrating the chain of thought. Um or if you're a ZDR customer, same thing. You can kind of in use that include par to uh be able to rehydrate that stuff. Cool. Will the template be available as oss? Uh I think maybe if the template I think all the stuff we showed today we'll all put on the um I'll put on the uh sort of um our build hours GitHub. So all that stuff should be available. Um so yes hopefully everything that you saw today will be available on GitHub. Cool. Uh I'm working on a paramedic simulator using multiple agents and roles patient bystander dispatcher etc. Any chanceing you posting this code? It seems like a great jump start. Yes, exactly. This game that we put together will be available on the build ours GitHub. So you can feel free to fork it and kind of do whatever you like. — I added those because it sounded like people were really uh interested in also playing a day in the life. — Nice. Yeah, exactly. Yeah, it's pretty fun, I would say. Yeah, it's a good time. — You can make it your own. — Exactly. — Um I know we have some time left, so if you want to do a quick refresh, I just added two more questions. — Awesome. Okay, cool. How does prompt caching work and how can I edit my prompts to take advantage of prompt caching? So, yeah, um it's pretty straightforward. Basically, the way that it works is once you pass in all of your context, we sort of construct a sort of underlying representation in tokens and then we take all those send those to the model. And at the model level, there is sort of a cache that will basically try to match a certain amount of uh basically a basically do a kind of a prefix match on the tokens you passed in with what may already be in the cache. So let's say you um made a call to the responses API and you said uh hey tell me a joke. Well, we will tokenize that conversation, send it to the model, and if we find that exact prefix in the uh sort of model's cache, then those will count as cache tokens, and you will pay a discounted price for those input tokens. Um so the way to really take advantage of prompt caching is to sort of not change the earlier parts of your context in between requests because if you remove something let's say three items up then uh essentially you will um change the prefix and so anything that came comes after that will be lost because the prefix will be a little bit different the sort of set of tokens will be different. So the way to sort of take advantage of the cache most effectively is to treat your context as sort of an appendon list and just keep appending things to it. If you sort of change things earlier in the uh earlier in the flow, then you'll kind of like lose out on a little bit of the benefit of prompt caching. Cool. What are some of the most common mistakes you see with using the responses API? This is a good one. Um we

Segment 10 (45:00 - 50:00)

definitely see that um you know we I think that the biggest thing that the responses API offers is sort of this advantage of being able to rehydrate chain of thought from request to request. And so we see sort of a common pitfall for sort of ZDR customers that are stateless by default. Um you know you there's sort of the additional opt-in step of requesting the encrypted content so that you can rehydrate that and uh sort of pass it back. And so sort of if you don't do that um you know you won't sort of like by default you won't be able to take advantage of that. So definitely recommend if you're using responses statelessly or if you're a ZDR customer to always um sort of request that encrypted content so that you can pass it back and take advantage of the full abilities of the reasoning models. Um another thing we see is I'm trying to think definitely um folks trying to sort of like roll their own versions of hosted tools that we have. Um, we think that we offer some like really great host of tools and they make it very easy to get started, especially if you're doing something like building a um a rag pipeline where you want the model to be able to search over a corpus of documents. This is a really hard thing to really do well and our team has spent a ton of time on getting it right and so would definitely recommend where possible, especially if you're just getting started and you don't need so many knobs to sort of take advantage of the built-in tools and uh, you know, really get all the power out of the platform to really get yourself ramped up quickly. Um, the other thing that we really want to try to encourage people to use to do is try out some of the other sort of objects in the responses API ecosystem. So, talked about the conversation object a little bit. This is a really easy way to get started, especially if you're building sort of a chat interface. The other really great object that we shipped a few months ago is the prompt object. So, if you have a task in your API, let's say, or in your application, let's say translate this or um, you know, this is sort of the outline for your character and maybe the game example that we went through. You can create a prompt object in the dashboard that is sort of a staple object that describes you can give it instructions. You can say you know you are character XYZ or your job is to translate the content below from English to Spanish. You can give it some tools and this basically allows you to sort of define a task and then uh sort of just reference that in the responses API with the prompt ID and then anything you can version these prompts and iterate on them and it really helps you hill climb on your tasks. Maybe you have an eval for this specific thing and you want to make it better. you can just kind of change the prompt, save it, take the new version, drop it in the responses API and sort of all your calls will get all the benefits of that without having to hardcode all that stuff into your application. So few common pitfalls um few ideas if you've sort of never tried those uh those objects before. — Awesome. Um these are some of our resources. Um our Q& A is going off. Do you have time for one last question? This one is about um would you explain how and when the MCP tool calls happen in the responses API and what you have seen folks doing with that functionality? — Totally. Yeah. So, uh MCP is really cool. We're like we were really excited to support in the responses API. Um basically the way that it works is if you enable an MCP server in responses, the first thing we do is we'll go out and reach out to the MCP server and just say, "Hey, what are the tools you have available for my user to use? " And it will basically return, if you've used function calling, it's very analogous to function calling. We kind of refer to it as like remote function calling almost. And so it returns a bunch of function definitions essentially. And we feed those into the model and we say here's all the list of functions that you have. And they're all namespaced. So we can tell sort of the difference between a function that you provided that you want us to yield back control to you to handle versus an MCP function that we need to reach back out to that server to execute. So in the linear example, there's a bunch of functions. There's like 40 functions. We only looked at a few of them. Um, but the sort of canonical example would be we make a request. The responses API will call out to the MCP server to say here's my user uh off token. What functions does this user have available? It will return this list. We'll show that to the model. Depending on my prompt, if I say like, hey, create an issue in XYZ board for you know this description. Then the model can identify the right tool. It will emit some JSON that says like here's the tool I want to call and here are the arguments. And then we'll take that and then we will send that back to the MCP server. So we're reaching back out to linear with that fun with that uh JSON and the linear MCP server will do something with it in this case maybe create an issue in the project and then it will return an acknowledgement back to us saying like okay great here's a representation of the issue I created. The model can then look at that and then summarize sort of what it did in its final answer. Um so yeah that's a little bit about uh you know how MCP is working under the hood. Um some cool things we've seen doing what people do with MCP are building um you know people really like uh to give sort of their agentic coding tools like codeex CLI access to uh you know other parts other things that um they might have to pull from. So again linear is a great example. Let's say I spin up codeex and I say like hey

Segment 11 (50:00 - 51:00)

can you just pull um this issue from linear it's already got the stuff in it or I want you to work through these five issues. it can sort of work with linear to sort of pull a few things and then work on it independently without you having to stop and prompt it every few turns. So um Codex CLI built on responses API and then you kind of get all that cool functionality by virtue of that. — Awesome. Thanks so much Steve. This was really helpful. Love seeing all of the questions come in. Um we're trying to answer them all in the chat um as well as live. Um, but if you didn't get your questions answered this time around, um, we have more build hours. So, you want to just hit the next slide. Awesome. Um, so October 29th is all about agent kit. Um, it's really a deeper dive of what you've seen today. So, bring those questions there if you have more questions around the responses API. We'll get to them there as well. Um, then November 5th is all about agent RFT. So, once you've built your agent, how do you make them better? Um, and then December 3rd is about agent memory patterns. Um, super excited for that one. Um, all of the build hours are available on YouTube as well as on demand on our homepage and that's where you can sign up for all the other future build hours. Um, and with that, uh, we'll wrap up. So, thanks so much for attending and we'll see you at the next one.

Другие видео автора — OpenAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник