Build Hour: Built-In Tools

1:01:00

Build Hour: Built-In Tools

OpenAI 03.09.2025 7 318 просмотров 125 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Built-in tools let you extend models out of the box without writing custom functions. This Build Hour shows you how to use web search, file search, code interpreter, MCP, and image generation directly with the Responses API, with demos of adding these tools to real applications. Katia Gil Guzman (Developer Experience) covers: - What are built-in tools? How do they compare to function calling? - Available tools: web search, file search, MCP, code interpreter, computer use, image generation - Playground demo: experimenting with tools in (https://platform.openai.com/chat) - Live demo: building a data exploration dashboard using MCP, web search, and code interpreter - Why use built-in tools? Minimal coding, functionality out-of-the-box, and ability to combine tools - Customer spotlight: Hebbia’s use of web search for finance and legal workflows (https://www.hebbia.com/) - Live Q&A 👉 Follow along with the code repo: https://github.com/openai/build-hours 👉 Playground: https://platform.openai.com/chat 👉 Built-In Tools Guide: https://platform.openai.com/docs/guides/tools 👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours

Оглавление (13 сегментов)

Segment 1 (00:00 - 05:00)

Welcome back to Open AI build hours. I'm Christine. I'm on the startup marketing team and today I'm joined with Katya. — Hi everyone. I'm Katya and I'm on the developer experience team. — So Katya is actually back for her second build hour and she's going to be in San Francisco all summer. So we're really excited to have her. Uh we have a really fun topic today. It's all about builtin tools. Um but before we start, I did want to revisit the goal of build hours for anyone new joining us. Um the goal of build hours is to empower you with the best practices, tools, and AI expertise to scale your company using OpenAI APIs and models. So as you're familiar, here is our homepage where you can register for all future build hours. We have some exciting topics coming up uh especially in August. So stay tuned and be sure to check it out. So, here's a snapshot of what you can expect today. Uh, first, we're actually going to explain what are built-in tools. Uh, go over some concepts and all the available tools that you can use as of today. Uh, really excited for this next one is we're going to show you the playground. Um, Katya is going to do a run through of how to experiment and iterate quickly using the tools in the playground. And then we're going to bring built-in tools into your app. Uh we're going to be building a data exploration dashboard today and then we will save some time for our customer spotlight. We're going to be joined with by the technical lead at Heavia um to go over how we're they use web search for finance and legal workflows. And then we always have time for Q& A. Our team is in the room with us to answer questions live and then Katia will actually take over and answer a few of them towards the end. Uh so with that Katya feel free to jump right in. — Great. Thank you Christine. Um so hi everyone again really excited to talk about built-in tools today. Uh but first things first let's discuss about what it is. So there are actually two parts in that name builtin and tools. And so the tools uh the tools part refers to the ability to add capabilities to the models because you know LLMs uh they're great. They can do a lot of things but when you want to build apps that can interact with your own data when you want to build agents that can take actions for you LMS can't do that on their own. So they need something they need tools to be able to perform those actions. And so that's what we're going to talk about uh today. and the out of the box part. So the built-in part, it means that you don't have to code anything. do anything on your side. You can just use it. So you can pick from a range of hers tools. We'll see uh which ones in a second. And then you can just add that to your app, agent, uh give the models access to them. And just uh very easily you can add these capabilities uh to your models to for example search data uh either like live data or data in your systems. You can perform complex tasks and again since it's out of the box it means you don't have to code anything and uh you can just leverage them uh whenever you need. Okay. So um what's the difference with function calling? Because uh some of you might be familiar of the with the concept of function calling. We actually did a build hour for function calling a few months ago. Um so if you don't know what it is really quickly, it's the ability to tell the model, okay, you can use these functions that you define in your code. And the way it works is that you tell the model that uh it how it should use those functions. So it will just give you parameters that you can then use to call the functions on your side and then you tell the model uh what the response is what the result is. So if you want to do uh function calling there's actually three steps that are important. Uh so we have a little diagram here. We're going to focus on the three steps in the middle. So once the model decides first you need to tell the model okay these are the functions that are available to you. For example, uh you can have like a get weather function and this is how you should call them. These are the parameters that you need. Once the model actually decides to call the function, it will tell you okay, I think you should call this function now with these parameters. So that's the first step. And then the second step is that you actually need to take that and then execute the function on your side. So uh for example, you have some code defined or you need to call uh an API. So you just take those parameters, execute the function, then you get a result. And so the third step is to tell the model here's the result. You add it to the conversation history to the conversation context so that the model knows what the result is and it can then

Segment 2 (05:00 - 10:00)

finally generate a final response. So that's function calling. If we look at built-in tools now, uh it works differently except for uh one tool computers use. We'll cover it in a second, but it works differently because you don't have to execute the code on your side. It's directly executed on our infrastructure. So, you can just tell the model, I want you to use these tools when you need uh and then when the model thinks it's right, it will call these tools. But instead of telling you, okay, you need to call this tool now, it will actually execute it automatically and add the results to the conversation history to the context and automatically generate a response as well. So on your side, you don't need to do this step in the middle. You just need to let the model know you can use this and then once it decides you can use a tool, you will directly go to this final response here. And so it's all transparent to you. Um so that's like the beauty of built-in tools you don't have to uh handle them and also you can uh in addition to benefit like using our expertise because uh we'll cover which tools you have access to but uh you don't have to think about how they are built what's behind them you can also use our infra you don't have to um execute anything on your own infrastructure okay so which tools are available now. We have um six tools that are available as of today. Just to give you a little background, uh we first introduced uh the first tools were introduced at the same time as the responses API. So it launched earlier this year around March and after that, a couple months later, we added a few more tools. And so the one that I think everyone's the most excited about is web search because as you know LLMs have a cut of diet. So they're not uh aware of what's going on right now in the world. Like the training data stops at a certain point. So for our current models uh their kind of date is May 2024. So that means everything that happened after they're not aware of. So you can't ask them what happened like last month because they have no idea. It's not in their training data. And so if you want the models to know uh what's happening right now, you need to give them access to the internet. the ability to search the web or connect them to a new source. And that's uh that's like pretty cumbersome to build. And so with web search we are actually giving you the capability uh to we're giving the models the capability to search the web uh without any effort. So we're using our own um services our own index so that the models can know what's happening in real time uh and you don't have to yeah to build anything again. So the second tool that is very exciting as well is file search. So file search is uh giving the models the ability to search your knowledge base your own database of files. So some of you might be familiar with the concept of rag retrieval augmented generation. Um and so that's the action of giving the models in their context some internal knowledge or some knowledge that they can use to generate the response. So if you want the models to know about something that is not in their training data that is for example your own internal uh data your own sources you don't want to train the model on it. fine-tune the model on it because it won't do anything. You want to find that information, find the right context at the right time and just use it in the system prompt or in the user prompt. Just give it the context that it needs to generate the a relevant answer. And so that's called rag. And it's uh it's also pretty cumbersome to build like you need to first pre-process all of that data, then you need to retrieve it uh retrieve the right information at the right time. And so all of that can be automated with the file search tool and uh we'll come back to it in a second so you can get a sense of how it works. The third tool I want to talk about is the MCP tool and that one is actually extremely exciting because it's not just one tool. It's giving the models access to hundreds and hundreds of tools because uh with the MCP tool you can get access to any remote MCP server. So for those who don't know MCP is uh means model context protocol and so it's a it's kind of a convention that has been uh introduced uh recently where you can just uh tell LLMs how to use your own

Segment 3 (10:00 - 15:00)

functions. So let's say you have like an API and you want LMS to interact with it. You can build an MCP server to let the LLMs know okay these are the functions that you have access to. these are the parameters that you can use and just give me that I'll give you the answer. So it's kind of the same as function calling except it's uh like any uh person who defines their MCP server they can give access to uh any LLM to their own functions. And so with MCP tool uh we'll see a couple examples of MCP server you can really give your models any tool that uh that you connect to it. So, it's pretty powerful. The next one I want to talk about, it's pretty powerful as well. They're all great. Um, is the code interpreter tool. So, the code interpreter tool um can come up with code to uh solve problems. So, it's really powerful for things like data analysis, for example, and then it can execute the code. So, we run the code again on our infrastructure and then give you an answer. And that answer can be just text or we can even generate charts. So we'll see an example of that as well. And then the last two tools that uh we won't talk about today but you can find resources about them if you're interested are computer use which is a little different because that tool uh lets you lets the models use interfaces computer interfaces uh visually like if they were human. So they can tell you, oh, I need to click on that button or I need to scroll down the page. And then you still need to execute that on your own computer environment. So that one is a little um there's an extra step you need to execute on your own environment, but uh it's still pretty straightforward. So you can find resources on that. Um and Christine will probably share in the chat. And so the last tool that we have is image generation, but that's like a whole other topic, so we won't cover it during that build hour. Okay. So now that we've introduced all of these tools, uh let's actually see it in action. So for that, I'm going to use a playground, which is our kind of uh experimentation lab on our platform. We haven't really talked about it during build hours before and uh so that's why I wanted to talk about it now because it's a super helpful resource, very convenient when you just want to try things out. You just you don't want to build a whole app or like write a script. You just want to like visually try things. Um and so for trying out tools, it's the perfect uh as a perfect interface. So we'll first try out the tools in there and then uh we'll move on to another demo. So, in the playground, I'm just going to do this very simple prompt. Uh, let me switch to the playground here. And I'm going to add this very simple system prompt. So, this is the playground. Again, if you're not familiar with it, you can find it in the platform in dashboard. It just moved. We used to have a playground uh top n top level nav item and now it moved to dashboard. And so you can uh you can use it for chat, for audio, images, etc. But we'll use the chat one today. And here you can add any tools that you want. So for example, here I can start with the file search tool. Uh I'm going to add it here. And so the file search tool, the way it works is it connects to a vector store where you have previously uploaded your files. So for example here I don't need to create the vector store myself. I can just uh upload the files and the playground will create the vector store automatically for me. Uh and that's all I have to do to kind of prepare the usage of the f tool. So now that I did this my files were automatically pre-processed. Uh what that means is that they were uh chunked. So they were split in multiple little pieces. They were embedded. So they were converted into vectors so they could be uploaded to the vector store. So that's something that you would normally have to do yourself and that can be pretty uh daunting if you don't know what you're doing because you know there's a way to optimize the chunk size. There are many things you need to think about when you build your own rag pipeline and here we take care of that for you. So you don't have to think about it. You just have to upload your files to the vector store. And so here I'm just attaching them. And just to give you an idea, these are the two files that I uploaded. So this is just a blog post about the tools that we shipped and I just exported that as PDF and that's the two files that I'm attaching here. So now I can ask any

Segment 4 (15:00 - 20:00)

question about these files. So for example, I can say um what are the tools OpenAI? I'm going to just copy paste it here. What are the tools OpenAI shipped uh in the responses API? Give me a concise answer with bullet points. And so here you can see that it actually searched through my files. It's giving me annotations. So it's giving me the reference which files did it find the information in and it's generating this answer. So from the two uh blog posts, it got the six tools that we just discussed because three of them went were in the first blog post, the others were in the second blog post. And so all of that is something that the model doesn't know because you know it's uh not public information that was shared before May 2024. So it really used our internal knowledge to be able to answer. And again like the RAD part it's not just the pre-processing it's also the retrieval part. And we're also optimizing that for you. So you don't have to think about, you know, finding the most relevant uh files for your user input. You don't have to think about how you would uh rerank those results like rearrange them to really use what's most helpful. We uh you can offload that to us. So that's for the file search tool. Now, uh I want to try the web search. So let me just remove that one here. And I'm going to add the web search tool. Uh so the website sure you can configure it a little bit. I'm just going to select the country here. What that means is that uh the results will be using these settings. So if you set a user location for example and you ask for like great restaurant in the area, it will know uh where the user is. So it's like when you are searching the web from your laptop, Google knows where you are. Well, that's kind of the same concept here. Uh but it's optional. So for example, I can ask uh anything that would be like uh public data on the web, internet. Uh so for example, I can say when is GPD5 expected to launch and again this is not internal knowledge. This is going to be searching the web for an answer. And so apparently the web says that GBD5 is set to launch in August 2025. So I don't know. We'll see. I cannot uh confirm nor deny that information. Um, okay. So, now that we've seen how the model can access either your own internal knowledge base or public information on the web, we can uh we're going to see how it can access functions from other uh application, other providers with the MCP tool. So I'm going to first start by adding an MCP server for Shopify. So this is a list of example MCP servers. You have Zap here, Shopify, Intercom, etc. Uh you can also create your own MCP server. So if you have uh an API, you can just you know uh interface it with an MCP server and host that yourself. And then uh you can connect through the responses API through the MCP tool. You can connect to a custom MCP server. But these are just the examples that we have in the playground. So I'm going to use the Shopify one and uh this store which is a an ebike store and I'm going to call it Code Boy ebikes. And so when I connect to it, it's first going to fetch all of the tools that are available on that MCP server. So if we look at that, we have these are all the tools that Shopify is exposing on all of their stores. So search shop, catalog, get cart, update card. So that mean any store that's on Shopify, you can interact with it through LLMs via this Shopify MCP tool. So I'm going to add it here. And then I'm going to say so I'm um yeah, as Christian mentioned, I'm in SF now for the summer and so I was uh doing Whimo because it's SF, but it's pretty expensive. So I want to find another option to commute. So I'm going to ask uh the model here to help me find a suitable ebike. So, I'm just gonna say, uh, I just want something not too expensive, please. And so, here, what it's doing is it wants to call an MCP tool from the Shopify MCP server. Uh, here it's um asking me to approve or decline. That's an option that you can use with the MCP tool. So, you can either ask the user for confirmation or you can also bypass

Segment 5 (20:00 - 25:00)

that. So, you don't have to show the approval. You can also always approve. Um, here I'm going to approve. And let's see. So, it's listing directly the Coboy box from the Coby uh Shopify website and showing me the results and all of that. Again, I didn't have to do anything. As you've seen, I just clicked a few buttons. We'll see what that means in code, but uh just by adding this MCP tool, I get access to all of that information that is on the Shopify website. Great. Okay, so that's that looks great. I think I'm going to take one of these later on. Um but let's try something else now. Let's try using the code interpreter tool. So the code interpreter tool um as I mentioned it can execute code to solve problems and it can uh generate like files like charts as well uh to answer your questions. So I'm going to combine it with the web search tool. What's great is that you can actually combine these tools together so they can work uh in tandem. And um I'm going to say that first I'm going to see if uh what's the weather in SF again to see if uh the web search tool works correctly. So here we're searching the web and we get uh some data. Oh, apparently we're experiencing one of its chilliest summer. Great. Now that I'm here, it's actually Yeah, I was pretty surprised by the weather. uh in San Francisco. I thought it was going to be a little bit better. Okay, so we can ask something to the coded interpreter tool to um compare actually this data because it says that it's one of the chilliest summer in decade. So we can ask it uh okay, can you actually find the weather data on the same day for the past 10 years and generate a comparison chart. So uh here we're again searching the web for uh the historical data and then uh okay sometimes one thing to note that happens from time to time and it's a good thing to be aware of it the web search tool can actually like spiral a little bit when it doesn't find what it wants. So I would include something in your prompt like in your system prompt to uh tell it like just do search it once and uh that's it because otherwise it will like try to search for more. Uh so when you combine the web search tool especially with like other tools that's something that can happen once in a while. So it's good that it happened here because now if you get that issue on your side you will know why. So, I'm going to ask again, uh, what's the weather in San Francisco? And then reass. And normally, it should work. Doesn't happen all the time. Uh, I'm just going to say add something like, yeah, search, just search for it once and then stop. Okay, so it's searching the web. Hopefully, it doesn't spiral this time. And then the code interpreter tool should be able to analyze this data. So this is the code interpreter tool. It's running Python code and then it generated a chart for us. And uh again we didn't have to do anything here. I just added the code interpreter tool and the code interpreter tool was able to say okay to find the answer to this question that's the code I need to execute. It's Python code so it's robust. It's not like LLM based because you know LLMs they're not uh deterministic so when they look at data sometimes they can get things wrong whereas here you actually execute code so you know that the answer that you get is the right one and so code interpreters can generate files for you. You can generate charts but you can also give it files. So for example here if I uh click the code interpreter tool to configure it I can also configure uh like the this tool by giving it access to a file. So this file we're going to use it in the next demo. I'm uh just uploading it and now code interpreter will be aware of it. Um, the next thing we're going to show in the demo, so I'm g added it. I'm going to add it here as well, is the Stripe MCP server tool. So, I'm going to add it from here. And, uh, all I need to do is add my Stripe API key. So, this is a test key. And sorry, and uh, it's connecting to the tool. So, it's also offering a lot of

Segment 6 (25:00 - 30:00)

different tools that I can have access to. I don't need all of this. All I need is the uh retrieve balance tool and list payment intense. And that's it. So, I'm adding this tool. If I try it here, I can say what's my Stripe balance? And let's wait a second again. It's connecting to the Stripe MTB server. Oh, yeah. You can I'm going to actually redo that and I'm going to configure the tool to not ask for approval, never require approval. That way I can uh try again and it's retrieving the data from my Stripe account. Okay, great. So now that we've uh tested this tool, we can actually use them in our own application. So we're going to uh imagine that we are an a store, you know, an inerson and online store that is selling wellness products. So things like yoga classes and zen kits and everything. And so as the store owner, the issue is that uh I have this these different sources of data. So I have like data in my stripe account. I have data uh in my uh for my offsite offline store like I that's in a in an Excel file. Like it's not easy to understand at a glance what my sales data look like. So I need to look up different tools. I also don't have an easy way to query my data. I don't have out of the box analytics. I'd love to be able to just ask questions to my data and see the results. So, we're going to build a uh data exploration dashboard using the tools that we just discussed. So, the MCP tool for the Stripe uh data, the web search tool, and the code interpreter tool. And so with that, we're going to be able to, uh, visualize and talk to our data in a much easier way. So, uh, just to give you an idea, this is what my sales data looks like. So, it's just one extract for, uh, January to July. And it just contains things like the customer names and, uh, the items that I sold offline. And then everything I sold online would be in my Stripe account. So let's start simple just to get a sense of how you would uh actually implement uh the tools in your application. And so let's try let's start with a simple Python script. Um so I already have like uh like a boilerplate code to just do like a common line interface. So what happens right now is that uh if I run it, it will just uh it will just like provide an interface for me to input something and then it just says hello world. Uh so that's where we are right now and now we want to integrate with the responses API. So what we're going to do is that we're going to replace that code here and we're going to call uh the responses API with the Python SDK. So actually cursor already know what it wants to do. So that makes my life too easy. Okay, I'm going to actually type it just for the sake of it. But um yeah, so you can create a new response with client responses. You can just specify the model that you want and then uh you can also um provide an input and that's all you need. And then we want to print that. So the whole response to see what's going on. And we also want to print the uh output text. Okay. So let's try that again. I'm going to run the script again. And then when I say hi, this time we actually hit the responses API. So we get this response. Hello, can how can I assist you today? Uh but that's all what that's all it can do now because we didn't add any tool. Uh now we can gradually start adding our tool. So for example, again cursor is giving it giving me a hint. Uh we're going to try adding the web search tool. And so when I told you it was as easy as like clicking a button, I wasn't lying. This is literally all you need to add to uh add web search capabilities to your app. So it's that simple. Of course, you can customize it. As I mentioned, you can like give a user location. Uh you can also set the um the how you want to search like if you want to do very thorough search or more uh a lighter search. Uh but for now we're just going to add it like that. Um okay. So let's

Segment 7 (30:00 - 35:00)

run that again. And I can try again. What's the weather in Paris this time? And so here the model will call the web search tool and then give us the answer. And as I was telling you, I didn't have to like handle the web search tool. It automatically was executed and the result was automatically added to the conversation context. So if we look at the response what it looks like uh we see that here the model decided to uh do a web search call with this query current weather in Paris and then uh this is the this is what it actually got from the internet and automatically it then added this uh this answer that we see here. So with the data that it got it was able to generate an answer and all of that in one turn. Okay, so now we can add all of our tools that we tested in the playground. And the beauty of the playground is that you can actually uh take this or actually I'm also going to add um I mean uh we we'll add it in our other app. But when you use the Stripe MCP tool, uh you might want to add this to your uh to your system prompt because the model doesn't necessarily know that you're looking at uh you're looking at sense and so your data might be like a little overblown if you don't add that in your system prompt, but we'll add it in our next app. Okay. So I want to integrate those same tools that I tested here in my app. So all I have to do is actually click here in code and I can see the code needed to do uh that uh API call. So I could just copy paste all of that. I already have most of it. So I'm just going to copy the tools array here. So I'm going to copy that. Here we go. And then in my uh tools list here, I'm going to paste this. I'm just going to replace this by the um environment variable that I have defined. And that's it. So this tools list, I'm going to import it in the other file here. So I'm going to replace that by tools list. And I just need to import it. Perfect. So if I run the script again now we now have access to all of these tools. So if I say for example what are my last uh five payment intents got yeah I might have uh what did I do? Sorry, error. Copy pasting probably. Let's try that again. Okay. So, let's say uh what's my Stripe balance? It's easier. Okay. So, it's searching first. is listing the tools that are available and now giving me back the answer that I want. So again, it called the MCP tool. If we look at the response here, it called the retrieve balance function from that MCP serve and then it used that. So this was the output of the retrieve balance uh function and it used that to generate the final answer. Uh great. So now that we've seen like how to actually uh implement that in Python in a very easy way, we're going to integrate it into a real app. So for the sake of time, I already uh prepared this front end because we only have like we don't have all day. Uh so I want to just show you what it looks like and then we're going to like explain the code a little bit. and uh adds the tools gradually. So this is the demo app um just to show you what it does right now. It just like um it just takes an input and calls the model but it can't do anything else. So if I say like what's my stripe balance is going to tell me what are you talking about? Uh okay great. So if I show you what the code looks like, the most important parts are in this root. ts file here. And so all of this boiler plate code, you can first we we're going to share the code uh to build our repo

Segment 8 (35:00 - 40:00)

but you also have all of that boilerplate code in the responses uh sample app, which is a an open source app that we have that you can use to build with responses. And so here is uh what we're sending to the API. We're just uh sending the input messages. We're streaming the uh we're sending the tools which for now are empty. This is our tools array. So we don't have anything there yet. And uh then streaming the result to the front end. And we're going to talk about what's happening in the front end in a second. Um and also we have this system prompt here. So we just say uh what I was telling you about earlier when looking at stripe values keep in mind that you need to divide them blah blah. Actually this is USD and uh just use web search. That's like an that's like a good uh tip for when you're building with multiple tools. You should specify to the model when they should use each tool. Like for example uh when we talk about online sales look at Stripe data. when we talk about uh inerson sales look at the sales data export like use web search only when we ask you about external data that's helpful for the model to know because uh otherwise it might try different things and it might not be the right tool okay so now let's add the tools array that we had just before in our Python script to that um to that tools list in the dashboard so I'm Just going to replace that. And I also need to update this so that it's compatible with JavaScript. And now I can refresh. And I can just say what's my stride balance again. Not very original. Um, and this time it should have access to this information because we just added the tools. So it's calling the retrieve balance and we can see the right answer. Perfect. uh but now we'd like to see the result in that dashboard that we have here. So would like to visualize it. And so for that we're actually going to define our own custom function. And uh that's one example of a time where you still might need function calling and not rely on the hosted tools. But as we've seen for most tasks, you can actually now just use the hosted tools instead of having to define your own functions. For this one, since it's actually connecting to our own interface, we want to add a function that the model can use to populate the data in the dashboard. So, we're going to create a function that's uh called generate component. Uh, and we're don't have much time. So, I'm just going to copy paste it from here um and tell you what it does. So, uh let me just grab it. Otherwise it might take five minutes for me to write everything. Uh and so this function generate component uh what it does is it's using uh structured outputs. By the way, if you don't know about structured outputs, check out the build hour for that. Uh to define the schema of the components that we might want to use. So we we're telling the model uh you can use this function to generate a UI component either a card, a chart or a table. And uh depending on the type of component that's the any off here you can provide these parameters. So for the card component we want a title a value and a description and for the chart component we want these things the values uh that we should show in the chart and for the table we want uh to have the row data. So now that we added that, we're also going to update our uh prompt here to say um that as soon as you have something that can be displayed, let me also copy paste that. Um as soon as you have something that can be displayed, you should use the generate component tool. And so the app is already uh set up to handle any tools that we want. But we uh have this handle tool uh function here where if it's a function called generate component that the model wants to call uh we're just using the result from that function to set the items that should be in the card in our dashboard or set the chart data or set the table data. Again uh this code is provided so if you want to see like how it works under the hood you can check it out. But the main idea is that the data that we show in our dashboard here is just hooked up to a store. And so when the model uses the generate component tool, it will give us the parameters, the data that we need to update the store. Okay. So now that we've added that, uh let's try it again. So let's say what's my

Segment 9 (40:00 - 45:00)

strip balance again? And this time not only would the model call the retrieve balance uh MCP server, it will also call the generate component function so that we can update our card here. And now we can try other questions. So for example, I'm going to ask uh can you search for my competitors doing yoga classes in SF and show it to me? And so this time it's using the web search tool that we gave the mo model. Um we uh we just had to add that as again in the tools array and it can search for the yoga studios and again calling the generate component this time with the table to show us the results in here. Okay. So now let's actually use code interpreter and so as uh if you can remember we added a file in the code interpreter environment when we configured it on the playground that file so now the model can know okay I have access to that file I can run code on it to actually get uh the data that I need and so here it's again generated component and this time it's a chart component chart data that was updated with our uh data that was computed by the code interpreter tool. But the best part about using uh this in the responses API is that it's actually uh multi-turn and the model knows how to use multiple turn multiple tools in a row and combine these tools together. So for example, if I ask can you list my Stripe payments from the last week? Show me cards with the following number of customers, number of failed payments, and uh amount of failed payments. It's actually listing the data. Okay. And here it can actually uh it can actually figure it out on its own. Uh but it might also have used the code interpreter tool to compute data if needed. And it's generating all the information that we want. And now we can visualize all of this neatly in a nice dashboard. And so all of that as you've seen I only had the dashboard the front end but uh all of the capabilities that allow the model to fetch the data whether it is on the internet whether it is in our uh stripe account or in our uh own file our CSV file uh I just had to add this with a few lines of code with just uh copy pasting the tools from the playground and that's what I can get as a result in just a few minutes. Okay. So, if I uh come back to the slides for a little bit uh to wrap it up. So, why would you want to use built-in tools? So, first of all, as you've seen, you can add them with minimal code. So, it's really simple to just uh like turn this on. Uh it's as easy as pasting in uh some code that you can find in the playground or just uh in our documentation. And that's all you need to do. You don't have any uh complexity overhead. So you don't have to write anything on your side. worry about what's happening under the hood. We take care of that for you. So the complexity of you know finding information on the web, optimizing a rag pipeline, finding what is the right code to execute to solve a problem and you don't have to worry about it. And the as I mentioned the best thing about the responses API and using this over multiple turns is that you can uh let us manage the conversation state. You can let us uh manage the yeah the history and the model can decide to combine tools together and to uh and to solve complex problems by using all of the tools at its disposal to um yeah multiple times in a row if needed. And so if you want to build uh powerful agentic experiences, an agent that can do a lot of things, especially with the MCP server that can give you access to anything on the internet that has an MCP server exposed, uh and all of that with minimal effort, then you can just use built-in tools and build something really powerful in a few minutes or hours depending on what you build. But it's a way to go much faster when you're building Okay. So I think now we are done with the demos and — yeah um we'll welcome Heavia's technical lead on stage. So Will would love for you to join us now.

Segment 10 (45:00 - 50:00)

— Awesome. Hey Will, how's it going? — Good. Thank you for having me. — Of course. Um, so for everyone tuning in, would love if you could give a quick intro um, what you do and also what Hebia does. — Sure. Um, yeah, so my name is Will. Um, I've been at Hebia for about a year now. And, um, I'm a software engineer, tech lead on our agents team. So, I'm working on all things agents. Um, from chat and deep research towards our matrix co-pilots. Um, for those who don't know, Hebia is a series B startup based in New York. um selling to financial and legal services. Um really at its core, Hebia is a search company and we like to say we're solving information retrieval. We have a few different products, a couple of which I'll demo for you all today. Uh but every product does the same thing sort of with a different flavor. And that is finding the right information and presenting that information to the user with the appropriate context. And I guess you can see in a video right here that's playing on the slide is a little sizzle video of our matrix product which I'll also demo for you live. Notably that product is meant to do large scale data analysis for unstructured data so users can see insights very quickly. Uh before he I was working on a startup uh went through combinator did the whole founder journey for a few years which was a fun and wild ride. Um and happy to be building AI agents at this point. Awesome. — So, yeah. Um, at this point, uh, I guess I can talk a little bit about how we use some of these built-in tools. Does that sound good? — Yeah. Uh, let's just go on to this next slide. Um, and then we'll let us know when you want to take over and start your demo. — Sure. Um, so I guess thank you all for giving this presentation on built-in tools. um a lot of what we do wouldn't be possible without them. I think that demo is great. I think it shows a lot of the really powerful things and that you can do with them and I think with the right sort of like primitive tooling as you add more complexity on top especially with the rise of AI agents and multi- aent systems you can build really powerful workflows. Um we do that for financial and legal services and it's providing a lot of real value and saving people a lot of time. Um and once again that wouldn't be possible without these built-in tools. So kudos to the whole OpenAI team for all the hard work here. Um the one I think I'll talk most about is web search in general. Um and we use it in many different parts of our product. Uh I think two of which are particularly important. one is for just finding information and two I think one of the limitations that we have with MLMs is that they don't know everything right and that context is important and the reason that people often have rag pipelines or provide context to LLM in general because they don't know everything uh and one of the other limitations that people don't often talk about is the knowledge cut off um and so there's often times in our product where users ask about things that the models actually have been trained on at all. And so we use web search as a way to get up-to-date information before actually going forth and doing long running searches. And I'll give a demo of that as well. Um, but to quickly show what you're looking at here, uh, our chat and deep research products are often thought about through the lens of the explore exploit trade-off. um where we often want to explore the space fairly broadly first to get updated in all the information and then we exploit the different areas that we think are useful and relevant to the user's question. And to illustrate why web search is important here, uh one of the questions that was a gotcha for us was you explained how Trump's tariffs are affecting you know my portfolio. The LLM only knew about Trump's first presidency and the tariffs that came with that and wasn't aware of the new insights that you needed to glean. Um, but if you can do a web search and get up-to-date information on those, you can now actually make more informed searches with web or other private knowledge bases to answer that question appropriately. Um, and that is a little bit of the explore exploit trade-off. Get up to information, understand the space, and then go deep. Um, so that's how we use web search. Um, MCPs as well. Uh, we have our own multi-agent framework that we've built in house and we try to write blog posts for everything we do is exciting. Um, if you want you can read about it and how we use MCP and web search as well. All of that's public. Um, and I think MCPs are great. It's a really step in the right direction of bringing offline data online, especially to LLMs. Uh, I think Hebia tends to be pretty opinionated in the way we like to index data and so we actually don't use a ton

Segment 11 (50:00 - 55:00)

of MCP servers at this point. Um, but we're always looking for ways to use them because when you have the right MCP servers, it can save you a lot of time. So, they're great. Uh, okay. At this point, I can turn it over to a demo if that works. — Yeah, we'd love to see this live. So, feel free to share your screen um and take over. — Perfect. So I'll walk you all through two of our products and explain how we use web search in both of them. Um so this first product is what you all saw on that animated video on the slide. It's called matrix and visually it's a mix of chat GPT meets Excel. So on the left side we have what we call our matrix agent which is almost our co-pilot that can drive the matrix and get insights across all of it. And then we have our actual matrix product which is a grid-like product that fundamentally structures information from unstructured sources. Uh and so here we have a company first matrix where every row is all the information about a company. And if you want to add a column you can but I'll just show you what it looks like roughly. Um so every column is sort of a request for information about a specific company. Um here there's a context button. We also have document matrices where you can contextualize with the document or with web. Um but here you can contextualize with web. So you sort of like you know you have your prompt that you send off to web. Uh it's contextualized with other columns within the matrix. So this would be company name. And we go off, we search the web for the appropriate information. Then we translate it into the format that's necessary for the matrix. So there's actually an example from the demo where it didn't know Whoops. Uh the right format for uh a currency, right? And so one of the things that our matrix does very well is data formatting. We call this type columns. And so you can actually specify the exact currency and the formatting you'd like for any single column. There's also a translation layer, therefore between the raw web output and what we actually show to our user. But ultimately here, web is used to just get up-to-date factual information regarding specific entities. Um, and you can do this really at scale as well, which I think is the most interesting part, especially for our customers. Imagine you had a matrix with 10,000 rows, 10,000 companies you wanted to get information on. All you have to do is add a column and we will do 10,000 web searches in parallel. across all of these different companies, getting those insights for you and allowing you to sort of like make informed decision downstream of them. Um, so once again, thank you for building web. It allows our product to be even more powerful. Um, and I'm going to switch over to a different tab at this point. Um, and I assume you all can see this now. This is our Chat and Deep Research product, which you know, I'm sure you all have seen products like this before. Uh this looks and feels you know like chat and deep research. The biggest nuances for us are the data sources that we have available. Um so we have web which you'll see here and I'll give a demo of that. Uh SEC filings and then private data like pitchbook or private company data like pitchbook. Um structured financial data which is in the form of you know snowflake tables as well as other structured information and then private sources as well. Um, so what I want to highlight here with deep research once again is that explore exploit trade-off. So I'm just going to quickly ask like can you tell me about how the latest US tariffs are affecting EV companies. So the sources I've enabled here are web and SEC filings. So, if I kick this off, um, we'll see what it does. The first thing it does is assess the sources, and then it's actually going to ask me some questions, which I'll answer. I'm just going to say yes to all to speed things up. Uh, once again, we'll assess some sources, and hopefully my answer was enough to do it. Okay. And so to a user uh what they're seeing in this multi- aent system underneath the hood uh are the actual executions that every multi- aent subsystem is doing. And so our web search agents are in this explore phase where it's looking at very broad concepts. Um once again just to sort of beat that LLM training cut off date so we can then make more informed decisions. And here now we're generating a research plan based off of this broad search we've done. The research plan is then informed by what we found on the web. The research plan will include things like what to look for in future web searches as well as what to look for in SEC filings including both like

Segment 12 (55:00 - 60:00)

companies and um information. And then finally that plan is then executed over pretty long time horizons you know upwards of an hour. Uh so I won't I'll spray you the details here but you know for every search we do you can also see the status so you see web searches you see our public filings index being searched for the exact documents company filing type etc. Um, and that's uh that's all on web. Maybe I'll quickly take a minute to talk about um MCP as well and I think why we don't leverage it as much as we should, but I think given the opportunity, we will continue to leverage it in the future. um our we're pretty opinionated in how we search for information and I think what we found is some common off-the-shelf knowledge base MCPs uh don't quite meet the bar that we need to serve our enterprise customers and so we built our own data indexing and then we effectively wrap an MCP server around our own data index and use that um and so what don't use like offtheshelf MCPS very often these days. We do use sort of our own uh flavor of it internally with our multi-agent framework. Um great. So I think that's everything I have to share. Um — so I can turn it back over to you all. — Yeah, thanks Will. Um that was awesome. We love seeing web search in a while um being used for real life applications. So thank you so much for joining us. — Yeah, of course. — Awesome. We'll see you next time. Um, okay, cool. So, we really only have three minutes for Q& A, so we're going to do a super fast speed round. Any questions that we don't get to, um, we'll send via email as a follow-up. So, let's get started on the first one. — Okay. U, and I know we've already answered some questions in chat, so perfect. But again, if you have some follow-ups, we'll be happy to help. Okay. So I ran into a situation where I have too many tools bloated in context window. What are the best practices for managing that type of situation? So that's actually a great question. Um so indeed when the model has many tools that it has access to whether it's custom tools or built-in tools, uh sometimes it can confuse which tools it should use. And so uh I would always start with prompting the model very thoroughly to tell it okay you should use this tool when this situation presents itself and other situation is here and sometimes that might not be enough in which case you might want to have like several agents that work together and you can orchestrate them uh to like focus on each one of them can be focused on a specific subset of tasks. So each one of them can have their uh smaller set of tools and then you can have like a sort of triage agent that can hand over to the right agent for the task. And so you can do that easily with the agents SDK for example which is an easy way to build these sort of like networks of agent and handle that orchestration uh very simply. So, uh I think we did have a build hour on ancient SDK, but uh if we if you you're not familiar with it, we have uh some content around it. We have it um in our docs. We have sample apps for it. So, I encourage you to use this if you're in the situation where you have like too many tools and the model can confuse uh some of them. Okay. So, next one. I've heard that the file search tool can be useful to embed an LM target description and URL to that actual document rather than chunking the document. Do we have any control over this chunking strategy? So, not as of now, but you do have some control for the file search. For example, you can do uh you can uh do metadata filtering. Uh you can also if you want you can indeed like just embed the the description and then let the model find the most relevant documents and then complete that with your own uh retrieval if you want. You can also use the retrieval um as a standalone API. So you can combine that however you want. I would say if you have a very complex uh rag pipeline and you want like fine grain control maybe you want to build like your own uh rag pipeline at that point but uh if you don't have like very specific needs or if you want to rely on our expertise because we're already doing that optimization process and already optimizing the pre-processing of the files. uh you can use the fast search uh tool and like kind of offload that to us. So, it's really up to you if you want to focus on that or not. Um I think we have Yeah, just uh

Segment 13 (60:00 - 61:00)

yeah, we're kind of over time. Uh — we can take the rest of the questions offline. Uh Will also got some questions in the chat, so we'll compile some answers and then send them over in a follow-up email. Okay. Um, so let's just move on to — just to answer that one. The answer is just responses because it's a quick one. — Um, and we also got um a few questions on the shell setup that you're running for the animation. Oh, yeah. — Um, so look out for that. We'll give you the full tutorial on that. — Um, but here are some resources. Feel free to look at the follow-up email um for these links or take a screenshot. And then I did want to plug that we have another build hour coming up um August 20th as well as August 26. So visit the homepage where you'll find all the details on the next one. Uh thanks again everyone for joining us and we'll see you August 20th. — Thank you.

Другие видео автора — OpenAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник