AI Show LIVE | Voice Live API & GPT-5 vs GPT-4.1 vs o3
53:43

AI Show LIVE | Voice Live API & GPT-5 vs GPT-4.1 vs o3

Seth Juarez 14.10.2025 54 просмотров 1 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Join us from 8:30 to 9:30 am PT for two exciting sessions: Upgrade your voice agent with Azure AI Voice Live API with Deb Adeogba and AI Evolution Explained: GPT-5 vs GPT-4.1 vs o3 – Smarter Choices for Smarter Business with Alexander Hughes. We look forward to seeing you there!

Оглавление (11 сегментов)

Segment 1 (00:00 - 05:00)

Yeah. Heat. Heat. Heat. Yeah. Yeah, Hello my friends and welcome to this episode of the AI show. This week I am in Lisbon, Portugal. So reach out if you want to say hello. Uh I am at the Azure Dev Summit in Lisbon, Portugal. I'm pretty excited about that. I'm going to be talking all about Agentic AI. So how about them apples? I'm excited to be here. I'm so glad that you're spending some time with us. Uh so I thought we'd jump right in. All right, let me tell you what we're doing today. So here we go. we have this um this handy dandy writing thing. So, for the first half of the hour, we're going to be talking about the voice uh wow, that was Oh, that was kind of fun. Um the uh voice live API uh with Deb. Now, this thing is really cool. Um, over the last year or so since Ignite of the year before, I've been actually using the real time GPT4 realtime API to do really, really cool demos. And I love them. And the cool thing about this is Deb's going to come on and show us like the expanded version of all of his stuff so that uh I can actually do more. So, um, you're going to love what she has to say. Um, and hopefully I can start to incorporate that and you'll see that in some later shows. And then, uh, part two, oh, oops, wrong side. Uh, part two, we're going to look at the new GPT5 model, but then do a little contrast with 4. 1 and 03 so we can get a

Segment 2 (05:00 - 10:00)

sense for like why are these things different uh, and how we can use them. And this is going to be with Alex. Uh so those are the two things that we're doing uh today. Now here's the thing. Um I'm pretty excited uh because we have coming up in about I think it's 6ish weeks, it's like the week of the 16th of November, uh we're going to be going to San Francisco for Microsoft Ignite. So let's actually let's take a look at the some of this stuff. I have a browser here. Let's see if I can figure out um tell you Microsoft Ignite 2025. Let's see if there's uh some stuff out there. You're probably thinking like, "What the heck is Seth looking at? " There you go. It's right here. Okay. Right here. Here's all the Oh, look at that. Ignite the possibilities. Ignite the possibility. I mean, that's here. Let me turn my sounds on. Uh turn this on. Uh, ignite the possibilities. I need to turn that up. Ignite the possibilities. Yeah. I mean, that's what we're doing. Igniting the possibilities. November 18th. I said 16th. 18th to the 21st. My wife's birthday is on the 21st. So, I'm going to leave the day before. I'm going to be honest with y'all. Let's see if they have any sessions in there. Session catalog. I have not. This is like the first time I'm checking this. So, let's see here. Let's see if I'm in here. I probably am. Here we are. Agentic identity and other enterprise superpowers. Oh, that's not me. AI defense. So, there's a there's some other Seth people. How cool is this? Cool. So, this is the only session that I think I'm in. I might be in more. I don't know. I'm not going to lie. Uh sometimes I don't know what's going to happen. Um this is cool. So I identity and other per enterprise. So that is what I'm doing at Ignite. There's a bunch of other stuff. I'm working on some new demos. Hopefully they'll show up in some sessions. Um and I want to do voice again. So that's why I'm excited for uh Deb's uh here for Deb's uh voice live thing. So that's that. So this is session catalog. Make sure you tune. I'm I think I'm also going to be helping with hosting the live show at Ignite. That's in 6 weeks. So, let's look at the Azure Dev. There we go. Azure Dev Summit in Lisbon. This is uh this is what's happening this week. So, we have some really cool people in there and then I'm there, too. Uh so, let me find myself here. Let me find myself. Um am I the only Seth? Oh, no. Oh, am I have I been cancelled? Let's scroll down here. Maybe they're just pictures. John Neil. Oh, I'm not on here. Maybe. Oh, there I am. They did not cancel me. So, I have a um I have a talk. Um one's a here. Here's the talk. Practical guide to agentic AI um AI powered agent blah blah. Learn why. And that's the first thing. And then the second thing is this one. Oh man. See, here's the thing. What happened was the world changed when I from when I wrote this to when this is happening this week. And so I'm actually working on looking at the um Microsoft agent framework, which is brand new. And so I'm going to I'm like learning that one real fast. And then we're going to have a full day of learning how to do that. So, Aentic Ops will insights on creating a robust agent ops plan. Real world examples and hands on practice. Yes. Here's a secret on this one. Uh, by the way, uh, anytime you see star ops in front of anything, it's literally just DevOps but with other stuff. So, yeah, that's what we're doing. That's what we're doing with that. So this is this week and I'm still working on this one uh because the stuff is just literally brand spanking new. So that's uh the Azure um the Azure Dev Summit in Lisbon, Portugal. All right, so without further ado, uh let's go to the Voice Live API with Deb. Take it away. You're not going to want to miss this episode of the AI show. talk all about Azure AI voice live API with my friend Deb. Make sure you tune in.

Segment 3 (10:00 - 15:00)

Hello, welcome to this episode of the AI show. We're talking all about the Azure AI Voice Live API with my friend Deb. How you doing, my friend? — I'm good. How are you? — Fantastic. So, tell us who you are and what you do. Yeah. So, uh, my name is Deb Adioba and I'm, uh, the lead for, uh, speech services for applied engineering under the AI platform. — Well, looks like your dog is not interested at all in anything we have to say. — You know what? Their loss. I'm just saying because there's a lot of cool stuff uh, when it comes to building voice agents primarily like it's hard, right? — Totally. Yeah. I mean, there's a lot of challenges when it comes to building voice agents. you know, uh there's things that you have to really think about like, uh, you know, you could see, you know, accuracy is probably the top one that you're really thinking about. Uh, you want to have high accuracy on your input and your transcripts. Um, especially if you have like domain specific vocabulary. Um, naturalness. You want to make sure it's a natural conversation between you, between your application and the user. Um and then you know as a company you want to have customized audio outputs because you want to make sure that you maintain that brand image and you have like a personality consistency. Um the other thing now with the introduction of large language models you want to have a choice of your generative AI model and that includes you know both uh large language models and also small language models and then you know uh you want to make sure that there's robustness and robustness in terms of you know interruptions and end of turn detection in different scenarios you know especially in noisy environments. Um and then you know you may want to embody your agent as well. have a visual identity and that help can really help improve engagement. Um but really it comes down to you know three things that you want to keep on the lower end which is you know engineering cost, time to market and also latency. — All right so this is awesome. So uh tell us about how the voice live API helps us with all of these things. — Yeah so we took all of that that stuff into account and so what we did is we created a unified you know real-time speechtoech interface. Um, so what it does is it takes speech to text, LLM, and also texttospech along with some conversational enhancements and also avatar if you'd like and it puts it into a single, you know, uh, low latency pipeline that you can use out of the box. Um, where you get, you know, your own choice of genai model. Um, you can use our unparalleled Azure speech recognition and synthesis. And then also, you know, you can use, uh, customizable voices and also customizable avatars as well. This I mean this is cool. Um and I'll put our faces up for a second that we did a demo at Build last time that used the GBT for real time API and it was a hoot to to to do it but I could sense that like there it felt like there was more stuff that needed to be added to make it more robust. Tell us about the the more stuff. — Right. Yeah. And actually GPT Realtime is available in this service. So you can use GPT realtime with voice live API as well. But yeah, it's like full of features. So you know um the first thing that's really important is that you know you can use the voices that are included in for example some of the real- time uh models but you could also use um our Azure speech. So we have over 140 local for speech to text 600 plus uh pre-built voices over 150 loces and we actually just released uh neural HD uh v2 voices. So those voices are actually contextually aware. So for example, if I, you know, write a sentence that says I'm very sad today. Um, a lot of texttospech services will read it out as I'm very sad today. But, um, what our service does is it'll read it out with emotions. So it'll say I'm very sad today. Um, so that's a big game changer. Um, and then you have, you know, choice of your gener generative AI models like I mentioned. Um, GPT real time, you know, we also have GPT 5 41 a bunch. We have our own five series as well that you can use. Um and we also have bring your own foundry model which is now in preview. Um customization is a big thing that you can utilize with voiceive API. So you can do custom speech, custom voice, custom avatar and we also just introduced 4K support for custom avatar as well. Um and then it the another big thing is it integrates seamlessly with um our Foundry agents. Um, so, uh, that also, um, helps if you're already building a Foundry agent, you can add voice to it very easily. Um, in addition, we have conversational enhancements. So, when you're piecing, uh, this type of a voice agent together, um, a lot of times you have to add like a third party on to handle stuff like uh, echo cancellation or noise suppression or interruption detection, but we've actually included all of that in here for you. We've already architectured it for you. Um and then I mentioned the avatars as well. So you can use we have a whole library of avatars but you can also do custom avatar as well because nowadays a lot of

Segment 4 (15:00 - 20:00)

people want to use a visual embodiment right to have a more personalized um experience with their users. Um and then if you're doing call center um we also it's we made it easy to build call center voice agents um with telephon um and you can do that through Azure communication service and there's more to come on that as well. Um and then you know for de developers um we have rich development resources. So we support C, Python and that's on top of websocket protocol. Um and we have additional um languages coming soon as well. And this is near to my heart because like the last like years worth of demos I've been doing it all with voice starting at Ignite, you know, of last year where we did a telefan version of this and like I'm excited to get my hands pruning to see all of the news because I've been using sort of older APIs and older models, but now I have a whole plethora of choices and I understand you brought something to show us what that looks like. Can we go to that now? — Yeah, let's do it. Today we're going to build a shopping assistant agent that recommends popular Microsoft products based on customer needs. But first we'll need to create an agent in the Azure AI foundry agents playground and then we'll enable audio input and output with voice live API. So first let's navigate to the agents playground and then we'll click on Microsoft device shopping agent. So you can see that we chose GPT40 for this but you can choose any model you'd like even FI. Um another thing to point out is that for the knowledge section we've prepared a catalog of popular Microsoft products. We've uploaded the file as a Jetik knowledge for rag purposes. So at this stage we have a fully functional shopping assistant agent and it's wellversed in Microsoft products. However, it's just text only. So now let's use the voice enabled uh voice enable your agents uh to give this agent a voice. So let's go to voice live playground. In voice live playground there's also some configurations that you can actually uh change to meet your needs. So you can see over the on the right here um you can choose your generative AI model. So we have bring my foundry agent as the model that we chose. Um, we also have several different ways to change or um, utilize some of the features with speech inputs. So, we have autodetect language, we have VAD, end of utterance. We also have audio enhancements such as noise suppression and echo cancellation. And we have a phrase list as well. So, you can actually put words in that are domain specific or jargon that goes with your industry that you want the speech to text to recognize. For speech output, we have a ton of library voices, um, neural HD voices that you can use, but you can also use custom neural voice as well. Um, and for, uh, you know, if you want to add an embodiment to this agent, you can also use avatar. So, we have a bunch of avatars in our library, but you can also do a custom avatar as well. So, for this demonstration, we're going to use an avatar. So, we'll use this avatar, and we'll hit apply. Okay. So, let's test this out. — Hello. How may I help you today? Hello. I'd like to buy a laptop for my brother who's in high school. Can you recommend a Microsoft device that's super high performance? I don't have any budget constraints. For your brother who's in high school and needs a high performance laptop, I recommend the Surface Laptop Studio 2. Here's why it might be a great fit. Surface Laptop Studio 2 one high performance. It features the 13th gen Intel Core i7 processor. — I want to buy I want to ask another question. Sorry to interrupt you, but I want to buy a gaming device for my nephew, but I have a very small budget. Can you recogn recommend a device for me? No problem at all. For a budget friendly gaming device, I recommend the Xbox Series S. Here's why it would be perfect for your nephew. — Ser what I need. Thanks so much for the recommendation. — So, as you can see from the demo, it just takes a few minutes to create a shopping assistant agent, enable rag for it, and add voice input and output capabilities through the voice live API. So, just a quick summary of the voice live API features. we've walked through for large language models. It allows full flexibility on model selection including uh any of the large language models that we support. On the speech to text side, voice Live API has automatic language detection, noise suppression, and echo cancellation capabilities and it also supports phrase list to customize speech to text behavior to domain specific vocabulary. on the TTS and avatar aspects. Voice Live API

Segment 5 (20:00 - 25:00)

supports a wide variety of pre-built voices and avatar and we also support custom voices and custom avatars. So, conversations powered by the Voice Live API feel natural and authentic with elegant handling of interruptions. And these all make the Azure Voice Live API the go-to solution to add voice input and output to any agents. — This is cool. I was looking at that. It's kind of cool that that's all in there because like when I when I did this stuff, I had like an API back end and I need to write sockets and it was a lot of stuff. So, what does this actually look like in the code? I know you brought a slide for that. Um, let's take a look at that. — Yeah. So, you know, for you saw in the demo like to add voice to agents, it's pretty simple, right? um you know if you're trying to you if you've already built the agent or you're trying to build a foundry agent it you it's really simple to just add um you know the agent connection string the agent ID and the authentication token you could see that on the left so it's just literally three lines of code — I see and so when you're actually here let me put my face up here so when you're actually like you build an agent you just want that agent to have voice you literally just add those three things and now you can interact with it over the websocket and it uses whatever intelligence that agent has. Correct. — Exactly. — But you could still use the previous one if you're building a more custom multi- turn more thing as well, right? — Is that right? You can still use the Okay. Cool. So, what are we looking at on this? Right. What is this telling us? — Yeah. So, one of the things that's really a cool feature is that you have so many choices for voices, right? You can use the built-in voices from the Genai model that you've chosen. Um but you can also use Azure's uh standard voices as well and that's where we have the 600 standard voices across 150 local. Um but also you can use custom voice. So you know if I wanted to create the voice of Seth I could actually add that in as well. And it's very simple. It's very simple to add it you know. — And how does one go about doing that? Is it just like you give some utterances and some text with it and that's what it does? — Yeah. for custom neural voice. Um basically, uh we require 300 to 2,000 sentences. So you would record 300 to 2,000 sentences and we have a self-s serve portal and you would upload um your voice data to the self-serve portal and it takes about 20 to 30 hours to get a voice back. — I'll be honest uh with you. I actually have done this before. I actually went through and recorded with this microphone and it's a very good one and it and I and the thing was speaking in this is like years ago and so I know this is technology is fairly mature. It's very good. — You should do another one because now we have HD voices. So the HD customeral voices are contextually aware so it would actually match your tone and your emotion as well. — Oh maybe. And you're saying like what 3,000 sentences? So, it's like an afternoon of like reading stuff and put — 300 to 2,000 sentences or 30 minutes basically. About 30 minutes. Yeah. So, it's not bad. — Now, here's a question. I know we're kind of going off the rails here, but like is it like something in the portal that like says read this thing and then you read it and then you hit next or is it like how does it work? — Yeah. So, for custom neural voice and we also call it professional voice. Um uh basically we in GitHub we actually have um sentences that you can read. So we offer sentences for you to read. But like let's say you're uh you own like Seth's Bagels or something like that. So maybe you need like uh domain specific words to your bagel shop. So what we usually recommend is that you if you have domain specific words, you should uh write some of your own scripts as well and put those words into like sentences and how you would say them. — That's amazing. And then it's just I'm guessing it's just a bunch of wave files and it's some huge data set and you upload it and it's like — go do your work and then it does that — put it in the oven and then wait for it to cook. — I love it. All right. So the one the other thing and the last thing that I wanted to ask you about was this notion of the avatar. Tell us about that. — Yeah. So avatars or visual embodiment are becoming more and more popular nowadays. So, a lot of people um are, you know, really opting in to add this visual embodiment to their agents or to their voice agents. Um so, we actually offer um standard avatars that are in our um our library. So, we have quite a few of those you can choose from and then those will use our standard voices, but you can also create a custom avatar as well. So, it's kind of the same um it's kind of along the same vein as custom neural voice. you would actually record video, right, of yourself and you would upload it to a self self-s served portal and um you get an avatar back that looks and um looks like you and exactly um and then you can also create a custom neural voice with that. So you can not only have just your custom avatar but also your voice that also matches it. Um so and you know you could see on the

Segment 6 (25:00 - 30:00)

as usual on the right side of the screen it's very easy to invoke that um avatar as well. — Well I'm excited because I literally I'm writing the next set of demos for Ignite and for Beyond for AI tours and I'm going to use all of this stuff. So Deb, I might be bugging you uh every once in a while if that's okay. This is amazing. Where can people go to find out more? Yeah. So, you can go to um aka uh I don't even know. Hold — live. I put it down below so people can see it because you know what? I knew we were I'm trying to be helpful here. — I Yeah, you go to aka. mvoiceicelivega. — Fantastic. And that's just a blog post that talks about the voiceivega GA. And is this G generally available now? — It is. It was generally available October 1st. Oh, — I look I like I said, I'm rewriting the entire voice pipeline of my thing and I'm going totally to do that. And then obviously if you want to play with it, you go to uh Foundry Playground for Voice Live as well, right? — Yes, exactly. — And then what is this link here? Uh voice live docs. What does this do? — Yeah, so if you want to go check out like how to use the API and SDK, uh that this is where you'd go. This has all the developer documentation. — Amazing. Well, thank you so much for spending some time with us, my friend. — Thank you so much. — And thank you so much for watching. We're learning all about upgrade your voice agent with Azure AI Voice Live API with my friend Deb. Thank you so much for watching and hopefully we'll see you next time. Take care. — All right, how about them apples? We should give her a round of applause. That was um that was awesome. Thank you so much uh Deb um on uh that. Yeah, I'm excited to use the voice live API. There's a lot of new stuff that just wasn't there. One of the things that I'm maybe thinking about doing maybe is maybe recording some of my voice so that the voice API talks like me. But would that be I don't know. What do you think is that like uncanny valley stuff? uncanny valley? The other thing is uh what do you think about the uh avatar? Should I make an avatar of myself or is that too uncanny about? Uh put in the comments. I'm interested in what you have to say. Think about that. All right. So, that's voice live API with Deb. Now, let's talk about uh GPT 5 uh with Alex Hughes. Take it away. You're not going to want to miss this episode of the AI show. We talk all about AI evolution explained. GPD5 versus 41 versus 03. Smarter choices for smarter business. My friend Alex. Make sure tune in. — Hello and welcome to this episode of the AI show. We're talking all about GPT5 versus GBT41 versus 03. How do you make the right choices with my friend Alex? How you doing, bud? — Hey, Seth. Uh, great to be here. I'm doing pretty good. How are you doing? — Fantastic. So, tell us who you are and what you do. — Awesome. Yeah. So, my name is Alex. I'm a product manager on the Azure AI foundry models team. Uh, and so what I do is I help bring the newest, latest, and greatest models from our amazing set of model providers, companies like OpenAI, XAI, Meta, um, and we bring those to Foundry so that our customers can use them in their agents and building their applications. — So, this is all awesome, but I feel like once you have a lot of choices, it's hard to make a decision. So I'm hoping you can sort of help us out with that. How do you know when to use what? Can you give us a little primer? — Absolutely. So choosing the right model is a big challenge. So there's so many different models. We're bringing new models and our providers are creating new models all the time and releasing them. So picking the right model, you really have to understand what your use case is and then have a clear understanding of what the models uh trade-offs are. So what their pros are um what their trade-offs might be. So, which models are more expensive than others? What models are more performant in different categories like reasoning than others? And that'll help you pick the right one. And I'm happy to help with that with GPD5 and some of the other models we have. — All right, let's start with GBD5. How do you explain this one? — Yeah, so GPT5 is the latest uh sort of model family from OpenAI. So, when we say GPT5 um we're not just referring to a single model, we're referring to a family model, family of models. Um, and in this case, GPT5 contains both reasoning models like GPT5, GPT5 Mini and Nano, and then also uh a chat model. So, picking GPT5 for your use case, uh, really depends on what specifically you're looking to do. But since we're bringing reasoning and conversation into one family, uh, you can really do anything with GPD5, anything you can think of within your organization. What GP 5 really excels at is code generation. Um, and then you also have a

Segment 7 (30:00 - 35:00)

ton of flexibility with things like tool calling, and I'm going to talk more about that today. So, that's really what GPT5 is great at. — So, it's more of like a collection of models. GP5 is a sort of a family of models, which is something that I didn't realize beforehand. Is that a good assessment? — Yeah, exactly, Seth. So, it's a family of models, a collection of four models to be specific. — All right, so let's go to the next slide. Tell us about each one of those. Yeah. So we have three reasoning models. Um and as we've seen with other models in the that have been released by OpenAI, um they sort of use this naming convention of mini and nano for smaller and faster models that are also reasoning models. So we've sort of shifted over from the uh OAR framework to uh in this case just GPT5. So we have GPT5 which is the mainline reasoning model. It's going to be best suited for those highly complex queries um that require deeper uh sort of chain of thought reasoning. Um it's really good for things like analytics and complex decision-m. GPT5 mini is uh a little bit smaller than GPT5. Uh lower latency but still great for reasoning. And then Nano is the smallest fastest reasoning model. Um and then of course we have chat which is best optimized for uh things like multi-turn chat scenarios. So, here's a question because I I we had Jennifer Marsman talk about GBD5, but maybe I didn't understand it correctly. Is the GBD5 like the one all the way at the top. Does that one like decide to do like model routing to different models or am I mistaken here? — Yeah, so great question, Seth. So, GPG5 is this collection of models. um within the OpenAI API directly, they use uh a routing uh service that essentially will pick the best model based on the query. We also have on Azure uh AI founder, we have a model router and our model router is really great at helping you pick the best model based on the input. It does it seamlessly. It does it for you. So you input your query uh your prompt and then the model router will route it to the best model based on the prompt. — Awesome. All right. So let's take a look at this here because there's a lot of stuff to think about when we start to compare these models. Correct. So it's good to get the framework of the four models. Well, four family of models, four things, but can you explain those like the difference here because there's a lot to think about. — Yeah. So um GPT5 is the latest um set of models from uh latest set of reasoning models from OpenAI. So um we've had other models in the past um that have excelled in different categories. We know that 03 was really good at like math heavy tasks for example, but 03 is being phased out now uh in favor of GPD5 since GPD5 in terms of the performance benchmarks as you can see um does perform better in those categories. So we're seeing GPD5 uh in particular the mainline reasoning model is outperforming 03 in those areas where 03 was the strongest. Um we also see with GPT5 we have a larger context window. we have 400,000 input tokens compared to 03 with 200,000 input tokens. So GPT5 not only supports uh more input tokens. It's also going to perform better uh across those benchmarks. We do see that its output cost is slightly more expensive um only by $2 for the output um but the input is uh considerably less expensive there too. Um so GPD5 is really the kind of evolution of 03. I just want to kind of emphasize that point. we're kind of from a reasoning model standpoint, uh, GPD5 is the next evolution there. And then compared to 4. 1, um, GPD5 chat is sort of what the next step there would be. But I want to emphasize that GPD41 still excels in some certain categories in particular, categories that require a large uh, input context. Um, so large document analysis, things like that. — I see. So, uh, 03 is like you could still use that one, but that one's going to it's going to be deprecated in favor of five because five does the things that 03 does but better. Uh, but then if you have a need for a larger context window, are concerned with cost, but it's not really off that by that much, you would go with GBD41. Am is that what I'm understanding? — Exactly. Yeah. So GPT5 chat uh does a really good job at um handling those low latency real-time chat scenarios that 41 is also really good at. Um but 41 does have a bigger context window. — Got it. — So if you need a ton of context, 41's great for that. — Yeah. And to me that's a little that's kind of a different task, right? Because it feels like that would be more of like a process automation where you're looking at very large things. Yeah, someone's not going to chat a million tokens in a context window is my guess, but that's uh I don't know. Maybe there is someone who can do that. — No, you're right. It's it's probably an unlikely thing. Yeah. — All right. So, let's take a look at the next slide here. Let's tell us about

Segment 8 (35:00 - 40:00)

what makes GBD5 a little bit different. — Exactly. Yeah. So, we so we took a little look at the benchmarks there and some of the improvements, but in terms of functionality, GPD5 um supports some new things that are pretty cool. So the first is free form tool calling. So in the past with tool calling uh we would need to specify those tools using structured JSON. Uh with GPT5 you can uh basically just send the raw text payloads. So if you have some Python code, if you have a SQL query, you can just send that directly to the tool and the tool will be able to handle that with GPT5 without actually needing to um sort of structurally define that tool with JSON. Um and this is really great for making those making your applications more developer centric. Uh getting started with tools is a lot uh more seamless. Um GPT5 also supports parallel tool calling. Uh so this allows it to uh call multiple tools essentially in parallel in a single model turn. So it makes using the model faster and in turn it makes your agents run faster. So a user doesn't have to wait um for one tool to happen, one tool call to happen then another then another. um if they're not dependent on each other then they can all just happen at the same time. So this makes your agents run a lot faster. — So help me understand this from a mechanical perspective how it works. So effectively you put something in the chat if you get the raw response it'll tell you to call three function calls for example and then you can run those at the same time and then put them back on the response back on the thread at the same time. Is that what that means? — Yes. Essentially yeah. So what we're doing is we define our functions within our application and I'm going to show you how to do this in just a minute here. So you define your functions uh what your tools what you want them to do maybe calling some sort of weather API or some time or whatever the tool might do. Um and within your application um maybe you're prompting your model to uh to do something. And if it if the model determines it has to use multiple uh multiple tools to accomplish what you're asking it to do, what it's going to do is call those tools in parallel. So it's going to use different threads to call them at the same time. And then it's going to get those responses back and then integrate them before it returns the final response to the user. — Ah, I see. So that's a function of the actual GBD5 service that's doing that on your actual behalf because I mean most people maybe they don't know this but the LLM itself doesn't call functions. It's a service that calls the functions. It's the LLM that recommends the functions to call. This the service just makes it much easier to call multiple of them at the same time. Is that am I getting this right? — Yeah, that's exactly right. — All right. So let's take a look at some of the the parameters. So effectively to make this actually work, you got to set parallel tool calls to true. One of the things that I maybe I did not get the understanding of the trade-off or the difference. What's the difference between reasoning effort and verbosity because help me understand the difference. — Yeah. So reasoning effort, there's four values for it. Um and there's one new value for GPD5 called minimal, which is the lowest possible level of reasoning. And then we have low, medium, and high which are the same values for previous models. Uh essentially what this reasoning effort parameter controls is how deeply the model is going to think before calling various tools or answering uh the prompt from the user. So if you want the model to think a lot, you can set reasoning effort to high. Um this is going to make latency longer because the model is going to take more time to think and uh it's going to allow it to create sort of more deeper analysis. think deeper before doing those various actions. Uh if you set it to minimal, it's going to think in the lowest possible amount while still being a reasoning model and still sort of doing that chain of thought thinking. Uh verbosity on the other hand um isn't actually controlling how deeply the model's thinking. It's simply adjusting the level of detail the model is providing in its response. So if the model is more verbose, it's going to give more detail with its responses. And if it's less verbose, it's going to give less detail with it with its responses. — This is really handy because I I've had models kind of like say a lot of stuff and it's like, whoa, I wish I could control that. And this is a control of that sort of thing. — Exactly. Yeah. So, — so you can have a model that's super thoughtful, but maybe you don't want it to kind of give you paragraph long responses. So, — I love it. And so, here's the diagram that I should have pulled up when we were asking about function calls. What are we looking at here? — Yeah. So, so here we've got just a um sort of an example diagram of what parallel function calls essentially was what it essentially is. So, uh before with parallel before parallel function calling we would do sequential tool calls. So, um up at the top we can see on the various model turns we're calling some various function in this case read file with three various files

Segment 9 (40:00 - 45:00)

and then we're getting those responses. With parallel tool calls, it's just one model turn and then we're actually doing all three of those reads at the same time and then getting that response. So, it's all happening at the same time instead of three separate uh model turns. And what this enables is essentially just faster uh faster agents, faster thinking um and then a better sort of interaction with users. — So, this is awesome. Uh is there any way to show us what this looks like in code? — Yeah, absolutely. So what you see here, Seth, is uh two files that I've created, two Python scripts. And what I'm going to be showing you here is some regular function calling, which uh I'm sure you're very familiar with. And then I'm also going to show you parallel function calling. So how can we uh parallelize these function calls when we have multiple function calls to make? Um so first with regular function calling uh here what I've done is uh initialize my u Azure OpenAI service client define my deployment and what I've got is some time zone data. So we have some various cities around the world Tokyo, San Francisco, Paris and I have the associated time zone for those cities. And then I've defined a function here called get current time and this function takes in a location a city and it's going to return the current time at that location. Um, so I'm using a function of datetime here to do that. Um, so that's what this function here is doing. And then what I'm doing here is I'm defining that tool um to use with my model. So I'm actually using GPT5 nano in this case. Um, and GPT5 Nano um, as Seth and I talked about is like a great model for low latency scenarios. So Seth, it's really good for uh building applications that need some reasoning but need to do it really fast. Nano's really good at that. — That's cool. — Yeah. Um so I've defined the tool here. In this case I'm defining the get current time tool. So I'm uh describing it as sort of getting the current time in a given location. Um I've defined the properties and then what's required here which is just the location. Um, so what I'm essentially giving prompting the uh GPD5 nano model here with is what's the current time in San Francisco. So I'm asking GPD5 to uh call the tool and then give me the current time in San Francisco. So um down here I'm sort of defining what the API call looks like uh processing the model response and then here is where I'm actually handling uh that particular function call. So pretty standard uh use case for a tool. Um and if I run this function I run this file, we can see that um we're going to get a response with um what the current time is in San Francisco. So we see that it's 11:56 a. m. in San Francisco right now. — Yeah. And could you scroll up because I want people to see this. This part is the part that's like kind of the magical bit here. When you get the response on line 83, uh if it's asking for a tool call, you look for the name and then you execute the actual function with the arguments and then you append the response of the actual function call to the thread. Is that right? — That's exactly right. Yes. — Perfect. And then you get a good response. Okay, cool. That was really fast, too. — Yeah, exactly. So, so GPD5 Nano is super quick, uh super low latency, but it can still do um really strong reasoning. So, it's not as strong in terms of its uh reasoning capabilities as the mainline GPD5 or GBD5 mini, but it's still very strong. So, it's really good at those scenarios where you need something that's quick, but also thoughtful. — Got it. All right. So, let's take a look at GBD5 parallel tool calling to see the difference. — Yeah. So, let's dive into this one here a little bit. So, um we're setting things up the same way using the same model GPT5 Nano. Um but in this case, we've got a little bit more data to work with. So instead of just time zone data, which we still have here, we've got some new data for the weather in these various cities as well. So we have the temperature and then the unit of temperature in these cities. — Got it. — So yeah, so I've got the same uh same function here. Um sorry, right here, get current time. This is doing exactly the same thing that I was doing here in function calling. Um but I've got this new function as well called get current weather. And this one's going to take in the location and the unit. Let's not do that. Um, and then it's actually going to return the current weather in that location. So, we have the data. Um, and then here's a function that can basically check what the weather's like in that location. So, I'm going to run the um or sorry, I'm going to define the tools the same way I did before. Um, so um get current time

Segment 10 (45:00 - 50:00)

is defined the same way. And then um here get current weather is defined as well. I'm requiring the location as input and then sort of what this function is doing is it's getting the current weather in that location. — And then I am uh prompting the model with uh what's the weather and current time in San Francisco, in Tokyo, and in Paris. Um and one thing I' I'd love to just call out here as well is here I'm defining uh tools using JSON. Um with GPT5 you don't have to do this. So you can just call the function directly with Python or with SQL for example. — What does that look like? — Yeah. So essentially if you have a um like a Python script that does something or let's say uh let's say you have an SQL query, query that calls um like a database and it's getting let's say in this case it's getting u like employee data for example. — Um so what you could do is with that SQL query you could just have it defined in plain text. So it's just a raw text like string SQL query string — as part of the prompt. — Exactly. Makes sense. — Yeah. And then you you have that string defined. You know what it's going to do and then you basically can have the model use that string and and call a tool directly with it. — So you don't actually need to in those cases define the tool with structure JSON. you can just have the raw text SQL query payload and you can just send that and send that as the prompt to the model and then the various tools will get called with that um with that raw text payload. — I love it. All right, let's take a look at how we parallel call these things. — Yes, absolutely. So, if we scroll down a bit here, we've defined those tools. Um here's our first API call. Uh here we're asking the model to use the functions. Um, and then here's where we're actually processing those uh responses. Um, so as you can see, Seth, it's really the same handling as we saw before with function calling. So in the response we're essentially just going through and seeing is the tool in the set of tools that we have and then if so um we call it and then we append those results to the message that the model's going to return to the user. — Got it. And now is this. So here's I see. So in the for loop in this there it might be returning more than one of these function calls and you're just executing and put on the thread as they come in. — That's right. — Okay. Cool. Cool. All right. Let's see what it looks like when we run it. — Yeah. So let's go ahead and uh and run this one here. Yeah. So, first we see really fast response there. Um, and then what we're seeing is um all of the function calls that were made. So, we have this all printed out here. So, we can see um get current weather is called uh with San Francisco. Um and then we see uh get current time is called for San Francisco. Um and then we see that it happens for Tokyo, the two functions. And then it happens with Paris with the two functions. So essentially GPT5 is calling all three all two all of those tools for all three of the cities. And it's doing this in parallel. So it's not just calling one then another then another. It's actually doing it all at the same time. And then it's printing those results. And we can see we have um we're for the three cities we have the weather and the time uh for all three. — Well, this is awesome. Uh where can people go to find out more about this? I have a couple of links. Tell us about this one. — Yeah. So, um uh — the blog post, tell us about the blog post. Sorry, you're probably not seeing it. I'm pointing at something you're not looking at. So, this is a blog post. Tell us what's there. — Yeah. So, um highly recommend you check out this blog. We highlight what's new with GPT5, its advantages, um the new features we're enabling on Foundry with it. So, highly recommend you check that out. — Awesome. And then on learn, there's some information similar to what you just showed. Is that correct? — Yeah, we show all of our OpenAI models, the reasoning models, what you can do with them, uh and then of course, examples of how you can use them. — Well, this has been amazing. Thank you so much for your time, Alex. — Thanks so much, Seth. I appreciate it. And thank you so much for watching me learn all about AI evolution explained.

Segment 11 (50:00 - 53:00)

GBD5 versus 4. 1 versus 03. Smarter choices for smarter business. My friend Alex, thank you so much for watching and hopefully we'll see you next time. Take care. — Awesome. Thank you so much, Alex. Let's give you a round of applause here also for uh for you. Another uh air horn for you. Thank you so much. That was super informative. It was it was cool that he spent so much time answering uh my questions. Uh so yeah, that's pretty exciting. Thank you so much for that, Alex Hughes, uh for that. Next time on the AI show, let me see if I can pull this up here cuz I always forget. Um I always forget and I cuz I never know. You know how like um you're like working on stuff and you're like, "What's the next thing? " Uh let me go in here. Hold on one second. One second. I'm finding it. The next episode of the AI show, I think we have uh Hold on. It's okay. It's saying it. Okay. Here we go. The next episode of the AI show. Okay. Azure AI translator with Krishna. Who? And this is an awesome one. Uh it's all about machine translation. Something that we absolutely need. That's the first half of the show. And the second half of the show says TBD. So, I'm going to just show you stuff that I've been working on. So, make sure you join us next week, October 20th, uh, for the AI show. Hey, thanks so much for spending some time with us. I know your time is valuable and we appreciate every second that you spend it here with us on the AI show. Thank you so much for watching, my friends, and hopefully we'll see you next time on this the AI show live. Take care. Heat. N. Heat. Hey. Hey. Hey. Hey.

Другие видео автора — Seth Juarez

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник