AI Show LIVE | Foundry Local Integration & New Agent Framework
1:02:25

AI Show LIVE | Foundry Local Integration & New Agent Framework

Seth Juarez 07.10.2025 78 просмотров 2 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Join us from 8:30 to 9:30 am PT for two exciting sessions: "Unlocking AI Potential with Foundry Local Integration for GitHub Copilot in VS Code" with Maanav Dalal, and "New Agent Framework for Next-Gen Multi-Agent Solutions" with Elijah Straight. We look forward to seeing you there!

Оглавление (13 сегментов)

Segment 1 (00:00 - 05:00)

Hey. How do you feel? Hello and welcome to this episode of the AI show. I'm excited to be here. Uh your friendly host here, Seth Wararez. We got a really good show for you today. I'm excited for that. Before we go to that, uh I'm going to side of the eye. Tell us where you're coming from. Say hello um next week. Oh, I'll tell you about that next week. Uh but side of the eye, where you're coming from, I'm excited to be here. Like I said, we got two good stuff. Uh and particularly like interestingly enough, interesting, let me uh let me actually let me bring up my screen so I can tell you what we're doing today. Uh entire screen. Here we go. Boom. Today on the AI show, as you know, I put the little screen up there. Um, uh, Foundry Local uh, with Manov. I want to make sure I spell his name. I like when I did the show, I said his name wrong. I felt so bad. Manov. Now he's looking at this. He's like, "Wow, Seth. Wow. " Oh, shoot. I just dropped my pen. Wow, Seth. Wow. Uh, how did you do that, Manovv? Uh, so he's here. Awesome dude. Uh, we got some good stuff with him. Foundry local and this I mean we'll talk about obviously we'll talk about it when he's, but it it's kind of going to unlock some scenarios. Not going to lie. Not gonna lie. And then number two, uh we released this video earlier on, you know, um um we released this video earlier. Um but I wanted to bring it on here and show it live. Hopefully Elijah will come. Is Elijah coming? I don't know. Rian, she's our producer. Maybe she'll tell me. Uh we have something called the new. Yes, he should be. I love it. Look at this. The new agent. Hold on. I got to have like a I used to have a sound effect for the new Oh, I'm not I'm pushing — the new agent framework. The new agent framework

Segment 2 (05:00 - 10:00)

Sorry, I pushed the wrong button. The new agent framework. What button did I push? I like disappeared for a second. Um, uh, this is with Elijah, also an awesome dude. Wow. I just like disappeared for a second. Rian's Rian, our producer, is like, "What? What is going on, Seth? What are you doing? Maybe I pushed mute or something. " Gosh, I'm so bad at this. So bad. No, I'm back. I'm sorry. Yeah, I pushed a button and like I must have skinny fingered, fat fingered something. Sorry about that. Okay, so yeah, the new agent framework with Elijah. Hopefully it was just like for a second. Gosh, Rihanna, was that Did I disappear for a long time or was it like half a second? It wasn't long. How like this could have been embarrassing for all of us. You know what I'm saying? Uh the new agent framework. And then if we have some time left over, uh I'll show you what I'm working on. Um okay. Uh here we go. Hello. Hello. Hello, my friend. Oh, I think I Let me change something. There we go. Hello. Hello, my friend. Indeed. I don't know. Maybe I like this one. My face isn't covered up, but maybe that's a good thing, you know? thing. Uh, okay. Uh, so we have that. Uh, hello. Hello, my friend. Um, Azal. And sorry if I'm looking to the side. This is cuz I at the side of the eye this thing because I got another screen here. Um, let's see chat. Let's go to the chat. Uh, hello my friend Richard from Canada. Oh, I think I remember last time Richard and I have a discussion. By the way, I learned really well when I say a dogmatic statement that I don't 100% believe and then someone corrects me. So, uh, we had one of those last time. Uh, Richard, thank you for coming. Uh, hello from Austin and Dallas. Uh, Austin. Hi. Hi, Austin from Dallas. I got confused. It's like Forest Gump, right? Austin from Dallas. Remember when they were out in the I was going to say something dumb like he's in Austin and Dallas. Austin is a multifasic being. Sylvest Mexico. If you didn't understand that, I said my mom is from Mexico. I lived there when I was a kid. So, and then I also lived in Spain for a couple of years, uh, for two years in southern Spain. So, my Spanish is weird weird. Uh, here here's a question. Do you use a mouse or a pen to write? During the pandemic, I made a pandemic purchase. A Wacom tablet. So, I can do really cool stuff like right like that. It's amazing. Uh yes, we got uh okay uh Seth Janiscu number seven. Uh he's the one holding up the room in uh in Twitch. Thank you for coming. Thank you. Uh hello from India. Uh yes, he's from I know Austin. I got it. I these are the jokes. Apologies for them. Um, all right. Uh, I think uh I think Manav is here, but I don't see his camera on. Uh, let's just try it. YOLO. YOLO. All right. So, let's bring him up. Uh, Manav, can you hear me? — He's typing. He's I press the wrong one. He He's typing. He's here. Okay. Can you speak? All right. This is what I'm going to do. I'm going to I'm going to We'll figure this out. Uh he's here to answer questions. So if you have if you have questions um just u put them in the chat. But in without further ado, let's get into it here. Um uh Foundry Local with Manav, my friend. Let's take it away. You're not going to want to miss this episode of the AI show. We talk all about unlocking AI potential with Foundry Local integration for GitHub C-Pilot in VS Code. Make sure you tune in.

Segment 3 (10:00 - 15:00)

Hello and welcome to this episode of the AI show. We're talking all about unlocking AI potential with Foundry local integration for GitHub copilot in VS Code. My friend Manovv. How you doing my friend? — Hey Seth, I'm doing great. How are you? — Fantastic. So tell us who you are and what you do. — Yeah, so I'm a PM on the Foundry Local team and I'm really excited to be here. It's AI show is amazing and I'm gonna talk a little bit about Foundry Local. So, why don't we start with before we get into like the GitHub stuff, can you give us a little like introduction to Foundry local, why it's important, and why should people think about it? — Yeah, totally. I mean, so local AI has been on the top of my mind forever, but I feel like it's probably important for everyone else to understand it. So, I brought some slides uh to kind of talk about that. Local AI can be used in so many different places, but let's talk about why you might want to use it first. So, privacy and security is paramount. uh whether you be a bank uh working with taxes or anything else, a lot of people care about their data a lot. And a lot of these public services are not really privacy and security focused. They're focused on providing you uh information quickly. So local AI running your own models, you get to choose what that privacy and security looks like. Uh latency and low bandwidth is also relevant, but I think that really what that comes down to at the end of the day is just you can run these models offline. That's really important because maybe you don't always have access to the internet. I know I've, you know, been able to do that a lot when I'm on like a flight or something and I'm like, great, I want to use LMS to do things for me, but I can't uh because I don't have internet. Well, with local AI, I can do that. And then cost efficiency is also really important because these can get expensive. I don't know about you all, but I am subscribed to a couple AI services and it can be really challenging when you have so many to look at and I'm sure you've maybe had the same experience. Yes. So with the case with Olay AI, as long as my hardware is good enough, I can run whatever I want for completely free outside of electricity costs. — And then the last one is that the the fact that I'm looking at all of these apps that are like if I want to use an LLM in offline mode, I can see me wanting to do that here. So the question is how does Foundry Local help? — Yeah. So we provide an interface uh a model interface to or rather an API interface to access these this LM on your device. So we spin up a server for you and we have a lot of great things going on in the back end over here. You can see on runtime is what we're using and if you're familiar with the open model or the open source model space you know that we already have that set up for like other models everywhere. And the beauty of it is really that it's crossplatform, but also that it really accelerates all the things that you're doing in terms of inference. So when it comes to these models, whether it be um Nvidia hardware you have, AMD hardware you have, Intel hardware you have or Qualcomm hardware you have, um whether you're on Windows, Mac OS, it doesn't really matter. We provide really good acceleration and we work with these hardware vendors directly. So using that goodness of on runtime that's baked into Foundry local, we ensure that you get some great uh inference performance and that's really important. — Yeah. And I we've I've played with found local and it's really cool. It's almost like you have your own like mini LLM but running on like an endpoint on your local device which is super cool. All right. So now let's talk about the GitHub copilot in VS Code integration. What is this all about? Because now this is because I think I've heard this. This is the part I have not heard about. Why don't you tell us about that? — Yeah, so this is new. Uh I did bring a demo, but before we jump into that, I just want to quickly summarize what's going on. Uh in GitHub Copilot, you can choose models uh just like everyone else. And you know that's based on what you like. With Foundry Local, we can bring the models that we want to use with GitHub Copilot. And unlike with the subscription that you typically have to pay, these are completely free because you can run them on your own device. So without further ado, let's just jump into the demo. — All right, let's do that. Hi, today I'll be covering running local models using AI toolkit for Visual Studio Code via Foundry Local. So to get started, if you haven't already, install the extension and you can go ahead by searching for AI toolkit in the extension store and go ahead and download that and install it in VS Code. Once you've done that, you're going to have two more prerequisites, GitHub Copilot and GitHub Copilot chat. They should install together and you'll have this little button that kind of shows up. You can go ahead and toggle the chat and open it. And we're going to be in ask mode for this. Then we're going to go ahead and click manage models. If it's your first time, it may take a second to load. Go ahead and click Foundry Local via AA toolkit and it'll load a set of models that are compatible with your device. Now, I'm going to choose the Quen models, and a 1. 5 million model since I don't have the greatest processor right now. And we'll go ahead and add those. Now, in the pane, we should see these. I've already have these installed, but if you don't, it'll prompt you when you click the models to start running them. Now, as expected, we can use them with VS Code and Kitup Copilot Chat to answer questions about our code. So, I'm going

Segment 4 (15:00 - 20:00)

to take a look at this payload. use the CUDA version since I have a CUDA device. And I'll go ahead and ask, tell me about this file. It'll take a second to load the model into memory. And then we should be off to the races. As you can see over here, while it's working, my GPU is loading into memory and it's being used. And we got some information about this file. That's wonderful. Now, let's go ahead and look at edit mode and try again. Let's have it make a quick change. Again, it's going to go ahead and using the GPU, it'll make the edit to the temperature and we can keep that change. Now I've headed back into Aspug for one last sample. I'm going to have it generate another sample JSON and we can use this also to write function similarly and it'll go ahead and do that and in this case it has questions for Tokyo. I hope this was a great intro to using AI toolkit and Foundry local to run local models on device. Just as a reminder, they're completely free and we have models for CPU, GPU, and NPU. So regardless of what device you're using, you can use the power of local models. — That was awesome. — Holy cow. — That uh so let me let me get to this because I I think you had a couple more slides that I want to get to because I want people to understand what's actually going on. So effectively there is a model running on your box that now whenever you do GitHub copilot stuff in VS Code it's just using that model. — Exactly. So on device, you know, you're not pulling anything else. You're not paying for any credits. It's just your own device. — Now here here's a question and now maybe we can you can uh like lay the cards out. Some of these models are not going to be as good as the bigger models, right? So in your experience, when would you use these models versus the other ones and what have you sort of been feeling out with the smaller models that are running on local device? — Yeah, I mean that's an amazing question. I think for me personally, there are certain situations where you do want to use big models, right? Uh when you're doing in-depth research, uh I know deep research is like a really great use for those large models. Uh up-to-date information, we will have tool call and support very shortly. But in the meantime, accessing the internet and using tools of that sort, you do want the latest models. But for things that, you know, aren't necessarily super uh, you know, large, I think anything that you want quickly done, this is a great example. So whether you're building it into apps because I think that's the main key use for Apache local. You're doing short searches, summarization, like there's so many great AI things uh in the GPT4 for example era or the GP40 era where, you know, we even support GPTOSs. That's a great example. But uh these smaller models that you know can do really smart things, we don't need to use the largest model in the world. — I feel you. So you're saying like hey if there's regular like standard text things like summarize this so that it's more concise or things that you would use a tiny model for you should totally so like if you're on the plane and you're editing like your documentation use the smaller models because those would totally work. — I mean I'll give you a great example. I was filling out some government documentation, some paperwork, and I was literally on a plane and I was just like, great, I do not want to write all of this information, but I'll write bullet points for you. Great purpose, right? I just spun up Foundry Local, had a chat window open and just gave it my bullet points and it turned it into like a large paragraph for me because I just didn't want to go through the effort of writing. — That is amazing. So, let's there's two more there was two more slides because I want people to know how this is working underneath. Tell us about what we're looking at here. — Yeah, so it's just a quick demo of what's going on. Uh so Azure AI foundry is where we actually host all these models and we're able to pull those in. Uh and then we have the local like management service that's kind of happening on device as well as we have on runtime all the way under bed which is you know based on your hardware making sure to optimize. Uh we have this all baked into Windows AI foundry. So uh I think 24 H2 and onwards uh we're using whatever execution providers you have whether that be CUDA, Windows Vitus for uh AMD or you know uh Qualcomm as well. So even if you have an NP on device, we're just kind of solving that whole problem for you. You don't have to figure out, you know, what drivers to install, uh, which packages to install. Once you have found your local installed, you're just kind of good to go. And whether that be through AI toolkit or that just be locally as a package via Winget or Brew, you're just good to go. — And the cool thing that and here's the thing that most people don't know because I heard about and I'll bring up the next slide so people get conscious. Windows ML. I heard about Windows ML like probably two or three

Segment 5 (20:00 - 25:00)

years ago. And so it's a really sort of bakedin thing to Windows. How does Windows make this better? Tell us about Windows ML. — Yeah, so when ML uh I think the J was very recent, but what that allows us to do is we, you know, just have the acceleration just baked in. I think that's like the simplest way. I don't want to complicate too much, but just — it just makes it so easy to deliver the packages that we need because a lot of them are just going to be on device already now. — Yeah. And and the thing that I learned about before because I this was maybe I shouldn't have said it was years ago, but I saw the internals of this stuff years ago and I remember the engineers were telling me, you know what's cool about this, Seth? The fact that you can just use the Onyx runtime and as an operating system, we will choose the hardware that's the best that's actually sitting in your box. So you don't have to be doing like because sometimes if I have to write a CUDA thing versus a CPU thing, completely different thing. And now, same for the MPU. The Onyx runtime in Windows ML helps you sort of do this without thinking about it. — Exactly. We're just removing that from the users in terms of uh difficulty. You just don't have to deal with it and it just works the way that you need to. Actually, if you could pull up that slide really quickly. — Yeah, let's do it. — Uh people can understand that there's many different ways to interface with Foundry local. Uh whether you use the CLI itself, uh we have SDK options and then you can also just call REST API uh straight to the endpoint that the server spins up for you. So there's a lot of you know ways to work with this and we have uh you know C C++ C uh Python JavaScript like whatever you want and I think even the community's been contributing rust bindings. So there's lots of options uh so you can use this however you want to. — I think I missed this part. Uh so if you don't mind maybe I can like sort of pull that thread a little bit. There's a Foundry local management SDK that allows you to execute these models without having to call an endpoint. — Yeah. So it's, you know, it's effectively wrapping that just so it's a little bit easier for you to do. But yeah, whether you want to download models, run off the service, um, actually spin like do inference, all that is just enabled a little bit easier with the SDK just so that I see, — you know, we're not calling direct endpoints and then because we make sure to dynamically allocate these ports sometimes for security and safety reasons, — we don't have to figure out what that is on the user's device. — I see. — So if you're running this on your device, you basically start Foundry local. It's a service. It's running and then it loads. It has models that are loaded and unloaded. It depends on the ones that you want. And then if you use the SDK, it just knows how to talk to that natively without you having to specify the endpoint, put in a fake key because I remember there's no key for this because it's local. You don't have to do any of that stuff. — Exactly. It just makesense that much easier. And so we want this to be really easy to ship with applications. And just being able to do that without having to go through the effort or the harder effort of development just makes that really easy so that we can just package this with anything that we need to send it off and then all of a sudden you know your local like applications are just using out of the box for free for your users. — Oh, this is amazing. Like I like just for folks I have to do a workshop one I think coming up in a couple of weeks and I got to get people to like use LLMs and play with them. I could literally just have them all install Foundry local and they could be irrespective of what the internet's like in that workshop. This will all still just work. — Exactly. — This is amazing. Where can people go to find out more? I know there's a blog here. Tell us about this. — Yeah. Uh this is a great way to get started. We have the AI toolkit and GitHub copilot integration at this blog. So that's a great place to get started. Otherwise on GitHub we have found you local. So just go ahead and search that or we haven't we have learn docs as well. Whether you want to get started there but uh awesome. There's amazing links there and you can just get started and get going. Uh it really is one line install whether you're on Windows or Mac OS and you can just get to off to the races. — This is amazing Monov. Thank you so much for spending some time with us my friend. — Yeah, thank you so much for the time Seth. It was really glad being here. — And thank you so much for watching and learning all about how to unlock AI potential with Foundry local integration for GitHub copiloted VS Code Plus+ because we talked about a lot more too with Foundry local stuff that I did not know about. Thank you so much for watching and hopefully we'll see you next time. Take care. Dude, that was awesome. — Yeah, thanks Seth. I had a great time. — Let Let's just uh That's how we know — it was good. And then some clapping here for you. — So I So uh that was the evergreen one. tell have you seen people use some of this stuff like and I know it was foundry local for GitHub copilot I know that that's what it was all about but I think it's much larger than that what's the reaction been to folks in foundry local — I think people are really excited about like you know specific use cases that local AI are really good for I think I I can't say names but like you know the people like banks or like tax firms who love the concept right because — people want to you know have tax like you know AI use in tax for example but like they also don't want all their

Segment 6 (25:00 - 30:00)

information kind of going to, you know, used in training or just like to give to other companies. And the beauty of local AI is I can have the power of AI but then not lose my information or my privacy that I really care about and then you know make taxes easier for example or make banking information easier. So I think that like you know customers like that have been really interested. Um, even stuff like, you know, like searching through your chats or something like there's people that we're talking to where it's like, you know, I want to use AI to search through my chats, but I don't want necessarily, you know, insert big company here having access to all of my chat information. — I can do that with local AI. — Yeah. And the only thing the other thing is like even like when you do computation like the JavaScript in particular has been moving a lot of stuff that used to be done at the back end to the front end, right? There's a way to start separate the silly simpler tasks with LLMs and then moving some of the harder bigger tasks to the back end which to me is absolutely what has happened in computing since the beginning of time. — Yeah. I mean I think I remember like just back in the day like having to write like little web scrapers or something like you know for anything right like it's just I need you know like it takes a long time to figure out how to use this package or do this like crazy thing and now it's just like I can ask a smaller model to do all this stuff really easily for me and not have to worry about you know kind of figuring that out myself. I think that's really where this local AI um you know is really useful. — Yeah. Now, here's the thing. It seems like the weird thing is that video um it looks like I'm wearing the same shirt. I only have one — not come so prepared. — No, no. That was not on purpose. I was like, "Oh no, now everyone's going to know that I have the same shirt on Mondays. " Do — you have a schedule? Do you have a reason? — No, I don't. He's Manovv is like, "What the heck? You have a schedule for your shirt? " No, I don't. — That would be insane, right? — Yeah. — So, here's some comments from Janice number seven. Windows ML with honest runtime and hybrid compute in one. Yeah. No, that's not the one I wanted. He was saying Windows Shoot. Here it is. Windows ML is so good. Yeah, it is. — It's really cool. I don't know. We GA recently. We had a big celebration for it internally and we're just really happy that, you know, this long, like you mentioned, this long uh thoughtout pipeline is finally where we want it to be in GA for, you know, all devices. I think H2 onwards. — Yeah. And and I know I said I had seen Windows ML. It's been in the works for a while. But the thing about it that's cool is that using this stuff on device is cool. In fact, — man, I was so inspired by it. I literally downloaded it like the minute after we did everything and I started it. So, I'm pretty excited trying to get it to work with agent framework. There's still some mismatch. — Some engine play that we're working. Yeah, sure. But maybe I'll do some poll requests because I like I said, I have a workshop next week in Lisbon where I'm gonna try to do Foundry Local and Agent Framework and put them together. So, I'm still — That's so cool. — All right, dude. Well, thank you so much for being with us, bud. — Yeah, I can't wait for the Asian framework demo. — I know. I have to finish it this week. So, there is that. All right, bud. It's good seeing you. — Yep. Have a good one. Thank you. — Uh All right, my friends. Uh that was uh Manovv. What a good dude. like um he's like I have some I work with some good folks like some seriously really smart and I I'm working on some features right now for the next version of whatever we put out and our engineers are just absolutely stellar. Uh so uh I get to work with some really really cool people. Um okay so uh coming up next we have Elijah. He'll probably be here in a little bit. Um, but I thought maybe we'll take a minute and as we go here because we have some time before uh I wanted to show you Foundry Local uh if that's okay. Uh so let's go back to our screen share here. Uh and let's open this up over here and let me go to uh Foundry local. Let me not show that thing. Uh Foundry local. I know you can't see. Uh I just want I need to move. Okay, here we go. Foundry Loco What is Foundry Local? Um Foundry Local. Let's see here. What is Foundry Local? Indeed. Foundry Locals avail uh preview public preview release early access da da use cases. Do I need to add Azure subscription yet? And you're probably wondering why would why would a company like Microsoft uh do this? Well, it's because we really want you to use LLMs. Uh and the reality is if you want to put an LLM thing into production to me, these are like this is like the new primitive for developers LLMs. Uh do I need special drivers for MPU acceleration? Uh I guess so. But I think if you have like a service device, it just works. I know I have a service device that's like a Qualcomm and it literally works. Um, follow the get started with Foundry Guide to help get started. So, get started. Uh, new MPUs are supported only on systems running Windows 24H2 or higher.

Segment 7 (30:00 - 35:00)

Yeah. Um, if you see service connection error after installation found. So, yeah, this is what I did. I did a wingette and looks like it works on Mac OS. And then, uh, do a foundry. So, so here we go. So, F foundry model list. Oh, it'll sit. Foundry model list. Uh, I downloaded a couple of models already. Um, so, so these are all the models that are available, right? Looks like we have a DeepSeek R1. Um, we have a 54 54. And notice that you have different versions of them, right? You have like a CUDA generic GPU. I installed I installed um 54 mini which we So how do I know which one model help um Foundry and this is still preview stuff so take a look at it. Uh so there we go. There's a cache uh discover run and manage models. Foundry model help. So I installed a couple. I don't remember which one I did. Uh foundry model least. Yeah, I think I installed uh Quen. Oh, I did not install GBOSs. So let me let's do this. Let me show you how to do this. Uh Foundry Foundry model load or install or download download. Uh boop. Oh jeez. Foundry model help. There it is. Foundry model M model download. And then boom. Here we go. I wonder how big this is. Oh, it's big. So, we got a minute to kill here, friends. What should we talk about? By the way, like I'm downloading the actual GPT OSS 20 billion parameters CUDA GPT1. I think like there's certain models that only work on a GPU. So, let me um I've got my other screen here. Sorry. Uh I want to show you just because I want to see this actually working. Uh make sure I'm not revealing anything I shouldn't be. Uh where is this? Uh do no here we go. So I have a lot of um GPU. Uh you can see right here I have an a GeForce 4080 which is not newer new but it's not old either, right? U I think the 5090 or the 5080 is the new one. And I have quite a bit of RAM, right? So, while this model is big, uh, it's, uh, 9 GB, I can fit it safely in RAM. all safely in RAM, which is nice. Okay. So, did we finish? Yes. Yes. So, so F foundry cache LS. Uh, these are all the models I have installed. Oh, foundry model load uh G GPT OSS20B. Here we go. Loading the Modell. There we go. Uh let's see our RAM and stuff go up here in a hot second. Uh Foundry Modell run uh GPT OSS uh 20B. Hi. Here we go. Let's see. Let's put this down here so we can see all the haps all the happenings here. We'll we'll boop. We'll do this and then I'll put this down so it looks uh Oh. Oh, what can you do? And then here, do you see this GPU thing? Boo. What can you do? Uh thinking

Segment 8 (35:00 - 40:00)

Whoa, look at this. Oh my gosh, this is so cool. Write a limmerick about amazing GPUs. Oh, some of these things are so chatty. By the way, look at this. Like, it's pegging my GPU right now. Wow. This, you know, some models are super chatty. I should have prompted it better. Wow. Okay. GPT OSS, you do you. All right. Do you see that? Like this all the way down. All right. Um let's get back to the show. I want to make sure that um uh we get back to it. Uh we have coming up Elijah and the agent framework and then maybe we'll try to see if we can get them both to work together. What do you think? Should we do that? Uh but before further ado uh let's turn it over to Elijah and Agent Framework. Take it away. You're not going to want to miss this episode of the AI show where we talk all about the new agent framework for nextgen multi- aent solutions. My friend Elijah, make sure you tune in. — Hello and welcome to this episode of the AI show. talking all about the new agent framework for nextgen multi- aent solutions with my friend Elijah. Elijah, how you doing, my friend? — Seth, good to be with you, my man. I'm doing really well. — We did a thing at the last thing where we got to work on demos. So, Elijah and I are we're what we call BFFs now. So, tell everyone what you do at uh at Microsoft. — Yeah, awesome. Uh I'm a PM on the uh in the Azure AI foundry team specifically working on semantic kernel autogen and now the agent framework which I'm super excited to uh be showing with you guys today. This is kind of a first look access to it. So uh it's exciting stuff. — Oh I'm excited. So why don't we start first what is it because I know there's a lot of options out there for doing aentic stuff and there's a lot of options even from just Microsoft. Can you tell us about the agent framework and how we should start to think about this? — Would love to, Seth. So, yeah, to get into it a little bit, um, we're living in an agentic world nowadays in San Francisco and you can't go five minutes without slamming your head into an agent uh, billboard, advertisement, etc. And um, as these and and at Foundry, we're proud to have our own Azure AI agent Foundry service. We worked on this heavily alongside Seth and some of our other colleagues to launch this back at build in May. Um, but now we've kind of gotten past just the need for a single agent and more into the realm of multi- aents. Single agents are really good for like very specific individual tasks. Um, whereas multi-agent systems can kind of uh accomplish a wide variety of tasks that require a diverse set of skills. If you think about a company, I'm really good at product management. Um, we have great engineers on our team, great marketing folks, and each one, everyone has their own individual skills. When we all come together as a team, that's when we're able to accomplish and ship cool products like the ones you're about to see today. And the same is true for multi- aent systems. Um, an individual agent, like if you give it too much scope, it could get confused and whatnot, but when you have multiple agents working together, then you're able to accomplish these uh more interesting and — complex tasks. Yeah. — Yeah. I absolutely love this slide. I am gonna is it okay if I steal it for other things because — the fact is that we started with uh no agent right but it's kind of an agent but not really it's agency is limited to like just here's your completion the single agent with tools and etc is super cool but this multi- aent thing that you have here there's so many options for how to string all of the string all the things together right how do people decide what to do and then what should they be doing once they've decided? — You know, that's a great question, Seth, and I'm going to talk about that here a little bit in a second. Um, because I think there's a lot of different ways that you can fit multi- aent systems into the work that you're doing daytoday. Um, but I think the question that you posed so eloquently at the beginning is like, okay, but how do I even do this? get started with multi-agent systems? And that's where this awesome agent

Segment 9 (40:00 - 45:00)

framework comes into place. So, for a little bit of historical context, here at Microsoft, we had two different agent frameworks previously, Autogen and Semantic Kernel. And for a little bit of history, Autogen kind of came out of the Microsoft research side of the house. Semantic kernel came out of the product side of the house. And it was kind of both had some great pros and cons to them. Autogen was very lightweight. Semantic kernel was really good for production workloads, but it was oftentimes confusing which one you should use and when. And so what we've tried to do is bring the best of both worlds together into this Microsoft agent framework that you see here today. So we're super excited to announce that and be rolling it out to public preview this week. — I see. So it's so the agent framework then if you could go back to the previous slide is a combination of both autogen and semantic kernel together like it's like the good bits smooshed together in like a sandwich. — Abs. Absolutely. I think that's a perfect way to describe it. We're trying to bring the white lightweight prototyping nature of autogen bring in we're working really closely with some folks at Microsoft research to bring the latest and greatest into Microsoft agent framework. But then we also know for especially in the enterprise you need that heavyduty production workhorse that is semantic kernel and all of the customization features etc that exist in there and so yeah bringing the best of both worlds to be able to prototype quickly but then also deploy to production. — Now is this going to be available in like C and Python uh for both languages? — Yes. So great question fully open source. We are 100% committed to open source and on day one it will be in Python and C with hope hopefully other languages coming soon. — I love it. All right. What do we got next? — Awesome. So, so then to answer your question that you brought up earlier, Seth is like, okay, but when do I use multi- aents and how where do I fit this into my uh my jobs dayto day and so we have kind of two different schools of thinking in it which when one is agent orchestration and agent orchestration is hey I just want a bunch of purely non-deterministic agents to work together to accomplish some sort of task and this can involve as it says here creative reasoning and decision-making um and leveraging those agents to accomplish a task like we were talking about earlier However, what we're really excited for here with agent framework is this workflow orchestration. If you um are familiar at all with semantic kernel process framework, I know we had a lot of fans of semantic kernel process framework. This allows you to bake in existing bake into existing workflows um multi- aents uh into that that process to be able to combine traditional business processes with also these multi- aent solutions. And for me, this is really exciting because it kind of gives you the best of both worlds, Seth, is that you're able to take whatever pro whatever workflows that you already have in your job, and we're I'm going to show an example of that here in a second, and then plug in agents where agents are really good because as we know, LLMs are really good at coding, natural language, research, etc., but then they struggle in other things of maybe trying to string together too many data sources or whatever it may be. So that this allows you to really tailor your solution to leverage the agents where the where they uh kind of fit the best and then also like not have to completely reinvent the wheel and reinvent your business workflow. — Let me see if I this makes sense. I got to mute myself here. So what you're saying is that now I can look at a business process that I may have at work and I can actually model it as a collection of agents in a process framework. I don't even know what it's saying. you said product in a in sort of a process rimmer that allows you to literally get a bunch of agents to do this kind of work. Am I getting this right? — Yes, absolutely. And it allows also the agents to interface with hey, you know, maybe I don't want to use an LLM for a certain task. I just want to use good old NumPy or a Python package or a C# package. Then you can just use that for those things that that regular coding that doesn't invoke all the token costs. um you can use the your business processes and then when it's like, hey, I know I actually really need some more like thinking or a nice report written, then you can bring in the LLM to help you do that. — And I love this. I I mean, you know me, we work together pretty closely. And I'm a fan of not using LLM when they're not needed. And if you have already existing business processes like file a claim that you just need to call an endpoint to, just do that. but allow an LLM to take an unstructured text and for example put it into the form of a claim, which is a really good idea. This is cool. So, do you have anything to show us that kind of does something like this? — Absolutely. Let's jump. And I know Seth is the king of demos and taught me everything I know about demos. So, we're going to jump into some demos here. — And so, and so what we're going to do here is this is a workflow that I've put together um to help venture capitalists. As I mentioned earlier, I live in San Francisco. A lot of VCs floating around a lot. And so I thought, what's a what's an existing event that exists in the VC space that I could help to automate and kind of bring in agents to an existing workflow. And so the what we're going to be looking at here, and this was actually a SVG file that I uh was able to generate using our agent framework is that it's a pitch

Segment 10 (45:00 - 50:00)

deck parser. So basically I'm going to take a pitch deck that a startup has sent me as a venture capitalist and then be able to do a little bit of research, analysis and a little bit of hey do I actually want to invest in this company yes and no based on the suggestion of the AI. And this is going to involve as you can see here both traditional pro um like processes of just like hey getting the research prepped and analyzing a slide deck but then also I'm going to use some market and financial analyst agents which you can see here in this sub workflow and I'm going to run these at the same time to kind of get optimal uh performance there if that makes sense. So here I'm actually going to go ahead. — There you go. So effectively you're basically modeling like how I would like what do I usually do when I look at like a thousand pitch decks. I always go through this process. This is the process, but this diagram is drawn by the process framework after you've set it up. But you might also pre-draw one and then have the process framework and validate that the one that you drew by pencil works as well. — Abs. Absolutely. You can do it either way. And I think that's what's so cool about LLM these days now too is that you can really take images, give it to them, and say, "Hey, put turn this into code. " This though is we have these great visualization features. We also have mermaid diagrams. I know a lot of pens are a lot of people are big fans of mermaid diagrams that the the framework will automatically just generate those based on the steps that you have in your framework. So that's really cool. — Let's look at the code. Let's do it. — Let's do it. And I'm actually Seth going to uh go ahead and just kick it off here for folks. So here you say this is the little UI I built. So while it's running in the background um I'm going to go jump in the code. But so I'm going to go ahead and upload this deck for Try AAI. And so what it is, Try AAI is kind of a iron man training company that it uses AI to track your Strava and your or — ring and all your various to be able to give you kind of the best um uh training plan for your Iron Man or whatever triathlon you got coming up. — And so this is going to go through that entire process. — Exactly. So while that's running in the background, I'm going to go ahead and show you guys like what exactly is happening. step by step. So, here we and this is really simple. What I did is I actually just um I'm uploading a pitch deck file and there's a Python uh a library that allows you to just pull out all of the information um from a PowerPoint slide deck and turn it into JSON. And we know LLMs love JSON. And so it just moves super quickly. And then we're going to feed it to an LLM in the next step. So you can see extract all text content from a pitch deck. Talks about hey just verifies that there must be a PowerPoint file. Does all that good stuff and then it hands it over to our step two which is our research prep agent. — So each of these steps then is modeled as a function inside of Python or inside of the process inside of the agent framework. — Yes. So and there's actually two ways to approach this. I did it in there's kind of a simple way and a more complex way. The simple way is you can actually just model all of these as a function. if you want kind of more uh customization and more ability to have some control over it, you're also able to actually define classes and then classes to be able to uh determine each step in your kind of process. — I see. Okay, this is cool. How do you wire these together? How where's the thing that says start here and then call all these other things? — That's a great question. So, I'm going to I'll jump down to the end here so we can actually just see what that looks like in action. And this is very simple, very similar to uh like directed graph and a workflow process that people may be familiar with where I'm just saying, hey, here's the start executor. And executor is basically our term for each individual node. And then, hey, I'm going to add this edge to go to the next step. Then I'm going to add edge between this one to this one. And you can see I've got a whole bunch of ed edges here. And then I think what's also interesting that is depending on whether or not I like this company, I have these conditional edges. So if there's if something is triggered in this case it's like if I say hey yes I want to invest in this company it'll then route it to a specific part of the workflow and if I say no hey I don't want to invest it'll actually do a completely different behavior. So gives the user a lot of great c you can see that here in the diagram as well. I'll zoom in a little bit. It's kind of zoomed out. Oh that was a little too far. um where it's like, hey, I have this human approver. Um and then I'm able to then kind of say, hey, yes or and then it goes to the approved handler or no, and it goes and it drops me a rejection email to nicely let — Can you show me the Sorry, could human approver bit? How does it shell back to the human in this case? — Yeah. So, I'll go up and uh let's find that human in the loop stuff. — This is cool. By the way, if people are like, "Wow, there's a lot of code. " I am okay with this because the reality is that these functions are pointing to things like here I'll put our faces up as you find it. This these functions are could be ostensibly pointed to things that are already existing in your systems, right? This is code that you have already written to do certain parts of the job is what I'm so all right. So what are we looking at here? — So yeah, great. That's exactly right, Seth. And great call out. So then we

Segment 11 (50:00 - 55:00)

actually have this special class called a approval request that should be able to say, hey, do you approve this analysis? Yes or no? And then there's also a routing decision that then takes like, hey, is this in this my case is very simple. was just a simple yes, no. You could add some more complex logic here, but hey, it then sends this um approval to the human and then asks me like, hey, do you want do you would do you approve the startup analysis? Yes or no? And then it sends that message back to the this kind of approval manager and then that'll let it know which of the next routes it has to take. And if we actually want to see this in action, I'm going to go over back to here to our demo. So I have um so it gives me this little hey, do you approve the start of analysis? this is the report that the LLM actually generated. So it's like hey this is going away for me but it keeps uh saying like hey this is some great stuff about this is the overview. Yeah if I scroll down it's got some competitors etc recommendations for success and then conclusion it says hey has the potential to redefine triathlon training etc etc. So then based on that I can be like you know what actually our fund is investing in um fit tech right now. So I'm actually going to deny sad moment for our founders. I'm going to say no — and then from there the human decision it'll run some logic and then hopefully give me — me a little to the founders to let them know like hey maybe not right now but maybe in the future. So, — I have a sound effect for this. Hold on. Here we go. — They do not get funding. — Denied. This is cool. — And it's actually nice because then I have a little the rejection email. It's like, "Hey, dear founders name, after careful consideration, we decided not to go with your uh thing right now, not to move forward. " And then I can just copy and paste that into my email and send that over to the founders. So, pretty cool stuff. This is uh dude this is fantastic. Uh by the way I was looking at that graph stuff. I love computer sciency things and so like it that was and I don't know what the right word is. I'm going to use this word. It looks delicious. I'm just — right because it really it really puts into context like hey you can make these agentic decisions coupled with decisions that you already have in your current processes and you can even shell out to a user to make sure hey does this make sense? This is fantastic dude. So where can people go to find out more? So, we have a open source um repo on GitHub that I want everyone should go check out. Um Microsoft agent framework is what it's called. Um blog posts, other developer content coming really soon. Um and we also have a discord, a pretty active discord that we'll link as well. Um so, a lot of great places for people to go access more information about this and and interact directly with the team. I think that's what the one the fun thing has been working on an open source project is that the team is very much involved with user feedback and so if there's stuff that you like don't like please let us know in GitHub issues on the discord etc and that feedback will get really baked right into the agent framework — you know what and I am excited to get my fingers pruning in all of this goodness because like this is the stuff I think that really is going to set like development shops apart from people that are not using a dentic AI when you're able do that cuz that thing you showed I know it went really fast but effectively you are literally automating a process of that takes forever. I got to open the pitch deck. I got to create a summary. But if there's a way to do that specifically with your when you're right, I think it's amazing. Thank you so much for being with us, my friend. — Of course. Thanks for having me, Seth. Pleasure being here. — And thank you so much for watching. Learn all about the new agent framework for nextgen multi-agent solutions with my friend Elijah. Thank you so much for watching and hopefully we'll see you next time. Take care. How about them apples? That was cool. Uh, all right. There are some questions. Elijah can come. Um, but Elijah's awesome. We'll have him on for another some other stuff. Let me turn that. My audio is a little hot. Okay. So, uh, here's a question. Um, uh, okay. Is agent framework working with Foundry local? I don't know. Uh I think there's some edges. Uh but uh I'm going to figure it out this week because I actually have to I'm gonna actually So for those that don't know, I'm going to be at Azure DevSummit next week in Lisbon. Make sure you sign up and go uh Azure DevSummit. Uh look it up. And what I have to what I wanted to do and this technology is also new, right? I have a talk uh on Monday at 3 at Lisbon time and then I have a workshop on Thursday. It's all day workshop where I'm just going to be explaining how I think about um agentic

Segment 12 (55:00 - 60:00)

AI and then so there's like a I'm going to talk about some stuff and then we're going to get some practical stuff in there. So hour talk, hour practical, hour talk, hour practical for eight hours with breaks obviously uh in between. And so I actually I wanted my suggestion was like wow I don't know what the internet's going to be like. Um I don't know how I'm going to give people access to LLMs. Uh and so I wanted to get to the basic principles. So, we're just going to we're going to I'm going to try to get Foundry Local to work with Agent Framework so that we're literally like if the internet's completely bad and nothing's working, people on their own boxes can, you know, well, they'll have to install the package agent framework, right? Um they'll install the package and then it's just off to the races working on this thing locally. And I feel bad, right? Because people will bring their laptops. Hopefully, they bring good laptops. But I know that my Surface laptop MPU with 54 is really fast. It's actually insane how fast it is uh with an MPU and it's I got like a Qualcomm CPU, right? It's insane. Uh so I don't know, but I have to figure it out this week amongst other things. So there's two things I need to do this week. One of them is finished some product for Ignite which is coming up and then the other one is um this thing. So we're going to get it done. Uh Richard is saying is this uh is this a live show on affiliation? So uh we record it. The way it works at on the AI show is u we actually record stuff beforehand and then we play it as live but then we have people come on and answer questions. Generally, generally when you see it, the recording, it's the first time we've ever played it. In this case, because Agent Framework was newly released, we actually pre-released it so people could watch it and then I played it here um and then to get some questions. So, another question is a typical workflow is deterministic. How is that mapped to an AI workflow? So, um the way that I like to think about it is the non-deterministic language to process part you handle through the workflow. Anything that's deterministic, you handle through function calling, right? That calls the deterministic functions. Uh, and hopefully the LLM has decided to call the right function. And anything that needs approval has a human approval thing. And that's what you saw Elijah do. And so it's a combination of both deterministic, stochastic, as well as human validation. All put together is should make things a lot faster. I'll give you an example of something I hate doing. I think I'm an engineer more than a PM, but I'm a PM and is dressed as an engineer. One of the things I hate doing is we need to provide updates like almost every day or once a week or all the time. And I hate writing those things. You have no idea. I have to write updates like once a week. It's very sad. and um I don't want to write them. So I could actually talk to a thing and hopefully it can write the updates for me. That's what I want. It's a deterministic process that I don't want to do. So maybe I just like say, "Hey, read my calendar and look at my team's things. Write the update for me. Please. I don't want to do it. Don't make me. " Okay, that was the end of my little rant. Okay, so if there's a live show, uh why am I? That's why because we released a video. Uh another questions. I keep going. Uh this new framework will be updated for some certification. I don't know. Uh oh my goodness. Early Dev, hello. How are you my friend? Um uh write durable function with it. Yeah, here's the cool thing. Jan's going exactly the beauty of all of this stuff. And holy cow, we are almost out of time. I'm going to put my my going away walk-off music. Yeah. The cool thing about this stuff and something I've always preached, by the way, and the religion of LLMs is you should use them to enhance your code, not throw out your code to use them. And that's the thing I've always and now we're getting to a place where it's like, yeah, we're having those things. Um um uh here's another one. Would it be useful to have some examples spec kit

Segment 13 (60:00 - 62:00)

preconfigured sample project using agent framework which best yes Pablo look at the we everyone's a barrel of good ideas today we are not out of time internet time is unlimited yeah you know not going to lie I think you may be right you know what I'm saying you literally may be Right. You may be right. Um, you're right. Uh, okay. Uh, so, uh, next time on the AI show, uh, by the way, we have a we have new, uh, here, let me, let me close all this other stuff cuz I keep looking to the side. Uh, I keep looking at the side. I don't want to do that. next time on the AI show. Uh, and we are excited because we have new we want to grow up and look more professional. So, look at this. Look at this beautifulness. You see that little AI drawing thing. Join us next time on the AI show October 13th. Upgrade your voice agent with Azure AI Voice Live API with Deb. Uh, she's awesome. Uh, she has her own channel, too. Make sure you go watch that because she puts out stuff faster than we do. We we're more of a general generalist AI show. And then GBD 4. 1 versus GBD5 with Alexander Hughes. Alex, he's been on the show before. He's a good dude. Uh so make sure you tune in for that. Uh yes. Uh let's see. Uh any last minute? Uh creating workflow with mix of tools is not easy. No. uh that's what the workflow thing is supposed to help with. For example, you see things like NADN do a really good job of doing a mixture there of stuff. Um so maybe this is a good, you know, framework for doing that in code. Uh but anyways, uh thank you so much for spending some time with us. We know that we know your time is valuable and uh we appreciate you being here with us on the AI show. Thank you so much for watching and hopefully we'll see you next time on the AI show next week. Thank you so much. We'll see you next time. Yeah.

Другие видео автора — Seth Juarez

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник