# The COMPLETE TRUTH About AI Agents (2024)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=HGS5LfyxLUU
- **Дата:** 09.07.2024
- **Длительность:** 45:46
- **Просмотры:** 76,867

## Описание

Prepare For AGI With me - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

00:00 - Intro to AI agents
01:21 - Generic agent workflows explained
03:06 - Generic workflows for complex tasks
04:08 - Generic workflows improve GPT-3.5 & GPT-4
05:10 - Mixture of agents paper examples
08:13 - Open-source LLMs beat GPT-4 with mixture of agents
09:16 - Few real-world agent use cases currently
09:46 - No-code agent workflow demo in Cassidy AI
13:54 - Multi-On web browsing agent overview
16:27 - Rabbit AI device agent & early issues
19:22 - Google customer service agent demo
21:14 - Devin AI coding agent from Cognition Labs
22:20 - OpenAI agent to control user's computer
23:32 - Meta's engineering & monetization agents
24:59 - IMVU's reasoning-based coding agents
26:26 - Challenges making reliable real-world agents
29:38 - Hard for agents to do multi-step tasks with few errors
32:45 - Google: 1-2 years to multi-modal AI assistants
34:26 - Bill Gates on very knowledgeable AI agents
36:18 - Nvidia CEO on future AI agent teams in business
39:07 - Risks of fully autonomous AI agents
41:01 - Key agent parts: tools, memory, planning, actions
42:27 - Better language models vs prompts to improve agents
43:48 - User interface design key for agent workflows
44:13 - New agent workflows for coding tasks

Links From Todays Video:


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=HGS5LfyxLUU) Intro to AI agents

insane Buzz going on about the phrase AI agent now I do want to make this video because it's actually going to give you a complete rundown of what are AI agents and the actual shocking truth about them because a lot of people don't know this and the buzzword is getting a little bit out of hand so let's actually take a look what agents are what they can do what projects you can use today and where we are on the grand scheme of actual real world autonomous agent applications so when we look at what are AI agents I want to Simply Define this to make this video a lot easier what we can simply call an AI agent is basically just an advanced AI assistant think of it as someone that you're giving a task and you're saying hey I need you to go out and autonomously execute this task either by yourself regarding your environment and being able to use certain tools or I need you to work in a team and then of course accomplish your goal now the reason I'm making this video is because there's a lot of information that most people don't know some good information and some bad information that's going to give you guys the true rundown of actual agents I'm going to first start with this video of Andrew NG describing AI agentic workflows this is another terminology that is used when you are describing an AI that is allowed to use a certain workflow to achieve a certain goal this

### [1:21](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=81s) Generic agent workflows explained

is pivotal in terms of how you can use AI today and I think this is one of the most important pieces of content that you'll ever see regarding actual AI agentic workflows that are applicable in today's use right what are they um many of us are used to using large language models with what's called zero shot prompting and that means asking it to write an essay or write a response to prompt and that's a bit like if you imagine going to a person and saying could you pleas write an essay on topic X by typing from start to finish all in one go without ever using backspace you know and despite the difficulty of writing this way I can't write that way 's do pretty well in contrast an agentic workflow is much more iterative you may ask an please write an essay on a write an essay outline and then ask do you need to do any word research if so go search the web fetch some info then write the first draft then read your draft to see if you can improve it and then revise the draft so with an agentic workflow it looks more like this where the Alor may do some thinking do some research then revise it and do some more thinking and this iterative Loop actually results in a much better work product um and if you think of using agents to write code as well today we tend to promp an L you know write code and that's like asking a developer Could you type out the program and have it just run you know from typing for the first to last character so what Andrew eni just described there was the difference between a non- agentic workflow and an agentic workflow as he said right there and I'm going to reiterate first off many people using chat GPT just zero promp it for the average user this is going to be their main experience with llms and that's completely fine but if you're someone

### [3:06](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=186s) Generic workflows for complex tasks

that's watching this channel you're definitely going to want to benefit from an agentic workflow if you're trying to achieve any difficult SL complex task you're always going to want to make sure that you have multiple steps in order to get the AI system to perform at a much higher rate now the reason you're going to do this is because this dramatically improves the ai's output and actually allows it to achieve a higher degree of reasoning now if you don't believe me take a look at Andrew NG when he's able to do this with GPT 3. 5 and it works surprisingly well but agentic workflows also allow it to work much better so um my team collected some third some data that was based on the coding Benchmark called human eval is a standard Benchmark released by open a few years ago that gives coding puzzles like this you give a non name to this Ines return to some and that turns out to be the solution and it turns out that gbt 3. 5 um on the evaluation metric um uh passet K got it 48% right with zero shot prompting they just

### [4:08](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=248s) Generic workflows improve GPT-3.5 & GPT-4

write out the gp4 does way better 60% or 67% accurate but it turns out that if you take gbt 3. 5 and wrap it in a gentic workflow it does much better and so and with gbt 4 that's that does also very well and so to me one thing I hope you take away from this is while there was a huge improvement from GPD 3. 5 to gb4 that Improvement is actually dwarfed by the improvement from GP 3. 5 with an agentic workflow and to all of you building applications I think that uh um this maybe suggests how much promise an agentic workflow has so overall just there we even saw on the benchmarks that using an agentic workflow and all of these right here these small different dots they basically describe different types of agentic workflows we can see that zero shop is just prompting the model with one kind of prompt and then of course we do have all of these different ways like language agent Tre search ldb and reflexion all of these

### [5:10](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=310s) Mixture of agents paper examples

different things are ways to prompt an llm to essentially make it more effective in some kind of feedback loop that gives you greater results and these are things that you can try to dat now of course there are papers on these detailing how to do them and sometimes they range from things that are quite hard to easy for example there was this that I do really want to show you guys there was a recent paper and I think it was one of the most interesting papers because of what it actually showed us so there was this paper called mixture of Agents if you are someone that's been paying attention within the AI Community this is a paper that has been popping up on your radar so basically what we have here is that this is a agentic workflow that actually utilizes different AI models to basically refine your response many different times using different layers of Agents we can see here that there are essentially three layers and for those of you who are non-technical including myself and you don't want to get super confused just think of it as you're using three different llm systems and you're basically saying I want you to judge the response and evaluate it three different times so basically think about this like some kind of competition where you input your prompt and then three different llm agents basically evaluate it every single time we have round one round two and round three now you might be thinking okay mixture of Agents sounds cool you have three judges every single time they refine The Prompt and it continuously gets better but the best thing about this is the actual effects of mixture of agents and this agentic workflow so you can see here that they discuss that remarkably this Improvement occurs even when the auxiliary response provided by the other models are of lower quality than what an indiv ual llm could generate independently so basically what they're stating here is that mixture of Agents is something that improves ridiculously on a base model even if the models combining together and talking together are dumber than the initial big model you can see right here you can see that these single models are different open source models wizard 8 * 22b llama 3 you can see that with responses from other models they're able to improve on a single model that has more parameters and has more smartness now there was also something that was in this paper that I forgot to include but it is really important basically in this paper they actually state that using mixture of Agents using only open- source llms is the leader of the alpaca evaluation 2. 0 by a substantial Gap achieving a score of 65. 1% compared to 57. 5% by gbt 40 basically you know how op eyes got GB T4 and they touted it as the smartest model they managed to get a better Benchmark result and increased reasoning capabilities by using open- source llms that were substantially dumber than gp4 so overall using this

### [8:13](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=493s) Open-source LLMs beat GPT-4 with mixture of agents

agentic workflow managed to out compete GPT 40 as a single model and this is why the collaboration effect of a gentic workflows actually are remarkably important next we have ai agents that you can actually use today now this is a product called crew AI crew is a collaborative working system designed to enable various AI agents to work together as a team efficiently accomplishing complex tasks each agent will have a specific role resembling a team composed of researchers writers and planners and the main you know function SL features of crew AI include role-based agents providing each robot with a clear defined role and teamwork capabilities to enable agents to communicate share task information and assist each other now this is a tool that you can run locally and many people can use this but this is something that I don't like because it's not easy to teach for non technical users because sometimes there are bugs and of course there are different issues that you can actually run into now some people do use

### [9:16](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=556s) Few real-world agent use cases currently

this but I don't really see anyone actually using crew aai for Too Many real world application sluse cases and that's one of the main I'm not going to say issues with agents but Main things with agents is that there aren't many real world use cases that you can use them for today now I personally use Cassidy AI because I'm going to show you guys how with no code and with a simple prompt you can get a simple agentic workflow that you can use on a day today basis for example I built this thing

### [9:46](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=586s) No-code agent workflow demo in Cassidy AI

that's going to discuss business ideas so my workflow input was discuss a business idea then the agent one considers the business ideas and its strength then agent two discusses it and then of course the Agents come to a inclusion but you might be thinking okay how on Earth did you build this entire like thing how did you chain together this workflow how hard was this and this is why I love cassid aai I'm not just promoting this for any kind of reason but this is the only thing that I found on the internet that allows you to build instant no code agentic workflows so for example if I come to here and I want an agentic workflow I can simply describe this in my own words so I can say I want an agentic workflow that you utilizes three agents that discuss my business idea and then come to a conclusion as to which one is the best longterm you can see right here I wanted a gentic workflow that utilizes three agents to discuss my business ideas then come to a conclusion as to which one is the best longterm and why so now I can simply click create workflow there are no fancy you know coding applications I don't have to install anything it just takes a minute and the reason I like this so much is that we can use this in natural language now the reason I'm you know telling people about this is because I believe that this is how the majority of software applications are going to be built in the future and whilst yes that sounds pretty crazy getting better at prompting with natural language is not a bad idea at all so right here you can see within a couple of seconds it managed to build this agentic workflow you can see right here it managed to get agent one's analysis agent 2's analysis agent 3's analysis and then of course the final agents discussing are going to be discussed by GPT 4 and then it's going to give the best business idea long-term and why so for example I can actually test this workflow right now so in order to test this workflow I have a business idea so of course it's going to analyze my business idea for long-term success so let's discuss a business idea that I could have for the YouTube channel let's say I have a YouTube channel in AI think I should launch a private community that helps people learn AI so here I'm going to show you guys exactly what I've just done I've got a YouTube channel in Ai and I'm thinking about launching a private community that helps people to learn to prepare for AGI and I do have this community I'm just using this as an example but I'm saying should I do that or should I help you know launch a skare line and I'm wondering which business idea is the best so essentially right now you can see the workflow is finished so I'm basically just asking this AI system look do I launch this AGI Community or do I launch a skare line now for most of you the answer is obvious you should launch a business that is related to AI but of course in certain cases the answer might not be so obvious that's why we use an agentic workflow here you can see agent one analyzed the problem then agent two analyzed the problem brought up many different things agent 3 also analyze the problem and then of course we have the agent discussion and we can see after carefully considering the perspectives of market demand brand alignment Innovation scalability yada y yada the conclusion is here you know focus on your AGI Community okay it says the best business idea for long-term success seems to be launching a private AGI Community preparation based on the following reasons and now we have all of these different AI agencies you know all of their different opinions and we now have that information and advice in a very quick manner this is why I like this software and this is why I'm telling you guys about it simply because this actually allows non-technical individuals to immediately get access to building AI agent workflows and using them daytoday now that was Cassidy AI of course you can use that click the link in the description if you want but just watch the rest of this video CU of course there's a lot of stuff to continue to do so now let's take a look

### [13:54](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=834s) Multi-On web browsing agent overview

at some of the other AI agents that are actually autonomous and I guess you could say do things on your behalf and of course this is where things start to get interesting so if we take a look at some of the AI agents that you can actually use today you're going to see that things aren't as good as you think they are so right here this is multi-on is an AI agent that essentially can be used in your browser to do very basic tasks and this is not like an llm based agent this is more kind of like an agent that is able to scrape the web and do many different things you can see that on the right hand side you have this agent that is able to book a restaurant reservation and is able to do certain things now these are the kind of agents that are essentially the most futuristic because they can actually go out into the Real World and Achieve real tasks based on what a user would want today not really a fancy demo but the problem is that mulon doesn't have that much usability in terms of what it's able to do it's only able to do really Niche stuff and it doesn't seem like they've made too much progress in that area now I'm not stating that to hate on mulon I'm just stating that because what you're about to see later on in the video is just how difficult AI agents are what they've achieved is actually amazing but then the grand scheme of things there are fundamental hurdles to getting AI agents to run around the web clicking and achieving certain tasks you can also see here that this is the multi-on browser you essentially click this right here and you can see that if you ask this AI agent for example you can see in the chat bot right here book a one-way flight to NYC for SFO for June 10th for the multi on hackathon you can see that it is able to use Google and sometimes do that you can see that it's your AI web co-pilot and it's able to perform a Google Search now you can see right here that this actually allows you to have a button that allows to you know ows you to basically Edge the AI on to be able to say yep that is correct you can go ahead and do that and sometimes you're able to do this completely autonomously now I do want to say that this is as far as this AI stuff goes because sometimes the AI will make mistakes and it isn't essentially mul On's fault there is just like this huge infrastructure problem that like I said I'll get to later on in the video but this is basically going to be something that is the future of AI and I do think

### [16:27](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=987s) Rabbit AI device agent & early issues

that will change things now of course in terms of AI agents we also have something else we also have ai agent devices now ai agent devices are a little bit stranger because they are basically AI assistants that are wrapped in Hardware so right here I'm pretty sure everybody knows what the rabbit R1 device is this is a device that is heralded as the best currently AI agent now I do have to say hats off to Jesse Lou because when he created this I do think that there is a remarkable level of scrutiny that he you know suffered from albeit you know very warranted but the point here is that this AI agent is not without its faults this video basically showed this AI agent doing a remarkable number of different tasks in scenarios the only problem was that the reality of this AI agent when it was released didn't actually live up to the hyp example if I want to plan a trip to London I can just describe what I wish in full to R1 I want to take my family to London it's going to be two of us and a child of age 12 we're thinking of January 30th to February 5th can you plan the entire trip for me we like cheap non-stop flights grouped seats a cool SUV and a nice hotel that has Wi-Fi exploring ticketing options to make your trip a reality for your trip I found various flight options a range of hotels to choose from and car rentals available please confirm each option individually for further details and booking so it's all been planned out I just confirm confirm and that's it could you come up with a schedule for fun things to do while I'm over there on it I have prepared a detailed travel plan for your trip in summary you will be exploring London's iconic landmarks visiting museums enjoying delicious meals at local restaurants and experiencing the vibrant atmosphere of different neighborhoods navigation details are also prepared and I can help you book tickets for any events you'd like to attend R1 just planned the entire trip for me that's awesome but it seems like this is a little bit too intense can you play now I'm not going to play the entire demo for you you've probably already seen the rabbit R1 numerous times and like I said the only problem about the R1 device was that there was a ton and ton of early criticism now like I said if you're going to start a business this is something that you of course have to be open to but this is something that of course shows us what AI agents are going to be like in the future in terms of a hardware device now we need to get to the point to where big compan IES are actually starting to implement AI agents into the real world one of the questions that I've seen floating around on Twitter is that nobody can seem to find a real world use case of AI agents doing something reliably so here is Google's recent demo where they're actually showcasing their customer service agent that is able to help a customer with certain inquiries

### [19:22](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1162s) Google customer service agent demo

in real time I'm so close to having this shirt for the concert let's give the store a call hi there this is the symbol old fashion customer agent at South Las Vegas Boulevard am I speaking with Amanda yes this is Amanda great thanks for reaching out Amanda I see you had a session on another device I've sent you an SMS message with a link to our live chat companion if you would like to switch to chat please click the link how can I help you today I'd like to purchase the shirt in my cart with the cart I have on file absolutely I see you're also a symbol fashion Rewards member looks like you have a 20% off voucher available to use would you like to apply it to this purchase yes please that would be great the shirt you're purchasing goes well with these items also available for pickup in your preferred size would any of these be interesting to you absolutely please add the white shirt and the boots to my C great your total is $23. 76 okay to proceed with the card on file yes your purchase is confirmed do you need anything else today no I'm all set thank you incredible you shopping with symbol fashion you'll get a text when the items are ready for you in less than 5 minutes I was able to find and order the shirt I wanted and an outfit to match I'm ready for the concert so this is currently what Google is cooking up there is a longer video that showcases all of Google's agents in another extensive video but let's take a look at what some of the other companies are doing one of the main demos that you may have seen before is of course Devin the main software engineer SL agent this is essentially an agentic workflow wrapped around GPT 4 that allows you to essentially benefit from the cognitive capabilities of GPT 4 and get yourself Hands-On with an AI software engineer SL

### [21:14](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1274s) Devin AI coding agent from Cognition Labs

agent okay I mean I know they don't use the term agent but this is essentially what it is you can see that the workflow is here where he has a planner he has a shell slw workspace and is able to go around and do tasks autonomously so essentially take a look because this is going to keep coming up in certain videos in action I'm going to ask Deon to Benchmark the performance of llama in a couple different API Provider from now on Devon is in the driver's seat first Devon makes a step-by-step plan of how to tackle the problem after that it builds a whole project using all the same tools that a human software engineer would Devon has its own command line its own code editor and even its own browser in this case Devon decides to use the browser to pull up API documentation so that it can read up and learn how to plug into each of these apis here Deon runs into an unexpected error Devon actually decides to add a debugging print State reruns the code with the debugging print statement and then uses the error in the logs to figure out how to finally Devon decides to build and deploy a website with full styling as the visualization you can see the website here all of this is possible today because of the advancements that we've made in both reasoning and long-term planning it's a really hard problem and we've only just started so that was Devon cognition lab's software agent now other companies are working on really big stuff for example openi is

### [22:20](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1340s) OpenAI agent to control user's computer

working on a form of agent to automate complex tasks effectively by taking over a customer's device the customer could then ask the chat GPT agent to transfer data from a document to a spreadsheet for analysis for instance or to automat automatically fill out expense reports and then enter them in accounting software those kind of requests would trigger the agent to perform the clicks cursor movements text typing and other actions humans take as they work with different other apps according to a person with knowledge of the effort essentially open AI is working on an agent that can you know control your computer and do your work for you that's basically the logistics of what's going on here so far we haven't actually managed to see open AI do anything just yet because it is very hard to get this done as I'll show you guys in a minute but this is essentially what they're working on openi recently released this right here where you can see their investment areas are in multimodal AI agents and last we'll continue to invest in enabling agents we're extremely excited about the future of agents and we share a little bit about that Vision back in November at death day and agents will be able to perceive interact with the world using all of these modalities just like human

### [23:32](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1412s) Meta's engineering & monetization agents

beings and once again that's where the multimodality story comes into play Imagine an agent being able to kind of coordinate with multiple AI systems but also securely access your data and even yes manage your calendar and things like that we're very excited about agents dein of course is an amazing example of what agents can become like cognition Labs has built this awesome uh this awesome like software engineer that can code alongside you but he's able to break down complex task and actually um you know browse the documentation online submit pool request and so on and so forth it's really a glimpse into uh what we can expect for the future of Agents now if you're wondering what meta are up to meta are also developing AI agents that can complete tasks without human supervision according to an internal post they include an engineering agent to assist with coding and software development similar to GitHub co-pilot according to the internal post and to current employees the post also sites monetization agents and that one current employee said that would help the businesses advertise on meta's apps and these agents could be for both internal use and for customers the employee said so essentially meta is also working on the same thing we also have a private company called imbu or imbu they are actually also working on AI agents and just recently raised another round where they are getting more funding in order to do this complex task we are writing an essay we're not just generating an essay and then turning it in that's not

### [24:59](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1499s) IMVU's reasoning-based coding agents

how our best essays get written we look at our essay we critique it we say hm this section needs more research I'm going to go do some research and come back and then rewrite it that iterative process of looking at something reasoning through what the issues might be figuring out where I need to ask questions what the goal is maybe changing my Approach those are all required to actually accomplish goals in the world at INB we're training large Foundation models optimized for reasoning on top of those models we build agents that we use to accelerate our own research search and these do more than just output something they also iterate and reflect and figure out what's the next step to do and then take that next step we're starting with agents that code because it requires complex reasoning to be able to code well and because that's the work we do every day and it's only through the serious use of these systems that we can really deeply understand how to improve the underlying reasoning models what we're trying to do is actually get to reasoning models that we can ultimately build agents we can trust on top of all sorts of Agents it's hard to see CU we're so in it today but AI models are like the very first electronic computer which was just a calculator what we're going to see is this explosion of enablement over the next 50 to 100 years if we do things well we have a world where we don't have to be glued to our screens anymore where computers can help us thin the barrier between idea and execution so that now after imbu I do want to talk about the difficulty of Agents because many people are you know I guess you could say wondering when AI agents are going to come and by the

### [26:26](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1586s) Challenges making reliable real-world agents

looks of things AI agents are going to be a lot further than people do think this is a recent clip from Dario Ade speaking about the difficulty of AI agents and why they probably won't be here anytime soon if you want an agent to act in the world um usually that acting requires you to you know engage in a series of actions right you talk to a chatbot it only answers and maybe there's a little followup but with agents you might need to take a bunch of actions see what happens in the world or with a human and then take more actions and so you need to do a long sequence of things and for that to actually work the error rate on each of the individual things has to be pretty low right if I'm a robot and I'm like you know okay I'm going to pick up this thing and walk over there and I'm going to pick up that you know I'm building a house or something there's probably thousands of actions that go into that and so all of this is to say the models need to get more reliable because the individual steps need to have very low error rates and I think part of that will come from scale um like we need another generation or two of scale before the agents will really work there is also Mustafa suan who is the head of Microsoft's AI former lead of Google Deep Mind stating that this also is the thing it's still pretty hard to get these models to um follow instructions with subtlety and Nuance over extended periods of time I think that they can do it you know and there's a lot of cherry-picked examples that are impressive you know on Twitter and stuff like that but to really get it to consistently do it in novel environments is pretty hard and I think that it's going to be not one but two orders of magnitude more computation of training the models um so not gbt 5 but more like gbt 6 scale models so I think we're talking about two years before we have systems that can really take action so this is why I was stating earlier that companies like multi-on and IMB do face a very difficult problem because whilst yes you can you know code an agent or whatever if you're a startup It's relatively hard to do that considering the fact that a lot of these Foundation models that you're using simply don't have the scale to support the long-term agentic workflows slask planning whatever you want to call it that is needed to successfully do these things in the future now you can see here that he talk about you know he spoke about GPT 6 being the date that we could get agents and this is actually further you know iterated by the fact that when we look at GPT 6's trademark one of the newest things that we do see is the fact that they say that you know GPT 6 you know has the purpose of testing artificial intelligence agents and this is not something that is in the trademark description for GPT 5 so this further bolsters the claim that in the future you know it could be around 2 and 1 half years before we do get very effective agents that work at scale and the reason this probably is most likely the case is because when think about it there are only a few companies that can really train models that are that big and that are that large that are going to be able to competing in the agent space that really does work so this is

### [29:38](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1778s) Hard for agents to do multi-step tasks with few errors

going to be a really interesting area for growth because this is going to be something that you know is quite difficult to do so when you ask a model to produce you know to complete a sequence of action let's say it's like three things in you know to basically let's say to book you know a restaurant that you and I can go to on a certain day yep the first action would be check the availability in both of our calendars so that's a correct function call reconcile the correct moment so that's the second action make sure that it's a restaurant that has availability so that check is another one and then you know go and sign in so that you can basically you know use the correct tool to book the right rest at the right time put your credit card details down um having obviously also check that it's that we both like Etc so it's like four or five or six different steps of you know just to produce that one quotequote action sub components right in order to get that right you're basically saying that the model has to produce perfect function calling for each element and do so in sequence so it can't just be arbitrary it has to be in sequence and that's like saying it has to write a four page document in response to one question that is exactly that document and can't be something that is approximate or similar to that doc so we all think that obviously these models are magic at the moment and they write beautiful poetry and creat of copy and text and give you good answers and sometimes they're grounded and blah blah but when for each one of those answers there's a wide range of correct answers that it could have picked right tens hundreds thousands maybe so it isn't producing a specific perfect every single token that is outputed is correct answer for each one it's not there yet right so to get that level of precision we have to scale up these two orders of magnitude that's what's happened so far the last you know five orders of magnitude of Transformers with every 10x of compute and data we get more Precision it's not just emerging capabilities that's wrong people say oh it's surprising we had these emerging capabilities that's an anthromorphic projection they're not surprising emerging capabil it is just more precise attention to the correct mapping between a prompt and an output right so you're just honing in on something more specific now do you think we'll be able to get narrow forms of actions in specific domain before we get to GPT 6 yeah definitely I mean there are some good actions today right you can definitely you see these orchestrators making good API calls at the right time the question is can it do it with 99% accuracy because if it does it 80% then you know one in five times it getting it wrong it's not usable for a consumer right so the reason I included that clip is because it's the first time I truly saw it broken down how difficult it actually is to make AI agents actually work in different scenarios now one of the things that we also need to discuss is of course other company's Future Vision for agents so we have Demis cabus

### [32:45](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=1965s) Google: 1-2 years to multi-modal AI assistants

talking about their vision for AI agents that I think will probably come a little bit earlier than open AIS for me the Highlight was probably this thing called project Astra that we showed which is our vision for what um I'm calling a kind of universal assistant or Universal AI agent could be like and how it could help in your everyday life that the main key thing about it is its multimodality ability to understand all the different modalities we operate in and beings in the context so I think that was what's missing from the language agents is they didn't understand the context the spatial context you were in and the environment you were in so they can only be that limits their use um we always had this vision of something that could understand the world around you through vision you know eventually audio and all the other sensors as well um and that's why we built Gemini our large model to be um series of models to be multimodal from the beginning able to cope with anyut and event and um and that's the I think the vision that we have for maybe the next year or tomart assist which I think um you know um game you can see Google there talking about how it's going to be a year or two which is much shorter than the GPT five timeline and of course Bill Gates also speaks about agents in this recent interview where he discusses the future of how it's going to work now previously I did a 30 minute video on Bill Gates blog post where he actually discusses how AI agents are truly going to change everything and if you know in around 2 and a half years when we do have fundamentally different agents that are able to do things for you long term on your computer that is truly going to change the way things go agent mental

### [34:26](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=2066s) Bill Gates on very knowledgeable AI agents

therapy ancient friend ancient girlfriend ancient expert all driven by uh deep Ai and it seems like it will be useful in proportion to how much it knows about us and I imagine at some point in the not too distant future probably all four of us will be asked if we want to turn on audio so our AI assistant can effectively like listen to our whole life right and I would and I would think that there there'll be benefits to do that because we'll get Good Counsel good advice um do you think that's true and do you think will you turn it on when invited to turn on the audio well computers today see every email message that I write um and certainly digital channels are seeing you know all my online meetings and phone calls so you're already disclosing into digital systems uh a lot about yourself and so yes the value added of the agent um in terms of summarize that meeting or help me with those followup uh um you know be phenomenal and the agent will have different modes in terms of which of your information able to operate with so there will be partitions that um you have but for your essentially executive assistant agent you know you won't exclude much at all from that partition before we so basically Bill Gates is just stating that your main AI agents there's not going to be much details they won't have because the more detail you give it the more useful it is going to be now of course we do have nvidia's CEO Jensen Hoang basically stating their vision for AI agents and how the future is going to be I think this is definitely one of the most insightful because he has a lot of connections and I can't imagine some of the conversations he's had all of these n are experts that are now assembled as a team so what's

### [36:18](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=2178s) Nvidia CEO on future AI agent teams in business

happening the application layer has been changed what used to be applications written with instructions are now applications that are assembling teams of AIS very few people know how to write programs almost everybody knows how to break down a problem and assemble teams very every company I believe in the future will have a large collection of Nims and you would bring down the experts that you want you connect them into a team and you don't even have to figure out exactly how to connect them you just give the mission to an agent to a Nim to figure out who to break the tasks down and who to give it to and they that a that Central the leader of the application if you will team would break down the task and give it to the various team members the team members would do their perform their task bring it back to the team leader would reason about that and present an information back to you just like humans this is in our near future so this is of course nvidia's Future Vision for agents and this is another video where he actually speaks about you know how agents are going to be collaborative in the workplace working with other company agents and how they're going to be doing things today most of the AIS are one shot uh you prompted with something and a generates instantly in uh the recommendation uh in the future AI will be multi-shot it will be reasoning based system just as we plan through various complicated uh scenarios uh it will do some planning itself and so you're going to have fast thinking AIS like we currently have you're going to have multi-step reasoning AIS uh that will be coming along uh these types of AIS uh will become increasingly agentic uh use and you demonstrated some early examples of this as well uh a that takes uh that uses tools AIS that works with other AIS that access information they have privilege and access to um Access Control to and so in the future AIS are going to be like employees in our companies in the sense that our employees work with your employees and so our AIS will work with other AIS um and uh we'll have Consulting AIS we'll have specialist AIS while generous generalist a so this is basically a comprehensive guide on kinds of AI agents that you know kind of exist and basically the holistic picture of AI agents now there's one last thing that I do want to actually show you guys because it is a controversial opinion

### [39:07](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=2347s) Risks of fully autonomous AI agents

well not really a controversial opinion but it goes against what the public believe about you know how we're going to get AI agents and one of the main things that many people have discussed is of course autonomous agents now autonomous agents are basically agents that you know do things on their own for quite some time for you know any kind of reason but here Mustafa suan breaks down why autonomous you know agents aren't exactly that good I think um first of all I don't think we're on a path towards fully autonomous and I think that's actually quite undesirable I think fully autonomous is quite dangerous and you know I got a lot stick after my TED talk because I said that the autonomous capability was dangerous and so on and that's one that should be regulated and you know I don't really care I mean I still think that I I think that if you have an AG that can formulate its own plans come up with its own goals acquire its own resources just objectively speak act completely independantly of human I objectively speaking that is going to be more potentially risky than not so I think about it as these like narrow veins of autonomy where you give it a specific goal and it has limited degrees of freedom to go off and act in some specific environment so like making automatically calling some API to check some registry to some information to observe some State maybe writing something into a third party you know API that is not yours but is again restricted with some specific degrees of freedom because I think the security risks here are significant so yeah I just think we should tread carefully on the autonomous piece but in terms of on the actions piece so yeah I think that essentially gives you a decent overview on where we're going to get with agents huge one of the most common ones that we see being built are agents um and we've heard a lot about agents uh from a variety of speakers before so I'm

### [41:01](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=2461s) Key agent parts: tools, memory, planning, actions

not going to go into too much of a deep kind of like overview but at a high level it's using a language model to interact with the external world in a variety of forms um and so tool usage memory planning taking actions is kind of the high level gist and the simple form of this you can maybe think of as just running an llm in a for Loop so you ask the llm what do you then go execute that and then you ask it what to do again and then you keep on doing that until it decides it's done so today I want to talk about some of the areas that I'm really excited about that we see developers spending a lot of time in and really taking this idea of of an agent and making it something that's production ready and and real world and really you know the future of Agents as the title suggests so there's three main things that I want to talk about and we've actually touched on uh all of these in some capacity already so I think it's a great Roundup so planning uh the user experience and memory so for planning Andrew uh covered this really nicely in his talk um but we see a few the basic idea here is that if you think about running the llm in a for Loop often times there's multiple steps that it needs to take and so when you're running it in a for Loop you're asking it implicitly to kind of reason and plan about what the best next step is see the observation and then kind of like resume from there and think about the what the next best step is right after

### [42:27](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=2547s) Better language models vs prompts to improve agents

that right now at the moment language models aren't really good enough to kind of do that reliably and so we see a lot of external uh papers and external prompting strategies kind of like enforcing planning in some method whether this be uh planning steps explicitly upfront um or reflection steps at the end to see if it's kind of like done everything correctly as it should I think the interesting thing here thinking about the future is whether these types of prompting strategies and these types of like architectures continue to be things that developers are building or whether they get built into the model apis as we heard Sam talk a little bit about the reason that was so interesting for me is because these papers are essentially tree of thoughts which is basically to basically get an AI to think a lot better reflection is basically where you get an AI to reflect back on its thoughts so for example it might write something out then you say you just wrote this out what do you think about this how could you improve it that's something that you can do um and basically he's stating here that future models might have these systems baked into the actual model so it's not something that we have to then prompt the AI to do again which is why of course in the future this is going to be something that is really intriguing on how they manag to get the reasoning capabilities improved but essentially you know further on the video he talks about how you know user interface you know the user interface is going to be pretty nice of course he talks

### [43:48](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=2628s) User interface design key for agent workflows

about Devon this is why I spoke about how Devon is going to be something that is brought up quite a lot in this video and of course this is essentially the prop osed Alpha codium flow which is you know a new workflow to kind of you know get better coding results and what he talks about here is that this is a new agentic workflow that kind of you know delivers a lot more results in this you know essentially workflow which is you know kind of different to before and

### [44:13](https://www.youtube.com/watch?v=HGS5LfyxLUU&t=2653s) New agent workflows for coding tasks

then of course um you know this is where he talks about you know agents for the future that they're going to have memory so this one right here as well this kind of workflow was Prett pretty crazy like it's some crazy stuff another kind of like aspect of this is just the importance of basically flow engineering and so this term I heard come out of this paper Alpha codium it basically achieves state-of-the-art kind of like coding performance not necessarily through better models or better prompting strategies but through better flow engineering so explicitly designing this uh kind of like graph or or state machine type thing and I think one way to think about this is you're actually offloading the planning of what to do to the human Engineers who are doing that at the beginning and so you're relying on that as a little bit of a crutch so basically they're just saying how we're going to get the AI to think in what steps like what are the best steps it can take to generate the most efficient code you can see it's got all these different steps it's got like Step 1 2 3 4 5 6 7 here 8 9 10 like 10 steps before it generates a final solution um and it it's really interesting how different steps and different ways that you interact with a base model can increase the capabilities which is why you know a lot of people think that these future models are going to get so much more better now let me know what you thought about this video if you thought this was good if it's actually updated your timelines on agents if this helped you understand fundamentally what AI agents are what they can do because there's a lot of you know information about AI agents this AI agent that and I just wanted to clear up the air on that stuff that we do see because there's a lot of hype there is also a lot of substance but I think it's important to differentiate between the two and understand what's coming in the future

---
*Источник: https://ekstraktznaniy.ru/video/14196*