The First Look At Googles AI Agents Is here... (Google Jarvis AI Agent)

17:58

The First Look At Googles AI Agents Is here... (Google Jarvis AI Agent)

TheAIGRID 28.10.2024 35 840 просмотров 650 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Prepare for AGI with me - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ 0:00 Project Introduction 0:55 Jarvis Revealed 2:15 Agent Development 3:44 Reasoning Capabilities 7:10 Return Demo 9:54 Virtual Teammate 12:37 Trip Planning 14:25 Vacation Assistant 17:23 Technical Details 19:05 Privacy Concerns Links From Todays Video: https://www.theinformation.com/articles/google-preps-ai-that-takes-over-computers?rc=0g0zvw Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (9 сегментов)

Project Introduction

So, in this video, I'm going to be diving into the real details of Google's AI agents. As many of you guys know, the 2025 space is pretty much going to be all about AI agents. And Google has genuinely been on fire when it comes to developing multiple different products. And it seems like the majority of people have forgotten just how good Google are. So, in this video, I'm going to be showing you guys exactly what Google's AI agents are going to do. and some actual demos of what Google actually showcased earlier this year and the kinds of ways in which Google Gemini's agents are probably going to look like. So earlier today, we got this article that basically says Google is developing artificial intelligence that takes over a person's web browser to complete tasks such as gathering research, purchasing a product, or booking a flight according to people with direct knowledge of the

Jarvis Revealed

product. Now, interestingly enough, it says that the product code came Jarvis is similar to one Anthropic announced earlier this week, these people said. So, essentially, we have gotten confirmation that Google is now working on AI agents. And I think this is rather fascinating considering the fact that it was early last year that we got the information that unfortunately some people had actually left Google to start their own AI agent startup. But we can see here that one of the key distinctions of this AI agent is that it's going to be doing research in the browser, purchasing a product, booking a flight, and it's going to be largely directly embedded within the Google ecosystem, quite likely with Google Gemini. Now, of course, it says Google plans to preview the product, also known as a computer using agent, as early as December alongside the release of its next flagship Gemini large language model, which would help power the product, two of the people said, and those plans are tentative and could change. Now, of course, this is something that is a leak, which means the information could be subject to change, but Google have released a ton of different videos in which they actually showcase what agents are going to be like. But I'm going to show you that guys that later in the video because I think a lot of people did miss

Agent Development

that. Now, early as December, I think could definitely happen, but I do also think that it will be dependent upon what other companies manage to release. The thing is right now in the AI space there isn't a definitive leader in AI agents. Anthropic did recently released theirs which was really good in terms of showing us what the future of agents are going to look like. But right now there isn't a definitive product that takes the cake in terms of AI agents. Right now with AI agents we are quite like at the stage of GPT2 where the models can barely do things that well. This isn't to take away credit from what these companies have done, just to show that we're really at the beginning of this entire development cycle. And of course, it also does state that this would be alongside the release of the next Gemini large language model. So, it will be interesting to see if they choose to release this at the same time as Gemini or they choose to delay it at a later date. I wouldn't mind if Google delayed this at a later date or released it earlier. All I would hope is that Google released this in a state that actually gives them good PR because one of the things I've seen time and time again is that news organizations are quick to run and make headlines that simply exaggerate issues of Google's AI. Now, it also talks about how Google is still developing an AI with so-called reasoning capabilities, which OpenAI launched in September after hiring a researcher who helped invent the reasoning method at Google in 2022. If

Reasoning Capabilities

you're not familiar with what they're referring to, if you remember, the recent paradigm to which we have shifted to is one of test time compute. And I would bet that Google are now staking their chips on this paradigm because it seems really, really promising. If you've ever used OpenAI's 01 model, it takes a little bit longer to respond, but those responses are far more in-depth and contain a lot more knowledge in terms of what you're going to get. is something that I personally think is underrated considering that most people don't really have use cases for these models. But when doing advanced research with these models and advanced brainstorming with these models, they can prove to be truly incredible. Now, with Google's multiple breakthroughs that they've managed to do before, I think if Google can tackle this problem and develop an AI reasoning model that is really, really good, I'm pretty sure they can definitely get to the level of OpenAI or potentially even surpass them. So now I can see here that it says that Google's agent, similar to the one Anthropic launched, responds to a person's commands by capturing frequent screenshots of what's on their computer screen and interpreting the screenshots like clicking on a button or typing into a text field two of the people said. So, if you haven't seen Anthropic's recent demo where they showcase these agents, I mean, most people would have seen them by now, but if you haven't paid attention to that, it's basically got this system where it screenshots anything on your computer and then it takes the next step. So, it'll screenshot what's going on your computer and then be like, okay, I need to click here. And then from that screenshot, it will then make the next step. And that's how the agent works. Sometimes you'll have to go in and manually change certain things. Of course, sometimes these models can still make mistakes, but I do think that these will be ironed out in the future. So, this is something that basically shows you how exactly this is going to work. Now, I am really intrigued to see how this kind of agent works in combination with other agents that Google is already having because like I said before, when I show you some of the demos that Google have already showcased us earlier this year at Google IO, they seem a lot more smooth and a lot more integrated with the traditional browser. So you can see right here that it says the key differences between the two companies agents is that Anthropic has said its products can operate different applications installed on a person's computer while Jarvis can only operate a web browser and has been tailored to Google's Chrome browser. The two people said so I think this is something that is going to fundamentally change the internet as we're able to get individual agents that are able to do work in your browser. Of course, having them eventually extend to your computer is going to be really crazy, but having them in your browser, it does seem like this would be a little bit easier to do because, of course, there are less things that could go wrong. And if you have something that works in the browser, it's like a contained environment that you can control certain variables. So, that key difference there should be noted for those of you thinking that you're going to use this Jarvis AI agent on your computer. Unfortunately, that won't be the case. Now, it also gives us more details about Jarvis by stating that Jarvis, at least for now, targets primarily consumers that want to automate everyday web-based tasks. And of course, this is where it actually refers to the example earlier this year where Sundar Pajay suggested that a future version of Google's Gemini could take several actions on its own to help someone return a pair of shoes. And

Return Demo

this is the demo that I'm going to show you guys now because this is the one that I personally think if we did get this in December, this would mark a real change. Imagine if Gemini could do all the steps for you. Searching your inbox for the receipt, locating the order number from your email, filling out a return form, and even scheduling a pickup. That's much easier, right? Let's take another example that's a bit more complex. Say you just moved to Chicago. You can imagine Gemini and Chrome working together to help you do a number of things to get ready. Organizing, reasoning, synthesizing on your behalf. For example, you will want to explore the city and find services nearby from dry cleaners to dog walkers. You'll have to update your new address across dozens of websites. Gemini can work across these tasks and will prompt you for more information when needed so you're always in control. That part is really important. As we prototype these experiences, we are thinking hard about how to do it in a way that's private, secure, and works for everyone. These are simple use cases, but they give you a good sense of the types of problems we want to solve by building intelligent systems that think ahead, reason, and plan all on your behalf. The power of Gemini with multimodality, long context, and agents brings us closer to our ultimate goal, making AI helpful for everyone. So when we look at this kind of thing that Google is able to do, I got to be honest with you guys, it does seem like Google's going to be able to do this and execute absolutely immediately. People are forgetting that Google have this entire ecosystem with Gmail, with a lot of different companies with the Chrome browser, it's pretty much what everybody does use. So, if Google can actually manage to leverage their existing platform and embed that with an AI agent that works flawlessly, as we're seeing here, it's going to be something that works tremendously with many existing customers. Everyone I know pretty much uses Google Chrome and it's going to be really easy to implement this into the entire ecosystem. So, I wouldn't be surprised if Google manages to gain some head ground in this area. Now, of course, we do have this example which I think is also really cool. This was something that Google demoed earlier this year as well, which is of course your AI teammate. Now, this I think is incredible because it's essentially an AI teammate that you can plug and play in quite like how OpenAI had their autonomous agents that were able to do different tasks for you in certain cycles. And I think considering the fact that Google, like I said before, has this AI ecosystem, it's going to be really easy to add team members that you can chat with and do different things

Virtual Teammate

with. Well, here's one way. We're prototyping a virtual Gemini powered teammate. This teammate has an identity, a workspace account along with a specific role and objective. Let me bring Tony up to show you what I mean. As you can see, the teammate has his very own account, and we can go ahead and give it a name. We'll do something fun like Chip. Chip's been given a specific job role with a set of descriptions on how to be helpful for the team. You can see that here. And some of the jobs are to monitor and track projects. We've listed a few out to organize information and provide context and a few more things. Now that we've configured our virtual teammate, let's go ahead and see Chip in action. To do that, I'll switch us over here to Google Chat. First, when planning for an event like IO, we have a ton of chat rooms for various purposes. Luckily for me, chip is in all of them. To quickly catch up, I might ask a question like, "Anyone know if our IO storyboards are approved? " Because we've instructed Chip to track this project, Chip searches across all the conversations and knows respond with an answer. There it is. Simple, but very helpful. Now, as the team adds Chip to more group chats, more files, more email threads, Chip builds a collective memory of our work together. Let's look at an example to show you. I'll switch over to a different room. How about uh Project Sapphire over here? And here we are discussing a product release coming up. And as usual, many pieces are still in flight. So, I can go ahead and ask, are we on track for launch? Chip gets to work not only searching through everything it has access to, but also synthesizing what's found and coming back with an up-to-date response. There it is, a clear timeline, a nice summary. And notice even in this first message here, chip flags a potential issue the team should be aware of. Because we're in a group space, everyone can follow along. Anyone can jump in at any time, as you see someone just did, asking Chip to help create a dock to help address the issue. A task like this could take me hours, dozens of hours. Chip can get it all done in just a few minutes, sending the dock over right when it's ready. And so much of this practical helpfulness comes from how we've customized Chip to our team's needs and how seamlessly this AI is integrated directly into where we're already working. Back to you, Aerna.

Trip Planning

Thanks. And of course, considering the fact that most people already use Chrome to plan different journeys, to plan different trips, how hard do you think it's going to be for Google to say, if you want to plan a trip, just try out with Google's Gemini agent. Or for example, someone might even search, I want to book a trip to Miami for Labor Day. And then Google just gives them an automated suggestion based on their user details on their account. So you can tailor things like, okay, I only eat vegan food. I like being out in the sun. I like these kinds of restaurants. You could already plug and play those things. And your agent could be able to tailor responses that are really specific to you, saving you so much time. Your AI agent could have every single detail, every single dietary requirement, everything about your entire life so that it could pick the perfect personalized journey for you. I personally think that is going to be something that is immeasurable in terms of its value. Now, we all know that chat bots can give you ideas for your next vacation, but there's a lot more that goes into planning a great trip. It requires reasoning that considers space, time, logistics, and the intelligence to prioritize and make decisions. That reasoning and intelligence all come together in the new trip planning experience in Gemini Advanced. Now, it all starts with a prompt. Okay, so here we go. We're going to Miami. My son loves art. My husband loves seafood. And our flight and hotel details are already in my Gmail inbox. Now, there's a lot going on in that prompt. Everyone has their own things that they want to do. To make sense of these variables, Gemini starts by gathering all kinds of information from search and helpful extensions like maps and Gmail. It uses that data to create a

Vacation Assistant

dynamic graph of possible travel options, taking into account all of my priorities and constraints. The end result is a personalized vacation plan presented in Gemini's new dynamic UI. Now, based on my flight information, Gemini 2 and a half day itinerary, and you can see how Gemini uses spatial data to make decisions. Our flight lands in the late afternoon, so Gemini skips a big activity that day and finds a highly rated seafood restaurant close to our hotel. Now, on Sunday, we have a jam-packed day. I like these recommendations, but my family likes to sleep in, so I tap to change the start time. And just like that, Gemini adjusted my itinerary for the rest of the trip. It moved our walking tour to the next day and added lunch options near the street art museum to make the most of our Sunday afternoon. This looks great. It would have taken me hours of work checking multiple sources, figuring out schedules, and Gemini did this in a fraction of the time. Now, there's also a little bit more information about this AI agent, and they state that the agent currently operates relatively slowly because the model needs to think for a few seconds before taking each action. According to two people with direct knowledge of the product now, essentially what this kind of tells me here is something that I didn't realize at first, but reading it for a second time, it's become more apparent. It says that the agent operates relatively slowly because the model needs to think for a few seconds before taking each action. So if this is true, if this leak is true, and remember it's a leak, it's speculation. We have no idea if this is factual or not. But considering what OpenAI has done and considering what other companies are going to do, I would edge on the sign of this being relatively true. But that would mean that Google have you know developed a custom model to run in the browser that is a thinking model essentially a reasoning model that is built for this task. I don't think you'd be able to use specifically Gemini for this task but they probably have a specialized version of Gemini that is working for this. I mean if the model is thinking and it's operating relatively slowly then this would be that kind of model. So it would mean that maybe Google could potentially use this model for use cases, but it does show us the fact that Google isn't completely behind as many people would have you believe. And of course, lastly, this article talks about the fact that Google will need to convince people that it AI agent would safely handle their personal data, including login passwords and credit card information, which it requires so it can visit sites to complete tasks or make purchases based on a customer's request. Of course, it references the fact that LLMs have been

Technical Details

making errors in their answers. We saw with Google's search engine that there were a few bad answers here and there that were unfortunate due to, you know, a few Reddit comments. But I think this is something that over time will manage to iron out these small issues. So, let me know what you think about Google's agents. Are you excited? I certainly am because I think it's going to be something that's really lightweight and something that works really well considering Google's user interface choices. So, that being said, if you guys have enjoyed the video, let me know what you think about this.

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник