# Google Just WON The A.I Race.. (Wow)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=6GO7bPb5cTA
- **Дата:** 21.05.2025
- **Длительность:** 49:44
- **Просмотры:** 486,647

## Описание

Join my AI Academy - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:
https://www.youtube.com/watch?v=LxvErFkBXPk&pp=ygUJZ29vZ2VsIGlv

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=6GO7bPb5cTA) Segment 1 (00:00 - 05:00)

So, Google just took over AI completely. In today's video, I'll go over the key announcements that you want to be aware of because there were quite a ton. One of the first announcements that was absolutely incredible is Google's Beam. This is a video communication platform that uses multiple cameras to be able to provide you with a 3D visualization of the person you're on a video call with. It's deeply immersive and honestly, it might become the future of communication. And today we are ready to announce our next chapter. Introducing Google Beam, a new AI first video communications platform. Beam uses a new state-of-the-art video model to transform 2D video streams into a realistic 3D experience. Behind the scenes, an array of six cameras captures you from different angles. And with AI, we can merge these video streams together and render you on a 3D light field display with nearperfect headtracking down to the millimeter and at 60 frames per second, all in real time. The result, a much more natural and deeply immersive conversational experience. We are so excited to bring this technology to others. In collaboration with HP, the first Google Beam devices will be available for early customers later this year. HP will have a lot more to share a few weeks from now. Stay tuned. Next, this is something that I think is absolutely incredible. We all know just how crazy AI has gotten when it comes to AI voices. But Google are taking that one step further by introducing real time speech translation. Now with Gemini, what you'll be able to do on meetings and on calls is you'll be able to enable speech translation and understand someone else even if they're speaking in another language. I think this is going to work wonders for breaking down borders. Been bringing underlying technology from Starline into Google Meet that includes realtime speech translation to help break down language barriers. Here's an example of how this could be useful when booking a vacation rental in South America and you don't speak the language. Let's take a look. Hi, Camila. Let me turn on speech translation. It's nice to finally talk to you. going to love visiting the city. The house is in a very nice neighborhood and overlooks the mountains. That sounds wonderful. There's a bus nearby, but I would recommend renting a car so you can visit the nature and enjoy it. That sounds great. You can see how well it matches the speaker's tone patterns and even their expressions. We are even closer to having a natural and free flowing conversation across languages. And today we are introducing this real-time speech translation directly in Google Meet. English and Spanish translation is now available for subscribers with more languages rolling out in the next few weeks. Next, we have the absolutely incredible AI assistant. This is something that I was super excited for and this is project Astra. So, this is basically where you can have an AI that can view what you can view using your mobile phone and you can literally get it to help you do anything. It's absolutely incredible. So, check out the real demo. Their early research project that debuted on the IO stage was project Astra. It explores the future capabilities of a universal AI assistant that can understand the world around you. We are starting to bring it to our products today. Gemini live as Project Astra's camera and screen sharing capabilities so you can talk about anything you see. People are using it in so many ways whether practicing for a job interview or training for a marathon. We've been appreciating the feedback from our trusted testers and some who are a little less trusted. Take a look. That's a pretty nice convertible. I think you might have mistaken the garbage truck for a convertible. Is there anything else I can help you with? What's this skinny building doing in my neighborhood? It's a street light, not a building. Why are these palm trees so short? I'm worried about them. They're not short. They're actually pretty tall. Sick convertible. Garbage truck again. Anything else? Why do people keep delivering packages to my lawn? It's not a package. It's a utility box. Why is

### [5:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=300s) Segment 2 (05:00 - 10:00)

this person following me wherever I walk? No one's following you. That's just your shadow. Gemini is pretty good at telling you when you're wrong. We are rolling this out to everyone on Android and iOS starting today. For example, we've upgraded voice output to be more natural with native audio. We've improved memory and added computer control. Let's take a look. I'm here with the latest research prototype of Project Astra and we're going to see how it can help me fix my bike. Hey, go online and look for a user's manual for a Huffy mountain bike. I have found a user manual for you. What would you like to know about it? Okay, cool. Now, scroll until you find the section about the brakes. Sure thing. Looking for that page now. I think I stripped the screw. Can you go on YouTube and find a video for how to fix that? Of course. I'm opening YouTube now. This looks like a good video. Place a rubber band over the head of the screw and press down firmly. Can you go through my emails with the bike shop and find which size hex nut I need? Sure. According to the email from Bicycle Habitat, you need a 3/8 in hex nut for your sprocket. I've highlighted the correct bin for you. It seems like I need a spare tension screw. Can you call the nearest bike shop and see what they have in stock? Yep, calling them now. I'll get back to you with what they have in stock. I think my brake pads are wearing down. Can you check the manual again and see if it talks about him anywhere? According to the manual, if you look on page 24, do you want to get lunch? Yeah, give me 5 minutes. Can you finish what you were saying? As I was saying, if you look on page 24, section two, you'll see how to replace the brake pads. Hey, uh, any updates on that call? Yep. I just got off of the bike shop. They confirmed they have your tension screw in stock. Would you like me to place a pickup order? Could you show me some examples of dog baskets I can put on my bike? Sure, I can help you with that. Just give me a moment. Here are some options. I think Zuka would look really great in these. Next, we have computer use when it gets into the agent area. And this one is super interesting. So Project Mariner is basically an advanced AI agent that can work with your computer/ browser. So it's going to be doing things and it can run up to 10 simultaneous tasks. Now this is actually going to be rolled out to developers first. So it's quite likely that they will find out the best use cases for this. But this is super interesting because Google has been pushing forward the AI agent space. Stepping back, we think of agents as systems that combine the intelligence of advanced AI models with access to tools. They can take actions on your behalf and under your control. Computer use is an important agentic capability. It's what enables agents to interact with and operate browsers and other software. Project Mariner was an early step forward in testing computer use capabilities. We released it as an early research prototype in December and we've made a lot of progress since. First, we are introducing multitasking and it can now oversee up to 10 simultaneous tasks. Second, it's using a feature called teach and repeat. This is where you can show it a task once and it learns a plan for similar tasks in the future. We are bringing project mariner's computer use capabilities to developers via the Gemini API. Trusted testers like Automation Anywhere and UiPath are already starting to build with it and it will be available more broadly this summer. Computer use is part of a broader set of tools we will need to build for an agent ecosystem to flourish like our open agentto agent protocol so that agents can talk to each other. We launched this at cloud next with the support of over 60 technology partners and hope to see that number grow. Then there is the model context protocol introduced by Anthropic so agents can access other services. And today we are excited to announce that our Gemini SDK is now compatible with MCP tools. And speaking of AI agents, we actually do have agent mode. This is basically where you have browsing on steroids. If you need to do something, your browser is going to whip up a bunch of AI agents, execute the task for you, and it's going to go to work for you behind the scenes. Think of it as you need to Google and find some tickets, but it's just going to do that in a much more efficient way. These technologies will work together to make agents even more useful and we are starting to bring agentic capabilities to Chrome search and the

### [10:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=600s) Segment 3 (10:00 - 15:00)

Gemini app. Let me show you what we are excited about in the Gemini app. We call it agent mode. Say you want to find an apartment for you and two roommates in Austin. You've each got a budget of $1,200 a month. You want a washerdryer or at least a laundromat nearby. Normally, you'd have to spend a lot of time scrolling through endless listings. Using agent mode, the Gemini app goes to work behind the scenes. It finds listings from sites like Zillow that match your criteria and uses Project Mariner when needed to adjust very specific filters. If there's an apartment you want to check out, Gemini uses MCP to access the listings and even schedule a tour on your behalf. And it'll keep browsing for new listings for as long as you need, freeing you up to do the stuff you want to do, like plan the housewarming party. It's great for companies like Zillow, bringing in new customers and improving conversion rates. An experimental version of the agent mode in the Gemini app will be coming soon to subscribers. This is a new and emerging area and we are excited to explore how best to bring the benefits of agents to users and the ecosystem more broadly. Next is something that I'm really glad Google is adding is the personal memory feature. One of the key standout features of chat GBT is to be able to have that rich context of the past conversations and to know exactly who I am to provide personalized recommendations. And Google, they're going all in on this front. In fact, in the presentation, they actually reference this quite a bit. We are working to bring this to life with something we call personal context. With your permission, Gemini models can use relevant context across your Google apps in a way that is private, transparent, and fully under your control. Let me show you an example in Gmail. You might be familiar with our AI powered smart reply features. It's amazing how popular they are. Now, imagine if those responses could sound like you. That's the idea behind personalized smart replies. Let's say my friend wrote to me looking for advice. He's taking a road trip to Utah and he remembers, "I did this trip before. " Now, if I'm being honest, I would probably reply something short and unhelpful. Sorry, Felix. But with personalized smart replies, I can be a better friend. That's because Gemini can do almost all the work for me. Looking up my notes and drive, scanning past emails for reservations, and finding my itinerary in Google Docs, trip to Zion National Park. Gemini matches my typical greetings from past emails, captures my tone, style, and favorite word choices, and then it automatically generates a reply. I love how it included details like keeping driving time under 5 hours per day and it uses my favorite adjective, exciting. Looks great. Maybe you want to make a couple of changes to it and hit send. This will be available in Gmail this summer for subscribers. Now, crazy when it comes to the new LLMs, they actually released a new large language model called Gemini 2. 5 Flash. And this is better than every other AI in nearly every dimension for a fraction of the cost and 10 times the speed. It's absolutely crazy. Gemini Flash is our most efficient workhorse model. It's been incredibly popular with developers who love its speed and low cost. Today, I'm thrilled to announce that we're releasing an updated version of 2. 5 Flash. The new Flash is better in nearly every dimension, improving across key benchmarks for reasoning, code, and long context. In fact, it's second only to 2. 5 Pro on the LM Arena leaderboard. I'm excited to say that Flash will be generally available in early June with Pro soon after. So, one thing that I was genuinely surprised by was the Google Gemini Deep Think. I knew that models that think for longer turn out to be smarter, but Gemini 2. 5 Pro was already smart enough. Turns out they let it think for even more time and it crushes the overall benchmark. So, this was a shocker for me. We've been busy exploring the frontiers of thinking capabilities in Gemini 2. 5. As we know from our experience with Alph Go, responses improve when we give these models more time to think. Today, we're making 2. 5 Pro even better by introducing a new mode we're calling Deep Think. It pushes model performance to its limits, delivering groundbreaking results. Deepthink uses our latest cutting edge research in thinking and

### [15:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=900s) Segment 4 (15:00 - 20:00)

reasoning, including parallel techniques. So far, we've seen incredible performance. It gets an impressive score on USA Mo 2025, currently one of the hardest mass benchmarks. It leads on live codebench, a difficult benchmark for co competition level coding. And since Gemini has been natively multimodal from the start, it's no surprise that it also excels on the ma main benchmark measuring this mmu. Because we're defining the frontier with 2. 5 Pro Deep Think, we're taking a little bit of extra time to conduct more Frontier safety evaluations and get further input from safety experts. As part of that, we're going to make it available to trusted testers via the Gemini API to get their feedback before making it widely available. Next, we have Gemini diffusion, which actually took the Twitter world by storm. And this is an experimental research model from Google Deep Mind that applies diffusion modeling, a technique popularized in image and video generation to text and code generation. So unlike traditional LLMs which generate text one token at a time, Gemini diffusion generates text by iteratively refining random noise into coherent outputs and processes entire sequences in parallel, which makes it way faster. And like Demis said, we're always innovating on new approaches to improve our models, including making them more efficient and performant. We first revolutionized image and video generation by pioneering diffusion techniques. A diffusion model learns to generate outputs by refining noise step by step. Today, we're bringing the power of diffusion to text with our newest research model. This helps it excel at tasks like editing, including in the context of math and code. Because it doesn't just generate left to right, it can iterate on a solution very quickly and error correct during the generation process. Gemini diffusion is a state-of-the-art experimental text diffusion model that leverages this parallel generation to achieve extremely low latency. For example, the version of Gemini Diffusion we're releasing today generates five times faster than even 2. 0 Flashlight, our fastest model so far, while matching its coding performance. So, take this math example. Ready? Go. If you blinked, you missed it. Now, earlier we sped things up, but this time we're going to slow it down a little bit. Pretty cool to see the process of how the model gets to the answer of 39. This model is currently testing with a small group and we'll continue our work on different approaches lowering latency in all of our Gemini models with a faster 2. 5 flashlight coming soon. Now, we also got native audio output, which is absolutely outstanding. I knew Google's audio output was going to be really good. But hearing it here for the first time made me realize just how on the ball Google is. First, in addition to the new 2. 5 flash that Demis mentioned, we are also introducing new previews for text to speech. These now have a firstofits-kind multie support for two voices built on native audio output. This means the model can converse in more expressive ways. It can capture the really subtle nuances of how we speak. It can even seamlessly switch to a whisper like this. This works in over 24 languages and it can even easily go between languages. So the model can begin speaking in English but then and switch back all with the same voice. Now when it comes to coding with Google Gemini's 2. 5 Pro models, they are actually really incredible. And here they actually showcase a really nice demo. Now as you heard from Demis, Gemini 2. 5 Pro is incredible at coding. So now let me show you how you can take any idea you have and bring it to life. So if you've ever been to the American Museum of Natural History in New York City, it has a set of amazing exhibits. So to bring that to you today, I got 2. 5 Pro to code me a simple web app in Google AI Studio to share some photos and learn more. So, here's what I have so far, but I want to make it more interactive. And I'm still brainstorming the design, but I've got some ideas. You've seen something like this before, right? Someone comes to you with

### [20:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=1200s) Segment 5 (20:00 - 25:00)

a brilliant idea scratched on a napkin. As a PM, I'm often this someone. Now, standard two-dimensional web design is one thing, but I wanted to make it 3D. And I learned that jumping into 3D isn't easy. It requires learning about all kinds of new things. Setting up a scene, camera, lighting, and more. Luckily for me, 2. 5 Pro can help. So here what I'm going to do is I am going to add the image I just showed you of the sphere and I'm going to add in a prompt that asks 2. 5 Pro to update my code based on the image. So we'll let 2. 5 Pro get to work. And as you can see it's starting to think and it's going ahead and creating a plan based on what I asked for and it'll apply it to my existing codebase. Because Gemini is multimodal, it can understand the abstract sphere sketch and code beautiful 3D animations, applying them to my existing app. So, this takes about 2 minutes. So, for the purpose of time, we're going to do this baking show style. And I'm going to jump to another tab that I ran right before this keynote with the same prompt. And here's what Gemini generates. Whoa. We went from that rough sketch directly to code updating multiple of my files and actually you can see it thought for 37 seconds and you can see the changes it thought through and then the files it updated. We did all of this in AI Studio. So once I finished prototyping I can simply deploy the code along with my Gemini API key. So, here's our final app in Chrome. Look at these animations. And I didn't need to have advanced knowledge of 3. js libraries or figure out the complex 3D math to build this. I know it would have taken forever to do this by hand. And instead, I was able to create this just based on a sketch. I can make this experience even richer with multimodality. So, I used 2. 5 Flash to add a question to each photo, inviting you to learn a little more about it. But what if it talked? That's where Gemini's native audio comes in. That's a panglin, and its scales are made of keratin, just like your fingernails. Wow. Now we're talking. You can hear how you can add expressive audio right into your apps. And before I share more, I'll leave this demo with another fun layout that 2. 5 Pro coded just for us. And if we're talking about coding, we can actually also take a look at Jules, the coding agent released by Google Gemini. Just submit a task and Jules takes care of the rest, fixing bugs, making updates. It integrates with GitHub and works on its own. Jules can tackle complex tasks in large code bases that used to take hours, like updating an older version of Node. j JS. It can plan the steps, modify files, and more in minutes. So today, I'm delighted to announce that Jules is now in public beta, so anyone can sign up at jewels. google. Now, I know AI overviews haven't always been great from Google, but they've actually updated it to be a lot better than it was previously. And Google has actually rolled out something called AI mode, where basically you're actually going to use AI to reimagine the search engine. Our Gemini models are helping to make Google search more intelligent, agentic, and personalized. One great example of progress is our AIO views. Since launching at IO last year, they've scaled up to over 1. 5 billion users every month in more than 200 countries and territories. As people use AI overviews, we see they are happier with their results and they search more often. In our biggest markets like the US and India, AIO views are driving over 10% growth in the types of queries that show them. What's particularly exciting is that this growth increases over time. It's one of the most successful launches in search in the past decade. AI overviews are also one of the strongest drivers of growth for visual searches in Google Lens. Lens grew 65% year-over-year with more than 100 billion visual searches already this year. So, people are asking more queries and they're also asking more complex

### [25:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=1500s) Segment 6 (25:00 - 30:00)

queries. With our latest Gemini models, I our AI overviews are at the quality and accuracy you've come to expect from search and are the fastest in the industry. For those who want an end to end AI search experience, we are introducing an all new AI mode. It's a total reimagining of search. With more advanced reasoning, you can ask AI mode longer and more complex queries like this. In fact, users have been asking much longer queries, two to three times the length of traditional searches, and you can go further with follow-up questions. All of this is available today as a new tab right in search. I've been using it a lot, and it's completely changed how I use search and I'm excited to share that AI mode is coming to everyone in the US starting today. AI modus search transformed with Gemini 2. 5 at its core. It's our most powerful AI search able to tackle any question. And as Sunder announced, we're excited to start rolling out AI mode for everyone in the US starting today. You'll find it as a new tab directly in search or right from your search bar. AI mode will be loaded up with all of our best AI features and capabilities. But it's even more than that. It's a glimpse of what's to come. Over time, we'll graduate many of AI mode's cutting edge features and capabilities directly into the core search experience. That starts today as we bring the same models that power AI mode to power AI overviews. So you can bring your hardest questions right to the search box. Today we'll give you a tour of AI mode and you'll see how it works and how it's getting even better with personal context, deeper research, complex analysis and visualization, live multimodality, and new ways to shop. Now, that's a lot because AI mode can do a lot. So let's dive in. First, with AI mode, you can ask whatever is on your mind. And as you can see here, search gets to work. It generates your response, putting everything together for you, including links to content and creators you might not have otherwise discovered and merchants and businesses with useful information like ratings. Search uses AI to dynamically adapt the entire UI, the combination of text, images, links, even this map just for your question. and you can follow up conversationally. Now, AI mode isn't just giving you information. It's bringing a whole new level of intelligence to search. What makes this possible is something we call our query fanout technique. Now, under the hood, search recognizes when a question needs advanced reasoning. It calls on our custom version of Gemini to break the question into different subtopics, and it issues a multitude of queries simultaneously on your behalf. It searches across the entire web going way deeper than a traditional search. And it taps into all of our data sets of real-time information like the knowledge graph, the shopping graph, and in this case, local data, including insights from our maps community of over 500 million contributors. Search pulls together a response and it checks its work to make sure it meets our high bar for information quality. And if it detects any gaps, it issues even more searches to fill them in. That means with AI mode, you get all of this from just a single search. And you get it fast. Now, let's take a look at what's coming next. AI mode, starting in labs. Soon, AI mode will be able to make your responses even more helpful with personalized suggestions based on your past searches. You can also opt in to connect other Google apps starting with Gmail. We call this personal context and you'll see when AI mode is bringing yours in to help. So now based on your recent restaurant bookings and searches, it gets that you prefer outdoor seating. And since you subscribe to those gallery newsletters, it suggests some cool art exhibits to check out while you're in town. But that's not all. Because your flight and hotel confirmations are in your inbox. You get event ideas that sync up with when you'll actually be in Nashville with many nearby where you're staying. You can see how personal context in AI mode makes search really yours with recommendations customized just for you. Now, this is always under your control and you can choose to connect or disconnect at any

### [30:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=1800s) Segment 7 (30:00 - 35:00)

time. Personal context is coming to AI mode this summer. Next, for questions when you want an even more thorough response, we're bringing deep research capabilities into AI mode. You already come to search today to really unpack a topic, but this brings it to a much deeper level. So much so that we're calling this deep search. Deep search uses the same query fan out technique you just heard about, but multiplied. It can issue dozens or even hundreds of searches on your behalf. It reasons across all those disparate pieces of information to create an expert level fully cited report in just minutes. It includes links to the web throughout so you can easily explore and take action. Now, that's a core part of how we've built AI mode overall and how we've always thought about AI and search because we believe AI will be the most powerful engine for discovery that the web has ever seen, helping people discover even more of what the web has to offer and find incredible, hyper relevant content. You're starting to see how search is becoming more intelligent. And we've got more to show you. So, I'm a huge baseball fan, and lately there's been a lot of buzz about these new torpedo bats. If you don't follow baseball, it's a new bat design where more of the weight of the bat is in the sweet spot. As you can see, I've been digging in on whether it's making a real impact on the game. And now, I'm wondering what the numbers say. So, I'll ask, show the batting average and on base percentage for this season and last for notable players who currently use a torpedo bat. Think about it. There are so many parts to that question. Search needs to understand who the notable players are, which ones are using torpedo bats, and their stats. I get this helpful response, including this easyto read table. And I know that this is fresh and accurate since it uses our sports data that's continuously updated down to the last strike. Search even brings in important context like that. It's still early in the season. I can follow up and ask, "How many home runs have these players hit this season? " And just like that, I get this graph. This goes back to what Liz mentioned about AI mode dynamically generating the right UI for each response. Search figured out that the best way to present this information is a graph and it created it. It's like having my very own sports analyst right in search. Complex analysis and data visualization is coming this summer for sports and financial questions. So, all this talk about baseball made me want to get closer to the game, like at the next game close. But finding the perfect tickets can be a chore. So, I'm excited to share that we're bringing Project Mariner's agentic capabilities into AI mode. You've already seen how AI mode is becoming more intelligent and personalized. And here's where you start to see search getting more agentic. Search can take work off my plate while still under my control. I'll say, "Find two affordable tickets for this Saturday's Reds game in the lower level. " Search kicks off a query fan out, looking across several sites to analyze hundreds of potential ticket options, doing the tedious work of filling in forms with all the criteria I asked for. and it puts it all together, reasoning across the results to analyze real time pricing and inventory. Then right here, task complete. I get great ticket options with helpful context so I can make an informed decision. Looks like these seats have a good view and at a reasonable price. Search helps me skip a bunch of steps, linking me right to finish checking out. Ticket secured. Now, something really cool that they announced was something called Search Live. So, Search Live is basically like Project Astra, except it involves using Google to search through the internet to make sure that whatever information you're requesting is up to date. Next, let's talk about multimodality. We've been blazing the trail for multimodal search since before it was really even a thing. We introduced Google Lens on this very stage back in 2017, and since then, we've made it even easier to see what you search what you see. Snap a picture with Google Lens or simply circle the search and you can get an AI overview instantly. Like Sundar mentioned, visual search is on fire and I'm today I'm excited to share that Lens has over 1. 5 billion users every month. Now, we're taking the next big leap in multimodality by bringing Project

### [35:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=2100s) Segment 8 (35:00 - 40:00)

Astra's live capabilities into AI mode. Think about all those questions that are so much simpler to just talk through and actually show what you mean, like a DIY home repair, a tricky school assignment, or learning a new skill. We call this Search Live. And now using your camera, search can see what you see and give you helpful information as you go back and forth in real time. It's like hopping on a video call with Search. I have three kids and they ask about a million questions a minute. And with summer right around the corner, the team and I decided to put Search Live to the ultimate test, helping us and our kids tackle something new. We recorded at home with our families just this past weekend. Let's take a look. It looks like you're about to do a fun science experiment. All right. Ready, Anna? Ready, AI? Yep. Okay. Are you ready for your science experiment? Yeah. Can you guess which experiment I'm trying to do? I have hydrogen peroxide and dish soap and yeast. You're likely going to make elephant toothpaste. How do I know which one of these strawberries is ready to eat? I'm trying to get this remote to work. It looks like someone is ready to get their hands dirty and plant a green bean seedling. Pump it to about 50 lbs per square inch. But don't go over 90 PSI. I mixed the baking soda. What do I do next? It looks like someone tried dipping a finger into the cinnamon water. What should I do to make this even more impressive? You could try using a different catalyst. Potassium iodide is a good option. Wo, why is it doing that? A ripe strawberry will also have a sweet smell. The green leaves at the top should look fresh and vibrant. The chemical reaction is going well. This is awesome. Thanks. Can we do it again? Go. Oh boy. How do I get strawberries stains out of a shirt? Try using a mix of 1 tbsp white vinegar, half a teaspoon liquid laundry detergent. Now, something that was also really cool, and I'm glad this finally made it to production is Google's AI tryon. So, one thing that Generative AI is really good at is being able to quickly put on clothes and adjust them to a body that it sees. And Google is basically implementing this into various e-commerce websites. So if you are doing some online shopping, you'll quickly be able to apply the clothes onto your body and see whether or not you like it. So that kind of interactive shopping is going to be upgraded with Gemini. So they include a full demo here and I really can't wait for this stuff to arrive because I think it's going to change shopping. I mean, how many times have you ordered something and it's just not what you think? All right, more on how it works. It brings in advanced 3D shape understanding which allows us to perceive shapes and depths more accurately, helping us better visualize the human body. Our Trion experience works with your photo. It's not some pre-captured image or a model that doesn't look like you. And then when it comes to clothes that you're interested in, the AI model is able to show how this material will fold and stretch and drape on people. This technology is the most state-of-the-art in the industry at scale. And it allows us to visualize how billions of apparel products look on a wide variety of people. And you can see it here how it really gives me a feel for how this dress might look on me. All right. So, I'm now set on the dress and search can help me find it at the price that I want and buy it for me with our new agentic checkout feature. So, let me get back here to the dress. And um I'm going to click this thing to track price. I pick my size. Then I have to set a target price. I'm going to set it to about $50. And tracking is happening. Search will now continuously check websites where the dress is available and then let me know if the price drops. So now let's switch out of our live demo mode. And then I'm going to sprinkle some IO magic. And let's assume the price is now dropped. When that happens, I get a notification just like this. And if I want to buy, my checkout agent will add the right size and color to my cart. I can choose to review all my payment and shipping information or just the let the agent just buy it for me. So now, Gemini Live, which I don't think most people even know exists, is basically like an AI assistant where you can just talk to it via voice. Basically, if you don't just want to chat to Gemini on the chat model, you can basically whip up a quick conversation. But now, they've upgraded it so that you can actually screen share on mobile. And they've added the camera function to this as well. So, it's going to be super interesting to see how talking with an AI changes over time. So, let's talk more about how all this is coming together to life in the Gemini app. We're launching five things today.

### [40:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=2400s) Segment 9 (40:00 - 45:00)

First, let's talk about Gemini Live. People are blown away by how interactive and natural the conversations are, and it works in over 45 languages, more than 150 countries. It's so intuitive, so engaging. The conversations, in fact, are five times longer than the text conversations in the app. And I can tell you from personal experience, it's great for talking through things on the drive-in to work in the morning. Now, as Sundar mentioned, Gemini Live now includes camera and screen sharing, both of which are incredible. All of it is rolling out free of charge in the Gemini app on Android and iOS today. And in the coming weeks, you'll be able to connect Gemini Live to some of your favorite apps like Calendar, Maps, Keep, Tasks. So soon, you can just point your camera and ask it to add an invite to your calendar and it'll be done. Or if you need to decipher your roommates's handwriting for the shopping list, Gemini Live can turn those scribbles into a neat list in Google Keep. Our Gemini RAW live road map is overflowing with exciting things. They're all being prototyped in Project Astra like you saw earlier. And as those ideas mature, we'll graduate them into Gemini Live for everyone. And since Gemini and Android work so closely together, many of those experiences will work great on Android across the entire ecosystem. So stay tuned for more. Well, I want to really understand how to visually conceptualize something. Imagin 4 is really good for high quality images and now they've actually upgraded it so that it does include good text generation. Starting today we're bringing our latest and most capable image generation model into the Gemini app. It's called Imagine 4 and it's a big leap forward. The images are richer with more nuanced colors and fine grain details, the shadows in the different shots, the water droplets that come through in the photos. I've spent a lot of time around these models and I can say this model and the progression has gone from good to great to stunning. And Imagine 4 is so much better at text and topography. In the past, you might have created something that looked good, but adding words didn't always work just right. So, check this out. Maybe I want to create a poster for a music festival. We'll make the Chrome Dino the big headliner. Imagine 4 doesn't just get the text and spelling right. It's actually making creative choices like using dinosaur bones in the font or figuring out the spacing, the font size, the layout that makes it look like this great poster. So, the image quality is higher, the speed is faster, the text is better. All of this lets you make posters, party, and anything else. And with Gemini's native image generation, you can easily edit these images, too, right in the app. We've also made a super fast variant of Imagine 4. We can't wait for you to get your hands on it. In fact, it's 10 times faster than our previous model, so you can iterate through many ideas quickly. And probably one of the most insane things that were released this Google IO was V3. So V3 was not only released, but it got not only upgraded physics, but they added native sound effects into the final video production. So it's pretty incredible when you see the demo because I still can't believe it. And there are just so many different videos online that basically sort of break reality now. But I think it's really incredible on how it works. All right, I want to show you one last thing. Images are incredible, but sometimes you need motion and sound to tell the whole story. Last December, V2 came out and it redefined video generation for the industry. And if you saw Demis' sizzling onions post yesterday, you know that we've been cooking something else. Today, I'm excited to announce our new state-of-the-art model, VO3. And like a lot of other things you've heard about from stage today, it's available today. The visual quality is even better. Its understanding of physics is stronger. But here's the leap forward. V3 comes with native audio generation. That means that V3 can generate sound effects, background sounds, and dialogue. Now you prompt it and your

### [45:00](https://www.youtube.com/watch?v=6GO7bPb5cTA&t=2700s) Segment 10 (45:00 - 49:00)

characters can speak. They left behind a ball today. It bounced higher than I can jump. What manner of magic is that? This ocean, it's a force, a wild, untamed might. And she commands your awe with every breaking light. The microfilm is in your ticket. They're watching the north exit. Use the service tunnel. Based on our collaboration with the creative community, we've been building a new AI film making tool for creatives. One that combines the best of VO, imagine, and Gemini. A tool built for creatives by creatives. It's inspired by that magical feeling you get when you get lost in the creative zone and time slows down. We're calling it Flow and it's launching today. Let me show you how it works. Let's drop into a hero, the Grandpa, is building a flying car with help from a feathered friend. These are my ingredients, the old man and his car. We make it easy to upload your own images into the tool or you can generate them on the fly using Imagine which is built right in. We can create a custom gold gear shift just by describing it. There it is. Pretty cool. Next, you can start to assemble all of those clips together with a single prompt. You can describe what you want, including very precise camera controls. flow puts everything in place and I can keep iterating in the scene builder. Now, here's where it gets really exciting. If I want to capture the next shot of the scene, I can just hit the plus icon to create the next shot. I can describe what I want to happen next, like adding a 10-ft tall chicken in the back seat, and Flow will do the rest. The character consistency, the scene consistency, it just works. And if something isn't oh quite right, no problem. You can just go back in like any other video tool and trim it up if it's not working for you. But flow works in the other direction as well. It lets you extend a clip too. So I can get the perfect ending that I've been working. Once I've got all the clips I need, I can download the files. I can bring them into my favorite editing software, add some music from Lyria, and now the old man finally has his flying car. Now, remember how we were just talking about VO3? They actually introduced something so that if you want to make a short AI film, you can actually do so rather easily. And the AI films that I've been seeing on Twitter, they've been nothing short of outstanding. So, it really makes me wonder where the field of entertainment is going to head next. I don't know if I'm on the right path, but I'm trying to find it. I'm questioning, searching, and then something shifts and I'm not trying anymore. I'm just doing, and all of the pieces start falling into place. It all feels pretty clear in my head. I see these like flashes of possibilities, almost like I'm traveling through dimensions. I'm looking down at myself and my characters in these different worlds and it's almost coming to life on their own. Even though I know I'm in control of that narrative, it feels like it's almost building upon itself at some point. You know, you could have an infinite amount of endings to your story. So, the work isn't built brick by brick. It blooms like a spontaneous garden. It grows naturally, fully vibrant and complete. I'm not forcing it. I'm just finding it. And that's when I know I'm in the right place.

---
*Источник: https://ekstraktznaniy.ru/video/12747*