generation models before. We have a very colorful landscape there with like Sunno, Yudio, Lirya by Google was um introduced recently only as a research preview but they only generate audio. Then we have video generation models like VO and all the Chinese models and like cling and then we have runway that was always used to lead the pack and then many players there. Then we have the image generation models, right? With uh Google's Image Genen 3 that used to also be one of the best. Then GPD 40 image generation came and kind of dethroned that, but they updated all of that. Okay. And not just that they updated that, the new VO, the VO3 now also does audio. So I'm going to put on my headphones now and like listen closely cuz this is a first. You can just put in a prompt and there's many improved things about the model like improved physics and realism and things like that, but mainly just check out this little video of a cat with a hat typing. This came straight out of V3. That's just great. That's just perfect in terms of sound design. The sound effects are spot-on. There's ambience behind it and even the video movement with the slight dolly in. Excellent. Okay, many more examples here. I actually played around a bunch. I can show you a few more here. So, that one is with music or this one is a little weird, but still good. Really impressive stuff here with the audio. I mean, okay, look, there's three arms. It's not perfect, right? But it's way better than the ones before. And this is the first one that does the sounds. Okay, check this one out. I really like this one. The sounds just brings it to life, doesn't it? Okay, one more and then we'll round out this little segment. I mean, it generates music there for me. You have the crowd. Do you hear the crowds there? The crowd looks realistic. The cat. This looks like something out of a animated movie. It's kind of perfect. I prompted for a cat with a hat with a monokal uh DJing at a party or something. And this is not it. Okay, that's the new model that we covered. I'm not even going to go really into the image model here because I think this is the big news here. Yes, they have a new image model. It has better prompt adherence when it comes to text, but I think, you know, that's sort of like on par with GPT40 now from what it seems. But this video model just blows everything else in the market out of the water because they also added what previously used to be my personal favorite interface to work with AI video in the web, which was Sora. It's not the best model, but I thought the interface was really good. You could edit things together really easily, extend stuff like that. Well, they have that, too. They have the scene builder. You can essentially take multiple clips and you can add them to a scene. So, I think I could just click this. Yeah, there you go. There's one scene and then I could take another one. And you can kind of stitch these things together. You can extend, you can work within that interface and you can, I don't know, create a little monoal cat with hat dance party like so. And as you can see, you can trim right in here. You can resort, rearrange the things. It's very simplistic, but allows you to create a quick little sequence. There's one more thing that I wanted to show you. Okay, beside the scene builder, again, this is ultra plan only. By the way, if you're using this, you can use text to video. So, you can say a cat with a hat and then it will, you know, generate that. As you're doing that, there's these settings that you need to pay attention to because by default, these are usually not set to the high setting and they're on the old model. Okay, so you need to switch to highest quality and then it's going to generate two of them at the same time. So, here I'm going to do another one with actually the highest settings. Then you can pick between uh two scenes when it generates that in here. Rearrange, do things like that. But I think the key feature here is this inside of this new flow application that they shipped here. Okay, the key feature here is you can actually pick different ingredients. Now, this is not fully functional with the new models yet. They just shipped this base version. They're going to update it over time, but you can essentially use the new imagin to generate images like this or like this, whatever you want to do. And then you can use them as reference images. Okay? And then you can mix and match these different reference images. So you could create a image like this of I don't know this beautiful Garden of Eden type scene and then you can pick a character that you have over here and then you can prompt on top of it and then it uses those things in combination to generate videos. Now this does not work well with the V3 yet. I tested it a bunch. You can see from my early results here some of these videos have been generated with this mixing. I suppose it work. But look, they're just not that good. There's actually one more I need to show you before we move on here. And I think that was the little break dancing one. Oh yeah. Okay. Full screen this one. What's even going on? I don't know. It is so much fun to play with. And let me tell you, like the audio adds another layer to it that really sells the whole thing. You know, sure it was fun to play with AI video, but with this, I don't know. It just feels different. It feels more immersive. And then you can build little sequences and you can mix and match different characters that you generate into scenes and you can turn them into videos with audio right away. I mean I'm doing cats with hats, but you could do anything. You could generate images of humans, characters, whatever you want to do and make things like this. So this is the new flow. Okay. And they have some great video demos there how artists and filmmakers are using this to create like extra footage for what they're doing. They had an app called Whisk before this. I thought that was the best interface to generate images especially for newcomers. that was exactly like this, but just for images. You could pick ingredients, pick styles here. That's still grayed out here. But you can mix and match things and use all of these generative tools in a super simple interface to actually create unique things with audio tool. Now, as you can see, here's the results. See, it does it all. It comes up with the music for it. sounds. Okay, so that's amazing. So, that kind of covers the image gen and the video gen over here.
This is essentially a competitor to OpenAI's Codeex, which we haven't talked about in depth on the channel yet, but essentially Codeex, it's a developer product. Connects to your GitHub and then you can pick a branch and it you can spawn like 10 agents, like 10 junior coders or actually senior coders cuz it's that damn good. And then it works on the different tasks you give it. Jules is sort of their competition to this. I don't think this will well time will show. I will not judge this yet, but it looks like, you know, exact competitor to that. The reviews of Codeex in the developer community have been absolutely insane though. Everybody's just blown away by how good that is. This is their competitor to that. So they just kind of dropped it as a side note like hey we have Jules here that is an AI agent that integrates with GitHub and then can asynchronously code for you. So that's a developer focused thing that they released. Then I have project mariner. This is a thing that is actually available right now. I haven't managed to get it to work on my machine. the team already is trying it out, so we'll follow up on this. But it's essentially like Open Eye Operator, which for anybody who doesn't know is like a computer use agent. Basically using mouse and keyboard to operate your computer to get things done. The most interesting thing to me here is one feature that I've been missing from all of these computer use agents that tested so far. And we've tested them all. We actually have a set of test cases, test prompts that we run on them. They're not reliable. They're not good. Even operator like it loses the cookies after a while. It's only trained on these few partner sites where it works reliably like Airbnb and Door Dash and these sites. But beyond that, it's really not reliable enough to be a product. And the one thing that I've always wished for is why don't they let me train it? do the thing manually? Look at my screen. Heck, I'll do the thing manually five times and just take my behavior and then replicate that the consecutive 20 times that I would be doing the task. Well, Project Mariner has this feature, teach a task. So you can actually teach it on specific tasks. Now this is just getting started. It's coming as form of a Chrome extension and this one also is only under the ultra plan. Okay. So this is the $250 plan. I mean obviously you won't be getting computer use agents for a low price here. But as you can see project mariner is on this side right here. Here early access. This one is just getting started but the application the extension is out now. So if you're in the US you can download this and test this right now. I'll follow up on Friday with more practical use of this, but it looks super impressive. Few more things here also for the AI pro plan. By the way, just to follow up, this was the ultra plan, right? As I talked about this flow app, this is available for everybody. Just to make the pricing clear, this AI mode in Google should be available for everybody for free. This will be I think available for free. Don't quote me on that though. Jewels is I think you pay per use here and Mariner is in the ultra plan the $250 plan, right? And then we get to this. Okay, this is worth a dedicated video honestly going through each and every one of these. I loved featuring Google Labs products over the past few months already. Now they actually added so much. So Jules is in here as one of them. Another one is the synth ID detector that analyzes AI generated footage and tells you if it's AI generated. They said they did it already on 10 million images or something and it's just this new tech that analyzes if something is AI generated. So you can basically upload things to it and try it out. And then there's another 20 things like 15 of those are new. One of them is project marina flow we talked about also but then there's so many more. I would just recommend you just check this out. There's really fun ones that have been out for a while like gen type where you can just like type in a phrase and it generates it in like a custom font that it creates for you. That's really cool. But that's been around for a while. a bunch of experimental stuff like Project Astra 2 that's now sort of like built into the mobile app as I talked about with the video features. Notebook LM has a desktop app now too that we'll talk about in Friday's video and have a look at because notebook LM is great. So Marina isn't but most of this is free. So if you're looking for free stuff to try just go to labs. google/experiments and you can try some of these experiments some of these things that are available in here. Especially here at the bottom we had a lot of fun with genes. can kind of generate custom chess pieces. If you're into chess, check this out. Also free. A lot of these things are just immediately available in Google AI Studio, their developer interface. So again, Gemini right here is their consumer interface and Google AI Studio is their developer interface. They're kind of unifying everything. Actually, they're kind of doing well on the naming and everything. I like it a lot. Here they're supposed to have the new imaging model, which is not available yet. I think as of now, nothing really new is available in here yet, except of the new Gemini 2. 5 Pro model that has been out since two weeks though. This new flash that released today is available here immediately. Okay, so that's Gemini Studio. And then let's round out the video with the last few points. This is the live feature that I talked about. That's the interface where it's the voice assistant, but you can also use video to interface with it. Wonderful thing. Then we have Gemini Chrome. This is also going to be available to everybody soon. This is not out yet, but they're basically altering the Chrome browser. As I said, like they're redoing their entire product lineup with AI. Now, the Chrome browser is going to have a AI button at the top. And a lot of these extensions you might have seen over the past years, well, it's going to cannibalize those, right? So, like you're going to be able to click the extension and like have an AI assistant on every site. I imagine this would also work on like YouTube videos and all across the web. You could kind of just use Gemini and ask Gemini right there and chat with the website. So, that's coming. And then this is a page on imagen if you care about the new image model which is more realistic more details but mainly the main update is it does text really well which catches it up to GPT40 and then it's just this product page with an overview of everything. Oh yeah and then they have a small model with 4 billion parameters for all you technical nerds out there. 4 billion parameter model that is almost on par with some of the top models out there. That's just an insane one too. There is so much to talk about here. I think this does it for the initial overview. So, I don't need to recap. You've seen it all. I do recommend you check out the Google IO keynote. And on Friday's news you can use, I'm going to follow up with some of these stories. We're going to have a look at Project Mariner. We're hopefully going to get access to Gemini Ultra and the new Deep Research Compare that. And until then, follow us on social media to kind of check out more live updates. If you enjoyed this, don't forget to leave a like and subscribe for the Friday videos. And yeah, that's all I got today. Frankly, world changing releases. And all I'll say now is let's see what OpenAI does to answer this because um they got to make a big move to keep up. Good job, Google. I'm impressed.