Google's Secret Image Editing AI & More AI Use Cases
13:09

Google's Secret Image Editing AI & More AI Use Cases

The AI Advantage 22.08.2025 11 816 просмотров 462 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Save time & headaches on your next project with Jam.dev! Get started for free today 👉 http://bit.ly/41k8esj In this video, Igor showcases the image editing capabilities of the mysterious Nano Banana model and Qwen-Image-Edit, tests the new Eleven Video feature from ElevenLabs, discusses the new AGENTS.md format for unifying agent instructions, and more. Enjoy! Free AI Resources: 🔑 Free ChatGPT Templates: http://bit.ly/4fUlFFs 🌟 Tailored AI Prompts & Workflows: http://bit.ly/3UE03Dx Go Deeper with AI: 🎓 Join the AI Advantage Courses + Community: http://bit.ly/4mNtW0J 🛒 Shop Work-Focused Presets: http://bit.ly/3UK9scJ Links: https://x.com/romainhuet/status/1957924964105179455 https://agents.md/ https://x.com/elevenlabsio/status/1956406489356333225 https://elevenlabs.io/studio/video-to-music https://elevenlabs.io/blog/eleven-music-now-available-in-the-api https://elevenlabs.io/docs/best-practices/prompting/eleven-music https://x.com/Clad3815/status/1955882035320365480/photo/1 Chapters: 0:00 What’s New? 0:18 Nano Banana 2:34 Qwen-Image-Edit 4:05 Jam 6:33 11Labs Video-to-Music 8:34 Pixel 10 AI Features 9:19 AGENTS.md 10:36 ChatGPT Typos 11:22 GPT-5 is a Pokémon Champ 12:01 ChatGPT Go Connect with Me: 💼 AI Advantage on LinkedIn: http://bit.ly/47bqlV4 🧑‍💻 Igor Pogany on LinkedIn: http://bit.ly/3HO0kAO 🐦Twitter/X: http://bit.ly/41fg3j6 📸 Instagram: http://bit.ly/463ZGbF #ai #nanobanana #jamdev This video is sponsored by Jam.dev.

Оглавление (10 сегментов)

  1. 0:00 What’s New? 78 сл.
  2. 0:18 Nano Banana 494 сл.
  3. 2:34 Qwen-Image-Edit 373 сл.
  4. 4:05 Jam 555 сл.
  5. 6:33 11Labs Video-to-Music 437 сл.
  6. 8:34 Pixel 10 AI Features 172 сл.
  7. 9:19 AGENTS.md 274 сл.
  8. 10:36 ChatGPT Typos 184 сл.
  9. 11:22 GPT-5 is a Pokémon Champ 127 сл.
  10. 12:01 ChatGPT Go 269 сл.
0:00

What’s New?

This week in generative AI, we have several apps that can do Photoshop level edits just by speaking to them and some interesting standardization coming in when it comes to how you provide agents with context. That and a few more stories in a lightweight edition of AI news you can use. The show that looks at all the releases in generative AI from the last week and highlights the ones that you can use or that matter.
0:18

Nano Banana

Starting off the story that everybody has been talking about over the past few days. Nano Banana, that's the name of this tool that has been going around on LM Arena. And if you're not familiar, this is a platform that ranks various models on their capabilities. In this case, Nano Banana is a image editing AI model. So you give it an image and words and it edits them. You might have experienced this concept in Chat GPT. When you upload a picture and ask for some edits, it can do that too. But as you might know in chat GPT with the image generation tool, if you try to edit, it will change everything about the picture, including the look of the person that you're trying to edit, making it not even an option for most use cases. Nano Banana is different, but it's not like a fully public available thing. People are rumoring that Google is testing this tool against other models on LM Arena. And right now, the only way to really try to yourself is to have a little patience and to cycle through various tools while comparing two of them in their interface. And then eventually, you're going to find one with the name Nana Banana. We did this for you and tested the image editing capabilities with a prompt that you can now see compared to some of the competition. Now, here's the thing about the model. It's very good at editing stuff locally and adhering to your prompt. Meaning, if you tell it you only want to edit the jacket and you want to make it a certain material, it will actually do it. Whereas most other models are kind of hit and miss, as you can see in these examples, it has its weaknesses with the text, but I was actually kind of impressed by how good it is at, for example, replacing a piece of clothing like we did in this demo here. And this is giving Photoshop a real run for its money because this classic tool to alter images has a learning curve. Not everybody can just use that intuitively, whereas everybody can type what they need into a little text box and get results. And changing jackets is just one use case if you really have the power to accurately edit images with just words. So you could do image restoration like turning these historical photos into something current. And look at the accuracy between those two. You can change the lighting in the scene or take one base image and recreate that in many other copies. I mean, look at the character reference here between the various shots. And over here they look surprisingly similar. Whereas with all other models, usually they just look like a different person. Chachi being actually one of the worst ones when it comes to image reference. So this is really pushing what's possible with
2:34

Qwen-Image-Edit

these image models. And we had even a second model that I kind of want to follow the story up with released out of Quen this week. Quen image edit. And they lead with this example of one character into all of these variations. And again, the character reference on this is becoming so strong. We've never really seen it before at this level. Look, this is the input image and then you can create all of these variations. They have a lot more examples on their blog post. And if you actually want to try one of these tools, this is the one that's easy to access. If you go to chats. quen. ai, you can go in here and use the image editing tool right in here. And me and a team have actually been playing with these tools. Particularly, this Quinn model is actually really good at text. And you can do things like giving it a thumbnail like this one of me riding a camel looking for the best AI search engine. And then if I tell it to change the text to best burrito in town, just don't forget to enable image edit right there. Let's have a look. And that is not perfect. Let's give it one more try. Meanwhile, I'll show you an example that came up during our testing here. It just nailed it right. The gradient stayed the same. It got the font exactly right. Let's see if we can replicate that here in our live testing. No, it's really having troubles with the word burrito. One final try maybe. No, actually did not work flawlessly. So, it looks like it might work for a word or two, but four is too much. And Nano Banana isn't even good at text. It's just good at everything else. So, let's try one more. Replace the map with a smartphone. Okay, hands are kind of good, but not great either. So, yeah, clearly some of these Nano Banana examples are the most impressive thing we've seen out there, but it's kind of hard to access for now. But interesting as an example of how quick even these image models are progressing these days. One thing that
4:05

Jam

all top AI models like GP5, Gemini 2. 5 Pro, and Opus 4. 1 have in common is that they're incredible at writing code. But just producing good code isn't what truly makes a developer or a dev team great. You need code that works that you can actively improve on. And that's often more important than simply writing it. And when you're testing, the problem is often that if you run the code, you don't exactly see what happened when it ran. And it doesn't matter how smart the model is if it has to guess what happened. It's missing crucial context. And that's why today I want to show you a tool that gives you that missing context. And that's Jam, the sponsor of today's video. So here's how Jam works. If you're in the process of testing a new feature and something doesn't go as expected, instead of having to do the classic AI debugging thing where you take screenshots, copy console output, or sometimes you're even trying to reproduce the buck yourself, instead of all that, you can simply use JAM's instant replay. And then here's what happens. This is really interesting. Jam takes the last few seconds of what happened on your screen, and it fills out the entire ticket for you, including technical details and a step-by-step reproduction and even a suggested fix. It also automatically adds all network requests and console outputs. All the things that you would usually be manually pulling together to give the AI model a chance to debug efficiently. And it packages all that in one sharable link. They even packaged all of this in an MCP so you can easily call this from within cursor or clot code, giving all these apps the ability to essentially look at everything that matters from the screen recording to the logs and really understand what went wrong to give the AI model a chance to fix it. Need that context anywhere else? one with one click you can get it to Jira Linear Asana notion or GitHub you name it. Honestly I can't stress how much time this actually saves even if you're not doing this work yourself if you're working with developer rather than just saying hey I tried the form and it doesn't work and then them following up with multiple questions it's so much easier if somebody's testing the app to actually have this at your disposal so you can hand it off to a developer or an AI to really paint the full picture. I've been sitting here and copying console logs and taking screenshots since forever now and this is just a better way to do it. And recently I worked with a developer and we went through this entire process many times and it was such a tedious process. As I learned about Jam I was genuinely upset that I didn't have it for that project. So if you're a developer project manager or QA save time and headaches in your next development project, go check out JAM today. You can get started on a free plan by clicking the link at the top of this video's description. Thanks again to JAM for sponsoring this video. And now on to the next piece of Aan news that you can use.
6:33

11Labs Video-to-Music

Okay, so this one as a passionate video creator is incredible. This is something that back in the day when I was manually editing videos. If you're not familiar, in my early 20s, I started out by doing cold outreach to various nightclubs in the city that I lived in, Vienna. And then from around 21 to 23, I was creating two free event videos a week. Then I moved to corporate events, weddings, course production, and so on. But really, music was an essential part all along the way. And the only way to get music for my videos back then was licensing it from various sites. And now you might know that with AI you can generate it. But also there's all these new tools like this one that we're seeing now. And to be clear, this is not a first. But this 11 Labs music model has been just so good that I can't wait to try this out. It's a videoto music tool and it does exactly that. And we're going to try it now. We're going to upload a video and see what kind of music it generates for it. So I just have some random AI videos laying around here. Let's see what it does with them. This is one smooth animation that I kind of like. Going to upload this. And without even logging in, this should give us a song for it. Oo. Yeah, that is so fitting. I wasn't even sure what I would put there myself, but this type of tool, it makes me want to create unique videos again. Okay, let's try one more. There's this little video of, I don't know, animated me waving. Let's see what it does here. I suppose this vibe should be more of a elevator music or something friendly. Yeah, exactly right. Like a corporate jingle. Okay, I have one more clip right here. Want to see what it does? I mean, it's the pulp fiction scene. That is so fitting. It kind of nailed all three of those. So, you can try that without even logging in. That's kind of nice. And relating to that story, I'm just going to quickly point out that they also made their 11 music product available for the API now. Wasn't the case last week. And they dropped a bunch of guides in their documentation. So, if you wonder how to prompt it, how to really get the most out of this tool, well, they shared various best practices here in their documentation. All right, let's move on. Google made a bunch of
8:34

Pixel 10 AI Features

announcements. None of that is really available yet, but there's one thing in particular that I really wanted to highlight in this video. They're announced their Pixel 10 with a bunch of AI features and an AI focused chip. It's basically a AI focused phone. And they also had some Google Home announcements. But there was one thing in here that I just really wanted you to see so you can prepare for this future where you can do phone calls and it live translates into any language. I mean this phone is going to be coming as soon as October. And to do low latency live translation, you just need a lot of computing power on the device. And apparently the new chip makes this possible. And you can translate between all of these languages right here. This one in particular I just can't wait to try as soon as that's available. I could use that in Portugal dealing with the super slow bureaucracy where I don't even understand it. many
9:19

AGENTS.md

times. Okay, next up we got to talk about a trend that I absolutely love and that's the unification of standards across the entire generative AI space. In this case, I'm talking about this agents. m MD format which is looking to be the universal format for all agent context files. And I think this Twitter post perfectly outlines the utility of this idea. Shatien, the creator of the popular front-end library, shares that this is what agentic projects commonly look like. You have a bunch of different rules. You have instructions, you have a clot. md if you're working with clot code. And this new agent. mnd format is looking to solve that. And the open dev account even acknowledged this week. And that's basically the story. It's just looking to unify all of these context documents that agents use to maintain their memory and give you more consistent results across time. There are essential part of the context that these agents work with. And I hope to see this adopted by all the coding platforms because then it's going to make a lot of stuff interoperable. And now with this file tool, you could easily transfer the different tools and context that you have in certain apps and pick the work up with a different tool, which is just not the case right now. There's quite a bit of lock in as every system functions a bit differently. And I just love this idea of a future where everything is interoperable, open, and the power lays in the user's hand. And this is a step in that direction. And that's why I
10:36

ChatGPT Typos

really wanted to cover this. And now on to this week's quick hits. There's a few stories that I thought were worth your attention. And the first one is chatb typos. Matt Wolf was tweeting about this. This was a hot topic in our community this week and multiple members conferred that they have this same issue of chat chip making typos, something we've never seen before. What this really means is that they're tinkering with the model as it is live. They're not saying what they're changing, but they're switching it up just like they have since the release of GPT5. But this is a really weird one because we've never seen chat make typos before. I mean, what kind of training data are they giving it that typos are in there? If I think about that, that's got to be conversational data. you know, like chats or Discord server rooms. You're not going to get typos in books or video transcripts or even blog posts. Hm. Not sure what to really make of that, but just wanted to tell you about it.
11:22

GPT-5 is a Pokémon Champ

Another story that I always love is these gaming benchmarks for LLMs. And there's been a change. GPT5 is the new king because it actually managed to finish Pokémon Red. Okay, so for some fun facts around this, the entire 162hour run cost about $3,500 in API credits. So, we did some research and we couldn't find exact times on this exact benchmark to compare to GPT5. But I remember that the Gemini model got a comparable results to 03. And there is this graph that compares it to the 03 performance. As you can see, it took 17,000 steps for 03 to complete this whereas GP5 did it in 6,000. Just another way to look at AI progress and how quickly things are coming along. And
12:01

ChatGPT Go

then also I thought it's super interesting that OpenAI launched this initiative to launch chat GPT in India. They have a brand new subscription tier that costs $4 for 10 times higher message limits than the free version. I want to see the entire world benefit from this technology and these use cases. And I can tell you already from my YouTube analytics, I've seen that India is the country that is consuming English AI content the most outside of the Western world. Sure, there's a lot of people who speak English, but also I think there's a tech literacy and a curiosity. And I hope that the big companies keep serving up plans like this to create more opportunity, especially in the software space. And that right there is pretty much everything we have for this week. It was a bit of a shorter one, but hey, it's summer. This is how I expected things to go. But then we got the week in early August where everything dropped at the same time. At this point, I want to share one thing about the channel, and that is that we're refocusing on new YouTube content. We have several very interesting and higher production value videos in production right now. And I can't wait to share some highv value tutorials and super long- form content that teaches at a level of depth that we haven't covered before on this channel. So, keep your eyes peeled for that. And other than that, we'll keep doing this show every Friday. My name is Igor and I hope you have a wonderful

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться