Googles New AI Is STUNNING (You can create anything!) Native Image Generation

18:37

Googles New AI Is STUNNING (You can create anything!) Native Image Generation

TheAIGRID 14.03.2025 36 420 просмотров 1 099 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

00:23 Consistent Characters 01:04 Image Editing 02:04 Deadpool Suit 02:52 Recipe Demo 04:31 Game Character 06:31 Gameplay Interaction 07:13 AI Games 07:36 Text Rendering 08:15 Gemini vs GPT-4 09:15 Style Changes 10:56 3D Modeling 12:06 Selfie Editing 13:34 Post-Truth Era 14:09 Passport Photo 15:31 Art Styles 16:34 Image Colorization 17:20 Pose Conversion 18:31 Closing Thoughts Join my AI Academy - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: https://x.com/ilumine_ai/status/1900017235898622025/photo/3 https://x.com/linaqruf_/status/1899977818563633466 https://x.com/OriolVinyalsML/status/1899853815056085062 https://x.com/emollick/status/1900056829683462234 Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com Music Used LEMMiNO - Cipher https://www.youtube.com/watch?v=b0q5PR1xpA0 CC BY-SA 4.0 LEMMiNO - Encounters https://www.youtube.com/watch?v=xdwWCl_5x2s #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (18 сегментов)

Consistent Characters

show you guys why this is so crazy so one of the things that we actually have is consistent characters being solved one of the demos that they showed us was that Gemini 2. 0 flash can essentially tell a story and illustrate it with pictures in this example I of course you know you can see that there is a goat there and then you can essentially change the goat to be doing different things of course there were different models in which you could do this but the really impressive thing is that with Gemini 2. 0 Flash the accuracy that these models have is truly impressive like this is the exact same animal and you might be thinking okay it can make images that are just slightly good in a story way but trust me I will show you guys exactly what I mean so one

Image Editing

of the things I wanted to try was and see just how good this model app was editing images so I put create a pick of Deadpool with a white background I wanted a white background so that I could potentially you know avoid any kind of mistakes that the AI image model might create but you can see right here it generated a very simple image of Deadpool then I said make him fold his hands and you can see it immediately man to make Deadpool fold his hands and it maintains absolutely everything so if I flip back you can see that it's the exact same picture and the crazy thing about this is that it doesn't regenerate the entire picture it only regenerates the entire parts that it needs to while leaving the rest of the image to be completely perfect now you can see right here I said now make him stand on one leg and you can see once again we have Deadpool and his leg is completely changed to standing on one leg I can probably you know already think of a million different way ways to use this but there are still so many more use cases so you're going to want to continue watching then I said okay make him wear

Deadpool Suit

a suit and you can see of course it managed to immediately put a suit on him in this amount of detail and this is something that I personally think is super useful because it shows us how things may change once AI editing gets here the AI was able to do this in around 5 Seconds every single time and it managed to maintain actually maintain character consistency and going back to the first one when trying talk about character consistency here I'm not sure what kind of character consistency they're using because it doesn't just regenerate the same character with the same description I really can't tell any differences between the character I create and the second image so I'm not sure how it does it but it has a really accurate image generation tool that I'm pretty sure a lot of people will be using especially since the API

Recipe Demo

is out now and now this is where we talk about the Incredible World model of Gemini so you can see right here we talk about you know giving it a recipe for a chocolate chip cookie and please include images for each step now this is crazy because it understands exactly what part of the image should look like and then it manages to generate exactly the next image after so you can see right here the eggs are being added and then what happens is to whisk the ingredients together you can see exactly what that is looked like and then you can then see what the fifth step looks like in the six step and it all looks just incredibly perfect so this is going to be something that is absolutely insane for a variety of different use cases and you can see right here that everything looks pretty much perfect I'm not sure Photoshop could do a better job than this maybe someone is out there with better Photoshop skills than me but just having it be able to edit an image and for example generate a recipe and you can see all the different stages of exactly what that's supposed to look like I think that is something that is really fascinating so the world model of this system is really good because it understand the next states of certain things I mean let's say for example you decided to take a picture of your you know baking tray or your you know bowl and you're like okay what is it supposed to look like next and then it's able to show you exactly what your bowl is supposed to look like next if you're actually completing the steps right and I think that is something that is really interesting because it's going to open up the use case for a variety of different things and of course like I said before this is something that isn't behind some kind of payall or some long- winded software this is something that

Game Character

you can use and it is incredibly fast now the crazy thing here is that there was one Twitter user that was able to do something that I think is one of the most creative things ever and essentially they created a game with an AI model so what you can see they had here was they managed to make a hero character in the style of gibli I'm not actually sure what that is but you can see right here that they generated the character then you can see it says put this character in game G the image needs to be like a typical screenshot of the gameplay and the game style like genin impact then you can see it generated this screenshot on the bottom right here an incredibly realistic screenshot of the game and honestly I cannot believe that this is coming out of an AI model because it looks like a screenshot from a mobile phone and you can already see where this is going now this first screenshot really did surprise me because we can see the same exact character in the game pose and not only that was the you know background the perspective the you know HUD icons everything was pretty much perfect but the next thing shocked me even more so the user wanted to take things even further and let's take a look at what they did you can see that they said okay move the character forward and you can see that the character manages to move forward their legs start to run and then when they moved closer towards the building you can literally see that they were able to you know get this and it managed to move forward right here and the thing is that there doesn't seem to be much distortions when we actually look at the perspectives we can see that the background here it still matches up like this mountain here you can see that the clouds are still in the exact same place this just look a little bit bigger the details looking seemingly the same and then of course we can see right here we can also see that this all looks very consistent and this is something that is super impressive because I never thought that we would get to this level of controllability and this granular level of detail to be able to control that simply with text prompts such early on and then you can see he

Gameplay Interaction

decided to take it even a step further and said okay get even closer to the wall and then you can see that he said climb that wall and we can see that the character managed to actually climb that wall so this was something that you know of course I didn't expect people to try but when you have software out there there's going to be a million different ways that people do try to use it so this is something for me that I found to be completely out of this world in terms of using it and when we think about AI generated video games imagine an AI that literally just is generating these frames and is doing some kind of diffusion generation in between those kind of frames just imagine how crazy that is going to be with this kind of consistency I can already see that AI WT

AI Games

games might be a big thing in the future now another good thing that this model does have is really impressive text rendering in this demo right here we can see that someone wanted to put an old detailed vintage 3 5 mm photograph from a front view vintage on a computer monitor set and you can see that the text looks very accurate I don't

Text Rendering

think there are many models out there that actually can complete the entire text without some minor mistakes it seems like Google is probably the only company right now that can accurately do text correct pretty much 100% of the time of course bar one or two mistakes but it is really surprising just how accurate this model is and so overall this is something that is truly impressive considering the consistency and its capabilities and so overall this is something that is truly impressive because it allows for many different use cases now in addition with text rendering we can also see generate an image of a classroom with the teacher writing the following words on the Blackboard with yellow chalk we got

Gemini vs GPT-4

Gemini flash native image output before GPT 40 and this was something that is kind of throwing shade to open aai I don't remember who tweeted this but I will have links to everything in the description but of course there was this released by open some time ago so I'm actually going to show you guys those slides now but you can see right here that we actually had this a year ago from open AI so I don't know why open AI didn't release this maybe they shifted their compute to something that was probably more important at the time but it seems that Google has the compute to off this and I do remember this tweet did go viral a lot of people were excited for this was 2. 3 million views and this was you know something thing that was inside the GPT 40 system now additionally if we take a look at what Google is able to do we can see that the same kind of image was prompted and you can see someone said Checkmate and this is where we can see the image that they generated and I think they

Style Changes

both look pretty accurate now another thing that this actually allows you to do is to switch Styles easily you can have a sketch and then you can say you know make a line art out of this sketch then you can see that's someone else said give it some base color which was really cool and then it said add some soft shading the source of low light is on the left upper corner and you can see it managed to do that in a really accurate way so this overall I think is just so impressive because you can get really creative with this like for example after you know that you can add some background some indoors and then they manage to make it monochrome grayscaled for some kind of illustration I mean think about the use cases here for those of you are creative I think there were going to be tons and tons of different use cases where you can use these to do many different things and of course I will say there are some meme capabilities here because you can see right now that we had this image and someone said make them look more relaxed and happy and have them holding an ice cream and don't change anything else and you can see that it immediately just used the exact same art style and put two pieces of ice cream in their hands and honestly if you were to tell me if this image is changed or not I wouldn't have been able to guess that now there have actually been more and more examples that you can see right now that have been really impressive one of them being this example where someone takes a picture of Morgan Freeman and they've managed to use that to create a 3D model for this character this is super impressive and not something that I would have expected the model to be able to do at all and since I was making the video

3D Modeling

I've just seen multiple different tweets about various ways that people are using this new software to fund their creative Endeavors and the ways that people are using this is really impressive because I never thought that this was even possible I'm genuinely not sure how this is even done on a base level I know like diffusion models are there but to be able to take this image and then recreate it like this I'm not sure what kind of sorcery Google is doing here this is definitely a really crazy cook and I would definitely expect that this has a wide variety of use cases especially for individuals that are trying to visualize different things imagine you are trying to 3D model his head and you want to see is it going to look like the actual perfect version that the AI has already managed to cook up so you can already start to see why this tool is incredibly powerful remember this was done in around 4 to 5 Seconds versus someone using a tool like zbrush or any other creative software in order to 3D model and taking hours on end now another example of someone using

Selfie Editing

this for a creative Endeavor was this tweet right here and it actually talks about how someone was using this to pretend that they are no longer late for work so it says POV you're already late for work and you haven't even left home yet you have no excuse you snap a picture of today's fit and there then you open Gemini 2. 0 flash experimental here we can see that the woman has her selfie and then of course this absolutely changes that selfie to show her at the tube station or the train station giving a thumbs up and this is really interesting because of course let's say you were late and you needed to take a picture somewhere this is of course something that you could easily use to fo your boss that look I'm 5 minutes away but at the same time it does make me wonder are we moving to an area where these images like I don't think they're AI generated in the sense that an AI didn't just create it all by itself it took the initial image data and then merged it with AI to create something hyper realistic so this is quite powerful but also at the same time it's very dangerous because we now have a situation ation where how do we know what to believe when it's actually blended with someone from reality and

Post-Truth Era

considering the text is basically perfect you wouldn't realize that it is fake too so I do think we may enter the post truth era where a lot of things people start to see online they probably won't even think is there and I'm guessing as well that you know famous people may be excluded from using this application in terms of like if you put this facing because I'm pretty sure you could definitely alter pictures of famous people and have them I don't know even take a selfie with you and do a variety of different things so of course

Passport Photo

right now ai is powerful but also very risky now another example was where someone said Gemini flash 2. 0 experimental will actually save you a trip to the Walmart portrait studio so here you can see we have a portrait that we have a portra and we've got a woman and another woman sitting at this table having some coffee and then you can see it says create a square passport photo of the blond woman on the right with a neutral face expression on a white background I'm pretty sure you can see why this is super powerful imagine just using one half taken picture of you to immediately get some kind of picture that you could use for passport verification this is something that I personally do believe once again is super powerful like I said Google really did cook with this model and you're essentially able to you know change things just based on what it's seeing so even if you didn't take things on the perfect angle you're able to really get down to the granular level of exactly what you want including the background your facial expressions the lighting and all of those things combined in order to create whatever kind of output that you do wish another example here is of Gemini being able to copy art styles so you can see

Art Styles

right here we have this art style I'm not exactly sure what it is called but you can see this art style is here and Gemini using this same art style is able to generate an image for a dog so have you ever see something online that you think I have no idea what that art style is and I'd love to recreate that you can use Gemini flash 2. 0 to essentially copy that art style and get inspiration and create images for yourself in any way or shape that you do want this is something that is truly powerful because a lot of times we don't know the names for things I see certain brands have certain color schemes and certain color palettes and certain assets that they use in a way that I haven't seen before and I'm just thinking how on Earth did they manage to do that and how do I even begin to use that art style so this is of course good for people that want to copy art styles but maybe not for those of you who want to preserve your own other way overall I do find this to be an incredibly powerful tool another one

Image Colorization

that you can see here is that this also allows for instant colorization of images perhaps if you have an old image of something you could maybe use this to colorize it so you know if you have old images maybe you are on the older end of my audience and you perhaps have a few pictures in black and white that you may want to decide to colorize Gemini Flash 2. 0 is going to easily allow you to do that with one simple prompt just colorize this and I think so far looking at this image everything seems to be pretty accurate of course in some images there are going to be certain things that are a specific color but overall I think this is an incredibly powerful tool especially for revitalizing those

Pose Conversion

pictures now something that I also found to be really important was the fact that you can actually convert the poses but the reason for converting the poses is to actually use them as 3D models so you can see right here you had this potato dressed as Wonder Woman and you can see right here it says make it stand in a t pose so of course that was something that you could do but most people don't understand that a t pose is really useful because what it allows you to do is to create 3D models from that pose now you can see this user managed to do that and this is something that if you know you have access to AI to 3D tools you're literally able to get these images and then get a 3D character so I'm pretty sure that you can now make any simple AI character into any kind of 3D model considering this workflow allows you to change the poses this wasn't a big issue but te- posing was something that was pretty hard to do I do remember sometimes trying to do it manually in Photoshop and other software and the AI just wouldn't recognize certain body parts so this definitely so a giant issue for creatives so with that

Closing Thoughts

being said hopefully you guys have enjoyed this video definitely try this out and I'll have more videos covering this soon

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник