# Googles New Text To Video AI "VEO" Is Actually AMAZING! (Googles SORA KILLER!)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=Y68giShiKfQ
- **Дата:** 03.06.2024
- **Длительность:** 24:19
- **Просмотры:** 18,016
- **Источник:** https://ekstraktznaniy.ru/video/14272

## Описание

Join My Private Community - https://www.patreon.com/TheAIGRID
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:
https://deepmind.google/technologies/veo/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Google have finally announced their Sora competitor and I must say things are really starting to heat up because this model is really good now I got to be honest with you guys this is not exactly new news but I will say that they seemingly have updated this model because it was announced at Google's IO but the demo wasn't as impressive as it was then so take a look at this you can see that this new demo on just simple photo to video model it looks absolutely incredible I mean if it wasn't for Sora being released a couple you know months earlier in February if this was something that was just coming out and it was stateof the ark we would say that this is very much so incredible so this is Google's vo is their most capable video generation model to date and it generates high quality 1080p resolution videos that can go beyond a minute in a wide range of cinematic and visual Styles and it accurately captures the Nuance the tone of a prompt and provides an unprecedented level of creative control understanding prompts for all kinds of cinematic effects time- lapses and aerial shots or Landscapes Google have said that their video generation model is going to be available very shortly and it's going to be there to help create tools that make video production available to everyone now I'm going to show some of you guys more of the demos because they are rather impressive so for example right here we have this image of a woman opening what seems to be some kind of crystal ball or something not really a crystal ball but some kind of rock that contains another piece of rock inside and then we can see that this is the video from that a pretty stable video showing her opening The Rock and giving us a look inside that is pretty effective if you ask me just based on the current image here we have another example this is simply an image I'm not sure if this is AI generated by their imaging software but we can then see that from that image we then get this generated video and we can see that the woman completely turns her head from left to right and then the dog starts to Blink and moves the other way so I think this shows us the kind of examples that we can have for example that woman turning from there to there it looks very consistent and very good another thing that I think you ought to pay attention to is the lighting for example the sunlight is coming from the right hand side of the image rays are beaming down so we have the Rays beaming down this way which look really cool and then of course you can see okay and it's kind of hard to see but you can kind of see that like as she turns around the Shadows still look pretty consistent as the sun comes down it kind of you know gives us that real effect so it's pretty interesting that this model is able to accurately carry uh and is able to I guess you could say truly understand exactly what's going on with certain things and with very much consistency so right here you can see that this is a woman that it seems to be in the middle of laughter and then we can see that when we click play she opens her eyes and then she looks around and blinks in a very realistic way now I would say that this is definitely a state-of-the-art model for what this is the quality looks amazing the character consistency looks really good and if we're being honest we have to say that this is at least on sora's level because of the way that this looks I mean this looks very good here we have the prompt being a lone cowboy rides his horse across an open plane at a beautiful sunset soft light warm colors that looks as realistic as it's going to get I mean I'm looking at the legs look pretty normal lighting which is also something that is very very hard to do this kind of lighting where it's super realistic I'm not even honestly sure how the AI system has managed to do that because lighting is something that is so hard to do but you can see the fact that the character consistency moves with the horse the sun Remains the Same and then the light is the ray that like that is really impressive and you might be thinking why they show certain demos for those of you who understand like you know if you ever shot film before or you've ever like analyzed you know these kinds of videos I think you can truly start to grasp the fact that this is AI generated that there is a deep level of understanding exactly what is going on in the image because it's right there like it remains there and then you can see that the sunlight comes back around that is very remarkable and not only that but the entire lighting changes so there's a remarkable literally a remarkable degree of accuracy from this model that I have to say Google have done a very good

### Segment 2 (05:00 - 10:00) [5:00]

job and you can see the Shadows at the bottom as well they do look very realistic too now there's also a few more examples this one is quite like the Sora example you can see an aerial shot of a lighthouse Standing Tall on a rocky Cliff it's Beacon cutting the early Dawn waves crash against the Rocks below so we can see that the waves look very realistic this looks very good and we can see that all of the Waves along here look really nice and consistent and coherent there is one entire wave coming across and then we can see that it literally manages to crash inside the rocks and as I look at this is genuinely some remarkable footage that I am seeing because it's truly impressive what you're able to do when these teams put their minds together so another example that we do have is a time lapse of the Northern Lights dancing across the Arctic Sky Stars twink twinkling snow covered landscape and this one looks really nice you can see it's actually managed to capture the fact that this is not moving at all but completely moving and then of course the Northern Lights like the entire time lapse that is genuinely pretty incredible because the thing is that with a lot of these clips like it might be your first time seeing the dynamic range of these clips like I've seen a lot of these other Clips before like I've seen a time lapse of the Northern Lights like seen uh you know this clip right here like I've seen as in the fact that I've seen how sunlight acts I've seen the time lapses and what I'm trying to tell you guys is that this model is genuinely really good like I wouldn't be hyping this up if this was just a mediocre model but this model is surprising me in the consistency of certain things now there's also this right here so a fast tracking shot down a Suburban residential street lined with trees daytime a clear blue sky saturated colors high contrast so this example is one where I'm guessing Google is probably trying to test how the consistency among the houses remains As you move forward because with any standard video model what would happen is as you move forward these houses on the right hand side uh the trees you know the grass right here everything just kind of meshes and merges together but I'm guessing that what you're about to see is a little bit different than that so let me show you guys um this one right here you can see that the consistency is remarkably impressive so for an AI generated video this is just it's so strange because you can literally see that the trees staying in the place so as the grass you know you're driving down the street and everything looks very very good the only problem I would say with this clip is that it isn't long enough because I would have loved to see how it handled going further down in the same clip now this is only a few of the actual clips that they have we're about to get into some more of the demos because honestly this is really impressive we also have many spotted jellyfish pulsating underwater their bodies are transparent and glowing deep ocean so here we can see this ocean clip we've got jellyfish being you know transparent and moving and we can see that once again the character consist of this is really good and I think what Google wanted to show was that even with complex characters SL images it manages to remain intact because the anatomy of jellyfish is not easy to do like it's not really an easy creature because it's so Wiggly I guess you could say it doesn't have any bones it just it's like it's basically like a piece of cloth so simulating that is actually really hard like simulating cloth physics from a generative video model is pretty hard especially with the way the underwater things go so I'm guessing that with this uh you know example right here it shows us how effective this is now there's also this right here you can see that this is the time lapse of a sunflower opening with a dark background and this one I truly like this example because not only is it a time lapse which means that this model has to go from literally here to being able to predict exactly what's going to happen and being also able to predict exactly what's on the inside it also has to do it in a fashion where it looks like a timlapse and this is exactly what I'm seeing here so we're seeing this model consistently have all the leaves I'm not seeing any of the leaves actually meshing together the leaves are looking pretty individual which is remarkably uh good for Google and we can see that the only thing about this is that you know the center bit does look and yeah I think this also looks really

### Segment 3 (10:00 - 15:00) [10:00]

good in terms of uh being able to do that now there's also another example that I really do want to show you guys and this is the extreme closeup with shallow depth of field of a puddle in a street reflecting a busy futuristic Tokyo City with bright neon lights night lens flare so the reason this one is so cool is because one of the things that's really hard to do um and one of the things that older video games don't really have is the ability to have this kind of reflection now the problem with this is that this was only I guess you could say not a recent breakthrough but like video games don't really have this feature just yet and it's basically the uh Reflections in uh puddles and stuff they really don't have like effective Reflections in puddles that's why Nvidia actually made their new graphics cards which are RTX so I'm going to show you guys why this is so cool is because the different lights flashing in the background you can see that the lights are actually reflecting exactly what's going on which is remarkably impressive now I would say that this isn't probably the best example of this because with so many colors going on it's kind of hard to see them but you can see here like as the blue so I'm going to show you guys this right here so pay attention to this area over here you're about to see that like as it goes blue uh this area goes blue here uh and as it goes red this area is a bit red here so I think and I'm show you guys okay why this is so crazy and this is exactly what I'm talking about okay 3 years ago there was you know RTX on versus off and it's basically literally just the kind of Reflections that you get that can make the game a lot more realistic so you can see that on the ground there I think you know cyberpunk 277 is going to be the best one but you can see that like if we look at how things are rendered this is RTX off you can see um so this is what it looks like pretty standard and then it's going to show you guys in a moment okay with RTX on the much more realism in terms of the actual rendering okay so you can literally see okay side by side that like one is blurry and one is super realistic and the point is that like this kind of I guess you could say um you know rendering which is you know done by this model I'm not entirely sure how it's done I mean it's truly truly impressive and that's why this entire vo model truly surprised me because like a lot of the stuff that's being done here uh is really done to a high degree like this closeup I mean uh it's really really impressive um and like you know looking at the RTX and stuff like that it just goes to show that Google I guess they were working on this model but I'm guessing that when they uh wanted to release this model slash you know uh show us a demo of this model they really did want this to be good and these are some of the um recent so yeah now this wasn't the only thing that vo is capable of doing in the blog post they actually uh talk about how they have controls for film making so one of the things that we can see here is that you're able to edit exactly what's going on okay and I'm guessing that what they're wanting to do here is make this available in not a film setting sense but they kind of want this to be really good for you if you're using this in you know video editing processes so this is actually going to be useful for those of you who are creating content or uh any kind of video that requires in painting or out painting so you can see right here okay if you click this you can see drone shot along the Hawaii jungle Coastline sunny day so we can literally see that looks really nice and really good but if you wanted to edit something in there for example with this prompt we can see that it's a drone shot along the Hawaii jungle Coastline sunny day but the only thing that they've added is kayaks in the water and then you can literally see that on the right hand side along with the kayaks we can see that there is the decent reflection there and it's really good because it allows you to edit in things with a simple text prompt so I think this is really cool because it shows you what is capable of when you're able to really control the I guess you could say so yeah I think this is something that is really effective uh and it's really really cool so this is going to be something that I guess we would have to see more on because the only things that we didn't get to see uh is of course more examples of this now one thing that Google actually showed us was this one minute long video and when they initially released vo in terms of like showing it through the Google IO event this was the clip that they showed us now this clip I got to be honest it wasn't as good as some of the ones that we've recently seen which is why I say that potentially they may have changed

### Segment 4 (15:00 - 20:00) [15:00]

uh somehow some things in the model but this one the reason I did like this clip like the only things I liked about this clip was the fact that it did maintain a little bit of consistency for quite some time and there was this cool bit at the end that you're about to see where the car literally goes through a tunnel and then it shows like this whole new uh city which is really cool but I think what's weird with this is that the video started out in one location and then managed to go to a completely different location and then just ended up in this weird kind of trajectory but I think it's you know kind of cool that they showed us this right here because like I said this last bit actually looks a lot more realistic than some of the other Clips but it's very uh interesting because you can see that what you were able to do is that I guess you could say craft this story line so with the controls here you can see that there were lit four areas we had 1 2 3 and four and you can see that it's a fast tracking shot through a dystopian sprawl then it says futuristic Dorian sprawl then it's a neon hologram of a car driving at top speed and then it's the cars leave the tunnel back into the real world city in Hong Kong so um for those of you who might have been confused the first time you've seen it that is where the video shows us that it's literally able to really stick to whatever it is that the user requests and then you can see that they've literally said that the video was not modified and I'm glad Google actually added this disclaimer that the video has not been modified because previously they did get some I guess you could say some kind of flack for that because they did edit the video a little bit and I think whatever reason this is Google I don't know what it is but Google's models they haven't trained on enough data that generate like really artificial stuff like Google for whatever reason they' only trained on realistic stuff like they haven't trained on these kind of cities because if you've ever and this is a weird Niche thing that only if you use AI every day but basically if you've ever used Google's uh image and model you'll know that like when you're generating images of futuristic cities the quality just isn't there like I was trying to generate a steampunk City and I can guarantee you guys it was one of the worst images I've ever seen from an AI generated tool it literally looked like something from darly 2 that's not to bash Google but the point is that once Google upgrades this model to be have trained on this kind of data I can promise you guys it's going to be really cool the point here is that Google's model whatever it is they've used is that it works really well with whatever is actually realistic which is why in this area when we go back to the real City you can see just how much the quality actually increases now there's also of course this right here this is a moody shot of a central European alley film Noir cinematic black white high contrast and detail this is a really nice one because it shows the fact that it's able to get the European alley Noir cinematic and of course this is a moody shot so I think what this shows here like to those of you who might not understand is that it's able to get the theme it's able to get the location it's able to get also the high contrast and it's able to get the camera angle that is you know Moody like the whole theme of it it's able to get that really like just capture the essence really well now of course there was this example as well which is really nice it's a Crutcher elephant in intricate patterns walking on the Savannah and the reason I like this so much is because it's something that you just would never see but with Google's model it's showing us exactly how this would turn out and it's really cool because for whatever reason this elephant just kind of looks somewhat realistic uh and it is in slow motion but it's still looks very effective so I think overall what we do have here with Google's vo is truly a stunning model because even the way like the rabbit kind of looks around as it's being held um and even though you know it is a little bit slow motion you can see that the rabbit moves right there and even in this example where the woman literally turns from right all the way over to left and she kind of blinks instinctively which strangely enough looks really well I think that you know what we have here where you know we've got the light flares we've got the man is still on the horse who's able to you know I mean this just completely looks realistic this doesn't look even AI generated at all but I think overall what we can see here in terms of the consistency being able to consist consistently generate uh you know scenes being able to accurately represent the kind of Lights that's flickering being able to also change what is going on in the image so for example you know edit kayaks in or whatever and being able to do one minute long videos where you're able to

### Segment 5 (20:00 - 24:00) [20:00]

literally prompt it four different times and make I guess in the future maybe in 10 years we're going to be making movies with this kind of thing I think it just goes to show us how great Google are when they truly do work on something so in a moment I'm going to show you this video from Google and I think it's really cool because in this video they show you like a little bit more in terms of what the software is going to look like so you can see right here um this is vo it's like a shot of a 916 convertible driving up to a Spanish Mediterranean in place that looks really cool you can see you got all these people working around with it you got this other clip right here and you can see it's the unedited raw output now another thing that I did think was really cool was that you could see uh in vo that there were these things where you've got like four of these so you got like four uh outputs so I'm guessing that it's kind of probably going to be you know in some aspects kind of like mid Journey or other AI tools where they give you know certain uh abilities to create multiple clips at the same time but um there is something that we can see from the consistency of these clips is that they do look really good okay uh and like I did say before uh one thing that I am noticing is that there is this kind of slow motion thing going on so I am wondering if that is just a thing that is just with their model I mean it doesn't really matter that much like honestly you'd be surprised like when I used to edit videos I remember just like you know finding something that was slow mod down and then speeding up so that is you know not something that is uh too bad to deal with at all but um yeah I should uh you know if you're watching this clip I think it's pretty cool uh on how much detail you can actually add and honestly can't wait for Google to just you know finally release this because it would be really nice for us to finally have this well I've been interested in AI for a couple years now we got in contact with some of the people that Google and they had been working on something of their own so we're all meeting here at gilo farms to make a short film core technology is Google deep mind's generative video model that has been trained to convert input text into output video it looks good we are able to bring ideas to life that were otherwise not possible we can visualize things on a time scale that's 10 or 100 times faster than before when you're shooting you can re iterate as much as you wish and so we we've been hearing that feedback is that it allows for like more optionality more iteration more improvisation but that's what's cool about it is like you can make a mistake faster that's all you really want at the end of the day at least in art it's just to make mistakes fast so using Gemini's multimodal capabilities to optimize the model training process vo is able to better capture the Nuance from promps so this includes cinematic techniques and visual effects giving you total creative control everybody's going to become a director and everybody should be a director cuz at the heart of all of this is just storytelling the close we are to being able to tell each other our stories the more we'll understand each other these models are really enabling us to be more creative and to share that creativity with each other Google basically said that this is going to be available soon through a wait list so you can sign up here and maybe you're going to be able to get to use this fairly soon but I think this is something that is really effective and something that looks really cool so let me know what you think about Vo do you think vo is better than Sora do you think it is worse do you think Google should just hurry up and release this let me know what your thoughts are because I would love to see one of the only things I don't think I saw too much though from this demo was the fact that I'm not trying to bash Google at all but a lot of these clips were in kind of slow motion so I'm wondering if there are any that are going to be in a lot more faster motion of course you could always add 1. 5 times speed but maybe that's just what the native model is for vo so I think overall competition is good and this shows us that Google is not out of the AI race yet and they've got a lot of cool things that they are going to be releasing that truly will shake up the game