Googles VEO-2 Just SHOCKED The ENTIRE INDUSTRY! (OpenAI SORA Beaten) Full Breakdown
10:48

Googles VEO-2 Just SHOCKED The ENTIRE INDUSTRY! (OpenAI SORA Beaten) Full Breakdown

TheAIGRID 16.12.2024 54 889 просмотров 1 072 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Join my AI Academy - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: https://deepmind.google/technologies/veo/veo-2/ Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com Music Used LEMMiNO - Cipher https://www.youtube.com/watch?v=b0q5PR1xpA0 CC BY-SA 4.0 LEMMiNO - Encounters https://www.youtube.com/watch?v=xdwWCl_5x2s #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

this is V2 Google's second iteration and somehow it manages to surpass every single video model that currently is available and that does include the recently released Sora 2 which was from open aai and it's really surprising considering Google haven't had the best track record but it seems that this December Google have surpassed everyone's expectations in terms of what they are capable of with regards to AI development as we've had up update after update showing us that Google clearly are now the industry leader when it comes to AI development this marks a historic moment as this is a time where AI is more competitive than ever and being able to top the leaderboards in terms of software that is not only the best but the best by a decent margin this means that Google is clearly back on their AI game and is setting new standards for other industry leaders to live up to now incredibly like I said before not only is this something that is visually appealing as well as visually striking when you do take a look at vo in terms of the actual benchmarks we can see how it is compared to others and how those other models performed you can see right here that meta's movie gen a 108p video generator that is remarkably good doesn't manage to match up it is only preferred around 30% of the time and over 50% of the time we can see that Google's model takes the cake as well as cing 1. 5 which is a model that I know a lot of people in the creative industry are using for various different projects and it's been highly talented as a model that is outstanding we can also see one of the very best models Minx actually coming in at just 30% when it looks at how much they preferred that model and of course Sora turbo opening eyes recent iteration of models we can see that one actually is the least preferred when it comes to looking at what kind of models are preferred so overall across the board we can see that Google's models are clearly better than any other industry standard text to video generation models which is like I said before very surprising considering that this is their second iteration and they didn't release the first one publicly for quite some time and it seems like or it would have seemed like Google would have been behind considering a lot of other people were using many other models so overall right here in the benchmarks you can see that over 50 % of the time near 60% of the time people will be choosing Google's models in terms of full video generation when we also look at the prompt adherence I do apologize for the quality here but we can see as well that this is where Google's models also outperform now one of the things I do really like about this model is the fact that this model does have incredible physics capabilities one of the large issues with these video generation models is that they are generative AI systems meaning that things aren't really understood at the physical SL granular level which leads to inconsistent SL hallucinated outputs but Google have managed to cook up something new here with VO2 it doesn't give us details on the entirety of the architecture but whatever they have done shows us that this model is one that really and truly understands the physical world we can see that cutting the Tomato not only looks really juicy and plump as the knife is cutting through it all those subtle movements and changes and vibrations are visible in the tomato and then when the tomato is laid onto the other one we can see it actually does look incredibly perfect with regards to the positioning of that fruit so overall we have something that has a really good understanding of how objects interact with each other I'm not sure how they done it but whatever they have done is working really well now one of the things that can be really hard to nail when dealing with a text to video generator is of course dealing with liquids are highly unpredictable and they move in ways that take traditional machines genuinely hours and hours to compute all of those little particles so when we do see accurate fluid simulations this shows us that the model that they've managed to develop is one that is really coherent if you've ever tried to generate your own W simulations you'll know just how tedious that can be so in these examples right here we can see that there are two kinds of liquid you can see that we have the coffee being poured there's all sorts of subtle nuances in the coffee that I really like to see there those small details and then of course when we look at the syrup that is also something that looks really detailed in terms of the syrup flowing at exactly the right speeds in terms of it compounding on one another so overall I do think that the physics is something completely incredible now I did come across this demo in which you see someone pouring coffee and this one looks really incredible because we actually get to see how the liquids managed to move from

Segment 2 (05:00 - 10:00)

one object to another so this was something that I found to be completely remarkable it was something that I really did like for example when he actually puts the glass back you can see there is a slight jiggle in the water right there so when he puts it down you can see that the water manages to move which is just you know all these subtle details just add to the kind of realism that this model is able to generate so this is something that gives me a large amount of confidence in Google's ability to actually get the job done when it comes to producing models that are really effective and that should be to No Surprise because Google are often the ones that innovate when it comes to this kind of Technology I'm not exactly sure what the exact piece of technology was but when it is regarding the current state of video development there was a breakthrough that Google actually made that led to the creation of many different efficient models like we do have now such as Sora and the other ones like Minimax now we can also see another example here of someone you know pouring a very cold drink and then we can also see the accurate fluid simulations but enough with the incredible physics what about the strange and the wonderful examples of characters being generated what about a sitcom TV show with potatoes you can see right here exactly what that looks like I'm not sure why this person generated potatoes but what we should take away from this is that the character consistency does look really effective now another interesting example if we're no longer going to look at potatoes is this prompt of a car going top speed through a road until reaching a waterfall and then it gets to the waterfall and jumps of a mountain so this is a really striking example of how you can see the car managing to run through a waterfall not run actually drive and then you can see that this car is able to accurately perform very well when it comes to the physics of what we're seeing here so the waterfall looks really nice you can see that the splash Downs look really nice overall it's just something that looks really remarkable in terms of what the model is able to do and I don't have any doubts that individuals that do get access to these models are going to have some really creative outputs that they going to use and when we actually take a look at prompts that are this difficult to perform and pull off with such a long form in terms of you know the character consistency ensuring that the car doesn't deform and mangle one of the problems I've seen from many different video generators is that sometimes the car will morph and shape into a completely different brand or just a completely different object so the crazy thing about all of this is that you know for those of you who don't continually use these video models you aren't actually truly understanding just how big of an advancement this is but I'm really trying to drive that point home to you guys that this is truly a model that is a step ahead of the competition in simply all areas and if you want to go back to potatoes you can see this example of a cinematic action kung fu movie and the protagonist is a potato wearing a long black leather coat and it is raining heavily I think this is an example of multiple different things coming together successfully here such as the you know prompt adherence as well as the rain looking really well there so that example is one that honestly I can't say I would have believed is AI generated unless someone certainly told me because everything I'm looking at I truly don't see any issues with it at all now Google's vo project vo the text to video generator isn't the only thing that they managed to launch with this they also decided to launch their Frontier Model which is a text image model imag 3 now IM 3 is essentially their image model that surpasses absolutely everything on the leaderboard you can see here where it ranks in terms of over overall ELO rating and like I said already I think this shows us once again that Google is taking back the Reigns in terms of showing us all that look these guys are the ones that invented most of the technology that these AI models are powered by anyways so it shouldn't be a large surprise that they are coming back with the phone and they are doing so with full Frost so it's not just a video model that they've released today they've also released an image model that seems to be the leader in terms of outright coherence so if you're wondering how this model Works in terms of the prompt adherence it's absolutely incredible if we actually take a look at the image Labs area where you actually get to create these image models it shows us that the kind of UI that they use is one that is not only easy for beginners but Alo but also even for advanced users it's simply the best in terms of allowing you to construct a prompt that enables you to control exactly what you want to see credit to Ethan mik for these photos but you can see right here that with IM 3 it really understands exactly what you want and with the drop- down ability for certain words essentially what you're able to do is you're able to control certain parts of the image and swap them in and out you can see right here one of the key prompts that I found to be rather fascinating was this closeup of a man's eye and in the eye you can see it's reflecting some garlic bread which is a really difficult prompt to B off so this is once again another example of Google's models pulling ahead we also get to see a photorealistic

Segment 3 (10:00 - 10:00)

image of a potato fighting a vampire on the moon which is rather interesting I'm not sure why anyone would prompt this but that is the point of AI for you to embrace your creativity and explore all of your ideas so overall I think it is absolutely incredible that Google have managed to produce vo and image and 3 that are state-of-the-art in terms of their performance compared to any other company and it seems to be that way by a long source so with that being said Google this week have simply reclaimed the throne in terms of video in terms of text to image and of course in terms of their overall AI models and the future so with that being said are you guys more bullish on Google Now I certainly am looking forward to anything else that they have to offer especially coming in January and I'll be looking to see if Google have any more releases left for the end of the year

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник