# DeepMind’s Veo2 AI - The New King Is Here!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=_qQwSVzYNpA
- **Дата:** 12.01.2025
- **Длительность:** 6:30
- **Просмотры:** 107,035

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

Try Veo2 here (Notes: likely USA only so far and there may be a waitlist):
https://deepmind.google/technologies/veo/veo-2/

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Gaston Ingaramo, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Sundvall, Taras Bobrovytsky,, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitter: https://twitter.com/twominutepapers
Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

## Содержание

### [0:00](https://www.youtube.com/watch?v=_qQwSVzYNpA) Segment 1 (00:00 - 05:00)

okay due to popular request from you fellow Scholars let's talk about Google deep mind's new AI video generator called V2 now look at this is this V2 it is not so what is it then this is called Video Port which was one of the state-of-the-art AI video generators back in the day now what do I mean by saying back in the day like five plus years ago no absolutely not this was a state-of-the-art video AI less than a year ago and today we can do this whoo just look at the difference holy matter of papers the new V2 can create videos up to 4K resolution that is stunning and I got to say I was very surprised by the results that I am going to show you now okay but we are fellow Scholars here so we have questions what can it do what can't it do how does it work and of course how does it compare to its competitors like open AI Sora can this compete you will get answers to all of these questions dear fellow Scholars this is two-minute papers with Dr Caro now first if you are one of our OG fellow Scholars I hope this says something to you oh yes I am a computer Graphics researcher by trade and we have tons and tons of simulation programs like this where we have to write the laws of physics into a computer program and this comes out yes these are all computer simulations however today we don't even need to do that an AI video generator can create all of these pixels by itself and we only need to add a taex PR just look at that quality incredible honey much better than the company with the same name also when creating humans this fellow skyar is super lifik I see barely any or no flickering at all super coherent video so cool The Usual Suspects like a drifting car or animated movies are working superbly and of course you can also use this AI to let your imagination run wild and dream up worlds that don't even exist or maybe can't even exist so good I am almost ready to believe that this can do absolutely anything so can it well no it is not perfect it has its own limitations for instance when we have this high frequency motion with skateboarding we have lots and lots of temporal coherence issues which I will explain in a moment but there are cases with other high frequency movement with tons and tons of bees here and this seems really well done to me really interesting also sometimes humans come out with a really coarse low resolution and it gets worse look at that the face turning away from us is more like morphing into the back of the head and if we look behind the shoulder of this woman oh yes the two minute papers special that is object permanence issues something appears and when we see it again it becomes something else okay but we have two more questions to go how does all this Wizardry work well the architecture they use for this is called a diffusion Transformer model okay but what does that mean well when you do text to image the model starts out from a bunch of noise and over time reorganizes this noise to make it resemble your text prompt however here we are talking about video not just one image so easy just do it many times after each other right no no not at all you see if you do that you get something like this you get this flickering effect because the neural network does not remember well enough what images it made previously and the differences show up as flickering you can't do that what you need to do is create not just one bunch of noise but many bunches and refine them at the same time while taking into consideration not just their neighbors but every single image this is how you get long-term temporal coherence and this is why we have these consistency problems we just talked about it is getting better and better over time but it is still there now it's good to look at these results but how does it compare to its competitors and now hold on to your papers fellow Scholars and oh my goodness look at that I would say it is heavily favored against its competitors especially against the amazing Sora that is a stunning result wow but it gets better this was about overall quality

### [5:00](https://www.youtube.com/watch?v=_qQwSVzYNpA&t=300s) Segment 2 (05:00 - 06:00)

but in terms of prompt adherence wow it is also doing extraordinarily well this is where scientists test how closely the videos are following your text BRS super important because something can look amazing but not be what you asked for at all this is not the case here it is also interesting how the results correlate with the quality as well now note that this is not from a peer-reviewed study so of course every wise fellow scholar like you applies a small grain of salt here I wanted to try it out too and ah yes maybe you fellow scholar will have better luck with it there is a link in the description so please let me know in the comments below and to think that this was possible less than a year ago and now this it is unbelievable how far we have come there is so much to be excited about in the near future what a time to be alive what do you fellow Scholars think let me know in the comments below to run your own experiments on an Nvidia GPU check out Lambda I use it myself regularly for these videos H look at that you can generate high quality images in less than a second per image I did a ton more of them and paid less than a dollar for all this crazy seriously try it out now at lamb labs. com /p papers or click the link in the description oh

---
*Источник: https://ekstraktznaniy.ru/video/17184*