# NVIDIA’s New Video AI: Game Changer!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=3A3OuTdsPEk
- **Дата:** 06.05.2023
- **Длительность:** 7:00
- **Просмотры:** 140,417
- **Источник:** https://ekstraktznaniy.ru/video/13188

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.com/papers 
❤️ Get more than $50 off from an upcoming W&B event in San Francisco! - https://shorturl.at/brtIQ

📝 The #NVIDIA paper "Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models " is available here:
https://research.nvidia.com/labs/toronto-ai/VideoLDM/

My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Martin, Matthew Valle, Micha

## Транскрипт

### Intro []

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Finally, as everyone knows that text to image  AIs are capable of creating incredible photos,   digital art, whatever you wish, now  we seem to be conquering video too. What? So we just write something, and  exactly that video comes out? Doesn’t   that sound impossible? That is so much  harder! You see, images is one thing,   but videos require a much greater understanding  of the world around us, so much more computation,   and one more secret ingredient that I’m  going to tell you about in a moment. So, after Google has published Imagen Video, now,  NVIDIA is here with their text to video AI too,   and it’s not just real good, it can do  things that other systems can’t do yet. So first, examples. Of course, we start out  immediately with a scholarly example of two pandas   reading a paper. Approved. It is also excellent  at time lapse videos, I love this one. We can

### Examples [1:05]

also let our imagination run wild with this one,  for instance, we can even ask for a stormtrooper   vacuum cleaning a beach. Very productive. Artistic  scenes work well too, here is the look out of a   rainy car window, but in the style of a Van Gogh  painting. Apart from a little flicker, this is   excellent. Now, what about natural phenomena? You  know that my favorite is fluid simulations, so we   have to look at one of those too. This is easily  good enough to make me crave coffee. This swimming   turtle is also a delight to look at. And these are  not some tiny, stamp-sized videos, we are getting   a sequence of are approximately 2000x1000  resolution images. Wow. That is incredible. I also liked how well this new paper can  deal with camera movements. Flying into   a fantasy landscape worked really  well, and I feel that this rotating   camera around the grapes is just one  paper away from near perfection. Wow. And these results are already very  impressive, but hold on to your papers,   because you have seen nothing yet. This can  do so much more than just text to video. With this, we can use our own characters to create  a movie out of thin air. You see, here is a bunch   of images of the test subject. Kermit, is that  you? Alright, now let’s ask the AI to make it   play the guitar. That is fantastic, but that’s not  the way of the True Scholar. True   Scholar is writing research papers. So, can it do  that too? Oh yes, yes it can. Good job, little AI! It can also generate these driving sequences.   So what is that good for? Well, of course,   to feed hypothetical situations to a self-driving  AI, so it can have a look at footage and practice   safely within a simulation before bringing  its knowledge into the real world. Loving   it. But wait. These are all quite short  sequences. So what about video length? Well,   worry not a second about that one! It can  generate videos up to 5 minutes in length.    By the end of their example, I saw at most a  tiny bit of degradation of quality or even less. And it can predict an entire video sequence  from a single image. That is also excellent   for self-driving cars. Why? Well, just  imagine giving it a starting scenario,   and it can simulate hundreds and  hundreds of potential variants of   this situation to train these  AIs. What a time to be alive! So how does all this black magic work? Well,  let’s pop the hood and look inside. Oh yes,

### Demo [4:15]

this is going to be excellent. I mean  not this, the one after this. You see,   this is a diffusion-based technique, which  means that it starts out from a piece of noise,   and gradually reorders these pixels to form  an image. But, there is a problem. Do you see   the problem? These images are completely fine,  but they don’t form a coherent video. However,   look at that! After the new proposed  temporal video fine-tuning step,   look. Now we’re talking! This is truly a  coherent video in the making. Loving it. And that temporal coherence is the  secret ingredient that I promised   that makes all this work so well. So good. Now, not even this technique is perfect.   Asking a koala to play the piano is a bit   too much. However, I bet that the First  Law of Papers applies here. The First   Law Of Papers says that research is a  process. Do not look at where we are,   will be two  more papers down the line. By the way, I just came across this. This shows  the length of Two Minute Papers videos over time.    From the first episode to 700 and beyond. It  used to be shorter. Can’t argue with that,   at this point it’s a running joke that Two Minute  Papers is never two minutes. But I wanted to   ask you. What do you think? Should these be  shorter? Let me know in the comments below. Now, I haven’t found the source  code for this paper yet, however,   there will soon be an episode about a  way for you to try to create videos like   this soon. If you’re interested, make sure to  subscribe and hit the bell icon to not miss it. Thanks for watching and for your generous  support, and I'll see you next time!