# NVIDIA's New AI: Better AI Videos Are Here!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=NMfqlscAU3M
- **Дата:** 12.03.2023
- **Длительность:** 7:31
- **Просмотры:** 142,578
- **Источник:** https://ekstraktznaniy.ru/video/13252

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.com/papers

📝 The paper "Generating Long Videos of Dynamic Scenes" is available here:
https://www.timothybrooks.com/tech/long-videos/
Source code, datasets and pretrained models: https://github.com/NVlabs/long-video-gan

My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Ne

## Транскрипт

### Intro []

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. In 2021, scientists at Google had an absolutely  insane idea. Do you see this photo? Yes? They   said, now, let’s give this to the AI, and ask  it to fly into this photo. Insanity. And yet,   it worked like this. Absolutely  amazing. It was not given a video,   and not even a collection of  photos. Just this one photo. However, there is a problem. Do you see the  problem? Well, this is the problem. Yes,   if we look for a while, it has to come up  with newer and newer content, and over time,   the consistency of the results suffer.   We don’t get one coherent landscape,   but something that morphs into  something else not connected to it. And today, this is why I am so excited to show  you NVIDIA’s new paper on video generation. Let’s   see…yes, they promise exactly what we are  looking for. Better long-term consistency! So first, let’s see what previous techniques  can do and then, what this new one can.

### Previous techniques [1:13]

This is an excellent test case. Now, you may  be asking Károly, what in the world is this?    And I say that is exactly the point.   MoCoGAN-HD is a technique from 2021,   from just 2 years ago, and we cannot  even tell what it is trying to do. Here is a later work from 2022, the time-agnostic  time-sensitive transformer TATS in short. This   isn’t great either, but now, at least we see  what it is trying to do. And that is rendering   clouds. It starts out reasonably well, but as  you see, the results quickly start degrading,   and that is the point of today’s video.   Long-term consistency is super difficult. DiGAN,   another work from last year does  way better than these previous two,   great progress, but as you see,  the video is full of artifacts. And now, hold on to your papers Fellow  Scholars and let’s see the new technique!

### New techniques [2:22]

Holy mother of papers! Do you see that?   It is leaps and bounds beyond any of these   techniques. And all this just one more  paper down the line. How cool is that? I don’t know about you, but I am itching to see  more examples. Let’s compare against StyleGAN-V,   one of the state of the art techniques out there  for this task. First, horseback riding. I must   say it is pretty incredible that today, an AI is  capable of creating a video like this. However,   there are issues. Things morph in and out of  existence, and did you notice? This doesn’t really   synthesize new scenery, it just repeats this jump  over and over. Now let’s see the new technique.    Wow! Now we’re talking! After that first jump,  we get new scenery as we enter the forest. But it has even more advantages. Look, when  synthesizing a landscape, I have a much better

### Super resolution [3:25]

sense of camera motion and rotation with the  new technique than with the previous one. And it can do something even better too. Yes,  this is going to be everyone’s favorite. Super   resolution. What is that? The enhance thing.   First, it generates a coarse video, and then,   adds more and more detail to it until it becomes  a much sharper, higher resolution video. Just   look at the difference. It takes no more than  a heap of pixels that looks like a computer   game from 25 years ago, and it understands what  this scene is meant to portray, and it creates   this detailed version of it, and I would argue it  would do at least as well as a human would. Wow. So, clearly, the new technique seems way  better, but this is a research paper,   in which scientists at NVIDIA also have to show  mathematically that it is better. So how do they   do it? How do they measure the differences? Well,  first, at the risk of simplifying the measurement,   we show these videos to a neural network that  somewhat mimics the perception of humans. And   the lower the score it gives, the better. And…oh  my goodness. Look at that. Apart from a few, super   low-resolution cases, this new technique is so  much better than the previous methods. Incredible.

### Conclusion [5:00]

But there is another way of measuring which  is exactly better, and even more importantly,   by how much. And that is, of course,  showing these videos to humans and asking   them. In the user study, 500 people  were asked which is more realistic,   and about 80-85% of the people chose the new  technique over StyleGAN-V, the previous one.    That is incredible. And once again, all this  progress in AI research in less than a year. Now, clearly the new technique is not  perfect either, it has artifacts, morphing,   and other issues as well. However, this is  incredible progress in just one paper. And,   as always, as a wise Fellow Scholar, please invoke  the First Law of Papers, which says that research   is a process. Do not look at where we are, look  at where we will be two more papers down the line. So, finally, better long-term  consistency for AI-generated   videos. What a time to be alive!   Also, good news! The source code,   datasets and pretrained models for this paper  are available, so let the experiments begin! So,   what video you would generate using this  technique? Let me know in the comments below! Thanks for watching and for your generous  support, and I'll see you next time!
