# NVIDIA’s New AI: Virtual Worlds From Nothing! + Gemini Update!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=-LhxuyevVFg
- **Дата:** 14.12.2023
- **Длительность:** 9:40
- **Просмотры:** 81,875

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers

📝 The papers are available here:
SceneScape: https://scenescape.github.io/
Image to 3D: https://mrtornado24.github.io/DreamCraft3D/
Dehaze: https://algolzw.github.io/daclip-uir/

AI film festival: https://aiff.runwayml.com/

📝 My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, Jie Yu, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu
Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/
Twitter: https://twitter.com/twominutepapers

#nvidia #gemini

## Содержание

### [0:00](https://www.youtube.com/watch?v=-LhxuyevVFg) Intro

Today is a good day, Fellow Scholars. Why is  that? Because we are going to talk about this,   this and this, three incredible new papers  that will help you unleash your creativity   like never before. And a word about Google  DeepMind’s Gemini AI at the end of the video.

### [0:21](https://www.youtube.com/watch?v=-LhxuyevVFg&t=21s) Virtual Film Director

First, let’s try to become a virtual film  director, and create videos. But wait,   one of our previous episodes was exactly on  that. Yes…but here, you get to be a director,   but not in the way you think. So, how? Well,  first, let’s start with a photograph. Now   get this, Google’s earlier AI helps us to  fly into this photo. This is incredible,   it even supports curved camera motions, and  long-term videos. However, there is a problem.    What is the problem? Long-term coherence is  not the greatest. The scene seems to deviate   further and further away from the origin photo,  of course, because it has to make up a lot of   stuff. There is a follow up paper on this, this is  better, but the issues still remain the same. But,   we learned something important here.   Whatever the new solution is going to   look like, now we know that we should be  evaluating it on quality and coherence. Now let’s have a look at a new text to video  technique. It promises zero shot scene generation.    This is incredible. What is that? It means that  it is able to generate scenes with these prompts,   kind of like an artist painting a landscape  that they have never seen before. It did not   have access to similar videos in its training  data, and yet, here it is. I love it. And,   back to quality and coherence: in my  opinion, we have higher-quality results,   and coherence is really good too. All we  need is just a piece of text input. So,   how? How is this even possible?   Well, I will tell you in a moment. Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Now let’s compare it to previous techniques,  this is against Runway’s GEN-1. We showcased   this in an earlier video too. Hmm, do you see what  I see here? Yes, it is not your eyes, the output   seems a little slower, GEN-1 indeed generates a  few duplicate frames. Otherwise, not bad. This   is a tool that is already being used out there to  create movie projects. Not feature length movies,   but do not let that fool you. AI film festivals  are already being held where people put together   videos of 1-10 minutes in length. Not 5-second  videos anymore. Prizes are also given out,   and if that fires you up, you can do submit your  own work too. Everybody can become an artist and   take part in this. What an incredible time to be  alive, right. So that was GEN-1. I will note that   GEN-2 also exists, to the best of my knowledge,  without a paper. Now, let’s compare GEN-1 to the   new technique…and the new technique, wow, this is  an incredible jump in just one paper. And GEN-1   appeared not years ago, but less than a year ago.   Such improvement in less than one year. And as   always, just imagine what we will be capable of  just two more papers down the line. My goodness. But, we are not done yet. Not even close!   Now hold on to your papers Fellow Scholars,   and let’s see how it performs against  another technique from this year. And for me,   this is once again, wow. Really, this kind of  improvement in just one paper is very impressive. Now, look here, because this is super important.   What is this? Well, Fellow Scholars, this is the   mesh representation of the scene. Digital  3D geometry. And I hear you asking, great,   now we can put it into our video games, that is  important, but Károly, is that super important?    Yes it is. You see, meshes are essential  in understanding this work. And here comes   one of my favorite things about this paper.   First, it starts out with a mesh structure,   digital geometry, and over time, this AI  is like a sculptor who is working on a clay   model. Yes, as the camera moves around,  it “grows” this mesh outwards. Loving it! Now, this was text to video. And  also, text to 3D geometry. But now,   wait a second. Text prompts are great, but what  if we already have an image of our scene instead,

### [5:27](https://www.youtube.com/watch?v=-LhxuyevVFg&t=327s) Two Minute Papers

but that’s just a 2D image, and we would like 3D  geometry instead? Well, have a look at this paper.    Oh yeah! It promises that it takes just this  one image, and it guesses what is around it,   what is behind it, in a coherent manner.   Well, it promises that. But, I will believe   it when I see it. Whoa, look at that! It’s not  perfect by any means, but it is real good. And   “good” is an important keyword. Why? Well,  take it from the legendary chip engineer,   Jim Keller. That was an incredibly good summary  of what you see here. And a huge honor. Thank you.

### [6:26](https://www.youtube.com/watch?v=-LhxuyevVFg&t=386s) Conclusion

So, yes if it’s good, two more papers down the  line and it will be great. So we can now create   these videos from text, or a photograph. But  get this, you might not even need a photograph   to perform all this. Yes, that’s right. With  the third paper for today, you can even have   a photo that is almost completely destroyed, and  it will be able to restore it really well. This   can then go into the photo to 3D techniques. Can  you imagine before watching Two Minute Papers,   that someone says that from this  photo, and two AI research papers,   and it will be restored with amazingly high  quality, and then, you can even put it into   a video game. No one would have believed it. And  yet, here we are. Wow. Now note that of course,   image inpainting techniques already exist,  there are heaps and heaps of papers on it,   for instance, here is a legendary paper,  PatchMatch from 14 years ago. All handcrafted,   human ingenuity. A graphics paper, of  course. That was really impressive. But   compared to what we can do today. Wow. I  am out of words. What a time to be alive! Now as promised, a word about Gemini. We  discussed it in detail a few videos ago,   and I made sure that we mainly discuss results  not from the marketing materials, but from the   research paper itself. We are Fellow Scholars  here, and paper results are more detailed,   and easier to verify. However, while I was  talking, I also showed you some materials   like this. In the meantime, Google discussed  how this footage was made. And it was made not   with the AI-s real-time feedback on a video, but  they used still images from the video plus wrote   a prompt to give it to the AI. The answers were  then showed here. These prompts range from simple,   like what do you see here to something  more detailed. However, we did not see   these prompts. I think the best would have been  if it were presented in a way like you see here,   and second best would have been adding a  disclaimer to the footage that still images   and additional prompting was used. The paper  results that span the majority of our videos are   to the best of my knowledge, accurate. I wanted to  make really sure to revisit this just to make sure   that you always get accurate information about  these works. That is The Way of The Scholar.

---
*Источник: https://ekstraktznaniy.ru/video/12857*