# NVIDIA’s New AI: Next Level Games Are Coming!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=76VYzhs-0FE
- **Дата:** 09.06.2025
- **Длительность:** 7:12
- **Просмотры:** 70,341
- **Источник:** https://ekstraktznaniy.ru/video/12327

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers

📝 The papers are available here:
https://research.nvidia.com/labs/toronto-ai/difix3d/
https://sites.google.com/view/cast4
https://syntec-research.github.io/UVGA/

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitter: https://tw

## Транскрипт

### Intro []

I want to create amazing virtual worlds where  we can talk and play together efficiently. In   the age of AI, this shouldn’t be a problem  at all. Except that it still is impossible. You see, when we try to render a virtual copy  of the real world efficiently. It’s not great.   Breathing life into it by populating  it with objects? Not a chance.

### New AI [0:23]

And it gets even worse when we try  putting real-looking humans in it.    Oh my goodness. I don’t  want to talk to this person. So this seems completely hopeless. So  why is everyone talking about AI this,   AI that if it can’t pull this off?   Well, to get the answer to that,   all you have to do is look at these papers.   Yes, luckily, we have three amazing works that   might solve all of these three problems. Let’s  start with this one. First, rendering worlds. In goes a bunch of images about the scene, but not  everything. So we need a technique to learn the   scene, and to be able to draw it from viewpoints  we’ve never seen it from. That is really tough.    It is kind of possible with NERFs and Gaussian  splatting, two of the go-to techniques to perform   this these days. However, not so fast. If we  don’t have enough information, they can still   introduce lots of noise and visual artifacts.   And some of the results are just criminally bad. So, I don’t think a single new paper could fix all  of that of course, so let’s see…goodness. It can!    These suddenly look almost perfect. Absolutely  amazing. So how is this even possible? what   the heck happened here? Well, a genius idea  happened. This AI technique is trained not   to give us the perfect answer immediately, but  to take an imperfect one, and learn to clean it   up. That is nearly as good as giving the perfect  answer, however, it is much simpler to pull off. And when I look at the results, with previous  techniques I’m thinking, I don’t want to use   any of these. And in just one paper we go from  that to…wow, let’s start using this right now! So, worlds are working okay now. Great, but  remember, that is just 1 out of 3. What if we   want to put new things into this virtual  world? Well, previous techniques are not   great at reconstructing 3D information  from a photo or a video of something. And this one was from just 3 years ago,  and everything is so coarse here. I don’t   want to play in a world like that. Now,  things have gotten a bit better since,   for singular objects, newer AI methods  can get pretty good results. But wait   until you try an entire scene of  objects. They completely fall apart. Even the better ones have trouble  understanding object alignment and scale. Now check this out. This one is from  a different research lab. Wow! A new   AI technique can do what none of the previous  ones can do, and that is, take just one image,   not of an object, but of an entire scene,  and create a digital 3D version of it. So let’s go back to that alignment  scene, and see the new one. Wow,   so cool! The whole scene, with the correct  scales, and nothing is intersecting each other. So, how? Well, it has two incredible  ideas to make this happen. One,   it has been infused by a GPT-like AI  model that is meant to understand the   relation of these objects. And it  is doing a glorious job at that. And now let me show you the second one,  my favorite. Look at this reconstruction. As we expected, positions and scales are correct  in this scene, they are true to the input photos,   that is excellent. But…come on man. The guitar is  poking through the box. That is really difficult   to guess correctly, but here is the second  genius idea: you don’t need to do that. The   scene is generally good, but does not obey the  laws of physics. Floating, poking things. So now,   hold on to your papers Fellow Scholars, and just  run a simple correction step that is inspired by   physics simulations, and let it sort out all of  these issues. Can it? Oh my, look at that beauty! Fantastic, so, worlds and things  are good. But the last puzzle piece:

### Virtual Humans [4:45]

people, not so much. That is the grand challenge. Previous techniques are not great at creating  digital versions of real humans. Do you want to   talk to someone who looks like this? I think  not. Unfortunately, this problem may be just   too tough. Why? You see, we are wired to look at  and understand each other’s faces and gestures,   so if something is off just by a tiny  bit, the game is over. But it turns out,   there is a solution. Dear Fellow Scholars, this is  Two Minute Papers with Dr. Károly Zsolnai-Fehér. Here is the new technique. Well, this is  just so much better than the previous ones,   I don’t even know what to say. Let’s look into  the paper and see how they did it. Oh yes. Take a bunch of deformable Gaussians, deformable  little bumps, attach them cleverly onto the   geometry of the face, and this can finally capture  detailed facial motion. Even up to 4k resolution. And you can throw at it some  really strong gesturing,   and all of those deformations are now  present in your virtual version. So good! Now, not perfect. There are still some  missing details, also, the teeth and   eye movements are not great yet. There is  still a little twitching going on too. But   now let’s invoke the First Law of Papers,  which says, do not look at where we are,   will be two more papers down  the line. So, near-perfect virtual worlds are   in the works, and there is incredible progress  on it. What a time to be alive! Once again,   these are papers that very few people are talking  about, and I am worried that if we don’t talk   about them here on Two Minute Papers, no one  will know about them. If you appreciate that,   subscribe and hit the bell icon and you’ll  see a lot more stuff like this here.
