# NVIDIA’s New AI: Erasing Reality

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=RaNay3x0Fmk
- **Дата:** 06.02.2026
- **Длительность:** 9:14
- **Просмотры:** 67,078
- **Источник:** https://ekstraktznaniy.ru/video/11394

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

📝 The paper and the code are now available here:
https://dvirsamuel.github.io/omnimattezero.github.io/
https://github.com/dvirsamuel/OmnimatteZero

Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi
 
My research: https://cg.tuwien.ac.at/~zsolnai/

#nvidia

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Insane paper today. This new work can delete stuff from your videos, but it does it in the right way. While many previous techniques fail at it, we have a bunch of puppies here. Now, let's just have one and remove the rest highlighted with blue. Previous technique from 2023. This is not great. This is a blurry mess. And it didn't do the removal at all. Now, previous technique from 2025. Oh, yes. Did it work? Yes and no. That's a classic. True. The dogs are gone. But are we done? Nope. Because not only the dogs need to be removed, but the secondary effects like shadows, too. Now, let's see the new technique. Look, now we're talking. It removed the shadows too for all of the frames. And it has done it well in each frame. Mess up just one and will easily find it. And by the end of this video, you will understand exactly how it does it. I'll try my best. And if you're a cat person, can it do that, too? Wait, wait. The interesting part here is not the cat, is the movement of the grass blades. Look, the cat has some secondary effects, too, which need to be cancelled out. Can it do that? Let's see. Oh my, it can. This is glorious. Okay, let's make it even harder. The blinking column is very easy to remove, but you wise fellow scholars already know what is going on. Oh, baby. Glossy reflections. You have to remove those two. Can it? Yep. I am pixel peeping here and I am not finding anything. Incredible. This new technique is called Omnimat Zero and is a collaboration between Nvidia and other labs. So, omnimat zero. Why zero? Is this the number of frames per second we get? No. I'll tell you how slow or quick it is in a moment. It is going to be pretty incredible. So, what about shadows that merge? The shadow of the bench has to remain, but person has to go. There is no way anyone can pull that off, right? Well, this one can do that, too. That is kind of insane. However, look, you pay for this by giving up some sharpness in the results. The new one is a bit blurriier, and I think if I look closely, I can see some artifacts. Not perfect. Now, here are three incredible bombshells that almost made me fall out of the chair. One, it uses already existing diffusion models. Two, it requires no additional AI training. And now hold on to your papers, fellow scholars, for three. It runs in real time. 25 frames per second. Wow, I didn't even think this could ever be possible. So how is all this even possible? How do you do that? Let me try to explain. Dear fellow scholars, this is two minute papers with Dr. Koa Eher. And while we talk about how it works, I'll put up some use cases that are not perfect so you can accurately see when it's good and when it's not so good. Very important to not overstate things here. Okay. So, imagine your video is a stack of jigsaw puzzles. Each frame is one full puzzle. When we remove the dog, we are essentially removing a few puzzle pieces from the middle of the board. In the past, AI tried to paint a brand new piece to fill that hole. That is slow and often looks wrong. But this method, Omnimat Zero, realizes that the video is a sequence. Since we have a stack of puzzles, you can look at the one below or above yours that has the pieces missing. That is the video from just 1 second ago or 1 second later. You don't have to guess what's missing. Just find the exact piece that is missing now and use that. Genius. And what is even more genius is that this explains all three of the bombshells we talked about. One, because it is finding and copying existing pieces rather than painting new ones from scratch. It doesn't need to go to art school. That is the zero training part. Two, it uses a standard puzzle builder that already exists. That is a pre-trained AI model that can create videos. Just take one off the shelf and plug that in. And three, most important of all, because copying a piece is instant compared to painting one, it runs in real time. It is beautiful in its simplicity. Absolutely loving this. Okay, but I hear you asking, but then why is the output footage blurriier than the input? Let's pull up this equation and try to see the problem through that. This is a mathematical trick called mean

### Segment 2 (05:00 - 09:00) [5:00]

temporal attention. Think of this as a magnet. The empty hole in our puzzle becomes magnetic. It pulls information only from the background pieces of the other puzzles in the stack. Now I just said mean temporal attention. Mean means average. So it averages those puzzles out to ensure the colors and lines match perfectly. It forces the AI to look at a timeline, not just a single picture. So why the blur on the bench and elsewhere? It comes back to that averaging part. Imagine the pieces the magnet pulls from the other puzzles are not perfectly aligned. Maybe the camera moved a tiny bit since, or the compression algorithm added some noise or artifacts. This means that the pixels change a bit over time, even on the same object. When you take five puzzle pieces that are just slightly offset from each other and smash them together to create one average piece, the sharp lines will get a little soft. If you average one and nine, you get five. The extremes get blurred out. The fine texture also gets smoothed out. This is the price we pay for stability. We trade razor sharp details for a video that doesn't flicker. A fair trade if you ask me. and one that is almost guaranteed to be solved two more papers down the line. And that is the first law of papers. Now, we are still not done yet. Not even close. We talked about how it would copy in pieces of a previous bench from earlier puzzle pieces. That's okay. But how does it know which shadow to keep and remove? Or how does it identify which blades of grass the cat has stepped on? That still sounds like witchcraft. Now there is another wonderful idea. You see in a single photo, a single puzzle, a shadow is just a dark patch of grass. But in a stack of puzzles in video, the shadow moves with the dog. Yes, the AI realizes these pieces are magnetically stuck together. So just remove things that move together. Love it. A really advanced paper explained in really simple words. And whatever the test metric is, it sweeps all of these previous technique like it wasn't even a challenge. These are the two offthe-shelf open systems it is built on. You can plug in anything here and it is a real hallmark of this new technique. Look, the score highlights that it does not make a huge difference. Just plug in anything and it will make it work. The scientists behind it also promised that the source code of this will also be available over time. I reached out and asked for them for comment and they said most likely early February. So we all get this for free. Not just the research paper but source code too. Ah thank you so much. What a time to be alive. So what this can do is absolutely incredible. But I don't see a lot of people talking about this paper. It is like treasure that no one is digging for because it doesn't pay well. But this is super important. These are the works that push humanity forward. I feel like Indiana Jones for finding these amazing works and sharing them with you fellow scholars. But I am worried because if I don't do it here, I am not sure if anyone else will. So save a paper today, subscribe, hit the bell, and leave a really kind comment. The algorithm will also reward you for it. Here you see me running the full Deepseek AI model through Lambda GPU cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful Nvidia GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda. ai/papers