# DeepMind’s AI Just Solved Video Generation In A Way Nobody Expected

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=spn_eTODPg8
- **Дата:** 13.10.2025
- **Длительность:** 7:47
- **Просмотры:** 257,818

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

Guide:
Rent one of their GPUs with over 16GB of VRAM
Open a terminal
Just get Ollama with this command - https://ollama.com/download/linux
Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b

📝 The paper is available here:
https://video-zero-shot.github.io/

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitter: https://twitter.com/twominutepapers
Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

## Содержание

### [0:00](https://www.youtube.com/watch?v=spn_eTODPg8) Segment 1 (00:00 - 05:00)

What is going on here? Video generator AIs  are not supposed to be able to do this. This   is just too much. Look at this fidelity. Wow. I  spent years and years of my life writing physics   and light simulations, but I’m never going  to write anything that is this realistic.    And it just pops out of this new AI. This  is insanity. Okay, let me try to explain. One of the scientists from Google DeepMind  just reached out to me to look at this work.    But I thought, you know, so many papers,  so little time. But I couldn’t resist,   and when I looked into it, well,  I almost fell off my chair,   my only saving grace was, of course,  holding on to my papers strongly enough. I recommend that you Fellow Scholars do  that because this work is changing how we   should be thinking about AI. Okay, this  is us peering into the mind of the Veo 3   AI. Veo 3 is Google DeepMind’s latest  generative video model. Text goes in,   video comes out. It is super good,  but man, super expensive too.   But we already know that. Now scientists  tried something absolute incredible with it. First, little AI, here is an image.   And now you get a text prompt saying,   you have to roll a burrito. Then,  it creates this video. This is   unbelievable! But this is still nothing  compared what it is capable of. For instance, it also seems to understand  a lot of advanced concepts in our world. It   understands color mixing, add  two kinds of paint together,   and man, it knows. Try to write  a simulation that is this good. Now, transfiguration. Make this teacup into a  mouse. And I gotta say, here I am going to be   merciless. slow it down and look  at every single frame. And, look at that. Wow.    Let’s slow down the transformation, it even  retains the motifs and the overall style of   the original teacup. That is amazing. But I am not  a normal person, so that’s not what I look at. I   am a light transport researcher by trade, that is,  ray tracing, so what I am looking at is this. Wow,   check this out! Even the specular highlights  on the golden spoon change realistically. Okay, let’s do some more of that. If you  have this 3D model, and you just say,   make it drop onto one knee and raise the shield.   Easy peasy. But that’s not even the point. Look   at the reflections on the armor. Completely  consistent throughout the whole video. Wow. You can even subject it to a psychological  Rorschach test. So what do you see in this   inkblot, little AI? Wow. You're clearly of two  minds, and both are being photobombed by crabs. But that’s still nothing. Refractions?   Not a problem. Goodness. Soft body   simulations? Why are you even asking? It even understands material properties   and sees what would happen if we  would burn this paper. But please,   Fellow Scholars, never burn  the papers. Read them instead. Now, I’ll show you the next few ones, and  I’ll tell you why they are super surprising. Image inpainting? You know, when we are missing  half the image, so fill it in? Doesn’t even break   a sweat. Now let’s do outpainting. Yes,  that’s exactly what you think it means,   imagine the world around it. And if  you think this is already impressive,   look at this. Wow, it just zooms out and out,  and I think I believe every single pixel of it. Image edge detection, segmentation,  super resolution, denoising,   you name it. It can also make a low light  image into a much more presentable one. Now, look, I did my doctorate in computer  graphics, so I know that many of these   techniques are taught in undergrad classes now.   We can program a machine to do them. However,   here this is completely different. Okay, why?   Because this AI was not programmed to do any of   this. All of these things it can do are emergent  capability. What does that mean? It means that   it has looked at a large amount of videos on the  internet, and learned these concepts by itself.    No one asked it to learn these. It learns like  a little child. That is absolutely incredible. But it’s not perfect. No sir! So what’s the catch? Dear Fellow Scholars, this is Two Minute Papers  with Dr. Károly Zsolnai-Fehér. Dr. Carroll.

### [5:00](https://www.youtube.com/watch?v=spn_eTODPg8&t=300s) Segment 2 (05:00 - 07:00)

Okay, limitations. Yes, you can get it to solve  your water puzzle. Ha! But no one said it will do   that correctly! Sometimes it gets a bit confused  - like a magician who pulls the rabbit out of the   hat before even putting it in. Entertaining yes,  reliable? Nope. You can even subject it to an IQ   test, which it fails. So it still makes plenty  of mistakes, the paper discusses it in detail.    Do you not have it open through the link in  the video description already? 5 pushups,   right now. Now note that this is just Veo3,  and it is a huge jump forward from version 2,   and now imagine what Veo 5 would possibly be  capable of. My goodness. What a time to be alive! And I wanted to add from the paper,  which you now have open already,   right? 5 pushups. The authors call this thing  “chain of frames”. Yes, that’s pretty much like   how ChatGPT is now thinking step by step, the  video model now shows its reasoning step by   step too in moving pictures. It’s like watching  a cartoon character think out loud, but instead   of words in a thought bubble, each new frame  is the next step of its reasoning. So cool! So thank you so much for reaching out, I would  have completely missed this paper. Especially   that as of the making of this video, it is not  even linked to by the official DeepMind website   that collect all their papers. Really weird. By  the way, this video is not sponsored by Google or   DeepMind, I have no business ties with them, they  don’t even know if this video is coming or not. And if you like this kind of content, please like,  subscribe, hit the bell icon, and leave a really   kind comment. That way, you’ll get more of this  in the future, and I’ll be honest, your kindness   really makes my day. So, AI for mazes, AI for  burrito wrappers, subscribe to Two Minute Papers.

---
*Источник: https://ekstraktznaniy.ru/video/12072*