❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers
Guide:
Rent one of their GPUs with over 16GB of VRAM
Open a terminal
Just get Ollama with this command - https://ollama.com/download/linux
Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b
📝 The paper is available here:
https://video-zero-shot.github.io/
📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitter: https://twitter.com/twominutepapers
Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
What is going on here? Video generator AIs are not supposed to be able to do this. This is just too much. Look at this fidelity. Wow. I spent years and years of my life writing physics and light simulations, but I’m never going to write anything that is this realistic. And it just pops out of this new AI. This is insanity. Okay, let me try to explain. One of the scientists from Google DeepMind just reached out to me to look at this work. But I thought, you know, so many papers, so little time. But I couldn’t resist, and when I looked into it, well, I almost fell off my chair, my only saving grace was, of course, holding on to my papers strongly enough. I recommend that you Fellow Scholars do that because this work is changing how we should be thinking about AI. Okay, this is us peering into the mind of the Veo 3 AI. Veo 3 is Google DeepMind’s latest generative video model. Text goes in, video comes out. It is super good, but man, super expensive too. But we already know that. Now scientists tried something absolute incredible with it. First, little AI, here is an image. And now you get a text prompt saying, you have to roll a burrito. Then, it creates this video. This is unbelievable! But this is still nothing compared what it is capable of. For instance, it also seems to understand a lot of advanced concepts in our world. It understands color mixing, add two kinds of paint together, and man, it knows. Try to write a simulation that is this good. Now, transfiguration. Make this teacup into a mouse. And I gotta say, here I am going to be merciless. slow it down and look at every single frame. And, look at that. Wow. Let’s slow down the transformation, it even retains the motifs and the overall style of the original teacup. That is amazing. But I am not a normal person, so that’s not what I look at. I am a light transport researcher by trade, that is, ray tracing, so what I am looking at is this. Wow, check this out! Even the specular highlights on the golden spoon change realistically. Okay, let’s do some more of that. If you have this 3D model, and you just say, make it drop onto one knee and raise the shield. Easy peasy. But that’s not even the point. Look at the reflections on the armor. Completely consistent throughout the whole video. Wow. You can even subject it to a psychological Rorschach test. So what do you see in this inkblot, little AI? Wow. You're clearly of two minds, and both are being photobombed by crabs. But that’s still nothing. Refractions? Not a problem. Goodness. Soft body simulations? Why are you even asking? It even understands material properties and sees what would happen if we would burn this paper. But please, Fellow Scholars, never burn the papers. Read them instead. Now, I’ll show you the next few ones, and I’ll tell you why they are super surprising. Image inpainting? You know, when we are missing half the image, so fill it in? Doesn’t even break a sweat. Now let’s do outpainting. Yes, that’s exactly what you think it means, imagine the world around it. And if you think this is already impressive, look at this. Wow, it just zooms out and out, and I think I believe every single pixel of it. Image edge detection, segmentation, super resolution, denoising, you name it. It can also make a low light image into a much more presentable one. Now, look, I did my doctorate in computer graphics, so I know that many of these techniques are taught in undergrad classes now. We can program a machine to do them. However, here this is completely different. Okay, why? Because this AI was not programmed to do any of this. All of these things it can do are emergent capability. What does that mean? It means that it has looked at a large amount of videos on the internet, and learned these concepts by itself. No one asked it to learn these. It learns like a little child. That is absolutely incredible. But it’s not perfect. No sir! So what’s the catch? Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Dr. Carroll.
Segment 2 (05:00 - 07:00)
Okay, limitations. Yes, you can get it to solve your water puzzle. Ha! But no one said it will do that correctly! Sometimes it gets a bit confused - like a magician who pulls the rabbit out of the hat before even putting it in. Entertaining yes, reliable? Nope. You can even subject it to an IQ test, which it fails. So it still makes plenty of mistakes, the paper discusses it in detail. Do you not have it open through the link in the video description already? 5 pushups, right now. Now note that this is just Veo3, and it is a huge jump forward from version 2, and now imagine what Veo 5 would possibly be capable of. My goodness. What a time to be alive! And I wanted to add from the paper, which you now have open already, right? 5 pushups. The authors call this thing “chain of frames”. Yes, that’s pretty much like how ChatGPT is now thinking step by step, the video model now shows its reasoning step by step too in moving pictures. It’s like watching a cartoon character think out loud, but instead of words in a thought bubble, each new frame is the next step of its reasoning. So cool! So thank you so much for reaching out, I would have completely missed this paper. Especially that as of the making of this video, it is not even linked to by the official DeepMind website that collect all their papers. Really weird. By the way, this video is not sponsored by Google or DeepMind, I have no business ties with them, they don’t even know if this video is coming or not. And if you like this kind of content, please like, subscribe, hit the bell icon, and leave a really kind comment. That way, you’ll get more of this in the future, and I’ll be honest, your kindness really makes my day. So, AI for mazes, AI for burrito wrappers, subscribe to Two Minute Papers.