DeepMind’s Veo3 AI - The New King Is Here!

6:15

DeepMind’s Veo3 AI - The New King Is Here!

Two Minute Papers 22.05.2025 85 667 просмотров 3 493 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda: https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 More on Veo3 available here: https://deepmind.google/models/veo/ 📝 My paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli GallizziIf you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers My research: https://cg.tuwien.ac.at/~zsolnai/ X/Twitter: https://twitter.com/twominutepapers Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

This was the state-of-the-art in AI video generation two years ago. And now check this out. Look at what just arrived. But also listen. This ocean, it's a force. A wild untamed might. And she commands your awe with every breaking light. Oh my goodness. We have a new king. Yes. Google DeepMind just announced their new AI video generation technique, VO3, where you write a small piece of text and out comes a video. And it synthesized the sounds as well. Not only that, but for speech, too. Where were you on the night of the bubble bath? Wow, that is one of the hardest things to get right because as humans, we are wired to closely watch faces and detect even the slightest emotional cues while someone is speaking. In short, they have really been cooking. This is incredible. But it gets even better because it can do 10 more amazing things. One, scene changes. Did you notice that in most of the AI video generators out there, you write a little text prompt and you get one scene, but no meaningful changes? Well, not here. These scenes really tell stories like a feather getting stuck in a spiderweb, us meeting the mosquitoes of the future in a futuristic hive. Kidding, or a paper boat getting lost. All of these come out from one prompt each game changer. two reference powered videos. You take a photo of a character, perhaps yourself, and bam, you are immediately there doing what you specified in the text prompt. With this, you can appear in glorious places you've never been to, perhaps places that don't even exist. Loving this one. Three, style matching. To create a video of this amazingly creative origami world, you don't need to do much. Just fold something as a style reference, give it as an image, write a piece of text, and you can create a whole featurelength movie. Four, finally, character consistency for video. This is barely solved well enough for still images, and they did it for complete videos, and you really get the same character or can even create interesting variants of it. My head is still spinning from this. But this is nothing compared to what is coming now. Dear fellow scholars, this is two minute papers with Dr. Koa Eher. Five. Specify the first and the last frame. Start a block of marble or stone. Last frame. This griffin. Now little AI, you do the hard part. Everything in between. So, can it do it? No way. Right. And wow, this is breathtaking. Six. If you get a scene where you wish to zoom in, that is easy. You just do it yourself. But zooming out now, that's a nearly impossible problem because you would have to synthesize all the missing information and all this for video. Can it do that too? Look at that. It seems perfect to me. I can't even point out a single seam anywhere. Seven. Add an object to an already existing scene or even a human. And here they really know how to make Koi happy. Look, indirect illumination is there. That is the colors of the burning torch painting its surroundings. Absolutely beautiful. And eight, character control. You record a video of yourself, add a target image of the subject, and there you go. Now, I am a little more excited about this variant. H this equation regarding quantum entanglement in bird seed particles is proving rather elusive. Oh yes, making virtual characters come alive. What a time to be alive. Nine. You can also mark up an image with movement directions. And now try to imagine what a good result would look like. Now VO3. I am out of words. This is lowkey amazing. Why? Well, it does it exactly as we specified, but the blocks don't collide and the whole scene just makes sense. You could never create a computer program without AI that can pull off anything like this. Now, 10 is coming in a moment. In the meantime, I'll note that not even this technique is perfect. The sounds of this keyboard are still a bit off. And don't forget not all but most of these videos have sound too that you can check out through the link in the video description. And 10 the scholars who

Segment 2 (05:00 - 06:00)

know you see if you watched two minute papers all this time you knew this was coming. 6 years and 600 episodes ago we talked about a paper about an AI system that was able to guess the sound of pixels. And then 10 months ago, Google DeepMind published a follow-up work on it as well that we talked about. So all you need to do is watch 2minut papers and you will know the future. Bravo Google Deep Mind. And now step number two, we might get fully open solutions that will be able to do similar things. Imagine running this at home for free. I love being alive today. What a time to be alive. Here you see me running the full Deepseek AI model through Lambda GPU Cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful NVIDIA GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda. ai/papers AI/papers or click the link in the description.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник