DeepMind’s New AI Beats OpenAI With 100x Less Data

8:25

DeepMind’s New AI Beats OpenAI With 100x Less Data

Two Minute Papers 18.11.2025 76 632 просмотров 3 538 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide: Rent one of their GPUs with over 16GB of VRAM Open a terminal Just get Ollama following the command from here - https://ollama.com/download/linux Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper is available here: https://danijar.com/project/dreamer4/ Source: https://www.youtube.com/watch?v=6bnM84xGxbg 📝 My paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Tybie Fitzhugh, Ueli Gallizzi If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers My research: https://cg.tuwien.ac.at/~zsolnai/ X/Twitter: https://twitter.com/twominutepapers Thumbnail design: Felícia Zsolnai-Fehér - http://felicia #minecraft

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

This is a mindblowing work. Yes, this is not a human playing Minecraft. This is an AI playing. But that's not really interesting, is it? I mean, why should I care? There are plenty of AIs playing Minecraft. This earlier one looked at 1 million hours of YouTube tutorials and got pretty good. So, what's this one about? Well, this is Google Deep Mind's new technique. You see it playing right now. And it has an ace up the sleeve. It was not allowed to browse YouTube to its hard content. No, it has never even tried Minecraft at all before. So, no more YouTube. You're grounded, little AI. So, what happened? Something crazy happened. Ever have the feeling when you are in the heat of an argument and you don't have a good comeback? But by the time you get back home to think, you know exactly what you would say to that jerk. Well, that's exactly what this technique does. And now hold on to your papers, fellow scholars, because here comes the crazy part. The AI was given a tiny bit of footage of humans playing the game, but then that's it. no more, not even access to the game itself. And it was not nearly enough material to master anything. So, get this. Instead, the AI built an internal world model, a little neural simulator of how Minecraft works, and then it started practicing within that simulation. Did not even touch the real game. Wowzers. Earlier, OpenAI's video pre-training technique, VPT, not to be confused with GPT, that one trained from 250,000 hours of Michelin star annotated footage, not just footage, labels of what is happening, everything. But this one, this new technique [clears throat] learned from a 100 times less data. So our question today is, is it as good [clears throat] as OpenAI's technique that had 100x more data? That would be very respectable. So let's have a look. What are you kidding me? Are you saying that by the time OpenAI success rate for what I think is a stone pickaxe down to 0% completely failing? Now this can succeed 90% of the time. What is this insanity? Wow. But it gets better. It even gets to the iron pickaxe and finally can obtain a diamond. Rarely though, but it's now possible. And it used to be impossible even with BC and VA. That is incredibly impressive. Why? Well, BC is the behavioral cloning paper. This just copies what humans did without too much thought. While VA is the vision language action model, that kind of cheats, too. If BC is like a parrot repeating moves it saw, then VA is like a student who is allowed to read all the rules before playing. So, how on earth is it possible that this imagination training outperforms all of them like crazy? Well, the research paper is available for free for all of us and it explains it all. If you know where to look, I think I got it. Let me try to explain. Dear fellow scholars, this is two minute papers with Dr. Carol. Dr. Carol, it works in three phases. Phase one, world model pre-training. First, it watches the videos to build an inner movie set of how Minecraft works. Phase two, learn what matters. So here magic happens. It starts training in its own imagination. And when it mines a block, it gets plus one point. This is instant feedback. And now it starts assigning value to the actions that it does. That is how it begins to form expectations about what's truly important to do next. And now phase three. And these dreams are now accurate and informative. So it now practices in his dreams millions and millions of times. Here it learns from imagined success and failure. And that is how it is capable of executing more than 20,000 actions in a row to get that piece of diamond. And this is really tough to get right. For instance, this formula describes how it should replay these imagined games. how to choose which action helped to get to that diamond. For instance, it learns when it is sufficient to just copy the scene. Human gameplay. Sometimes that works. But if you want to chop that tree

Segment 2 (05:00 - 08:00)

without an axe in your hand, copying doesn't help. Then you have to learn for yourself. And this AI does exactly that. Absolutely incredible. But wait a second. What does this whole thing to do with Minecraft? I mean, you look at humans doing things and you learn how the world works. Who said that this has to be limited to Minecraft? Well, that's the best part. No one. It can also dream about the real world and can simulate whatif scenarios with objects dropping, friction, and more. This shows that the same imagination that helped it mine diamonds in Minecraft can be used to teach robots to safely practice in their own simulated worlds before acting in ours. What a time to be alive. Now, I'll tell you in a moment about limitations, what it cannot do. But now, please like, subscribe, and hit the bell icon to not miss future papers. I have so many I don't even know where to start. And please leave a really kind comment. This will help you too with the YouTube algorithm. So the main limitation of this AI is that it can predict things but only on a short term. Now I hear you saying, Dr. Carol, what how is that possible? You just said that it strings 20,000 actions together to get that heavenly diamond. Is that not long-term? Yes. Absolutely right. It did just that. But it did not do that with one super long dream. No. But with many short tiny dreams that are stitched together. So it's not one long flawless dream. It is many small ones. Okay. Does that matter? Oh, big time. You see, because every short dream is accurate for just a few seconds. The agent doesn't understand long-term cause and effect. For example, in its imagination, it chops down a tree, but later the tree might pop back into existence. So, over time, there will be a small mistake that snowballs into a big one. So, yes, it can get that diamond, but not always. The longer the run, the more unreliable it gets. But I mean, getting a jump this big in just one research paper is beyond amazing. 100x less data than open AI. No access to the real game. Still does it. This is beyond amazing. And just imagine what we will be capable of just two more papers down the line. My goodness. And I got to say, I can't stop playing with OpenAI's OpenGPT model through Lambda GPU cloud. And yes, this is actual speed. I can't believe that I can have more than 100 billion parameters running super fast here. Many of you fellow scholars are using it. And if you don't, make sure to check it out. It costs only a couple dollars per hour. Insanity. You can rent an Nvidia GPU through lambda. ai. AI/papers or click the link in the video description.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник