# OpenAI’s New AI: Crushing Games! 🎮

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=jZT7yHVgcOo
- **Дата:** 20.06.2025
- **Длительность:** 7:26
- **Просмотры:** 66,687

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

Guide for using DeepSeek on Lambda:
https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video

📝 Paper+code: https://github.com/lmgame-org/GamingAgent
Some results: https://huggingface.co/spaces/lmgame/lmgame_bench
Try it out: https://lmgame.org

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitter: https://twitter.com/twominutepapers
Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

## Содержание

### [0:00](https://www.youtube.com/watch?v=jZT7yHVgcOo) Segment 1 (00:00 - 05:00)

I had so much fun with this work. Goodness. Finally, today you are going to see nearly all major AIs put to the test on not the usual benchmarks, but gaming. Oh yes, there are some incredible findings here. I'll tell you about. So, which is the most intelligent AI out there? We are saying PhD level AI, this and that. But can they play Tetris, Super Mario, or Sakuraban? Well, let's try Lama 4 and see what happens. We know it's good at benchmarks, but unfortunately struggles with gameplay. Which gameplay, you ask? The answer is yes. And then trying Tetris. Previous models leave a lot of gaps. Lines barely form and then collapse. Enter OpenAI's O4 Mini. Looking better, holding longer, but still not a single line cleared. Will the popular DeepS R1 change this? A good start. And wait, yes, a line is formed. But hold on. Oh no. The promising start unravels quickly. Heartbreaking. Now while we look at Claw 4 Opus, I feel like we are just looking at AIS that out compete each other in losing later than others, but not quite winning. And that is further underlined by the fact that this point system gives you one point for each piece you put down before losing the game. Now, I have high hopes for OpenAI's amazing shiny new 03 Pro. Now, hold on to your papers, fellow scholars, as it starts out a bit weird. But if we keep looking a bit longer, wow, clearing line after line. And I am starting to think that this is the first one that is actually planning ahead. We'll see that in action in a later game soon. And goodness, it did not fail by the end of the experiment, which is admittedly not super long, but very impressive indeed. Bravo. Now, Super Mario. Dear fellow scholars, this is two minute papers with Dr. Koa Eher. GPT40 is not going to win any awards here. I will tell you that right now. But Claude 3. 5 almost seems smart. Finding a hidden block, but then inexplicably dives into the abyss. 3. 7 looks better. Crushing glmbbas, leaping bravely over pits, uncovering the star and rushing towards it recklessly. Can it survive the thrill? And just when it seems destined to reach the finish line, ooh, disaster strikes. And when we try again, it is on track to go all the way and finish the level. And then, oh my, that was the first AI technique that felt a bit like watching a human play. Doing a godlike run and then messing up the easiest thing. Really cool. Final results, 03 is best, often by quite a bit. It is crushing Super Mario, soccer, and Candy Crush. 03 Pro does not have every game yet, but the ones it has been tested on show a quantum leap compared to everything else. Amazing. Now, a little logic game Soccer Band. Just push the boxes onto X marks where Gemini 2. 5 Flash. Okay, I am seeing something here. It successfully finishes the first level. Let's look at the second one. I already feel some impending doom. The first box quickly goes where it thinks it needs to. And you will see later that this is not quite where it needs to go. Then we go back and come on man. Don't do it. Do not do it. O brutal. Okay. Open AIO3. I have trust in you. First level. Easy peasy. Then the dreaded second level. Will it fall into the same trap? It knows that if you push the box on this mark, you won't be able to push the second box in here. So, nice planning in advance. And then the level almost solves itself afterwards. Good job, little AI. But after level four, it also stores out. A good leap forward. Now, we have fresh new footage of the amazing 03 Pro as well, which I will show you in a moment. The code is available, too. Make sure to check it out in the video description. And I can't wait to fire up a Lambda instance and play with these. Now, have you noticed this? Oh, yes. It is really slow. I mean, really, really slow. Each of these moves takes forever. Why? Because these tasks are not quite what these AI techniques are meant to do. Thank you for the researchers for creating something that they call a

### [5:00](https://www.youtube.com/watch?v=jZT7yHVgcOo&t=300s) Segment 2 (05:00 - 07:00)

harness which is a textual representation of the game itself which is fed to the AIS at each step and then they are asked okay what do we do now by the way this way they can also play the amazing Ace Attorney as well for some of you connoisseurs out there and as promised while you're looking at the amazing O3 Pro how far it can push this game note again that this is sped up. These moves really take a while, but that is not the important part. We have three key lessons here. You got to hear these one. Perhaps for the first time ever, we are starting to see genuine planning and strategic thinking emerge in these large models, even if it's slow. That is absolutely incredible. And lesson number two, previous benchmarks don't tell us the whole story. However, games an incredibly rich and challenging test bed for evaluating core AI capabilities. They demand long-term planning and adaptation in a way few other benchmarks can. Super useful and this is how we get to truly understand their weaknesses and strengths. And third, this one is wild. After training on Soccer Banan, the AIS improve their spatial reasoning skills. And when they play the previously unseen Tetris, they do better, up to 8% better just from reusing their knowledge learned in Soccer Banan, a game that is quite different. I think that is absolutely incredible. Perhaps a sign of some kind of intelligence arising from basically sand that is silicon. What a time to be alive. And in the meantime, 03 Pro has demonstrated these lessons and finished all six levels in the game. Bravo. Here you see me running the full Deepseek AI model through Lambda GPU Cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful NVIDIA GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda. ai/papers AI/papers or click the link in the description.

---
*Источник: https://ekstraktznaniy.ru/video/12298*