# MuZero: DeepMind’s New AI Mastered More Than 50 Games

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=hYV4-m7_SK8
- **Дата:** 07.01.2020
- **Длительность:** 5:27
- **Просмотры:** 125,697
- **Источник:** https://ekstraktznaniy.ru/video/14199

## Описание

❤️ Check out Linode here and get $20 free credit on your account: https://www.linode.com/papers

📝 The paper "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" is available here:
https://arxiv.org/abs/1911.08265

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Benji Rabhan, Brian Gilman, Bryan Learn, Christian Ahlin, Claudio Fernandes, Daniel Hasegan, Dan Kennedy, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krc

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. Some papers come with an intense media campaign and a lot of nice videos, and some other amazing papers are at the risk of slipping under the radar because of the lack of such a media presence. This new work from DeepMind is indeed absolutely amazing, you’ll see in a moment why, and is not really talked about. So in this video, let’s try to reward such a work. In many episodes, you get ice cream for your eyes, but today, mind. Buckle up. In the last few years, we have seen DeepMind’s AI defeat the best Go players in the world, and after OpenAI’s venture in the game of DOTA2, DeepMind embarked on a journey to defeat pro players in Starcraft 2, a real-time strategy game. This is a game that requires a great deal of mechanical skill, split-second decision making and we have imperfect information as we only see what our units can see. A nightmare situation for any AI. You see some footage of its previous games here on the screen. And, in my opinion, people seem to pay too much attention to how good a given algorithm performs, and too little to how general it is. Let me explain. DeepMind has developed a new technique that tries to rely more on its predictions of the future, and generalizes to many more games than previous techniques. This includes AlphaZero, a previous technique also from them that was able to play Go, Chess, and Japanese Chess or Shogi as well and beat any human player at these games confidently. This new method is so general, that it does as well as AlphaZero at these games, however, it can also play a wide variety of Atari games as well. And that is the key here: writing an algorithm that plays chess well has been a possibility for decades. For instance, if you wish to know more, make sure to check out Stockfish, which is an incredible open-source project and a very potent algorithm. However, Stockfish cannot play anything else - whenever we look at a new game, we have to derive a new algorithm that solves it. Not so much with these learning methods, that can generalize to a wide variety of games! This is why I would like to argue that the generalization capability of these AIs is just as important as their performance. In other words, if there were a narrow algorithm that is the best possible Chess algorithm that ever existed, or a somewhat below world-champion level AI that can play any game we can possibly imagine, I would take the latter in a heartbeat. Now, speaking about generalization, let’s see how well it does at these Atari games, shall we? After 30 minutes of time on each game, it significantly outperforms humans on nearly all of these games, the percentages show you here what kind of outperformance we are talking about. In many cases, the algorithm outperforms us several times, and up to several hundred times. Absolutely incredible. As you see, it has a more than formidable score on almost all of these games, and therefore it generalizes quite well. I’ll tell you in a moment about the games it falters at, but for now, let’s compare it to three other competing algorithms. You see one bold number per row, which always highlights the best performing algorithm for your convenience. The new technique beats the others on about 66% of the games, including the Recurrent Experience Replay technique, in short, R2D2. Yes, this is another one of those crazy paper names. And even when it falls short, it is typically very close. As a reference, humans triumphed on less than 10% of the games. We still have a big fat zero on Pitfall and Montezuma’s Revenge games. So why is that? Well, these games require long-term planning, which is one of the more difficult cases for reinforcement learning algorithms. In an earlier episode, we discussed how we can infuse an AI agent with a curiosity to go out there and explore some more with success. However, note that these algorithms are more narrow than the one we’ve been talking about today. So there is still plenty of work to be done, but I hope you see that this is incredibly nimble progress on AI research. Bravo DeepMind! What a time to be alive! This episode has been supported by Linode. Linode is the world’s largest independent cloud computing provider. They offer affordable GPU instances featuring the Quadro RTX 6000 which is tailor-made for AI, scientific computing and computer graphics projects. Exactly the kind of works you see here in this series. If you feel inspired by these works and you wish to run your experiments or deploy your already existing works through a simple and reliable hosting service, make sure to join over 800,000 other happy customers and choose Linode. To spin up your own GPU instance and

### Segment 2 (05:00 - 05:00) [5:00]

receive  a $20 free credit, visit  linode. com/papers  or click the link in the description and use the promo code “papers20” during signup. Give it a try today! Our thanks to Linode for supporting the series and helping us make better videos for you. Thanks for watching and for your generous support, and I'll see you next time!
