# DeepMind Has A Superhuman Level Quake 3 AI Team! 🚀

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=MvFABFWPBrw
- **Дата:** 05.08.2018
- **Длительность:** 5:13
- **Просмотры:** 146,935
- **Источник:** https://ekstraktznaniy.ru/video/14431

## Описание

Pick up cool perks on our Patreon page: https://www.patreon.com/TwoMinutePapers

The paper "Human-level performance in first-person multiplayer games with population-based deep reinforcement learning" and its corresponding blog post are available here:
1. https://arxiv.org/abs/1807.01281
2. https://deepmind.com/blog/capture-the-flag/

Crypto and PayPal links are available below. Thank you very much for your generous support!
Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh
PayPal: https://www.paypal.me/TwoMinutePapers
Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A
LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Andrew Melnychuk, Angelos Evripiotis, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Esa Turkulainen, Geronimo Moralez, Kjartan Olason, Lorin Atzberger, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shaker

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. After having a look at OpenAI's effort to master the DOTA 2 game, of course, we all know that scientists at DeepMind are also hard at work on an AI that beats the Capture The Flag game mode in Quake 3. Quake III Arena is an iconic first person shooter game and Capture the Flag is a fun game mode where each team tries to take the other team's flag and carry it to their own base while protecting their own. This game mode requires good aiming skills, map presence, reading the opponents well, and tons of strategy. A nightmare situation for any kind of AI. Not only that, but in this version, the map changes from game to game, therefore the AI has to learn general concepts and be able to pull them off in a variety of different previously unseen conditions. This doesn't seem to be within the realm of possibilities to pull off. The minimaps here always show the location of the players, each are color coded to blue or red to indicate their teams. Much like humans, these AI agents learned by looking at the video output of the game and have never been told anything about the game or what the rules are. These scientists at DeepMind ran a tournament with 40 human players who were matched up against these agents randomly, both as opponents and teammates. In this tournament, a team of average human players had a win probability of 43%, where a team of strong players won slightly more than half, 52% of their games. And now hold on to your papers, because the agents were able to win 74% of their games. So the difference between the average and strong human player's winrate is 9%, and the difference between the strongest humans and the AI is more than twice that margin, 22%. This is insanity. And as you see, it barely matters what the size or the layout of the map is or how many teammates there are, the AI's winrate is always remarkably high. These agents showcase many humanlike behaviors such as staying at their own base to defend it, camping within the opponent's base, or following teammates. This builds on a new architecture by the name For The Win, FTW in short, good work folks. Instead of training one agent, it uses a population of agents that train and evolve from each other to make sure that a diverse set of playstyles are discovered. This uses recurrent neural networks, these are neural network variants that are able to learn and produce sequences of data. Here, two of these are used, a fast and a slow one that operate on different timescales but share a memory module. This means that one of them has a very accurate look at the near past, and the other one has a more coarse look, but can look back more into the past in return. If these two work together correctly, decisions can be made that are both good locally, at this point in time, and globally, to maximize the probability of winning the whole game. This is really huge because this algorithm can perform long-term planning, which is one of the key reasons why many difficult games and tasks still remain unsolved. Well, as it seems now, not for long. An additional challenge is that the game score is not necessarily subject to maximization like in most games, but there is a mapping from the scores into an internal reward, which means that the algorithm has to be able to predict its own progress towards winning. And note that even though Quake 3 and Capture The Flag is an excellent way to demonstrate the capabilities of this algorithm, this architecture can be generalized to other problems. I am going to give you a few more tidbits that I have found super interesting, but before, if you are enjoying this episode and would like to pick up some cool perks like early access, deciding the topic of future episodes or getting your name listed in the video description as a key supporter, why not support the show on Patreon? With this, you also help us make better videos in the future. You can find us at Patreon. com/TwoMinutePapers and we also support Bitcoin and other cryptocurrencies, the addresses are available in the video description. And now, onwards to the cool tidbits: - a human+agent team has been able to defeat an agent+agent team 5% of the time, indicating that these AIs are able to coordinate and play together with anyone they are given. I get goosebumps from this. Love it. - The reaction time and accuracy of the agents is better than that of humans, but not nearly perfect as many people would think. However, they outclass humans even if we artificially reduce their accuracy and reaction times. - In another experiment, two agents were paired up against two professional game tester humans who could freely communicate and train against the same agents for 12 hours to see if they can learn their patterns and force them to make mistakes. Even with this, humans had only won 25% of these games. Given the other numbers we have, it is very likely that this unfair advantage made no difference whatsoever.

### Segment 2 (05:00 - 05:00) [5:00]

How about that. If there are any more questions, make sure to have a look at the paper that describes every possible tidbit you can possibly imagine. Thanks for watching and for your generous support, and I'll see you next time!
