# This Curious AI Beats Many Games...and Gets Addicted to the TV

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=fzuYEStsQxc
- **Дата:** 17.11.2018
- **Длительность:** 4:46
- **Просмотры:** 288,826

## Описание

Pick up cool perks on our Patreon page:
› https://www.patreon.com/TwoMinutePapers

Crypto and PayPal links are available below. Thank you very much for your generous support!
› PayPal: https://www.paypal.me/TwoMinutePapers
› Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh
› Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A
› LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg

The paper "Large-Scale Study of Curiosity-Driven Learning" is available here:
Paper - https://pathak22.github.io/large-scale-curiosity/
Blog post - https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, John De Witt, Kjartan Olason, Lorin Atzberger, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga.
https://www.patreon.com/TwoMinutePapers

Thumbnail background image credit: https://pixabay.com/photo-3774381/
Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=fzuYEStsQxc) Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. Reinforcement learning is a learning algorithm that chooses a set of actions in an environment to maximize a score. This class of techniques enables us to train an AI to master video games, avoiding obstacles with a drone, cleaning up a table with a robot arm, and has many more really cool applications. We use the word score and reward interchangeably, and the goal is that over time, the agent has to learn to maximize a prescribed reward. So where should the rewards come from? Most techniques work by using extrinsic rewards. Extrinsic rewards are only a half-solution as they need to come from somewhere, either from the game in the form of a game score, which simply isn't present in every game. And even if it is present in a game, it is very different for Atari breakout and for instance, a strategy game. Intrinsic rewards are designed to come to the rescue, so the AI would be able to completely ignore the in-game score and somehow have some sort of inner motivation to drive an AI to complete a level. But what could possibly be a good intrinsic reward that would work well on a variety of tasks? Shouldn't this be different from problem to problem? If so, we are back to square one. If we are to call our learner intelligent, then we need one algorithm that is able to solve a large number of different problems. If we need to reprogram it for every game, that's just a narrow intelligence. So, a key finding of this paper is that we can endow the AI with a very human-like property - curiosity. Human babies also explore the world out of curiosity and as a happy side-effect, learn a lot of useful skills to navigate in this world later. However, as in our everyday speech, the definition of curiosity is a little nebulous, we have to provide a mathematical definition for it. In this work, this is defined as trying to maximize the number of surprises. This will drive the learner to favor actions that lead to unexplored regions and complex dynamics in a game. So, how do these curious agents fare? Well, quite good! In Pong, when the agent plays against itself, it will end up in long matches passing the ball between the two paddles. How about bowling? Well, I cannot resist but quote the authors for this one. The agent learned to play the game better than agents trained to maximize the (clipped) extrinsic reward directly. We think this is because the agent gets attracted to the difficult-to-predict flashing of the scoreboard occurring after the strikes. With a little stretch one could perhaps say that this AI is showing signs of addiction. I wonder how it would do with modern mobile games with loot boxes? But, we'll leave that for future work now. How about Super Mario? Well, the agent is very curious to see how the levels continue, so it learns all the necessary skills to beat the game. Incredible. However, the more seasoned Fellow Scholars immediately find that there is a catch. What if we sit down the AI in front of a TV that constantly plays new material? You may think this is some kind of a joke, but it's not. It is a perfectly valid issue, because due to its curiosity, the AI would have to stay there forever and not start exploring the level. This is the good old definition of TV addiction. Talk about humanlike properties. And sure enough, as soon as we turn off the TV, the agent gets to work immediately. Who would have thought! The paper notes that this challenge needs to be dealt with over time, however, the algorithm was tested on a large variety of problems, and it did not come up in practice. And the key insight is that curiosity is not only a great replacement for extrinsic rewards, the two are often aligned, but curiosity, in some cases, is even superior to that. That is an amazing value proposition for something that we can run on any problem without any additional work. So, curious agents that are addicted to flashing score screens and TVs. What a time to be alive! And, if you enjoyed this episode and you wish to help us on our quest to inform even more people about these amazing stories, please consider supporting us on Patreon. com/TwoMinutePapers. You can pick up cool perks there to keep your papers addiction in check. As always, there is a link to it and to the paper in the video description. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14389*