# Curiosity-Driven AI: How Effective Is It? | Two Minute Papers #257

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=9S2g7iixB9c
- **Дата:** 19.06.2018
- **Длительность:** 2:41
- **Просмотры:** 41,208
- **Источник:** https://ekstraktznaniy.ru/video/14455

## Описание

The paper "Curiosity-driven Exploration by Self-supervised Prediction" and its source code is available here:
https://pathak22.github.io/noreward-rl/

Pick up cool perks on our Patreon page: https://www.patreon.com/TwoMinutePapers

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Esa Turkulainen, Geronimo Moralez, Kjartan Olason, Lorin Atzberger, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Rafael Harutyuynyan, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

One-time payment links are available below. Thank you very much for your generous support!
PayPal: https://www.paypal.me/TwoMinutePapers
Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh
Ethereum: 0x002BB

## Транскрипт

### <Untitled Chapter 1> []

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. There are many research projects about teaching an AI to play video games well. We have seen some amazing results from DeepMind's Deep Q Learning algorithm that performed on a superhuman level on many games, but faltered on others. What really made the difference is the sparsity of rewards and the lack of longer-term planning. What this means is that the more often we see the score change on our screen, the faster we know how well we are doing and change our strategy if needed. For instance, if we make a mistake in Atari Breakout, we lose a life almost immediately, but in a strategy game, a bad decision may come back to haunt us up to an hour after committing it. So, what can we do to build an AI that can deal with these cases?

### Our agent discovers how to play Mario Bros. without any rewards from the game [0:43]

So far when we have talked about extrinsic rewards that come from the environment, for instance, our score in a video game, and most existing AIs are for all intents and purposes

### Lets see what the same policy does on a different level [0:55]

extrinsic score-maximizing machines. And this work is about introducing an intrinsic reward by endowing an AI with one of the most humanlike attributes: curiosity. But hold on right there, how can a machine possibly become curious? Well, curiosity is defined by whatever mathematical definition we attach to it. In this work, curiosity is defined as the AI's ability to predict the results of its own actions. This is big, because it gives the AI tools to pre-emptively start learning skills that

### Discovering how to navigate in VizDoom [1:26]

don't seem useful now, but might be useful in the future. In short, this AI is driven to explore, even if it hasn't been told how well it is doing. It will naturally start exploring levels in Super Mario, even without seeing the score. And now comes the great part: this curiosity really teaches the AI to learn new skills, and when we drop it into a new, previously unseen level, it will perform much better than a non-curious one. When playing Doom, the legendary first-person shooter game, it will also start exploring the level and is able to rapidly solve hard exploration tasks. The comparisons reveal that an AI infused with curiosity performs significantly better on easier tasks, but the even cooler part is that with curiosity, we can further increase the difficulty of the games and the sparsity of external rewards and can expect the agent to do well, even when previous algorithms failed. This will be able to play much harder games than previous works. And remember, games are only used to demonstrate the concept here, this will be able to do so much more. Love it. Thanks for watching and for your generous support, and I'll see you next time!
