# Building a Curious AI With Random Network Distillation

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=CIDRdLOWrXQ
- **Дата:** 02.12.2018
- **Длительность:** 3:31
- **Просмотры:** 30,861
- **Источник:** https://ekstraktznaniy.ru/video/14385

## Описание

This episode was supported by insilico.com. "Anything outside life extension is a complete waste of time". See their papers:
- Papers: https://www.ncbi.nlm.nih.gov/pubmed/?term=Zhavoronkov%2Ba
- Website: http://insilico.com/

The paper "Exploration by Random Network Distillation" is available here:
Blog post: https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/
Paper: https://arxiv.org/abs/1810.12894
Code: https://github.com/openai/random-network-distillation

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, Javier Bustamante, John De Witt, Kaiesh Vohra, Kjartan Olason, Lorin Atzberger, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakeri

## Транскрипт

### Segment 1 (00:00 - 03:00) []

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. In a previous episode, we talked about a class of learning algorithms that were endowed with curiosity. This new work also showcases a curious AI that aims to solve Montezuma's revenge which is a notoriously difficult platform game for an AI to finish. The main part of the difficulty arises from the fact that the AI needs to be able to plan for longer time periods, and interestingly, it also needs to learn that short-term rewards don't necessarily mean long-term success. Let's have a look at an example - quoting the authors: "There are four keys and six doors spread throughout the level. Any of the four keys can open any of the six doors, but are consumed in the process. To open the final two doors the agent must therefore forego opening two of the doors that are easier to find and that would immediately reward it for opening them. " So what this means is that we have a tricky situation, because the agent would have to disregard the fact that it is getting a nice score from opening the doors, and understand that these keys can be saved for later. This is very hard for an AI to resist, and, again, curiosity comes to the rescue. Curiosity, at least, this particular definition of it works in a way that the harder to guess for the AI what will happen, the more excited it gets to perform an action. This drives the agent to finish the game and explore as much as possible because it is curious to see what the next level holds. You see in the animation here that the big reward spikes show that the AI has found something new and meaningful, like losing a life, or narrowly avoiding an adversary. As you also see, climbing a ladder is a predictable, boring mechanic that the AI is not very excited about. Later, it becomes able to predict the results even better the second and third time around, therefore it gets even less excited about ladders. This other animation shows how this curious agent explores adjacent rooms over time. This work also introduces a technique, which the authors call random network distillation. This means that we start out from a completely randomly initialized, untrained neural network, and over time, slowly distill it into a trained one. This distillation also makes our neural network immune to the noisy TV problem from our previous episode, where our curious, unassuming agent would get stuck in front of a TV that continually plays new content. It also takes into consideration the score reported by the game, and has an internal motivation to explore as well. And hold on to your papers, because it can not only perform well in the game, but this AI is able to perform better than the average human. And again, remember that no ground truth knowledge is required, it was never demonstrated to the AI how one should play this game. Very impressive results indeed, and you see, the pace of progress in machine learning research is nothing short of incredible. Make sure to have a look at the paper in the video description for more details. We'd also like to send a big thank you to Insilico Medicine for supporting this video. They use AI for research on preventing aging, believe it or not, and are doing absolutely amazing work. Make sure to check them out in the video description. Thanks for watching and for your generous support, and I'll see you next time!
