# DeepMind's AI Learns Imagination-Based Planning | Two Minute Papers #178

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=xp-YOPcjkFw
- **Дата:** 09.08.2017
- **Длительность:** 4:21
- **Просмотры:** 88,137

## Описание

The paper "Imagination-Augmented Agents for Deep Reinforcement Learning" is available here:
https://arxiv.org/abs/1707.06203

Out Patreon page with the details:
https://www.patreon.com/TwoMinutePapers

WE WOULD LIKE TO THANK OUR GENEROUS PATREON SUPPORTERS WHO MAKE TWO MINUTE PAPERS POSSIBLE:
Andrew Melnychuk, Christian Lawson, Dave Rushton-Smith, Dennis Abts, e, Eric Swenson, Esa Turkulainen, Kaben Gabriel Nanlohy, Michael Albrecht, Michael Orenstein, Steef, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

Two Minute Papers Merch:
US: http://twominutepapers.com/
EU/Worldwide: https://shop.spreadshirt.net/TwoMinutePapers/

Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/)
Artist: http://audionautix.com/ 

Thumbnail background image credit: https://pixabay.com/photo-767781/
Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook → https://www.facebook.com/TwoMinutePapers/
Twitter → https://twitter.com/karoly_zsolnai
Web → https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=xp-YOPcjkFw) Segment 1 (00:00 - 04:00)

Dear fellow scholars, this is two minute papers with Kohaa. A bit more than two years ago, the Deep Mind Guys implemented an algorithm that could play Atari Breakout on a superhuman level by looking at the video feed that you see here. And the news immediately took the world by storm. This original paper is a bit more than 2 years old and has already been referenced in well over a thousand other research papers. That is one powerful paper. This algorithm was based on a combination of a neural network and reinforcement learning. The neural network was used to understand the video feed and reinforcement learning is there to come up with the appropriate actions. This is the part that plays the game. Reinforcement learning is very suitable for tasks where we are in a changing environment and we need to choose an appropriate action based on our surroundings to maximize some sort of score. This score can be for instance how far we've gotten in a labyrinth or how many collisions we have avoided with a helicopter or any sort of score that reflects how well we are currently doing. And this algorithm works similarly to how an animal learns new things. It observes the environment, tries different actions and sees if they worked well. If yes, it will keep doing that. If not, well, let's try something else. Pavlov's dog with the bell is an excellent example of that. There are many existing works in this area and it performs remarkably well for a number of problems and computer games, but only if the reward comes relatively quickly after the action. For instance, in Breakout, if we miss the ball, we lose a life immediately. But if we hit it, we'll almost immediately break some bricks and increase our score. This is more than suitable for a well-built reinforcement learner algorithm. However, this earlier work didn't perform well on any games that required long-term planning. If Pavlov gave his dog a treat for something that it did two days ago, the animal would have no clue as to which action led to this tasty reward. And this work subject is a game where we control this green character and our goal is to push the boxes onto the red dots. This game is particularly difficult not only for algorithms but even humans because of two important reasons. One, it requires long-term planning which as we know is a huge issue for reinforcement learning algorithms. Just because a box is next to a dot doesn't mean that it is the one that belongs there. This is a particularly nasty property of the game. And two, some mistakes we make are irreversible. For instance, pushing a box in a corner can make it impossible to complete the level. If we have an algorithm that tries a bunch of actions and sees if they stick, well, that's not going to work here. It is now hopefully easy to see that this is an obscenely difficult problem. And the deep mind guys just came up with imagination augmented agents as a solution for it. So, what is behind this really cool name? The interesting part about this novel architecture is that it uses imagination which is a routine to cook up not only one action but entire plans consisting of several steps and finally choose one that has the greatest expected reward over the long term. It takes information about the present and imagines possible futures and chooses the one with the most handsome reward. And as you can see, this is only the first paper on this new architecture, and it can already solve a problem with seven boxes. This is just unreal. Absolutely amazing work. And please note that this is a fairly general algorithm that can be used for a number of different problems. This particular game was just one way of demonstrating the attractive properties of this new technique. The paper contains more results and is a great read. Make sure to have a look. Also, if you've enjoyed this episode, please consider supporting TwoMinut Papers on Patreon. Details are available in the video description. Have a look. Thanks for watching and for your generous support. And I'll see you next time.

---
*Источник: https://ekstraktznaniy.ru/video/14611*