# Hindsight Experience Replay | Two Minute Papers #192

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=Dvd1jQe3pq0
- **Дата:** 27.09.2017
- **Длительность:** 5:03
- **Просмотры:** 27,620
- **Источник:** https://ekstraktznaniy.ru/video/14584

## Описание

The paper "Hindsight Experience Replay" is available here:
https://arxiv.org/pdf/1707.01495.pdf

Our Patreon page with the details:
https://www.patreon.com/TwoMinutePapers

Recommended for you:
Deep Reinforcement Terrain Learning - https://www.youtube.com/watch?v=wBrwN4dS-DA&t=109s
Digital Creatures Learn To Walk - https://www.youtube.com/watch?v=kQ2bqz3HPJE
Task-based Animation of Virtual Characters - https://www.youtube.com/watch?v=ZHoNpxUHewQ
Real-Time Character Control With Phase-Functioned Neural Networks - https://www.youtube.com/watch?v=wlndIQHtiFw
DeepMind's AI Learns Locomotion From Scratch - https://www.youtube.com/watch?v=14zkfDTN_qo

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Dave Rushton-Smith, Dennis Abts, Esa Turkulainen, Evan Breznyik, Kaben Gabriel Nanlohy, Michael Albrecht, Michael Jensen, Michael Orenstein, Steef, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

Two Minu

## Транскрипт

### <Untitled Chapter 1> []

dear fellow scholars this is two minute papers with károly fajir reinforcement learning is an awesome algorithm that is able to play computer games navigate helicopters hit a baseball or even defeat NGO champions when combined together with a neural network and Monte Carlo tree search it is a quite general algorithm that is able to take on a variety of difficult problems that involve observing an environment and coming up with a series of actions to maximize a score in a previous episode we had a look at deepmind's algorithm where a set of movement actions had to be chosen to navigate in a difficult 3d environment efficiently the score to be maximized was the distance measured from the starting point the further our character went the higher score it was given and it has successfully learned the concept of locomotion really cool a prerequisite for a reinforcement learner to work properly is that it has to be given informative reward signals for instance if we go to a written exam as an output we would like to get a detailed breakdown of the number of points we got for each problem this way we know where we did well and which kinds of problems need some more work however imagine having a really careless teacher who never tells us the points but would only tell us whether we have failed or passed no explanation no points for individual tasks no telling whether we failed by a lot or just a tiny bit nothing first attempt we failed next time we failed again and again and again now this would be a dreadful learning experience because we would have absolutely no idea what we improve clearly this teacher would have to be fired however when formulating a reinforcement learning problem instead of using more informative scores it is much easier to just tell whether the algorithm was successful or not it is very convenient for us to be this careless teacher otherwise what score would make sense for a helicopter control problem when we almost crash into a tree this part is called reward engineering and the main issue is that we have to adapt the problem to the algorithm where the best would be if the algorithm would adapt to the problem this has been along problem in reinforcement learning research and the potential solution would open up the possibility of solving even harder and more interesting problems with learning algorithms and this is exactly what researchers at open AI try to solve by introducing hindsight

### Hindsight Experience Replay [2:30]

experience replay a gr or her in short very apt this algorithm takes on problems where the scores are binary which means that it either passed or failed the prescribed task a classic careless teacher scenario and these rewards are not only binary but very

### Pushing task [2:48]

sparse as well which further exacerbates the difficulty of the problem in the video you can see a comparison with a previous algorithm with and without the her extension the higher the number of epochs you see above the longer the algorithm was able to Train the incredible thing here is that it is able to achieve a goal even if it had never been able to reach it during training the key idea is that we can learn just

### Sliding task [3:14]

as much from undesirable outcomes as from desirable ones let me quote the authors imagine that you're learning how to play hockey and are trying to shoot a puck into a net you hit the puck but it misses the net on the right side the conclusion drawn by a standard reinforcement learning algorithm in such a situation would be that the performed sequence of actions does not lead to a successful shot and little if anything would be learned it is however possible to draw another conclusion namely that this sequence of actions would be successful if the net had been placed further to the right they have achieved this by storing and replaying previous experiences with different potential goals as always the details are available in the paper make sure to have a look now it is always good to test things out whether the whole system works well in software however its usefulness has been demonstrated by deploying it on a real robot arm you can see the goal written on the screen alongside with the results a really cool piece of work that can potentially open up new ways of thinking about reinforcement learning after all it's great to have learning algorithms that are so good they can solve problems that we formulate in such a lazy way that we have to be fired and here's a quick question do you think eight of these episodes a month is worth a dollar if you have enjoyed this episode and your answer is yes please consider supporting us on patreon details are available in the video description thanks for watching and for your generous support and I'll see you next time