# DeepMind’s AI Watches YouTube and Learns To Play! ▶️🤖

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=jjfDO2pWpys
- **Дата:** 27.03.2021
- **Длительность:** 8:17
- **Просмотры:** 225,473

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers 
❤️ Their mentioned post is available here: https://wandb.ai/latentspace/published-work/The-Science-of-Debugging-with-W-B-Reports--Vmlldzo4OTI3Ng

📝 The paper "Playing hard exploration games by watching YouTube" is available here:
Paper: https://papers.nips.cc/paper/7557-playing-hard-exploration-games-by-watching-youtube.pdf
Gameplay videos: https://www.youtube.com/playlist?list=PLZuOGGtntKlaOoq_8wk5aKgE_u_Qcpqhu

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Serban, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Eric Haddad, Eric Martel, Gordon Child, Haris Husic,  Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Joshua Goller, Kenneth Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Mark Oates, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=jjfDO2pWpys) Introduction

dear fellow scholars this is two minute papers with dr carol jonaife between 2013 and 2015 deepmind worked on an incredible learning algorithm by the name deep reinforcement learning this technique looked at the pixels of the game was given a controller and played much like a human would with the exception that it learned to play some atari games on a superhuman level i have tried to train it a few years ago and would like to invite you for a marvelous journey to see what happened when it starts learning to play an old game atari breakout at first the algorithm loses all of its lives without any signs of intelligent action if we wait a bit it becomes better at

### [0:45](https://www.youtube.com/watch?v=jjfDO2pWpys&t=45s) The Catch

playing the game roughly matching the skill level of an adapt player but here's the catch if we wait for longer we get something absolutely spectacular over time it learns to play like a pro

### [1:00](https://www.youtube.com/watch?v=jjfDO2pWpys&t=60s) Over Time

and finds out that the best way to win the game is digging a tunnel through the bricks and hid them from behind this technique is a combination of a neural network that processes the visual data that we see on the screen and a reinforcement learner that comes up with the gameplay related decisions this is an amazing algorithm a true breakthrough in ai research a key point in this work was that the problem formulation here enabled us to measure our progress easily we hit one break we get some points so do a lot of that lose a few lives the game ends so don't do that easy enough but there are other exploration based games like montezuma's revenge or pitfall that it was not good at and man these games are a nightmare for any ai because there is no score or at the very least it's hard to define how well we are doing because there are no scores it is hard to motivate an ai to do anything at all other than just wander around aimlessly if no one tells us if we are doing well or not which way do we go explore this space or go to the next one how do we solve all this and with that let's discuss the state of play in ais playing difficult exploration-based computer games and i think you will love to see how far we have come since first there is a previous line of

### [2:30](https://www.youtube.com/watch?v=jjfDO2pWpys&t=150s) State of play

work that infused these agents with a very human-like property curiosity that agent was able to do much better at these games and then got addicted to the tv but that's a different story note that the tv problem has been remedied since

### [2:50](https://www.youtube.com/watch?v=jjfDO2pWpys&t=170s) Curiosity

and this new method attempts to solve hard exploration games by watching youtube videos of humans playing the game and learning from that as you see it

### [3:00](https://www.youtube.com/watch?v=jjfDO2pWpys&t=180s) How Does It Work

just rips through these levels in montezuma's revenge and other games too so i wonder how does all this magic happen how did this agent learn to explore well it has three things going for it that really makes this work one the skeptical scholar would say that

### [3:20](https://www.youtube.com/watch?v=jjfDO2pWpys&t=200s) Why It Works

all this takes is just copy pasting what it saw from the human player also imitation learning is not new which is a point that we will address in a moment so why bother with this now hold on to your papers and observe as it seems noticeably less efficient than the human teacher was

### [3:40](https://www.youtube.com/watch?v=jjfDO2pWpys&t=220s) Observations

until we realized that this is not the human player and ai but the other way around look it was so observant and took away so much from the human demonstrations that in the end it became even more efficient than its human teacher whoa absolutely amazing and while we are here i would like to

### [4:05](https://www.youtube.com/watch?v=jjfDO2pWpys&t=245s) Copypaste

dissect this copy-paste argument you see it has an understanding of the game and does not just copy the human demonstrator but even if it just copied what it saw it would not be so easy because the ai only sees images and it has to translate how the images change in response to us pressing buttons on the controller we might also encounter the same level but at a different time and we have to understand how to vanquish an opponent and how to perform that two nobody hooked the agent into the game

### [4:40](https://www.youtube.com/watch?v=jjfDO2pWpys&t=280s) Game Information

information which is huge this means that it doesn't know what buttons are pressed on the controller no internal numbers or game states are given to it and most importantly it is also not given the score of the game we discussed how difficult this makes everything unfortunately this means that there is no easy way out it really has to understand what it sees and mine out the relevant information from each of these videos and as you see it does that with flying colors loving it and three it can handle the domain gap previous imitation learning methods did not deal with that too well so what does that mean let's look at this latent space together and find out this is what a latent space looks like if we just embed the pixels that we see in the videos don't worry i'll tell you in a moment what that is here the clusters are nicely clumped up away from each other so that's probably good right well in this problem not so much the latent space means a place where similar kinds of data are meant to be close to each other these are snippets of the demonstration videos that the clusters relate to let's test that together do you think these images are similar yes most of us humans would say that these are quite similar in fact they are nearly the same so is this a good latent space embedding no not in the slightest this data is similar therefore these should be close to each other but this previous technique did not recognize that because these images have slightly different colors aspect ratios this has a text overlay but we all understand that despite all that we are looking at the same game through different windows so does the new technique recognize that oh yes beautiful praise the papers similar game states are now close to

### [6:45](https://www.youtube.com/watch?v=jjfDO2pWpys&t=405s) Similar Game States

each other we can align them properly and therefore we can learn more easily from them this is one of the reasons why it can play so well

### [6:55](https://www.youtube.com/watch?v=jjfDO2pWpys&t=415s) Conclusion

so there you go these new ai agents can look at how we perform complex exploration games and learn so well from us that in the end they do even better than we do and now to get them to write some amazing papers for us or you know two minute papers episodes

### [7:15](https://www.youtube.com/watch?v=jjfDO2pWpys&t=435s) Sponsor

what a time to be alive this episode has been supported by weights and biases in this post they show you how to use their tool to check and visualize what your neural network is learning and even more importantly a case study on how to find bugs in your system and fix them weights and biases provides tools to track your experiments in your deep learning projects their system is designed to save you a ton of time and money and it is actively used in projects at prestigious labs such as open ai toyota research github and more and the best part is that weights and biases is free for all individuals academics and open source projects it really is as good as it gets make sure to visit them through wnb. com papers or just click the link in the video description and you can get a free demo today our thanks to weights and biases for their long-standing support and for helping us make better videos for you thanks for watching and for your generous support and i'll see you next time

---
*Источник: https://ekstraktznaniy.ru/video/13950*