❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers
📝 The paper "Agent57: Outperforming the Atari Human Benchmark" is available here:
https://deepmind.com/blog/article/Agent57-Outperforming-the-human-Atari-benchmark
https://arxiv.org/abs/2003.13350
❤️ Watch these videos in early access on our Patreon page or join us here on YouTube:
- https://www.patreon.com/TwoMinutePapers
- https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join
Apologies and special thanks to Owen Skarpness!
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Michael Albrecht, Nader S., Owen Campbell-Moore, Owen Skarpness, Rob Rowe, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh
More info if you would like to appear here: https://www.patreon.com/TwoMinutePapers
Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://discordapp.com/invite/hbcTJu2
Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/
#Agent57 #DeepMind
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Between 2013 and 2015, DeepMind worked on an incredible learning algorithm by the name Deep Reinforcement Learning. This technique looked at the pixels of the game, was given a controller and played much like a human would… with the exception that it learned to play some Atari games on a superhuman level. I have tried to train it a few years ago and would like to invite you for a marvelous journey to see what happened. When it starts learning to play an old game, Atari breakout, at first, the algorithm loses all of its lives without any signs of intelligent action. If we wait a bit, it becomes better at playing the game, roughly matching the skill level of an adept player. But here's the catch, if we wait for longer, we get something absolutely spectacular. Over time, it learns to play like a pro, and finds out that the best way to win the game is digging a tunnel through the bricks and hit them from behind. This technique is combination of a neural network that processes the visual data that we see on the screen, and a reinforcement learner that comes up with the gameplay-related decisions. This is an amazing algorithm, a true breakthrough in AI research. However, it had its own issues. For instance, it did not do well on Montezuma’s revenge or Pitfall because these games require more long-term planning. Believe it or not, the solution in a followup work was to infuse these agents with a very human-like property… curiosity. That agent was able to do much, much better at these games…and then got addicted to the TV. But that’s a different story. Note that this has been remedied since. And believe it or not, as impossible as it may sound, all of this has been improved significantly. This new work is called Agent57, and it plays better than humans, on all 57 Atari games. Absolute insanity. Let’s have a look at it in action and then in a moment, I’ll try to explain how it does what it does. You see Agent57 doing really well at the Solaris game here. This space battle game is one of the most impressive games on the Atari as it contains 16 quadrants, 48 sectors, space battles, warp mechanics, pirate ships, fuel management, and much more, you name it. This game is not only quite complex, but it also is a credit assignment nightmare for an AI to play. This credit assignment problem means that it can happen that we choose an action, and we only win or lose hundreds of actions later, leaving us with no idea as to which of our actions led to this win or loss, thus, making it difficult to learn from our actions. This Solaris game is a credit assignment nightmare. Let me try to bring this point to life by talking about school. In school, when we take an exam, we hand it in, and the teacher gives us feedback for every single one of our solutions and tells us whether we were correct or not. We know exactly where we did well, and what we need to practice to do better next time. Clear, simple, easy. Solaris, on the other hand, not so much! If this were a school project, the Solaris game would be a brutal, merciless teacher. Would you like to know your grade? No grades, but he tells you that you failed. Well, that’s weird, okay. Where did we fail? He won’t say. What should we do better next time to improve? You’ll figure it out bucko! Also, we wrote this exam 10 weeks ago, why do we only get to know about the results now? No answer. I think in this case, we can conclude that this would be a challenging learning environment even for a motivated human, so just imagine how hard it is for an AI! Hopefully this puts into perspective how incredible it is that Agent57 performs well on this game. It truly looks like science fiction. To understand what Agent57 adds to this, it was given something called a meta-controller that can decide when to prioritize short and long term planning. On the short term, we typically have mechanical challenges, like avoiding a skull in Montezuma’s revenge or dodging the shots of an enemy ship in Solaris. The long term part is also necessary to explore new parts of the game, and have a good strategic plan to eventually win the game. This is great because this new technique can now deal with the brutal and merciless teacher who we just introduced. Alternatively, this agent can be thought of someone who has a motivation to explore the
Segment 2 (05:00 - 07:00)
game and do well at mechanical tasks at the same time and can also prioritize these tasks. With this, for the first time, scientists at DeepMind found a learning algorithm that exceeds human performance on all 57 Atari games. And please, do not forget about the fact that DeepMind tries to solve general intelligence, and then, use general intelligence to solve everything else. This is their holy grail. In other words, they are seeking an algorithm that can learn by itself and achieve human-like performance on a wide variety of tasks. There is still plenty to do, but, we are now one step closer to that. If you learn only one thing from this video, let it be the fact that there are not 57 different methods, but one general learning algorithm that plays 57 games better than humans. What a time to be alive! I would like to show you a short message from a few days ago that melted my heart. This I got from Nathan, who has been inspired by these incredible works and he decided to turn his life around, and go back to study more. I love my job, and reading messages like this is one of the absolute best parts of it. Congratulations Nathan and note that you can take this inspiration and greatness can materialize in every aspect of life, not only in computer graphics or machine learning research. Good luck! If you're a researcher or a startup looking for cheap GPU compute to run these algorithms, check out Lambda GPU Cloud. I've talked about Lambda's GPU workstations in other videos and am happy to tell you that they're offering GPU cloud services as well. The Lambda GPU Cloud can train Imagenet to 93% accuracy for less than $19! Lambda's web-based IDE lets you easily access your instance right in your browser. And finally, hold on to your papers, because the Lambda GPU Cloud costs less than half of AWS and Azure. Make sure to go to lambdalabs. com/papers and sign up for one of their amazing GPU instances today. Our thanks to Lambda for helping us make better videos for you. Thanks for watching and for your generous support, and I'll see you next time!