Reinforcement Learning With Noise (OpenAI) | Two Minute Papers #225
3:48

Reinforcement Learning With Noise (OpenAI) | Two Minute Papers #225

Two Minute Papers 04.02.2018 24 524 просмотров 852 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
The paper "Better Exploration with Parameter Noise" and its source code is available here: https://arxiv.org/abs/1706.01905 https://github.com/openai/baselines The write-up and our Patreon page with the details: https://www.patreon.com/posts/technical-for-16738692 https://www.patreon.com/TwoMinutePapers We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dave Rushton-Smith, Dennis Abts, Emmanuel, Eric Haddad, Esa Turkulainen, Evan Breznyik, Frank Goertzen, Kaben Gabriel Nanlohy, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Michael Orenstein, Raul Araújo da Silva, Robin Graham, Shawn Azman, Steef, Steve Messina, Sunil Kim, Torsten Reil. https://www.patreon.com/TwoMinutePapers One-time payment links are available below. Thank you very much for your generous support! PayPal: https://www.paypal.me/TwoMinutePapers Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) Artist: http://audionautix.com/ Thumbnail background image credit: https://pixabay.com/photo-2560006/ Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Facebook: https://www.facebook.com/TwoMinutePapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (1 сегментов)

Segment 1 (00:00 - 03:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This work is about improving reinforcement learning. Reinforcement learning is a learning algorithm that we can use to choose a set of actions in an environment to maximize a score. Our classical example applications are helicopter control, where the score to be maximized would be proportional to the distance that we traveled safely, or any computer game of your choice where a score can describe how well we are doing. For instance, in Frostbite, our score describes how many jumps we have survived without dying and this score is subject to maximization. Earlier, scientists at DeepMind combined a reinforcement learner with a deep neural network so the algorithm could look at the screen and play the game much like a human player would. This problem is especially difficult when the rewards are sparse. This is similar to what a confused student would experience after a written exam when only one grade is given, but the results for the individual problems are not shown. It is quite hard to know where we did well and where we missed the mark, and it is much more challenging to choose the appropriate topics to study to do better next time. When starting out, the learner starts exploring the parameter space and performs crazy, seemingly nonsensical actions until it finds a few scenarios where it is able to do well. This can be thought of as adding noise to the actions of the agent. Scientists at OpenAI proposed an approach where they add noise not directly to the actions, but the parameters of the agent, which results in perturbations that depend on the information that the agent senses. This leads to less flailing and a more systematic exploration that substantially decreases the time taken to learn tasks with sparse rewards. For instance, it makes a profound difference if we use it in the walker game. As you can see here, the algorithm with the parameter space noise is able to learn the concept of galloping, while the traditional method does, well, I am not sure what it is doing to be honest, but it is significantly less efficient. The solution does not come without challenges. For instance, different layers respond differently to this added noise, and, the effect of the noise on the outputs grows over time, which requires changing the amount of noise to be added depending on its expected effect on the output. This technique is called adaptive noise scaling. There are plenty of comparisons and other cool details in the paper, make sure to have a look, it is available in the video description. DeepMind's deep reinforcement learning was published in 2015 with some breathtaking results and superhuman plays on a number of different games, and it has already been improved leaps and bounds beyond its initial version. And we are talking about OpenAI, so of course, the source code of this project is available under the permissive MIT license. In the meantime, we have recently been able to upgrade the entirety of our sound recording pipeline through your support on Patreon. I have been yearning for this for a long-long time now and not only that, but we could also extend our software pipeline with sound processing units that use AI and work like magic. Quite fitting for the series, right? Next up is a recording room or recording corner with acoustic treatment, depending on our budget. Again, thank you for your support, it makes a huge difference. A more detailed write-up on this is available in the video description, have a look. Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник