OpenAI Safety Gym: A Safe Place For AIs To Learn 💪
4:14

OpenAI Safety Gym: A Safe Place For AIs To Learn 💪

Two Minute Papers 20.12.2019 124 573 просмотров 3 734 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Linode here and get $20 free credit on your account: https://www.linode.com/papers 📝 The paper "Benchmarking Safe Exploration in Deep Reinforcement Learning" is available here: https://openai.com/blog/safety-gym/ 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Anastasia Marchenkova, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Benji Rabhan, Brian Gilman, Bryan Learn, Christian Ahlin, Claudio Fernandes, Daniel Hasegan, Dan Kennedy, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. https://www.patreon.com/TwoMinutePapers Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Instagram: https://www.instagram.com/twominutepapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (1 сегментов)

Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. Reinforcement learning is a technique in the field of machine learning to learn how to navigate in a labyrinth, play a video game, or to teach a digital creature to walk. Usually, we are interested in a series of actions that are in some sense, optimal in a given environment. Despite the fact that many enormous tomes exist to discuss the mathematical details, the intuition behind the algorithm itself is remarkably simple. Choose an action, and if you get rewarded for it, try to find out which series of actions led to this and keep doing it. If the rewards are not coming, try something else. The reward can be, for instance, our score in a computer game or how far our digital creature could walk. Approximately a 300 episodes ago, OpenAI published one of their first major works by the name Gym, where anyone could submit their solutions and compete against each other on the same games. It was like Disneyworld for reinforcement learning researchers. A moment ago, I noted that in reinforcement learning, if the rewards are not coming we have to try something else. Hmm... is that so? Because there are cases where trying crazy new actions is downright dangerous. For instance, imagine that during the training of this robot arm, initially, it would try random actions and start flailing about, where it may damage itself, some other equipment, or even worse, humans may come to harm. Here you see an amusing example of DeepMind’s reinforcement learning agent from 2017 that liked to engage in similar flailing activities. So, what could be a possible solution for this? Well, have a look at this new work from OpenAI by the name Safety Gym. In this paper, they introduce what they call the constrained reinforcement learning formulation, in which these agents can be discouraged from performing actions that are deemed potentially dangerous in an environment. You can see an example here where the AI has to navigate through these environments and achieve a task, such as reaching the green goal signs, push buttons, or move a box around to a prescribed position. The constrained part comes in whenever some sort of safety violation happens, which are, in this environment, collisions with the boxes or blue regions. All of these events are highlighted with this red sphere and a good learning algorithm should be instructed to try to avoid these. The goal of this project is that in the future, for reinforcement learning algorithms, not only the efficiency, but the safety scores should also be measured. This way, a self-driving AI would be incentivized to not only drive recklessly to the finish line, but respect our safety standards along the journey as well. While noting that clearly, self-driving cars may be achieved with other kinds of algorithms, many of which have been in the works for years, there are many additional applications for this work: for instance, the paper discusses the case of incentivizing recommender systems to not show psychologically harmful content to its users, or to make sure that a medical question answering system does not mislead us with false information. This episode has been supported by Linode. Linode is the world’s largest independent cloud computing provider. They offer you virtual servers that make it easy and affordable to host your own app, site, project, or anything else in the cloud. Whether you’re a Linux expert or just starting to tinker with your own code, Linode will be useful for you. A few episodes ago, we played with an implementation of OpenAI’s GPT-2 where our excited viewers accidentally overloaded the system. With Linode's load balancing technology, and instances ranging from shared nanodes all the way up to dedicated GPUs you don't have to worry about your project being overloaded. To get 20 dollars of free credit, make sure to head over to  linode. com/papers  and sign up today using the promo code “papers20”. Our thanks to Linode for supporting the series and helping us make better videos for you. Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник