# DeepMind Control Suite | Two Minute Papers #226

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=WhaRsrlaXLk
- **Дата:** 07.02.2018
- **Длительность:** 3:10
- **Просмотры:** 20,465
- **Источник:** https://ekstraktznaniy.ru/video/14517

## Описание

The paper "DeepMind Control Suite" and its source code is available here:
https://arxiv.org/pdf/1801.00690v1.pdf
https://github.com/deepmind/dm_control

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dave Rushton-Smith, Dennis Abts, Emmanuel, Eric Haddad, Esa Turkulainen, Evan Breznyik, Frank Goertzen, Kaben Gabriel Nanlohy, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Michael Orenstein, Raul Araújo da Silva, Robin Graham, Shawn Azman, Steef, Steve Messina, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

One-time payment links are available below. Thank you very much for your generous support!
PayPal: https://www.paypal.me/TwoMinutePapers
Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh
Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A
LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg

Music: Antarctica by Audionautix is licensed under a Cre

## Транскрипт

### <Untitled Chapter 1> []

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This footage that you see here came freshly from Google DeepMind's lab, and is about benchmarking

### pendulum: swingup [0:06]

reinforcement learning algorithms. Here, you see the classical cartpole swing-up task from this package. As the algorithm starts to play, a score is recorded that indicates how well it is doing, and the learner has to choose the appropriate actions depending on the state of the environment

### reacher: hard [0:27]

to maximize this score. Reinforcement learning is an established research subfield within machine learning with hundreds

### swimmer: 6 links [0:32]

of papers appearing every year. However, we see that most of them cherry-pick a few problems and test against previous works on this very particular selection of tasks. This paper describes a package that is not about the algorithm itself, but about helping

### fish: swim [0:46]

future research projects to be able to test their results against previous works on an

### cheetah: random actions [0:52]

equal footing. This is a great idea, which has been addressed earlier by OpenAI with their learning environment

### hopper: random actions [1:00]

by the name Gym. So the first question is, why do we need a new one? The DeepMind Control Suite provides a few differentiating features.

### hopper: hop [1:06]

One, Gym contains both discrete and continuous tasks, where this one is concentrated on continuous

### walker: random actions [1:12]

problems only. This means that state, time and action are all continuous which is usually the hallmark

### walker: run [1:22]

of more challenging and life-like problems. For an algorithm to do well, it has to be able to learn the concept of velocity, acceleration

### humanoid: random actions [1:26]

and other meaningful physical concepts and understand their evolution over time.

### humanoid: stand [1:33]

Two, there are domains where the new control suite is a superset of Gym, meaning that it

### humanoid: walk [1:39]

offers equivalent tasks, and then some more. And three, the action and reward structures are standardized. This means that the results and learning curves are much more informative and easier to read. This is crucial because research scientists read hundreds of papers every year, and this means that they don't necessarily have to look at videos, they immediately have an intuition of how an algorithm works and how it relates to previous techniques just by looking at the learning curve plots. Many tasks also include a much more challenging variant with more sparse rewards. We discussed these sparse rewards in a bit more in detail in the previous episode, if you are interested, make sure to click the card on the lower right at the end of this

### cart-pole: swingup sparse [2:25]

video. The paper also contains an exciting roadmap for future development, including quadruped

### acrobot: sparse [2:30]

locomotion, multithreaded dynamics and more. Of course, the whole suite is available, free of charge for everyone.

### CMU Motion Capture (subject 5, clip 20) [2:35]

The link is available in the description. Super excited to see a deluge of upcoming AI papers and see how they beat the living hell out of each other in 2018. Thanks for watching and for your generous support, and I'll see you next time!