# How To Train Your Virtual Dragon

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=Wxb0jN0X7cs
- **Дата:** 04.05.2019
- **Длительность:** 4:16
- **Просмотры:** 44,553

## Описание

Patreon: https://www.patreon.com/TwoMinutePapers

₿ Crypto and PayPal links are available below. Thank you very much for your generous support!
› PayPal: https://www.paypal.me/TwoMinutePapers
› Bitcoin: 1a5ttKiVQiDcr9j8JT2DoHGzLG7XTJccX
› Ethereum: 0xbBD767C0e14be1886c6610bf3F592A91D866d380
› LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg

📝 The paper "Aerobatics Control of Flying Creatures
via Self-Regulated Learning" is available here:
http://mrl.snu.ac.kr/research/ProjectAerobatics/Aerobatics.htm

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Bruno Brito, Bryan Learn, Christian Ahlin, Christoph Jadanowski, Claudio Fernandes, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Richard Reis, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga.
https://www.patreon.com/TwoMinutePapers

Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=Wxb0jN0X7cs) <Untitled Chapter 1>

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. Scientists at the Seoul National University in South Korea wrote a great paper on teaching an imaginary dragon all kinds of really cool aerobatic maneuvers, like sharp turning, rapid winding, rolling, soaring, and diving. This is all done by a reinforcement learning variant, where the problem formulation is that the AI has to continuously choose the character’s actions to maximize a reward. Here, this reward function is related to a trajectory which we can draw in advance, these are the lines that the dragon seems follow quite well. However, what you see here is the finished product. Curious to see how the dragon falters as it learns to maneuver properly? Well, we are in luck. Buckle up. You see the ideal trajectory here with black, and initially, the dragon was too clumsy to navigate in a way that even resembles this path. Then, later, it learned to start the first turn properly, but as you see here, it was unable to avoid the obstacle and likely needs to fly to the emergency room. But it would probably miss that building too, of course. After more learning, it was able to finish the first loop, but was still too inaccurate to perform the second. And finally, at last, it became adept at performing this difficult maneuver. Applause! One of the main difficulties of this problem is the fact that the dragon is always in motion and has a lot of momentum, and anything we do always has an effect later, and we not only have to find one good action, but whole sequences of actions that will lead us to

### [1:45](https://www.youtube.com/watch?v=Wxb0jN0X7cs&t=105s) Generalization

victory. This is quite difficult. So how do we do that? To accomplish this, this work not only uses a reinforcement learning variant, but also adds something called self-regulated learning to it, where we don’t present the AI with a fixed curriculum, but we put the learner in charge of its own learning. This also means that it is able to take a big, complex goal and subdivide it into new, smaller goals. In this case, the big goal is following the trajectory with some more additional constraints

### [2:18](https://www.youtube.com/watch?v=Wxb0jN0X7cs&t=138s) Intermediate Level: Ribbon

which, by itself, turned out to be too difficult to learn with these traditional techniques. Instead, the agent realizes, that if it tracks its own progress on a set of separate, but smaller subgoals, such as tracking its own orientation, positions, and rotations against

### [2:40](https://www.youtube.com/watch?v=Wxb0jN0X7cs&t=160s) Intermediate Level: XY-turn

the desired target states separately, it can finally learn to perform these amazing stunts. That sounds great, but how is this done exactly? This is done through a series of three steps, where step one is generation, where the learner creates a few alternative solutions for itself and proceeds to the second step, evaluation, where it has to judge these individual alternatives and find the best ones.

### [3:02](https://www.youtube.com/watch?v=Wxb0jN0X7cs&t=182s) Expert Level: Z-turn

And third, learning, which means looking back and recording whether these judgments indeed put the learner in a better position. By iterating these three steps, this virtual dragon learned to fly properly. Isn’t this amazing? I mentioned earlier that this kind of problem formulation is intractable without self-regulated learning, and you can see here how a previous work fares on following these trajectories. There is indeed a world of a difference between the two. So there you go, in case you enter a virtual world where you need to train your own dragon, you’ll know what to do. But just in case, also read the paper in the video description! If you enjoyed this episode and you wish to watch our other videos in early access, or get your name immortalized in the video description, please consider supporting us on Patreon through Patreon. com/TwoMinutePapers. The link is available in the video description, and this way, we can make better videos for you. We also support cryptocurrencies, the addresses are also available in the video description. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14320*