# This AI Learns Acrobatics by Watching YouTube

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=ozUzomVQsWc
- **Дата:** 22.11.2018
- **Длительность:** 3:41
- **Просмотры:** 89,460

## Описание

This episode was supported by insilico.com. "Anything outside life extension is a complete waste of time". See their papers:
- Papers: https://www.ncbi.nlm.nih.gov/pubmed/?term=Zhavoronkov%2Ba
- Website: http://insilico.com/

The paper "SFV: Reinforcement Learning of Physical Skills from Videos" is available here:
1. https://xbpeng.github.io/projects/SFV/index.html
2. https://bair.berkeley.edu/blog/2018/10/09/sfv/

Pick up cool perks on our Patreon page:
› https://www.patreon.com/TwoMinutePapers

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, John De Witt, Kjartan Olason, Lorin Atzberger, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga.
https://www.patreon.com/TwoMinutePapers

Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=ozUzomVQsWc) <Untitled Chapter 1>

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. If we have an animation movie or a computer game where, like in any other digital medium, we wish to have high-quality, lifelike animations for our characters, we likely have to use motion capture. Motion capture means that we put an actor in the studio, and we ask this person to perform cartwheels and other motion types that we wish to transfer to our virtual characters. This works really well, but recording and cleaning all this data is a very expensive and laborious process. As we are entering the age of AI, of course, I wonder if there is a better way to do this? Just think about it... we have no shortage of videos here on the Youtube about people performing cartwheels and other moves, and we have a bunch of learning algorithms that know what pose they are taking during the video. Surely we can make something happen here, right? Well, yes, and no.

### [0:58](https://www.youtube.com/watch?v=ozUzomVQsWc&t=58s) Motion Reconstruction

A few methods already exist to perform this, but all of them have dealbreaking drawbacks. For instance, this previous work predicts the body poses for each frame, but each of them have small individual inaccuracies that produce this annoying flickering effect. Researchers like to refer to this as the lack of temporal coherence. But, this new technique is able to remedy this. Great result! This new work also boasts a long list of other incredible improvements.

### [1:26](https://www.youtube.com/watch?v=ozUzomVQsWc&t=86s) Adaptive State Initialization

For instance, the resulting motions are also simulated in a virtual environment, and it is shown that they are quite robust - so much so, that we can throw a bunch of boxes against

### [1:38](https://www.youtube.com/watch?v=ozUzomVQsWc&t=98s) Robustness

the AI, and it still can adjust to it. Kind of. These motions can be retargeted to different body shapes. You can see as it is demonstrated here quite aptly with this neat little nod to Boston Dynamics. It can also adapt to challenging new environments, or, get this, it can even work from a single

### [2:07](https://www.youtube.com/watch?v=ozUzomVQsWc&t=127s) Motion Completion

photo instead of a video by completing the motion seen within. What kind of wizardry is that? How could it possibly perform that? It works the following way. First, we take an input photo or video, and perform pose estimation on it. But this is still a per-frame computation, and you remember that this doesn't give us temporal consistency. This motion reconstruction step ensures that we have smooth transitions between the poses. And now comes the best part: we start simulating a virtual environment, where a digital character tries to move its body parts to perform these actions. If we do this, we can not only reproduce these motions, but also continue them. This is where the wizardry lies. If you read the paper, which you should absolutely do, you will see that it uses OpenAI's amazing

### [2:55](https://www.youtube.com/watch?v=ozUzomVQsWc&t=175s) Environment Retargeting

Proximal Policy Optimization algorithm to find the best motions. Absolutely amazing. So this can perform and complete a variety of motions

### [3:07](https://www.youtube.com/watch?v=ozUzomVQsWc&t=187s) Character Retargeting

adapts to more challenging landscapes, and do all this in a temporally smooth manner. However, the Gangnam style dance still proves to be too hard.

### [3:17](https://www.youtube.com/watch?v=ozUzomVQsWc&t=197s) Failure Cases

The technology is not there yet. We also thank Insilico Medicine for supporting this video. They work on AI-based drug discovery and aging research. They have some unbelievable papers on these topics. Make sure to check them out, and this paper as well in the video description. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14387*