# This Neural Network Makes Virtual Humans Dance! 🕺

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=mb6WJ34xQXg
- **Дата:** 02.02.2021
- **Длительность:** 6:52
- **Просмотры:** 89,563
- **Источник:** https://ekstraktznaniy.ru/video/13986

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers 
❤️ Their mentioned post is available here: https://wandb.ai/wandb/in-between/reports/-Overview-Robust-Motion-In-betweening---Vmlldzo0MzkzMzA

📝 The paper "Robust Motion In-betweening" is available here:
- https://static-wordpress.akamaized.net/montreal.ubisoft.com/wp-content/uploads/2020/07/09155337/RobustMotionInbetweening.pdf
- https://montreal.ubisoft.com/en/automatic-in-betweening-for-faster-animation-authoring/

Dataset: https://github.com/XefPatterson/Ubisoft-LaForge-Animation-Dataset

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Serban, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Eric Haddad, Eric Lau, Eric Martel, Gordon Child, Haris Husic, Jace O'Brien, Javier Bustamante, Joshua Goller, Kenneth Davis, Lorin Atzberger, Lukas Biewald, Matth

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Most people think that if we have a piece of camera footage that is a little choppy, then there is nothing we can do with it, we better throw it away. Is that true? No, not at all! Earlier, we discussed two potential techniques to remedy this common problem. The problem statement is simple, in goes a choppy video, something happens, and then, out comes a smooth and creamy video. This process is often referred to as frame interpolation, or frame inbetweening. And of course, it’s easier said than done. If it works well, it really looks like magic - much like in the science fiction movies. So what are the potential somethings that we can use to make this happen? One, optical flow. This is an originally handcrafted method that tries to predict the motion that takes place between these frames. This can kind of produce new information and I use this in these videos on a regular basis, but the output footage also has to be carefully inspected for unwanted artifacts. Which are a relatively common occurrence. Two, we can also try to give a bunch of training data to a neural network, and teach it to perform this frame inbetweening. And if we do, the results are magnificent. We can do so much with this! But wait a second…if we can do this for video frames… here is a crazy idea - how about a similar kind of inbetweening…for animating humanoids? That would really be something else and it would save us so much time and work! Let’s see what this new method can do in this area! The value proposition of this technique is as simple as it gets: we set up a bunch of keyframes, these are the transparent figures, and the neural network creates realistic motion that transitions from one stage to the next one. Look, it really seems to be able to do it all, it can perform twists and turns, brisk walks and runs, and you will see in a minute, even dance moves. Hmm…this inbetweening for animating humanoid motion idea may not be so crazy after all! What’s more, this could be super useful for artists working in the industry, who can not only do all this, but they can also set up movement variations by moving the keyframes around spatially. Or we can even set up temporal variations to create different timings for the movement. Excellent. Of course, it cannot do everything, if we set up the intermediate stages in a way that uncommon motions would be required to fill in, we might end up with one of these failure cases. And all these results depend on how much training data we have with the kinds of motions we need to fill in. Let’s have a look at a more detailed example! This smooth chap has been given lots of training data with dancing moves, and…look! And when we pull out these dance moves from his training data, he becomes a drunkard. So, talking about training data. How much motion capture footage was given to this algorithm? It used the Ubisoft La Forge Animation Dataset. This contains 5 subjects, 77 sequences, about 4. 5 hours of footage in total. Wow, that is not that much. For instance, it only has 8 movement sequences for dancing. That is not that much at all. And we’ve already seen that the model can dance. That is some serious data efficiency, especially given that it can even climb through obstacles. So much knowledge has been extracted from so little data. It truly feels like we are living in a science fiction world. What a time to be alive! So, when we write a paper like this, how do we compare the results to previous techniques? How can we decide which technique is better? Well, the level 1 solution is a user study. We call some folks in, show them the footage, ask which one they liked best - the previous method, or this one? That would work, but of course, it is quite laborious, but fortunately, there is a level 2 solution. And this level 2 solution is called the Normalized Power Spectrum Similarity, NPSS in short. This is a number that we can produce with a computer, no humans are required, and it measures how believable these motions are. And the key of NPSS is that it correlates with human judgement, or in other words, if this says that a technique is better, then it is likely that humans would also come to the same conclusion. So let’s see. Here are the previous methods, NPSS is subject to minimization, in other words, the lower

### Segment 2 (05:00 - 06:00) [5:00]

the better. And, let’s see the new method…oh yes, it indeed outpaces the competition. So, there is no wonder that this incredible paper was accepted to the SIGGRAPH ASIA conference. What does that mean exactly? If research were the olympics, a SIGGRAPH or SIGGRAPH ASIA paper would be the gold medal. And, this was Mr. Felix Harvey’s first few papers. Huge congratulations! And as an additional goodie, it can create an animation of me when I lost my papers, and this is me when I found them. Do you have some more ideas on how we could put such an amazing technique to use? Let me know in the comments below. Thanks for watching and for your generous support, and I'll see you next time!
