# AI Makes Video Game After Watching Tennis Matches!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=YCur6ir6wmw
- **Дата:** 19.09.2020
- **Длительность:** 5:47
- **Просмотры:** 214,165
- **Источник:** https://ekstraktznaniy.ru/video/14068

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players" is available here:
https://cs.stanford.edu/~haotianz/research/vid2player/

❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: 
- https://www.patreon.com/TwoMinutePapers
- https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Joshua Goller, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fi

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Approximately a year ago, we talked about an absolutely amazing paper by the name vid2game, in which we could grab a controller and become video game characters. It was among the first, introductory papers to tackle this problem, and in this series, we always say that two more papers down the line, and it will be improved significantly, so let’s see what’s in store, and this time, just one more paper down the line. This new work offers an impressive value proposition, which is to transform a real tennis match into a realistic looking video game that is controllable. This includes synthesizing not only movements, but also what effect the movements have on the ball as well. So, how do we control this? And now, hold on to your papers, because we can specify where the next shot would land with just one click. For instance, we can place this red dot here. And now, just think about the fact that this doesn’t just change where the ball should go, but the trajectory of the ball should be computed using a physical model, and, the kind of shot the tennis players has to perform for the resulting ball trajectory to look believable. This physical model even contains the ball’s spin velocity, and the Magnus effect created by this spin. The entire chain of animations has to be correct, and that’s exactly what happens here. Bravo! With blue, we can also specify the position the player has to await in to hit the ball next. And these virtual characters don’t just look like their real counterparts, they also play like them. You see, the authors analyzed the playstyle of these athletes and built a heatmap that contains information about their usual shot placements for the forehand and backhand shots separately, the average velocities of these shots, and even their favored recovery positions. If you have a closer look at the paper, you will see that they not only include this kind of statistical knowledge into their system, but they really went the extra mile and included common tennis strategies as well. So, how does it work? Let’s look under the hood. First, it looks at broadcast footage, from which, annotated clips are extracted that contain the movement of these players. If you look carefully, you see this red line on the spine of the player, and some more, these are annotations that tell the AI about the pose of the players. It builds a database from these clips and chooses the appropriate piece of footage for the action that is about to happen, which sounds great in theory, but in a moment, you will see that this is not nearly enough to produce a believable animation. For instance, we also need a rendering step, which has to adjust this footage to the appropriate perspective as you see here. But we have to do way more to make this work. Look! Without additional considerations, we get something like this. Not good. So, what happened here? Well, given the fact that the source datasets contain matches that are several hours long, they therefore contain many different lighting conditions. With this, visual glitches are practically guaranteed to happen. To address this, the paper describes a normalization step that can even these changes out. How well does this do its job? Let’s have a look. This is the unnormalized case. This short sequence appears to contain at least 4 of these glitches, all of which are quite apparent. And now, let’s see the new system after the normalization step. Yup. That’s what I am talking about! But these are not the only considerations the authors had to take to produce these amazing results. You see, oftentimes, quite a bit of information is missing from these frames. Our seasoned Fellow Scholars know not to despair, because we can reach out to image inpainting methods to address this. These can fill in missing details in images with sensible information. You can see NVIDIA’s work from two years ago that could do this reliably for a great variety of images. This new work uses a learning-based technique called image to image translation to fill in these details. Of course, the advantages of this new system are visible right away, and so are its limitations. For instance, temporal coherence could be improved, meaning that the tennis rackets can appear or disappear from one frame to another. The sprites are not as detailed as they could be, but, none of this really matters. What matters is that now, what’s been previously impossible is now possible, and two more papers down the line, it is very likely that all of these issues will be

### Segment 2 (05:00 - 05:00) [5:00]

ironed out. What a time to be alive! Thanks for watching and for your generous support, and I'll see you next time!