# Ubisoft’s New AI: Breathing Life Into Games!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=Dt0cA2phKfU
- **Дата:** 26.11.2022
- **Длительность:** 5:58
- **Просмотры:** 251,186

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.com/papers 

📝 The paper "ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech" is available here:
https://github.com/ubisoft/ubisoft-laforge-ZeroEGGS

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Luke Dominique Warner, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=Dt0cA2phKfU) Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Earlier we talked about AI-based techniques  that can learn to clone your voice, and then,   we can perform text to speech. So, it would learn  this. And then, the AI would generate this. Yes,   this means that we can write something,  and the AI says it in our voice. So that   is text to speech. But get this, what  about speech to gesture? What is that? Well, scientists at Ubisoft had a crazy  idea. They said, let’s create a dataset   of characters where the AI can see gestures for  agreement, disagreement, being relaxed, neutral,   scared, and more, learn from it, and apply  it to new virtual characters intelligently. Sounds good, right? But, not so fast. Ubisoft  is not the first to try this idea, here is a   technique from just two years ago. Look. There  is something here, but we are humans, at least,   most of us anyway, and our eyes are highly attuned  to the gestures of other humans, which means that   if even the smallest part of the animation  is off, we will immediately notice. And I   think that every single one of you Fellow Scholars  indeed noticed that something is off here. And   based on the quality of the animation that is  required today to keep the illusion up, I am not   sure if this is coming to fruition anytime soon.   Just look at these examples. We are so far away. But in any case, let’s have a look at the new  technique together. The concept is the same,   first, in goes a speech sample, and it will be  able to generate long sequences using a style of   our choosing. Like this. That looks incredible.   Let’s compare it to the previous method.    Yes, my goodness! A night and day difference! Such incredible progress in just two years. Yes,   this is from only two years ago. And  that would already be pretty cool,   but, it gets crazier than that. Much crazier. For instance, we can plug in a new  speaker the AI hasn’t heard about yet,   and this happens. This is really good. But  that’s still nothing. It generalized to not   only new speakers, but hold on to your  papers, to new languages too. Listen. Wow, and the list of features just keeps on  going. For instance, we can also exert some   artistic control here. If we feel that the  hands are too high up, we can lower them.    Or, we can even ask for more or less  hip movement during the monologue. And I have to say, it does a lot of things really  well from the lowest energy level gestures, up to   the highest. And it still doesn’t stop there, it  is not only better than the previous technique,   is not only more controllable than the previous  technique, but is it also about 7 times faster,   and even better, it is done with a neural network  that is significantly simpler than that one. How many hours of these gestures did  the neural network get access to to   learn it? How big was the training set size?   Actually, it requires very little information.    All it was given was about 2 hours of these  gestures, and it is not copy-pasting, but,   it can continue these gestures for a long-long  monologue seamlessly. And as a cherry on top,   even some creative control is allowed. So,  these amazing virtual worlds are going to   be even more full of lifelike  characters, and with this work,   this process is going to be even easier and more  accessible for all of us. What a time to be alive!

### [5:00](https://www.youtube.com/watch?v=Dt0cA2phKfU&t=300s) Segment 2 (05:00 - 05:00)

Thanks for watching and for your generous  support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/13378*