# NVIDIA's Magical AI Speaks Using Your Voice! 🙊

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=KeepnvtICWo
- **Дата:** 12.03.2022
- **Длительность:** 5:36
- **Просмотры:** 249,161

## Описание

❤️ Check out Cohere and sign up for free today: https://cohere.ai/papers

📝 The paper "Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion" is available here:
https://research.nvidia.com/publication/2017-07_Audio-Driven-Facial-Animation

Details about #Audio2Face are available here:
https://www.nvidia.com/en-us/omniverse/apps/audio2face/
https://docs.omniverse.nvidia.com/app_audio2face/app_audio2face/overview.html

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, Christian Ahlin, Eric Martel, Gordon Child, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background image credit:
https://pixabay.com/vectors/triangles-polygon-color-pink-1430105/
Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu

Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://discordapp.com/invite/hbcTJu2

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

#nvidia

## Содержание

### [0:00](https://www.youtube.com/watch?v=KeepnvtICWo) Intro

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today, through the power of AI research, we are going to see how easily we can ask virtual characters to not only say what we say, but we can even become an art director and ask them to add some emotion

### [0:21](https://www.youtube.com/watch?v=KeepnvtICWo&t=21s) Art Director

to it. So, how does this happen?

### [0:41](https://www.youtube.com/watch?v=KeepnvtICWo&t=41s) How it Works

Well, in goes what we say, for instance, me uttering Dear Fellow Scholars, or anything else. And, here is the key, we can also specify the emotional state of the character. And this AI does the rest. That is absolutely amazing. But, it gets even more amazing. Now, hold on to your papers, and, look. Yes, that’s right, this was possible back in 2017, approximately 400 Two Minute Papers episodes ago. And whenever I showcase results like this, I always get the question from you Fellow Scholars, asking that “yes, this all looks great, but when do I get to use this? ”. And the answer is, right now. Why? Because NVIDIA has released Audio2Face, a collection of AI techniques that we can use to perform this quickly and easily. Look, we can record our voice live, and have a virtual character say what we are saying. But, it does not stop there, it also has 3 amazing features.

### [1:55](https://www.youtube.com/watch?v=KeepnvtICWo&t=115s) Features

One, we can even perform a face swap, not only between humanoid characters, but, my goodness. Even from a humanoid to, for instance, a Rhino! Now that’s something. I love it. But wait, there is more. There is this. And this too. Two, we can still specify emotions, like anger, sadness and excitement, and the virtual character will perform that for us. We only need to provide our voice, no more acting skills are required. In my opinion, this will be a godsend in any kind of digital media, computer games, or even when meeting our friends in a virtual space. Three, the usability of this technique is out of this world. For instance, it does not eat a great deal of resources, so we can run multiple instances of it at the same time. This is a wonderful usability feature, one of many that really makes or breaks when it comes to a new technique being used in the industry or not. An aspect not to be underestimated. And here is another usability feature: it works well with Unreal Engine's MetaHuman, this is a piece of software that can create virtual humans. And with that, we can not only create these virtual humans, but become the voice actors for them without having to hire a bunch of animators. How cool is that. Now I believe this is an earlier version of MetaHuman, here is the newer one.

### [3:42](https://www.youtube.com/watch?v=KeepnvtICWo&t=222s) Demo

Wow, way better. Just imagine how cool it would be to voice these characters automatically. Now, the important lesson is that this was possible in a paper in 2017, and now, in a few years it has vastly improved, so much so that it is now out there deployed in a real product that we can use right now. That is a powerful democratizing force for computer animation. So, yes, the papers that you see here are real. As real as it gets, and this tech transfer can often occur in just a few years time. In some other cases, even quicker.

### [4:25](https://www.youtube.com/watch?v=KeepnvtICWo&t=265s) Outro

What a time to be alive! So, what would you use this for? I’d love to know what you think. Let me know in the comments below! Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/13630*