# Audio To Obama: AI Learns Lip Sync from Audio | Two Minute Papers #194

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=nsuAQcvafCs
- **Дата:** 04.10.2017
- **Длительность:** 5:04
- **Просмотры:** 37,611
- **Источник:** https://ekstraktznaniy.ru/video/14580

## Описание

The paper "Synthesizing Obama: Learning Lip Sync from Audio" is available here:
https://grail.cs.washington.edu/projects/AudioToObama/

Our Patreon page with the details:
https://www.patreon.com/TwoMinutePapers

Patreon notes:
https://www.patreon.com/TwoMinutePapers/posts?tag=what%27s%20new

Recommended for you:
WaveNet: https://www.youtube.com/watch?v=CqFIVCD1WWo
Face2face: https://www.youtube.com/watch?v=_S1lyQbbJM4

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Dave Rushton-Smith, Dennis Abts, Esa Turkulainen, Evan Breznyik, Kaben Gabriel Nanlohy, Michael Albrecht, Michael Jensen, Michael Orenstein, Steef, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

Two Minute Papers Merch:
US: http://twominutepapers.com/
EU/Worldwide: https://shop.spreadshirt.net/TwoMinutePapers/

Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/li

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This work is doing something truly remarkable: if we have a piece of audio of a real person speaking, and a target video footage, it will retime and change the video so that the target person appears to be uttering these words. Whoa! This is different from what we've seen a few episodes ago, where scientists at NVIDIA worked on synthesizing lip sync geometry for digital characters solely relying on audio footage. The results were quite amazing, have a look. This was great for animating digital characters when all we have is sound. But this time around, we're interested in reanimating the footage or real, existing people. A prerequisite to do this with a learning algorithm is to have a ton of data to train on - which we have in our possession as there are many hours of footage of the former president speaking during his weekly address. This is done using a recurrent neural network. Recurrent neural networks are learning algorithms where the inputs and outputs can be sequences of data. So here, in the first part, the input can be a piece of audio with the person saying something, and it is able to synthesize the appropriate mouth shapes and their evolution over time to match the audio. The next step is creating an actual mouth texture from this rough shape that comes from the learning algorithm, which is then used as an input to the synthesizer. Furthermore, the algorithm is also endowed with an additional pose matching module to make sure that the synthesized mouth texture aligns with the posture of the head properly. The final retiming step makes sure that the head motions follow the speech correctly. If you have any doubts whether this is required, here are some results with and without the retiming step. You can see that this indeed substantially enhances the realism of the final footage. Even better, when combined with Google DeepMind's WaveNet, given enough training data, we could skip the audio footage altogether and just write a piece of text, making Obama, or someone else say what we've written. There are also a ton of other details to be worked out, for instance, there are cases where the mouth moves before the person starts to speak, which is to be taken into consideration. The dreaded "umm"-s and "ahh"-s are classical examples of that There is also an important jaw correction step and more. This is a brilliant piece of work with many non-trivial decisions that are described in the paper - make sure to have a look at it for details, as always, there is a link to it is available the video description. The results are also compared to the Face2face paper from last year that we also covered in the series. It is absolutely insane to see this rate of progress over the lapse of only one year. If you have enjoyed this episode and you feel that eight of these videos a month is worth a dollar, please consider supporting us on Patreon. You can pick up some really cool perks there and it is also a great deal of help for us to make better videos for you in the future. Earlier I also wrote a few words about the changes we were able to make because of your amazing support. Details are available in the description. Thanks for watching and for your generous support, and I'll see you next time!