# Making Talking Memes With Voice DeepFakes!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=lLa9DUiJICk
- **Дата:** 10.11.2020
- **Длительность:** 6:36
- **Просмотры:** 120,757

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "Wav2Lip: Accurately Lip-syncing Videos In The Wild" is available here:
- Paper: https://arxiv.org/abs/2008.10010
- Try it out! - https://github.com/Rudrabha/Wav2Lip

More results are available on our Instagram page! -  https://www.instagram.com/twominutepapers/

❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: 
- https://www.patreon.com/TwoMinutePapers
- https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Eric Haddad, Eric Lau, Eric Martel, Gordon Child, Haris Husic, Javier Bustamante, Joshua Goller, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh.
If you wish to support the series, click here: https://www.patreon.com/TwoMinutePapers

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

#deepfake

## Содержание

### [0:00](https://www.youtube.com/watch?v=lLa9DUiJICk) Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, hold on to your papers, because we have an emergency situation. Everybody can make deepfakes now by recording a voice sample, such as this one, and the lips of the target subject will move as if they themselves were saying this. Not bad huh? And now to see what else this new method can do, first, let’s watch this short clip of a speech, and make sure to pay attention to the fact that the louder voice is the English translator, and if you pay attention, you can hear the chancellor’s original voice in the background too. So what is the problem here? Strictly speaking, there is no problem here, this is just the way the speech was recorded, however, what if we could recreate this video in a way that the chancellor’s lips would be synced not to her own voice, but to the voice of the English interpreter? This would give an impression as if the speech was given in English, and the video content would follow what we hear. Now that sounds like something straight out of a science fiction movie, perhaps even with today’s advanced machine learning techniques, but let’s see if it’s possible. This is a state of the art technique from last year that attempts to perform this. Hmm…there are extraneous lip movements which are the remnants of the original video, so much so that she seems to be giving two speeches at the same time. Not too convincing. So, is this not possible to pull off? Well, now, hold on to your papers, let’s see how this new paper does at the same problem. Wow, now that’s significantly better! The remnants of the previous speech are still there, but the footage is much, much more convincing. What’s even better is that the previous technique was published just one year ago by the same research group. Such a great leap in just one year, my goodness! So apparently, this is possible. But I would like to see another example, just to make sure. Checkmark. So far, this is an amazing leap, but believe it or not, this is just one of the easier applications of the new model, so let’s see what else it can do! For instance, many of us are sitting at home, yearning for some learning materials, but the vast majority of these were recorded in only one language. What if we could redub famous lectures into many other languages? Look at that! Any lecture could be available in any language and look as if they were originally recorded in these foreign languages as long as someone says the words. Which can also be kind of automated through speech synthesis these days. So, it clearly works well on real characters…but. Are you thinking what I am thinking? Three, what about lip syncing animated characters? Imagine if a line has to be changed in a Disney movie, can we synthesize new video footage without calling in the animators for a yet another all-nighter? Let’s give it a try! Indeed we can! Loving it! Let’s do one more. Four, of course, we have a lot of these meme gifs on the internet. What about redubbing those with an arbitrary line of our choice? Yup, that is indeed also possible. Well done! And imagine that this is such a leap just one more work down the line from the 2019 paper, I can only imagine what results we will see one more paper down the line. It not only does what it does better, but it can also be applied to a multitude of problems. What a time to be alive! When we look under the hood, we see that the two key components that enable this wizardry are here and here. So what does this mean exactly? It means that we jointly improve the quality of the lip syncing, and the visual quality of the video. These two modules curate the results offered by the main generator neural network, and reject solutions that don’t have enough detail or don’t match the speech that we

### [5:00](https://www.youtube.com/watch?v=lLa9DUiJICk&t=300s) Segment 2 (05:00 - 06:00)

hear, and thereby they steer it towards much higher-quality solutions. If we continue this training process for 29 hours for the lip-sync discriminator, we get these incredible results. Now, let’s have a quick look at the user study, and, humans appear to almost never prefer the older method compared to this one. I tend to agree. If you consider these forgeries to be deepfakes, then… there you go! Useful deepfakes that can potentially help people around the world stranded at home to study and improve themselves. Imagine what good this could do! Well done! Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14040*