This AI Makes "Audio Deepfakes"!

5:38

This AI Makes "Audio Deepfakes"!

Two Minute Papers 08.04.2020 620 454 просмотров 16 019 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers Their blog post on #deepfakes is available here: https://www.wandb.com/articles/improving-deepfake-performance-with-data 📝 The paper "Neural Voice Puppetry: Audio-driven Facial Reenactment" and its online demo are available here: Paper: https://justusthies.github.io/posts/neural-voice-puppetry/ Demo - **Update: seems to have been disabled in the meantime, apologies!** : http://kaldir.vc.in.tum.de:9000/ ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://www.patreon.com/TwoMinutePapers - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Benji Rabhan, Brian Gilman, Bryan Learn, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. https://www.patreon.com/TwoMinutePapers Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://discordapp.com/invite/hbcTJu2 Károly Zsolnai-Fehér's links: Instagram: https://www.instagram.com/twominutepapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/ #audiodeepfake #voicedeepfake #deepfake

Оглавление (4 сегментов)

<Untitled Chapter 1>

Dear Fellow Scholars, this is Two Minute Papers with this guy's name that is impossible to pronounce. My name is Dr. Károly Zsolnai-Fehér, and indeed, it seems that pronouncing my name requires some advanced technology. So what was this? I promise to tell you in a moment, but to understand what happened here, first, let’s have a look at this deepfake technique we showcased a few videos ago. As you see, we are at the point where our mouth, head, and eye movements are also realistically translated to a chosen target subject, and perhaps the most remarkable part of this work was that we don’t even need a video of this target person, just one photograph. However, these deepfake techniques mainly help us in transferring video content. So what about voice synthesis? Is it also as advanced as this technique we’re looking at? Well, let’s have a look at an example, and you can decide for yourself. This is a recent work that goes by the name Tacotron 2, and it performs AI-based voice cloning. All this technique requires is a 5-second sound sample of us, and is able to synthesize new sentences in our voice, as if we uttered these words ourselves. Let’s listen to a couple examples. Wow, these are truly incredible. The timbre of the voice is very similar, and it is able to synthesize sounds and consonants that have to be inferred because they were not heard in the original voice sample. And now, let’s jump to the next level, and use a new technique that takes a sound sample

Neural Voice Puppetry

and animates the video footage as if the target subject said it themselves. This technique is called Neural Voice Puppetry, and even though the voices here are synthesized by this previous Tacotron 2 method that you heard a moment ago, we shouldn’t judge this technique by its audio quality, but how well the video follows these given sounds. Let’s go! If you decide to stay until the end of this video, there will be another fun video sample waiting for you there. Now, note that this is not the first technique to achieve results like this, so I can’t wait to look under the hood and see what’s new here. After processing the incoming audio, the gestures are applied to an intermediate 3D model, which is specific to each person since each speaker has their own way of expressing themselves. You can see this intermediate 3D model here, but we are not done yet, we feed it through

Comparison to VOCA - Winston Churchill

a neural renderer, and what this does is apply this motion to the particular face model shown in the video. You can imagine the intermediate 3D model as a crude mask that models the gestures well, but does not look like the face of anyone, where the neural renderer adapts this mask

Comparison to Deferred Neural Rendering

to our target subject. This includes adapting it to the current resolution, lighting, face position and more, all of which is specific to what is seen in the video. What is even cooler is that this neural rendering part runs in real time. So, what do we get from all this? Well, one, superior quality, but at the same time, it also generalizes to multiple targets. Have a look here! And the list of great news is not over yet, you can try it yourself, the link is available in the video description. Make sure to leave a comment with your results! To sum up, by combining multiple existing techniques, it is important that everyone knows about the fact that we can both perform joint video and audio synthesis for a target subject. This episode has been supported by Weights & Biases. Here, they show you how to use their tool to perform faceswapping and improve your model that performs it. Weights & Biases provides tools to track your experiments in your deep learning projects. Their system is designed to save you a ton of time and money, and it is actively used in projects at prestigious labs, such as OpenAI, Toyota Research, GitHub, and more. And, the best part is that if you are an academic or have an open source project, you can use their tools for free. It really is as good as it gets. Make sure to visit them through wandb. com/papers or just click the link in the video description and you can get a free demo today. Our thanks to Weights & Biases for their long-standing support and for helping us make better videos for you. Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник