Rewrite Videos By Editing Text

4:10

Rewrite Videos By Editing Text

Two Minute Papers 11.07.2019 73 290 просмотров 3 673 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

📝 The paper "Text-based Editing of Talking-head Video" is available here: https://www.ohadf.com/projects/text-based-editing/ ❤️ Pick up cool perks on our Patreon page: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: 313V, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Bruno Brito, Bryan Learn, Christian Ahlin, Christoph Jadanowski, Claudio Fernandes, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, Ivelin Ivanov, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Zach Boldyga. https://www.patreon.com/TwoMinutePapers Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Facebook: https://www.facebook.com/TwoMinutePapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/ #DeepFake

Оглавление (1 сегментов)

Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. The last few years have been an amazing ride when it comes to research works for creating facial reenactments for real characters. Beyond just transferring our gestures to a video footage of an existing talking head, controlling their gestures like video game characters and full-body movement transfer are also a possibility. With WaveNet and its many variants, we can also learn someone’s way of speaking, write a piece of text and make an audio waveform where we can impersonate them using their own voice. So, what else is there to do in this domain? Are we done? No-no, not at all! Hold on to your papers, because with this amazing new technique, what we can do is look at the transcript of a talking head video, remove parts of it or add to it, just as we would edit any piece of text - and, this technique produces both the audio and a matching video of this person uttering these words. Check this out. It works by looking through the video collecting small sounds that can be used to piece together this new word that we’ve added to the transcript. The authors demonstrate this by adding the word “fox” to the transcript. This can be pieced together by the “v” which appears in the word “viper”, and taking “ox” as a part of another word found in the footage. As a result, one can make the character say “fox” even without hearing her uttering this word before. Then, we can look for not only the audio occurrences for these sounds, but the video footage of how they are being said, and in the paper, a technique is proposed to blend these video assets together. Finally, we can provide all this information to a neural renderer that synthesizes a smooth video of this talking head. This is a beautiful architecture with lots of contributions, so make sure to have a look at the paper in the description for more details. And of course, as it is not easy to measure the quality of these results in a mathematical manner, a user study was made where they asked some fellow humans which is the real footage, and which one was edited. You will see the footage edited by this algorithm on the right. And, hm, it’s not easy to tell which one is which, and it also shows in the numbers, which are not perfect, but they clearly show that the fake video is very often confused with the real one. Did you find any artifacts there that give the trick away? Perhaps the sentence was said a touch faster than expected. Found anything else? Let me know in the comments below. The paper also contains tons of comparisons against previous works. So in the last few years, the trend seems clear: the bar is getting lower, it is getting easier and easier to produce these kinds of videos, and it is getting harder and harder to catch them with our naked eyes, and now, we can edit the transcript of what is being said, which is super convenient. I would like to note that AIs also exist that can detect these edited videos with a high confidence. I put up the ethical considerations of the authors here, it is definitely worthy of your attention as it discusses how they think about these techniques. The motivation for this work was mainly to enhance digital storytelling by removing filler words, potentially flubbed phrases or retiming sentences in talking head videos. There is much more to it, so make sure to pause the video and read their full statement. Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник