DeepMind's WaveNet, 1000 Times Faster | Two Minute Papers #232

4:10

DeepMind's WaveNet, 1000 Times Faster | Two Minute Papers #232

Two Minute Papers 01.03.2018 48 964 просмотров 1 927 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

The paper "Parallel WaveNet: Fast High-Fidelity Speech Synthesis" is available here: https://arxiv.org/abs/1711.10433 Our Patreon page: https://www.patreon.com/TwoMinutePapers DeepMind's Blog: https://deepmind.com/blog/wavenet-launches-google-assistant/ We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Esa Turkulainen, Evan Breznyik, Frank Goertzen, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Nader Shakerin, Raul Araújo da Silva, Robin Graham, Shawn Azman, Steef, Steve Messina, Sunil Kim, Torsten Reil. https://www.patreon.com/TwoMinutePapers One-time payment links are available below. Thank you very much for your generous support! PayPal: https://www.paypal.me/TwoMinutePapers Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) Artist: http://audionautix.com/ Thumbnail background image credit: https://pixabay.com/photo-3172471/ Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Facebook: https://www.facebook.com/TwoMinutePapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (1 сегментов)

Segment 1 (00:00 - 04:00)

dear fellow scholars this is two minute papers with károly on IFA here due to popular demand here is the new deepmind paper on wavenet is a text-to-speech algorithm that takes a sentence as an input and gives us audio footage of these words being uttered by a person of our choice let's listen to some results from the original algorithm note that these are all synthesized by the AI the blue lagoon is a 1980 American romance an adventure film directed by Randal Kleiser the blue lagoon is a 1980 American aspects of the sublime in English poetry and painting 1770 to 1850 all this requires is some training data from this person's voice typically ten to thirty hours and a ton of computational power the computational power part is especially of interest because we have to produce over sixteen to twenty four thousand samples for each second of continuous audio footage and unfortunately as you can see here these new samples are generated one by one and since today's graphics cards are highly parallel this means that it is a waste to get them to have one compute unit that does all the work while the others are sitting there twiddling their thumbs we need to make this more parallel somehow so the solution is simple instead of one we can just simply make more samples in parallel no no no it doesn't work like that and the reason for this is that speech is not like random noise it is highly coherent where the new samples are highly dependent on the previous ones we can only create one new sample at a time so how can we create the new waveform in one go using these many compute units in parallel this new wavenet variant starts out from white noise and applies changes to it over time to morph it into the output speech waveform the changes take place in parallel over the entirety of the signal so that's a good sign it works by creating a reference network that is slow but correct let's call this the teacher Network and the new algorithm arises as a student network which tries to mimic what the teacher does but the student tries to be more efficient at that this has a similar vibe to generative adversarial networks where we have two networks one is actively trying to fool the other one while this other one tries to better distinguish fake inputs from real ones however it is fundamentally different because of the fact that the student does not try to fool the teacher but mimic it who are being more efficient and this yields a blistering fast version of wavenet that is over a thousand times faster than its predecessor it is not real time it is 20 times faster than real time and you know what the best part is usually there are heavy trade-offs for this but this time the validation section of the paper reveals that there is no perceived difference in the outputs from the original algorithm kaalia so where can we try it well it is already deployed online in Google assistant in multiple English and Japanese voices so as you see I was wrong I said that a few papers down the line it will definitely be done in real time apparently with this new work it is not a few papers down the line it is one and it is not a bit faster but a thousand times faster things are getting out of hand real quick and I mean this in the best possible way what a time to be alive this is one incredible and highly inspiring work make sure to have a look at the paper perfect training for the mind as always it is available in the video description thanks for watching and for your generous support and I'll see you next time

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник