# This AI Produces Binaural (2.5D) Audio

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=t_7qpPOmsME
- **Дата:** 19.01.2019
- **Длительность:** 3:57
- **Просмотры:** 33,561
- **Источник:** https://ekstraktznaniy.ru/video/14370

## Описание

The paper "2.5D Visual Sound" is available here:
https://arxiv.org/abs/1812.04204

Pick up cool perks on our Patreon page:
› https://www.patreon.com/TwoMinutePapers

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, Jason Rollins, Javier Bustamante, John De Witt, Kaiesh Vohra, Kjartan Olason, Lorin Atzberger, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Richard Reis, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga, Zach Doty.
https://www.patreon.com/TwoMinutePapers

Splash screen/thumbnail design: Felícia Fehér - htt

## Транскрипт

### <Untitled Chapter 1> []

dear fellow Scholars this is two minute papers with car Boral or 2 and a2d audio means a sound recording that provides the listener with an amazing 3D is sound sensation it produces a sound that feels highly realistic when listened to through headphones and therefore using a pair is highly recommended for this episode it sounds way more immersive than regular mono or even stereo audio signals but also requires more expertise to produce and is therefore quite scarce on the internet let's listen to the difference

### monaural audio (for comparison) [0:44]

together we have not only heard sound samples here but you could also see the accompanying video content which reveals the position of the players and the composition of the scene in which the recording is made this sounds like a perfect fit for an AI to take a piece of mono audio and use this additional information to convert it to make it sound binaural this project is exactly about that where a deep convolutional neural network is used to look at both the video and a single Channel audio content in our footage and then predict what it would have sounded like where it recorded as a binaural signal let's listen to a few results

### MONO2BINAURAL (our method) [2:19]

the fact that we can use the visual content as well as the audio with this neural network also enables us to separate the sound of an instrument within the mix let's

### separated sound for cello [2:49]

listen to validate the results the authors both used a quantitative mathematical way of comparing their results to the ground truth and not only that but they also carried out two user studies as well in the first one the ground truth was shown to the users and they were asked to judge which of the two techniques were better in this study this new method performed better than previous methods and in the second setup users were asked to name the directions they hear the different instrument sounds coming from in this case the new method outperformed the previous techniques by a significant margin and if we keep progressing like this we may be at most a couple papers away from Two and a Half the audio synthesis that sounds indistinguishable from the real deal looking forward to a future where we can enjoy all kinds of video content with this kind of immersion thanks for watching and for your generous support and I'll see you next time