# This AI Shows Us the Sound of Pixels

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=o-LU_Dja6Ks
- **Дата:** 08.11.2018
- **Длительность:** 3:27
- **Просмотры:** 31,552

## Описание

The paper "The Sound of Pixels" is available here:
http://sound-of-pixels.csail.mit.edu/

Pick up cool perks on our Patreon page:
› https://www.patreon.com/TwoMinutePapers

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, John De Witt, Kjartan Olason, Lorin Atzberger, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga.
https://www.patreon.com/TwoMinutePapers

Thumbnail background image credit: https://pixabay.com/photo-1606337/

YouTube video credits:
Old Wine - https://www.youtube.com/watch?v=bLjS6E6c0IA
Fabian Rivero - https://www.youtube.com/watch?v=-HLTNgdajqw
Michael Mikulka - https://www.youtube.com/watch?v=n8-2q4dheyU
Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Crypto and PayPal links are available below. Thank you very much for your generous support!
› PayPal: https://www.paypal.me/TwoMinutePapers
› Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh
› Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A
› LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=o-LU_Dja6Ks) Segment 1 (00:00 - 03:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This is a neural network-based method that is able to show us the sound of pixels. What this means is that it separates and localizes audio signals in videos. The two keywords are separation and localization, so let's take a look at these one by one. Localization means that we can pick a pixel in the image and it show us the sound that comes from that location, and the separation part means that ideally, we will only hear that particular sound source. Let's have a look at an example. Here is an input video. And now, let's try to separate the sound of the chello and see if it knows where it comes from. Same with the guitar. Now for a trickier question... even though there are sound reverberations off the walls, but the walls don't directly emit sound themselves, so I am hoping to hear nothing now, let's see... flat signal, great! So, how does this work? It is a neural-network based solution that has watched 60 hours of musical performances to be able to pull this off, and it learns that a change in sound can often be tracked back to a change in the video footage as a musician is playing an instrument. As a result, get this, no supervision is required. This means that we don't need to label this data, or in other words, we don't need to specify how each pixel sounds, it learns to infer all this information from the video and sound signals by itself. This is huge, and otherwise, just imagine how many work-hours that would require to annotate all this data. And, another cool application is that if we can separate these signals, then we can also independently adjust the sound of these instruments. Have a look. Now, clearly, it is not perfect as some frequencies may bleed over from one instrument to the other, and there also are other methods to separate audio signals, but this particular one does not require any expertise, so I see a great value proposition there. If you wish to create a separate version of a video clip and use it for karaoke, or just subtract the guitar and play it yourself, I would look no further. Also, you know the drill, this will be way better a couple papers down the line. So, what do you think? What possible applications do you envision for this? Where could it be improved? Let me know below in the comments. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14393*