DeepMind's AI Learns Object Sounds | Two Minute Papers #224
4:03

DeepMind's AI Learns Object Sounds | Two Minute Papers #224

Two Minute Papers 30.01.2018 20 417 просмотров 861 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
The paper "Objects that Sound" is available here: https://arxiv.org/abs/1712.06651 https://www.youtube.com/watch?v=TFyohksFd48 https://www.youtube.com/watch?v=x_qusr58ruU Our Patreon page with the details: https://www.patreon.com/TwoMinutePapers One-time payment links are available below. Thank you very much for your generous support! PayPal: https://www.paypal.me/TwoMinutePapers Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg Recommended for you: Look, Listen & Learn - https://www.youtube.com/watch?v=mL3CzZcBJZU We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dave Rushton-Smith, Dennis Abts, Emmanuel, Eric Haddad, Esa Turkulainen, Evan Breznyik, Frank Goertzen, Kaben Gabriel Nanlohy, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Michael Orenstein, Raul Araújo da Silva, Robin Graham, Shawn Azman, Steef, Steve Messina, Sunil Kim, Torsten Reil. https://www.patreon.com/TwoMinutePapers Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) Artist: http://audionautix.com/ Thumbnail background image credit: https://pixabay.com/photo-756326/ Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Facebook: https://www.facebook.com/TwoMinutePapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (1 сегментов)

Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This work is about creating an AI that can perform audio-visual correspondence. This means two really cool tasks: One, when given a piece of video and audio, it can guess whether they match each other. And two, it can localize the source of the sounds heard in the video. Hm-hmm! And wait, because this gets even better! As opposed to previous works, here, the entire network is trained from scratch and is able to perform cross-modal retrieval. Cross-modal retrieval means that we are able to give it an input sound and it will be able to find pictures that would produce similar sounds. Or vice versa. For instance, here, the input is the sound of a guitar, note the loudspeaker icon in the corner, and it shows us a bunch of either images or sounds that are similar. Marvelous. The training is unsupervised, which means that the algorithm is given a bunch of data and learns without additional labels or instructions. The architecture and results are compared to a previous work by the name Look, Listen & Learn that we covered earlier in the series, the link is available in the video description. As you can see, both of them run a convolutional neural network. This is one of my favorite parts about deep learning - the very same algorithm is able to process and understand signals of very different kinds: video and audio. The old work concatenates this information and produces a binary yes/no decision whether it thinks the two streams match. This new work tries to produce number that encodes the distance between the video and the audio. Kind of like the distance between two countries on a map, but both video and audio signals are embedded in the same map. And the output decision always depends on how small or big this distance is. This distance metric is quite useful: if we have an input video or audio signal, choosing other video and audio snippets that have a low distance is one of the important steps that opens up the door to this magical cross-modal retrieval. What a time to be alive! Some results are very easy to verify, others may spark some more debate, for instance, it is quite interesting to see that the algorithm highlights the entirety of the guitar string as a sound source. If you are curious about this mysterious blue image here, make sure to have a look at the paper for an explanation. Now this is a story that we would like to tell to as many people as possible. Everyone needs to hear about this. If you would like to help us with our quest, please consider supporting us on Patreon. You can also pick up some cool perks, like getting early access to these videos or deciding the order of upcoming episodes. Details are available in the video description. Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник