# Visually Indicated Sounds | Two Minute Papers #79

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=flOevlA9RyQ
- **Дата:** 17.07.2016
- **Длительность:** 4:12
- **Просмотры:** 7,721

## Описание

The Scholarly Store is available here: https://shop.spreadshirt.net/TwoMinutePapers

Using the power of deep learning, it is now possible to create a technique that looks at a silent video and synthesize appropriate sound effects for it. The usage is at the moment, limited to hitting these objects with a drumstick.

Note: The authors seem to lean on a database of sounds, i.e., the synthesis does not happen from scratch, but they are not merely fetching the database entry for a given sound, but performing example-based synthesis (Section 5.2 in the paper below). In the video and the paper, they both use the words "synthesized sound" and "predicted sound", and it may be a bit unclear what degree of synthesis qualifies as a "synthesized sound". I think this is definitely worthy of further scrutiny.

_____________________________________

The paper "Visually Indicated Sounds" is available here:
https://arxiv.org/abs/1512.08512

Recommended for you:
What Do Virtual Objects Sound Like? - https://www.youtube.com/watch?v=ZaFqvM1IsP8&index=37&list=PLujxSBD-JXgnqDD1n-V30pKtp6Q886x7e
Synthesizing Sound From Collisions - https://www.youtube.com/watch?v=rskdLEl05KI&index=51&list=PLujxSBD-JXgnqDD1n-V30pKtp6Q886x7e
Reconstructing Sound From Vibrations - https://www.youtube.com/watch?v=2i1hrywDwPo&index=83&list=PLujxSBD-JXgnqDD1n-V30pKtp6Q886x7e

Our deep learning-related videos are available here (if you are looking for convolutional neural networks, recurrent neural networks):
https://www.youtube.com/playlist?list=PLujxSBD-JXglGL3ERdDOhthD3jTlfudC2

WE WOULD LIKE TO THANK OUR GENEROUS PATREON SUPPORTERS WHO MAKE TWO MINUTE PAPERS POSSIBLE:
David Jaenisch, Sunil Kim, Julian Josephs, Daniel John Benton.
https://www.patreon.com/TwoMinutePapers

We also thank Experiment for sponsoring our series. - https://experiment.com/

Subscribe if you would like to see more of these! - http://www.youtube.com/subscription_center?add_user=keeroyz

The thumbnail background image was created by slgckgc - https://flic.kr/p/9x93qE
Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook → https://www.facebook.com/TwoMinutePapers/
Twitter → https://twitter.com/karoly_zsolnai
Web → https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=flOevlA9RyQ) Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This name is not getting any easier, is it? It used to be Károly Zsolnai, which was hard enough, and now this... haha. Anyway, let's get started. This technique simulates how different objects in a video sound when struck. We have showcased some marvelous previous techniques that were mostly limited to wooden and plastic materials. Needless to say, there are links to these episodes in the video description box. A convolutional neural network takes care of understanding what is seen in the video. This technique is known to be particularly suited to processing image and video content. And it works by looking at the silent video directly and trying to understand what is going on, just like a human would. We train these networks with input and output pairs - the input is a video of us beating the hell out of some object with a drumstick. The joys of research! And the output is the sound this object emits. However, the output sound is something that changes in time. It is a sequence, therefore it cannot be handled by a simple classical neural network. It is learned by a recurrent neural network that can take care of learning such sequences. If you haven't heard these terms before, no worries, we have previous episodes on all of them in the video description box, make sure to check them out! This piece of work is a nice showcase of combining two quite powerful techniques: the convolutional neural network tries to understand what happens in the input video, and the recurrent neural network seals the deal by learning and guessing the correct sound that the objects shown in the video would emit when struck. The synthesized outputs were compared to the real world results both mathematically and by asking humans to try to tell from the two samples which one the real deal is. These people were fooled by the algorithm around 40% of the time, which I find to be a really amazing result, considering two things: first, the baseline is not 50%, but 0% because people don't pick choices at random - we cannot reasonably expect a synthesized sound to fool humans at any time. Like nice little neural networks, we've been trained to recognize these sounds all our lives, after all. And second, this is one of the first papers from a machine learning angle on sound synthesis. Before reading the paper, I expected at most 10 or 20 percent, if that. The tidal wave of machine learning runs through a number of different scientific fields. Will deep learning techniques establish supremacy in these areas? Hard to say yet, but what we know for sure is that great strides are made literally every week. There are so many works out there, sometimes I don't even know where to start. Good times indeed! Before we go, some delightful news for you Fellow Scholars! The Scholarly Two Minute Papers store is now open! There are two different kinds of men's T-shirts available, and a nice sleek design version that we made for the Fellow Scholar ladies out there! We also have The Scholarly Mug to get your day started in the most scientific way possible. We have tested the quality of these products and were really happy with what we got. If you ordered anything, please provide us feedback on how you liked the quality of the delivery and the products themselves. If you can send us an image of yourself wearing or using any of these, we'd love to have a look. Just leave them in the comments section or tweet at us! If you don't like what you got, within 30 days, you can exchange it or get your product cost refunded. Thanks for watching, and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14803*