Semantic Scene Completion From One Depth Image | Two Minute Papers #147
3:18

Semantic Scene Completion From One Depth Image | Two Minute Papers #147

Two Minute Papers 23.04.2017 13 032 просмотров 473 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
The paper "Semantic Scene Completion from a Single Depth Image" is available here: http://sscnet.cs.princeton.edu/ Recommended for you: How Does Deep Learning Work? - https://www.youtube.com/watch?v=He4t7Zekob0 Artificial Neural Networks and Deep Learning - https://www.youtube.com/watch?v=rCWTOOgVXyE WE WOULD LIKE TO THANK OUR GENEROUS PATREON SUPPORTERS WHO MAKE TWO MINUTE PAPERS POSSIBLE: Andrew Melnychuk, Christian Lawson, Daniel John Benton, Dave Rushton-Smith, Sunil Kim, VR Wizard. https://www.patreon.com/TwoMinutePapers Awesome Two Minute Papers merch: http://twominutepapers.com/ Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) Artist: http://audionautix.com/ Thumbnail background image credit: https://pixabay.com/photo-2225414/ Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Facebook → https://www.facebook.com/TwoMinutePapers/ Twitter → https://twitter.com/karoly_zsolnai Web → https://cg.tuwien.ac.at/~zsolnai/

Оглавление (6 сегментов)

<Untitled Chapter 1>

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This piece of work is an amazing application of deep neural networks, that performs semantic

Semantic scene completion

scene completion from only one depth image. This depth image is the colorful image that you see here, where the colors denote how

Problem definition

far away different objects are from our camera. We can create these images inexpensively with commodity hardware, for instance, Microsoft's Kinect has a depth sensor that is suitable for this task. The scene completion part means that from this highly incomplete depth information, the algorithm reconstructs the geometry for the entirety of the room. Even parts, that are completely missing from our images or things that are occluded! The output is what computer graphics researchers like to call a volumetric representation or a voxel array, which is essentially a large collection of tiny Lego pieces that build up the scene. But this is not all because the semantic part means that the algorithm actually understands what we're looking at, and thus, is able to classify different parts of the scene. These classes include walls, windows, floors, sofas, and other furniture. Previous works were able to do scene completion and geometry classification, but the coolest part of this algorithm is that it not only does these steps way better, but it does them both at the very same time. This work uses a 3D convolutional neural network to accomplish this task. The 3D part is required for this learning algorithm to be able to operate on this kind of volumetric data. As you can see, the results are excellent, and are remarkably close to the ground truth data. If you remember, not so long ago, I flipped out when I've seen the first neural network-based techniques that understood 3D geometry from 2D images. That technique used a much more complicated architecture, a generative adversarial network, which also didn't do scene completion and on top of that, the resolution of the output

SUNCG dataset: over 40K houses

was way lower, which intuitively means that Lego pieces were much larger. This is insanity. The rate of progress in machine learning research is just stunning, probably even for you seasoned Fellow Scholars who watch Two Minute Papers and have high expectations. We've had plenty of previous episodes about the inner workings of different kinds of neural networks. I've put some links to them in the video description, make sure to have a look if you wish to brush

Synthesizing training data

up on your machine learning kung fu a bit. The authors also published a new dataset to solve these kind of problems in future research works, and, it is also super useful because the output of their technique can be compared to ground truth data. When new solutions pop up in the future, this dataset can be used as a yardstick to compare

Testing

results with. The source code for this project is also available. Tinkerers rejoice! Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник