# An AI Learned To See Through Obstructions! 👀

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=ICr6xi9wA94
- **Дата:** 28.07.2020
- **Длительность:** 5:24
- **Просмотры:** 108,797

## Описание

❤️ Check out Snap's Residency Program and apply here: https://lensstudio.snapchat.com/snap-ar-creator-residency-program/?utm_source=twominutepapers&utm_medium=video&utm_campaign=tmp_ml_residency
❤️ Try Snap's Lens Studio here: https://lensstudio.snapchat.com/

📝 The paper "Learning to See Through Obstructions" is available here:
https://alex04072000.github.io/ObstructionRemoval/
https://github.com/alex04072000/ObstructionRemoval

📝 Try it out here: https://colab.research.google.com/drive/1iOKknc0dePekUH2TEh28EhcRPCS1mgwz

 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh.
More info if you would like to appear here: https://www.patreon.com/TwoMinutePapers

Thumbnail background image credit: https://pixabay.com/images/id-451982/

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=ICr6xi9wA94) Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Approximately two years ago, we covered a work where a learning-based algorithm was able to read the wifi signals in a room to not only locate a person in a building, but even estimate their pose. An additional property of this method was that, as you see here, it does not look at images, but radio signals, which also traverse in the dark, and therefore, this pose estimation also works well in poor lighting conditions. Today’s paper offers a learning-based method for a different, more practical data completion problem, and it mainly works on image sequences. You see, we can give this one a short image sequence with obstructions, for instance, the fence here. And it is able to find and remove this obstruction, and not only that, but it can also show us what is exactly behind the fence! How is that even possible? Well, note that we mentioned that the input is not an image, but an image sequence, a short video if you will. This contains the scene from different viewpoints, and is one of the typical cases where if we would give all this data to a human, this human would take a long-long time, but would be able to reconstruct what is exactly behind the fence, because this data was visible from other viewpoints. But of course, clearly, this approach would be prohibitively slow and expensive. The cool thing here is that this learning-based method is capable of doing this, automatically! But it does not stop there. I was really surprised to find out that it even works for video outputs as well, so if you did not have a clear sight of that tiger in the zoo, do not despair! Just use this method, and there you go. When looking at the results of techniques like this, I always try to only look at the output, and try to guess where the fence was obstructing it. With many simpler, image inpainting techniques, this is easy to tell if you look for it, but here, I can’t see a trace. Can you? Let me know in the comments. Admittedly, the resolution of this video is not very high, but the results look very reassuring. It can also perform reflection removal, and some of the input images are highly contaminated by these reflected objects. Let’s have a look at some results! You can see here how the technique decomposes the input into two images, one with the reflection, and one without. The results are clearly not perfect, but they are easily good enough to make my brain focus on the real background without being distracted by the reflections. This was not the case with the input at all. Bravo! This use case can also be extended for videos, and I wonder how much temporal coherence I can expect in the output. In other words, if the technique solves the adjacent frames too differently, flickering is introduced to the video, and this effect is the bane of many techniques that are otherwise really good on still images. Let’s have a look! There is a tiny bit of flickering, but the results are surprisingly consistent. It also does quite well when compared to previous methods, especially when we are able to provide multiple images as an input. Now note that it says “ours, without online optimization”. What could that mean? This online optimization step is a computationally expensive way to further improve separation in the outputs, and with that, the authors propose a quicker and a slower version of the technique. The one without the on-line optimization step runs in just a few seconds, and if we add this step, we will have to wait approximately 15 minutes. I had to read the table several times, because researchers typically bring the best version of their technique to the comparisons, and it is not the case here. Even the quicker version smokes the competition! Loving it. Note if you have a look at the paper in the video description, there are, of course, more detailed comparisons against other methods as well. If these AR glasses that we hear so much about come to fruition in the next few years, having an algorithm for real-time glare, reflection and obstruction removal would be beyond amazing. We truly live in a science fiction world. What a time

### [5:00](https://www.youtube.com/watch?v=ICr6xi9wA94&t=300s) Segment 2 (05:00 - 05:00)

to be alive! Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14096*