# AI Learns Real-Time Defocus Effects in VR

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=Do_00r8NGMY
- **Дата:** 30.01.2019
- **Длительность:** 4:20
- **Просмотры:** 36,583

## Описание

The paper "DeepFocus: Learned Image Synthesis for Computational Displays" and its source code is available here:
https://research.fb.com/publications/deepfocus-siggraph-asia-2018/
https://www.oculus.com/blog/introducing-deepfocus-the-ai-rendering-system-powering-half-dome/
https://github.com/facebookresearch/DeepFocus

Pick up cool perks on our Patreon page:
› https://www.patreon.com/TwoMinutePapers

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
313V, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, Jason Rollins, Javier Bustamante, John De Witt, Kaiesh Vohra, Kjartan Olason, Lorin Atzberger, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Richard Reis, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga, Zach Doty.
https://www.patreon.com/TwoMinutePapers

Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=Do_00r8NGMY) Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. If we are to write a sophisticated light simulation program, and we write a list of features that we really wish to have, we should definitely keep an eye on defocus effects. This is what it looks like, and in order to do that, our simulation program has to take into consideration the geometry and thickness of the lenses within our virtual camera, and even though it looks absolutely amazing, it is very costly to simulate that properly. This particular technique attempts to do this in real time, and, for specialized display types, typically ones that are found in head-mounted displays for Virtual Reality applications. So here we go, due to popular request, a little VR in Two Minute Papers. In virtual reality, defocus effects are especially important because they mimic how the human visual system works. Only a tiny region that we’re focusing on looks sharp, and everything else should be blurry, but not any kind of blurry, it has to look physically plausible. If we can pull this off just right, we’ll get a great and immersive VR experience. The heart of this problem is looking at a 2D image and being able to estimate how far away different objects are from the camera lens. This is a task that is relatively easy for humans because we have an intuitive understanding of depth and geometry, but of course, this is no easy task for a machine. To accomplish this, here, a convolutional neural network is used, and our seasoned Fellow Scholars know that this means that we need a ton of training data. The input should be a bunch of images, and their corresponding depth maps for the neural network to learn from. The authors implemented this with a random scene generator, which creates a bunch of these crazy scenes with a lot of occlusions and computes via simulation the appropriate depth map for them. On the right, you see these depth maps, or in other words, images that describe to the computer how far away these objects are. The incredible thing is that the neural network was able to learn the concept of occlusions, and was able to create super high quality defocus effects. Not only that, but this technique can also be reconfigured to fit different use cases: if we are okay with spending up to 50 milliseconds to render an image, which is 20 frames per second, we can get super high-quality images, or, if we only have a budget of 5 milliseconds per image, which is 200 frames per second, we can do that and the quality of the outputs degrades just a tiny bit. While we are talking about image quality, let’s have a closer look at the paper, where we see a ton of comparisons against previous works and of course, against the baseline ground truth knowledge. You see two metrics here: PSNR, which is the peak signal to noise ratio, and SSIM, the structural similarity metric. In this case, both are used to measure how close the output of these techniques is to the ground truth footage. Both are subject to maximization. For instance, here you see that the second best technique has a peak signal to noise ratio of around 40, and this new method scores 45. Well, some may think that that’s just a 12-ish percent difference, right? … No. Note that PSNR works on a logarithmic scale, which means that even a tiny difference in numbers translates to a huge difference in terms of visuals. You can see in the closeups that the output of this new method is close to indistinguishable from the ground truth. A neural network that successfully learned the concept of occlusions and depth by looking at random scenes. Bravo. As virtual reality applications are on the rise these days, this technique will be useful to provide a more immersive experience for the users. And to make sure that this method sees more widespread use, the authors also made the source code and the training datasets available for everyone, free of charge, so make sure to have a look at that and run your own experiments if you’re interested. I'll be doing that in the meantime. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14364*