❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers
Their mentioned post is available here:
https://app.wandb.ai/latentspace/published-work/The-Science-of-Debugging-with-W%26B-Reports--Vmlldzo4OTI3Ng
📝 The paper "Consistent Video Depth Estimation" is available here:
https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nader S., Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh.
More info if you would like to appear here: https://www.patreon.com/TwoMinutePapers
Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/
Оглавление (1 сегментов)
Segment 1 (00:00 - 04:00)
Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. When we, humans look at an image, or a piece of video footage, we understand the geometry of the objects in there so well, that if we have the time and patience, we could draw a depth map that describes the distance of each object from the camera. This goes without saying. However, what does not go without saying, is that if we could teach computers to do the same, we could do incredible things. For instance, this learning-based technique creates real-time defocus effects for virtual reality and computer games, and this one performs this Ken Burns effect in 3D, or in other words, zoom and pan around in a photograph, but, with a beautiful twist, because in the meantime, it also reveals the depth of the image. With this data, we can even try to teach self-driving cars about depth perception to enhance their ability to navigate around safely. However, if you look here, you see two key problems: one, it is a little blurry, and there are lots of fine details that it couldn’t resolve, and, it is flickering. In other words, there are abrupt changes from one image to the next one, which shouldn’t be there as the objects in the video feed are moving smoothly. Smooth motion should mean smooth depth maps, and it is getting there, but it still is not the case here. So, I wonder, if we could teach a machine to perform this task better? And more importantly, what new wondrous things can we do if we pull this off? This new technique is called Consistent Video Depth Estimation, and it promises smooth and detailed depth maps that are of much higher quality than what previous works offer. And, now, hold on to your papers, because finally, these maps contain enough detail to open up the possibility of adding new objects to the scene, or even flood the room with water, or add many other, really cool video effects. All of these will take the geometry of the existing real-world objects, for instance, cats into consideration. Very cool! The reason why we need such a consistent technique to pull this off, is because if we have this flickering in time that we’ve seen here, then, the depth of different objects suddenly bounces around over time, even for a stationary object. This means, that in one frame, the ball would be in front of the person, when in the next one, it would suddenly think that it has to put the ball behind them, and then, in the next one, front again, creating a not only jarring, but quite unconvincing animation. What is really remarkable is that due to the consistency of the technique, none of that happens here. Love it! Here are some more results where you can see that the outlines of the objects in the depth map are really crisp, and follow the changes really well over time. The snowing example here is one of my favorites, and it is really convincing. However, there are still a few spots where we can find some visual artifacts. For instance, as the subject is waving, there is lots of fine, high-frequency data around the fingers there, and if you look at the region behind the head closely, you find some more issues, or you can find that some balls are flickering on the table as we move the camera around. Compare that to previous methods that could not do nearly as good as this, and now, we have something that is quite satisfactory. I can only imagine how good this will get to more papers down the line. And for the meantime, we’ll be able to run these amazing effects even without having a real depth camera. What a time to be alive! Thanks for watching and for your generous support, and I'll see you next time!