This Neural Network Turns Videos Into 60 FPS!

6:02

This Neural Network Turns Videos Into 60 FPS!

Two Minute Papers 18.02.2020 266 768 просмотров 10 019 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Weights & Biases here and sign up for a free demo here: https://www.wandb.com/papers Their blog post on hyperparameter optimization is available here: https://www.wandb.com/articles/find-the-most-important-hyperparameters-in-seconds 📝 The paper "Depth-Aware Video Frame Interpolation" and its source code are available here: https://sites.google.com/view/wenbobao/dain The promised playlist with a TON of interpolated videos: https://www.youtube.com/playlist?list=PLDi8wAVyouYNDl7gGdSbWKdRxIogfeD3H 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Benji Rabhan, Brian Gilman, Bryan Learn, Claudio Fernandes, Daniel Hasegan, Dan Kennedy, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. https://www.patreon.com/TwoMinutePapers Far Cry video source by N00MKRAD: https://www.youtube.com/watch?v=tW0cvyut7Gk&list=PLDi8wAVyouYNDl7gGdSbWKdRxIogfeD3H&index=20 Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Instagram: https://www.instagram.com/twominutepapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/ #DainApp

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. With today's camera and graphics technology, we can enjoy smooth and creamy videos on our devices that were created with 60 frames per second. I also make each of these videos using 60 frames per second, however, it almost always happens that I encounter the paper videos that have anything from 24 to 30 frames per second. In this case, I put them in my video editor that has a 60 fps timeline, so half or even more of these frames will not provide any new information. As we try to slow down the videos for some nice slow-motion action, this ratio is even worse, creating an extremely choppy output video because we have huge gaps between these frames. So, does this mean that there is nothing we can do and have to put up with this choppy footage? No, not at all! Earlier, we discussed two potential techniques to remedy this issue. One was frame blending, which simply computes the average of two consecutive images and presents that as a solution. This helps a little for simpler cases, but this technique is unable to produce new information. Optical flow is a much more sophisticated method that is very capable as it tries to predict the motion that takes place between these frames. This can kind of produce new information and I use this in the video series on a regular basis, but the output footage also has to be carefully inspected for unwanted artifacts. Which are a relatively common occurrence. Now, our seasoned Fellow Scholars will immediately note that we have a lot of high-framerate videos on the internet, why not delete some of the in-between frames, give the choppy and the smooth videos to a neural network, and teach it to fill in the gaps! After the lengthy training process, it should be able to complete these choppy videos properly. So, is that true? Yes, but note that there are plenty of techniques out there that already do this, so what is new in this paper? Well, this work does that, …and… much more! We will have a look at the results, which are absolutely incredible, but to be able to appreciate what is going on, let me quickly show you this. The design of this neural network tries to produce four different kinds of data to fill in these images. One is optical flows, which is part of previous solutions too, but two, it also produces a depth map that tells us how far different parts of the image are from the camera. This is of utmost importance, because if we rotate the camera around, previously occluded objects suddenly become visible, and we need proper intelligence to be able to recognize this and to fill in this kind of missing information. This is what the contextual extraction step is for, which drastically improves the quality of the reconstruction, and finally, the interpolation kernels are also learned, which gives it more knowledge as to what data to take from the previous and the next frame. Since it also has a contextual understanding of these images, one would think that it needs a ton of neighboring frames to understand what is going on, which, surprisingly, is not the case at all! All it needs is just the two neighboring images. So, after doing all this work, it better be worth it, right? Let’s have a look at some results! Hold on to your papers, and in the meantime, look at how smooth and creamy the outputs are! Love it! Because it also deals with contextual information, if you wish to feel like real Scholar, you can gaze at regions where the occlusion situation changes rapidly and see how well it fills in this kind of information. Unreal. So how does one show that the technique is quite robust? Well, by producing and showing it off on tons and tons of footage - and that is exactly what the authors did! I put a link to a huge playlist with 33 different videos in the description so you can have a look at how well this works on a wide variety of genres. Now, of course, this is not the first technique for learning-based frame interpolation, so let’s see how it stacks up against the competition! Wow, this is quite a value proposition, because depending on the dataset, it comes out first and second place on most examples. The PSNR is the peak signal to noise ratio, while the SSIM is the structural similarity metric, both of which measure how well the algorithm reconstructs these details compared to the ground truth, and both are subject to maximization. Note that none of them are linear, therefore even a small difference in these numbers can mean a significant difference. I think we are now at a point where these tools are getting so much better than their

Segment 2 (05:00 - 06:00)

handcrafted optical flow rivals that I think they will quickly find their way to production software. I cannot wait. What a time to be alive! This episode has been supported by Weights & Biases. In this post, they show you which hyperparameters to tweak to improve your model performance. Weights & Biases provides tools to track your experiments in your deep learning projects. Their system is designed to save you a ton of time and money, and it is actively used in projects at prestigious labs, such as OpenAI, Toyota Research, GitHub, and more. They don’t lock you in, and if you are an academic or have an open source project, you can use their tools for free. It really is as good as it gets. Make sure to visit them through wandb. com/papers or just click the link in the video description and you can get a free demo today. Our thanks to Weights & Biases for their long-standing support and for helping us make better videos for you. Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник