Deep Learning Program Hallucinates Videos | Two Minute Papers #120
5:33

Deep Learning Program Hallucinates Videos | Two Minute Papers #120

Two Minute Papers 17.01.2017 61 109 просмотров 1 314 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
The paper "Generating Videos with Scene Dynamics" and its source code, and a pre-trained network is available here: http://web.mit.edu/vondrick/tinyvideo/ Recommended for you: Image Synthesis From Text With Deep Learning - https://www.youtube.com/watch?v=rAbhypxs1qQ What is an Autoencoder? - https://www.youtube.com/watch?v=Rdpbnd0pCiI Hallucinating Images With Deep Learning - https://www.youtube.com/watch?v=hnT-P3aALVE WE WOULD LIKE TO THANK OUR GENEROUS PATREON SUPPORTERS WHO MAKE TWO MINUTE PAPERS POSSIBLE: Sunil Kim, Daniel John Benton, Dave Rushton-Smith, Benjamin Kang. https://www.patreon.com/TwoMinutePapers Subscribe if you would like to see more of these! - http://www.youtube.com/subscription_center?add_user=keeroyz Music: Dat Groove by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) Artist: http://audionautix.com/ Thumbnail image credit: https://pixabay.com/photo-1751455/ Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Facebook → https://www.facebook.com/TwoMinutePapers/ Twitter → https://twitter.com/karoly_zsolnai Web → https://cg.tuwien.ac.at/~zsolnai/

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

dear fellow Scholars this is 2 minute papers with Caro here ever thought about the fact that we have a stupendously large amount of unlabeled videos on the internet and of course with the ascendancy of machine learning algorithms that can learn by themselves it would be a huge missed opportunity to not make use of all this free data this is a crazy piece of work where the idea is to unleash a neural network on a large number of publicly uploaded videos on the internet and see how well it does if we ask it to generate new videos from scratch here unlabeled means there is no information as to what we see in these videos they are just provided as is machine learning methods that work on this kind of data we like to call unsupervised learning techniques this work is based on a generative adversarial Network wait what does this mean exactly this means that we have two neural networks that erase each other where one tries to generate more and more real looking animations and passes it over to the other that learns to tell real footage from fake ones the first we call the generator Network and the second is the discriminator network they try to outperform each other and this rivalry goes on for quite a while and improves the quality of output for both neural networks hence the name generative adversarial networks at first we have covered this concept that was used to generate images from written text descriptions the shortcoming of this approach was the slow training time that led to extremely tiny low resolution output images this was remedied by a follow-up work which proposed a two-stage version of this architecture we have covered this in an earlier 2-minute paper episode as always the link is available in the video description it would not be an understatement to say that I nearly fell off the chair when seeing these incredible results so where do we go from here what shall be the next step well of course video however the implementation of such a technique is far from trivial in this piece of work the generator Network learns not on the original representation of the videos but on the foreground and background video streams separately and it also has to learn what combination of these yields realistic footage this two stream architecture is particularly useful in modeling Real World videos where the background is mostly stationary and there's an animated movement in the foreground a train passing the station or people playing golf on the field are excellent examples of this kind of Separation we definitely need a high quality discriminator Network as well as in the final synthesized footage not only the foreground and background must go well together but the synthesized animations also have to be believable for human beings this human being in our case is represented by the discriminator network needless to say this problem is extremely difficult ult and the quality of the discriminator network makes or breaks this magic trick and of course the all important question immediately arises if there are multiple algorithms performing this action how do we decide which one is the best generally we get a few people and show them a piece of synthesized footage with this algorithm and previous works and have them decide which they deem more realistic this is still the first step I expect these techniques to improve so rapidly that will soon find ourselves testing against Real World footage and who knows sometimes perhaps failing to recognize which is which the results in the paper show that this new technique beats the previous techniques by a significant margin and that users have a strong preference towards the two stream architecture the previous technique they compare against is an auto encoder which we have discussed in a previous 2minute papers episode check it out it is available in the video description the disadvantages of this approach are quite easy to identify this time around we have a very limited resolution for these output video streams that is 64x 64 pixels for 32 frames which even at a modest frame rate is just slightly over 1 second of footage the synthesized results vary greatly in quality but it's remarkable to see that the machine can have a rough understanding of the concept of a large variety of movement and animation types it is really incredible to see that the neural network learns about the represent ations of these objects and how they move even when it wasn't explicitly instructed to do so we can also visualize what the neuron network has learned this is done by finding different image inputs that make a particular neuron extremely excited here we see a collection of inputs including these activations for images of people and trains the author's website is definitely worthy of checking out as some of the submenus are quite ample in results some amazing some well a bit

Segment 2 (05:00 - 05:00)

horrifying but what is sure is that all of them are quite interesting and before we go a huge shout out to llo chandes who helped us quite a bit in sorting out a number of technical issues with the series thanks for watching and for your generous support and I'll see you next time oh

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник