Google’s PlaNet AI Learns Planning from Pixels

3:22

Google’s PlaNet AI Learns Planning from Pixels

Two Minute Papers 23.03.2019 45 896 просмотров 1 976 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Errata: https://twitter.com/arjunbazinga/status/1114497224174497793 📝 The paper "Learning Latent Dynamics for Planning from Pixels" and its source code is available here: https://planetrl.github.io/ https://arxiv.org/abs/1811.04551 https://github.com/google-research/planet ❤️ Pick up cool perks on our Patreon page: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: 313V, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Claudio Fernandes, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Marcin Dukaczewski, Marten Rauschenberg, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Morten Punnerud Engelstad, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Richard Reis, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Thomas Krcmar, Torsten Reil, Zach Boldyga, Zach Doty. https://www.patreon.com/TwoMinutePapers Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Facebook: https://www.facebook.com/TwoMinutePapers/ Twitter: https://twitter.com/karoly_zsolnai Web: https://cg.tuwien.ac.at/~zsolnai/ #Google #PlaNet #PlaNetAI

Оглавление (1 сегментов)

Segment 1 (00:00 - 03:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. Today we are going to talk about PlaNET, a technique that is meant to solve challenging image-based planning tasks with sparse rewards. Ok, that sounds great, but what do all of these terms mean? The planning part is simple, it means that the AI has to come up with a sequence of actions to achieve a goal, like pole balancing with a cart, teaching a virtual human or a cheetah to walk, or hitting this box the right way to make sure it keeps rotating. The image-based part is big - this means that the AI has to learn the same way as a human, and that is, by looking at the pixels of the images. This is a huge difficulty bump because the AI does not only have to learn to defeat the game itself, but also has to build an understanding of the visual concepts within the game. DeepMind’s legendary Deep Q-Learning algorithm was also able to learn from pixel inputs, but it was mighty inefficient at doing that, and no wonder, this problem formulation is immensely hard and it is a miracle that we can muster any solution at all that can figure it out. The sparse reward part means that we rarely get feedback as to how well we are doing at these tasks, which is a nightmare situation for any learning algorithm. A key difference with this technique against classical reinforcement learning, which is what most researchers reach out to solve similar tasks, is that this one uses models for the planning. This means that it does not learn every new task from scratch, but after the first game, whichever it may be, it will have a rudimentary understanding of gravity and dynamics, and it will be able to reuse this knowledge in the next games. As a result, it will get a headstart when learning a new game and is therefore often 50 times more efficient than the previous technique that learns from scratch, and not only that, but it has other really cool advantages as well which I will tell you about in just a moment. Here you can see that indeed, the blue lines significantly outperform the previous techniques shown with red and green for each of these tasks. I like how this plot is organized in the same grid as the tasks were as it makes it much more readable when juxtaposed with the video footage. As promised, here are the two really cool additional advantages of this model-based agent. The first is that we don’t have to train six separate AIs for all of these tasks, but finally, we can get one AI that is able to solve all six of these tasks efficiently. And second, it can look at as little as five frames of an animation, which is approximately one fifth of a second worth of footage…that is barely anything and it is able to predict how the sequence would continue with a remarkably high accuracy, and, over a long time frame, which is quite a challenge. This is an excellent paper with beautiful mathematical formulations, I recommend that you have a look in the video description. The source code is also available free of charge for everyone, so I bet this will be an exciting direction for future research works, and I’ll be here to report on it to you. Make sure to subscribe and hit the bell icon to not miss future episodes. Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник