❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers
📝 The paper "Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback" is available here:
https://www.deepmind.com/blog/building-interactive-agents-in-video-game-worlds
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Luke Dominique Warner, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu
Károly Zsolnai-Fehér's links:
Mastodon: https://sigmoid.social/@twominutepapers
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today we are going to have a peek into DeepMind’s journey into creating an AI that behaves like a human. How? Well, by dropping it into a virtual playhouse, asking humans to give it tasks, judging it rather harshly, and hoping that it learns something from it. But, not so fast! First, we have to start with an AI agent, a little assistant, that already knows a few things. Just dropping a randomly acting agent into this game would do no good. So, how do we do that? Well, scientists at DeepMind say that first, we should ask it to look at humans interacting with previous AI agents. And have it copy them. Let’s call this behavioral cloning. So, how good is behavioral cloning? Well, the answer is not good at all. Let’s have a look together. This agent has looked at humans building towers from books, and now, it’s your turn, little AI. And it does… not too well, I am afraid. Oh my. That is not good. Look. It has learned a bit from humans, it is trying so hard, but unfortunately it has no idea about how physics works, and does not have the finesse required to build this tower. So, is that it? We can’t even write an AI that clones what humans can already perform in virtual worlds? Well, do not despair, because here comes the second best part of the paper. In this work, this agent is used just as a starting point, and now, we can ask it to do a bunch of tasks, and have humans give it a score. For instance, here you see it trying to arrange these fruits in a row. And, whenever it progresses with this task, it gets a nice spike in the score. And whenever it does not do well, we can tell it that this is not the way. Alright, but what does this process do for us? Well, it creates an agent that can learn not the behavior of humans, but the reward model itself. What does that mean? Well, it tries to learn what the human teacher would think about what it is doing right now. And, is that useful? Well, have a look at it yourself. Here is the agent after it started to learn the reward model, and, let’s build that tower again. First try is not too bad, but the second one, now look at that. That is a proper tower of books right there. Very proud of you, little AI! So, how well did it do? Here is the score for a human player. Here it is for the not too great first try with the behavioral cloning AI. And now, hold on to your papers, and look at that! This is the new agent. It does almost as well as a real human does. Wow. This is an amazing chart that shows that these agents can learn tasks that require walking around, understanding a bit of physics, and a bit of finesse as well. I love it. And, it can learn not just building towers, but it also learned to build triangles of objects, and of course, everyone’s favorite. What good would an AI be if we could not teach it to clean our room? So, can it do that? Oh yes, and with flying colors. Loving it. But wait, I said that this was the second best part of the paper. What is the best part? It is this. Remember, first, we start out from the behavioral cloning agent, and start learning the reward model. This does nearly as well as a human does, so that is absolutely amazing, we can teach an AI to behave like a human does in these tasks. This is incredible. But that’s not it. Get this, if we repeat this process and let it learn for a bit longer, we get this. Whoa! Are you seeing what I am seeing? Yes, that is right, if we let it train for a bit longer, this happens. It can even surpass humans at building that tower. An AI that learns from humans, and over time, even surpasses humans. And, wait, it gets better, it surpasses humans not just on this one particular task, but on a set of over a 150 standardized tasks. Here they are. Generally, it greatly outperforms in instruction following-type tasks, while it is
Segment 2 (05:00 - 06:00)
a bit worse than humans at answering questions, but it still does really well. Wow. Perhaps this is one more step towards general intelligence. And also, welcome to Two Minute Papers, Land of the Fellow Scholars who look at bar charts and make happy noises. Yeah! What a time to be alive! Thanks for watching and for your generous support, and I'll see you next time!