# DeepMind’s New AI Surpasses Humans At Some Things!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=VvzZG-HP4DA
- **Дата:** 03.01.2023
- **Длительность:** 6:28
- **Просмотры:** 104,549
- **Источник:** https://ekstraktznaniy.ru/video/13339

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback" is available here:
https://www.deepmind.com/blog/building-interactive-agents-in-video-game-worlds

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Luke Dominique Warner, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fi

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Today we are going to have a  peek into DeepMind’s journey   into creating an AI that behaves like a human. How? Well, by dropping it into a virtual  playhouse, asking humans to give it tasks,   judging it rather harshly, and hoping  that it learns something from it. But, not so fast! First, we have to start  with an AI agent, a little assistant,   that already knows a few things. Just dropping  a randomly acting agent into this game would   do no good. So, how do we do that? Well,  scientists at DeepMind say that first,   we should ask it to look at humans interacting  with previous AI agents. And have it copy them.    Let’s call this behavioral cloning.   So, how good is behavioral cloning? Well, the answer is not good at all. Let’s have  a look together. This agent has looked at humans   building towers from books, and now, it’s your  turn, little AI. And it does… not too well,   I am afraid. Oh my. That is not good.   Look. It has learned a bit from humans,   it is trying so hard, but unfortunately  it has no idea about how physics works,   and does not have the finesse  required to build this tower. So, is that it? We can’t even write an AI that  clones what humans can already perform in virtual   worlds? Well, do not despair, because here comes  the second best part of the paper. In this work,   this agent is used just as a starting point,  and now, we can ask it to do a bunch of tasks,   and have humans give it a score. For  instance, here you see it trying to   arrange these fruits in a row. And, whenever  it progresses with this task, it gets a nice   spike in the score. And whenever it does not do  well, we can tell it that this is not the way. Alright, but what does this process do for us?   Well, it creates an agent that can learn not   the behavior of humans, but the reward model  itself. What does that mean? Well, it tries to   learn what the human teacher would think about  what it is doing right now. And, is that useful? Well, have a look at it yourself. Here is the  agent after it started to learn the reward model,   and, let’s build that tower again. First  try is not too bad, but the second one,   now look at that. That is a proper tower of  books right there. Very proud of you, little AI! So, how well did it do? Here is the score  for a human player. Here it is for the   not too great first try with the behavioral  cloning AI. And now, hold on to your papers,   and look at that! This is the new agent. It does  almost as well as a real human does. Wow. This is   an amazing chart that shows that these agents  can learn tasks that require walking around,   understanding a bit of physics, and  a bit of finesse as well. I love it.   And, it can learn not just building towers, but  it also learned to build triangles of objects,   and of course, everyone’s favorite. What  good would an AI be if we could not teach   it to clean our room? So, can it do that?   Oh yes, and with flying colors. Loving it. But wait, I said that this was the second  best part of the paper. What is the best part? It is this. Remember, first, we start out from  the behavioral cloning agent, and start learning   the reward model. This does nearly as well as  a human does, so that is absolutely amazing,   we can teach an AI to behave like a human does in  these tasks. This is incredible. But that’s not   it. Get this, if we repeat this process and  let it learn for a bit longer, we get this.    Whoa! Are you seeing what I am seeing? Yes, that  is right, if we let it train for a bit longer,   this happens. It can even surpass humans at  building that tower. An AI that learns from   humans, and over time, even surpasses humans.   And, wait, it gets better, it surpasses humans   not just on this one particular task, but on  a set of over a 150 standardized tasks. Here   they are. Generally, it greatly outperforms in  instruction following-type tasks, while it is

### Segment 2 (05:00 - 06:00) [5:00]

a bit worse than humans at answering questions,  but it still does really well. Wow. Perhaps this   is one more step towards general intelligence.   And also, welcome to Two Minute Papers, Land   of the Fellow Scholars who look at bar charts and  make happy noises. Yeah! What a time to be alive! Thanks for watching and for your generous  support, and I'll see you next time!