OpenAI Plays Hide and Seek…and Breaks The Game! 🤖

6:02

OpenAI Plays Hide and Seek…and Breaks The Game! 🤖

Two Minute Papers 22.10.2019 10 843 724 просмотров 367 271 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Weights & Biases here and sign up for a free demo: https://www.wandb.com/papers ❤️ Their blog post is available here: https://www.wandb.com/articles/better-paths-through-idea-space 📝 The paper "Emergent Tool Use from Multi-Agent Interaction" is available here: https://openai.com/blog/emergent-tool-use/ My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://www.patreon.com/TwoMinutePapers - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Bryan Learn, Christian Ahlin, Claudio Fernandes, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Matthias Jost,, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil. https://www.patreon.com/TwoMinutePapers Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu 00:00 Intro 00:44 Start - Pandemonium! 01:06 A little learning 01:33 But then - something happened! 02:08 They learned what?! 02:32 It gets even weirder 03:16 Amazing teamwork 04:02 More interesting behaviors 04:33 Extensions 05:02 More stuff from the paper Károly Zsolnai-Fehér's links: Instagram: https://www.instagram.com/twominutepapers/ Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/ #OpenAI

Оглавление (10 сегментов)

Intro

open AI built a hideand-seek game for their AI agents to play while we look at the exact rules here I will note that the goal of the project was to pit two AI teams against each other and hopefully see some interesting emergent behaviors and boy did they do some crazy stuff the coolest part is that the two teams compete against each other and whenever one team discovers a new strategy the other one has to adapt kind of like an arms race situation and it also resembles generative adversarial networks a little and the results are magnificent amusing weird you'll see in a moment these agents learn from previous experiences and to the surprise

Start - Pandemonium!

of no one for the first few million rounds we start out with pandemonium everyone just running around aimlessly without proper strategy and semi- random movements The Seekers are favored and hence win the majority of the games nothing to see here then over time the hiders learned to lock out the Seekers

A little learning

by blocking the doors off with these boxes and started winning consistently I think the coolest part about this is that the map was deliberately designed by the open AI scientists in a way that the hiders can only succeed through collaboration they cannot win alone and hence they are forced to learn to work together which they did quite well but then some something happened did you notice this pointy doorstop shaped

But then - something happened!

object are you thinking what I'm thinking well probably and not only that but about 10 million rounds later the AI also discovered that it can be pushed near a wall and be used as a ramp and TDA got him the Seeker started winning more again so the ball is now back on the court of the hiders can you defend this if so how well these resourceful Little Critters learned that since there is a little time at the start of the game when The Seekers are frozen apparently during

They learned what?!

this time they cannot see them so why not just sneak out steal the ramp and lock it away from them absolutely incredible look at those happy eyes as they are carrying that ramp and you think it all ends here no no not even close it gets weirder much weirder when playing a different map a Seeker

It gets even weirder

has noticed that it can use a ramp to climb on the top of a box and this happens do you think couch surfing is cool give me a break this is box surfing and the scientist were quite surprised by this move as this was one of the first cases where the Seeker AI seems to have broken the game what happens here is that the physics system is coded in a way that they are able to move around by exerting Force on themselves but there is no additional check whether they are on the floor or not because who in their right mind would think about that as a result something that shouldn't ever happen does happen here and we are still not done yet this paper just keeps on

Amazing teamwork

giving a few hundred million rounds later the hiders learned to separate all the Rams from the boxes dear fellow Scholars this is proper box surfing defense then lock down the remaining tools and build a shelter note how well rehearsed and executed this strategy is there is not a second of time left until the Seekers take off I also love this cheeky move where they set up the shelter right next to the Seekers and I almost feel like they are saying yeah see this here there's not a single thing you can do about it in a few isolated cases other interesting behaviors also emerged for instance the hiders learn to exploit the physics

More interesting behaviors

system and just Chuck the ramp away after that The Seekers go what just happened but don't Despair and at this point I would also recommend that you hold on to your papers because there was also a crazy case where a Seeker also learned to abuse a similar physics issue and launch itself exactly onto the top of the hiders man what a paper this system can be extended and modded for many other tasks too so expect to see more of these fun

Extensions

experiments in the future we get to do this for a living and we are even being paid for this I can't believe it in this series my mission is to Showcase beautiful works that light a fire in people and this is no doubt one of those works great idea interesting unexpected results crisp presentation Bravo open AI love it so did you enjoy this what do you think make sure to leave a comment

More stuff from the paper

below also if you look at the paper it contains comparisons to an earlier work we covered about intrinsic motivation shows how to implement circular convolutions for the agents to detect their environment around them and more this episode has been supported by weights and biases provides tools to track your experiments in your deep learning projects it can save you a ton of time and money in these projects and is being used by open AI Toyota research Stamford and Berkeley in this blog post they show you how to use their system to find Clues and steer your research into more promising areas make sure to visit them through wb. com papers wb. com slapers or just click the link in the video description and sign up for a free demo today our thanks to weights and biases for helping us make better videos for you thanks for watching and for your generous support and I'll see you next time

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник