AI нашел лазейки: как OpenAI играл в прятки и сломал игру
Изучаем, как AI-агенты OpenAI в игре в прятки находили неожиданные и даже ломающие игру стратегии. Открываем новые грани машинного обучения и emergent behavior.
Для AI-агентов и LLM
Экстракт доступен в структурированном Markdown. Скачать .md · JSON API · Site index
💡 Ключевые тезисы (5)
1 AI учатся друг у друга в стиле 'гонки вооружений' #
2 Хайдеры научились запирать 'искателей' ящиками #
3 AI научился использовать 'подозрительные' объекты как трамплин #
4 AI сломал игру, забрав рампу у 'искателей' #
5 AI научился 'серфить' по коробкам, используя физику #
AI нашел лазейки: как OpenAI играл в прятки и сломал игру
Спикер: Two Minute Papers | Длительность: 6:02
Транскрипт
Intro
OpenAI built a hide-and-seek game for their AI agents to play while we look at the exact rules here I will note that the goal of the project was to pit two AI teams against each other and hopefully see some interesting emergent behaviors and boy did they do some crazy stuff the coolest part is that the two teams compete against each other and whenever one team discovers a new strategy the other one has to adapt kind of like an arms race situation and it also resembles generative adversarial networks a little and the results are magnificent amusing weird you'll see in a moment these agents learn from previous experiences and to the surprise
Start - Pandemonium!
of no one for the first few million rounds we start out with pandemonium everyone just running around aimlessly without proper strategy and semi-random movements The Seekers are favored and hence win the majority of the games nothing to see here then over time the hiders learned to lock out the Seekers
A little learning
by blocking the doors off with these boxes and started winning consistently I think the coolest part about this is that the map was deliberately designed by the OpenAI scientists in a way that the hiders can only succeed through collaboration they cannot win alone and hence they are forced to learn to work together which they did quite well but then some something happened did you notice this pointy doorstop shaped
But then - something happened!
object are you thinking what I'm thinking well probably and not only that but about 10 million rounds later the AI also discovered that it can be pushed near a wall and be used as a ramp and TDA got him the Seeker started winning more again so the ball is now back on the court of the hiders can you defend this if so how well these resourceful Little Critters learned that since there is a little time at the start of the game when The Seekers are frozen apparently during
They learned what?!
this time they cannot see them so why not just sneak out steal the ramp and lock it away from them absolutely incredible look at those happy eyes as they are carrying that ramp and you think it all ends here no no not even close it gets weirder much weirder when playing a different map a Seeker
It gets even weirder
has noticed that it can use a ramp to climb on the top of a box and this happens do you think couch surfing is cool give me a break this is box surfing and the scientist were quite surprised by this move as this was one of the first cases where the Seeker AI seems to have broken the game what happens here is that the physics system is coded in a way that they are able to move around by exerting Force on themselves but there is no additional check whether they are on the floor or not because who in their right mind would think about that as a result something that shouldn't ever happen does happen here and we are still not done yet this paper just keeps on
Amazing teamwork
giving a few hundred million rounds later the hiders learned to separate all the Rams from the boxes dear fellow Scholars this is proper box surfing defense then lock down the remaining tools and build a shelter note how well rehearsed and executed this strategy is there is not a second of time left until the Seekers take off I also love this cheeky move where they set up the shelter right next to the Seekers and I almost feel like they are saying yeah see this here there's not a single thing you can do about it in a few isolated cases other interesting behaviors also emerged for instance the hiders learn to exploit the physics
More interesting behaviors
system and just Chuck the ramp away after that The Seekers go what just happened but don't Despair and at this point I would also recommend that you hold on to your papers because there was also a crazy case where a Seeker also learned to abuse a similar physics issue and launch itself exactly onto the top of the hiders man what a paper this system can be extended and modded for many other tasks too so expect to see more of these fun
Extensions
experiments in the future we get to do this for a living and we are even being paid for this I can't believe it in this series my mission is to Showcase beautiful works that light a fire in people and this is no doubt one of those works great idea interesting unexpected results crisp presentation Bravo open AI love it so did you enjoy this what do you think make sure to leave a comment
More stuff from the paper
below also if you look at the paper it contains comparisons to an earlier work we covered about intrinsic motivation shows how to implement circular convolutions for the agents to detect their environment around them and more this episode has been supported by weights and biases provides tools to track your experiments in your deep learning projects it can save you a ton of time and money in these projects and is being used by open AI Toyota research Stamford and Berkeley in this blog post they show you how to use their system to find Clues and steer your research into more promising areas make sure to visit them through wb. com papers wb. com slapers or just click the link in the video description and sign up for a free demo today our thanks to weights and biases for helping us make better videos for you thanks for watching and for your generous support and I'll see you next time
Практические задания
Задание 1: Эксперимент с коллаборацией AI
Изучить, как AI агенты учатся сотрудничать в простой среде. Попробуйте реализовать упрощенный вариант игры в прятки с двумя командами AI. Используйте симулятор (например, PyBullet или Unity ML-Agents). Задачи: 1. Определите цели для каждой команды. 2. Настройте среду с несколькими 'укрытиями'. 3. Запустите обучение на 10000 раундов и проанализируйте, появляются ли стратегии сотрудничества.
Лучшие цитаты
«The Seekers are favored and hence win the majority of the games nothing to see here then over time the hiders learned to lock out the Seekers by blocking the doors off with these boxes and started winning consistently» — Two Minute Papers
«Well these resourceful Little Critters learned that since there is a little time at the start of the game when The Seekers are frozen apparently during this time they cannot see them so why not just sneak out steal the ramp and lock it away from them absolutely incredible look at those happy eyes as they are carrying that ramp» — Two Minute Papers
«This is one of the first cases where the Seeker AI seems to have broken the game what happens here is that the physics system is coded in a way that they are able to move around by exerting Force on themselves but there is no additional check whether they are on the floor or not because who in their right mind would think about that» — Two Minute Papers
Ключевые выводы (Takeaways)
- AI может находить неожиданные стратегии, ломающие правила игры.
- Коллаборация AI приводит к emergent behavior, которое сложно предсказать.
- Физические движки и правила игр могут содержать лазейки для AI.
- AI-системы могут обучаться на ошибках и адаптации конкурентов.
🏋️ Практикум
Эксперимент с коллаборацией AI
Изучить, как AI агенты учатся сотрудничать в простой среде. Попробуйте реализовать упрощенный вариант игры в прятки с двумя командами AI. Используйте симулятор (например, PyBullet или Unity ML-Agents). Задачи: 1. Определите цели для каждой команды. 2. Настройте среду с несколькими 'укрытиями'. 3. Запустите обучение на 10000 раундов и проанализируйте, появляются ли стратегии сотрудничества.
💬 Цитаты (3)
«The Seekers are favored and hence win the majority of the games nothing to see here then over time the hiders learned to lock out the Seekers by blocking the doors off with these boxes and started winning consistently» #
«Well these resourceful Little Critters learned that since there is a little time at the start of the game when The Seekers are frozen apparently during this time they cannot see them so why not just sneak out steal the ramp and lock it away from them absolutely incredible look at those happy eyes as they are carrying that ramp» #
«This is one of the first cases where the Seeker AI seems to have broken the game what happens here is that the physics system is coded in a way that they are able to move around by exerting Force on themselves but there is no additional check whether they are on the floor or not because who in their right mind would think about that» #
Популярное в категории
Читать далее
Two Minute Papers
Худший баг в играх наконец побеждён: кубический барьер против клиппинга
Two Minute Papers
Поделитесь с коллегами