Multi-Agent Hide and Seek

2:57

Multi-Agent Hide and Seek

OpenAI 17.09.2019 11 062 550 просмотров 166 945 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

We’ve observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek. Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported. The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior. Learn more: https://openai.com/blog/emergent-tool-use/

Оглавление (7 сегментов)

<Untitled Chapter 1>

on earth the simple rules of natural selection and competition led to the evolution of increasingly intelligent life-forms today we ask if comparably simple rules at multi-agent competition can also lead to intelligent behavior in a new virtual world these agents are playing hide and seek these agents have just begun learning but they've already learned to chase and run away this is a hard world for a hider who has only learned to flee however after training and millions of rounds of hide-and-seek the hiders find a solution the hiders learn to use rudimentary tools to their advantage by grabbing and locking these blocks they can create their own shelter the Seekers are locked in place for a brief period at the start

Multiple Door Blocking

of the game giving hiders a chance to prepare even so the hiders must learn to collaborate accomplishing tasks that would be impossible for any single individual the hiders are not the only ones who can learn to use tools after

Ramp Use

many generations of failing to break into the shelter the Seekers learned to jump over obstacles using ramps however after many millions of rounds of having

Ramp Defense

their shelter breached the hiders learned to take away the primary tool the Seekers have at their disposal note that we did not explicitly incentivize any of these behaviors as each team learns a new skill it implicitly changes the challenges the other team faces creating a new pressure to adapt we've also put these agents into a more

Shelter Construction

open-ended environment randomizing the objects team sizes and walls in this world they learn to construct their own shelter from scratch requiring that they arrange multiple objects into precise structures to prevent seekers from using the ramps the hiders move them to the

Box Surfing

edge of the play area and lock them in place we originally believe this would be the final strategy that the agents learned however we found that after more training the Seekers discover that they can jump on top of boxes and surf them to the Hydra shelter in the last stage of emergent strategy that we observe the hiders learn to lock

Surf Defense

as many boxes as they can before constructing their force in order to defend against box surfing so how do agents acquire these skills they're trained using reinforcement learning an algorithm inspired by the way animals on earth learn the agents play thousands of rounds of hide-and-seek in parallel for many days they train against each other as well as past versions of themselves using an algorithm called self play coevolution and competition on earth led to the only generally intelligent species known to date humans while this world is far less complex than Earth we have found evidence that simple rules can lead to increasingly intelligent behavior from multi-agent interaction we hope that with a much larger and more diverse environment truly complex and intelligent agents will one day emerge

Другие видео автора — OpenAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник