AI-боксеры: от неуклюжих падений до нокаутов за неделю
Спикер: Two Minute Papers | Длительность: 6:02
Транскрипт
You shall not pass! []
Now, in an earlier work, we saw a few examples of AI agents playing two-player sports, for instance, this is the “You Shall Not Pass” game, where the red agent is trying to hold back the blue character and not let it cross the line. Here you see two regular AIs duking it out, sometimes the red wins, sometimes the blue is able to get through. Nothing too crazy here. Until…this happens. Look. What is happening? It seems that this agent started to do nothing…and still won. Not only that, but it suddenly started winning almost all the games.
Does nothing - still wins! [0:49]
How is this even possible? Well, what the agent did is perhaps the AI equivalent of hypnotizing the opponent, if you will. The more rigorous term for this is that it induces off-distribution activations in its opponent. This adversarial agent is really doing nothing, but that’s not enough - it is doing nothing in a way that reprograms its opponent to make mistakes and behave close to a completely randomly acting agent! Now, this new paper showcases AI agents that can learn boxing. The AI is asked to control these joint-actuated characters which are embedded in a physics
Boxing - but not so well [1:30]
simulation. Well, that is quite a challenge - look, for quite a while after 130 million steps of training, it cannot even hold it together. And, yes…these folks collapse. But this is not the good kind of hypnotic adversarial collapsing. I am afraid, this is just passing out without any particular benefits. That was quite a bit of training, and all this for nearly nothing. Right? Well, maybe…let’s see what they did after 200 million training steps. Look!
Learning is happening [2:13]
They can not only hold it together, but they have a little footwork going on, and can circle each other and try to take the middle of the ring. Improvements. Good. But this is not dancing practice, this is boxing. I would really like to see some boxing today and it doesn’t seem to happen. Until we wait for a little longer…which is 250 million training steps.
After 250 million training steps [2:39]
Now, is this boxing? Not quite, this is more like two drunkards trying to duke it out, where neither of them knows how to throw a real punch…but! Their gloves are starting to touch the opponent, and they start getting rewards for it. What does that mean for an intelligent agent? Well, it means that over time, it will learn to do that a little better. And hold on to your papers and see what they do after 420 million steps.
Drunkards no more! [3:10]
Oh wow! Look at that! I am seeing some punches, and not only that, but I also see some body and head movement to evade the punches, very cool. And if we keep going for longer, whoa!
Serious knockout power! [3:29]
These guys can fight! They now learned to perform feints, jabs, and have some proper knockout power too. And if you have been holding on to your papers, now, squeeze that paper, because all they looked at before starting the training was 90 seconds of motion capture data. This is a general framework that also works for fencing as well. Look!
It works for fencing too [4:00]
The agents learned to lunge, deflect, evade attacks, and more. Absolutely amazing. What a time to be alive! So, this was approximately a billion training steps, right. So how long did that take to compute? It took approximately a week. And, you know what’s coming.
First Law of Papers [4:20]
Of course, we invoke the First Law Of Papers, which says that research is a process. Do not look at where we are, will be two more papers down the line. And line, I bet this will be possible in a matter of hours. This is the part with the gorillas. It is also interesting that even though there were plenty of reasons to, the researchers
An important lesson [4:43]
didn’t quit after a 130 million steps. They just kept on going, and eventually, succeeded. Especially in the presence of not so trivial training curves where the blocking of the other player can worsen the performance, and it’s often not as easy to tell where we are. That is a great life lesson right there. Thanks for watching and for your generous support, and I'll see you next time!
Практические задания
Задание 1: Анализ кривой обучения AI-агента
Изучите предоставленные данные о тренировочных шагах (130 млн, 250 млн, 420 млн, 1 млрд) и сопоставьте их с описанием прогресса AI-боксёров. Отметьте, на каком этапе появились первые значимые улучшения (например, удержание равновесия, первые удары, уклонения). Подумайте, какие метрики можно было бы отслеживать для более точной оценки прогресса.
Лучшие цитаты
«What does that mean for an intelligent agent? Well, it means that over time, it will learn to do that a little better.» — Two Minute Papers
«Especially in the presence of not so trivial training curves where the blocking of the other player can worsen the performance, and it’s often not as easy to tell where we are. That is a great life lesson right there.» — Two Minute Papers
«And if you have been holding on to your papers, now, squeeze that paper, because all they looked at before starting the training was 90 seconds of motion capture data.» — Two Minute Papers
Ключевые выводы (Takeaways)
- AI-боксеры демонстрируют прогресс от неуклюжести к нокаутирующим ударам за миллиард шагов.
- Успех AI в боксе достигнут с помощью 90 секунд данных motion capture и недели вычислений.
- Обучение AI в физических симуляциях применимо к различным единоборствам, например, фехтованию.
- Настойчивость исследователей важна при работе со сложными кривыми обучения AI.
Уровень сложности
beginner
Причина сложности
Концепции объясняются доступным языком с использованием аналогии, без глубокого погружения в технические детали.
Необходимые знания (Prerequisites)
[]
Связанные темы
["искусственный интеллект", "машинное обучение", "обучение с подкреплением", "компьютерное зрение", "физические симуляции", "робототехника"]
Инструкции по применению
- Определите цель обучения для AI-агента в симуляции.
- Выберите подходящий алгоритм обучения с подкреплением.
- Подготовьте начальные данные (например, motion capture).
- Запустите процесс обучения, отслеживая метрики производительности.
- Анализируйте кривые обучения, выявляя проблемные участки.
- Итерируйте: корректируйте параметры или архитектуру модели.
- Масштабируйте вычисления для достижения желаемого уровня производительности.
- Тестируйте AI-агента в различных сценариях.