This Curious Robot Should Be Impossible!

6:57

This Curious Robot Should Be Impossible!

Two Minute Papers 01.01.2024 258 899 просмотров 8 577 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper "Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks" is available here: https://openreview.net/pdf?id=QG_ERxtDAP- 📝 My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, Jie Yu, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/ Twitter: https://twitter.com/twominutepapers

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

here you see an incredible new robot that learned to explore stand up and even handle packages and more now wait a second what you see here should be impossible why is that why is this impossible dear fellow Scholars this is two minute papers with Dr car well to understand what is going on here let's look into the popular large language models first you see when training a large language model to understand English and be a good assistant we can give it tons and tons of data from the internet to learn on for instance the early gpt2 read Amazon reviews and learned to write new ones this was possible because tons and tons of training data was available however this should be impossible for a real robot in the real world why is that well have a look at this page paper where scientists at Nvidia tried to teach a software agent to run and it kind of learned to run okay but after quite a bit of training at first it looked like this now letting a robot lose in a lab while flailing around would not work too well because it would injure itself and its environment before any meaningful learning could happen so in software there is plenty of training data to learn from but for robotics there is no data none or even if there is just not enough to learn anything so how can we solve this well the solution is my favorite which is learning inside a simulation letting the robot play around in a video game here they are doing reinforcement learning which means truly playing a video game here you do something and if you do well you get a score a reward if you don't do so well you get nothing or a negative score now we also need a little magic sauce here to make sure that the robot is curious and wants to explore and understand the world around it normally that is not that difficult for instance deep Minds agents can play video games as they are yearning for a high score so they will go around and explore however ha there is a huge problem a TV problem yes you heard it right the AI has a TV problem agents like these can encounter a TV in a virtual world and then this happens yes they get addicted to it and they never want to leave now that is real humanlike Behavior if I've ever seen one this happens because the agent is getting new information all the time and it's much more interesting than these otherwise boring levels so how did they make this little robot curious well by engineering the rewards of the game in a way that for instance if the angle or the velocity of the door changes it gets a reward thus look it starts experimenting with it now new task if we wish to get it to move boxes around to where they belong we can craft a reward involving the velocity of the box and the distance from the container now hold on to your papers fellow Scholars because this is the moment of truth let's jump from the video game world into the real world and see how this little robot fares and who look at that it can navigate around in the real world competently and it can even stand up it now opens those doors and can handle packages well as you see it handles them at least as gently as some of the delivery services out there another humanlike trait perfect now Jokes Aside by further tailoring these reward functions I would imagine that it could be much gentler with those packages and even doors just one more paper down the line and also Bravo what has been achieved here is extremely hard why well because we cannot just make a simpler video game for this robot to learn on it has to be good enough to even work in the real world after all and if the knowledge from here does not translate properly to here things can break really quickly this was an amazing feat and just imagine that now we have better and better tools to create Virtual Worlds a lot of our episodes are

Segment 2 (05:00 - 06:00)

on new papers that improve these potential simulation environments and every single one of those could be one step closer for us to have these little AIS train in these Virtual Worlds and then come out into the real world help us out and do all this safely for instance these little robots could help us with last mile delivery the idea behind this paper could also help us create better self-driving cars that can safely train in an environment where we can make crazy hard levels for it and with a powerful computer have it train for years and years in computer simulation time not in real time and then come out as a safe and competent AI agent what a time to be alive and this is not science fiction not at all in an earlier paper we saw nvidia's AI train for 33 years in simulation time to play in Minecraft and boy did it learn a great deal in there now this new work of course does not come without limitations hand engineering reward functions is a bit of a limitation here which means that we have to write these rewards ourselves so if we need it to perform a new task it needs a new reward function for that limiting the generality of the agent an earlier paper we talked about here could potentially remedy that as it is about teaching these AI agents to find out what a good score would be experiment tracking model evaluation and production monitoring for your deep learning projects and llm apps this is what weights and biases does and it is the best everyone is using it try it out now at wb. me/ papers or click the link in the description below

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник