❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers
📝 The paper "Eureka: Human-Level Reward Design via Coding Large Language Models" is available here:
https://eureka-research.github.io/
📝 My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bret Brizzee, Bryan Learn, B Shang, Christian Ahlin, Gaston Ingaramo, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Kenneth Davis, Klaus Busse, Kyle Davis, Lukas Biewald, Martin, Matthew Valle, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu
Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/
Twitter: https://twitter.com/twominutepapers
#nvidia #openai #chatgpt
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Here is an incredible paper that teaches virtual humans to run in the most fabulous ways. So good! So, what is this? How did we get here? These are large language models, text-based AI systems that can be our smart assistant, someone who can draw images for us, we all know that. However, fewer Fellow Scholars know that they are also excellent at reading and writing computer code, and can thus, even learn to play Minecraft. That is incredible. Just think about it: this is a text-based AI, how could it possibly control a graphical game like this? Really amazing. So, at this point, we know what these large language models can do. The age of surprises is over…or so I thought! Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Previous AI-based techniques can play Atari games, and other games at a superhuman level. They typically do it through reinforcement learning, which means that they get a controller, look at the screen and play like a human would. DeepMind’s techniques are excellent at this, and thus, they already do extremely well on these games. However, there is a huge problem. What is the problem? Well, look. Scores. Games have scores. And these scores can be used as feedback for the AI to understand whether it is doing well or not. High score, doing well, lower score, not so much. And whatever other task we wish it to perform, there needs to be a score. There needs to be feedback. And therein lies the problem. Each game has a different scoring mechanism, and thus, each task needs a different scoring mechanism. If we are aiming to create an intelligence of sorts, any kind of intelligence requires generality. So it has to be able to perform tasks it hasn’t seen before. But how? Scientists at NVIDIA had a crazy idea. Let’s use these ChatGPT-ish large language models to write code to calculate the score for these tasks by itself. Wow. I love it. But, does it work? I think that cannot possibly work, there are just too many kinds of tasks out there. Well, let’s have a look together. Now, little AI, write code for the score for this humanoid to run and then, train it. Well, what can I say… it works. Well, look, no one said it has to be beautiful. We run. Running is making progress moving forward. It recognized that, and this already works, and the whole thing was written by an AI. Now, get this. Let’s try this some other way. First, little AI, design a task where forward movement is necessary. Wow. That is a fabulous way of moving forward. Now let’s give it some feedback: This looks like squat jumps, make the movement resemble running a little more. So the AI says, okay, got it, so you mean a duck walk! Well, not quite! New feedback: the torso has to be a little higher. Okay, good. But, it is using mostly one leg to hop on. Please use both legs. Goodness! Both legs are now being used, but look. Hoo boy! Back to square one. Now the torso is too low again, let’s ask it to penalize that, and…there we go. Finally, something that resembles running. It is not an easy task at all, it turns out. And, this human feedback-driven solution I think is way better than the AI on its own. This intelligence of sorts is wonderful, but human intelligence is where it’s at. What a fantastic paper! But, it gets better. In fact, here comes the best part! This concept generalizes to so many tasks, from passing balls to balancing them, teaching robots to move, and even the crows favorite: spinning pens. Very impressive. But, how does this train so well and so quickly? The key is that this learns within a computer simulation which has two advantages: one, it can’t really harm itself, and two, my favorite: in the real world, one second means one second,
Segment 2 (05:00 - 07:00)
but in a virtual world, one real second can be simulated much quicker, given a powerful computer. A little game can be simulated quicker. How much quicker? Well, in this case, this little AI is learning a thousand times quicker in a simulation than it would in the real world. So cool! And it gets even better. Hold on to your papers Fellow Scholars, because it can not only match the level of humans playing these games, but the evolution-based variant can even showcase superhuman performance. Wow! It matches or exceeds the level of humans on 75% of the dexterity-based tasks. That is insanity. Finally, an AI that can do these tasks and have some sort of generality. What we are seeing here is perhaps the early days of true intelligence being born. And all this within a little piece of silicon, and lots of human ingenuity. What a time to be alive! However, it is a research work, and it is not perfect. Not even close. For instance, please don’t ask it to close your doors. Ouch.