# NVIDIA’s New AI: Training 10,000x Faster!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=QCllgrnk8So
- **Дата:** 15.12.2024
- **Длительность:** 6:23
- **Просмотры:** 77,834
- **Источник:** https://ekstraktznaniy.ru/video/17214

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The papers are available here:
https://hover-versatile-humanoid.github.io/
https://blogs.nvidia.com/blog/robot-learning-humanoid-development/

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Gaston Ingaramo, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Sundvall, Taras Bobrovytsky,, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitte

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Robotics is not really working. Yet. And  one of the reasons why is that there is   simply not enough data for robots to learn  from. And the other problem is that we,   humans don’t have time. But no matter, because  this work speeds up time by 10,000 times. Yes,   really, but not in the way you think.   To teach an AI English, it can read the   whole internet. There is tons of data. But for  robot arms and humanoid robots, not so much. Since data is the engine of all AI applications,  this is not great news. Now here are 2 amazing   new, unexpected research works that might  make them work sooner than we think. You see, one very good way to reach robots  is where they learn directly from humans.    Human demonstrations, that is. After one human  demonstration, this kind of appears to be doing   the task, things are being grabbed here, but it  moves slowly, and isn’t too confident. And…after   a 1000 demonstrations, let’s see... now we  are talking! This works so much better. Okay, so the number of demonstrations  really matters. But there is a problem.    What is the problem? Well, the  problem is that nobody is going to   do the same task a 1000 times to teach  a robot something that is this simple. So here is a crazy idea, from a new paper  called SkillGen. What does it do? Well,   it takes a look at 10 human demonstrations,   and generates 200 or even more out of it. So  cool! But does it work? Let’s have a look. Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. When learning from 200 demonstrations, the robot  can be successful roughly 3 times out of 10,   but after 5000 demonstrations, now about 8  times out of 10. Huge difference. We just   went from unusable to promising  with just one idea. Loving it. And here comes the best part that I did not  expect at all: these synthetic demonstrations   give us comparable results to actually having  hundreds of human demonstrations. So the synthetic   generated ones can be close to as good as the  real ones. I think that is unbelievable. Wow. Now, the other problem is that  we humans don’t really have time   to wait for these robots to learn these tasks. So,   how could that be addressed? We can’t just  speed up time. Except that we can. Here’s how. We can build a simulation environment for robots,  and run it on a powerful computer that can run   this simulation quicker than real time. How much  quicker? Well, for every one second in real time,   this can simulate about 10,000 seconds. Wow. It  can get a year’s worth of learning in one hour! That is great, however, still not good  enough. We have more data problems. You see,   human demonstrations are nice, but there  are so many ways to do that, for instance,   you can have a virtual reality headset that  tracks your head, and if you are lucky,   your hands too, or it can just be a camera put  down somewhere that records the motion, or an   exoskeleton that gives you the movement of most of  the joints in the body, maybe only robot arm data,   no lower body info, this is a huge mess. How are  we supposed to learn from this huge soup of data? Well, this new paper called Hover,  gives us a way to do exactly that. Yes,   it can take all of these control modes, and it  is able to train one unified controller to move   virtual, and even real humanoid robots  around. That is absolutely incredible. But we are not done yet, we  said this speeding up time,   which is basically a superpower needs a powerful  computer. And that is unfortunately not enough. If we have these heavyweight neural  networks like ChatGPT, we don’t stand   a chance because computing the next move  just takes too much time. These networks   have hundreds of billions of parameters inside,  and even a small neural network has about 500M. But not this new proposed system in Hover. Yes,  now hold on to your papers Fellow Scholars,   because all this needs is not  hundreds of billions of parameters,   and not even 500 million parameters. Only  1. 5 million. It’s almost unbelievable

### Segment 2 (05:00 - 06:00) [5:00]

how small that is in today’s standards.   Your phone can run that easily,   it eats it up like a lollipop, and  probably even your smart watch. And   yet it can pull off these control tasks. I am  completely stunned by this. Wow. What a paper! So the robots of the future will be able to  learn from very few human demonstrations,   generate their own demonstrations to learn  from, enter a world where time is sped up 10,000   times compared to our world, and they  will be able to learn from absolutely   anything. So I hope that this will lead  to these helpful little robots that will   finally fold my laundry while I read  my papers. What a time to be alive! So, what do you think? What  would you Fellow Scholars   use this for? Let me know in the comments below.
