OpenAI's NEW "AGI Robot" STUNS The ENITRE INDUSTRY (Figure 01 Breakthrough)

19:49

OpenAI's NEW "AGI Robot" STUNS The ENITRE INDUSTRY (Figure 01 Breakthrough)

TheAIGRID 13.03.2024 2 559 569 просмотров 34 740 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

✉️ Join My Weekly Newsletter - https://mailchi.mp/6cff54ad7e2e/theaigrid 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: https://twitter.com/Figure_robot/status/1767913661253984474 Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Методичка по этому видео

Структурированный конспект

Робот OpenAI Figure 01: Революция в автономии и коммуникации

Разбираем прорывную демонстрацию робота Figure 01 от OpenAI. Узнайте о его автономности, мультимодальном ИИ, речевых возможностях и технических деталях, которые потрясли индустрию.

Оглавление (9 сегментов)

Intro

this is genuinely by far the most surprising AI demo I've ever seen in my entire life and you're about to see exactly why open AI a new humanoid robot in partnership with figure is absolutely incredible take a look at the demo and then we can dive into all the technical details of how this really

Demo

works hey figure one what do you see right now I see a red apple on a plate in the center of the table a drying rack with cups in a plate and you standing nearby with your hand on the table great can I have something to eat sure thing great can you explain why you did what you just did while you pick up this trash on it so I gave you the Apple because it's the only uh edible item I could provide you with from the table great so based on the scene right now where do you think the dishes in front of you go next the dishes on the table like that plate and cup are likely to go into the drying rack next great can you put them there of course all right so how do you think you did I think I did pretty well the apple found its new owner the trash is gone and the table wear is right where it belongs I agree thanks figure one you're welcome if there's anything else you need help with just let me know so there we have it ladies and gentlemen that was the demo that actually did shake the entire industry or shock the entire industry whichever word you to use and I have to be honest with you guys this is by far the most surprising thing I've seen all year and by far the most impressive demo I've

Company Progress

seen now one thing you do have to give hats off to is the company figure because they have moved at lightning speed this company is only 18 months old meaning that it's been 1 year and 6 months since the Inception and my oh my just look at the amount of progress that they've done they've gone from having absolutely nothing to building a working humanoid robot that is able to complete these tasks using its Vision model with an endtoend neural network and being able to talk to you whilst it completes tasks now before I give my entire reaction to this because I have a lot to State about this let's dive into some of the technicals of this robot that's being developed in this AGI lab so the

Technicals

first image that was tweeted by someone that works there they said let's break down this video all behaviors are learned and not tell operated and run at normal speed one thing that you might not have known is that usually when robot demos have been done in the past a lot of the times because robots are kind of slow and they have been slow in the past because making them fast is quite a difficult task they have been sped up to just show you what the robot is capable of doing however this recent demo was all done in one time speed that means what you saw wasn't sped up at all and everything was done in real time secondly the Tweet also states that all behaviors are learned and not teleoperated teleoperation is the process of using a human who is controlling the robot via a VR controller it could be a VR headset and you use that movement and it's mapped onto the robot in a demo to show you what the robot is physically capable of however this is a robot that is working with an endtoend your network meaning that 100% of the behavior of this robot is entirely autonomous they also state that we feed images from the robot's cameras and transcribe text from speech captured by onboard microphones to a large multimodal model trained by openi that understands both images and text and the model processes the entire history of the conversation including past images to come up with language responses which are spoken back by the human text of speech the same model is responsible for deciding which learned Clos Loop Behavior to run on the robot to fulfill a given command and loading a particular neuron Network weights onto the GPU and executing a policy so essentially this AI system is able to recognize what's going on in the environment once it recognizes what's going on from the human speech What It Wants it's then able to decide a policy from its library of existing policies of what it can then do and then it's able to execute on that policy with its reasoning so it's really really impressive because the image processing actually does have common sense reasoning so the robot with its vision is able to make sense of its surrounding using the camera so it doesn't just see images it understands them in a way that it can reason about what is happening or what it needs to do next and of course one of the really cool features of this was the text of speech the robot can respond to humans by converting the text that it outputs in terms of its reasoning into spoken words so it can continue to carry on a conversation with a person now one of the things that I really really did like about the Tex of speech and I really just want to talk on that really quickly was how coherent this and how human this sounded in fact this sounded so human now most people might be thinking that the voices that you heard from this figure robot were just like a human um you know recording and it could well be that but I want to tell you guys and I'll show you guys a demo later that having humanoid robots that sound that human is completely possible and I've exper experimented with some software in the past that sounds 100% human realistic as in the robots say uh they say m and this is something that can actually be done I know it sounds so realistic but it is possible it could be a model that open AI hasn't released yet but let's continue into the technicals before we dive into some speculation in addition of course one thing that they stat here as well is that it is a whole body controller and so the robot can move in a controlled stable way ensuring it doesn't topple over or make unsafe movements and it's like having an inner sense of balance and understanding of how to move all of its body parts in harmony they also state that it has 200 Herz actions and 1 khz joint torqus so the robot's actions are updated 200 times per second and the forces at its joints are updated a thousand times per second this means that the robot can make very smooth and precise movements reacting quickly to changes and the whole system is designed to operate seamlessly so the robot can understand both the visual and spoken aspects of the environment and decide how to respond in both speech and action and execute these responses in real time without being controlled by a human at that time when it's being autonomous the behaviors that it does show in the video are learned from training not programmed for each specific interaction that allows it to process and react to information very

New Capabilities

quickly there's another tweet here breaking it down this is someone who is working on the robot they said connecting figure one multimodal model gives it some interesting new capabilities figure one can now describe its surroundings using use common sense reasoning when making decisions for example the dishes on the table like that plate and cup are likely to go into the drying rack next and of course translate ambiguous highle requests from I'm hungry to some context appropriate behavior like hand the person and apple and then of course describe why it executed the particular action in plain English for example it was only the edible item that I could provide you with from the table so I think right here this demo of you know being able to hand someone an apple it might seem pretty BAS but this does show us Advanced reasoning capabilities because if someone says I'm hungry you have to realize okay if this person is hungry that means they want food I have to scan the environment to look for any sources of you know food I see an apple then I can select the policy to hand that person an apple and then I have to grab the apple and then hand the person the Apple so it's Common Sense reasoning when making decisions which is a key step up from what we've seen before which means that this is another level it's not just you know showing the dexterity and the torque of the robot and I mean the implications for this are really cool because now we have a robot that can make educated guesses about what should happen next based on what it sees and for instance like we did see if it does see the dishes on a table it can infer that they are likely to be placed in a drying rack afterwards and it's similar to how you might see dirty dishes and think that these dishes need to be washed another tweet here from Corey says a large pre-train model that understands conversation history gives figure one a powerful short-term memory consider the question can you put them there what does them refer to and what is there answering correctly requires the ability to reflect on memory the pre-trained model analyzing the conversation's image and text history figure one quickly performs and carries out a plan place the cup on the drying rack and number two place the plate on the drying rack so essentially this tweet here is of course once again desri describing how the common sense reasoning works because with the memory it's able to you know realize what has happened before and then make educated guesses based on that here's where we

How It Works

take a look at some of these things in more detail and there is a tweet by Corey who works on the AI at figure robot and I'm going to break this down for you so this tweet is essentially about how the robot carries out complex askk with its hands which involve handling and holding and manipulating objects in a very refined way which is what we call bu manual manipulation because it uses both hands so first things first is you have the neural network visual Moto Transformer policy so you can basically imagine the robot's brain essentially has a special part that can take pictures it sees through which are IT cameras and then of course turn them directly into its actions like moving its arms or its fingers and this part uses something called a Transformer which is a type of neuron Network that is really good at handling sequences of data and in this case sequences of images over times now essentially we've got mapping pixels to actions and this robot doesn't just see images it interprets the visual information the pixels to decide which actions its hands and fig fingers should take and of course the onboard images at 10 HZ and actions at 200 HZ then we have and the robot's cameras capture images at 10 times per second and based on those images the robot updates what it's doing 200 times per second so there's a slight delay between seeing and acting but it's actually very quick and much faster than a human can perceive and of course we've got the 24 degrees of freedom actions and essentially in robotics this refers to how many different ways a robot can move and here it means that the robot can adjust the position of its wrist and the angles of its fingers in 24 unique ways to grasp and manipulate objects and of course the high rate set points for the whole body controller this is essentially where the actions the robot decides to take are like targets or goals that the robot aims to reach and the set points are then used by the whole body controller which operate at an even higher speed to make sure the robot's entire body Moves In coordination with the actions of the hands and of course the separations of concerns and this term actually means dividing up a complex problem into smaller more manageable parts and in this case the internet pre-trained models act like the robots high level thinking using common sense to make plans based on what it sees and hears and the Learned visual motor policies are the robots reflexes allowing it to perform tasks that are too complex to program by hand like adapting to unpredictable movement of a squishy bag and the whole body controller is like the robot sense of balance and self-preservation ensuring that whatever actions the hands are performing the robot stays stable and doesn't fall over or move unsafe so in simpler terms basically what we have here is a robot that is designed to perform complex tasks with its Hands by seeing and acting in a very rapid and sophisticated manner with different parts of its brain focusing on different aspects of the task to ensure it is done smoothly and safely now that we've broken down most

Whats Going On

of the technical aspects of this story I think it's rather important to really discuss what's going on here because this is like I've said by far one of the most shocking things I've seen this year in terms of being able to surprise me I did know that open Ai and figure were actually working on this update but I didn't think that when it arrived it would be so realistic one of the first things that I did notice of course was the voice and many people were actually stating I saw a lot of comments stating that you know this is VR piloted and scripted chat GPT doesn't use pause words and some and someone said it's using opening eyes text to speech which does you can hear it if you use the conversation in the chat GPT app and he responds saying I use this all the time and it doesn't include Paws here's the thing I can probably guarantee well I can't guarantee of course I'm not I don't work at openi but I can probably bet that openi are using a different model for this because previously when they discussed talking about the robot and they discussed future updates for chat GPT 5 and GPT 4. 5 if that even is a thing they spoke about how they wanted to increase the latency time or in fact decrease the latency time is what I mean which means that robots and you know the AI system talking back to you is going to get a lot faster and a lot more humanlike so we do know that probably what they are using is an updated version that is potentially specific just for this kind of robot because the type of system that we might be getting in future updates to chat GPT likely won't be the same one for this one because this one is going to need to be really quick and it's dealing with you know an entire another system so I do think that is 100% possible and I think it's rather shocking to show you that because people are this skeptical you know someone saying VR piloted you know VR tele opt and scripted and of course saying chat GPT doesn't use pause words if you've actually ever you know used an AI voice generator if you've used 11 Labs or you've used a different kind of Technology you'll know that sometimes the AIS actually can glitch out and they can actually use paw sometimes if I do manage to find an example of an AI doing this I will include it but it is something that I have seen happen multiple times and it wouldn't be that hard for openi to get this done because open ey does have a very good team and whatever it is that they do work on they are going to excel at we've seen them excel at video and video isn't even their main company of course they're an AI company but we know that one of the main things that they did do was of course these large language models and that was their main focus but it goes to show that when open AI puts their mind to something they're able to execute in a way that shows us that they really are the market leaders now another thing I noticed that was really fluent and fluid was of course how well the robot moved in terms of placing items down you can see that when it placed the plate down I found it so incredible that it was able to smoothly move the plate in just like that and then not only that when it was moving the trash away I just thought that the movement of moving the trash away was very human like it seemed it didn't really seem like a robotic kind of grasp it seemed as if it was someone that was you know throwing things away in this very fluid Dynamic and human like way and then even then it even manages to move the basket back over to this person so I thought that bit there was very human-like those were two main things that I thought wow this is really really surprising and the future is going to be shocking because it was only a couple of months ago in fact it wasn't that long ago before we had some very basic updates of this robot

Whats Next

you know this autonomous you can see speed versus human 16. 7% this is where this robot was now I think what's next for this company in terms of where they're going to be heading next right now they are standing at a very good point in terms of development but I think what's next for this company is that likely what we see is we probably see this robot maybe work on its movement a little bit more because whilst yes it's able to now have this Vision you know thing on board and it's able to talk with natural language which is a major step up because it's a to perceive the environment and reason about the environment I think what will happen now is that we are going to have a situation where the necks probably work on the speed of the legs because that does seem to be something that was quite slow now there was an update from then which did actually show the dynamic walking Improvement speed and you can see that it was a little bit faster of course humans do walk a lot faster than this and we've seen that the Tesla bot can actually walk a lot faster than this but I do think that this team is moving so quickly that by the end of the year if we have this thing I wouldn't say running but if we have it moving at 100% human speed and being able to talk you know in real-time conversations like it is but even faster I genuinely would not be surprised so that is where I think this is heading next because

Final Thoughts

whilst these you know demos here are really cool I think of course watching the robot walk around um in an environment and then potentially you know real time update its policies to maybe a dynamic enironment will be super fascinating because with a lot of demos what we do see is we do see a preset environment like everything seems like it's preset but sometimes what is for us to be able to see a robot go into a kitchen that it's never seen before and then of course have it dynamically adjust its policies in a new environment but to be honest with you guys this is by far the most impressive robotics thing we've all seen and considering this lab just came out 18 months ago this shows us the rate of acceleration that is possible when you truly put your mind to something aome Dynamics have some real competition on their hands Tesla's Optimus really have some because this is you know firstly the first embodied AGI system that we've seen and honestly guys this looks really good if this guy if this company continues at the rate they're going at they could definitely completely dominate the market because they have something that actually works we're seeing continuous demos and we're seeing something that you know sounds realistic is able to reason very well and another thing as well that I did see was that the people that are working on this model they didn't actually state that it uses GPT 4 which means that likely what we could have here is we could have them using what is likely here was that we could have a situation where things are using a completely different model so it could be an updated version of GPT 4 GPT 4. 5 it could be gp5 but this tweet shows us that it says connecting figure one to a large pre-trained multimod model gives it new capabilities and it has advanced uh Common Sense reasoning so of course that is speculation but maybe it could be a different kind of model because it just says an open AI model it doesn't say gp4 okay if it was gp4 they would have said it's gp4 so I'm guessing that the open AI model that they're referring to here might be a model that is specific for robotics probably fine-tuned for Robotics and you know understanding how everything works or it could be an advanced system that open ey has been working on for the last year and a bit so with that being said if you did enjoy this video let me know if you think this is crazy do something that is immediately going to replace B Enders and you know people who bag groceries or people who work in certain stores I have no idea what the future is going to look like because my I what we're seeing here is remarkable and shocking and stunning and just all sorts of incredible so with that being said let me know what you thought about this entire demo was this something that you're excited about and I'll see you guys in the next one

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник