# Googles's NEW INSANE PALM-E SHOCKS The Entire Industry! (PaLM-E Google ANNOUNCED!)(Multimodal)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=2BYC4_MMs8I
- **Дата:** 12.03.2023
- **Длительность:** 9:09
- **Просмотры:** 309,731
- **Источник:** https://ekstraktznaniy.ru/video/14945

## Описание

Googles's NEW INSANE PALM-E SHOCKS The Entire Industry! (PaLM-E Google ANNOUNCED!)(Multimodal)

On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM-E, a multimodal embodied visual-language model (VLM) with 562 billion parameters that integrates vision and language for robotic control. They claim it is the largest VLM ever developed and that it can perform a variety of tasks without the need for retraining.

According to Google, when given a high-level command, such as "bring me the rice chips from the drawer," PaLM-E can generate a plan of action for a mobile robot platform with an arm (developed by Google Robotics) and execute the actions by itself. (BenEdwards)

https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal-language.html
https://palm-e.github.io/

Business enquiries: sponorships@theaigrid.com

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Google's AI is absolutely insane they just recently announced palm e three days ago and the researchers have managed to make a robot with over 562 billion parameters and this robot doesn't even need constant retraining now take a look at this footage where they actually ask this robot to bring me the rice chips from the drawer this is insane and you might not understand exactly why but trust me guys this is a Next Level robot you can see right here that they even decide to add some minor disturbances just to see how well the robot is going to cope with real world scenarios and you can see right here that the robot does do very well in handling specific tasks it's honestly so interesting as to how far we've come with AI and Google's new robot here is honestly truly groundbreaking because it can honestly manage to handle these tasks without the need for further input and it doesn't need to be trained again which means you can actually put new tasks into the robot so this kind of robot is definitely going to be shaking up a lot of Industries if we do have it at scale so if we go to Google ai's research blog you can see here that they said today we introduce palm e a new generalist robotics model that overcomes these issues by transferring Knowledge from buried and visual language domains to a robotic system basically transferring all that great knowledge that they have from a text Data into an actual physical robot this is the GitHub where you can read more about this and it says in the first video we execute a long Horizon instruction bring me the rice chips from the drawer that includes multiple planning steps as well as incorporating visual feedback from the robot's camera finally we show another example where the robot's instruction is bring me a green star and the green star is an object that this robot wasn't directly exposed to so that means that this robot hadn't really known what this was it's just based on the data generated from it being such a large model of data that it was originally based on so you can see right here that of course this is the one that we just looked at and of course this is definitely very interesting but remember this is also four times speed so the robot is actually quite slow here so you can also see in this second example what we have is something very interesting in The Following part we show palm e controlling a tabletop robot arranging blocks we show the palm e can successfully plan over multiple stages based on Visual and language input so our model is able to successfully plan a long Horizon task sort blocks by colors into different corners and you can see here that it is doing this task very well which is really interesting because these AI robots are moving very fast in terms of the rate of advancement and it truly is a little bit scary because it does feel like we've only just been thrust into the world of AI last year with the rise of chat gbt and there rise of Dali stable diffusion and many other things you can also see here that the first instruction is move the remaining blocks of the group and then it sequences step-by-step commands to the low level policy such as move the yellow hexagon to the green star and move the blue triangle to the group so it's definitely very interesting to see exactly how quickly this robot is able to do this and of course it doesn't seem like these robots are too quick right now because as you can see here it says two times speed and right here it says four times speed so these robots aren't moving lightning quick but there are some robots which can move with lightning speed and accuracy which I will show you so I'm wondering what happens when you combine let's say for example for a quick moment we take this robot right here and let's say we managed to combine it with this robot right here you can see that this is a robot which responds in real time to table tennis players and it moves very quickly now I know that many different robots perform many different functions but it just goes to show you how quickly we can move with AI imagine just asking this robot to do anything now it says here next we demonstrate two examples of generalization in the case below the instruction is push the Red Blocks to the click up and the data set only contains three demonstrations with the coffee cup in them none of them included the Red Blocks so essentially what is going on here is that it's able to complete new tasks on data that it hasn't really been trained on okay and you have to understand that this is insane because when we look at other robots that we've had before see that these robots in Tesla factories are ones that are already trained a million times on specific tasks that they need to do but the reason that Google's one is so interesting because these are tasks that they weren't previously trained on so you could theoretically tell it to do something new and it could you know go ahead and simply do it now here is the demo it says the examples below are all completions in Orange from Palm e and the prompt is one or more images and the

### Segment 2 (05:00 - 09:00) [5:00]

text is in Gray so I can click these and it will show me exactly what's going on so it says given this image this is what it was given and it says who are the two teams playing in this photo and which was the last winner championship and which year did they win and who was their star player in that year so you can see here that it actually manages to get this data response very quickly it says the team in white is New York Knicks the team in green is the Boston Celtics it actually has all the information that you'd want to need now I don't play basketball and I don't really know much about this but for someone who's on the internet maybe you see an image and you want to know everything that you can about that image something like this would be a complete Game Changer because you'd instantly have tons of data from that image even if you didn't have enough now you can see also here it says what flavor is the pink donut on the right and you can see right here that it says blueberry and of course it's reading that data which is really cool there's also this example here which says what will the robot do next and of course it's going to fall so it just goes to show how much this robot truly understands because I could argue that maybe some humans might not even understand this kind of concept because some people might perceive this as something else because you know humans have different way of perceiving things so this is another interesting example it says I'm a Robot Operating in a kitchen given this image when a human asked me to do a toss I'll respond with the sequence of actions I would like to accomplish this task with only the items I see and use all of these ingredients you see to make a cake batter and you can see right here that it manages to give all the information immediately which is really cool and this one I thought was truly interesting and a little bit scary if I'm being honest because it says I'm just getting two custom pizzas for me and my friend how much should I pay in a total and let's think step by step and I think Google may want to go ahead and look at this again because it should say pay but it actually does say play there so that's a little um you know spelling mistake but what's also cool is that it does this really quickly so I can imagine you're going to be able to Simply you know have these image images potentially upload them onto maybe a Google AI app on your phone and then simply ask it any question you want and it's going to be able to give you that data which is going to be absolutely insane imagine you're in a foreign country and you don't speak the native language and you want to know exactly how much your meal is going to cost you or what kind of things they have in the food it's going to be very applicable to so many use cases and you can see what's in this image answer in emojis um also this one right here it says if a robot wanted to be useful here what steps should it take clean the table pick up the trash pick up the chairs wipe the chairs and put the chairs down so that is really crazy and remember now this is a very crazy model because of course this one is combined with an actual robot that can interact with the real world so it's going to be able to do this I mean if you told the robot you need to be the most helpful robot ever um it's gonna be crazy because this robot is going to be doing stuff on the data that it is given and based is being fed into sensors and into its cameras so it's definitely really cool as well you can see here that they have face recognition because it says Kobe Bryant is on the left and Kobe Bryant has won five championship rings you've also got another example here which says can I go down the street on a bicycle yes or no let's think step by step do not enter accept bicycles do not enter except bicycles though is able to differentiate between certain images here as you can see it says photo one has sunglasses on top of the folded clothes and photo 2 does not have sunglasses on top of the folded clothes so it's definitely really interesting as to how far this AI has come and honestly I didn't expect this because yes AI is moving quickly we'd expect that but the fact that we are now at a stage where robots can literally interact with the environment based on a simple given prompt is truly interesting because how quickly is this technology going to Pro progress and imagine if these robots are engineered by Boston Dynamics combined with this technology or many different kind of robot makers that are out there how quickly and how fast are these going to be deployed in Real Worlds
