# Massive AI News: Google TAKES THE LEAD! LLama 4 Details Revealed , Humanoid Robots Get Better

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=WfDDo3hDtik
- **Дата:** 03.08.2024
- **Длительность:** 29:20
- **Просмотры:** 41,589
- **Источник:** https://ekstraktznaniy.ru/video/14150

## Описание

Prepare for AGI with me - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

00:00 - Introduction and overview of recent AI developments
00:23 - Meta's plans for Llama 4 in 2025
03:27 - OpenAI director's prediction on AGI timeline (5-15 years)
05:32 - Google DeepMind's Gemini Pro 0801 outperforming GPT-4
09:47 - Gen3 Alpha adding image-to-video capability
13:48 - Demo of Minty robotics humanoid robot
17:07 - Nvidia's Project Groot for humanoid robot development
22:07 - German company Neura's humanoid robot video
23:29 - Introduction of "Plan Like a Graph" method for AI planning
25:34 - Mark Zuckerberg discussing AI agent capabilities
27:27 - Demo of AI wearable device

Links From Todays Video:
https://x.com/TheHumanoidHub/status/1818024278656434425 
https://x.com/florianhoenicke/status/1818176766130676035 
https://x.com/TheHumanoidHub/status/1818726046633804184 
https://x.com/azed_ai/status/1818

## Транскрипт

### Introduction and overview of recent AI developments []

so with another incredible few days in AI let's take a look at some of the stories you really do want to know about because although people think the industry is slowing down a few key stories from today's video and you'll see that things are not slowing down at all now one of the things that we got from The Meta 2024 earnings call was that they're aiming for llama 4 to be the most advanced AI model in the industry in 2025 it's going to be

### Meta's plans for Llama 4 in 2025 [0:23]

trained on 10 times as much as computers llama 3 and Zuckerberg said he would rather build too much compute they're not enough now if you know why this is crazy is because you know previously when Mark Zuckerberg actually spoke about llama 3 was that he said that you know when they were training the model and they were doing their tests that the model was still getting better and essentially they basically said that look we just decided to release the model but we still could have made this better now I do want to state that if this is the most advanced model in the industry in 2025 and that would mean that this model would have to out compete Google out compete Claud SL anthropic and of course Dethrone open AI who's going to be releasing GPT 5 so it seems that the AI race is truly heating up because if that is their goal and you remember previously they were just like okay we're going to release AI models but now they're like look we want to have the most advanced AI model in the industry this is a completely different thing that they're stating figuring out the right level of infra capacity to support training more and more advanced models llama 3 is already competitive with the most advanced models and we're already starting to work on llama 4 uh which we're aiming to be the most advanced in the industry next year we are planning for the compute clusters and data we'll need for the next several years the amount of compute needed to train llama 4 will likely be almost 10 times more than what we used to train llama 3 and future models will continue to grow beyond that uh it's hard to predict how this trend uh how this will Trend multiple Generations out into the future but at this point I'd rather risk building capacity before it is needed rather than too late given the long lead times for spinning up new infrastru so there you have it they are focusing on building Lama four and continuing the AI development so for those of you that think okay this is you know slowing down it's clearly not I truly do believe that they can get it done but I'm wondering how on Earth are they going to achieve having the most advanced AI model in the industry in 2025 not that meta can't get it done but we've seen recently like there's almost been this sort of convergence around this new GPT 4 level but I'm really wondering if GPT 5 and you know Gemini 2 and Claude 4 manag to pull ahead can meta's llama 4 managed to once again up the stakes again so we've seen them catch up before so I mean this time maybe they're going to pull out some new stops some new architectures some new different ways to probably reason to where this is going to be possible now I'm not sure if this is going to be open source I do remember that previously there were some internal disagreements about open sourcing this new model so I do wonder if that powerful model will actually be open source at all now in more prediction news open AI director says that artificial general intelligence may be 5 to 15 years a out so if you aren't familiar with opening eyes board of directors one of the board directors is Adam D'Angelo he's the CEO of Kora and

### OpenAI director's prediction on AGI timeline (5-15 years) [3:27]

he made a prediction during an event last week and he said that the Advent of AGI will be a very important change for the world when we get there hence the reason I've created my post AGI economics Community because if there's going to be some crazy event in the future where a powerful technology is going to change the world it seems like you might want to prepare for that scenario now he says of course following the reports of earlier where openi developed the new way to track its progress towards building AGI the company sharing a new five level classification system with its employees if you aren't familiar with that internally open AI had this you can see level one chatbots AI with conversational language level two reasoners and human level problem solving and level three agents and systems that can take action so um you know I've only spoken about the first three here because we're quite far away from level four and level five and I think you know what you can truly see here is that even before we get to AGI you can see that having human level problem solving is going to be a complete GameChanger and having systems that can actually you know take actions these are going to be really gamechanging events when they actually are surpassed because having systems that can take actions and having you know even on the latter end of things you know innovators that can Aid in invention an AI that can do the work of an organization this is going to be absolutely you know incredible level stuff that most people just aren't even comprehending due to the current you know nature of generative AI where there's some hallucinations here and there so I do think that once these levels are surpassed I think even before AGI things are going to get really crazy now I do think that this you know prediction right here is actually within the realm of possibility because it's not one that is too crazy and actually too safe in terms of bets because he does say within 5 to 15 years and you have to understand the window between 5 to 15 years is basically between 2030 to 2045 and if

### Google DeepMind's Gemini Pro 0801 outperforming GPT-4 [5:32]

we've seen anything you have to understand the AI development within the next 5 years considering what some people are predicting for 2030 we have to understand that the next 5 years of AI development are likely to be as twice as fast as the prior 5 years considering the fact that now we've not just got a little small San Francisco startup that's working on this stuff we've got literally all of the world's top Labs working on generating flash research searching you know better ways to make intelligence and there's a lot of stake here so I would argue that you know within 5 years could this happen most certainly it most certainly could happen but it will be interesting to see if there are any major roadblocks on the way to AGI and you know how those come about now in terms of actual developments regarding new forms of intelligence there is exciting news from the chatbot Arena now Google deep Minds new Gemini Pro experimental 0801 has been tested in the arena for the past week gathering over 20,000 Community votes and for the first time F Gemini has claimed the number one sport top passing GPT 40 CLA 3. 5 with an impressive score of 1,300 and also achieving number one on the vision leaderboard so gini 1. 5 Pro excels in multilingual tasks and delivers robust performance in technical areas hard prompts and coding so this is incredible you can see that Google new model they're 0801 and notice they're not calling this Gemini 2 although it very well could be but it's just Gemini 1. 5 Pro an experiment I'm guessing that they're you know doing some additional reasoning on top of the already made model because maybe they leared what anthropic did because anthropic did some kind of I'm not sure if they did post training added some architecture but one of the things that we know for sure that claw 3. 5 Sonic does is that it thinks really hard about it problem before generating response if you weren't paying attention at the time when Claude 3. 5 Sonic was released there was basically a prompt where you could ask Claud to show its thinking and it would show its internal thoughts before actually emitting a response and I'm not sure if it was that alone that put that allowed claw 3. 5 Sonic to just deliver so much better performance in terms of its reasoning ability but we do know that Google has definitely done something very similar as they've not you know gone ahead and trained an entirely new model from scratched but they've you know experimented with Gemini 1. 5 Pro and for the first time we've seen a GPT 40 truly defeated by a decent margin now I do wonder how long it is going to stay there for because sometimes you know models get there and then they don't stay there for quite some time and usually what we will see is from open AI that they usually manage to release a model you know within like the next two weeks and then they try and immediately reclaim the number one spot but I do have to be honest with you guys I've been testing this in the Google AI playground and it doesn't fall for all of your crazy tricks that traditional llms might fall for example one of the common things that these AI systems fall for is what's bigger 9. 9 or 9. 11 you can see that when I ask this model you can say you know 9. 9 is bigger than 9. 11 and of course it says think of it like this both numbers have the same whole number Part 9 9. 9 has 9 in the 10th place while 9. 11 has a one and since 9 is greater than one 9. 9 is larger this is therefore 9. 9 is greater than 9. 11 now of course some people could argue that this may have been in the training data because this was you know on Twitter at least a new widely discussed Benchmark for whatever reason but I do think that this model is actually a lot better because I've tried a lot of trick questions and unfortunately not really unfortunately but surprisingly they just haven't worked so I'll be surprised to see where this model does get stomped at in terms of its reasoning ability to answer certain questions because I even had certain questions on personal projects that I would only use clae 3. 5 on and this model whatever this model is it now actually manages to complete those questions really easily so I would

### Gen3 Alpha adding image-to-video capability [9:47]

definitely say that if you have any really hard questions that you know you struggle with or that you're using for Claude try out in this experimental version in the Google AI studio and see if it's able to do it even better than clae 3. 5 Sonet or at least on par now one of the craziest things that most people also did Miss and I nearly missed this because I did actually look at the main update but one of the things that was crazy was that gen 3 Alpha actually added image to video now I think text to video is incredible because it's just a text prompt and you're getting a video at the end of it but the reason that image to video is so crazy is because this allows so many different use cases because now your using images to actually steer what is being done by the video generation model now this has you know remarkable use cases because so many people are now using images from mid journey to be able to animate them and one of the things that I've seen time and time again from Runway is that this model for whatever reason is able to simulate not physics but certain liquids and other things in a very coherent way I'm going to show you guys a few examples of runways you know image to video but it's remarkable at how the physics engine is I'm not sure how it's being done since it's completely generative but I got to be honest you know like for example this one right here you can see exactly what this looks like and it's really effective but I'm going to show you some of the you know more popular examples that were floating around Twitter so for example if we take a look at this image on the left you can see that someone's probably taken this image from a window or on top of a rooftop you know for all I know it could be AI generated but if we take a look at you know the runway generation you can see that the simulation of fluid coming from behind whether it be a tsunami or a wave it's remarkably accurate in terms of how the fluid manages to come over the building now the reason that I find this so impressive is because one of the common mistakes of generative AI was that things usually don't manage to mesh together well you see object objects passing through objects you see objects just you know not going to law with the laws of physics and this was basically something that many people and many critics of AI said that generative AI would never get well so I'm not sure what you know architecture they're using but this is actually you know quite similar and quite on par with Sora in terms of how we're seeing the water managed to bend around certain objects now what's so interesting about this as well is that one of the things that I've seen from a few industry people when they're speaking about this is that they've said that you know this kind of tool is seemingly like a kind of VFX because like I said in previous videos if you've ever worked in fluid simulation in CGI you know that you know trying to render something like this you know using traditional CGI methods it takes serious compute and it's not to say that you know AI didn't take computer train but if you wanted to do something like this on a standard computer you'd have to probably you know leave your machine on for a few days in order to get actually good-looking fluid simulations or put it on a kind of render Farm because simulating all of these particles is really difficult and computer intensive for a single system but if you have things like runways gen 3 Alpha where you can literally just you know generate image to video this is going to be something that I think would allow people to explore new forms of VFX almost immediately and Runway did even mention that they're making this model as twice as fast with a new recent update that they're rolling out so I would definitely go ahead and test this on certain things some people are putting a liquid over various images and so far it looks really amazing so this is some remarkable stuff hello my name is mbot how can I

### Demo of Minty robotics humanoid robot [13:48]

help you hey mty please get the shopping cart and come with me I got your instructions I'm going to take the cart te thank you myy you are welcome it was my pleasure so that was the demo for the men robotics robot this was a short demo showing us how the future of humans and robots can potentially collaborate I think this one was rather nice because it actually shows you know what a lot of people do miss about this technology a lot of the times some people do forget that there are individuals that are less fortunate in terms of their abilities or disabilities and humanoid robots will actually help these people quite a lot more than you might think and it's definitely going to give them a new lease on life in terms of being in able to do tasks that they may not have previously enjoyed before or may have found difficult so I always find the fact that you know a lot of people sometimes hate on AI and it's understandable it's a technology that for the most part might just replace your entire career but on certain edge cases you know for example using GPT 40 with vision you know having a humanoid robot able to do difficult tasks for those people who are not especially mobile and as people manag to get older sometimes you know you can develop a chronic issue I think humanoid robots that are running on you know energy and able to you know jump up and grab this and run around and do this I think it's going to make you know people's lives a lot project root is nvidia's moonshot initiative at building a universal AI brain for all kinds of different humanoid robot platforms and at this graph we are introducing a set of tools for developers in the humanoid robot ecosystem to build their AI models better and more efficiently and this time we introduce a new synthetic data generation pipeline we start with human

### Nvidia's Project Groot for humanoid robot development [17:07]

collected demonstrations using a mixed reality device like apple Vision Pro and then we multiply that by a THX or more using nvidia's Suite of simulation tools like rooc Casa and mimic Chen at Jensen sigraph keynote he introduced the three comp the problem the dgx ovx and agx basically we use dgx the main Workhorse computer to process lots and lots of videos and text to train the multimodal foundation model for robots and we use ovx to run the Nvidia simulation stack such as rooc Caza ISAC lab and mimic gen and in ovx we can multiply the real world data by at least a th000 x using our simulation tools and then once the model is trained we will deploy that to the HX computer so that we can test the model on the real robot and on edge Computing devices with these three computers we will be able to enable the developers around the world to build better AI models for humanoid robot Hardware platforms I think this year is the year for hum robots we have seen lots of new V Hardware sprinting up in the ecosystem and we have seen the emergence of multimodal foundation models that can form the AI brain for these humanoid robots and with nvidia's exciting developer tools and simulation Suite I believe we are one step closer to solving the AI brain for humanoid robots the era of physical AI is here physical AI models that can understand and interact with the physical world will embody robots many will be humanoid robots developing these Advanced robots is complex requiring vast amounts of data and workload orchestration across diverse Computing infrastructures Nvidia is working to simplify and accelerate developer workflows with three computing platforms Nvidia AI Omniverse and Jetson Thor plus generative AI enabled developer tools to accelerate project Gro a general humanoid robot Foundation model Nvidia researchers capture human demonstrations seeing the robot's hands in spatial overlay over the physical world they then use robas a generative simulation framework integrated into Nvidia Isaac lab to prod a massive number of environments and layouts they increase their data size using mimic gen Nim which helps them generate large scale synthetic motion data sets based on the small number of original captures they trained the Groot model on Nvidia dgx cloud with the combined real and synthetic data sets next they perform software in the loop testing in Isaac Sim in the cloud and Hardware in the loop validation on Jets and Thor before deploying the improved model to the real robots Nvidia osmo robotics Cloud compute orchestration service manages job assignment and scaling across distributed resources throughout the workflow together these Computing platforms are empowering developers worldwide to bring us into the age of physical AI powered humanoid robots so that was nvidia's video where they basically talk about how we're going to scale humanoid robots with the Omniverse and by the looks of things this you know is pretty crazy we have an entire system that's set up for us to be able to collect data with the Omniverse and of course you can see the Omniverse Cloud the djx cloud we've got project grou and with all these systems with all these Nims and autonomous things going on we do have a situation where the humanoid robots are most certainly coming I mean if you watch my previous video I spoke about how figur new robot is coming in a few days and they're going to demonstrate exactly how effective it is in terms of its agility manipulation and endtoend ability so I mean this is definitely one of the areas that I'm most personally excited for because seeing a robot a real human robot interact with the real physical world I think this is where people are really going to start to see that okay maybe this AI robotics thing is coming sooner than I thought now in more robotics news we did have the German robotics company nura release a video of their humanoid robot 4 any1 performing tasks the robots

### German company Neura's humanoid robot video [22:07]

concept was unveiled in 2022 and it's one of the first to join the Early Access Nvidia humanoid robots developer program so coming and if you've been paying attention to you know exactly what's going on you will have seen that China have been developing an insane amount of robots and I know that this is of course for anyone this is not the Chinese robot at all it's actually German so it's a little bit different there but I do think that like people cannot fathom how many humano robot platforms are being built and just how many robots there will be in this world doing things in the Physical Realm that they just you know definitely didn't do before and it's just quite shocking that every time I see a new humanoid robot platform I'm just like wow another one another one that's capable of doing different things and I think the future is pretty clear now on where we're going to in the future it's just you know maybe 10 years 20 years but I think the future is going to look completely different when we have these robots just you know walking around and doing you know a million different things then we also had some more un hobbling and this is an easy trick to improve your llm results without any fine tuning so many people know F shot prompting or Chain of Thought prompting and a new better method was presented by these researchers at icml 2024 it's called plan like a graph and I'll let them explain it because they're going to do a better job than me our thought is

### Introduction of "Plan Like a Graph" method for AI planning [23:29]

basically um benchmarking how well can larange models do and uh asynchronous planning and uh so say you want to make a breakfast and you want to uh make a coffee fry an egg and make a toast maybe the best uh strategy for you is to do these uh different snaps at the same time but um so you want to make a coffee and you want to grind coffee beans and then buy the coffee so you kind of C that dependencies for some of the sub tasks so I guess for our setting we are mostly interested in given a complicated task as such where you have to parallelize some tasks and sequentially execute some steps how can you derive the shortest possible time for these complicated tox so uh we proposed a method which we call plan like a graph which is basically telling the language model to um do complicated planning uh like a graph by generating a graph first and then solve the task and we find that our methods are actually outperforming the Baseline methods across all the models we evaluated here and also have a prto improvement over T of all facities here so in general our test is a really good um as an off the shelf prom engineering method so no training at all only promp engineering so if you're interested definitely should try now the reason that this is rather important is because you know current llms do struggle with planning like planning was one of the things that they just don't get very well and this allowing models to plan a lot better is you know it's an insight to the fact that we still are unlocking new methods to increase the capabilities of this models which means once again that there's most likely still a huge reservoir of untapped potential in these models in ways that we don't understand and you know it basically talks about you know these models would initially struggle with asynchronous planning where you know for example if you're going to bake a cake for example you're going to preheat the oven um and initially a robot would just like preheat the oven and then wait 10 minutes and then move on to like rolling the dough whatever but a human would preheat the oven they'd roll the dough

### Mark Zuckerberg discussing AI agent capabilities [25:34]

they'd wash their hands do this and they'd be able to do it you know planning all of these things that we pretty much take for granted in the sense that we know how to do this you know almost a second nature but these models when they're thinking about how to do it in fact I'm going to show you guys you know a screenshot from the paper but basically you can see here um you know this is just the sequential planning which would take 65 minutes by just simply adding how long it takes to do every single step and then of course we've got parallel you know planning where it's like okay I can do everything at once which just isn't true so um yeah and then this is asynchronous planning where you set the oven on you then roll this and then of course then you manage to you know get your thing in the end so I think it's fascinating that we're still managing to find new ways to unlock capabilities from these systems and here is Mark Zuckerberg also stating something rather similar better it's not just about what you wanted to say I mean I think generally creators and businesses have topics that they want to stay away from too right so just getting better at all this stuff um you know I think the platonic version of this is not just text right you almost want to just be able to and this is sort of an intersection with some of the codec Avatar work that we're doing over time you want to basically be able to have almost like a video chat with the um with the agent and I think we'll get there over time I don't think that this stuff is that far off but the um the flywheel is spinning really quickly so it's it's exciting um there is a lot of new stuff to build and I think even if the progress on the foundation models kind of stopped now which I don't think it will I think we'd have like five years of product Innovation for the industry to basically figure out how to most effectively use all the stuff that's gotten built so far but I actually just think the kind of foundation models and the progress on the fundamental research is accelerating so um so that it's a pretty wild and of course if you did miss this was the friend product that actually broke the internet this is an AI wearable I'm going to show you guys this

### Demo of AI wearable device [27:27]

I think it's pretty cool you know I think at least people are outside not staring down at their phone but some people hate it some people say it's dystopian but let me know your thoughts about this as well let of breath we made it w I don't know what to wo very good that's fair all right let's go let me show you how to game bro okay oh come on oh let's go are you serious come on man I hate this game take notes baby oh man you guys suck bro you look like the back let's go dude what how did you do that I know the effects are crazy it's dank I could eat one of these every day sorry I got you messy it's really nice up here how'd you find this place I don't know I just kind of like to come up here to be by myself I've never brought anybody else I right besides her she goes everywhere with you right mhm guess I must be doing something right though I guess so we'll see