Checkout Free Community: - https://www.skool.com/theaigridcommunity
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Intersted In AI Business: https://www.youtube.com/@TheAIGRIDAcademy
Links From Todays Video:
https://x.com/TheHumanoidHub/status/2015836108668731723
Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.
Was there anything i missed?
(For Business Enquiries) contact@theaigrid.com
Music Used
LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s
#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
Оглавление (3 сегментов)
Segment 1 (00:00 - 05:00)
So, did Yanl just expose the robotics industry? We need to dive into this video. So, let's talk about it. So, welcome back to the video. And this video clip has been going absolutely viral. Yalinkan said something in a recent interview where he essentially exposes the entire robotics industry. Take a listen to this one minute clip and then I'm going to dive into every single thing because we have to talk about this. There's a lot of companies building humanoid robots and they do those kinds of you know they play kung fu and you know impressive thing. This is all precomputed. Okay. Uh none of those company absolutely none of them has any idea how to make those robots smart enough to be useful. — Right? That's a big secret of robotics industry. Um you can't train them on very narrow tasks and you have to collect lots and lots of data. the same way, you know, people thought they would build self-driving cars. Um, and it's expensive and it's only practical for a small number of narrow tasks and you don't have robots that, you know, have nearly as good as a common sense as your housecat, let alone human intelligence, right? So, um, that's the, you know, that's the challenge for the next few years, right? Getting system to really understand the real world. And the problem is that the approaches that have been successful for language do not work for highdimensional continuous noisy data. You have to use something else. So yeah, this was a pretty crazy statement because he basically called out the entire humanoid robotics industry. And I think you have to understand that the humanoid robotics industry there are many major players in this space. But he does have a point. Okay. So, let's get started with one of the first things that he said, and that's that a lot of these, you know, robotics demos are precomputed. And he's right about this. Okay, most people don't realize that when you're seeing these Unitary G1 demos, these are precomputed, you know, actions that aren't fully autonomous. Now, what I mean by that is that yes, it's autonomous in the ability to judge the environment and move around. It's, you know, able to do that, but it isn't a robot that walks around, knows how to pick up, you know, a plant and reason about the environment that it's in. This is a, you know, flashy precomputed demo. And most people don't also realize the amount of time that these robots actually do fail in practice. Like right now, this is no hate to Unree at all. Okay? But you'd be surprised at the amount of times that these robots fail offcreen. You have to understand that this is a video presentation presented by the company and I'm only using Unitry here because they're largely one of the most widely known robotics companies and they are, you know, pretty much the gold standard when you do see videos like this. So most people don't realize that when, you know, these demos are happening, there might be several scenarios in which there are failure modes and, you know, just edge cases where the robot just fails. But those companies, it's against, you know, it's not really their incentive to be showing those to the wider population. Their goal is to drive more sales, drive up hype, and of course push the industry forward in terms of getting more individuals interested in humanoid robotics. So Yanuk is essentially saying here that look, whilst yes, these robot companies look super impressive, they're actually not as impressive as they seem because it's all precomputed. Now, here's the crazy thing about this is that I actually did some digging and I wanted to dive into certain companies because I was like, "Okay, Unitry, that's fair enough, but what about Boston Dynamics? " So, I actually Googled this, right? And I looked at the CES in 2026. Okay, now this was where we first got to see the new version of Boston Dynamics Atlas. Now, what was actually crazy is that the CES 2026 Atlas demonstration was actually remotely controlled. Although the product version is intended to function fully autonomously and that's crazy because the big unveiling of the production ready Atlas at the world's biggest tech show was teleyoperated. Now forgive me if I'm wrong but I've used a lot of research agents and every one of them has come back saying this was operated but that wasn't a lot of knowledge. I think most people did think that this was you know fully autonomous. Now, it's not to say that it couldn't have done it fully autonomous, but you have to remember that there is a huge inherent risk in putting your robot on stage for the world to see at one of the most visible tech conferences of the year and having it be fully autonomous. Okay, I think it makes sense to have that to be teleyoperated so that you reduce the risk of any failure. So, I mean, this is essentially what Yanukin is saying. Demos look incredible, but if you strip away the choreography and the remote control, you don't have general intelligence. Now, what's crazy is that if you actually dive into the timeline for what Boston Dynamics are doing, they are, you know, deploying these robots in 2026 with the goal of performing industrial tasks. And apparently, they plan to deploy Atlas in part sourcing operations in 2028 and then to do more in 2030. Now, the timeline is actually interesting here because they're saying that they're going to be doing 2028 for
Segment 2 (05:00 - 10:00)
basic part sorting and 2030 for more complex parts. So, these companies, they're not really claiming general intelligence now. And this CES demo, it shows Yanlakan's point. He's basically saying that look, hardware is 100% what these guys have. They have impressive mechanical capabilities, but none of them have general intelligence that can figure out novel tasks, and many of them still need a human operator. Now, this entire thing just had Twitter in a frenzy. People were saying, "Oh, here goes Yan Lakhan again. " Some people were saying, "Why is Yan Lan such a negative Nancy all the time? " And of course, Elon Musk decided to give his response. And he said he thinks if he can't do it, then no one can. Now, of course, you have to understand that Elon Musk is referring to, of course, Tesla's Optimus humanoid robots. He's of course working on humanoids at Tesla. So he's saying that look if Yan Lakun is not able to do it he thinks that no other company is able to do it. Now Yan Lakun he says immediately well that's actually quite the opposite. I know I can do it and I know how to do it which is a bold statement saying that you know how to solve an issue that an entire industry is trying to solve and simply haven't solved is a pretty bold statement and he's saying it's not with the current techniques that everyone is betting on. My bet is famously on Jeppa and world models and planning and at some point you'll realize that I'm right. Now, for those of you who don't know what VJER is, this is basically where you teach an AI to understand videos by making it predict what's missing, not by filling in pixels, but by understanding the root concepts. So, this is super different from current AI. Okay? And normal videos, normal like videos like chatif video, they see a video frame, then they predict the exact next pixels, which is like memorizing a red ball at position X moves to position Y. Now, VJepper, which is Yan Lehan's solution to this entire problem and the robotics industry he's trying to expose, is that if you see a part of a video with certain chunks masked out, then you predict what's in the master parts conceptually, not pixel by pixel. It's like understanding the ball has momentum, so gravity pulls it down and therefore that ball will arc. So, you basically show, you know, like let's say for example, you showed someone a video with the middle 5 seconds blacked out. pixel by pixel prediction is going to try to guess every pixel color, but VJER understands that the person was throwing the ball. So, the middle pop probably shows the ball throwing through the air and landing. Okay, so that's the key difference between VJER and other things. And VJER learns by predicting the missing parts of a video in an abstract representation space with flexibility to discard the unpredictable information. It's not trying to memorize in detail. is trying to learn the underlying concepts such as the physics, the patterns and of how the actual world works. And the predictor in VJ Jeppa can be seen as a primitive world model that's able to model spatial uncertainty and predict highle information about unseen regions rather than pixel level details. So think about it like this. If a robot does actually have vjer and it does work, it means that it could watch you pour water once. understand that concept. Liquid flows downwards, fills the container from the bottom up. And then it could understand that entire concept to pour liquids into completely different containers. And that would, you know, prevent the models from needing 10,000 examples of pouring water and then juice and then 10,000 into pouring into cups that are shaped like a, you know, a tall cup, a short cup. I mean, basically, okay. VJEA is Yanlen's attempt to build an AI that learns principles and physics from videos, not just memorizing patterns like generative AI and it still is early research, but this is basically the, you know, structure and the kind of thing the Anakin is saying that this is what is going to work. Now, you can see there were some very harsh public opinions here. Someone said that this is the type of BS that makes many people in the field dislike him so much. He doesn't just have a contrarian opinion. He actively thinks everyone in the field on these domains are stupid and not good enough while him with his great jeoper will save the world with cat level AGI. And of course another counterargument said here that a 17-year-old doesn't learn to drive in 10 to 20 hours. It's more like 10 to 20 hours plus 17 years of reinforcement learning and a very robust physics environment on top of an insane amount of pre-training via evolution for millions of years. And so essentially what he's saying here and he makes a very interesting point. He's basically saying that look, humans don't learn to drive from, you know, scratch in 10 to 20 hours. They come preloaded with 17 years of embodied experience, which is your physics, intuition, your object permanence, your spatial reasoning, and millions of years of evolutionary optimization through visual processing, mode control, and threat detection. So, when a teen learns to drive, they're actually fine-tuning on a massive foundation of world understanding, not learning from zero. And compare that to what AI is doing. A lot of people would be like, okay, teens can, you know, just be 17 or 16 and immediately learn to drive like that. But how is it that robots need millions of, you know, examples? And then when we think about this, this is an argument that actually supports Yanakan's theory
Segment 3 (10:00 - 13:00)
because he's admitting that you need a world model foundation. The only debate that I think people are missing here is that you're debating on how do we get that foundation. This guy is basically arguing that, you know, we want to get that through the evolutions, pre-training, through massive amounts of data. we just need more, you know, examples. That's like the standard industry, uh, you know, position. But Yan Leon's position is that, you know, you can't implicitly learn world models through pattern matching or demonstrations. You need an explicit world model architecture. The critical question, I think, you know, that's going to continue to come up is that can robots actually develop intuitive physics and common sense implicitly from millions of hours of demonstration data or do they need explicit mechanisms to build those predictive world models? and Yanlakan is essentially betting on the latter that explicit world modeling is required. You won't be able to just scale your way there with more demonstration data using current methods. They're fundamentally the wrong structure. And of course, like I said, there were some pretty rough counterarguments saying that this guy is going to look like an idiot in 5 years when we train robots end to end with world models. He's using the same arguments as he did for LLMs and they all failed. Yan is one of those people that are spiritually correct that LLMs and whatever might not be AGI, but practically this just falls apart. You don't need AGI. You need human sample efficiency. And I think he's right. You just need to be able to sample the data efficiently to the point where you need 10 examples rather than 10,000. Now, of course, Yakin once again responds. He's not having any of this nonsense. And he says, "You do realize that I've been advocating for end-to-end self-supervised training of world models and planning for about 10 years and I've made a lot of progress over the last five and actually made it work for simple robotics tasks over the last two and he just started his company to make it practical. Not sure who's going to look like an idiot. " So of course that's Yanlak's rebuttal and remember I've just, you know, explained to you guys Jepper 2. And he you know additionally says here that a few robotics companies are working on world models planning but the vast majority are using LM derived methods like VLA or diffusion policies with RL fine-tuning in simulation. Those are good for narrow tasks but the companies building humanoid hardware don't tend to be working on innovative robotics in AI. And I think that's a key point. He's basically saying that look, all of these companies that are working on these flashy humanoid robotics demos, they're not working on innovative robotics AI that actually powers it. They're just making the humanoids look really good. Now, to summarize this entire thing, it's basically on one side, you've got the industry saying, "Look, we're going to need hours and hours of training data to be able to basically get these things to pattern match. " And Yanakun is basically saying that look, that's not the way to do it. We basically need to let these things understand how reality works so that even if some data or they reach some kind of you know strange area that they've never seen before they're still able to understand because they just understand the underlying concept and they don't need to see every familiar detail. So I think it's going to be super interesting. Yanlakhan is already attracting talent to his new lab. So it's going to be really interesting when he has the actual resources to purs to pursue this with millions and millions of dollars. So, it's going to be, you know, super interesting to see how his take actually goes. I'm actually really eager to see where he goes with this. But, let me know what you guys think about robotics. Do you guys think robotics is completely going to fail? Do you think Yanukan is right? It's super interesting. And I'll see you guys in the next