# Open AI's SECRET AGI Breakthrough Has Everyone STUNNED! (SORAS Secret Breakthrough!)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=6Xv6uSeXbJ4
- **Дата:** 18.02.2024
- **Длительность:** 28:32
- **Просмотры:** 122,065
- **Источник:** https://ekstraktznaniy.ru/video/14514

## Описание

✉️ Join Our Weekly Newsletter - https://mailchi.mp/6cff54ad7e2e/theaigrid
🐤 Follow us on Twitter https://twitter.com/TheAiGrid
🌐 Checkout Our website - https://theaigrid.com/

Links From Todays Video:

https://twitter.com/DrJimFan/status/1758549500585808071 
https://twitter.com/ricburton/status/1758378835395932643 
https://twitter.com/stephenbalaban/status/1758375545744642275 
https://research.runwayml.com/introducing-general-world-models 
https://twitter.com/yumidiot/status/1759179217483485442 
https://twitter.com/NandoDF 
https://openai.com/sora

Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos.

Was there anything we missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#Artifici

## Транскрипт

### Segment 1 (00:00 - 05:00) []

So there was a recent level AGI breakthrough at OpenAI and many people did miss this because it was caught up within the news at Sora and hidden in the research paper/ the blog post that they did in more detail on their new astounding technology. There were some key details on the AGI breakthrough that OpenAI has made. So let's dive into exactly how this is going to lead to AGI and this recent breakthrough in Sora that led to that. So one of the first things that we did see from Sora was this. They stated that video generation models add world simulators. So we explore large scale training of generative models on video data. Specifically we train a text conditional diffusion models jointly on videos and images of variable durations, resolutions, and aspect ratios. And we leverage a transformer architecture that operates on space-time patches of video and latent codes. Our largest model, Sora, here's where it gets interesting, is capable of generating a minute of highfidelity video. And our results suggest that scaling video generation models is a promising path towards building generalpurpose simulators of the physical world. Now essentially what that statement means is that we are going to have a situation where these world models in these AI systems can really understand exactly how the physical world how physics and how everything actually relates to one another in threedimensional space and this is a key component of AGI and they essentially state here that their results based on what they've done through Sora based on some of the other experiments that they probably have done with Sora and scaling it up even more that scaling these video generation models is the path towards building general purpose simulators of the physical world. Now, if you don't know what world models are, which is essentially how these AI systems manage to think about AI systems, you might want to take at this video by Runway because they did a very good job in this video of talking about how language models are essentially going to need a world model in their head to truly understand how to generate videos and why that was needed. So, I'm going to show you guys around 60 seconds of this clip and then you guys are going to understand that and then I'll dive back into why this is so Crazy. And this is where my dog comes in. Hi, Reuben. Reuben also has an internal model of the world based on things he knows. And it looks something like this. He learned that if we go this way, there's a higher chance of us going to the park. And if we go that way, there's a dog that kind of looks at him funny. He knows that there's usually sidewalk chicken scraps right here and that these stores known as pet stores that have the treats that he likes. Once we get to the park, this is where all the butt sniffing happens. With all of this data, the sights, sounds, relationships of things, Reuben has figured out how to predict certain outcomes and adjust his behaviors. Just like the general world model, models have the ability to generalize its understanding to new and unseen data. It knows how to imagine the future based on its knowledge of the world. It's kind of like how Reuben knows to avoid other dogs that also look at him funny and how he knows to drag us into pet stores that he's never been to. We believe that by training these models to predict the next frame or token in a sequence, the models will learn a much more detailed understanding of the world, including the wise and the hows, than the large language models. So, what does this all mean? It means that pretty soon general world models will allow us to simulate worlds that more closely reflect our own. That video I included was a small section of the runway general world models and I think it goes to show exactly where we are headed and this video was released a couple of months ago and it's clear that they were looking in the right direction. Now in addition to AGI which is of course the path that OpenAI are pursuing this is someone who is working at OpenAI. Okay, this is of course Bill Pebbles and he is working on Sora and at AGI at OpenAI which shows us that these two different sections of the company are very intertwined. Now of course there was a

### Segment 2 (05:00 - 10:00) [5:00]

recent tweet here by him on the day that Sora was released and he stated Sora was here. Tim Brooks and I have been working on this at OpenAI for a year and we're pumped about pursuing AGI by simulating everything. And it's clear that essentially they're stating that a key component to AGI is being able to simulate the real physical world to truly understand how things interact with one another so that you can generalize and predict things with striking accuracy. And the crazy thing about this is that I don't think that we are too far away from this. And I'm going to show you guys why their AGI breakthrough and why the saw a breakthrough. And what it shows us is that AGI is not that far away in terms of capabilities because I think truly all that open AI needs now is just the compute that they are asking for. So let's take a look about this. Okay. And this is potentially proof that AGI could potentially be just scale or a key component to AGI could just be the scale of pretty much everything. So the clip that you're seeing now is a video clip from OpenAI's research page. And essentially what we can see here is that this is the base compute. And compute of the text to video model. You can see that it's rather bad. You can't really understand what's going on with the dog. Then once it's scaled up four times, you can see that we can actually start to get a pretty decent video. This looks like some of the early texture video models that we do have. And you can see that the dog, you know, prancing around in the snow with the person behind them doesn't look too bad. This looks a lot more coherent and this is you know compared from the base compute to four times more compute. Then essentially what we do is we scale it up again four more times and you can see that 16 times more compute results in a much more coherent video which you can see clearly demonstrates the ability to grasp exactly what's going on in the video with a high level of fidelity. So it's clear that what they're stating on this is that this compute that they're doing on certain things that they've managed to design shows us that of course compute is a very key shows us that compute is a key factor in determining how we get to AGI and before like we stated with GPT3 to GPT4 and GPT3. 5 to other increased level systems we know that compute was a huge factor and allowed for much greater breakthroughs. So in addition, there was also this tweet by someone and essentially he stated that this is the strongest evidence that I've seen so far that we will achieve AGI with just scaling compute and it's genuinely starting to concern me. I used to think that we would run into roadblocks. The end of scaling laws and maybe we don't have the right model architecture, power density walls, the end of Moors law and problems related to the dimensionality of multimodal data. And of course reachers are and of course researchers being out of ideas because they're already using mixture of experts and other problems. However, he's stating that it increasingly looks like we will build an AGI with it just scaling things up an order of magnitude or so and maybe just two. And it also seems clear that Sam Alman and others at OpenAI have possibly already come to the same conclusion. given their public statements, the chip and scale ambitions, it's genuinely starting to concern me. And he states here that I'm not concerned because I am an AI doomer. No, I'm wholeheartedly on the side of computational and scientific freedom. And I think that the risks are far from existential. I'm concerned because of course, number one, post AGI, the world is about to change immensely. And number two, I cannot see this wild new future clearly. And I find it difficult to predict exactly what will change with the advent of AGI. And this change is unknown. And because of this, I'm concerned and even fear this change. And this does make sense. Ladies and gentlemen, we've already seen that Sam Alman and Openi potentially have come to the same conclusion. And given their public statements and the previous videos in which we've done deep dives on where OpenAI is heading, we know that the recent request from Sam Alman to scale and raise up to 7 trillion to boost his GPU chip supply is absolutely incredible. Now, something that most people did kind of miss that when we were looking at the entire 7 trillion chip supply thing was that Sam Alman didn't just ask for $7 trillion just for GPUs. He actually asked for 7 trillion to just increase his compute. And of course, he just wanted a lot of different data centers and many different things and pretty much anything to do with scaling up the compute. So, it wasn't just GPUs, it was the true nature of trying to scale things up. And I think some people did miss that. So, it's clear from this tweet and a bunch of other people. And if we start to tie together certain things together, we can start to realize that, you know, potentially we already do have the concept of AGI potentially at the OpenAI headquarters. But maybe all they need is just enough compute to

### Segment 3 (10:00 - 15:00) [10:00]

where potentially they've got like four times compute and it's a really good system and maybe they've realized that look, we don't have enough compute and once we scale it up to a level where that is unfathomable for most people, we could potentially have AGI or even the birth of ASI. And I do think ASI is really far off but I do think that the birth of AGI isn't going to be that far. Now in addition something that I did just want to add quickly was of course the image generation capabilities of OpenAI's Sora. It's actually able of generating images and they do this by arranging patches of gorian noise and a spatial grid with a temporal extent of one frame and they can generate images up to 2K quality. And you can see that these images are really, really good and are actually on the level of midjourney. And I would say that sometimes, you know, OpenAI, one thing that I do like about their Darly 3 is that the Darly 3 is actually pretty good at understanding exactly what you mean when you type it. MidJurning is more focused on the I guess you could say quality. Something that I actually did want to add that was rather fascinating was the fact that Yan Lehan a couple of days ago actually did make a statement stating that we don't know how to do this. And of course, whilst this is taken out of context, I think it's important to see that if some of the most esteemed AI researchers and people that are involved within the development of this technology are even surprised by the capabilities of the system. It definitely means that there is a lot going on here in the video. Predicting the next frame is too easy. You have to ask it to predict uh multiple frames. And basically, we don't know how to do this properly. It doesn't work for video. What works for text And the only technique that uh so far that has a chance of working for video is a new architecture that I've called JEPA. That means joint embedding predictive architecture. I'm not going to explain to you what it is. But here is a funny thing. It's not a generative architecture. So uh the so the joke I'm saying is not a joke at all. I really believe this. The future of AI is not generative. A lot of people now talking about generative AI like it's you know the kind of the new thing. I think if we find ways to get machines to learn how the world works, they're not going to be generative. Um, so you know, it's new architectures, right? So getting machines to understand how the world works by I do think as well what's crazy about this is of course that Yan stating that we don't know how to do this and this is a comment that I think most people should take away is that of course I don't think anyone can claim to understand AI much better than Yand Khan. So if he's surprised, we should all be reconsidering how much we truly understand and can predict about the future of AI ourselves. Now essentially with the Jeeper architecture that he was actually referring to, Meta actually did release the video on how that actually works. So I'm going to show you guys that now and you guys should pay attention because um this is a rapid development that was genuinely overshadowed by OpenAI's Sora release. And this is of course another step to AGI that most people just simply aren't talking about today. Machines require thousands of examples and hours of training to learn a single concept. The goal with JEPAS, which means joint embedding predictive architectures, is to create highly intelligent machines that can learn as efficiently as humans. VJE is pre-trained on video data, allowing it to efficiently learn concepts about the physical world, similar to how a baby learns by observing its parents. It's able to learn new concepts and solve new tasks using only a few examples without full fine-tuning. VJA is a non-generative model that learns by predicting missing or mask parts of a video in an abstract representation space. Unlike generative approaches that try and fill in every missing pixel, VJA has the flexibility to discard irrelevant information which leads to more efficient training. To allow our fellow researchers to build upon this work, we're publicly releasing VJA. We believe this work is another important step in the journey towards AI that's able to understand the world, plan, reason, predict and accomplish complex. There was also a very interesting part of this because this is where we get to AGI and emerging capabilities. So they essentially state and this is just by far the fascinating piece of this paper/blog post and they stated that we find that video models exhibit a number of interesting emerging capabilities when trained at scale and these capabilities enable Sora to simulate some aspects of people animals and environments from the physical world and these properties emerge without any explicit inductive biases for 3D objects and they're purely phenomena of scale. So you can clearly see here that them stating that we find that video models exhibit a number of interesting emerging capabilities when trained at scale shows us that like I stated before in another video that when things are going to be trained at scale when the next assessor system whether it be GPT5 GPT6 or whatever AI system is trained at a ridiculous level of scale emergent capabilities are going to occur that will lead to massive breakthroughs and this is what we're seeing here because if we have a system that is able to you know like get real physical aspects and

### Segment 4 (15:00 - 20:00) [15:00]

physics simulations correct to a high degree, then we could essentially get a real comprehensive system that leads to AGI in terms of the fact that not only are we getting highquality data, we could be getting a system that truly understands the entire world better than we do in certain aspects and could predict certain things. So these emerging capabilities are of course things that we can't predict. But when we do have, you know, smaller systems that we can kind of see where things are going to go with compute, that's why this is crazy. And remember guys, this part of the paper/blog post, they essentially state that these properties emerge without any explicit inductive biases for 3D objects, they are purely phenomena of scale, which does go to show that once we scale things up even more, guys, can you imagine the kinds of AI systems that we're going to get? And can you imagine the emergent capabilities that could come out of this kind of scale guys and it's pretty crazy and we do know that things have happened in the past with GPT4 and the theory of mind that we did get and the reason I'm so excited slash also a little bit scared is because if this is all we get from scale in the sense that Sora can generate videos with dynamic camera motion and as it shifts and rotate people and elements move consistently through three-dimensional space that means that we are going to get to a situation where with highfidelity simulation, okay, the ability for Sora to generate highfidelity videos for up to a minute long introduces the it literally introduces the possibility of creating detailed simulations of the physical world. And of course, for AGI to be able to simulate complex dynamic environments and interactions is essential for understanding causality, physics, and social dynamics. And of course, all of this is going to enable a more sophisticated reasoning and prediction model. And this video was definitely by far the craziest video I ever saw out of the entire paper. You can see that someone is literally painting and they are painting what appears to be a kind of I think pink tree. I'm not exactly sure of what kind of tree this is, but the point is that someone's doing a watercolor painting and it looks fascinating. Like in terms of how real this looks, this is something that I didn't think we would get for literally a couple of years and this has shattered my timelines on my expectations. Now, what I've titled this here is of course realworld simulation. Are we going to get to a stage in the future where Sora is able to generate things really, really well with, I guess you could say, a level that means it could generate the real world effectively. Remember, okay, Sora is not just, I guess you could say, a kind of model that's just generating, you know, pixels on a screen and just physically, uh, you know, doing it all randomly. It's essentially kind of simulating what would happen. And of course, right now it's not 100% accurate, but could we get to the stage where we could have a text to video generator that simulates the real world with physical accuracy in a remarkable way? I think that is going to be something that we do need to see in the future and potentially what could lead us to AGI. Now, of course, there was also something as well. I'm not sure if I did this. Of course, they also stated that these capabilities that continued scaling of video models is a promising path towards the development of highly capable simulators of the physical and digital world and the objects and people and animals that live within them. And of course, what you're seeing is Minecraft gameplay and you can see that it manages to simulate Minecraft very, very well. Now, of course, there are different physics elements in Minecraft as well that of course that this video didn't get really correctly cuz of course you have this creature just like sliding across and just moving randomly. But in the future, the point is that things are about to get absolutely insane once you know the like open source manages to catch up to this kind of level. And we definitely are moving very quickly and even quicker than I did think. Now there was also a very interesting statement from of course Jim Fan and this is a point that I was trying to make. It says I see some vocal objections. Sora is not learning physics. It's just manipulating pixels in 2D. And I respectfully disagree with this reductionist view. It's similar to saying that GBT4 doesn't learn coding. It's just sampling strings. Well, what transformers do is just manipulating a sequence of integers, which are token IDs. And what neuronet networks do is just manipulate floating numbers. And that's not the right argument. Sora's soft physics simulation is an emergent property as you scale up text to video training massively. So understand that what we're seeing here is of course an emergent property of this kind of system. And the thing is that in order to I guess you could say generate the right kinds of things, you kind of do have to have some kind of understanding of physics or else it's going to look really weird. If you've ever done 3D animation before, you'll know that in order to animate certain things, you'll need to understand exactly how physics does work to make certain things look realistic or else it does look really weird. And the point is as well is that many people do state that AI doesn't understand this, it doesn't understand that. But do humans, that is a real question, like do humans really understand how the figures of certain things work? Because when we compare AI to humans, I think a lot of what we do are unfair comparisons. It's

### Segment 5 (20:00 - 25:00) [20:00]

like stating all it's doing is predicting the next token. But if you asked a human to write like, you know, a 2,000word in, you know, a couple of seconds, um, it literally can't do that. Whereas AI really can. So a lot of times when we make certain comparisons, it's important to realize that, you know, although we compare ourselves to AI quite a bit, we have to understand that these are completely different systems that also do the same kind of output in a vastly more efficient way. And of course, we can and of course the tweet continued on stating that GPT4 must learn some form of syntax, semantics, and data structures internally in order to generate executable Python code. GPT4 does not store Python syntax trees explicitly. like it doesn't store this data in its head but it has to learn certain concepts enable to satisfy the objective. So it says very similarly sorup must learn some implicit forms of text is 3D transformations ray tracing rendering and physical roots in order to model the video pixels as accurately as possible. It has to learn concepts of the game in order to satisfy the objective. And that is true like when you're walking, you know, when you're a kid and you're growing up and you're learning and all this data is going in through your eyes as you grow up, you have to learn intuitively like literally intuition certain concepts of reality in order to satisfy the objective of being able to walk, being able to look, being able to catch things. You know, you you're not like writing things down and studying. It's all intuitive as more and more data goes into you as a human and as you grow and as you learn. And this is exactly what these systems are doing because these systems aren't just you know storing this and then calling on it. It's essentially learning it intuitively which is an implicit form of learning which is why this is such a fascinating thing. And he essentially states here that the difference is that Unreal Engine 5 is handcrafted and precise but Sora is purely learned through data and is intuitive. Now, there was another fascinating tweet here by Nando, the person that leads a team at Google Deep Mind. And essentially, he says, "Given this intuition, the way that Sora learns, I cannot find any reason to justify disagreeing with Dr. Jim Fan. With more data, high data quality, electricity, feedback, fine-tuning, grounding, and parallel neural net models that can efficiently absorb data to reduce entropy, we will likely have machines that reason about physics better than humans, and of course, teach us new things. And that is where some incredible breakthroughs could happen. Imagine if we do have a system that is able to, I guess you could say, you know, think about the physics in a way that we couldn't and is able to, you know, kind of simulate exactly what would happen in a billion different scenarios in order to generate some kind of new information. That is something that we could be experiencing here and we could just be starting that. I mean, of course, in the future with more scale, we don't know if things are going to break, work. That is of course how things experiment and of course there are a lot of disagreements about this. But I do think that the ability to understand and predict the world, including the physical laws, is seen as an emergent property of neural networks as they process more data and refine their internal models. And of course, what's fascinating about this, which is what he stated, was that this actually parallels how intelligence has emerged in biological life. And of course with enough highquality data with enough feedback neural networks might they day might one day reason about physics and the world better than humans which could potentially teach us new things. Now there was a fascinating comment which is of course this has got to be what I saw and if you don't know what this is referring to IAT which was someone who held a very senior position at OpenAI and is regarded as a genius and someone that is very imperative to OpenAI success in terms of developing new models and new AI breakthroughs. And of course, he did have some kind of situation with OpenAI where he tried to remove Sam Alman from the board. And people are speculating that potentially Ilioska saw something that was along the lines or quite similar or maybe even more advanced than Sora that could lead to some kind of AI system that could potentially of course harm humanity, which is why he tried to remove Samman. Now, of course, I just want to state that is complete speculation. We haven't really heard anything from IAS cover for quite some time other than his position at OpenAI is quite confusing from a few articles that I found online. So currently we don't know what IA saw. the breakthrough is that led to the firing of Samman and of course the subsequent rehiring. But we do know that OpenAI what they've shown us today their mini demo is of course quite shocking. Now of course the thing that they ended this with was rather fascinating. So they basically stated at the end of the research blog that we believe that we so they basically stated at the end of their capabilities that so they basically stated at the end that we believe that the capabilities that Sora has today demonstrate that continued scaling of video models is a promising path towards the development of capable simulators of the physical and digital world and the objects and people that live with them. So basically what they're stating is that the continued scaling, okay, remember right here, continued scaling of video models is a of capable simulators of the physical and digital worlds and the objects. And of course this shows us that all they

### Segment 6 (25:00 - 28:00) [25:00]

need again is more compute. And of course that leads us back to this point which is the $7 trillion. So we do know that $7 trillion is quite a lot. But I'm guessing that they need more compute because they clearly do have something in which once they scale it up, it is going to result in a advanced AI system that most people simply aren't ready for. And of course, if you remember, we also know that the compute is going elsewhere. They are surely able to reduce the amount of GPT4 messages. I mean, increase it to probably unlimited if they really want to. But I'm guessing that the compute for GBT4 is going elsewhere. I mean, think about it like this. So if you have let's say you have 100% of your compute and you're working on new AI systems you're not going to be able to divert 90% of that to GPT4 whilst it is a large percent of your customer user base you're going to be literally diverting I think more than 50% of that to the advanced AI systems that are going to be successor systems because that is what is eventually going to replace those old systems and I think the fact that we haven't seen any kind of you know increase in the messages in over a year shows us that the compute is probably going to more advanced AI systems that they are already training and working on. So with that being said, what do you think happens once they scale up more? Because that is what I want to know. Okay? And I think once they scale up more, things are going to be absolutely interesting. Now, there was another tweet that I did see that was absolutely crazy. Okay? And this is an interesting thread that shows us that Asia is just on the horizon. So this tweet Twitter thread essentially states that after Sora it became very difficult for me not to connect the dots and come to the astounding conclusion OpenAI already has AGI. So he says one dot is obviously the existence of Jimmy Apples and his leaks and essentially he states that Jimmy apples scores very well. On March 14th he got the GPT4 release date. He got the Gobi and Arachus names before they were officially released. He also incredibly got Sam Alman's firing around a month or two before it actually happened. And on February 15th, he actually got the release date for Opening Eyes Sora correctly. But the above dots may only qualify him as a legendary leaker. And what makes him and one of the key tweets that is an example of this is that of course Jimmy Apple tweeted that there's been a vibe change at OpenAI and we risk losing some key ride or die employees. And of course Sam Alman was fired on November the 17th. And it's crazy because Sam Alman didn't actually know about the firing. Normal people that worked at OpenAI didn't know about the firing. And even Mera Morati firing until the day before the firing. And yet Apples knew about the vibe change resulting to Sam Alman's firing. And apparently Jimmy apples clearly has sources at the top level of OpenAI. And of course, Jimmy Apple says that whilst he loves trolling occasionally, he never lied once. And of course, he once said that AGI has been achieved internally and mentions AGI in 2025 multiple times. and I confirmed at one time that he is serious, not trolling at the slightest. In addition, they were sitting on open eyes sora for almost a year and GPT4 finished training in July of 2022. And according to the felt, there are GOI leaks, Arachus, of course, Qstar, 4 Pete, Orion, and it sounds like there are much more advanced systems like GPT4 that are all not yet released. All of this is to pretty much state that does OpenAI have AGI? I'm giving them a huge benefit of the doubt. But with all the leaks going on and of course this person I think is an AI leaker as well, it's clear that this AI breakthrough is something that is fascinating and will lead to a truly interesting future. Yeah.