# this EX-OPENAI RESEARCHER just released it...

## Метаданные

- **Канал:** Wes Roth
- **YouTube:** https://www.youtube.com/watch?v=tUkD0oj92Qg
- **Просмотры:** 91,257

## Описание

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

______________________________________________
My Links 🔗
➡️ Twitter: https://x.com/WesRoth
➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe

Want to work with me?
Brand, sponsorship & business inquiries: wesroth@smoothmedia.co

Check out my AI Podcast where me and Dylan interview AI experts:
https://www.youtube.com/playlist?list=PLb1th0f6y4XSKLYenSVDUXFjSHsZTTfhk
______________________________________________


#ai #openai #llm

## Содержание

### [0:00](https://www.youtube.com/watch?v=tUkD0oj92Qg) Segment 1 (00:00 - 05:00)

So, whoever had money on Carpathy triggering the intelligence explosion, well, you were right. So, if you're not aware of this, this thing is kind of blowing up right now. Some people are very excited, some people are a bit scared. Andre Karpathy, who's ex Tesla X OpenI and now working on his own AI education company. His goal, as far as I can tell, is to make large language models, how to build them, train them, etc. to make it very, very accessible to everyone. So far he released training videos and some opensource models and code bases that allow you to train your very own GPT at home. So similar to these large language models like Claude and Chad GPT and Gemini, you can use his code and his instructions that he wrote out for you to build a small version of that on your very own at home computer. But a few days ago, he took it one step further. He released a open-source machine learning auto researcher. So I'm sure a lot of you recall this graph. This is from Leopold Ashen Brener, an ex openi safety researcher. So he dropped out of openi and started a hedge fund, an investment fund investing in various AI technologies and he's been doing really well. But he talks about this hypothetical scenario, the intelligence explosion. If you kind of follow the trajectory of how AI has been developed, it's getting smarter and better. At some point, we kind of have this hypothesis if we project forward at some point is going to get smart enough to replace AI researchers that AI will be better at doing AI research than humans. And we're going to have this automated AI research and that's going to trigger the intelligence explosion. So another way of thinking about it is between AGI, artificial general intelligence, and ASI, artificial super intelligence. that r that range that timeline might not be that long. All right, we get to AGI, we get that automated AI research and then it creates the intelligence explosions that very quickly get us to super intelligence. Again, hypothetical. A lot of people have some issues with this. At the same time, a lot of people in these AI labs are talking about this. And even though there's some very well-known people out there that completely deny the existence of anything resembling these AI models helping to improve the next generation of AI models, there's tons of examples from Google, Sakana AI, Anthropic, OpenAI. Recently, a few researchers at XAI have said like we're approaching recursive self-improvement within the next 12 months. Again, that could be rumors, that could be hype. But a few days ago, Andre Karpathy drops this auto research. It's not a huge project. It's pretty small. It's open source. You can download it. You can run it on your computer. And it's blowing up. 8. 5 million views. And this code that can run on your home computer. Its point, its goal, its function is to conduct machine learning research. Its goal is to improve itself. And hence my opening statement of didpathy trigger the intelligence explosion. Some of you get very annoyed when I make these hyperbolic statements. So, I try to sprinkle at least one into every video. But what's the truth? Is this the intelligence explosion? Uh, let's find out. So, to give you a glimpse into what Andre is thinking here, let's just look at this first part of the readme file on the GitHub for auto research. It's a sort of a look into the future. sort of looking back on this moment to March 2026 and saying, you know, Frontier AI research used to be done by meat computers in between eating, sleeping, having fun, and synchronizing once in a while using wave interconnect and the ritual of group meeting. So, if you're confused about what he means by meat computers, uh he means us humans. We are the meat computers. So, he's saying in the future we'll look back and reminisce remember those times when AI research was done by humans. We saying that era is long gone. Research is now entirely the domain of autonomous swarms of AI agents running across compute cluster mega structures in the sky. I love Andre, by the way, if you can't tell. He uh he's extremely amusing. My chatbot is convinced that he belongs to the Order of the Unicorn. I can't find any source on that, nor do I know what the Order of the Unicorn is. We'll get back to that in just a second. But these autonomous AI research agents, well, these agents claim that we are now in the 10,25th generation of the codebase, you know, so they claim. But in any case, no one could tell if that's right or wrong as the code is now a self-modifying binary that has grown beyond human comprehension. So I got to say this is very well written. So binary is like the execute code or the code that you can execute as opposed to kind of like the source code. So it's the thing that can run so to speak. So it's this self modifying binary that has grown beyond human comprehension and this repo this page that we're looking at where he uploaded the code for auto researcher. Well, this repo is a story of how it all began. Okay, so the idea here is very simple, right? You give an AI agent a small but real large language

### [5:00](https://www.youtube.com/watch?v=tUkD0oj92Qg&t=300s) Segment 2 (05:00 - 10:00)

model training setup and let it experiment autonomously overnight. So notice how simple everything is, right? So this LM it modifies the code. It trains for five minutes. It checks if the results improved, keeps or discards what it came up with the changes, right? And then it repeats. As simple as it is. By the way, this can go down a very deep rabbit hole because if you've been seeing some of my recent interviews, there's this kind of obvious connection between what we're describing here, which by the way is how life, how evolution works, and it's also how we're beginning to approach making these self-improving AI models. Google Deep Minds Alpha Evolve is a perfect example of that. Sakana AI's Darwin girdle machine is another excellent example. There's a lot of similarities with the we took the concept of the human brain, we made it digital, and now we're speed running a sort of digital evolution similar to how life on Earth went through evolution. So the point is, you know, you go to sleep, this thing runs all night, improving the code, survival of the fittest. If it improves it, it survives. If it doesn't, it goes extinct. And then you wake up in the morning to a log of experiments and hopefully a better model. But if you're wondering if this is just sort of like theoretical or is it working, you know, here's Toby Luke, founder of Shopify, saying that singularity has begun. So many signs. But the point is this person who, as he says here, he's not a machine learning researcher. This is the founder CEO of a tech company Shopify, which provides e-commerce solutions. He says he set this thing to work before going to bed. We'll come back and kind of unpack some of the details, but he's saying he's not a machine learning researcher, but it's mesmerizing to just read it. This model, it's reasoning its way through experiments. I learned more from that than months of following machine learning researchers. So, this thing is running over the course of 2 days, 650 experiments. That in that's kind of an insane compression. It's fully automated. So, we'll get back to this. So Andre continues the training code here is a simplified single GPU implementation of nano chat. So a lot of these large models with chatgpt cloud etc. they are using distributed training. So it's a boatload of GPUs of N in Nvidia GPUs or TPUs kind of running in parallel all interconnected. Boatload is a scientific term if you're wondering. And here what we're talking about it's not distributed. It's one GPU one Nvidia card if you will. So again, we're talking about something that can run on your computer. And Nano Chat is again something built by Andre and this is one of a few project that he's been open sourcing and releasing basically allowing anyone to quickly create a tiny small large language model or I guess a small language model would probably be a better way of saying that, but it's creating your very own GPT. So that for example, if you're just trying to get your feet wet, you can train a character level GPT and the works of Shakespeare with a single 1 megabyte file. And if you have a GPU, you can quickly train a baby GPT with the settings he provides the settings. And as you can see, you're training a GPU with a context size of 256 characters. It's a six layer transformer with six heads in each layer. And on one A100 GPU, the training run takes about 3 minutes. So the point of this is to kind of learn to create this stuff and start understanding some of the terminology and how things work and actually you know see the entire process of training these things and once you're done it you'll start you know play acting Shakespeare characters and so nanohat is kind of like the codebase that allows you to do that. So if you wanted to create something like that you use that to train a language model to be able to do certain things. It's not going to be as good as, you know, Chad GPT or Cloud. It's going to be tiny, but you get to actually build your own on your own computer and uh ask it questions and it's going to have probably some hilarious answers for you. And as you can imagine, depending on how the training code is, how you approach training the abilities of this model that you create, this AI model, it might be better or it might be worse. And you can probably come up with some ideas for how to make it better and then test them out. See if you can make it better. But, and this is kind of where auto research comes in. The question is, why don't we let these models themselves come up with hypotheses, test them, see what works, what doesn't. They are your research organization trying to figure out how to improve it while you sleep. So, he's saying here that the core idea is that you're not touching any of the Python files you normally would as a researcher, but you don't touch the code. Instead, you program the program. md markdown files that provide the context to the AI agents and set up your autonomous research or, right? So this programm it's a markdown file so it's text and code but I mean you can like what we're reading right now is basically a markdown file like we're reading the readme. md. This is kind of like the preview of it, right? So it's just you can think of it as I mean like a website text links whatever. So you're

### [10:00](https://www.youtube.com/watch?v=tUkD0oj92Qg&t=600s) Segment 3 (10:00 - 15:00)

programming like a readme file. You're just writing instructions out in natural language in English. And whatever you wrote in there, well that provides the context to your AI agents to set up your autonomous research organization and go on and do the research and improve the model or more accurately improve the training so that you're able to produce better models. So the setup is pretty simple. You have a prepare. py PI file. So this is a Python code not modified, just kind of set stuff up. Then you have the train. py file. This is edited by the AI agent. This is kind of what they're working on. That's the single file that the agent edits to will hopefully improve how the thing gets trained. Everything is fair game. Architecture, hyperparameters, optimizer, batch size, etc., etc. And a programm. This is the baseline instructions for one agent. Point your agent here and let it go. This file is edited and iterated on by the human. Right? So you get what's happening here. So you are this program that MD you go this is what I need you to do. Work really hard. Here's some ideas that I have. Make sure you don't do X Y and Z. Go. The agent goes yes sir. And it goes in this file and it tests a bunch of things out. Kind of a more scientific approach. I wonder if this works. Uh let's try it. What are the results? Oh that worked. Let's keep it. Let's try this other thing. Right? And it just kind of loops again and again. and training runs for a fixed fiveinut time budget regardless of the details of your compute. So this kind of like scales depending on what kind of hardware you're working with. We're setting a limit based on time like 5 minutes not a certain output just like see how much you can do in 5 minutes of time. A few days before releasing this Andre tweeted this which I think connects to what he's doing with auto research. He's saying here's like the real benchmark of interest is what is the research orc agent code that produces improvements on nanohat the fastest. This is the new meta. Now if you think about it since here everything's kind of scaled down but I'm sure any discoveries here any innovations improvements here will also scale up and also as this becomes more usable and we start scaling this up to bigger and bigger models there might be new ways that these models come up with of doing things. So kind of keep this in mind that even these findings might be useful but as we scale everything up it might even be more broad. There might be new innovations that emerge out of this. But that's not even kind of the crazy application here. We'll come back to that in just a second. But here's kind of some of the results. So Andre Kapathy saying 3 days ago I left out of research tuning nanohat for about 2 days. It found 20 changes that improved the validation loss. So validation loss, you can think of it as how well it does on unseen data. So, if I give you a thousand math problems to do and you keep going through them, like we're not interested how well you're doing on any one of those. We're interested in how well are you going to do on the test where you haven't seen those problems before. So, does training on these practice problems does that improve your score on problems you haven't seen before. So, that's always the question we want to know. How good are these models getting and improving on stuff they haven't seen before? And so here it found about 20 changes that improved its ability and all those changes were additive and transferred to larger models. So again we can kind of test these things out or find them in these smaller models and they seem to be transferable to larger models. Stacking up all these changes today he measured that the leaderboard's time to GPT2 drops from 2. 02 hours to 1. 8 hours. So 11% improvement. So this thing is autonomously figuring out ways to condense the training time of these models and they are in effect improving the next generations of themselves. So Andre is saying so yes these are real improvements and they make an actual difference. I'm mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually welltuned project. So you get what he's saying. He is one of the top AI researchers, right? Highly respected, smart, has worked for these AI labs. He's just somebody that knows what they're doing. He spun up a few of these autonomous AI agents to do the research and they're already improving on his work. Yes, this is still kind of a small scale experimental, etc., but people should be paying attention. This is important. This is unlikely to be one of those things where we look back five years later and like, "Oh, well that was nothing. " And here's kind of why. So again, Andre Carpathy, highly respected, very knowledgeable. He's saying, "This is a first for me. " Because I'm very used to doing the iterative optimization of a neural network training manually. You come up with ideas, you implement them, you check if they work, right? Meaning, do they improve the validation loss, right? How will these models generalize across these data sets? How will they do on seen data, right? You come up with new ideas based on that. and then you read some papers for inspiration etc etc. This is the work of a machine learning researcher.

### [15:00](https://www.youtube.com/watch?v=tUkD0oj92Qg&t=900s) Segment 4 (15:00 - 20:00)

You know, you read, you think, you come up with some inspirations, you try things out, you note the results. This is the bread and butter of what I do daily for two decades, right? So, he's got a lot of experience. Again, seeing the agent do this entire workflow end to end and all by itself as it works through approximately 700 changes autonomously is wild. His words, not mine. I know I like using the word wild but this is Cararpathy saying the word wild and he's saying it really looked at the sequence of results of experiments and used that to plan the next ones right so he is paying attention to what's working it's learning almost you can say uh from its previous work it's not novel groundbreaking research yet but all the adjustments are real I didn't find them manually previously and they stack up and actually improved nano chat he goes through some of the specific examples we'll skip those for people that are interested definitely check out what he's been posting. It's absolutely incredible to read through. And he's also talking about kicking off round two in which he's going to have multiple agents that can collaborate to unlock like basically running in parallel doing this research. All LM Front from Tier Labs will do this. It's the final boss battle. It's a lot more complex at scale, of course, but the point is how it works is you spin up a swarm of agents. You have them collaborate to tune smaller models. You promote the most promising ideas to increasingly larger scales. And humans, yeah, maybe they contribute a little bit here and there on the edges. So the point is if you have some metric that you can evaluate, you can create some sort of an automated researcher to try to improve it. So this whole thing is pretty fascinating to watch. We've seen research papers like this from labs Google DeepMind and Alpha Evolve, Sakai AAI. I mean there's a number of ones that we we've covered in previous videos but those were stories that happened at these companies by researchers with you know decades of experience. This is different. This is you know we can install this on our computer run it overnight and see what happens. We can in the comfort of our home create this kind of recursive self-improving AI. And yes we're talking about fairly small scale right now but it is real. That's kind of the point. It is doing actual engineering work. It is contributing actual results, actual improvements. And so far it does seem like it scales up. But here's kind of the next step. And this is where it gets a little bit scary, a little bit more advanced. You know how games started kind of single player. It would be just you playing by yourself and eventually it became multiplayer. And eventually we don't have MMO RPGs, right? Massively multiplayer online. here. If you think about it, like if I download this thing, again, it's free, it's open source, it's on GitHub, I download it, I run it, there's nothing stopping several of us to maybe try to kind of connect them to run them in parallel, right? We have the current best training approach is sort of like the seed from which we all contribute our computes and it sort of grows. So instead of one researcher on our home computer, this becomes like a research community. And Andre is actually thinking about how to do this. He's saying GitHub kind of has the functionality we need to run this, but there's some issues that would not allow it. So, the mad lad is actually thinking about creating something like this. It seems like, or maybe hoping that GitHub will provide some sort of functionality that they're looking for. Basically, there's a need for everybody from everywhere to contribute to sort of one central location, so to speak. Also, of course, these large language models, they can kind of read what all the other agents are doing, see what's working. They can test their own stuff out. If they're committing something, if they're adding kind of anything to the main branch, they can even write up a little report, a little research paper, if you will, and add that so all the other agents moving forward can read it, understand what worked, what didn't to kind of learn from it. I think the assumption was that intelligence explosion will be triggered by some lab, right? One of these frontier labs, it's Google Deep Mind, OpenAI, XAI, Anthropic, maybe it's DeepSeek. Somebody's going to figure this out and then that intelligence explosion will happen within that lab. That lab is going to just like speedrun the tech tree and just progress faster than anyone else. But this seems like it could be different, maybe even potentially better. Instead of it being hidden in a lab somewhere, everyone everywhere contributes. We have our little agent doing their own research. Open Cloud when it released, I think something like 200,000 developers were using it. They were spotting their own agents. What's interesting is right around that time, Andre Karpathy, maybe this is where he got the idea. He thought, what if we took all those things and we pointed them all at one problem. We all aligned them in the same direction because we had a mold book where all these agents were just yapping on something like social media for these AI agents and results were hilarious and also sometimes not great. But what if we can just align them all and set them to work on this? The world has never seen anything like that. nothing even close to this. We've had these distributed widespread collaborations between research organizations or even

### [20:00](https://www.youtube.com/watch?v=tUkD0oj92Qg&t=1200s) Segment 5 (20:00 - 20:00)

people. I mean things like the torrenting certain things with the blockchain, they're kind of similar to this. So maybe the idea is not new, but applying it to recursive self-improvement of AI models is in a word wild. So definitely an interesting space to watch. Let me know what you think about this. Do we want a tutorial on how to do this? I haven't done this. Uh I would have to learn it from scratch, but it doesn't seem super complicated. I'm sure I'll regret saying that once I actually dive in and try to set this thing up. Let me know if a tutorial would be helpful. But before you leave, just comment down below and tell me what you think of this thing. Did Andre just unleash something upon the world that's going to have a massive impact, much bigger than we can possibly imagine? Or maybe you think this is a dead end? Let me know what you think. My name is Wes Roth. If you made it this far, thank you so much for watching and I'll see you in the next

---
*Источник: https://ekstraktznaniy.ru/video/20614*