its own job. Okay, so that was kind of step two, more exciting, right? We're getting kind of a step up. So this is more exciting, more interesting. But now we get to step three. So this is the autonomous scaffold optimization. So it ran this for a 100 plus rounds. So it took that early version of M2. 7 with the scaffolding that it had and said okay think of a way like a hypothesis how could you improve yourself okay design the experiment modify the code uh then commit those changes then run whatever benchmark test that you want to see did you get better or did you get worse then compare those results with you know the control group how you were previously right if you're getting better then just commit those changes if you're getting worse revert back to the previous step where you were better and try something else all right so it's running this 100 plus times with zero human input of any kind. Right? So it autonomously does what basically is the scientific method. Right? So hypothesis experiment, you know, compared to the control group. Then here's the conclusion, right? So that's one loop and it just keeps looping through that with no human supervision. I mean, I'm sure there's a person in the room or maybe not. Who knows? By the way, you might have ran into this idea at some point in your life where people would say, "Oh, one day the AGI or whatever some AI model like it does these experiments autonomously and somewhere in the middle of the night like wakes up and there's this massive explosion of intelligence, right? It hits some wild curve and it just became infinitely smarter. " Right? Not that long ago, this was just pure science fiction speculation. Think about right now how many people are running Andre Karpathy's Auto Researcher on their own home computers now. Is that thing gonna become aware in the middle of the night while you're sleeping and then escape onto the internet? Probably not, right? But as you'll see with this Miniax, we might be getting closer to that. And I say that specifically because of how it performed on OpenAI's MLE bench, I believe it was called. We'll come back to that in just a second. Okay, so it's running a 100 plus different things. What is it testing? What kind of knobs and dials is it turning? Well, one of them that I thought was interesting was the temperature. it changed the temperature. I don't mean like on the thermostat. I mean the model's own temperature. So temperature you can think of as we're sort of asking the model how creative does it need to be. Right? So these models are kind of like guessing at what the output should be. And if we turn up a knob we're saying take a wild guess, you know, just there's no wrong answers. See what you can come up with. And if we turn it down, it's a little bit more predetermined, so to speak. within narrower guidelines, narrower bounds of whatever statistically likely. I'll give you a quick example. So, I opened up GPT 4. 1 in the playground. You can actually play around with these settings. If you haven't tried it out, I encourage you to kind of fun and kind of gives you an idea about how these models work. So, I told it to write a very short poem about cats and the temperature set at one. That's kind of the default. It said, "Soft pause in the sun, whiskers twitch, then done a leap and nap, life is simple for cats, right? " So, it's kind of what you would expect, right? But you see how most people would say yes, that's the correct answer to a short poem about a cat. No one's going to go, "What was that? " You know, that's not what I asked for. That sort of is within some normal precision of how wild the guesses are as to a poem about a cat. Let's turn up the temperature to two, which is the maximum, and uh see what happens. So, here's what it got. Moonlight shimmers. Soft footfalls along the sill. Now I miss my cat, right? Okay. So, a lot weirder. Stuff got weird there, right? It's Yeah, it's a poem. Yeah, I guess it's about a cat, but it's kind of dark. And is the cat dead in this one? Okay, so that's what happens when you turn up the temperature. By the way, when I just put in that same prompt again with temperature of two, just to see kind of how different it would be. Here's the second attempt at a poem about a cat. As you can see, it starts with something like a poem about a cat, but it proceeds into what I can only surmise is some sort of instructions for demon summoning. And there's like pages and pages of this. It cycles through what I think is pretty much every single language that existed in the world. And it ends with greetings, spirits, which come confirms my suspicion that you're summoning something cuz at the end of it, you like greetings, you know, thing that I had just summoned. Okay, I better log out and clear the cookies on this one. Okay, so that's turning up the temperature. That's what that does. And as you can imagine, from one to two, like just in that range, there's so many different possibilities for what could be produced. And depending on what you're trying to achieve, there might be some special range or some number in there that just like works much better than others. And so that was just one of many different things that it tried to test and optimize and see if it can prove itself and its abilities by tweaking that one variable. It also improved some of the work guidelines. So, for example, if it found a bug, it