M2.7 just BROKE the Entire Industry...

M2.7 just BROKE the Entire Industry...

Wes Roth 63 108 просмотров 1 586 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Try SerpApi https://serpapi.link/wes-roth Click the link above to get 250 free credits to start building right now. ______________________________________________ My Links 🔗 ➡️ Twitter: https://x.com/WesRoth ➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe Want to work with me? Brand, sponsorship & business inquiries: wesroth@smoothmedia.co Check out my AI Podcast where me and Dylan interview AI experts: https://www.youtube.com/playlist?list=PLb1th0f6y4XSKLYenSVDUXFjSHsZTTfhk ______________________________________________ 00:00 Minimax M2.7 03:50 Step #1 04:35 SerpApi (sponsor) 05:53 Step #2 07:02 Step #3 #ai #openai #llm

Оглавление (5 сегментов)

Minimax M2.7

So, Miniax releases M2. 7. They're calling this the early echoes of self evolution. So, first of all, what is Miniax? It's a Chinese company founded in 2022. They have hundreds of millions of users globally as well as investments from Alibaba, Tencent, and many, many others. So, they're claiming that this model helped evolve itself, self-evolution as they put it. and they describe a specific technical process by which this model helped improve itself. So, is this just a marketing fluff or is there something to it? Well, first of all, this isn't the first model to be claiming something like this. Google Deep Minds Alpha Evolve did something similar. Those models with their harness helped improve the future versions of Gemini. So was kind of driven by Gemini and it helped improve the future versions of Gemini as well as some of Google's data centers, some of the hardware on which the Gemini model was trained etc. Recently Andre Karpathy released Auto Researcher and it's kind of this idea of self-evolution but just on a very small scale. It's something that you can run on your local machine and a lot of people are impressed with this technology. It's showing that this is possible. Now, not to overblow it, we're still kind of in the early stages. Again, here they're saying these are the echoes of self-evolution. Sam Alman referred it to as being in the larvalo stages of recursive self-improvement. So, kind of keep that in mind as we talk about this. So, we're sort of ascending that mountain where models are going to be able to do more and more of machine learning research. We're not on the sort of the vertical side of the mountain yet, nor do we even know what that might even look like and even if we're going to get there. But for the time being, we have this Miniax M2. 7. So, really fast, we're going to be talking about these models and their harness. The harness is kind of like all the stuff around them that helps those models do whatever it is that they need to do. It's data, it's tools, it's code that helps shape kind of what it's doing. So the analogy I use is the model is kind of like the pilot and the harness is the vehicle or the airplane that it's piloting or driving or flying, whatever works for you. But in this analogy, it's like having the pilot of this, let's say, Formula 1 car. He's driving the car and he's also the head of the engineering lab that's improving the car while also driving the car all at the same time. That's kind of a fitting analogy here. So first and foremost, the team at Miniax, they built a internal research agent harness using this model M2. 7. So this was kind of an early checkpoint, an early version of this model that they tasked with building that harness, building that Formula 1 car for itself to drive. So what it need to do is support data pipelines, support training environments, manage all the stuff across different teams, have all of the memory that it needed to keep track of the experiments. So in the beginning it was kind of like this research assistant to the machine learning team. It would do literature review. It would analyze some of the proposed experiments that the researchers had in mind. It would pipeline the data. It would launch the experiments. It would fix bugs. It would do log analysis. It would do merge requests. Run smoke tests as they say, monitor the various results. So it would do end to end most of that stuff. Okay. So, this sounds impressive, but we're not into kind of like the crazy aspects of it yet because if you're using something like open claw or clawed code or open eyes codeex or anything like that and there are many more every time I mention any one of them here, there's a lot of people in the comments saying, "What about this one? " Cuz Hermes is the next one that's kind of like bubbling up. But my point is, we've seen this, right? So, it's an AI model within some scaffolding. It makes it more agentic. So, it's these AI agents that we're talking about. So, this is what they're doing with an early

Step #1

checkpoint of this M2. 7 model. It's working very well. So, that's impressive, but we're not into the crazy part yet. But according to the team at Miniax, they're saying that M2. 7 I'm always going to struggle with that name for some reason. M2. 7 that this model is handling 30 to 50% of the reinforcement learning teams workflow. Right? So the AI researchers that are working on this, this thing handles a third to half of all the stuff that they have to do to keep training and improving this model. All right, so that's step one. Again, impressive, right? If we stopped there, this would have been pretty cool. an interesting read. Although at this point, a lot of us have tried this at home. This isn't cutting edge. Maybe we didn't try for machine learning, but this approach is becoming a little bit more diffused in the world.

SerpApi (sponsor)

So step two. So, if you've ever tried to pull data from Google search programmatically, you know it's a nightmare. You write a scraper, it works for a day, then Google changes their layout, throws a capture at you, and the whole thing just breaks. SER API solves that. You make one API call to Google, YouTube, Bing, or whatever search engine you want, and you get back clean, structured JSON data, organic results, images, shopping, news, Google Scholar, all of it. And here's why this matters for AI specifically. Every model you build needs realworld data. SER API gives you a pipeline to pull live search data at scale. Need pre-classified images for training? Google images API. Need academic papers? Google Scholar API. I need to track what the internet's saying about a hot topic right now. Google News API. They handle the captures, the proxy rotation, and all the scraping infrastructure with 99. 9% uptime and responses in about a second. You just focus on building. Click the link in the description or scan the QR code on the screen to get 250 free credits to start building right now. Thank you, Ser API, for sponsoring this video. And now, let's get back to the video. Step two, this is where M2. 7

Step #2

recursively improves its own harness, right? So it improves the car that it's driving. So this harness plus model, it starts kind of tracking its own performance and seeing which of it tweaks which of changes how it affects the final outcome. Is it improving it? Is it getting better? So it's kind of collecting feedback on its own performance, right? So it's getting some sort of data back about how am I doing? Please filled out my little sheet, you know, one to five. Would you rate me highly or poorly? And at the same time it's building evaluations for various internal tasks and then it begins iterating on its own architecture on its own skills. So those like skills empty file that I'm sure some of you are familiar with that we use for clot code etc. And these are used when an AI agent has to do something often right? So instead of like every single time coming up with some fresh and exciting way to approach it we kind of make it into a skill and they just follow that skill. It's like a recipe like if you want to do this particular research, here's a recipe for how to do that. If you want to make a blog post, here's a recipe for how to do that step by step. So the point is this agent is rewriting its own tools to get better at

Step #3

its own job. Okay, so that was kind of step two, more exciting, right? We're getting kind of a step up. So this is more exciting, more interesting. But now we get to step three. So this is the autonomous scaffold optimization. So it ran this for a 100 plus rounds. So it took that early version of M2. 7 with the scaffolding that it had and said okay think of a way like a hypothesis how could you improve yourself okay design the experiment modify the code uh then commit those changes then run whatever benchmark test that you want to see did you get better or did you get worse then compare those results with you know the control group how you were previously right if you're getting better then just commit those changes if you're getting worse revert back to the previous step where you were better and try something else all right so it's running this 100 plus times with zero human input of any kind. Right? So it autonomously does what basically is the scientific method. Right? So hypothesis experiment, you know, compared to the control group. Then here's the conclusion, right? So that's one loop and it just keeps looping through that with no human supervision. I mean, I'm sure there's a person in the room or maybe not. Who knows? By the way, you might have ran into this idea at some point in your life where people would say, "Oh, one day the AGI or whatever some AI model like it does these experiments autonomously and somewhere in the middle of the night like wakes up and there's this massive explosion of intelligence, right? It hits some wild curve and it just became infinitely smarter. " Right? Not that long ago, this was just pure science fiction speculation. Think about right now how many people are running Andre Karpathy's Auto Researcher on their own home computers now. Is that thing gonna become aware in the middle of the night while you're sleeping and then escape onto the internet? Probably not, right? But as you'll see with this Miniax, we might be getting closer to that. And I say that specifically because of how it performed on OpenAI's MLE bench, I believe it was called. We'll come back to that in just a second. Okay, so it's running a 100 plus different things. What is it testing? What kind of knobs and dials is it turning? Well, one of them that I thought was interesting was the temperature. it changed the temperature. I don't mean like on the thermostat. I mean the model's own temperature. So temperature you can think of as we're sort of asking the model how creative does it need to be. Right? So these models are kind of like guessing at what the output should be. And if we turn up a knob we're saying take a wild guess, you know, just there's no wrong answers. See what you can come up with. And if we turn it down, it's a little bit more predetermined, so to speak. within narrower guidelines, narrower bounds of whatever statistically likely. I'll give you a quick example. So, I opened up GPT 4. 1 in the playground. You can actually play around with these settings. If you haven't tried it out, I encourage you to kind of fun and kind of gives you an idea about how these models work. So, I told it to write a very short poem about cats and the temperature set at one. That's kind of the default. It said, "Soft pause in the sun, whiskers twitch, then done a leap and nap, life is simple for cats, right? " So, it's kind of what you would expect, right? But you see how most people would say yes, that's the correct answer to a short poem about a cat. No one's going to go, "What was that? " You know, that's not what I asked for. That sort of is within some normal precision of how wild the guesses are as to a poem about a cat. Let's turn up the temperature to two, which is the maximum, and uh see what happens. So, here's what it got. Moonlight shimmers. Soft footfalls along the sill. Now I miss my cat, right? Okay. So, a lot weirder. Stuff got weird there, right? It's Yeah, it's a poem. Yeah, I guess it's about a cat, but it's kind of dark. And is the cat dead in this one? Okay, so that's what happens when you turn up the temperature. By the way, when I just put in that same prompt again with temperature of two, just to see kind of how different it would be. Here's the second attempt at a poem about a cat. As you can see, it starts with something like a poem about a cat, but it proceeds into what I can only surmise is some sort of instructions for demon summoning. And there's like pages and pages of this. It cycles through what I think is pretty much every single language that existed in the world. And it ends with greetings, spirits, which come confirms my suspicion that you're summoning something cuz at the end of it, you like greetings, you know, thing that I had just summoned. Okay, I better log out and clear the cookies on this one. Okay, so that's turning up the temperature. That's what that does. And as you can imagine, from one to two, like just in that range, there's so many different possibilities for what could be produced. And depending on what you're trying to achieve, there might be some special range or some number in there that just like works much better than others. And so that was just one of many different things that it tried to test and optimize and see if it can prove itself and its abilities by tweaking that one variable. It also improved some of the work guidelines. So, for example, if it found a bug, it

Другие видео автора — Wes Roth

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник