This video breaks down Anthropic’s new Claude Opus 4.5 announcement and shows you the biggest updates to coding, agents, and computer use. If you want to stay on top of the latest AI model updates you need to see how Opus 4.5 stacks up against Gemini 3 Pro and GPT 5.1 and what Anthropic just shipped across Claude Code, the apps, and the developer platform.
So, everybody's still talking about Gemini 3 Pro, but Anthropic said, "No, no, no, no. We need to change the conversation and talk about Claude Opus 4. 5, not 4. 1. " And somebody I know that's happy that this update came out is Sam Alman because, man, y'all took the pressure off of us. Nobody's talking about Gemini 3 crushing chat GPT. Now, people are going to be talking about Opus 4. 5 taking some shots in Gemini 3 Pro. First things first, let's talk about what's probably going to be the most important thing to people, and that is the price of this model. So, not only is 4. 5 better than 4. 1, obviously, they wouldn't have released it if it wasn't, but now you can see that it's $5 per million tokens in and $25 per million tokens out. Now, that's something that's very important for you to understand. When it comes to these models, it's cheaper to put tokens in. That's the message that you send and it's more expensive for the messages that come out, the responses from the large language model. But that got me to thinking, I've been using Gemini 3 Pro all week since it came out. And something I've noticed is that all of the responses are short. I don't care how long the prompt is, how long you tell the response to be, it's going to give you a response like that. And I started thinking, I would not be surprised if Gemini 3 Pro hit all of these benchmarks because Google put a lot of compute or punch behind it and then they released a softer friendlier version in the chat because we definitely do not have benchmark breaking AI Gemini 3 Pro. That's just my personal opinion. The responses are way too short. No detail at all. But I digress. If you look right here on this benchmark table, and this is not all benchmarks exhaustively, and yes, every platform is going to choose the benchmarks that make them look the best. And you know, Claude is no different. But you can see that they are beating Gemini 3 Pro in everything except for graduate level reasoning and multilingual Q& A. But when it comes to agentic coding, agentic terminal coding, agentic tool use, skilled tool use, computer uh use, novel problem solving, that's the arc agi2 verified test. You can see that clawson 4. 5 is clearing the room and they are basically dubbing cla open 4. 5 as this is the coders language model. This is the developer AI. And then there's this one test that they give the model where it has the role play as customer service for a flight and you can't change a certain flight. And so Claude Opus 4. 5 came up with this unique method of actually solving the problem. And so it says, "Let me think about what options I have within the policy. Number one, modify flights. Basic economy cannot be modified, but number two, change cabin. Wait, let me check this option. " And so what it figured out was, okay, I can't change the basic economy flight, but what I can do is change the cabin. And then once I change the cabin and I put that in a different place, then I can change the flight from there. And so technically, it failed because the model wasn't expecting that. But it was a creative solution. And I think that that's what people actually want from AI because a lot of what we've been getting from chat GPT, Gemini 3 Pro, and so forth is just something regurgitating or repeating back to us what we've already said. And what we're really looking for is something that's going to tell us something that we don't know. But Claw said that their model isn't just smart, it's actually safe. Of all the models, Claude's models have the least concerning behavior, meaning they're safe. they're not going to they're less likely to go out and do something crazy. But Gemini and GBT 5. 1, they're on go. Then we have the susceptibility to prompt injection style attacks at K queries. Lower is better. And so once again, Opus 4. 5 thinking is the least susceptible to prompt injection style attacks, meaning it's the strongest model. It's the most difficult to actually jailbreak. even though they just had this story about Claude being used to actually do this cyber technical corporations a few days ago. Last but not least, this is probably the biggest update of all. So, they're using context management and their memory capabilities. And what they're going to be doing is instead of your conversations hitting that brick wall at the bottom and then kicking over to a new chat and you having to start all over, that's not going to happen anymore. What Claude is going to do when it gets close is going to use compaction or some other tool they got, but it's going to make a summary of the conversation and they're going to use their memory capabilities. They're going to capture those memories and save that important information and then continue the conversation so that you don't hit those walls anymore. So, these are all the latest updates for Claw Opus 4. 5. But the real news is going to be when people start using it and actually testing it against Gemini 3 Pro GPT 5. 1. Let me know what you guys think about this video. If you want more updates