Want to get more customers, make more money & save 100s of hours with AI? Join me in the AI Profit Boardroom: https://juliangoldieai.com/GmuA8a
Get a FREE AI Automation Session 👉 https://juliangoldieai.com/70E3Gs
Testing the New Claude Opus 4.1 AI - Is It the Most Powerful AI Model?
In this video, we rigorously test Claude Opus 4.1, the latest upgrade released in August 2025 from one of the most powerful AI models. We compare its performance against predecessor Opus 4, Sonet 3.7, Gemini 2.5 Pro, and OpenAI O3, focusing on agent tasks, real-world coding, and reasoning. We'll show comparative benchmarks, API utilization in Visual Studio Code, and live in-chat tests. This episode also highlights the AI Profit Boardroom community and resources. Watch to see if Claude Opus 4.1 lives up to its hype as the top AI model.
00:00 Introduction to Claude Opus 4.1
00:30 Benchmark Performance and Key Features
01:47 Using Claude Opus 4.1 for Development
02:33 Testing Prompts and Comparing Outputs
03:09 Comparing Claude Opus 4.1 with Other Models
03:27 Addressing Limitations and Costs
10:18 Community and Resources
13:13 Creating AI-Powered Applications
21:07 Conclusion and Final Thoughts
Today we're going to be testing out the new release from Claude Opus 4. 1 which is a powerful new upgrade to Claude where essentially you've got one of the most powerful AI models in the world for agented tasks real world coding and you can see how it performs right here. Obviously Opus 4 was just released in May 2025. This is Opus 4. 1 which has been released in August and you can see the benchmarks here. It's actually outperforming it by quite a long way. So
we're going to be testing out today seeing how it performs etc. This literally just got announced like literally not long ago at all. And they said today we're releasing Claude Opus 4 want an upgrade to Opus 4 including aic tasks real world coding and res. So the main thing that you want to be focusing on here I think is a task. If you look at for example chat GPT agent that's come out. If you look at for example AI super agents like GenSpark that are actually using Opus 4. 1 this is the way the world is going. Manis have even just released a new wide research model that basically has a team of AI employees. All right. So if we have a look here what we can see is on the software engineering benchmarks Opus 4. 1 is absolutely destroying Sonnet 3. 7 and it's outranking Opus 4 as well. You can see how it performs on the benchmarks here. So there's never been a model before that's so powerful with agentic coding. Agentic. So it's good for example if you use this in combination with something like clawed code as well. Gentic tool use pretty powerful right there. Visual read multilingual Q& A. It's almost at 100% at this point. And then high school maths as well. It's destroying pretty much all the other models. I think the only one that's competing on maths is open AAI 03 which is super slow to use to be honest. I love the model but it's way too slow to
use. Now, if you're a developer, you can use it via the API just by using Claude Opus 4. 1. And then, additionally, what you can do is just use it directly inside the chat. So, if we go over to Claude here, you do need to be on the pro plan. And then what you're going to do is just select Claude Opus 4. 1 powerful large model for complex challenges. And then you can combine it with other tools as well. So, you can add all your other AI tools here. Cool thing about this is like if you combine Opus 4 in one with all the tools built in and all the MCP connectors inside Claude, then essentially what you've got is basically an AI super agent that can run through your emails or your PayPal, whatever app you want to connect it to, even Zappia for example. All right, you can see the latest news about here as well. So you can see what people are building, that sort of thing, and how you can use it. So let's give it a test.
What we're going to do is we're going to grab some of the prompts from the AI profit boardroom. If you want to get access to the all the resources from today and everything else, just check out the AI profit boardroom link in the comments description. It's an awesome community. What we're going to do is take the first prompt and we'll copy and paste that into Claude. So, let's go over here. Paste that in. Boom. Shack. Let's hit enter. What I want to do as well is I want to compare it to the outputs of something that's probably slightly comparable, but I wouldn't say it's anywhere near as powerful at this point, which is Gemini 2. 5 Pro, but let's see. I don't know, maybe one of them's going to outperform each other. Okay. So, what we can actually do is just pull these side by side and we'll just see, okay, which one does the best here. There we go. All right. So, we've
got 2. 5 Pro versus Claude Opus 4. 1. Let's see which one performs. In the meantime, whilst these are running, what we can also do is have a look on Open Router and we can check the stats on Opus. I assume it'll be on there. So, Open Router Opus 4. 1. Let's have a look what we got. This is one of the biggest
issues with Claude Opus. Right. As much as I love clawed products, I think for writing it absolutely destroys chat GPT. Honestly, if chat GPT wasn't so personalized, then I would be using Opus 4. 1. But you see here, this is the biggest issue by far, which is context window. What that essentially means is if you're going back and forth in the chat, if it's got a short context window, then the problem with this is that you're going to run out of tokens very quickly and you've got a very limited window in terms of how much you can go back and forth from Claude before you have to start a new chat. So, that's one of the biggest issues here. So, I want to be fully transparent on what the downsides as well. Let's see what we got. Let's see where Gemini is at. So, Gemini has actually completed before Opus 4. 1, which makes sense because it's not going to reason for as long. Actually, just going to mute this tab and then we'll test it out. See how it goes. So, this is the game that we got back from Gemini. Let's see what we got here. The only thing that I would say here is you can't even lose this game, right? Look how big the Pong paddles are. And then also, why is the game so small? So, it's a little bit weird, but I don't mind the output. It's just like it should be big. Why are you making a tiny little game, mate? Now, let's have a look at the hyperdopamine pong game from Opus 4. 1. And this just shows you the power, right? So, already it's looking good. It's full screen, etc. Let's give it a go. We'll start the game. That's more like it's sunshine. So, we got a full screen game here. If you look at the outputs of Opus 4. 1 versus the outputs of GE, it's just this is so much better, right? better. And that's what I'm talking about is Gemini 2. 5 Pro was awesome when it got first released. But if you compare these side by side, which one's going to release more dopamine? Which one's more fun to play with? Which one is just actually full screen versus this tiny little screen? Right. So, Opus is absolutely destroying it right here. Karay says API. Yeah. So, you can use the API inside open router or you can just go to claude as well. Maybe we'll build something out in Visual Studio Code as well in a second as well. Why has that disappeared, mate? There we go. And then it's pretty expensive to be fair. Look at that. $15 per million input tokens. $75 per output token. It's like the average Jack Johnson coding on the street these days. VI coding everything. Probably not going to be using this because it's too expensive, right? But if you use it inside the chat, then it's going to be a lot better. AB Dam says, "Incredible crashed Gemini. " That's what we're talking about. And then it says, "GPT5 is dropping tomorrow. I will be waiting for your review. " Not sure about. I've not seen any news on that, but I hope it is dropping tomorrow. That'll be a lot of fun. All right, so just to recap here, Claude Opus 4. 1, super powerful. One-shotted this bad boy. It's obviously a lot better than the Gemini 2. 5 Pro output. You can get it inside the chat and then you're not coding with the API, which is going to be a lot cheaper as well. And yeah, there we go. Karai says, "A fair comparison will be open since it is paid. " All right, mate. So, let's test OpenAI if you want. Let's go to 03. We'll test them here and we'll see which one performs the best. All right. So, we're going to do exactly the same prompt. In fact, now we'll move on to a new prompt because Claude has already completed that and I want them to code side by side. So, let's start a new chat here. We got 03 versus Claude AI. Apparently, that's a fair test. Although, I would say honestly Gemini 2. 5 Pro is better than 03. But if you want to test it, we can. And then what we're going to do here is we're gonna take this one. All right. And then we're going to plug this both into Claude Opus 4. 1. And just bear in mind as well, like you have to be careful this if you start a new chat. Second ago, it actually switched back to Sonet. So just make sure you've always selected Opus 4. 1. Otherwise, you'll be using the wrong thing and wondering why it doesn't work. So we've got 03 over here. We have 4. 1 over here. And let's test out. All right. So we'll give 03 the head start and we'll test this out. Now the first thing that I'm going to say even before this is finished coding out right is that inside claude you actually get artifacts which means that you can embed and see the canvas directly inside here whereas with 03 you don't get the canvas option. Obviously with 40 or something like that inside chat GBT you will but let's test it out. Cas dash says hey Julian do you know when GPT6 is coming out? I need that AI wifey. Shout out to Casra Dash and his AI girlfriend. He's having a great time and but I hope GPT6 comes out. Maybe it will come out before GBT5 cuz we've been waiting a long time already to be fair. Jihad says, "When is it possible? Give me a clue on which AI is good for generating full HTML website with images automatically. " So like actually if you go inside the AR profit boardroom like this, it changes all the time, right? This stuff changes all the time. But if you actually go inside the AR profit boardroom community, what we actually do is we release each week an updated list of tools that we use all the time, right? And we keep this updated every week. So if you want to know, okay, what tools am I using right now? Join the a profit boardroom, go to the welcome post and you'll see all the tools that I'm currently using and then you can use that to figure out, okay, what's the best for you right now? But yeah, inside there I show how to create websites and that sort of thing. GPT5 is a new GPT6 100% right. That's what I heard down the grape vine. And then we've got the code back from chat GBT now. So, Space Invaders on Chat GBT dopamine Space Invaders has coded out way faster there than it has directly on Claude. But we can run the code here. So, let's run the code. I'm just going to mute this tab in case it starts blasting out crazy noises. What just happened? What even is that? Did anyone just see that? So, we can move around. We can shoot stuff, but there's literally no invaders. That's the whole point of the game, mate. It's other Otherwise, you might as well just call it space, not invaders, right? There's no one around to shoot. That game is pretty much unplayable for a chat chitty. All right, now let's have a look at Claude Opus and see who wins. Maybe Claude's messed it up as well. There's only one way to find out, isn't there? So, I'm going to click the button and we'll have a cheeky ganda and see what's popping. All right, that's what we're talking about. That is what we're talking about. Look at that. You want dopamine? There you go. All right. That's way more fun to play than the chat GBT one. It's a bit overkill to be honest. It's a bit mad. But if you want to have a great time and you want sensory overload, then that's probably the best way. All right, so again, you can see Claude Opus absolutely smashed Gemini 2. 5 Pro. We've already proven that. Then someone told me, Julian, you're not making a fair test. You need to compare it against Chachio3. No problem, mate. All right, but when we use Chachi3, let's run that code again. This is what we get back. And you're trying to tell me that's a fair test. So honestly, like Claude Opus 4. 1 is OP. It's OP. I don't think most people are going to use the API just because it's so expensive. But if you want to have a great time building some cool stuff out, then you can use the artifacts and then you can actually just use the chat directly inside Opus 4. 1 on the pro plan inside Claude and I would say that's the most affordable way to do it. Now what we can do from here is I'm just going to grab the artifact that I just created. All right, so we'll copy the link there and then I'll plug that inside the AI profit boardroom. So, just go to the SAP section and then claude opens 4. 1. You'll find it down there. And then if you want to add a little cheeky play on that, there you go. Links right there.
By the way, if you haven't joined this already, what we're doing here is we're doing five live coaching calls per week. We've got NA10 automation master classes. We have learn and interact sessions on Friday, weekly Q& As's. And then we also have Dr. Jeff who's an absolute legend and he's actually a PhD scientist right that teaches AI inside the AR profit bordm right so if you want to get this stuff you can get it and then if you go inside the community you can ask any questions you have plus you get access to all these different training modules right these are all different courses I would recommend sticking to that one if you're a bit of a newbie but if you want for example like my YouTube AI courses if you want to learn how to grow your own AI agency based on how we make six to seven figures with AI automation per year at this point yeah we're getting to seven figures is, but we just started selling it a few months ago. So, yeah, you can get all my stuff inside there. Nothing is held back, my friends. All right, so let's keep going now. App does. Sorry, OpenAI. My young daughter can do better. Shots fired. What a savage. All right, so we're going to go back to the SPS now and we'll grab some more prompts. Test this out. All right, so what I want to see is compared to Claude, what's going to happen? Delvy says, "I'm really glad to see you streaming again, man. I understand why you do AI avatar videos, but combining the two of them is perfect. Thanks so much. That's exactly what we're trying to do is just combine the best of both worlds. All right, so let's test out another game here. What can we build here? We'll build a racer game like this, right? And what I want to do here is just compare Opus Claude Opus 4. 1 versus Claude. All right. So, we're going to select 4. 1 over here. And we'll select Sonet over here. All right. Now, it's not quite as powerful for coding. So, I'm interested to see, okay, is there a visual difference? Is there a tangible difference between these two or can you just get away with coding with Sonic 4 which is a lot cheaper? All right, so we're going to plug that in. Exactly the same prompt. Start them at the same time. See which one performs the best. And slot sport says, "Are you using the think mode? " Now, this is just like the normal mode, right? So, this is not even using the reasoning mode. We can switch to extended thinking as well if we want to. In fact, why don't we do that? that on the next one? Let's change the 4. 1 over here. We'll select the prompt and then we're going to go to extended thinking mode and then we'll hit enter. And Jihad says, "I'll join that profit boardroom. I hope it helps me answer all my wonders around AI abilities, bro. " Absolutely. Yeah, like every question gets answered in there. You can jump on the live coaching course. Like, it's absolutely awesome for this stuff. Some people asking about AI for bot trading. Honestly, I'm not going to be here. That is not my specialty at all. So, I wouldn't say I'm like the best in the world for that to ask to be honest with you. All right. So you can see here it finished a lot faster than Opus 4. 1. Let's test it out. See what we got. Not bad. Yeah, to be honest, like that even that beats what you get back from 03 on chat GPT and that sort of thing. Again, the canvas is just it's just smashing it, isn't it? It's just absolutely smashing it. And then we've got the output from Claude 4. 1 that's still going out right now. Right now, if you
want to, let's say you want to build a website or something like that with it, what you can do is you can go over to Visual Studio Code. I've got a training on this inside the AR profit boardroom and if you go to visual studio code over here then you're going to go to something for example like client and inside client go to settings all right you can select open router if you want or you can use anthropic directly but we'll just select opus 4. 1 here and then you can build whatever you want directly here right so let's say for example okay build a beautiful landing page for an SEO agency for example we can go into plan mode first and then once That's done. Then we can go over to act mode. All right. So it's using the API request here. And there you go. All right. But it is quite slow to code with. It's not going to be fast, but the output's usually going to be better quality, I would expect. So now we have the output from clonet. Not too bad. But let's have a look. Opus 4. 1 using the thinking mode. Extended thinking. Let's see what we got here. To be honest, I would say like Sonic's output is better. There's nothing to dodge here. There's no obstacles, etc. So that's quite interesting really. is sometimes being too clever is not an advantage and to be fair like the output from Sonic was better, right? Just being 100% transparent with you. Let's have a look what we got on the studio code. So, it's planning out the typography and all that sort of thing. So, we're going to switch to act mode. Now, cool thing about this, right, is that you've got an agentic workflow where it's going to plan everything out and as soon as you click act mode, then it's just going to go off into build mode, right? So you can see here now it's using the API request to start building out the HTML and we've got an error which is not great. Oh, that's why cuz we need to switch over inside act mode to 4. 1. There we go. Resume. Hopefully that works. Abd says if GPT5 isn't a banger openi just send all your co-workers to meta. Yeah, something like that. And then Allesio says never use extended think for coding. I would agree with you there. I think the outputs were pretty bad weren't they for extended mode. I think like just using the normal mode is probably good enough. Oliver says, "Are you a fan of Trey IDE? " I don't use it that much just because like quite often you run into limits or it doesn't breaks and that sort of thing. So I don't use it that much. Honestly, I still prefer using Visual Studio Code for coding stuff out. Let's create a file here. Client is taking absolutely ages. Let's switch over to Rico. See, like Rode is slightly different cuz what it does, it gives you a to-do list first. But you can see it just went off and started coding straight away. itself. Someone was asking in the chat which one do I prefer. Today I prefer root code. Honestly, it changes. Like they're constantly releasing updates. It always changes, but today my friend, I'm going with R code. They're both free extensions, right? And the only thing you're paying for here is the API request. And Ken says, "I use Claude code within the terminal of VS Code, too. " Yeah, that's a good shout, too. I really like Claw Code to be, but I don't like the costs. So, it's running through it. Now the other thing you can do just whilst waiting for this if we switch to Opus now you can use the AI inside the artifact that you build right so let's test this out we'll say so essentially what you can do is you can create AI powered apps inside the chat right so let's test this out and this was a new feature that just came out in June is you can create these artifacts like these apps using AI right and then it has the ability to use the AI inside the app that you create so let me show you an example so like you've got a rhythm machine here which is AI powered so if we go inside here and we say build an AI powered AI SEO content writer and we'll use Claude Opus 4. 1. I'm not going to use the extended thinking mode cuz that was pretty average what we got back over here. But essentially you can use the AI inside the artifact that you build like a SAS tool that's AI generated and then you can share that as an artifact. So wait for that to load. In the meantime, let's go back to Visual Studio Code now. See what we got. And Ken says, "Can you try that prompt again but turn off extended think mode to see if it's better. " Yeah. Okay. All right. So let's start a new chat. So this is with extended think mode. We'll duplicate the tab, run a new chat, opus 4. 1, make sure we don't have extended thinking on. So let's run it without and just see if the output is better. All right. So what we're testing here is can it code better without extended thinking mode. So this is extended thinking mode. This is none. And in the meantime, we've got the AI SEO content writer being created over here. SEO landing page as well still being generated. A lot of people saying they're not a fan of the extended thinking mode. I can see why with the One Nation says, "Thank you, Julian. Thanks to you, I'm benefiting from the latest AI innovations and SEO strategies. Wishing you continued success. " Happy. Thanks very much. We are posting more and more SEO content on the channel just because I think it is pretty fascinating. Even if there's less demand for it, I think like it's still an interesting industry and we're still running a lot of experiments in the background. All right, so it's generating the CSS now. It's already done the HTML. I'm assuming it's got to do the JavaScript as well later. It's getting expensive though, I'll tell you that for free. We already spent $160 without seeing anything back so far. So I hope the outputs are good for you. But let's see. So you got the AI powered content generator here. All right. So this is the artifact that we build. So essentially like back in the day you would pay for something like Jasper to create SEO content for you. It's going to get expensive. Blah blah. So let's type in a keyword. You can choose content type of voice. All right. Generate SEO content. And then it's created the content. Should be using AI. Yeah, I think it will be using AI because it wouldn't say in today's digital landscape if it wasn't. Let's just test this out. I just want to make sure it's not using the code instead of creating the outputs with AI. So, let's change this now to SEO agency and see if it just generates something completely different cuz it created the output. So, so we're going to change that and that create a new article. We'll X off that. And then it gives you like the words, the keyword density, readability score, and SEO score. Now, obviously that requires a lot more work. Like I would go back and forth in the chat way more, but you can see that basically you can build your own SAS tool in the space of a couple of minutes using Claude Opus 4. 1 and then you can actually share that with your team or with whoever you want. They're not going to get charged for using it. You're it and everybody's winning, right? So you can create your own SAS tool in like the space of 2 minutes and it's really powerful, right? You can download it, copy the text, reoptimize it, etc. Let's click on optimize. So it will just reoptimize the text right there and there you go. All right, now let's compare the output. So, we got Claude Opus 4. 1 thinking mode versus non-thinking mode. Let's see if this is better. Ah, look at that. That's way better. So, how interesting is that? Opus 4. 1 with extended reasoning. Pretty trash compared to non-reasoning. So, just something to bear in mind there is when you're using it, maybe just turn off the thinking mode. Pretty crazy game though. I mean, what that was a one shot as well. Work perfectly. All right, let's come back to Visual Studio Code here. So, now it's going to create the JS. Allesio says, "I've done a lot of coding tests with Gemini and both with and without extended thinking and the results are better without. It's likely this could also happen with anthropic models as well. " And Davey says, "Julian, is Sonic 4 better than Opus 3? " Yeah, I would say so. So, there we go. And we've got the preview of the page right there, which is pretty nice. It's a nice design right there. Pretty cool the way it's done, etc. What we can also do is if we open this up, so let's open that up. Then we'll go to Google Chrome. Grab that. And there's our website. And that was a one shot. Looks pretty nice, right? So, just to be clear here, Opus 4. 1, I've shown you how to create SAS tools. They're AI powered. You can share them with people. Claude Opus 4. 1 outperforms chat 03 as well as Gemini 2. 5 Pro. I've demonstrated the differences between thinking mode and non-thinking mode. And honestly, I would say non-thinking mode is better by a country mile. And then finally, what I've shown you is how to use the API directly inside Visual Studio Code to create something beautiful like you can see right here. This looks really nice, right? Like this super nice design. We just one-shoted that bad boy. And you could launch that, right? You could customize it to your brand and then launch it. Much better than the previous outputs I've seen on Visual Studio Code. To be fair, expensive to use the API, but if you want the best, you got to pay
for the best, don't you? So, thanks so much for watching. If you want to get access to all the video notes and prompts from today along with all of my best training and coaching on AI, feel free to get that inside the AI profit boardroom. We lally show all the automations that I use for my business including like avatar videos, email content automation, social media automation, a AI agents and workflows with tons of NA10 templates. And additionally, we have a YouTube AI roadmap. So, how you can grow your business with YouTube along with an agency course that shows you how to grow your own AI agency and also land more clients. Additionally, inside here, you can post inside the community. Ask any questions that you have and we always get back to you. Along with that, we do five live coaching calls a week. Now, the prices going up soon, so make sure you sign up now before you miss out. And inside here as well, you get like weekly Q& A, NA10 automation master classes. Everything is recorded as well, so you can watch it back. And yeah, feel free to get that inside the AI profit board. If you're like, "All right, Julian, I like everything you're showing, but I just don't have the time to implement it myself. " Then you can book in an AI automation session completely free. And basically on that call, we're going to discuss you becoming the client, right? So on that call, you're going to book it in. We're going to look at where you're spending your time as a business and then also what you can auto and how we can implement that for you. So, if you just want to pay someone else to implement all this stuff and you think, "Let's work with Julian Gold's agency," then feel free to book in an AI automation session, link in the comments description. And I appreciate you watching. As always, I just want to give a massive shout out to everyone who posted a comment and a question inside the chat today. That was awesome. And if you want to jump on the live streams, maybe you missed this one, then just join the AI Profit Boardroom and then go to the calendar and we do daily live streams and you can see the schedule right in there. All right, appreciate you watching. See you on next one. Cheers.