tested it live and the code just didn't function. But when I ran the exact same prompt on Claude Opus 4. 1, it worked perfectly. Created a beautiful playable game right away. Then I tried Sam Alman's example where he said to ask GPT5 to use Beatbot to make a sick beat to celebrate GPT5. This time GPT5...best at everything. In my head-to-head test, Claude Opus 4. 1 still won on several tasks. For content creation, I'm sticking with Claude. It feels more natural and human. For coding, Opus 4. 1 is definitely on the same level, sometimes better. There were certain tasks that GPT5 couldn't do that Opus 4. 1 handled perfectly
Поиск по транскрипциям
cursor is $20 a month this is 10 and on top of that you're getting unlimited access to GPT 40 and Claude Sonet which Claude Sonet is in my opinion the most powerful coding AI model right now that is pretty amazing so you're doing it at half the price I've been using curs to build
test results matter for you. Whether you're running an agency, building SAS products, or just want to create better digital experiences for your customers, Claude Opus 4. 1 isn't just writing code. It's thinking like a developer. It understands architecture. It knows how to make components work together. It can build complex interactive experiences that would take...these strategies. Remember, Julian Goldie reads every comment, so make sure you share your thoughts below. What impressed you most about these Claude Opus 4. 1 test results? What are you planning to build with these AI coding tools? And if you found this valuable, smash that subscribe button because I'm testing new AI tools every week and sharing
break a sweat. The benchmarks are insane, too. On the AIM 2024 math competition, Miniax M1 scored 86% that beat OpenAI's 03 model, Claude 4, and Gemini 2. 5 Pro. In coding tests, it scored 65% on Live Code Bench. And in software engineering tasks, it hit 56% on swbench. These aren't just good scores. These are crushing
break a sweat. The benchmarks are insane, too. On the AIM 2024 math competition, MiniAX M1 scored 86% that beat OpenAI's 03 model, Claude 4, and Gemini 2. 5 Pro. In coding tests, it scored 65% on Live Code Bench. And in software engineering tasks, it hit 56% on S. Bench. These aren't just good scores. These
break a sweat. The benchmarks are insane, too. On the AIM 2024 math competition, Miniax M1 scored 86% that beat OpenAI's 03 model, Claude 4, and Gemini 2. 5 Pro. In coding tests, it scored 65% on Live Code Bench. And in software engineering tasks, it hit 56% on SWE. These aren't just good scores. These are crushing
piece of advice and one that I consider pretty valuable is explanatory output style. Basically, inside of the cloud code config, there's the ability to enable an explanatory or learning output style. And that's where every time Claude will do something, it will just give you some sort of reason or justification why. Additionally, you can ask Claude...your work. That's to be understood. Just like the calculator removed our need to do like manual uh mental math. So too has Claude removed our need to do smaller little coding operations and so on and so forth. Alternatively, you can also go directly to whatever Claude instance just built you something and say create me a simple
going to click on terminal. Click on new terminal. Then from here just click on the commands and type in claude as long as you've got claw code installed. And you can actually use claw code directly inside anti-gravity here. Right? So you could say okay use teams to improve this website. Use teams to make this
Microsoft, who just introduced C-Pilot Co-work, which is a new Microsoft 365 feature built directly on Anthropic's Claude system. If you're not familiar, cloud code was very cool, but far too technical and complicated for most people. Then Anthropic made claude co-work as the next step and it was more useful for more people
launched. The big one is Claude Haiku 4. 5. And a few weeks ago, we launched Claude Sonnet 4. 5, which became the world's best coding model. And just last week, we followed it up with Claude Haiku 4. 5. And here's the really remarkable thing. 5 months ago, Sonnet 4 was state
your website. So for example for me I could set up a website right here and we could put this in the code section for example and just add a new folder right there. that Claude called coowork can run from. Right? So, let me do that. So, we're going to go to code. Then, we're going...game is fun, dopamine inducing, crazy colors, etc. Then we're going to hit let's go. And what you can see here is that Claude Co-work is going to begin to code that out and build it out. Right, whilst we're waiting for that to load, someone asks, is there an offer on right
customers, I'm here to help you get the latest AI updates. Here's how it works. You open VS Code, install the Kilo code extension, connect it to an AI model like Claude or GPT4, then you start chatting with it. You can say, "Build me a to-do app with a database and user authentication. " Kilo code will
properly. So you can take advantage of all the advanced features. One thing people keep asking is how this compares to other models like Claude, GPT, and Gemini. Here's the truth. For coding tasks, especially visual coding tasks, Kim K2. 5 is right up there with the best of them. And in some cases, it's actually better
Japanese market. Content is quite it feels quite human to be fair. It's not that bad. Not as bad as I was expecting. I would still use claude obviously but for writing content not bad at all really fast to respond. Let's have a look through the white paper as well. See if there's anything interesting here...probably go with I would say I think Gemini is pretty good. Gemini is probably the best or you can use Claude Opus as well. Claude Opus is very powerful but if you're coding with it using the API then it's going to cost a fair amount. It's quite expensive to use opus. So let's test
around 70. 9% on reasoning tasks. Gemini 3 is at 53. 3%. Claude 4. 5 is at 59. 6%. That's a massive gap. For coding, garlic scored 94. 2%. Gemini got 89. 1%. Claude hit 91. 5%. Again, garlic is ahead. Now, I know what you're thinking. Benchmarks don't always match real world
programming tasks. And the early feedback is incredible. According to reports from the information, GPT5 shows significantly improved programming skills, especially with complex realistic code in large software projects. Testers say it outperforms Claude Forson in head-to-head comparisons. But it's not just coding. GPT5 shows higher performance in scientific disciplines such as mathematics, physics, and technical tasks
programming tasks. And the early feedback is incredible. According to reports from the information, GPT5 shows significantly improved programming skills, especially with complex realistic code in large software projects. Testers say it outperforms Claude Forson in head-to-head comparisons. But it's not just coding. GPT5 shows higher performance in scientific disciplines such as mathematics, physics, and technical tasks
going to create the HTML for that. So we got two different windows coding over there. In the meantime, do you know what? Let's start building out with anti-gravity. So this is Google Anti where you can get free access to using uh Claude Opus 4. 5. So if we click on a new window over here...actually work, which is great. And it's looking pretty clean and nice, right? It's written all the copy, designed it, hosted it, coded it locally, and we've done that all using Claude Opus 4. 5 and the terminal, right? And I don't code. I can't code. I don't know what I'm doing
calling is how AI actually uses tools and gets things done. On reasoning benchmarks is competing with the best models from OpenAI and Google. On coding benchmarks, it's showing performance that rivals Claude Sona 4. But here's what's really exciting about this. The team behind this use something called hybrid reasoning. The model can switch between...that thinks more like a human expert. Now, let me show you some specific performance comparisons. Against Claude Opus 4, GLM-4. 5 performs better on agentic tasks. Against GPT 4. 1, it performs better on several coding benchmarks. Against Gemini 2. 5 Pro is competitive across most metrics. And remember, this is an open-source model
calling is how AI actually uses tools and gets things done. On reasoning benchmarks is competing with the best models from OpenAI and Google. On coding benchmarks, it's showing performance that rivals Claude Sona 4. But here's what's really exciting about this. The team behind this use something called hybrid reasoning. The model can switch between...that thinks more like a human expert. Now, let me show you some specific performance comparisons. Against Claude Opus 4, GLM-4. 5 performs better on agentic tasks. Against GPT 4. 1, it performs better on several coding benchmarks. Against Gemini 2. 5 Pro is competitive across most metrics. And remember, this is an open-source model