Why Everyone’s Freaking Out About Claude 4 (With Examples)
25:08

Why Everyone’s Freaking Out About Claude 4 (With Examples)

The AI Advantage 22.05.2025 113 182 просмотров 2 808 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Try out the new model capabilities through Claude.ai & Claude Code on the Max plan: http://clau.de/aiadvantage Links: https://www.anthropic.com/news/claude-4 https://www.anthropic.com/news/agent-capabilities-api Examples From The Video: https://claude.ai/public/artifacts/de510a29-6d65-48c9-ac1f-29060cb303e5 https://claude.ai/public/artifacts/b080bd1d-52c0-4526-86ac-df25f6e247a5 https://claude.ai/public/artifacts/d5403a7b-f6b2-4c37-b568-dfebd81d9750 Chapters: 0:00 Overview 5:30 Writing 9:14 Agentic Upgrades 17:19 Example Apps #claude #claudeopus4 #ClaudePartner Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 💼 AI Advantage LinkedIn: https://www.linkedin.com/company/the-ai-advantage 🧑‍💻 Igor's Personal LinkedIn: https://www.linkedin.com/in/igorpogany/ 🐦 Twitter: https://x.com/IgorPogany 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (4 сегментов)

  1. 0:00 Overview 995 сл.
  2. 5:30 Writing 743 сл.
  3. 9:14 Agentic Upgrades 1526 сл.
  4. 17:19 Example Apps 1554 сл.
0:00

Overview

Okay, so we just got the brand new Claude models from Enthropic. The internet is going crazy. Concretely, we're talking about Claude for Opus and Claude for Sonnet. And I think the hype right now is actually justified because we're going to be looking at various ways to use this thing and how it performs in comparison to some of the competition in this very video. But let me just tell you, writing style unmatched. Coding ability, well, it matches the demos that we just saw on Google IO. even with the web interface. But not just that, it goes further. It builds these things that are way more complex in one shot, no problem. It's smarter. It performs better on benchmarks. It just looks very solid overall. But the real question isn't if the benchmarks are good. The real question is, is this thing actually any good in practice? Should this become your new daily driver? The short answer to that is probably. And at the very least, you should consider and test this thoroughly yourself because this thing goes above and beyond what we've seen from all other models on things like writing, coding, context retention. We'll talk about the developer side of it too, but mainly we'll be focusing on examples for the web interface here. And before we get into it, which I'm very excited about, and this is going to be grounded in multiple examples and experiences with these models, and we're going to talk about this as we do with new launches, but I want to point out one thing, which is Anthropic actually gave me access to this a few days in advance before this launch. And that also makes this video actually sponsored by Antropic. But I told them straight up like, I'm only doing these videos if I can kind of give my opinion. And if the opinion would not uh confirm with what they want, then too bad. Like, we're not going to be able to do this because I got to keep these reviews honest and grounded in examples. So, I hope that this video is going to show you various examples and use cases of this because they're seriously impressive. But first, we got to go over what has been released here. As I said, um, this has been shipped all across their platforms, the two models that I mentioned. So, Claude for Opus, Claude for Sonnet. If you look at the web platform, Claude. ai, then you'll see that immediately these are available to you if you're on one of the paid plans. Right now, I'm on the Max plan over here. If you use clot code, you can just uh type slashmodel and switch to opus right here. So that's available too and they're available through the API today, which is amazing. No fuss. Rollout went super uh super smoothly. Okay, so what's new? Well, this part is actually pretty quick. It's really the model performance that is the most impressive thing here. But essentially, they released these new models. There's some examples here, but mostly it performs better on coding. It performs better on writing. They make all of these claims, but they hold. Okay, so first up, I want to look at one benchmark that they lead with here. This is the um SWE bench. These are practical software engineering examples from the real world. And this is just one thing that all of these new models, new frontier models that can think measure themselves on. Okay, here are basically different coding problems that coders would um face in the real world. Okay, so what does this look like? Well, they're the best in the world on them just straight up. The Sonnet model actually being slightly better, which is interesting because it's way cheaper. But basically, they just smash these benchmarks completely. For reference, I want to pull up um where we were six months ago with 01. Remember the first thinking model out of OpenAI and what was it? November 2024, I believe. Well, if you pull these up side by side, you will quickly realize how far we've come in such a short amount of time. Back then, cracking 30 to 40% of these problems was considered groundbreaking. Like, incredible that it can do most of these tasks that software engineers uh face. Well, right now we're looking at 72 up to 80% um with these caveats here. So, incredible. Okay, it beats 03 and it's just the best model available on the market today. Now, at this point, I do have to point out that um we Google IO presented the new um you know, deep search model that is not available as of today. And on some of the benchmarks, they didn't even release anything. They were very selective with what they showed. Um, and we don't even have direct comparisons, but that might be a good competitor to this, as might be future releases, but who knows what the future holds. Right now, this smashes, but as you know, it's not even about the benchmarks. It's really about how it performs in practice. feels. It's really about the vibes, as the people say. Okay, so I prepared a few examples here in my early testing and reran them on the platform today just to make sure that you know some of these things hold and that the final production models that are shipped now kind of hold up to what my experience previously with them was. And let me tell you, this thing performs like a charm. So I want to go into some examples, then we're going to talk some more about developer tooling, and then we're going to talk um about other examples. Okay, but first up, I just want to talk about tone because this is one of the things that I personally care the most about actually writing with
5:30

Writing

these models. Not writing code, but just writing text. So, if I tab over here and I go over to the most basic prompt on planet Earth that I often run, write me an email to my boss about the broken coffee machine. Okay, if you run that in here and you read this email, let's just take a second to appreciate how nonaiish this sounds. And you might be familiar with the fact that Claude previously already has been kind of crowned by many the king of tone. Many people use Claude just because they like the tone so much. Okay, Opus 4 kind of trumps that, I would say. Like honestly, like, okay, just have a look. This does not sound like AI, but you be the judge. That's just my opinion right here. So, look at that. Hi, boss. I wanted to bring your attention um to your attention that the office coffee machine stopped working this morning. It's not powering on at all. I've checked that it's properly plugged in and try different outlets. It's like a human would speak no since many of us rely on it. Blah blah blah. Would you like me to? And then three options. Contact the manufacturer about the warranty repair options. Research replacements. Call a local appliance repair service. In the meantime, I've set up the back office. It just reads like a human. There's none no weird words. It's just, hey, this is the problem. I tried a solution. Here's a few suggested solutions. This just reads like a human. There's no weirdness with it. Okay. And I ran many examples. It just like con and this is not even like prompting for any sort of style. It's just the default model. It's way more human, which is wonderful. Um, sonnet 4 note is not as good as for Opus 4 at actually this writing tone. This is a unique ability with Opus 4. Okay, what else? I have this little prompt that I actually use with a fine-tuned model that I have that I fine-tuned on a lot of my video scripts. And it just gives it a little bit of, you know, context for how it should behave, although it's already trained on many examples. And right here, I just want to look at the first two sentences and then we can wrap up this section on style. You can see that it sounds great in my opinion. Okay, so this is a YouTube video intro for the best and worst free AI art generators. And then it just starts out. You know that feeling when you have this incredible image in your head, but you can't draw it to save your life? That was me six months ago staring at a blank canvas app, frustrated as hell, wondering why my stick figures looked more like abstract disasters than actual people. Then I discovered AI art generators. I don't know about you, but to me, this reads like a human. This just sounds natural. Like I can't help myself. I pulled up uh the comparison here with GPT4. 5, which I also used to really like for writing. But just looking at this side by side, I'm going to have a very hard time um going back from the Opus 4 model for writing because look at this. You know, when I first discovered AI art, I generally thought I'd found magic. Imagine taking your wildest ideas and seeing them come to life right before your eyes. Nobody speaks like that. I mean, and this is the best model they have for writing. It is really good, but still it has But look at that. Um, some of them gave me stunning images that left me speechless. Others, well, they made me question if I was even speaking the right language. And then, uh, look at that. So, today we're diving deep into the absolute best and just diving deep. Again, really, this just doesn't do that. It sounds natural and it sounds great without even any specialty prompting. I think this is really this is kind of really shifting what is perceived as AI text. And yeah, if you write with AI, very strongly recommend you give this a shot at least. Okay, what else we got? Well, there's a bunch of changes in the API and to what it can actually do with code. Okay, obviously I
9:14

Agentic Upgrades

mean it led with the software engineering benchmark as kind of one of the big selling points. And even if you're a nondev, this really matters because the ability to create little applications, little scripts, just little functional tools to help you is severely underrated by most people. And the reason why most people that even would have the curiosity and the technical or like the computer usability to kind of make these things happen has been severely hindered by the incompetence of many of these models. That's just the truth. You heard about vibe coning. You tried to create something, but then it creates it, but it doesn't really work. Like there's bugs in there and the button doesn't work and you kind of have to tinker with it. And sure, you can make it eventually work and you can make kind of a bad version of something that you would need to pay $5 a month for or $10 a month for, but that's just not worth it for most people. I think here for the first time even beyond 03 even beyond de Gemini 2. 5 Pro I just have this feeling that changed and I have a few examples here. I didn't run these examples multiple times. Okay, these examples I just run once and they just work exactly as expected. Okay, little games, little dashboards. We'll look at these in a second. Before we do that, I just gonna want to go over this blog post very briefly to show you what changed here. Because what they essentially shipped is they shipped a bunch of developer tools. So, not just that the model is really good for developers as we talked about here and it's really good at Pokemon now and it's available in clawed code and all that which is a completely different beast. Um, we can talk about that later on. But in the API, they shipped Claude, Opus 4, and Sonnet 4 with all of these different tools like data analysis. So it cannot just write code, it can also execute code through the API. And these are the things that made O3 so powerful. If you're not familiar, OpenAI shipped 03 a while back and it just blew people away because it had all of these tools. It could write code, execute code, it could um it could can generate images now, which is actually not that relevant. That's something you don't have in here. But the most essential thing is that it can actually like solve these math problems. It can do data analysis by itself when it considers it worth it. Okay, these the thinking models. And now I'm going to get to my favorite part. They changed two things in here. Okay, so in cloud code, which is essentially a command line interface for um claude, so you run it in your terminal like this, they change the length that these tasks can run at. Okay. And this video is not focused on that, but I ran a few projects and try to build a few things. Before it was like 1 to five minutes. Now it routinely runs for like 15, 20 minutes. And in the keynote today, they stated it can run for up to seven hours. through the API if you let it. Okay, so we literally went from chat GPT two years ago solving problems that might take you I don't know 10 minutes to do by hand and it did it instantly, right? Like in maybe like 10 seconds. Amazing. Then we moved to these thinking models which sort of fought for a minute or two and solved harder problems. They could do math. They could code things that were unthinkable before. All of a sudden on s uh swb bench they were performing really highly and it was like wow these are becoming really useful for real world tasks and real world problems and hard problems not just for inspiration or as an assistant or a co-pilot and they could save you tens of minutes. Then deep research came around and like that was the mind-blow. If you follow the channel, I praise the deep research functionalities across all apps cuz that was like hours saved. In some cases, maybe even dozens of hours depending on your tech literacy skills. But with this, if the agent runs for seven damn hours, that's not saving you hours. That's saving you dozens, maybe even hundreds of hours. And there was this interesting point in the keynote which I do recommend you check out from Enthropic where the CEO was asked, "Hey, what do you think? When is the first $1 billion company going to be built by just one person? " And he actually said 2026. Now, if that holds, I don't know. But with agents running for seven hours in May 2025 already, nobody would have predicted that. Nobody thought that we're going to get scores like this early. Okay? Nobody thought that it would be creating plans on how to solve Pokemon and these complex problems and have like a getting unstuck protocol that it just self-codes and then follows. Nobody thought that we could like initiate these agents in cloud code, let them run for multiple hours while it problem solves everything it stumbles into by itself. It needs to run code. Cool. It does that. It needs to search the web. Okay, it can do that. it needs access to the YouTube API or to your Slack channels. Well, that's what model context protocol is there for MCP, the kind of universal connector that has now been by the way, um, if you're not familiar, it's kind of like the HTTP protocol was for the internet. This is the same thing for agents. So, you can just connect anything into it. And it has now been adopted by pretty much everybody. I mean, if you're not aware, Google adopted it this week at IO. OpenAI adopted it this week, too, into their APIs, too. It's kind of everywhere. Microsoft announced that it's going to be built into Windows MCP. So, it's kind of like universal. So, you can plug all of these other tools into it. Now, the API has um web search, it has um prompt caching, which we'll talk about next here. All these MCP servers, it can execute code and more. Okay. So, last thing we got to talk about here in the context of this running forever and doing amazing stuff is prompt caching. So, if you're not familiar, this is essentially a technique to save money. So, when you run the API and you run it for a while, that's the reason that this hasn't been able to work. Like, most agents weren't working for like 50 minutes. They were working for like 3 minutes and then they kind of ended or for 10 minutes in some cases, right? Or a bit more. But the problem is it starts getting really expensive because you need to start passing the entire history of the interaction back to the agent, right? So it needs to remember what it did 40 minutes ago if it's going to work for 40 minutes and then that just starts stacking up. The list starts getting longer, right? Prompt caching what it does is you give it a specific area of context or a specific amount of context and it remembers it once and then it lives in the cache of the agent and you don't have to keep paying for those tokens. So it's a way to save money and a way to enable longer term workflows. and they extended prompt caching from up until now. Up until now it was five minutes and now it's up to one hour with several other improvements. Insane. Like these agents will be just running everywhere now. This is available through the API today. Okay, so people are going to be building this into stuff. So punchline before we look into some of these examples here is that it writes really well. It codes really well. But mainly these agents can not just run for minutes as they have. And by the way, if you're not familiar, all of these vibe coding apps like Cursor and Lovable and all of these others, they're built on Claude's models. They're just Claude wrappers with a bunch of logic and fine-tunes and you know, proprietary data and stuff in the background, but they run on these models. And now the models got massive upgrades and capabilities and you can run them for cheap, not for five minutes, but for an hour. Gamechanging things really. And we'll see the repercussions of this over time. To give you a more concrete feeling, I want to round out this video by actually showing you a few examples. Okay, so we talked about some of these uh things like the writing style and that's really great, but really I want to focus on kind of the coding abilities here that you can immediately experience in the web
17:19

Example Apps

interface by just prompting something and then seeing it come to life and the artifacts here. So this is this I ran this for comparison. I'm going to show you the prompts. Okay, they're super simplistic. So, anybody can recreate these. And also, it's also kind of hard to communicate like these just super long prompts in here. And it doesn't even make that big of a difference because these models are just so damn good at figuring out what you might mean and creating like to-do lists in the background while it thinks that you can kind of just do simple prompts like this. Okay, so I prompted for essentially the same thing that I tested in the Google that was shown off in the Google IO video and that I also tested in O3. Of didn't do that well with their canvas. This does it super well. Plus, it actually animates the different uh planets, which is amazing. So, can actually go here. I can click Saturn. Um, and it should be able Yeah, it shows you the details and you can speed it up and whatever. Nice. Simple little web app. I think I followed up one time. Did I? Yeah. Show all the planets at once. Um, that's the only follow-up prompt I had to do to make it better. before it looked like this uh with just one planet at once and in the first version I didn't tell it to be 3D so it just looked like this but yeah about I don't know 20 words and you get to this okay next up I built a game okay and you might be saying Eigor you know like game who's who needs a simple you know stupid game I think they make great examples actually because they're quite complex there's a lot of logic there and they're just great to show off so let's do exactly that I started build a simple 3D RPG. Okay, boom. I got this. All of a sudden, you could move around. You could attack and stuff. All right, let's move on. Add enemies, combat, and a golden shovel as the weapon. I don't know why golden shovel. Maybe that's a little, you know, like we're in the golden era of AI and like these tools are the picks and shovels. So maybe that's what that means, but I don't know. I just kind of came up with that. So it works, right? Not bad. Now, how can we make this better? Must be really complicated, right? Make it so that Q and E make me look around and make it more aesthetic. Wow, I'm such a gifted coder here. Make it more aesthetic. And it came up with this. Nice. And now I can spin around and yeah, looking good. Okay, but we can do better than that. No, make it brighter and add more decoration. Okay, it fought for a bunch. I'm doing all of this with plot for Opus, by the way, with extended thinking enabled. And yeah, in here, that's exactly what it did. Excellent. Looking good to me. There's more decoration. Look at those chandeliers. We have a simple little game, but we can do better than that than that. Set swing shovel to R. Okay, that was I just wanted everything on the keyboard. So, I did that. Let me extend it a little more. Can I do that? Okay, great. And now, let's actually play the game. Let's run around. These enemies are getting the best of me. I can kind of like see that all works. And here's the thing. There's no quirks. Defeated. Okay, I need to refresh for it to work. Like, sure. Could this ball be a little higher up? Yes. But like, it all works. The experience, the quality, uh, the vitality, the enemies, everything just works as expected. expected, which when I first tried this, I was like, that's a little sus. I'm not used to that. I'm used to these things coming up and you having to bug fix for 20 minutes before you even get a working version. That's just the case with every other model. Not with this one. It kind of just works. Um, and I think, yeah, make the enemies into tacos. So, I ran this like a minute before hitting record. So, let's have a look at actually did that. Okay, I literally did not check this. Ah, okay. Yes, we have the taco enemies. Look at that. Oh, tactical number one. They even have eyes. I kind of love it. But you might be saying, "Okay, Eigor, you know, tacos, golden shovels. I'm not sure that's really that practical. " Okay, how about something actually convenient? something simple that you might actually need like a finance tracker. Okay, these dashboards were some of the first things people have been building with claw 3. 5 sonnet, which even 3. 5 sonnet and 3. 7 sonnet were already some of the favorites. But now when it actually builds a dashboard, the damn thing works. And I'm not just saying it kind of works or most of it works. All of it works. Okay, I just said build an interactive finance dashboard for a student that is intuitive yet functional. include logging, cash flow management, and budgeting functionality. Super simple prompt. Lot of room to interpret. That's what these models are really good at. I just give it the goal and that's it. By the way, that's that would be my prompting tip. We talked about this before. If I reset the site, okay, I have login management um with the login info. Here I sign in. Oops, I must have mistyped demo 123. Perfect. We're in. If you built these apps before, you will know that there's just quirky stuff like usually the graphs were a bit off, you know, like the green line kind of like ended out here or the dollar sign wasn't in place or it look something looked off. Show me on here what looks off. And we could rerun this 10 times. It just works. That's what I'm saying. So, you can add transactions. Okay. example whatever amount 50 date look at that the entire interface there's not a single quirk here and that's just something that me using these tools so much I'm not used to I'm surpris I'm kind of like weirded out by it to be honest like how does this just how does it work like why does it all work this is not supposed to work I'm supposed to like follow up with another 10 prompts and bug fix all the stuff and then it's going to get to the state I just one-shoted To be fair, I twoshoted it. I gave it this message and then I told it here, add an extra tab for the budget and make it editable and add visualizations on the homepage with mock data. It did that. So, I have my budgeting tab over here. I could publish this to the web if I wanted to share this with you. I will actually do that. I will publish these put the links into the description below if you kind of want to, you know, stress test this yourself, see if it works. But yeah, essentially I can change my monthly budget, add a zero, that's the good life. And then um let's see. Entertainment, transport, all of that just adds up. Just works. Okay, so to round things out, I built this little application recently. There's going to be a tutorial on the channel soon, like a step-by-step thing, including like a GitHub tutorial and stuff. The point is, I was building this application purely HTML and it was, you'll hear about it soon. It's very cool. It takes your voice, you can prompt on top of it. It uses the transcription API. And for the life of me, I haven't been able to turn that thing into a Chrome extension. It just worked as a web app and I really wanted as a Chrome extension. Guess what? Claude Opus oneshotted the thing. I just told it turn it into a Chrome extension. It did it. It works. I don't understand exactly how or why, but the results are there. So, this is impressive. You know, if you've been following the channel, it's been a while since I really made a video that is like, wow, this like this really changes like everything. Not just in the software industry, but al also for all the apps, the subscriptions app you have. I mean, we could talk about this forever and we will, but you can think of the various like online services like Netflix just spawn these dashboards when you need them on demand. Check it out. Thank you for Claude for giving me early access to this so I could play around a little more than usual with this. And yeah, go create stuff, go write, go code, go build apps, have fun. And yeah, I hope you'll have the same experience as me where it will surprise you by how reliable it is. That's all I got for today. I hope you have a wonderful day.

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться