Is Sonnet 4.5 All Hype? (Industry Reacts)

12:06

Is Sonnet 4.5 All Hype? (Industry Reacts)

Ray Amjad 03.10.2025 3 518 просмотров 67 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=_O9m7ALrF5A - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=_O9m7ALrF5A - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=_O9m7ALrF5A 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=_O9m7ALrF5A - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links (in order): - https://x.com/claudeai/status/1972706807345725773 - https://x.com/sam_paech/status/1973203851458256995?s=12 - https://x.com/_abhaysinghal/status/1973476034864631916?s=12 - https://x.com/rayfernando1337/status/1973062312664895800?s=12 - https://x.com/slow_developer/status/1973406298352881720?s=12 - https://x.com/deedydas/status/1973574408599200146?s=12 - https://x.com/scaling01/status/1972728819409895649?s=12 - https://cognition.ai/blog/devin-sonnet-4-5-lessons-and-challenges#the-model-is-aware-of-its-context-window - https://x.com/SevDeutsch/status/1973782487777374654 - https://x.com/finbarrtimbers/status/1973922679418974298 Timestamps: 00:00 - Intro 00:21 - Task 1 02:12 - Design Taste 02:54 - Task 2 05:40 - Syncophancy 06:15 - Writing Evals 06:52 - Droid's Testing 07:14 - Complaints 07:40 - Cognition Labs' Thoughts 09:40 - More Tweets 10:19 - My Conclusion

Оглавление (11 сегментов)

Intro

All right, so a couple of days ago, Claude Sonnet 4. 5 came out and it claimed to be the best coding model in the world. A bunch of people have tried it and some people thought it was really great and other a bunch of crap and there had been a bunch of mixed reviews in the middle. I'll be sharing my own thoughts about Claude Sonnet 4. 5 having used it in Claude Code for about 20, 30 hours or so and then also sharing the experiences of other people online as well. So the first thing I got it to do is my application, HyperWhisper. There's a coupon

Task 1

code down below if you are interested. I got it to add a cloud version. So if you go to settings and go to credits, then you can like top up your account. With credits and you basically only pay for the amount of like minutes you actually transcribe. And basically Claude Sonnet 4. 5 made this UI over here and it also made the Cloudflare workers for the application. So when you are using HyperWhisper Cloud, you don't have to use it because you can use all these other providers over here with your own API keys too. And then use local models as well from this list. So you're not using the Cloud version. Basically it made the Cloudflare workers endpoint. As you can see over here, the requests are forwarded to Grok. And yeah, then I shipped this feature to production about two days ago and some people have started using it already, but most people are still using their own API keys. Anyways, I thought it did a pretty good job at planning this and also coming up with a design for like this credits page as well. It also came up with a design for this page where you buy credits to top up your account with as well. And yeah, that's one of the strong suits about Claude Sonnet 4. 5 I found. Claude models just generally are better at design and better at preserving design throughout the entire application. So whenever I use GPT 5 high in Codex CLI, it just kind of slams in this random design that doesn't really feel like it should belong in the application, whereas this design definitely does feel like it should belong. I did have to iterate on the design a tiny bit because the very first draft that I came up with was this over here, but now it looks much better after like two iterations. Hey, so as a short aside, if you're more interested in making money from Vibe coding and Vibe marketing, then you can learn how to do that in my AI startup school over here. A bunch of people have already joined, as you can see right over here, and made a bunch of money from their own applications. There are a bunch of classes on Vibe coding, Vibe marketing, coming up with ideas, distributing with AI, a bunch of case studies as well, and using Claude code and code X CLI. I've covered every single feature in them. And of course, as you can see over here, there's a lot of posts in the community where people are sharing knowledge with each other. And there will be a link down below if you are interested in

Design Taste

joining it. A couple people online have also said Claude Sonnet 4. 5 is so much better at designing UIs, as you can see over here. And some people had similar experiences and other people had different experiences as well. And yeah, this is Claude Sonnet 4. 5 when it comes to making vector images. So this is a lone tree in a desert. And this is like the New York City skyline. And I thought this part was pretty impressive with the sun like rays coming out of the building over here. And yeah, it seems to be really good at just like understanding visually what things should be pieced together because SVGs are vector images and it's basically writing the code for the vectors to create this image. And then you can see some other models over here that are good at coding, but not so good at design right over here with their own SVGs. I also got Claude Sonnet 4. 5 to make something much more

Task 2

complicated for my application MindDeck. There's a coupon code down below if you are interested as well. And it basically made a cloud storage version where you can back up your files to cloud and everything is encrypted locally and then backed up and it goes to Cloudflare R2 and as you can see over here, I can go to any of the files like chats and then basically everything is backed up, as you can see over here, encrypted. And this was a pretty huge task because it had to make its own sync engine and stuff like that. And also taking all these different factors when it comes to syncing local data with like the cloud version. And it also made some Cloudflare workers as well, as you can see over here to help with the syncing as well and validating license keys and so forth. And you can see right over here, I was doing like a whole bunch of tests throughout the entire day when making the syncing engine. And overall it did a pretty good job in sense like it can continuously develop and improve on a system that it made without like suddenly going crazy and just basically messing everything up. This was over the course of like six seven hours of Vibe coding and I found its planning abilities and planning mode in Claude Code to be on par with Opus 4. 1 because usually they'd recommend using Opus 4. 1 with planning but I think Sonnet 4. 5 like just matches Opus 4. 1 on that level which may make Claude Code cheaper as well because it probably means that you don't have to use Opus 4. 1 nearly as much but it did still have some blind spots like one of the blind spots that I had over here is that every time you press sync, it likes to sync the entire local collection to the cloud version instead of only syncing what's changed. So you can see it's syncing everything again over here. So now what I'm vibe coding it with is making a new sync state tracking system where it keeps track of what things have already been synced before and what things haven't and only syncing the differences instead. And this is something the model completely missed when it came to planning out this project or this new feature for the very first time. And I think I would probably combine the plans that this has with plans that GPT-5 has in Codex CLI. Especially for really big features like this. Another pretty big difference that I found compared to Claude Sonnet 4 is that Sonnet 4. 5 actually continues with execution until like what you told it or what it planned is completed. There were often times when I was using Sonnet 4 where it came up with a plan, it then executed on the plan and it claimed to be done. And then I got GPT 5 to check through the code of Sonnet 4 and it was like, hey, you're missing like this, this and this. And then I would tell Sonnet 4 and it's like, you're absolutely right, I did miss such and such. Whereas Sonnet 4. 5 when I get codec CLI GPT 5 to check for you whether the plan has been successfully executed by Sonnet 4. 5 then most of the time it's done almost everything perfectly as it actually planned that it was going to do. So I do find that it is better at executing on the plans without missing details or just claiming it implements that feature when it didn't implement a

Syncophancy

feature. And I think that's pretty related to what people are saying with Claude Sonnet 4. 5 showing a major drop in "Sycophancy" and they say in their system code that Sonnet 4. 5 is less needlessly agreeable earlier Claude models often over agreed even when wrong, much like GPT-4o. And you can see right over here that Sonnet 4. 5 now gets a score of 6. 5% and 11% over here. So there's like a huge drop. I'm not exactly sure what this benchmark involves, but it's good that there's a drop. And I have found this to be true in my experience as well. Whilst it still does say you're absolutely right, it does seem to say it less often than Sonnet 4 did. Another important thing to note is

Writing Evals

that Sonnet 4. 5 tops writing evals, as you can see over here. So when it comes to long form creative writing, then you can see it scores pretty highly right over here. And this over creative writing benchmark, it scores the best out of all these models as well. And I think this is generally true of Claude models as well. For many of my friends who do startups, that involves some kind of like writing element in like AI generating written text that will be read by other people. They usually do stick to Claude Sonnet for that purpose because it is just better at writing and better at writing things that seem less AI written. But that also may just be that most people aren't used to Claude Sonnet's writing in the wild compared to like GPT's writing in the wild. And also the factory

Droid's Testing

team who made Droid, which is like a platform agnostic, model agnostic coding agent. When they did some testing with Sonnet 4. 5, they found it to be on par with Opus 4. 1 when it came to using it in Droid right over here, which means that you can basically save like, I think Opus 4. 1 is five times more expensive. You can save a lot more money by using Sonnet 4. 5 instead. Other people

Complaints

have complained with some things about Sonnet 4. 5. Another Ray over here said, a medical student gets three chances to pass their boards, a pilot gets one shot at their license, Claude Sonnet 4. 5, unlimited attempts, cherry picks the best answer, claims 82%. In real world testing, it fell apart in my livestream, the "world's" best coding agent couldn't even beat a junior developer who only gets one try. And I think they're referring to this part of the system card or like blog post

Cognition Labs' Thoughts

over here basically. Cognition Labs who made Devon did a pretty good write up about Claude Sonnet 4. 5, some lessons and challenges, and basically they had to rebuild Devon for Claude Sonnet 4. 5. So they say why rebuild instead of just dropping the new Sonnet in place and calling it a day? Because this model works differently in ways that broke our assumptions about how agents should be architected. And they say that Sonnet 4. 5 is the first model we've seen that's aware of its own context window, and this shapes how it behaves. As it approaches context limits, we've observed it proactively summarizing its progress and becoming more decisive about implementing fixes to close out tasks. This context anxiety can actually hurt performance. We found the model taking shortcuts and leaving tasks incomplete when it believed it was near the end of its window, even when it had plenty of room left. They say that we ended up prompting pretty aggressively to override this behavior, even then we found that the prompts at the start of the conversation weren't enough. We had to add reminders both at the beginning and at the end of the prompt to keep it from prematurely wrapping up. And they say that they found that enabling the 1 million token context beta, which is like I think a beta flag that you have to put in the header request or something, but capping it at 200,000 tokens does seem to get around this feature because they say that it basically makes a model think that it has plenty of runway and it prevents it making as many shortcuts. They also notice the model likes to take a lot of notes, so the model treats the file system as its own memory without prompting. It frequently writes notes and summaries, e. g. changelog. md, summary. md, but not CLAUDE. md nor agents. md, both for the user and its own future reference. They also notice that Sonnet 4. 5 is efficient at maximizing actions per context window through parallel tool execution, e. g. running multiple bash commands at once, reading multiple files simultaneously. That being said, there are trade-offs. Parallelism burns through context faster, which leads to context anxiety that they mentioned earlier. And then they mention a couple of things that they're now exploring next. I would recommend reading through the article, basically everything that I mentioned

More Tweets

will be linked down below. Someone over here on Twitter said that GPT-5 Codex respects structure, Sonnet 4. 5 doesn't. It's good for PRDs if you're pivoting your product taking a new direction. In these cases, GPT-5 is too autistic and will cling to existing structures too much. I haven't really found this to be my experience, I think that Sonnet 4. 5 does seem to respect structure, especially when it comes to design, but I think it's an interesting take. Someone else said their review of Sonnet 4. 5 based on 30 hours of Claude code use, is that it's basically the same as Opus 4. 1 which is quite good but not as good as GPT-5 Codex thinking equals high. And I think this is kind of like what I mentioned earlier in the video. I think that Sonnet 4. 5 is equal to Opus 4. 1 when it comes to planning. And I think for my own personal conclusion,

My Conclusion

I will continue to use GPT-5 Codex thinking medium most of the time in Codex CLI when it comes to making any new features. But I will continue to use Sonnet 4. 5 for design related work at least. I think one of the things that will happen over the next couple weeks is that many people who do make these agents such as Devon or the Claude Code team or even the factory droid team and other like people who make agents on top of Sonnet 4. 5, they will learn many of the quirks and like weird behaviors of Sonnet 4. 5 such as some of the things that have been listed in the blog post over here and then basically change that agent to like better fit Sonnet 4. 5 such that its performance is better overall. I think that many of these developers who are making these agents on top of the models will make their own agents much better in the coming weeks and then I will probably have another look at Sonnet 4. 5 in like two to three weeks from now. I think this was pretty similar when GPT-5 came out as well. Like a lot of people said that GPT-5 was pretty bad the first couple days when it came out. And then a couple weeks later, everyone changed their mind and started using Codex CLI and so forth. And then they made it even better with GPT-5 Codex, like a fine tuned version for Codex. So yeah, I think it's like a general trend that whenever a new model comes out, like it doesn't seem to perform as well because many people are just swapping out the model parameter, but keeping all the system prompts the same. But then people have to slowly learn the quirks of the model and what makes it different and so forth, and then slowly adjust their system prompts and then a couple weeks after release, it certainly performs much better than it did on the day of release. Anyways, as I mentioned earlier in the video, do remember to check out the AI startup school. It will be linked down below in the description. And if you do join, then you can basically ask me any question that you have about development or marketing when it comes to like vibe coding or vibe marketing your own applications.

Другие видео автора — Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник