Is Sonnet 4.5 All Hype? (Industry Reacts)
12:06

Is Sonnet 4.5 All Hype? (Industry Reacts)

Ray Amjad 03.10.2025 3 518 просмотров 67 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=_O9m7ALrF5A - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=_O9m7ALrF5A - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=_O9m7ALrF5A 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=_O9m7ALrF5A - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links (in order): - https://x.com/claudeai/status/1972706807345725773 - https://x.com/sam_paech/status/1973203851458256995?s=12 - https://x.com/_abhaysinghal/status/1973476034864631916?s=12 - https://x.com/rayfernando1337/status/1973062312664895800?s=12 - https://x.com/slow_developer/status/1973406298352881720?s=12 - https://x.com/deedydas/status/1973574408599200146?s=12 - https://x.com/scaling01/status/1972728819409895649?s=12 - https://cognition.ai/blog/devin-sonnet-4-5-lessons-and-challenges#the-model-is-aware-of-its-context-window - https://x.com/SevDeutsch/status/1973782487777374654 - https://x.com/finbarrtimbers/status/1973922679418974298 Timestamps: 00:00 - Intro 00:21 - Task 1 02:12 - Design Taste 02:54 - Task 2 05:40 - Syncophancy 06:15 - Writing Evals 06:52 - Droid's Testing 07:14 - Complaints 07:40 - Cognition Labs' Thoughts 09:40 - More Tweets 10:19 - My Conclusion

Оглавление (11 сегментов)

  1. 0:00 Intro 111 сл.
  2. 0:21 Task 1 457 сл.
  3. 2:12 Design Taste 170 сл.
  4. 2:54 Task 2 614 сл.
  5. 5:40 Syncophancy 132 сл.
  6. 6:15 Writing Evals 148 сл.
  7. 6:52 Droid's Testing 78 сл.
  8. 7:14 Complaints 87 сл.
  9. 7:40 Cognition Labs' Thoughts 406 сл.
  10. 9:40 More Tweets 162 сл.
  11. 10:19 My Conclusion 402 сл.
0:00

Intro

All right, so a couple of days ago, Claude Sonnet  4. 5 came out and it claimed to be the best coding   model in the world. A bunch of people have tried  it and some people thought it was really great and   other a bunch of crap  and there had been a bunch of mixed reviews   in the middle. I'll be sharing my own thoughts  about Claude Sonnet 4. 5 having used it in Claude   Code for about 20, 30 hours or so and then also  sharing the experiences of other people online   as well. So the first thing I got it to do is  my application, HyperWhisper. There's a coupon
0:21

Task 1

code down below if you are interested. I got it to  add a cloud version. So if you go to settings and   go to credits, then you can like top up your  account. With credits and you basically only   pay for the amount of like minutes you actually  transcribe. And basically Claude Sonnet 4. 5 made   this UI over here and it also made the Cloudflare  workers for the application. So when you are using   HyperWhisper Cloud, you don't have to use it  because you can use all these other providers   over here with your own API keys too. And then use  local models as well from this list. So you're not   using the Cloud version. Basically it made the  Cloudflare workers endpoint. As you can see over   here, the requests are forwarded to Grok. And  yeah, then I shipped this feature to production   about two days ago and some people have started  using it already, but most people are still using   their own API keys. Anyways, I thought it did a  pretty good job at planning this and also coming   up with a design for like this credits page as  well. It also came up with a design for this   page where you buy credits to top up your account  with as well. And yeah, that's one of the strong   suits about Claude Sonnet 4. 5 I found. Claude  models just generally are better at design and   better at preserving design throughout the entire  application. So whenever I use GPT 5 high in Codex   CLI, it just kind of slams in this random design  that doesn't really feel like it should belong in   the application, whereas this design definitely  does feel like it should belong. I did have to   iterate on the design a tiny bit because the  very first draft that I came up with was this   over here, but now it looks much better after  like two iterations. Hey, so as a short aside,   if you're more interested in making money from  Vibe coding and Vibe marketing, then you can learn   how to do that in my AI startup school over here.   A bunch of people have already joined, as you can   see right over here, and made a bunch of money  from their own applications. There are a bunch of   classes on Vibe coding, Vibe marketing, coming up  with ideas, distributing with AI, a bunch of case   studies as well, and using Claude code and code  X CLI. I've covered every single feature in them.    And of course, as you can see over here, there's  a lot of posts in the community where people are   sharing knowledge with each other. And there will  be a link down below if you are interested in
2:12

Design Taste

joining it. A couple people online have also said  Claude Sonnet 4. 5 is so much better at designing   UIs, as you can see over here. And some people had  similar experiences and other people had different   experiences as well. And yeah, this is Claude  Sonnet 4. 5 when it comes to making vector images.    So this is a lone tree in a desert. And this is  like the New York City skyline. And I thought this   part was pretty impressive with the sun like rays  coming out of the building over here. And yeah, it   seems to be really good at just like understanding  visually what things should be pieced together   because SVGs are vector images and it's basically  writing the code for the vectors to create this   image. And then you can see some other models over  here that are good at coding, but not so good at   design right over here with their own SVGs. I also  got Claude Sonnet 4. 5 to make something much more
2:54

Task 2

complicated for my application MindDeck. There's  a coupon code down below if you are interested as   well. And it basically made a cloud storage  version where you can back up your files to   cloud and everything is encrypted locally and then  backed up and it goes to Cloudflare R2 and as you   can see over here, I can go to any of the files  like chats and then basically everything is backed   up, as you can see over here, encrypted. And this  was a pretty huge task because it had to make its   own sync engine and stuff like that. And also  taking all these different factors when it comes   to syncing local data with like the cloud version.   And it also made some Cloudflare workers as well,   as you can see over here to help with the syncing  as well and validating license keys and so forth.    And you can see right over here, I was doing like  a whole bunch of tests throughout the entire day   when making the syncing engine. And overall  it did a pretty good job in sense like it can   continuously develop and improve on a system that  it made without like suddenly going crazy and just   basically messing everything up. This was over the  course of like six seven hours of Vibe coding and   I found its planning abilities and planning mode  in Claude Code to be on par with Opus 4. 1 because   usually they'd recommend using Opus 4. 1 with  planning but I think Sonnet 4. 5 like just matches   Opus 4. 1 on that level which may make Claude Code  cheaper as well because it probably means that you   don't have to use Opus 4. 1 nearly as much but  it did still have some blind spots like one of   the blind spots that I had over here is that  every time you press sync, it likes to sync   the entire local collection to the cloud version  instead of only syncing what's changed. So you   can see it's syncing everything again over here.   So now what I'm vibe coding it with is making a   new sync state tracking system where it keeps  track of what things have already been synced   before and what things haven't and only syncing  the differences instead. And this is something the   model completely missed when it came to planning  out this project or this new feature for the very   first time. And I think I would probably combine  the plans that this has with plans that GPT-5 has   in Codex CLI. Especially for really big features  like this. Another pretty big difference that I   found compared to Claude Sonnet 4 is that Sonnet  4. 5 actually continues with execution until like   what you told it or what it planned is completed.   There were often times when I was using Sonnet 4   where it came up with a plan, it then executed  on the plan and it claimed to be done. And then   I got GPT 5 to check through the code of Sonnet  4 and it was like, hey, you're missing like this,   this and this. And then I would tell Sonnet 4 and  it's like, you're absolutely right, I did miss   such and such. Whereas Sonnet 4. 5 when I get codec  CLI GPT 5 to check for you whether the plan has   been successfully executed by Sonnet 4. 5 then most  of the time it's done almost everything perfectly   as it actually planned that it was going to do.   So I do find that it is better at executing on the   plans without missing details or just claiming it  implements that feature when it didn't implement a
5:40

Syncophancy

feature. And I think that's pretty related to what  people are saying with Claude Sonnet 4. 5 showing a   major drop in "Sycophancy" and they say in their  system code that Sonnet 4. 5 is less needlessly   agreeable earlier Claude models often over agreed  even when wrong, much like GPT-4o. And you can see   right over here that Sonnet 4. 5 now gets a score  of 6. 5% and 11% over here. So there's like a huge   drop. I'm not exactly sure what this benchmark  involves, but it's good that there's a drop. And   I have found this to be true in my experience as  well. Whilst it still does say you're absolutely   right, it does seem to say it less often than  Sonnet 4 did. Another important thing to note is
6:15

Writing Evals

that Sonnet 4. 5 tops writing evals, as you can see  over here. So when it comes to long form creative   writing, then you can see it scores pretty highly  right over here. And this over creative writing   benchmark, it scores the best out of all these  models as well. And I think this is generally true   of Claude models as well. For many of my friends  who do startups, that involves some kind of like   writing element in like AI generating written text  that will be read by other people. They usually   do stick to Claude Sonnet for that purpose because  it is just better at writing and better at writing   things that seem less AI written. But that also  may just be that most people aren't used to Claude   Sonnet's writing in the wild compared to like  GPT's writing in the wild. And also the factory
6:52

Droid's Testing

team who made Droid, which is like a platform  agnostic, model agnostic coding agent. When they   did some testing with Sonnet 4. 5, they found it  to be on par with Opus 4. 1 when it came to using   it in Droid right over here, which means that you  can basically save like, I think Opus 4. 1 is five   times more expensive. You can save a lot more  money by using Sonnet 4. 5 instead. Other people
7:14

Complaints

have complained with some things about Sonnet 4. 5.   Another Ray over here said, a medical student gets   three chances to pass their boards, a pilot gets  one shot at their license, Claude Sonnet 4. 5,   unlimited attempts, cherry picks the best answer,  claims 82%. In real world testing, it fell apart   in my livestream, the "world's" best coding agent  couldn't even beat a junior developer who only   gets one try. And I think they're referring to  this part of the system card or like blog post
7:40

Cognition Labs' Thoughts

over here basically. Cognition Labs who made Devon  did a pretty good write up about Claude Sonnet   4. 5, some lessons and challenges, and basically  they had to rebuild Devon for Claude Sonnet 4. 5.    So they say why rebuild instead of just dropping  the new Sonnet in place and calling it a day?    Because this model works differently in ways that  broke our assumptions about how agents should be   architected. And they say that Sonnet 4. 5 is the  first model we've seen that's aware of its own   context window, and this shapes how it behaves.   As it approaches context limits, we've observed it   proactively summarizing its progress and becoming  more decisive about implementing fixes to close   out tasks. This context anxiety can actually hurt  performance. We found the model taking shortcuts   and leaving tasks incomplete when it believed it  was near the end of its window, even when it had   plenty of room left. They say that we ended up  prompting pretty aggressively to override this   behavior, even then we found that the prompts at  the start of the conversation weren't enough. We   had to add reminders both at the beginning and at  the end of the prompt to keep it from prematurely   wrapping up. And they say that they found that  enabling the 1 million token context beta, which   is like I think a beta flag that you have to put  in the header request or something, but capping   it at 200,000 tokens does seem to get around this  feature because they say that it basically makes a   model think that it has plenty of runway and it  prevents it making as many shortcuts. They also   notice the model likes to take a lot of notes, so  the model treats the file system as its own memory   without prompting. It frequently writes notes and  summaries, e. g. changelog. md, summary. md, but not   CLAUDE. md nor agents. md, both for the user and  its own future reference. They also notice that   Sonnet 4. 5 is efficient at maximizing actions per  context window through parallel tool execution,   e. g. running multiple bash commands at once,  reading multiple files simultaneously. That being   said, there are trade-offs. Parallelism burns  through context faster, which leads to context   anxiety that they mentioned earlier. And then  they mention a couple of things that they're now   exploring next. I would recommend reading through  the article, basically everything that I mentioned
9:40

More Tweets

will be linked down below. Someone over here on  Twitter said that GPT-5 Codex respects structure,   Sonnet 4. 5 doesn't. It's good for PRDs if you're  pivoting your product taking a new direction. In   these cases, GPT-5 is too autistic and will cling  to existing structures too much. I haven't really   found this to be my experience, I think that  Sonnet 4. 5 does seem to respect structure,   especially when it comes to design, but I think  it's an interesting take. Someone else said their   review of Sonnet 4. 5 based on 30 hours of  Claude code use, is that it's basically the   same as Opus 4. 1 which is quite good but not  as good as GPT-5 Codex thinking equals high.    And I think this is kind of like what I mentioned  earlier in the video. I think that Sonnet 4. 5 is   equal to Opus 4. 1 when it comes to planning.   And I think for my own personal conclusion,
10:19

My Conclusion

I will continue to use GPT-5 Codex thinking medium  most of the time in Codex CLI when it comes to   making any new features. But I will continue to  use Sonnet 4. 5 for design related work at least.    I think one of the things that will happen over  the next couple weeks is that many people who   do make these agents such as Devon or the Claude  Code team or even the factory droid team and other   like people who make agents on top of Sonnet  4. 5, they will learn many of the quirks and   like weird behaviors of Sonnet 4. 5 such as some  of the things that have been listed in the blog   post over here and then basically change that  agent to like better fit Sonnet 4. 5 such that   its performance is better overall. I think that  many of these developers who are making these   agents on top of the models will make their own  agents much better in the coming weeks and then   I will probably have another look at Sonnet 4. 5  in like two to three weeks from now. I think this   was pretty similar when GPT-5 came out as well.   Like a lot of people said that GPT-5 was pretty   bad the first couple days when it came out. And  then a couple weeks later, everyone changed their   mind and started using Codex CLI and so forth. And  then they made it even better with GPT-5 Codex,   like a fine tuned version for Codex. So yeah, I  think it's like a general trend that whenever a   new model comes out, like it doesn't seem to  perform as well because many people are just   swapping out the model parameter, but keeping all  the system prompts the same. But then people have   to slowly learn the quirks of the model and what  makes it different and so forth, and then slowly   adjust their system prompts and then a couple  weeks after release, it certainly performs   much better than it did on the day of release.   Anyways, as I mentioned earlier in the video,   do remember to check out the AI startup school.   It will be linked down below in the description.    And if you do join, then you can basically ask me  any question that you have about development or   marketing when it comes to like vibe coding  or vibe marketing your own applications.

Ещё от Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться