# The "Token Muncher" Problem: Is Sonnet 4.6 Actually Cheaper?

## Метаданные

- **Канал:** Sam Witteveen
- **YouTube:** https://www.youtube.com/watch?v=iyLwNnU6BMA
- **Источник:** https://ekstraktznaniy.ru/video/22380

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Okay, so Anthropic has released Claude Sonnet 4. 6 and undoubtedly you've been seeing this all across your feeds and you probably seeing everyone rave about this. But in this video, I want to take an even look at what is good here, but also why you may not want to use this model for many of the tasks that you're thinking about using it. Okay, so let me start off by saying that Claude Opus 4. 6 is my daily driver model. I use a lot of tokens in this every day, both in Claude code, but for also other uses and stuff like that. And so, for a long time, I've been very partial to the actual Claude models. I like the tone of them. I like the way that they respond generally. I like how they're very good at tool use. And so, just like many of you, I was definitely excited when I heard the rumors that Claude Sonnet 5. 0 was coming. Now, clearly, it wasn't 5. 0. We've got this incremental naming sort of bump here to Sonnet 4. 6. And if we go through their blog post, there's obviously a number of things that this is quite a bit better. So, one of them being computer use. So, computer use and clauding Chrome has been something that definitely has been getting better and definitely having a model that's better at this is going to, you know, help with how this actually goes. And if you come back to their blog post here and they're saying that people with early access is saying that this may be at a human level. I don't know about you, but while I am super impressed that they've gone from well under sort of, you know, 20% when they first launched this back in October 24 up to, you know, 72% on these OS world benchmarks. I'm not sure I would call getting something right on a browser 72% of the time human level here. Now that brings us to the other benchmarks and the other benchmarks are very impressive here. So we can see that while they are behind Opus 4. 6, Sonnet 4. 6 is definitely catching up to this sort of level that is being set by the Opus model. And personally I think that a lot of the goal here is to build a model that's reasonably cheap for co-work. So, Anthropic introduced Claude co-work which they sort of describe as Claude code for the rest of your work. And this kind of plays out here when we look at the sort of knowledge work benchmarks that we're sort of seeing of office tasks and things like this that this model is clearly being built for those kind of tasks. And the model has gotten a number of really interesting updates of where things that were sort of just opus kind of things before things like the adaptive thinking, extended thinking. So remember adapted thinking is basically where Claude can determine when and how much use of the sort of extended thinking or the long chain of thought thinking that it does and also things like context compaction. We're now starting to see more solid support for these in the Sonnet 4. 6 range of models. So, the big question should become, why did I say at the start of this that you may not want to use it? And this is something that I just haven't seen a lot of people talk about as they jump out and claim that this is the best model ever, that it's insane, that it's all those sorts of things. And it does sound really good when we look at the pricing of Sonnet 4. 6, that it's 40% cheaper than Opus 4. 6. And that cheaper price and ratio holds true both for the sub 200,000 token version of the model. The model goes up to million tokens is definitely impressive. So again, why am I saying that you might not want to use this? If we come in here and we look at the independent benchmarks and evaluations, one of the most reputable ones that I see nowadays are run by artificial analysis. And they basically say that this is a really good model, right? that it's much smarter, that it's you intelligence is high and stuff like that, except for one issue, and this is that the adaptive thinking from this that helps make it smarter represented a substantial improvement over Sonet 4. 5. But to achieve this result, Sonet 4. 6 used more than 4x the total tokens than its predecessor. So this went from being, you know, on their benchmarks, Sonet 4. 5 used 58 million tokens and the new one used 280 million tokens with the adaptive thinking. By the same comparison, when they compare that to Opus 4. 6, it only used 160 million tokens. So it really does seem that Sonnet 4. 6 six may have a token muncher problem that while it's going to be faster and cheaper per token overall if it ends up using four times the amount of tokens as the previous model you're

### Segment 2 (05:00 - 08:00) [5:00]

going to have to wonder whether you're just better to stick with Opus 4. 6. Now we can see more details about this when we come in and look at the artificial analysis website. we can see that okay you know this is right up here the only model that is actually spending more tokens is one of the Xiaomi flash models which I think is actually a very cheap model to use and obviously it doesn't have the intelligence that the Summit 4. 6 has. So if you remember back some of the early GPT5 models had a similar kind of problem was that they were the basically like a big token muncher and people started reporting that actually the real cost of the API was a lot more when you factored in that it was just using so many more tokens to do certain tasks. Another issue that you want to think about is if you're using this via the API, it does seem to be very clear now that the APIs are no longer equal. In the past, it used to be that basically all the APIs, whether you used it from Anthropic directly, on GCP, AWS, etc., you were getting the same model and all the same feature set. That doesn't appear to be the case. Now a good example of this is programmatic tool calling. So this is definitely one of the killer features that a number of models are starting to use nowadays. The idea was introduced way back with open AAI and then later with the Gemini models where you've got a sandbox or a sort of code execution environment and the model can then write code run that server side and execute that and that actually means that things like sort of tool calls can be done server side be much faster and use less tokens. The only challenge with this is that this code execution isn't available on all the platforms. You can see here it's basically saying that it's available via the claude API and the Microsoft Foundry. There are similar issues when you start looking at the API for other features as well. For a long time, the skills have not been supported evenly on Anthropics API versus other APIs out there. Now, for most people, if you're using this via a sort of clawed code subscription or something like that, that's not going to affect you at all. But I would say then probably for most people doing that, you're going to stay on the Opus model anyway if you're paying the flat fee buffet model of where you can eat all the tokens that you want over a certain period of time, etc. So, just to finish up, I would say definitely this model looks really nice. It's definitely got some bumps over what we've sort of seen before, but you really want to check it on your own eval to see is this actually going to be cheaper than Opus 4. 6 or not. And my guess for things where you're not using the adaptive thinking, the answer is going to be yes. And for certain tasks perhaps where you are doing really long chain of thought with adaptive thinking that the answer is going to be no. Stick to Opus 4. 6. But overall, a good solid bump. Unfortunately, not the sort of bump that we were hoping to get with sort of a set 5. 0. And perhaps this just sets the stage for an Opus 4. 7 or an Opus 5. 0 in the not too distant future. Anyway, as always, let me know in the comments what you think, where you plan to use this model yourself. Do you see yourself changing from Opus 4. 6 to this model, or are you most likely going to stick with Opus 4. 6? I think for most of the things I'm going to be doing sort of claude codewise, I'm going to stick to Opus 4. 6. But for certain agent tasks and stuff like that, I will definitely try the Sonnet 4. 6. Okay, if you found the video useful, please click like and subscribe, and I will talk to you in the next video. Bye for now.