Qwen 3.5 - The next NEXT model

Qwen 3.5 - The next NEXT model

Sam Witteveen

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

Okay, so up until recently, usually making AI smarter has meant making it slower and making it bigger. But the Quen 3 models have just dropped and they've really changed this whole equation. Not only is this model matching what their Quen Max was able to do, and I'll talk about sizes and stuff in a second, but it's able to do it with up to a 19x speed boost. So, in this video, I'm going to break down what they've released. I'm going to talk a little bit about some of the key changes that they've made and why this family of models looks like it's going to be a real winner and also I'm going to speculate on some of what we might see coming over the next few weeks or months from the Quen team. Okay, so it seems pretty obvious that this release was made to coincide with Chinese New Year. We've even got the nice little Quen plushy here dressed in his Chinese New Year outfit. But don't be fooled. This is a model that the Quen team have been working on for quite a while. So, I made a whole video about the Quen Next models. If you haven't seen it, check that out because really this model is the successor to that particular model. And I've got to think that they'll probably even been training this when they released those models. So just going back to the Quen 3 Next, they basically had an 80B model with 3B active and they also went on to release a coding version with the Quen 3 KOD Next in there. So jumping back to 3. 5, this is the sort of continuation of that, but we've got a much bigger model that probably has been trained for quite a lot longer as well. So the Quinn 3. 5 model that they've released and currently this is just one model that they've released is a 397 billion parameter model but with 17 billion parameters active. Now if we go back to the Quen 3 models you'll remember that the ones that they released publicly the biggest one was 235 billion parameters with 22 billion parameters active. But this kind of hides the story. The real interesting thing in here is because this is a mixture of experts model. What we want to look at and judge these on is partly the number of experts. And we can see that the Quen 3 model had 128 experts. The thing that's most interesting here is well how many experts does this have? And we can see that it's got 512 experts. Now they're being used slightly different from the architectures, but this model is definitely continuing the direction that the Quen 3 Next was taking in having a lot more experts than the previous Quen 3 mixture of experts models had. Now, the size of this model is not something that's going to be really friendly for people who want to run it locally. Realistically, even if you're running quantized versions of this, you're probably going to need around 256 gig of RAM, perhaps even 512, you're going to need a pretty fast machine to be able to do this. That said though, I guess if you're working in a company where you've actually got a node of GPUs, you will be able to run this on a reasonable size node and you'll be able to have a fully contained model that can actually get you very close to some of the sort of proprietary models out there without having to have a trillion parameters. So, I'm not going to hop on the benchmarks too much, but jumping in here, we can already see that when we're comparing this model to the Quen 3 Max thinking model, which Quent team themselves have said was greater than a trillion parameters, this model is already beating it. And not only is it beating their previous models, but it's also very competitive with the GID 5. 2, Cord Opus 4. 5, and Gemini 3 Pro. Now, the other benchmarks that they've really focused in on here are vision benchmarks. And this kind of drives one of the really cool things about this model. So, in the past, Quen tended to just make language models and then make a VL model where they would sort of bolt on an encoder to one of their previous language models. That is no more with this model. This model is multimodal out of the gate, meaning that it's been trained from scratch, both on text, but also on images in here. And that improves the entire sort of results that we're going to see for things like visual question answering, anything that relates to images. Now, the multimodal part is something that we know Gemini 3 is very strong at, and it seems that they're generally behind that for some of the main vision benchmarks, but they're already surpassing things like the Claude Opus models, which we know are not as good at multimodal stuff, and getting benchmarks that are very respectable to GPT 5. 2. Let me remind you again, this is a model that's under 400 billion parameters with only 17 billion parameters active. So they talk in here about the training of the model and specifically the pre-training of this and there are a number of things that really stand out here. So obviously the new architecture which this is building on this quen 3 next architecture with an attention system that basically allows it to not need as

Segment 2 (05:00 - 10:00)

much RAM when they actually go to very large context lengths. And we see that plus one other thing which I'll talk about in a second allows them to actually get this sort of speed bump boost. So you can see that when they're decoding at sort of 256k, this model is 19 times faster than their quen 3 max model. And amazingly, it's even 7. 2 times faster than the much smaller, almost half the size quen 3235b model. The other big win that they've gotten here is that they've moved from sort of single auto reggressive token prediction to multi-token prediction. So this is something that a number of the proprietary models are doing and are doing very well. From what I understand from talking to certain researchers about this, the big advantage of this is that in pre-training, your model just tends to learn a lot faster. So another thing that's great to see is that their multilingual coverage has gone from 119 languages to over 200 languages and dialects. And the tokenizer that they've using has actually gone to a vocab of 250K here. So that's kind of matching sort of Gemini and the Google tokenizer which is around 256K tokens. Now, if you're not sure why that is important for multilingual stuff, I actually did a whole video of sort of comparing some of the old tokenizers versus the new tokenizers and just showing you how inefficient a 32K tokenizer is when you're doing it for languages that are not things like English or Chinese or sort of Western European languages. So, it is good to see that they've made a very clear conscious decision here to be able to expand out the number of supported languages. And I got to think that probably makes a lot of sense for them. Certainly, I've been seeing lots of companies around the world starting to use the Quen models. Not the big models, more the smaller ones. Lots of people have been distilling to and fine-tuning some of the things like the 600 million parameter model, the 1. 7B parameter model. Okay. So, another interesting thing that we're seeing here is a continuation of one of the things I brought up in the Miniax 2. 5 video last week, and that is this use of RL training environments to get your model to be better at reasoning, different tasks. And just like the other companies, it certainly seems here that the Quen team has consciously decided to scale up their RL environment trainings. Now, it is interesting that the Miniax people were claiming that they were doing hundreds of thousands of different environments. Here we can see the Quent team is maxing out with this model at least at around 15,000 environments. And that does make me wonder with the hundreds of thousands of environments, are they really just sort of variations of the same thing? My guess is that Miniax is not training on an order of magnitude more unique environments. But I guess until people actually show us what they are actually doing, we can't sort of see this. Now the cool thing here is that they do mention that there's an upcoming technical report. Hopefully that will have a lot more of this sort of information in it. So if you want to play with the models, the models are actually out on Quen Chat. So you can actually just go and try these for free. They've got some nice demos in here of using it for different kinds of tasks. Whether that's sort of vision tasks, whether that's kind of vibe coding 3D games. Some of these games in here are pretty impressive for what they're actually doing and obviously there are a lot of other coding tasks that you can sort of try and do. Interestingly, they actually have a whole section where they do mention that this can be used for open claw and so it's certainly going to be interesting to see the inference prices of this model as it gets rolled out to different providers etc. and see, okay, how competitive is this against Miniax 2. 5? And I kind of feel at this stage, you have to think of the Quen team as a frontier lab, right? They not only are producing these really highquality models, but they're also making things like Quen code, Quen agent, and they're jumping into a lot of the tasks that we're seeing the proprietary model labs really sort of focus on here. Now, if you want to try it yourself, you can just come into chat. quen. ai. They actually have an 3. 5 plus version of the model in here. From what I understand, this basically is just where the model is set up so that it can actually go for a full million token context window. And you'll see if you come in here, you can actually set it to be thinking or fast or auto. And we can then see it actually go through. In this case, you can see it's doing a number of web searches as it goes through and finds different models of what's actually been released. And sure enough, it looks like it's getting the Kimmy K 2. 5, GLM5, which we did last week, Miniax 2. 5. And you can see over on the right side, you actually get the URLs

Segment 3 (10:00 - 11:00)

for things, but you'll also get a lot of the thinking coming out over there as well. So, I think building on this, we're going to see both distilled versions of this and perhaps some even smaller models in the similar sort of sizes to what we saw with perhaps the ATB Quinn. next. If you remember back to the video that I did about that, it looked like that was only sort of half trained. So, it will be interesting to see if they release a 3. 5 version of that. And hopefully they'll also release some 3. 5 distilled versions of the really small models that we saw get so popular with the Quen 3 series of models. And my guess is that those are going to start coming in the next few weeks, if not a month or so. So if you are interested to check this out, I think a lot of companies will be actually serving a lot of the inference companies will be serving this. Try to be careful sort of where you actually pick the server from. We had a really interesting discussion in the comments for one of the models last week where someone was saying that the model sucked only for other people to sort of ask like okay where were you using it and point out that was likely a quantized version of the model that wasn't being served very well. And then when the original commenter actually sort of retrieded it at another place, got to see that their prompts actually worked out really well compared to where they had been using it. This is a common thing that I see a lot is that you really want to make sure that you're either serving the model yourself with the right config or using someone who's serving the model in a way that actually is going to get you all the intelligence out of the model. So let me know in the comments what you think about this. If you want to, we can have a guessing game of what will be the sizes of Quen 3. 5 that we're going to see over the next month or so. And as always, if you found the video useful, please click like and subscribe, and I will talk to you in the next video. Bye for now.

Другие видео автора — Sam Witteveen

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник