# This New OpenSource AI Crushes Everything...(Qwen 3) Beats DeepSeek...

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=OJGemzH46qg
- **Дата:** 29.04.2025
- **Длительность:** 20:17
- **Просмотры:** 13,801
- **Источник:** https://ekstraktznaniy.ru/video/12909

## Описание

Join my AI Academy - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

00:00 Qwen 3 Revealed
00:41 Benchmark Surprise
01:22 Smaller Models Shown
02:00 Thinking Mode Explained
02:54 Hands-on Demo
03:34 Artifact Feature Test
04:20 Visual Comparison
05:18 Multimodal Test
06:16 Real-World Coding
07:14 Ant Colony Sim
08:25 Open Model Pros
09:49 Free HTML Creation
11:13 Open vs Paid
12:01 Deep Research Demo
13:03 Checking Accuracy
14:11 Free Deep Research
14:46 Image Generation
15:39 Video Generation
16:20 Voice Chat Test
17:11 Voice Features
18:14 YouTube Tips
19:13 Overall Verdict


Links From Todays Video:
https://qwenlm.github.io/blog/qwen3/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be su

## Транскрипт

### Qwen 3 Revealed []

So, we just had another DeepSseek moment that I personally think is probably even better than Deepseek. Alibaba just introduced Quen 3, their openw weight models, the latest large language models from Alibaba Cloud. And I promise you guys, this entire AI system is largely better than Deepseek for a variety of reasons that I will get into. I'm not going to waste you guys' time on the benchmark benchmarks because that stuff is honestly not always indicative of what the model can actually do. But you're going to be really surprised by the actual models and just how good they are. So, if we actually take a look at the benchmarks very quickly, I'm only going to spend 60 seconds on this, we

### Benchmark Surprise [0:41]

can see that when we look at the largest model right here, the 235 billion parameter mixture of experts model, it actually does perform on par with Deep Sea Car 1. Honestly, edging it out even better in almost every benchmark that really does matter. It is comparable to 03 mini. We can see that on the Arena Hard, it manages to get top two. On the AE 24, it manages to get above OpenAIS 03, above Deep Sea Car 1, Grock 3 beta, Open AI 01, the Amy 25. All in all, this benchmarking is really, really indicative of just how good the model is. It really doesn't get beat here, which is remarkably surprising for an openweight model. Usually, they're just

### Smaller Models Shown [1:22]

behind the state-of-the-art, but this model beating it is, you know, really incredible. Now, they also do have smaller models right here that are on the level of GPT40 at a much smaller size. So, we can see here once again the Arena Hard here looking very, very good. The Amy benchmarks, the math benchmarks, the coding benchmarks, the code forces, GPQA, the live bench. Honestly guys, it's super surprising at 30 billion parameters. And this is probably going to be absolutely incredible when you guys see what the model is able to do. Now, interestingly, they also introduced a thinking mode to the model. So, you can see the non-thinking mode right here just, you know, is stagnant. There's no improvement. But as

### Thinking Mode Explained [2:00]

the model begins to think, the abilities and intelligence of the model continues to increase. Now, I actually spent some time testing the Quen platform. It is absolutely outstanding. And this is not a sponsored video. Honestly, when I used the platform, I was pleasantly surprised by everything it had to offer. And this is a clear example that China are set to disrupt the AI industry. So, watch us all the way to the end because you're about to see an incredible AI assistant that really took me off guard. Okay, so if we go on over to Quench, we can actually see that the model has a variety of different options. And honestly, I got to say, the user interface for Quench Chat is refreshing. And we actually have a variety of different things here that I was pleasantly surprised by. So, you can see right here that we have many different models that we can choose from. The one I'm going to be benchmarking here is the largest model, the 235B, the one that is actually on par with other state-of-the-art models. So

### Hands-on Demo [2:54]

you can see also here that we do have thinking. And so, with this dynamic thinking, what I really like about this is that this allows you to control how it thinks. And I've seen that the option for this might be available in open air model soon, but currently that just isn't there yet. So I'm going to go ahead and benchmark this so you guys can see exactly how the model does. So with artifacts, it already comes with some pre-built ones. So for example, creating a graphical workflow overview. So I'm just going to use this prompt. I'm actually going to say turn this into an interactive infographic. Make it visually appealing. Use CSS, JS, and HTML in a single HTML file. And so basically I just have a table here. And so I basically just want to see if it can remake this. I've seen, you know, Sonet

### Artifact Feature Test [3:34]

and Anthropic make this. So right now we can also see what the model is thinking about. And so far it seems that this model is going to be able to tackle this. Well, so I'm actually glad that we do have thinking because that's going to increase the reasoning capabilities. So all I have to do now is wait for this to pop up with the artifacts feature and then we'll actually see how this pops up right here. So I'm just going to wait for this to come up. And so here we have the artifacts feature. We can see that this one is really nice. It's basically given us some really nice flash cards. I do want to say that it did take a little bit longer than expected. I didn't expect it to take that long. You can see it did think for quite a long time about how to do this. But if I'm being honest with you guys, I don't mind if a model takes longer to think because that just means that, you know, it's at least going to be accurate. So, you know

### Visual Comparison [4:20]

speed for me is no longer an issue when factor when factoring in accuracy. So accuracy is the last metric and we can see here that it managed to build this visually appealing. We can see this looks really well. Now the thing about this that I think is most surprising is that I actually asked Claude to build this as well. And surprisingly this is actually more visually appealing than Claude 3. 7 Sonic. Let me actually show you guys what I mean. So I actually put the same prompt into Claude. And you can see here that this infographic, it actually doesn't look as good as the previous one. I mean depending on where you do want to look. I mean, this one is kind of nice. You know, it actually does have this loading up. I guess Claude's one is probably a little bit more complex in terms of, you know, the overall design. I guess you could see when you got the elephant right there, you've got all of these different things. I mean, of course, you know, you probably could say this one is a little bit more complex, but if we're talking about purely visual aesthetics, then I don't know about you guys, but I do really like this one because I'm easily able to see the animals and it looks

### Multimodal Test [5:18]

really, really simple to, you know, read and interact with and get that data. I mean, it's all the way green for some of them. red. I mean, to be honest, I would have to say that is a point4 quen. Now, let me ask it a more interesting question. So, I'm actually going to take a look at its multimodal capabilities and I'm going to ask it to rebuild this web page in a single standalone HTML file. So, I found a random, you know, HTML web page and I'm just going to ask it if it can rebuild it. And let's just see if that actually works. And I will see exactly how this model does manage to go and how it manages to reason about this. So, it's going to be super interesting to see what it manages to come up with. And so, yeah, the model is now reasoning and I'm really intrigued to see where the model manages to end up. I think it's going to be really interesting because this model currently is really surprising me in how well it is. Oftent times when there are models that are quite similar to GPT40 or Claude 3. 5 Sonet, they often are in benchmark territory but not in real world use cases. But here, this model is

### Real-World Coding [6:16]

surprising me. Now, once again, I do think this isn't the greatest design, but this is once again just HTML, so it isn't that bad all things considered. I mean, it is of course doesn't look the best, but nonetheless, it still shows that the model tried to do this, which is really nice. But I do have to be honest, when we do test this with Claude, this is what Claude actually managed to come up with. So, this is pretty surprising here. And Claude's one actually works. So, I can actually enter commands and then I can actually clear the terminal with Claude code here. So, it's actually pretty impressive for Claude if I'm being honest with you guys. Now, I can still use the command actually. I actually can still use this terminal so I can put things in. Visually, of course, it's not that appealing. And the clear terminal button, it doesn't actually work. So, I will say that is probably one point for Claudet 3. 7. So, now let's use another prompt to see what it's able to do. So, I'm going to say write a p5. js JS script that simulates an ant colony searching

### Ant Colony Sim [7:14]

for food. Use pheromone trails and basic AI rules to show the ants exploring and optimizing paths. Include controls that user can, you know, use in real time. So, I'm going to input that in. We're going to see what it's able to do. And of course, I'm going to put the same thing into Claude so we can judge between the models. And this is pretty crazy. So, here is Claude for reference. This is actually absolutely insane what Claude has been able to do. So, simulate this ant colony with all the pherommones and stuff. It's actually kind of making my computer lag a little bit. Uh, which is pretty crazy. So, if we refresh that, you guys can see that all these ants are going around. If we want to reduce the ant count, I can reduce it to like 10 ants. If I want to reduce the speed, I can do that. The center angle. I mean, there's so many different things that are really in here. Like, I can add food, you know, I can cancel food. I mean, Claude is honestly, you know, I don't want to say it's super surprising, but now that you can build this with a simple prompt is absolutely insane. like I can add a bunch of different ants here. And you know, this is just, you know, as serves as a benchmark so you guys can see how it compares to Quen 3 in terms of simple coding tasks. Of course, you know, people probably aren't using it for this on a day-to-day basis, but nevertheless, it's useful to see how models compare on creative tasks that you may use them for. Now, with Quen 3

### Open Model Pros [8:25]

for some reason, when I actually go to full screen, it actually does mess up for some reason. I'm not sure why that is, but the artifacts feature doesn't really work that well here. I mean, if I hit preview, it should try and load, but we can see here that, you know, it just doesn't get like any details. I'm not sure why that is on this case. Maybe it's my mistake, but uh yeah, on this example, once again, I would have to give the edge out to Claude because the examples here don't really seem that well. This one is on me. I actually did reload the Quen simulation, and this is what it proposed. So, this one was really cool. I'm not going to lie, this one was actually pretty decent. I don't know why the artifacts feature was glitching. It clearly is a little bit more bare bones. We can see that of course we can change the number of ants. We can put, you know, 200 ants if we completely wanted to. Like that's a ton of ants. Uh we can put just 22 ants and it will actually just do that as well. We can, you know, increase their turn chance. So they're going to be doing a lot of different things. And then we can increase the evaporation rate, which is of course pretty nice. But um yeah, this is uh still pretty interesting if I'm being completely honest with you guys in terms of the simulation factors. Of course, we just saw that Claude was, you know, monumentally better, but we do have to remember that this model is basically free and you can run this for free on your home device. So, having access to these open source models that are this good, I think that is one of the key things that we do have to, you know, talk about here. Of course, you know, Claude is going to be a little bit, you know, higher above in terms of the, you know, quality. But for an open

### Free HTML Creation [9:49]

source model, being able to, you know, code this on demand, honestly, I think it's absolutely worth it. Now, of course, for the portfolio, this is what the website was. The only thing that I don't think it was able to generate was unfortunately images. So this is the only thing that uh you know did have a I guess slight issue with this. But nonetheless for a HTML website it is pretty basic. It's not pretty bad. I did actually ask Claude to do this but I think it did get rate limited. Like my claude right now is currently glitched. So I will try that again. But so far looking at this like I said before for a website I mean all of the links are currently working. It does seem like a pretty basic website, but nonetheless, for an open-source model that is able to reason effectively, I think this is worth it. I mean, you're not really paying anything. I just signed up completely for free. And with Claude, you do get rate limited pretty quickly, even in cases where, you know, this model is a little bit better. So, let's take a look at what Claude was able to do. This is Claw's design. Got to be honest, the only thing once again is it doesn't have images, but this definitely does look a lot cleaner. it looks a lot better in terms of front-end design. And I got to be honest, what I'm currently testing now is like a mix of, you know, the ability for the code to actually work and a mix of the actual front-end design. So, I do have to say that since the model probably isn't designed with that in mind, then, you know, it does get a bit of points off for that. But so far, I have to say that it does hold its own in terms of being able to code rather effectively. And remember, this

### Open vs Paid [11:13]

model is basically free for you to run. So yeah, Claude definitely edges out a bit here, but we're going to go on to some other benchmarks now so we can test how the model actually does perform. So I know that the model isn't really designed for this, but I actually want to come over to my AI community, come on over to the prompts area, and I'm actually going to go to the Google deep research area cuz I really do like these prompts that I've made, and it's really good. So one of the prompts I have here is, you know, an emerging market gaps finder. Create an industry comprehensive research report analyzing untapped market opportunities in whatever, you know, market that you're looking for. Research your products and services, common customer complaints, trending problems people are willing to pay to solve, and present findings in actual Google doc with verifiable data sources. So, I'm going to take this prompt, and it does have a search feature. It doesn't actually have a deep research feature. It does have thinking. So, I am actually just going to try turn on thinking and search to sort of act as a deep research. And then let's actually

### Deep Research Demo [12:01]

see what this model is able to do. So, currently, you can see it is searching many different sources. So, it's also searching and thinking and so far it does seem to be doing a lot of what I would be doing anyway. So, yeah, so far it looks pretty good. So, we can see here that it managed to search through 10 sources pretty quickly and then it gave me an executive summary. It spoke about the AI agents market that's projected to grow 7 billion in 2025. You know it talks about findings niche vertical solutions healthcare AI agents multimodal capabilities AI agents integrating text voice and visual input exceed supply particularly in retail and logistics AI financial planners increased 45% year-over-year but few platforms offered this. It talks about common customer complaints lack of autonomy. Let's actually see if the model hallucinated anything. So I'm going to go and click through these links because one of the things that is a key issue with LMS is hallucinations. When you're testing a model, you really want to see where the model does trip up. And hallucinations is by far one of the most common ones. So, I want to just test maybe four of these links and see if it's currently correct. So, lack of

### Checking Accuracy [13:03]

autonomy. Let's click this link and see if it does mention lack of autonomy. So, we can see right here that it does say increased autonomy. And this is a Medium article. And so, yeah, it does actually have the relevant information right here. It does talk about, you know, um autonomy and stuff like that. So, so far looking pretty good. Let's go ahead and check about the privacy concerns. this. Let's see if there's any privacy concerns because I always wanted to check if you know the model does get it right. We can see here it does actually say that yes data privacy bias and ethics is still major worries but of course AI agents aren't get going away. So far this does look pretty good. So now in this one it did say that you know they do spend 50k per month on AI agents and others spend $500. And I don't actually see that figure here anywhere. Maybe it's somewhere in the HTML. Maybe it's in like another part of this, you know, entire thing. Maybe it's somehow managed to download the PDF. So that might be one hallucination. So of course always do double check which whatever it is that you do click because you do have to make sure that these values are completely right. But I will have to say that you know the verifiable data sources are here and it does look really nice in terms of the way that it's output the data. Now for a free you know system. I keep saying that this is

### Free Deep Research [14:11]

really incredible for what I'm getting. Like I just signed up with a random account and I'm here being able to do basically deep research for free. being able to generate code front ends just do amazing things. Now not only that when we actually look at the model as well it also embeds image generation. So I can use this prompt of a crusanium thaw manipulated to resemble the head of an eagle. So let's see what happens here. I mean this animation is pretty nice so far as well. So far that looks pretty nice. I'm not sure what image generation model they are using. I would have to say that image generation doesn't really matter that much anymore because you know image generation is mainly open source now.

### Image Generation [14:46]

We've got tools like Flux and you know they can just embed that into the model. So it's not really that hard. So now I've tested with a much harder prompt. Uh people handshake text in background says handshake. So handshake is pretty hard because traditional AI image models they struggle with hands. They even struggle harder with handshakes. And then of course uh behind you know the text I wanted to see if it gets that right. So H A N D S H. Yeah the text behind doesn't really look well but um I do want to say the handshake it does look pretty fine. So, so far that one is not bad, but remember it's free image generation, so I can't complain. You've also got different dimensions here, which I really do like. And you've also got this gallery right here, which is really nice as well. So, it gives you, you know, instant image prompts. And surprisingly, there is also also video generation as well. So, surprisingly, I can use this prompt of a huge elephant is leisurely drinking water by the Clear River, using its long trunk to scoop up water and spray it on its body to ward off the heat. So, I'm

### Video Generation [15:39]

really surprised that they do actually have video generation here. I was not expecting that from a free open- source model. So, the video just finished. We got a 5-second video here. Let's actually play this. Uh, yeah. Okay. So, it's pretty low resolution. A little bit laggy, but at the end of the day, once again, like I'm saying, you can't really complain about something that is completely free to use and it took around I would say maybe 5 minutes to generate. So, it's not that bad in comparison with other things. So, once again, decent gets alike from me. Now there are many other things that you can use this model for. You know you can you know have thinking, you can have artifacts, analyze image and make a plan, news code, video generation, search. There are just so many different things. You can even use your voice in video chat which is pretty crazy. Like

### Voice Chat Test [16:20]

to talk about or ask me a question. Hey, this is crazy. I didn't know there was voice chat on, you know, uh Quinn. Yeah, it's really cool. You can have realtime conversations with others through voice chat. It makes chatting more interactive and engaging. Have you tried using it yet? Yeah, this is my first time using the uh voice chat. So, I'm actually really surprised that you guys had this feature. It wasn't in the blog post at all. Oh, yeah. Sometimes things get missed in blog posts, but don't worry, voice chat is great for those who prefer talking void typing. If you need any tips on how to use it better, just let me know. Okay. I was wondering, can you search the internet through voice chat or no? M. No, I'm not able to search the internet directly through voice chat, but if you want to find something online, we could try looking it up together here. What do you think? Okay, that sounds like a good plan. I was also wondering, can you laugh or do any kind

### Voice Features [17:11]

of emotions? Well, I can't actually laugh or show emotions in the traditional sense since I'm an AI, but I can understand when someone might be feeling happy or sad based on what they say. So, feel free to share how you're feeling. Hey, uh, what's going on? What can you do as a voice model? Well, I can do quite a few things. I can answer your questions about all kinds of topics like work, hobbies, or general knowledge. I can also have conversations with you just like this one we're having right now. And if you want to share something interesting, I'm here to listen. If you've got any specific questions or ideas in mind, feel free to let me know. Yeah, I was wondering how can I make my videos better for my subscribers? I really want to make the best and most engaging content, the most thoughtful, you know, just the most insightful use of content surrounding AI videos ever. Well, first off, you should focus on creating highquality visuals. Make sure they look good and are relevant to what you're talking about. For example, if it's an AI video, show some cool examples of how AI is being used. Then, write really engaging captions that give people a taste of what your video is about. And don't forget to add some background music that fits the mood.

### YouTube Tips [18:14]

Also, try to interact with your subscribers in the comment section. Reply to their questions and comments. It makes them feel more involved. And keep your video short but sweet. People have short attention spans these days. You could also collaborate with other YouTubers who have similar audiences. That way you can reach even more people. If you want more tips, just let me know. Here is always more we can talk about. So that was the voice chat. And apparently there is also a video chat. Now I currently don't have access to this cuz I don't have my phone on me at the moment. But this is something that I think is really interesting. And remember there are other models here. We've got the 30B models that are just as good as GPT4 that are probably much faster in terms of being able to use them. So overall, I got to be honest, guys, this is, you know, absolutely amazing. You've also got, you know, image analysis. Surprisingly, you've also got video analysis, audio analysis, too, which is really surprising because we don't even have audio analysis and video analysis for other key models. So, I got to be honest, guys, this is the model that truly has surprised me. I

### Overall Verdict [19:13]

would argue that this is probably better than deepseek in a variety of different use cases. And I mean, when we actually come to the Deep Seek area, it only really has a few things. I mean, you've got search, you've got thinking, which is really nice, but they don't really have much other than that. That's not too hate on whatever they've done. But compared to Quen with uploading documents, images, videos, audio, generating images and video, having a real-time voice chat, real-time video chat, summarizing text, video generation, search, this is a really comprehensive system that is 100% free. So, if you haven't, you know, started using the model, definitely take advantage of this because I don't know how long this model is going to be here for free. I mean, of course, you could definitely run it. Some, you know, people are going to start offering this as a service. Of course, sites like Po, Open Router, they're going to, you know, offer this as a service. But I talk about the fact that, you know, having a voice chat currently is completely free. I just do think that once this does go viral, these servers may actually get overloaded. So, definitely try this out while you still can and let me know what you guys think about this. Do you guys think I'm overreacting or do you think this is a really good
