# Deepseeks New V3 UPGRADE Just Changed Everything... (DeepSeek-V3-0324)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=xk9Tsbr5ZDg
- **Дата:** 26.03.2025
- **Длительность:** 13:08
- **Просмотры:** 16,750
- **Источник:** https://ekstraktznaniy.ru/video/13159

## Описание

00:00 DeepSeek V3 Released
00:22 Update & Benchmarks
01:11 MMLU & GPQA
02:04 Math Benchmark Leader
03:09 Coding Benchmark Insights
05:03 Community Testing Results
06:38 Intelligence Index Results
09:20 Web Development Boost
10:00 Impressive 3D Game
12:03 Industry Impact

Join my AI Academy - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWC

## Транскрипт

### DeepSeek V3 Released []

So, China have done it again. They've actually released a new model, well, I guess you could say a new update to Deepseek that completely changes the game. And what I'm referring to is the Deep Seek V3 minor upgrade. So, there was this message that I saw floating around on Twitter. And apparently, this is from the Deep Seek WeChat, which is of course the app that they use in China to communicate. Now, crazily, we haven't

### Update & Benchmarks [0:22]

seen any real update from the Deep Seek team as they usually tweet all of their updates on Twitter. However, this time with the update to Deep Seek V3, they haven't managed to post anything, leading some to speculate that this isn't actually Deepseek. However, we're pretty sure that it is because the update is truly incredible and the improvements are one of a kind. And when you actually take a look at the benchmarks, you'll start to understand that AI is rapidly moving in the direction of decreasing prices yet increasing model performance, which is a great thing for consumers, but not for these giant AI tech companies. So, of course, many people would want to look at the benchmarks. And immediately upon looking you can instantly see that Deepseek V3 compared to the updated

### MMLU & GPQA [1:11]

version Deepseek V3 0324 there is a stark difference. Now they've actually spoken about which things they've managed to update such as the MMLU they've added five points in that area right there. You can see that in the MMLU Pro they went from 75 all the way up to 81 which is really good. There's also the GPQA. So you can see the GPQA there was a very nice jump from 59. 1 all the way to 68. 4 putting it on par with GPT 4. 5. The same with the MMOU where it's actually very near GPT 4. 5 and not if surpassing claw 3. 7 Sonic. Now, of course, with the math benchmark, this is pretty crazy because this is the first time that we see the model that is from Deepseek absolutely surpass everything else on the market. Right

### Math Benchmark Leader [2:04]

here, it gets a 94 compared to the other models that simply don't come near it. And on the AME benchmark, we can see that there is a plus 19% gain, which is absolutely huge. And it's important to know that with this, what you're actually looking at is of course non-reasoning models. So you won't see models like R1, you know, claw 3. 7 thinking or open eyes 03. You're just seeing models that are frozen, I guess you could say, because of course it would be unrealistic to compare a model that thinks for extra time to those models that are frozen. So that's why you don't see those other models in this benchmark. So the Amy run right here as well, you can see that China are focusing heavily, or should I say Deepseek are focusing heavily on that math area. Of course, again, interestingly enough, I did see that the Life Code bench is up to 49. 2. But I would say that with that benchmark, maybe there might be some caveats because one thing that I've realized about the coding benchmarks is that although some may AI models managed to get better on certain benchmarks, for the longest time, it doesn't seem that there has been any AI model that has surpassed Claude in terms of its coding

### Coding Benchmark Insights [3:09]

ability. So, I will be intrigued to see if this actually does stack up and if people do switch to this. I think that would be a real incredible moment because Claude 3. 7 Sonet or Claude 3. 5 Sonet has such a choke hold on the AI community when it comes to coding. Now, if you don't believe themselves, one of the things I do like about the AI community is that they often run their own benchmarks in order to see what the model is doing for them. Now I think personally this is the most useful thing you can do because you can really gauge exactly where the model lies for your own personal use cases. Now someone did this themselves and I will get into that in a moment. But if we actually take a look on the other benchmarks released by other organizations, we can see that the ADA Polyot benchmark one right here. Deepseek V3 is actually number one in terms of or should I say number two in terms of the model being able to get a high score compared to the cost. And this is something that is really important because we can see that once again this small update actually pushes the model just that little bit further to the point where people may now use this model over other models. You can see it surpasses 03 medium the previous version of V3 and the only models that are in front of it are Deepseek R1 which is of course a thinking model and of course claw 3. 7 Sonet which is no thinking and then claw 3. 7 Sonnet which is thinking. So only one model is ahead of this in terms of this benchmark. And if you're wondering what on earth is the ADA Polygot benchmark, I didn't even explain that. Well, this benchmark is a benchmark that consists of 225 of the hardest coding exercises from exorcism specifically selected to provide a strong coding challenge to LLMs. And it covers six popular programming languages such as C++, Go, Java, JavaScript, Python, and Rust. So

### Community Testing Results [5:03]

this one is designed to be more representative of realworld programming scenarios and we can see it actually manages to perform pretty well. Now what we can also see here like I said before some individuals have opted to see beyond the hype and see if the models are actually good for their own personal use cases. And this user said it has a huge jump on all metrics in all tests and it's now the best non-reasoning model dethroning set 3. 5. We can see right here that Deep Seek V3 is 0324 managed to get higher scores than the previous models in their test. But I also see that Quen 32B managed to get a higher score. So it will be interesting to see what kind of model this was. If this was some kind of fine-tuned version of the model or if there were some kind of things on that benchmark that Deepseek V3 did excel at. Either way, we can see that the model is performing really well here on all the tests. Of course, they also did some coding tests and we can see that the model also does perform decently well compared to these other models. So, it is quite surprising that the model is doing so well. Now, another benchmark that you do want to see which is the Kors LLM arena the realworld coding benchmark. We can see that it's actually number two once again just behind claude 3. 7 thinking and behind claude 3. 5 Sonet and actually edges out claw 3. 7 sonnet. One of the most surprising things about this model is that it is actually quite good at coding. Now, like I said already, I don't know if this is going to dethrone Claude in Cursor just yet, but it will be interesting to see what people have

### Intelligence Index Results [6:38]

created. Now, we can also see that artificial analysis ran their own benchmark and we can see that they have the intelligence index. Now, this incorporates seven evaluations spanning reasoning, knowledge, math, and coding. And we can see that this model has jumped pretty much to the first in terms of the non-reasoning models which is like I said already a huge surprise considering the size of models like GPT4. 5. GPT 4. 5 as many of you guys may know was referred to as a chunky model considering its huge size in terms of all the data that's in the model and potentially the amount of parameters that the model uses. So having a model be able to surpass that is much smaller may be an indication of the model being trained on some of the benchmarks maybe. But maybe it's just a more effective model. Either way, it's going to be really interesting to see what this actually does on the LM arena because I think that is going to be one of the most useful benchmarks when it comes to seeing what real world use cases there are. Now, artificial analysis have actually said that this is the craziest launch because it's actually more impressive than R1 and potentially indicates that R2 is going to be a significant leap forward. So, you can see they said Deep Seek are not just releasing the best open-source model, they're now driving the frontier of non-reasoning openweights models, eclipsing all proprietary non-reasoning models, including Gemini 2. 0 Pro, Claude 3. 7 Sonnet, Alama 3. 370B. And when we look at this one right here, you can see that we can see the index by the model type. So we can see that for non-reasoning models, the best non-reasoning model according to several benchmarks now is probably Claude 3. 7 and of course the Deepseek V3. So, this one is super interesting and I think it's really impressive of this company to just come out of the gate and do this considering that the previous iterations of models would take months if not years to produce and release and they would be a lot more expensive than we're getting right now. There are many inference providers that are actually providing access to this model for free. And once again, the age-old saying of intelligence too cheap to meter is ringing very true. Now we can see that when it comes to reasoning models we can see that R1 is actually surpassing Claude sonet thinking in several areas but of course we know that Claude they are focused on just coding but nonetheless if this is V3 many people are speculating that V2 is going to be even more impressive and that the next thinking model could certainly leap the AI industry in terms of the west as a whole and I do really wonder what the

### Web Development Boost [9:20]

AI industry would think if that is the case. Now, they also talked about the fact that they introduced some front-end web development such as improving the executability of the code and more aesthetically pleasing web pages and game frontends. Now, I looked at a few tweets. We can see that one person here said that this has made the world of a difference. If we look at the web pages here, we can see that they look really, really smooth and really, really effective. And of course, this model managed to generate better code. So, we can see that this user here, he actually tweets a lot about AI papers. He said to Deepseek V3, "Make a cool 3. js game. " So, we can see exactly what this looks

### Impressive 3D Game [10:00]

like when we play this video. And we can see it makes some really cool interesting 3D game where you shoot these objects that are coming at you. So, overall, we can see that the model's coding abilities, being able to actually code in a way that performs and runs well in certain scenarios is something that the model is able to do. And the code isn't just for show. So this is a you know certainly game-changing thing because it means that now many individuals who previously may not have been able to afford the frontier models will have access to a frontier model for the fraction of a cost and I think like I said before the main thing will be is that you know when people start switching to these models what will happen to these other companies and how will they manage to integrate this into their further plans. There was also this where someone managed to generate a hugging face code generator. So this user Paul Pandandy said, "Do a water molecule simulation with DeepSeek V30324 via any chat. Prompt to create an interactive simulation that shows water molecules forming and breaking hydrogen bonds and also show a temperature slider. And we can see that it manages to do this rather effectively. Now there are some minor caveats, but overall it does seem to do it pretty well. " Now another person said it's so over. Deep Seek V3 just dropped and created this website in one shot. It wrote 800 plus lines of code without even breaking once. And honestly, if you ask me, you know, who created this website, I would not have guessed that this would have been created by AI in a one-shot scenario. So, this is something that once again I do think is rather surprising. Now overall I think this is like as said already. This is going to have some far far wide reaching implications. And in my next video on OpenAI I'm actually going to talk about how OpenAI have kind of secretly revealed that they're going to be making some changes at the company thanks to the Deepseek models. And I think it's going to change the AI industry as a whole especially for us customers in a way that people just haven't seen yet.

### Industry Impact [12:03]

Now there was also this one as well where you can see that the model is significantly better at front-end coding. We can see that the previous area where you wanted to create a simple game, the images just did not look that good in terms of the HTML. However, on this side, we can see that they looked a ton better. And overall, of course, if you guys do want to use the model, you can use a website like Po. This is definitely becoming one of my favorite websites to start to use models because they don't have any limits. The only limits I have is credits, which I can easily top up, and they usually deploy state-of-the-art models within just a few hours. So, I'm actually really excited to be using this website. And this is not like a sponsored video or anything. It's just an easy way to use these models in a simple chat user interface. So, with that being said, if you guys have enjoyed the video, let me know what you think about Deep Seek. Do you think OpenAI are panicking? I actually know that they actually aren't. They're doing a smart pivot, which I'll talk about in the next video. But, what do you guys think about this model? And have you used this? And I will say for the last thing is that I cannot wait to see the LMSYS chat arena leaderboard because that is probably going to be the clearest indication of where the model ranks in terms of actual day-to-day usability.
