DeepSeek V3.2 Just Beat Gemini 3.0 Pro… Seriously!
8:22

DeepSeek V3.2 Just Beat Gemini 3.0 Pro… Seriously!

Universe of AI 06.12.2025 5 663 просмотров 118 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
DeepSeek V3.2 is here — and in several benchmarks, it actually beats Gemini 3.0. In this breakdown, I walk through what makes this model such a big leap for open-source AI, show real demos of reasoning and agent behavior, and go over the benchmark results that put V3.2 ahead in multiple categories! For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪‪ ⁨‪‪‪‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: /intheworldofai 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ 🔔 Subscribe for simple, high-quality AI breakdowns every week. #deepseek #deepseekai #gemini3 #gemini3pro #gpt5 #gpt51 #ai #artificialintelligence #opensourceai #aimodels #machinelearning #technews #aiupdate DeepSeek V3.2, DeepSeek AI, DeepSeek vs Gemini, DeepSeek vs GPT, DeepSeek V3.2 demo, DeepSeek V3.2 benchmark, Gemini 3.0, Gemini 3 Pro, Gemini AI, Gemini vs DeepSeek, Gemini vs GPT, GPT 5, GPT-5 High, GPT vs DeepSeek, GPT vs Gemini, AI model comparison, AI benchmark, AIME benchmark, HMMT benchmark, GPQA, ICPC AI, open source AI models,

Оглавление (2 сегментов)

  1. 0:00 Segment 1 (00:00 - 05:00) 855 сл.
  2. 5:00 Segment 2 (05:00 - 08:00) 632 сл.
0:00

Segment 1 (00:00 - 05:00)

Today, we're breaking down DeepSeek version 3. 2, the newest open-source model that's making a serious attempt to catch up with models like GPT5, Cloud 4. 5, and Gemini 3 Pro. And here's the thing, this release isn't just a bit better. Deep Seek built a new attention system, scaled reinforcement learning massively, and even created over 18,000 custom environments to teach the model how to use tools like a real agent. This paper is extremely technical, but I'm going to make it super simple and I'll later show you a live demo of how the model handles both complex reasoning and tool use. Let's start with why Deep Seek built this in the first place. According to the paper's introduction, open models were improving, but closed models like GPT5, Gemini 3 Pro, Cloud 4. 5 were improving faster. Instead of catching up, the gap was actually getting wider. And Deepseek identified three reasons why. Number one, attention was getting too expensive. Long context models pay a huge cost because they look at every token every time. Number two was that there was not enough RL compute. Open models weren't receiving enough reinforcement learning to develop deep reasoning like the closed source models. Problem three was that there was weak agent behavior. Open models struggle with tool use, multi-step task, debugging, searching, planning, etc. What this means for you and me is that you couldn't rely on open models for things like researching online, fixing code, verifying their own answers, solving Olympiad level math, or planning step-by-step tasks. So, Deepseek version 3. 2 is an attempt to fix all of that at once. So, the big idea behind Deepseek version 3. 2 2 is actually how they've changed the architecture and included something called deepseek sparse attention DSA for short. To understand it in the simplest way, most models treat every part of the input as equally important. Deepseek decided what if the model could figure out what really matters and focus only on that. Instead of reading everything, the model learns to prioritize. Imagine you ask someone to summarize a book. One person rereads all 300 pages every time. the other person flips straight to the highlighted paragraphs. Version 3. 2 is the second person. This one idea makes the model faster, more efficient, better at long context, and more capable of tackling multi-step reasoning. You don't need to understand the mechanics, just the outcome. The model now wastes less time and brain power, so it can use more energy on actually solving the problem at hand. Deepseek didn't just train the model normally, they coached it. They created specialist versions of the model. One trained heavily on math, one on coding, one on logic, one on search, and one on agent task. Basically, a team of tutors, each teaching the model a different skill. Then, Deepseek merged the best part of all of these specialists into a single unified model. This is why version 3. 2 feels more logical, more structured, and better at explaining its reasoning. It wasn't just trained, it was coached like a student preparing for an exam. Older models had a terrible habit. Every time they used a tool, ran code, searched the web, whatever, they forgot what they were thinking. So they had to restart the reasoning from scratch over and over again. Deepseek fixed that. Now the model keeps his reasoning across tool calls, remembers previous steps. It doesn't repeat itself. It doesn't lose track halfway through a task. Think of it like solving a math problem. You don't throw away your notes every time you pick up a calculator. That's what version 3. 2 2 does. That single change makes multi-step tasks dramatically smoother. So, now that we covered the major upgrades at a high level, let's jump into what you really want to see, how this model actually performs. We're going to run two demos, one on reasoning and one on tool use, which ties directly into the agent improvements we just discussed. Here's a reasoning style problem that matches the kind of evaluation DeepS used in the paper. The prompt is, a particle moves along a line with the acceleration of 3t ^2 - 12t + 9. At time z, the velocity is 4. At time 2, the position is 10. Find its position at time 4. Show your steps, and put the final answer in a box. What I want you to pay attention to is how the model breaks the problem into steps, structures the solution clearly, and finishes with a clear box final answer. Now that you have seen how Deep Seek version 3. 2 to handles reasoning. Let's look at where this model really stands out. Its ability to use tools like a proper AI agent. Earlier in the video, we talked about how version 3. 2 finally stops forgetting its thought process every time it calls a tool. This demo is a perfect example of what that unlocks. The prompt I'm giving this model is to plan a 3-day itinerary for Chicago. Budget about $350 per day. Include at
5:00

Segment 2 (05:00 - 08:00)

least one museum per day. At least one restaurant each day. must have a rating above 4. 2 and no repeating restaurants or attractions. Use tools when needed and explain each step as you go. Let's see this in action. You can see that the model is currently searching for attractions, looking at dining options and museums and is using the tool of online search at the moment. As you can see, the model starts by searching online based on the constraints I have given it, which is already a good sign. That means it's setting up the rules it needs to follow before beginning the search. As you can see, the tool call ability of Deepseek is obviously amazing. It was able to read 10 web pages and summarize the information in a clear and interesting manner. It was able to understand my constraint of my daily budget and ensure that my restaurants that it has highlighted were above that 4. 2 star benchmark. It also included a museum to visit on each single day and it gave me an estimated daily cost 310 for the first day, 345 for the second day and 340 for the other day. It also gave me a detailed daily itinerary depending on what I want to do. So day one is about downtown icons and art. Day two is about history, parks and city views. And day three is about architecture, scenic and neighborhoods. It also tells me that I can maximize my budget by using public transit. Look for free days. Many museums have sometimes free offers. So to take advantage of that and that's the Finnish itinerary. 3 days each one within budget with unique attractions and at least one high-rated restaurant. all built through multiple tool calls without breaking its reasoning chain. This is exactly the kind of agent behavior DeepSeek was targeting with thousands of practice environments we saw earlier in the paper is noticeably smoother, more organized, and more reliable than older open models that tried to act like agents. Before we end off, let's take a quick look at the benchmarks. On the left, you got GPT5 High and Gemini 3 Pro, the top Frontier models, but look at the two Deep Seek columns on the right. Deepseek version 3. 2 into thinking is already competitive across almost every reasoning task. But the real story is the special version on Amy HMMT and IMO bench and even code force ratings. Special actually matches or beats Gemini 3 Pro and gets surprisingly close to GPT5 high. For an open model, this is wild. These are scores we've never seen before from anything outside the closed labs. And it gets even crazier when you look at competition performance. Deepseek version 3. 2 2 special won gold across the board. Gold on the International Math Olympiad benchmark, gold on the Chinese math olympiate, gold on the Informatics Olympiad, and even a gold level of performance on ICPC, which is basically the World Cup of competitive programming. This is the first open model to ever hit gold medal level across all these categories. So, the takeaway is simple. Deepseek version 3. 2 isn't just good, it's historically good for open source. If you enjoyed this video, this is what we do here. Fast, clear updates on the biggest moves in AI. If you want to stay ahead of everything happening in this space, make sure you're subscribed. And if you want the hands-on side demos, tools, workflows, and everything developers can actually build, check out the world of AI. We also run a simple no noise newsletter that gives you the most important AI tools and updates in just a couple of minutes. Subscribe here. Follow World of AI. Join the newsletter.

Ещё от Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться