# DeepSeek V4 LEAKED: A Coding-First Model That Changes Everything!

## Метаданные

- **Канал:** Universe of AI
- **YouTube:** https://www.youtube.com/watch?v=GOrih2V9DUM
- **Дата:** 13.01.2026
- **Длительность:** 9:11
- **Просмотры:** 15,553
- **Источник:** https://ekstraktznaniy.ru/video/9520

## Описание

DeepSeek V4 is rumoured to launch around the Spring Festival — and it may introduce a new architecture built for coding-first performance.

In this video, I break down the leaks, the Engram memory system, what’s confirmed vs reported, and why DeepSeek V4 could change how AI models handle reasoning and long-form code.

Sources:
1) https://www.theinformation.com/articles/deepseek-release-next-flagship-ai-model-strong-coding-ability
2) https://news.aibase.com/news/24467
3) https://x.com/chetaslua/status/2011013228130943165/photo/1

For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models:  ‪‪ ⁨‪‪‪‪‪‪‪@intheworldofai 

🔗 My Links:
📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com
🔥 Become a Patron (Private Discord): /worldofai
🧠 Follow me on Twitter: https://x.com/UniverseofAIz
🌐 Website: https://www.worldzofai.com
🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://inth

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Deepseek is back in the conversation. According to multiple leaks and insiders, Deepseek is preparing to release Deepseek version 4 around the Spring Festival, likely midFebruary, and internal tests suggest it could outperform GPT and Claude in coding. But here's the thing, this isn't just about benchmarks. This looks like a fundamental architectural shift. Today, I want to walk you through how Deep Seek got here, what's actually leaked about version 4, the new Ingram architecture, and why this could quietly be one of the most important model releases of 2026. So, let's get into it. Before talking about version 4, it's important to understand Deepseek's pattern because they don't release models randomly. They've been very deliberate. It really starts with Deepseek version two. Version 2 didn't really shock people by beating GPT4. What surprised everyone was efficiency. The version two model came with something called MLA, which is multi head latent attention, which showed that you could get strong reasoning performance without brute forcing scale. That was the first sign Deepseek cared less about headlines and more about how models actually work. Then came Deepseek version 3. This is where things started to get a bit more serious. Version 3 leaned heavily into mixture of experts ore, but not as a research demo, as something practical. The standout moment here was simple. Excellent coding and reasoning at a fraction of the cost compared to the large scale models. Quietly, version 3 became one of the best open coding models available, especially for long sessions. A lot of developers started using it without even talking about it. Then just before last spring festival, Deepseek dropped R1. R1 was different. It wasn't a general model. It was a reasoning first model. Long chains of thought, structured problem, deeper logic, almost like Deep Seek saying before we scale further, less understanding. That's important because version 4 doesn't feel like version 3 but bigger. It feels like everything converging. So what do we actually know about Deepseek version 4? First, the timing. Multiple insiders point to a spring festival release likely by midFebruary. That lines up perfectly with Deep Seek's history. When they release around this time, it's usually intentional and it's kind of a statement. Second is the structure. Leaks suggest two versions. The first one being the version 4 flagship model optimized for long heavy coding sessions and the second one version 4 light focus on speed and responsiveness. That alone is telling. It suggests that Deep Seek is designing around real usage patterns. Long form builders versus fast interactive users, not just chasing one benchmark number alone. Third, and this is the headline, coding first performance. Internal tests reportedly show version 4 outperforming claude and chat GPT in certain coding dimensions, especially long code generation, multifile reasoning and maintaining structure over time. If that is true, the implications are big because an open costefficient coding first model changes who can build serious software with AI. And there's one more rumor that matters a lot. There are strong signals that version 4 may no longer separate reasoning and general models. Meaning what Deepseek learned from R1 could now be baked directly into the flagship model. If that's the case, version 4 isn't just better at coding, it's better at thinking while coding. And that brings us to the architecture behind all of this. Recently, DeepSeek released a paper called conditional memory via scalable lookup. This paper also introduces something called ingrim. And this is likely the secret sauce behind version 4. Here's a core idea in plain language. Stop forcing models to memorize everything. Most modern models, especially models, mix logic, reasoning, and factual knowledge inside the same expert layers that creates tension. The model is constantly balancing remembering facts versus actual reasoning. Ingram separates those roles. You could think of it like a cyborg brain. One part of the system handles dynamic computation, logic, semantics, planning, reasoning, code structure. The other part handles static memory, massive knowledge storage retrieved only when needed and no reasoning, just recall. And here's where it gets wild. Ingram allows 01 lookup into a massive memory tables, even billion parameter ones stored in CPU RAM, not GPU VRAM. Meaning almost zero extra GPU costs, huge knowledge capacity, faster inference, and cheaper deployment. It's like giving the model an external hard drive, and finally letting the GPU do

### Segment 2 (05:00 - 09:00) [5:00]

what it does best. Think this architecture is especially powerful for coding. Most coding models struggle with two things. Staying coherent over long sessions and not getting overloaded by memorized syntax and APIs. Ingram changes that dynamic. Instead of memorizing everything, the model retrieves facts, reasons about structure, plans before writing. That's exactly what you want for multifile projects, refactoring, long coding prompts, and complex logic. This also explains why Deepseek is rumored to be positioning version 4 as a coding first whale. Not just flashy demos, but just raw sustained capability. Now, let's talk about benchmarks. And I want to be very clear about what's confirmed versus what's reported. First, the confirmed part. In Deep Seek's newly published Engram paper, they test long context performance head-to-head against a standard 27 billion baseline model, and the result is consistent. Engram matches or beats the baseline while using less training compute. And when training conditions are equal, it outperforms across nearly every long context metric. On document level perplexity, books, papers, and code, engram holds par or improves. On ruler, which stresses long context reasoning, memory, and structure, engram shows clear gains across multihop reasoning, symbolic task, and long range question answering. This is the real difference and it's published and it's already shows that the separating memory from reasoning works. Now the second part is reported internal testing. So it's not public leaderboard results but according to multiple industry sources and Chinese AI community summaries. Deepseek has seen meaningful internal improvements when comparing engram integrated models against previous Deepseek baselines. Those reported gains come in the form of noticeable improvements on reasoning style evaluations similar to BBH, moderate but consistent gains on math focused tasks and stronger performance on coding evaluations, particularly in long context and multifile settings. These aren't official benchmarks yet, but what they do indicate is that the model is definitely getting better. And when you line those reports up with publish engram results, the story stays consistent. Longer context holds together better. Reasoning stays stable deeper into the prompt and coding benefits from not overloading the model with memorized facts. That's why these leaks are being taken seriously. Not because of one number, but because the evidence and the rumors point in the same direction. Now, let's be real for a second. Not everyone is convinced yet. Some of the leak language is vague and there's still no official confirmation from Deepseek. But here's why the story still matters. Deepseek has a very clear pattern, especially around the spring festival. They also just published a real technically serious paper that lines up almost too well with what's being rumored. And historically, Deepseek doesn't overhype. They ship and then people realize what just happened. So, this doesn't feel like random noise. If Deepseek version 4 actually lands with Ingram style memory, integrated reasoning, and a coding first focus, this isn't just another strong model. It's a completely different way of thinking about how models should work. You stop forcing one network to memorize everything. You separate memory from reasoning and you get longer coherence, better planning, and cheaper inference all at once. And if that scales, it doesn't just make V4 interesting. It forces everyone else, OpenAI, Anthropic, Google, to respond. The exciting part, we don't have to wait long. If version 4 drops midFebruary, I'll run the real benchmarks, test the coding claims, compare costs, and see whether this actually lives up to the hype or not. If you want that breakdown the moment it drops, you know what to do. Subscribe to the channel. Make sure to subscribe to our channel. We do real tests, not just headlines. Make sure you're also subscribed to the world of AI. And don't forget to check out our newsletter for deeper breakdowns you won't see on YouTube. And I'm growing my Twitter following, so make sure you follow me on Twitter as well. Hope you guys enjoyed today's video and I'll see you in the next
