# New DeepSeek Research - The Future Is Here!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=fFL7la73RO4
- **Дата:** 04.02.2026
- **Длительность:** 12:35
- **Просмотры:** 289,930

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers
I use DeepSeek there by running an instance with enough GPU VRAM and using ollama.

📝 The #DeepSeek paper is available here:
https://arxiv.org/abs/2501.12948

Sources:
https://x.com/awnihannun/status/1883276535643455790
https://x.com/bcjordan/status/1886825587097878826
https://x.com/izag82161/status/1906347576204640514

Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi
 
My research: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=fFL7la73RO4) Segment 1 (00:00 - 05:00)

Another long video here so you fellow scholars know something is going on. Okay, so Deepseek did something huge. I think for the first time ever we might have the full recipe to create Chad GPT like intelligence and it is out there in the open for free for everyone. Their new work I think might be the gold standard for open-source releases. Look, a year ago they published a 20page paper and now a year later they extend to 80 pages. And this is not some filler material. This is gold. Why does that matter? You see, OpenAI keeps important parts of the CED GPT recipe secret. We all know that it can do incredible things. Get a gold medal at the biological olympiad or it can study with you. It passes the bar exam, then looks at a screenshot and writes an app for you that looks just like it. It's a crazy world. Now, OpenAI publishes some research papers about their techniques, but for me, some of these feel more like marketing documents. They don't contain nearly enough information to be reproduced. Check this out. Given the competitive landscape, this report contains no further details about the architecture, hardware, training, compute, data set construction or training method. And this is not a media article criticizing OpenAI. These are their own words in their GPT4 paper. So, OpenAI in general is not very open. But, you know, here I am with two minute papers and it's never 2 minutes. So, who am I to say anything here? Although, I also don't sell shares for shareholders for billions of dollars either. Now, finally, the folks at Deep Seek gave us the secret sauce to create such a model. Science is supposed to be open and reproducible for the benefit of humanity. This is a great step towards that. So, Deepseek is a smart and free AI model that you can run yourself. It needs lots of hardware power. So I usually just rent a GPU on Lambda and do it. It is super fast, reliable and private. I note that I don't have any relationship with the Deepseek people whatsoever. Now I'll tell you about five things from the paper that really surprised me and I think will surprise you too. I apologize as I don't have footage for everything here. So in the video part you'll see some LLM things, some simulation stuff, not just figures from the paper because I can't make a long video with just those. All right, first generate options. This is going to be nasty but efficient. Traditional AI assistants are often trained with a technique called PO. It's like an expensive private teacher who grades every single sentence a student writes. The student is the AI being trained and the teacher is a second equally huge AI model that critiques it. That's good, but it is also incredibly expensive and slow. But Deepseek does not do that. Deepseek fires the teacher. No more teacher. Bye. And here is where it gets brutal. Then the student gets one question and writes 16 different answers. Okay, that's crazy. Now what? Well, now we don't grade every sentence. We grade these answers against each other. We ask, did the code run? Was the answer correct? And the best one gets a medal and the crappy ones get discarded. Okay, so why does this work? Well, it works because this process can be made very cheap. So you can run it on a massive scale. This technique is called GRPO, group relative policy optimization. No expensive teacher needed anymore. Two, pause to think. Imagine a child taking a math test, always rushing through it and failing. Then she suddenly realizes, "Wait, let's stop, take a deep breath, double check everything, and wait a second. Aha! Now I am doing way better. " Yes, an AI can have an aha moment. Amazing. But here's the kicker. No one taught the child to do that. It learned it by itself. For the first time, I think researchers watched an AI naturally learn to think before speaking. Something some human beings could also learn from. So, it started generating words like wait or let me recalculate. And over time, check this out. It realizes that spending more time thinking leads to a higher score. So it started thinking longer and longer by itself. Absolutely amazing. Three, patience over theory. In more technical

### [5:00](https://www.youtube.com/watch?v=fFL7la73RO4&t=300s) Segment 2 (05:00 - 10:00)

terms, use pure reinforcement learning. Here's a question. How do you get better in chess? By reading a textbook or learning by playing millions of games? Well, I think the textbook is limited by human knowledge. But if you keep improving, playing against yourself has no limit. And Deep Seek here proved that you don't need the textbook. You don't need human examples to teach an AI to reason. Just give it the rules and let it play against itself. And wow, it evolved from a stuttering mess into a math genius completely on its own. And it discovered new strategies humans never even taught it. Here you see how rapidly it improves over time using this scheme. And goodness, it quickly got better than humans at solving tough competition math problems. It started at around a 15% success rate and went up to nearly 80% in not that much time. And now hold on to your papers, fellow scholars, because it was given zero examples of how to solve these. It found it out by itself. I think this is an absolute breakthrough. Four, find a flashlight. But wait, it still benefits greatly from a gentle nudge in the right direction. Yes, you can start it from zero knowledge. It is absolutely possible. But in that case, weird things can happen. If you do that, sometimes the model starts speaking gibberish or switches between languages like crazy. That's insane. So scientists at Deepseek say if you give it just a couple of examples as a guide, it will head off to the right direction immediately. If you want to find the treasure in a dark forest, you could wander randomly until you find it. But it's much faster with a flashlight. So look at R10 versus R1. Interestingly, this concept did not help its mathematical knowledge that much. up twoish percentage points in one case even down compared to the zero knowledge guy. Why? Well, because mathematics needs abstract thinking. You can think in whatever language you want. As long as the answer is correct, the test doesn't care. But look at this. Alpaka eval. This contains natural language questions you need to answer. Now here, if you start switching between French and English, that's going to be a problem, bro. In these cases, a good kick in the ass at the start of the learning process does wonders. It more than tripled its performance. Insane. And now five. Learn from giants. The star of the show. This is where we get incredible value. Distillation. Dear fellow scholars, this is two minute papers with Dr. Koa Eher. Imagine a Nobel Prizewinning physicist writing a physics for dummies book. You need the genius to write it, but not to read it. And that is the key. Deepseek applied this exact logic to AI. They took their huge R1 AI and had it write 800,000 examples of how it thinks. Essentially, a textbook. And then this textbook can be used to teach small and cheap models how to think similarly. Okay. Now the question is how smart are the small guys? Have a look. This is kind of shocking. It beats the huge previous GPT40 model nearly six times better on competition level math questions. Nearly six times. And this is just 7 billion parameters. This thing is tiny. This runs easily on many laptops or even on your phone in the next couple years. This used to be the state-of-the-art one and a half years ago. It needed billions and billions of dollars to train. And now you get something for free that is almost six times smarter on this data set. What a time to be alive. Now, here comes the kicker. These concepts we learned here are great to make an AI. Yes, but you can also use them to improve yourself, too. Isn't that incredible? Just think about it. One, group policy learning. Don't just settle for your first idea. Generate five different solutions to your problem, then grade them against each other. Pick a winner. Two, pause to think when you face a hard question. Don't rush. Force yourself to say, "Wait," and doublech checkck your logic. The extra time pays off. You saw it with the AI, too. Three, practice over theory. Stop reading those endless tutorials. Yes, maybe read a bit. You need to learn the fundamentals, but you often learn a lot faster by doing the task and failing. This way

### [10:00](https://www.youtube.com/watch?v=fFL7la73RO4&t=600s) Segment 3 (10:00 - 12:00)

you can self-correct and learn a lot. So, we can learn so much more than just AI stuff from this paper. It's incredible. And you can supercharge your own thinking this way. And scientists at Deep Seek put all this incredible wisdom into an AI. And look at what incredible things it can do. And to put all this knowledge out there in the open. I hope the other labs are taking notes. And whatever model you have from the giant labs today, you are going to be able to run yourself for free forever privately in hopefully about one or two years. That is I am out of words. These things cost billions to train and we get them for free soon after. We are absolutely spoiled here. Now, of course, many of you wise fellow scholars know that we are not the first to come out with this video. I am sure you type whatever about deepseek you will see dozens of videos published day one often even day minus one. What are they about? That doesn't matter. Just be first and put a video out there. Just say some stuff and get all the ad revenue. Maybe do a video for things that don't even exist. That's day minus one. And then when it appears, you get all the views and in 24 hours there's going to be some other drama or release. So just rinse and repeat. Yes, people do that. But we don't do that here. I took some more time to cook this. So you fellow scholars get a better higher quality video. This is not great for getting views and money. But we are not maximizing money here. We are maximizing meaning. So, if you think that this is the right way to do things, please subscribe, hit the bell, and leave a really kind comment. This helps the YouTube algorithm get you the good stuff, too. And check out Lambda in the description because it's amazing. And if you sign up through it, it helps us exist. Here you see me running the full Deepseek AI model through Lambda GPU Cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful NVIDIA GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda. ai/papers AI/papers or click the link in the description.

---
*Источник: https://ekstraktznaniy.ru/video/11397*