# The “Biggest” AI That Came Out Of Nowhere!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=4bFDPVe6BHs
- **Дата:** 15.07.2025
- **Длительность:** 3:59
- **Просмотры:** 142,068
- **Источник:** https://ekstraktznaniy.ru/video/12247

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

Guide for using DeepSeek on Lambda:
https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video

Kimi K2:
https://moonshotai.github.io/Kimi-K2/
API: https://platform.moonshot.ai

Run it yourself locally: https://x.com/unslothai/status/1944780685409165589

Sources:
https://x.com/chetaslua/status/1943681568549052458
https://x.com/satvikps/status/1944861384573169929

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder

## Транскрипт

### Segment 1 (00:00 - 03:00) []

This is Kimi K2 and it’s a bit like a  Swiss army knife the size of a building.    Huge but somehow still handy and does  useful things. Oh yes, give me that. And for me, I feel that it just  came out of nowhere. This is the   biggest open language model AI, and  perhaps the most surprising one,   because it might be the smartest  non-thinking model out there. The numbers certainly indicate that, but we are  Fellow Scholars here, so we like to look a bit   more closely. This one trillion, yes trillion  parameter model can code up a cool interactive   3D mountain scene for you, can create a visual  analysis of remote work trends. And, remember   the coding up the classic bouncing ball experiment  from earlier? Passing with flying colors. It can   even give you motion trails and lets you play  with a couple parameters to change the game. Now, it can run commands and edit files.   Thus, hold on to your papers Fellow Scholars,   because if you ask it to create a Minecraft-like  game, this is what happens. And at the end,   you’ll see the game, it has some obvious  problems, but otherwise, super impressive   from just one tiny prompt. Everyone can  become a coder. What a time to be alive! Okay, so how is this even possible? How does  it do all this magic? Dear Fellow Scholars,   this is Two Minute Papers  with Dr. Károly Zsolnai-Fehér. Well, as far as we know, it uses fewer  heads and more experts than DeepSeek. Okay,   what does that mean? Well, it is a bit  like a well-run hospital. It’s less like   one brilliant general doctor trying  to diagnose everything, and more like   a huge hospital that instantly routes you to  the best specialist for your specific issue. What does that mean in practice? More compute  efficiency overall. Fewer parameters are   activated at the same time when you use it.   And it works extremely well, however, wait,   there is a tradeoff here. As a result, it  is a bit thin on a tough academic benchmark   like Humanity’s Last Exam - 4. 7% success rate.   The thinking DeepSeek can get up to about 14%,   while the best closed models are at  21-25%, with more results coming soon. Note that of course, this is meant  to be a relatively speedy model,   while being really smart. It  competes really well against   those competitors. Plus it offers  really cheap pricing for API access. And it has a secret ace up its sleeve too. It uses  something that they call the MuonClip optimizer,   which is more robust when building incredibly  huge AI models than the previous Adam optimizer   that basically everyone uses. It makes these  training curves less spiky, and behaves a bit   like a surge protector to make sure there  are fewer spikes here, and that the curve   does not blow up. What does all that mean? Well,  MuonClip is the surge protector that helps run   this little hospital smoothly. This would be the  Two Minute Papers explanation. And I think this   idea might be one of the important puzzle pieces  in training the largest AI models in the world. The link is in the description, try it out.
