The “Biggest” AI That Came Out Of Nowhere!

3:59

The “Biggest” AI That Came Out Of Nowhere!

Two Minute Papers 15.07.2025 142 068 просмотров 4 995 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda: https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video Kimi K2: https://moonshotai.github.io/Kimi-K2/ API: https://platform.moonshot.ai Run it yourself locally: https://x.com/unslothai/status/1944780685409165589 Sources: https://x.com/chetaslua/status/1943681568549052458 https://x.com/satvikps/status/1944861384573169929 📝 My paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers My research: https://cg.tuwien.ac.at/~zsolnai/ X/Twitter: https://twitter.com/twominutepapers Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

Оглавление (1 сегментов)

Segment 1 (00:00 - 03:00)

This is Kimi K2 and it’s a bit like a Swiss army knife the size of a building. Huge but somehow still handy and does useful things. Oh yes, give me that. And for me, I feel that it just came out of nowhere. This is the biggest open language model AI, and perhaps the most surprising one, because it might be the smartest non-thinking model out there. The numbers certainly indicate that, but we are Fellow Scholars here, so we like to look a bit more closely. This one trillion, yes trillion parameter model can code up a cool interactive 3D mountain scene for you, can create a visual analysis of remote work trends. And, remember the coding up the classic bouncing ball experiment from earlier? Passing with flying colors. It can even give you motion trails and lets you play with a couple parameters to change the game. Now, it can run commands and edit files. Thus, hold on to your papers Fellow Scholars, because if you ask it to create a Minecraft-like game, this is what happens. And at the end, you’ll see the game, it has some obvious problems, but otherwise, super impressive from just one tiny prompt. Everyone can become a coder. What a time to be alive! Okay, so how is this even possible? How does it do all this magic? Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Well, as far as we know, it uses fewer heads and more experts than DeepSeek. Okay, what does that mean? Well, it is a bit like a well-run hospital. It’s less like one brilliant general doctor trying to diagnose everything, and more like a huge hospital that instantly routes you to the best specialist for your specific issue. What does that mean in practice? More compute efficiency overall. Fewer parameters are activated at the same time when you use it. And it works extremely well, however, wait, there is a tradeoff here. As a result, it is a bit thin on a tough academic benchmark like Humanity’s Last Exam - 4. 7% success rate. The thinking DeepSeek can get up to about 14%, while the best closed models are at 21-25%, with more results coming soon. Note that of course, this is meant to be a relatively speedy model, while being really smart. It competes really well against those competitors. Plus it offers really cheap pricing for API access. And it has a secret ace up its sleeve too. It uses something that they call the MuonClip optimizer, which is more robust when building incredibly huge AI models than the previous Adam optimizer that basically everyone uses. It makes these training curves less spiky, and behaves a bit like a surge protector to make sure there are fewer spikes here, and that the curve does not blow up. What does all that mean? Well, MuonClip is the surge protector that helps run this little hospital smoothly. This would be the Two Minute Papers explanation. And I think this idea might be one of the important puzzle pieces in training the largest AI models in the world. The link is in the description, try it out.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник