The “Biggest” AI That Came Out Of Nowhere!
3:59

The “Biggest” AI That Came Out Of Nowhere!

Two Minute Papers 15.07.2025 142 068 просмотров 4 995 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda: https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video Kimi K2: https://moonshotai.github.io/Kimi-K2/ API: https://platform.moonshot.ai Run it yourself locally: https://x.com/unslothai/status/1944780685409165589 Sources: https://x.com/chetaslua/status/1943681568549052458 https://x.com/satvikps/status/1944861384573169929 📝 My paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers My research: https://cg.tuwien.ac.at/~zsolnai/ X/Twitter: https://twitter.com/twominutepapers Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

Оглавление (1 сегментов)

Segment 1 (00:00 - 03:00)

This is Kimi K2 and it’s a bit like a  Swiss army knife the size of a building.    Huge but somehow still handy and does  useful things. Oh yes, give me that. And for me, I feel that it just  came out of nowhere. This is the   biggest open language model AI, and  perhaps the most surprising one,   because it might be the smartest  non-thinking model out there. The numbers certainly indicate that, but we are  Fellow Scholars here, so we like to look a bit   more closely. This one trillion, yes trillion  parameter model can code up a cool interactive   3D mountain scene for you, can create a visual  analysis of remote work trends. And, remember   the coding up the classic bouncing ball experiment  from earlier? Passing with flying colors. It can   even give you motion trails and lets you play  with a couple parameters to change the game. Now, it can run commands and edit files.   Thus, hold on to your papers Fellow Scholars,   because if you ask it to create a Minecraft-like  game, this is what happens. And at the end,   you’ll see the game, it has some obvious  problems, but otherwise, super impressive   from just one tiny prompt. Everyone can  become a coder. What a time to be alive! Okay, so how is this even possible? How does  it do all this magic? Dear Fellow Scholars,   this is Two Minute Papers  with Dr. Károly Zsolnai-Fehér. Well, as far as we know, it uses fewer  heads and more experts than DeepSeek. Okay,   what does that mean? Well, it is a bit  like a well-run hospital. It’s less like   one brilliant general doctor trying  to diagnose everything, and more like   a huge hospital that instantly routes you to  the best specialist for your specific issue. What does that mean in practice? More compute  efficiency overall. Fewer parameters are   activated at the same time when you use it.   And it works extremely well, however, wait,   there is a tradeoff here. As a result, it  is a bit thin on a tough academic benchmark   like Humanity’s Last Exam - 4. 7% success rate.   The thinking DeepSeek can get up to about 14%,   while the best closed models are at  21-25%, with more results coming soon. Note that of course, this is meant  to be a relatively speedy model,   while being really smart. It  competes really well against   those competitors. Plus it offers  really cheap pricing for API access. And it has a secret ace up its sleeve too. It uses  something that they call the MuonClip optimizer,   which is more robust when building incredibly  huge AI models than the previous Adam optimizer   that basically everyone uses. It makes these  training curves less spiky, and behaves a bit   like a surge protector to make sure there  are fewer spikes here, and that the curve   does not blow up. What does all that mean? Well,  MuonClip is the surge protector that helps run   this little hospital smoothly. This would be the  Two Minute Papers explanation. And I think this   idea might be one of the important puzzle pieces  in training the largest AI models in the world. The link is in the description, try it out.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник