# Kimi drops Open Claude? | AI workflows to Supercharge your Setup!

## Метаданные

- **Канал:** MattVidPro
- **YouTube:** https://www.youtube.com/watch?v=CjdHj85zVI0
- **Дата:** 30.01.2026
- **Длительность:** 14:02
- **Просмотры:** 6,430
- **Источник:** https://ekstraktznaniy.ru/video/11366

## Описание

Huge thanks to RunwayML for sponsoring today’s video! Check out gen 4.5 here: https://runwayml.com/?utm_source=creator&utm_medium=sponsored&utm_campaign=gen45&utm_content=mattvidpro

▼ Link(s) From Today’s Video:
Krea Edit: https://x.com/viccpoes/status/2015166624362627090 https://x.com/EsotericCofe/status/2015580812692074521
Hunyuan Image Instruct: https://x.com/TencentHunyuan/status/2015635861833167074
Qwen 3 TTS Demo: https://x.com/HuggingModels/status/2015467450473840858 https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
BlenderMCP: https://x.com/hughmfer/status/2015439996010824035 https://github.com/ahujasid/blender-mcp
Kimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599 https://huggingface.co/moonshotai/Kimi-K2.5/tree/main https://x.com/KimiProduct/status/2016081756206846255 https://x.com/KimiProduct/status/2016150324789309709
Z Image: https://modelscope.cn/models/Tongyi-MAI/Z-Image/summary?version=master 
LTX-2 Workflows: https://x.com/wildmindai/status/20

## Транскрипт

### Segment 1 (00:00 - 05:00) []

What's up everybody? Welcome back to the Matt Vidpro AI YouTube channel. AI is evolving so quickly that we are constantly inventing brand new workflows using brand new AI technology to break down brand new barriers. Addie here is showing off the Crea Realtime Edit. This is a nano banana style editor that works in real time. And as you can see, he's hooked it up to his webcam. This is a big upgrade for realtime AI. You can see he's drawing a tattoo right on his arm based off of the webcam. His face actually looks sharper in the AI generated output. The webcam quality actually isn't all that great. Here he is whipping around a lightsaber. You can see it's also adjusting the lighting in real time. You can see obviously it's during the day, but this makes it look like it's completely at night. Is it perfectly consistent every time no matter what? No. But just the capability that this grants you, I think is a really huge leap forward. The grills are honestly pretty funny, but it's showing off how intelligent the AI truly is. It's only handling exactly what he's asking for and leaving everything else untouched in real time. Here, Vic is showing off a similar but different workflow. He's created several loose idea style concepts that can be moved around and dragged inside of the editor to adjust the shape on the fly. you can effectively style your scene in real time. And we've had tech like this for a long time, but now it is being done with a nano banana style architecture. The model is much smarter. And I think you can tell just by the output, the way it kind of gets things a lot better instantly. There's less hallucination and you can prompt and iterate faster. Nucleus shows off how a real simple 3D scene can be turned into a realistic render automatically adding reflections onto the pavement. This is a great point. We're not too far away from diffusionbased game engines or game graphics rendering pipelines. Tensen Huan also released Juan Image 3. 0 instruct. This is natively multimodal focused on image editing at 80 billion parameters. It's designed to be completely state-of-the-art. They do appear to have a website where you can actually try this. I don't know about you guys, I do not have WeChat or QQ. If you want to put your email in, that's going to be up to you. Absolutely a Nano Banana Pro style competitor from China. This little demo video shows off some pretty awesome examples. Honestly though, being closed source, it doesn't entice me too much over Google's actual Nano Banana Pro, which I use pretty much almost every day at this point. You might recall we recently talked about Quen 3 texttospech. This thing can clone voices instantly. I even tried it on myself. Hugging Models has a pretty awesome demo here showing off some familiar voices so you can get an idea of how incredible this thing is. Only 1. 7 billion parameters and it's this good. And if you guys didn't see the last video, this one's also completely open source. — This is the best text to speech generator you can use right now. — You can easily clone anyone's voice. — It's so good [snorts] at handling emotions. It can even do accents and different languages. — You can even prompt the exact voice you want. — It works with low VRAM and is super fast. — Good one. Okay, fine. I'm just going to leave this sock monkey here. Goodbye. — Okay. Yeah, I resent you. I love you. I respect you. But you know what? You blew it. And thanks to you. — Hey, you dropped your uh calculus notebook. I mean, I think it's yours, maybe. — Oh, wow. My mortal enemy, Mr. Thompson's problem sets. Thanks for rescuing me from that F. — No problem. I actually kind of finished those already. if you want to compare answers or something. — This thing is absurdly capable and runs on a lot of consumer- grade GPUs. I recommend using Google's anti-gravity to install open source projects like this and get them running locally on your own machine, but I would be interested in doing a deeper dive tutorial on installing this one. It's way too good and way too much fun. Before we dive any deeper, I've got a quick word from today's sponsor. And honestly, they are a fantastic fit. You ever look at a still image and wish you could just hit play? Well, that's exactly what Runway's new Gen 4. 5 image to video feels like. I'm not talking about morphing. I'm talking about bringing a frozen moment to life. I've been testing this new model and the fidelity is ridiculous. It understands physics now. Nailed my one-armed man test. Drop in a concept and it doesn't just animate it, it simulates it. The lighting, weight

### Segment 2 (05:00 - 10:00) [5:00]

texture, all stay consistent. If you want to try hitting play on your own images, use the link down in the description below. They actually just bumped the discount for us. It's now 20% off for any new subscriber. Go try Gen 4. 5. Seeing is believing. Huge thanks to Runway ML for sponsoring today's video. Now, back to your regularly scheduled content. Welcome back, folks. There is a Blender MCP floating around that integrates with LLMs to essentially vibe code 3D assets. Blender is an open 3D modeling software platform, very capable, been around for a long time. Sidarth Ahuja, hopefully pronouncing that correctly, is the creator of this Blender MCP. Hugh is showing off an entire farm set that they built with this Clawude MCP. And my oh my is it ever impressive. I haven't tried this one personally myself, but I am very eager to set it up. I don't do 3D modeling and I don't have Blender installed, but this is so awesome. As Hugh points out, if you are trying to build video games and you don't know anything about modeling or design, this is a gamecher. I mean, solo dev teams are able to accomplish so much more than they used to utilizing AI tools. It is a completely new world we are living in. I mean, using Google's anti-gravity alone to install GitHub projects such as this one and then utilizing them to create a full 3D farm asset scene like you see here. This is something that used to take months of work but is now compressed into maybe half a day's time. I mean, this is a lot. Maybe this is like a day and a half at the most, right? But seriously, this used to take multiple people weeks. Stunning example. another one that I would be interested in doing a full deep dive video on. Of course, guys, as always, everything is linked down in the description below. And speaking of opensource, this is Kimmy K2. 5. This is an open-source visual agentic intelligence language model. It's built to compete with the greats. Moonshot really seems to have cooked here. Not only does this natively intake images, but it also natively intakes video and scores fantastic benchmarks. Feel free to pause right here and take a look at these suckers. You can see in aentic tasks it is smoking clawed chatgpt and gemini not by very much mind you. And everywhere else it's either trading blows or a little bit worse. The highlights for this one really are in the workflows that they demonstrate in the fact that it is fully open- source weights and code. Check it out on hugging face. Absolutely beautiful. You know, the community is going to be going to work to see how they built this thing and of course maneuvering it, customizing it for all kinds of different workflows. Here they show off a oneshot video to code result from Kimmy K2. We upload a video of an existing website, just basically a screen cap scrolling through it, very simple, but the website itself is complex. It not only clones the entire website from the video input, but the specific visual interactions and UI design that it entails. Interpreted video tokens into actual output code tokens that fully make sense and run perfectly. And it's open- source. Like, wo, take a good look for yourselves, guys. This is such an interesting benchmark to test multimodality with video and code. As the user scrolls down, you can see the text start to pull away a little bit and it zooms in through the aircraft window. Now, is the real handcrafted by a human one much better? Of course, we have the text fading in as well. You can see it's not getting every single aspect and piece, but this is easily a graduate level replication, right? Especially using only code. Like this plane is probably a real image asset where this is being created entirely with code. But yeah, a pretty incredible demo I have to say. Anthropic, OpenAI, Google, they have to stay on their toes because opensource competitors like this, these models hang around longer and up everybody's game. They typically end up becoming cheaper options if the big closed source companies don't try to compete. Kimmy also has the agent swarm. You can parallel your work. If you remember, Elon's Grock tried something similar with four agents running in parallel. This seems to use a lot more. With one prompt, they got an 100 megabyte long Excel file generated with imagery. A total of 55 scenes to cover a in-depth 10-minute story. That is freaking cool, man. This is the kind of stuff I really want to try and I wish Anthropic, Google, OpenAI implemented an agent swarm sending out multiple parallel agents to complete work. So fascinating.

### Segment 3 (10:00 - 14:00) [10:00]

So yeah, Kimmy K2 is available today and it is a potent LLM. It's competing with other open- source LLMs and closed source ones. The space as a whole in early 2026 is clearly full steam ahead. There is no slowing down. The amount of digital work you can do right now in a short period of time is shrinking drastically. All of these little computer agents working in parallel to complete tasks. It's nuts. The full nondistilled version of Zimage is here. This in a somewhat similar way principally to the 10-cent model we looked at earlier is customizable image generation and editing at a high quality level. I'm happy to say that the full version of this is also open source. Quality level is totally up there with the 10-centent model that we looked at earlier along with, you know, Nano Banana Pro or GPT image 2. It does text, it does design, photo realism, anime. What's also interesting is that they dropped an image to Laura. It takes a single image as input and instantly outputs a custom Laura tailored to that specific style or feature. Very, very cool. I haven't heard of that before. The LTX2 community has been on such a roll since it released fully open source. Here, Wild Minder is sharing us a complete list of LTX2 workflows. First and last frame, talking avatar, image to video, text to video, and Quen TTS included for consistent voice cloning. If you want to try making more narrative pieces with LTX2, now is absolutely your time. And this is going to be your first resource for the LTX2 community. At this point, it seems trivial to make stuff like this. — I keep trying to make 1080p videos, but kept running [snorts and music] out of VRAM, but finally Lord Ki delivered. So what are you waiting for? Go make some films. — See what comes. Amazing. Finally, I figured I would leave you guys with this. Quen 3 Max thinking outperforms all other state-of-the-art models including Gemini 3 Pro, GPT 5. 2, etc. In humanity's last exam with search tools, it achieves nearly 60% which is very high. Less than a year ago, above 30% would be considered very good. And now we have doubled that. The hype train is already really starting to pick up again. AI winter, I don't think so, guys. Out of all of the benchmarks that we're even looking at here, humanity's last exam is the only one that's not starting to reach full saturation. We aren't even out of January yet, and I can really start to feel the pull. This is my fifth or sixth year following AI closely. There's a lot of folks that want to say it's purely a bubble, but real world work is getting done a lot faster right now with advancements that are being shipped every single day from all corners of the globe. I'm not saying there aren't any real financial bubbles out there, but sitting at home as one person with a computer, the amount of work that I can do in an hour is already greater than it was just 6 months ago, let alone a year, 2 years ago. Crazy times, though. Thanks for riding the wave and staying on the pulse with me. If you want to be even more up to date, I recommend you check out my Discord server. The AI news leaks channel is constantly distributing the latest happenings and things you can try today. Thanks so much for watching. I'll see you in the next video and goodbye.