at Deepseek and their new release, Deepseek 3. 2. If you're not familiar, they were the first big model out of China where everybody was freaking out that, hey, the Chinese have the Frontier models now and they're open sourcing them. But ever since then, the attention on them was kind of declining and now they released a big new release to compete with some of the Frontier models out there. And the thing is, some of these benchmarks are actually on par with some of top model makers. I mean, look at this. The Thinking Model scoring 93 on benchmarks where Google Gemini 3. 0 0 Pro that came out two weeks ago is at 95. Agenta capabilities on par with some of the top models. Really impressive stats, but what matters at the end of the day is how it performs in a real world. So, I want to show you if you pull up their website and go to their platform, their application, and just loging in with your Google account, you can use this for free as you can many of the competitors. Sure, there's a usage limit after a few dozen messages, but this is completely for free. They don't even sell you a plan. You can kind of just use this. Now, as you'll see, my latest chats are from November and January when they had their latest releases. I kind of just tried it out a little bit and didn't really go back to it. And I'm going to do the same thing here. I think the selling point of the big platforms isn't just the models, it's also the feature set that it brings. And here, it's very limited. But let's give this a fair shot anyway. And what I wanted to do today is pull up examples from the Opus 4. 5 video last week, which if you're not familiar, is the biggest and baddest model in the entire space that actually kept its initial hype. A lot of people are using it every day for a variety of tasks and they're absolutely loving it. And in that video, I compared it to the big release before that, which was out of Google's Gemini 3. 0. And I ran two prompts and I want to just try the same two prompts so we can then compare Gemini versus Opus versus Deepseek. Now, again, I just want to highlight that this model is actually open source, so people can download it and use it locally, build it into their apps. But let's see how it performs. This rather open-ended prompt, create a visually stunning design website for a studio that will impress web front- end developers, will show us what kind of front end this creates on the first try. Obviously, this is no comprehensive test, but interesting nonetheless. While it's doing that, I'm also going to run the second prompt that I tried there, which creates a visual. Create an SVG of the Death Star in the sky above Los Angeles. Nice. Let's run this and see what we get. Okay, so we got our death star over LA. I'm really curious to see what this will look like. There's a quote. The more you tighten your grip, the more traffic will slip through your fingers. There's some moving pieces. Are these cars? The Death Star. Does that even look like a Death Star. I say Opus is the winner on this benchmark with this probably being third. Hey, not a perfect test, but it's worth something. All right, it took a whole while to write this code, but let's see what it did here. I can see it did all of it in one HTML file. Okay, that's not bad. I mean, I like the particles, but honestly, compared to both what Gemini and Opus did, this is kindergarten level. Sure, it did all of it in one HTML file, but hey, I give it free reign over how to do it. And yeah, compared to what the other models did, this is not even close. So, this is why benchmarks aren't everything. Honestly, after one week of usage and trying all the different models and now seeing this, my personal recommendation right now would really be with Opus 4. 5. It's just so damn good. and reliable. But that would just be my recommendation right now. Deepseek, I'm going to close out here and probably not touch for a whole while until they release something new like