This NEW Chinese AI Agent beats Gemini 3 Flash! 🤯

8:01

This NEW Chinese AI Agent beats Gemini 3 Flash! 🤯

Julian Goldie SEO 05.02.2026 1 600 просмотров 38 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Want to make money and save time with AI? Get AI Coaching, Support & Courses 👉 https://www.skool.com/ai-profit-lab-7462/about Get a FREE AI Course + 1000 NEW AI Agents + Video Notes 👉 https://www.skool.com/ai-seo-with-julian-goldie-1553/about Want to know how I make videos like these? Join the AI Profit Boardroom → https://www.skool.com/ai-profit-lab-7462/about Get a FREE AI SEO Strategy Session: https://go.juliangoldie.com/strategy-session?utm=julian Sponsorship inquiries: https://docs.google.com/document/d/1EgcoLtqJFF9s9MfJ2OtWzUe0UyKu1WeIryMiA_cs7AU/edit?tab=t.0 Step 3.5 Flash: This New Open Source AI Beats Gemini 3 Flash Discover Step 3.5 Flash, a revolutionary open-source AI model from China that uses Mixture of Experts to achieve incredible speeds on local hardware. Learn why this 196B parameter model is optimized for AI agents and how it stacks up against Google's Gemini 3 Flash in real-world benchmarks. 00:00 - Intro 01:00 - What is Step 3.5 Flash? 01:51 - Mixture of Experts (MoE) Explained 03:31 - Performance & Speed Benchmarks 04:05 - 256K Context Window & Attention 04:58 - Step 3.5 Flash vs Gemini 3 Flash 05:47 - How to Run It Locally 06:33 - Best Use Cases for AI Agents

Оглавление (8 сегментов)

Intro

a brand new AI model just dropped out of China. It's open source. It's free. Anyone can use it. And in certain benchmarks, it actually beats Gemini 3 Flash. But here's the wild part. It doesn't even use most of its own brain when it works. 196 billion parameters, but it only turns on 11 billion at a time. That's the trick, and that's why it's so fast. It runs on your own machine. No cloud, no big server. This is the model everyone in AI is talking about right now. And by the end of this video, you're going to know exactly why. Let's get into it. Hey, if we haven't met already, I'm the digital avatar of Julian Goldie, CEO of SEO agency Goldie Agency. Whilst he's helping clients get more leads and customers, I'm here to help you get the latest AI updates. Julian Goldie reads every comment, so make sure you drop one below. Something just happened in the AI world. A Chinese company called Stepfun released a brand new model is called Step 3. 5 Flash, and people are losing their minds over it. And by the end of this video, you're going to understand exactly why because this one is different. So, let's start

What is Step 3.5 Flash?

at the beginning. What actually is this model? Step 3. 5 flash is an open-source AI model. That means the code, the weights, everything is out there for anyone to use. No payw wall, no subscription. You just download it and go. It was released in February 2026 by a company called Stepfun. And the big thing about this model, the reason people care is that it was built specifically for AI agents. Now, what does that mean? An AI agent isn't just a chatbot that answers questions. An agent takes action. It plans steps. It uses tools. It can work through a task on its own without you holding its hand the whole time. Think about that. An AI that doesn't just talk. It actually does things. And step 3. 5 Flash was designed from the ground up to be that kind of model. That's what makes it different from a lot of what's out there. How it

Mixture of Experts (MoE) Explained

works? Here's where it gets interesting because the way this model is built is actually really clever. Step 3. 5. Flash uses something called mixture of experts. People call it mo. Yeah, here's the simple version. The model has 196 billion parameters in total. That's a massive model. But when you actually use it, when you ask it to do something, it only switches on about 11 billion of those parameters. The rest stay off. So you imagine you have a team of 200 experts, but for any single task, you only need 11 of them to show up. You don't waste time or energy on the other 189. That's exactly what's happening here. It's using just the right amount of brain for the job. No more, no less. And that is why this model is so fast without being dumb. You get the intelligence of a huge model, but you only pay the speed cost of a much smaller one. On top of that, it uses something called multi-token prediction, MTP3. Every time it runs, it predicts three tokens at once instead of one. So, right out of the box, the speed gets a massive boost. And this isn't a onetime trick. It compounds every single generation step is faster because of how MTP3 is built into the core of the model. It's not something bolted on afterwards. It's baked in from the start. Now, before I keep going, if you want to start building AI agents like this for your own business, check out AI Profit Boardroom. That's where we actually build and test these kinds of AI workflows. Things like using agents to handle your research, automate your outreach, or run tasks without you being there every second. No fluff, just real setups that work. The link is in the description and in the comments. Go check it out. Now, let's get back into

Performance & Speed Benchmarks

it. Let's talk about how fast this thing actually is. Step 3. 5. Flash can hit up to 350 tokens per second on coding tasks. That is genuinely fast. To put that in perspective, most models you've used before are nowhere near that number when they're doing real work. And because of that MTP3 trick I mentioned earlier, it's not slowing down anytime soon. This model is built to move quick. It's built for tasks that need answers now, not in a few seconds. Now, if you're building anything where speed matters, agents that need to act fast, workflows that have multiple steps, this model is made for that. Here's another

256K Context Window & Attention

thing that matters. Step 3. 5. Flash can hold 256,000 tokens in memory at one time. To put that in real terms, that's like handing the model an entire book and having it remember every single page. So, if you're working on something big, a long document, a huge code base, a project with a lot of moving parts, this model can keep track of all of it. It doesn't forget. it doesn't lose the thread. For AI agents especially, this is a gamecher because agents need to hold a lot of context while they work through a task and 256k tokens gives them a lot of room to do that. It also uses a hybrid attention system. That's a fancy way of saying it can handle short things and long things at the same time without slowing down. Sliding window attention for the quick stuff, full attention when it needs to look at everything. So whether your task is two sentences or 200 pages, the model adjusts. It doesn't waste power on things it doesn't need to process.

Step 3.5 Flash vs Gemini 3 Flash

That's smart design. Okay, this is the part everyone wants to know. Does step 3. 5 Flash actually beat Gemini 3 Flash? Here's the honest answer. In certain benchmarks, yes. On coding tests like SW Bench, it scores really well. On agent tasks and reasoning challenges, it's right up there competing with models that use way more parameters when they run. Remember, this model only activates 11 billion parameters at a time, and it's still holding its own against models that are much bigger in that department. That's impressive, but let's keep it real. No model wins every single benchmark. Gemini 3 Flash is still strong in its own areas. The point isn't that step 3. 5 Flash is perfect at everything. The point is that it's competitive, it's open source, and it's free. That combination is what's turning heads. Can you run it on your own

How to Run It Locally

machine? So, here's the question a lot of people have. Can I actually run this on my own computer without renting cloud servers? Yes, you can. Stepfont released the model in GGUF format with INT4 quantization. That means the file size is small enough to actually download and run locally. It works on a Mac Studio with an M4 Max. It works on Invidia DJX systems. It even works on AMD AI Max hardware. You can grab it right now from HuggingFace or you can access it through API platforms like Open Router if you'd rather not run it yourself. All the links are going in the description. Running AI models locally is a big deal. It means your data stays on your machine. No one else sees it and you're not dependent on someone else's servers. For businesses that care about privacy, and that should be everyone, this is exactly the kind of option you want available. So, what do you actually do

Best Use Cases for AI Agents

with this? You build agents. Agents that can research topics, draft content, plan steps, and execute tasks without you micromanaging every move. That is the power of a model built for agentic work. You use it for coding. It's fast, it holds a ton of context, and it understands long code bases. Developers are already experimenting with it for planning, debugging, and building. And you run it locally, which means you keep control your data, your machine, your rules. Remember, this model only activates 11 billion parameters at a time, and it's still holding its own against models that are much bigger in that department. That's impressive. So, that's step 3. 5. Flash, open source, fast, built for agents, runs on your own hardware, and it's competing with some of the best closed models out there. If you want to start using AI agents like this to actually automate parts of your workflow, head to AI Profit Boardroom. That's where we build these things for real. Link in the description and comments. And if you want the full process, the SOPs, and over 100 AI use cases like this one, join the AI success lab. It's our free AI community. Over 40,000 members in there already. You'll get all the notes from this video, access to the community, and everything you need to actually get started. Links are in the comments and the description. That's it for today. Comment below with what you want to see next. Julian reads every single one.

Другие видео автора — Julian Goldie SEO

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник