China’s New Coding AI Beats GPT-5.1 & Claude Sonnet 4.5!
11:05

China’s New Coding AI Beats GPT-5.1 & Claude Sonnet 4.5!

Universe of AI 01.01.2026 2 766 просмотров 86 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
China just released IQuest-Coder V1, an open coding AI that beats GPT-5.1 and Claude Sonnet 4.5 on SWE-Bench Verified. This video breaks down how it works, why the benchmarks matter, and shows real agentic coding demos. For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪‪ ⁨‪‪‪‪‪‪‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: /intheworldofai 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ GPT-5.1, GPT 5.1 coding, GPT coding model, OpenAI GPT, GPT vs Claude, Claude Sonnet 4.5, Claude 4.5 coding, Claude AI coding, GPT vs Claude coding, best coding AI, AI coding comparison, coding AI benchmark, SWE-Bench, SWE-Bench Verified, LiveCodeBench, Terminal Bench, agentic AI, autonomous coding AI, software engineering AI, open source coding model, China AI models, IQuest Coder, IQuest Coder V1, IQuest Research, AI agents, AI programming, Universe of AI #IQuestCoder #codingai #agenticai #SWEbench #opensourceai #aifordevelopers 0:00 - Intro 0:33 - Model Specifications 1:09 - Model Training 2:31 - LoopCoder 4:18 - Benchmarks 7:06 - Demos 10:38 - Outro

Оглавление (7 сегментов)

  1. 0:00 Intro 121 сл.
  2. 0:33 Model Specifications 104 сл.
  3. 1:09 Model Training 214 сл.
  4. 2:31 LoopCoder 284 сл.
  5. 4:18 Benchmarks 441 сл.
  6. 7:06 Demos 682 сл.
  7. 10:38 Outro 88 сл.
0:00

Intro

Before we start off the video, happy new year, guys. And it looks like we have our very first new model of 2026, and it's from China. China just released a new open coding model that's quietly beating GPT 5. 1 and Cloud Sonnet 4. 5 on some of the hardest software engineering benchmarks we have. This isn't another autocomplete model. This thing fixes real GitHub issues, runs tools, and handles full repositories. The model is called IQ Quest Coder version one. And today I want to walk you through why this is a big deal, how it works, and then we'll actually look at the demos on their website. So let's get into it. IQ Quest Coder version 1 is
0:33

Model Specifications

a brand new family of openw weight coding models coming out of China. It comes in multiple sizes, 7 billion, 14 billion, and 40 billion parameter models, but the flagship model is the 40 billion version, and it includes a special architecture they call loop coder. I'll get into that later on in the video. But what makes this model interesting isn't just raw size. It's how it's trained and what it's optimized for. Instead of focusing only on code completion, iquest coder is designed for agentic software engineering, meaning long horizon task, tool usage, debugging, and repository level reasoning. Most coding models are
1:09

Model Training

trained on static snapshots of code. IQest coder takes a different approach. They introduce what they call a code flow multi-stage training pipeline. In simple terms, the model doesn't just see finished code. It learns how code evolves over time. That includes commit histories, patches, bug fixes, tool outputs, test failures, and recoveries. The idea is to teach the model how real software is built and maintained, not just how it looks when it's done. This matters a lot for tasks like software engineering bench where the model has to take an issue description and produce a patch that actually passes that test. The training pipeline has three major stages. Stage one is a large-scale pre-training on general text and massive amounts of code followed by an analing phase on highquality repositories. Stage two is where things get interesting. This is mid-training with long context lengths 32,000 and then 128,000 tokens using reasoning data, agent trajectories, and full repository context. This stage teaches the model how to plan across multiple files, track state over long context, and recover from errors. Stage three is post-training where the model splits into two paths. An instruct version for general coding assistant and a thinking version trained with reinforcement learning for deeper reasoning and self-correction. So the training
2:31

LoopCoder

explains why the model is smart, but the loop coder architecture explains how it delivers that intelligence without exploding compute. This section shows why loop coder is actually interesting from a systems perspective, not just a benchmark trick. On the left, you're seeing a traditional dense former. If you want better performance, the only real option is to add more layers or more parameters. That means higher visual RAM usage, more memory bandwidth pressure, and harder deployment. On the right is loop coder. Instead of stacking more layers, it reuses the same transformer blocks multiple times. You could think of it like the model reading the problem once then looping back and refining its understanding without increasing the parameter count. Because of that loop structure, you get a few important benefits. First, lower HPM overhead. Since parameters are shared across loops, the model needs less memory movement between GPU memory and compute units, which is one of the biggest bottlenecks in large models. Second, higher deployment efficiency. You're getting more reasoning steps on the same hardware, which matters a lot if you're running this in production. Third, performance is maintained. You're not trading intelligence for efficiency. The loop lets the model refine answers instead of bloating that architecture. And finally, scalability. Instead of scaling horizontally by adding more layers, loop coder scales temporarily through iterative computation. This wormhole animation is actually a really good metaphor. Scrolling forward represents the model moving deeper through reasoning passes. In a dense model, that depth only comes from adding layers, which is expensive. In loop coder, depth comes from iteration. The model loops back, reevaluating global context, and refineses its output, all while keeping the same memory footprint.
4:18

Benchmarks

Before we move into the demos, I want to slow down on the benchmarks for a moment because these aren't generic coding scores. Each of these tests something very specific about how the model behaves in real world engineering workflows. Let's start with software engineering bench verified because this is the most important one here. Software bench isn't about writing functions from scratch. The model is given a real GitHub issue, a broken repository and it has to generate a patch that actually passes tests. IQC scores 81. 4 top one here which is higher than GPC 5. 1, Cloud Sonic 4. 5 and Gemini 3 Pro. That's a strong signal that this model isn't good at code generation. It understands debugging, repository context, and how changes propagate across files. This benchmark heavily rewards agentic behavior, not just pattern matching. Next is live codebench version 6, which focuses on reasoning heavy coding problems under strict contamination controls. What's interesting here is that the thinking variant of IQC coder performs significantly better than the instruct version. That tells us the reinforcement learning path is actually doing something. The model is benefiting from iterative reasoning instead of just generating longer answers. Big code bench tests something different. Large compositional coding tasks that involve APIs, libraries, and multi-step instructions. IQC leads or matches top models here as well, which suggests it's strong at stitching together multiple concepts and not just solving isolated problems. Terminal Bench is one of my favorite benchmarks because it's very practical. The model has to operate inside a real terminal environment, running commands, handling errors, managing dependencies, and completing workflows end to end. Strong performance here shows that the model can actually add and not just suggest code. Then you have mine to web and BFCL which test general tool use. Mind to web evalues whether an agent can follow instructions across real websites while BFCL focuses on structured function calling and API usage. IQC performs competitively across both which reinforces the idea that this model is built for tool augumented workflows and not just chatbased coding. Finally, full stock bench evaluates end-to-end application development backend, front end, and integration. Strong results here suggest the model can reason across layers of stack which lines up with the long context repository aware training we talked about earlier. When you put all of these together, a clear pattern emerges. IQC isn't winning because it's bigger or more verbose. It's winning on benchmarks that reward planning, iteration, and tool use. And that's exactly what you would expect from a model trained on code evaluation, agent trajectories, and long context reasoning. Let's start with
7:06

Demos

this example because it looks simple, but it's actually doing a lot under the hood. This is a real-time pixel sandbox where every particle follows different physical rules. Sand falls and piles up, water flows and spreads, stone blocks movement, and acid actually interacts with each other material. So, let's take a look at this. So, if I click on over here, you can see the sand is falling and filling up the area. And based on where I click, it's going to land on that section. Then, if I take water, it's going to fill up a specific area. So, as you can see, since we have these sand blocks blocking the water path, it's going to follow that. Then, we have stone, we can place that over here, and it's going to block the area. So, now if I were to pour water over it, it would kind of, you know, make its own path depending on where the stones are. And then we have acid, which is going to remove everything. What matters here isn't the visuals, it's actually the statefulness. The model has to generate logic that runs continuously and update thousands of elements every frame and it has to keep everything stable as the system evolves. This is exactly the kind of task that breaks shallow code generators because the logic never stops running. It has to reason about cause and effort over time depending on what element we're pouring over in that simulation. The next simulation that we're looking at is the Boyd's flocking algorithm, but it's a great test of real reasoning. Each of these agents follows just a few local rules: separation, alignment, and cohesion. There's no global control telling them where to go. The overall flock behavior emerges from local interactions. So, let's take a look at it. So, if I move my mouse around here, you can see that these boys have to avoid that. And it kind of shows you how the flock works. But if I take away, we can see that area is being filled up. Again, what's impressive here is that the model didn't just generate the algorithm. It built a tunnelable system where you can adjust vision radius, speed, and force weights in real time. And when I move the mouse, the agents react instantly, scatter, and reorganize themselves. And this last demo is less about deep algorithms and more about the full stack competence. The model generates a full 3D solar system simulation, orbital mechanics, camera controls, different view models, and smooth interaction. You've got state management, rendering logic, user input handling, and physics concepts all working together. We can see this in action. So, we can see we can increase the speed of the simulation. We can do a side view, which is really cool, I think, like having that section. Now, we're in the free camera because we're moving around. Then, we have the angle view. We have the top view. We also have the follow planet view. So we'll click on this. And right now we're following Venus. We'll slow this down a little. We can zoom in and zoom out. We can see sorry. This is the orbital line. And we also have a description. So what's really cool here is that this is the first time I've seen a 3D simulation add all of these features and it looks pretty amazing and the graphics and everything look very cohesive. So this lines up well with his strong performance on full stack bench in larger compositional coding tasks. What I like about these demos is that they match the benchmarks. They're not just flashy. They show state, iteration, interaction, and recovery. And that's exactly what I Quest Coder was trained to do. IQ Quest Coder version one isn't just another coding model. It's a signal of where coding AI is going next. Less autocomplete and more autonomy. If you want, I'll test this model hands-on in a follow-up video. Just leave a comment if you would like that. If you enjoyed this
10:38

Outro

video, this is what we do here. fast, clear updates on the biggest moves in AI. If you want to stay ahead of everything happening in this space, make sure you're subscribed. And if you want the hands-on side, demos, tools, workflows, and everything developers can actually build, well, check out the World of AI. We also run a simple no noise newsletter that gives you the most important AI tools and updates in just a couple of minutes. Subscribe here. Follow World of AI. Join the newsletter.

Ещё от Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться