GLM-4.7: Open-Source Agentic Coding Gets Better!
9:37

GLM-4.7: Open-Source Agentic Coding Gets Better!

Universe of AI 22.12.2025 5 186 просмотров 100 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
GLM-4.7 is a new open-source model focused on agentic coding, terminal workflows, and tool use. In this video, I walk through: • What actually changed from GLM-4.6 • How GLM-4.7 performs across coding, reasoning, and agent benchmarks • Why improvements in SWE-bench, Terminal Bench, and τ²-Bench matter in practice • Frontend and creative demos to see how the model behaves beyond numbers Link to try GLM-4.7: https://chat.z.ai/ For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪‪ ⁨‪‪‪‪‪‪‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: /intheworldofai 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ #GLM47 #AgenticCoding #OpenSourceAI #AICoding #UniverseOfAI GLM-4.7, GLM 4.7, GLM AI, Zhipu AI, open source AI model, agentic coding, coding agents, AI coding model, SWE-bench, Terminal Bench, tau2 bench, BrowseComp, AI benchmarks, LLM benchmarks, AI for developers, AI agents, coding with AI, frontend AI, AI demo, GLM vs Claude, GLM vs GPT, open source LLM, AI tooling, Universe of AI 0:00 - Intro 0:23 - What's New! 1:25 - Benchmarks 4:37 - Frontend Design Capabilities 5:48 - Interactive Website 6:55 - Voxel Temple 9:10 - Outro

Оглавление (7 сегментов)

  1. 0:00 Intro 80 сл.
  2. 0:23 What's New! 151 сл.
  3. 1:25 Benchmarks 519 сл.
  4. 4:37 Frontend Design Capabilities 254 сл.
  5. 5:48 Interactive Website 220 сл.
  6. 6:55 Voxel Temple 431 сл.
  7. 9:10 Outro 88 сл.
0:00

Intro

There's a new open-source model worth paying attention to, GLM 4. 7. We've seen a lot of open models lately, but this one is clearly focused on coding and agent workflows, not general chat. In this video, I'll do a quick overview of what the model is targeting, walk through the benchmark results, and then we'll look at a couple of demos so you can see how it actually behaves. So, let's get into it. So at a high level
0:23

What's New!

GLM 4. 7 positions itself as a coding partner, especially for people using agents. There are four main areas they're emphasizing. First, core coding improvements. Compared to GLM 4. 6, there are clear gains in agenting coding and terminalbased task. On the software bench verified is up to 73. 8%. Multilingual software bench also jumps to 66. 7% and terminal bench goes up to 41%. Second, UI and front-end output, which they refer to as vibe coding. The model produces cleaner web pages and better formatted slides with fewer layout issues. Third is tool use. GLM 4. 7 performs very well on multi-step tool benchmarks like how to bench and web browsing tasks like browse comp. And fourth, reasoning, especially when tools are enabled. On humanity's last exam, the tool enabled score jumps to 42. 8% 8% which is a meaningful improvement over 4. 6. That's the framing. Now let's look
1:25

Benchmarks

at the numbers more closely. Looking at this first chart, this is a highle comparison across reasoning, coding, and agent benchmarks. What I would focus on here isn't whether GLM 4. 7 is number one everywhere. It's not, but how consistent the improvements are over GLM 4. 6. You see gains in live code bench, GPQA, software engineering bench, terminal bench to assistant hle. In other words, the model isn't just improving in one narrow area. It's getting better across the tasks that tend to break agents, which are multi-step reasoning, execution, and follow-through. This is the kind of chart that tells you where to expect improvements, not necessarily how dramatic they'll feel. All right, this second table is a more detailed breakdown of the benchmark results, and I'll break it into three buckets: reasoning, code agents, and general agents. Starting with reasoning, the MMLU Pro and GPQA Diamond are both knowledgeheavy benchmarks, but they're structured to test reasoning under pressure, not just recall. GM 4. 7 improved modestly here from 83. 2 to 84. 3 on MMLU Pro and more clearly on GPQA Diamond at 85. 7. The bigger signal though is HLE, humanity's last exam. Raw HLE jumps from 17. 2 to 24. 8, which is already meaningful, but when tools are enabled, it goes up to 42. 8. That gap tells you something important. This model is much better at using tools as part of its reasoning process and not just answering in isolation. That's exactly what you want in agent settings, reasoning with execution, not just before it. Now the most important section for this model is code agents. Software engineering bench verified moves from 68 to 73. 8. That's not just write better code. Software engineering bench tests whether the model can understand an existing code base, apply a fix, and pass real unit tests. So a sixpoint jump there usually translates to fewer broken patches and fewer partial fixes. Multilingual software engineering bench is even more telling. Going from 53. 8 8 to 66. 7 suggests that the model is handling non-English comments, mixed language repos, and less standardized code much more reliably than before. Then there's terminal bench. Terminal bench isn't about code quality. It's about execution discipline. Running commands in the right order. Interpreting output and fixing errors instead of repeating them. Jumping from 24. 5 to 41 is a strong indicator that the model is more dependable once it leaves the editor and starts interacting with an environment. Finally, the general agent benchmarks browser comp and how to bench measure whether the model can decide when to browse, choose the right tool, manage context across steps. GLM 4. 7 improves across all these, especially when context management is enabled. This usually shows up as fewer tool loops, less unnecessary browsing, more direct paths to answer. Again, not perfect, but noticeably much more controlled than earlier versions. So, taken together, this table tells a pretty clear story. GM 4. 7 isn't just smarter in isolation. It's better at staying on track once it starts doing things. And I'm on Z. AI.
4:37

Frontend Design Capabilities

I'll put the link for this model in the description as well, so you can test it out by yourself. So, click on GLM 4. 7. Make sure you're selecting the most advanced model. Once you're in there, I'm going to test this model out by asking it to create a HTML website. Have some high contrast dark mode, bold condensed headings, animated ticker, chunky category chips, and magnetic CTA. So, let's see what this creates. All right, looks like our model is done creating the website. And from the get- go, you can tell it's pretty beautiful design without limits. We can see that it has this animated thing going on. And we can see our mouse has this cool feature here as well. We have this monor structure. Oh, this is pretty good. Like if you hover over them, it adds color to it. So on the get- go, I can clearly tell like the design elements are really good for this model. Let's see what can we do. Anything here? Can we click on these features? Not at the moment. But the UI interface is good. And if we look at the code, it generated about how many lines of code is there? About 600 lines of code. And it only thought for about like 20 30 seconds. And the preview, if you look at it, is pretty great. Like this is pretty good. And this is like a high professional looking website. So, not bad for an open- source model.
5:48

Interactive Website

Here's another example of what I'm going to test out is that I'm going to ask the model to behave like a creative front-end engineer and digital artist. Deliver one complete standalone HTML file, constrained single file HTML with all CSS and JavaScript, static hosting compatible, no backend, no build tools, and the design style should be cyberpunk 3D. Use 3. j js and everything like that. And this is what the model actually created. So audit to algorithm is a charted accounted website. You can press initialize view here. You can also scroll down and you can see the background kind of follows you along with it. So you can see it like follows the cyber punk theme pretty spot-on. And like the user interface is pretty good. Like look at this. As I'm scrolling up and down, it zooms in and out onto the background. So that's cool. the skill matrixes here gives us our finance core text stack everything like that over here when we go down it also has these beautiful animated features where like if I hover over them they pop up so this is pretty good like the UI interface is good the capabilities to code all this is alo pretty good so I am not like not upset with this at all now
6:55

Voxel Temple

I want to test out something a little bit more creative and complex than the previous examples is to create a richly crafted vaual art environment featur featuring an ornate temple set within a vibrant garden and waterfall. Include diverse vegetation, animals, and ensure the composition feels lively, colorful, and visually striking. Use any voxil or WebGL libraries you prefer. So, let's see what this creates. Looks like our temple is live. And we can see it over here. It has tried to add rich interactive voxil art environment. It features a procedurally generated ornate temple. So, this is our temple. kind of looks like a tempo, but it's a little bit glitchy. And then hit try to animate koiish and birds. So I can see I guess these white things that are floating at the top over here are birds, so it's not bad. Uh I mean it doesn't really look like a bird, but we'll give it to it. Then let's just look at the environment. Oh, that's our koiish. It's pretty good. Like not bad at all. Like good animation and everything like that. The temple looks nice. We have all these trees. We can also stop the rotation toggle day in day night. So, okay. Yeah, looks like there's some I guess fireflies or something like that. And let's just make it day again. So, the temple doesn't really look like a temple. Obviously, it's a little bit complex, but it's not bad. Like, it has the structure of what a temple would look like. And it has all these tree elements, and it has this pond here with koiish. that the waterfall is not really waterfalling. I guess this is the part where it's trying to be a waterfall, but it's still pretty good. Like, I'm not going to criticize the model too much. It still was able to add all of this. And creating voxal environment is a little bit tough. So, what this created is pretty good. And we can actually see it called it the voxil sanctuary, a procedural exploration of architecture and nature. So, good. We can zoom in and zoom out. The zoom in it features are nice as well. Like this is pretty detailed. Let's go into the temple. The temple looks like it's levitating. Like the pillars are not attached to the top, which is fine. And then this is our pond. We can see our fishes here, which are looping back left and right. So, not bad at all. So, this is pretty good. If you enjoyed this
9:10

Outro

video, this is what we do here. Fast, clear updates on the biggest moves in AI. If you want to stay ahead of everything happening in this space, make sure you're subscribed. And if you want the hands-on side, demos, tools, workflows, and everything developers can actually build, well, check out the world of AI. We also run a simple no noise newsletter that gives you the most important AI tools and updates in just a couple of minutes. Subscribe here. Follow World of AI. Join the newsletter.

Ещё от Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться