GLM-4.6 Beats Opus 4.1 Already For Cheaper?!

10:32

GLM-4.6 Beats Opus 4.1 Already For Cheaper?!

Ray Amjad 09.10.2025 6 654 просмотров 132 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=XUCOUoJnZn4 - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=XUCOUoJnZn4 - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=XUCOUoJnZn4 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=XUCOUoJnZn4 - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links: - https://x.com/FactoryAI/status/1975643644326793619 - https://docs.factory.ai/cli/byok/overview - https://x.com/FactoryAI/status/1971271087855186128 - https://x.com/aeitroc/status/1975564528634249492?s=12 - https://x.com/ronaldmannak/status/1976159065479999989 Timestamps: 00:00 - Intro 00:58 - Task 1: i18n 03:37 - Task 2: Bug Hunting 05:19 - Task 3 & My Learnings 08:25 - Concluding Thoughts

Оглавление (5 сегментов)

Intro

So, a lot of folks have been talking about GLM 4. 6 and its cost effectiveness recently. And now it turns out you can use it with Android alongside any other open source model. Android is basically like a Nova Aent CLI tool that seems to perform better compared to the same model on Cloud Code and Codeex CLI. You can see on Terminal Bench, it gets higher scores for Opus 4. 1, GBT5, and Sonic 4 when using the same model. And now using GLM 4. 6 six with Android. You get the same score on terminal bench as using Opus 4. 1 in code. Android is not sponsoring the video or anything. I don't accept sponsors on the channel, but these videos are made possible by the people who buy my products using links in the description down below. Anyways, after I saw the post, I thought it was pretty exciting and I wanted to try it myself because now you can basically bring your own custom models and use any of the providers such as Fireworks AI, uh, Deep Infra, O Lama, OpenR, and so forth. And you would define it in your config file kind of like this as you can see here. So

Task 1: i18n

basically to compare the two I ran GLM 4. 6 on open code and also with Droid as well. And with open code it used my subscription that you can see here from z. A. And with Droid it just used their own version of GLM 4. 6 that they have running. Because when you run Droid and then select the model you can see that you have factory core GLM 4. 6 costing 1/4 of that standard token count and they're running it on their own cluster of H100 GPUs. I know this because I asked one of their like team where is this running because I was concerned with like the location and it's running in the US with their own H100 cluster. Anyway, so the first task that I got it to do is for my application tool, Hyper Whisper. There's a coupon code down below. It's basically a speech to text tool that you can see here. I started getting sales from Japan organically and I was like, "Hey, I should translate this application to Japanese now. " So what I did because this is a Swift UI application, I got open code Android. I gave it the same prompt and I basically said look through the entire application and replace any hard-coded strings and text with the key value equivalent and then make a file that contains all the key value pairs for the translations. And this is like a pretty long task because it has to go through every single file, see wherever they're hardcoded values, and then translate those and put it in a new file. And just comparing the timestamps from when it started to when it finished, it took about 26 minutes in total. Whereas Droid running their own version of GLM 4. 6 took 12 minutes in total to complete the same task. But the interesting thing is that when comparing the code of the two solutions, then the open code version actually localized or translated more of the strings in the application because you can see there are over 450 lines whereas the Droid version only seems to have translated about 20% of that instead. But I guess that does make sense because the open code version did run for longer. But the interesting thing is that Droid integrated it better with the project because it made a brand new settings where you can change the language as well. And that settings page is right over here. And it also actually added it to the list of known regions and did a bunch of other stuff that actually integrates it better with the application. But I have found Droid to be generally better at using Swift UI. And you can see an example over here where this person made a new application uh using Swift UI 100% coded by Droid and they just ran into less problems with Swift UI 6. 2 and the concurrency that was introduced into it. I found this part to be between my experience as well because Droid when making Swift UI changes, it often compiles on the first try compared to Cloud Code which like has to take a couple tries to compile because I think it has an older understanding of Swift UI and just is not as strong when it comes to making

Task 2: Bug Hunting

changes to do with it. Anyways, another thing that I got Droid and Open Code to do with GLM 4. 6 is find some bugs in my application Tenza AI. And if you don't know, that's my free AI news application that you can see on the app store. There will be a link down below to download it and it has some pretty good reviews so you may want to check it out. Like someone said if you have experienced love at first sight, this was it for me. Uh so they particularly enjoy the application. Anyway, so I basically gave the same prompts to both of them and I said review any uncommitted changes and find any potential bugs and issues and group them by the severity. And the interesting thing is that whilst they do find some of the same issues, for example, it found a race condition over here and over here on the right hand side, they do also find a bunch of different issues as well. And they also group them by different severity. And it's quite interesting because the open code version stays in English the entire time uh with Z. AI, whereas the Droid version seems to randomly switch to Chinese in some cases, like over here. Um, but yeah, uh I think the behavior is just a bit interesting. I think I'd probably combine the two together and then get a Nova agent to check just to make sure these are actually like severe issues and only fix the most important ones before making the next update. But I think I would trust the Droid version with the implementation itself more because of the implementation that it did with the Swift UI project. But I think in this particular case when it comes to finding issues and bugs in a project then it's very heavily influenced by the system prompt that's given to model. So I suppose a open code system prompt and the droid system prompt is different and if you were to give the same uh model GLM 4. 6 to R code or had it running in cloud code and told it to find bugs and issues then it would come up with a different list as well with some overlapping issues again. And

Task 3 & My Learnings

finally something I got both of them to do is basically come up with a brand new application which implements Sora 2 uh allowing me to generate videos. It has SQLite in the back end with Prisma OM. It uses TCPR and uses a bunch of other things. And then I gave it the docs for Sora 2 as well. And I gave the same prompt to both of them just to be fair, but I haven't checked the solution yet. One thing I do notice is that the open code version with Z AI as a back end took 27 minutes to do whereas the Droid version took 15 minutes and 4 seconds to do instead. But yeah, let's test out both versions to see how they perform. Now, this is like a simple Sora image generating application. And now this is a open code version. So I'm going to say something like cat eating pasta on a airplane and then see what it comes up with. And now it says fail to generate video. I'm not exactly sure why. So let's just double check. And it seems that open code did not implement the API successfully when using the model. So I'll just paste this back in and then hopefully it fixes it. And now let's try the Droid version whilst we're waiting. So do mpm rundev inside of here. This will spin up the Droid version. So let's paste in the same API key. And now if we try the Droid version with the exact same prompt, you can see that it has all the same settings and a pretty similar design as you would expect from GLM 4. 6 uh and then press generate one video. It actually seems to be generating. And we'll see what happens once it's done. But at least when it comes to using GLM 4. 6 six in open code. You can see it's really struggling with the types over here because uh it just keeps like making the same edit over and over again. Uh so that's a bit unfortunate sadly. So I think basically what this teaches me is that not only does a model matter but the agentic loop that the model is insideroid the factory team clearly figured out something when it comes to designing an efficient agentic loop for each of their models because they seem to perform better on the same model for the same agentic loop compared to codeci and cloud code. So I think when talking about a model with someone and saying that this model sucks so this model is significantly better you want to also consider the gening environment or the loop that it's running in whether that's kilo code open code cursor droid or something else entirely. Anyways it now seems open code version is done again and it launched the server for us. So let's open up the server. Okay, so I ran the open code version again after restarting server, putting in the same prompt, generate videos, and it gave me the same error again. Even though open code tried to fix it, it didn't get there. So I think GLM 4. 6 within open code kind of sucks because uh aentic loop within open code probably is not optimized for this model or just is not good overall compared to say Droid. Whereas now rerunning the Droid server, it seems to be working like it's processing the video. If I press download, the video preview is kind of broken over here, but if I click on the video after downloading it, you can see actually is a cat eating pesto on an airplane. Yeah, and I thought it was going to be using the fork for some reason, but I should have specified that in my prompt. And yeah, so basically

Concluding Thoughts

Droid gets further along with GLM 4. 6 compared to open code. And I guess a conclusion or takeaway from this video is that you want to consider what a gentic loop is running when those models are being used. And it seems that Droid is scoring pretty well for the same models compared to the actual providers of those models like Cloud Code and Codeex CLI. And yeah, I think my mind has kind of changed on this recently. I always thought to myself that the model provider always has the best agentic loop designed for their model to maximize their performance when it comes to coding. Um, but now that's no longer the case because it seems that you can get better performance for the same model with better optimized agentic loop. And I think given two options potentially where you have like OpenAI who are doing like a million other things compared to Codeex CLI and a company that's solely focused on providing a really good agentic loop like cursor or droid or something else uh for any models that are available in their program. I think in many cases it can be better to go for the latter such as like cursor or droid or something else. I really hope that in the future there is an easy way to compare the Aentic Loop and how good it is across different benchmarks for the same model instead. Uh but we'll have to see if that ever happens in the future. But yeah, as a final plug, there's no real point vibe coding applications if you're not also getting customers for them. I teach a lot to do with that in my AI style skill because I go through how to actually vibe code good applications that you can get paying users for. I cover a lot of distribution strategies in my vibe boxing section over here. also coming up with ideas that are worth paying for distributing with AI. And there's a bunch of case studies as well. And of course, if you want to master codeex and cloud code, then I have every single feature covered within like these two classes here. And I will slowly be making class on Droid as well in this because I'm using Droid more and more lately. As you can see here, a bunch of people have already joined the community and are now running their own profitable applications after learning from the classes and each other as well. And one advantage of joining is that you can basically ask me any single question and you get free access to applications that I also make and any applications that I make in the future as

Другие видео автора — Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник