Want to make money and save time with AI? Get AI Coaching, Support & Courses 👉 https://www.skool.com/ai-profit-lab-7462/about
Get a FREE AI Course + 1000 NEW AI Agents + Video Notes 👉 https://www.skool.com/ai-seo-with-julian-goldie-1553/about
Want to know how I make videos like these? Join the AI Profit Boardroom → https://www.skool.com/ai-profit-lab-7462/about
Get a FREE AI SEO Strategy Session: https://go.juliangoldie.com/strategy-session?utm=julian
Sponsorship inquiries:
https://docs.google.com/document/d/1EgcoLtqJFF9s9MfJ2OtWzUe0UyKu1WeIryMiA_cs7AU/edit?tab=t.0
Exploring the New GLM 4.7 Flash: Fast, Free AI Coding Assistant
In this video, we dive into the newly released GLM 4.7 Flash, a local coding AI agent and assistant. The video guides you through how to access and use GLM 4.7 Flash via platforms like Hugging Face, LM Studio, and Alama. We conduct performance benchmarks, comparing it to other models like Quinn and GPT OSS, and reveal its faster speeds and affordability. Despite challenges with running it locally due to hardware constraints, alternative deployment methods via Open Router are discussed. Additionally, links to resources and training materials in the AI Success Lab and AI Profit Boardroom communities are provided for further learning and AI development.
00:00 Introduction to GLM 4.7 Flash
00:35 Accessing GLM 4.7 Flash on Hugging Face
00:55 Benchmark Comparisons
01:49 Using GLM 4.7 Flash in LM Studio
02:28 Installation and Setup
03:54 Challenges and Limitations
09:44 Community and Resources
11:33 Exploring Alternatives
We have a brand new release of GLM 4. 7 Flash today which is a local coding AI agent and assistant. And you can see here it's like a comparable model to for example like GPOSS or Quen. And you've got GLM 4. 7 Flash. Now I've used GLM 4. 7 already on the chat. There's a really good agent that's available for free at chat. ai using GLM 4. 7, but this one is different because it's faster and more affordable. In fact, you can actually run it for free. So, I'm going to show you exactly how to use this today and we'll test it out side by side. So, you
can get access to this at hugging face as you can see right here. So, you can for example go to the inference model and you can test this out. So, if we say okay, are you working? We can test it out and type in a prompt right here and then it will come back to us using GLM 4. 7 flash. You can also see the
benchmark evaluations. So in terms of the benchmarks here, let's compare it against Quen. So you can see here it's outperforming Quen by quite a long way and it's outperforming GPT OSS which is the open- source model from Chat GPT by quite a long way. Pretty much all of the benchmarks right here. So you can see here they've said setting a new standard for the 30B class GLM 4. 7 balances high performance and efficiency blah blah. And we're going to test it out. G says can you please link the toolkit you made a video about. Yes. So, if you want all the links, they're always inside the AI success lab, which is a free community. I put all the links. The reason I don't put the links on YouTube is just like it's easier to bunch everything inside here. And then you can type in whatever you want. So, for example, if you wanted to learn about GLM 4. 5, you can just type that in the archive and then get all the links and the resources right here. So, is inside the AR success lab link in the comments description. You also see it's available
on LM Studio as you can see right here. So, let's test this out. And LM Studio is a way to host free local models like you can see comes in a nice UI. So if we download LM Studio and then we're going to install this like you can see. And then if we type in Studio, we should be able to get that set up. Just wait for that to install. And then you can see we can download this right here. So we can download that. That is a huge file by the way, like 16 gig. It might take a while to download that, but you can see how we can get access to that inside LM Studio. So you can see 4 flash 4. 7 flash is available right here and you can begin to use that inside LM Studio for
free. Also if you get the pre-release of Olama this new update as you can see you actually get access to GLM 4. 7 flash as you can see right here. So if you want to get access to this let's try and install this absolutely ages. So what we can do is we can download the latest version of Lama and then from there we can install GLM 4. 7 Flash but the problem is you can see here it might take an hour to download that. Let's see it's getting in the meantime let's see what questions we got. Oh this just nearly done. Let's go. All right here we go. So let's check that one. 14. 3. There we go. So you can see it downloaded now. So, what you have to do if you want to get it running on O Lama, you go to version 0. 14. 3 of Olama. Download the zip if you're on Mac. Then what you're going to do is run that once you've downloaded it. Make sure you're on 0. 143. And then inside the terminal, you just run this command as you can see here. So, alarm run glm4. 7 flash. And then you can get access to this model directly right there as well. It is a huge model, so it's going to take a little while to download. That's essentially how you can get access. There is GLM 4. 7 inside open code as well, but I can't see 4. 7 flash. Maybe we have to up update here. Let's see. Can't see it in there. Let's have a look on the comments what we got here. Why is bro dancing walking? Why? To get the steps in to stay in shape. Why not? What are we doing right now? Can you run
a quantized version of GLM 4. 7 Flash? Yes, you can run it on a Lama and LM Studio like we were talking about. By the way, 4. 6 Flash is trash. I've not tried it out. So, let's see. Team Insights, please stop doing AI videos as soon as I see you published. Honestly, like if you look at the views and if you look across every single channel, if you look whether it's on Facebook, Reddit, if you look on Twitter, YouTube, the videos that tend to perform the best are my avatar videos, right? And typically my human videos perform worse. And in general, people vote with their views and their clicks. And AI avatar videos tend to outperform the human version of me, sadly. What else we got here? How can you type whilst walking? Very nice setup. Thank you very much. I'm on a treadmill. Yes. Getting the steps in. And we got a new member as well. Welcome, Chlo. Thank you very much for joining. So, you can see that new model is loading right here. It's taking a little while. There we go. So, we got GLM Flash now running on LM Studio. So, if we go down here, we'll select the model. We'll try GLM flash. It says model loading was stopped due to insufficient resources. Continuing to load the model would likely overload your system. So, I actually can't run it locally, sadly. Just so you're aware, I'm on a M4 chip. So, I'm on a Mac Mini, Apple M4 chip. I can't run it locally, which is annoying, but better than crashing during a live stream. And you can see here the 4bit runs fast on a base M5 32 GB laptop. So I think if you have a powerful laptop, this is going to work nicely. But if you're just using a normal one, this is going to be pretty hard to use. The only place you can really start using it then directly is inside hugging face. You can use it online. They've got an inference provider right here. So you can just test it out inside the chat there. See how it performs, etc. But for local deployment, that's going to be pretty tough unless you've got a super powerful laptop from what I can see. I might be wrong, but that's just what I've seen so far. Let's see how other people are using it. Oh, there we So we can actually use it on open router as well. So if you don't want to use it locally or if you don't have the power to use it locally, then you can just load it on the API inside open router. And then if you want to start coding with it, just load up your favorite ID like this. Then we can see if we can use it inside client. Let's have a look. Obviously, if you don't run it locally and you're running it with open router, there's going to be a cost to that. But it is super cheap. Look at that. 0. 07 per million input tokens. 200,000 context. Not bad. And then if you wanted to run it locally, you can select O Lama and then select the model and then go from there. If we want to use it inside open router, I think we can use it like this. So let's just try this out. Yeah. So you can select GLM 4. 7 flash like this. And then you just get an open router API key. So let's find that. And then once you've got your API key, just use it inside your ID. So you can plug it in right here in the settings of client. Hit done. And now we have GLM 4. 7 flash running. Right. So if we say okay create just to test this out. We'll see how it is. So we'll say create an SEO calculator. Let's see how it performs and also how much it costs to build out the project. It's a reasoning model. So it's using thinking mode 200k token context. A little bit slow but maybe that's because it's just been released. It should be fast to be fair. And also if we have a look inside terminal now we have GLM 4. 7 flash running with Olama. I do think that's going to be very slow on my laptop. if I try and ask it something, but we'll try this out. So, you can see I've typed in the prompt right there for a llama, and it's not It doesn't seem to be working on the API, which is weird. So, it's not working on terminal. the API. Good start. It worked on hugging face, but it was quite slow to respond. Ashkit says, "Love your content. " Thank you very much, sir. Odoro says, "Just arrived. What are we building? " We're waiting for GLM 4. 7 to work. It is very slow, though. Look at that. It's cheap, but this is nowhere near the speed of what I would expect for a flash model. And Clo said content, love what you're building. Thanks very much for joining. I'm glad you're in the boardroom, too. Appreciate you being there. I actually have to shut down a lama because it was just being so slow. So, I'm going to shut these tabs down. Yeah, this is way too slow on anti-gravity. Look at that. It's lagging my whole computer. Overall, not an ideal model to use, I would say, locally if you don't have a powerful laptop. Let's go to the chat here and just see how it performs. I think next time I upgrade or next time I get a new laptop or whatever, I'm going to have to get some crazy RAM on it. So, let's try this out. We're going to try Opus versus Flash. As you can see right here, I'm going to shut down O Lama. Shut down Studio. Let's try and get it working again. So, now we have GLM 4. 7 Flash working here and Claude Opus over here. And it looks Let's have a look here. Looks like it totally failed on me. So GLM for this is it's funny when you see this cuz I think most YouTubers would never show you this but you can see we've tried it on pretty much everything LM Studio and to Gravity with the API key for open router and open router inside the chat itself and it's just not working like anywhere I've tried it. So maybe it's just been released and it's still slow or everyone's rinsing it. But so far I would never use this model. You're going to get way better results if you just go to chat. zai, Z AI select GLM 4. 7 and then build something out right there. So, for example, let's say you want to build a website. You can see some of the stuff we built previously using GLM 4. 7, not the Flash version. And look at that. Like, super nice. The UI looks really nice. The website looks nice. We literally just gave it a prompt to create a beautiful landing page based on some information that we plugged in and it created a much nicer page. So, I would still say for 99% of people watching, unless you have a super powerful laptop or you're really insistent on running local models, you just you wouldn't use GLM 4. 7 Flash. Doesn't make sense. So, at least I've been transparent and shown you works and what doesn't. If you do want to see my
training on how to use GLM 4. 7, you can get that inside the AI success lab, completely free, link in the comments description. So, you can see we have loads of trainings on like how to use this for AI SEO and also how to build stuff with this, all our best resources and step-by-step tutorials. So, I would recommend starting there if you want to learn this stuff. This is completely free community that connects you with 47,000 people as you can see. And then, if you haven't already, check out the AI profit boardroom link in the comments description. This is basically an amazing community full of serious AI builders who are focused on growing and learning and scaling with AI automation together. where you have a daily accountability group like you can see where you can check in, post your goals, avoid information overload, avoid shiny object syndrome and avoid overwhelm and just focus on your goals for the day. You can also post inside the community, get help, get support whenever you need it. And you can see it's not just me who's winning with this stuff, right? So, loads of people are just getting results with AI automation and they share their progress inside the win section of the AI profit boardroom. Top of that, you get four weekly coaching calls per week. So you can jump on these coaching calls live, get help, get support whenever you need it. And then also you'll get my best trainings inside the classroom. For example, you can see the six week AI automation masterass that shows you how to go from complete beginner to expert with AI, plus how to build your first AI agent in under five minutes. On top of that, you'll get my best playbooks for automations on X, Twitter, AI avatar videos, and also you'll learn how to get more clients for your agency using the agency course. Additionally, you'll learn how to rank number one inside AI search engines and how to grow a YouTube channel and what's working for me right now. So, feel free to get that link in the comments in description to the AI profit boardroom.
Stark DevOps says, "Have you tried subbras? " I've not tried that. Let's try it out now. Let's see. is you got GLM 4. 7 on there. Let's try this out. So, it's got GLM 4. 7, but it doesn't seem to have flash on there, which is what we're looking for. So I think even Cerebrass you don't seem to be able to get access to GLM 4. 7 flash which is what we're looking to test out