Cloud vs Local LLMs for Codex/Claude Code - The Truth You Need To Know

8:37

Cloud vs Local LLMs for Codex/Claude Code - The Truth You Need To Know

Gary Explains 30.04.2026 49 488 просмотров 1 931 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Is it possible to use tools like Codex or Clause Code with a local LLM. If you can the advantages would be massive: no expensive monthly subscription and no fears of running out of tokens. --- ⭐ Please support my channel on Patreon! Get early access to videos, members-only content, behind-the-scenes updates, and join the Gary Explains Discord! Join here 👉 https://www.patreon.com/GaryExplains 🙌 Twitter: https://twitter.com/garyexplains Instagram: https://www.instagram.com/garyexplains/ #garyexplains

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

So, large language models that you can run locally on your PC have improved vastly over the last year. In fact, I've recently done a review of Quen 3. 6 and Gemma 4, and they are both pretty impressive. However, that now leads to the question, can I use these local models that I run on my PC with tools like Codex or Claude Code so that I can avoid paying a large subscription to OpenAI or whoever, and I also don't have this problem of running out of tokens as I'm doing the coding? Well, that's what I want to look at in this video. So, if you want to find out more, please let me explain. Okay, so let's dive into this then. So, what's my setup? I've got I'm using Ollama. You can also use LM Studio, and Ollama comes with a built-in way of launching Codex from OpenAI. So, I'm using OpenAI's Codex with Ollama, and I basically you can specify which model you want to use. I've been trying three models. In fact, I do try four in the end. So, I've got Gemma 4 26 billion running on an RTX 5090, I get 117 tokens a second out of that. I've got Quen 3. 6 35 billion also running on the RTX 5090, and I get 140 tokens a second out of that. And then I've got Quen 3. 6 35 billion parameters again, but this time on a Jetson 4. Now, with that I only get 36 tokens a second. That's the difference between unified memory and VRAM. But, the reason I wanted to try the Jetson 4 is it's got a 256K context window compared to 64K context window on the other two. I wanted to see if that made a difference. So, what's the task? Well, the task is to build uh a basically a language interpreter like Python, Lua, you know, but it's called New Scrippy, which I basically invented. Uh and you can see a New Scrippy here on the right-hand side. The way it works very familiar to other kind of you know, C-inspired interpreters. And then basically what I did is I give Codex this readme file, and then prompt it basically to implement it. So, the project is a script interpreter for a simple language called New Scrippy. It's typeless. There are no curly brackets and no semicolons. The end keyword is used to delimit blocks, so you get if end, while end, etc. The interpreter will be in C. It needs a tokenizer, and it needs to use an AST. Now, I won't go into how you write interpreters in this video, but basically if you want one that's more than just, you know, really simple, you need to be able to expand on it and add things to it. You need a tokenizer and an abstract syntax tree, which is kind of the plumbing that you need to get this working. Here is an example script. So, then I just basically gave it an example script, a longer one than that one, cuz I couldn't fit it all on the slide, but it's basically about twice as long as that, which shows all the different things you can do. And then I say, "Go ahead Go ahead and implement that interpreter, please. " Now, I tried it using the real Codex with GTP 5. 5. It took 6 minutes on GTP 5. 5, and it built a complete functioning interpreter. It was able to run that test program. I then prompted it to write additional New Scrippy programs to stress test various edge cases, which it did, and all of those passed and failed according to what they were meant to do. So, this is something that a frontier model can do, and it's not kind of a piece of technology that's unknown. I mean, there have been plenty of interpreters written, and there's plenty of example source code, plenty of documentation out there. So, this should be something that these models can handle. So, I tried that based on that readme file and a bit of prompting on those models that I mentioned on my PC. Unfortunately, the results were not very good. So, on Gemma 4 26 billion running on my PC, I had to keep typing in continue to do something and then stop and I'd say, "Continue. " And then in the end, it just got stuck. Uh it just kept saying, you know, "I will use shell. I will use shell. " And it just kept on going round and round, or it kept saying, "I will list the programs. I will list the programs. " It just kept on getting So, I would stop it. I would then resume the session. I would try to restart it, or I would restart it and say, "Where are you up to? What's the status? Try again. " I tried as many different ways I could to get it free of that loop, uh but it just all it failed. So, after several attempts, I abandoned and just said, "No, it can't do it. " Now, Quen 3. 6 35 billion did write some code, which was good. Uh a simple New Scrippy program with, you know, three or four lines of print this and, you know, define a few variables would work, but more complicated stuff had bugs. And what actually happened is that the AI knew that, and it was trying to run it and test it. It would then try to run it and see the bugs. It would try to fix them, and it just got stuck in a loop. It would just kept saying, "So, if A is equal to B. " And then it would go off and say, "So, if A is equal to B. " And it would just go round and round. And I left it for hours, and it just kept on going round the same output coming out. No different. Again, tried

Segment 2 (05:00 - 08:00)

to restart it a few times. It just didn't work, basically. Jetson 4, similar story. It got stuck. Didn't really write much code. Again, I had to restart to get it to try to continue on, and I just abandoned it again. So, basically, no, you can't use it locally to do something you can do with Codex or Claude Code. It's just not going to work. Even if you're prepared to wait, you know, rather than 6 minutes, you're prepared to wait 60 minutes or 2 hours or whatever, it can't do it. They just get stuck. They get bogged down. They just they don't know what they're doing. So, I thought, "Well, what can they do? I mean, is it that bad? " So, I then went for simple New Scrippy. Okay, and I said to it, "I want the simplest interpreter possible, including a tokenizer and that uses an abstract syntax tree that can execute this print 3 + 4. " Now, that may not seem like a very complicated program, but in here you've got to understand that it's parsing a keyword print or built-in function. It's able to handle an expression. It's got to know that you've got to add three and four together. The result of that thing got to be printed out on the screen. So, it's Although it's trivial compared to a full language, if you can do that, you've actually got a lot of the plumbing there for what you might need later on. That was the idea. Can it create the plumbing and create something that works? Well, they all did work, so that's good. You got something out, and they only took a few minutes. So, it, you know, 5, 6 minutes, and you would get some code out of it. You were looking between about 140 and 200 lines of code. And for example, the Gemma just came up. It would only write run a script with one line. It had nothing in there about processing the next line and the next line. That's fine. I asked it for the most simple thing, and it would only work with the plus operator. I then did ask it to add minus, multiply, and divide as a second prompt, and it did do that, and that worked. So, you can kind of get the plumbing going here. Again, the same with Quen 3. 6. That worked. 170 lines of code. It actually opted to go for plus and minus in the first time around. Again, it could only run one line of code. Quen 3. 6 27 billion, that's the fourth model that I used. I thought I'd give that a try. And that worked again, just a few minutes. 140 lines of code. It supported plus, minus, divide, and multiply out of the box, even though I didn't ask it to do that. But, the precedence was wrong. If you typed in a more complicated one where, you know, you can see it's tricky, it executed left to right. It didn't get it right in terms of the precedence. Okay, so there you have it. So, not quite the result I was hoping for. Up at the high end where it's more complicated, it basically didn't work at all. And when you give it more simple tasks, well, in fact, they could have been one-shotted just using a prompt in a normal chat kind of setup. So, I'm a bit disappointed, to be honest. Now, I'd love to hear your thoughts on this. Is this something you've tried? Is this something you use yourself? Do share your software and hardware configurations in the comments below. Okay, that's it. My name is Gary Sims. This is Gary Explains. I really hope you enjoyed this video. If you did, please do give it a thumbs up. And if you like these kind of videos, then why not stick around by subscribing to the channel? Okay, that's it. I'll see you in the next one.

Другие видео автора — Gary Explains

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник