Can You Replace Claude Code/Codex with OpenCode and a Local LLM?
8:53

Can You Replace Claude Code/Codex with OpenCode and a Local LLM?

Gary Explains 07.05.2026 15 297 просмотров 793 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
It is a simple question, can you replace Claude Code/Codex with OpenCode and a local LLM? 🤔 But it is quite hard to answer. In this video I test OpenCode with 3 different local AI models and see if they can do the same tasks as the big coding agents. --- ⭐ Please support my channel on Patreon! Get early access to videos, members-only content, behind-the-scenes updates, and join the Gary Explains Discord! Join here 👉 https://www.patreon.com/GaryExplains 🙌 Twitter: https://twitter.com/garyexplains Instagram: https://www.instagram.com/garyexplains/ #garyexplains

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

So, I recently did a video where I tried to see if I could use Claude code or codex with a local large language model, a AI model running on your PC, and therefore save you a whole bunch of money in subscription fees to open AI and so on. Now, that test clearly showed you can't do that. Doesn't work, and got a whole video about that if you want to go and check it out. But, there are alternatives. For example, there is an open-source product called open code, and that's really a kind of an equivalent to Claude code or codex. It's a coding agent, but it's kind of optimized for running on local large language models. So, in this video, I want to see whether I can write the same program that I was trying to write with codex with a local large language model, but this time using open code. I think you'll see that the results are quite surprising. So, if you want to find out more, please let me explain. Now, to jump straight to the point, the answer is a bit nuanced. It's technically yes, practically no, and I'm going to unpack what I mean by that now in rest of this video. Okay, so let's dive into the testing that I've done. So, I'm using Ollama again. It has this launch feature, so I can say Ollama launch open code, and then give it the name of the model, and it will sort out all the configuration. You have to install open code before that, but then Ollama handles it all. I've tried three different setups here. One is Quen 3. 6 35 B on a Jetson Thor. That's with 256 K of context, and it gives me 36 tokens a second. I then also wanted to try the 27 billion parameter one on an RTX 5090, and that was giving me 13 tokens a second. Remember, both of these failed to produce anything really, anything workable when used with codex or Claude code using again this Ollama launch command. And then, tried now the Quen uh 3. 6 35 billion Q8. So, these are both Q4, and we'll talk more about this in a second. This is Q8, and that gives me seven tokens a second, again 256k of context. So, Open Code is an AI coding agent. It looks similar to the terminal versions of Claude Code and of uh Codex. You basically start in here, and you can say things like, "Read the read me file. " Then, you can say, "Cre- create a plan to implement it. " You just talk to it, and it goes away, and it might go away for a long time doing all the different tasks it's got to do. So, what is the task? Same as the previous video. I've got a uh language I invented, New Scrippy, and I want an interpreter for it. So, this project is a script interpreter for a simple language called New Scrippy. It's typeless. There are no curly brackets, no semicolons. And uh I've actually got the full read me file on my GitHub repository if you'd like to take a look at that. The example code that I give is actually much longer than this because I cover many, many aspects of the language inside that example program, but this is a good uh section here to show you the kind of thing it's got to be able to do. do functions. assignments of integers, of strings. It's got to be able to do if statements, all that kind of stuff. Now, what's the competition? Well, when I use Codex with GPT-5. 5, it took 6 minutes. It built a fully functional interpreter. It ran all the uh it ran the example code. I asked it to create some example scripts that would test different cases. It did that as well, and everything worked perfectly. I tried exactly the same thing with Codex again, but not using uh GPT-5. 5, but using a local LLM. That's what the other video is about, and that failed miserably. You can see all about that in the other video. So, what happened this time around? We're now using Open Code and local LLM. I got good results across the board. So, in each case, so that's Quen 3. 6 35 billion, Quen 3. 6 27 billion, Q8. I got a working interpreter. It actually worked. It would run the example script that was in the readme file, and it was absolutely great. So, from that point of view, it's very, very different using open code with a local LLM because it's a kind of optimized. It's not you know, Codex and Claude expect their models, basically. This one says, "Well, I'm going to be a bit more conservative here about what these models can and can't do. " And it works. I got a working interpreter in all three cases. However, I then got ChatGPT to do a code review. I said, "Hey, look at this code and tell me what you think of it. " And it did pull up lots of problems. So, for example, the 3. 5 billion one, it said, "Well, the biggest problem is that it's reading the script in text mode. " Which I didn't think was that bad, but it then went on to explain that means that some kind of, you know, ending so difference on Unix or Linux and Windows is uh it's CRLF or LF, carriage return, line feed, or just line feed. Anyway, it said it can make a mistake because of that. And there were some memory leaks. But generally, the code was pretty good. The 27 billion parameter one had several critical memory bugs uh that would cause crashes even. Uh and it never implemented the division

Segment 2 (05:00 - 08:00)

uh operator. So, plus, minus, and multiply. And if you did a division, it would drop that into multiply. So, it's kind of got some coding errors as well. And the 27 billion parameter version of but with the Q8 quantization was much better code than the Q4 version, had some memory leaks, and it was complaining that runtime error reporting was not optimal. So, this one, the 27 billion Q8, and the 35 billion uh kind of produced similar uh quality of code, but they all had their own problems. So, that's great. I as I said, I got a code out, and it's got some bugs, but it works uh according to the testing that I gave it to do. But what are the problems? Well, first of all, extremely slow. We're talking about seven tokens a second in one case. So, this thing took hours and hours. I'd often leave it running overnight. There would be I'd have to prompt it again for the next thing and then it would run for a few more hours and then, you know, so it was a long slog. And if you compare that to the 6 minutes it took chat GPT 5. 5, you know, it's just not practical, really. And as you saw, the output isn't perfect. Ignoring style issues and things like that, there were bugs, there were memory leaks, there were buffer overflows, and they would all require fixing. Uh and the code generated is basically not suitable for a production environment. Might be suitable if you just want to test around an idea or something, but if you're testing an idea, you don't know whether it's the idea that's wrong or whether it's the code that's wrong. So, you have to take uh care with what it produces. And one final interesting thing to note is that quantization matters. So, in 2005, early 2005, I did do a video about LLM size, the number of parameters, and LLM quantization to see, you know, which was the most important factor. And my initial findings then were that uh quantization didn't really matter. I was getting the same kind of results from a Q8 model than I was from a Q4 model or from a Q6 model. Uh go and watch that video if you want to see all about that, but where that was different, that was mainly text-based stuff. So, I was asking for summarizations, like essay outlines, general logic questions, not the demands of these coding harnesses. Now, when we use a coding harness like uh open code, the Q8 model was actually much better. It asked more questions, asked more intelligent questions, which I was surprised me because in fact the other two didn't really ask any questions at all. In fact, they may not have asked any questions. And this came back and said, "Do you want this? Do you want this? How do you want to handle this? " I was like, "Oh, wow, this is actually understanding. " And the overall code was of much higher quality than the Q4, and there were significantly fewer critical bugs. So, using a higher level of quantization makes a difference in this situation. However, not so much of a difference. I did try QN 3. 5 just 9 billion. So, I went right down, but then went with the Q8 version. Did that maybe, you know, recover some of those? It didn't. The 9 billion parameter Q8 version didn't do anything either. So, it does matter, but so does the number of parameters. So, you have it. Open code with a local large language model does have some benefits and certainly can do some real work, but there are, of course, problems. Love to hear your thoughts and your experiences on using open code with a local large language model in the comments below. Okay, that's it. My name is Gary Sims. This is Gary Explains. Hope you enjoyed the video. If you did, please do give it a thumbs up and subscribe to the channel as well. Okay, that's it. I'll see you in the next one. —

Другие видео автора — Gary Explains

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник