How I Used Codex CLI to Fix Claude Code
11:00

How I Used Codex CLI to Fix Claude Code

Ray Amjad 30.08.2025 12 144 просмотров 453 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Join AI Startup School & learn to vibe code and get paying customers for your apps ⤵️ https://www.skool.com/ai-startup-school —— MY APPS —— 🎙️HyperWhisper, write 3x faster with your voice: https://www.hyperwhisper.com/ - Use coupon code X8RW3ELH for 40% off 💬 MindDeck, an advanced frontend for LLMs: https://minddeck.ai/ - Use coupon code AWHK2ZWF for 40% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE —— MY CLASSES —— 👾 Codex CLI Masterclass: https://www.mastercodexcli.com/ - Use coupon code K5LP2NRK for 20% off 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/ - Use coupon code 6OKODFRW for 20% off ————— CONNECT WITH ME 📸 Instagram: https://www.instagram.com/theramjad/ 👨‍💻 LinkedIn: https://www.linkedin.com/in/rayamjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links Mentioned - https://research.trychroma.com/context-rot - https://arxiv.org/abs/2402.18216 Timestamps: 00:00 - Intro 00:24 - Context Rot 02:45 - The Problem with Claude Code 05:26 - Where Codex CLI Comes In 07:57 - LLM's Task Switching Capabilities 08:33 - How I've Been Using It 09:41 - Comparison to Subagents 10:24 - Conclusion

Оглавление (8 сегментов)

  1. 0:00 Intro 101 сл.
  2. 0:24 Context Rot 518 сл.
  3. 2:45 The Problem with Claude Code 644 сл.
  4. 5:26 Where Codex CLI Comes In 551 сл.
  5. 7:57 LLM's Task Switching Capabilities 139 сл.
  6. 8:33 How I've Been Using It 266 сл.
  7. 9:41 Comparison to Subagents 164 сл.
  8. 10:24 Conclusion 138 сл.
0:00

Intro

So over the last few weeks, OpenAI's Codex CLI  has gotten quite good, and I'll be explaining   how I've been using it alongside Claude Code  over the last few days to achieve even better   results than using Claude Code alone. Firstly,  why has it suddenly gotten good? That's because   GPT-5 was launched about 3 weeks ago, and when  you're using GPT-5 High when it comes to coding,   it actually performs quite well. And also, they've  been releasing updates almost every other day   since the launch of GPT-5 as well. Now, before  getting into the problem that I was having when it
0:24

Context Rot

came to using Claude Code alone, I want to explain  what Context Rot is because it will actually help   you understand why the problem was happening to  begin with. Basically, a paper came out about   6 weeks ago by Chroma, and they write about how  increasing input tokens impacts LLM performance.    Now, we all know the needle in a haystack test  that many LLMs do, so when Google releases a 1   million token context window, they say it achieves  like a perfect score on the needle in a haystack.    What they're doing here is they have a bunch of  random text, which is related to the needle in   some way. And they have a needle, such as the best  writing advice I got from my college classmate was   to write every week. And then they ask a question,  such as what is the best writing advice I got from   my college classmate, and they see whether the  LLM is able to retrieve that piece of information   from this long piece of text. So what Chroma  did is introduce something called distractors,   which are somewhat semantically similar to the  needle, such as I think the best writing tip I   received from my college professor, and not the  classmate, my college professor, was to write   every day. So this is semantically similar to the  needle, but it's a different piece of information.    And then they experimented with different types  of distractors, so they had easy ones over here,   and challenging ones over here. And you can see  that as they changed the number of distractors,   the performance of the model actually  decreases for many high performance,   medium, and low performance models. So as you can  expect when there's one distractor, or when there   are zero distractors, the high performance models,  such as Claude 4, Sonny, and also GPT-5 as well,   are fairly consistent. But as the token or input  tokens increases, with one distractor, you can   see decreases. And as you have more and more  distractors, and with increasing input tokens,   you can see the performance is like almost 35%  over here. And they also compared different   distractors as well. So you can see distractor 0,  1, 2, and 3 over here. And the fourth distractor,   which is distractor 3, is the most challenging  across the board. Because if we look back to what   the distractor is, it says, I thought the best  writing advice I got from my college classmate was   to write each essay in four different styles,  but not anymore. And this, but not anymore,   makes this distractor much more challenging to  different LLMs out of all the distractors on   this list. I would recommend reading through the  paper because it is quite interesting. But the   result here is pretty clear. As you increase the  amount of input tokens, the performance of the LLM   decreases. As you introduce more distractors,  then the performance decreases even more when   you're introducing more input tokens. And as you  introduce more challenging distractors as well,   the performance drops even further. And basically,  the problem that I was having with Claude Code
2:45

The Problem with Claude Code

over the last couple weeks is that I would have  my file over here in green. And I would have   the needle, which is a thing that needed to be  edited to achieve the result that I gave Claude   Code to do. Claude 4. 1 Opus in this case, because  I usually use Opus. And then I would have a bunch   of weak distractors scattered across the code  base. And bear in mind, I'm not the model, so I   don't know exactly what is a distractor to me. But  these are a bunch of weak distractors that kind of   do something similar to whatever the needle is  doing. And then for some reason, Claude 4. 1 Opus   decides to add another distractor in another file  that is related to the needle. So it adds like a   strong distractor over here when implementing  the feature that I said that it should do. And   then it adds another strong distractor over here.   And this kind of happens over a three, four hour   long coding session. And if you replicate that  across like dozens more files in the code base,   you can see the distractors are starting to pile  up. And usually when I'm vibe coding with Claude   Code, I'm watching a television show as well. So  I'm just pressing accept, accept without   reading through the code. And during that time, I  reset the context window a bunch of times because   I try not to go over 50% of the context window.   And then when I ask it to implement something,   it ends up like making a change over here  instead of over here where the needle is. And   other times it makes a change over here instead  of over here where the needle is. And this kind   of happens across the code base. And then when  I'm testing its implementation, I'm like, hey,   this doesn't work. Like, why is it not working?   And the thing it just added. And then I check the   code base. And it turns out there are three or  four different functions that all do something   very similar. And when I asked it to make an edit,  it edited one of the functions for like another   part of the code base. And it didn't edit another  function. Or when I ask it to do some refactoring   or remove stuff, it would like remove references  to the thing, for example, like for a function.    But it wouldn't actually remove the function  itself from the code base. And ultimately,   my code base became filled with distractors.   So I'd have dozens across different files,   some being weak, some being stronger, some being  related to like some functionality I was adding,   some not being related. And I would just find  myself having to intervene more as the projects   got bigger and more files were added.   And basically what would happen is I would have   Claude Code try and remove some of the duplicate  code. And it would only remove it about 50% of the   time. And other times I would notice there are  multiple functions that do the same thing. And   I would have it merge the functions together.   And then it wouldn't delete the old function,   despite me telling it to delete it. And yeah,  ultimately it just became a massive nightmare.    Because Claude Code would lose the forest from  the trees. It would be so caught up in the weeds   of actually like executing on a task. That it  didn't pay attention to the bigger picture. And   realize that there are multiple functions that are  doing similar things. That need to be merged in   some way. Or that this is not the most effective  solution. Given this other part of the code base   and so forth. And this is where Codex CLI comes  in handy. So I have Claude Code open on the left.
5:26

Where Codex CLI Comes In

And Codex CLI on the right. And this is a real  production application that I'm editing called   MindDeck. You can basically run many different  LLMs in parallel. So you can see over here,   I can run up to eight different LLMs. And there  are a bunch of advanced features as well. That   just make it good for like LLM power users.   You can also use many different models. All the   models available in OpenRouter. You bring your own  API keys as well. And there are a bunch of more   advanced features. Like importing from ChatGPT  and so forth. And basically I was adding more MCP   servers to the application. You can see it over  here with Claude Code. And it did some research.    It made an implementation. And then what I do is  I give all the stuff that it did to Codex CLI. And   I make it come up with a critique of the plan that  it just implemented. Or find any problems. And you   can see the critique over here. It identifies some  problems. And then I give this plan back to Claude   Code. It makes some of the changes. Depending on  what it thinks is a good change. And then I give   the result back to Codex CLI. And then I just  keep going back and forth between the two. And   often I implement much better solutions. I also  remove duplicates from arising. And distractors   arising in the code base. Because one model, which  is Claude Code, is making all the changes. It's   like focused on the weeds itself. Whereas the  other model, Codex CLI, has like a big picture   overview. And understanding of everything that  is happening. And I find that GPT-5 High, which   I'm using in this case. Has really good attention  to detail. And can make good recommendations to   other models. Depending on what it's doing. I find  for some reason that Claude Code is not able to   both implement the features that are required. And  also assess its own work. It kind of needs another   model to like keep in check. And assess its own  work. And I guess it's kind of like being a human   as well. It's really hard to maintain both a  close-up view of the code base. And be able   to make all those edits. And also to be able to  maintain a big picture understanding of the entire   code base. And this is essentially what I'm doing  over here. I'm having Claude Code maintain the   like close-up view by editing everything that's  required. And I have Codex CLI maintain the big   picture understanding of how the code base fits  together. And oftentimes without me telling it to.    It is able to recognize the duplicate functions  and distractors. And suggest that many different   things are meant to be merged together in some  way. In other words, over long coding sessions.    It's quite easy for Claude Code to lose the forest  from the trees. Implement distractors. Implement   duplicate functions. And just generally do a worse  solution overall. Unless there's someone else   checking its code as it's going along. And in this  case I found Codex CLI to be quite good at that.    But I'm sure you can experiment with other models  and other providers and tools as well. And it kind
7:57

LLM's Task Switching Capabilities

of reminds me of this paper which many people  intuitively know. Called LLM task interference.    And they basically investigate how much worse LLMs  are. When it comes to like doing a task switch.    From whatever previous task it was doing. So  for example it could have been doing a sentiment   analysis over here. And then you ask it to solve  some like math problems. And the performance can   be slightly worse than if you just started a  new chat. And ask it to solve math problems   instead. I find that it's good to have one tool  such as Codex CLI. Maintaining a big picture   understanding of the code base. And critiquing the  code or implementation that another LLM. Such as   Claude 4. 1 Opus in Claude Code was writing. I've  been doing this for my other application as well.
8:33

How I've Been Using It

HyperWhisper. There's a coupon code down below for  that if you're interested as well. And basically   I have Claude Code make a bunch of changes. And  then I give all the changes including the summary   to Codex CLI. And then I ask Codex CLI to come  up with a critique of all the changes that were   made so far. And then I pass this critique back to  Claude Code. And I say what do you think of this?    And then I give all the critique over here. And  the recommended changes and all. And then it comes   up with a new plan after investigating everything  that was done. And then I tell it to do X, Y,   and Z. And then I pass this back to Codex CLI.   And I keep going back and forth between the two.    Where I have one acting as an implementer. And  another acting as like the big picture thinker.    And checking the implementation. And I found that  in the last few days of doing this. It led to less   bugs. Less distractors. And it just led to better  code overall with better solutions that consider   edge cases and so forth. And I have been switching  up a bit in some cases. Where I get Claude Code   to be the critiquer or the checker. And I have  Codex CLI be the implementer. And I found that   in the case of SwiftUI. It actually works better  to have Codex CLI as an implementer. And Claude   Code as a critiquer slash checker. And I have  kind of done this before with subagents. Where
9:41

Comparison to Subagents

I had the main Claude Code session. Having the  big picture overview of everything. And having   the subagents act as like implementers. And  also checkers and so forth. But I found this   approach of where I'm using a different model.   In this case which is GPT-5 High. You want to   do slash model and then change it to GPT-5 High.   I found that to be better overall. Because I think   GPT-5 High is fundamentally like has different  architecture. Has different training data. Thinks   in a different way. And is able to be a better  critiquer of the code that Claude Code writes.    Than like Claude Code itself. It's like having  someone else critique your work. And they're   going to come up with a better critique. Because  they're a different person. We have different   life experiences. Different training data.   And so forth throughout their life. Than if   you try and critique your own work. But yeah. I  will be continuing to experiment with this over
10:24

Conclusion

the coming weeks. If you do want to learn more  about Codex CLI. And how it compares to Claude   Code. Then I do have a previous video about it.   But ultimately I would just recommend using both   of them in parallel. And having one act as a  critiquer. And one acting as an implementer.    Anyways this video is not sponsored or anything.   I don't accept sponsors on this channel. Because   I think it can lead to some kind of bias. But  this video is made possible. And supported by   the people who buy my AI products. Using a link  in the description down below. There should be   some coupon codes as well. If you're interested.   And I've generally found them very useful. And   if you buy them. Then you're also supporting  like an indie developer. A small YouTuber.

Ещё от Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться