came to using Claude Code alone, I want to explain what Context Rot is because it will actually help you understand why the problem was happening to begin with. Basically, a paper came out about 6 weeks ago by Chroma, and they write about how increasing input tokens impacts LLM performance. Now, we all know the needle in a haystack test that many LLMs do, so when Google releases a 1 million token context window, they say it achieves like a perfect score on the needle in a haystack. What they're doing here is they have a bunch of random text, which is related to the needle in some way. And they have a needle, such as the best writing advice I got from my college classmate was to write every week. And then they ask a question, such as what is the best writing advice I got from my college classmate, and they see whether the LLM is able to retrieve that piece of information from this long piece of text. So what Chroma did is introduce something called distractors, which are somewhat semantically similar to the needle, such as I think the best writing tip I received from my college professor, and not the classmate, my college professor, was to write every day. So this is semantically similar to the needle, but it's a different piece of information. And then they experimented with different types of distractors, so they had easy ones over here, and challenging ones over here. And you can see that as they changed the number of distractors, the performance of the model actually decreases for many high performance, medium, and low performance models. So as you can expect when there's one distractor, or when there are zero distractors, the high performance models, such as Claude 4, Sonny, and also GPT-5 as well, are fairly consistent. But as the token or input tokens increases, with one distractor, you can see decreases. And as you have more and more distractors, and with increasing input tokens, you can see the performance is like almost 35% over here. And they also compared different distractors as well. So you can see distractor 0, 1, 2, and 3 over here. And the fourth distractor, which is distractor 3, is the most challenging across the board. Because if we look back to what the distractor is, it says, I thought the best writing advice I got from my college classmate was to write each essay in four different styles, but not anymore. And this, but not anymore, makes this distractor much more challenging to different LLMs out of all the distractors on this list. I would recommend reading through the paper because it is quite interesting. But the result here is pretty clear. As you increase the amount of input tokens, the performance of the LLM decreases. As you introduce more distractors, then the performance decreases even more when you're introducing more input tokens. And as you introduce more challenging distractors as well, the performance drops even further. And basically, the problem that I was having with Claude Code