Cursor is CAUGHT red handed...

Cursor is CAUGHT red handed...

Wes Roth 32 812 просмотров 914 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

full story: https://natural20.beehiiv.com/p/cursor-got-caught-using-a-chinese-ai-model-and-didn-t-tell-anyone ______________________________________________ My Links 🔗 ➡️ Twitter: https://x.com/WesRoth ➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe Want to work with me? Brand, sponsorship & business inquiries: wesroth@smoothmedia.co Check out my AI Podcast where me and Dylan interview AI experts: https://www.youtube.com/playlist?list=PLb1th0f6y4XSKLYenSVDUXFjSHsZTTfhk ______________________________________________ 00:00 – Cursor Composer 2 04:05 – Kimi k2.5 08:55 – China 13:58 – Self-Summarization 18:22 – Doom on MIPS #ai #openai #llm

Оглавление (5 сегментов)

Cursor Composer 2

All right, so Cursor is in some hot water. Tons of drama surrounding this one. So exactly what happened? Well, first and foremost, Curser releases their own AI model. It's called Composer 2, and it's good. Really good. People are absolutely blown away. It's a frontier level at coding, but it's very, very cheap, much cheaper than the frontier level models. Everything's going great until one person goes, "Wait a minute. Why is this new model by cursor named Kimmy K2. 5? So this is Finn. So he's saying so composer 2 was just Kimmy K2. 5 with reinforcement learning. So apparently composer 1. 5, you know, blocked it. It seems like they did a better job of covering up this fact, covering up their tracks for composer 1. 5, but not for composer 2. After Finn posted this, it's now blocked. Elon Musk weighs in going, "Yeah, it's Kimmy 2. 5. " Okay, so what just happened here? We'll hear from the people at Cursor. Kimmy Aai. But first and foremost, if you're not familiar with Cursor, it's one of the fastest growing software companies of all time. It's an AI code editor built on top of VS Code, which is an open source project. It raised like 2. 3 billion just a few months ago in November 2025. It's valued at close to $30 billion and reportedly making over 2 billion in annualized income. So they are extremely successful, one of the fastest AI coding apps and arguably the most popular AI coding app. And what is a Kim K2. 5? It's a Chinese company that released this opensource model. So it's a genuinely excellent model, especially for various agentic tasks. It's open source with a modified MIT license. So here's the point. If you're a startup, if you're a small company, if you're just kind of using it for your own purposes, basically you don't have to say that you're using the Kim K 2. 5 model, it's open source, but that whole thing about it being modified, it just means that large companies do have to disclose that they're using this model. Basically, if you have over 100 million monthly active users or over 20 million in monthly revenue, you can't use it kind of like an open source model. You have to declare that you're using it. You have to display prominently that you're using Kimmy K2. 5 or whatever you're using in your product's user interface. So your users have to be aware that this is kind of the base model. So when we say Kim is open source, it's open source with an asterisk cuz technically I mean technically no because open source is supposed to mean you know fully open source. This is open source but unless you're a high revenue company then not so much. Now, as you can imagine, cursor did not say that this was based on Kimmy 2. 5. And so, as you can imagine, so cursor did not prominently state that this model was based like the base model was Kimmy K 2. 5. This is Lee Robinson. So, he is working for Cursor. He's helping developers, teaching them how to use all these tools. He's saying, "Yep, composer 2 started from an open- source base. We'll do full pre-training in the future. " Meaning that in the future, they're planning to train their own models sort of from scratch. And they're saying only about one quarter of the compute spent on the final model came from the base. The rest is from our training. This is why the evals are so different. So what he's saying is I'm assuming they're estimating how much compute went into training the KI model. And they're saying that you know that was one quarter of the total. So they've spent on top of that additionally a lot more compute to do reinforcement learning with all the data that cursor has because there's a lot of stuff that happens in cursor people using it. They do have a lot of data that they use for reinforcement learning to improve this model. That's why the model looks so good on the evals. But notice even here he doesn't say the model. He says we started from an open source B and some open source base, right? They're still not saying Kimmy and he's saying and yes, we are following the license through our inference partner terms. So it seems like because they're using an inference provider and technically that's the person doing the inference providing the model, they don't have to disclose this. Here's the thing. This thing is a little bit weird. Here's Yulun Doo. So, he's at

Kimi k2.5

Kimmy Moonshot. So, he's at the company building these Kimmy AI models. He posted this post that was later deleted. You know, I found Harve who reposted it. So, that's why we're able to see it. So, this is no longer on Exois. I can tell, but this is a post by Yulan. So, again, he works for Kimmy. He's saying, "Wait, we tested with composer 2 model API and found out that tokenizer is indeed the same with our Kimmy tokenizer. " Right? So, somebody from the Kimmy team is going, "Yeah, this is indeed our model or at least the base of the base model is that he's saying we can almost confirm this is our model post trained further. We are shocked that Cursor AI did not respect our license, nor did they pay us any fees. " and he's saying, "Michael Truel, why did you do this? " So then that post disappears as far as we can tell and later this Kimmy. ai post appears. They're saying, "Congrats to the cursor team on launch of Composer 2. We are proud to see Kimmy K2. 5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pre-training and high compute RL training is the open model ecosystem we love to support. So looks like Fireworks AI is the inference provider and so Cursor accesses Kim K 2. 5 via Fireworks AI. It's a hosted reinforcement learning and inference platform as part of an authorized commercial partnership. All right, so what just happened? Let's review. So Cursor launches Composer 2 March 18th, I believe it came out. describes it as frontier level intelligence at a low cost and basically presents this as their own model without mentioning you know China or Moonshot or Kimmy or anything. So it's really sounded I think to most of us like they built it from scratch because indeed it was presented seemingly as if it was built from scratch. No mention of any open source model was there. Later, an ex user finds the fact that they're still referring in some URL to that model as a Kimmy K2. 5. And within hours of Reddit and X and hacker use everywhere, employees working at Kimmy are going, hey, what's going on? Why aren't you crediting us with using our models? Employees over at Cursor are posting, you know, okay, yes, we've used this as the base model. Still not using the name until they get called out for it. So here in this original one, no mention of Kimmy, right? So there's some people calling them out saying, you know, why is everything coming after the leak, you're still not giving credit to the open- source model. And then later as a response, they're saying, you know, here's the confirmation that we're using Kimmy since people really want me to say this Kimmy K2. 5. Yes, that is the base we started from and we are following the license through inference partner terms. And when people keep pressing them for more answers, why aren't you disclosing what the model is? They post this, this is from cursor. It's called training composer for a longer horizons. It's a very interesting blog post that reveals how they've approached this whole thing. It's actually technically a very interesting post and it kind of reveals what they built on top of Kimmy. So here they describe this concept of self summarization. Basically the model pauses mid task as it's running some task summarizes everything it knows so far into about a thousand tokens and then continues on with that compressed context and as they say here by making self summarization part of composer's training. We can get training signal from trajectories much longer than the models max context window. Now, we'll come back to this blog post in just a second here because it has some very interesting points, including you'll never guess what appears halfway through. Doom. But really fast, in case you're wondering, why didn't they actually mention Kimmy, was this some attempt to use something without disclosing it or using the license or paying some fees or was something stolen? The answer is no. As far as we can tell, everything was above board. Everything was fine. A lot of the open-source community wanted people to attribute it to the right model for them to say Kimmy. This was based on Kimmy. And there are great arguments for why that should happen. It gives a credit to the original open source community. It also gives a signal to a lot of other people that are building on a community that hey, like we're all behind this open source community that if you build something amazing, people will give you attribution. So if it seems like people are taking advantage without sort of putting stuff back in without attributing it does ruffle a lot of people's feathers and members of cursor did sort of state that yes they should have you know in the future they know now that they need to make sure they state that they put the attribution etc. With that said let's assume we gave them a stern finger wagging and they've learned their lesson. Let's assume

China

we've yelled enough at them for this. Let's talk about why they did it. And also, what they've actually created is outstanding, amazing, and impressive all in its own. Because again, this wasn't them taking somebody else's work and passing it off as their own. That's not what happened here. And I think it's very important that piece doesn't get lost in the outrage, right? We can still figure and tell them they're bad for not attributing, but don't miss what actually happened. So, first and foremost, I think why they didn't want to mention it as somebody else's model is number one, it doesn't look great. It looks bad not to build your own model. Cursor just raised at a close to $30 billion valuation. Partly on the idea that they are indeed a serious AI research company, not just a rapper, right? Not just a UI on top of somebody else's work, cuz remember VS Code is Microsoft originally. So, you kind of get this idea, right? It kind of looks bad. they should be in a position to build their own model. We we'll come back to that in just a second. I think reason number two why they didn't want to mention Kimmy is China. China building on top of a Chinese model is politically kind of uh being on shaky ground. The US China AI race narrative is everywhere. So deepse the deepse moment caused a lot of issues if you recall that whole thing. Tons of stocks crashed caused a panic in Silicon Valley. So saying for Churser, the biggest AI coding tool, right, for them saying, yeah, our base model or the thing that the powers our stuff that it's an Alibaba based Chinese model, it's not awesome to say that. That makes sense. I think so. I think that's important to keep in mind because if they did mention it, they'd still be going through this Storm right now. It would just be a slightly different theme. they would still be getting attacked and henpecked not for you know not attributing something to open source but rather for using a Chinese company so I think that makes sense is a US company a lot of the enterprise level customers are very sensitive to using stuff that have ties to the Chinese infrastructure disclosing this could have been a PR headache so they didn't and I'm not saying right or wrong I'm not taking sides I'm just kind of letting you know what I'm pretty sure the reasoning was behind being like hey let's just not say anything. So, here's kind of the big points to understand. The three quarters of compute that went to building this composer 2 model, that's still that's real work. That's real contribution. I think a lot of people are talking about this as a way for them to, you know, slap a sticker on top of a Kimmy model and maybe just charge money for it instead of running everything through OpenAI anthropic. I think that's missing some of the context. In fact, and if you look at this specifically, this is the context that I think is missing. Curser is doing real research. They are adding their own innovations to it. They are saying that they're going to be building kind of their own model in the future. So maybe starting on an opensource foundation was step one for them to test things out on top of already existing models to test out their reinforcement learning protocols, their reinforcement learning pipelines. And in the future, they're going to do their own pre-training and built it from scratch. And for open- source AI for the open source ecosystem, this is working more or less as intended. Moonshot AI released it open source for other people to build on top of it. Cursor did So in that sense, everything is working as intended. This is exactly what we want to happen. So really in the end comes down to they didn't say who was supposed to be. they didn't give the right attribution. The idea that they're running it through fireworks as an inference platform. Again, maybe that's one of those things where it's like technically everything is fine, but a lot of the people in community are not going to feel great about that. It feels like just a loophole to go through to not give attribution. And again, it's getting kind of hard to tell whose technology things is built on, right? Because open anthropic. Everybody's running on Google technology by the way, right? Transformers, that was 2017. Attention is all we need paper. Google chose to publish that work and because they did they've created a lot of competitions for themselves like openi took that idea and ran with it. Anthropic split away from them very early on started building kind of in their own direction. Everybody's doing a distillation from each other. We know that a lot of the labs uh in China are using you know these distillation attacks as Anthropic has called them to get data out of the western models to be able to train on it. Somebody posted this tweet earlier. They said, "My daughter has been asking a lot of questions this evening. This is a form of an advanced distillation attack. " I thought that was the greatest joke ever cuz yeah, your kids asking a bunch of questions is a form of a distillation attack. Technically true. They're trying to get the knowledge that's in your head so that they can improve their performance on various benchmarks in life. But really fast, let's quickly cover cursors self summarization things.

Self-Summarization

So again, this is where a lot of the RL compute went the training of the composer 2 model. You know, yes, they had the Kimmy base, but they did tons and tons of work training on top of it. Again, Chrysler has tons of data. It's one of the most popular AI coding apps. So they're using all of that to build on top of an open source model. And the idea here is basically how to make it be able to work on very long and large tasks. How do you compress everything when there's more data, more stuff, more code than what fits into the context window of the model? And so their approach is this self summarization. So really fast, they explained like this composer. This is cursor's own model built on top of an open source model, but it's a specialized model designed for agent coding and trained through RL reinforcement learning in the cursor agent harness. This enables it to be trained with compaction in the loop improving its ability to determine the most critical information to summarize and preserve. So as composer works through a task, it approaches a fixed content length trigger. So let's say halfway through when the context window is halfway filled. I think earlier they said that's kind of like that point. Maybe it's later. Basically it pauses to summarize its own context before continuing. So how it works is composer generates from a prompt until a fixed token length trigger is reached. So basically you tell it go make me a flappy bird app but I want to use my webcam and flap my hands and that's going to be the controls for the flappy game bird. So it starts working on that and let's say it approaches some token length right some predetermined sort of amount of work that it did. So it hits that fixed token length, something triggers and then they insert a synthetic query asking the model to summarize the current context, right? So it's almost like a different model comes in and goes, okay, look at everything we've done so far and summarize it, condense it, really figure out how to get all the necessary information out of everything that we did and compact it. Right? So this model is given a scratch space to think about the best summary. And so it generates that condensed text and then composer loops back to step one with the condensed context which includes the summary plus conversation state. The point being here is this model that's compacting and summarizing everything. If you think about it, if it does a bad job at that, then the chances of, you know, the kind of the original project reaching a good conclusion goes down. If it misses important context, if it's bad, then you know the chance of it completing the project goes down. If it's really good and really captures all the important details, the chances of that project having a good outcome improved, meaning that we have verifiable rewards. We can say like this way of condensing this information of summarizing that was good and this other way was bad. So therefore, we can take that data and then use reinforcement learning to train this summarizer model to be even better at summarizing. So they say here this means that self summaries themselves are part of what gets rewarded. So basically poor summaries that lost critical information are downweighted. They slowly go extinct and the ones that do well they propagate and get used more often. Now I know what you're asking. Uh yeah yeah but what does this have to do with Doom? Isn't that the whole point of this thing? And uh yes I mean isn't everything? So there's this Terminal Bench 2. 0 know sort of prolevel benchmark obstacle quiz. This boss fight from that course it's referred to as make doom for MIPS. So MIP is an old school type of computer language that's used in things like Nintendo 64. So the problem that's given is you're given the raw source code for the 1993 game Doom and this file that tells the game how to draw frames to a specific file and this virtual machine that only understands MIPS. This only understands this weird language that's not often used. And so you're giving those pieces and you're just told to make it go, make it run. So this is a problem that's very simple to describe to put in a prompt and it's incredibly difficult to actually work through and create and complete. This problem is challenging enough that several powerful models are unable to get it correct in the official reported numbers partly because it requires testing a significant amount of code. So you're constantly running up against the context window sort of challenge and this is kind of one of the things that cursor is trying to get better at trying to solve with this composer 2. So they're saying when testing an early research checkpoint of composer 2 we

Doom on MIPS

found that it was able to solve this problem correctly. The solution required engineering and testing a significant amount of code as well as exploring some alternative implementations. Here's an image rendered in the course of solving the problem. By the way, um this is a bit embarrassing, but so this image, if you're looking at it, if you remember how the original looks, you're probably like, "Oh, this is kind of like warped and weird. " It's not. Here's the thing. I have a green screen behind me. And then I'm using a software that filters out that particular green color, and I'm able to add, you know, various effects such as being over this website that we're looking at here. It's all very fancy. But the downside is apparently that specific color green also gets filtered out. Sorry, ADHD moment. The point is composer worked for 170 turns to find an exact solution along the way creating self summaries in a compact human readable and structured form. So it self summarized more than 100,000 tokens down to the 1,000 it believed would most help it solve the problem. All right so in the end what does this all mean? Number one, yes, in a perfect world cursor should have said, you know, Kimmy K2. 5 was the base model. the people that are upset and yelling at cursor right now. I mean, you know, they do have a point. Point number two, you know, did cursor do this to pull a fast one to trick people? I don't think so. I think it was done to avoid the drama from kind of like the geopolitical sensitive issues surrounding US versus China. It really doesn't seem like they just took Kimmy, rebranded it, put their own label on it, and resold it. That's really It doesn't seem like that's what's happening here. Again, they put more compute than the original model was, you know, for they used for pre-training. Also, they built on top of an open source model and they did put out this blog post, you know, they do have this which is what the composer's self summary was. So, they're publishing their research, right? So, so they're part of the open source ecosystem. They're doing things a lot of things, right? They're adding to it. They're contributing to it. And they're showing some pretty powerful things you can do with these open-source models and your own data and how you can use your own reinforcement learning training to take something that is good and turning into something pretty amazing, pretty great. So here's Clement Delang. So he is the founder currently running Hugging Face. This is one of probably the top open-source AI sort of champions, people that are getting it out there, enabling open-source AI to be used and doing a lot for the ecosystem as a whole. I mean, the dude runs HuggyFace, right? So, he's saying open source keeps being the greatest competition enabler. Another validation for Chinese open source that is now the biggest force shaping the global AI stack. And the frontier is no longer just about who trains from scratch, but who adapts, fine-tunes, and productizes fastest. Seeing the same thing with openclaw, for example. So I think all in all, it's a big win for open-source. It does bring the question of how many other models are there out there where companies are saying, "Yeah, we trained our own in-house model, but they just really were able to cover the tracks that it's from some other Chinese open- source model. " Is it possible that Cursor is the first company that thought of doing this and they were just the first that got caught? I don't know. But all in all, I think all is well, that ends okay. And I hope Cursor does pre-train their own model next, if that makes sense from a technical perspective. More importantly, I hope that they continue doing research like this, improving the models and posting what they've done. But let me know if you agree, disagree. Do you think they should get continued outrage over this? Again, I don't think I'd be saying that this was a great thing to do. I do in this video just wanted to point out that they've also contributed a lot. And I think the whole China situation is what caused the sensitivity in the first place. I don't think this was just a loweffort way to steal something. I don't think that's the right narrative here. But let me know what you think in the comments. If you made this far, thank you so much for watching. I will see you in the next one. My name is Wes Roth. Please consider subscribing. My channel has like the highest views to subs ratio I think in this entire industry on YouTube for AI. I mean, oh why can't we make this official? Are you embarrassed of being seen with me? Am I just like on the side? Hit subscribe if you're not subscribed. Hit subscribe. Let's make this official and I'll see you in the next

Другие видео автора — Wes Roth

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник