OpenAI's New GPT-OSS Models Kinda Suck (Comparison)

10:23

OpenAI's New GPT-OSS Models Kinda Suck (Comparison)

Ray Amjad 05.08.2025 794 просмотров 25 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Join AI Startup School & learn to vibe code and get paying customers for your apps ⤵️ https://www.skool.com/ai-startup-school 📲 Stay up to date on AI with my app Tensor AI - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai CONNECT WITH ME 📸 Instagram: https://www.instagram.com/theramjad/ 👨‍💻 LinkedIn: https://www.linkedin.com/in/rayamjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links Mentioned: - Tweet: https://x.com/OpenAI/status/1952776916517404876

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

30 minutes ago, OpenAI released two brand new open- source models and we'll be comparing against the Chinese models that took the limelight in the past few weeks, Quenfree Coder and also Kim K2 when it comes to coding tasks on real world benchmarks and basically it seems they released a 120 billion parameter model which can be run on data centers and high-end desktops and laptops and a 20 billion parameter model which can be run on most desktops and laptops. Um they seem to be designed for agentic tasks which can be good for like agentic coding I guess and they have a full chain of fonts that you can be reading too. Uh there are a few benchmarks over here which I won't be going into and there's a research paper attached and you can download them on hogging phase but basically we'll be using them. They're already deployed on open router. So if you go to open router and search openai you can see the two models are deployed here. We'll use a bigger model the 120 billion parameter model. And it seems the pricing that many of these providers which are hosting the model right now are charging are about 15 cents per input million input tokens and 60 cents per million output tokens. And I'll probably make a separate video about the 20 billion parameter model versus the 120 billion parameter model because many people would be interested in how much performance they can get locally on their computer if they don't have a high-end computer. Um but for now we'll just be using their data center version over here because it's already set up for us. I'll be testing this model against quantry coder and Kim K2 by making changes to AI application that I made called Tenza AI and it basically helps to stay up to date with the latest AI news. Currently the open source announcement is not on here because it only happens like every hour uh the checks for any new announcements and this happened like 1 minute past the hour. Um, but yeah, basically if you want to learn how to make applications just like that and monetize them and distribute them, then I teach everything to do with that in my AI startup school and you can join that using the link in description and I take you through a program that helps you build applications, distribute them and also come up with AI ideas for your applications too. There's a link in description down below to join it. But anyways, we'll be getting an open AI or AI key from Open Router over here. So, I'm going to call this like uh YouTube uh 6th August because it's the 6th of August for me right now. And then just copy this key over here. And I'm going to be using client in all three cases. I think it's good idea to use the same like environment whether you're using like a gem CLI uh cloud code like routing it the request to a different place. Uh client is a pretty popular choice. I'll be using that in all cases. Uh you want to install it. Go to the settings over here. Choose open router and then paste your key over here. Press done. And then finally you can select the model from here. So the model should be available if I start typing openai. Um and it should be openai/gbt/oss. So I can just start off by saying hello who are you and make sure that it is a openi model and the request are being routed properly. And you can see it says I'm client blah blah and that's because it's in the system prompt. But if we go back to open router over here and then we go to activity then we can see that this request was routed via open router to one of these model providers. It seems it reached about 300 tokens per second which seems pretty fast and then we can see more information about which provider it was routed via and so forth but anyways we'll be giving it some tasks. So I have a list of tasks over here which have been recommended by users using the application and one of them is allowing users to adjust the time they receive notifications. So on the application, if you go to it and you go to settings over here and go to manage notifications, you can get a daily digest of any like AI news that has happened. And there should be a setting which allows them to adjust it to like ATM. 9:00 a. m. uh like 10 p. m. whatever time they feel like suits them. And this requires a few things. It requires a database migration. It requires some editing on the API like inest endpoint. And it also requires a like UI update. So we'll be giving this test to all three models. Let's start out by giving it to the open source model by OpenAI. We can make a new chat over here and I'll be using Super Whisper. So, I'll just quickly say to Super Whisper, "Hey, I basically want you to add the ability for the user to decide which time they receive the daily digest notification. You should update the front end with some UI like on the mobile expert application. update the manage notifications page and then also update the ingest endpoint which like decides which times they should be receiving the um like daily digest and then also add a new database migration for this and update the database schema for this. So all three things required updating and uh yeah go for it. So this is a prompt over here. I give it some guidance on what it should be updating. So we can just press enter and then see what happens. And now let's do the same with Quen over here. So if you open up client again. Okay. So for some reason client just kept bugging out and it kept like ask me to sign in over and over again because it didn't like me having different models running in parallel on different versions of it. Uh so we'll be using uh this model or this application instead which is recode and recode is like the number one application right now um on open router over here. So we

Segment 2 (05:00 - 10:00)

can just select a model from this list and go to API configuration. Select make sure that open router is selected over here. Uh for the model we can choose gpt/g openai/gpt/ oss 120 billion parameter and then we can paste in the same prompter that we had in earlier. So I'll press enter over here. And now let's just quickly double check that this is actually going via the new model. So if we go back to activity uh we can see that is now going via recode. So recode to GPTOSS. Uh so we will have this run basically and then see how this model performs and then we'll have it running via Kim K2 on root code as well and then we'll have it running via quen coder on recode as well after this is done. All right. So it seems that openi model completed task on root code and it took 47 cents in total which is pretty interesting. It did like take quite a while because um I guess open router is pretty under load for this model. a lot of people trying to use it right now. We will compare this against Kimmy K2 and also Quenfrey Cod by opening a new session, opening up Kimmy and then having the model run by Kimmy now. So I'm going to switch the model over to Kim K2, then press done and then run it in the Kimmy folder over here. So what I'll actually do to make sure it's fair, I'll start off in architecture mode over here or architecture mode and then give it the exact same prompt. um because the open air model also started off in architects mode. So I actually think whilst this is running maybe I can spin up another session of root code. I haven't tried this before and then give it the prompt for or give it quen free instead. And I think that should be fine because it should like isolate it per session. So if I go back and then do quen free quenfree coder. So paste a prompt over here and then choose architecture mode again to be fair and then press enter and we'll see how quenf free coder performs too. All right so it seems like quen free code is also done and it seems that uh also Kim K2 is done. So when I quickly check the prices against these uh it seems that GPT's model at least when running an open router came out to be the cheapest at 47 whereas Quenfrey Koda came out to be 2 $2. 15 and then uh Kim K2 is 83. Uh so we'll check the solution against all of them. So basically for open air solution over here I'm just going to quickly read through it um because this will be a very long video if I run all of them. So, OpenAI's model seems to have added the migration properly and also the uh thing to the database and the roundup generation, but it seems to have failed to actually make a update to UI. So, it added the state, but it didn't make a UI update unfortunately. Whereas, Kimmy K2 over here, if we check through what it did, it made a migration as well. It didn't make an update to schema table, unfortunately. Um, it just deleted the database types for whatever reason. Um, it did make an update to rand roundups as well. And looking through the code, this seems to be correct. It also added a nice time picker component as well. And it made many more updates to settings page over here. So, it actually updated the uh like it added a state and then it did all the relevant fetching as well on the notifications page. So I think Kim K2 actually beats the open model in this case. If I check it against Tenza or like the Quenfree and Quenfree like the thing is it made a bunch of like MD files that it didn't delete for planning. It also made a migration. It didn't update the schemas. It also deleted all the database types for whatever reason. Uh but hey, it happens. We can regenerate them later automatically. And it actually added a component to the notifications page uh which you can see over here. And this looks pretty good. It seems to follow the same design that we have used so far in the application and it handles like changing it properly too. But of course Quenfrey Coder was much more expensive than Kim K2. So I think Kim K2 comes out on top here. I know like I can just tell OpenAI's model, hey uh can you add the component to on the notifications page and then it will add the component. But yeah, I don't think I'd be using quen coder because it just ends up being much more expensive. Kim K2 seems to do pretty well over here. Uh the only downside is it deleted my entire database types which the OpenAI model did not do. So because this video is already getting pretty long and I'm tired because it's 4:00 a. m. over here, I'll do other tests in another video. So do subscribe if you want to see me try the OpenAI model against other models in much harder exams or much harder tests. But it seems it finished adding a component as well. So it made a new component over here. And it didn't actually integrate the component into the wider like into the screen that it was meant to as well. I thought that was

Segment 3 (10:00 - 10:00)

like pretty obvious that I should do. So I think the model does require more guidance on the surface and Kim K2 does. I feel like Kim K2 just does a better job of like doing it on its own. With the extra guidance, it would probably end up costing the same amount maybe, but we'll see over the coming days as I do more tests. I will be making an update video in a few days when I have done more testing of the model. So do subscribe for

Другие видео автора — Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник