GPT-5.1 Takes Over Opus 4.1 in Coding?
9:18

GPT-5.1 Takes Over Opus 4.1 in Coding?

Ray Amjad 14.11.2025 5 072 просмотров 73 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=SdeVQ0YyMvc - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=SdeVQ0YyMvc - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=SdeVQ0YyMvc 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=SdeVQ0YyMvc - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links: - https://openai.com/index/gpt-5-1-for-developers/ - https://www.anthropic.com/news/claude-sonnet-4-5 - https://x.com/xeophon_/status/1989043950695641166?s=12 - https://x.com/embirico/status/1989081059041116598?s=12 - https://openrouter.ai/openai/gpt-5-codex - https://openrouter.ai/openai/gpt-5.1-codex - https://openrouter.ai/openai/gpt-5.1-codex-mini - https://x.com/coderabbitai/status/1989035006774354387?s=12 - https://www.coderabbit.ai/blog/gpt-51-for-code-related-tasks-higher-signal-at-lower-volume Timestamps: 00:00 - Intro 01:24 - Speeds & Mini Models 02:01 - Code Review 02:45 - My Tests 03:13 - Designs 05:16 - Making an iOS App

Оглавление (6 сегментов)

  1. 0:00 Intro 308 сл.
  2. 1:24 Speeds & Mini Models 117 сл.
  3. 2:01 Code Review 128 сл.
  4. 2:45 My Tests 99 сл.
  5. 3:13 Designs 451 сл.
  6. 5:16 Making an iOS App 853 сл.
0:00

Intro

Okay, so GPT-5. 1 came out in Codex CLI early  today, and we'll be going through it in this   video and seeing how it compares to GPT-5  when it comes to vibe-coding. We'll start   off by going through the release notes and  what some people online are saying about it,   and then go through my own experience  when it comes to vibe-coding with it. Okay, so they say they made GPT-5. 1 faster because  they overhauled the way they trained it to think,   which means that it now spends less time on  easy tasks and more time on harder tasks. So   you can see here with GPT-5. 1 medium,  it generates less tokens per response   for all these tasks. And finally, you can see  that for the SME-bench Verified GPT-5. 1 High   here gets a better score of 76% compared  to GPT-5 High which got a score of 73%. And now that is slightly behind Claude Sonnet 4. 5  which does get a score of 77. 2% and GPT-5. 1 High   now beats Opus 4. 1. But for some reason they did  not actually include the GPT-5. 1 Codex scores,   they just included the GPT-5. 1 scores. Anyways,  the GPT-5. 1 Codex models did land in Codex and   they say that GPT-5. 1 Codex is better for  Codex mac and Linux and GPT-5. 1 is better   for Codex on Windows. And they say that's  because the GPT-5. 1 Codex models are not   as good at PowerShell as GPT-5. 1 is, which  means that if you are using PowerShell and   not something like Windows Subsystem for Linux,  then you should probably use GPT-5. 1 instead. And that's because the environment that the Codex  models are reinforcement learned in are actually   in bash. Anyways, one thing that I did notice when  doing my testing between GPT-5 Codex and GPT-5. 1
1:24

Speeds & Mini Models

Codex is that GPT-5 Codex has become slower now  because they're probably phasing out the model. You can see on OpenRouter it was performing  at around 25 to 30 tokens per second,   and now it's dropped down to about  15-16 tokens a second. Meanwhile,   GPT-5. 1 Codex has been operating at around  30 tokens a second for most of today. And GPT-5. 1 Codex Mini is operating around 57  tokens a second, which means that it should feel   twice as fast. And they do say it's four times  more cost efficient, which means that if you   are using it, GPT-5. 1 Codex Mini, then you won't  hit your rate limits as soon. And now, as for the
2:01

Code Review

final bit of online verdict, CodeRabbit, they  say that GPT-5. 1 isn't GPT-5, but just faster. They found in their own evals of the  model. It's the highest precision model   they have ever tested for code related  tasks like code review. So they noticed   the precision increase is between 7 and  50%, which is like a pretty big range. They say compared to GPT-5 Codex and  Sonnet 4. 5, GPT-5. 1's comments feel   leaner and more conversational and closer to  how experienced engineers actually communicate. They say that GPT-5. 1 also follows context  better because when prompts are vague,   then it explicitly explains its assumptions.   And then the CodeRabbit team basically explain   a bunch more things, including where GPT-5. 1  still lags behind. So basically you got GPT-5,
2:45

My Tests

5. 1, 5 codex, and 5. 1 codex to do a bunch of  different things. The first one was making an   iOS version of my application, HyperWhisper,  and this is my speech-to-text application.    There will be a coupon code down below  if you are interested in buying it. And yeah, basically like a lot of people asked  for the iOS application, so I figured it was   worth trying to get one started. And then  I got it to do some landing page designs   because I'm pretty interested in how the models  have evolved when it comes to designing things.
3:13

Designs

So this is a design that GPT-5 came up with when  it comes to introducing itself. The prompt was   the same, so I said, please introduce yourself,  the model, in a complete landing page. So it   has the features section, it has a purple theme,  it has this live demo, which may actually work. And yeah, it does seem to respond with  something. Oh, it turns out it actually   responds with the same thing every single  time. Anyways, then this is GPT-5 again. I gave a new session the exact same prompt,  ignoring the previous context. This actually   seems slightly longer, it still has a very  purple-esque design and has the components   of landing page. It actually does not seem to  have a footer this time, which is interesting. And then I gave the same prompt to  GPT-5. 1, and then this is a page that   I came up with. Interestingly enough,  it actually did not use any purple. It started using green instead  this time. It seems to be longer,   which is pretty good. The FAQ also does work. It also added a footer and it also added  the call to action at the bottom. So it does   seem more complete compared to GPT-5.   And this is another attempt as well. And honestly, like, I think this may  be the best out of all the landing   pages. It is slightly messed up in the  top over here, but it does feel more   like a landing page. I think it's actually  because of this right hand bar over here. And now let me actually show you GPT-5. 1  Codex as well. So this is GPT-5. 1 Codex.    I do like the color scheme here  is better, not the purple design,   but it does seem a little too compact for  my liking. Like, the padding is kind of off. And then this is GPT-5. 1 Codex again  in a different chat. And this, like,   nav bar is totally messed up. It didn't make it properly, but the  padding here actually does seem better,   so I think that's a good thing. It did add  the footer as well in this case and also in   the previous case as well. So honestly it  seems that GPT-5. 1 is slightly better when   it comes to design. I would probably stick  to 5. 1 myself instead of 5. 1 Codex. Anyways,   I will be trying out more when it comes to  design over the coming days and then sharing   my thoughts in my AI startup school that you can  join using the link in description down below. A   bunch of people have already joined and found  some value out of it, so maybe you will too.
5:16

Making an iOS App

As for a much more complicated task,  I asked it to make an iOS version on   application HyperWhisper. So I came up with  a somewhat detailed plan about how everything   should look. And I read through the plan  and I was like, okay, this looks good. And then I gave this plan to GPT-5  Codex medium thinking and GPT-5. 1   Codex medium thinking. And then I  made some notes for myself here. So I would say that GPT-5. 1 Codex  is faster because it took 13 minutes   instead of 30 minutes compared to  GPT-5 Codex. And when I asked both   models, are you done? Then they  basically said, yeah, we're done. And I think part of this performance  difference is the fact they have slowed   down GPT-5 Codex and made it half as slow  as it used to be. So maybe they would have   taken roughly the same time before.   And then what I did is I got Claude   Code to look at both of them and compare  the solutions to see which one is better. So even though it says GPT-5 here, this actually  refers to GPT-5 Codex instead. And then after   Sonnet 4. 5 looks through both of the code bases,  it said that GPT-5 Codex is more advanced and   feature complete with better modularity, a more  modern Swift implementation, and then GPT-5. 1   codex did not do as good of a job because it just  kind of like laid out some boilerplate instead   for me to use. So you can see when it comes to  feature capabilities, some of the features are   actually missing with GPT-5 Codex. And yeah,  honestly, it seems kind of disappointing. I guess it is using less tokens per turn, but that  also does mean that any implementation that it   does do may be less complete because maybe they're  training it to do less tokens per turn rather than   checking whether its implementation has been  completed. But I actually did use GPT-5. 1 Codex   instead, so maybe it would have done a better job  if I just used 5. 1 like 5. 1 instead. Okay, this   is a version that GPT-5. 1 Codex made. So there's  like a license key page and this is what it looks   like. So for some reason the screen resolution is  messed up, but I think this is a simulator issue. So it had the history thing, the settings  kind of added to my keyboard. And honestly,   I told it to make the design kind  of like the desktop application,   so it did a pretty bad job at  that. And when I press the button,   then it just crashes instead of asking me  to allow microphone recording permissions. So yeah, let's actually see what  GPT-5 Codex did instead. Okay,   now this is a version that GPT-5 Codex  made. So you can see there's a record page,   a history page, the design  is kind of messed up here. And then the settings page has a vocabulary  manager too, and you enter your license key.    There's no validation system. Or actually there  is maybe because it says the key is wrong. So there's also keyboard instructions,  open new, open settings as well,   and then tap to record doesn't actually do  anything. So I think making a Mac OS desktop   application into a mobile application is  like very difficult to do in one session   and I'd have to like iterate with the  model back and forth overall. But yeah,   honestly, I think both of them did not  do as good of a job as I hoped it would. GPT-5 Codex did manage to go further  than GPT-5. 1 Codex. So I probably will   use this particular thing as a starting  point and then continue to iterate on it   until it's like a working application and  then finally publish it to iOS App Store. I think my conclusion here probably is that  GPT-5. 1 Codex does seem more conversational   and it does seem to be better at design. If you  want even better design, then you would want   to go for GPT-5. 1 instead. But I think GPT-5. 1  Codex does seem to produce less tokens per turn. So if you're getting it to try  and do a really big task perhaps,   then it probably won't do as much of the task as  you would have liked. And that's why you can see   the implementation for GPT-5. 1 Codex being  worse than the GPT-5 Codex implementation. But anyways if you do want to buy the  MacOS version of HyperWhisper then it   will be linked down below with a coupon code  as well for Black Friday and if you do want   to learn more about Codex CLI and Claude Code  then I do have masterclasses on both of them   linked down below where I go through every single  feature as well as a bunch of bonus content and   finally I will be sharing more of my thoughts on  GPT-5. 1 and GPT-5. 1 Codex within my community as   well so if you are interested in that then  there will be a link down below for this.

Ещё от Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться