Vibe Coding News THIS WEEK!
8:55

Vibe Coding News THIS WEEK!

Ray Amjad 27.10.2025 4 312 просмотров 104 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=OtDZVZOnP3M - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=OtDZVZOnP3M - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=OtDZVZOnP3M 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=OtDZVZOnP3M - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links (in order): - https://scale.com/leaderboard/swe_bench_pro_public - https://x.com/FactoryAI/status/1971271087855186128/photo/2 - https://x.com/EnoReyes/status/1981457750258549073/photo/1 - https://x.com/alexalbert__/status/1972707077182394744/photo/1 - https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md - https://www.masterclaudecode.com/ - https://x.com/limgangrui/status/1981008992240611752?s=12 - https://github.com/openai/codex/releases - https://x.com/embirico/status/1982148230654681114?s=12 - https://x.com/ryolu_/status/1982580555905728560 - https://x.com/corbin_braun/status/1982227942936862940?s=12 - https://www.ycombinator.com/companies/compyle - https://www.compyle.ai/ - https://www.tensorai.app/ - https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746?ct=home-page - https://www.masterclaudecode.com/ - https://www.hyperwhisper.com/ Timestamps: 00:00 - Intro 00:03 - SWE-Bench Pro 00:53 - Limitations of Benchmarks 01:30 - Mixing Models 02:20 - Claude Code Updates 03:46 - Claude Code Viewer 04:25 - Codex CLI Updates 05:48 - Cursor News 06:30 - Compyle 08:18 - Conclusion

Оглавление (10 сегментов)

  1. 0:00 Intro 20 сл.
  2. 0:03 SWE-Bench Pro 163 сл.
  3. 0:53 Limitations of Benchmarks 119 сл.
  4. 1:30 Mixing Models 169 сл.
  5. 2:20 Claude Code Updates 298 сл.
  6. 3:46 Claude Code Viewer 156 сл.
  7. 4:25 Codex CLI Updates 284 сл.
  8. 5:48 Cursor News 148 сл.
  9. 6:30 Compyle 418 сл.
  10. 8:18 Conclusion 160 сл.
0:00

Intro

Okay, so we'll be going through a lot of the vibe  coding related news from last week. First of all,
0:03

SWE-Bench Pro

we have a brand new update to SWE-Bench  Pro benchmark from Scale AI. And this   is basically their own much harder  version of the SWE-Bench verified   benchmark that you see a lot of  the model providers like to use. And as you can see in the case of the Claude  models, they're slowly getting to top of this   particular benchmark, which means that we need to  have harder ones like the one that Scale AI came   up with. Anyways you can see that they have  a bunch of new models like claude-4-5-Sonnet,   Claude 4. 5 Haiku which does surprisingly  well both within the error margin and   then gpt-5-2025-08-07 (High) and also  kimi-k2-instruct seems to do surprisingly   well in this Benchmark as well meanwhile it  seems that claude-opus-4-1-20250805 is kind   of irrelevant now unfortunately they don't  have GLM 4. 5 which is a pretty interesting   model or 4. 6 but maybe they'll add it soon now  as I have talked about before your performance
0:53

Limitations of Benchmarks

for each of these models can vary depending  on which agenting environment you're using,   whether that's Cursor, open code, Claude Code,  or Claude Code, or basically any of the other   agenting coding tools that are available. And  this is pretty clear with Droid, for example,   because when Droid came out last month, their  brand new update, you can see that for the same   models, it performs better than Claude Code, Codex  CLI, on the same, well, Terminal-Bench accuracy   benchmark. And that's because they really nailed  the agentic loop that the model is operating in. So you may actually see different  performances for each of these models,   depending on which agenting environment you're  using. Using it in. Anyway, something else
1:30

Mixing Models

interesting that I saw recently is this mixed  model performance thing that is on Droid. And you can see that when they use Sonnet  4. 5, they get this score on Terminal Bench.    When they use GLM-4. 5 Air, which is one of the  open source Chinese models, they get 34. 6%.    And when they use Sonnet 4. 5 for specification,  but for executing and actually writing the code,   they use GLM-4. 5 Air, then they get very close  to Sonnet 4. 5. Like you probably would expect   this to be closer to middle, but closer to Sonnet  4. 5. And I have heard that to be generally true. Many people are using Sonnet 4. 5 to plan out  what changes should be made in Claude Code,   for example. And then they switched to  a lighter weight model like Claude 4. 5   Haiku instead to do the implementation of  the code because it's faster and also like   more cost effective. Anyways, we're  going to talk about the Claude Code
2:20

Claude Code Updates

updates. And it is worth mentioning that I  also released my Claude Code masterclass. So if you go to masterclaudecode. com, you should  see it. And there are a bunch of useful lessons   basically for every single feature that is  available in Claude Code. As well as a bunch   of bonus content and lessons for features  and use cases that I haven't really talked   about before on this channel. And I am still  working my Codex CLI masterclass, so do stay   tuned for that. Anyways, the first big change  is a brand new UI for the permissions prompt. So if you do /permissions, then you can  see now has like a tab structure and   it also includes workspaces as well. So  when you do /add-dir to add a directory,   then it also appears in here. And this  seems like slightly nicer, more convenient. They also improved this /resume  command. So when you do /resume,   then you can see you can press b to toggle  the branch. So to only see sessions that   were made on your current branch to avoid  like accidentally opening a session that   happened on a different branch. And you  can also do slash to start searching. So   I can search like Japanese and then and then  I can see when I implemented Japanese language   support to my application. So this makes  the resuming session significantly easier. Now, something else that I did mention my previous  video is this brand new interactive question tool.    And I have found it to be useful in some cases,  but in other cases it can be annoying. So if you   do want to disable it, then you can put in this  command into your settings. So if you do "deny":   ["AskUserQuestion"], then it won't  ask any questions anymore. Something
3:46

Claude Code Viewer

else pretty interesting that I came  across is this Claude Code viewer. And this basically helps you see what's happening  under the hood because I think it's intercepting   the requests that are happening to the Anthropic  API servers and then like passing the requests   and putting them in a nice UI. You can see what  kind of tool calls and commands and messages   were passed from the main agent to subagent. And  also you can see what the subagent exactly did. I think this is probably most useful for those  people use subagents quite a lot because you   can see how information is being passed  from the main agent to the subagent and   back to main agent. And you can probably  change your prompt around to make sure   better information or the correct information  is being passed so the main agent and subagents   can make better decisions overall. Now  we also have some Codex CLI updates.
4:25

Codex CLI Updates

They haven't really updated it  much over the last month or so,   so I'm not exactly sure what they're doing, but  I imagine that cooking something big. Anyways,   we now have a /feedback command and if you do  that, then you can say whether that's a bug,   bad result, good result or something else  entirely. They also made a bunch of improvements   to MCP servers, so you can do things such as  specifying which tools are enabled, disabled. They also added a brand new thing, which is  --add-dir to allow you to add an additional   working directory. And this is pretty similar to  what already existing Claude Code with add-dir.    But in the case of Codex CLI, it seems you have  to trigger it by doing this instead when you're   launching Codex CLI for the first time. And after  adding it, it should be able to look through both   directories. Even though it just says directory  here, it will have multiple working directories.    They also say they made this /compact command much  more stable and they added auto compaction too. And auto compaction happens around 90% of  the context window being full as you can   see in this pull request. And apparently last  week OpenAI engineer ran Codex on an extremely   hard task for over 60 hours. This took roughly a  dozen auto compactions. They're much more stable   now. But I guess the main thing here is whether  it would be successful after 60 hours because I   guess anyone can get Codex CLI or another agent to  run for like 100 plus hours, but whether result is   actually good is a different story. And as for  some Cursor related news, it seems that Cursor
5:48

Cursor News

2. 0 is going to be coming out soon. Some people  have been teasing some features from it online.    I don't seem to see the feature myself, so maybe  I have to switch to early access or something. But you can see there are a few changes.   You can see that there's a microphone   icon so you can speak directly into  Cursor. There's also a option where   you can have a run in the background on  the cloud with Cursor background agents,   run in parallel with different work trees, or  just run locally normally on your device. And   it also seems you will be able to run multiple  models at the same time. And I think it will   likely be running on different work trees  and then maybe you combine the solution   together somehow or pick the best solution.   I'm not exactly sure how this will work yet.
6:30

Compyle

And another pretty interesting tool that I've  been trying recently is Compyle spelled with   a y. And what made me interested in this was this  like image over here where it basically helps you   come up with like a pretty useful plan. So for  my application Tensor AI, I was adding like a,   if I go to application, I was adding a feature  that allows users to archive of articles that   they already read. So you wouldn't be able  to swipe to left or right or something,   and then an archive article. So I first told  it to come up with a plan for me on how to   implement this feature. And then it generated  some UI mockups as well after asking me some   clarifying questions. And then I was like, yes,  I want it to be green with an icon or something   like in this diagram. And then the  next diagram, it then showed me like,   does this unarchiving page look good? And  I'm like, no, I don't want the button over   here. So what I find really useful here is  that you can actually see visually what kind   of changes would end up being made and that  gives you a much more robust plan overall. And then you can see this is a plan that I  came up with at the end. So I should be able   to give this plan to an agent like Haiku 4. 5  or something because it's like pretty long,   robust, everything is in there. Or maybe  give it to even cheaper Chinese model,   for example, and then have it execute on the  plan. You can also get it to execute on the   plan within Compyle, but I haven't really  tried that feature yet. But ultimately   I think that people are slowly becoming  worse and lazier at prompting over time. So there will be a new set of tools like  Compyle that basically help people prompt   better or come up with better plans through  a bunch of clarifying questions. I think the   standout here for me has been the like UI  related components. And also the questions   that it has been asking me seems to be more  natural than the questions that I've been   getting from Claude Code in the plan interactive  questions mode. So I think there's a lot of room   in this like planning sort of domain for  development, and I'm pretty excited to see   where this product or maybe other products  go. Anyways, that's basically for the video.
8:18

Conclusion

If you do want to check out my  brand new master Claude Code class,   there will be a coupon code down below with a  link. And if you do want to check out the   programs that I personally make in Vibe Code,  like HyperWhisper, there will be links and   coupon codes for those down below. Because  I don't accept any sponsors on the channel   and pay for the subscriptions myself, it's all  funded by the softwares and classes that I sell. And if you do want to see anything new in this  brand new this week in Vibe coding series that   I'm doing, then do leave a comment down below and  I will make the future videos in the series much   better. And if you do enjoy this kind of stuff and  want to see more of it, then do subscribe to the   channel because it lets the algorithm know that  this is a good video and everything like that.

Ещё от Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться