Vibe Coding News THIS WEEK!

8:55

Vibe Coding News THIS WEEK!

Ray Amjad 27.10.2025 4 312 просмотров 104 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=OtDZVZOnP3M - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=OtDZVZOnP3M - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=OtDZVZOnP3M 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=OtDZVZOnP3M - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links (in order): - https://scale.com/leaderboard/swe_bench_pro_public - https://x.com/FactoryAI/status/1971271087855186128/photo/2 - https://x.com/EnoReyes/status/1981457750258549073/photo/1 - https://x.com/alexalbert__/status/1972707077182394744/photo/1 - https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md - https://www.masterclaudecode.com/ - https://x.com/limgangrui/status/1981008992240611752?s=12 - https://github.com/openai/codex/releases - https://x.com/embirico/status/1982148230654681114?s=12 - https://x.com/ryolu_/status/1982580555905728560 - https://x.com/corbin_braun/status/1982227942936862940?s=12 - https://www.ycombinator.com/companies/compyle - https://www.compyle.ai/ - https://www.tensorai.app/ - https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746?ct=home-page - https://www.masterclaudecode.com/ - https://www.hyperwhisper.com/ Timestamps: 00:00 - Intro 00:03 - SWE-Bench Pro 00:53 - Limitations of Benchmarks 01:30 - Mixing Models 02:20 - Claude Code Updates 03:46 - Claude Code Viewer 04:25 - Codex CLI Updates 05:48 - Cursor News 06:30 - Compyle 08:18 - Conclusion

Оглавление (10 сегментов)

Intro

Okay, so we'll be going through a lot of the vibe coding related news from last week. First of all,

SWE-Bench Pro

we have a brand new update to SWE-Bench Pro benchmark from Scale AI. And this is basically their own much harder version of the SWE-Bench verified benchmark that you see a lot of the model providers like to use. And as you can see in the case of the Claude models, they're slowly getting to top of this particular benchmark, which means that we need to have harder ones like the one that Scale AI came up with. Anyways you can see that they have a bunch of new models like claude-4-5-Sonnet, Claude 4. 5 Haiku which does surprisingly well both within the error margin and then gpt-5-2025-08-07 (High) and also kimi-k2-instruct seems to do surprisingly well in this Benchmark as well meanwhile it seems that claude-opus-4-1-20250805 is kind of irrelevant now unfortunately they don't have GLM 4. 5 which is a pretty interesting model or 4. 6 but maybe they'll add it soon now as I have talked about before your performance

Limitations of Benchmarks

for each of these models can vary depending on which agenting environment you're using, whether that's Cursor, open code, Claude Code, or Claude Code, or basically any of the other agenting coding tools that are available. And this is pretty clear with Droid, for example, because when Droid came out last month, their brand new update, you can see that for the same models, it performs better than Claude Code, Codex CLI, on the same, well, Terminal-Bench accuracy benchmark. And that's because they really nailed the agentic loop that the model is operating in. So you may actually see different performances for each of these models, depending on which agenting environment you're using. Using it in. Anyway, something else

Mixing Models

interesting that I saw recently is this mixed model performance thing that is on Droid. And you can see that when they use Sonnet 4. 5, they get this score on Terminal Bench. When they use GLM-4. 5 Air, which is one of the open source Chinese models, they get 34. 6%. And when they use Sonnet 4. 5 for specification, but for executing and actually writing the code, they use GLM-4. 5 Air, then they get very close to Sonnet 4. 5. Like you probably would expect this to be closer to middle, but closer to Sonnet 4. 5. And I have heard that to be generally true. Many people are using Sonnet 4. 5 to plan out what changes should be made in Claude Code, for example. And then they switched to a lighter weight model like Claude 4. 5 Haiku instead to do the implementation of the code because it's faster and also like more cost effective. Anyways, we're going to talk about the Claude Code

Claude Code Updates

updates. And it is worth mentioning that I also released my Claude Code masterclass. So if you go to masterclaudecode. com, you should see it. And there are a bunch of useful lessons basically for every single feature that is available in Claude Code. As well as a bunch of bonus content and lessons for features and use cases that I haven't really talked about before on this channel. And I am still working my Codex CLI masterclass, so do stay tuned for that. Anyways, the first big change is a brand new UI for the permissions prompt. So if you do /permissions, then you can see now has like a tab structure and it also includes workspaces as well. So when you do /add-dir to add a directory, then it also appears in here. And this seems like slightly nicer, more convenient. They also improved this /resume command. So when you do /resume, then you can see you can press b to toggle the branch. So to only see sessions that were made on your current branch to avoid like accidentally opening a session that happened on a different branch. And you can also do slash to start searching. So I can search like Japanese and then and then I can see when I implemented Japanese language support to my application. So this makes the resuming session significantly easier. Now, something else that I did mention my previous video is this brand new interactive question tool. And I have found it to be useful in some cases, but in other cases it can be annoying. So if you do want to disable it, then you can put in this command into your settings. So if you do "deny": ["AskUserQuestion"], then it won't ask any questions anymore. Something

Claude Code Viewer

else pretty interesting that I came across is this Claude Code viewer. And this basically helps you see what's happening under the hood because I think it's intercepting the requests that are happening to the Anthropic API servers and then like passing the requests and putting them in a nice UI. You can see what kind of tool calls and commands and messages were passed from the main agent to subagent. And also you can see what the subagent exactly did. I think this is probably most useful for those people use subagents quite a lot because you can see how information is being passed from the main agent to the subagent and back to main agent. And you can probably change your prompt around to make sure better information or the correct information is being passed so the main agent and subagents can make better decisions overall. Now we also have some Codex CLI updates.

Codex CLI Updates

They haven't really updated it much over the last month or so, so I'm not exactly sure what they're doing, but I imagine that cooking something big. Anyways, we now have a /feedback command and if you do that, then you can say whether that's a bug, bad result, good result or something else entirely. They also made a bunch of improvements to MCP servers, so you can do things such as specifying which tools are enabled, disabled. They also added a brand new thing, which is --add-dir to allow you to add an additional working directory. And this is pretty similar to what already existing Claude Code with add-dir. But in the case of Codex CLI, it seems you have to trigger it by doing this instead when you're launching Codex CLI for the first time. And after adding it, it should be able to look through both directories. Even though it just says directory here, it will have multiple working directories. They also say they made this /compact command much more stable and they added auto compaction too. And auto compaction happens around 90% of the context window being full as you can see in this pull request. And apparently last week OpenAI engineer ran Codex on an extremely hard task for over 60 hours. This took roughly a dozen auto compactions. They're much more stable now. But I guess the main thing here is whether it would be successful after 60 hours because I guess anyone can get Codex CLI or another agent to run for like 100 plus hours, but whether result is actually good is a different story. And as for some Cursor related news, it seems that Cursor

Cursor News

2. 0 is going to be coming out soon. Some people have been teasing some features from it online. I don't seem to see the feature myself, so maybe I have to switch to early access or something. But you can see there are a few changes. You can see that there's a microphone icon so you can speak directly into Cursor. There's also a option where you can have a run in the background on the cloud with Cursor background agents, run in parallel with different work trees, or just run locally normally on your device. And it also seems you will be able to run multiple models at the same time. And I think it will likely be running on different work trees and then maybe you combine the solution together somehow or pick the best solution. I'm not exactly sure how this will work yet.

Compyle

And another pretty interesting tool that I've been trying recently is Compyle spelled with a y. And what made me interested in this was this like image over here where it basically helps you come up with like a pretty useful plan. So for my application Tensor AI, I was adding like a, if I go to application, I was adding a feature that allows users to archive of articles that they already read. So you wouldn't be able to swipe to left or right or something, and then an archive article. So I first told it to come up with a plan for me on how to implement this feature. And then it generated some UI mockups as well after asking me some clarifying questions. And then I was like, yes, I want it to be green with an icon or something like in this diagram. And then the next diagram, it then showed me like, does this unarchiving page look good? And I'm like, no, I don't want the button over here. So what I find really useful here is that you can actually see visually what kind of changes would end up being made and that gives you a much more robust plan overall. And then you can see this is a plan that I came up with at the end. So I should be able to give this plan to an agent like Haiku 4. 5 or something because it's like pretty long, robust, everything is in there. Or maybe give it to even cheaper Chinese model, for example, and then have it execute on the plan. You can also get it to execute on the plan within Compyle, but I haven't really tried that feature yet. But ultimately I think that people are slowly becoming worse and lazier at prompting over time. So there will be a new set of tools like Compyle that basically help people prompt better or come up with better plans through a bunch of clarifying questions. I think the standout here for me has been the like UI related components. And also the questions that it has been asking me seems to be more natural than the questions that I've been getting from Claude Code in the plan interactive questions mode. So I think there's a lot of room in this like planning sort of domain for development, and I'm pretty excited to see where this product or maybe other products go. Anyways, that's basically for the video.

Conclusion

If you do want to check out my brand new master Claude Code class, there will be a coupon code down below with a link. And if you do want to check out the programs that I personally make in Vibe Code, like HyperWhisper, there will be links and coupon codes for those down below. Because I don't accept any sponsors on the channel and pay for the subscriptions myself, it's all funded by the softwares and classes that I sell. And if you do want to see anything new in this brand new this week in Vibe coding series that I'm doing, then do leave a comment down below and I will make the future videos in the series much better. And if you do enjoy this kind of stuff and want to see more of it, then do subscribe to the channel because it lets the algorithm know that this is a good video and everything like that.

Другие видео автора — Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник