Claude Code's New Default Subagents (This Week's News!)

12:31

Claude Code's New Default Subagents (This Week's News!)

Ray Amjad 03.11.2025 5 745 просмотров 138 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=TOs0DRBcdRs - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=TOs0DRBcdRs - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=TOs0DRBcdRs 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=TOs0DRBcdRs - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links: - https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md - https://www.reddit.com/r/Anthropic/comments/1oc8uq9/claude_code_overrides_the_sandbox_without/ - https://www.reddit.com/r/ClaudeCode/comments/1ohb48d/claude_code_after_i_caught_it_disabling_the/ - https://x.com/claudeai/status/1984304957353243061?s=12 - https://x.com/dmwlff/status/1984338962698420300?s=12 - https://github.com/numman-ali/openskills - https://nextjs.org/evals - https://github.com/vercel/next-evals-oss - https://scale.com/leaderboard/rli - https://arxiv.org/pdf/2510.26787 Timestamps: 00:00 - Introduction 00:12 - New Native Build 01:28 - Output Styles Deprecation 01:59 - Allow Unsandboxed Commands 03:21 - Disallowed Tools for Subagents 04:38 - Prompt-Based Stop Hooks 05:00 - Planning Subagent 06:19 - Resuming Subagents 06:58 - Auto Model Choice 07:19 - OpenSkills 07:44 - Next.js AI Evals 09:00 - Remote Labor Index Benchmark

Оглавление (12 сегментов)

Introduction

Okay, so I'll be going over a bunch of the Claude Code news from last week, as well as some related things such as open skills, also a brand new eval from Next. js, a brand new benchmark, and also some related Claude Code plugins as well. So firstly, one of the biggest changes that we have is now

New Native Build

we have a Claude Code native installer that's much more stable and does not require Node. js. So previously, if you remember, when you were installing Claude Code, you had to install Node and then npm, and that was a bit of a hassle. But now you can install Claude Code much more easily by using this brew command or installing it via the scripts instead. And if you already have Claude Code installed, then you can use `claude install` and it will switch over to native installer. So you can see that if I run `npm list -g`, then I have Claude Code installed via npm. But now if I just run `claude install`, then it will switch over to the native build version, which should be much more stable according to their like announcement. And if I run `npm list -g` again, you can see that Claude Code is no longer in this list because it's switched over to the native build instead. Some of the improvements here mean that the auto-updater is much more stable and users should run into fewer failed updates and bricked installs. And that Claude Code is now a self-contained executable and no longer has a Node. js dependency. So basically, if you're telling a friend to install Claude Code for the first time, then you can just give them any of these commands and that should help them install it quickly. And then after that, you can tell them to watch my brand new Claude Code masterclass that will also be linked down below.

Output Styles Deprecation

Okay, now if you run Claude Code, one of the biggest changes you will see is that they're planning to deprecate output styles. So if you do `/output-style`, you will see that it will be removed on November 5th or later, and they'll automatically convert any output styles you have. And they basically recommend using plugins instead. And I think it's pretty nice because they're consolidating a bunch of features. They also recommend using system prompt files, system-prompt, append-system-prompt, or any of these, basically whichever one gets the job done easily. I do go through all of them in my Claude Code masterclass as well, so do check that out.

Allow Unsandboxed Commands

A useful setting that they added to sandbox mode is `allowUnsandboxedCommands`. And basically a problem that was happening with some people, for example, this person on Reddit, they were running sandbox mode, and then all of a sudden, Claude Code decided to manually override sandbox mode by running a bash command outside of sandbox mode. And how this works is that whenever Claude Code tries to use a bash tool, it can set an additional parameter `dangerouslyOverrideSandbox` to `true`, which basically means that it won't check against the sandbox to see if that command is blocked or not. Now, in some cases that can be useful, but also you don't want to risk Claude Code overriding the sandbox. So what you can do to prevent that from happening in your case is copy this over, then go to your sandbox. And then in brackets put like this `allowUnsandboxedCommands` to `false`. And that basically means that Claude Code if it tries to use a bash tool and then tries to manually override the sandbox then it won't be able to. So that's probably more useful for enterprises who want to make sure that Claude Code is running in a well-behaved sandbox. Hey, so as a short aside, in addition to my paid community, I launched a brand new free community earlier today. There are some vibe coding techniques that I shared in the community that I haven't talked about before on my YouTube channel. And also there are a bunch of templates to help kickstart your next vibe coding project off. You will be able to chat with people all around the world in the community. So if you are casually interested in vibe coding and want to chat with more people about it online, then this community will be a pretty good place to do so. There will be a link down below for those who are interested, so do check it out.

Disallowed Tools for Subagents

Something else they added is a `disallowedTools` field to custom agent definitions for explicit tool blocking. So you can see that in my agents file, I have a web-fetcher sub-agent that has `disallowedTools`, a `webfetch`, `bash`, and `websearch`. So obviously it's a web fetcher and if it can't use bash, hence it can't use curl and it can't do a web search and it can't do web fetcher either, then obviously it's not going to be very useful. But now if I try running this sub-agent by calling the sub-agent explicitly and saying `fetch masterclaude. com`, then you will see it basically won't be able to fetch that particular website because it doesn't have any tools available to fetch any of the content from it. So instead it tries to find it on my local computer instead. But now if I remove the `disallowedTools` and then run the sub-agent again and I say `fetch masterclaude. com`, which is my masterclass, then you will see that it actually is able to pass in that prompt and then do the fetch command and I can press allow and then it will basically fetch the content that's required from it. And then you can see it gives me a summary of the page. So this can be useful for specific sub-agents that you see inappropriately calling tools that they should not be allowed to call. You can just add that to the file.

Prompt-Based Stop Hooks

They also added prompt-based stop hooks and I'm not exactly sure what that is. So I ran a stop hook to see if anything has changed and it seems like nothing has really changed and people on Reddit are also wondering the same thing, like what is a prompt-based stop hook? So if you do know, then leave a comment down below of like what exactly that is because I'm pretty confused. Anyways, they made a bunch of bug fixes over here, which we're not going to be

Planning Subagent

going over. One thing they did do is they added a new planning sub-agent. So we can see this in action because if I make a change to my project HyperWhisper, so that's my AI speech-to-text application. I want to add Speechmatics as well as one of the AI speech-to-text providers. So if I go to planning mode and basically say using the application, "Hey, can you add Speechmatics to this particular application? Here's a link to their website. https://www. speechmatics. com/" and also paste a link to their website, press enter. You will see it will firstly call the Explore sub-agent to first explore the codebase to understand how it works and also fetch the website as well. And now you can see after fetching the information, it's calling this brand new Plan sub-agent and has passed in this prompt to the Plan sub-agent. So all the inner workings of that particular sub-agent will be deleted and only the final result will be kept and passed back to the main session. Anyways, what this means for you practically is that when you're using planning mode, you will not end up using as much of the context window as you previously did because the planning is now running within a sub-agent. You can also trigger the sub-agent manually by not using planning mode and then just doing `@`, doing `agent`. And then you will see it on the list, called `agent-Plan`. And that's right underneath the `agent-Explore`. These are the three built-in sub-agents into Claude Code.

Resuming Subagents

now also allows you to resume sub-agents. So you can see that I triggered a sub-agent here, which is the planning sub-agent. And I said "come up with a detailed life plan. " And now it's asking me some clarifying questions. It gave me a response. This response goes back to the main thread, the main general-purpose agent, and then I can say, I can say "continue with that sub-agent, resume it" to explicitly resume that sub-agent. And then Claude Code should be able to find that sub-agent and then resume the session that it was having with it. And if I review the transcripts for that particular session, then I can see through the transcript that it actually resumed the old sub-agent instead of spawning a brand new one.

Auto Model Choice

Claude Code can also now choose the model used by its sub-agents. So I guess that means that if you now go to your sub-agent, for example, and then remove the model from this line, then it will choose the appropriate model for whatever task that it has in mind. But I don't think that I would really be using this because I know what models should be used by which sub-agents. And then there were a bunch of very small changes here.

OpenSkills

A pretty interesting project that I saw online recently is OpenSkills. And if you install it and then run OpenSkills sync, it will bring Anthropic's skills system to all AI coding agents on your machine. So Claude Code, Cursor, Windsurf, and Aider. And basically I think the way that it works is it injects this particular thing into the system prompt of all the above coding agents so that they're aware of whatever skills are running and basically how they work.

Next.js AI Evals

Another pretty interesting thing that I saw is the AI model performance evaluations for Next. js specifically. And basically they compared how different AI models and agents compare on Next. js code generation and migration, measuring success rate, execution time, token usage, and quality improvements. So basically you can see that gpt-5-codex seems to perform the best, has the high success rate overall for all these different tasks that are available in Next. js. And you can see how many tokens it uses, how long it took in each of these tasks. And if you want to see the tasks themselves for their particular evals, then that should be available on GitHub here. So it's quite interesting that gpt-5-codex does the best here. claude-opus-4. 1 does slightly worse at 40%. glm-4. 6 does surprisingly well. And sadly, it seems that claude-sonnet-4. 5 only gets 32% correct. So it's interesting that when it comes to Next. js specifically, then glm-4. 6 does better than claude-sonnet-4. 5. Then you can also see how different agents perform as well, such as `codex` actually does worse for some reason. `cursor (composer-1)` does seemingly better, and `cursor (sonnet 4. 5)` does pretty well. So anyway, this is probably worth looking through yourself if you're using Next. js quite a lot.

Remote Labor Index Benchmark

Anyway, something else that is pretty interesting is that Scale AI came out with a brand new benchmark called the Remote Labor Index (RLI) that basically evaluates how good different AI agents are at performing real-world, economically valuable remote work. And basically what they did is they gathered 240 projects from Upwork across a bunch of different domains. So I think 23 different Upwork domains from 64 in total, excluding projects requiring physical labor, long-term evaluation, or direct client interaction. And from 550 initial projects, they now have 240 projects to be attempted. And then basically the value of all that work is $143,991. And then you can see all these different categories. And they basically evaluated the agents across all these different benchmarks. And basically the conclusions that they came to is that when it comes to absolute automation, like completing a project for a client end-to-end, current agents perform near the floor. So you can see the highest performing agent, Manus, achieved a 2. 5% automation rate in the sense that if that particular result was delivered to a client, would the client actually accept that result? And then you can see how the other models do. So claude-4-5-Sonnet only got 2% ish. gpt-5-2025-08-07 got 1. 6%, ChatGPT agent got 1. 25%, then gemini-2. 5-pro got 0. 83%. But they say that whilst the absolute performance is low, ELO scores reveal a steady and measurable progress. So newer frontier models consistently rank higher than older ones, which means that probably over time they're going to be using this benchmark to actually evaluate how economically valuable these agents are. The most common failure modes for the failed projects was the quality was too poor, like agents produce a child-like or amateur-quality work. The deliverables were incomplete, so they did not follow all the instructions for what kind of work should be delivered as a final result. Technical and file issues, so agents produced corrupt, empty, or incorrect file formats, and then various inconsistencies as well. Anyways, this is pretty interesting. It will be linked down below alongside the actual paper as well. So I would recommend reading through it if you're interested, and I think that speaks to the capabilities of many of these agents right now. Like an agent right now is not going to like build your company from scratch. It would do a pretty bad job of that. At least when it comes to coding, a lot of people say that they're better at like really big auto-completion tasks. So it can autocomplete a lot of the codebase for you, given that you have a good understanding of what things should be completed. But I was also watching the recent Andrej Karpathy interview as well, and he talked about how when it came to one particular project that he was working on, it was really hard to use any of the coding agents because they really struggle when it comes to a codebase which is completely new, like it's not anything that has been done before, was incredibly rare. But they perform better when it comes to public frameworks or using techniques that are commonly and well used, which I think makes them better at like auto-completion tasks. Anyways, I'm going to be following this benchmark personally because I'm interested in seeing how well it performs over time, but it is pretty interesting that Claude Sonnet 4. 5 does so well on the benchmarks and GPT-5 as well. I do wish they measured some of the Chinese models as well, but I think they will be improving on this benchmark over time. Anyways that's basically it for the video, if you do enjoy this kind of stuff then do subscribe to the channel because it lets me know that I should be making more of this and if you do want to join my free community as well then it will be linked down below and also my Claude Code masterclass will be linked down below as well.

Другие видео автора — Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник