Claude Code's New Default Subagents (This Week's News!)
12:31

Claude Code's New Default Subagents (This Week's News!)

Ray Amjad 03.11.2025 5 745 просмотров 138 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Level up with my Claude Code Masterclass 👉 https://www.masterclaudecode.com/ Learn the AI I'm learning with my newsletter 👉 https://newsletter.rayamjad.com/ Got any questions? DM me on Instagram 👉 https://www.instagram.com/theramjad/ 🎙️ Sign up to the HyperWhisper Windows Waitlist 👉 https://forms.gle/yCuqmEUrfKKnd6sN7 Since I've never accepted a sponsor, my videos are made possible by... —— MY CLASSES —— 🚀 Claude Code Masterclass: https://www.masterclaudecode.com/?utm_source=youtube&utm_campaign=TOs0DRBcdRs - Use coupon code YEAR2026 for 35% off —— MY APPS —— 🎙️ HyperWhisper, write 5x faster with your voice: https://www.hyperwhisper.com/?utm_source=youtube&utm_campaign=TOs0DRBcdRs - Use coupon code YEAR2026 for 35% off 📲 Tensor AI: Never Miss the AI News - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai - 100% FREE 📹 VidTempla, Manage YouTube Descriptions at Scale: http://vidtempla.com/?utm_source=youtube&utm_campaign=TOs0DRBcdRs 💬 AgentStack, AI agents for customer support and sales: https://www.agentstack.build/?utm_source=youtube&utm_campaign=TOs0DRBcdRs - Request private beta by emailing r@rayamjad.com ————— CONNECT WITH ME 🐦 X: https://x.com/@theramjad 👥 LinkedIn: https://www.linkedin.com/in/rayamjad/ 📸 Instagram: https://www.instagram.com/theramjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links: - https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md - https://www.reddit.com/r/Anthropic/comments/1oc8uq9/claude_code_overrides_the_sandbox_without/ - https://www.reddit.com/r/ClaudeCode/comments/1ohb48d/claude_code_after_i_caught_it_disabling_the/ - https://x.com/claudeai/status/1984304957353243061?s=12 - https://x.com/dmwlff/status/1984338962698420300?s=12 - https://github.com/numman-ali/openskills - https://nextjs.org/evals - https://github.com/vercel/next-evals-oss - https://scale.com/leaderboard/rli - https://arxiv.org/pdf/2510.26787 Timestamps: 00:00 - Introduction 00:12 - New Native Build 01:28 - Output Styles Deprecation 01:59 - Allow Unsandboxed Commands 03:21 - Disallowed Tools for Subagents 04:38 - Prompt-Based Stop Hooks 05:00 - Planning Subagent 06:19 - Resuming Subagents 06:58 - Auto Model Choice 07:19 - OpenSkills 07:44 - Next.js AI Evals 09:00 - Remote Labor Index Benchmark

Оглавление (12 сегментов)

  1. 0:00 Introduction 59 сл.
  2. 0:12 New Native Build 262 сл.
  3. 1:28 Output Styles Deprecation 110 сл.
  4. 1:59 Allow Unsandboxed Commands 323 сл.
  5. 3:21 Disallowed Tools for Subagents 240 сл.
  6. 4:38 Prompt-Based Stop Hooks 88 сл.
  7. 5:00 Planning Subagent 256 сл.
  8. 6:19 Resuming Subagents 130 сл.
  9. 6:58 Auto Model Choice 83 сл.
  10. 7:19 OpenSkills 81 сл.
  11. 7:44 Next.js AI Evals 215 сл.
  12. 9:00 Remote Labor Index Benchmark 672 сл.
0:00

Introduction

Okay, so I'll be going over a bunch of the Claude  Code news from last week, as well as some related   things such as open skills, also a brand new eval  from Next. js, a brand new benchmark, and also some   related Claude Code plugins as well. So firstly,  one of the biggest changes that we have is now
0:12

New Native Build

we have a Claude Code native installer that's  much more stable and does not require Node. js.    So previously, if you remember, when you were  installing Claude Code, you had to install Node   and then npm, and that was a  bit of a hassle. But now you can install Claude   Code much more easily by using this brew command  or installing it via the scripts instead. And if   you already have Claude Code installed, then you  can use `claude install` and it will switch over   to native installer. So you can see that if  I run `npm list -g`, then I have Claude Code   installed via npm. But now if I just run `claude  install`, then it will switch over to the native   build version, which should be much more stable  according to their like announcement. And if I run   `npm list -g` again, you can see that Claude Code  is no longer in this list because it's switched   over to the native build instead. Some of the  improvements here mean that the auto-updater is   much more stable and users should run into fewer  failed updates and bricked installs. And that   Claude Code is now a self-contained executable and  no longer has a Node. js dependency. So basically,   if you're telling a friend to install Claude  Code for the first time, then you can just give   them any of these commands and that should help  them install it quickly. And then after that, you   can tell them to watch my brand new Claude Code  masterclass that will also be linked down below.
1:28

Output Styles Deprecation

Okay, now if you run Claude Code, one of the  biggest changes you will see is that they're   planning to deprecate output styles. So if  you do `/output-style`, you will see that   it will be removed on November 5th or later, and  they'll automatically convert any output styles   you have. And they basically recommend using  plugins instead. And I think it's pretty nice   because they're consolidating a bunch of features.   They also recommend using system prompt files,   system-prompt, append-system-prompt, or any of  these, basically whichever one gets the job done   easily. I do go through all of them in my Claude  Code masterclass as well, so do check that out.
1:59

Allow Unsandboxed Commands

A useful setting that they added to sandbox mode  is `allowUnsandboxedCommands`. And basically a   problem that was happening with some people, for  example, this person on Reddit, they were running   sandbox mode, and then all of a sudden, Claude  Code decided to manually override sandbox mode by   running a bash command outside of sandbox mode.   And how this works is that whenever Claude Code   tries to use a bash tool, it can set an additional  parameter `dangerouslyOverrideSandbox` to `true`,   which basically means that it won't check against  the sandbox to see if that command is blocked or   not. Now, in some cases that can be useful, but  also you don't want to risk Claude Code overriding   the sandbox. So what you can do to prevent that  from happening in your case is copy this over,   then go to your sandbox. And then in brackets put  like this `allowUnsandboxedCommands` to `false`.    And that basically means that Claude Code if  it tries to use a bash tool and then tries   to manually override the sandbox then it won't  be able to. So that's probably more useful for   enterprises who want to make sure that Claude  Code is running in a well-behaved sandbox. Hey, so as a short aside, in addition to my paid  community, I launched a brand new free community   earlier today. There are some vibe coding  techniques that I shared in the community   that I haven't talked about before on my YouTube  channel. And also there are a bunch of templates   to help kickstart your next vibe coding project  off. You will be able to chat with people all   around the world in the community. So if you  are casually interested in vibe coding and   want to chat with more people about it online,  then this community will be a pretty good place   to do so. There will be a link down below for  those who are interested, so do check it out.
3:21

Disallowed Tools for Subagents

Something else they added is a `disallowedTools`  field to custom agent definitions for explicit   tool blocking. So you can see that in my agents  file, I have a web-fetcher sub-agent that has   `disallowedTools`, a `webfetch`, `bash`, and  `websearch`. So obviously it's a web fetcher   and if it can't use bash, hence it can't use  curl and it can't do a web search and it can't   do web fetcher either, then obviously it's  not going to be very useful. But now if I try   running this sub-agent by calling the sub-agent  explicitly and saying `fetch masterclaude. com`,   then you will see it basically won't be able to  fetch that particular website because it doesn't   have any tools available to fetch any of the  content from it. So instead it tries to find it   on my local computer instead. But now if I remove  the `disallowedTools` and then run the sub-agent   again and I say `fetch masterclaude. com`, which  is my masterclass, then you will see that it   actually is able to pass in that prompt and then  do the fetch command and I can press allow and   then it will basically fetch the content that's  required from it. And then you can see it gives me   a summary of the page. So this can be useful for  specific sub-agents that you see inappropriately   calling tools that they should not be allowed  to call. You can just add that to the file.
4:38

Prompt-Based Stop Hooks

They also added prompt-based stop hooks and  I'm not exactly sure what that is. So I ran   a stop hook to see if anything has changed and it  seems like nothing has really changed and people   on Reddit are also wondering the same thing,  like what is a prompt-based stop hook? So if you   do know, then leave a comment down below of like  what exactly that is because I'm pretty confused. Anyways, they made a bunch of bug fixes  over here, which we're not going to be
5:00

Planning Subagent

going over. One thing they did do is they  added a new planning sub-agent. So we can   see this in action because if I make  a change to my project HyperWhisper,   so that's my AI speech-to-text application. I  want to add Speechmatics as well as one of the   AI speech-to-text providers. So if I go to  planning mode and basically say using the   application, "Hey, can you add Speechmatics to  this particular application? Here's a link to   their website. https://www. speechmatics. com/"  and also paste a link to their website,   press enter. You will see it will firstly  call the Explore sub-agent to first explore   the codebase to understand how it works and also  fetch the website as well. And now you can see   after fetching the information, it's calling  this brand new Plan sub-agent and has passed   in this prompt to the Plan sub-agent. So all the  inner workings of that particular sub-agent will   be deleted and only the final result will be kept  and passed back to the main session. Anyways, what   this means for you practically is that when you're  using planning mode, you will not end up using as   much of the context window as you previously  did because the planning is now running within   a sub-agent. You can also trigger the sub-agent  manually by not using planning mode and then just   doing `@`, doing `agent`. And then you will see  it on the list, called `agent-Plan`. And that's   right underneath the `agent-Explore`. These are  the three built-in sub-agents into Claude Code.
6:19

Resuming Subagents

now also allows you to resume sub-agents. So you can see that I triggered a sub-agent here, which is the planning sub-agent. And I said "come up with a detailed life plan. " And now it's asking me some clarifying questions. It gave me a response. This response goes back to the main thread, the main general-purpose agent, and then I can say, I can say "continue with that sub-agent, resume it" to explicitly resume that sub-agent. And then Claude Code should be able to find that sub-agent and then resume the session that it was having with it. And if I review the transcripts for that particular session, then I can see through the transcript that it actually resumed the old sub-agent instead of spawning a brand new one.
6:58

Auto Model Choice

Claude Code can also now choose the model used  by its sub-agents. So I guess that means that if   you now go to your sub-agent, for example,  and then remove the model from this line,   then it will choose the appropriate model  for whatever task that it has in mind. But   I don't think that I would really be using  this because I know what models should be   used by which sub-agents. And then there  were a bunch of very small changes here.
7:19

OpenSkills

A pretty interesting project that I saw online  recently is OpenSkills. And if you install   it and then run OpenSkills sync, it will bring  Anthropic's skills system to all AI coding agents   on your machine. So Claude Code, Cursor, Windsurf,  and Aider. And basically I think the way that it   works is it injects this particular thing into  the system prompt of all the above coding agents   so that they're aware of whatever skills  are running and basically how they work.
7:44

Next.js AI Evals

Another pretty interesting thing that I saw is  the AI model performance evaluations for Next. js   specifically. And basically they compared how  different AI models and agents compare on Next. js   code generation and migration, measuring  success rate, execution time, token usage,   and quality improvements. So basically you can  see that gpt-5-codex seems to perform the best,   has the high success rate overall for all  these different tasks that are available   in Next. js. And you can see how many tokens  it uses, how long it took in each of these   tasks. And if you want to see the tasks  themselves for their particular evals,   then that should be available on GitHub here. So  it's quite interesting that gpt-5-codex does the   best here. claude-opus-4. 1 does slightly worse at  40%. glm-4. 6 does surprisingly well. And sadly,   it seems that claude-sonnet-4. 5 only gets  32% correct. So it's interesting that when it   comes to Next. js specifically, then glm-4. 6 does  better than claude-sonnet-4. 5. Then you can also   see how different agents perform as well, such  as `codex` actually does worse for some reason.    `cursor (composer-1)` does seemingly better, and  `cursor (sonnet 4. 5)` does pretty well. So anyway,   this is probably worth looking through  yourself if you're using Next. js quite a lot.
9:00

Remote Labor Index Benchmark

Anyway, something else that is pretty interesting  is that Scale AI came out with a brand new   benchmark called the Remote Labor Index (RLI)  that basically evaluates how good different AI   agents are at performing real-world, economically  valuable remote work. And basically what they   did is they gathered 240 projects from Upwork  across a bunch of different domains. So I think   23 different Upwork domains from 64 in total,  excluding projects requiring physical labor,   long-term evaluation, or direct client  interaction. And from 550 initial projects,   they now have 240 projects to be attempted. And  then basically the value of all that work is   $143,991. And then you can see all these different  categories. And they basically evaluated the   agents across all these different benchmarks.   And basically the conclusions that they came   to is that when it comes to absolute automation,  like completing a project for a client end-to-end,   current agents perform near the floor. So you  can see the highest performing agent, Manus,   achieved a 2. 5% automation rate in the sense  that if that particular result was delivered   to a client, would the client actually accept  that result? And then you can see how the other   models do. So claude-4-5-Sonnet only got 2% ish.   gpt-5-2025-08-07 got 1. 6%, ChatGPT agent got   1. 25%, then gemini-2. 5-pro got 0. 83%. But they say  that whilst the absolute performance is low, ELO   scores reveal a steady and measurable progress.   So newer frontier models consistently rank higher   than older ones, which means that probably over  time they're going to be using this benchmark to   actually evaluate how economically valuable these  agents are. The most common failure modes for the   failed projects was the quality was too poor, like  agents produce a child-like or amateur-quality   work. The deliverables were incomplete, so they  did not follow all the instructions for what kind   of work should be delivered as a final result.   Technical and file issues, so agents produced   corrupt, empty, or incorrect file formats,  and then various inconsistencies as well. Anyways, this is pretty interesting. It will be  linked down below alongside the actual paper as   well. So I would recommend reading through it if  you're interested, and I think that speaks to the   capabilities of many of these agents right now.   Like an agent right now is not going to like build   your company from scratch. It would do a pretty  bad job of that. At least when it comes to coding,   a lot of people say that they're better at like  really big auto-completion tasks. So it can   autocomplete a lot of the codebase for you, given  that you have a good understanding of what things   should be completed. But I was also watching  the recent Andrej Karpathy interview as well,   and he talked about how when it came to one  particular project that he was working on,   it was really hard to use any of the coding  agents because they really struggle when   it comes to a codebase which is completely new,  like it's not anything that has been done before,   was incredibly rare. But they perform better when  it comes to public frameworks or using techniques   that are commonly and well used, which I think  makes them better at like auto-completion tasks. Anyways, I'm going to be following this benchmark  personally because I'm interested in seeing how   well it performs over time, but it is pretty  interesting that Claude Sonnet 4. 5 does so well on   the benchmarks and GPT-5 as well. I do wish they  measured some of the Chinese models as well, but   I think they will be improving on this benchmark  over time. Anyways that's basically it for the   video, if you do enjoy this kind of stuff then do  subscribe to the channel because it lets me know   that I should be making more of this and if you  do want to join my free community as well then it   will be linked down below and also my Claude Code  masterclass will be linked down below as well.

Ещё от Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться