Claude Code Skills Just Got Even Better

16:15

Claude Code Skills Just Got Even Better

Nate Herk | AI Automation 05.03.2026 45 334 просмотров 1 406 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Full courses + unlimited support: https://www.skool.com/ai-automation-society-plus/about All my FREE resources: https://www.skool.com/ai-automation-society/about Apply for my YT podcast: https://podcast.nateherk.com/apply Work with me: https://uppitai.com/ My Tools💻 14 day FREE n8n trial: https://n8n.partnerlinks.io/22crlu8afq5r Code NATEHERK to Self-Host n8n for 10% off (annual plan): http://hostinger.com/nateherk Voice to text: https://ref.wisprflow.ai/nateherk Claude Code just dropped a major update to how skills work. The new Skill Creator helps you build better skills from scratch, run evals to test how they perform, optimize existing skills for better accuracy, and trigger them more reliably. In this video, I break down everything that changed and then do a full live build of a brand new skill so you can see exactly how it works. Whether you're just getting started with Claude Code or you've already been building skills, this update makes the whole workflow significantly better. Sponsorship Inquiries: 📧 sponsorships@nateherk.com Timestamps 00:00 What is a Skill? 00:52 Two types of Skills 03:27 Anthropic's Skill Creator 04:23 Evals 06:15 Trigger Tuning 07:47 Installing this Skill 08:44 Building new Skill 12:24 Final Result 14:18 Final Output

Методичка по этому видео

Структурированный конспект

Claude Code Skills: How to Build, Test, and Optimize AI Agent Skills

A comprehensive guide to Claude Code skills — what they are, the two types (capability uplift vs encoded preference), and how to use Anthropic's new Skill Creator to build, evaluate, benchmark, and trigger-tune skills for AI agents.

Оглавление (9 сегментов)

What is a Skill?

Cloud skills just got 10 times easier to build and stronger to use. So, in today's video, I'm going to explain exactly why that is and then I'm going to live build a completely new skill right here in front of you guys. So, real quick, what is a skill? It's basically just a recipe. So, that when you ask your agent to make you, for example, a LinkedIn post, it will read the recipe and it will get it right every single time. And when I say recipes, I literally just mean text. It's just text instructions. It's like a prompt. So if I go to customize and I go to skills and I click on let's say for example the internal comm skill this says a set of resources to help me write all kinds of internal communication using the format that my company likes to use. And you can see this is the skill itself. It is literally just text that you could read that an intern could read. Anybody could read and understand what's going on in the skill. And if you're using them in cloud code you can see I've got a ton of skills here. So for example let's look at my um idea mining skill. This is the markdown file that explains to the agent what this skill actually does. And once again

Two types of Skills

it's all just text. So what did Enthropic actually do that made all these skills better? They updated their skill creator skill, which is literally a skill that teaches Claude how to build, test, measure, refine, just make all the skills better and better. So let's actually cover why that matters and what happened. So the first thing I need you to understand is that there are two different types of skills. We have a capability uplift skill, which basically is a prompt. So it teaches Claude how to do something better. for example, design websites with the front-end design skill or create documents or run Excel formulas. Things that maybe the default model by itself doesn't know super well, but with a prompt, it does a much better job. And then we also have encoded preference skills, which means that Claude already understands each of these pieces, but it needs to follow them in a specific order. So these are way more like actual workflows, like actual kind of like step-by-step automations. So, quick example. If you ask Claude without a front-end design skill to build you a website, it could do it, but it might just look very generic. It might look AI slop as they call it. But if you give it the exact same prompt, but this time you also let it use the front-end design skill, it's going to look much better because that skill tells it stuff like good fonts, good color schemes, you know, good background elements, good layouts. And that is a classic capability uplift skill. Now, here's an example of an encoded preference skill, which is the one we just saw in my cloud code, which I call idea mining. And this skill is a little bit more sequential and there's different steps involved. So, first it will look at my YouTube comments. It will look at, you know, some videos in my niche. It will also look at AI trends on X and the web. It will then spin up two different agents. So, a YouTube agent that analyzes this stuff and a research stuff. And these run in parallel. And then they both send their output back to the main agent, which will score and cross reference. And then the main agent turns all that information into some video ideas for me, which is why I call it idea mining. So, what I could do is I could say, "Hey, Mr. AI agent, go look at my comments, go look at YouTube, go look at X, you know, analyze that and help me find some video ideas and every time it would give me different answers sort of do it differently or I can just say, hey, do some idea mining and it will just call the skill and every time I get an output that I like. And the reason why this is actually important to understand is because capability uplift skills might fade over time because for example with the front-end design skill, right now we're with Opus 4. 6, right? What if Opus 5 drops and default Opus 5 is better at front-end design than Opus 5 with a front-end skill? So, at that point, you might just need to retire that skill completely, but with an encoded preference skill, these will probably stay pretty durable and accurate because the process is very specific usually to you, which Opus 5 won't be trained on most likely. Okay, so those are the two

Anthropic's Skill Creator

kind of different types of skills. Now, we can actually evaluate them. So, with this new skill creator skill, which is an official anthropic skill, this is the one we're talking about. It's in the repo right here. And if I open up the actual skill MD, you can see this is what it does. It creates new skills. It can modify and improve existing skills. It can measure skill performance. So use this when you want to create a skill from scratch, if you want to update or optimize one. run evals to test a skill, if you want to do benchmarks, or if you want to optimize a skills description for better trigger accuracy. So I'm going to talk about what each of these little elements mean, but I just wanted to show you that this is the actual skill creator skill. It's basically just all of Enthropic's best practices on how to build better skills. They've done things before like dropped a 33page PDF which walks you through fundamentals, planning and design, testing and iteration, distribution and sharing, all this kind of stuff, patterns and troubleshooting. This is pretty thorough. So you could either take time and learn this or you could just give your agent the skill creator skill and all that information is

Evals

already in there. So what the eval do is it lets your agent actually evaluate the quality of your skill and then make improvements. So, let's say you have a skill for creating job descriptions. What you could do is give your agent tons of examples of really good job descriptions that you want. And then it will look at your skill. It will test out some prompts and it will compare it to the outputs and it will be able to optimize your skill for you. As we've talked about in the past, the more you use a skill, the better and better because you're able to give feedback on what you like and what you don't. So, this basically shortcuts that process. Here's a quick example that Enthropic actually ran with this eval. The skill for filling out some PDF stuff was having trouble finding the right spot to put the text. But then after they ran the evaluation on the skill and it was able to improve, now you can see all the text is accurately being placed whether that is a checkbox or just a fill-in some sort of field. So there's two reasons that we need to use evals and they sound kind of similar but they're basically the opposite. So the first one is to catch regressions. So this means let's say we have a job description skill. As a model evolves it might actually use the skill worse because it's trained a little different and it you know thinks a little different. So this would basically be an early signal that you need to evolve your skill. And then the second one is to spot out growth. So once again, as models improve or evolve, it might be able to just do a better job without a skill at all. And that's when you would be able to run the evaluation, say, "Okay, wow, without a skill, it's actually better. And I'm just going to go ahead and delete this or maybe just archive it. " And then we can also run benchmarks. So when a model updates or when you make an iteration and you change your skill, just run all the evals and run a benchmark which will give you stuff like a pass rate, a time and also how many tokens are being used. So here's an example where they said benchmark the PDF skill with and without the skill loaded and show me sideby-side results so I can see the uplift. And we get all this information about these different evaluation metrics. We get the pass rate. We get the total time and the total tokens. So here you can clearly see that with the skill you're getting much better results. And then the final

Trigger Tuning

piece is skill trigger tuning. So once you've got a project filled up with, let's just say 10 or more skills, you might notice sometimes that you get false triggers or you get misfires. Meaning you wanted it to use a skill and it used the wrong one or just didn't use any at all. Luckily, you could also use them with slash commands, but it's so much more convenient to just be able to speak a natural language and make sure that your agent understands you. So using the trigger tuning, the skill creator will basically analyze your skill. It will test out different prompts that you might use to trigger that skill and then it will edit the description so that skill gets called more accurately. And this is an actual evaluation that they ran. You can see on the lefth hand side and on the right hand side we have the test score and the train score. And the green and blue are basically the results after it has been analyzed and fixed with the trigger tuning. So you can see it's still not perfect, but it's so much better than where we were without this new skill. What I think is really cool and how I want to end off this section before we get into a live demo is where this is going. And at the bottom, we have a quote from Enthropic themselves that say, "Over time, a natural language description of what the skill should do may be enough with the model figuring out the rest. " And I really think that this word may should actually have been will. And basically what this means is that today when we're telling our agent to build skills for us or maybe just giving it an SOP, we're giving it steps, rules, and format. But what's going to happen in the future is we're going to be able to just tell it in way more highle natural language what we want and it's going to be able to figure out all of that and get there with a spec and basically just cut down the time that it takes for us to get a really good skill or you know a really good automation.

Installing this Skill

All right, so I am in my Herk 2 project which is kind of just like my personal assistant in cloud code and I'm going to show you guys how we can actually get the skill installed. So whether you're in VS Code, which is where I am, or in the terminal or desktop app, whatever, you just need to do /plugins, you can click on manage plugins, and then if you just go in here and you can see like all of the kind of anthropic official ones, you can just go ahead and search for skill-creator. And right here, you can see the official one. Here's the GitHub. And all you have to do is go ahead and click install. You can install this for just you, you can install it for your project, or locally. And I'm just going to install it for the whole project. So now you can see that's installed and I'm just going to go ahead and restart Cloud Code so that actually happens. And so just keep in mind if you're in Cloud Code, it may not show up right here in your actualcloud skills if you did it, you know, in your project. So you can just verify it and say, do you have the skill creator skill? What does it do? And you can see right here that we do in fact have that. So I'm

Building new Skill

going to go ahead and switch on to plan mode and I'm going to see if it can build us a new skill. I need you to create a skill called YouTube weekly roundup where at the end of every week, you will look at the videos that I made that week. You'll analyze the comments, you'll analyze the views, engagement, things like that. And you'll give me a PDF report on all of the insights, strengths, weaknesses, threats, opportunities. So, that's all I'm going to send off. And I kept this pretty vague intentionally to see what it's going to come back with and how it's going to be able to plan this out for us. And this is where the future's going. And this is what Enthropic is talking about. Because most people that are using skills right now are actual just like executives and managers and operators. They're not engineers, which means we're really good at being able to explain what we want, the metrics we need to hit, and why we need that, but maybe not all of those technical nitty-gritty details. All right, so it came back and asked me some questions. The first thing I said is I want it to just be the last 7 days. So, it's a rolling 7-day window. It asked about the report sections that it came up with, and I said those look good. And for the PDF style, I told it to use the brand assets in my folder. So right over here I've got my brand guidelines and then this one is the actual logo for AIS. So I'm telling it to use those and hopefully it can throw all that on there and make it feel really branded. So it's going to keep going now with this plan. All right. So at this point it came back with a plan. And keep in mind I still haven't told it anything about text stack or anything else. It's writing out everything that it's going to do. And normally I would read through this and give it some tweaks potentially but I just want to see what this skill creator is able to do with a oneshot prompt. And I'm just going to go ahead and accept. And look at this. In its to-do list we can see that it creates all these things. But then the last step is to run the test and iterate with the skill creator eval process. So I'm excited to see what it does there. So you can see that it created everything and then what it did is it decided to test it to do a final iteration. Okay. So I was a little confused. I said, "Do you have an actual PDF file for me? " And it said, "Yes, it is in your projects folder. " I was looking in the templates where it created an HTML template, but apparently it actually rendered that as a PDF. So let me go to projects. We'll go to YouTube weekly roundup. And right here we have an actual PDF, which this doesn't look great. Obviously, this is not a PDF, but if I actually open it up from my files, it is a PDF. So, here we have the logo, we have weekly roundup, we have three videos published, and then we got some stats on views, likes, and comments. I'm going to keep going down. We have our executive summary. So, this is for it actually ran I think two weeks worth of data just to test this out. And I will say just by glancing at this, I don't think that this data is correct. So, keep that in mind. Here we can see the per video breakdown. Right now, we have nothing available in our SWAT analysis. And then we have competitor context and there's nothing available here. So now it's time to give it some feedback and see what it can do. I'm first of all going to clear out this context because it used up 62%. I'm going to go back into plan mode and just give it some honest feedback. All right, so the report looks great. Like aesthetically, you did a good job on the design. However, the data is all wrong. There was a lot of missing elements. I need you to really look at how you're actually scraping this data from my YouTube channel, how you're actually searching through the comments and competitor videos and make sure that there's actually data going into this report. And before I send this off, it's interesting because you can see here it sent us some JSON data, which is actually the raw information that it was able to find from my YouTube channel. And the thing is, this isn't super in-depth. So, I just don't think that it did a good enough job on the research element. And maybe this is exactly what we were talking about earlier over here where at some point the AI is going to be able to understand that we want all of this granular data, but maybe right now it's our job to just explain that really clearly. I want to see comments analysis. what's working for other people in the space. I want to see, you know, other trending videos in AI. And I want you to use all of that and use your brain to figure out what are the strengths my channel has, the weaknesses, and the opportunities and the threats. And then all of this information should be a pretty in-depth research report for me on, you know, my YouTube weekly roundup. So, while this is running, I thought that we should

Final Result

real quick look at what it actually did. So, in my claude, we've got my skills folder. If I go all the way down, we've got the YouTube weekly roundup. And this is the MD file. So, we've got the YAML up top with the name, description, disable model invocation false, which just basically means that Cloud Code can call this based on a request. It doesn't have to be explicitly a slash command. And then an argument hint. So basically when cloud code decides to use this skill, it will send in maybe a hint so that the skill understands like what video we're looking at or you know the topic. It's giving some context. It's giving some channel benchmarks, some optional focus and then step-by-step instructions on what to actually do here. Now you can see what it's doing is it's calling on a script called fetch YouTube data, which if I was to look for that in here, I could probably go down to my scripts. I could see YouTube weekly roundup. And right here we've got some different things. We've got the prepare data. We've got the render report. And we also have a script that I already had in this project that it was able to find and use, so it didn't have to create a new one. And this one is called fetch YouTube data. So the skill. md file here basically points to everything that the agent needs in order to do this accurately. Okay. So it's come back with another detailed plan. And I'm going to go ahead and fire this off. And I love this. Once again, we've got all these to-dos. And then at the end, it says to audit with the skill creator. And this is such a good example of why using a project more and using a skill more makes it stronger because some of the pieces that I already had in this project it's able to reuse like my YouTube analyzer agent like my YouTube data script and of course it has all the context about my business and my YouTube channel in here already. So all of those changes have been made and now all that's left to do is actually run the skill. So I just called the skill. You can see that it's reading it right here. And now what it's doing is it's going to refresh channel data. It's going to use three agents in parallel, prepare the report, populate the data, and then render the PDF and show it to me. So, I'll check in with you guys when we get that output. All right, that finished up. We've got some quick hits. Top competitor move, biggest opportunity. Apparently, Jack Roberts is my biggest threat. If you see this, Jack, keep

Final Output

crushing it. Okay, so here is the report. These stats right off the jump look a little bit more accurate. I might want to tell it to make this logo a bit bigger, but it did what we asked. Seven videos published, and like I said, these stats look more accurate. We've got the executive summary here with some key takeaways of doubling down on dollar outcome titles, make a dedicated anti-gravity tutorial, fill the chat GBT to claude migration, watch Jack Roberts closely, and then address VS Code versus anti-gravity confusion. We've got the per video breakdown. So now you can see the actual metrics from all the videos, including the one that I literally just dropped like an hour ago. And so all of these look like they're doing okay. This one might not be doing the best, and similar with this one. But I really like the way that this actually looks. The layout's pretty good. It is very clean and professional. For the SWAT analysis, we actually have it on the second page. It still looks good. That's obviously just an easy spacing issue that we can fix. So, we have some strengths here. We have some weaknesses. We have threats. And we have our opportunities. Top comments and audience signals. Selling shovels in a gold rush, bro. Well played, man. 26 likes. Hi, Nate. 10 days into joining your plus community. I got my first potential client. It's all thanks to you and your community. Awesome. And you can see that we see other comments. We see what video they came from and how many likes. And we're also getting video requests, we're getting pain points. And so that really helps me stay in tune with what you guys are saying. Wow, it just keeps on going. We've got competitor context. So all of these channels, all of these videos, all of these stats, and that comes along with some notable gaps. And then finally, we get what's trending in AI this week. So what are skills, the most powerful AI agent I've ever used, all of this stuff with the channel, the views, the views per day, and the topic. So this is amazing. And I was able to build this in 20 minutes. And now what I would do is just keep running it. And every time say, "Hey, I liked this. I didn't like this. " use the skill creator to make this better and better. So anyways, appreciate you guys making it to the end of the video. If you enjoyed, please leave a like. And now that you understand this concept of skills and how to make them really good, what you need to do next is build your own executive assistant that you can start to build tons and tons of skills into. So if you want to see how you can do that, then check out this video right up here. I'll see you guys over there.

Другие видео автора — Nate Herk | AI Automation

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник