that mode but on the other test is scoring 84. 8% for Claude 3. 57 Claude 3. 7 versus Gro 3 beta so Claude is winning in that respect agentic coding you can't see any benchmarks for extended thinking but you can see no extent of thinking and it's storming ahead right
Поиск по транскрипциям
which is important, is permission management. When you're running Cloud Code, there's all sorts of different kind of permission things flying by. Kind of out of the box, what happens when you start our tool is for read actions. If Claude is searching or reading, we just let it go. But once it starts writing or running bash...permission management and being smart about it can help you work faster. So there's something called autoaccept mode where if you're working with cloud code and you press shift tab, claude will just start working. There's things you can do like you can configure claude in the settings where specific commands like on bash like
Here's what makes Claude's approach different, though. Other AI agents tend to be narrow. They do one thing well. Schedule emails, analyze data, write code. But based on these leaks, Claude agent mode looks like a general purpose task machine. Research, analyze, write, build, do more. Those sections suggest it can handle almost any workflow. That...disappeared? You tell your agent what you need, you go grab coffee, you come back, and it's done. That's not science fiction anymore. That's what leaked code is showing us right now. Now, I need to hit you with the reality check again. This is rumored, not confirmed. The UI screenshots could be from internal testing that
specific app and I think a lot of people in the industry do as well and uh this can be multiple apps by now uh vs code wind surf cursor Etc so I like to use cursor currently and this is a separate app you can get for your for example MacBook and it works with the files on your...calling the API of um anthropic and asking Claud to do all of this stuff but I don't have to manually go to Claud and copy paste chunks of code around this program does that for me and has all of the context of the files on in the directory and all this kind of stuff so the that
first thing that we're going to test is how does it perform versus claude 3. 7 and we'll use this prompt which is create an AI powered audit tool for goldie agency that analyzes a business's operations and suggests automation opportunities in HTML users must enter their details etc right in HTML format so now what...Sonic gets straight into coding it whereas Llama 4 has a little think about how it's going to work first and then starts coding this out. Does seem like Claude 3. 7 Sonet is a lot faster. But we'll test these out and see how they perform in a second. This is coding the CSS separately. All right
existing habit screen, squad screen, and it's created this challenges feature. What I'm doing here is I'm not really trying to write code the feature. I'm using Claude basically as a prototyping tool. Now back in the day, before we had AI tools like Claude and Lovable and Vizzero and things like that, um in order
some really crazy news out of anthropic today Claude was already probably the most powerful AI model now they just took it up another notch Claude can now write and run code right in the chat this is really crazy because now you can have zero programming knowledge and now have clad build and run programs right in the software
process 750,000 words. That's about 10 novels worth of text in a single conversation. This is a big deal for businesses and developers. Claude can now understand entire project code bases, analyze thousands of documents, and process complex files all at once. It's a giant leap in what AI can handle, making Claude a serious tool
into this. So, for example, this whole page right here was actually automated using Claude Opus 4. 5, which is their brand new model. And from what I've seen, honestly, I prefer it out of all of the models for coding, for design, for front end, etc. Uh, for writing, very, very powerful tool as well...right? All this code with her before, etc. It's all very messy. So, what you can do over here is you can grab the code like you can see from Claude Opus 4. 5. So, it's creating the modern language page right here. And um it's redesigning the page to create a nicer front end. And then
questions right there. You can get code suggestions. You can debug faster. It's all integrated. Now, I'll be honest. If you're serious about coding, I still recommend Cursor Plus Claude. But Google's tools are catching up fast. Now, let's hit Google
thing with the theory that's and that's maybe why this comprehension that is a really relevant concept for us as developers with AI coding to understand the theory that Claude builds internally for the process of you running it for this particular feature is let's say internally complete like Claude knows that this color here also
model, the one they recently updated. For a brief window, it was even outperforming all the major competitors in benchmark tests. It handled reasoning, writing, and coding tasks exceptionally well. Models like Claude have always appealed more to AI enthusiasts, the kind of people who like to experiment, tweak, and push the limits. It's probably not the first
there's kind of two tips on debugging. One is to ask Claude to think ultra hard or why do you think this happened to help you debug bugs, right? So think ultra hard basically gets claw to think longer before coding. And why do you think this happened causes it try to root cause the issue more, right...looks great. There's a bunch of emojis. And yeah, overall it seems to work well. So, let's go back to our cloud code. And let's actually ask Claude to check things off in to-do. md that are done for milestone one. and also add to-dos for milestone 2, which if you recall is hooking
then Claude is going to do that into Json then once that is done in a couple of seconds which shouldn't take that long Claude is just going to be coding away in this area right here on the right hand side this does take a little bit longer because it's an Excel file of the amount
with AI. All right, so we're in cursor. Make sure to select agent down here. This makes it so that it goes out and builds code for you. I'm using Claude for sonnet for this, but you can use really any model you want. Claw for sonnet does cost money, but you can use claude...well. that is significantly cheaper and it'll give you similar results if we're being quite honest. All right, I pasted in our initial prompt that Claude gave us here which goes step by step on exactly what we're going to be building and I'm going to hit enter on this and it's going to start
press B, hit up, down, left, right, that kinda thing. You describe that tool to it behind the scenes. Like I have to implement some code that actually says like when Claude says it wants to press A, I need to make it so that actually goes presses A on the emulator. - I see. So the tools are like options
channel, Lenny's podcast, Greg's channel, Tina and Jeff. We all make content about AI and, you know, practical tutorial content. And let's wait until Claude finishes making the changes to the spec and the /youtube command. And then we're going to run the command to see if it can do the analysis for all five...some ways the easiest step. Now I've been following this process for a long time and I haven't ran into a case where Claude hasn't oneshot the actual code because we've done all the preceding steps. So that's it for this tutorial and um like and subscribe to this channel. I have
working outputs so overall rankings we've got the winners and the losers like you can see and in terms of the verdicts Claude 3. 7 one for writing the chat deep seek one for coding jeepy 4. 5 one for General AI stuff and then Gro one as the best free option but it is inconsistent so final fource...than GPT 40 but it's not 15 times better Claude is an absolute writing Beast if you need high quality content probably go to Claude deep seek is actually pretty good at coding surprisingly and Gro is fun but pretty inconsistent probably still not on the same level as the rest honestly which AI do you think is winning
task. Next, you set a max number of tries. This stops the loop from running forever. Usually, people set this between 10 and 20 tries. Then, Claude starts working. It builds the code. It writes the features. It does whatever you asked. Here's where it gets cool. A stop hook checks the output. It looks for that completion signal
Anti-gravity is going to be your go-to. It's free with generous rate limits right now and still has upgradeable plans, plus access to Claude Opus 4. 5, the premier coding model today. If you're doing admin or basic data entry like we just discussed, work beaver low-end type of tasks, that's going