# Claude 3.7 Sonnet Just Shocked Everyone! (Claude 3.7 Sonnet and Claude Code)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=St_I9HUEES8
- **Дата:** 25.02.2025
- **Длительность:** 14:53
- **Просмотры:** 32,419

## Описание

Join my AI Academy - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:
https://www.anthropic.com/news/claude-3-7-sonnet

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=St_I9HUEES8) Segment 1 (00:00 - 05:00)

so today anthropic finally released their new AI model clae 3. 7 Sonet and there's actually a lot to digest because this one isn't your standard llm but rather a hybrid reasoning model that has a lot to offer in terms of the benchmarks broken so if you actually take a look at what we see here it says that today we're announcing clae 3. 7 Sonet our most intelligent model to date and the first hybrid reasoning model on the market it says right here that claw 3. 7 Sonic can produce near instant responses or extended step-by-step thinking that is made visible to the user API users also have fined grain control over how long the model can think for and of course this hybrid reasoning that they actually talk about is essentially where you have system one and system two thinking which I'll explain a little bit in a second and essentially this hybrid reasoning actually means that the model can offer responses that can be suited to both difficult queries and of course those quick queries where you need an instant response so they talk about how they developed clae 3. 7 sonnet with a different Philosophy from other reasoning models on the market just as humans use a single brain for both quick responses and deep reflection We Believe reasoning should be an integrated capability of Frontier models rather than a separate model entirely this unified approach also creates a more seamless experience for users so they're essentially referring to this where of course you have system one and system two of course with your system one this is where you have your intuition and Instinct so quite like humans LMS can respond instantly but of course with system 2 this is much slower logical and this is of course where you think for your problems and come up with more complex Solutions and this is of course what they've embedded into the model now with CLA 3. 7 Sonic you can actually control the budget for the thinking so if you're a developer and you are thinking about the customizability for this model you can actually tell claw to think for no more than a certain number of tokens and it's up to you how long you want claw to think I think this is something that is really valuable because of course it allows us to control how long CL thinks about it problems because one of the things I've realized about the other thinking models is that sometimes the model will think about for our problem for maybe 10 seconds when we wanted the model to think for maybe 100 seconds or even 200 seconds so this is going to be something that is really useful I will get to the very Infamous benchmarks in a moment but when we actually look at claw 3. 7 versus claw 3. 5 in the standard model claw 3. 7 this is just essentially a lot smarter than the previous claw 3. six and essentially this version of course you can just enable the extended thinking mode where of course it self-reflect before answering which of course improves its performance on math physics instruction following coding and many other tasks and they generally find that prompting the model Works similarly in both modes so for those of you who do have prompts that used to work on clae 3. 6 sonnet it's quite likely that they will work on CLA 3. 6 and will work the same way they have done on clae 3. 7 so there won't be any changes to prompting now one thing that I found really interesting and I am finally glad an AI company has done this is now the fact that they are optimizing for real world Focus they state that in developing our reasoning models we've optimized somewhat less for math and computer science competition problems and instead shifted Focus towards a real world tasks that better reflect how businesses actually use LMS the reason I think this is going to be gamechanging is because often times we see companies focus and obsess over benchmarks that are in areas that aren't in everyday use for example when we about to take a look at the benchmarks for claw 3. 7 you'll see that the benchmarks are still very impressive but a lot of those areas don't actually directly translate to World business use that the average everyday person is going to get value out of the model for and I think this is why claw 3. 7 and Claw 3. 6 have traditionally been better than chai GPT and their rival counterparts because they train the models so that they're actually good for real world use and maybe not so much competition problems so these are the benchmarks for claw 3. 7 Sonet and essentially right here we can actually see where claw 3. 7 Sonet exceeds and one of the first things that we do notice is that claw 3. 7 Sonet isn't crushing these other companies in terms of the benchmarks if you do remember it was very recently that we did have the grock 3 beta be released and of course we can already see that clae 3. 7 is on it honestly don't know how they did it but for many and several benchmarks it does come out on top for example in agentic coding and of course agentic tool use which I'll dive into a little bit more but these are the areas that are actually real world use cases these two right here are really important but in other areas it does seem like you know for example visual reasoning and high school competition math it does seem like a lot of these top models are kind of all converging towards the same area around 86% but of course in the GP QA we can see that clor 3. 7 Sonet manages to Edge out over grock 3 beta now I will say that this one is a little bit more

### [5:00](https://www.youtube.com/watch?v=St_I9HUEES8&t=300s) Segment 2 (05:00 - 10:00)

interesting because of course I do think with clae 3. 7 Sonic it really is a model that you can't look at the benchmarks and judge it is a model that you truly have to use and some of the tweets that I'm seeing in the AI Community definitely show us that this is probably going to be the model that a lot of people immediately switch to I wouldn't be surprised if Claude probably run out of inference for the model considering so many people already used the previous one now like I said before of course I don't want to focus too much on these benchmarks but I'll show you the benchmark that we do need to focus on so this Benchmark right here is of course a gentic tool use so this is the tow Benchmark and this essentially is the framework that tests AI agents on real world tasks with users and Tool interactions so this is basically a benchmark that will actually have real world usage and so this tow Benchmark here that we see is basically one that is really important because like I said before this is something that is really needed for real world use case so this Benchmark basically just evaluates how consistently an AI agent can perform the same task consistently across multiple trials using a metric code pass K which basically you know looks at how reliably it does it over competed attempts and I think you know this stuff is important because we need benchmarks that actually work for the real world use cases of course competition math the GP QA this is of course great for assessing how smart an AI is but you're going to need real world use cases if you want AI to actually be used in of course the real world so by focusing on tool use which is what this Benchmark does it basically you know looks for consistent behavior and it ensures that these AI agents are prepared for deployment in sensitive domains like customer service or Healthcare and of course it wouldn't be benchmarks if we didn't get to the swe bench verified so this Benchmark is of course one that is for the software development Niche and this actually achieves state-of-the-art performance on thew Benchmark and this being state-of-the-art really just goes to show how much better this model is over opening eyes 03 we can literally see right here that whilst yes there's a lot of deep seek hype a lot of 03 hype too claw 3. 7 actually manages to surpass those models by a pretty significant amount and this is something that like I don't just believe the benchmarks for this model this is something that I've actually seen first case when looking at people who currently Cloe code with clo 3. 7 son it and all of the things that they're saying is basically that this model is pretty much outstanding so we can see that open A3 is at 49% open A1 is at 48% claw 3. 5 Sonic all of them are basically around you know 49% and then we get this massive jump to 62. 3% and then all the way up to 70. 3% with custom scaffolding so that is a huge jump and then this one here I believe was the October 2024 so that is 4 months and we've seen nearly a 12% increase which doesn't seem like a lot but in terms of your actual day-to-day use this is going to be a remark more helpful model for various different use cases when it comes towe related tasks now of course for those of you who may use Devon you'll also see here that in the agentic coding evaluation this model actually once again jumps up to 67% so once again I think it's quite clear to see that the trend we on we can see with GPT 40 starting at 49% and now with Claude Sonic 3. 7 already at 67% can you imagine where we're going to be in just a few years I mean the future is truly exciting and if you're actually wondering about clae coding you actually may want to take a look at this so this is basically a video where they introduce a new agentic coding tool that actually allows users to work with clae directly in the terminal now this is something that is launched as a direct research preview to enhance coding capabilities and features of course of this is basically where you know it can understand your code base it can analyze a repository it can provide insights into its structure users can request changes you know it can display the thought process you know it can generate and execute tests resolving errors automatically it can detect and find you know fix build issues iteratively it can you know push changes to GitHub with clear summaries and this is a video that you're definitely going to want to watch if you're someone that uses Claude for coding should we be doing like big smile or uh no big smile is creepy that's sort of way think I'm Boris I'm an engineer I'm cat I'm a product manager we love seeing what people build with Claud especially with coding and we want to make Claud better at coding for everyone we built some tools one of which we're sharing today we're launching CLA code as a research preview CLA code is a agentic coding tool that lets you work with CLA directly in your terminal we're going to show you an example of it in action so we have a project here it's a nextjs app let's open it up in an instance of quad code now that we've done this clad code has access to all of the files and this repository we don't know much about this

### [10:00](https://www.youtube.com/watch?v=St_I9HUEES8&t=600s) Segment 3 (10:00 - 14:00)

codebase it looks like an app for chatting with a customer support agent let's get CLA to help explain this code base to us clad starts by reading the higher level files and then it Dives in deeper now it's going through all the components in the project cool here's its final analysis so say I was asked to replace this left sidebar with a chat history and I'm also going to add a new chat button I'm going ask clad to help me out here we haven't specified any files or paths and Cloud's already finding the right file stop date by itself Claud can also show its thinking and we can see how it's decided to tackle this problem quad asking me if I want to accept these changes I'll say yeah now quad's updating the Navar adding a button and icons as well next it's updating the logic to ensure the saving State works correctly after a bit Claude completes the task here's a summary of what it's done let's take a look at that so we're seeing a new chat button and new chat history section on the left let's check if I can start a new chat while keeping the previous one saved I'll try out the new chat button too great it's all working now let's ask Kaa to add some tests to make sure that the features we just added work C's asking for permission to run commands we'll say yes Cod is making some changes to run these tests after getting the results it continues with its plan until all tests pass after a few minutes it looks like we're good to go now I'm going to ask CA to compile the app and see if we get any build errors let's see what it finds Cod identif the build errors and is now fixing them then it tries to build again it'll keep going until it works now let's finish everything up by asking quad to commit its changes and push them to GitHub quad creates a summary and a description of our changes and it'll push the changes to GitHub that's it that's an example of what clad code can do we can't wait for people to start building now there was also this Benchmark that I forgot to include but they've actually introduced a Claude Benchmark for the model playing Pokémon so it says here that Claude 3. 7 Sonic demonstrates that it is the very best of all the Sonic models so far at playing Pokemon Red fortunately I don't actually play Pokemon but they talk about that Pokemon is a fun way to appreciate clothe 3. 7 Sonic capabilities but the ex these capabilities to have a real world impact act Beyond playing games because the model's ability to essentially maintain focus and accomplish open-ended goals will help developers in a wide range of state-of-the-art AI agents being developed so that is why they did this and I think that these kind of new benchmarks are going to be super entertaining and super interesting so of course as well we do have this which is of course something for the future and this is basically where they said and I quote the clae 3. 7 Sonet and clae code mark an important step towards AI systems that can truly augment human capabilities with their ability to reason deeply work autonomously and collaborate effectively they bring us closer to a future where Ai and riches and expands what humans can achieve and this is basically where they're stating that you know by 2027 Claude is going to be having Pioneers so first we having the assistants in 2024 then of course we had the collaborators in 2025 Claude does hours of independent work for you on power with experts expanding what every person or team is capable of then of course in clae 2027 this is where we have Pioneers where they're predicting that Claude is going to be able to find breakthrough solutions to challenging problems that would have taken teams years to achieve so the future for Claude is certainly bright but let me know if you have used this already of course if you go right here you can see that Claude 3. 7 Sonic is right there and of course you can also see the thinking mode you've got normal and then you've got extended so it's completely up to you what you want to use hopefully you guys enjoyed the video and I'll see you in the next one

---
*Источник: https://ekstraktznaniy.ru/video/13281*