New Claude 3.7 Sonnet - World's First "Hybrid Reasoning" Model

13:33

New Claude 3.7 Sonnet - World's First "Hybrid Reasoning" Model

Skill Leap AI 25.02.2025 15 177 просмотров 391 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Here is the link to the full Claude 3.7 Sonnet announcements and the Claude Code video: https://www.anthropic.com/news/claude-3-7-sonnet In this video, I test the new Claude 3.7 Sonnet by Anthropic, an upgrade from the 3.5 Sonnet model. Key highlights: Hybrid reasoning mode: Quick answers or step-by-step thinking Improved writing: Still one of my favorite models for tone and clarity Coding tests: Mixed results with a chess game and front-end web design Reasoning tests: Extended mode struggled with logic-based problems Claude Code: New coding tool (research preview) with GitHub integration The model is available on all tiers, but extended reasoning isn’t on the free plan. Tried it yet? Let me know how it worked for you! ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ MORE FROM SKILL LEAP: 💡 Join the fastest-growing AI education platform & Instantly access 20+ top courses in AI: 👉 Start with a free trial: https://bit.ly/skill-leap ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

Claude finally got a brand new upgrade they just released Claude 3. 7 Sonet this has an upgrade from 3. 5 Sonet and if you've used Claude before it's one of my go-to large language models especially because of its tone and some functionality that they have this is from a company called anthropic and they don't do releases very often in fact claw 3. 5 Sonic came out five months ago now in the age of AI That's a thousand years ago and they also released something called claw code which I'll show you here now you'll notice inside of claw. website you'll have three different models so you have claw 3. 7 Sonet this is the replacement for 3. 5 but then for the thinking mode or the reasoning models you actually have two different ones you have a normal one this is for most use cases and then you have the extended one this is best for Math and reasoning so this is just a smarter version that's going to think longer I'm going to use this one for most of the testing here the normal version this is why they call this a hybrid reasoning model is because the normal version is almost instant it doesn't really show you much thinking it almost gives you the answer instantly as you can see here it just kind of gave me the answer here and it didn't have to show me its thinking now as far as the pricing of this CLA 3. 7 Sonet is actually available on all CLA accounts including free pro team and Enterprise it's also available inside the anthropic API and it shows their pricing here for that but the extended thinking mode this is only going to be available in every tier except the free tier so if you want the extended one you will have to upgrade to at least the professional plan for that so I'm going to show you a couple different things we'll do some standard things with the clae here the 3. 7 son at the standard mode then I want to show you some reasoning mode I'll show you some of these benchmarks here that they rolled out too for example when it comes to coding here this is their latest Benchmark comparing it to open ai1 open ai3 mini High and deep seek R1 and obviously with benchmarks nowadays I like to test it out for myself but as you could see here in software engineering it's beating all of those in this Benchmark and this is the most standard test that you've probably seen before if you've seen any of these benchmarks again it's comparable to some of the top models they include deep seek R1 and grock 3 which recently rolled out here too and they also introduced Claude code now this one is only available in research preview right now so you may not have it yet but because Claude Sonet 3. 5 sonnet was one of the best tools for coding they decided to create something that was more native to coders and that is our first agentic coding tool so this is kind of built in to some of those platforms and you'll have GitHub access here right inside of claw to work with code natively using claw 3. 7 Sonet so this video does a really good job explaining it in about 4 minutes so I'll link this page in the description if you want to learn more about that right now I want to show you claw 3. 7 Sonic it right inside of the cloud website okay we'll go ahead and put cloud through some basic testing here one of my favorite reasons and why I still have a paid subscription to Claud is because of its writing style and then in addition to that they have this choose style option where you could Define your own writing style with a better set of instruction that it uses on the back end here every time you go to write something it's going to use one of these Styles and one of them I've created here but some of these other ones that are here by default are really good but a normal writing style basically not choosing any style I think with claw 3. 5 Sonet was one of the best now we have claw 3. 7 Sonet so let's see what we get there so I'm not going to give it too much instructions I'm just going to ask for a summary in 250 words and I'm going to ask it to include the key points in five bullet points and I'm just going to copy this entire page here now here's our summary anthropic has announced claw 3. 7 Sonet their most intelligent model to dates and their first hybrid reasoning model the new model can provide either quick answers or extended step-by-step thinking that users can see so you could see the thinking I'll show you when we use the reasoning model claw 3. 7 Sonet shows significant Improvement in coding and frontend web development capabilities so we'll test this a little bit here too but you can see the default writing is so much better if you've ever used chat GPT or even Gemini they go overly promotional still to this day they do improve that all the time but I still prefer Claude with his default writing style here without really doing anything to the style or the tone and he actually listened to our prompt here so we got exactly five bullet points over here and I copied and paste this section to count the words and we got exactly 248 words out of this one which is really impressive usually these models don't do a good job with W count that's not really how they work off token counting and not work counting but

Segment 2 (05:00 - 10:00)

really great job here okay now one limitation I'll point out right up front which has been the limitation that Claud has had for a long time one of the reasons people use other models it doesn't have web access so if I give it a question here that requires real time it's going to say my knowledge cut off is October 2024 now chat GPT has web access Gro deep seek Gemini they all have web access this one still doesn't have web access and all those also have something called Deep research where it analyzes ton of websites and gives you a much more in-depth summary of what you're looking up this doesn't have that either right so this is going to be a problem if this is what you're looking to do with it now the next test I want to do is a hallucination test because if you use these type of large language models you know that they make stuff up sometimes and that is one of their biggest limitation so describe each of the following variety of mangoes and I just named four one of them is not actually accurate let's see if it could figure that out okay totally failed this test this lemon cream mango I just made that up lemon cream mangoes are a lesser known but delightful variety nope I just made that up they're not a delightful variety I'm just curious if chat PT does get this one right I haven't tested it for a while okay chat PT got it wrong too it just told me this is a rare variety of mango and sometimes having web search kind of helps a little bit with these kind of hallucination problems perplexity for example told me there is no specific information on Lemon Cream variety but there is something called lemon zest mango here so I could follow up on that and just kind of do my research to make sure this one is also not made up so this is something that you could solve to some extent with search where Claude and chat GPT failed at okay next let's do a coding test and I'm going to turn on this extended mode this time because this is designed for Math and coding so I'm going to have a math question coming up too and we're going to ask it to code a game on chess and I'm just going to make it a basic game right now and then I'll change the rules of the game here if you could do this without problem before in my previous videos I checked with 03 uh 03 mini I checked with deeps car one same kind of prompt and I was able to get the results so then I changed the rules of the game to make it really complicated for them and to some extent they were able to get it right in those other videos I'm going to tell it the pieces are saved in a folder called assets on my computer and I'm going to upload an image of that okay for the very first one for some reason it decided to do this in web kind of language in HTML and this is the game I got so I downloaded it and try to run this game and it kind of looked nice the images did not load so it wasn't able to connect those but I also couldn't play at all so that was a fail so I asked it to make this in Python instead something that I've been using with all the other tests now in his first attempt it wrote The Python code I was able to run it but the moment I tried to move a piece it crashed again so that also didn't work and it took me five times to go back and forth to get a working chess game that I'll show you here okay and here's the game by the way I didn't tell it to make a game of chess where the rules were different I'm just trying to get a default chess game and it's not working the right way you could see already this piece should move two and it could only move one same thing with the opponent so for some reason he got the basic rules of a game of chess wrong I literally can't move this one the right way you also couldn't figure out when the game was done so if I took this piece here it just kind of kept going even though I should be able to take this right now it shouldn't be able to move another piece right now but I could just keep kind of moving my other pieces and this is just happening over here and I think if I take yeah so it just kind of doesn't know how the game ends either but the fact that the pawns just couldn't moved the right way was a big fail because the other models that I've used not only did they get a default game of chess correct off one prompt usually sometimes on the second prompt after six or seven prompts I still don't have a functioning game of chess and I haven't even changed the rules yet typically for these tests I push them further I asked hey can the king move like the queen to really try to get some new rules in there but I didn't really need to do that because I just couldn't even get this to work and you can see the UI here is kind of a little messed up too so on top here is cutting off some of these with this text now in the release they said it's really good at front ends so I have this issue with my website where it's not doing a good job with formatting this for the web so I'm going to just take a picture on top of this so if I shrink this down you could see it's not Center AL line right here so I'm trying to change that right now and I'm going to ask it to improve that so let's give it that image so I'm just going to ask it to write front end code all in one file and try

Segment 3 (10:00 - 13:00)

to fix the issue here for mobile so let's see what we get here it's going to give us the code and I'll run this in a second okay this is what I got on desktop view from that and if I shrink it down okay it's centered aligns some things but uh the formatting is not great so it really optimized it for mobile and completely messed that up on desktop view so that did not quite work I'm sure with some back and forth I could get it but I've been using CLA 3. 5 Sonic for these kind of things and with some back and forth I could get it to work too eventually sometimes I run out of credits that's one of the other things with Claude that I've had issues with you hit your limit all the time if you use it I'm sure you've hit that same issue before too okay so far still the same points that I gave 3. 5 Sonet I think is still great at his tone in summary listening to my instructions but two coding test now even a basic front end code where it takes an image and makes it kind of a HTML page not great let's do a reasoning test here that I've done with other models too okay this one I've tested with every model in previous videos every reasoning model so I'm going to choose make sure it's set to the extended reasoning model here and it says you have a rope exactly 50 ft building is 75 ft you need to measure that using only the rope and your body which is only 5 ft tall describe your steps so I'll show you the reasoning how it actually goes through it too because one thing I do like about it is if you click over here you could see the actual reasoning here and I didn't see that being a summary of its reasoning they kind of said it is showing you the step-by-step reasoning is also showing you the timer up here 16 seconds so far so I'll let this finish up okay 1 minute and 39 seconds this time and I ran this three different times and it got it wrong every single time in different ways this time you got it wrong by saying just stretch the Rope 50 ft vertically along the side of the building from the ground well how are we going to do that the there's going to be gravity pulling that rope down we can just stick it straight up then it says get to 50 ft and then use your 5 feet body to keep measuring it like you're supposed to float in the air so this is definitely on par with some of the worst answers I got using any of these models and I'm pretty sure chat p03 mini got this one right so this is a thinking model it's a reasoning model it thought for almost 2 minutes which is also on the long end of these reasoning model solving this problem and similar triangles is the actual Solution on how you supposed to do this you can't just put the Rope up there so this definitely is a fail and it fell three different times in completely different ways one time it just kind of made up an answer that made zero sense at least this time it just doesn't know that gravity is going to pull that rope down but like the logic of it kind of makes sense and when I first tested did it initially I asked it how many RS in strawberry and instead of just telling me three it created some app here click the strawberry to find out how many RS one oh okay so I guess it got three Rs but I don't know why you had to make it interactive app I guess that's kind of neat now I'm going to keep using this I still really love Claude for his writing but some of these early tests here are not very impressive so let me know what you find out go ahead and comment below I'd love to see if you're getting better results than I'm getting here in the initial first day release thanks for watching I'll see you next time

Другие видео автора — Skill Leap AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник