New Claude 3.5 Sonnet is Better Than GPT-4o
12:05

New Claude 3.5 Sonnet is Better Than GPT-4o

The AI Advantage 21.06.2024 73 763 просмотров 2 178 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Anthropic surprised everyone by releasing Claude 3.5 Sonnet, a multimodal AI that beats even GPT-4o in a ton of categories. Today I'm breaking down everything you need to know about what makes this model special and showing you some of my early test results. Links: https://claude.ai/ https://www.anthropic.com/news/claude-3-5-sonnet https://x.com/mikeyk/status/1803799827182154205 https://x.com/emollick/status/1803868040779481121 Chapters: 0:00 Anthropic’s New Model 1:25 Benchmarks 3:27 Testing 6:15 Artifacts Feature #ai #claude Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 🐦 Twitter: https://twitter.com/TheAIAdvantage 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (4 сегментов)

  1. 0:00 Anthropic’s New Model 323 сл.
  2. 1:25 Benchmarks 475 сл.
  3. 3:27 Testing 720 сл.
  4. 6:15 Artifacts Feature 1440 сл.
0:00

Anthropic’s New Model

and it happened again one of the big AI companies in this case anthropic released a state-ofthe-art model in multiple departments Cloud 3. 5 Sonet and as opposed to some of the latest llm announcements that were exactly that announcements we didn't have anything usable this one is a breath of fresh air you can use it right away even for free there's limitations we'll talk about this in a second but not just that the benchmarks are better than even gbt 40 but supposedly the vision recognition is state-ofthe-art as of today meaning if you upload images no other model is better at recognizing what you gave it and this is my favorite they made a massive step forward in terms of how to interface of an AI is presented to Consumers they introduced this experimental feature they call artifacts where if the AI writes code for you whether that's a website or a game there's one half of the screen where you have the chat as you're already used to with Absol chat GPT but the second half is used as an interactive code editor and by default you don't even see the code it just happens it's interactive so you talk to the chat on one side and you get the results on the other side this is something we've seen with demos like Devon but none of these were available to the public they always just announce stuff and we never get our hands on it GPT 40 Voice Assistant Sora Devon and also many Google products are just a few examples this is available today for free and especially this interface really is a glimpse into the future of these products so let's have a closer look at everything that came out here with CLA 3. 5 Sonet and if you should be using this over something like cat GPT with the GPT 40 model all right first
1:25

Benchmarks

things first the facts what is happening here Claud 3. 5 Sonet a brand new model coming from anthropic AI released today and it's their best model yet and for are you true Enthusiast this might be a bit surprising right because up until now they had three models Haiku the smallest one Sonet the mediumsized one and Opus they're big boy competitor to gbt 4 and now 40 and there's no update to that large model this is the mediumsized model that got upgraded to the 3. 5 model and it outperforms their big model Opus it is a whooping 200,000 tokens of context that you can use and let's just briefly talk about benchmarks yep it crushes virtually all benchmarks against Opus and GPT 40 nice but as you know I care more about the usability the use cases for consumers rather than these benchmarks but it's always good to see that they yet again rais the bar on what's possible another fact that this is fast open I kind of sent a standard with gbt 40 generating super quickly Opus was slower than that this is twice as fast as Opus and feels virtually identical to open AI GPT 4 model if you had experience with that over the past few weeks its knowledge cut off is in April 2024 which is super recent that's 2 months ago and last but definitely not least it's freely available to everybody even in Europe I'm sitting in lisban Portugal and accessing this without a VPN now do keep in mind that if you don't have a paid account you're going to be limited to around 12 messages every few hours which means you won't be really able to complete projects but you can get a taste for it or get some quick results for my testing and examples I'm on the Pro Plan here and now we can talk about the Innovations here because there's two big things I want to highlight the fact that it's slightly better on some benchmarks at this point as a consumer I'm not sure if that really matters as long as you're not talking about something like code generation or math problems because let's be real if it's a 86 or 88 on mlu for most people watching this video it's not going to make a difference what is though is the vision capabilities because the various jumps between the models recently have been massive current state-of-the-art GPT 40 before that Opus but now CLA 3. 5 Sonet smashes most of these benchmarks especially when it comes to reading charts and documents which is interesting because that's a very common use case at least for me I upload a lot of charts or infographics and what this means is that even more complex infographics will be easily digestible
3:27

Testing

all right but those are all numbers let's see practically how it forms I went ahead and put this to the test right away my first example is one that we've actually explored on the Channel with previous versions of vision and it's this road sign that is quite complex you need to look at all of this you need to make sense of it and then you can make recommendations I basically uploaded it and asked can I park here on Tuesday at 6 a. m. and it correctly answer is that yes you can park here on Tuesday on 6 a. m. and then there's the reasoning quick comparison inside of cat GPD we get the same thing mind you this already worked with chat GPT Vision on release so let's step it up a bit shall we I provided it with this image from a War's Walo book you're probably familiar they're very visually complex and you need to find this tiny Waldo character which has a red and white striped t-shirt for humans very hard it's timeconsuming that's why it's such a popular book cuz it's a good challenge I uploaded this image to both Claude and Chad GPT and gave them the identical prompt I told it Where is Waldo question mark very Innovative prompt engineering right here I know and I was surprised by the results both of them told me that hey to find Waldo you can look for his characteristic outfit but the scene is extremely busy and intricate and I can't actually locate him hm weird right well thanks for letting me know this is a busy picture but where is he where and Claud son it continued to apologize to me that it can't actually find or point out where Waldo is and let me tell you no matter how you prompt it it's just not going to tell you and this is not because the vision is not capable it's because anthropic is notorious for its limitations and what we ran into in this example is that they're refusing to identify various people in an image in this case Waldo it just won't do it cuz people could be abusing this for nefarious reasons C GPT on other hand when I told it but where is he straight up told me Waldo is located near the bottom left corner of the image I spent about 3 minutes looking for him still couldn't find him so I followed up with be more precise I can't find him and it gave me pretty exact coordinates and more descriptions where was he there he is and yeah turns out there is Waldo you can't even see the shirt it's only his head so it's more restricted just something to be aware of let's give it one more shot here I uploaded the image of the First YouTube homepage that pulled up for me and then I gave it a simple prompt list all the details of this image relevant to a YouTube content creator in the generative AI Niche and the result are quite interesting because the category that Claude Sona 3. 5 chose for me are very different from what cat GPT decided to do with the same prompt and image subjectively and this is just me and my personal taste I would say that the CLA results were more relevant plus I can tell you something about the mopic family of models the writing style generally speaking and I think most people would agree don't have this typical cat upt style that is usually frowned upon these days as in details that it was able to pick out from this both of them were excellent I can't really make a value judgment because both of them picked up on all the details that I would care about here so look at the end of the day I think when it comes to Vision it really matters what you think and what your use cases look like and one you'll just have to try yourself and see how it works for you and to as time passes and more people test out their use cases will learn more and I'll be sharing it here on the channel but the vision looks
6:15

Artifacts Feature

excellent and now let's talk about the second point which is something that I absolutely love to see it's this new experimental feature that you can enable by clicking on this little icon here at the bottom and it's this artifacts feature and this is simply explained the cloud models always Excel that coding now according to the benchmarks they even better than they have been before but what if you're not technical what if generating code is no use to you well a lot of people including me were trying to make these simplistic tutorials where you know generate the code anyways don't read it just copy paste it and bring it into another code editor that can display it and you can still kind of work with it but let's be real for most people who don't know how to write code this workflow doesn't work and even for people who write code it is cumbersome to always take it bring it over to your IDE run the code and then having to copy paste the feedback back into something like Claud that's why certain code editors like cursor where all of this happens in one interface have become so popular but now they're bringing all of these capabilities to the consumer you don't have to know how to code in order to do things with code because with this artifacts feature you can do something like this generate a portfolio website for a designer that's the simplest prompt you could come up with right very intuitive and it's going to start writing the code you're going to be like hm okay great now I have a window with a bunch of code I still don't know what it means but wait a second as soon as it's done it switches to this preview mode and you have the viewer right inside of this interface this thing is interactive you can press buttons you can move around and you can have multiple projects open in one and if you want the code you can always switch on over here and copy it or download it for yourself but if you don't want to mess with that you don't have to because the chat is still here on the left side and as I mentioned earlier this has been a thing with applications like Devon that have been announced previously but never released to the public if you're not familiar with a similar interface chat on one side code editor and preview on the other here they combine them so know in practice I might generate a website like this but hey this should be a portfolio website for a designer this looks like something that a 10-year-old that is learning to code would create do 10-year-olds learn how to code I guess with apps like this they will be anyway besides the point if I don't like this I can follow up with something like make it more aesthetic whatever that means right I just know I don't like the look of it make it more aesthetic should do it let's see it's writing code that is pretty neat let's see what we get and look we don't even have to make Cuts here this is all happening in real time it writes it so fast I can just watch it and talk for a few seconds and voila I think we should be rounding it out here there you go the last sections and okay it's taking a few more seconds than I thought but no worries here it is see that wasn't too bad and all of a sudden the website is way more aesthetic I could have a background image here at the top look at that this is starting to look good there's little animations I mean are you serious look at these prompts generate a portfolio website for a designer and make it more aesthetic and you get this impressive to say the least the same thing with chat GPT would have took twice as long cuz there would be a bunch of copy pasting back and forth iterating here it just happens and as their demo videos and many people on the internet have already shown this is not limited to website you can build simple little games you can create Graphics you can do all of that within this little interface yourself you might have seen The Benchmark of creating a snake game in a llm a million times before now you can try it for yourself and you don't need a code Editor to do it I'm not sure if I'm getting the point across how Monumental of a development this is because really this is a massive step from just assisting you to actually doing aka the difference between an assistant and something that people refer to as an agent these days and I think that so many AI terms have a very blurry definition I mean how would you define AGI you ask 10 people you get 10 different definitions how do you define an agent and the worst one of them all what the heck is even an AI expert what I do know though is that this direction where the interface does the work for you less and less prompt engineering less and less copy pasting your use cases with practical results right here this is where we are heading let's round this out with an example from Ethan mullik here on Twitter where he built a game Prototype that teaches about opportunity cost but is an arcade game with lovecraftian elements and then all he does is follow up with make it better again super generic but these models are becoming better and better at understanding what you mean without extensive prompt engineering I actually just created a video on why you might not need prompt engineering at all and in what situations you do long story short if you're doing it yourself you might not need it you just need basic llm education but if you're delegating it if you're creating chatbots if you're creating autom ations or if you're going to be using that prompt regularly in that case prompt engineering is still a must we'll be uploading that video next week so keep your eyes out for that but back to this little demo it generates all the code and you have a little interactive game and now you can build interactive games for co-workers for your children or just for fun this is not exclusive to people who know how to write and execute code it's not just about the writing could have been supplemented or replaced with something like GPT 3. 5 already but a lot of people got stuck on the point that they had to install vs code and Python and then maybe they had some package conflicts and they just didn't know how to resolve it this is very common a lot of people make it sound very easy but what if things go wrong you'll get stuck not with this as you can see I'm absolutely loving this feature I think this is going to be available in every single llm soon and there you have it CLA 3. 5 Sonet freely available one more thing if you're interacting with it down here you can switch to models but there's no real reason to use Opus right now although it's their premium model which also hints at the obvious feature with a Opus 3. 5 coming out which would be their big model in other words they're probably keeping an ace up their sleeve for open ai's next big move if that's gbt 5 or whatever you want to name it we don't know but that's on its way with have llama 400b that is in training which is going to be the open source version of this space moves so fast and it's very refreshing to see some of these Innovations arrive in my browser usable today all right and if you're interested in learning about AI tools that are usable like this every single week I run a show on this YouTube channel called AI news you can use every single Friday I gather all the updates all the usable apps and present them to you plus we're doing more and more testing in the process so it's often not just the first look but also second or third look here's the playlist subscribe for more content like this and I'll see you soon

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться