Thew New "Claude 3.5 Sonnet" Actually SHOCKED The Industry! -  Beats Gpt4o
13:24

Thew New "Claude 3.5 Sonnet" Actually SHOCKED The Industry! - Beats Gpt4o

TheAIGRID 20.06.2024 35 631 просмотров 883 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Claude 3.5 Sonnet Revealed! Learn A.I With me - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: https://www.anthropic.com/news/claude-3-5-sonnet Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

so anthropic just went ahead and actually shoed the entire AI industry because they went ahead and released a very fascinating model Claude 3. 5 Sonet which is now the current state-ofthe-art in terms of air models meaning that currently this is the best AI model that you can use and interact with on the planet so without further Ado let's dive into what makes this so amazing you can see on the screen that claw 3. 5 Sonet performs better than any other model from any other company currently and this is something that definitely caught us off guard considering the fact that GPT 40 was released fairly recently llama 400 billion parameters is actually relatively really good and of course claw 3 Opus was the previous state-of-the-art model competing with GPT 40 so the reason that this is such a surprise and such a shock is the fact that this is not going to be anthropics largest model 3. 5 Sonet is actually the second model in the tier which means in the future when they release their updated model it's going to be even crazier so we can see here on the GP QA which is graduate level reasoning this actually takes a 6% jump over GPT 40 or actually a 5. 9% and this is rather impressive it also gets 88. 7% on the MML and you can see that on the coding 92% on the multilingual math 91. 6% and the reasoning over text 87% and the big bench hard 93% and of course on the math benchmark 71. 1% and the grade school math the GSM akk Benchmark 96. 4% now what's crazy about this is that I always like to look at the fine print of these benchmarks and we can see here that for the majority of these are actually zero shot which means that this is just one question followed by an answer now in some cases where it does say zero short Chain of Thought that just means it's one question where they basically ask the AI system to explain its line of reasoning and then in doing so it gets a better answer whereas with these ones where you can see three shot this is basically where they have three interactions with the model and then they ask it the final question and then that is how this reasoning is done of course right here you can see that this is five shot which is definitely very interesting because GPT 40's is zero shot but I'm not trying to nitpick this is definitely a very Stark Improvement considering that this model is blazingly quick now the benchmarks aren't of course the things that we want to look at what actually got released with this because they did show us a few interesting things that they're adding to clawed so one of the things that they spoke about was the fact that this has strong reasoning it sets new industry standards for the benchmarks and we can see here in this demo exactly how Claude 3. 5 Sonet is able to help this user Sam you can see that they're helping them be able to craft this knobble and of course kick off the plot and give various pieces of information that are truly state-of-the-art early tests from many users are stating that this is nothing like they've seen before and you can see right here what's really intriguing is that Claude is outputting a huge amount of text in another box right here which allows you to look at what the model is doing while it's actually generating the text I think this is really useful because we can see exactly how the model is generating everything and it just gives it a cleaner user interface now what's fascinating about this example here is that you can see that they actually ask the user to do quite a lot of things and you can see here that it's making a detailed diagram of how relationships are and how they connect to the characters that is something that I think is really cool and it shows that Claude actually has some increased capabilities one of the ones that they talk about and something that I've heard as well is that Claude has always so far been the very best at coding and interpreting what you want to do with your code so for this now means that we do have a free coding model that is truly Advanced and you can see here it's using its strong reasoning abilities in order to generate a response to the user's inquiry now there was also something that they released that I found was really cool okay and I'm going to talk about that in a moment but they also speak about how Claude now comes with stronger Vision this is a short demo on where they showcase exactly how Claude is used to do many different things with its strong Vision capabilities you can see the user here actually ask Claude that I'm giving a lecture to my class on the human genome he then inputs two images and

Segment 2 (05:00 - 10:00)

says this is a genome timeline of the sequencing you can see the two images here and he says can you transcribe the data into Json then we can see that because a Claude 3. 5 is actually really quick it's able to code this up very quickly and of course efficiently without any mistakes now this is something that I love about anthropic they've released a product that is not only good but it's actually really quick considering the price to speed ratio is very effective you could also see here that the user is able to add their color palette in and then of course once they've added their color palette in it was able to make the adjustments and changes there and then we were able to get this nice visualization where we can actually look at the data in a more efficient way then of course what was really cool about this and I think that this is going to be something that a lot of people do use is that they create a presentation based on this topic very quickly he says use one of the those charts you just used on the slide and you can see here that within I guess you could say a couple of seconds we do get a quick presentation that we could use for pretty much anything that is a workflow that we're seeing demonstrated by anthropic that's rather effective on showing us how we can actually utilize claude's New Vision models so by combining the different inputs we can then ask Claud to make things for us especially in code and in all these different formats we're able to LEL Lage these strong Vision capabilities and this is something that's really effective as the user was able to do this within only a few minutes now there is also something called artifacts so essentially artifacts are really interesting because they actually appear next to your chat and they allow you to see and iterate and build on your Creations in real time so the user starts by stating that can you create an 8bit star crab for me and then you can see that it responds very quickly with the code on the right hand side this is effective because now we can essentially do things that we want to so for example he says can you make some seashells in the same style and he says Absolutely I'll create some seashells to complement our crab then of course you can see that those seashells are there and then what's interesting is the next step so what the user asks Claude to do is essentially build many different things within this artifact area and you can see that as he's managing to build all of these things we're then going to get to a situation where he combines all of these into a working game which looks very effective so you can see right here he says let's make it playable have the crab jump up over oncoming she shells maybe add some styling too I think to want I want to call this a crab clae and then we can see that this user is able to play this game after a few more lines of code from Claude and then of course what we're able to see is that this is something that the user can enjoy so this is another demonstration of how the user interface in clae is changing this is of course something you can try out right now but I think it's rather effective that Claude is adding consistently new features especially being a state-of-the-art model now what really caught me of God and this is something that really caught everyone of God with this entire model is for the fact of the price to intelligence ratio what we can see here is the previous trajectory for the models for anthropic in the cost per million tokens versus the intelligence previously we followed a traditional slope where as the cost was low the intelligence was low and things sloped up very smoothly like this as you would expect on any normal chart but over time interestingly enough what we see right here in the middle is this huge jump up in capabilities for the same price we can see that Claude 3 sonnet is actually the same price for a higher level of intelligence than Claude 3 Opus so this actually means across the board the cost of intelligence is going down and this is a trend that many of you in the AI industry probably would have known is coming but to see it so rapidly considering the fact that claw 3 Opus was held in such high regard to be dwarfed by a much smaller model I'm guessing considering it's much faster and it also costs less is something that I didn't think we would see just for a couple months now I'm really intrigued with as to how they've managed to achieve this in such a short space of time but I do know that anthropic are actually working around the clock to bring us new frontier models now something that they also did actually speak about as well that I've mentioned before was the agentic coding it is that claw 3. 5 Sonet solves 64% of problems on an internal agentic coding evaluation compared to 38% for Claude Opus and it says our evaluation tests a model's ability to understand an open- source code base and Implement a PO request such as a bug fix or a new feature give given a natural language description of

Segment 3 (10:00 - 13:00)

the desired Improvement for each problem the model is evaluated based on whether all the tests of the codebase pass for the completed code submission the tests are not visible to the model which is important and includes tests of the bug fix or new feature to ensure the evaluation mimics real world software engineering we based the problems on real pull requests submitted to open- source code bases and the changes involve searching viewing and editing multiple files typically three or four or as many as 20 the model is allowed to write and run code in an agentic Loop and iteratively self-correct during evaluation and then we run these tests in a secure sandboxed environment without access to the internet and the results are shown in table three so you can see here overall Claude 3 Sonet is nearly as twice as good as claw 3 Opus on coding evaluations and you can see here at the other models how they face in comparison so this is fairly fascinating because this sets a remarkable precedent for what is to come next we can see that clae 3 Haiku 177% Claude Sonic 21 clae 3 Opus a stu jump to 38% and clae 3. 5 Sonic at a staggering 64% that is pretty fascinating I would have loved for anthropic to show us some of these internal demos so we could see exactly how Claude is behaving during this agentic Loop now one of the main things that many people are wondering right now is that if CLA 3. 5 Sonet is such a giant jump from the previous Claude 3 Model what is the jump for clae 3 Opus going to be like in the future you can see here that they said our aim is to substantially improve the trade-off curve between intelligence speed and cost every few months to complete the claw 3. 5 model family we will be releasing Claude 3. 5 Haiku and Claw 3. 5 Opus later this year so it will be interesting to see how much claw 3 Opus improves on his previous benchmarks remember claw 3 Opus was actually the St of-the-art model until today but now of course considering the fact that GPT 40 was released which slowly dethroned claw 3 Opus now that claw 3. 5 Sonet is here this is fairly surprising if there was a state-of-the-art model from anthropic I would have predicted it to be claw 3 Opus since that was their largest model and their most capable model which means if claw 3 Opus or claw 3. 5 Opus is coming in the future and it's going to be their most capable model we can surely say that the Benchmark improvements are going to be absolutely dramatic and considering the fact that the other areas of competition and the other companies are also hot on the heels of anthropic it seems that this industry is getting no slowdowns anytime soon now they also spoke about in terms of the future models they actually stated that they working on new modalities and features to support more use cases for businesses including Integrations with Enterprise applications and one fascinating thing is that they said that they're also exploring features like memory which will enable claw to remember a user's preference and interactions that happened in the past making their experience more even personalized and more efficient so overall what we have here is a fascinating Showcase of exactly what claw 3. 5 Sonic is this is a remarkable air release and anthropic continue to surprise us with their ability

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник