Claude 3 Just Released - “Outperforms GPT-4 And Gemini in Every Category!”

12:52

Claude 3 Just Released - “Outperforms GPT-4 And Gemini in Every Category!”

Skill Leap AI 04.03.2024 12 590 просмотров 298 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

There is a brand new version of Claude, Claude 3 just released by Anthropic and it’s a pretty big upgrade from Claude 2. With Claude 3 beats GPT-4 and Gemini in the top benchmark testing. Claude 3 comes in three different models. Haiku, Sonnet, Opus. All models of Claude 3 have vision capabilities. The best model, Opus requires a paid subscription to Claude 3 Pro. In this video, I'll test its Vision capabilities, writing ability, image-to-code capabilities, and coding capabilities. You can read the full blog post here: https://www.anthropic.com/news/claude-3-family ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ MORE FROM SKILL LEAP: 💡 Join the fastest-growing AI education platform & Instantly access 20+ top courses in AI: 👉 Start with a free trial: https://bit.ly/skill-leap ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

there is a brand new large language model from the company anthropic it's called claw 3 and it's really a giant leap over the previous model claw 2 and based on the top Benchmark testing that they posted they beat Gemini 1. 0 Ultra the recent version that Google put out their best version and they beat gp4 the best version of chat GPT sometimes by a lot in the coding category for example it's almost at 85% with zero shot prompting versus 67 % and it's available right now so I'll show it to you in action but I kind of want to show you exactly what this is all about so claw 3 actually comes in three different models so this is similar to Gemini has Nano they have Pro and they have Ultra so Claud did the same thing they have three models Haiku Sonet and Opus now Opus is their best model but this one requires a paid upgrade so this is the one you're going to get access to if you use CLA for free and this one is going to be requiring you to upgrade to the premium version which is $20 a month and all three versions of claw 3 actually have really great vision capabilities here's again another Benchmark for vision how well it could analyze images and see what's inside of them and look at GPT 4 here my favorite Vision capability of any large language model it doesn't even win in any category here it's lost in all these categories and it looks like Gemini now Gemini Ultra is winning but Claude is keeping up with their best model the opest model sometimes this middle model is also winning and this is one of my favorite updates here this is the main reason I use gp4 over anything else Gemini Claude A lot of times they would refuse to answer so they had a lot of guard rails here that wouldn't let them answer and it looks like with the three different models of claw 3 over 2. 1 they have significantly changed that so now it's closer to 10% or lower when before it was closer to 25% here for how many times it would refuse it and with every model we obviously expect it to be more accurate so it looks like again in The Benchmark testing between claw 2. 1 and the new version again they're testing Opus here so every time you see Opus that's the best model so that's close to Gemini Ultra as far as the best model that Google has and gp24 the best model from open Ai and Claude was always good at having basically the largest context window but right now they are releasing Claud 3 with a 200k contact window up on launch which is right now which is obviously massive but it says all three models will be capable of taking an input of 1 million or exceeding 1 million tokens so we'll see when they roll this out as well inside of their models this is really exciting it looks like we're all moving towards the 1 million context token in on the input side this time but 200k context window up on launch available right now if you're building on top of clot if you're using their API to build your applications this is the cost that they have published so the cost for this model the opest model here is $15 on the input side for a million tokens and on the output side $75 this is the pce from the chat GPT side so GPT 4 Turbo 10 for a million on the input 30 output side and gp4 is a little bit more expensive so this is obviously an expensive model but if you look at their benchmarks you may want to actually upgrade so let me show you the benchmarks in more detail and here's the Benchmark right here but remember this is on their website so we'll take this for a little bit of test here but when it comes to coding it's at 85% where GPT 4 that's only at 67% this is zero shot prompting that is a massive upgrade and some of these it's even getting close to 100% over 95% in multiple categories over here and you could see exactly what these benchmarks are these are some of the more popular benchmarks when they run these different models against okay let's jump into Claud and let's take this for a test I want to test a few different things in this video and then I'll make a much more deep dive video comparing the best version of Claude against the best version of chat GPT and Gemini let's check the vision capability I want to see how well it does here and this time I'll just use sonnet so you could see over here claw 3 Sonet this is the free version right now I'm on the free version of cloud I'll upgrade in a second to the opest version to show you that and I'm going to ask it to write me a product description so this is a good use case because this is a projector but it's very unclear I made sure I don't name the file anything that I could recognize it from let me see if it could do this part and it says it's a high-end gaming console entertainment center it's futuristic design really good product description but he actually couldn't figure out what it was now I did the same thing with GPT 4 and right here entered this Sleek projector so right away he knew this was a projector this is not a traditional projector so it's more short throw but I wanted to see if he could figure out what that is and he did in the product description I don't think he did a good job at all it

Segment 2 (05:00 - 10:00)

always goes way way overly promotional now in Claud we always have this limitation with the free version so I feel like the free version most of the times is not very usable I've used it for maybe an hour and I already ran out of credits I didn't even do that much testing I have one message left so let me upgrade to Pro I upgrade to Pro sometimes and then I cancel when I don't end up using it but if it's going to be the best inclass llm I might want to keep this subscription and by the way when you subscribe to Pro it will let you switch between the different models so you would get the opest model which is the highest model that does require it but you can downgrade to use a different model same as GPT where you could switch between four and 3. 5 okay now when you type in a new prompt you'll have this option so you have the opest this is the most intelligent model that I've mentioned and this is the default model and you can even go all the way back to 1. 2 instant so let me choose this other option the third one by the way is not available I think that one's only available through the API it's a fast model but it's obviously the lowest one out of the three that we have access to and look how accurate this is this Sleek Ultra short throw projector delivers a immersed visual experience wow that's great so this one is very accurate very close to the Amazon description of this product and again really excellent when it comes to the tone and the writing style I would rather take this than kind of the story form at GPT takes ever been in the middle of the presentation I thought wow I could really use some oomph okay way different type of writing style maybe you like this I personally prefer the claw version of this okay let's go ahead and start a new chat this time I'm going to check it for is tone and style so let me go ahead and upload a file and with Opus you still have a maximum of five files 10 megabytes each so that will be your upload here but remember you have that massive context window so they could be pretty large files and it's says it does a really good job I think it says over 99% accuracy of what it could pull from the document I'm going to say turn this into a newsletter and with these type of tests by the way I keep my prompts as simple as possible with no context just because I want to see what the model is trained on and tuned on to see what it could do by default and I read through this and every single time the default writing style and the default tone is very conversational again if you pause and read this I think you're going to prefer this a lot more than chat GPT you could see chat GPT Again by default with no context here uses things that are just overly promotional by default game Cher for both individual and businesses I definitely did not say a game Cher in that video so it's using that to kind of embellish what this video is all about and I've noticed words like that throughout and this one actually did a nice job with a PS line let me see if Claude added a PS line it did not so it knows a little bit about marketing so in email EMS PS line are usually the most red line after the subject line so he did know that chat GPT is just a better marketing tool but I have to really fine-tune the tone and style now this time I'm going to do a little bit of vision and coding in the same test so I'm going to turn an image into code this is going to be basic code but it's going to have HTML CSS and JavaScript in the same code and it's a screenshot from the open AI website about their gp4 TBO API pricing so I took a picture from here but this has this calculator here that's going to require a little bit more on the coding side it's not just going to be design and text okay so the first time around it says sorry I cannot convert this into a full HTML CSS JavaScript because it infringes on copyright now the first time I did it before I recorded the video it did do it so let's see if I say yes you can if we could bypass it so this is one of those times where we refuses to answer because of some guard rails in this case it looks like it's not letting me do it again now chat jpt gave me some text about it can't technically do it but it did give me the output now when I tested this it didn't look great but when Claude did answer me from the first time I did it I saved that so let me show you what it actually came up with so this is the text here I copied and pasted the first time it answered me and it was all of it HTML JavaScript and look at this if I do the math it's doing the math on this side so it has everything that I need needed this is checkable here and it's very much like the open AI website here I changed this to hyperlink here that I could link to any page I want and as I'm reading through the text everything was exactly accurate based on what I looked at on that page so this was a really good test because I wanted to see if it would refuse to answer that's why I took a screenshot of someone else's but sometimes you could create something like a sketch and see if it could turn it into code but when I copied and pasted the chat GPT version this is basically what it looked like this is the exact copy and paste I

Segment 3 (10:00 - 12:00)

did the exact same thing with both this is the output I got out of chat GPT right not even close it missed everything and as you could see Claude gave me something that was far more usable and pretty much exactly like I had but they have that copyright protection but when you're turning your sketch into a code this could be extremely useful using CLA and I always like to test that for math so here's an equation here it's a picture so he needs to analyze what's in the picture again all this I'm using Opus here the best version and I gave the same thing to chat GPT and this time looks like chant GPT got it right this is a quadratic formula so that's specifically what I was looking for it to explain it to me this is a Swiss army knife of algebra but Claude it did not tell me at all that this is a quadratic equation I specifically chose quadratic equation because I knew what that looked like and this is going a different direction so I guess when it comes to figuring out what the math equation looks like GPT is winning that but again obviously this is just a onetime test I'm doing couple hours after this came out and I'll do a more Deep dive but let me show you a coding example because this is where the big gap is in The Benchmark so I asked for a simple game of snake that it could write in Python so far every time I've tested this it's only worked on GPT 4 it's worked nowhere else it doesn't even tell me exactly how to install it and Claw did a fantastic job helping me just get this code right here and then tell me exactly how to run it if I run the game it's going to open this up right here and it worked perfectly fine there are no issues with the game the only thing is when I make this with GPT 4 if I lose this game it actually gives me a kind of a game over window in this case if I lose it just closes the window and brings me back into the terminal app but this was the fastest and my best experience ever using any large language model without really understanding code to write Python and run a game locally on my computer now in my early testing Opus claw 3 Opus was extremely surprising it really impressed me in pretty much all my testing except a couple of reasoning and math testing that I've ran it through but I'll do a much deeper dive video comparing it with gp4 and Gemini Ultra to see really now what's the best model cuz this Benchmark is claiming that it's beating everything in every category so we really need to take it to a deep dive test but so far pretty impressive and if you haven't watched my other head-to-head video I took gpg 4 and ran it against Gemini Ultra the 1. 0 version and I did a detail comparison across 10 different categories with ton of different prompts so watch that if you want to see who won that battle and I'll see you on the next video thanks for watching

Другие видео автора — Skill Leap AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник