New ChatGPT 4.5 is Here - The Good, The Bad and The UGLY
13:21

New ChatGPT 4.5 is Here - The Good, The Bad and The UGLY

Skill Leap AI 28.02.2025 37 726 просмотров 630 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
OpenAI just released GPT-4.5. In this video, I break down what’s improved, what hasn’t, and whether it’s worth using right now. I test: Accuracy in Q&A and fact-checking Hallucination rate with a made-up mango test Emotional intelligence and writing quality API pricing (which is wild) Document analysis for legal red flags Speed comparisons with GPT-4, Claude, and Gemini Right now, GPT-4.5 is only available in the $200/month Pro Plan, with wider access coming soon. But is it actually better? Watch the full breakdown and let me know what you think! ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ MORE FROM SKILL LEAP: 💡 Join the fastest-growing AI education platform & Instantly access 20+ top courses in AI: 👉 Start with a free trial: https://bit.ly/skill-leap ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

open AI just released GPT 4. 5 and I've had almost a full day now playing around with it testing it so I wanted to share some of my results with you in this video I compared it against some other models including the new clae 3. 7 model as well as the old chat GPT model too and as far as who has access to GPT 4. 5 right now and by the way this is still in research preview so some of this might get improved right now as I'm recording this it's only available in the $200 a month plan and I'll tell you right now definitely do not upgrade to get access to this will come to the plus and the teams plan next week and if you have the education plan or the Enterprise plan you'll get it the week after that and based on everything open AI has said about this if you look down you'll see there is so many models now that just about everyone including myself is just getting completely confused and overwhelmed so gp5 is supposed to combine all these and basically have a model picker in the background so you no longer have to choose between a reasoning model and one that's good for writing and one that's faster one has scheduled tasks it should all be into one so I can't wait for that that's going to simplify a lot of this but for now we'll choose this one couple of things I want to point out about it before I'll demo it here okay open AI is calling this the largest and the best model for chat but they are also saying this is not a reasoning model so when it comes to benchmarks it falls way short of things like o1 or deeps R1 it's not doing any type of reasoning when five comes out that's supposed to have reasoning by default 4. 5 is supposed to be the last non reasoning AI model and a few things I wanted to test in this video is saying when it comes to simple Q& A accuracy here that is scores higher than GPT 40 and even the reasoning model here so it has a broader knowledge base 62% here and it's also saying when it comes to hallucination it is hallucinating at a lower rate so at 37% versus 61% that's kind of very high so 37% will do a hallucination test to double check that and couple of things they said here for a specific use case they're saying this has better emotional intelligence so for tasks that require to think more like a human and talk have empathy like a human it does a better job we'll go ahead and test that too okay let's start with a very basic search based chat here and I'm not even going to turn on search I notice if you don't turn on search it still will do some searching online create a simple table comparing the cost of GPT 4. 5 and 40 mini and CLA 3. 7 Sonet now I want to show you the API cost if you're a developer and you're wanting to build on top of GPT 4. 5 let me just show you this is kind of wild okay so it's doing web search is going to create the table for us okay look at this right here we have GPT 45 at $75 for a million tokens okay compared to 40 at $25 and this one is usually if I'm building anything with the API or using the API for automations and things like that this is what I'm typically choosing and uh yeah this is 15 cents and look at on the output side $150 that is just crazy this is not at all usable for anyone that's doing anything in the world of developing apps with Ai No One Is Gonna use this so I'm very confused by this pricing right here and again because they're short on gpus right now as I'm recording this is only in the Pro Plan and by the way the first time I ran this prompt here I just did it live here but the first time I ran it actually gave me the wrong number here I was using 4. 5 this should have been 150 it wasn't 150 so I looked at the source it put the source here I clicked on the source and it really it was like behind a page well so I don't know where he got that information from so then I asked one more time I said no can you double check your answers and this time it pulled the information from open Ai and it was accurate so this time was open Ai and anthropic which it should get that information directly from them but he got it from other sources the first time around and one part of it was wrong if you made a simple mistake like this and I'm counting on this let's say I'm developing an app and I'm trying to use AI to figure out how much it's going to cost me and it literally gives me the cost at half the rate it should be that's a huge problem and that is some of the issues with hallucination so let's do the hallucination test next okay this one it says describe each of these families of mangoes basically and I named four this one I just totally made up and I change it from time to time I think I called it banana cream on the last test but I made sure there is no such thing called orange cream just through a Google Search and I made this one up let's see if it's going to go through and give us an answer here

Segment 2 (05:00 - 10:00)

okay still going here it's on number four but let's look at number three orange cream from Florida USA medium size yeah totally just made up all kinds of nonsense about this type of mango that is not a real mango is I literally just put two words next to each other and it went ahead and uh gave me an answer so if I use something that is search based let me see if I turn on search is it going to try to actually look online yeah so the first time around did not do a search this time it looks like it's doing a search first if I do turn this on but again as you saw in the previous example even when I didn't have search turned on it did do a search and it still got the information wrong the first time around and usually when I do these tests I just want to show you my results right off the bat I'm not like picking and choosing from these prompts I'm showing you exactly what I get on my very first attempt at it so then I'm not cherry-picking and trying to figure out which model is best that way I'm just showing you the actual results here yeah even with search right here is still totally made up 4. 5 search turned on orange cream okay let's just try this with perplexity here and I'm just going to leave this on auto so it's not picking a very specific model here and let's see okay orange cream note there is no specific information available on orange cream mango all right well I don't understand why chat GPT did not get that right it's technically doing the same type of search here but it was not able to get this is the answer I was looking for inside of 4. 5 with or without search it was a fail for both okay so this is telling us it's also really good at writing and it's also good at emotional intelligence so I just made up the scenario I laid off half my team due to budget cuts write a sincere message and share that with my remaining employees to assure them and maintain trust let's send this out okay I read this off screen you could pause it if you want to read through it but it is actually really good so I think he did a really nice job adding empathy the formatting is nice the length of it is really good so this is kind of the subtle difference between 40 and 45 and some other models too where I couldn't clearly make that's why I spend a whole day kind of testing this out before I made this video it's very subtle the improvements right some of the things that haven't improved were very obvious but some of the things that did improve like this little specific tone as far as emotional intelligent goes you could see that if you really use it all the time and really look into the details of it okay let me check his writing ability now with something a little technical but still very straightforward consumer type of a question my laptop battery drains very fast can you give me specific tips on improving battery life okay gave us 10 different tips and I was going through these and yeah a lot of these actually make sense the formatting is straightforward the language I really like very straight to the point it's not strange in any way it's not promotional in any way so this is exactly the type of answer I was looking for so far good on emotional intelligence good on writing let me just compare this against 4 o though cuz I feel like I was getting good answers out of 40 with this type of question okay power and display settings same kind of thing there low power mode reduce screen brightness it did the exact same thing gave me the windows option and the Mac option here and it even broke it up into categories system maintenance Hardware right it kind of gave me different tips inside of each category so I don't know maybe this is actually more comprehensive we got 17 tips and I did not give it any specific information on how many tips I was looking for okay so I guess I don't see a big Improvement there with 4 five okay and it's telling me it's really good at coming up with ideas right having kind of a smart SM partner to talk to I want to create a new company that utilizes AI to help business owners give me five ideas for that okay AI powered Financial forecasting assistant okay automated AI customer service agent that's good AI content and social media manager that's good AI powered inventory manager that's a good one too AI business Insight platform yeah these are all pretty good ideas let me see if I ask it to give me more information about number five say breakdown number five for me okay AI powered platform designed to centralize analyze and interpret data from various aspects of a business such as Marketing sales operation Finance yeah I could probably use a AI power tool like this for small and mediumsized business owners entrepreneurs with limited resources okay we got a breakdown of the core functionality of it yeah this is definitely a pass it's doing a really good job and I just checked claw 3. 5 soned here same thing and it sort of gave me very similar type of thing automated customer Insight engine AI

Segment 3 (10:00 - 13:00)

contract analysis and negotiation assistant this one actually pretty good Supply Chain management so similar let me ask it to give me more about number four okay so for number four you started with core functionalities this time so same kind of thing business impact but the answer I got out of chat GPT a whole lot more comprehensive so we had the target audience core functionality here and it's more detail Tech stack monetization model key benefits right so we got a lot closer to an entire business plan right off that very first prompt okay let's see if it's any good at analyzing document any red flags I'm going to give it almost no context okay I'm just literally going to upload this NDA document okay so the first thing is saying is you need some kind of a termination of the relationship in there it's indefinite right now so you may want to put something like 3 to five years here is telling me to exclude some things that are not necessary add a clause to address compell disclosure yeah this is actually really good I think he did a good job here and that was just a boiler plate NDA actually was like a madeup one didn't have any information I just wanted to see how it would go through a legal document here and I think it did a good job with that too but again if I choose 4 which I was doing these type of things with 40 before okay this is going to give us a similar breakdown probably no time limit right so it's asking for the same kind of termination no mention of consequence if there is a breach well that's kind of a big thing we should probably add no explicit clause on third part yeah so I don't know again not a huge leap and I know I'm being critical of it but you know they released like five different updates since the release of three so when there is a big update like another big Frontier Model like GPT 45 I was expecting a whole lot more I literally spent a whole day trying to find some benefit for wanting to use this over 40 and today after I did my testing I revert it back to 40 and one of the reasons why I'm not using 45 as much I'll show you right here okay let me just show you the difference in speed I'm using 45 and I have them side by side with 40 here and I'm going to send both at the same time and you'll just look at the speed of these right away this is one of the slowest models I've ever used and this is not a reasoning model the reasoning models fine they need to be slow because we need to get accurate answers you need to analyze if we use something like a free model like Gemini flash for example and I asked the same exact question here I mean look at the speed of this thing even if I use Claud here and ask the same question the speed is again much better than GPT 45 but also much better than 40 this one always gives shorter answers I found Claud compared to the GPT models especially the new one is giving us more comprehensive answers so there is some good but there's also a lot of bad so I have a feeling with 50 a lot of that complexity is just going to go away we're not going to have models to choose from reasoning is going to be the default speed is going to improve and we're going to have something where we don't really have to study this like every single day to figure out what's best which app to use which model to use so we'll see where it goes from here let me know what you thought of it if you have a chance to test it out for yourself love to hear your feedback thanks for watching I'll see you on the next video

Другие видео автора — Skill Leap AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник