# I Compared GPT-5 to GPT-4o - The Results Will Surprise You

## Метаданные

- **Канал:** Skill Leap AI
- **YouTube:** https://www.youtube.com/watch?v=mLUVcv3rpKY
- **Дата:** 14.08.2025
- **Длительность:** 24:29
- **Просмотры:** 28,228
- **Источник:** https://ekstraktznaniy.ru/video/12201

## Описание

Learn how to use AI to grow your business with Skill Leap AI. 
Access 20+ expert courses & community, free for 7 days: https://bit.ly/skill-leap

I go step by step to show how I compare GPT-5 and GPT-4o in real tests.

I use the same prompts in ChatGPT and try auto mode and fast mode. I test coding, speed, document making, image work, dashboard building, and facts with sources. I keep the setup fair so you can see the true GPT-5 vs GPT-4o results.

GPT-5 with thinking on does better at hard tasks, like coding and making dashboards from data. It also handles visual input and reasoning well. GPT-4o is usually faster and makes cleaner PDFs and docs. Both had some bad links in citations. Both still struggle with perfect YouTube thumbnails.

If you want a simple GPT-5 vs GPT-4o comparison for ChatGPT speed, coding, documents, images, dashboards, and reasoning, this video will help you choose the right model for your work.

## Транскрипт

### Segment 1 (00:00 - 05:00) []

For the last couple of days, I've been testing GPT5 versus GPT40 since they brought back GPT40. If you're not aware, GPT5 got a ton of backlash. A lot of people did not like it. I read two 300 comments in my video about it and people were really upset. So, I wanted to put them side by side and run it across 10 different categories of prompts and do some testing inside of chat GPT like SP test. So we could see is GPT5 actually worse than GPT40. Okay. So inside of chat GPT under the dropdown in the models dropdown you have auto and you have fast thinking. And if you have the team's plan which I have here or the pro plan you also get this pro option that is really intended for research. And I have a video coming out just about this pro plan. It's actually really good for these type of tasks. And then you have the legacy plan which they brought back GPT40 and I have that open in this window. Now what auto does is it decides how long it needs to think to give you an answer. This was how they released it in the first place. Now they didn't want you to pick before, right? But now you can. So auto will decide if it could answer you instantly with the fast mode, if it needs to think and longer. So those are all going to be available here for you to choose from. For most of these though, I'm going to leave it on auto here because with GPT40, we don't have that option. So, I think it'd be a fair test here. Just the regular GPT5, which is the version that came out, which is the auto mode versus 40. And there's actually one more setting I want to show you that's kind of hidden. And I don't know exactly when they roll it out, but if you go to your settings and customize GPT here, there's this option right here. What personality should chat GPT have? So, you could click this dropown and actually change it. Now, I've already set mine up with custom instructions. So, I told it exactly how it should talk and I have ton of custom instructions on mine, but you can choose from these. So, synic, robotic, listener, nerd, or the default, which is the cheerful and adaptive. Okay, so with the very first test, I want to see how it does with bunch of different things all at once. So, vision is one. I want to see if you could read this web page. This is just a screenshot of our website. And what I asked it is, I want to see if you could actually recreate this. So, it's going to do some front-end coding. Then, I asked it a business task increase this for conversion. So, I want to see can it improve things for better conversion. And then I wanted to recreate this in canvas so I don't have to download the mode. Canvas is the mode that visualizes everything inside of chat. GPT has been around for a while. And I'm going to send this out for both here. Okay. So, already GPT4 is off to the races here. And five decided to do some thinking. Okay. Let me show you 40 first. So I just have to click run code. So it opened canvas. It did all the things I asked it. And here it is. So it center framed everything. It kept the heading the same. It changed a little bit about the text here. And it has a broken image. So it's okay, but it's far worse than the landing page I gave it. It definitely did not improve the landing page at all. Okay. I'm going to say go to Skill Leap and grab the image link from the source code. And then let's see if it updates this broken image if I tell it to do that. Okay, it's telling me it can't do that, but it's asking me to go do it myself and give it the link. Okay, I'll give it the link here. And it was able to pull that in after I give it a link. But I think the design is pretty bad compared to what we already have. So I don't think this is an improvement at all. Let's see what GPT5 gave us. Okay, here's the result from GPT5. It made a lot of better decisions, I think. Why join? Added a FAQ section on top which we don't have. He actually created the top menu which GPT4 just didn't do. Added a starf free trial here. It has a watch at twominut tour. We don't even have that but that's a good touch here. And it Oh wow. It actually added a whole lot more than the screenshot I gave it. So that's pretty good. It does have some very clear issues here. So these random names it decided to put here, they overlap with the image. It does not have the image, but now let me see if it could go to the website. I asked the same question. Go to the website, find that image, and then update this image here on the right. And it looks like it's already having some issues. So, it's updating the images for me, but then it just removed the rest of the code here. So, if I press run, I'm going to get that error here inside of canvas. Let's see if it could rewrite the whole thing again. Okay, it was able to actually pull in the image. I did not give it the image link. It went and found it on its own and it pulled it in. So, it had that one issue. I did try this three different times though before I record it. And GPT5 actually got it right every single time, but I wanted to keep any type of issue that I have here the time I'm recording this for the video so you could see it. I don't want to edit anything out like issues like you just saw. But overall, it fixed everything else. And yeah, I think this one GPT5 is

### Segment 2 (05:00 - 10:00) [5:00]

the winner. And just to be fair, I'm using the default version of GPT5 which has thinking built in when it needs it. So I'm comparing that to GPT40 which doesn't have thinking. The reasoning models had that like 03 and 04. So it's a little bit unfair there, but I want to choose both of the default options that we have rather than picking the from the model dropdown what they are really trying to do here. All right, let's do a speed test. This should not require thinking, but I still I'm going to keep it on auto. And this is 40. I mean, that was almost exactly about the same. I pressed it a little bit earlier on the left, but they kind of finished about the same time. So, I don't think there was very obvious speed when they don't need the thinking model when GPT5 doesn't and use the fast mode. Let me do it again. This time, let me just try the fast mode with a new chat. All right, let's see if it's obviously faster. This time I'll press go on this first and then this one. Yeah, that was not obviously faster. In fact, I feel like it was a little bit slower than the auto mode and GPT4 finished first. Again, for the last couple days of testing these two, it's pretty obvious that 40 is faster. It's not thinking. So, that is going to make it faster. If I was comparing 03, for example, the reasoning model before that, it was probably going to be slower. But even when I was trying the fast mode or the auto mode on things that shouldn't require reasoning, there's no obvious speed difference. And a lot of times 40 was just faster. So this one I'm going to give the score to 40 for speed. Okay, for the next one, let's see how well it does at creating a document from scratch. I'm going to ask for a fake invoice here in a downloadable PDF. And I want to test this vision model. And then we'll pull in that information from that PDF in a follow-up prompt. Okay, five in this case finished faster just by one second. And let me download both of these and I'll show you the result. Okay, so on the right side, this is 40. On the left side, this is five. Now five created something that looks a little bit nicer, but it has all kinds of issues. I mean, there's a random box up here. There's some text overlap over here. This one really, really clean. Okay, so 40 obviously I think wins this one. Let's do one more. I want to see if you could create another set of sales data. We'll use that for a follow-up prompt, too. Okay, so again, this is 40. And it looks really nice and organized. And on the left is five. And I mean, I guess the data is there, but the layout here is far nicer, right? So again, 40 is going to win this one when it comes to creating any type of documents in these two tests here. Okay, the next one I'm going to upload that same invoice. I'll do the one that GPT40 created to both. And we're going to extract the information and see how cleanly it gives us the information. This is a really big use case from these models. And this is an invoice from my earlier test from 40 that had a little bit more information just to see the other one only had two lines of information. And here is our JSON output here. And yeah, they both got all the information exactly right. So I'm going to give this one a pass for both. Now with our sales data, let's see if we upload that if it could create a nice visual dashboard from that set of data. Okay, this is what I got out of GPT5. I got some code here and it created this graph. Definitely was getting things like this out of GPT3 though. Let's see what we get here. Okay, so I got almost the same exact chart here. It definitely did not create us a visual dashboard. Neither one. Let me try this whole thing again. And I'm going to make it very clear. I want a visual dashboard in canvas from the start. Okay. 40 went right to work. Five is deciding to do more thinking. Okay. Let me make 40 full screen here. Okay. Yeah, that looks a whole lot better. This is kind of what I had in mind. A interactive chart here that I could go ahead and share with my team directly from here. So, this is 40. Okay. Here's the result from five. And yeah, it looks obviously more detailed here. It lacks some color. I think it'd be nice with a little bit more blue and a little bit more purple that it sometimes likes to do, but overall, yeah, and it gave us quick talking points. I think five is the winner here. And I've noticed that is the one place it's clearly better than 40 when you give it visuals or you give it any kind of asset to turn into dashboards, which I do all the time, but I usually do that with clawed or lovable. They just do a far better job. Still to this day with both of these models out, they just do a better job. So, I still use that, but it did do a pretty good job here with five. So, I'm going to give this one to 52. Okay. When GPT5 came out in their release, they talked about how it hallucinates a whole lot less. So, we're going to do a hallucination test here. So, it'll need to capture some data online. So, it'll

### Segment 3 (10:00 - 15:00) [10:00]

have to do a web search and it has to give us citation. I specifically asked for it for two credible sources. and he needs to answer these five questions below. There were just different factual questions here. And a lot of times with 40, I was getting links that were just not clickable. They were just like fake links that it would make for us. Now, hallucination in my opinion is by far the biggest problem with generative AI, especially these large language models. So, let's see if we are one step closer to fixing that. Okay, let's go through the answer with five. So, the Hubble telescope launched in 1990. They both got the same answer here. Let me click the NASA link. Okay, that's good. And let's click the space agency link. Unsafe link. Okay, so that gave us a warning. That's unsafe link, but it does look like a real legit website here. Okay, let's try GPT4. So, same answer. So, it got that right. And here's the issue. I got a 404. So, it made up a link to the website. It took us to nasa. gov, but it made up that link which is actually a huge issue. This is one of the biggest reasons why I would use like a tool like Perplexity or Gemini. They just did a far better job at citation and give me real links. Let me see. This one also says unsave. Okay, so this one worked. This one took us to that same place. So they decided to use the same. Let me go to this one. This one actually used two different links here. Okay. Well, five give us the wrong link. This one actually chose the same link here. Yeah, they both give us the wrong link. And I tried a few more links here. So, GPT 40 had about three broken links. This one only had one broken link. So, it is an improvement. It's using the citation, right? The answers were all correct, though, and they both actually gave us the same exact answers. But in this case, I wanted to know where the sources came from. And I asked for a very specific direct link to two sources. They followed that instruction. They gave us two sources, but in this case, out of what 10 sources, three of them were just made up URLs, which I think was a big problem with 40, but it's not quite solved with five. Okay, the next one is for ideation. And obviously we use chat GPT a lot for ideiation but I wanted to see how they would come up with a test like I'm doing right here to compare GPT5 versus 40. So let's go ahead and send this out here. Okay, this answer was really interesting. So I didn't really make it specific if I was testing these models in some other place or inside of chat GPT. So lot of the information GPT5 gave me was actually how I would test it maybe with the API or more technical GPT40 gave me very practical things right reasoning test creative writing test code generation we have coming up and I think they both did a good job they just took a different angle now 40 I think in my personal experience with using these models this is a more practical way to use it actually just kept kind of the API test at the But GPT5 did give us a whole kind of table here that's really nice to look at. And it also gave us bunch of sources. If you look at some of these, it pulled sources from different websites. This one did not pull a single source. So, this used its own training data here to give us this recommendation. Now, as I was making this video, I did go through these and couple of my tests did come from two actually came from 40 and two came from five. the more technical stuff coming up came from five. So I think they both did a good job. It was no clear winner here. And in my previous video, in my first impression after I was using five for maybe three hours, I tested a lot of writing. So if you want to see writing examples, emails, style, tone, all that was covered and I actually liked it more than 40. And it also at the time I thought it was not using these m dashes when it was responding. It didn't really in my test. Now it uses them again all the time. And I have no idea why they're doing this. Correct me if I'm wrong here, but I feel like this is a watermark. So you could tell AI is generating this content. So then it can't train on its own data if it doesn't want to, right? Its own output can't be part of the training data when it's crawling the web again, maybe. But yeah, that mdash situation, it is still there. And I think it's very much screams that AI wrote it. Now when it comes to memory, not just a store memory, but looking at old chats so far, I don't really have a really clean test to show you that, but in my experience in the last couple of days, 40 I feel like just understands me a lot more. So just now when I asked for it, a prompt examples, I think he gave me practical

### Segment 4 (15:00 - 20:00) [15:00]

prompt examples rather than technical ones. Now, I make these type of videos all the time. I'm always asking for practical things, right? I'm not super technical. So, I'm not going to make one for developers only type of video where I'm just testing coding. I was testing all kinds of different work and business use cases. So, it was kind of following along with that without me being very specific in the prompt. GPT5 does not do that. So, I'm on the same account, so I'm not sure why it's not referencing my old chest a way that's intuitive, but it's definitely not. At least it's not obvious to me that it's not. Okay, let's do some coding tests here that we could actually run on canvas. So, in this case, I'm just going to click this plus sign and I'm going to turn on canvas here on my own on both of these. I'll turn it on here, too. And I want to build a goal tracking app. I specifically said React because that will be really easy for Canvas to display. We'll go ahead and send both of these out. Exact same prompt. All right, let's look at what four created for us. Okay, so here's our app. Doesn't look super great, but let's see if it works. Make a video. This video category, this dates, and add goal. Okay, he added the goal. Some kind of weird progress bar here. I'm not sure how I would know how to fill that up. Let me add a second one. Okay, let's try this one. Add goal. Okay, so it's working. It doesn't look great, but functionally it's working. Oh, I guess I could update that on my own, but yeah, in this case, it doesn't really make sense to do that. And it looks like GPT40 wrote 89 lines of code to give us that. And this one five with thinking. It looks like it's writing a whole lot more than that. Okay, 514 lines here. Let's see what we got here. Okay, so already, yeah, this looks a whole lot better. We have dark mode and light mode. Well, that does not work, but cool idea. We have work and personal category. We have a search option. Let's go ahead and add a goal here. Video work target number one deadline today. Add a goal. Okay, he added a goal here. It looks like he added a weekly trend over here. Quick log. Okay, this is kind of how we would complete it. There's a reset option. There's a delete option. Yeah, a whole lot better just from the exact same prompt, right? Couple of issues. Obviously, some of the buttons don't work, but overall clearly another time where GPT5 beats GPT40. And I think a lot of that has to do with reasoning, right? This one, you know, wrote 400 more lines of code. It really took its time. It took 10 minutes compared to 30 seconds the other one took to give us this type of output. So, for this next one, I just want to see what happens if I turn off the thinking part of GPT5. This is going to be another coding prompt. And I think this would be kind of a fair test, but the whole point is with auto that's the default. You really don't want to change that for the most part and let it decide for you. And as I mentioned, I think it does a good job deciding when to think and when not to. Oh, this is interesting. This time it only wrote 177 lines versus 146 lines. So, we'll test out 40 first. Okay. So, already in my dark mode, I can't even see what's going on here. So, business name, tech, New York, balanced. Actually, I'm in Chicago, so I'll choose that. Start a business. Okay, so it's kind of working. Let's go to next month. Yes, very confusing. So, this kind of like a business uh like tycoon game where I'm making decisions as time goes on. Okay, it's not bad. Let's see what we got out of five. Okay, five. Yeah, pretty much the same thing. It also put black text on a black background. It pretty much made the exact same game with the exact same issues. Layoff. No. All right, that's not great. Let me just for the sake of testing the thinking model tool, I'm going to stop this. I'm going to give the same exact prompt to GPT5 and I'm going to pick the thinking model this time. Okay. So, with thinking turned on, 817 lines of code. So, let's go ahead and see what that looks like. Already, it fixed a lot of the visual issues we had with the last one. We could type in our own information here. We have the strategy here that was based on our prompt and start. Yeah, I mean, that's a whole lot better. It is by default though usually not making something very colorful, but you could easily update that with a follow-up prompt. And it's set to auto mode. Let me go ahead and advance to the next month. Turn this auto on here. And now it's moving by itself even, which is really cool. Okay. Raise price. So, you're just making decisions pretty much every month or

### Segment 5 (20:00 - 24:00) [20:00]

every other month here. See how that contributes to sales here. Oh, it looks like I went bankrupt already. So, already I chose poorly. Okay. So I think it's super obvious that it was GPT5 thinking that pushed this to actually give us something really good when the real test of five the fast mode and 40 that's the real kind of appletoapple test and they both did not do a good job. So thinking took this to the next level is most likely if I test this out with 04 the older reasoning model it will probably be comparable to what we're getting out of this too. And I'm going to do one very complicated prompt here for the image generation to see if it's changed how it generates images. This has ton of text here on a board. Let's see what we get out of these two now. Okay, so they both perfectly passed this one and I was actually reading through all this. I couldn't find a single typo here in both of these generations. So I don't see a obvious difference when I've been using this. I'm going to do one more to see if it could get a YouTube thumbnail right, which by the way, it never can. It can't get the aspect ratio in the old model. Okay, that took almost four minutes, but they both finished at around the same time. So, it already both of them got confused and added the initial prompt I had here into the background. GPT5, I mean, that looks exactly like me. This looks like 90% like me, but this one is Wow, that's kind of crazy, actually. I thought they on purpose wouldn't let this model generate something that was identical to the picture, but that looks pretty good, but the fundamental problem is still there. So, I think they both did a good job with the design. This one did a better job, but they don't understand what 16x9 is. And I specifically asked it to not crop the text. Do not crop text. Okay, GPT4, listen. Five did not. Again, no clear winner here. In fact, GPT40, I think, would be the winner in this case because I could technically use this without the crop text. This one, I just can't. It's literally cropping test. I'm not going to upload this one to YouTube, but it the resemblance is like 99%, right? Maybe a little bit wider on the eyes, but that is wild. So, I don't know if they're going to kind of nerf this one and make it where it's a little bit off, not 100% close to the person you upload, but overall, no clear winner here either. Now when it comes to reasoning, this is going to not at all be a fair test. I already tried it and GPT5 wins every single time when it's there is logic, there is reasoning. GPT40 doesn't have that. The older models were 03, 04, 04 mini, those type of models had reasoning. They're not available anymore inside of chat GPT. So that won't be a fair test. So what I'm going to do is for reasoning I'm going to compare GPT5 with things that have reasoning like Gemini and Grock and Claude. So we'll save that video. I have that coming up next week. But for this video it's very obvious that GPT5 with thinking turned on excels at coding, excels at creating visual things, creating dashboards and analyzing documents. But other than that, in almost everything else, it is not beating GPT40. It's very clearly not beating GPT40. And if you remember, if you use GPT3 and then they went to four, it was such a huge leap that I could see why people would be upset. We've been waiting for GPT5 for like a year, and I can't even literally tell the difference in a lot of things. In this writing though, which I covered in the last video, in my first impressions, email, uh, kind of outlines, things like that. But I think it did a really good job. So I was really happy with it in my first initial testing in the first few hours. Now when it comes to agentic use, I covered that in the last video too. GPT5 is really designed to work with the agent mode. So it did a much better job with 40. So I could see that taking us somewhere. But for everyday use, it's just not quite there yet. And I think the big upgrade is just from the users experience of not having to pick between ton of different models. But a lot of people did not like that either, so they brought that back anyway. So I think that was their big improvement they thought was going to change how people use chat GPT. And check out my video I made about Google Gemini. I covered pretty much every single thing it could do inside of the Gemini
