GPT-5.4 Is Here — I Tested the New ChatGPT Model

14:48

GPT-5.4 Is Here — I Tested the New ChatGPT Model

Skill Leap AI 05.03.2026 51 642 просмотров 862 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

👉 Join the fastest-growing AI education platform! Try it free and explore 20+ top-rated courses in AI included Introduction to AI Agents course: https://bit.ly/skill-leap I walk through how I test the new GPT 5.4 Thinking model and see how it compares to GPT 5.2 and other AI models. Here is the official blog post: https://openai.com/index/introducing-gpt-5-4/ In this video I test the new ChatGPT update that just released. I talk about GPT 5.4 Thinking, GPT 5.4 Pro, and the new GPT 5.3 Instant model. I explain how the instant model answers right away while the thinking model takes time to reason before it replies. I also show what the Pro model is meant for and when it makes sense to use it. I run real tests with GPT 5.4 Thinking to see what it can actually do. I use it for research, deep web search, building a PowerPoint presentation, and creating a full Excel spreadsheet with formulas. I also test coding by asking it to build a small AI tools website and a simulation app. This helps show how well the new model handles coding, knowledge work, and tool use. I also talk about the new computer use capability that lets the model work on the web, handle tasks, and help with things like emails or data entry. Another update is lower hallucination rates, which means the model should make fewer incorrect claims. Along the way I compare GPT 5.4 with GPT 5.2 and talk about how it stacks up against other AI models like Claude and Gemini. This gives a quick look at where the new ChatGPT model stands right now. If you want to see what GPT 5.4 Thinking can do with research, spreadsheets, presentations, coding, and everyday prompts, this walkthrough shows the results from real tests.

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

We just got a brand new chat GPT model called GPT 5. 4 Thinking and there's another new Pro. Both of those came out today and just a couple of days before that they also released GPT 5. 3 Instant. So this is the model that answers you instantly. This new one after it does some thinking. And the Pro model is really for high-end research. Now, there was no 5. 3 thinking model, which kind of confuses things because now they're going to release the instant model. It's going to have a different number than their thinking model. It might be ahead or it might be behind. They mentioned that in their blog post. Let me jump in there. And I also want to show you on my actual account here. We'll do some testing with GPT 5. 4, but I wanted to quickly show you the five key highlights here that are new with this model. So, the one big improvement that came up with GPT 5. 2 that I covered in my GPT 5. 2 two video was the big improvement it had in knowledge work. For example, it was able to create really elaborate spreadsheets and documents too like presentations and this again has a big improvement over 5. 2 when it comes to creating spreadsheets documents that are nicely formatted and even presentations like PowerPoint presentation. Now they also released something called chat GPT4 Excel. this as an add-on here and you could add this to your Excel if you have a paid version of chat GPT. The other big upgrade is computer use is native to this model now. So this is a first generalpurpose model with native computer use capabilities. So it could actually do things on the web for you. Now they've had their agent that kind of does this already, but this model now has this natively to it. You don't have to use another model to do any type of computer use. And they had a few examples here with the computer actually doing things like data entry or handling emails and calendars for you here again with you not touching anything. This is the version of computer use with chat GPT 5. 4. Now as far as coding goes, chat GPT had a different model called GPT 5. 3 codeex which is also pretty new and this model was just designed for coding. Now their generalpurpose 5. 4 for thinking model matches the quality of this other model the capability of the model that was just designed for coding. Now this tool use is really handy if you do any vibe coding or if you're a developer building with 5. 4 thinking model because now the way it does tool search is a lot more efficient. So it uses a lot less tokens even though the price is slightly higher with 5. 4 over 5. 2. the way it actually needs tokens, especially for things like tool calling. Overall, it will probably be cheaper to use 5. 4 than 5. 2 just because of some of that improvement that they made. Now, on this entire blog post, they don't compare it to other models that are not from OpenAI. They only compare it to other GPT models here, but they did post one chart here online that I'll show you now. They compared it with the best model from Anthropic, which right now is OP 4. 6. And Google's best model right now is Gemini 3. 1 Pro. And the thinking model does slightly win over some of these. In some of the other ones, it does not win. You could see, you know, it's pretty even here with being state-of-the-art model, but it does have a slight edge. So, technically on paper, it's the best model available of any company right now. Okay, let's take it for a spin here. And right now, I'm going to use chat GPT 5. 4 Thinking. And I didn't get early access, but I did get this about an hour after they released it. So, if you're watching this the day I published this video, it may take a few more hours for this to come to your account. They also have this pro account. I do have the pro account here, but this is also available in their plus account with limited usage. This is again for research grade. I almost never use this for everyday tasks. I almost always use a thinking model or just set it to auto. So, it picks between instant and thinking. The instant model that just came out a few days ago is also available here. But for this video, we're going to go ahead and test out the thinking model. The first prompt I wanted to show you is for doing more deep web search, not by using deep research, but just using a regular chat. They also have this option right here where you could change the thinking effort. Now, it's set to standard by default, but if your task is more complex, you may want to change this all the way to heavy, although it does sometimes take quite a while. This is to do a research analysis for me. And I just want to show you this is one of the things they said that had a big improvement over 5. 2. And the one interesting thing is you could actually follow up while it's doing the research to give them more context without interfering with the research. So I could say actually give me 15 sources. I had initially asked for 10 sources and it's going to continue to build on what it was doing before. So you don't have to pause it or start a new chat. You could actually change the direction here during a chat which is really useful. That was one of the other

Segment 2 (05:00 - 10:00)

upgrades that they had with GPT 5. 4. It took about 57 seconds and the plan and I ran this before is pretty comprehensive and it followed my prompt really well. My prompt actually asked for three different sections as the output format for a upfront plan. Then to show me the findings with citations and a final checklist. And this was ultimately to figure out have AI products, consumer AI products reduce hallucination over time, which by the way is one of the other updates which this says this has another 33% hallucination reduction from the previous 5. 2 model. So every model is saying that they are reducing their hallucination, which obviously is one of the biggest problems in AI when it just makes up an answer and it's not reliable. But with every model release, it looks like we're getting closer to that 1% mark. And yeah, this is really, really nice and comprehensive. Again, I ran this just previously, went through the answers, and I thought it was really well done research without doing deep research. You obviously have the tool right here of doing deep research. This sometimes takes 15 minutes, but in a minute, I got a pretty good response. Now, for a follow-up, this says it's good at knowledge work. Well, let's say I need to present this in a meeting. Okay. So, I'm going to ask it for a PowerPoint presentation. And it took roughly five minutes here to create this entire presentation that was a downloadable link. And it looks pretty good. It has all the different sections here that it gave us from the research here. It's also putting the reference to the website that it got this information from. And overall, let me see how many pages. 15 different slides here. And that's exactly what I asked for. I asked for a 15 slide presentation. Pretty good. I think the design could be better. Let me see if with just one follow-up prompt, I could get it to keep all the information, but to redesign. Okay, so four more minutes and it gave us a new design. I could actually go ahead and play it here inside of chat GPT. And yeah, again, it's minimalistic cuz that's what I asked for in my initial prompt, but it did design it in a more modern design than the previous version, but it kept all the content the same. Now, what about creating a spreadsheet? I'm going to ask it to create an Excel spreadsheet for me, and I give it all the different details. Now, it took about 10 minutes. It gave us Excel doc. It was downloadable. I was able to open it Excel. It had the formulas here. Let me just show you inside of chat GPT. And let's see. We got a summary page here. And I went through some of the details. And at first glance it looks great. Obviously this is still done with AI that can hallucinate. So especially when it comes to these type of numbers do a spot check. The reduction is hallucination is improving the output of these models every single time they have a new release. This has all the formulas. You could see on top here the formulas are all there. All the different pages are there. The data chart is there. This is a huge timesaver. I was able to create a PowerPoint from a research and I was able to do again without any research with one single prompt create an entire Excel sheet here that I could download make adjustments in Excel spot check make sure everything is good and then turn it into something I could present to my team very quickly. So huge time-saving things for any type of knowledge work. Now I want to test this coding first. I just want to see if it could create a website that compares top AI tools like we have on Futuripedia. And this is the prompt I'm going to give it with a lot of the very specific details. I want to see if it follows those details, right? Like rounded cards, for example, dark light mode toggle, things like that. And it's going to be something I could just run inside of chat GPT here with its canvas mode and then with the standard thinking here. And I'm going to just let this finish up and I'll show you the result of this one too. Okay, here is the first version of our app. Here we got our light mode. Actually, light mode looks really good. I like it more than dark mode, but it did follow all that prompt. It followed our filtering. It's a little bit too busy here at first glance, but if you click on tools, it loads them here. The information, let's see if the information is correct and up todate. So far, so good. There is no big issues. Does it link out to the website here? No, this link does not work. Okay, so there's one issue there. What about comparing different tools? Claude versus chat GPT. Nope, it won't let me select it. So, it still has similar issues here. It looks like that we had with 5. 2 again from my very first prompt, but that's exactly what I was testing. I wanted to see if you could get this in a single prompt here. But it's not bad. You actually had more issues when I tested 5. 2 two in my previous video, but I just wanted to see a true pass would have been if in one shot I gave a one prompt and I got a perfectly working app. This has a couple of just minor issues that I could fix, but overall I think it did a pretty good job. Now, this one's also another coding test here, but this time I want to compare it side by side with the 5. 2 model as well

Segment 3 (10:00 - 14:00)

so you could see the improvement from 5. 2 to 5. 4 thinking. Okay, this one actually is working pretty good here. dawn all the way to nighttime. Yeah, everything is working exactly as it should. And again, this happened in my very first prompt. Now, I'll show you the result with 5. 2. Now, the result with 5. 2 were totally different. It actually gave me a more cinematic looking simulation, but I think this one, even though it looks a little cartoony, it actually makes more sense here for the type of simulation that I was doing. 5. 1 actually gave me something closer to this one too that I got out of 5. 4. So, another interesting test here with coding. Now, when it comes to coding, I still think OPZ 4. 6 is going to be the best model in the world, even though in the benchmarks, this is beating it, but I have a video coming up which I do every few months where I compare all the top AI models across all the top categories. I'll have that where I do a head-to-head test and I'll test out Gemini and Grock and things like that against this model. Now, since most people use chat GPT as their daily driver, just writing simple things is the way most people use it. So, writing emails, writing blogs, writing just headlines. In this case, I'm going to find a hook for a YouTube video for the release of 5. 4. And again, this is exactly a prompt I used with my 5. 2 video just to compare it kind of head-to-head. And then this is going to be subjective. Obviously, it's based on my taste of what I'm going to like, but let me read through some of the results here. I asked for five intros to the YouTube video, and I don't like a single one. And I've done a lot of work in my instruction at the account level to try to mimic my writing style, and this is kind of not following it. And he also has m dashes in three of these out of the five, which is very specifically in my system instruction not to use mashes. And I thought 5. 2 was actually following that, but this is not following that. Let me do a quick test here. Let's see. 5. 2 thinking. I'll use the same prompt. Okay, maybe not. 5. 2 actually used M dash in every single example here. But I'm pretty sure when I made the video, it was not doing that and it was following my instructions and it was avoiding this right here, which I just think it's a clear giveaway that you're using Chat GPD. That's just what chatd likes to do. And they said they fixed it with the previous release, but I'm getting it still in 5. 4 4 with the thinking model. Okay, let me try again here with the 5. 4 thinking model and let's see if it does the same thing. Okay, I'm not sure what happened there, but no m dashes. Let's read through one of these. Chat PT 5. 4 just dropped and it might be the first update in a long time that actually changes how people use AI every day. The benchmarks look wild. The demos look even better. Yeah, not really high talk. Again, I don't think it's following my system instructions at that count level to follow my tone. I've done a ton of things to train it this way. I typically use projects because I could tailor those more for specific use cases when it comes to mimicking my writing, but overall, again, this is not something I would just copy and paste from here and read it with a follow-up prompt, maybe two. I could get it to go where I want, but I still find Gemini and Claude still do a better job doing more nonpromotional straightforward conversational tone right off the top without doing follow-up prompts, without having system instructions, without creating custom projects or anything like that. This is still not the writing style I would use just off the shelf here. Again, I have a lot more testing to do. This is just my initial testing just off a couple of prompts here in the first couple hours of release. And if you haven't checked out our platform skill leap, on this platform me and my team actually released comprehensive courses pretty quickly after a release of a new AI model or a new AI technique. So we recently rolled out a AI risk management course. This digital marketing course I recently created AI website builder that shows you kind of vibe coding and building really nice looking websites without actually writing any code. We recently upgraded our ultimate guide to generative AI. We initially made this in 2023 and have fully updated it now six different times. So, a newer version of this. I think it's about 70 different lessons. And these are all in linear order. And you get access to all these courses and all our new courses with a month-to-month subscription. And you could also get a 7-day free trial to make sure this is a good fit for you. And we have ton of different resources. Under the resource tab, we have entire prompt libraries, deals with other AI companies, a community learning path. You get access to all of that. So, I'll put a link in the description so you could try it totally for free. Watch as much as you want and if it's a good fit, again, it's a month-to-month subscription. Claude also had a pretty useful update that I covered in this video right here. Thanks for watching this one.

Другие видео автора — Skill Leap AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник