GPT 5.4 Pro Is the STRONGEST AI Model I’ve Tested (But Costs a TON)

22:17

GPT 5.4 Pro Is the STRONGEST AI Model I’ve Tested (But Costs a TON)

MattVidPro 06.03.2026 19 659 просмотров 736 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

OpenAI’s GPT 5.4 is genuinely strong, but the interesting part is where it breaks, where Pro pulls ahead, and where Claude or Gemini still win. I pushed it through real one-shot creative coding tests: a 3D engine sim, instrument pack generation, a driving game, a water globe, and interactive educational tools. In this video, GPT 5.4 Pro looks like the strongest no-compromises model overall, but Gemini still rules multimodality for me and Claude stays very competitive. ▼ Link(s) From Today’s Video: release blog: https://openai.com/index/introducing-gpt-5-4/ 5.4 mods pokemon red: https://x.com/backus/status/2029711059247059282 Angel Minecraft Demo: https://x.com/Angaisb_/status/2029635731585372598 Adam's Skills Comparison: https://x.com/AdamHoltererer/status/2029926291021894016 Image fails: https://x.com/himanshustwts/status/2029864003217089009 https://x.com/himanshustwts/status/2029864003217089009 API Pricing: https://developers.openai.com/api/docs/pricing/ MattVidPro Discord: https://discord.gg/mattvidpro Follow Me on Twitter: https://twitter.com/MattVidPro Buy me a Coffee! https://buymeacoffee.com/mattvidpro ▼ Extra Links of Interest: General AI Playlist: https://www.youtube.com/playlist?list=PLrfI66qWYbW3acrBQ4qltDBsjxaoGSl3I Instagram: instagram.com/mattvidpro Tiktok: tiktok.com/@mattvidpro Gaming & Extras Channel: https://www.youtube.com/@MattVidProGaming Let's work together! - For brand & sponsorship inquiries: https://tally.so/r/3xdz4E - For all other business inquiries: mattvidpro@smoothmedia.co Thanks for watching MattVideoProductions! I make all sorts of videos here on Youtube! Technology, Tutorials, and Reviews! Enjoy Your stay here. All Suggestions, Thoughts And Comments Are Greatly Appreciated

Оглавление (5 сегментов)

Segment 1 (00:00 - 05:00)

Oh, I should probably clean that up, huh? Everyone, welcome back to the Matt Vid Pro AI YouTube channel. As I'm sure all of you know by now, GPT4. 5 has been released publicly, and this model absolutely lives up to the expectations set out by the leaks we saw. John Bakus here shows off experimenting with GPT 5. 4 autonomously, editing and rewriting the original Pokemon Red ROM, essentially modifying it, replacing the Pokemon with various AI models. The game boots on up exactly as you would normally expect. You start a new game, go through the character creation process, and you can see that some famous names in the AI space are modded right in. You get the idea. He's going through the opening of the game here. We get to choose our starters. The first one is Clawed Safety. Calm by nature. It answers each prompt with care. When danger nears, it adds three extra safety layers. Grock unhinged. It posts first and fact checks later, if ever. Open AI ambition, a polite model with an ambitious plan for prompts. You can see 5. 4 for biasing itself a little when making this mod. Then we actually have to battle Samma in the Pokemon battle. And he obviously is going to throw out open AI. Our user here chose Grock. And if you go to fight, you can see there are a few options here for moves. We have bikini edit and slur in classic Grock fashion. Pretty insane though that this is like a straightup real playable Pokemon ROM, entirely autonomously edited by GPT4. 5. This is but a taste of its capabilities in what you'll see today. And there is a lot more to this story than just pure capabilities alone. Here you can see Angel attempts a Minecraft clone with GPT4. 5 and it nails it very decently first try. Upon close inspection of the footage, we can see it's actually quite detailed. We have textures that 5. 4 before was able to generate with code. Very impressive to me. Nine different kinds of blocks and it actually is creating a procedural world. Angel goes on to build a little bit of a house, but you can see in the corner there's actually a setting sun and nighttime. It's even got some little clouds. Some people were saying, "Oh, Minecraft is solved. " I don't think so. A full-scale game with sole depth and care is outside the scope of GPT 5. 4's capabilities. Adam Halter shows off a sidebyside frontend test. One without skills and one with without skill. The front end design is okay. It's very basic and it's safe in all of the ways that make it feel entirely uninspired and unique. But keep in mind, it was not that long ago at all that seeing something like this would blow our socks right off. You have to think about the acceleration here. AI technology moves so fast. With the skill enabled, the difference is night and day. This is very promising for strong prompt adherence and its ability to replicate very specific types of patterns. Whether you're coding up a website like this or following a specific style of writing. It's important to note though too that Claude Opus 4. 6 is incredible at front end as well as following the instructions in the skills that allow it to do so. Anyways, this text in the center, the red self buttons right at the bottom of this initial description and this wonderful visual right on the side. And all of that just continues as you scroll down. Mr. Halter ran multiple tests. This one is without the front-end skill. This comicky, bold, pop out, punchy color style, I'm actually kind of a fan of personally. A lot of that style that I like seems to have disappeared with the skill enabled. But I don't think there's any doubt that this is easier to understand at a glance, especially with this little spring menu and the settings and everything right on the side. Again, with the rest following suit. Logically, this is better, but my taste remains with the former. In my testing, I've really dialed up the heat. And this first one is with plain old 5. 4 thinking. In the model selector here, you'll see that instant uses 5. 3 and then thinking is strictly 5. 4, but there's also a pro 5. 4. I am not sure whether these are different checkpoints or pro just has extended thinking enabled. Regardless, I had the thinking effort toggle at the bottom set to standard. I wanted a 3D model of a four cylinder engine replicate a specific engine. I want to be able to toggle inner views so I can actually see the pistons in the engine running realistically be able to start the engine, rev it up, and then turn it off. Quality lighting, particles, and effects. And how did it go? Where's my engine? Chat GPT 5. 4. Where is my engine? I believe the engine exists in the code, but on one shot it does not work. Directly compare this with Gemini 3. 1 Pro, which was actually able to do this first try. It's a little bit dark, but you can see the engine piston model.

Segment 2 (05:00 - 10:00)

And yeah, I can actually start the engine. And you can see all of the rotating pistons in there and the car's exhaust. The pistons move faster if you rev the engine up, too, which is pretty cool. But admittedly, this looks very sparse compared to what we just saw from 5. 4. I'm using Google's anti-gravity agent to bug fix what chat GPT 5. 4 produced because it was so much more complex than what Gemini made. 3. 1 Pro only wrote about 300 lines. This is over a thousand. I also put clawed opus with no extended thinking up against this task. And you can see very similar to the issue we had with chat GPT. There isn't any engine to actually be seen at all. Okay, the fixes are starting to come through in real time here as anti-gravity works autonomously and you can just see the level of breadth and detail that 5. 4 was aiming to go for here. Did an agent have to polish it up? Yes. But we were only using standard thinking, not even pro mode. And this definitely has another level of depth and breadth. Okay, if we start the engine, it fires up. The simulated engine has a working throttle. And you can actually see it kind of shake up. And you can see the exhaust area getting very, very hot, glowing. And if I press three for X-ray, oh wow, this is on another level, guys. This is crazy. I know a lot of you probably don't know much about vehicle engines, but I've never seen such a detailed 3D engine model come from an AI like this. You stop the engine, you could see the different pistons there. It tried to do the con rods and the crankshaft and everything as well. It did the valves up top with cams and stuff. Man, this is incredible. You can rev the engine up and down. And it also, you know, tried to build an educational tool as well, showing the different flow of, you know, the exhaust gases or the charged air. And did that all work out perfectly? Not exactly. It did clearly have some issues with the 3D positioning. Like the valves are absolutely incorrect. And, you know, there's just piping and stuff that isn't replicated perfectly by any means, but in all my days, I have never seen something like this. Let's move up to the default pro mode. For this 5. 4 for prompt. I had it recreate 18 classic instruments from band class, brass, drum, guitar, recorder, piano. It used code and its agentic abilities to create a quote unquote production ready instrument pack. The time it took to accomplish this 65 minutes and 18 seconds. This thing will consistently think for longer than an hour with very difficult tasks. The most challenging, bleeding edge stuff, it takes forever, but Pro is competent. Just take a look at how long all of these thinking traces are. It does all of this coding. research. But to build an instrument pack from the ground up like this would take a human far longer than an hour. The full zip pack it delivered is like 60 megabytes. There's a lovely photo in here that is the spectrogram grid for all the instruments. Yeah, it went through the painstaking process of graphing and visualizing everything. Aerafoil Bloom is one instrument. Prism Reed, Nor Latis, Cedar Rift. It kind of went and made its own instruments. The Solar Baritone, Split River, Quantum Snare Veil, Monolith Pulse. It is so crazy to me that this is just a little added cherry on top to help analyze what it actually built. Let's perk our ears up and actually listen. Aerof foil balloon. Let's start with that instrument. Keep in mind it generated that with code. Yeah, it sounds kind of pleasant. Bass clarinet. I like that one a lot. Glass spire obo. Oo, that's supposed to be the saxophone, dude. This is crazy. trumpets a little bit sharper. They all kind of have like this digital essence. Deep tuba thumping drums. Okay. The symbol is terrible. That one didn't turn out good. All right. I like the guitar. — grand piano. You can see it provided a little read me kind of going through and explaining everything for a language

Segment 3 (10:00 - 15:00)

model. I am super impressed with this. For this next prompt, we're back to 3D. This time I wanted a driving game that I can control with W and D. Realistic driving feel suspension NPC cars driving on the road. Supposed to be a canyon location. Again, this one again went to pro thought for almost an hour. And here is what it produced. You can definitely see the road. canyon, which is already so cool. You can even hear the car in the background, but we can't see the car because unfortunately camera controls are all borked and messed up with this one. Let's see if I can get you guys a view of the car. Oh, there it is. There's our red car. Okay, so we are supposed to kind of be able to drive it on the road, but like I said, the camera controls are all kinds of messed up, but you can see it's driving on the dirt. If we sit in our car and wait for long enough, we should eventually have an NPC drive by. Oh, they're coming up. You can actually see them on the mini map. And yeah, there actually is a working mini map, which is also super cool. For one shot, this has gotten pretty far. There they come. They hit you and just kind of sort of push you back over there. Let's see if we can drive after them a little bit. Take me back through the mountain. Chase them. Oh man, the controls for the camera are so terrible. That's all kinds of messed up. We can also step out into a first person mode as well and sort of walk around. That was also in the prompt. But yeah, the cars do sort of look messed up. It's all made of blocks and like weird stuff, but for an LM that really doesn't have access to a computer like a person would, this is so crazy as an HTML monolith. I also ran this one inside Gemini 3. 1 Pro and inside of Claude 4. 6 Opus. So, let's go check those out. Now, for Gemini 3. 1 Pro, this isn't exactly a fair fight. This would be more comparable just to normal 5. 4 thinking. But still, this is incredibly rudimentary. I don't see, you know, really a canyon, just an orange wall. There's no NPCs. Here's my little red sports car. If I press F, I can get into it. And then you can sort of drive it around and it moves like a little RC car. It doesn't really feel as realistic. There's definitely no suspension going on or anything like that. From that initial engine demo we ran, I can tell 5. 4 four would build out something that's a lot more detailed. But would it work on first try? Here is Claude 4. 6 Opus with extended thinkings attempt. This one I can immediately tell more detailed than Gemini's. I can sort of get this car moving, but yeah, the canyon is not fully produced. I don't know what's going on with this. Incredible. No doubt 5. 4 4 Pro got the closest. The biggest caveat with using 5. 4 Pro is the price. We got to rip this band-aid off now. This Pro model, which in the API appears to be a separate checkpoint, astronomically high costs. It is less efficient than Gemini 3. 1 Pro. Opus 4. 6. But the real competitor to those checkpoints on an efficiency level is the regular GPT 5. 4. And oh yes, by the way, this one does have an a million token context window. Finally, as standard with the 5. 4 series, really, when you break it down, Opus 4. 6 and GPT 5. 4 thinking go head-to-head if you're looking at quality output per dollar. But if you need the best, you need that no compromises solution. Right now, 5. 4 Pro is undefeated. Next up, advanced 3D water simulation on a small rotatable 3D globe. I want to be able to whip the globe around with the cursor and actually have the water react to it realistically. I also want to be able to grow lemon trees on the surface of this globe and they will spawn and drop lemons down on the surface. This first result right here is from Gemini 3. 1 Pro. You can see I can definitely spawn the trees on the surface and they will indeed drop some lemons. I kind of like the dot shading texture with this water. It looks pretty cool. And if I shake it around, yeah, the water does kind of react, but it's in a fake way. There's no real fluid physics sim going on. I mean, there's some basic stuff, but not to a super impressive level. It does not feel very dynamic. Here is Opus 4. 6's variant with extended thinking on. This one is definitely more impressive than 3. 1 Pro. It might be hard to tell at first, but as I whip this globe around, it actually is creating real physics wave effects. If I zoom in onto one of these islands, you can see it actually in progress. And there are a ton of settings to mess with. We can change the wave intensity. You can raise or lower the water level. And you can see it all interacts dynamically with the topology. If I lower the gravity a little bit and really start to whip the globe around, you'll see the water sloosh and splash a lot more. Let me go find an island. There you can see it moving and sloshing around. Let's make some waves as well. This allows us to basically just spawn a bunch of water right on the surface. And this will really give you an idea of the

Segment 4 (15:00 - 20:00)

type of water effects that Claude 4. 6 six was able to produce with this one. We now have some massive tsunamis approaching this land. Let's try the suck mode to suck all the water back to the surface. Okay, it made like some weird like split in the center here. Let's try to open that land up so I can plant some lemon trees. Let's plant some trees down here. You can see those dynamic lemon trees are also definitely included in this version by Opus 4. 6. It's a little bit hard to see, but yeah, they are indeed spawning and dropping some lemons, and they even have little like imperfections on the lemons, which I thought was a nice touch. You can also do lemon rain. Overall, definitely super impressed by this. Now, let's pull up 5. 4 Pro. Here is 5. 4 Pro. I'd say this looks a lot more like the Opus 4. 6 variant than the Gemini 1. I think that the water reacts a little bit more dynamically though along with the graphics. Especially if you zoom in here, you can see there's some nice shading and reflections going on with the water. This one also starts with lemon trees growing. I think the lemon trees and the lemons look worse on this one too in comparison to 4. 6. And I do love though that the lemons are just dropping off the surface of the globe. They aren't affected by the gravity. Very silly. It's just because the trees grow so tall. Interestingly, the gravity isn't centered around the planet on this variant. So, if I increase it, you can actually see all the water sloosh down off the topology and eventually make it and clump towards the bottom, which is really awesome to see. I believe this was all intentional because the AI thought that this is what I was trying to go for to see the water get pulled off the surface like that. If I spin this one around, the water seems to kind of disappear. And now I'm just sort of left with this empty planet full of lemons. But yeah, growing trees works exactly as you would expect. And you can, you know, drop the lemons the same. This has the most impressive water physics for sure. And honestly, globe topology, but at the end of the day, I do honestly think that the Opus variant had a little bit more detail and kind of takes the cake. It is very, very close though. And honestly, all three of these doing this on first try, guys, check this out. If I give it a little spin, we should be able to see the water all clump towards the center like that. And then look, if I stop it, it all spreads back out. I sent it an MD file of a previous conversation I had with an AI chatbot about how memories and thoughts are recalled in the brain. So, I said, "Explore the concept in great detail. Build me an interactive web page to explore it. " And while the front-end design is basic, this is so cool because of the interactivity. Concepts are not stored. They are assembled. This little lab app interactive demo explores how concepts become temporary coalition across perception, memory, meaning, body, and conscious access that conjure a frog in your brain. It requires many different systems working together. Guys, tell me this isn't the coolest thing. This is a fully clickable and interactive brain model. It doesn't look like a perfect brain. I mean, it's a language model that made this, but it gets the general idea across and shows how all the different systems are connected. We can click on the language system. Down here, we have different dials for perception, meaning, memory binding, and over on this side, we can adjust a novelty scale, the emotional charge through the system, and sensory richness. If the concept label is frog and we are simply recalling a frog, maybe we're recalling its croak, the word itself, its color, and the motion. These are all the different things that are lighting up in your mind. The visual cortex, color and surface stream, motion cortex. Such a cool educational tool for visualizing this idea. Down here, it gives us another animated view. Also, all affected by these same dials. Let's say you see frogs all the time, but you don't really touch them too much and suddenly you got to clean a frog tank. We'll have the experience mode set to hold. Novelty is cranked up to about 70%, a little emotional charge and flooded with sensory richness. Shape, color, motion, touch, the visceral experiences all come together, and that is how much your brain is apparently lighting up. It really goes on and on. The ability to make custom learning tools like this is so insane to me. Anything you want to know about can be explained in a visual way, deeply with high levels of interactivity. The final test I ran was a deep dive comparison. another interactive website comparing two different types of cars. You can see a bunch of frontline comparisons. Some cool cards here. It doesn't have a terrible knack for front-end design truly. And I've run this same exact prompt with Claude. And while it came very close to this, this is still a better result. It demonstrates all of these comparisons very clearly. And this interactive torque graph is absolutely the best one that I've come across for this prompt. Not to mention this was a difficult point for Claude, Gemini, and Chat GPT previously. The 3D suspension component here where we can actually have a real visual for how the different

Segment 5 (20:00 - 22:00)

types of suspensions on these cars compress. This is all very, very cool stuff, but it's not necessarily modeled with the utmost detail. Finally, before we close off, I want to touch on multimodality. It does appear that the run-of-the-mill 5. 4 forth thinking struggles a little bit with multimodality especially compared to Gemini. Claude has never been a huge multimodality contender and Gemini used to be terrible but now it's actually the best. You can see 5. 4 thinking failing simple stuff like this or this much more difficult test that we watched Gemini 3. 1 Pro fly through a highly distorted Family Guy screenshot. It is unable to identify the characters as Peter Griffin and Brian. Instead, it thinks it's Batman and Snoopy. pretty close, but Google rules multimodality right now. So, when all the cards fall and the night is over, where does this leave us? Overall, I think 5. 4 is a pretty stellar release from OpenAI. This is a highly capable LLM, and I will be daily driving 5. 4 because I want to see where all the cracks are. My previous daily driver was Claude Opus 4. 6, and that is the closest model that I can relate this one to. It feels agentic, smart, creative, and unlike Gemini 3. 1, it has zero case of the lazy bones. It loves to work, work, and work. However, if I need a quick researched back answer or something multimodal, Gemini all day for me, and that certainly won't be changing. 5. 4 Pro, as we've seen today, is really the strongest of the bunch right now, but it costs a ton. This is going to be for those big, most difficult tasks. Something that you think AI can't do right now, and you want to know if it can. With everything going on politically regarding OpenAI right now, this is definitely a distraction for them. And like I said, it is genuinely a strong model release. If you're refusing to use Open AIS based on a moral stance, which I totally understand and respect as a decision, I would caution folks against jumping straight to Claude or Google. If that's the case for you, I strongly recommend checking out open- source models, especially Quen 3. 5. This model can actually run entirely locally on several Apple devices and like any gaming computer or laptop. Thanks so much for watching, folks. I hope this video was a good demonstration and helps you get a decent grasp of what we're dealing with here. I got to go return that bulldozer to the guy I stole it from. See you.

Другие видео автора — MattVidPro

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник