Subscribe to the channel for more like this!
In this video, Igor breaks down OpenAI's new GPT-5.2 model, which is now powering ChatGPT for both free and paid plans. Enjoy!
Links:
🔑 Free ChatGPT Prompt Templates: https://bit.ly/newsletter-aia
🧑💻 Igor Pogany on LinkedIn: https://bit.ly/IgorLinkedIn
🐦Twitter/X: https://bit.ly/AIAonTwitter
📸 Instagram: https://bit.ly/AIAinsta
https://openai.com/index/introducing-gpt-5-2/
Prompts:
create an svg of the death star in the sky above los angeles
Create a beginner-friendly monthly budget in Excel with 10 common expense categories, a place to enter income, formulas that auto-calculate totals and remaining balance, and conditional formatting that turns cells red when I overspend in a category
Create a tool to help me compare the total cost of a loan with 6.5% interest rate with no down payment vs. a loan with 5.5% interest rate with 20% down payment
Another day, another AI model, but this one it's a big hitter. OpenAI released GPT 5. 2 as you can see here. And I have access in my account and I ran a bunch of test prompts and I picked a few that I think might actually interest you. Some of them are the classic comparisons where you can see the progress because earlier this week we had a massive announcement out of Google with their deep think model which this is essentially on par with. Okay. But then other prompts really show you things that can unlock productivity in your day-to-day. And that's what I'm here to do. My hope is that with this video, you'll see some things that this GPT 5. 2 model can do, which yesterday's GPT 5. 1 model, which was the main model in chat GPT could not do, particularly creating spreadsheets. It's really good at that. And I have a bunch of comparisons between this and other state-of-the-art tools. So, with that being said, let's just get into it. Let's look at the overall announcement and then let's look at a bunch of comparisons because that's my personal favorite way to see how these models are progressing because it just feels more tangible and then I also know what to do and so do you. All right, so with that being said, let's have a quick look at the announcement. You might have heard it already. GPD 5. 2 is here. They got to have a stockpile of these models. No, I mean my sense is they already had this before GPD5 was out and they were just waiting for the competition to come out. And maybe GPD5, people on Twitter are talking about GPD5 being potentially just a dumped down version of what we got here because let me tell you, this thing is the real deal. It's on par with the competition and better in some cases. It's a really good model and the sentiment across the internet is super positive. There's this cycle that you might be aware of. We're kind of right here. They're kind of just trumping each other. But what we're gonna do here is look at examples. But before we do that, I want to show you a few numbers, a few benchmarks. Not many. It's basically crushing 5. 1 on everything. But what I care about is this comparison to their competition and the ability to actually create cheats. Look at that. This is 5. 1 thinking and 5. 2 thinking. just last week uploaded a video showing you a bunch of things in chat GPT that it's not best for sheets was one of the big ones. Claude had that crown and now we get to compare it which is nice. Beyond that on this ARC AGI 2 um benchmark Gemini free deep think that came out earlier this week. I know it's chaotic but try to keep up. Gemini 3 deep think basically was the king on this very hard benchmark essentially now GPT 5. 2 two coming out. Um, you can see that it is on par. The funny thing is like when I look at this graph, they actually did not label Gemini and this is not OpenAI tweeting it. This is the ARC prize themselves tweeting it. So this right here, this little white triangle is Gemini free deep think their most advanced, most intelligent model that thinks multiple minutes every time you ask it something. On par with the new 5. 2 2 Pro model. You can see the 5. 2 family kind of dominating this area up here. And then Gemini 3 Pro that a lot of people are using and switching to honestly is down here on the benchmark at around 32% whereas this new model on a medium setting is at 38. Um on the lower settings of course 5. 2 you can see that it's right around here. So in other words, what this means based on these benchmarks, if you're going to be using it just on the default mode, so you just go into chat GPT and use the outer router, the automatic setting, it's, you know, on par or slightly below Gemini 3 Pro if you use the automatic setting. But if you switch to the pro setting, which is behind the $200 plan, then it's on par with Google's I guess it's like a $150 plan model, Gemini Free Deep Think. All right, lot of models, lot of numbers, a lot of stuff. Basically, they released a flagship model to compete with Google. That's what happened here. Okay, but what we care about is how it actually performs. So, let's go into some examples. Starting with a visual one. So, this is not something you're going to use, okay? But I just want something for you to kind of like picture these benchmarks in a different way. And I recently did this prompt and I really like it. It's basically create an SVG of the Death Star in the sky above Los Angeles. And then I put this into GPT 5. 2 Pro to start with. And this is one of the best examples I got. Look, this look looks like a Death Star. This looks like LA. And these are palm trees. I suppose I put the same thing into Google's deep think mode. So, if you're not familiar, you need to be on the ultra plan here to access that. Just like in chatb, you
need to be on the pro plan to access this model. But when you're on there, you can run this and then wait for 10 minutes to get um uh SVG of the Dev Star. Long story short, it looks like this. And I think these are some of the best ones we've seen because when you compare this to some of the older models, it didn't even look like LA. They didn't even try to do palm trees. I mean, I'm not saying these are good palm trees. the Death Star was all over. No need to even pull them up. It wasn't even close, but all of a sudden, these models for the first time, they create something that looks like what I asked for. I mean, sure, you'll be waiting for 8 minutes, but they do it. Okay, so that's a first view. One extra note that I want to add is like when you're using the Google model, you can't activate deep think, which is the advanced thinking, and the canvas feature at the same time. So, um canvas feature is what allows you to display this in here. Whereas chatd has a more like unified experience where you can do these things at the same time which I really like. Just a little side note. So I had to open it in this external viewer. But now let's look at the interesting thing. Let's look at its ability to generate sheets because I created this. Okay. I basically went into claude and run ran this prompt. Create a beginner-friendly monthly budget in Excel with 10 common expense categories, a place to enter income formulas that autocalulate calculate totals and remaining balance. Conditional formatting that turns cells red when I overspend in a category. And I really want to spend some time here to see how many of these boxes this checks, right? If it fulfills all of them. So, this is the one that we got from Claude. I'm just going to download this. And previously, let me tell you, Claude was actually the king when it comes to creating sheets. They shipped an update with first Sonnet, but now their newest model, Opus 4. 5, and it was so good at cheats. And turns out this is one of those things that if you learn that you can do this with AI, you'll just keep returning and doing it because it becomes so easy and it's so convenient. It's actually something you'll start using if you discover this ability. Apparently, now GP 5. 2 has it, too. So, I ran the same prompt through it and I wasn't really sure how to do this comparison because look at this. If you go into Claude, this is not a model that's going to think for 15 minutes. This is not one of those like pro or deep think models like from the competition. It's like a light thinking model. It's always going to be pretty speedy. All you can do here is just say extended thinking and that's it. You can't really put it into like go and think for 15 minutes claude type of mode just like you can do with chatbt. So I think the fair comparison would be the first one where I just set this to thinking in chat instead of the pro mode. And also this is the mode that 99% of the people will have. So I will compare it here. Okay, it still fought for 5 minutes which is impressive to say the least. But I'm going to download this and then we're going to open these two Excel sheets and compare step by step. And then afterwards we can kind of look at the result that came out of the pro model too. And then also I have one more comparison. Um, I was trying a bunch of stuff like retro games and stuff, but I it's not worth showing. I don't think we want to waste any time on that. I want to show you what matters here. So, we're going to open up this first sheet. This is the one that Claude made. And then we're going to open up the second sheet. This is the one that GPT 5. 2 made. And also, what I want to do is I'm going to open a little text edit file here. Um, we can just do new document like so. And I'm going to score these. And we're going to do this together. Okay, so basically there's a few categories that I want to be looking for because this prompt asked for a few things. Well, first of all, there should be 10 expense categories. Then we want um place to enter income. We want formulas to calculate totals and remaining balance, but that's fine. And then conditional formatting, meaning if a certain number uh goes over the budget, then the fields turn red. Something very common and very it's a it's like really a nice to have thing in Excel. Okay. And then we're going to do uh first column is going to be look at that. Look at my text edit skill. First column is going to be chat GPT 5. 2 and then second column is going to be cloud opus. Opus 4. 5. Okay, Gemini is not even close in sheets, by the way. I tried it. Um, so we're going to compare these two here. And yeah, with that being said, I think we're ready to have a look at these. Okay, so let's start with the first kind of criteria as I resize this thing. Um, I hope this is visible. I'm going to zoom in maybe a little more. That's nice. Okay. So, first category 10 expense categories. Okay. 1 2 3 4 5 6 7 8 9 10.
This does it. Yep. Okay. Same here. Miscellaneous. Okay. So, both that was a really easy one. Okay. So, I would say both of these succeeded on that category. What about the formulas to calculate though? Because if I say budgeted, you know, $100 here and 200 here. Oops, that's a mistake. 200. Yep, that autoc calculates. 100 200 That also autocalculates. Does it work in here too? 199 and one. And you can already see the conditional formatting at work. Yeah, this is working. Oh my god. Okay. Okay. So, it looks like both the formulas and the conditional formatting worked out here. Look at that. Formulas work. Conditional formatting also works. So, it's really a tie on these. Now, let's have a closer look. Which one do we like more? I think styling wise, it's kind of the same thing. Hm. You can plan, you can budget here. Let's say we have $1,000. Then this updates. I like that. It did the same thing over here because we asked for it. It added total expenses down here. This is you really have to be super nitpicky. This gave you the how-to, which I like, but honestly, it's tied. And that's a big deal because Claude was excellent at this. So, let's close down the one that we got from Chat GPT. And my conclusion here at this point, as I clearly showed you, is that they're equally as good. Now, you could run five more examples, but you'll find consistent results. It's just really good at sheets now. And this is the one I got from um Pro, I believe. I just want to make sure that's correct. This is the next application. Yes, monthly budget. Let's download this and see how this one turned out. This one did it in uh check currency, but it's essentially the same thing. So, there's no real practical difference between the pro model generating it. I believe I switched this after the fact or did I? Um there's no real practical difference between the pro model and the thinking model. I reviewed these results before shooting the video, just so you know. So, I'll run this one more time. Maybe we can verify this once more, but there's no practical difference, which is kind of amazing because then there's no real reason to get the $200 plan for these sheets. I like that. Um, you can clearly see that this one tool works. Um, yeah, it just works. So, let's have a look at one more thing here. One more thing which I want to check out which is how good is it at creating these little applications because as I got more proficient with AI, I find myself more and more in this situation where I have a prompt that I want to use but then immediately my mind goes towards okay what kind of like dashboard or website or web app could this prompt be supported by so that I use it more often and get more value out of it. So simple to vibe code these things up that it's sort of just like a default thing for me. So, let's have a look at something um that is simple but valuable and let's see the different approaches cuz they really do differ. Look, I ran it through all the top three models. Okay, create a tool to help me compare the total cost of a loan with 6. 5% interest rate with no down payment versus a loan with 5. 5 interest rate with 20% down payment. And you can see that GPT 5. 2 thinking actually defaulted to doing this with a sheet. Now I wonder this is just out of curiosity but if I run the same thing if I turn on the thinking model and I enable um and I tell it to do it with canvas like so will it vibe code app but without telling it about anything it just defaulted to an Excel sheet which I think is interesting. Let's have a brief look at that. There's a calculator okay with chart metrics. I don't really see the chart so let's quickly download this and open it up. And then there seems to be ah no works. I just have to move this up. What a beautiful sheet. I mean seriously, what a great sheet. H very nice. So a lot of this will be dynamic. I suppose I can adjust this formulas. Everything is linked up. This just looks good. I like it. So this is what GPT did for the app, right? Claude very different approach and I think this is interesting because Claude is still the model the go-to model for development for coding and people know this you know there's a lot of there have been a lot of discussions around frontier coding model at this point all people I talk to all most people you ask cla they just use opus it just works it's just reliable it's a bit ambitious sometimes it goes like over what you ask for if you leave the context kind of
undefined that's a core characteristic But for everything development wise, it's just so good. And also, if you use Claude in the web app, it's just so good to like the function. It just works, you know, like you're in here and it tells you, look at that better deal. You can readjust everything. These little apps, this is just it's just good. What can I tell you? Detail breakdown. So, we don't really have a graph here, which I kind of liked in my other app, but I could prompt for that, right? I could set add graphs. All right. and it'll figure that out. But yeah, this is typical clawed manner. It creates a custom app that is really good. And then I did the same thing inside of Gemini. The thing is you can't enable canvas and deep thinking at the same time. So again, this these are just the common models, okay? Not the $200 models, just the subscription models that you can also get on the free tier, limited usage, but you can get them on the free tier, right? This is it. And this is what you get here. So you get a bit of a visual down here. And I have to say this result is so solid. And that's kind of the story here. Honestly, that's the overarching story. All of these are so good. But there are specific things that differentiated them. But with GPT 5. 2 coming out for me, one of the big differentiators was this the fact that the sheets, the Google sheets who used to work um the Excel sheets used to work way better inside of Claude than here. Of course, there's image editing which is better in Gemini than in uh GPT 5. 2. But when it comes to work rellated tasks, I always liked GPT 5. 1. Comes to thinking, strategizing, planning, brainstorming, fantastic at that. Now it's even better incrementally. Barely noticeable in some cases, but it's better. But at these things where it had a massive deficit like creating these sheets or like having uh top scores on the benchmarks and also reflecting in these complex tasks like creating SVG images or coding games etc. on all of those things they were slightly behind Gemini. Now they're on par with Gemini and they have the core capability and one of the big reasons I used to go to Claude regularly. Now, let's have a quick final look at this. And I want to do one more test, one spontaneous test here, and that is writing style. I want to just briefly do the basic writing style prompt. But this is super solid. All of them are super solid. Actually, I think Gemini takes this one. What do you think though? Can leave a comment below. But as I ran this through the pro model, yeah, I think when you run it through the pro model, you don't really look it thinks for all this time. But even with canvas on um it kind of switches up. So anyway, that's the comparison here. This might run a bit more, but I want to do a final test, which is a simple writing test. Okay. Write me an essay to my boss about the broken coffee machine in the office. Okay, let's see. Thinking mode enabled. And let's see. I'm going to look for the basic things I look for here. Okay. So, this is super lengthy. Okay. It wanted to flag an issue that's small in theory but surprising big impact. Okay. This is a very different type of writing. It does not give me follow-up. Oh my god, this is so different than what? I did not expect this. Wait, I'm going to do one more just without thinking. We'll do the instant model. So, usually I'm looking for follow-up prompts. I'm looking for the length of the message. I'm looking for how much it invents things and how many placeholders it keeps. Oh my god, this is super long, too. Okay, here it did the follow-up prompt. Okay, that's good. But there's no placeholders. It's sort of just assuming it for you. And you know what that tells me when I see that? This is a what I would call a go-getter model, a helpful assistant that does not ask for context, but does things. And this actually I'm saying this because it aligns with some of the other tests I ran here. And why does it make it so long? Since the machine stopped working, the effects have been subtle but noticeable. Reduced energy levels. Investing in the repair replacement of the coffee machine to for the team and their collective energy levels would be extremely grateful. So yeah, it's making some assumptions, but good ones. It's nothing crazy. Hm. I don't know. You make up your own mind. Like the thing is if you run this in any other model, even if I go into Gemini here, you will get something short and concise with placeholders. Not in this new model. Let's just go to the fast one and show you this quickly. And actually
I can retract that statement because Gemini goes crazy here, too. Okay. I was under the impression that the other model to be honest I'm not the biggest Gemini user. You guys know that if you watch the channel but yeah if I run it through Opus this will be a two paragraph deal. No interesting yeah there you go. Placeholders. This is what I Yeah. like and this is why I like Claude. So look it's personal preference. I like this about Claude. I like its writing style. I like the fact that it's more concise. I can create this dashboard that it's state-of-the-art at coding and that it also like feels right when you code with it. But at all these business tasks, chat GPT has the tooling. Chat GPT has all the features. Chat GPT has the solid model. It thinks well. It writes well. It's just good overall. And now it can create good Excel sheets and presentations too. There's really not many reasons to leave the app. And that's the reason they released this. that's why they came out with it and I think they did well. So yeah, it's hard to recommend anything besides Chachib for most people just because it's all the features in one and it just works. It's not convoluted menus or limited functionality and stuff. Sure, Gemini is good at taking market share, but if you had to make one recommendations, it's probably chat. I have all three, sure, as do many of you. But at the end of the day, you just got to make up your own mind on what you need, what you like. I' say test all of these. But yeah, chat GPD back on top. Who would have thought? All right, I'll see you very soon.