# OpenAI Released GPT-5.2 Is Not What You Think - You Should Be Concerned

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=5G8CIcSppng
- **Дата:** 12.12.2025
- **Длительность:** 10:39
- **Просмотры:** 17,747

## Описание

Checkout my newsletter : - https://aigrid.beehiiv.com/subscribe
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Learn AI With Me : https://www.skool.com/postagiprepardness/about

Links From Todays Video:
https://openai.com/index/introducing-gpt-5-2/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=5G8CIcSppng) Segment 1 (00:00 - 05:00)

So GBD 5. 2 is not what you think and we need to talk about it. So GBD 5. 2 is here and it's a noticeable leap in quality compared to the previous models and arguably even Gemini 3 Pro. Now of course most people are going to look at this model and think well it was just released to of course combat the Gemini 3 Pro fanatics which were all over social media including myself. Now I will say on a first glance it does look like that this is just another model that is once again a leap in capabilities compared to prior models. Looking at the benchmarks that's what you would presume. You would presume that okay this is surprising because not only does it you know perform better on the software engineering bench on the GPQA on other reasoning benchmarks and interestingly enough it does so on frontier math. you would argue that this is just okay somewhat of I guess you could say an expected you know model release from OpenAI. Now I'm not going to talk about benchmarks maxing and all that stuff about overfitting and yada yada yada. What I will talk about is the real deal behind this model because most people are going to miss this fundamental moment and I was debating on whether or not you know making the video with this narrative but the more research I saw the more I realized that this is true. So, this is what you need to take away from this. GPT 5. 2 Pro is the first model that is actually good for work. And I believe that this is probably one of the first times that we're getting an AI system that will fundamentally change the labor force. I know you guys might be thinking that is an extravagant statement. But just believe me when I say that, you know, as you dive into this video, you'll see why GPT 5. 2 is a much larger release than previous ones. And I'm pretty sure opening I knew what they were doing than they would release this. So you can see here the slide that I've got on is the GPD evaluation metric which is specifically designed for knowledge work tasks guys and these are basically white collar work jobs in a majority of different cities or whatever you call it the western world now on GDP val okay and this is why I say this model is the first model that just changes the game you have to understand that GPT5 thinking went from 38% which is well below human level to double that okay to around 75 4. 1%. Okay, so it's winning most of the time in tasks that the majority of people in white collar jobs do. And I'm not saying this to, you know, fearong or do anything like that. I'm literally showing you guys the benchmark. Okay? And this is a wakeup call for me because I didn't think that AI development was going as quickly as it is. Even as someone in the space, I still have relatively short timelines. But, you know, every day that I'm in the space, I'm reminded that, you know, things are moving faster than even I think. Now, when we take a look at this, you can see just how crazy this is. Okay, this is in win rate versus a industry professional. So, think about what kind of implication that's have. If an AI system is in an industry and 38% of the time it wins, most people are going to dismiss that tool and say, you know what, 38% of the time it's absolutely useless. I'm going to stick with a human. But at 74% of the time, you start to think, you know what, maybe I'll start to use the tool. What do you guys think happens when we get to maybe 85% 95%? because of course we know that benchmarks improvements are coming. Now it's important to know that this model additionally I think you know it was also able to do that because it hallucinates less on the paper. You can see that you know GBT 5. 2 thinking hallucinates much less which is honestly incredible and I'm you know glad that they were able to do this. Of course you know hallucinations I'm not going to get into the big thing. I was reading a research paper the other day basically says that hallucinations are somewhat baked into the model. So that's kind of good news for you if you know you don't want to get replaced by AI like nobody does. But this is the deal. I've done detour. When you look at this section from opening eyes page, this was, I guess you could say, one of the most interesting yet most concerning things because what we got to see here was that the difference between GBT 5. 1 thinking versus GPT 5. 2 thinking is completely stuck. I know you guys can't see all of the specific details in the Excel tables. And if you're listening to this audio, I'll just describe it for you. GBT 5. 1 thinking you can see it has a pretty basic boring Excel table you know after asking it to create a workforce planning model headcount hiring plan attrition budget impact include engineering marketing legal and sales departments and then GB 5. 2 to thinking. You can see it does it in an incredible way. You can see this is actually using Excel. Tutorial will be coming on that later. But this is what I mean by when I say this is just a huge jump in terms of capability that makes it useful for the actual real world. A lot of the time, one of the biggest criticisms that most people say about AI is that it's not useful real world. It can't do this. It can't do that. But every single day, we're seeing those big changes. And I think somewhere down the line, openly, I was like, okay, we've saturated the chat model. We've got images, audio, video, vision. Why don't we actually focus on real world knowledge tasks which is the majority of what enterprise users actually want. And here is the deal and this is why I said

### [5:00](https://www.youtube.com/watch?v=5G8CIcSppng&t=300s) Segment 2 (05:00 - 10:00)

most people are looking at this saying okay it beats this agi benchmark it beats that benchmark guys this is probably the model that you know makes the world wake up. I mean look at this again this is uh you know something you know another task and it you know calculates between GP5. 1 and GPD 5. 2 thinking. It talks about, you know, the incorrect outputs on GP5. 1 thinking, but the correct outputs on GPT 5. 2 thinking. Um, and yeah, once again, you can see here, you are a projects manager at a UK based tech startup called Bridgemind. Bridge Mind successfully obtained grant funding from a UK based organization that supports the development of AI tools to help local businesses. This website provid 5. 2 thinking that is a huge increase in terms of the quality. It's not perfect by any means, but I think this is the model that makes people realize that okay, this is clearly the world we're headed to. And um it's going to be an interesting one. Definitely one because I honestly don't know what's going to happen to the economy or the world. But even on long context tasks, we can see that once again on the open AIMC MRCV2 needles and a hay tax, you can see that you know for 256 tokens GBD 5. 2 you thinking it basically doesn't miss any needles in a haststack. So one of the issues of course was you know longer context you know you kind of miss the needles in a haststack. It's difficult to reason over really long pieces but I don't know what they did but they did something right because GPC 5. 2 thinking is basically perfect up all the way. Now remember how I said that this model is basically like a human employee. Uh they've also improved the vision as well. So this is another benchmark type thing and you can see it's able to easily identify things in this image and this image is ridiculously low poly but remember guys this is you know one of the big jumps okay and vision you know all of these things are just key key components and I think people underestimate a 5 to 20% jump across the industry every month that is huge like imagine if you had an investment that jumped 20% a month that would be you know generational wealth pretty quickly envision you know as you can see here and this is the one this isn't even vision this is the agentic benchmark those of you guys who pay attention to the AI world you'll know that tal 2 bench is one of those super difficult benchmarks and I remember I was like okay t2 bench it seems so difficult that like pe like this one is going to hold up for quite a while and I remember looking at it you know thinking that okay it's probably going to hold up for quite a while but telecom you know one of the benchmarks it literally jumps from GPT 5. 1 to 47% all the way up to 98% so once again a doubling that we've seen here which is pretty crazy if you ask me. So the point I'm trying to make here guys is that you know this is the kind of uh you know unlock which is economic unlock which is I guess good for individuals and companies project managers you know maybe it's good if you always have to be in your job but at the same time this is the kind of model that shifts the economy in a ways that I don't think the world is ready for yet and we really need to talk about those implications because of course we can be excited for the model and what it's able to do for us. is going to be able to automate more work. But we do need to understand that, you know, this model is not what you think. It's not just another chat model. This is not that. This is an AI agent that focuses on long context tasks. Look at this. Okay. So, if you think that like, you know, this is just complete clickbait or just, you know, crazy angle. OpenAI had this section on the web page where it's like, you know, what does GBT do for, you know, my I guess you could say industry. And it says here that GBT 5. 2 unlocked a complete architecture shift for us. Guys, this is why I'm saying you can see here that they said a complete architecture shift. Okay, that's wild. Complete. Okay, complete architecture shift for us. We collapse fragile multi- aent system into a single mega agent with 20 plus tool. And the best part is it just works. The mega agent is faster, smarter, and 100 times easier to maintain. And we're seeing dramatically lower latency, much stronger tool calling, and we no longer need sprawling system prompts because 5. 2 2 will execute off a clean, simple oneline prompt. It feels like pure magic. So understand that GBT 5. 2 with one update. Okay, remember it wasn't that long ago they released GBT 5. 1 and in that short space of time. Okay, you can see that it's like you know it collapsed a fragile multi- aent system into a single mega agent with 20 plus tools. That kind of gives me the, you know, vision for the future in the sense that like these models are just going to be, you know, so smart that, you know, you won't need an entire crazy prompt chain. You probably just need to tell the model, hey, do this, do that, and you already know what I'm implying by asking this question within this context. And we're already seeing that crazy jump here. So, for those of you who are thinking, you know, what does this model mean? even do? What should I be looking at here? Well, this is the model that is going to be able to automate a lot of work. It's, you know, a better agent. it, you know, can handle long context tasks. And I think this marks the first shift that OpenAI is moving towards actually moving towards the enterprise because they've been losing a little bit enterprise. It's not

### [10:00](https://www.youtube.com/watch?v=5G8CIcSppng&t=600s) Segment 3 (10:00 - 10:00)

a coding model. It's not a chatting model. This is a smart model that is built for work, for the workforce, for the economy, and we're seeing that unlock now. So, I'm going to be doing a video later on this. If you guys, of course, subscribe to the channel, you can check it out. But this is the doit all agent. And I think next year is when we really start to see the impact of this because things are just getting started and you know like agents weren't that good but now they are good which is pretty scary. But let me know what you guys think about this model. Let me know if you guys are surprised by this. Of course, it it really did do well on um a majority of benchmarks, which is of course once again surprising.

---
*Источник: https://ekstraktznaniy.ru/video/12575*