# Sam Altman Finally Admits It: "We Screwed Up"

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=2Lnyai0Q4GA
- **Дата:** 03.02.2026
- **Длительность:** 8:50
- **Просмотры:** 42,764
- **Источник:** https://ekstraktznaniy.ru/video/12192

## Описание

Checkout Free Community: - https://www.skool.com/theaigridcommunity
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Intersted In AI Business: https://www.youtube.com/@TheAIGRIDAcademy

Links From Todays Video:
https://futurism.com/artificial-intelligence/altman-openai-chatgpt-worse

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Транскрипт

### Segment 1 (00:00 - 05:00) []

So Sam Alman literally just publicly admitted that he messed up the new version of chat GBT. So let's talk about it. So something crazy happened recently. OpenAI had this event where they had the OpenAI town hall with Sam Alman and they were basically talking about many different things. Now it was super interesting because they got to reveal, you know, the future of the company and certain things they're going to work on. But one of the things I found that was super interesting was that Sam Alman actually admitted that they accidentally made the new version of chatbt worse than the previous one. So for one of the first times, you know, Twitter is actually right about the performance of GPT 5. 2 and how it performs. And you know, I'm going to show you guys a clip in just a moment, but in my personal experience, this model has been absolutely awful to use and has been the reason I switched over to Gemini. And I remember I even asked some of you guys, if you check the poll, you'll see that most of you guys have actually switched over to Gemini. And so, yeah, let's take a look at this clip from Samman because I think it's important to see, you know, exactly what he's talking about. And at least this time, I can say they're being transparent about some of the changes that they made. — There's been a lot of discourse on like Twitter and X recently about chat uh about GT5's writing in chatbt um and being a little unwieldy, hard to read. Um, obviously GT5 is a much better agent model, really good tool use, intermediate reasoning, whatever. Um, so it feels like uh models are a little bit spiky or they've gotten even spikier where some spikes like coding got super high, some spikes like uh or it's very unspiky around writing. So I'm just kind of curious how you how OpenAI thinks about that feature. — I think we just screwed that up. Uh we will make future versions of GPT 5. x X uh hopefully much better at writing than 4. 5 was. Um we did decide and I think for good reason to put most of our effort in 5. 2 into making it super good at intelligence, reasoning, coding, engineering, that kind of thing. Um and we have limited bandwidth here and sometimes we focus on one thing and neglect another. But I believe that the future is mostly going to be about very good general purpose models. Um you know, even if you're trying to make a model that's really great at coding, it'd be nice if it writes well, too. Like if you're trying to have it be able to generate a full application for you, you'd like good writing in there. When it's interacting with you, you'd like to have a sort of thoughtful, incisive personality and communicate clearly. Like good writing in the sense of clear thought, not like beautiful pros. Um, so my hope is that we just push to get future models really good in all of these dimensions and I think we will do that. Um, I think intelligence is a surprisingly fungeible thing and we can get really good at all of these things in a single model. Um it does seem like this is a particularly important time to push on kind of let's call it coding intelligence. Um but we will try to excel and catch up on everything else quickly. — And so yeah in that clip you can clearly see that openi strategy was to simply focus on coding because I think they realized that anthropic are just truly ahead when it is coming to coding. And honestly, if even if you guys look at my channel for the past 2 days, you'll see that coding has been taking over in terms of what people are using, where AI is in terms of the actual, you know, I guess you could say use cases. Coding seems to be one of those things that is just super interesting. People are playing around with maltbot. with, you know, clawed code so much. I literally can't get away from the clawed code, you know, environment. And it's not a bad thing. It's just people talking about a piece of software that they truly do love. I mean this is the SWE bench as of right now and we can see that currently the one that is number one is Claude 4. 5 Opus which you can clearly see is a level above GPT 5. 2 and those models that people are using for coding like GPT 5. 1 or GPT 5. 2. So I think this is one of those cases where Sam wants thinking okay maybe we decided to focus on the wrong thing and then you know sometimes when you try to fix a problem you can actually make more problems. So, and so sometimes this is going to happen because like I said in the beginning, you know, I've used GPT 5. 2 and it is genuinely just a struggle to I mean maybe not do basic things but just instruction following and the raw human understanding in terms of just writing it just doesn't work that well. And of course, this is where you know Google Gemini have come out on top. I'm seeing you know a lot of people that are uh you know switching. I'm actually going to make a video on the boycotts cuz I mean it's getting pretty crazy right now. But I think at least OpenAI have recognized that look, they're trying to chase coding and being a good model, but they've had to sacrifice some things. And I think they're just in a weird spot right now because of course OpenAI doesn't really have the entire coding audience. When you actually think about it, that's Anthropic's thing. Anthropic, you know, I mean, as long as they're focusing on the coding ability, the majority of their customer base is going to be, you know, exceedingly happy. So this is where we got of course the text from the article which says the admission raises a high stakes question whether frontier air models can continue to excel at tasks across the board or if proficiency in one domain will come at the expense of a broader skill set now I do think this is a very interesting question to

### Segment 2 (05:00 - 08:00) [5:00]

ask if you're going to chase coding does the rest of the model suffer I don't know I would argue that opus and claude those models are good at coding and they're good at overall writing so I mean I think someone said something that was really profound. Someone said the fact that Anthropic actually made their model to be somewhat honest and somewhat understanding means that maybe it's better in some way. And I know that sounds like some kind of, you know, complete nonsense that's not rooted in any kind of data, but I think they were saying that like they trained the model to just be honest, harmful, and helpful. And so someone said something that was really profound. And I think maybe this is part of it, but I'm not too sure because of course Anthropic are, you know, complete geniuses. They've done something clearly with the coding. But think about it like this. Anthropic, their coding ability, if we look back at it, it is actually very good. We know that it is currently the best in the market right now. Software engineers say it is the best model to use. And most people don't realize this, but if you're actually using Claude to write articles, to write any sort of blog post, just flesh out ideas, Claude is by far the best AI for that. And now, of course, this question from the article where they're saying, you know, will proficiency in one domain start to come expensive as broader skill set? I'm not sure if that is going to be the case. I think maybe like super niche ones might struggle, but I think the broader skill set should remain good as long as they train it in the right way. And something to consider, I mean, someone said this, but I'm not sure if this impacts the model that much, but I guess we don't know. Claude was actually, you know, grown, I guess you could say, with the constitutional AI with the constitutional AI. And that constitutional AI, instead of humans constantly saying, no, that's bad and yes, that's good. Claude was written with a constitution that says be helpful, be honest, don't cause harm, respect human values, and explain your reasoning where possible. And then Claude basically rewites itself to better follow those principles. So I think, you know, it's going to be interesting how they train the models and will open change the way they train the model because chatbt was trained with, you know, RHF where the model answers questions and then humans say this is good, this is bad. So the model basically learns do what humans like. Some people have said that maybe that is why anthropics models are just winning across the board because the way that they train the model it gives the model more agency. I don't know if that's going to be like some you know AI consciousness rant but I think it's going to be something that is you know super interesting that you know we will have to have that conversation in the future. Now of course you know you can see here it says a data scientist and tech blogger Mahal Gupta pointed out in a review that GBG 5. 2 there are plenty of signs that the LLM is backsliding and some of them aren't particularly subtle. He said those includes a flatter tone, worse translation capability, inconsistent behavior across tasks, and some major aggression in instant mode setting and meant to, you know, provide instant answers to simple questions. And I, like I said already, I've agreed to this, okay? And I think you guys have too. I've shown you guys the poll that you guys have. I mean, when I did the poll, I think 2,000 of you guys answered and it was 50/50 between people who said they use CHP and said who they use Gemini. So, it was super interesting um to see that result. And you know this article dives into the performance of the model and it talks about the fact that factuality is better by their metrics but not everyday uses. You know it's confident but wrong summaries incorrect claims. Um I mean and you know it says long context is impressive on paper but there were messy inreal workflows. And I agree with this wholeheartedly because I mean the other day I used this model and it just it kept hallucinating on some super emails and I was like okay I can't use this model for anything because it hallucinated in a certain part of an email and then I sent that email off. They responded saying, "What did you mean by this? " And I read the email and I was like, "Okay, this is a mistake that I didn't even catch. " And yeah, it's 5. 2 is like good at math, but because it's good at math and it's trying to get better at code, I'm not sure the OpenAI have focused on as much as they could have with regards to writing another subject. So, it's going to be super interesting to see where it does go. But of course, let me know what you guys
