# OpenAI FIRES BACK At Leakers On GPT-5s Performance

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=-e77suMqZVQ
- **Дата:** 14.11.2024
- **Длительность:** 13:17
- **Просмотры:** 22,010
- **Источник:** https://ekstraktznaniy.ru/video/13750

## Описание

Prepare for AGI with me - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

0:00 Altman Response
0:47 Model Performance
1:40 Release Delay
2:28 Marcus Critique
3:25 Industry Struggles
4:10 Scaling Debate
5:10 Marcus Predictions
5:52 Current State
6:46 Benchmark Discussion
7:34 Reasoning Tests
8:32 MIT Research
9:16 Benchmark Leaders
10:13 Symbolic Approach
10:43 Performance Analysis
11:25 Paradigm Shift
12:04 Future Direction
12:56 Final Analysis

Links From Todays Video:
https://x.com/sama/status/1856941766915641580
https://x.com/GaryMarcus/status/1855382564015689959

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything

## Транскрипт

### Altman Response []

so s mman actually just fired back at those who have leaked information about GPT 5/ Orion in this video I'll dive into everything you need to know and why this is important for the future of AI so it actually all starts with this bombshell article the article states that open AI Google and anthropic are struggling to build more advanced AI I'm going to be explaining to you exactly what this means but basically this article states that three of the leading AI companies are seeing diminish returns from the costly efforts to develop newer models and here's specifically what it states that has Sam Alman firing back it says that GPT 5 is essentially a disappointment it said that the model known internally as Orion which some would call GPT 5 as they have changed the naming sequence it says that this

### Model Performance [0:47]

model did not hit the company's desired performance according to two people familiar with the matter who spoke on condition of anonymity to discuss company matters and as of late summer Orion fell short when trying to answer coding questions that it hadn't been trained on so they're basically stating that look this model the model that is supposed to be the successor to GPT 40 or as many know it to GPT 4 this model isn't living up to internal expectations which is quite disappointing considering the hype around open eyes Next Generation but there's more to get into it says that Orion is still not at the level opening ey would want to release it to users and the company is unlikely to roll out this system until early next year one person said and of course the information also reported on this article where I spoke about how GPT 5 is experiencing a Slowdown and you can see

### Release Delay [1:40]

from this that the basic is stating that look this model that was meant to come out later this year which is supposed to be GPT 5 isn't currently performing at the rate that we thought it would and because of this it now means that this model is going to come out even later now a lot of this information is actually rather fascinating because some individuals on the internet in the AI Community have been predicting this for quite a while and this is where we get into the drama with Sam Alman so remember how okay right now in this current space it seems as if AI is slowing down at least with the GPT series prior to this there was Gary Marcus who actually said this in an article he stated that deep learning is hitting a wall what would it take for AI to make real progress and I'm going to

### Marcus Critique [2:28]

summarize this article really quickly but basically in this piece Gary Marcus critiques the limitations of deep learning and he argues that while AI excels in patent recognition tasks like image and speech processing it falls short in areas requiring real reasoning common sense and understanding and he basically says that deep learning models often function as black boxes lacking transparency and interpretability which raises concerns about their reliability and critical applications such as Radiology autonomous vehicles and other real things that we're going to need in real life and one of the takeaways I need you guys to take away from this because we're going to come back to this point later is the fact that he advocates for a more integrated approach to AI combining deep learning with symbolic reasoning to address the shortcomings and his perspective is not one that is really popular but he's been very vocal about this for quite some time especially since March 2022 now

### Industry Struggles [3:25]

remember the statement okay deep learning is hitting a wall That's essentially a statement that had been echoed in the AI Community for quite some time and every time AI makes an advancement people have literally been using this and say haha deep learning is sign a wall and then showing astonishing new benchmarks now of course with all of this talk in the space with articles saying that look these companies are struggling to build Advanced AI gbt 5 is apparently a disappointment and of course referencing this article from 2022 which states that deep learning is setting a war Sam mman Actually tweeted yesterday or I think perhaps today that there is no war of course like I said before this is referring to the article released two years ago basically stating that deep learning is hitting a wall now Sam Alman isn't the only person that stated this on Twitter will deoo who is

### Scaling Debate [4:10]

actually working on Sora AGI at openi also stated something rather fascinating he stated that scaling has hit a wall that is 100% evaluation saturation if you don't know what this means he's basically saying that the only wall that AI is going to hit is one where all of the current evaluation methods are going to be completely blown out of the water essentially meaning that look these benchmarks that we're training our models on are going to be completely saturated by the time 2025 is in full swing and we have future iterations of certain models now will deoo isn't the only person who has said that benchmarks are going to be saturated if we do take a look at what Sam Alman said in the open AI ask me anything one of the Bold predictions for 2025 was that apparently open AI are going to be able to saturate all the benchmarks which is a pretty bold claim considering how difficult a lot of these benchmarks seem to be and for the fact that a lot of these benchmarks are supposed to stand the test of time now remember guys the

### Marcus Predictions [5:10]

previous statements deep learning is hitting a wall and news articles are basically stating that look these companies are struggling to get Advanced models so quickly what I want to do guys is show you guys something that Gary Marcus recently said which was folks it's game over i1 GPT is hitting a period of demolishing returns like I said it would now honestly credit where credits due because Gary Marcus did state that by the end of 2024 we will see 7 to10 gp4 Level models no massive advance or either no GPT 5 or disappointing GPT 5 price Wars very little reme for anyone no robust solution to hallucinations modest lasting corporate adoption and modest

### Current State [5:52]

profits split 7 to 10 ways I have to be honest as someone that's been in the air Community for quite some time this is something that I have witnessed on a scale that you wouldn't believe there are 7210 gp4 models there hasn't been any massive advance in the GPT series don't worry guys I still know that 01 exists and of course recently there has been no gbt 5 or disappointing gbt 5 according to these leakers SL people who have spoken to these news reporters now of course there have been price Wars and of course one of the crazy things is that there is no robust solution to hallucinations even in a recent AMA opening eye literally said that hallucinations are a huge problem that they're still working on now remember this is all about the GPT series okay because there is a lot of information about the 01 series that I think a lot of people are ignoring and this is where samman is hitting back and where you actually need to pay attention so pay attention to this part of the video cuz this is where things start to

### Benchmark Discussion [6:46]

get crazy so someone on Twitter actually asked Sam Alman and they basically said what about Chet's art eval and chol's Arc evaluation is basically the toughest evaluation there is currently for llms of course there are a variety of different llm benchmarks and evaluations but this one is the toughest because this kind of question the way that they've set it up is so that these llms they literally cannot memorize exactly what's going on and the reason they've done it that way is because they want to test how an llm truly reasons about a problem that it's never seen before a common critique of large language models is that since you're packing all of this data into it you're essentially just fitting the model with the data it needs to answ those question questions and essentially you're just Building A system that can retrieve the answers at test time and

### Reasoning Tests [7:34]

it's not actually reasoning about the solutions which is why this evaluation was built to test how they reason on problems that they could have never studied before now in regards to this evaluation which you can see on screen those are examples of how you'd solve the problem Sam mman actually said that in your heart do you believe that we've solved that one or no and in my heart I do believe that they've solved this one already and let me tell you guys why I do believe that this Arc AGI Benchmark which is supposed to be the Benchmark that is resistant to memorization and of course the Benchmark that proves whether or not we on the right track to AGI has actually been solved so do you remember the video I posted yesterday I spoke about the surprising effectiveness of test time training for abstract reasoning and the main takeaway from this paper was the fact that they actually get state-of-the-art on public validation accuracy which matches the human score basically on the arc AGI Benchmark or for the public set they manag to get state-of-the-art which matches Human Performance and this was

### MIT Research [8:32]

people from MIT so MIT who tested on this very tough Benchmark they managed to get 62% which was a variation of test time compute the same method that openi are using so think about it like this for chol's evaluation which is the toughest evaluation there is for llms people at MIT managed to get 61. 9% which matches the human score so we have to think what do you think open AI have done with that Benchmark considering the fact that they've managed to create an entire model around this method and it's quite likely that future iterations of o2 and 03 are going to even surpass what we think is possible now what's crazy about this super difficult Benchmark is that currently for this Benchmark you guys really need to pay attention here

### Benchmark Leaders [9:16]

is that for this Benchmark okay the highest score that we currently have is an approach that is pretty different okay so take a look okay you can see that the first one is Ryan greenl and two steps down is one preview now the most interesting thing about this is Ryan's approach okay remember how I said in the beginning of this video one of the main critiques to opening eyes GPT series model was the fact that model doesn't learn and it according to Gary Marcus needs a neuros symbolic approach which is something that it doesn't currently have and basically Ryan's actual approach which is a method of tackling the AR AGI Benchmark he actually uses GPT 40 and he actually uses a neuros symbolic approach and you can see he managed to get the highest level Benchmark now his approach which actually gets the highest level Benchmark is using an llm with a discrete program search which is actually very similar to what opening ey have done with 01 so basically

### Symbolic Approach [10:13]

extrapolating this information all out on one side we have Gary Marcus basically saying that look deep learning is hitting a wall I told you guys need neuros symbolic AI I'm completely right about GPT slowing down on one side he is right because the GPT series is actually slowing down but what it seems to be happening now is that there is another Paradigm that is quickly gaining traction because if we take a look at this current evaluation one that current LMS really struggle with we've already seen people at MIT manage to crush this Benchmark and Achieve basically human level reasoning and we've already seen

### Performance Analysis [10:43]

someone use a neuros symbolic approach to get 72% on one of them and 43% on the private one which is essentially what opening ey are pursuing so if that is the case I wouldn't be surprised if open Ai and samman's team have already cracked this incred L difficult evaluation because we've seen that other individuals have really gotten close to human level performance so what's to say a lab with billions of dollars wouldn't have managed to find even greater techniques than what is out there publicly and what's crazy about this is that this same technique that everybody's now talking about is one that is known for being used in Alpha go which is of course one of the first superhuman techniques that was able to get ridiculously level superhuman performance in a game and I'm pretty

### Paradigm Shift [11:25]

sure you're all familiar with how crazy alphao is so basically I think this is a situation where I guess everybody wins because while some people are arguing whether AI is slowing down or it's speeding up I think what's quite likely is that we have another scurve Paradigm the GPT series Paradigm is potentially slowing down while some people state that this isn't the case I think evidence from all three Frontier Labs is pretty hard to ignore even if they are leaks it's not like there is any company that's going to come out and publicly admit that their models are slowing down in performance even if it is a prior series that they might not be focusing on it's quite likely that the 01 or this test time compute Paradigm is going to be the future of AI that reasons in an

### Future Direction [12:04]

advanced way and I would honestly think that this is something that the majority of normal people don't need to worry about because 01 02 03 those kinds of models are going to be used in areas where you only really need Super Advanced reasoning which is not something that you use on a day-to-day basis most people are going to be pretty happy with a GPT 4 level model that's able to perform well on your day-to-day tasks and when it comes to Benchmark saturation I wouldn't be surprised if benchmarks are saturated because when we actually look at what Sam alman's basically saying if we remember the test time graph we remember that these models get better when submitting more samples for example when 01 was allowed to submit 10,000 samples per problem it achieved a gold medal threshold on one of their competitions so let's think about this for other problems what would happen if the model is able to think or submit 100,000 samples for a problem that's even more

### Final Analysis [12:56]

difficult could it increase the Benchmark by let's say 2% each time it's quite likely that those benchmarks will become increasingly saturated as we increase the submission size overall I think this statement there is no rule is pretty accurate and I do think that currently the state of AI is very confusing because on one side yes some people are right that AI is slowing down but on the other side they are right that this next paradigm
