OpenAI Might Be In Big Trouble...

18:35

OpenAI Might Be In Big Trouble...

TheAIGRID 06.03.2025 53 597 просмотров 1 153 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Join my AI Academy - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ 00:00 Intro 00:24 Satya Nadella on AGI Definition 00:48 Cognitive Labor and AGI 02:09 Dynamic Nature of AGI 02:32 Sam Altman's AGI Perspective 03:11 Nadella on AGI and Economic Growth 04:04 Real-world Value vs Benchmark Hacking 06:01 AI Infrastructure Investment Risks 06:14 Supply and Demand in AI 07:08 OpenAI’s AGI Levels Explained 08:02 CEO Predictions on AGI Timeline 09:17 Andrew Ng's View on AGI Timelines 10:13 AGI Complexity Explained 13:14 Missing Components for AGI 14:24 Comparing Human and AI Intelligence 16:06 Narrow AI vs. AGI 17:04 OpenAI’s Superintelligence Goals 18:20 Conclusion and Final Thoughts Links From Todays Video: https://x.com/ns123abc/status/1895900960519831576 https://x.com/paul_cal/status/1891896917161906204 https://www.reddit.com/r/singularity/comments/1iqjmng/warning_about_perplexity_ai_deep_research_it/ https://x.com/kimmonismus/status/1895459347750404412 https://siliconreckoner.substack.com/p/the-frontier-math-scandal Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com Music Used LEMMiNO - Cipher https://www.youtube.com/watch?v=b0q5PR1xpA0 CC BY-SA 4.0 LEMMiNO - Encounters https://www.youtube.com/watch?v=xdwWCl_5x2s #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (18 сегментов)

Intro

so open AI actually might be in some trouble and that is related to their recent model now I personally think the recent model is really decent but many people don't think so currently there is this idea that GPT 4. 5 simply isn't that good of a model and apparently open AI is destined for failure so without further Ado let's dive into is open AI in trouble and are they slowly falling

Satya Nadella on AGI Definition

off a cliff so one of the recent articles that I came across that was you know recently published by futurism was this one right here it says open ey may have really screwed up with GPT 4. 5 it says that the hype is dying and this is in reference to the fact that GPT 4. 5 was meant to be a Frontier Model now currently it's not a Frontier Model in the sense that it's the best in all

Cognitive Labor and AGI

categories but I think maybe the marketing surrounding the model didn't do it any justice one of the things that they actually did was you know stating that you know a GPT 4. 5 is the Lar EST and most knowledgeable model yet while managing the expectations and cautioning saying that it's not a Frontier Model and so I think there was definitely some marketing issues like I was talking about before because you know they're essentially trying to have it both ways you can't call it the largest and most knowledgeable model whilst also stating that look it's not going to be the best model now I've done a video on GPT 4. 5 that you guys may want to check out but this article has its point so they say that probably because the company knew what the public response was going to be muted they actually reference AI critic Gary Marcus calling the llms a nothing Burger to an anonymous expert who told ARS Technica that the model is a lemon and it appears that the amply hyped Union model is seriously lacking and the type of juice that made the original chat gbt or its follow-up GPT 4 become enormous cultural and financial touchstones so in this article you know where they're saying that you know open ey may have screwed up with gbt 4. 5 I on The Wider Spectrum don't think this is that much of a bigger issue because I do know what open ey are trying to do and I'll talk about that later on but I do think that you know overall it doesn't

Dynamic Nature of AGI

look good from an Outsiders point of view because it's like you know they were hyping up new models and then they released GPT 4. 5 and the public reaction is quite bad now we also need to take a look at GPT 4. 5 and what they were saying before if you remember what earlier articles from places like the Wall Street journal and Bloomberg were saying they actually had a lot of

Sam Altman's AGI Perspective

information regarding leaks so they spoke about how the project called GPT 5 and code named Orion had been in works for more than 18 months and intended to be a major advancement in the technology that powers chat gbt the closest partner investor Microsoft had expected to see the model around mid 2024 I remember that would have been you know s or eight months ago now and apparently open ey actually conducted at least two large training runs Each of which entailed months of crunching huge amounts of data with the goal of making a smarter and every single time they did this apparently new problems arose and the software felt short of the results

Nadella on AGI and Economic Growth

researchers were hoping for that's what people close to the project were saying and they say at best Orion you know performs better than opening eyes current models but hasn't Advanced enough to justify the enormous cost of keeping the new model running and I think this is pretty crazy because we've got a situation on our hands apparently GPT 4. 5 in this article says that you know this was actually meant to be called GPT 5 and I'm guessing that if we deduce that you know they were talking about a model that was quite expensive but the model didn't live up to capabilities then maybe they actually just changed GPT 5 to GPT 4. 5 because you have to remember that you know GPT 4. 5 was you know code named Orion and apparently now this is what they were saying about GPT 5 so my best guess is that maybe openi did actually make GPT 5 which is currently GPT 4. 5 and I'm guessing that since GPT 5 was probably a

Real-world Value vs Benchmark Hacking

disappointment in terms of not getting the results that they were expecting maybe this was like look since it's not that big of a deal in terms of like you know the performance You're Expecting The you know what it's going to be able to do I'm guessing we should probably call this model GPT 4. 5 and all of this data when they're talking in quotes about GPT 5 here this actually looks exactly like what probably happened to GPT 4. 5 which is you know the model we currently have now and it is really expensive to run if you're not in the you know I think it's Pro tier you actually don't even have access to it and it is a good model but certainly I don't think it's worth the $200 a month for marginally um you know better performance now of course you can see right here we've got GPT 4. 5 so this is where chubby says judging by the mood GPT 4. 5 is the first big failure of open AI too expensive too little Improvement and often inferior to GPT for o in Creative answers in community test and I think that you know this statement is you know half true and half not because of course it's probably the first big failure of open ey but I do believe that it's not inferior to GPT 40 and I'll get into that in a moment but Gary Marcus did actually quote tweet this by saying look I've been trying to tell you guys about this for some time now Gary Marcus said here that it's a big surprise only to you and the many other opening eye fans who refused to listen to me when I patiently and endlessly explained that this would be exactly what would happen for the rest of us this was essentially inevitable now Gary Marcus has been you know quite critical of open a eye for some time now he's openly spoke about you know the things that he believes opening eye has done wrong in terms of them you know not having any mo of course them having major problems here you can see he dives into some really key details on why he believes that opening eye is in serious trouble he says of course they've got a brand name which is good but gbt 4. 5 is hugely expensive which I do agree with and he says even so it offers no

AI Infrastructure Investment Risks

decisive advantage over competitors which means they had zero Moe which is you know on the brand scheme of things it is actually quite true scaling them you know hasn't gotten them to AGI I do think that is true but at the same time AGI will come later the GPT 5

Supply and Demand in AI

project was a failure some people are wondering is that or they've got the deep- seek war led to you know a price war of course I will speak about this as well and open AI is of course losing money and one of the key things he actually does talk about is that many top people have left and I do think this is unfortunate because these people are essentially multi- multi-millionaires because of course they've had equity in open Ai and I do think that some of them maybe just want to pursue creative projects in AI rather than you know working ruthlessly at one of the most competitive startups and we can actually see what Gary Marcus says in a recent interview where he talks about the fact that open AI might not be doing so well now it's of course always I'm going to always add this caveat is that Gary Marcus is a known critic of the AI industry he's definitely critical of a lot of things that I don't think sometimes is warranted like sometimes he does make good points but sometimes it does seem like the statements that he makes don't come from a place of genuine criticism but rather I'm not even sure where it comes from but just wanted to add that because it's important to

OpenAI’s AGI Levels Explained

understand viewpoints um I think that open AI is highly overvalued I think we just saw their business model sort of blow up yesterday or over the last few days with deep seek basically giving away for free what they wanted to charge money for um also I think that deep seek is more open than open Ai and that's going to be attractive to Talent uh I don't think things are looking great I think $17 billion valuation uh is hard to Bear out when you're losing 5 billion dollars a year or thereabouts now this is why I said I didn't really agree with Chubby's statement I agreed with half of it but the other half I just didn't really agree with that much so basically the reason I said that is because K pathy did some Community testing and 80% preferred the answers of GPT 4 instead of GPT 5 but the only reason I said that this doesn't make sense as a valid test whil yes it is kind of a test the only

CEO Predictions on AGI Timeline

problem wrong with this is that this is a very small sample size in the sense that you only have five questions and five questions that Andre karpathy has asked isn't a large enough sample size to be judging a model it's like me saying five I've got know like five questions I've asked an a model pick between a or b and the reason I stand by that statement you know of just stating that look like if I had you know five prompts that I showed and people just preferred one model over the other the reason you use you know other methods is because the larger sample size you have the more holistic view you're going to have so for example let's say we looked at this was actually released while I was making this video and we actually see that GPT 4. 5 managed to top the LMS y Arena so we can see right here the GPT 4. 5 preview was an area where open AI actually managed to get themselves to be number one and this is pretty crazy because it was only a week ago that we had grock 3 become number one across all of the categories and here we can see that we had 3,000 votes which is of course a much bigger sample size in terms of having users talk to the model and view the model in terms of their actual responses and considering that this is something that openai have essentially said is going to be number one clearly we can see that this makes sense so if

Andrew Ng's View on AGI Timelines

we want to look at other categories we can see that in you know coding math and hard we can see that it is number one which is genuinely quite surprising considering other models are also pretty decent so I don't know what it is about the Vibes of the model although that is a very vague term The Vibes clearly are real and despite some you know user testing even by the goats like Andre Kathy it still doesn't you know account for the rapid testing that we can do here there where you know you can see exactly just exactly how these results are so as well with star control we can see the GPT 4. 5 it actually manages to beat out the other models by a large jump so we can see that this is a pretty decent jump in terms of overall ELO rating which is rather surprising now remember the Deep seek thing which as Gary Marcus said is something that kind of you know messed up open ai's launch I do think that you know certain industry partners are leaving open a due to them

AGI Complexity Explained

being able to use open source models one other thing that I don't remember who said this but they basically said openai doesn't have a Moote in the sense that if someone can offer a foundation model for the same price or cheaper then individuals that work with them would work with other companies that are able ble to do that and have it completely open source or basically for free where they can host it themselves and basically we recently had that confirmed when figure the company that produces humanoid robots in an interview the co actually said that look everything that we are currently using now is open source and they actually also terminated their agreement with open AI recently because of course they had a breakthrough and they just believe that they are the very best at embodying humanoids with AI like you know um trajectories uh to go and grab something right um and that doesn't sit like in the LM just no idea what that looks like there's no robot data in there um so um yeah we basically train we use like we use open source models today uh but with uh our own models internally and own data collection we do internally on robot and we basically build like basically like the foundation models yourself um ourself and we've been doing that almost exclusive like basically ourselves for a year um and um we think we're the best in the world at doing this on robotics think there's anybody that's demonstrated better AI learning on robots than we put up publicly and now if we're going to talk about other things we also have to talk about deep research hallucinations now this video isn't an attack on open AI but it's to put a critique on certain things and also a question are open ey genuinely in trouble now basically deep research is you know open ey's Flagship tool that I've seen a lot of people talk about and they say that the tool is really good but there is currently a floww in deep re research that I don't want to say makes it unusable but in certain aspects I would argue that what would be the point of using this if this is you know the case so basically this person said that this is the worst hallucination I've seen from state- of-the-art LM for a while and I think that this is something that you know is basically true for all LM so this isn't just you know opening ey's fault this is something that acrossed every single research tool so it says deep research made up a bunch of stats and Analysis while claiming to compile a data set of thousands of Articles and supposed gather birthe info from you know from each author from reputable sources but you know they did the research and none of this stuff is true so this is something that is of course a big issue because you have to ask the question and someone on Reddit posted this and I will say that this is probably the best way to put it because this you know is incredibly dangerous if you have a research tool that you know gives you hallucinations because this renders the entire point of deep research rather pointless because you will need to go and personally verify every single sour SCE to confirm the accuracy and I think that is true what is the point of having a research tool that you know does research for you but you need to go ahead and confirm that those sources are legitimate because you might as well have just done the research yourself now of course maybe it saves you time or

Missing Components for AGI

whatnot but isn't the point to have everything you know solidified and verified so that you can say okay I've got this research report and now I only need to skim through this and I've got the main details and like I said this is something that goes for all deep research tools I would like to see an update where they actually have the sources and they have like an AI That's able to click the link it's able to verify the source so you know there's like green ticks on verified Source or maybe if it doesn't know about certain sources it has some kind of you know not sure about this Source or whatever because that is you know something that you know renders this you know quite useless I would say and it's actually makes me a bit apprehensive to use it because if I'm using an AI tool to do research and then that research comes back and there are some hallucinations that is going to be of course a big issue now remember ages ago when they were stating that you know these models could begin to automate Lar portions of the economy and that they believe that companies that train the best 2025 to 2026 models will be too far ahead for anyone to catch up in subsequent Cycles was it just hype because right now we've got a situation on our hands where even you know people that are training the biggest models they aren't that much ahead than the competition and it seems like things are currently changing on their hands and it was Ilia SATA that said you know pre-training as we know it will end of

Comparing Human and AI Intelligence

course computer is growing but the data is not growing so there will be new Innovations on this side and I'm actually quite bullish here the fact that these larger models are not producing better reasoning I think that now the real work in AI has started because the easy pickings have gone like before it was just okay just make it bigger and it become smarter but now they have to think about new innovative ways which means we're actually probably going to get some really smart systems in the future because people aren't just staring down one tunnel they're exploring multiple Solutions so another thing that I did think was a really bad look for opening eye and like I said before I'm not bringing this up to critique open AI but does this mean that open AI are in trouble and this was something that you know I don't want to say it upset me but it definitely did throw me off because I didn't expect this at all so this right here is Frontier math and essentially there was an AI Benchmark that was you know they they basically spent a lot of time on this Benchmark and they basically said it was a math you know this Benchmark is you know something that if any AI is able to solve any of these problems this AI is ridiculously smart so basically open AI commissioned Epoch AI to produce 300 advanced math problems for an AI evaluation that form you know the core of the frontier math benchmark and the only problem was that open AI secretly funded this Benchmark linked to the 03 model so open AI they basically paid for this Benchmark to get created and they actually had access to the problems and the solutions of the test set so that was something that you know the individuals who were creating The Benchmark they actually didn't disclose that until 3 was released so there was this huge Backle of Frontier math and this was something that basically put a stain on that Benchmark because one of

Narrow AI vs. AGI

the key things with 03 whilst it does do well in terms of you know certain benchmarks one of the benchmarks that was really surprising because the jump was from 2% to 25% and people were like whoa how on Earth did that happen basically if that Benchmark is invalidated people are going to now start to question the Integrity of other benchmark scores and I'm also somewhat doing the same because it's now like okay you know there is an incentive for not opening eye to lie but for them to make maybe just like you know game the benchmarks a little bit so that their models can seem even better I mean these companies are trying to attract investment and of course doing so would definitely boost that so um Gary Marcus also talks about this it says you know open I's you know impressive results on 03 should be taken with a huge grain of salt they had access to the problems solution we don't know what they trained on we don't know you know what problem tailor validation or data augmentation techniques might have been built in and the problem is right now we're still waiting on this like I'm truly is that um uh Epoch AI have not been able to independently

OpenAI’s Superintelligence Goals

evaluate the model using their hold out set and the hold out set is basically a bunch of questions that they haven't simply shown the model yet so for me I would like to see you know different benchmarks where opening eyes models haven't seen it before and they're able to still reason well now I will say that I'm not completely bearish on open ey at all I still think that you know the company's doing amazing things and I still think that they are going to do well in terms of the fact that right now they're focusing on ASI which I'll explain in another video but the question is you know did open AI screw up by releasing this model did gbt 5 not live up to expectations I'd love to know your thoughts and theories but my honest opinion is yes gbt 4. 5 or gbt 5 may have been a flop but I would say that it is the case that you know this model is number one more of a bbes model number two it's not a reasonable model and number three the new paradigm the one which we're slowly ending has give birth to of course the new paradigm which is test time reasoning or test time compute in which there is still a lot of areas that are currently unexplored so I would say open I still have a lot of Runway because they are ones that you know pioneered that area and I think that they will still lead due to their connections and Partnerships but of course there is now competition we have other countries like China on the case

Conclusion and Final Thoughts

of being very hard to deal with in terms of the competition so it will be interesting to see what goes on with these other companies and with that being said do not forget to leave a like subscribe and I will see you guys in the next one

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник