OpenAI Just Revealed They ACHIEVED AGI (OpenAI o3 Explained)

12:05

OpenAI Just Revealed They ACHIEVED AGI (OpenAI o3 Explained)

TheAIGRID 20.12.2024 303 934 просмотров 5 509 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Join my AI Academy - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ 00:00 AGI milestone announcement 00:36 Arc benchmark explained 01:46 Visual examples 03:21 Benchmark performance 04:25 Expert reactions 05:55 Earlier predictions 06:57 Compute limitations 07:54 Model iterations 09:15 Math performance 10:39 Future outlook 11:54 Final thoughts Let me know if you'd like any adjustments to the timestamps or titles! Links From Todays Video: https://www.youtube.com/watch?v=SKBG1sqdyIU https://www.youtube.com/watch?v=UakqL6Pj9xo&t=285s&pp=ygUTZnJhbmNpcyBjaG9sbGV0IGFnaQ%3D%3D Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com Music Used LEMMiNO - Cipher https://www.youtube.com/watch?v=b0q5PR1xpA0 CC BY-SA 4.0 LEMMiNO - Encounters https://www.youtube.com/watch?v=xdwWCl_5x2s #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (11 сегментов)

AGI milestone announcement

so today actually marks a very historic moment for the AI Community as it is going to be probably regarded as the day where AGI actually happened now if you guys don't know why this is the case this is because opening eye today released SL announced I guess you could say their new 03 model which is the second iteration of their 01 series the model that thinks for a long time now if you don't understand why this is potentially AGI this is because the new system managed to surpass human performance in the arc benchmark now the reason that the arc Benchmark is such an important Benchmark is because it is resistant to memorization sure so Arc is

Arc benchmark explained

intended as a kind of IQ test for machine intelligence and what makes it different from most benchmarks out there is that it's designed to be resistant to memorization so if you look at the way LMS work they're basically this uh big interpolative memory and the way you scalop the capabilities is by trying to cram as much uh knowledge and patterns as POS possible into them and uh by contrast Arc does not require a lot of knowledge at all it's designed to only require what's known as core knowledge which is uh basic knowledge about things like U Elementary physics objectness counting that sort of thing um the sort of knowledge that any four-year-old or 5-year-old uh possesses right um but what's interesting is that each puzzle in Arc is novel is something that you've probably not encountered before even if you've memorized the entire internet now if you want to know what Arc actually looks like in terms of this test that humans are so easily able to pass but you know these AI systems currently aren't you can take a look at the current examples right here in this video example here AGI is all about

Visual examples

having input examples and output examples well they're good okay now the goal is you want to understand the rule of the transformation and guess it on the output so Sam what do you think is happening in here probably putting a dark blue square in the empty space see yes that is exactly it now that is really um it's easy for humans to uh intuitively guess what that is it's actually surprisingly hard for AI to know to understand what's going on um what's interesting though is AI has not been able to get this problem thus far and even though that we verified that a panel of humans could actually do it now the unique part about AR AI is every task requires distinct skill skills and what I mean by that is we won't ask there won't be another task that you need to fill in the corners with blue squares and but we do that on purpose and the reason why we do that is because we want to test the model's ability to learn new skills on the Fly we don't just want it to uh repeat what it's already memorized that that's the whole point here now Ark AGI version one took 5 years to go from 0% to 5% with leading Frontier models however today I'm very excited to say that 03 has scored a new state-ofthe-art score that we have verified on low compute for uh 03 it has scored 75. 7 on Ark AGI semi-private holdout set now this is extremely impressive because this is within the uh compute requirements that we have for our public leaderboard and this is the new number one entry on rkg Pub so congratulations to that now I know those of you outside

Benchmark performance

the AI Community might not think this is a big deal but this is a really big deal because this is something that we've been trying to Sol for I think around 5 years now this is a benchmark that many would have heralded to be the golden standard for AI and would of course Mark the first time that we've actually managed to get to a system that can actually outperform humans at a task that traditionally AI systems would particularly fail at now what was interesting was that they had two versions so we had o03 with low tuning and we had 03 with high tuning so the 03 with low tuning is the low reasoning effort and this is the model that operates with minimum computational effort which is optimized for Speed and cost efficiency and this one is suitable for simpler task where deep reasoning is not required so you know basic coding straightforward tasks and then of course you have the high tuned one which is where the model takes more time and resources to analyze and solve problems and this is optimized for performance on complex task requiring deeper reasoning or multi-step problem solving and what we can see here is that when we actually you know tune the model and make it think for longer we can see that we actually manage to surpass where humans currently are now what's crazy about

Expert reactions

this is that the people that have created The Arc AGI Benchmark said that the performance on Arc AGI highlights a genuine breakthrough in novelty adaptation this is not incremental progress we are in New Territory so they start to ask is this AGI 03 still fails on some very easy tasks indicating fundamental differences with human intelligence and the guy that you saw at the beginning Francis cholet actually spoke about how he doesn't believe that this is exactly AGI but it does represent a big milestone on the way to WS AGI he says there's still a fair number of easy Arc AGI one talks to 03 can't solve and we have early indications that Ark AI 2 will remain extremely challenging 403 so he's stating that it shows that it's feasible to create unsaturated interesting bench marks that are easy for humans yet impossible for AI without involving specialist knowledge and he States now which is you know some people could argue that he's moved the goal that we will have AGI when creating such evals become outright impossible but this is a little bit contrast to what he said earlier this year take a look at what he said 6 months ago about the Benchmark surpassing 80% which it did today turn question around to you so suppose that it's the case that in a year a multimodal model can solve Arc let's say get 80% whatever the average human would get then AGI quite possibly yes I think if you start so honestly what I would like to see is uh an llm type model solving Arc at like 80% but after having only been trained on core knowledge related stuff now one of the

Earlier predictions

limitations of the model is actually the compute cost so you can see right here that he says does this mean that the arc priz competition is beaten he says no targets the fully private data set which is a different and somewhat harder evaluation but the arc prize is of course the one where your Solutions must run with a fixed amount of compute which is about 10 cents per TS and the reason that is really interesting is because I'm not sure you guys may have seen this but if we actually take a look down the bottom here you can see that o03 high tuned the amount of compute that is being put into the model there means that this model cost around $11,000 per task which is pretty expensive if you're trying to use that AI for anything at all I think these models that search over many different solutions are going to be really expensive and we can see that reflected here with this one being over $1,000 per task which is ridiculously expensive when you think about using it to perform any kind of task of course as we've seen with AI cost will eventually come down so the fact that they've managed to get the Benchmark is just the very best thing we

Compute limitations

can use the similarities between how technology was pretty bulky in the early days you know like the big TVs the really bulky phones but now with technology you find ways to become more and more efficient and eventually you do things faster and of course cheaper so it's quite likely that this will happen in AI too and what I do find crazy about this is that you know two years ago he said that the arc AGI Benchmark fully solved is not going to be within the next 8 years and he says 70% hopefully less than 8 years perhaps four or five of course you can see AI managing to speed past most people's predictions we also got the fact that it did very well on the swe bench which is of course a very hard software engineering bench and I'm guessing that if you are a software engineer this is probably not the best news for you but I'm sure that now with a bunch of people who are now coding there's probably a lot more demand for software Engineers that actually understand the code that's actually being written here but this is actually rather interesting because one thing that I realized when you know doing this video was the fact that this is actually O2 and not 03 just want to go on a

Model iterations

tangent quickly here because whilst I was reading this you know break through about 03 one thing that I needed to remember is that this is actually the second iteration of the model I think some people might be thinking that 03 is opening I third iteration but O2 is simply being skipped because there is a conflict with O2 which is a British mobile service provider so the fact that this is only the second iteration of the model it does go to show that potentially 03 or even 04 is going to be quite a large jump and might reach Benchmark saturation now we also look at the math benchmarks and the PHD science level benchmarks we can see that there is a decent Improvement there as well I do think though that this is sort of reaching the Benchmark saturation area because we can see this one on competition math is around 96. 7% and this one is 87. 7% and like I said before one of the things that most people are starting to say is that like okay AI has slowed down because it's no longer increasing at the same rates as it was before what we have to understand is that as these benchmarks get to 955 Plus or around that 89% the incremental gains are going to be harder and harder to reach because number one you only have 10% left to get and number two it's quite likely that 3 to 5% of all the questions are contaminated meaning that there are errors in those questions anyways which means that 100% is simply not possible on certain benchmarks which is why they decided to create the frontier math benchmark now at the time

Math performance

when the Benchmark was released which I think around 2 or 3 months ago these were the current models that could do only 2% on these questions you can see at the lead was Gemini 1. 5 Pro Claude 3 Sonic and you can see 01 preview and 01 mini were there but for those of you that haven't realized just how good 02/03 is this is a model that gets 25% so this is something that is really incredible when it comes to research math and you have to understand that this kind of math is super difficult and all of those questions are completely novel so the previous benchmarks that we recently had aren't truly showing the capabilities I mean think about it like this okay when we take a look at these benchmarks you can see that okay you would say maybe the model's gotten you know 10% better overall which is just not true because if we actually look at the really hard benchmarks where it's really solving unseen math problems we can see that there is a 20 times improvement over the current stateof the-art which is just absolutely incredible so I think this is probably the very important image that people need to realize because you can't compare this kind of model even though it does do a massive increase on these kind of benchmarks you can't really compare it to this where it's having you know this insane level of scores now what's important to understand is that noan brown the person who works on reasoning at opening eye said that 03 is going to continue in that trajectory so for those of you who are thinking you know maybe AI is slowing down it is clear that is not the case now interestingly enough we also got samman in an interview talking about what he believes AGI to be and I think

Future outlook

like I said before the definition is constantly shifting and constantly changing it used to be a term that people used a lot and it was this really smart AI that was very far off in the future as we get closer to it I think it's become a less useful term people use it to mean very different things um some something that's not that different than 01 you know uh and some people use it to mean true super intelligence something smarter than like all of humanity put together uh we try now to use these different levels we have a five level framework we're on level two with agents now sorry with uh reasoning now uh rather than the binary of is it AGI or is it not I think that became too horse as we get closer but I will say um by the end of next year end of 25 I expect we will have systems that can do truly astonishing cognitive tasks like where you'll use it and be like that thing is smarter than me at a lot of hard problems so now can see the only thing here is that if you are a safety researcher he says please consider applying to help test 03 minion 03 because he's excited to get these out for General availability soon and he is extremely proud of the work the opening ey have been doing to creating these amazing models so it will be interesting to see when these models are actually released what people are going to do with them what they're going to build and of course what happens next if you enjoyed

Final thoughts

this video let me know what your thoughts are on if this is AI or not I do think that benchmark Mark has been broken and we're constantly finding new ways so it will be interesting to see where things head next

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник