Alpha Everywhere: AlphaGeometry, AlphaCodium and the Future of LLMs

14:45

Alpha Everywhere: AlphaGeometry, AlphaCodium and the Future of LLMs

AI Explained 18.01.2024 97 942 просмотров 3 862 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Is AlphaGeometry a key step toward AGI? Even Deepmind's leaders can't seem to make their minds up. In this video, I'll give you the rundown of what AlphaGeometry is, what it means and what it doesn't meann. Plus I'll cover AlphaCodium, dropped open-source tonight seemingly out of nowhere, and causing a big stir for what it might mean for coders the world over. And I'll touch on what I foresee is the future of large languages models and their alliance with search. AI Insiders: https://www.patreon.com/AIExplained AlphaGeometry: https://www.nature.com/articles/s41586-023-06747-5 Deepmind Blog Post: https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/ Shane Legg Tweet: https://twitter.com/ShaneLegg/status/1747670093348176140 AlphaCodium Paper: https://arxiv.org/pdf/2401.08500.pdf Alpha Codium Blog: https://www.codium.ai/blog/alphacodium-state-of-the-art-code-generation-for-code-contests/ Alpha Codium … Code: https://github.com/Codium-ai/AlphaCodium AlphaCodium Tweets: https://twitter.com/karpathy/status/1748043513156272416 https://twitter.com/svpino/status/1747971746047627682 Twitter Math Kardashian: https://twitter.com/afterveil/status/1746168116546093291 Hassabis New Tweet: https://twitter.com/demishassabis/status/1747669767270306256?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet AlphaGeometry NYT: https://www.nytimes.com/2024/01/17/science/ai-computers-mathematics-olympiad.html AIMO Prize: https://aimoprize.com/ Metaculus IMO: https://www.metaculus.com/questions/6728/ai-wins-imo-gold-medal/ Paul Christiano Lesswrong: https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer Professor Rao, On the Planning Abilities of LLMS: https://arxiv.org/pdf/2305.15771.pdf Eureka: HUMAN-LEVEL REWARD DESIGN VIA CODING LARGE LANGUAGE MODELS https://arxiv.org/pdf/2310.12931.pdf Google Offers Salary: https://www.theinformation.com/articles/googles-defense-against-openai-talent-grab-special-stock?utm_source=ti_app&rc=sy0ihq Mathematics Will Fall First: https://twitter.com/francoisfleuret/status/1731096582932578653 Samsung Galaxy S24 Gemini Ultra and Nano: https://blog.google/products/android/google-ai-samsung-galaxy-s24-/ Lead Author Trieu Video: https://www.youtube.com/watch?v=TuZhU1CiC0k V100 to X100: https://pbs.twimg.com/media/FbkEJX1WYAAh8wv?format=png&name=small https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F486c51a2-195a-4d4a-ba7a-281140c9bf64_2208x1230.png AI Explained Eureka: https://www.youtube.com/watch?v=RCRuiu-3VDU Check out the amazing Donato Capitella: https://www.youtube.com/@donatocapitella Non-Hype, Free Newsletter: https://signaltonoise.beehiiv.com/

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

24 hours ago Google Deep Mind released Alpha geometry and while their leaders are calling it a step toward AGI the team itself is warning everyone not to overhype it I've read the paper in nature the press releases and Associated interviews and feel that hitting gold for geometry in the international math Olympiad is significant more so for what it signifies about the growing alliance between language models and search idea generation and brute for in that same vein we'll also take a quick peek at Alpha codium the brand new OP sourced rival to Alpha code from Google deepmind but let's start all the way down in the day-to-day way AI is now being used for math education if you think this is the way to go to get kids interested let me know it's bordered off by these two values so in this case the integral would be the area of this shape here what about this other stuff here let me take it from here Kim that tall swirly symbol on the left is an S which stands for sum what are we summing we're summing the area of these strips a tiny distance DX multiplied by the height which is the value of the function but these are way too thick Taylor DX is actually really tiny for those who don't know the international math Olympiad is the most prestigious math competition in the world I remember competing in challenges just to get into the international math Olympiad spoiler I didn't get in but I would say I never studied that hard anyway this new system Alpha geometry scores almost as highly as the average IMO gold medalist but specifically for a subset of geometry problems only not algebra or number Theory we're talking just geometry so it's not like Alpha geometry did an IMO test it just did 30 geometry IMO questions nevertheless getting a gold medal overall in the IMO has long been one of the Holy Grails of machine learning that's maybe why one of the co-founders of Deep Mind said AGI keeps getting closer and even Demis cabis the leader of Deep Mind and one of the other co-founders said this congrats to the team this represents another step on the road to AGI he later edited out that last sentence possibly because he read that the team said not to overhype it but also he might have read some of the caveats in the paper itself of course I'll get to the paper but first I want to set the stage there is now a grand prize of five million and an overall prize pool of 10 million for getting gold in the IMO 2 years ago the forecast on metaculus for an AI getting a gold medal was 2037 and what is it as of tonight 2027 and of course you don't need me to tell you that's just 3 and 1/2 years away so how does it work well Alpha geometry is a neuros symbolic system a combination of a neural network and the old-fashioned symbolic pre-programmed systems and in fact that Alliance between large language models noral networks and old-fashioned pre-programmed systems is going to be the theme of this video idea generation and you could call it creativity plus brute force and search that Alliance I predict in the future will yield AGI here is a simple example of how it works imagine you're trying to prove that two angles are equal in an Isles triangle a key part of that proof is to drop a perpendicular line down from a to hit the midpoint of B and C the thing is symbolic systems aren't designed to propose those kind of constructs idea generation isn't their Forte that's where a language model came in the language model in this case was only 151 million parameters and it was trained on Purely synthetic data the purely synthetic training data was all about getting the model to provide proofs for various geometric statements in 91 million of those samples brute force would be enough just step-by-step deduction using known rules but in 9 million cases you would need one of these constructs the authors call them pulling rabbits out of the hat and the language model was ftuned on those examples it paid particular attention to those examples basically it got really good at suggesting such constructs going back to this example the moment you posit that line an old-fashioned symbolic deducer could then solve the rest it could mechanically produce the proof that these two angles are equal basically the angle at B and the angle at C if by the way the deducer couldn't solve the problem it would send it back to the language model to suggest other constructs While most of that training data involved basic proofs Apparently one involved two constructs and a proof length of 247 deduction steps I can start to see why Alpha geometry outperformed all but the best humans a bit below somewhat sheepishly the authors admit that these Solutions tend not to be symmetrical like human discovered theorems as they are not biased towards any aesthetic

Segment 2 (05:00 - 10:00)

standard as in these Solutions don't look good they look like trash but they work the lead author of the paper put this really well in a video on his own YouTube channel and pointed out that the approach isn't fully novel The General observation here is that given a hard problem we usually have to come up with one or more rabbits in order to transform the problem into a more mechanical state in such a way that the symbolic engine or the mechanical solver can just take the problem and then solve it but if the solver failed to solve the problem then we can always come back and ask for more rabit and then we keep doing this in a loop until we find the solution and so with this observation our Sol here pretty much reflect the structure of this observation where we built a neural language model that is trained to propose magic instruction and then we built a symbolic engine that is tasked with handling all the mechanical cases and the mechanical deduction in geometry and then we put these two components into a loop so that we obtain a neural symbolic server named a geometry let me point to an important fact that is the observation of neuros symbolic structure is not a novel observation that is made in our work in fact in 2020 Po and SG have already pointed out that a major limitation of the improving compared to human is in fact the ability to generate original mathematical terms and this limitation might be addressable via the generation from language models geometry it seems might be particularly amable to this approach as one IMO gold medalist and Fields medalist put it finding solutions for IMO geometry problems Works a little bit like chess in the sense that we have a rather small number of sensible moves at each step nevertheless he says he was stunned that they could make it work they even cheeky compare their system trained on 100 million proofs with GPT 4 it apparently had a success rate of 0% often making syntactic and semantic errors of course deciding which of the many constructs to use is a question of search and comp compute budget but they noticed that using less than 2% of that search budget analyzing eight constructs each time versus 512 during test time it could still solve 21 problems that would still put it at just below the silver medalist level and way above the previous state-of-the-art speaking of search and compute budget though I couldn't help but notice this they use nvidia's v00 gpus and said somewhat modestly scaling up these factors to examine a larger fraction of the search space might improve Alpha geometry results even further I think frankly that's an understatement because the V100 was replaced in 2020 with the a100 recently replaced by the h100 and yes I know I pronounce my H's in a cogni way even the h100 from Nvidia is going to be replaced this year with the B100 and next year with the X100 I've almost lost count now of how many generations behind the V100 is so the fact that they use v100s is incredibly impressive I feel like the bitter lesson is going to strike again soon and Imo geometry is going to be all but solved by next year I must caution though that this had been foreseen including by Paul Cristiano former head of alignment at open Ai and Imo participant when he was younger he predicted that AI would soon solve most geometry problems essentially for free Deep Mind in their blog post go a bit further though they describe this as demonstrating ai's growing ability to reason logically and to discover and verify new knowledge I feel like there might be years more of debate over whether it's appropriate to use that word reason for what's happening here but in the end it might end up being semantics nevertheless Google are open sourcing the alpha geometry code and model within a year they hope it will be inside Google's Gemini remember Google also promised that Alpha code too would be put inside Gemini so that's a lot of Alphas to go around of course many of you might be wondering if this is an example of mathematics falling first which would then lead to a torrent of results that will impact everything in theoretical science as one machine learning Professor put it well we simply don't know as the co-founder of xai and former googler put it leaves a lot of questions open he said it's not easily generalizable to other domains and other areas of math that's not going to stop the lead author attempting to generalize the system across mathematical fields and Beyond but speaking of alpha code and open sourcing we now have Alpha codium it's open source single click and is claimed to beat Alpha code 2 without fine-tuning all the relevant links will be in the description but there's another reason why I bring it up in this video not just that it's brand new and state-of-the-art but it's also that same theme of llms proposing Solutions and iterating based on feedback from the environment in this case code unit tests as Andre karpathy puts it we are moving away from that naive prompt to Auto regressive token by token can answer where llms like gp4 are

Segment 3 (10:00 - 14:00)

forced to put out Immediate Solutions it's becoming more like a conversation between llms and their environment in my own tests for smart GPT 2. 0 I'm discovering the same thing as the authors when they say this try to avoid direct questions and leave room for exploration well the way I would translate that is that if you force an llm into an immediate answer it will then pick an answer and then stick to it values fluency over accuracy so what's the answer try to avoid those direct questions encourage reflection that's probably why Chain of Thought works so well here's a great summary from Santiago on Twitter First Alpha codium gets the llm and its model agnostic to reason about the problem describe it using bullet points and focus on the goal inputs outputs rules Etc then make the model reason about the tests it would need generate potential Solutions and rank them in order of correctness Simplicity and robustness now generate more diverse tests for edge cases and here's the key step pick a solution generate the code and run it on a few test cases if the test fail improve the code and repeat the process I can't help but notice that this is eerily reminiscent of some of the prep work I did for smart GPT I won't go through it now but what it involved was commanding the model to not output a solution immediately in fact I wanted it to generate mistakes that students might make then I would force it to come up with test cases and the rest of the steps I might cover in another video but it was that same approach the same idea don't get the model to Output an immediate answer delay that as long as possible and first generate test cases it's almost like you're forcing it to reason logically and yes in case you're wondering this works amazingly for mathematics here are some of the results of alpha codium compared to direct prompting across a range of models so you might mention this video to anyone who thinks llms have peaked the theme of using them for idea generation and then external experimentation just keeps occurring again and again in the literature we saw it with Eureka and if you haven't seen my video on that do check it out the llm gpc4 would propose reward functions and these would be tested in a simulated environment and the reflection fed back in and even the notorious llm skeptic Professor Ralph who I interviewed for AI insiders updated in November his original paper on the planning abilities of llms tweaking the ending to say this we demonstrate that llm generated plans can improve the search process for underlying sound planners and additionally show that external verifies can help provide feedback on the generated plans and back prompt the llm for better Plan Generation coming from him that's borderline euphoric and yes I can't help but mention that I go into more detail on this topic on AI Insiders on patreon the link is in the description and that's not just for this video on its implications for embodiment and Robotics I also interview Professor Ral for this video on reasoning as the holy grail for artificial intelligence while we're here though I can't resist mentioning that I also released this video tonight on AI insiders basically it's my attempt through analyzing five papers to answer the question as to whether llms boost worker productivity and no unfortunately the ad is not yet over because today I also released this video from Donato capitella he's an AI Insider himself and one of the benefits is that members can submit explainers for other insiders to watch the best of these I'll talk about on the main Channel which is what I'm doing right now this was a fantastic video from Donato who is a cyber security consultant based in London in fact I fairly recently met up with him again proving that I am not GPT 5 I'm even going to go one step further and recommend his YouTube channel I think it is criminally underrated he creates partly with AI admittedly these amazing detailed diagrams to explain certain topics if you want to know what I mean check out his channel so no in summary llms are not peing but here's another quick example just 48 hours ago we heard about Google laying off a thousand workers but what about their workers at Google Deep Mind no those workers they are spending hundreds of thousands to millions of dollars to keep them at Google that's because openai has hired at least six of Google's Gemini contributors since October indeed MoneyWise I would say things are heating up rather than slowing down I imagine Samsung have signed a multibillion dollar contract to get access to Google Gemini models in their smartphones and apparently Samsung will be among the first Partners to test Gemini Ultra so no Alpha geometry and Alpha codium are definitely not AGI but neither is the race 2 AGI slowing down anytime soon thank you so much for watching and have a wonderful day

Другие видео автора — AI Explained

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник