AlphaGeometry: Solving olympiad geometry without human demonstrations (Paper Explained)
35:26

AlphaGeometry: Solving olympiad geometry without human demonstrations (Paper Explained)

Yannic Kilcher 21.01.2024 40 177 просмотров 1 019 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
#deepmind #alphageometry #llm AlphaGeometry is a combination of a symbolic solver and a large language model by Google DeepMind that tackles IMO geometry questions without any human-generated trainind data. OUTLINE: 0:00 - Introduction 1:30 - Problem Statement 7:30 - Core Contribution: Synthetic Data Generation 9:30 - Sampling Premises 13:00 - Symbolic Deduction 17:00 - Traceback 19:00 - Auxiliary Construction 25:20 - Experimental Results 32:00 - Problem Representation 34:30 - Final Comments Paper: https://www.nature.com/articles/s41586-023-06747-5 Abstract: Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning1,2,3,4, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges1,5, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004. Authors: Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He & Thang Luong Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Оглавление (10 сегментов)

Introduction

cool hello there today we're going to look at solving Olympia geometry without human demonstrations this is a paper that introduces the alpha geometry model it's by Google deepmind and it akin to other computer mathematics papers it goes after math Olympiad problems in the domain of geometry specifically ukian geometry and it does so and that's what it says here without any human Dem demonstrations as training data this paper is certainly a kind of a breakthrough in computer mathematics on one hand but on the other hand is also super duper specialized in a way like the premises like the circumstances have to be exactly right for this technique to work um and we're going to look into what they do right here they say the combination is a neuro symbolic system that can so it's a combination of sort of trained language model models and symbolic solvers that together do something like a proof search across these geometry problem and specifically the difficulty in this problem is to the construction of auxiliary points or auxiliary things uh and what that means is I believe demonstrated here in this um in this example so the top is a very simple problem that serves as an example

Problem Statement

so they say let ABC be any triangle with AB equals AC so this is an equ lateral triang like a l triangle so this here is equal to this here right now we should prove that the angle at B here is equal to the angle at C uh how can we do that uh we can the ultimate proof is going to look like this we construct the midpoint here and we observe that AB equals AC so that's a premise we observe that b d so this distance right here is equal to and obviously D is a d and therefore we have two similar triangles and therefore these angles have to be the same you'll notice that the proof of why the angles are the same was included this construction of the midpoint right here the midpoint D so introducing new things in order to make the proof happen this is done in a lot of mathematical proofs like to introduce some helper variable to introduce some kind of construction uh whether that is in Geometry or outside of geometry usually it's in the form of some definition or some kind of Lemma or something like this or yeah as I said an extra variable in Geometry it's often the construction of extra points extra circles and so on like oh okay take the bis sector here and intersect it with this line and we'll call that new Point e and then we'll you know apply something to that point e this is a difficult thing why is that difficult because if you have a math proof engine it can only reason across the things that exist right as soon as it comes to I need to construct something that doesn't exist first and then I can apply my deduction rules to that that's very tricky because the amount of things you could construct that aren't there already is infinite and therefore what should sort of a classic Prov even do it cannot even start searching because even the a single step you have infinite possibilities to search for so what the alpha geometry paper does is it says how about we use a language model to suggest new things to construct to suggest these auxiliary constructions and then once we add them then we use our prover in order to try to solve the proof and if it can't solve the proof we use the language model again to suggest the next thing to be constructed we add that we use the prover again ask like yeah can you prove it now no okay and they keep adding things until the prover can prove it so this is the basic the core Loop here um you have a symbolic deduction if it is solved all good if it is not solved go here ask the language model what extra thing should I introduce into the construction give it again to the symbolic uh deduction engine and repeat until solved you'll already see this we'll get into how you train the language model and so on to do the correct thing you'll already see this requires an extremely specific set of circumstances namely it requires that whenever let's say a problem is solvable with the current things you have available then the solver here is able to at least in reasonable time give you that solution right um it is also necessary that the language model can be trained to reasonably suggest next things to construct that's also not a given by any means right uh it's been long held as sort of the creative act of mathematicians to introduce new things that are then helpful and abstracting this into a language model especially without human training data as the this paper does requires specific setups of circumstances you can see this applied to an actual problem here so this over here is the problem statement you can see it's quite a bit longer than the sample problem statement we've started with ultimately you are required to prove that the circum circles 01 and O2 which are in green here uh and of the triangles fkm and K qh so f k m maybe yeah and kqh well I'm not even I don't even know they are qkqh these ones right here so that the um the circum Circles of triangles fkm and kqh are tangent to each other so that there there's a point where they touch I guess all right this requires Alpha geometry 109 proof steps and includes quite a number of auxiliary constructions which you can see right here for example this midpoint of BH the midpoint of HC and this other midpoint right here among other things I would guess so as you can see not an easy thing to do once you actually get to the hard problems here all right

Core Contribution: Synthetic Data Generation

how I think the general structure though should be clear right so you start with the problem you have already a solver in place like a symbolic solver that can give you a solution if there exists one with the current set of premises and things you have available if that does not yield a result you add something and the adding something that's done by the by a trained large language model so now the question is how do you train a large language model to suggest to add you add things to you the first question is how do you even encode such problems uh in such problems in a viable form now you just have to go with the flow here believe me that they have found the form to relatively efficiently and understandably encode this into a sort of language form where they can tokenize it and feed it into an llm the llms they're going to use right here are exclusively trained on this kind of domain specific language so they're not fine-tuning something that exists somewhere and rely on its uh pre-training or anything like this as far as I understand they fully from scratch train the model on the language they on the DSL they construct here what they do they sample what they call random premises and random premises are really just things like this right here so they just construct a bunch of triangles and circum circles and midpoints and angle bis sectors and so on and you can see in this extended table down here somewhere all the things they use to sample them again they do not use this from Human data where is it this one there's no human data in here they just go about this and they just say okay I can sample

Sampling Premises

an angle bis sector mirror I can sample the in Center I can sample to make the tangent I can uh make a line or a point on the line I can make a segment of a circle or I guess a segment at all like just a line uh so they can sample all these things and you'll see even though it's quite a list it's it's not a giant list so I think condition crucial condition number two is that the sort of Base operations of the problem is relatively small so what they're going to do is they're going to just sample combinations of these like they'll just sample like this they start with this one and then maybe this one and maybe some become impossible after a while because there is no line to to put a point on but once there is these open up again so they'll just kind of sample these and they'll augment them by what they're call algebraic reasoning so um like dividing angles into half or multiplying them by two and things like this like introducing constant factors and so on but ultimately it's a finite relatively small set of things that all of these geometry problems are representable as combin combinations of at least a large part of these problems are representable as combinations of those things they purposefully do not cover all of the geometry problems they encounter they say only about 75% of their ultimate test set or of their um of sort of the domain they consider is coverable by their DSL and by their construct here and they on purpose leave a away the others and I'm going to guess in order to keep things interpretable but also this list here that they need to sample from small so why is it important that this is a small list because this being a small list makes it possible such that I can kind of cover the space that I'm looking at relatively okayish with just sampling from this so if I just consider proofs that are 100 steps long and I just randomly sample from you know these things a 100 times and I do that a billion times that's going to give me reasonable coverage of my domain not great but reasonable such that the llm can learn something if this here was let's say all of mathematics or would have to cover all of mathematics there would be kind of no way I think to uh to do that but because the fundamental elements of the domain are so limited this is all of a sudden accessible so that's what allows them to do this so they just kind of go wild uniformly sampling across these things because again they don't want to use human demonstration data so they just uniformly sample random what they call premises so this here could be a uh construction of these premises and I mean made a small mistake before how

Symbolic Deduction

many of these basic elements you sample to construct the premises has is not directly the amount of proof steps involved to prove them so um I'm I want to revise that statement but there is a a correlation what do they do now so now they have this construction right here the Second Step they do is they figure out all of the things that they can say about this so using their symbolic prover they now prove like everything they can about this so you can see right here for example there is a right angle here and then there is I don't know I guess this is a triangle and then right so one thing you can claim is that sorry triangle a over here and B are in some way similar are they I don't even know but I hope you understand what I mean like from what we've constructed right here we can go ahead and then use our symbolic reasoning engine to give us of the closure of all the statements we can now prove about this thing um in total right so we call they call that symbolic deduction to prove all the statements and by doing that they kind of they need to go they built these trees so okay we can prove that eadh the uh this polygon right here e a DH is cyclic which I think means it has a circum Circle which you can know by I guess looking that this here is a triangle and if this is a triangle and you kind of intersect it in any way then the this has must have a circum Circle I'm not sure but I guess that's what it is so you can deduce this likewise you can deduce that e c d has a circum Circle so if you consider again e b c d this one again this can be completed to a triangle and therefore it has a circum Circle I think that's why it has a circum Circle then you can also deduce that for example this angle is equal to this angle so EDH um the angle at D between e and H is equal to the angle C between e so to this angle right here you can deduce from this being cyclic you can deduce that these angles must be equal because these are similar triangles maybe it's just the angles that are equal you can deduce that some things are orthogonal you get the point right so you can deduce things and from the things you deduced you can deduce other things and you just do that until you've exhaustively made all the possible statements all the possible non-trivial statements you can make about the things that you start with right here okay that's step two that's symbolic deduction all right so now we have all those things how are we going to do that what now we're going to pick one of them okay so of all the things we've proven and you can see out here represented by all these arrows even the gray ones right here now we're going to pick one of them this one here that's the one and we just say okay this one is now our Target this is to be proven okay um once we've done that we can

Traceback

simply walk back the path that the prover took to get here in order to obtain the proof right in order to prove this node right here I need to know three things I need to prove this here and well the bottom one I can directly deduce from the premises that I have uh but then the other two I can again Trace back my steps and say okay I need to prove this and this right here and these two I can then you know prove from my um from my original premises all right why does this help I'm sorry this is a bit of a long winded thing but at this point you can say okay great now you have kind of the Target and you know the proof to get there so this you can ask what don't we have isn't this exactly why we have the prover in the first place to kind of find that path like why does that help us and now comes that third step you can say okay all the green ones are involved in proving this right here now let's look at what we actually have here we have H and A and B and C so we have four things h a b c and c so these are the four elements let's say that are that we want to prove something about let's look at all the elements that appear anywhere in the proof so these are a h we already had these B and C we already had but we also see other things

Auxiliary Construction

appearing on the path to the thing we want to prove for example e and D in fact those are the two things that appear extra so you can see out of all the things we have available here only four of them appear in the thing we want to prove but six appear on the way to achieve that proof and we're interested in that difference so e and D are they are you know on the way to the proof but they're not part of the thing we want to prove and since we've constructed this thing here out of kind of combining these premises these construction premises in that list before this means that we can now make this into a math problem that requires the solver to construct something how do we do that we simply take the problem and we only reduce the original construction down to the things that we want to prove something about right so in this case only b c and a the triangle and H right and we ask please prove that ha this thing right here is perpendicular to b c and if we do that in order for you to prove this you will need to construct the points e and D along the way to your proof I hope you can see a little bit how that works so step one we use this list of constructions you know constructions and we sample at random to construct something like this right that by definition means that anyone could construct these things because you can just say well let's just construct the midpoint of this and consider it right and so on then we do a kind of a closure of all the things we could possibly say about what we've constructed this is the symbolic deduction part then we pick one of these and we do what they call a traceback the traceback is we'll look at all the nodes that are required to prove the statement that we picked now what we can do is we can say which elements of the original problem were are part of the final statement and which elements of the original problem are not part of the final statement but are somewhere part of the proof and the ones that are not part of the final statement but somewhere in the proof they appear those that directly leads to a the formulation of a problem which is you can see this is not the same as the original thing this is derived from it so I can now leave away everything except the things I want to say something about and by having the people prove that statement necessarily they will need to construct these auxiliary constructions along the way because those are on the way to the proof they can be constructed because that's how we originally did it and they are not already there because they're not in the final statement so that's how they come up with a data set that requires these auxiliary things to be constructed in order to prove something that's it that's how they make the data set I think that's pretty smart you can see totally that it requires this very specific set of circumstances like okay the problem domain is kind of infinite but also the list of basic premises that you can combine to achieve these premises is sort of very small and also can be verified easily and so on but still I think it's pretty smart how they did that um so yeah you can see right here points e and D took part in the proof despite being irrelevant to the construction of ha and BC therefore they are learned by the language model as auxiliary constructions so now we're just going to take the language model and just going to feed those derived proves into them and we're going to weigh the ones with auxiliary constructions specifically notably they pre-train on all the proofs and then they F tune on the ones with auxiliary constructions so that the language model is kind of heavily based towards making these auxiliary constructions and yeah that's that is how they come up with a language model that can then suggest new to construct new things all right yeah this all here is just saying that they say okay the sort of the key missing piece is generating new proof terms sort of if you want to do computer math they're very good at this symbolic deduction but they're not good at suggesting new things to feed into the symbolic deduction and yeah the proofs perform auxiliary constructions that symbolic deduction engines are not designed to do and if they if you make them consider them then that introduces infinite branching points to the search tree so even in the first step you have infinitely many things to consider not just many but like infinitely many because constructing new things is is a wide open domain like infinitely open um the language model is seated with the problem statement string and generates one extra sentence at each turn conditioning on the problem statement and past constructions describing one new auxiliary construction such as construct point x so that abcx is a parallelogram each time the language model generates one such construction the symbolic engine is provided with the

Experimental Results

new inputs to work with and therefore its deduction closure expands potentially reaching the conclusion they look a little bit into the data they have generated uh the proofs they are skewed towards shorter end of the proofs but here you can see sort of the average proof length and um very long proofs but still you can see there are enough proofs in the data set that surpass that although this is a log scale but suffice to say they do manage to cover the domain okayish and okayish to the degree that the proofs they generate are at least sometimes long enough or even longer than the proofs that are necessary to solve these math Olympia problems uh yeah I don't want to go too much into this um the notes are available to patreons as always um I want to go a little bit into the results they consider 30 uh 30 math Olympiad problems as a benchmark and previous methods solved kind of not as many while as their system solves 25 of them of the ones they didn't solve um I am I didn't maybe I've overread it because nature papers are a complete like I really don't know how someone can read those papers and sort of Orient themselves in it cuz they're it's like five times over the same thing but then it's always a bit different and oh this is the methods thing and in any case I have not seen them deeply look at the ones where it failed and explicitly say you know here is why it failed were they not representable but I think the ones that are not representable in the language they've already excluded from the 30 uh so I haven't exactly I don't exactly know the five they couldn't solve kind of why um and that would be interesting but maybe that's in the paper and I haven't seen it uh they also say they could do some ablation here they say uh oh the funny part is okay GPT 4 zero um they say if we only use 20% of the training data Alpha geometry still achieves 21 problem solved similarly if we only use less than 2% of the search budget so a smaller beam search during test time Alpha geometry can still solve 21 of the 25 problems so they the way to read that is they got to the 2021 is Mark of solve problems pretty quickly but then they pumped a lot more comput data into it and that gave them four more problems respectively without fine tuning you can see that they solved 23 so they're like okay let's fine-tune on the uh on the ones with only auxiliary constructions and that gives them two more what that suggests to me so they frame it here as oh if we only use 2% we still achieve stateof thee art I read that as we pumped a lot more uh compute and so on and data into it and we only got very marginal returns for it uh and that kind of suggests there might be a sort of a ceiling to this where you'd have to in order to maybe solve the five remaining ones you'd probably have to invest humongously more resources I don't know that's just how it reads to me uh um if they don't frame it as sort of oh we can still do really well with a smaller budget it's okay we couldn't do really well even with a big bigger budget so here you can see the procedure we pre-trained the language model on all 100 million synthetically generated proofs including the ones with that are pure symbolic deduction we then fine-tune the language model on the subset of proofs that requires auxiliary constructions which was about 9 million proofs so about 10% of the total data had this auxiliary construction uh briefly going over the rest of the paper right here I hope you kind of get what they did now and how they did it but there are some interesting other bits in this paper for example the proof length or hard problems at least uh seems to correlate with the average score of human contestants you can see as the problems get longer sort of in terms of proof length humans also score Less on them which makes sense like it kind of makes sense if someone says this but it wasn't necessarily obvious uh that this is the case but it could also have been that they're just more difficult but then again if you consider there are super few like that the amount of actual different things you can do in this domain is very limited it does make sense that kind of the main way you can introduce difficulty into this domain is just by making the requir proof steps longer um and you can also see that all of these very difficult problems did require some sort of construction in fact the ones that are only deduction tend to be the easier ones um at least yeah you know here yeah so yeah I think this is a domain where just hardness of problems correlates probably more than in most other domains with just how many steps are required to achieve the goal and your goal is just that humans aren't that good at kind of doing that search problem over very long steps looking very far in into the future in a search tree good they say yeah something about

Problem Representation

how they represent it they purposefully don't use a formal language that can represent kind of general math problems such as lean they say that requires a large amount of ground workor to describe these problems so they say we instead adopted a more specialized language using in these other uh papers and these other works or other systems um and owing to that 75% of all the problems can be adapted into this representation every proof step is logically and numerically verified and can also be evaluated by a human reader as if it is written by an IMO contestant thanks to the highly natural grammar of the language so that's one thing that's certainly pretty helpful in evaluating these things as you can see in the proofs that I've shown you at least it did look and sound interpretable right this the proof here on the top right we could read it right so the top thing here is um a construction right it's very readable from a human standpoint now and obviously also tokenize once you get to it and then these things right here the premises and the inferences and what that means all of that is quite readable and interpretable as a human and that also allowed them to do human rating on their Solutions but yeah so that's certainly an advantage as opposed to doing it in something like lean where probably already this proof would be quite large and quite not understandable by human standards but the other reason is certainly that this being so limited a limits a lot their premise construction sort of bre and second of all is much more conducive to the language model because the vocabulary is limited and with the same amount of data you can cover a lot more sort of valid sequences if the vocabulary is shorter all right I don't want to bore you too long uh they are still like a tiny bit worse than the gold medalists um on on this but they are probably better than most humans especially most untrained humans so take that as you will this is as I in

Final Comments

summary this is a paper that does a very good job at a very specific Niche application um and I guess the biggest question here is this technique that they used how Reliant is it on the specifics of this domain and how applicable is it to a more sort of General more General problem settings or if we ask another way what other problem settings exhibit the same core fundamental elements of this technique such that it is applicable right uh I'll leave that up to someone else to decide I wish you a very pleasant evening thanks for listening and bye-bye

Другие видео автора — Yannic Kilcher

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник