AlphaFold - The Most Important AI Breakthrough Ever Made
22:48

AlphaFold - The Most Important AI Breakthrough Ever Made

Two Minute Papers 02.12.2025 220 277 просмотров 14 469 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
To celebrate the 5th anniversary of #AlphaFold, I was invited by Google DeepMind to interview Nobel Prize Winner and Distinguished Scientist, John Jumper. Note that we have no business ties with them. Thank you so much to John for being so kind and insightful, and to the film crew as well - they all did an incredible job. AlphaFold: https://deepmind.google/science/alphafold/ The full Thinking Game Movie: https://www.youtube.com/watch?v=d95J8yzvjbQ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers Note that just watching the series and leaving a kind comment every now and then is as much support as any of us could ever ask for! My research: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (5 сегментов)

Segment 1 (00:00 - 05:00)

favorite two-minute papers episode. — Oh, Alpha Fold. It's easy. It almost felt too easy. It felt like too many of ideas were working. It felt it was going up and I remember um talking to Tim, the engineering lead, going, "This is really feeling too easy. We're having too much success. This problem can't be this easy. Are we leaking the test set? " Right? You know, are we doing the classic machine learning sin? Fellow scholars, I don't really like to be on camera, but there is a big reason I am here today. You see, I met Nobel Prize winning chemist John Jumper last year, and we talked for an hour. And in that hour, I learned more than I thought I would learn in a year. It was unbelievable. And today, I have the opportunity to give you this amazing gift, too. So, with that said, hey, John. — Hello. really grateful to have you here today. I have goosebumps which I have carefully hidden under this lab coat. — So [snorts] what is Alphaold and why is it important? — So Alphafold is a uh is a neural network which makes it relatively appropriate for the podcast but uh it is a deep learning system that predicts the result of a specific scientific experiment. And to tell you about that, I should tell you about the domain that it's in. Proteins. So proteins are the nanom machines that basically drive your cell. A couple thousand atoms each. Um they're coded for by your DNA. When we say that DNA is an instruction manual for the cell, a lot of what it's telling you is how and when to build proteins. And so three letters of your DNA map to individual one of 20 chemical groups. Those chemical groups are basically just like little collections of atoms, you know, boop boop boop boop. And there's a machine, another protein in the body that reads the DNA in a relatively complicated process and kind of builds out the proteins one step at a time, joining links in a chain or a rope. So it takes this chemical group, attaches that one, attaches that one, attaches it basically the same way each time, and builds out a string of maybe 300 of these is a reasonably typical length. And then what happens when your cell builds this thing is of course it's not a machine. Most of them are not machines that function kind of just as floppy ropes. They actually some of parts of it are greasy. Some positively charged. Some parts are negative. So it will fold up. It will make helyses. It will make sheets. It will pack into a relatively compact 3D object that is kind of the assembled working machine. So these are machines that build themselves right or joined in 1D. And of course our DNA is 1D and our world is 3D. So this is kind of how the body solves this. It builds these things. They fold up into this incredibly intricate shape. And this happens, you know, there about 20,000 human proteins. There are hundreds of millions, billions known um proteins across all organisms. Um and one part of what I described is really easy to measure. It's really easy to read our DNA thanks to the genomics revolution. You can think of it as pennies to read the sequence of a protein. the DNA that becomes the linked amino acids. It takes a year uh to get the structure of a protein and really hard experiments and they often fail and just extraordinarily difficult. If you want to put an economic value on it, maybe $100,000. So scientists do this experiment where they start from DNA, but they really want to understand how this machine works. So they need to see a picture of it. And so they uh determine the structure experimentally. They use enormous synretrons the size of small uh small villages in order to do this. And so but people have done it a lot. There's been enormous societal investment because it's really important to understand this to understand disease to do drug development. There are about now 200,000 known protein structures about 140,000 when we did alpha fold and we developed a deep learning system that goes from u amino acid sequence DNA sequence to the structure of a protein in five or 10 minutes instead of a year and does this with accuracy close to not quite as good but very close to experimental accuracy and it's been used enormously. So we've predicted the structure of about 200 million proteins. Every protein from an organism whose full genome has been sequenced. Scientists are using it for drug development to understand the body, everything else. And I think from a machine learning point of view, it's both kind of the first problem really transformed by AI. It's an extraordinarily practical system that scientists are using. I think it's something like three million scientists have used our database of predictions. People make predictions every day with this. And it's also this kind of promise that we're going to use AI not just to do things that humans can do or solve human problems, but to do kind of superhuman level. There are no humans that are good at getting the structure of a protein by eye. They do it with experiment. That we

Segment 2 (05:00 - 10:00)

can use this to transform science that we can build new tools that fundamentally advance our science. — Now, I remember asking you last year, how did it feel when it first started working? the time I really remember is that when it first started or really what would happen is we kept you know AlphaFold is built iteratively it's not yesterday we didn't have AlphaFold today we did it was maybe two years and probably 30 40 different kind of individual ideas that worked along the way some grand ideas some small ideas but each one kind of inching up the performance and I remember maybe a year into building AlphaFold um 2, the one that was uh really very successful. It almost felt too easy. It felt like too many of ideas were working. It felt it was going up. And I remember um talking to Tim, the engineering lead, going, "This is really feeling too easy. We're having too much success. This problem can't be this easy. Are we leaking the test set? " Right? You know, are we doing the classic machine learning sin? — And he was sitting there going, "I don't think we are. " are and we were we went back we double checked we zeroed coordinates in our eval set to make sure we weren't actually leaking we couldn't really like ever find a leak but it felt too easy it felt like the nature shouldn't yield this easily to our efforts — and I remember I didn't really wasn't really totally sure until actually we did some structure predictions for SARS cove 2 proteins related to COVID — and then the experiment came out afterwards that we were really sure okay we were really not leaking anything but it was wild. — Wow, that's crazy. But that is also the hallmark of a pro-scientist, you know, because during a research project, you miss a thousand balls and when you finally hit one, you know, you don't ask questions, you celebrate. But that's not what you did, you know, you picked apart the performance immediately instead. So that's amazing. I mean you but you see pro aletes they don't you know they're when they miss they're always interrogating fixing thinking like you this is a craft machine learning is a craft and you you have to be a craft person to do it. — Mhm. All right. Now the score I'm asking this because the score didn't jump from zero to 100 in just one magic trick. So this was a sum of many brilliant little puzzle pieces and each of these contribute a little to the score. You add another puzzle piece, you get another few points. And what I'm wondering is that this sounds like a linear progress. You know, you you're climbing step by step. So why is it s so surprising that when you get to the peak? So you know much like Moore's law was a successive succession of ideas and breakthroughs that in total gave the appearance of inevitability — and that inevitability in the case of Moore's law was driven by you know exponential growth and investment as well you're when you think about when you do this there's you never know if you're going to get the next win right you know you don't know in fact we have um charts of progress and they don't actually go like this, right? The ideas that you know we list maybe go like that, but you the actual progress went flat flat. Oh, what about this idea idea flat flat idea idea. And in fact, the flat verse up we at the time Deep Mind was kind of on six month cycles. So every six months you kind of formally continued your project and you presented your results to the whole company every six months. And I remember the first three months we would always try our wildest ideas and it would mostly not work and we would get very scared and then about halfway through we'd like okay guys we got to get serious we need to not have no progress and then suddenly some idea would hit and then a bunch of ideas would hit. So it was always you know it was alternation alternating elation and terror is more you know it's only when you make it really blurry and you squint you zoom out oh it went up linearly. Yeah, it's like overnight successes 10 years in the making, right? — Yeah, it's that sort of thing. — All right. Can you build an intuition on what proteins would look like when folded up into a 3D structure? And also, did you have a protein structure where you looked at the 3D result and said that cannot be right and it turned out to be right. Does that happen? — Okay, I'll tell two stories on this. I mean, an intuition. You mean can I build an intuition not on an individual protein? So sometimes you can say oh this looks really similar to this other protein — and therefore I bet it's going to have about similar structure. So that's like what humans can do and that's we people call it homology modeling. It's a very fancy name for saying well the sequence is similar probably the structure is similar. So you can do that and sometimes you can notice individual motifs like there were all these papers

Segment 3 (10:00 - 15:00)

that would list all these motifs like helyses are very common element in proteins and I remember a paper on well the last element of a helix is going to be one of these three amino acids the one before that's going to be some of you know so there's some regularities and human rules that they've cataloged and you can kind of use that but ultimately it only works a little bit and doesn't give you the kind of precision you need to do drug development at all. In terms of things actually that surprised actually a real surprise came from machine learning I shouldn't have been surprised but I was I mean there were two big surprises. One was sometimes we would have proteins with giant voided cavities in the middle or a protein that was like C-shaped and you know proteins made the atoms and proteins are really up against each other right it's a very dense object and I said it doesn't look right um and but the model was extremely confident and we looked in the experimental structure and then immediately realized what would happen so alpha fold 2 the original alpha fold 2 was trained only on single proteins, but often when a protein is solved, sometimes multiple copies of itself will appear, what's called a homr. So maybe three copies actually sometimes densely intertwine with each other to make the actual folded thing is not one copy. It's the three copies together, a trimer. Or there would be some other protein of a completely different type that it would say wrap around and they only appear together in the body. And sometimes alpha fold would realize these patterns and leave these giant voids that look totally wrong or this spiral which is just floating in air and I'd be like well that's wrong but it's extraordinarily confident and then I would find out oh it realized that in fact this protein comes in three copies and so this spiral is onethird of that and if you overlay it it's perfect. So that even though alphafold we didn't tell it about this context it had learned rules that sometimes there are these geometric patterns which I can explain. I think the other big surprise was actually when we ran Alpha Fold across random proteins in humans. We would see some bits that looked beautiful and structured and some really ugly long arcing ribbons. Oh no, that's wrong. And I remember we looked at that and we wouldn't see this very much when we predicted proteins that were experimentally solved. We said, "Oh no, are proteins that have been experimentally solved more or special and actually alpha fold isn't good on the things we hadn't solved. " And then Katherine on the team a little later that day looks in this um uniprott this uh database of various experimental facts about proteins which will tell you certain regions that are known for example experimentally to be disordered. And she starts to realize that where AlphaFold is making these ridiculous long arcing predictions that can't possibly be correct and they aren't proteins. It was very low confidence and those regions were disordered. And what AlphaFold was in fact telling us is this region doesn't have a structure. um kind of implicitly. So what we found out is that the lowest alpha fold confidence protein was actually pretty much a state-of-the-art predictor of whether a protein was disordered. And so we would find all these things that we kind of knew about proteins but we didn't feel because disorder doesn't appear in this database of protein structures. We would find all these things out just kind of looking at alpha fold and being surprised. — Mhm. Amazing. Now I'll not ask what its most impactful application is because it has now hundreds of thousands of research works uh building on it in just about five years which is unbelievable. So which one is your favorite? I think I have two favorites. One was this giant protein hundreds giant protein complex hundreds of protein chains called the nuclear pore. The nuclear pore is actually um kind of the the giant gates for the nucleus. The nucleus stores your DNA, right? It's where your nucleic material is and the rest of the cell is outside the nucleus. And so you need a gatekeeper that decides who can enter and leave and kind of opens and contracts. And I remember thinking, you know, this is enormous. Alphold does, you know, it's a thousand times bigger than what AlphaFold can do. So we, you know, maybe later we'll come up with some machine learning that will help with these kind of problems. — And then this paper comes out. The first one I saw was out of the Kazinski and Beck lab saying we solve the structure of the nuclear pore that we knew something like 30% of before. Now we know 60 70% and a lot of the rest is actually disordered. um because we combined very low resolution experimental techniques cryoet with alpha fold for the individual pieces and running different alpha folds and finding all the little kind of joins and compartments and then we could finally build the model of the nuclear pore and in fact that and some very related papers were a special issue of science um all about the structure of the nuclear pore and three out of the four made huge use of alpha fold I remember searching through these papers and maybe 150 mentions of the word alphafold in

Segment 4 (15:00 - 20:00)

this in work that we didn't too that all we did was make the software tool that scientists use to make amazing discoveries. And I just felt like, you know, I I'm the moment I the Nobel is extraordinary. And now I'm waiting for the Nobel of someone who used AlphaFold and their own creativity to discover the next thing. — Yeah. The second order Nobel is the one that I'm I can't wait for. — And I think the other one was people discovered all these uses of Alpha Fold that we didn't expect to really work. So they would run thousands and thousands of AlphaFold predictions and just see which one that AlphaFold liked. So the one I really loved, there was a paper on fertilization. How does egg and sperm come together? And there are proteins on egg sperm that kind of join together and they recognize each other and they start fertilization. But it was known that there was a protein in humans that was missing that something didn't make sense. And there were actually two labs that did this that took this protein on the egg and 2,000 proteins, every one that appears on the surface of sperm and just ran 2,000 alpha fold predictions and they found one specific protein that alphafold thought stuck up against this egg protein. And then they go to the lab and they say knock this protein out and egg and sperm will come together but not start fertilization. They'll make mutations in the individual regions in which these come together and they'll find out that blocks fertilization. So they pretty they've established biochemistry now this thing they had no idea which of these 2,000 to look at and alpha fold said look at just this one and sure enough that was the protein that was essential in this and I love this notion that we would never do this with experiment you would never send out 2,000 labs to make 2,000 structures and see which one comes back — that we can do new types of science because of the scale we've achieved. — Yeah. Incredible. Any unexpected use cases? — All right. So one that really surprised me I can tell you an unexpected weakness and an and then an unexpected strength of alpha fold. So the unexpected weakness is if you take a protein and you break it, you do something that's going to cause it to be unstable. Like one very strong rule of proteins is that positively or negatively charged amino acids don't appear in the greasy middle part of a protein, right? They don't like grease. And so a spartic acid is a very small charged amino acid. doesn't really appear in the center of proteins. So if you take a a protein and you mutate one of the inner amino acids to a spartic, alpha fold won't really change its structure. Even though this doesn't make sense and it's there's reasons you can explain it, but we say alpha fold is not extremely point mutation sensitive. It's answering a slightly different question. So we said okay that's some future work. And so there are a lot of people who do protein designs and were using alphafold to check their designs and say which ones does their design method work? Does it produce struct does it produce sequences that alphafold thinks folded to the structure they were trying to make? And I remember thinking that's probably not going to work because alphafold isn't mutation sensitive. It doesn't have a sensitive enough understanding of the interactions. But I was totally wrong about that. And people found that it was actually really good when it came to design proteins at figuring out which ones might work. One paper that came out um a few months after AlphaFold said that they when designing proteins to bind to each other, they get a t-fold increase in success rate if they only make the things that Alphafold thinks binds. — Mh. — And it's become really dominant actually that Alphafold filtering is one of the secrets of modern protein design. Even though we tried we were designing a natural protein system, we got kind of this enormous design improvement for free. Mhm. Now, just to showcase the influence of AlphaFold, in my opinion, let me hold on to my papers for this one to make sure I word this properly. — Oh, yeah. — In 20 years, nearly every person with access to modern healthcare will benefit from a tool, diagnostic or drug influenced by Alphaold. What do you think? — I think that's pretty fair. I think that it is now a tool of modern biology. And I will say that there are other tools like every you know biological discovery today in some way benefits from DNA se DNA sequencing right DNA synthesis right these are tools that underpin the kind of technology of modern biology — and alpha fold is very certainly one of those that like it you pe people teach it to grad students right it's a standard part of the graduate curriculum We will learn how to do some things and I will show you how to use alpha fold because you will probably use it in your research and then people make all these discoveries and these discoveries compound and grow. That's the wonderful part of working in research is that you have this enormous spreading out of the work you do. It's not just, you know, it's wonderful. I think sometimes it's wonderful to be a doctor, to be someone who very

Segment 5 (20:00 - 22:00)

definitely and obviously decides the right treatment for a patient and make someone healthy. But I also love the thought of being a researcher that I can build a tool that will help a hundred thousand people, million, that will help a billion be healthy in the fullness of time as kind of it helps every it helps bring forward science. You know, I like to think that AlphaFold maybe made structural biology, which is one of the major fields of biology, five or 10% faster, — right? And that's extraordinary. Is it a possibility that you know Alpha lists it gives you a confidence score too not just a prediction can it be confidently incorrect? — Yes. Um so we uh a very simple analogy. If um if the weather report says there's a 90% chance of rain today and it doesn't rain, was it wrong? Um some people will say yes, but that's not obviously correct. You're supposed to be wrong one time in 10. So we can say Alphafold's confidence is calibrated. So that's what we can really say is that you know 90% chance of being or you know really what we say is average accuracy is 0. 9 on a certain scale called LDDT. So if our confidence says 0. 9 then on average it will be there but some of them will be very bad and actually we know a very interesting failure mode of very high confidence. Sometimes it's just wrong. But more commonly, for example, a protein will have two structures and alpha fold will produce one with high confidence, but it won't but you really wanted the other one. And so it confidence more reflects does this structure make sense as one state of the protein, but it doesn't necessarily say it's every state of the protein or the one you care about. — All right, let's have a lightning round. I ask you something and try to answer in one sentence. — Oh, that's hard for me. How did Alpha 2 improve on the first one? — We did machine learning research uh at the intersections of protein and ML, not taking ML off the shelf and applying it to proteins. — Alpha 43, — we expanded it to do the protein cinematic universe and we adjusted the architecture to make it work. Alpha Proteto. — It developed new techniques to design more efficiently using Alpha Fold and other ideas. — Favorite two-minute papers episode. Oh, — Alpha Fold. It's easy. — Kidding. Yes. All right, John. I've learned so much again. Huge honor. Thank you so much. It — was a pleasure. Thank you.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник