Building OpenAI o1 (Extended Cut)
22:13

Building OpenAI o1 (Extended Cut)

OpenAI 20.09.2024 278 718 просмотров 6 338 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Top row (left to right): Mark Chen, Giambattista Parascandolo, Trapit Bansal, Łukasz Kaiser, Hunter Lightman, Karl Cobbe, Łukasz Kondraciuk, Szymon Sidor, Noam Brown, Hongyu Ren, Liam Fedus, Hyung Won Chung Bottom row (left to right): Ilge Akkaya, Jakub Pachocki, Shengjia Zhao, Jason Wei, Wojciech Zaremba, Jerry Tworek Host: Bob McGrew More on the research: https://openai.com/index/learning-to-reason-with-llms/ Full list of contributors: https://openai.com/openai-o1-contributions/

Оглавление (11 сегментов)

Introduction

all right I'm Bob McGrew I lead the research team here at open aai we've just released a preview of our new series of models 01 and 01 mini which we are very excited about and we've got the whole team here to tell you about them what exactly is 01 so we're starting a series of new models uh with the new name oan this is to highlight the fact that you might feel different when you use o as a compared to previous models such as GPT 40 as others will explain later o1 is a reasoning model so it will think more before answering your question we are releasing two models o1 preview which is to preview what's coming for 01 and 01 mini which is a smaller and faster model that is trained with a similar framework as o1 so we hope you like our new naming scheme o1

What is reasoning

so what is reasoning anyway so one way of thinking of reasoning is that there are times where we ask questions and we need answers immediately because there are simple questions for example if you ask what's the capital of Italy you know the answer is Rome and you don't really have to think about it much but if you um Wonder um about a complex puzzle or you want to write a really good business plan novel you probably want to think about it for a while and the more you think about it the better the outcome so reasoning is the ability of turning thinking time into better outcomes whatever the task you're doing so how long have you guys been working on this early on at open AI we were very inspired by the alpha go results and the potential of deep reinforcement learning and so we were researching that heavily and we saw great scaling on daa and Robotics and we were thinking about how H how can we do reinforcement learning on a general domain to get to a very capable artificial intelligence and then we saw the amazing results of scaling on supervised learning in the GPD Paradigm and so ever since we've been thinking about how do we combine this two different paradigms into one yeah and it's hard to point to one exact instance where this whole effort got started but you know we've had early Explorations with yakob and Shimon lukash Ilia and of course like I think one moment in time here is consolidating things with Jerry and um having him build out this large scale effort here so I mean it's been

Aha moment

going on for a long time but I think what's really cool about research is there's that aha moment there's that particular point in time where something surprising happens and things really click together are there any times for you all when there was you had that aha moment like we trained gpt2 gpt3 GPT 4 there was the first moment when the one was hot of the press we started talking to the model people were like wow this mod is really great and starting doing something like that and I think that there was a certain moment in our training process where we trained like put more comput in our than before and train first mod generating coherent chains of thought and we so wow this looks like something meaningfully different than before and I think for me this is the moment uh wow related to that uh when we think about like training a model for reasoning uh one thing that immediately jumps to mind is you could have humans write out their thought process and train on that when aha moment for me was like when we saw that if you train the model using RL to generate and hone its own chain of thoughts it can do even better than having humans right chains of thought for it and that was in aha moment that you could really scale this uh and explore models reasoning that way for a lot of the time that I've been here we've been trying to make the models better at solving math problems as an example and we've put a lot of work into this and we've come up with a lot of different methods but one thing that I kept like every time I would read these outputs from the models I'd always be so frustrated the model just would never seem to question what was wrong or when it was making mistakes or things like that but one of these early uh 01 models when we trained it and we actually started talking to it we started asking it these questions and it was scoring higher on these math tests we were giving it we could look at how it was reasoning um and you could just see that it started to question itself and have really interesting reflection um and that was a moment for me where I was like wow like we we've uncovered something different this is going to be something new and it was just like one of these coming together moments that that was really powerful so

Spiritual experience

when you read the the thoughts do they does it feel like you're watching a human or robot it's like a spiritual experience it's a spiritual experience but then you can empathize with them all like oh that's a mistake that a lot of people would make or you can see it sort of questioning common conventions and yeah it's spiritual but like oddly human in its Behavior it was also pretty cool at some point when uh when we have seen in cases where there was like a limited amount of thinking allowed for the model that just before the timeout the was like I'm like I have to finish it now and like here's the answer I spent a lot of time doing uh competition math when I was young and that was really my whole reason for getting into AI was to try and automate this process and uh so it's been very like a huge full circle moment for me to see the model actually be able to follow through like very close to the same steps I would use when solving these problems um and that's you know it's not exactly the same chain I thought I would say but very relatable it's also really cool to you know it's believable that these models they are getting on the Casp of really advancing engineering and science and if they seem to be like solving the problems are you know maybe we can call ourselves experts hard for us then maybe they will be even hard for some other experts and could Advance science so

Hurdles

we've talked a lot about some of the Great Moments and the times and everything just clicked what are some of the hurdles places where it was actually really hard to make things work training large models is fundamentally a very hard thing to do and there are like thousands things that can go wrong and there are at least like hundreds that did go wrong in every training R so almost everyone here like you know put a lot of Blood Sweat and Tears in training those things and figuring out how to how to keep them continue learning and improving on a path is actually the path of success is very narrow and the ways of failure are plentiful it's like imagine like having a the center for launching a rocket to the let's say some planet Moon or so and if you are off by one angle you won't arrive at the destination and that's our job so the model we said is very good often times better than humans like has equivalent of several phds and that is sometimes a challenge because we have to often go and verify that the model isn't going off the rails doing something s sensible and it started taking some serious time as we scal the model uh we were saturating out all the industry uh grade evals and we don't know what to look for next so that is also a challenge yeah I do think all of these things we ran into it's also been one point of fulfillment it's like uh you know every time you have a puzzle it's like another hurdle for this team to overcome and I'm really glad with all the little hurdles that we've overcome

Testing

so what are some of the ways you tested the models did you have any favorite questions that you saw the model get better at how many hours are in ster for whatever reason the Jud GPT wasn't able to solve this question reliably but oh one like you know we did like a year and a half work like a Lar and now we can count the number of Arts in Strawberry we should have just hardcoded that wayu reliably I have this habit which I think other people here do too of whenever you go on Twitter and you see some post that's like large language models can't do this you copy and paste it in and then you said confirm that our large uh can do this to give people a

How to use o1

sense of what they can use the model for I'd love to hear some of the ways that you use 01 so uh one way I've been using o1 is for obviously coding and a lot of my job is about coding so the U more and more I focus on the problem definition and use this what's called a TD test driven development so instead of writing the code that implements the functionality I focus on writing say the unit test that specify what is correct behavior of this piece of code to pass and so because I can focus on more that and then pass it on to1 to really implements the thing I can focus on this what's important what's the high level problems um to solve and so on so this has been really a important way of Shifting my focus and another area is debugging so now when I get some error messages I just pass to oan and then it just prints out something sometimes it solves right away even if it doesn't it at least give some better questions to ask and uh provide some ways to think about this problem better so it has really important change of um working uh for me and I hope this helps others too I like using a one more and more for learning the more I ask it's like various complex technical subjects I find hallucinate less and explain better those Concepts on previous models for me I like to use o1 as like a brainstorming partner so that can range from anything from like a how to solve some very specific ml problem machine learning problem to like how to write uh a blog post or a tweet so uh for example I recently wrote a blog post about language model evaluations and I was asking oan about ideas for the structure of the blog post pros and cons of certain benchmarks um and even the style of the writing and I think because it's able to think before it gives the final answer um it's able to connect ideas better it can revise and uh critique candidate ideas and things like that yeah I think if you need like a know you have some short text and want it more creative something really different that that's a great use to like give me five different ideas also if you have just sort of like some unstructured thoughts it's a really brilliant thought partner so you have like some ideas it's like well how should I connect these things what am I missing um and through its final answers and through sort of reading it's like thought process it can really lead to like much better results for you yeah I use it to try out a bunch of our internal secret ideas and it actually tries improved yeah for Standalone projects it's great like I I had to add a GitHub plugin I know nothing about adding GitHub plugins and I just said like hey I want GitHub plugin that displays this and this information about the pr and and like um yeah just produce the code I just like you know I just ask it like okay so where do I need to paste this code I don't even know like it's just like yeah place it here let's go I think for a lot of people it's uh it's hard to really fill the AGI and until you see the models do something better than humans can at a domain that you really care about and I think you know for go players and chess players that would have come you know a few years earlier and for a lot of us that like really value math and coding I think we're starting to feel that now our moms would be proud of us so are there any parts of this

What needed to be done

project anything that really needed to be done but you know people might not realize how important it is so I think building large scale reliable infrastructure for to run our biggest Flagship model training runs as well as doing research experiments is something that is not as exciting as doing research itself but has to be done and it's has a tremendous impact on the success on the entire project I think there is something special in open ey about how we structure our research G that we value algorithmic advancements in the same way as building reliable large scale systems and building data that are needed either way for training those models I'm really proud of openi in that way yeah I think that has been a consistent pattern throughout many of our big projects every time we scale a new thing up another order of magnitude we see another host of problems both algorithmic and infrastructure and we've definitely built a capacity to to um Advance them both with a lot of fuckus I feel the final model is just like literally a beautiful piece of art right in order to make it work you have to make sure that every step has work right you know we find some any Challenge and we solve it right I think that's really how opening ey operates and then very proud to work out here and I also must say there's like a really not only brilliant people are here but also kind hearted is just fun to me to work over here and I'm grateful to my colleagues to you know code with me par code with me hang out with me eat lunch with me like speak with the model with me so what's it like to work on the

Working on the strawberry team

strawberry team you can have your brilliant ideas but most of the time you spend on like running them and not running and failing and then it's very good to have people very close by in your office that you can ask for help with whatever failed last time because I mean most of the time you spend your time debugging things that didn't work so and having people who can help was speaking of this U help we had many times when we were trying to debug this for like a week and then passing by wend the and then like ask it and then like he just solved it right away he started calling it w the blessing and then blessing people and that has been uh really effective and I stopped like thinking about is this too stupid to ask and just ask right away well one of the things I like really appreciate about working at open a is that from every big project like this we like really learn right we we I think like from daa we learn importance of engineering from gp4 research and we keep iterating like this and uh the effect of that is that like now the strawberry team is again the best big uh research project team yet because it's built on all of the things we've learned from the previous projects and it really like you can like really see it working here like people really like started developing develop very good intuition like when do you hack something where do you like need to develop stronger fundamentals like when do you st overnight where do you actually like take a weekend off and like come with a fresh mind to this particular problem like I think I think like it's really amazing to observe this progress we make as a company yeah one thing I've liked is just how organic this project has felt the ideas have come literally from everywhere on this team and um people feel empowered to just like hey here's an idea I really believe in and it's the thing that I'm going to push and also people are just willing to get their hands dirty I feel like there have been a lot of you know deadlines some self-imposed but um we've all really come together and you know been willing to put in that work to make it happen this project really demonstrated the power of momentum where we get initial good results and more people get excit about particular field and particular research they try to contribute their new ideas those new ideas work even better and then the thing started snowballing and getting more and more momentum on itself or people just believing that this is the right thing to do and we should continue pushing this research related to that I think like we have a very like lots of very smart people but also like very opinionated people but people are always willing to like update their opinions once you see results to the contrary and I think that's like make things really fun it's kind of cool to be in that place that's a combination of like a brilliant scientists and engineers and folks who can like build out like incredible systems uh it's very humbling

Giving the model personality

so one thing I remember a few months ago I remember the model was very smart but it was also kind of boring so what was it like to give the model A personality yeah so that's interesting I remember I asked model about the meaning of life and it gave me an answer 42 which is not that bad of an answer uh and you know the kind of similarity when I asked model you know what is love it told me oh there like a strange human feeling and uh once we actually uh get gave model personality made it actually work with chat then the aners start being quite interesting that day and know I asked about the love it told me you know there's like a romantic love familial love self love nonconditional love conditional love and uh it became more useful and also more fun the funnest moment is that ask the exact same question and you tries to Define love with algebra and you ask them after question so what's the story of 01 mini how did that come to be um so the motivation is that we want to bring the ow series to broader audience which with like much lower cost so we created ow mini which was designed to be like a minimal demonstration of the whole 01 pipeline order framework um we make it uh stand reasoning specialist which may not necessarily know the birth dat of our favorite celebrity but really truly understands like how to do reasoning effectively right and truly has a lot of um intelligence the model is actually really smart right it's like much smarter than our previous best model for all and also almost um part right with our best model o1 uh but only comes with like a fraction of cost and latency um it does have the limitation that you may not know a lot of the knowledge in the outside world right which is not about science or technology but we try to make it roughly on par with our previous best mini model like 40 mini and we are working to improve it further so I'm super excited for our external users to just try it out for this like lightning experience of uh reasoning and thinking so what motivates you to do your research I just find it fascinating that in this world you have these things that can do intelligence and reasoning and they're much smaller than you think and they can do this in different ways it's just super fascinating good things in life take time and our models just tend to answer too quickly and eventually want to have models I can do for example research for months or years and I feel like this is the first step in the direction of models I can think very long for about one problem and now we're at the level of minutes and I think it's just the first step on a long path that hopefully takes us to models that can think for months or years as time goes by it feels very meaningful that uh that IE together with a small number of people can have a some substantial positive impact on the world and also it's uh fun day today it's just fun I like you know speaking to the computer I like starting a job on the cluster I very much enjoy uh collaboration it's just beautiful I really like our malls to be useful and I think technology has a chance and a promise to improve human life and I like our models to do like work for us to to help us with our day-to-day problems and giving them ability to reason H allows them to do for us things that they just couldn't be that will allow us to spend our time more productively yeah very excited about this I mean I think these sort of paradigms unlock things that these the models couldn't do before so it's not just like answering some sets of queries a little bit better but it's actually getting to a point where through planning through error correction it's able to just unlock like new capabilities and you know the ability to produce new knowledge in the world for like science for Discovery I think is one of the most exciting pieces for this and I think in some short amount of time it's going to become like a larger and larger contribution or contributor to its own like development and I think that's like a really exciting regime I think some of the people on the team we were um math or coding Olympiad participants in the past and there's this huge personal motivation to create a system that can beat us we do best and um I think a second thing really kind of Echoes the point that JT and leam made you know I do think reasoning is a much more powerful primitive than people give it credit for you know um when you think about kind of accomplishing tasks reliably really that fundamental primitive has to be reasoning you're going to hit bottlenecks and you're going to have to navigate your way around them so I'm really excited for that future I think AI researchers job is to find the way to put more compute in and Hardware people have been doing so good of a job that the cost has been going down exponentially for a very long time and we don't have much time to find another way to put in more compute and it's kind of like a weight on my shoulder is just getting uh larger and this new paradigm really finds a way to unload that for probably a long time is there anything else

Conclusion

you've observed as we've been going through this project for the whole time we've been doing it anything else that's worth calling out I think an interesting meta observation that we've had is uh every model that we train is like a little bit different it has its own quirks and it's almost like a artisanal cuz when you look at a model that can do so many different tasks um each model you train won't be exactly the same performance at each task so it might be better at some tasks and worse at others and so there's this uniqueness of like personality to every model that is almost like a little bit beautiful thank you and congrats on releasing this

Другие видео автора — OpenAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник