# OpenAI DevDay 2024 | Fireside chat with Olivier Godement and Mark Chen

## Метаданные

- **Канал:** OpenAI
- **YouTube:** https://www.youtube.com/watch?v=Mkf1JA8_RkQ
- **Дата:** 17.12.2024
- **Длительность:** 30:20
- **Просмотры:** 7,425

## Описание

Fireside chat with OpenAI Leaders

## Содержание

### [0:00](https://www.youtube.com/watch?v=Mkf1JA8_RkQ) Introduction

hello again last section of the day uh that chat uh you know me let me introduce Mark uh Mark is our head of research at open AI looking after the research effort like model development and Mark was also uh the lead on owan listening uh he's been at open for um quite a few years you want to say a few words about you sure yeah so it's nice to meet you all um I've been at opening ey for over six years now and it's crazy just kind of having it go from a 20 person startup to really this kind of international corporation and it's great to be back in Singapore I was here about a year and a half ago met with a lot of the people on the ground and I feel like kind of the technical depth of the people we met with was beyond anywhere else that we visited so yeah really you guys are in a great spot I agree every single meeting for me um today and yesterday I was pretty impressed by the depth for sure um awesome uh we have 30 minutes uh I'm

### [1:05](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=65s) Singapore leading in AI

excited I don't get to put backk on the hot seat with like hard questions every day but you know here we are um you guys left a bunch of good questions in the box and so yeah we'll try to get to as many as we can uh let's start um yeah actually starting like what exact you about being in Singapore for the event uh how do you see Singapore leading in AI yeah I mean it's really just how deep the technical depth um that you all have so I remember coming here right and meeting the former prime minister and uh you know I gave him a coding demo and learned that you know he actually codes himself and you know he's done quite a bit of that and you know we found that pattern throughout you know all the people in government in business they just kind of knew the technical details in and out so that's the thing that makes me most excited I have exactly a

### [1:55](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=115s) Reinforcement learning

same anecdote I was in a meeting with a regulator like agency this morning I was prepared to talk about like regulation high level like you know how does tdpt work we started talking about reinforcement learning like from the get-go chair F how does it work like the level of depth and uh pragmatism I would say is a pretty high uh for sure um what was the last time something in AI research surprise you as in wow this is a Sci-Fi becoming real kind of way ah good question you're starting with the spicy ones um yeah I think for me you know I got my start in AI research in image generation and I just feel like kind of visual stuff is always very visceral to me so it's very compelling right like you can see it immediately you don't have to kind of read a paragraph or something so uh you know just the whole wave of image Generation video generation improvements like that's just been wow yeah for

### [2:55](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=175s) How close are we to AGI

me that would be uh Speech to speech like the first time having like a natural sounding conversation with an AI for like a few minutes and just seeing that you know it's actually like very intive like you know there is nothing like weird about it like just what I would expect uh that was my biggest mind blown I think the past like few months yeah I mean one thing to add on that too is you know I used to be a competitive programmer one other big moment was just watching these models slowly like catch up to and even get better than I am which is very scary he's pretty good that's what even most scary um okay very easy question mark uh how close are we to AGI yeah I mean that's a very hard to answer question just because people Define AGI so differently right um I think when you look at it from an economic lens you already see our products providing a lot of economic value right clearly open the ey is one of the most valuable tech companies and we are providing billions of dollars of values to real users today I think there's a different definition too of you know are we able to perform well on benchmarks that capture kind of intelligence or the ability to kind of um do General tasks and it's interesting to see just like two years ago those kind of tasks that were the frontier task for AIS were grade school math problems and today they're the hardest PhD problems so I do think you know we're in the regime where these models are able to solve some of the most difficult exams that Humanity has crafted how do we Benchmark models once

### [4:29](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=269s) Benchmarking models

they solve PhD level problems like what's the next yeah that's a very good question so I do think increasingly we have to lean on utility right we have a product out there we're trying to provide utility and value to the world and I do think you know when you saturate all the benchmarks like really what you're looking at is like do I provide value to the end users that makes sense and how has sorry I'm drilling down the how has like your a mental model of like Benchmark versus Vibes like Evol over the past few years doing all research what do you mean by that Benchmark versus Vibes like you know like qualitative sense like you use the model it just feels smarter it feels better versus you know having quantitative like yeah I mean I do think they're actually quite highly correlated and really AGI is kind of a conversation right it's like you put a model in the world and then someone says oh well you know I know it beats the benchmarks today but it's not quite capturing what I feel like an AGI is so then they contribute a benchmark to you right where you fail and then you create another you know version of the model and so I do think you know this is an iterative process it's um and I do think it's very highly correlated with Vibes that makes sense um

### [5:42](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=342s) Safety improvements

switching to safety um what is the most exciting safety development for you in the past year also very good question so I actually think 01 is probably one of the biggest safety improvements that we've made in the last year and it's surprising because often times this is framed as a capabilities Improvement but it's also fundamentally a safety Improvement as well and what I mean by that is you can imagine trying to jailbreak a model for instance right um and you can think of the old GPT systems as they have to respond immediately right and so maybe they're more susceptible to being tried but when you have a Reasoner the model can kind of reflect on hey is this prompt trying to get me to do something that's not consistent with what I'm trying or what I should be doing and um that extra amount of time that it gets to think and reflect makes it more robust to a lot of these safety attacks was that something that you and your team expected yeah no I it very much was and I think when we think about reasoning it's something that's very broad-based right um it's not just something that you use for math for coding um it's something that's very transferable right the same reasoning that you may use to do well in coding may be applicable to like kind of how you negotiate or how you know play a very difficult game yeah

### [7:00](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=420s) Benchmarks

that makes sense and talking about benchmarks like Do You observe like the same Challenge on safety all benchmarks yeah um so I think the good thing is you know safety kind of mimics this kind of adversarial attack uh framework there you know the attacks are fairly strong so I do think we have a ways to go there um I will not claim you know our models are perfectly robust and I think uh you know there's a very rich set of tasks there to improve on makes sense um what

### [7:28](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=448s) Key enablers

are the key enablers to move us from level one today to level five of super AI if you want to remind people like what is level15 yeah so um essentially we put out a framework at openi over the last couple of months um where we Define kind of levels of AGI and this is progressing from you know basic reasoners to more agentic systems to you know models that can take actions in the world um and eventually kind of more autonomous systems right fully autonomous systems and I do think really like robustness and reasoning is the key there right I think the reason you can't rely on a lot of the agentic systems today is they're just not reliable enough and I think that's also what underlies our bet on reasoning we've invested so heavily because we think that reasoning is what's going to drive reliability and robustness in the future makes sense um would you say that we achieved level two already you know are we getting there like yeah no I do think we are moving from kind of like one to two right um moving towards more agentic systems um again I think today you know a lot of the agentic systems do require human supervision um but increasingly right like you can see um you're more hands off right you let the model do its thing and we're building more trust in AI systems awesome um let's double click on

### [8:47](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=527s) Synthetic data

synthetic data uh first if you could REM remind people what it is and then uh is there any good practice to generate ftic data to trade model yeah very good question as well you can really tell the technical breath of this crowd yeah so again synthetic data is um data that's not kind of produced by a human data that's produced by a model and often times you see the power of synthetic data in kind of like data Poe or like uh low data quality style data sets so one example where we've leveraged synthetic data is in training things like Dolly 3 if you look at the dolly 3 paper um you know what One Core problem of training image generation models is that when you look at captioned images on the internet there typically is low linkage between the caption and the image that it's describing right like you could have a photo of a hot air balloon and the caption instead of describing the balloon is more like oh this was the best vacation I've ever had right and I think in those cases you can really leverage synthetic data right like you train something that's able to produce you know a high fidelity caption for your image and you go in kind of regenerate captions for your entire data set and um we've shown you know approaches like this to work very well and you can think of like other areas in which you have like a poor quality on one side of your data set that makes

### [10:12](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=612s) AI hitting a wall

sense um maybe related is AI hitting a wall yeah um that's a very relevant question today right like um you might have seen some articles recently you know saying like you know a lot of these large Foundation labs are hitting some pre-training walls or um even Ilia has come out and said kind of explicitly that um you know we may be hitting some walls in terms of pre-training but I do think our perspective internally is that you know we have two live paradigms today it's a even more Rich environment than we had in the past like there's this test time scaling uh Paradigm that we've explored with uh the O Series of models and it's really taking off I don't see the same barriers there uh to scaling reasoning models and um you know we actually think that you know I've been at opening ey since gpt1 and every time you go from you know one to two to 3 to four there are technical challenges right um and often multiple technical challenges I do think even in the four plus world we have a handle on exactly what technical challenges we need to solve uh and they're very concrete I don't think they're anything that you know uh we really just don't have a handle on that makes sense yeah something we

### [11:24](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=684s) Is OpenAI still committed to research

like to say at OPA internally is that the reasoning paradig is currently at the dpt2 level of maturity yeah some product Market fit uh but you know so much SE home for improvement and then we see like in the application the products that people are building quite a bit yeah um is oh spicy is open AI still as committed to research and safety as it was in the early days no 100% yeah um actually I you know I lead the research team I manage a very large portfolio of research projects and of course you know I think high level all the time about you know how much should I be allocating resources manpower to exploratory research versus you know to um kind of immediate short-term wins and I do think as a matter of principle we allocate more resources to doing exploratory research um and I do think we know we have a different style than the other labs when you look at kind of other big foundation labs they have a style because they have so many kind of good researchers they have them all do like undirected research projects and just kind of you know let them out kind on the world and you know whatever they want to work on they work on for us we're a smaller lab we have to be a little bit more directed so we pick specific exploratory bets that we have high conviction in and then within those areas we let the researchers really have a lot of freedom right so I think um this gets the best of both worlds we don't have kind of just kind of a little bit aimless exploration we really have directed exploration and leverage our small size well

### [13:00](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=780s) What model capability do you use

Mak sense um what model capability do you use the most personally and what model capability are you most excited to see others use creatively yeah so I mean one of my recent favorites has been are search models yeah I mean they're actually very useful I think um I really don't like the process of searching for information you have to like wait through so many links I think the links today are really kind of bloated with a lot of ads and just kind of content that's not relevant so I use that a lot just kind of to do um you know information gathering on any kind of subject um yeah I mean I use it for learning a lot too you know whenever there's some kind of topic that you know I'm not familiar with I think trpd today is my default M and you know I'm a researcher by trade I think having stepped into this role I think I've had to learn a lot more about the business and you know different parts of open Ai and I've actually found chat jpt to be a really tremendous resource there and actually I'm curious for your take on this Olia like uh what do you use mostly um my most surprising use case of the

### [14:07](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=847s) Oliviers most surprising use case

past few week has been brainstorming with one it took me a few weeks actually to recalibrate on like what sort of queries could I send like to chbt uh and 01 versus 40 has brought like a new level of depth it feels finally like a sping partner who is truly engaging with the ideas versus you know just like commenting them uh you know I was doing a bunch of like product strategy stuff like in the past few weeks and it felt like real like you know entity like to engaging uh so yeah that pry incredible oh yeah I love that yeah I feel like 01 for kind of this strategic planning is amazing yeah only incredible uh and you know like in strategy usually there is no like you know stupid questions like you know and so but of course like you know it's not like I'm going to ask like to people like you know the most basic ones and having that sort of a neutral but yet you know engaging like f partner has been very stunning to the contrary like I ask a lot of stupid questions and that's why I lovep for stupid questions someone was giving me like a good question like you know would you prefer someone to take a look at like your search history or like your chpt history yeah I cannot show the researchers exactly at that point it's locked it's mine um let's talk about o1 how did you come up with the intuition about reasoning and o1 so again this was a collective effort but also one that we've run for a very long time so uh remember the point I made about kind of having exploratory Focus bets um this was one of these bets more than two years ago and really kind of we had conviction that today you know the models are lacking right in certain ways they feel like incredibly smart but somehow like there's something about it which doesn't feel quite like AGI and back then we had a hypothesis that um it's really that it has to respond right away and when you think about a human right you put them in some spot you ask them to respond right away they're not going to give you the best answers right um a human is going to think for a variable amount of time depending on what you're asking right sometimes a human might say like hey I need to think for a while or I'll get back to you tomorrow right I need to kind of do some research um and really we felt like one missing thing here was Bridging the Gap between system one and system two thinking right this fast thinking is there the knowledge is there but the slow thinking is not so really that was the core hypothesis that we had you know we actually had a lot of different bets aimed at addressing this core challenge and I think like uh 01 you know uh watching it kind of take off has been really fantastic process right it was a group of very kind of like exploratory researchers um they got some small signs of life and then kind of when we got those signs of life um we just organized around that like we put together research teams we scaled the projects we put kind of uh Big Data generation efforts behind it um big scaling efforts big infrastructure efforts and yeah I think we were able to kind of fulfill the promise of that initial

### [17:06](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1026s) How long it took to get conviction

ambition how long did it take like to get conviction like on these SS of Life yeah I think that's always the hardest part of research right um in the beginning especially when you're working on something that feels a little bit more like a moonshot project you're going to have a lot of Misses um and I think it's really about protecting the researchers who are doing that right like um if you believe strongly in a ction it's a matter of time before it works out so you just have to let the researchers just try all the different approaches um yeah certainly there were stretches of you know like three four months where it didn't feel like we were making that much meaningful progress um but then eventually someone makes a big breakthrough and you're like okay well like this gives us enough energy to you know put more resources push a little bit further um and yeah that's really the joy of kind of managing this large portfolio of research projects um it's been a couple of month

### [17:56](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1076s) What are the biggest surprises

since you launched o1 what are the biggest surprises or learning how people have been using it yeah so I think um we've already engaged with a lot of external Partners um I think one cool thing is they found it just like strictly better than you know kind of using um like a fine-tuning approach for instance right um I think a lot of people find that you know uh it it's less uh kind of able to get tripped up on you know hard questions and I do think um you know there's been a lot of applications that happen outside of the domains of math and science that you know we focused a lot on um and it's just cool to see the reasoning really generalize over to those kind of things like uh when you look at Medical domains right uh we've seen kind of Partners try to put in um you know here's a list of symptoms um you know uh anti- symptoms and what is a disease consistent with all of this and you know the model can do this a lot better than 40 right because this involves forming hypotheses invalidating them forming new hypoth so we're seeing a lot of benefits even in places that we didn't explicitly kind of focus too much attention on that was

### [19:06](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1146s) How quickly can we get O models to everyone

big surprise as well like we knew that the model was good on like math science coding and you know that's what we were focused on like testing but seeing the model like work like pretty well like on like legal reasoning which you know on paper seems pretty far away but it's like logic you know um that's been uh really amazing and I'm sure there are many like other topics like you know Dimensions like areas out there that you know we haven't fully tested yet um how quickly do you think we can get you know o models to everyone you know in terms of customer models custom um I think it's funny because at the moment when people customize models like pain model it's mostly superficial right like for four it's mostly like style tone formatting I think o1 is a very different flavor of like fine tuning feels much more like an expert model that goes really deep on specific task uh so I expect the types of use cases that will be built on top of ow fing will be very different I think it's a matter of a month or two you know that sort of a

### [20:11](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1211s) Biggest challenge for startups using AI

yeah um what's the next biggest challenge for a startup using AI as a core feature uh biggest challenge for sorry challenge biggest challenge for a startup using AI as a feature um well actually do you think it's a quite good time to be building startups on top of AI right like um I think when you look at Foundation model players right like um we focus on generality right and really as a at a company like open the ey it's impossible for us to go into every single vertical um and I think there's just so much room and so much work that you can do kind of tailoring a model to be very functional in a particular domain um and I do think you're seeing that today right like I think you see a rich ecosystem of startups kind of building on top of a uh on top of open AI for different types of applications I'm curious what you think I think this is a great question for you as well

### [21:05](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1265s) Why startups win

um I think usually like the reason why startups win is that because they know and have conviction on a secret that's right the rest of the market does not and I feel like with AI like essentially You're Building like on a Tex stack which is constantly moving like you cannot predict when the next model will come up what will be like the next capabilities that are unlocked and so I feel the startups that do best are usually the startup who have like an in and building like right at the edge of stuff which isely working but there is a sign of Life a bit like research and you know when the next Generation like o1 O2 you know comes up like it unlocks like the it makes the feature just more reliable uh so yeah easier send that down but I feel that's a pretty good recipe like you know to build something really

### [21:50](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1310s) Prom caching

really cool um you have plans to expand pumped caching maybe I can take that one oh yeah you should take uh so prom caching essentially is a really cool feature that we shipped um a month ago uh which essentially uh what we do is that we cash essentially most recent uh input tokens uh and that saves up latency like you know you don't have to go through the whole GPU and everything uh and that saves on cost um I think massive adoption massive usage and love so you know we'll keep investing in it um I think promp caching is becoming more and more important because of longer and longer context Windows um if you want to move to uh a world where applications are like truly agentic you will need to pass a lot of context on like you know previous interaction with the users and like you know a bunch of context and so in a way you have to find ways to um optimize the cost um I think the dimension that we are mostly interested in prom caching frankly is like you know make it like even more cost efficient like even like you know more discounted um have a longer cach windows at the moment you know it really depends on like many parameters but you're trying like to expand um and something that we want to keep as an new variant is uh we decided as a design like principle to make um optin Automatic by default right like you don't have to pass any parameter or just no but from caching we just like discount uh cash and discount and so I think we'll double down on that like people have loved it and I think that was the right

### [23:19](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1399s) One AI breakthrough

choice um oh wow if you could time travel 10 years into the future oh man what's one AI breakthrough you would be thrilled to see I do think you know if you ask most people at opening ey today um 10 years from now you know we have AGI even in a fairly strong form so really like it unlocks the full potential like you can imagine one person right in a week creating a mega startup right that delivers immense value to everyone um I think really just kind of the ability to for one person to just leverage themselves and create impact just becomes wild right and um I do think kind of the first domains where you will see this kind of improvement feels like it's software yeah what do you think I love the one person massive impact like in a few days uh thing and I think you could expand it Beyond business I kind of love like the you know the sort of the nostalgic like in the 17th century scientists were like in the room just doing something in like you know revolting physics um and I wonder if we could get back to that Vibe like you know one person being able to do some massive like scientific discovery on like medicine or physics or you know I don't know computer science um but of course by being assisted so much by AI like that would be

### [24:39](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1479s) Interdisciplinary collaboration

insane um how do you see interdisciplinary collaboration like with sociologist impacting AI research yeah so I do think you know increasingly with o1 models we are kind of reaching out to external experts and partners so we've done a lot of uh collaborations with you know very famous mathematicians like ter ta we've done collaborations with people in National Labs in the United States and we're seeing how they use the models and honestly these are the people today who are telling us they're getting a lot of impact from the models um and so I do think these external collaborations are things that we're leaning into more and more um I also think specifically kind of more towards the sociology angle um we want to kind of Define ai policy as a conversation with external experts as well um I think you know safety isn't something that you decide internally in isolation it's something that you know you really have to engage the public with and you know figure out what the right kind of way to tailor models for any particular group is I agree like

### [25:41](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1541s) Models and values

that's one of my biggest learning of the past couple of years that clearly models en code values like you know at that point it's clear and people are using them more and more if you fast for a few years and people are using AI like five hours a day six hours a day at that point you have a huge responsibility like you know uh and it's clearly not like one company or even one country thankly who is going to impose that toop down those values and so need to find a way essentially both to sort of decide how to get there and like mechanisms like you know for people to declare in a way people communities to declare their values like that was a pretty big Insight

### [26:16](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1576s) Should you long coding

Insight yeah um a friend asked me if they should long coding I'm not sure because as a senior engineer I noticed my job is changing from coding to reviewing code from AI so I feel like industry I feel seen so I feel industry level coding is going away what is your advice yeah so I actually wouldn't advise people today to just stop learning coding um I think fundamentally when you learn coding you're actually learning kind of problem solving and general reasoning this is actually a skill that allows you to be kind of robust to a lot of changes right you you'll be make kind of more rational decisions more principal decisions and even when it comes to AI I think there's a lot of value in people who are able to understand the internals of what are going on and that allows you to better use these tools so I do think like for a long time the people who really deeply understand coding machine learning are the people who are going to be the ones steering AI would you learn

### [27:12](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1632s) How would you learn coding

coding in a different way like you know if you were like 18 or whatever 15 today like how would you yeah how would you do it oh I'm so jealous of the people who are 15 or 18 today I mean yeah it's just like um back then right it's the when I was growing up the internet was just coming up right it was very hard to kind of figure out how do you get the right resources to learn something you know today you just AB trct gbt right um I think it's almost like this tailored curriculum for yourself right you don't understand something immediately someone helps you dig in more and um you know one study that I've always really loved in the past was you know people learn in very different patterns some people they learn very linearly right you give them a curriculum they kind of learn at the at a particular rate some people you know they get stuck right it's like they get stuck for a while and then they have a breakthrough and then you know they get stuck at a new thing um and I do think you know this just really unlocks personalized learning and you know today I do so much of my learning in that way instead yeah all right

### [28:11](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1691s) One thing people would not guess about working at OpenAI

we have time for one last question what is one thing people would not guess about working at open AI maybe a team Quirk or a tradition yeah I mean open is a very quirky place of course um yeah I actually think one thing is um even though we're here developing AI openi is like a very human place to work um I just think you know people are very kind I think uh almost everyone goes out of their way to help other researchers you know on board get familiar with the tools um it's a very kind of driven place I think people feel empowered we want people to feel empowered to seek impact and make impact um it's also very fluid place right we don't tell researchers you know hey you're going to work on this thing you know they figure out with us what they want to work on and I think that's very powerful because researchers are motivated by what excites them right and I think you can't make a breakthrough if you're not kind of deeply excited about the thing you're doing so I just think there's this culture of you know excitement about pushing the field forward really kind of uh helping each other out and being a kind place to work what do you say

### [29:19](https://www.youtube.com/watch?v=Mkf1JA8_RkQ&t=1759s) What do you think about OpenAIs mission

I feel the same on the product engineering side uh it's the first time I feel it like in a company like the mission is very clear like you know everyone is R but then the how is pretty much like left to the teams like you know and to me the way I interpret it is you know it's like essentially the research lab Roots like you Empower people you give them space an agency and they will surprise you essentially with the outcomes and so whenever we decide like what product to roll out next like you know what to do in GPT it's not like you know Sam is like every week telling us like hey that pixel like move to the left Etc it's like you know trusting us so that would be one um second like I would double down like the humidity kindness especially because like the Open Eye mission is like so grandio I feel people like think that you know that's what we talk about like every day but I feel like people on a daily basis like are just so like approachable and like combo like you know I wouldn't trade it like for any other place um yeah all right thank you so much thank you thank you

---
*Источник: https://ekstraktznaniy.ru/video/11405*