OpenAI DevDay 2024 | OpenAI Research

9:02

OpenAI DevDay 2024 | OpenAI Research

OpenAI 17.12.2024 9 228 просмотров 161 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Building with o1

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

hello my name is Yan uh today with Jason we're happy to share our thoughts on how you can build with oan so oan is a reasoning model and we train oan to think with reinforcement learning and during the training phase it learns among other things to refine the thinking strategies and recognize and correct its mistakes so when one attempts to solve a very difficult problem it may not get to the working strategy in one go but just by trying a strategy even if it's unsuccessful can give some cues as to what to try next and so 01 does this and then eventually it gets to the better strategy and so on so it's very patient it's a very different typees of model and so last month when we released on preview we uh had showed the some of the examples of actual Channel thought going on so in this particular example we have the cipher text the model is trying to decipher a some kind of a text and I'll show you some uh reasoning patterns so in this case the model says H so probably it's like recognizing that this current thinking strategy is not really leading to anywhere and probably trying something different in this case the model tries something out and then says wait a minute so the probably realizing slightly better approach and then trying that next and then in this case the model has more conrete idea to try out so it says let's test this Theory and if after a while model gets to a more correct situation correct like a strategy and says it's perfect so the Model Behavior is very different and in fact it's so different that we believe o1 represents a new paradigm and new paradigm changes so many things that I think we should think about a having a New Perspective so what really changes I think a good starting point to think about this question is where we were and then where we are now and where we're heading towards and so I invite you to think about questions like these what just became possible with o1 that wasn't really possible with the previous gener generation of the model and what will become possible with the future versions of o1 and obviously the answers to these questions will be very different uh depending on the specific domain but just thinking about these questions will put you into a modo thinking where okay now we're building with the future models uh in mind instead of thinking that the current generations of the model will stay as is so um you might say okay I'm not building o1 myself so how do I know what future generations of the o1 might look like prev unlike the previous uh paradigms o one Paradigm is much simpler in that it's a reasoning model so its reasonings will be just better meaning that it will be able to think better in Prett much anything that requires thinking so with that I think it's useful to think about um when you're building something uh considering questions like this what would you want to build if reasoning is 50% better than what we have now what would you want to do differently and then maybe more importantly what would you not want to build if reasoning is 50% better and we have seen in many cases that as the model gets generally smarter some of the problems that we think are difficult uh in the past just became trably possible so if we believe that the reasoning will just continually get better we should think about what problems to not solve as well so with this new paradigm I have been working for a while but I'm still really uh having some trouble because I'm so used to the previous Paradigm of um and so I think this is be really useful I hope this Sparks some interest in thinking about how to uh build with this new reasoning Paradigm with that I'll pass it on to Jason thanks youngan um I wanted to talk a little more specifically about a few EV valves that we've shown at the blog post that might guide you for when you might want to use 01 and 01 preview uh compared to GPT 40 so I think one of the best use cases for models in the 01 Paradigm are for extremely hard math and code problems so you'll see in these bar charts uh we have on the left IM which is competition math and on the right code forces um and

Segment 2 (05:00 - 09:00)

there's uh these three bars so GPT 40 inal 01 preview uh and then 01 and the thing to note here is that GPT 40 and 01 preview are barely solving a few questions in these benchmarks um and then you could see o1 preview can solve more than half and o1 uh can solve the majority of problems in this data set so the point is there's some subset of tasks where GPT 40 is really struggling um and uh 01 can solve the majority of problems um here's a broader evaluation Suite uh that we published uh in the blog post um and I'll highlight a few things here so uh first um if you look at uh the performance on some of the math benchmarks so uh math from Hendrick uh physics college math uh elsat there's a huge performance gain um when you use 01 preview compared to GPT 40 and then conversely you don't get a huge performance gain for every task so there's some tasks like AP English Language literature sat public relations where we actually don't see o1 preview doing a lot better than GPT 40 and so I made this table that sort of summarizes when you might want to use models from the 01 Paradigm uh versus GPT 40 so I would say the pros of using models from 01 preview and 01 is if you're trying to do tasks that are extremely challenging prompts um and in their domains of science math and coding um and the other use case would be if you don't care about any other constraints and you just want the best answer uh then 01 will likely be the most performant model generally the cons here are obviously that because as hangan mentioned um 01 preview and o1 require time to think it's going to be a lot more expensive and there's much higher latency when you use these models and then for gb24 I would say that's still great for the majority of use cases that people are currently using the API for obviously it's less expensive and lower latency uh than o1 preview and 01 um and I guess the con would be that it's weaker on than o1 on prompts that require uh reasoning or strong coding or math um and then there's the question of when should you use o1 preview versus 01 mini um and this plot here shows the inference cost versus performance of a few of these models um so you could see uh the inference cost on the x-axis um and then on the y- AIS is performance on imy which is competition math um and interestingly we see that 01 pre 01 mini is actually strictly better than 01 preview and this is because we really specialized 01 mini to be a fast but performant model on things like math and coding um and I would say you should use o1 mini if you're doing math coding and or if you want the answer uh more quickly or cheaply um but I would say in other cases uh o1 preview is a good choice um finally I wanted to highlight a few use cases uh in the API of o1 preview and o1 mini so I like this one from the uh open a cookbook here uh it's basically a uh medical inaccuracy detection so you given a bunch of information um and diagnosis um and then o1 preview tries to detect whether it's a correct diagnosis or not um obviously coding uh is another great example of uh o1 preview shining so for use cases like cursor I think o1 preview would do a great job um hard Sciences research is another great use case that o1 preview is particularly strong at um and we've also heard that these models have been good uh as a brainstorming partner on math problems or uh on legal domain reasoning um so I'll end here and enjoy using preview and mini thank you

Другие видео автора — OpenAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник