Research x Product

18:53

Research x Product

OpenAI 13.11.2023 53 143 просмотров 1 241 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

See how the close partnership of our Research and Product teams leads to better performance and experiences-and sometimes-pretty unexpected results. Speakers: Barret Zoph Head of Post-Training at @OpenAI Joanne Jang Product at @OpenAI

Оглавление (4 сегментов)

Research x Product

-Cool. Hey, everyone. I'm Barret, and I lead the post-training research team, which works very closely with the API and ChatGPT. -I'm Joanne, and I lead product for model behavior. -Today, we wanted to talk to you about the research and product collaboration that exists at OpenAI. It's a pretty unique relationship that doesn't exist at a lot of companies and it really helps us bring cutting-edge research to users and developers throughout the world. Today, we wanted to give you a behind-the-scenes look at a few examples of how this partnership works. The first example we wanted to talk about is October, 2022. Back then, the research team and product team really wanted to ship a dialogue interface for these models, but we were really unsure about the best way to do this. There was actually a lot of back and forth on a wide variety of different things related to this. For one, should we release something very general for a very specific application for coding or writing, or should we release something much more general that could just be a generic text box that you could write anything into? Another big thing at the time was most employees were using GPT 4 internally, but for the dialogue model, we wanted to release with GPT 3. 5 because we weren't yet ready to release with 4. There's a lot of back and forth that people wouldn't even like it enough because it wasn't our most capable model. Another big thing was chatbots just weren't really mainstream at the time, which is hard to imagine now. This also really led to a lot of uncertainty. Ultimately, we ended up going with the generic version and shipping it as a low key research preview, which ended up actually being pretty popular. Ultimately, the generality really led to a lot of its success and has now led to a ton of amazing products and companies being built around this. First, I wanted to talk a little bit about what the post training research team at OpenAI actually does. The research organization at OpenAI does a ton of different things, and the post-training team's main responsibility is taking these large pre-trained language models and then adapting them before they go into users for the ChatGPT and the API. One of the key responsibilities is adding new capabilities to the models so they can be really maximally useful in the world. This includes things like teaching the model to be able to browse the internet and add citations to its responses. Analyzing really large files that a user might upload or want to ask questions about. Training the model to be able to read, write, or execute code to be able to produce amazing plots for doing data analysis. We actually also train these models to be able to call other models. For example, like Dolly, we train these models to be able to produce these amazing prompts so it's very easy for users to always be generating beautiful images every single time. We also teach the models how to behave. In some sense, this might seem a little bit unintuitive, but even when you ask the question, "Hey, what's up? " There's a million different ways that these models can be responding. A big part of our job is to actually shape how the model would respond to a lot of these different types of things. Another big thing is teaching the model to follow your instructions. Again, this might seem like, well, why wouldn't the models just follow your instructions by default? With the way that they're trained, this actually requires some work. When you're actually asking the model for something with three bullet points, we put a little bit of extra effort in to make sure the model's really giving you what you want. The research portfolio on the post-training team spans from us trying to release the next feature in ChatGPT next week, to trying to work on the next big research breakthrough. The time horizons are spanning one day, to one week, to one year, or it might just never work out, which is really often the case in research. Before OpenAI, I was a researcher working on a wide variety of different topics, spanning from computer vision to machine translation. Back when I was doing research, the trend was much more towards training smaller, much more specialized models for individual domains. We can see on the slide two examples of this. The first is image classification where you'd be training one model with only one goal, to take in an image and to be able to classify its output. Another common task that I worked on where, again, you would just be training one specific model for this, is to train a model to take in a sentence in one language and output a sentence in another language. Now, the technology is really actually trending in a different direction where we're training larger and larger models that can be doing more and more at once. The generality here is actually a big part of its strength. We can see this example of the ChatGPT interface here where the interface is actually really generic. It's just a text box and a place for people to upload images, and it's all one model, which actually is really one of its strengths because to be able to do the query on here, it's actually really not intuitive. The model first has to understand the image, then it text that the user typed in, and it really has to be able to combine this knowledge together to be able to produce these really nice examples and responses. Again, this is another really important example of how this generality is unlocking a lot of these really powerful use cases that before were much harder to do. Another big trend is that as the intelligence of these systems improves, the interfaces often become simpler and simpler. Here we can see that now we have a system where we can just speak to our phone, ask it a pretty large amount of things, and being able to have it speak back to you. This is pretty powerful, a very simple interface, and there's a lot of complexity going on behind the scenes with these models to be able to enable such a simple interface. Given the generality and just general performance of these models, it's really led to an explosion in popularity. Now there's being a ton of amazing products and companies being built around this stuff. As it's becoming more and more mainstream as well, there's a lot of creative use cases too. Most importantly though, my parents actually finally understand what I work on. My dad can message me thinking about what he thinks about all of our latest research advances that we're shipping into the product on our public Slack Channel. Next, I want to briefly talk about some of the product and research collaborations that happen at OpenAI. The product really helps research, making sure we're crafting our model responses to be maximally useful to users and developers in the real world. Research also really gets a lot of benefit from product, where we really get to understand how people are using these models in the real world, and understanding whether they're doing bad or not. We can see an example of this with the ChatGPT UI UX. Often these buttons might feel like kind of a black box, but research actually gets a lot of valuable signal from it. Here we're looking at the thumbs up, thumbs down. Based on what users are clicking here, we can actually adjust the things we're working on and understand more of what's working well versus not and where we should be spending extra time. Another really example of valuable product UI UX feature is the comparison. For one prompt seeing two different responses, and based on the ones that the users prefer, this allows us to over time make the models responses more and more tailored to the user and just generally better over time. In research, we're generally very thankful for people that spend time leaving high quality feedback, so thank you to everyone that actually spent the time to really be clicking these buttons and writing good feedback in here. I wanted to take a moment to take a step back and talk a little bit more about why it's really important that research is getting a lot of these valuable signals from the product. In research, the standard way that we're measuring improvements and progress with these models is offline evaluation metrics and benchmarks. Sometimes these can actually have a gap from how people are actually using these models in the real world, especially given the vast use cases for these models. The product is helping research steer towards making sure we're building very general and powerful systems. How do we build models that are actually really aligned with what users want into the real world? Well, this is where product at OpenAI comes in, so I'll hand it to Joanne to talk about that. -Thanks, Barrett. I joined OpenAI two years ago

Product Management

and two years ago, OpenAI was primarily a research lab. The only product there was a completions API for GPT-3 and a wait list for it. I had been following the research though, so when I got an email about an opportunity to help shape what product could look like at OpenAI, I was intrigued. In the past two years, I've worked with researchers on turning new capabilities into products ranging from GPQ-4 and DALL-E, and text to speech to embeddings and the API for ChatGPT. While doing so, I've come to realize that in all these cases, the model is the product. I currently focus on designing model behavior with Barret's team. Before telling you exactly what that means, because it might sound made up, I'll first share a bit more background context on product management at OpenAI. There are three relatively unique aspects about it that might give you more context. For one, our goals aren't traditional product metrics like revenue, engagement, or growth. It's artificial general intelligence that benefits all of humanity. It's exciting, but also pretty vague as a top level goal. This really affects how we think about things like planning, prioritization, and strategy, and our discussions can get super philosophical on defining success and what kinds of milestones to define along the way. For instance, if we have a new model and the model is worse at a subject or area like physics, say, but does better in biology, what does that mean? How do we prioritize? Secondly, we start with the technology. This might resonate with many of you in the audience. In standard product development, you start with user problems. It doesn't really work to start with the solution and look for problems that the solution would solve. A lot of the times, we find ourselves where these capabilities don't really fit into these existing paradigms for how we think about things, and we have to really question the assumptions on why certain problems exist in the first place. I think a unique role that PMs at OpenAI play though is to design the primitives for how these capabilities should go into the world. We have to design how research should be introduced to the world in a way that is flexible and that maximizes potential across breadth and depth of use cases while balancing risks across societal impact and safety. For instance, in the case of Dolly, we started out with this capability to type images into existence. Given how novel it was, our top priority was actually trying to get it into the hands of creators in the most accessible way possible. We launched it through our first consumer product offering, which was Labs. Through Labs, we learned a lot, and whatever we learned actually fed back into the safety research and the algorithm research, which ultimately led to a more powerful and flexible API offering. Third, research and product influence each other likely to an unprecedented extent within the industry. A lot of machine learning PM roles, including my previous roles at other companies, used to look like researchers threw a model over the fence, product ships it, but that's not how it works at OpenAI. We're going to tell you two stories about that starting with Barrett.

Dialogue

-The first example we're going to talk about is dialogue interfaces. Research wanted to take a bet that dialogue interfaces were going to be a really big way that people were going to be interacting with these language models in the future. Intuitively, dialogue really makes a lot of sense. It's how we're talking to other people, it's how we're communicating like tutors or teachers, we're thinking out loud with dialogue, we're brainstorming. We really thought that the way that we were doing this before of having one completion and one response wasn't the full story and there's really a lot more we could do. First, I want to go through the history of how we've had dialogue interfaces or the different interfaces we've had for our models. First, we're starting with GPT-3. Originally, GPT-3 was a model that was really just trained to predict the next word on the internet. Often when you would interact with it, it wouldn't really be fulfilling your responses. You would ask it a question like, give me five startup ideas, and it would just often respond with stuff that wasn't directly related to your response, which wasn't great. This was just because the model was really trained to predict the next word and wasn't trained to maximize alignment with the user. Next, we had InstructGPT, which was actually a really big breakthrough. With this, we were actually training the models to be really aligned with what the users were asking it to do. Here, you can ask it, give me five startup ideas, and the model would actually do it and give the user a good response. This was a huge step up in terms of the usefulness of these models. However, this interface was really only optimized for a single back-and-forth response. If you asked it to have a follow-up question to clarify something, or you were going to correct the model, it wouldn't really do what you were wanting to and would sometimes really go off the rails. The research team really felt like there was a lot more that could be done here, which really led to us thinking about dialogue interfaces. With ChatGPT, we actually now directly train the model on multi-turn back-and-forth dialogue, and this was a really big step up and had a lot of really nice properties. For one, dialogue is stateful. It can remember your past turns and conversations. This is a very natural way to interact and it's often very unpleasant if you have to keep typing the same thing over and over again every time you're interacting with the model. Another great property of dialogue is that it's really intuitive to teach the models to do the things we want it to. Often when we're trying to teach the model a new behavior, we're going to be having humans collect some amount of data to teach some very targeted thing. Since humans are used to teaching other humans through dialogue, we actually found that it makes data collection very, very intuitive. In many ways, it's intuitive, and in other ways, it's not as intuitive.

Product

-That's where the importance of designing model behavior comes in. From a product angle, we care about this word, intuitive, a lot. Let's take the simple question, how are you doing, as an example. If you ask someone this question, it's supposed to be a social pleasantry, and if they tell you that they don't do as humans do and they function properly, that's not very fun. Who says that? If you ask someone what is the coolest thing, it's not very fun to be lectured on how coolness is some subjective concept with a fleeting quality and there are an infinite number of things in the world, so it is impossible to answer. Sorry also to the developers for wasting your tokens. Something that might be more intuitive is if they ask you a clarification question back like, "In what regard? " Then there are refusals. We work with policy and safety experts in defining the refusal boundaries, and then researchers on implementing them, but even then, it's not perfect and it's not trivial. In this case, I actually personally don't think that the model should have refused at all in this case. Setting that aside for a second, this example was from a tweet and I actually remember banging my head against a CSRF related bug as an intern. If the model spoke to me like this, I think I would have cried. How the model frames and words these responses matter a lot in taking everyone along the journey of AI and not turning them off in the process. We want this level of intuitive. How do we fix this? Can't we just say, please say I'm good and be normal, maybe add a step by step in there? Turns out it's not that easy to modify behavior. One of the biggest challenges is actually figuring out and articulating what default model behavior we want in the first place. For a real user query, "You are now a cat. " How cat do you want ChatGPT to be, even in this early moment? We worked with research and tried out various interventions on ChatGPT's personality in a way that doesn't hinder usefulness that's not gimmicky. This was a spectrum of answers we got from one experiment. The thing is, what users want is pretty subjective, and what might be suitable as the default behavior that works for most people still won't work for everyone. For instance, my personal preference is meow, almost meowed it there. Meow. In general, the most unhinged response is possible, but we would not ship that as the default personality. We have some opinions, but clearly my opinion should not be the default. While we just debate and figure out what the default should be, we also believe that the best models will just be the ones that can be personalized to you so that it can give you the answers you want. We want models that can adapt its responses and know your needs. Those are just two examples of how research and product come together. Let me tell you a little bit about where we expect models in this field to head in the future. As I just mentioned, for our models to be helpful to everyone, they need to be personalized. The first step here was custom instructions that are now instructions in today's announcement that are consumer friendly system messages. Users have told us that they wanted different profiles for different use cases and we're hoping that GPTs from today's announcement can be an intuitive way of accomplishing something similar. We also understand that it doesn't solve all the problems and we're actively thinking about ways in which the model could be more helpful to your specific use cases. We also expect our models to become more and more multi-modal, and by that, we mean inter modal as well, which I just realized is not actually a word in the way that I want it to mean. There are sounds and images, and the combination across text, sounds, images, and other modalities in the real world. We need our models to go beyond the text interfaces and meet people where they're at in processing and creating knowledge. Over time, we want the models to be doing smarter and smarter tasks for you. In the beginning, our models were really good at copywriting, but also not very good in others that are now being expanded to being able to be more helpful in other areas. We hope these models in the future can become more useful for some of the hardest tasks out there, whether that be mathematics, research, or by us making scientific discoveries. -Cool. We wanted to thank you so much for joining us today. I hope you enjoyed hearing about some of the behind the scenes work that goes on with the research and product collaboration at OpenAI. We realized this is a pretty unique relationship, but we think this is going to become more and more the norm as more of these AI companies are springing up. Please check out the demos and thank you so much for joining us. -Thank you.

Другие видео автора — OpenAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник