Model Behavior: The Science of AI Style

17:21

Model Behavior: The Science of AI Style

OpenAI 08.10.2025 4 278 просмотров 94 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Why does ChatGPT sound the way it does? Go behind the scenes to see how the style a model emerges, the trade-offs researchers weigh, and what the future of tone and voice in human-AI interaction could look like.

Оглавление (4 сегментов)

Segment 1 (00:00 - 05:00)

Hello everyone. Uh I'm Laurentia. I'm a uh well I work on model behavior. But today I'm excited to talk to you today about the science of AI style. But before we get into that, a little bit about me. I'm actually a librarian by trade. I went to library school after learning that librarians work at Google on their search team. Uh, librarians help people access information and I wanted to on the internet. It's kind of cool. I still get to do that today, but grad school was ages ago. Uh, since then I've had a couple short stints at Restoration Hardware and Apple before having a much longer stint at Instacart. I was there for 10 years. Uh, and today I work at OpenAI and research and I get to talk to all of you. Thanks for coming. I work with teams across OpenAI to improve model behavior. I've spent my career helping people find what they're looking for, and to this day, I love asking questions about information, what it is, how people access it, how people give it meaning, um, what makes it useful, and critically what it feels like when grappling with information. Uh, like I said, you have to be go to grad school to be a librarian. Someone wanted to see me from my early days. So, there you go. I've come a long way, right? But all right. Now, I'm one of the custodians of our model spec. You may have heard of it. It's a document that we put out publicly. It states our principles around how we design our models behavior wholesale. Uh, it uh we put it out into the world so everyone knows how we think about shaping model behavior because these choices aren't just made in theory. They show up in how people experience chat GPT every single day. So, uh, quick exercise. Let's do a show of hands to see how some of these different preferences land. Uh, raise your hand if you find emojis helpful in assistant responses. I see a few, but you know, there are people inside our company who love emojis. All right. Um, who here thinks it's easier to work with Chad GPT when it's concise? I love this developers. Okay. And uh who wishes chat GPT would ideate with you more? I'm a PM by trade, so like I'm definitely in that camp. You know, there's different hands going up and those are different behaviors that can show up at different times. And all of this stuff, the stuff that makes up how information is expressed is part of our model style. So here's our road map for the next 25 minutes. First, we'll ground uh in style, how I define it and why I think it matters. Then we'll talk about how style emerges in a model and what that means for trust and perception. We'll talk about the complexities behind style and then finally I'll share some parting thoughts on the future tone and voice of AI and then we'll wrap with your questions. And I've been asked to remind all of you go to Discord, ask your questions there. That's how you can get questions up to me at the end. All right. So what is style anyways and why does it matter? Well, for today's talk, when I talk about style in models, I mean values. So this is generally stuff our models should always or never do. Uh like we always want our models to uphold the law. Traits like be curious, be warm, be concise, be sarcastic. And then flare. That's the use of emojis, m dashes, these sort of like micro things that turn up in model responses. When we put all of those things together, we get demeanor, which is how values, traits, and flare uh adapt across specific contexts. Why does this stuff matter? Because al together, these parts of style change how people experience the model. Early AI models were cautious and flat. They gave you the facts, but they felt aloof. I can remember interacting with the models like before I joined chatbt and very much feeling this way with them flat. Later models became more dynamic, more adaptable and more understandable in tone. And that evolution actually started to change how people use our models. Instead of just looking up trivia or using chat GPT like a search tool, people started to use chat GPT to collaborate. As a tutor, a coding partner in planning one's day in writing. The list goes on. As one user told us, using chat GPT feels like hiring a ghost writer who never sleeps, never complains, and always gets the tone right. And that right there, that is not about IQ points. That's about style. How it feels to interact with our models. So, how does a model get its style anyways? Well, I'm going to keep it super simple with three buckets. Uh, starting with pre-training. So, with my

Segment 2 (05:00 - 10:00)

librarian hat on or pre-training and training. on, this is all about filling the library with knowledge. The corpus that we train on sets the baseline voice uh idioms and uh breath of knowledge that kind of define what the model's realm of capabilities are. Then we start fine-tuning. This is where I get more involved. uh you know we start to add tone, helpfulness, guard rails, we start to measure how are our models are at meeting our guidelines and we start to train in improvements and then finally uh this is where users really get involved and of course like devs I know that you can do some of your own um post- training but this is a big area um dev settings like system instructions tools and defaults actually let folks like us in our system prompt and chat GPT and you in your uh apps if you're using our AP API further refine the style of our models. User prompts change style. I love thinking about this. How you write your prompt changes how the model responds. If you say yo, howdy, or hi there, uh it actually leads to different styles of response from the model. I'm from Alberta, Canada. Uh it's like the Texas of Canada. We say howdy all the time. And the model has started to adapt and recognize that and talk to me a little bit like an Albertan. And that gets into our personalization features like memory when enabled that tailor model style over time to each and every one of you. Uh and then finally, you can actually select a default personality in chat GBT. This is something where we've got some like deeply trained in personalities that are a bit more robust than just prompting things on your own. Style is mostly set by training. It's refined by tuning and then finally shaped by how you and the app prompt it in the moment. But style isn't just about aesthetics. It shapes how people interpret and trust the model. Humans have a natural tendency to read into things. And for that, I'm going to give you a story. Uh this is a like drawing of Bruce, my red Chevy Astro. He was signed by Bobby Orur, who's a hockey player, and he went everywhere with me during my college years. He had all-wheel drive. He was heavy set, and he could plow through feet and feet of snow. He made me feel confident. powerful and it was always a little fun for me when I'd jump out of that van in like a dress and heels and surprise everyone. I think they're expecting someone bigger. I loved Bruce's vibe and boy did I miss him when he finally broke down because I never got his oil changed. He was good to me. And I'm not alone in this. As humans, we talk to pets that can't talk back. We argue with our GPS. And we let old cars go with a tang of nostalgia and sadness. AI can magnify this. When interacting with m our models, it can feel like you're in a conversation with a human counterpart. That makes the model helpful. It makes the interaction smoother and more approachable. But without the right model behavior, it can also blur lines and lead people to assume judgment, expertise, and even agency that the model doesn't actually have. We tackle this through deep intent and craft around how our models respond. We ask things like, "How warm or neutral should the model sound by default? When should it adapt to your preferences? And when should it stay the same and consistent for everyone? How do we balance ensuring the model isn't annoying while making sure you don't misinterpret what it is? " These choices are foundational to trust and safety. Regardless of how we tune style, our work should never impact safety guardrails negatively. And people notice when we make changes to style, when Chat GPT leaned too far into praise, what folks online dubbed glazing, we call it that, too. Uh, it became distracting and undermined trust. And that's why we try to tune carefully. We try to aim for models that are balanced, warm, and approachable, but not sick ofic. Style isn't a bolt-on. It's an interface that people experience. And because people read intention into style, we have to be deliberate in what we do. Even small stylistic choices can ripple into whether users trust or mistrust our models. So, how do we make these decisions anyways? Well, there's a lot of deep philosophical questions that go into designing model behavior and writing our spec. The principles that guide our work are found in it. uh we they're all available for you to see. Um but there are three core principles that we really anchor in. So I'm going to walk you through those. The first is maximizing helpfulness uh and freedom for our users. We aim to maximize users autonomy and the ability to use and customize our tools according to their needs. Uh again librarian hat on. We call this like intellectual freedom. We think people should be able to explore

Segment 3 (10:00 - 15:00)

all sorts of ideas uh largely unbounded. But we balance that against minimizing harm. Like any system that interacts with hundreds of millions of users, AI systems also carry potential risks for harm. Our safety systems team leads the charge on defining and measuring our safety standards. That gets into the values I was mentioning. And then overall, we need some sort of default behavior that our models uh to have for our models to have. And so we aim to set defaults that users and developers can override so long as they stick to those safety policies. Two quick callouts on all of this. There's no single owner of the model spec. I'm standing up here today. Um but I'm not the only person working on it. And that is by design. Our policies are shaped by researchers, safety experts, product managers, designers, policy makers, and others from across our company. In addition to taking input from users like all of you and developers and civil society groups and it's also important to call out that there's no single style that works for all users. Um you know as we saw with that exercise at the start we all want something a little bit different right? So we don't think there should be a single overarching style for everyone. Instead we aim to provide choice flexibility so the model can adapt to different contexts and user needs. And flexibility can sound straightforward. But I'm sure as many of you seen uh under the hood, this is tricky. It conf uh collides with the fundamental limits of how large language models work. They don't execute rules. They approximate patterns. And that makes consistency and style one of the hardest open problems in alignment. So let's connect this all back to the style concepts I laid out earlier. Values are the things we won't compromise on, like our safety standards. traits like be curious, be warm and concise. All of which are actually in our model spec are things that we started to actively define defaults for. And then we've got flare like m dashes, emojis, things like that. Frankly, there's often no design default for these things. They just sort of emerge in the models today. But what's true is that across both traits and flare, um, we think developers and users should be able to steer these things. We are trying to set defaults but not fixed behaviors. But it's important to note even if we can all align here and you're like okay Lanch has taught me this makes sense. Uh if we align on what the model should do in the spec or here uh the hardest part is actually getting the model to consistently follow through. Again alignment is not a solved problem. Why? Because large language models don't execute code the rule wait execute rules the way code does. They generate statistically generated text based on their training. That makes them powerful but flexible and it also makes them less predictable at fine grain levels. So when we ask for something like don't use m dashes, the model doesn't have a clean toggle to flip. Instead, it's trying to balance your request against everything else it has already learned. That means we see inconsistencies sometimes in the way the model is weighing the many instructions that it's getting at the same time. It's also why alignment remains an open research challenge. Translating human intent into the behavior that's reliable, steerable, and consistent across billions of contexts is extraordinarily hard. And until that problem is better solved, you'll miss you'll sometimes see cases where the model technically knows what you want it to do because you're telling it. Um, but it can't consistently execute it. But the good news is we're working on it. And that gets us to the future of style. So when I listen to how people actually use our models and what builds trust, three themes keep emerging. Power users want fine- grain control. Everyday users want systems that adapt naturally to context. And almost everything wants things to feel simpler and more intuitive. That's why we're focused on steerability. This is a big one for me and we're investing a lot in it right now. And this means making our models better at following customization requests so things work the way you expect. People often ask, why can't I prompt dashes away? This is the work that's going to seek to let us and you better manage model traits and flare contextual awareness. So, like I said earlier, you know, even if some of you raised your hands on emojis or didn't, you might actually like getting emojis out of the model when you're composing a text to send to a friend, but you might really hate emojis when you're writing code because it's going to like break how things run, right? So, we need to be able to uh teach our models to shift tone appropriately depending on your context. Whether you're drafting medical guidance or bedtime story, we want to get the tone right. And then again, librarian here, AI literacy and accessibility is something that's always top of mind for me. Most people aren't power users. They're not folks like you coming to these talks. They aren't asking a lot of questions about AI. They're just trying to use

Segment 4 (15:00 - 17:00)

Chat GBT. We need style management to feel as simple as picking your uh your phone's wallpaper while also helping people learn how to get the most out of these systems because they're quite powerful. So, I'm just going to show you how these ideas translate into the product. Uh, this is where you can shape chat GPT's personality and style right now. Uh, I like to choose nerd. That's like one of the personalities. It ideates a lot more than our baseline model. So, that's a style that I like. There's a few different ones in there. Um, a lot of people really like cynic, which is our super sarcastic personality. And our models also will follow the instructions you put in there. But as I mentioned earlier, work is ongoing to ensure customization carries across model turns. So I can say something to the model like don't talk like a millennial because I'm a millennial and it echoes uh kind of how I talk and it'll start to lose that after multiple turns. That's the steerability work we're doing. The so what is this? If we get style right, AI becomes more usable, trustworthy, and personal for everyone, not just the folks in this room. So, some takeaways. How our models communicate is central to how humans experience AI. Some things that drive user perception of style like are fixed, like our safety policies. But generally, we're aiming to be flexible. We believe style should always be anchored in freedom. AI should expand your idea to explore ideas, not restrict them. So, there's some uh actions you can take to help shape what comes next. If you learn some things, please tell your friends um help keep them up to date. If you've got ideas, you can actually tag me on X. I've put my handle up there. I love it if you actually share your conversation links or like what exactly you were doing because that helps me debug. And then best of all, take matter into your own hands. Go out and customize your chat to Ppt and stay tuned because our customization features are just going to keep getting better. Style is about how people feel about technology. Some of it's fixed for safety. Much is about your freedom. Your freedom to shape how AI shows up for you. And that's my talk. Thanks everyone.

Другие видео автора — OpenAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник