# Feedback Loops in Opinion Modeling | Danielle Ensign | OpenAI Scholars Demo Day 2021

## Метаданные

- **Канал:** OpenAI
- **YouTube:** https://www.youtube.com/watch?v=wZ6PqNp-W_w
- **Дата:** 10.05.2021
- **Длительность:** 15:31
- **Просмотры:** 6,578

## Описание

Learn more: https://openai.com/blog/openai-scholars-2021-final-projects#danielle

## Содержание

### [0:00](https://www.youtube.com/watch?v=wZ6PqNp-W_w) Introduction

so my presentation is going to be on feedback loops in opinion modeling and i will so i'm danielle ensign my mentor is jeff with so i'm going to briefly overview here where i'm going to talk about why this is a problem we should study i'm going to then give a literature review of just like briefly lenses on opinion modeling and then i'm going to talk about the particular thing we studied which is when models produce data that then goes back into models in an application of temperature decay so why study opinion modeling well in ai safety it would be useful to understand how preferences change over time and sort of open-endedness and data augmentation it would be good to understand what are the processes that are leading to this data we're sending into our models and in ai fairness there is this concern that as language models generate text in the world they may affect sort of this opinion ecosystem that exists out there and so it would be good to understand how that happens and how it affects models themselves so

### [0:57](https://www.youtube.com/watch?v=wZ6PqNp-W_w&t=57s) Literature Review

first a brief literature review we have language modeling which is essentially where you take lots of data and then you feed that data into a language model and it sort of captures a snapshot in time um so this is useful but it doesn't quite capture a lot of these dynamic questions they might ask like how the process is changing over time and so we do have some previous work that people have done on just like a facebook scale simulator or people that study github and these find that they're sort of these spikes but it's hard to get sort of this detailed analysis when you're just doing these learning systems and so another thing you can do is these agent based or physics kind of models and one thing they find there is that reality is very sort of nuanced it's hard to concretely define these particular things because they're very interacting with each other and so one thing people do is they try to take like a systems theory approach where they study the structure for many interacting parts but the problem here is it's very hard to have like choose the right sort of level of fine-grainedness of your models and so one thing you can do is use these empirical laws we see in real world data and then use that to sort of validate your models and then finally there's sort of this other perspective of looking at these networks and you look at sort of how the data is moving across through like how opinions the density of the networks and different things like that and i can give you some insight into what's happening so there's a lot of other things in particular we decided to study a particular problem which is where you have models that are outputting data and that's fed back into the models themselves so concretely right now there are models like gp23 that are outputting text that's going on the internet and this data is going to go back into future models and so it would be good to understand what things we should be worried about here what's happening so here's the setup that we're going to do we're going to take some data we're going to feed it into a trained model we're going to generate some data we're going to use that to train another model we're going to generate some more data and repeat and you can imagine a couple of variations of this maybe we're going to fine-tune the model instead of training one from scratch or maybe we're in a classification setting where we label this data distribution and that goes and that leads to a trained model so um very concretely here let's consider this coin setting where we have so we have we're going to flip lots of coins and then in this case we ended up with the same amount of heads and tails and so our new probability is 0. 5 we do this again and we ended up with 13 heads and seven tails so our new probability of heads is 0. 65 and we can repeat this multiple times and there's a couple of things you find when you start doing this formal analysis on like linear classifiers or coins and two insights are first of all that more data tends to lead to a decreased step size and this just makes sense it gives you a better estimator and the second insight is when you look at this we ended up at all tails and the reason for this is because there is some probability of just outputting the same token over and once that happens then it's going to be stuck there and so for example uh you could imagine all heads or all tails

### [4:07](https://www.youtube.com/watch?v=wZ6PqNp-W_w&t=247s) Theory

or even more generally imagine if we have more than two outcomes in a discrete setting so we can do some random walk on these and eventually one of these is going to end up at zero in which case we're back to the two token settings so there's some theory uh one other thing that's worth talking about in the theory setting is this temperature so with temperature i have this graph here and this is showing so when this red line is just a line of the green line that's temperature 1. 0 and when the red line is horizontal that's temperature 0. 0 so what you see here is that when we sample with temperature what's going to happen is things that are 0. 5 or higher are going to be pushed up and things that are below 0. 5 are going to be pushed down this has real world implications because what this means is that when we sample from models we're going to be perpetuating existing biases this is biased in the technical sense uh it's less clear if you can argue that this applies to bias like in the real world but uh it certainly at least seems like it would speed up this collapse issue that we're talking about so there's some theory but how does this apply in practice well first what we did is we looked at some engram models where you can actually run the theory and what the theory suggests is that it should collapse to a single path on the graph from the start to the end and that path should have no cycles because if there is a cycle that represents two different places we went and one of those directions is going to be collapsed and in fact that's what we found uh just doing a basic n-gram this sort of traditional nlp modeling we found after doing ten thousand ideas of this step it clapped to by being missed i will not wish the apart cousin of duty

### [5:44](https://www.youtube.com/watch?v=wZ6PqNp-W_w&t=344s) Transformer

so that's great but transformers are really sort of more modern language models and so the question is what happens with transformers there is this tricky question of like how do you measure collapse and one way you can do this is by modeling temperature by modeling entropy so if we generate lots of sentences and then we compute the probability of each sentence just by multiplying the probabilities of each word and then average over loss of those sentences we get a rough estimate of the entropy of the model itself so as a reminder here's what we're doing we're taking data into a train model and i'd like you to sort of guess what you think is going to happen with a transformer when we just feed it back into here so okay what we find is that there are two settings the first is basically it just sort of shoots off to entropy randomness where it yeah it sort of just becomes this uniform outputting thing roughly it still sort of centers on things but it becomes this very sort of random generation the other thing we find is this behavior

### [6:52](https://www.youtube.com/watch?v=wZ6PqNp-W_w&t=412s) Collapse

here where it sort of shoots up initially and then it goes down to this collapse as the theory predicted concretely here so it starts out we have this plankton or aggressive wildlife some days in the season just kind of standard output and then it if you've looked at the outputs of language models you know that a lot of the outputs are pretty weird and so what's happening is the language models are sort of getting used to the outputs being at least what seems to be happening is the language models are getting used to these outputs being weirder than they are used to as their inputs and so we end up with the generation quality seems to decrease then eventually it hits this cat where now it's sort of used to how weird the output is and then we have these cycles that happen so for example um hi hello it might just repeat something like that and once that cycle appears in a generated output the model will see that and that will be more likely to be produced in the future and so we'll repeat this um and then it'll be sort of perpetuated in there and so if you look at these like these are the most common tokens at the peak they're pretty common tokens but what happens is it sort of focuses on particularly weird little loops so in this case it really like to say twitter a lot or ally a lot but you we ran some other runs so this one eventually just focused on saying enemy over and the important point here is that we have this collapsed behavior so that's the theory um what the theory also suggests is that entropy we would or that temperature if we have a temperature below 0. 1 or below 1. 0 then we would expect it to collapse quicker and if we have temperature above 1. 0 then we would expect it to go to this entropy and that is what we find so if we have temperature below 1. 0 we find that it very quickly collapses you'll recall over here it took 250 steps here it only took about 20 and with very low temperature it collapses almost immediately to the sentences like essentially just the most common sentence whereas with higher temperatures it has some time to fiddle around before it collapses and then yeah with temperatures above 1. 0 it just sort of takes off to essentially the maximum entropy it can get to so future work in this direction

### [9:12](https://www.youtube.com/watch?v=wZ6PqNp-W_w&t=552s) Future Work

uh it would be good to have better understanding of what's happening in this process it would be good to sort of um to run more runs and understand some of the variability some of these take a very long time so it would be good to understand if there's other kinds of outcomes these are the i'm describing the general patterns we've seen it would be good to understand like if we are perpetuating real-world biases you know uh there's an argument that temperature sort of leads to these models being mode seeking where they will output the most common thing and then that goes back into later models and so you can imagine that it should be perpetuating bias but it's hard to necessarily argue that and then finally there's this temperature decay phenomenon where because this lower temperature things are being fed back into the models and the model gets used to it and that we get less entropy that seems like a problem and so uh you know the one other thing here is all of this theory seems a little iffy like in practice when we like are feeding this models into the real world there's some filter on what data actually goes out there and so to and so we should under like and so really we should incorporate in this model some kind of feedback thing that is filtering what data is going on there and so uh this is analogous to if you have like a go playing system then you only are feeding back the things that do well and so that's a relevant piece here as well so yeah that's my presentation um and thank you to

### [10:53](https://www.youtube.com/watch?v=wZ6PqNp-W_w&t=653s) Questions

to everyone um my mentors and everyone that was able to help and uh yeah i'm open to questions now cool so uh the first question we have is for the language model at t equals one does entropy always increase and then decrease yeah so sometimes it does seem to just take off to really high entropy sometimes it does uh do that up and then back down we haven't ever seen it just go down and honestly it's kind of weird but it pretty like consistently goes down it doesn't do much of a random walk whereas for like the theory it suggests it should be doing a bit of a random walk and so there's a lot of open questions there so there's a question here would you see the same effect if you extend the data set instead of replacing it with model samples so this is another direction that you can yeah that i think is really interesting it's essentially this question of grounding so i have a couple slides here on this one where you can imagine instead of just directly using the data into a trained model we actually have some ground truth data that we're also feeding into the model this in principle this can uh help quite a bit because like we're just doing a random walk at each step if you bias the random walk towards a particular distribution then we would expect that random walk to not or to roughly stay around there and for small engram models we did find that this helped and this is relevant in practice because these models are going to be feeding back into themselves and so or and these models are going to be taking sort of also real world data um but yeah we haven't validated this on language models and i think it's a really interesting question yeah what are the implications of this work first semi-supervised learning so i think that yeah it's um that's a tricky question i think i would need to think more about that um yeah so you could certainly imagine like at the limit of labeling data just from some small set of data that you may end up with some feedback loops but it's less clear i think that's a nuanced and tricky question yeah i'm not sure i have a great answer for you on that one uh the final question we have here is we're able to look at what happens if the outputs are only some smallish percentage of the input for the next training step mixed with new real-world data yeah so that's this grounding setting and we have not ran it on language models i did ran it for the engram things and honestly my intuition is that yeah just a very small percent would help significantly you know it's worth pointing out that these things are like in practice we're only going to be doing a couple steps we're not 10 000 steps which is how many we needed to converge and so like only in practice if you have some of this grounding data that may make a big difference yeah um so there was also a question have you considered other settings apart from the coin setting um yeah so we did that setting with lots of different models we looked at like a linear classifier so it's a little different there because instead of you know this collapse it just sort of randomly walks if you don't have any bias then it just sort of keeps cycling and yeah the general insight of sort of having either a random walk or collapse for discrete things seems to be true in a lot of settings but i think that it's worth doing a little more detailed analysis there because for some more complex models it you might be able to say something more interesting um okay so that's all the time i have so i'm gonna pass it off to jonathan thank you everyone

---
*Источник: https://ekstraktznaniy.ru/video/11579*