# Stanford CS221 | Autumn 2025 | Lecture 18: AI & Society

## Метаданные

- **Канал:** Stanford Online
- **YouTube:** https://www.youtube.com/watch?v=071zJXhvNfM

## Содержание

### [0:00](https://www.youtube.com/watch?v=071zJXhvNfM) Segment 1 (00:00 - 05:00)

So today we're going to talk about AI's role in society. So this is going to be a bit of a departure from the rest of the class. Um so far we've only focused on the technical aspects of AI. We talked about machine learning. We talked about state based models with search MDPs and games. Talked about Beijian networks. We talked about logic. And so now we're going to talk about societal um implications. And so the first thing you might wonder is why should we care about this? Right? So this is a class about it's a CS class about AI. So why are we talking about this in this class and there are many other classes that you could take to learn much more about um AI in society. Um so I'll try to argue why this is here today. So first is indisputably technology has massive impact on society. This has been historically true. Um I mean you can go back as far as printing press and the steam engine and things like that. But more recently internet, mobile phones, social networks have just changed the fabric of society and uh for either better or in some case for worse. Um and AI in particular is the fastest growing in you know history. Um and part of that growth is due to the fact that we have you know internet and social media to overlay the growth of this technology on top of um you know for example right now chat GPT has 800 million weekly active users. So it and this is going to keep on growing. So we're in if you think about we're only in the very beginning stages of this technology you know revolution. The second point is that we as technologists um speaking to an audience who is probably primarily computer science um we have incredible amount of power right first we understand the capabilities and limitations of this technology um better than anyone else. um we can choose what problems to work on and given those systems that we decide to build we make design choices that shape um access or um how the uh system works. So for example, you might when building a model, you have to decide what languages do we support or if you release make a model, do we release the weights? Um what kind of requests should your um service uh take and which one should you refuse? So as you can tell, these are design decisions that no one else is going to make. I mean that it has to be the job of the model builder and these are not technical questions. These are kind of design decisions that have um you know uh implications. So you know if you're not convinced yet you can maybe think about you might be wondering well we're just here to develop the technology let's someone else worry about that's someone else's job to worry about the consequences. Um and just to hammer it home, this is like a relatively extreme example, but you know, there was this uh man by the name of Vernon Haron Brown who in World War II, he helped Hitler develop rockets and then he came to us and helped develop the space program and the song by Tom Lair kind of sums up his perspective. Once rockets are up, who cares where they come down? That's my department, says Veron Brown. So this is a bit kind of um I mean it's a song but I think it describes an attitude that we probably don't want to take. Um okay so let's say we do care about uh technological impact on society. What should we do next? So you can start with now you know the principles um and the broadest principle is we want to ensure AI is developed to benefit and not harm society. It almost sounds like a platitude by now. Um, and there are a number of documents you can turn to draw inspirations. For example, there's a Belmont report um, which uh, talks about ethical principles um, for human subjects research. And this came about because there was this really unfortunate uh study that was done in 1974 where they basically did a gave a batch of African-American men um who were known to have syphilis but didn't treat it because they wanted to study what happens if you don't treat syphilis. Apparently not great things. Um so that led to the uh creation of the

### [5:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=300s) Segment 2 (05:00 - 10:00)

Belmont report to protect human subjects research and the idea is that um there's some principles here respect for persons which leads to the idea of informed consent. Um subjects have to consent and understand what they're getting themselves into. Um we want to maximize the benefits and minimize the harms and don't affect uh some groups disproportionately more than others. All these things were kind of violated in this previous study. You could also look at things like the ACM code of ethics which talks about uh we want to contribute to human well-being, avoid harm and respect privacy and confidentiality and so on so forth. So these all seem pretty unobjectionable, right? Um but you know the real question is how do we actually operationalize them in practice? because it's all well and good to say we want to benefit society and minimize harm. Um but you know what does that actually mean? So I'm going to talk through a bit of um let's say call them frameworks to think about and help us orient um how we should think about this uh very challenging problem. So this is a question how do we benefit a society by developing AI and the main problem here is that we don't get to fully control its use right if you commit a bad act you can then that's sort of your choice and you know we can sort of say to ourselves don't do bad things but the challenge is that it's that's not our call and AI is an example of a dual use technology, one that can be used to both benefit or to harm people. And dual use technologies aren't uh new to AI. They've been around for some time. I think they were fairly there's this idea of dual use as came up um probably first in when thinking about um in the chemical industry where ammonia was used heavily to uh for arguulture but then um they could also be used for chemical weapons. Um rockets as we saw earlier the you know they are can be used to fire missiles at people or they can send people into space and discover science and all that. Nuclear is a probably a prime example. They can be used to um you know as weapons or to for you know uh for energy. um cyber security. Um if you have let's say an agent that can uh hack into, you know, a server, this can actually be useful for penetration testing. So testing your own servers um before you deploy them or they could actually be used for cyber attacks on someone else's servers. And then finally um you know encryption. So they can protect user privacy and we all love and um you know encryption is the foundation of how you can you know do anything on the internet these days or but it could also be used to conceal criminal activity and encryption we all take it for granted today but you know 20 30 years ago there's a big debate over the kind of the dual use of encryption and that actually was uh you know banned um or strong encryption was banned for a bit of time. So, and AI is a dual use technology. Um, but that doesn't mean you just throw up your hands and say, well, you know, it could go either way. It could be positive, negative. Let's just, you know, you can't do anything about it. There are things you can do to steer AI so it tilts towards the benefits. I said it was hard, but I didn't say it was impossible. So the way to think about um the role of AI in impact on society um is this diagram here. There's two axes here. Intent and impact, right? And intent could be good and bad and impact could be uh positive or negative. So in this upper left quadrant, you're trying to do good things and you have a positive effect. These are beneficial applications to healthcare, education, science, things like that. Um, then you could be in this lower right quadrant where you have bad intent and you have negative impact on society, spam, fraud, disinformation. This is generally about misuse of AI. So, don't do that. Um the more subtle case is um in this upper right quadrant where you have good intent but somehow some bad something happens and it turns out to be have a bad negative impact. This is probably

### [10:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=600s) Segment 3 (10:00 - 15:00)

the more common um you know case in a lot of uh situations and that's something we'll you know we'll talk about. And now um you know there is also this technically this other quadrant here where you have bad intent and you end up having good outcomes um that as far as I know I don't know uh yeah I guess you have to be a pretty bad super villain to kind of live in that um quadrant. So um so you know benefits um beneficial applications um I think there's many things that we can do that you know helps uh you know the world and um remember we as researchers we choose what problems we work on right so we can choose problems along these lines that have a more direct impact so um for example just to name um a few Um AI is uh is having a will have a big influence in the you know biomedical space um to accelerate drug development. For example, alpha fold and its successors not only can predict, you know, protein structure, but also how these proteins bind to various kind of, you know, drugs. And this is can be used to accentuate accelerate drug development because you can do a bunch of simulations or model calls instead of actually doing the wet lab experiment. um more on the clinical side. Um it could be used to help doctors for example answer questions on electric uh electronic healthcare records or it could be used to communicate with um you know patients um directly let's say translating um uh you know medical jargon into uh language that one can understand. Um in your education for on the student side you can have personalized learning on the instructor side you could design curriculums uh create problems automatic grade u the main challenge here is to think about pedagogy and rather than task completion because if you have a tutor um you don't want it just to give you the answer immediately you want it to make sure you understand um robotics uh there's self-driving cars which are actually you know um pretty good these days. Um and you know it seems like people are now looking uh more aggressively at you know other forms of robots such as in the household which would be have significant implications for countries with large aging populations. um weather and climate for you know forecasting for early warning and um uh monitoring the effectiveness of various uh climate policies. Um so all these things are things that AI could have a very positive impact um on. Then a few misuse. Um obviously there's more but just to give you a sense um there's cyber attacks. Um so AI it can be powerful enough. There's a recent anthropic uh post showing that they detected uh a bunch of attackers using cloud code to conduct a large scale cyber attack. Um and uh you know AI agents are getting good enough where these things can actually be quite effective. Now remember some of the flip side of this is that the same agents could be helpful for securing our system or detecting vulnerabilities. So AI is always a dual use in this way. We can also generate text, images, audio, video now with in you know stunning fidelity and this obviously can be used for spreading you know dis disinformation, misinformation. uh this can be used by state actors or it could be used by you know teenagers that want to do harm to you know their peers. These are kind of fairly big and serious um you know problems. Then if we think about accidents so this is really about AI having unintended consequences. So neither the AI developer nor the users wants this to happen but it happens anyway through essentially oversight or negligence. So there's inequality. So AI works better or worse for different uh groups. So if you have an accent, you might realize that voice assistant just may not recognize your speech as well as if um someone who doesn't have an accent. Um, sickle fency has been a big problem. Um, where it can uh affirm a false belief that a user has. Um, but this is especially troubling for users with u various mental health issues

### [15:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=900s) Segment 4 (15:00 - 20:00)

which can lead to, you know, self harm. Uh, potentially there's overreiance, you know. Um, so just to back up a little bit. So sency, right? Why is this an accident? So it's not the case that you know language model providers wanted this to happen. They're fine-tuning their models. So that is you know pleasant for the user but as a result a side effect um you get some reinforcement of false beliefs for over reliance. Again the model developers want these models to be helpful. Um but if they're too helpful then be people might become overdependent and lose ability to critically think for themselves. There's uh cultural homogenization where AI re enforces existing biases and stereotypes and um AI displacing various job functions. So for example, entry level software engineers are um maybe at the level of what uh cloud code can do which um can cause some you know stress in terms of how um you know for hiring of that those entry- level positions. So in summary there's three categories benefits we should do more of this misuse. we should implement safeguards to try to prevent. I don't think you can prevent misuse altogether. Um but you can try to prevent it and accidents are probably where we can do the most right. You can do much more preventative testing and just be careful before you deploy an AI system. Okay. So the other thing I want another view of um AI societal impact is you know thinking about the whole ecosystem and this is something that um you know Richishi will talk about uh the next lecture and the starting point is that um we often think about an AI model right we think about a model as and its properties does it generate the correct answer? Does it what kind of queries does it handle and so on? But that's really insufficient. I think you we need to really take an ecosystem view on AI to understand how AI uh impacts society. So here's a picture um you should have in your head, right? So here's your AI system and um there are it interacts the world through either upstream or downstream. So in the case of let's say a large language model you know it interacts with upstream in that it takes you know data which are created by people it takes GPUs which comes from resources and energy in the environment and an AI system is built and is deployed downstream to users and the use and misuse of that AI causes various impacts on people. So AI people kind of live on both sides of um this picture here. So let's look at kind of upstream. So we know AI models especially large language models are created from data and compute right data. It's important to remember that they come from people, right? This idea of, you know, um, from a model developer perspective, you can have this view that, well, the internet data is free. It's just out there and you can crawl it. But, but all that data is really the product of people's labor. And that's something important to remember. and compute. It all comes from you know extracting resources from environment whether that be energy or you know materials to build the actual data centers and the GPUs and so on. Um there's a few considerations uh that are present upstream. So you think about privacy where people's information is unintentionally revealed. uh copyright which I'll talk about a little bit later where creators might not be appropriately compensated. So an incentive system is kind of uh you know broken labor practices where workers might be treated you know poorly because um you know AI models are fairly data hungry and you have to uh usually pay people to create uh data for you. And if I were environmental impact now that um the some of these like AI data center builds are kind of insane in scale. Um this is you know think about uh carbon emissions, water usage and resource extraction. Downstream AI is used by people and this could have positive and negative um

### [20:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=1200s) Segment 5 (20:00 - 25:00)

implications. um inequality so AI helps some more than others. um it can uh you know generate toxic content or take um harmful actions. We already talked about over reliance and jobs. Okay. So now in the rest of the lecture I want to deep dive a bit into a few topics. Um it is impossible to cover the whole breadth of what it means for AI to have an impact on society. So I'll try to cover a few representative samples to give you an idea of um some of these concerns. So um let's talk about inequality. So um in 2018 there was a very classic study uh done to show the what inequality means for AI. So this was the gender shades project. um they evaluated gender classifiers on um from Microsoft Face++ and IBM on a population and then they stratify based on demographic group. So um darker male, darker female, lighter male, lighter female and looked at the accuracy rates. And what they noticed is that you know there's one group that had much lower accuracy than the other groups. So if you were just measuring average accuracy, you would think that this system these systems work pretty well. But there you would notice oversight that there was actually intersectional group here um that was uh getting much worse service. So one interesting thing is that this study came out and then after the study you know the systems got fixed. Um so one thing that's I think a big takeaway here is that this shows the power of thirdparty auditing which can be a powerful incentive to incentivize to get companies to actually fix problems. And the first step actually in solving a problem is to identify the problem. Sounds obvious but I think this really you know you have to actually do it to really see its impact. On the technical side you know how do you reduce inequality? Well um you know one thing you could do is collect more data for underrepresented demographic groups. This generally is um can be you know costly which is why I think a lot of companies might not have the data to um uh in a in the first place. You can also do some things on the algorithmic side where you can upweight um represented groups in your data set. you can use this a tool called distributional robust optimization which looks at not just the average um accuracy but the the worst case average accuracy over groups. So what you would do is you would maximize the worst the minimum accuracy over all these groups and that objective function would allow you to make sure that all the groups have good accuracy. Now just as an aside you know one thing you might wonder is you know is gender classification even you know well- definfined task given you know because it's ultimately based on the image you're relying on superficial features as opposed to reaching the kind of gender as a um selfidentification another type of inequality comes um more at sort of a global level so I think we tend to think a lot about um models and what's happening in in the US. Um but if you look um at so here's a study that was uh that was done where they took um this model that was fine-tuned from llama um and which was advertised as a reward model which it scores basically how good AI assistant generations given user generations and they evaluated based on using this reward model different generations to worry from I am from country. And uh this reward model is usually used to then post-train a language model using reinforcement learning as we alluded to Ken talked about uh last time. And we see that um this is very uneven, right? So um higher is blue is better, red is worse here. And um maybe not surprisingly like US and Canada have um you know really high reward and places like Saudi Arabia have very low reward

### [25:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=1500s) Segment 6 (25:00 - 30:00)

right so for no particularly good you know reason obviously this is not a particularly fair uh reward model to use and you can imagine if you fine-tune on this reward model you're going to get some pretty serious uh biases Um, another type of inequality that happens is a bit more subtle. This has to do with spirious correlation. So, here's a here's an example um problem from you know a few years ago. So, the task here is given a image of an X-ray try to predict whether there's a collapsed lung um or not. So, this is some you know bad medical condition. And if you look at the accuracies here of the system, it's, you know, not bad. But if you take a closer look um at this image, this is a um classified as a collapsed lung. And it's the researchers noticed that um there's a see this pipe here. This is called a chest drain, which is a common treatment for a collapsed lungs. And what happened was that the model was basically not trying to leverage the presence of a chest drain to predict whether it had a collapsed lung or not. So this is bad because if you look at um this is the accuracy so over higher is better here. Um if you look on average that's the blue so it's 0. 87 87. If you look at the people with uh chest drains, so these are the people who are got treated for cops, it looks great. But the people who haven't gotten treated for chest drains, it's much worse, right? Which means that the people who don't have chest drains, which means that they actually haven't been treated but nonetheless need to be treated, we're not able to actually predict those uh on those folks very well, which is bad. Um so this is known as a spirious correlation. These are patterns in training data that don't generalize. All machine learning algorithms do is they look for patterns in data. Right? And or correlations. And a spirious correlation is one that you see in the training data and the learning algorithm says aha this is how I'm going to predict uh collapsed lung or not. but it actually turns out not to be the right causal variable to latch on to and therefore if you have another condition it won't work. And this affects minority populations often the most. And the takeaway from all of this is that it is really important to not just look at one number. You look at accuracy and you get it up to 90% and you say like wow I made so much progress. It is really important to monitor uh metrics on different populations just to make sure that you're not essentially um harming a particular uh sub population. Okay. So next I want to talk about alignment and alignment is about how do we make an AI system do what we want to do. So sounds, you know, fairly straightforward, but it turns out not to be really. Um, so here if you put on your reinforcement learning hat, what you would do is you define a reward function that captures the values or the what you want and then you train an agent to maximize the expected reward. Okay, so what could go wrong here? Any ideas? You just write down what you want. That reward function is very hard to define in especially in all cases. And what happens if you don't get it right? You get reward hacking. So this is an example from old a open AAI uh blog post before they started training language models. Um, and this is a game called Coast Runners where the goal is to uh race a boat and to finish a race. And the reward that you have in this game is that you're getting points and you get points for hitting things. So when you do reinforcement learning in this environment, the agent learns this behavior where the boat is just swinging around and just hitting this boat many times and not completing the race at all. Right? So this is um an instance of reward hacking which is where the reward function doesn't capture what we want

### [30:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=1800s) Segment 7 (30:00 - 35:00)

and this is terribly frustrating for a model developer because they're thinking just do what I mean not what I say but alas the algorithms can only do what you say. Um so this shows up in more realistic cases. So for example, we have a lot of um people interested in building coding models and a natural reward function for a coding agent is that you write code that passes unit tests. But we know that tests are always complete in their coverage. Which means that just because you pass the test doesn't mean you actually get the answer that your code is, you know, correct. But even if your code is correct, it might have other problems. For example, it might not be secure. robust to um you know adversarial you know testing and there's this work that um evaluates language models both on correctness and um securess um and even if it were both correct and secure it might have other things for example it might have bad style might be complex and so on and it's really hard to get this reward function right so And what happens if you don't get the reward function right? You should be cautious. Never overoptimize your reward function. That's incorrect. If it's slightly incorrect, you can optimize it a bit. Um, and likely you're going to be moving roughly in the right direction. But if you overoptimize, you're going to start gaming the reward function and finding all sorts of devious ways to squeak out more reward. And this is generally true, I think, in in life. So a second problem is the idea that if you think about the reward function representing values the question is whose values are we talking about and there's this idea of pluralism which means that you know different people have different values right so there's actually no one reward function that works uh for everyone um for example you know the view of is it go okay for governments to moderate public social media content And you know, people are divisive, uh, on this. And so, um, it's sort of, you know, dangerous to pick one view of this and just fine-tune your language model to, um, only that view and serve it to the world because then you're projecting that view on everyone. Um, and so ideally models would represent the diversity of thought. Of course within the overton you know window you don't want to represent maybe like fringe um kind of ideas at the same time models should be personalized um but not so much we don't want everyone living in their own echo chamber and certainly we already saw that psychopantic um language models are not or a bit dangerous. So then there's this kind of trade-off between how much do you represent that diversity of thought versus personalized to a individual um user. So again there's no kind of right answer or recipe here. The third challenge is um has to do with scalable oversight. So language models are getting really good. they can solve incredibly complex problems and already they're generating solutions that are hard for experts, you know, to verify. So there's we had this paper on unanswered, you know, questions where you take look at all the unanswered questions on, you know, stack exchange and we have language models that generate the answers, but we don't it's hard to actually um verify whether these answers are correct or not. So there's a few ideas around scalable oversight. One is that you can break down a problem into smaller you know problems um so that humans might be able to verify individual problems. You can use AI itself because if um AI is better than humans at least on these on some task maybe you can get a second AI that's better than human to keep the first AI check. There's some ideas around debate and constitutional AI that are proposing that really the only way to address this is to kind of use AI to monitor AI which you know obviously has some kind of recursive problems. Um another thought is that if you have process level supervision or evaluation which means that language model generates a rationale and then you check the steps basically you show the language model shows the work and then you're checking the steps that is generally going to be much more effective than looking at um just did you get the answer right or not. There's

### [35:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=2100s) Segment 8 (35:00 - 40:00)

some other things you might imagine for example formal verification if you can cast things in terms of theorem you know proving then you're good but most problems in the world are um subtle and complex and you can't really do that. So you know summary of all the things that can go wrong here. The reward function isn't what we want exactly leading to reward hacking. There's actually no one reward function in the first place. So you have to think more pluralistically and it's really hard to even for humans to write down a reward function um in the first place or to provide feedback to a um an agent let alone write down the reward function. So you need to have some story for scalable oversight. Okay. So moving on um you know to copyright. So copyright has to do with the upstream dependencies on the language world in particular you know data. So currently there's a lot of lawsuits floating around in the jai space mostly around copyright and recently there was a settlement. Anthropic agreed to pay authors $ 1. 5 billion to settle copyright lawsuit and so this is you know pretty uh meaningful um that uh you know this is a big issue. So I want to take a step back and think about how to think about you know copyright and so this will be a bit you know going through kind of basic kind of IP law u but I think it's important to for understanding um kind of where language models fit into the space. So why does copyright exist? So copyright is part of a framework um intellectual property law generally and the goal is to incentivize creation of intellectual goods right and the idea is that you protect creators you in by protecting creators you incentivize people to be creators. So there's many types of intrial property, copyright, patents, trademarks, trade secrets, but the most relevant for um foundation model training is copyright. So this actually goes back all the way to uh 1700s in England. Um and u this was the first time that you know copyright kind of emerged in the US. More recently, there was a copyright act of 1976 and it basically says um it applies to original works of authorship fixed in any tangible medium of expression um and so on and so forth. So in particular it's uh applies to original works. So collections are not copyrightable. So like you can't copyright a telephone directory because it's just like a collection of numbers unless there is some creativity in the selection or arrangement. So if you just list everyone alphabetically there's no creativity but if you have some artistic way of doing it perhaps that could be copyrightable and in particular cop uh applies to expression uh not ideas. So you can't copyright the quicksort algorithm for example, but you can copyright the code that implements the quicksort algorithm. So in 1976 the copyright got expanded. So before things that had to be published to get copyrighted and after 1976 all they had to be was you know fixed. So you don't have to register a copyright for it to uh have um to be copyrighted. This is uh in contrast with patents. If you invent something and you don't patent it, then someone can just take it and you know run with it. But so the threshold for copyright is extremely low. So this is something I think a lot of people don't get is that you know it's not just that books are copyright but your website if you have a website is copyrighted. Like your you know technically I guess your homework in uh 221 is copyright. It's your work. any essay you write is your work and so it's extremely low, right? So then you might wonder then if that means everything basically everything is copyrighted and you wouldn't be wrong. But there's more to the story. So um registration is required before creator can sue someone. So uh if you didn't register and you have a copyright but you can't go and sue someone for copyright infringement. It's pretty easy to register. You pay $65. Um and but also copyright expires. It lasts for 75 years and then it becomes part of the

### [40:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=2400s) Segment 9 (40:00 - 45:00)

what is known as a public domain. So works of Shakespeare, Beethoven, um all the you know works in you know project Gutenberg all old books essentially are in the public domain which means that you can use them there's no problem. So summary is that most things on the internet are actually copyrighted. Okay. So then the question is like well if it's copyrighted how can you use a copyrighted work? There's two ways. You can get a license for it or you can appeal to the fair use clause. So a license is basically a contract that says um that is granted by a licenser to a l and effectively says you can use this and I work that I own and you I promise to not sue you for using it. So um there's something important called a creative common license and this enables free distribution of copyright work. So if uh a work has if I create some work and I put a creative common license on it that means um people can use it freely without um I basically give permission for anyone in the world to use it. So examples are Wikipedia, um, Khan Academy is all creative comments, free music archive, a bunch of things are creative comments and this was created in 2001 to essentially bridge public domain and existing copyright. So it basically allows creators to explicitly say effectively this is public domain even though 75 years hasn't elapsed. But most model developers in addition to using public domain data they also license data for training foundation models. For example, Google uses there's I mean these are um some examples. So Google pays Reddit for their data. Open AI pays stack exchange and shuttertock. Um and otherwise they wouldn't be able to you know use that uh data. Okay. So if you think about it, licenses don't cover that much, right? So you have to pay for data. It can be creative comments or it can be in the public domain. But the vast majority of data on the internet is none of these. And that comprises most of the training data for language models. So then what you have to do is appeal to uh fair use. So section 107 of copyright act um and it basically gives you fair use says there's four criteria to determine whether you can use a particular work um and it depends whether you can use it depends on um the purpose of the use. So if you're using it for educational purposes um that's going to be more likely to be fair use than if you're trying to make money off of it. If you're um taking the work and doing some transformation on it, it's going to be more likely fair use than if you're just taking it and reproducing and reselling it. Um it also depends on the nature of the work. If it's factual, um then it's more uh likely to be fair use than if it's fictional. Um because facts are closer to you know things are that aren't really about creativity just like what exists. Um and the more non-creative something is the more fair use it is. So um and then also it depends on the how much of the work you use. If you use a snippet it's fine. whole book that might be problematic. And then finally this is a kind of economic condition. It depends on the use the effect of a use on the market, right? So if you use someone else's work and you end up competing with them or that entry in the same market, that's going to be looked down upon as opposed to if you're using that work and doing something completely different. So here's some examples of fair use. You watch a movie and you write a summary of it. Um that's totally fine. You reimplement algorithm which is idea rather than copying the code. So remember expression is copyrightable ideas are not and then Google books um indexing a bunch of books and showing snippets. This is fair use. It took a very long lawsuit to conclude that it was fair use and now it sets a lot of the kind of precedent going forward for uh data use but now it is fair use. It wasn't you know clear at the time. One thing also to note is that copyright is not about verbatim memorization. You can

### [45:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=2700s) Segment 10 (45:00 - 50:00)

copyright plots and characters like Harry Potter. So even if you're not writing um you're not like copying the book Harry Potter book, but if you use the characters then um that could be a copyright violation. Um but if you are pariding then that's more likely to be fair use. So there's a lot of kind of semantic subtleties around fair use. It's about semantics and economics as opposed to purely technical notion of string overlap. Okay. So now for AI models what is the implication? So first of all copying the data first of training is already potentially violation even if you don't do anything with it. So train a model if you just by copying it I mean that's why it's called copyright you could potentially violate unless you appeal to fair use. Um now has been well argued I think that ML models training ML models is transformative right you're not just like taking a book and then you know putting it up on your website for anyone to download it the book is trained clearly the model can do things that the book didn't so it's a transformative you know operation and furthermore you can argue that ML systems are interested in the idea. So for example, you might want to uh classify stop signs versus not stop signs. The ML system couldn't care about what exactly your stop sign image looks like. It just wants the general idea. Um so it's not really interested in artistic choices. Now that that's for classifiers. So it depends on which ML system you're talking about. And then there is a kind of market um issue where foundation models can and do um affect the market. So there's a reason why writers and uh artists have a lot of angst about AI because their livelihoods sort of are impacted by you know the flood of kind of AI generated content. So but this is also regardless of copyright. So even if um AI were able to uh you know if AI let's say were not trained on any of this data but still were able to generate uh things that were really competing with artists there would still be a problem but it wouldn't be a copyright issue. Okay. So the final piece of is that even if you have a license or you can appeal to fair use for a work um there could be terms of service that prevent you from actually using it. So for example, YouTube actually has a lot of creative comments content on it, but its terms of service prohibits technically downloading you know videos um and you know at scale and storing them. So you can run into barriers even if you have um you're free on the copyright sign. Okay. So um now I want to talk about one kind of subtlety here, right? Because I talked about how ML model trained on data is transformative and it's true that it it's a model is different than the data and it can do things that the data can't do. I can answer new questions. It can generate new um sentences. But there is a sense and people have studied this extensively that the model can memorize the data and thereby copying the data. So for example the idea of memorization is the text sort of contained in the model weights. Now what does that mean? You can operationalize that by looking at the probabistic model. So you take a book and you look at the probability of um the token I of a book given the previous tokens and you look at how high that probability is. If the probability is very high that means essentially the model knows the book and therefore the book has been memorized and has been copied into the model. Um now if you uh do this on a few models like llama 370B um you actually notice that some books

### [50:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=3000s) Segment 11 (50:00 - 55:00)

like Harry Potter are actually memorized very heavily. So this shows for three different books and uh the rows are different models. Um and each of these bars shows the extent to memorize these models are memorized. So this book there's all this white means that there's very little memorization whereas down here it says that llamoth 3. 17B has memorized Harry Potter and sorcerer stone basically kind of extensively. So um this is you know interesting from a copyright you know perspective because um a lot of the arguments around training is fair use. It's fine because we don't really um we're transforming the data. This result kind of calls that into question because at least for some books we are the models are copying uh the data. There's another level here though, right? Just because the model assigns high probability to tokens doesn't mean that it's uh a user can leverage this fact to uh cause harm, right? Because ultimately copyright is trying to protect you know creators and if some uh you know third party can't let's say uh get the book out of the model then you know in some sense the harm is limited. Now, so extraction is the idea that a lay user let's say without too much work can actually extract the text from model weights. And there's experiment where if you turn out if you prompt llama 3. 17b with just the string Mr. and Mrs. D. So those of you who have read Harry Potter know that this is basically the first uh few words of Harry Potter and you put that into a language model this uh llama paper 170B and it will generate all the rest of the book and by I mean you know 450 460,000 tokens almost verbatim not quite and so this is additional evidence that well this could be you know problematic uh because um that if a user is able to take download these models which are you know free you can download from hugging face and then just like generate any book out then uh that is basically like putting the books out for anyone to download which you know obviously violates copyright. Um so extraction in general is a stronger case for infringement. Now fortunately um it turns out that this is the exception rather than rule. So for some reason llama really likes Harry Potter but um for example and butterfly it doesn't you can't extract anything out. Um so I think this complicates this, you know, the copyright story a bit because um it really depends on the model and it depends on the work of art. So you can't say universally models memorize works nor can you say they don't. It really depends. Okay. Um so there's a few other links here which uh you can read if you're interested. So now the final thing I want to talk about is the idea of openness and transparency. Um and with this I urge everyone to think about not just what a model can do but essentially how a model is built. So we're sort of the meta stuff, right? So in particular who can make decisions about a model's behavior who can even build a model you know at all and there is one kind of risk that um is I haven't mentioned before which is uh centralization of power right now there are very few uh big tech companies that can build you know frontier models because these are very capital intensive projects and there's not that many people who can you know do it and furthermore very little was revealed about how they work. Now, a lot of the think uh discussion about, you know, AI risk and benefits and safety are really tied to what these models do and how they um impact the world. But I think it's important to realize that a lot of the what they do is governed by the processes by which these models are

### [55:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=3300s) Segment 12 (55:00 - 60:00)

built. So let's talk about kind of transparency a bit. So transparency is think about as a prerequisite, right? So along the lines of if you can't measure it, you can't improve it. So suppose we want to make AI systems safer. um harm people less and so on. Um if you don't have the transparency of how these models are built and what they can do, you can't even begin to uh start making these models um you know better. So we had a project uh called the foundation models transparency index which um which tries to is one take on how to improve transparency and the idea is that um it evaluates model developers based on their transparency practices. Right? So this idea of an index in it's commonly used in economics for example the human development index um monitors basically certain types of indicators that have to do with uh how the society as a whole is doing. So there's a 100 indicators that capture the upstream the model and downstream properties. Okay. So this uh shows the upstream um indicators include you know information about the data how the data was created through labor compute um and then there's things about the model um capabilities and risks and what mitigations were done how the model was and then there's downstream um indicators on how the model is distributed um who can use the model um what are the what they can use it for if there's a problem how we provide feedback and so on so forth and those are the rows um and then the columns are different models so we have anthropic cloud 3 here um you know this is from 2024 um and each cell basically is a score so uh you know blue is great um a red is not great and you can see that the transparency levels of different companies is fairly uneven. So there's some that do quite well 100%. Um whereas some of the closed models they're very poor especially on some of these um upstream metrics because there's not much disclosed about how these models are built. And remember that um in order to mitigate some of the let's say the harms around you know labor practices or environmental impact and um you know understanding copyright infringement we need to know at least the very minimum you know what data and the compute situation for building various models are. So um so here's just a kind of a summary statistic of you know how the various model providers are are doing and now the idea of the theory of change so to speak is that public reporting will incentivize companies to be more transparent. So remember the gender shades uh project by just simply reporting the status quo that there were these disparities led to improvements and with this project there is something you know similar that uh you know happened where if you look at the transparency in May 2024 it has gone up by and large uh from October 2023. So this is an example of how just by measuring you can sort of get people to pay attention and therefore um in incentivize improvement. Okay. So related to transparency but not identical to transparency is idea of openness. And so this is a topic I you know actually care quite a bit about. Um and uh so the idea is these foundation models are built and the question is how uh much how open are they? So openness is a spectrum here. So on one end of the spectrum are closed models. These are models like GPD5 or cloud where you can only use them through uh API access or through the product. um you can't access the internals. Then there's open weight models such as llama, deep sea, quen and so on. Um where the weights are released but um

### [1:00:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=3600s) Segment 13 (60:00 - 65:00)

you don't have the data, code, you don't really know how the necessarily how the model was built. Um and then you have you know open source models where um you have the weights, you can see the code and the data recipe. Um but still the development of the model is you know closed and then finally we have uh you know open development where you basically see everything and more importantly you can have the community come in and contribute and audit and so on. And if you think about the relationship to open source software where it's clear that open source software has been very successful for not just bringing in the community to further improve has been better for security um and decentralizing you know power. Um imagine if back in the 90s if wind Microsoft Windows had been the only um you know operating system as opposed to Linux and the number of uh Linux- like operating systems have existed. Um so one note is that often people say open models and they mean openweight models. Um most of the models I think that people use are closed and open weight. uh but I think it is important to look at the whole spectrum here and in particular openweight models uh even just because you're open weight doesn't mean you do well on the transparency index because you're not talking about how you don't have to say anything about your data to have open weight model but as remember we need disclosures on the data to understand you know things like copyright and you know privacy uh implications and so Um, one thing to way to think about open way models is that they're essentially like releasing a binary executable. Um, right, you don't have the code, you just have something that you can run. Um, and you know, from that angle, it's clear that, you know, having just a binary executable that you can run on your computer doesn't constitute open source. Okay. So you know why is openness um important? Um so this paper talks mostly about kind of open uh weight models um but some of these ideas can be extended to the full spectrum as well. first the benefits right so increased uh innovation and customizability for researchers and developers right if you only have an API access then um there's only so much you can do only what the API allows you to do which is put in a prompt get a response out right so you can't you know know innovate um with open weight models. You have the weights, you can fine-tune, you can take layers. So people do various things like take the layers and quantize and customize the models in all sorts of creative ways which is something you can't do if you have um a API only model. Um, openness is also important for increasing transparency. Though, as I mentioned before, it's just having the weights open is not enough. There are actually many um, you know, providers that um, you know, if you look at uh, Mistrol 7B is an open way model, but they're actually um, you know, not there there's only 55 out of 100 um, on the scale. Um and finally, openness hopes to reduce the centralization of power. And this is because you have this artifact in the world. And people can take that and customize it however they want as opposed to having a central source of authority to dictate how model behavior should be for everyone. So now that said, some of this is also partial because at the end of the day, it still requires a hu huge amount of capital to train a model. So it's not like you have once you have open weights, everyone can train their own foundation model. So the other thing to think about is the risks and mostly the risks around open models have to do with misuse. And this is uh because anyone can take open weight model even it has been you know safety tuned it's fairly easy to just strip off the safety and you can

### [1:05:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=3900s) Segment 14 (65:00 - 70:00)

use it to gen generate disinformation you can use it to uh cause various to hack into you know systems potentially um one thing that I think is important um is to think more about the risk in terms of the delta from what exists already. So this is called marginal risk, right? So the alternative in an alternative world without the open release model, you still have closed models and you still have the internet, right? As you can see that you know cloud code example where hackers were actually using cloud code to do cyber security attacks. So it's not that there's no harm from closed models either. And furthermore, a lot of the information about how to make, you know, bioweapons is actually just on the internet. It's a lot of uh um kind of, you know, and there's, you know, classes that talk about uh if you take a advanced bio class, then you probably learn a lot about how to do it as well. So, I think that's something that's important to uh take into account. It's not saying that there's no risk but that there's the relevant quantity is the marginal risk. And then the second point is that it is important to think about the whole ecosystem of how the impact happens. So just thinking about um one example that people are worried about is people take um you know a lay person takes a open way model they ask it to strip off the safety and now they can prompt it to get it to uh help you make a you know uh you know boweapon and one thing is that the language model only helps you with designing it right you still have to go manufacture it deploy Okay. And those are additional gates um that uh actually have a bunch of weaknesses that could be shored up. And one can make an argument that uh sometimes gating at the kind of the physical level is easier than trying to do at uh the model level. But um the reason this paper exists is that in 2023 in particular was a year where there was a lot of um angst about you know AI especially in the policy world. This is right after ChachiBG came out December 2022 and then the everyone kind of the world woke up and there was a lot of you know confusion about um the risks um and the benefits and there was a lot of debate over um well should we have these open weight models at all? Meta at that time was putting out open weight models and um you know I think this framework allows us to lend some clarity into how to think about the benefits and the risks. Okay. So to summarize um you know hopefully I've convinced you that technologies should care about societal impact. After all we are people and you know this is the and as people we should care about the world that we um inhabit. Um but it's hard um AI is dual use. there's a lot of uncertainty as especially since you know AI is in uh growing in capabilities so quickly um which makes it hard to reason about what the exact impact will be that doesn't mean we don't try I think we can do a lot of things here um I think we talked about how you can you know focus on beneficial applications you can try to deter misuse and prevent accidents from happening Hopefully we've talked through some examples so at least you have the language and awareness of some of the issues that come up. Um I think one general comment is that we need to think about the whole ecosystem not the model. So as technologists we often fixated on okay the data shows up at the front door we figure out what the algorithm is how do you train it how you evaluate it and then that's it right but I think in order to meaningfully talk about societal impact you really have to understand both ends of the both input upstream and output downstream. when we talked about inequality um that motivated me to monitor multiple metrics um same with uh you know alignment um because of reward hacking um we can't there's no one metric that is the

### [1:10:00](https://www.youtube.com/watch?v=071zJXhvNfM&t=4200s) Segment 15 (70:00 - 72:00)

perfect metric um there might not be even one voice um here and we might as humans we might be not kind of powerful enough to even write down what the thing is that suggests a greater need to kind of have a much more comprehensive suite of metrics to monitor it because that is what will give us kind of robustness and in an understanding of what our systems are doing and then argue that openness and transparency is a basic you know foundation. you know think about kind of a um you know kind of freedom in a kind of a political system that I think of that as kind of analog uh here which it's like uh without it I think it's very hard to uh make some of these basic um you know improvements and then finally the idea of auditing as a powerful tool so on a technical level it's just evaluation right it's just prompting a model with some things and you get about some other things or you know evaluation but really I think from a kind of a social perspective it's uh it's auditing it's a bu raising awareness of certain issues that happen in the world and that in instigates a you know the desire and hopefully the will to actually address some of these issues. Okay. So that is uh all for today. Um next time we'll take a deeper look at the players in the AI ecosystem. So Richishi is going to give a guest lecture on AI supply chains which uh expands on this idea of paying to attention to the entire ecosystem rather than the individual model. Hopefully that will give you even a greater appreciation for how to think about AI's role um in society.

---
*Источник: https://ekstraktznaniy.ru/video/20903*