Open AI Is In SERIOUS Trouble (Open AI Lawsuit Details)

27:36

Open AI Is In SERIOUS Trouble (Open AI Lawsuit Details)

TheAIGRID 01.01.2024 48 242 просмотров 971 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Open AI Is In SERIOUS Trouble (Open AI Lawsuit Details) 💬 Access GPT-4 ,Claude-2 and more - chat.forefront.ai/?ref=theaigrid 🎤 Use the best AI Voice Creator - elevenlabs.io/?from=partnerscott3908 ✉️ Join Our Weekly Newsletter - https://mailchi.mp/6cff54ad7e2e/theaigrid 🐤 Follow us on Twitter https://twitter.com/TheAiGrid 🌐 Checkout Our website - https://theaigrid.com/ Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos. Was there anything we missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (6 сегментов)

Segment 1 (00:00 - 05:00)

so this video we will discuss one of the most important lawsuits of all time the open AI versus the New York Times so if you haven't heard the open AI company in conjunction with Microsoft is being sued by the New York Times for alleged copyright infringement and the New York Times Are claiming that open AI trained its GPT 4 model on many different New York Times articles and they essentially want GPT 4 to be shut down so pay attention to this because if this case is ruled in a certain way it would have serious ramification for the generative AI industry and with that being said let's get into exactly why this is so important so let's take a look at exactly what the what's happening here because essentially if you don't know the New York Times a media establishment is essentially saying that opening ey has trained gbt 4 and gbt 3. 5 but mainly gbt 4 on propriatary data from The New York Times such as their articles and they're essentially claiming copyright infringement and/ copyright theft okay so that is the case now what you're seeing on screen here as you can see a series well this is just one but what you're about to see cuz I'm about to show you a lot is you're about to see a series of essentially prompts that people have prompted chat GPT with/ GPT 4 and then the output and then on the right hand side you can see that the actual output the actual text from The New York Times and how similar it is so you see right here that this is the text it says output from GPT 4 um a documentary film that premiered on a Wednesday significant outbreak from his pressors and the actual text from The New York Times now you don't need to read all of this but the only text that isn't different is the black text so the word Premier here but all the red text is essentially um the exact same so there's a lot to discuss in this because I think this is one of the most important things because this kind of AI SL generative AI lawsuit is definitely one of the first of its kind and we don't know how to try this because the laws haven't been updated for this kind of technology so that's why this is really important so you can see here why the New York Times is essentially saying that look open AI stole you know trained the data on our articles and when we've got this output when we prompted it we've got the same exact stuff but let's actually take a look further because um someone actually did a Twitter thread so I'm going to go on through this Twitter thread uh you know just talk about certain things and then essentially I'm going to show you some things that I did find when looking through the paper cuz the paper although is long I've read most of it and it's pretty crazy so you can see here that it is basically a landmark case and he said it's rooted in copyright law and the US Constitution and that's very much where it begins so it says the Constitution and the Copyright Act recognize the critical importance of giving creators exclusive rights over their Works since our nation's founding strong copyright protection has empowered those who gather and report news to secure the fut of their labor and investment and it says copyright law protects the times expressive original journalism including but not limited to its millions of articles that have registered copyrights and it says defendants this is open AI have refused to recognize this protection powered by large language models containing copies of times content defendant's generative AI tools can output that recites times content verbatim closely summarizes it and mimics its expressive style as demonstrated by the sades of examples C exhibit J now if you do take a look at exhibit J I'm not going to lie to you guys there are a ton of different examples there so I want to show you guys exhibit J because they're like hey take a look at exhibit J because this is pretty crazy so this is exhibit J here and you can see that it says 100 examples of gp4 memorizing content from The New York Times says the following are examples of situations where opening eyes GPT 4 model was trained on and memorized articles from The New York Times each example focuses on a single news article examples were produced by breaking the article into two parts the first part of the article is given to GPT 4 replies by writing its own version remainder of the article in each case we observe that the output of gbt 4 contains large spans that are identical to the actual text of the article from The New York Times for each example we provide the following the URL The Prompt that was given the response from gbt 4 and the end of the article as it appears on the New York times. com so the text is depicted in the red font when it appears identically in both gbt 4 and the output and the source article from The New York Times now when we take a look into this you can see that essentially the output is pretty similar like I mean for this text this is all exactly the same because all the red text is here and I'm only going to go over a few of these but I'm scrolling down and you are seeing that there are literally over 100 different examples of this okay and this case is so much bigger than people think because there's a lot of stuff that's going on open AI which means that this isn't really good for them so this kind of lawsuit that people might be thinking open is a billion dollar company this is completely fine but you know as we deep do a deep dive into this

Segment 2 (05:00 - 10:00)

case you're going to see that this is really really important because it sets the precedent on what exactly is going to happen and I mean you can see that like as I scroll down there is tons and tons of this but anyways if we go back to this what you can see here that it then goes on to state that but it does well to look towards the future and showing how violation used to create a substitute undermining existing and future business models including AI licensing which fund the critical and costly journalism around the globe so here they talk about how you know Microsoft is becoming a huge company based on the fact that they were able to you know acquire open AI SL you know buy half of them so it says Microsoft deployment of the times trained large language models through its product line helped boost its market capitalization by a trillion dollars in the past year alone and opening eyes release of chat gbt has driven its valuation as high as 90 billion so I mean what they're saying here is look you use our articles you train the model on it and because your model is so effective you know you've essentially managed to get this huge sum of money like there's huge sums of money at State like they're a $90 billion company and that is of course a huge amount of money now essentially they say the times objected after it discovered that the defendants were using and anytime I do say the defendants are just open AI SL Microsoft and it says defendants were using times content without permission to develop their models and tools for months the times had attempted to reach a negotiated agreement with the defendants in ordance with its history of working productively with large technology platforms to permit the use of its content in new digital products including news products developed by Google meta and apple the time's goal during these negotiations was to ensure it received fair value of the use of its content facilitate the continuation of a healthy news ecosystem and help develop generative AI technology in a responsible way that benefits society and supports well-informed public so essentially what the times are saying here is that look we discovered you using this and we tried to reach out several times but we weren't able to reach a deal which is pretty unfortunate because I think open ey did reach another deal with another other companies and then here's where essentially they talk about the cost because it says the series was a product of an enormous amount of effort across three continents and Reporting this story was especially challenging because times repeatedly denied was repeatedly denied both interviews and access and basically they're saying that certain articles that they did were really hard to do and they had to go through so much trouble to get those interviews and then essentially they're saying that they basically ripped that off and they're saying that these companies aren't really valuing how much stuff goes into the journalism of these stories and then of course they're talking about how they provided real- time detailed information across you know a range of topics including us elections you know many things that went around that I don't want to say cuz I don't want the video to get demonetized but basically tons of stuff and real time information that a lot of people did use now what's also interesting is that they said and with clearly tied role for Microsoft the complaint highlights the abuses that even in the most recent months it shows the example of where content was lifted verbatim from a New York Times report and then it compares to the approach taken by a search engine so basically what they're saying here is that what we have is we have someone asking Bing chat something and then essentially what Bing chat did was Bing chat allegedly took the entire tree of a New York Times article and basically put it into that chat Bo and of course this isn't good because it means that people don't need to um you know find an article it just needs to give that information with you in that you know chat session and the problem with that is that people don't need to go to New York Times anymore um and they're alleging that you know secret here you can see New York Times article they're alleging here that is you know infringement because they taking the content and putting it somewhere else and then here's when they show another example of you know Microsoft Bing essentially you know just taking New York Times articles and what's also interesting is that they claim that while open a/ Microsoft engaged in Wild scale copying from many sources they gave the times content particular emphasis when building their large language models revealing a preference that recognizes the value of their works so they're basically saying that you know uh they did copy a lot of people but they essentially prioritized our work because our work was probably the best work ever they also talk about how the times content was a total of 200,000 URLs accounting for 1% of all sources listed in the open web Tex 2 an open source Recreation of the web to Tech set Su used in training gpt3 so of course on the slide we do have more and more different articles because like I said there are you know several uh you know examples out of here and then what I do find interesting is that they say that despite its early Promises of altruism open ey quickly became a multi-billion dollar company for profits yada y so basically what they're saying here is that open AI you know firstly if you didn't know the story of open Ai and why they called open AI is because essentially open AI was meant to be an open source AI company which means that you know they were supposed to you know do open research share what they did um and over time they've lost their way and they've become a for-profit company that is closed source that don't have any open source code so people and you know

Segment 3 (10:00 - 15:00)

now the New York Times are saying that you know now they became a billion dollar company for profit you know built on the unlicensed exploitation of copyrighted works that belongs to the times and others and essentially they you know turned off their nonprofit status in around March 2019 and now they're just basically a you know they're basically like not a conglomerate of course but mainly just like a profit hungry company now this is why this is one of the most widely discussed things on the Internet is because one thing that they do say and I don't think that this is going to happen but they do say that um you know okay so let's just take a look at some of the things that they do want okay so it says we you know we want awarding the times statutory damages compensatory damages restitution disgorgement and any other relief that may be permitted by law or Equity so basically they want to say look we want you know financial compensation for any damages y y they want to say permanently enjoining defendants from the unlawful unfair and infringing conduct alleg herin and of course ordering destruction of GPT or other llm models and training sets that incorporates times work so that is the most important thing here because what they're saying is that we're ordering the destruction of all GPT models and all other llm models and training set that's all incorporate the times work which basically means that they want GPT 4 destroyed because they're saying that look you guys clearly trained on our models and if that model contains our copyrighted work um we want that destroyed now um for I'm pretty sure times understands you know what they're doing because anyone who knows how these training runs work you know that in order to train gp4 It actually cost around $100 million and here's the thing okay open aai is a company that currently isn't making any money as surprising as that may seem and as many people as use chat GPT and as you know great as it is it doesn't actually turn a profit at the current time now in the future that may be something that does happen but here's the thing you know training Runs cost really really far too much for them to retrain their entire model I think that what is likely to happen here is that gbt 4 doesn't get destroyed because I think open AI are going to argue back that look okay although they do want GPT 4 destroyed they're probably going to be like you know at this point GPT 4/ chat GPT is pretty much like a public tool it's like Google so we can't really destroy it because it would be destroying like you know a fundamental part of you know technology and it's really huge it's F it's you know it's you it's like it's 100 mli users in two months it's definitely one of the most used things out there so them destroying it definitely wouldn't it wouldn't be that good but I think that what will happen is that what openi will have to do is they will have to update their guidelines which is definitely what we have seen before so although we don't know exactly what can happen because it is the law and anything can happen and if the judge decides to rules in the New York times's favor because allegedly what they're claiming is kind of possible in the sense that you know you can see that it is output verbatim the problem is another problem that I didn't see anywhere in the paper and I'm pretty sure that this will be argued back is the fact that although it might have been trained on that data what they're probably going to argue is the fact that these models don't actually store things verbatim they essentially uh reason to predict the next word so I'm guessing what open a are going to argue is that you know these models don't just output you know blank pieces of text that are verbatim the word because that's not actually how they work and when you understand how large language models work I guess you could argue that this isn't that much copyright because it's akin to someone reading something online and then writing it out from memory that's what I've seen some people say and I guess to an extent that is true but that's definitely going to be a very hard point to argue now destruction of GPT 4 I think that's highly unlikely but I do think that what is likely to happen is open I are going to have to update their guidelines to ensure that anything relating to the New York Times like they're probably going to have to manually hire someone or a bunch of team to go through all that stuff so that anything relating to that doesn't ever come out again and we've seen some of the small changes that they've done now I want to show you guys a few pages from the actual PDF because it's pretty crazy so this is why this is pretty crazy okay so is it even talks about the technical report they say for example the technical report that openi released said that this report contains no further details about the architecture including model size Hardware training compute data set construction training method and all similar and it says here and they even talk about SS and they say that open eyes Chief scientist SS Justified the on Commercial ground it's competitive out there and there are many companies who want to do the same thing so from a competitive side you can see this as maturation of the field but they're saying that he actually said this because its effect was to conceal the identity of the data that open AI copied to train its latest models from the rights holder so the basically what they're saying is that you know sov is basically saying that look we're not going to give you our data sources uh the parameters because of course we don't want you guys to understand how we built this model and we know that it's the you know state-of-the-art model and we don't want everyone to be able to

Segment 4 (15:00 - 20:00)

copy us because it is really competitive but the New York Times are arguing that this isn't true they're arguing that look they only said this because they wanted to conceal the fact that they trained it on a whole bunch of copyright stuff so I mean you know I mean I do I do kind of agree more with ssk for the fact that you know it's really competitive out there and if they do tell them all the data sources all the models um I think it is going to be easier for people to compete with them and we know that there are so many companies out there that are in this AI race then essentially they actually do then of course you can see here that it says the commercial offerings have been immensely valuable for open AI over 80% of Fortune 500 companies are using chat GPT according to recent reports open is generating revenues of 80 million per month and is on track to a pass over a billion within the next 12 months now of course this is quite a lot of money that they are making but remember the cost to run chat GPT on its servers is really expensive that's why you can only send 40 messages every 3 hours or is it even 40 or 30 messages I don't know which one it is but it's only every 3 hours and that's been like that since the release I mean when GPT 4 was released it was only 25 messages like every 6 hours and now it's not that much better even nearly I think it's been like around 7 to 8 months so this point here also does make sense because if over 80% of Fortune 500 companies using chat GPT that just goes to show the scale of this product and I don't think that they're going to destroy this product at all but the problem is that if this does actually go through as in if open a does lose this and they do have to settle or you know I guess you say lose some kind of copyright law infringement case which this is then it's going to be interesting because that now means that it opens up The Floodgate in terms of you know other generative AI models because if a generative AI model like a large language model is to have found to have any other copyrighted material then anyone who created that then anyone's Works who's within that large language model is probably going to go after that person because this is the president that's been set and that's not good for the space because although I do agree that you shouldn't just you know steal a bunch of people's data and of course if you are going to be investing and creating something then you should pay for the data that you do train it on I think that it's going to be interesting to see how these companies navigate and how they prevent that from actually happening so then of course let's continue to talk about some of the other stuff okay um and I think this is really important because a lawyer actually did a breakdown on Twitter so uh let's take a look at this thread because I know you guys are going to want to see this so they say that the New York Times is a great plaintiff it isn't just about articles it's about originality and the creative process they're investigative journalism like an in-depth taxi lending expose cited in the complate goes beyond mere labor its creativity at its core and they said but here's a Twist copyright protects creativity not effort while the taxi article 600 interviews were impressive the innovation in reporting is what matters legally and then she actually talks about how this is you know similar to the GitHub lawsuit against GitHub co-pilot which cited only a few lines of code that were open source so essentially if you didn't know the GitHub co-pilot lawsuit is a class action lawsuit filed against GitHub Microsoft and open aai by the Joseph sari Law Firm on behalf of Open Source programmers and the lawsuit actually alleges that GitHub co-pilot an AI based coding product developed by GitHub in corporation with open AI violates intellectual property by profiting from the work of op Source programmers so this lawsuit actually claims that GitHub co-pilot has actually been trained on billions of lines of publicly available code which raises the concern about license violation so the plaintiffs argue that co-pilot reproduces their license code without proper attribution copyright notices or license terms which they believed violates the conditions of their open source licenses and the lawsuit actually alleges that GitHub Microsoft and openi knowingly fail to program co-pilot to review attribution copyright nances and license terms and this lawsuit was actually filed in November of 2022 which just goes to show the lawsuits are starting to pile up and it represents a significant development in the ongoing debate about the ethics of AI and IP rights so despite the allegations GitHub Microsoft and open AI have actually defended their product arguing that co-pilot is just a tool that helps developers write code based on what it's learned from the publicly available code they've also attempted to have the lawsuit dismissed but as of December 20223 the lawsuit is still ongoing and the court has refused to dismiss the two key charges in the case which goes to show that this is a serious matter and with this case I'm not sure this is going to get dismissed as well so like the GitHub lawsuit which she's actually talking about these two cases are going to really determine where the AI space goes next now of course she also does mention that there are failed negotiations which suggest damages for the New York Times and opening eyes already licensed from other media Outlets like political so she basically says here that opening eyes refusal to strike a deal with the New York Times who says that they reached out in April may prove costly because if there are more examples to happen then there's going to be more people trying to dig into the pockets of open Ai and open AI thought that they could get out of it for seven or eight figures but New York Times is looking for more of like an ongoing royalty so they can earn an income from that so now what's crazy about this too is that she also you know pulls up something from the paper that I didn't see but she says that the

Segment 5 (20:00 - 25:00)

complaint paints open AI as a profit drien and closed source and it you know contrasts with the public good of Journalism and this narrative could prove powerful in court weighing in the societal value of copyright against Tech Innovation notably the balance of good versus evil has been at the issue of every major copyright case but essentially what's crazy as well is that um you know they talk about The Bard drama and she even says that you know open ey CEO Sam Alman reportedly clashed with open ey board member Helen Tona a paper that Tona wrote criticizing the company over safety and ethics issues related to the launches of chat TPT and GPT 4 so I mean that's crazy that they're really bringing in absolutely everything and they said that the venant knew or should have known that these actions involved unauthorized copying of time works on a massive scale during training then the craziest thing about this all okay and this is kind of bad but I'm going to show you guys why it's not that crazy but it's that she actually talks about how misinformation allegations actually include a clever twist so it says the complaint pulls in something people are scared of hallucinations so if you don't know what hallucinations are basically where AI models say something that isn't real so for example you can ask an AI you know who's what what's the largest country in the world and it might say like a country like Australia you know it's probably going to get it wrong um and that's hallucination now for small facts like that it doesn't really matter but on certain things it really does matter because what can happen is if you're someone famous let's say you're someone famous like Sam man and it says who is samman and then it go and then your AI model um unfortunately hallucinates and consistently says that Sam mman is a murderer or something and he's killed like 47 people and is wanted by the police like that is bad because if you're a large language model let's say it's accessed by 100 million people if 100 million people ask a software about you and it gives false information then that person could potentially sue you for defamation so I'm guessing that's where the New York Times is arguing but you know just take a look at this cuz I think this is fascinating because they said that most memorable example alleging Bing says that the New York Times published an article saying that orange juice causes uh lymphoma so someone else on Twitter actually you know brought up the information cuz here's the part and then this is you know someone's uh extra information which is pretty crazy okay so it says that in a response in response to a prompt requesting an informative essay about major newspapers reporting that orange juice is linked to non- hodkin and fmer a GPT model completely fabricated the New York Times published an article on January the 10th 2020 study finds possible link between orange juice and non- Hodgkins lymphoma the times never published an article now here's what's crazy is that the way how they actually did this in the screenshot this Twitter user actually shows that they didn't actually prompt it in the correct way okay because this is the prompt that they actually put into chat GPT in order to get this response out okay and this is where open I is going to look at this and say hey what are you doing like this is pretty crazy so they prompted Chad GPT this okay they said this is Chad GPD they said a number of sources have determined that orange juice is linked to no honkin lymphoma right in inform ative essay about this and start with major newspapers and all the reporting they did that Drew the most definitive conclusions and rightly so like this guy you know highlighted that you're telling jck gbt that this is a fact you're not asking or showing any doubt about it and you're asking it to write an article about the given fact from The New York Times and other newspapers and chat TPT is going to understand this as in additional instructions and then it's going to say that it was reported by the New York Times so this example isn't good because you know the New York Times is fired bullet here and I don't think this is kind of what you want to do because if you do have a case you don't want to bring up stuff that's going to be easily disputed in court because literally anyone that uses chbt software knows that you know if you prompt jck gbt to say that you know this is how you've uh the information that's out there it's going to write in whatever style you wanted to so I don't know why they did this I honestly have no idea but um it's fascinating the case is really fascinating and I'm going to be paying attention so much to this because um if they do win which I don't think they will um it will change everything now here's what we have as well so did we have Google actually seeing this coming so I was reading an article and I realized that Google Gemini cuz I remember before when I was looking at all the information and leaks of Gemini I saw something fascinating so um Gemini actually did remove any copyrighted data in their training material so it says Google's lawyers have been eagle lied closely evaluating the training of Gemini they've even made researches removing training data that came from textbooks so essentially Gemini Google Gemini what they did was They removed any training data that was um copyright that's what they did they removed any training data that was copyright I don't know where Google gets his sources from I don't know if they struck up licenses I'm pretty sure they did pay people for the data but it goes to show the different ways in which they took this stuff on so that is fascinating that Google managed to be like hm we kind of know that you know training on large pieces of copy material although it could make the model better it's not something we want to open ourselves up to and then we do have a thread here and let me just be clear about this okay my opinion is that you know I think New York Times should just take a settlement because I think

Segment 6 (25:00 - 27:00)

they are overstating what chat TBT has done because they're acting as if people go on chat GPT and they ask chat GPT hey what are the most recent New York Times articles and of course I'm not a lawyer or anything like that you know I I don't know too much about copyright law but I do know that people use chat TPT for General Knowledge Questions and if there was like a general knowledge Corpus textbook or something like that they stole you know that information from or they trained it on then I guess that they would have an argument but like it's the New York Times it's not like people are you know constantly saying that you know hey can you know write me an article in the style of the New York Times or can you write me something about the New York Times that's not what's happening here so you can see here that Gary Marcus someone who's You Know spoken so much about uh AI um and he actually uh talks about how the New York Times actually has a good case and basically they're saying that you know open a can't afford to settle many similar cases because it's big bucks and they aren't even turning a profit like I said before they aren't turning a profit and they can't really afford to lose and the New York times know all of this now what's crazy and I feel like this is crazy as well is that there might be I me I don't think this is that bad but you know G Marcus here says that open eye is in a heap of trouble and it's not just text because essentially when you type in and this doesn't work anymore but this used to work but if you type in animated sponge it pops up with a picture of SpongeBob um and if you type in um a golden Droid from a classic movie it pops up with CIO C cp30 okay so this is a guy from the character from Star Wars and then of course this is a lovable children's lovable cartoon character so the point they're trying to make is that um it kind of does infringe on copyright now I tried this recently it doesn't work anymore so open I've updated the guidelines and like I said I think this is exactly what open ey is going to do because although the issue is complex kind of you know I get where New York Times is coming from I think that you don't want to set the precedent that you know if it does contain some copyrighted material that you know you can then asse them into the ground and destroy the model because I just don't think that's an effective way I think that there's going to be some kind of law that says you know if you know you trained on data that wasn't licensed you might have to pay for yada y but either way this is a fascinating case I think open ey does actually have a strong defense in the sense that you know the main use of chat GPT and GPT 4 isn't the New York Times And there are a billion different users and they're going to argue that 175 no 1. 7 trillion parameters I don't know how many parameters gbt 4 is they could argue that look okay maybe this is was slipped in by accident maybe there was you know um some mishaps and I guarantee what all likely happen you know I I honestly don't know they're not going to destroy gbt 4 because it took them 100 million to produce so um will be interesting let me know your comments down below and I'll see you guys in the next one

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник