A 1980s copyright case shows how OpenAI could survive its lawsuits.
How copyright lawsuits could kill OpenAI If youâre old enough to remember watching [the hit kidâs show Animaniacs](, you probably remember Napster, too. The peer-to-peer file-sharing site, which made it easy to download music for free in an era before Spotify and Apple Music, took college campuses by storm in the late 1990s. This did not escape the notice of the record companies, and in 2001, a federal court [ruled]( that Napster was liable for copyright infringement. The content producers fought back against the technology platform and won. But that was 2001 â before the iPhone, before YouTube, and before generative AI. This generationâs big copyright battle is pitting journalists against artificially intelligent software that has learned from and can regurgitate their reporting. Late last year, the New York Times [sued OpenAI and Microsoft](, alleging that the companies are stealing its copyrighted content to train their large language models and then profiting off of it. [In a point-by-point rebuttal]( to the lawsuitâs accusations, OpenAI claimed no wrongdoing. Meanwhile, the Senate Judiciary Subcommittee on Privacy, Technology, and Law [held a hearing]( in which news executives implored lawmakers to force AI companies to pay publishers for using their content. Depending on who you ask, whatâs at stake is either the future of the news business, [the future of copyright law](, the future of innovation, or, specifically, the future of OpenAI and other generative AI companies. Or all of the above. Ideally, Congress would step in to settle the debate, but as James Grimmelmann, a professor of digital and information law at Cornell Law School, told me: âCongress does not like to legislate on copyright unless thereâs a consensus of most of the players in the room â and thereâs not anything resembling that consensus right now. So Congress may hold hearings and talk about it, but weâre really far from any legislative action.â So which is it? Advocates of technological innovation would say that AI technology is full of promise and weâd better not stifle that while itâs in the early days of development. Media companies would say that even exciting technology companies need to pay when they use copyrighted content, and if we give AI a free pass, journalism as we know it could eventually cease to exist. The consensus of [casual]( [observers]( and [legal experts alike]( is that this New York Times lawsuit is a big deal. Not only does the Times appear to have a solid case, but OpenAI [has a lot to lose]( â [perhaps its very existence](. The case against OpenAI, briefly explained If you ask ChatGPT a question about, say, the fall of the Berlin Wall, thereâs a good chance some of the information in the answer has been culled from New York Times articles. Thatâs because the large language model, or LLM, that powers ChatGPT has been trained on over 500 gigabytes of data, [including newspaper archives](. Generative AI tools only work because this training data helps them know how to effectively respond to prompts. In other words, copyrighted data, in part, is what makes this new technology powerful and what makes OpenAI such a [valuable company](. The New York Times claims that OpenAI trained its model with copyrighted Times content and did not pay proper licensing fees. That, [the lawsuit says](, enables OpenAI to âcompete with and closely mimicâ the New York Times, perhaps by summing up a news story based on Times reporting or summing up a product recommendation based on Wirecutter reviews. Even worse is what the lawsuit calls âregurgitation,â which is when OpenAI spits out text that matches Times articles verbatim. The Times provides 100 examples of such âregurgitationâ in the lawsuit. In its rebuttal, OpenAI said that regurgitation is a ârare bugâ that the company is âworking to drive to zero.â It also claims that the Times âintentionally manipulated promptsâ to get this to happen and âcherry-picked their examples from many attempts.â But at the end of the day, the New York Times argues that OpenAI is making money off of content and costing the newspaper âbillions of dollars in statutory and actual damages.â [By one estimate](, given the millions of articles potentially implicated and the cost per instance of copying, the New York Times might be looking for $450 billion in damages. OpenAI has a clear solution to this conflict: Pay the copyright owners upfront. The company has already announced licensing deals with folks [like the Associated Press]( and [Axel Springer](. OpenAI also claims that it was negotiating a deal with the New York Times right before the newspaper filed its lawsuit. Just how much OpenAI is willing to pay news outlets is unclear. A January 4 report [in the Information]( said that OpenAI has offered some media firms âas little as between $1 million and $5 million to license their articles for use in training its large language models,â which seems like a small amount of money to OpenAI, currently aiming for a [valuation]( as high as $100 billion. But the mounting lawsuits, should they go against the company, could be far more expensive than paying heftier licensing fees. The New York Times is also not the only party suing OpenAI and other tech companies over copyright infringement. A growing list of authors and entertainers have been filing lawsuits [since ChatGPT made its splashy debut in the fall of 2022](, accusing these companies of copying their works in order to train their models. The copyright holders filing these lawsuits extend well beyond writers, too. Developers have sued OpenAI and Microsoft [for allegedly stealing software code](, while Getty Images is embroiled in a lawsuit against Stability AI, the makers of image-generating model Stable Diffusion, over its copyrighted photos. âWhen youâre talking about copyright and you get statutory damages,â said [Corynne McSherry](, legal director at the Electronic Frontier Foundation, âif you lose, the downside and the financial risk is massive.â The case for innovation While [itâs easy]( to compare the Times case to the Napster one, the better precedent involves the VCR, according to McSherry. In 1984, a years-long copyright case between Sony and Universal Studios over the practice of using VCRs to record TV shows [made it all the way to the United States Supreme Court](. The studio alleged that Sonyâs Betamax video tapes could be used for copyright infringement, while Sonyâs lawyers argued that taping shows was [fair use](, which is the doctrine that allows copyrighted material to be reused without permission or payment. Sony won. The judgeâs decision, which has never been overturned, [said that]( if machines, including the VCR, have non-infringing uses then the company that makes them canât be held liable if customers use them to infringe upon copyrights. The entertainment industry [was forever changed]( by this case. The VCR let people watch whatever was broadcast on TV whenever they wanted, and in just a few years, Hollywood studios actually ended up [seeing their profits grow]( in the VCR era. The machine got people more excited about watching movies, and they watched more of them, both at home and in theaters. âIf you have to go to copyright owners for permission for technological innovation, youâre going to get a lot less innovation,â McSherry told Vox. That in mind, thereâs one more copyright lawsuit worth looking at: [the Google Books case](. In 2004, Google started scanning books, including copyrighted works, so that âsnippetsâ of their text would show up in search results. It partnered with libraries at places like Harvard, Stanford, and the University of Michigan, [as well as magazines](, like New York Magazine and Popular Mechanics, that wanted their archives digitized. Then came the lawsuits, including a 2005 class action suit from the Authors Guild. The authors cried copyright infringement, and Google claimed that making books searchable [amounted to fair use](. As Judge Denny Chin [said]( in a 2013 decision dismissing the authorsâ lawsuit, Google Books is transformative because, thanks to the tool, âwords in books are being used in a way they have not been used before.â It took about a decade, but Google eventually won, and [Google Books is now legal](. Like Sony and Napster before it, the Google Books case is ultimately about the battle between new technology platforms and copyright holders. It also raises the question of innovation. Is it possible that giving copyright holders too much power could stifle technological progress? In that 2013 decision, Judge Chin said its technology âadvances the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders.â And a 2023 economics [study of the effects of Google Books]( found that âdigitization significantly boosts the demand for physical versionsâ and âallows independent publishers to introduce new editions for existing books, further increasing sales.â So consider that another point in favor of giving tech platforms room to innovate. Few would disagree that technological progress has shaped the media business since the invention of the printing press. Thatâs basically why the [earliest copyright laws]( were written over 300 years ago: Technology made copying easier, and authors needed some way to protect their intellectual property. But AI is a bigger leap forward, technologically speaking, than the VCR, Napster, and Google Books combined. We donât know yet, but AI seems destined to transform our understanding of copyright and how content creators get paid for their work. It will take a while, too. A ruling in the New York Timesâs case against OpenAI will take years, and even then, questions will remain. âI think generative AI could be as transformational for copyright as the printing press,â said Grimmelmann, the Cornell law professor. âBut that will probably take a little bit longer to play out.â âAdam Clark Estes, senior correspondent [Three speech bubbles representing the OpenAI GPT chatbot store are floating above a horizon in an etched drawing of a countryside.]( Paige Vickers/Vox; Getty Images [There are too many chatbots]( [Will OpenAIâs new chatbot store finally make AI useful?]( [The letters AI glowing on a dark ground surrounded by tiny dots of light like stars.]( Getty Images [Thousands of AI experts are torn about what theyâve created, new study finds]( [The very confusing landscape of advanced AI risk, briefly explained.]( [A vehicle that resembles a large drone with propellors extending from its base on four sides hovers in the air. ]( VCG/VCG via Getty Images [Are flying cars finally here?]( [The world had âflying carsâ in the 1930s. We could be getting them again.](
Â
[Learn more about RevenueStripe...]( [A hand puts a ballot into a box with a digital code on it.]( Moor Studio/Getty Images [You thought 2023 was a big year for AI? Buckle up.]( [AI will change the world this year. We just donât know how yet.]( [An illustrated â2024â is seen through a binocular field of view.]( Paige Vickers/Vox [24 things we think will happen in 2024]( [From Trump to Tesla, how 2024 will shake out, according to the Future Perfect team.]( Support our work Vox Technology is free for all, thanks in part to financial support from our readers. Will you join them by making a gift today? [Give]( [Listen To This] [Listen to This]( [Hollywood's secret musicals]( The studios promoting Mean Girls, Wonka, and The Color Purple are hiding something from you. [Listen on Apple Podcasts]( [This is cool] [Astronomers spotted something perplexing near the beginning of time](
Â
[Learn more about RevenueStripe...]( [Facebook]( [Twitter]( [YouTube]( This email was sent to {EMAIL}. Manage yourâ¯[email preferences]( , orâ¯[unsubscribe](param=tech) â¯to stop receiving emails from Vox Media. View our [Privacy Notice]( and our [Terms of Service](. Vox Media, 1201 Connecticut Ave. NW, Washington, DC 20036. Copyright © 2024. All rights reserved.