OpenAI, The New York Times debate copyright infringement of AI tech companies in first trial arguments

The copyright infringement trial between The New York Times and OpenAI kicked off in a federal court hearing on Tuesday.

A judge listened to arguments from both parties in a motion to dismiss brought by OpenAI and its financial backer Microsoft. The New York Times — as well as The New York Daily News and the Center for Investigative Reporting, which have filed their own lawsuits against OpenAI and Microsoft — claim OpenAI and Microsoft used the publishers’ content to train their large language models powering their generative AI chatbots. Doing so means the tech companies are competing with those publishers by using their content to answers users’ questions, taking away the incentive for a user to visit their sites for that information and ultimately hurting their ability to monetize those users through digital advertising and subscriptions, they claim.

OpenAI and Microsoft say what they’re doing is covered by “fair use,” a law that allows the use of copyrighted material to make something new that doesn’t compete with the original work.

The outcome of this lawsuit has large implications for the entire digital media ecosystem, and will determine the legality of generative AI tools using publisher’s copyrighted work without their consent for training.

Here were the main arguments during the trial:

The New York Times’ argument

Using copyrighted content

OpenAI is using The New York Times’ content to train its large language models, sometimes by making copies of that content, the plaintiffs claim. Sometimes several paragraphs or entire articles part of that training dataset are returned in response to a user’s prompt. And in some cases, fresh content the LLM didn’t use for its training (because of a cut-off date) is also regurgitated by the LLM in response to a prompt. Plaintiffs gave examples of outputs that have verbatim language or summaries of articles without attribution from The New York Times.

LLMs copy content because they can’t process information like humans

Humans can read something, understand the underlying information and learn something new, which isn’t considered copying information. But LLMs don’t have the ability to do that since they are machines, meaning the models absorb the “expression” of the facts, not the facts themselves, which should be considered copyright infringement, according to The New York Times’ lawyers.

Generative AI search is different from a traditional search engine

Unlike a traditional search engine (where links to the original source are provided and a publisher can monetize that traffic through advertising or subscriptions), a generative search engine provides the answer to a question with sources in the footnotes. The footnotes, The New York Times’ lawyers argue, can contain a variety of sources, which hurts a publisher’s ability to get that user to their site.

Evading paywalls

OpenAI has custom GPTs in its store with products that help users remove paywalls. “Users were posting to Reddit forums and social media how they’ve gotten around a paywall using a product called SearchGPT, and in fact OpenAI pulled the product after they were aware products were being used to infringe,” said Ian Crosby, a partner at Susman Godfrey and The New York Times’ lead counsel.

Time-sensitive content gets stripped without attribution

The New York Times’ lawyers said content was being used from The Times’ product recommendation site Wirecutter without appropriate attribution, which means Wirecutter lost revenue from people not clicking through to the site and on affiliate links. And that stripped content was sometimes time-sensitive, such as product recommendations around Black Friday. They claim the content should be protected by a “hot news” doctrine, part of copyright law that protects time-sensitive news from being used by competitors. The lawyers argued ChatGPT cited some products as endorsed by Wirecutter when they weren’t, which hurts the brand’s reputation.

OpenAI and Microsoft’s arguments

Fair use doctrine

Lawyers for OpenAI and Microsoft said the copyrighted materials in question are allowed under fair use doctrine. AI companies have been staunch proponents of the doctrine, which allows copyrighted materials to be used without permission as long as the use is different from their primary purpose, used in non-commercial contexts and not used in a way that would harm whoever owns the copyright.

Annette Hurst, an attorney representing Microsoft, said LLMs understand language and ideas that can be adapted for “everything from curing cancer to national security: “The plaintiffs in their own words have alleged that this technology is capable of being commercialized to the tune of billions of dollars without regard to any capability for how.”

How LLMs work

Defense attorneys also disagreed with their plaintiff counterparts when it came to describing how large language models work. For example, OpenAI’s attorney said the company’s LLMs don’t actually store copyrighted content, but just rely on the weights of data derived from the training process.

“If I say to you, ‘Yesterday all my troubles seemed so,’ we will all think to ourselves [think] “far away” because we have been exposed to that text so many times,” said Joe Gratz, an attorney at Morrison & Foerster that represented OpenAI. “That doesn’t mean you have a copy of that song somewhere in your brain.”

Statute of limitations

Lawyers claimed the lawsuit shouldn’t be allowed because of the three-year statute of limitations for copyright infringement cases. However, attorneys for the Times note it wasn’t possible to know by April 2021 that OpenAI would be using the publishers’ content in ways that would harm it.

‘Misleading’ examples

Lawyers for the Times say they’ve found millions of examples to provide their case. However, OpenAI argued plaintiffs have been misleading with examples of how ChatGPT replicates copyrighted content and with examples of how AI-generated content cites the Times in inaccurate answers. Defense lawyers also claim the Times exploited aspects of ChatGPT that helped use prompts to generate AI content that violated OpenAI’s terms. (Lawyers also noted OpenAI has sought to address the weaknesses.)

No proof of harm

The Times’ claims include OpenAI removing copyright management information (CMI) such as mastheads, author bylines and other identifiable information. However, OpenAI and Microsoft say the plaintiffs haven’t proven how they were harmed by removing CMI. They also claim plaintiffs haven’t shown OpenAI and Microsoft willingly infringed on copyrighted works. However, plaintiff lawyers said past court rulings have recognized copying copyrighted content was infringement on its own without any need to prove dissemination or economic loss.

“Their biggest problem is they don’t have a plausible story for how they would be better off if the CMI they say was removed was in fact removed,” Gratz said. “… There is not a way in which the world would be better for them in the ways that they say the world is not good for them if the CMI that they say was removed was never removed.”

What comes next

The Times’ lawsuit is just one of many lawsuits facing OpenAI. While OpenAI won a case in November, other ongoing lawsuits include complaints by a group of Canadian news publishers, a group of U.S. newspapers owned by Alden Capital, and a class action lawsuit filed by a group of authors. (OpenAI, Perplexity and Microsoft roped into the ongoing Google search antitrust lawsuit after Google sent subpoenas to all three companies.)

Other major tech startups and giants have their own legal battles related to AI and copyright. Meta faces a class action lawsuit filed by a group of writers including Sarah Silverman. Perplexity is a defendant in a lawsuit filed in October by News Corp. Google is facing a lawsuit brought against it by the Authors Guild.

It’s unclear when U.S. Judge Sidney Stein will issue his decision on whether to let the case move forward. Megan Gray, an attorney and founder of GrayMatters Law & Policy, attended the hearing in person and noted Stein seemed to be “in it for the long haul” and unlikely to dismiss it this early.

“Judge Stein was engaged and curious, remarkable given his age and lack of technical sophistication,” Gray said. “He understood the cases and positions, plus he has a tight rein over his courtroom. He doesn’t normally provide an audio line for the public and the fact that he did so here indicates that he is well familiar with the import of the case and its impact on society.”

https://digiday.com/?p=565500

Future-proof your career with lifetime access to CompTIA training

The biggest thing I didn’t see at CES: Thunderbolt 5. Insiders explain why

Reolink Home Hub review: Camera control—and cost control, too

Photoshop is now a team effort with Adobe’s Live Co-Editing

Lizzo says she’s tired of being ‘dragged by everyone’ on the internet: ‘I QUIT’

الحياة على إنسيلادوس؟ أوروبا تتطلع إلى مهمة علم الأحياء الفلكية إلى قمر محيط زحل

الميزة: رواية القصص المثالية لأوبرا Final Fantasy VI، وكيف أخطأت Pixel Remaster ملاحظة

إستطلاع: Box Art Brawl

Cavs Beat Thunder in Historic Battle Called a Potential NBA Finals Preview by Fans

Cavs end Thunder winning streak, 129-122, in potential NBA Finals preview that lives up to the hype

Jimmy Butler Suspended By Miami Heat For Seven Games After Suggesting Trade

Tesla stock’s bumpy road through 2024 was erased by Musk and Trump’s bromance

Start 2025 strong with data-driven strategies for retail success by Edna Chavira

Google’s search market share falls below 90% for first time since 2015

Data, AI and advertising: 2025 predictions

7 proven strategies for effective B2B customer retention

5 Explosive Claims From Peacock’s ‘Diddy: The Making Of A Bad Boy’ Documentary

Retro blue VW van miraculously survives deadly Los Angeles fire

Stephen Schwartz, Bonnie Greenberg to Be Honored at Guild of Music Supervisors Awards

ChatGPT’s new feature lets you make to-do lists and set reminders