More

    OpenAI, The New York Times debate copyright infringement of AI tech companies in first trial arguments

    The copyright infringement trial between The New York Times and OpenAI kicked off in a federal court hearing on Tuesday.

    A judge listened to arguments from both parties in a motion to dismiss brought by OpenAI and its financial backer Microsoft. The New York Times — as well as The New York Daily News and the Center for Investigative Reporting, which have filed their own lawsuits against OpenAI and Microsoft — claim OpenAI and Microsoft used the publishers’ content to train their large language models powering their generative AI chatbots. Doing so means the tech companies are competing with those publishers by using their content to answers users’ questions, taking away the incentive for a user to visit their sites for that information and ultimately hurting their ability to monetize those users through digital advertising and subscriptions, they claim.

    OpenAI and Microsoft say what they’re doing is covered by “fair use,” a law that allows the use of copyrighted material to make something new that doesn’t compete with the original work.

    The outcome of this lawsuit has large implications for the entire digital media ecosystem, and will determine the legality of generative AI tools using publisher’s copyrighted work without their consent for training.

    Here were the main arguments during the trial:

    The New York Times’ argument

    Using copyrighted content

    OpenAI is using The New York Times’ content to train its large language models, sometimes by making copies of that content, the plaintiffs claim. Sometimes several paragraphs or entire articles part of that training dataset are returned in response to a user’s prompt. And in some cases, fresh content the LLM didn’t use for its training (because of a cut-off date) is also regurgitated by the LLM in response to a prompt. Plaintiffs gave examples of outputs that have verbatim language or summaries of articles without attribution from The New York Times.

    LLMs copy content because they can’t process information like humans

    Humans can read something, understand the underlying information and learn something new, which isn’t considered copying information. But LLMs don’t have the ability to do that since they are machines, meaning the models absorb the “expression” of the facts, not the facts themselves, which should be considered copyright infringement, according to The New York Times’ lawyers.

    Generative AI search is different from a traditional search engine

    Unlike a traditional search engine (where links to the original source are provided and a publisher can monetize that traffic through advertising or subscriptions), a generative search engine provides the answer to a question with sources in the footnotes. The footnotes, The New York Times’ lawyers argue, can contain a variety of sources, which hurts a publisher’s ability to get that user to their site.

    Evading paywalls

    OpenAI has custom GPTs in its store with products that help users remove paywalls. “Users were posting to Reddit forums and social media how they’ve gotten around a paywall using a product called SearchGPT, and in fact OpenAI pulled the product after they were aware products were being used to infringe,” said Ian Crosby, a partner at Susman Godfrey and The New York Times’ lead counsel.

    Time-sensitive content gets stripped without attribution

    The New York Times’ lawyers said content was being used from The Times’ product recommendation site Wirecutter without appropriate attribution, which means Wirecutter lost revenue from people not clicking through to the site and on affiliate links. And that stripped content was sometimes time-sensitive, such as product recommendations around Black Friday. They claim the content should be protected by a “hot news” doctrine, part of copyright law that protects time-sensitive news from being used by competitors. The lawyers argued ChatGPT cited some products as endorsed by Wirecutter when they weren’t, which hurts the brand’s reputation.

    OpenAI and Microsoft’s arguments

    Fair use doctrine 

    Lawyers for OpenAI and Microsoft said the copyrighted materials in question are allowed under fair use doctrine. AI companies have been staunch proponents of the doctrine, which allows copyrighted materials to be used without permission as long as the use is different from their primary purpose, used in non-commercial contexts and not used in a way that would harm whoever owns the copyright.

    Annette Hurst, an attorney representing Microsoft, said LLMs understand language and ideas that can be adapted for “everything from curing cancer to national security: “The plaintiffs in their own words have alleged that this technology is capable of being commercialized to the tune of billions of dollars without regard to any capability for how.”

    How LLMs work 

    Defense attorneys also disagreed with their plaintiff counterparts when it came to describing how large language models work. For example, OpenAI’s attorney said the company’s LLMs don’t actually store copyrighted content, but just rely on the weights of data derived from the training process.

    “If I say to you, ‘Yesterday all my troubles seemed so,’ we will all think to ourselves [think] “far away” because we have been exposed to that text so many times,” said Joe Gratz, an attorney at Morrison & Foerster that represented OpenAI. “That doesn’t mean you have a copy of that song somewhere in your brain.”

    Statute of limitations 

    Lawyers claimed the lawsuit shouldn’t be allowed because of the three-year statute of limitations for copyright infringement cases. However, attorneys for the Times note it wasn’t possible to know by April 2021 that OpenAI would be using the publishers’ content in ways that would harm it.

    ‘Misleading’ examples

    Lawyers for the Times say they’ve found millions of examples to provide their case. However, OpenAI argued plaintiffs have been misleading with examples of how ChatGPT replicates copyrighted content and with examples of how AI-generated content cites the Times in inaccurate answers. Defense lawyers also claim the Times exploited aspects of ChatGPT that helped use prompts to generate AI content that violated OpenAI’s terms. (Lawyers also noted OpenAI has sought to address the weaknesses.)

    No proof of harm

    The Times’ claims include OpenAI removing copyright management information (CMI) such as mastheads, author bylines and other identifiable information. However, OpenAI and Microsoft say the plaintiffs haven’t proven how they were harmed by removing CMI. They also claim plaintiffs haven’t shown OpenAI and Microsoft willingly infringed on copyrighted works. However, plaintiff lawyers said past court rulings have recognized copying copyrighted content was infringement on its own without any need to prove dissemination or economic loss.

    “Their biggest problem is they don’t have a plausible story for how they would be better off if the CMI they say was removed was in fact removed,” Gratz said. “… There is not a way in which the world would be better for them in the ways that they say the world is not good for them if the CMI that they say was removed was never removed.”

    What comes next

    The Times’ lawsuit is just one of many lawsuits facing OpenAI. While OpenAI won a case in November, other ongoing lawsuits include complaints by a group of Canadian news publishers, a group of U.S. newspapers owned by Alden Capital, and a class action lawsuit filed by a group of authors. (OpenAI, Perplexity and Microsoft roped into the ongoing Google search antitrust lawsuit after Google sent subpoenas to all three companies.)

    Other major tech startups and giants have their own legal battles related to AI and copyright. Meta faces a class action lawsuit filed by a group of writers including Sarah Silverman. Perplexity is a defendant in a lawsuit filed in October by News Corp. Google is facing a lawsuit brought against it by the Authors Guild.

    It’s unclear when U.S. Judge Sidney Stein will issue his decision on whether to let the case move forward. Megan Gray, an attorney and founder of GrayMatters Law & Policy, attended the hearing in person and noted Stein seemed to be “in it for the long haul” and unlikely to dismiss it this early.

    “Judge Stein was engaged and curious, remarkable given his age and lack of technical sophistication,” Gray said. “He understood the cases and positions, plus he has a tight rein over his courtroom. He doesn’t normally provide an audio line for the public and the fact that he did so here indicates that he is well familiar with the import of the case and its impact on society.”

    https://digiday.com/?p=565500

    Read More

    Latest articles

    spot_imgspot_img

    Related articles

    Leave a reply

    Please enter your comment!
    Please enter your name here

    spot_imgspot_img