policy Major

SDNY Denies Dismissal of AI-Output-Based Copyright Infringement Claims

Summary

Judge Sidney H. Stein denied OpenAI's motion to dismiss output-based copyright infringement claims within MDL 3143, accepting at the pleading stage that ChatGPT's summaries of George R.R. Martin novels were substantially similar to the protected expression in those works. The ruling extended the MDL's litigation scope from training-data ingestion to the downstream question of whether AI outputs can themselves infringe.

What Happened

Within MDL 3143 (1:25-md-03143), the Authors Guild and other plaintiff authors had alleged not only that OpenAI's training process infringed their copyrights but that ChatGPT's outputs — specifically detailed plot summaries and character descriptions generated on demand — reproduced protected expression from their novels in a way that constituted direct infringement. OpenAI moved to dismiss these output-based claims, arguing that language model outputs are generated stochastically and cannot be substantially similar to any specific prior work as a matter of law.

Judge Stein rejected this argument at the pleading stage. Reviewing the specific ChatGPT outputs attached to the complaint alongside excerpts from the Martin novels, the court found that plaintiffs had plausibly alleged substantial similarity between the AI-generated summaries and the expressive elements of the underlying books. The court declined to rule as a matter of law that no output of a language model could ever be substantially similar to a training work, holding that the question required factual development through discovery and expert analysis.

On a separate discovery question, the court also denied Microsoft's request for limits on document production, signaling that the MDL would proceed with broad discovery into both companies' training and deployment practices.

Why It Matters

The output-infringement ruling opens a second front in AI copyright litigation that the industry had hoped to close at the pleading stage. Even if AI companies ultimately prevail on training-data fair use arguments, they now face potential liability for what their models produce in deployment. The ruling creates pressure for developers to invest in output filtering and similarity detection, and substantially increases the settlement value of the pending MDL. It also sets the stage for what could be the first trial on output-based AI copyright infringement.

Tags

#copyright #litigation #ai-outputs #substantial-similarity #training-data