active multi-axis polarity

AI Training on Copyrighted Works: Fair Use or Infringement?

Part of thread: The AI Copyright Wars

Positions

"Training AI models on publicly available content is transformative fair use that benefits society"

medium
Proponents: openai google-deepmind meta-ai tech-industry-trade-groups some-legal-scholars

"Using copyrighted works to train AI without permission or compensation is infringement at scale"

medium
Proponents: new-york-times authors-guild getty-images music-industry visual-artists

"The copyright framework is inadequate — we need new legal structures (compulsory licensing, data dividends, collective bargaining) to address AI's impact on creators"

medium
Proponents: digital-rights-organizations labor-advocates some-legal-reformists

"AI companies should negotiate voluntary licensing deals with willing rights holders, creating a market-based solution"

medium
Proponents: associated-press axel-springer some-media-companies-with-licensing-deals pragmatic-industry-voices

Context

The AI copyright controversy is fundamentally about the allocation of economic value. Generative AI creates value by learning from vast corpora of human-created content — and the question of who is entitled to share in that value has no clear precedent in copyright law. The existing legal framework was designed for copying and distribution, not for the extraction of patterns and knowledge from creative works.

Key Tensions

Transformative use vs. competitive substitution: The strongest fair use arguments apply when the new use is transformative (creating something fundamentally different from the original). AI companies argue that models learn patterns rather than memorize works. But when AI outputs directly compete with the original works — when ChatGPT provides information that users would otherwise get from the Times website — the competitive substitution undermines the transformative argument.

The inconsistency problem: AI companies simultaneously argue that training on copyrighted data is fair use (requiring no permission or payment) while signing licensing deals with some publishers (acknowledging that the data has value worth paying for). This inconsistency weakens the fair use argument and suggests that licensing is commercially feasible when the counterparty has sufficient negotiating power.

Enforcement challenges: Even if courts rule that AI training requires permission, enforcing that ruling is practically difficult. Models have already been trained, and it's unclear whether retraining on licensed-only data is technically or economically feasible. This creates a first-mover advantage for companies that trained on everything before legal clarity emerged.

The international dimension: Copyright law varies by jurisdiction. The EU AI Act requires transparency about training data and compliance with EU copyright provisions, including opt-out mechanisms for rights holders. Whether AI companies can practically comply with a patchwork of international copyright requirements remains to be seen.

Status

Actively contested, with no definitive judicial ruling as of early 2026. The practical resolution is evolving through a combination of litigation, voluntary licensing deals, and regulatory requirements, but the fundamental fair use question remains open.

Last updated: March 8, 2026