RSS GitHub
The Ledger A sourced historical record of AI

AI Training on Copyrighted Works: Fair Use or Infringement?

A multi-axis controversy in the The AI Copyright Wars thread.

Context

The AI copyright controversy is fundamentally about the allocation of economic value. Generative AI creates value by learning from vast corpora of human-created content — and the question of who is entitled to share in that value has no clear precedent in copyright law. The existing legal framework was designed for copying and distribution, not for the extraction of patterns and knowledge from creative works.

Key Tensions

Transformative use vs. competitive substitution: The strongest fair use arguments apply when the new use is transformative (creating something fundamentally different from the original). AI companies argue that models learn patterns rather than memorize works. But when AI outputs directly compete with the original works — when ChatGPT provides information that users would otherwise get from the Times website — the competitive substitution undermines the transformative argument.

The inconsistency problem: AI companies simultaneously argue that training on copyrighted data is fair use (requiring no permission or payment) while signing licensing deals with some publishers (acknowledging that the data has value worth paying for). This inconsistency weakens the fair use argument and suggests that licensing is commercially feasible when the counterparty has sufficient negotiating power.

Enforcement challenges: Even if courts rule that AI training requires permission, enforcing that ruling is practically difficult. Models have already been trained, and it's unclear whether retraining on licensed-only data is technically or economically feasible. This creates a first-mover advantage for companies that trained on everything before legal clarity emerged.

The international dimension: Copyright law varies by jurisdiction. The EU AI Act requires transparency about training data and compliance with EU copyright provisions, including opt-out mechanisms for rights holders. Whether AI companies can practically comply with a patchwork of international copyright requirements remains to be seen.

Status

Actively contested, with no definitive judicial ruling as of early 2026. The practical resolution is evolving through a combination of litigation, voluntary licensing deals, and regulatory requirements, but the fundamental fair use question remains open.

Position A: Training AI models on publicly available content is transformative fair use that benefits society

Medium confidence

Proponents: OpenAI Google DeepMind Meta AI tech-industry-trade-groups some-legal-scholars

Sources

  1. OpenAI and Journalism
  2. EFF on AI and Fair Use

Position B: Using copyrighted works to train AI without permission or compensation is infringement at scale

Medium confidence

Proponents: The New York Times Company The Authors Guild Getty Images music-industry visual-artists

Sources

  1. NYT v. OpenAI Complaint
  2. Authors Guild class action

Position C: The copyright framework is inadequate — we need new legal structures (compulsory licensing, data dividends, collective bargaining) to address AI's impact on creators

Medium confidence

Proponents: digital-rights-organizations labor-advocates some-legal-reformists

Sources

  1. EFF on AI and Copyright

Position D: AI companies should negotiate voluntary licensing deals with willing rights holders, creating a market-based solution

Medium confidence

Proponents: associated-press axel-springer some-media-companies-with-licensing-deals pragmatic-industry-voices

Sources

  1. AP-OpenAI licensing agreement

References

  1. OpenAI and Journalism
  2. EFF on AI and Fair Use
  3. NYT v. OpenAI Complaint
  4. Authors Guild class action
  5. EFF on AI and Copyright
  6. AP-OpenAI licensing agreement

See also