Four Emerging Principles on Fair Use and Training Artificial Intelligence
By Paul Roberts modernpatentlaw.com
Across landmark decisions in Ross Intelligence, Bartz v. Google, and Kadrey v. Meta, federal courts are converging on a coherent framework for evaluating AI training under copyright law. These cases establish four foundational principles that collectively define the boundaries of lawful machine learning in 2025.
This framework represents the judiciary's pragmatic response to transformative technology—adapting century-old fair use doctrine to accommodate analytical uses that bear little resemblance to traditional copying. The principles address transformation, intermediate copying, market harm, and the diminished role of creative nature when use is non-expressive.
Principle 1: Transformation Is Function-Specific
What the Copying Does
Courts now focus on the purpose and function of the use rather than merely the amount copied or technical means employed.
Analytical Repurposing
When AI training extracts linguistic patterns for analysis rather than reproducing expression for consumption, the use qualifies as transformative.
Learning vs. Competing
The critical distinction: learning from copyrighted works differs fundamentally from competing with them in their native markets.
"The use is for an entirely new and different function"—Judge Chhabria's formulation captures how function-driven analysis reshapes transformation inquiry under the first fair use factor.
This principle effectively inverts traditional analysis. Volume of copying becomes secondary when the purpose of that copying serves a fundamentally different function than the original work's expressive or entertainment value. Courts assess what the AI system does with the training data, not merely what it takes.
Principle 2: Intermediate Copying Is Permissible
Doctrinal Foundation
Building on precedents from Google v. Oracle, Sony v. Connectix, and Sega v. Accolade, courts recognize that complete copying can be lawful when technologically necessary for a transformative purpose.
The key inquiry shifts from whether entire works were copied to whether the AI system subsequently exposes that protected expression to end users.
Application to AI Training
Both Bartz and Kadrey confirm that ingesting complete copyrighted works to extract linguistic structure—without reproducing those works in outputs—falls within established intermediate copying doctrine.
What matters is not the training corpus itself, but whether the resulting model functions as a substitute for the original copyrighted materials or merely embodies learned patterns.
This principle eliminates per se liability for full-text ingestion during training. Courts treat the training phase as intermediate copying—a necessary step in creating a transformative analytical tool. Liability attaches only when outputs impermissibly reconstruct or substitute for the training materials.
Principle 3: Market Harm Requires Evidence
Speculation Insufficient
Plaintiffs must prove actual market substitution or demonstrate a functioning licensing market—hypothetical harm fails as a matter of law.
Ross: Proven Harm
Thomson Reuters prevailed because Westlaw's established legal research market was directly affected by a competing AI product serving identical functions.
Bartz & Kadrey: No Evidence
Defendants won dismissal because plaintiffs provided no empirical proof of lost book sales, displaced markets, or existing AI training licenses.
Critical Evidentiary Standard: Courts explicitly reject protecting hypothetical licensing markets that plaintiffs propose during litigation. The fourth fair use factor requires concrete evidence of economic harm to established markets or derivative works that creators would traditionally license.
This evidentiary burden significantly favors AI developers in cases involving general-purpose language models. Unless plaintiffs can demonstrate that the AI system serves as a market substitute—reducing demand for the original works themselves—fair use prevails on the fourth factor despite commercial training purposes.
Traditionally, the highly creative nature of copyrighted works weighs against fair use under the second factor. However, this calculus changes dramatically when copying serves a non-expressive analytical function rather than an expressive or entertainment purpose.
Formal Recognition
Both Bartz and Kadrey acknowledge that using creative literary works formally weighs against defendants on the nature-of-work factor.
Minimal Practical Weight
Yet judges explicitly describe this factor as carrying minimal practical significance when the use transforms expression into analytical training data.
Data Points, Not Expression
Courts characterize creative works in this context as "data points in a mathematical space"—their creativity matters less when the AI extracts patterns rather than reproducing expression.
This represents a fundamental doctrinal shift. Creativity of the source material, while still technically relevant, loses its traditional weight when the defendant's use is analytical rather than expressive. The function of the copying—not the nature of what was copied—becomes dispositive.
The Emerging Safe Harbor
Pragmatic Boundary
Together, these four principles define a functional safe harbor for AI training under fair use doctrine. The boundary turns on two determinative questions: Does the AI system compete with the copyrighted product's core market? Does the use transform expression into non-expressive analytical patterns?
The Decisive Test
Fails Fair Use: Training that creates products competing directly with copyrighted works in their native markets
Passes Fair Use: Training that extracts patterns without market substitution or verbatim reproduction
Transformative analytical use—absent concrete proof of market harm—constitutes the new judicial safe harbor. AI developers operating within these boundaries can proceed with confidence that intermediate copying for pattern extraction, even of complete copyrighted works, falls within the scope of lawful fair use under current precedent.
Safe Harbor Applied: Spectrum of Risk
1
Ross Intelligence
Commercial Substitution
AI legal research tool directly competing with Westlaw's established market for case law analysis and retrieval—clear infringement.
2
Middle Ground
Context-Dependent
Intermediate copying and full-text ingestion tolerated when outputs don't expose originals, even in commercial contexts.
3
Bartz & Kadrey
Transformative Learning
General-purpose language models learning patterns without reproducing specific works—protected fair use.
Function and market impact now define liability risk. The spectrum runs from direct product substitution (infringement) to pattern extraction for general analytical tools (fair use). AI developers must assess where their specific use case falls along this continuum, with market competition serving as the most reliable predictor of judicial outcome.
Notably, factors that once seemed decisive—commercial purpose, complete copying, creative nature of works—prove largely irrelevant when the use is truly transformative and non-substitutive. The safe harbor accommodates even aggressive training practices provided outputs don't reconstruct the copyrighted inputs.
Practical Implications for AI Developers
01
Document Training Purpose
Maintain detailed records demonstrating that training serves analytical pattern extraction rather than expressive reproduction. Prove non-expressive use through system design documentation and technical specifications.
02
Retain Output Logs
Preserve evidence showing that the AI system does not generate verbatim or substantially similar reproductions of training materials. Automated testing and content filtering logs become critical evidence.
03
Audit Market Overlap
Conduct regular assessments of whether AI products compete with copyrighted materials in their primary markets. Identify potential substitution effects before litigation forces the analysis.
04
Implement Transparency Protocols
Evidence and transparency constitute the best defense under emerging case law. Proactive documentation of training practices, data sources, and output controls positions developers favorably should disputes arise.
Best Practice: Treat fair use as an evidentiary exercise. The developer who can conclusively demonstrate transformative analytical use and absence of market substitution will prevail under the framework established by Bartz and Kadrey.
Policy and Legislative Outlook
Judicial Adaptation
Courts are successfully adapting 17 U.S.C. § 107 without requiring legislative amendment. The fair use factors prove sufficiently flexible to accommodate AI training when judges focus on function and market impact rather than mechanical application of traditional doctrine.
This judicial approach creates precedent faster than legislation could be drafted and enacted, providing immediate guidance to the AI industry while preserving flexibility for future technological developments.
Congressional Prospects
Congress may eventually codify distinctions between AI training (pattern extraction) and generation (output production), potentially creating statutory safe harbors or licensing frameworks. However, absent legislative action, the judiciary is effectively leading AI copyright governance through case-by-case adjudication.
Likely regulatory focus areas include documentation requirements, data-provenance rules, and mandatory licensing frameworks for certain commercial AI applications—particularly those serving markets adjacent to copyrighted works.
2025 represents the judicial prototype for AI copyright governance. The principles emerging from Ross, Bartz, and Kadrey will guide both future litigation and eventual legislative reforms. AI developers and publishers should treat these holdings as the operative framework until Congress provides additional clarity through statute.
Closing Takeaways
Purpose Over Volume
Courts care about why copying occurs far more than how much is copied. Transformative analytical purpose trumps traditional quantity concerns.
Evidence Is Essential
Fair use determinations increasingly turn on concrete proof rather than legal abstractions. Market harm requires empirical demonstration; speculation fails as a matter of law.
Function Determines Outcome
The analytical versus expressive distinction drives results across all four fair use factors. Non-expressive pattern extraction receives maximum protection.
Courts are building a judicial safe harbor for AI learning—one that protects transformative analytical uses while preserving copyright's core incentive function. AI developers who understand and operate within this framework can train large language models with confidence, while publishers gain clarity on when their works receive protection against market substitution.
Thank you. I'm happy to address questions on how these cases reshape copyright strategy for AI developers and publishers navigating this evolving landscape.