Fair Use vs. Illicit Data: US Courts Draw Clear Lines on AI Training with Copyrighted Material

June 30, 2025

543

Recent rulings by two U.S. District Judges have offered a nuanced perspective on the legality of training Artificial Intelligence (AI) models using copyrighted books

Judge Alsup’s Ruling in Bartz v. Anthropic

Training on Copyrighted Works as “Fair Use” (Generally): Judge Alsup ruled that training LLMs on copyrighted books generally constitutes “fair use” under U.S. copyright law. He deemed the process “quintessentially transformative,” comparing it to a human learning to write by reading books, rather than merely reproducing them. This is a significant win for AI developers, as it validates the core process of training models on vast datasets. The judge emphasized that the AI’s purpose is not to replicate the original works but to learn statistical relationships and generate new, different text.

The “Piracy Twist” – Using Pirated Materials is NOT Fair Use: This is where the divergence and critical caveat lie. While training itself can be fair use, Judge Alsup drew a hard line regarding the sourcing of the training data. He explicitly stated that Anthropic’s creation of a “central library” of over 7 million pirated books, downloaded from “shadow libraries” (pirate sites), was copyright infringement and not protected by fair use. The judge noted that “piracy was the point: To build a central library that one could have paid for… but without paying for it.” This part of the ruling means Anthropic will proceed to trial in December to determine damages for this specific infringement related to its pirated library.
- Implication: This creates a crucial distinction: how the data is acquired matters immensely. AI companies cannot claim fair use as a shield if their underlying training data was obtained illegally. This could open the door for a wave of lawsuits focusing on the source of the data rather than solely on the “transformative” nature of the AI output.

Other Related Rulings and Divergent Approaches

While not directly on pirated material, another recent ruling involving Meta Platforms and authors, decided by Judge Vince Chhabria, showcased a different judicial approach, which indirectly highlights the complexities of AI copyright litigation:

Focus on “Market Harm”: In the Meta case, Judge Chhabria appeared more focused on whether the plaintiffs could prove “market harm” – that Meta’s use of their books directly damaged their ability to sell their works. Despite Meta also allegedly using pirated materials, the judge seemed less interested in that specific angle, emphasizing the need for concrete financial damages to win a copyright case against an AI company.
- Implication: This suggests that proving infringement for AI training might depend heavily on demonstrating quantifiable financial impact on the copyrighted work’s market, which can be challenging. It indicates that different judges may prioritize different factors within the four-factor fair use analysis.

Overall Implications

Mixed Signals, but Crucial Clarification: The rulings are a “mixed bag,” offering wins to both sides. AI companies can celebrate the general affirmation of “fair use” for the training process itself, a foundational principle for their technology. However, copyright holders now have a powerful tool to challenge AI companies that have relied on illegally sourced data.
Emphasis on Data Sourcing Due Diligence: AI companies are now on clear notice that they must rigorously vet their training datasets. Relying on “pirate sites” or “shadow libraries” carries significant legal risk, regardless of whether the eventual AI output is deemed transformative.
Long Legal Battle Ahead: These are just the initial significant rulings. Many other high-profile cases are ongoing (e.g., The New York Times vs. OpenAI, Getty Images vs. Stability AI), and appeals are highly likely. The legal landscape for AI and copyright is still nascent and will continue to evolve as more cases are decided and potentially reach higher courts.
Potential for Licensing Frameworks: The ongoing litigation could push the industry towards establishing robust licensing frameworks, similar to those in the music industry, to compensate creators whose works are used for AI training. This is seen by some as a more sustainable path forward than endless litigation.

In essence, while courts acknowledge the transformative nature of AI training, they are drawing clear boundaries around illegal data acquisition, emphasizing that the end does not justify the means when it comes to copyright law.

Did you subscribe to our Daily newsletter?

It’s Free! Click here to Subscribe!

Source: Bar and Beach

Fair Use vs. Illicit Data: US Courts Draw Clear Lines on AI Training with Copyrighted Material

Judge Alsup’s Ruling in Bartz v. Anthropic

Other Related Rulings and Divergent Approaches

Overall Implications

Lesson Learned

Unguarded Conveyor Drum and Communication Lapse Cause Injury

Case Study: Benzene Vapour Fatality Following Tank Entry

Auxiliary Boiler Malfunction Leading to Crew Injuries

Impact Damage from Mooring Line Failure at Exposed Berth

Improper Handling Procedures Lead to Concrete Steel Pipe Damage

Latest Articles

Dominican Republic Maintains Heightened Health Controls at Ports

Singapore Strait Boardings Fall After Crackdown but New Risks Emerge

Global Maritime Leaders Push Forward After Net Zero Framework Delay

Bulk Carrier Values Hold Firm as Earnings Strengthen Across Segments

Alternative-Fuelled Vessel Orders Drop Sharply in November 2025