Recent rulings by two U.S. District Judges have offered a nuanced perspective on the legality of training Artificial Intelligence (AI) models using copyrighted books
Judge Alsup’s Ruling in Bartz v. Anthropic
Training on Copyrighted Works as “Fair Use” (Generally): Judge Alsup ruled that training LLMs on copyrighted books generally constitutes “fair use” under U.S. copyright law. He deemed the process “quintessentially transformative,” comparing it to a human learning to write by reading books, rather than merely reproducing them. This is a significant win for AI developers, as it validates the core process of training models on vast datasets. The judge emphasized that the AI’s purpose is not to replicate the original works but to learn statistical relationships and generate new, different text.
- The “Piracy Twist” – Using Pirated Materials is NOT Fair Use: This is where the divergence and critical caveat lie. While training itself can be fair use, Judge Alsup drew a hard line regarding the sourcing of the training data. He explicitly stated that Anthropic’s creation of a “central library” of over 7 million pirated books, downloaded from “shadow libraries” (pirate sites), was copyright infringement and not protected by fair use. The judge noted that “piracy was the point: To build a central library that one could have paid for… but without paying for it.” This part of the ruling means Anthropic will proceed to trial in December to determine damages for this specific infringement related to its pirated library.
- Implication: This creates a crucial distinction: how the data is acquired matters immensely. AI companies cannot claim fair use as a shield if their underlying training data was obtained illegally. This could open the door for a wave of lawsuits focusing on the source of the data rather than solely on the “transformative” nature of the AI output.
Other Related Rulings and Divergent Approaches
While not directly on pirated material, another recent ruling involving Meta Platforms and authors, decided by Judge Vince Chhabria, showcased a different judicial approach, which indirectly highlights the complexities of AI copyright litigation:
- Focus on “Market Harm”: In the Meta case, Judge Chhabria appeared more focused on whether the plaintiffs could prove “market harm” – that Meta’s use of their books directly damaged their ability to sell their works. Despite Meta also allegedly using pirated materials, the judge seemed less interested in that specific angle, emphasizing the need for concrete financial damages to win a copyright case against an AI company.
- Implication: This suggests that proving infringement for AI training might depend heavily on demonstrating quantifiable financial impact on the copyrighted work’s market, which can be challenging. It indicates that different judges may prioritize different factors within the four-factor fair use analysis.
Overall Implications
- Mixed Signals, but Crucial Clarification: The rulings are a “mixed bag,” offering wins to both sides. AI companies can celebrate the general affirmation of “fair use” for the training process itself, a foundational principle for their technology. However, copyright holders now have a powerful tool to challenge AI companies that have relied on illegally sourced data.
- Emphasis on Data Sourcing Due Diligence: AI companies are now on clear notice that they must rigorously vet their training datasets. Relying on “pirate sites” or “shadow libraries” carries significant legal risk, regardless of whether the eventual AI output is deemed transformative.
- Long Legal Battle Ahead: These are just the initial significant rulings. Many other high-profile cases are ongoing (e.g., The New York Times vs. OpenAI, Getty Images vs. Stability AI), and appeals are highly likely. The legal landscape for AI and copyright is still nascent and will continue to evolve as more cases are decided and potentially reach higher courts.
- Potential for Licensing Frameworks: The ongoing litigation could push the industry towards establishing robust licensing frameworks, similar to those in the music industry, to compensate creators whose works are used for AI training. This is seen by some as a more sustainable path forward than endless litigation.
In essence, while courts acknowledge the transformative nature of AI training, they are drawing clear boundaries around illegal data acquisition, emphasizing that the end does not justify the means when it comes to copyright law.
Did you subscribe to our Daily newsletter?
It’s Free! Click here to Subscribe!
Source: Bar and Beach