AI Training and Copyright: A Shifting Legal Landscape
the burgeoning field of artificial intelligence, particularly large language models (LLMs), has been operating under the assumption that the use of copyrighted data for training falls under the “fair use” clause of the Copyright Act.However, this assumption is now facing legal challenges from publishers and creators through ongoing lawsuits, many still navigating the court system. Recent key decisions, however, suggest a growing legal momentum in favor of AI companies regarding the core question of whether scraping copyrighted data for training constitutes fair use.
A pivotal ruling came this summer in Bartz v. Anthropic, which Anthropic plans to settle. Judge William Alsup determined that Anthropic’s use of digitized books as training data qualified as “fair use” under the Copyright act. The judge’s reasoning centered on the “transformative” nature of the use – the models weren’t simply replicating the books’ content, but rather utilizing the text to learn predictive language patterns, the fundamental mechanism behind LLM language generation.
Similarly, Judge Vince Chhabria reached a comparable conclusion in Kadrey v. Meta,a class action lawsuit brought by authors including Sarah Silverman. He also found Meta’s use of copyrighted books to be transformative, aligning with the primary test for fair use. However, Chhabria cautioned that transformative use alone might not guarantee fair use protection, noting that the impact on a work’s market value could also be a determining factor. His ruling indicated a degree of hesitation in establishing a sweeping precedent for future AI training cases.
Despite these nuances, the outcomes of Bartz and Kadrey are influencing industry behavior. According to one media executive, publications are now reluctant to pursue legal action against AI firms for unauthorized content use, fearing costly defeats.This caution stems not only from the rulings themselves but also from recent federal decisions, like the Google search monopoly case, and a broader judicial climate.
the most significant legal test remains the New York Times lawsuit against OpenAI and Microsoft. OpenAI has repeatedly attempted to have the case dismissed, but has been unsuccessful thus far. In May, a judge mandated the preservation of millions of chat logs and transcripts potentially relevant to the case. A victory for the New York Times could substantially alter the legal landscape surrounding fair use in AI training.