- Judge grants certification for books in two ‘shadow’ libraries
- Authors denied certification for Books3 set and scanned copies
A federal judge certified a class of authors whose books Anthropic PBC is accused of infringing by downloading them from pirated websites as it built repositories for its AI model.
Judge William Alsup granted certification to authors whose books are in two pirate libraries—LibGen and PiLiMi—from which Anthropic has been accused of illegally downloading material, according to an order filed Thursday in the US District Court for the Northern District of California. Authors whose books are in another dataset—Books3—weren’t certified. The ruling marks the first certification order amongst a slew of class actions filed against AI companies over the past few years.
Although Anthropic argued it would need “individualized” investigations to identify which books were copied or whether any downloaded files were empty, Alsup said a jury could work with a list based off of a LibGen-provided catalog containing specific titles and authors. PiLiMi similarly has a catalog of bibliographic metadata unique to its dataset, Alsup wrote.
Alsup told the plaintiffs to omit any empty files downloaded by Anthropic from the final list of books they compile to present to jurors.
The class certification order defines which books Anthropic could be held liable for infringing as the pared-down copyright case proceeds to trial. While Alsup ruled in June that training on copyrighted works was fair use, he said it was unlikely that Anthropic could assert that defense for its data downloaded from illicit online websites to create a “research library.”
Anthropic is appealing the latter part of the ruling after another California federal judge’s decision didn’t focus on the download issue in a similar copyright case against Meta, seeking to resolve their conflicting approaches to fair use.
An Anthropic spokesperson said in an emailed statement about the Thursday order that the court “failed to properly account for the significant challenges and inefficiencies of having to establish valid ownership millions of times over in a single lawsuit.” The company is “exploring all avenues for review.”
The authors’ counsel declined to comment.
While the class certification decisions are a procedural update, the “stakes are enormous,” said Adam Eisgrau, a director at tech trade group Chamber of Progress, who said an “extremely high” damages award could lead to financial ruin for the company.
“We potentially could have hundreds of thousands—maybe even millions—of plaintiffs,” Eisgrau said.
“The scope of the damages underscores the magnitude of the importance of getting this fair use analysis right—not just for Anthropic, but for the fate of innovation in our economy in the future,” he added.
While two of the online datasets passed muster, Alsup denied certification for authors whose books are in the shadow library Books3. Although there is “no question” Anthropic downloaded 196,640 records from this database, the only identifiable author information is in the file names downloaded, which isn’t sufficient, Alsup wrote. Working off of file names can be confusing or redundant if they contain incomplete information or don’t specify the edition of the book, he noted.
Book content in the Books3 dataset were also “spotty,” Alsup said, with some 7% of the files found empty.
Anthropic abandoned the use of pirated copies for LLM training and eventually switched to purchasing books and scanning them to create digital copies. But the authors of those books were denied certification in Alsup’s Thursday order.
He said his fair use ruling “peters out” any path to recovery for the plaintiffs on the scanned books. While there is evidence that the copying was done for other uses beyond LLM training, he said, that evidence “has not been presented with the requisite specificity and uniformity to warrant class certification.”
Ed Lee, a Santa Clara Law professor, said in a blog post that the order signifies a “potential for business-ending liability.”
“Even if there are only 100,000 works in the class, the total damages can easily hit $1 billion to $3 billion under the standard range,” Lee wrote.
Lieff Cabraser Heimann & Bernstein LLP, Cowan Debaets Abrahams & Sheppard LLP and Susman Godfrey LLP represent the class plaintiffs. Arnold & Porter Kaye Scholer LLP, Cooley LLP, Latham & Watkins LLP, and Lex Lumina LLP represent Anthropic.
The case is: Bartz v. Anthropic PBC, N.D. Cal., Docket No. 3:24-cv-05417, Order issued 7/17/25
To contact the reporter on this story:
To contact the editors responsible for this story: