Publishers and record labels waging a copyright battle against training AI models on their content are turning to a new target: the pirate sites that allegedly supplied tech firms with the bulk of their training data.
Two lawsuits in New York federal court aim to hold the pirate library Anna’s Archive responsible for a “staggering” scale of copyright infringement that has taken on increased importance with the added context of the growth of generative artificial intelligence. The Association of American Publishers’ suit calls out Anna’s Archive for soliciting AI developers to purchase high-speed access to its content and says its sale of pirated works undercuts the market for legitimate licenses.
Pirate libraries are increasingly at the center of the copyright battle over AI training. Writers, artists, and movie studios suing AI companies are increasingly including claims over how firms acquired their training data. The lawsuits against Anna’s Archive buttress those efforts, said Justin Nelson, a Susman Godfrey attorney on the team that secured a $1.5 billion settlement for authors in a class action accusing Anthropic PBC of illegal downloads.
“It’s complementary to what we’re doing—they’re two sides of the same coin,” Nelson said. “This is trying to choke off the supply and we are trying to impose consequences on the demand.”
While the lawsuits represent a new frontier in AI-copyright litigation, the suits are unlikely to produce direct results for copyright owners because pirate sites are too difficult to permanently shut down. When pirate sites hosted on servers in the US are blocked, they inevitably reappear under new domain names in foreign countries where the US lacks jurisdiction, turning the effort into a game of whack-a-mole, according to Princeton University assistant professor Peter Henderson.
The lawsuits are “a tool by which they can create friction, but unless parties travel abroad, they won’t really face repercussions,” Henderson said.
Anna’s Archive
Anna’s Archive is one of the “largest and most brazen” websites known as shadow libraries, according to authors who have sued AI firms over their acquisition of books from the pirate site. The library, named in the US Trade Representative Office’s annual review of notorious online pirate markets, hosts more than 63 million books and 95 million papers, according to its website.
It’s also gone after music—in December, the library said it had scraped roughly 86 million music files from Spotify, downloading a total of 300 terabytes of data. That triggered a lawsuit from record labels and the streaming platform, and the plaintiffs secured a preliminary injunction suspending several Anna’s Archive websites in January.
But by the time the AAP filed its lawsuit, it said the site had resurfaced with several new domain names that continued to host pirated content.
“We don’t have any illusions that it’s going to completely shut down Anna’s Archive,” AAP president Maria Pallante said of the suit. “It’s there partly also as an educational opportunity for us to remind law enforcement and to also remind members of Congress that really do understand that this is a scourge on US intellectual property that the problem is getting worse, not better.”
Bloomberg Law sent a request for comment to a site branded as Anna’s Archive, which includes a Copyright claim form and an email address to follow up on any correspondence.
The Problem with Pirates
Online piracy has been a problem since long before the AI boom.
The failure to address the limits of US jurisdiction to block pirate sites “is a part of the big trauma of the internet governance community,” said Rodrigo Balbontin, an associate director at the Information Technology & Innovation foundation.
In 2011, two bills in Congress proposing to expand US authority to block access to foreign pirate sites generated sharp backlash from the internet community and civil liberties advocates including the ACLU and Electronic Frontier Foundation, who said the measures risked blocking legitimate websites.
“You could argue American rightsholders have it harder than anyone else,” Pallante said, citing the demand for American content and the inability to block foreign-hosted pirate sites.
In the past decade, several countries have successfully adopted site-blocking measures, largely proving many of the fears that led to the bills’ defeat to be false, according to Balbontin.
“Piracy is on the rise,” Balbontin said. “There is a proven mechanism on how to do this correctly.”
The Endgame
Authors and publishers are “caught between two extreme developments” when it comes to AI, Pallante said. One is AI developers’ “categorical assertion” of fair use for AI training. The second is flagrant piracy by shadow libraries who have become “untouchable.”
When sued, the operators of pirate sites rarely appear in court, often resulting in default judgments and no monetary relief. Pallante said there is “very little hope of any kind of financial award.” Suing Anna’s Archive was necessary because it was important to not let the level of piracy “just go on and get worse,” she said, declining to rule out suing other pirate sites.
While foreign pirate sites are hard to shut down, winning injunctions in the cases against pirate libraries can help rightsholder plaintiffs in suits against AI developers, allowing them to point to rulings against Anna’s Archive to bolster their piracy claims.
But the likely lack of any monetary recovery makes it harder for independent musicians and authors to justify bringing their own suits, leaving this a battle that’s more likely to be waged by larger rightsholders.
Beyond developing a new strategy for AI copyright litigation, Princeton’s Henderson said the lawsuit against the pirate sites could be a strategic move given a pending ruling from the US Supreme Court in Cox Communications Inc. v. Sony Music Entertainment, which will determine whether internet service providers are on the hook for users’ repeat music piracy.
“I could see the rightsholders setting themselves up for this,” he said. A win for Sony means “now they can go after ISPs. They can say, ‘Hey, ISPs, we have this default judgment against Anna’s Archive.’”
To contact the reporter on this story:
To contact the editors responsible for this story:
