
Two authors have launched a class action lawsuit against Apple, accusing the company of stealing books and using their content to train its AI models.
“Apple is building Apple Intelligence using Books3, a dataset of pirated copyrighted books that includes the published works of Plaintiffs and the Class,” the lawsuit explains. “Apple used Books3 to train its OpenELM language models. Apple also likely trained its Foundation Language Models using this same pirated dataset.”
According to the suit, Apple’s use of stolen content is both “deliberate and commercially significant” because it hopes that Apple Intelligence will “add trillions [of dollars] to its market capitalization in coming years.” But Apple has only selectively paid for the content it uses to train its models. It “entered into a multimillion-dollar licensing agreement with Shutterstock but not with Plaintiffs or the Class.”
Like other AI models, Apple’s Foundation Language Models depend on high-quality content for training. So Apple has used an in-house software program called Applebot to scrape content from “mass quantities of webpages for nearly nine years before disclosing that it intended to train its AI systems on this scraped data.” The suit notes that scrapers like Applebot also reach “shadow libraries” with millions of other unlicensed copyrighted books. And that includes the content created by authors who launched this suit.
“Plaintiffs and the Class are authors who have registered copyrights for their published works,” the suit continues. “They did not consent to the use of their works in any Apple Intelligence model, including the Foundation Intelligence Models and OpenELM language models. Apple did not compensate creators for use of their copyrighted works and [it] concealed the sources of their training datasets to evade legal scrutiny.”
The suit claims a “class period” of three years before the date of the filing and seeks damages and injunctive relief in a class action lawsuit that includes a body of content creators that is so large and geographically diverse that identifying them is “impractical.” It says that only Apple can properly identify every author whose work was stolen by the company.
“Plaintiffs demand a jury trial for all claims,” the filing concludes.