Meta sued for allegedly training AI with content from pirated books

Hotstar in UAE
Hotstar in UAE

Meta is one of the companies that has decided to bet heavily on artificial intelligence to stay among the top companies in the tech industry. The firm has its own series of AI models, Llama. Like other companies, Meta trained Llama using datasets with large amounts of information available on the internet. However, a group of authors is suing Meta for allegedly using pirated books to train their AI models.

Authors like Ta-Nehisi Coates and comedian Sarah Silverman (among others) are part of the group that says Meta used a dataset with content from stolen books. Not only that, the company’s CEO, Mark Zuckerberg would have been aware that the dataset contained pirated books before giving his approval for its use in the Llama training.

Meta deliberately used pirated books to train AI, lawsuit claims

Documents related to the lawsuit were made public in the middle of this week. The case, filed in a California federal court, stems from another filed in 2023 and dismissed last year by U.S. District Judge Vince Chhabria. At the time, the authors claimed that Meta AI was able to generate text that infringed their copyrights. The original suit also alleged that Meta AI removed the copyright management information (CMI) from the content of their books.

The plaintiff group wants the case reopened

However, the plaintiff group claims that new findings warrant reopening the case. They say that they had access to internal Meta communications where Zuckerberg “approved Meta’s use of the LibGen dataset notwithstanding concerns within Meta’s AI executive team (and others at Meta) that LibGen is ‘a dataset we know to be pirated.’” LibGen is a dataset for AI training that was available on the internet for a time. It contained around 32 TBs of content focused on books of all kinds—including scientific content.

The plaintiffs told Judge Chhabria that the new findings not only bolster their previous claims. They even think they may also include a new computer fraud claim. The judge will allow the plaintiffs to present their new evidence in an amended complaint. However, he also expressed skepticism that the lawsuit could be successful for the authors.

2025-01-13 15:07:34

Leave a Comment