Meta knowingly used pirated materials to train its Llama artificial intelligence models—with the blessing of company chief Mark Zuckerberg—according to an ongoing copyright lawsuit against the company. How TechCrunch reports, plaintiffs in Kadrey v. Meta happening filed court documents that mention the company using LibGen AI training dataset.
LibGen is commonly referred to as a “shadow library” that provides access to files of academic and general books, journals, images, and other materials. A lawyer for the plaintiffs, who include writers Sarah Silverman and Ta-Nehisi Coates, accused Zuckerberg of approving the use of LibGen for training despite concerns raised by company executives and employees who described it as “a dataset that (they) know that he's a pirate.” “
The complaint also states that the company removed copyright information from LibGen materials before transferring them to Llama. Meta apparently admitted in a document submitted to the court that she had “removed(edited) all copyright paragraphs from beginning to end” from scientific journal articles. One of its engineers reportedly even created a script to automatically remove copyright information. The lawyer argued that Meta did this to hide its copyright infringement activities from the public. Additionally, the lawyer mentioned that Meta admitted to distributing LibGen material via torrents, although its engineers felt uncomfortable sharing it “from a corporate laptop (owned by Meta).
Silverman, along with other writers, sued Meta and OpenAI for copyright infringement in 2023. They accused the companies of using pirated materials from shadow libraries to train their AI models. The court had previously rejected some of their claims, but the plaintiffs said their amended complaint supported their allegations and addressed the court's earlier reasons for dismissal.