Meta secretly trained its AI on notorious copyright infringement database, new unpublished court documents reveal


“Meta has treated the so-called 'public availability' of shadow datasets as a get-out-of-jail-free card, even though Meta's internal records show that every decision maker involved at Meta, up to and including its CEO, Mark Zuckerberg, knew LibGen was 'a data set that we knew was copyright infringing,'” the plaintiffs allege in this motion. (First filed in late 2024, this motion is the third request to file an amended complaint.)

In addition to the plaintiff's brief, another record was not redacted at Chhabria's order—Meta's protest for motion to file an amended complaint. It argued that the authors' attempt to add additional claims to the case was “an eleventh play on a false and inflammatory premise,” and denied that Meta had waited to reveal important information during discovery. Instead, Meta argues that it first disclosed to plaintiffs that it used the LibGen data set in July 2024. (Because much of the discovery material remains confidential, it is difficult for WIRED to confirm that statement.)

Meta's argument hinges on its claim that the plaintiffs knew about LibGen's use and should not have been given additional time to file a third amended request when they had sufficient time to do so before the process. discovery concluded in December 2024. “Plaintiffs have known about Meta's downloads and use of LibGen and other 'shadow libraries' since at least mid-July 2024,” the law states. the giant's master technology giant argumentative.

In November 2023, Chhabria granted Meta's motion to dismiss several of the lawsuit's claims, including its claim that Meta's alleged use of its authors' work to train AI violate Digital Millennium Copyright Acta United States law enacted in 1998 to prevent people from selling or copying copyrighted works on the internet. At that time, judge Agree with Meta's position that the plaintiffs did not provide enough evidence to prove that the company removed what is known as “copyright management information” (CMI), such as author names and work titles.

The unredacted documents argue that the plaintiffs should be allowed to amend their complaint, arguing that the information Meta disclosed is evidence that the DMCA complaint has merit. They also said the investigation uncovered reasons to add new charges. “Meta, through a company representative who testified on November 20, 2024, has now admitted under oath to uploading (aka 'seeding') copyright-infringing files containing copyrighted works. Plaintiffs' products on 'torrent' sites,” the motion alleges. (Seeding is when torrent files are shared with other peers after they have finished downloading.)

“This torrenting activity has turned Meta itself into a major distributor of pirated copyrighted material that it is also downloading for use in its commercially available AI models,” one of Unverified new documents re-claim the claim, claiming that Meta, in other words, has not only used copyrighted material without permission but also disseminated it.

LibGen, an archive of books uploaded to the internet that originated in Russia around 2008, is one of the largest and most controversial “shadow libraries” in the world. In 2015, a judge in New York commanded a preliminary ban on the site, a measure theoretically designed to temporarily shut down the repository, but its anonymous administrators simply switched its domain name. In September 2024, another judge in New York commanded LibGen had to pay $30 million to copyright holders for infringing on their copyrights, although it is unknown who actually ran the piracy center.

Meta's pain of discovery in this case is not over either. In a similar vein, Chhabria warned the tech giant against any future excessive redaction requests: “If Meta again submits an unreasonably broad request to seal, all will simply not be sealed,” he wrote.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *