Meta faces class action over AI training copyright infringement

Abstract illustration depicting legal barriers intersecting with data streams, representing copyright conflict in AI training

Meta faces a class action lawsuit from five major publishers alleging the company systematically infringed copyrights by using their content without authorisation to train its Llama large language models. The complaint, filed in federal court, represents the latest legal challenge to Big Tech’s practice of scraping vast quantities of web content for AI development.

The publishers—whose identities have not been publicly disclosed pending court filings—claim Meta harvested their copyrighted articles, books, and other materials as part of datasets used to develop its open-source AI models. According to The Verge AI, the lawsuit seeks both monetary damages and injunctive relief that could force Meta to retrain its models using only licensed content.

The legal action arrives as the AI industry faces mounting scrutiny over training data practices. Meta released its Llama 3.1 model in July 2024, trained on what the company described as over 15 trillion tokens—a scale that necessarily requires ingesting substantial portions of the public internet, including copyrighted works.

Meta has consistently maintained that its use of publicly available data constitutes fair use under US copyright law, an argument that will now face judicial scrutiny. The company declined to comment on pending litigation but has previously stated that AI training on published works falls within established legal precedent for transformative use.

The class action structure could prove particularly consequential. If certified, the lawsuit would allow potentially thousands of publishers and content creators to join the complaint, dramatically expanding Meta’s potential liability. Similar class actions against OpenAI and Stability AI remain in early procedural stages, with courts yet to rule definitively on the fair use defence.

Business Impact

The lawsuit creates immediate pressure on Meta’s AI strategy, which relies heavily on the open-source Llama ecosystem to compete with proprietary models from OpenAI and Anthropic. Should the publishers prevail, Meta would face three costly scenarios: substantial damages payments, licensing fees for retraining, or both.

Publishers stand to gain leverage in negotiations over AI licensing deals. The New York Times, which filed its own lawsuit against OpenAI and Microsoft in December 2023, has already demonstrated that major publishers can extract significant concessions. OpenAI subsequently signed licensing agreements with the Associated Press, Axel Springer, and the Financial Times, establishing market rates for training data.

For the broader AI industry, an adverse ruling could fundamentally alter development economics. Training frontier models already costs hundreds of millions of dollars; adding comprehensive licensing fees could create insurmountable barriers for smaller competitors, potentially consolidating the market around well-capitalised incumbents.

Investors should note the asymmetric risk profile. Meta’s market capitalisation exceeds $1 trillion, providing substantial resources to absorb damages or licensing costs. However, a precedent establishing that AI training requires explicit authorisation would affect all players uniformly, potentially benefiting Meta relative to less-capitalised competitors.

Legal Landscape

The case will likely hinge on the four-factor fair use test under US copyright law: purpose and character of use, nature of the copyrighted work, amount used, and effect on the market. AI companies argue that training constitutes transformative use that doesn’t substitute for original works. Publishers counter that AI models directly compete with their content by answering questions that would otherwise drive traffic to their sites.

District court rulings in related cases have produced mixed signals. In October 2023, a federal judge allowed similar claims against Stability AI to proceed, finding that fair use determinations require factual development unsuitable for early dismissal. However, courts have also dismissed portions of complaints, particularly claims related to model outputs rather than training itself.

The outcome will likely depend on technical evidence about how training data influences model behaviour—a question that remains contested even among AI researchers. Meta may argue that individual copyrighted works constitute infinitesimal fractions of its 15-trillion-token training corpus, making any single publisher’s contribution negligible.

What’s Next

The immediate procedural battle will centre on class certification, with Meta expected to argue that individual publishers have insufficiently similar claims to warrant collective treatment. Discovery could prove particularly contentious, as publishers seek detailed information about Meta’s training datasets—information the company considers proprietary.

Industry observers should monitor whether this lawsuit prompts additional publishers to file claims or join the class action. The case timeline will likely extend 18 to 24 months before any substantive rulings, but preliminary decisions on motions to dismiss could arrive within six months, providing early signals about judicial receptiveness to fair use arguments.

This lawsuit marks a critical juncture in the collision between copyright law and AI development, with implications extending far beyond Meta to the entire foundation model ecosystem.