Publication

Locke Lord QuickStudy: Generative AI and Copyright ‎Infringement: Federal Judge in Stability AI Distinguishes ‎‎Between AI Model Creators and Users – For Now

Locke Lord LLP
November 28, 2023

A California federal judge recently issued one of the first substantive generative AI decisions to date. In Anderson, et al., v. Stability AI Ltd. et al., 23-cv-00201-WHO (N.D. Cal. Oct. 30, 2023) (“Stability AI”), Plaintiffs, a trio of visual artists, asserted claims for direct and vicarious copyright infringement under the Copyright Act. The three AI companies moved to dismiss the complaint for failure to ‎state a viable claim. Judge William Orrick in the U.S. District Court for the Northern District of California[1] dismissed most of the copyright and intellectual property claims asserted by content owners against various AI companies. The decision demonstrates that courts will likely require plaintiffs to plead their copyright infringement claims with more specificity. The court did, however, permit copyright claims against the primary defendant to proceed and allowed claims against the other defendants to be replead. By outlining what needs to be alleged to sufficiently plead a copyright claim against an AI company, the court provides a roadmap of where liability may lie and where proof problems inherent to the technology may thwart a claim.

Background

The product at the center of Stability AI is the titular company’s product, Stable Diffusion. Stable Diffusion is a “software library” that provides image generating services to consumers and businesses (such as commercial defendants DeviantArt and Midjourney, which are open-source generative AI companies that generate images in response to a user’s text prompt.‎). To create and train Stable Diffusion, Stability copied (or directed someone to copy) five billion images from the internet, an indiscriminate total that necessarily included copyrighted works. The Stable Diffusion website states that it is a “state-of-the-art tool that can bring your imagination to ‎life.”‎ Due to the large reservoir of training images to help it learn, the AI art tool is able to replicate the ‎style of specific artists upon demand.

Plaintiffs are copyright holders who allege that their works were ingested by Stability and used to train Stable Diffusion.[2] The Plaintiffs contend all three AI companies are committing enormous copyright infringement ‎by using their visual artworks without authorization to allow AI-image generators such as Stable ‎Diffusion to create “new” images. The defendants in Stability AI can be sorted into two groups: those involved in the scraping, copying and use of copyrighted works to train AI models (Stability), and those that integrated the Stable Diffusion program into their own product but who played no part in the scraping, copying and use of the registered training images (DeviantArt and Midjourney).

In brief, Plaintiffs’ main position is that AI companies cannot and ‎should not hide behind a characterization of the output images as “new” when they are ‎infringing derivative works.‎ The decision takes aim at the primary generative AI companies: those that scrape and store registered works to train generative AI models. Companies that merely use the software created by these companies avoided the court’s crosshairs, although they are not completely out of the woods, as the court outlined how claims could plausibly be made against them.

Using registered images to train an AI model

The focus of the order was on Stability AI and its Stable Diffusion product. The court held that Plaintiffs had plead enough in its complaint to claim direct copyright infringement by alleging that Stability AI “downloaded or otherwise acquired copies of billions of copyrighted images without permission to create Stable Diffusion.”[3] The court relied heavily on Stability’s alleged scraping, copying, and use of copyrighted images when addressing plaintiffs’ direct infringement claims. And, while Stability contested the validity of some of the complaint’s facts, those arguments are irrelevant at the motion to dismiss stage.[4] Accordingly, the court denied Stability’s motion to dismiss plaintiffs’ direct copyright infringement claim and allowed plaintiffs’ claim to go forward.

Notably, the court focused on the copying, storage, retrieval, and use of registered works in analyzing direct infringement. However, as the order appears to suggest, it is possible that a generative AI model could reproduce an entire image without having that image stored in its memory, i.e., a model could “include only a few [or no] elements of a copyrighted [i]mage.”[5] These doctrinal concepts of “copying” are difficult to apply to such a scenario.

Regardless, the court found that companies who train AI models, depending on the facts and circumstances, may be liable for direct copyright infringement, at least under the current legal landscape and at the motion to dismiss stage.

Downstream companies using generative AI in their products

The court, however, dismissed, with the ability to replead, Plaintiffs’ claims against DeviantArt and Midjourney to proceed, rejecting in their current form the three theories Plaintiffs put forward regarding direct infringement against these defendants.

First, Plaintiffs asserted that by distributing Stable Diffusion, DeviantArt and Midjourney were directly infringing plaintiffs’ copyrights. However, that claim is premised, at least in part, on Plaintiffs’ assertion that Stable Diffusion uses compressed copies of the registered “training images.” In other words, the court was asking whether, by licensing Stable Diffusion and integrating that software into their own products, these downstream defendants had an unauthorized copy of a registered work, which could be the predicate “copying” for a copyright claim, mentioned above. Since it was unclear whether there were physical copies of the specific copyrighted images at issue within Stable Diffusion or whether Plaintiffs’ theory was based on an algorithm’s ability to “reconstruct” registered training images, the court requested clarity on this theory (and expressed some skepticism of the latter position).

Second, Plaintiffs claimed DeviantArt’s and Midjourney’s programs were themselves infringing derivative works, and therefore, according to Plaintiffs, distributing these products constitutes direct infringement. This claim, as currently plead, failed for the same reasons as the first: Plaintiffs did not allege plausible facts to support what is compressed and stored in the Stable Diffusion program; nor did plaintiffs assert how defendants’ programs utilize the registered images, if at all.[6]

Finally, Plaintiffs alleged DeviantArt’s and Midjourney’s programs create output images—i.e., images for end-consumers who enter a prompt—that are infringing derivative works. The court rejected this argument as currently plead, noting: (1) it suffers from the same lack of specificity as the first two arguments; (2) it is not plausible that every output relies upon registered work(s), as argued by plaintiffs; and (3) the court was skeptical as to whether claims based on a derivative theory can survive absent a showing of “substantial similarity” (Importantly, Plaintiffs’ concede that none of the output images generated are substantially similar to their original works.)

Accordingly, the court dismissed the direct claims against DeviantArt and Midjourney but allowed Plaintiffs an opportunity to replead. The court also requested clarity as to how registered images are stored, used, and reproduced, if at all, in the various programs. Finally, the court requested evidence of AI image generators responding to prompts similar to “in the style of [an artist with protected works]”, which could be evidence of unauthorized copying.

Pleading standards for identifying infringed works

The court clarified the types of allegations sufficient to allege that a plaintiff’s work was used as a “training image.” As a practical matter, while it is widely known that a large portion, if not most, copyrighted works on the internet have been scraped and used to train AI models, it is not easy to prove that a specific image was scraped. The court held that plaintiff attesting that she used an internet service called “haveibeentrained.com” to determine that her work was in fact on datasets used for training was sufficient to survive a motion to dismiss on this issue.

Conclusion

This is one of the first decisions directly addressing the copyright implications of generative AI. By focusing on the main player—the company that ingests data and uses that data to train generative AI—the court is setting the stage for where liability may land. The court also previewed some of the key issues before it: (1) whether ingesting data to train an AI model is “copying” under the Copyright Act and (2) whether secondary use of generative AI models can expose companies, like DeviantArt and Midjourney, to copyright liability. On the first, the court’s decision suggests that it may. The second question remains unanswered and will depend on ‎the ability to develop evidence of the substantial similarity of the “outputs” to original works used in the training. Other issues, such as the applicability of fair use and other ‎defenses, are still open. ‎We will keep updating this space as this case and these issues develop.

---

[1] The Northern District of California has become the venue of choice to litigate the first series of cases litigating IP rights and generative AI. See, e.g, Kadrey et al., v. Meta Platforms, Inc. 3:23-cv-03417; J.L. et al., v. Alphabet Inc., et al., 3:23-cv-03440, Tremblay v. OpenAI, Inc., et al., 3:23-cv-03223; Doe 1 et al. v. GitHub Inc. et al., No. 22-cv-06823.

[2] Most plaintiffs were dismissed for failing to register their works, leaving just one plaintiff, Anderson, remaining. However, we will continue to use the plural “plaintiffs” for continuity.

[3] Stability AI,. at 8

[4] Stability AI, at 7 (cleaned up).

[5] Stability AI, at 10

[6] Stability AI, at 8-9.

Click here to visit the AI resource center