This year, AI has taken over the world due to the introduction of GPT-4, DALL·E 3, Bing Chat, Gemini, and numerous other models and tools that can produce text and images when given a straightforward written prompt. Millions of real people’s texts and photos were used to train these models by AI developers, and some of these people are not pleased that their creations have been used without their consent. The launches were followed by lawsuits. And the first one will probably go to trial the following year.
The tech companies that created each AI model are depending on defence strategies such as fair use since nearly every lawsuit that is currently pending involves copyright in one way or another. They usually can’t claim that their AIs weren’t trained on copyrighted works. Instead, a lot of people contend that since generative content is produced using “new” works, content scraping from the internet is transformative. The sheer scope of generative AI tools has created massive legal messes that will be playing out in 2024 and beyond, even though text-based plagiarism may be easier to prosecute than image generators that mimic the visual styles of specific artists.
Getty Images sued Stability AI, the company that created Stable Diffusion, in January, claiming that the generative image model was improperly trained using millions of copyrighted photos from the stock photo giant’s collection. The lawsuit asked for an undisclosed amount of damages. Despite the fact that Getty has filed a comparable lawsuit in Delaware, a judge decided this week that the case can proceed to trial in the United Kingdom. There is no set date. For the record, Getty provides hilariously damning examples of how Stable Diffusion applies an odd, hazy, Getty-like watermark to some of its outputs.)
Stability AI, Midjourney, DeviantArt, and Runway AI are being sued by a group of visual artists for allegedly violating their copyright by using their creations as training data for their AI models. The lawsuit, filed in San Francisco, claims that when the names of the artists are entered as part of a prompt, the models can produce images that correspond with their unique styles. Although two of the artists involved had not registered their copyright with the US copyright office, the judge largely dismissed an earlier version of the suit and allowed the plaintiffs to refile, which they did in November. Next year, we’ll probably find out if the amended lawsuit can proceed.
On behalf of John Grisham, George R. R. Martin, George Saunders, and fourteen other writers, the Authors Guild, a writers’ trade group, has filed a lawsuit against OpenAI (the company that created ChatGPT, GPT-4, and DALL·E 3) for using their writings illegally to train its large language models (LLMs). The plaintiffs contend that the copyrighted full texts have to be somewhere in the training database because ChatGPT is capable of accurately summarising their works. Although a similar lawsuit brought by Sarah Silverman against Meta was largely dismissed in November, the proposed class-action lawsuit filed in New York in September also makes the argument that some of the training data may have come from pirate websites. They are requesting an injunction to stop the unauthorised use of their works and damages. Though a judge hasn’t made a decision in this case yet, we should find out more in the upcoming months.
And it extends beyond writers and artists. ABKCO, Universal Music, and Concord, three music publishers, are suing Anthropic (the company that makes Claude) for allegedly stealing song lyrics from its musicians in order to use them as training data for its models. In the Tennessee lawsuit, Claude asserts that it has the right to both quote the copyrighted lyrics upon request and use them verbatim in compositions that it claims to be its own. It is unlikely that a court date will be scheduled before the end of the year, as the suit was only filed in October. Anthropic is probably going to attempt to have the case dismissed.
The most unusual case involves eight anonymous plaintiffs, including two minors, who are suing Google in a proposed class-action lawsuit for allegedly misusing users’ personal data and violating their copyright. The content that the plaintiffs claim Google exploited includes books, images from dating websites, Spotify playlists, and TikTok videos, according to the lawsuit that was filed in San Francisco in July. Google is inevitably fighting back, attempting to have the case dismissed. We might learn if the case will proceed before the end of the year because they filed that motion back in October.
It appears that a trial in some of these lawsuits concerning the legality (or not) of using copyrighted content that has been taken down from the internet to train AI models may finally take place next year. While some plaintiffs, such as the Authors Guild, are also requesting an injunction to stop AI manufacturers from using models trained on copyrighted works, the majority of plaintiffs are seeking damages for their works being used without permission. Any AI developed using the pertinent data would have to stop working and be trained again using a different dataset if that ruling stood.
Of course, the lawsuits could all be settled, they could last longer, or they could be dismissed outright. And, regardless of how any judge rules, we can probably expect a number of appeals. While all of these lawsuits are pending, generative AI models are being used by an increasing number of people and are still being developed and released. Even if a judge declares generative AI makers’ behaviour to be a gross violation of copyright law and fines them millions of dollars, given how reluctant US courts have been to ban tech products for copyright or patent infringement, it appears unlikely that they will put this genie back in the bottle.