A California-based legal firm has filed a class-action complaint against OpenAI, the maker of the well-known chatbot ChatGPT, on the grounds that it significantly violated the copyrights and privacy of several individuals when it used data that was stolen from the internet to train its technology.
By using millions of internet users’ social media comments, blog posts, Wikipedia articles, and family recipes, OpenAI allegedly breached their rights, according to a novel legal theory put to the test by the case. The law firm behind the lawsuit, Clarkson, has previously filed sizable class-action lawsuits on a variety of topics, including data breaches and misleading advertising.
According to Ryan Clarkson, the managing partner of the company, the business seeks to represent actual people whose information was taken and commercially misused to develop this extremely potent technology.
On Wednesday morning, the case was submitted to the northern district of California’s federal court. Requests for comment from a representative of OpenAI went unanswered.
With the rise of “generative” AI tools like chatbots and image generators, there is a significant unanswered question that is at the center of the lawsuit. The system operates by taking in billions of words from the public internet and learning to draw conclusions from them. The resulting “large language models” can predict what to say in response to a prompt after ingesting enough data, which enables them to produce poetry, hold sophisticated discussions, and ace professional tests. However, the people who created those trillions of words never gave their consent for a business like OpenAI to use them for its own financial gain.
According to Clarkson, all of that data is being used in a way that was never intended—at scale—by a large language model. He stated that he hoped to persuade a court to impose certain restrictions on how AI algorithms are developed and how individuals are paid when their data is utilized.
There is already a group of plaintiffs with the firm, and they are aggressively seeking more.
It is still unclear whether utilizing data obtained from the open internet to train technologies that could be very profitable for their creators is legal. Some AI developers have claimed that using internet data should qualify as “fair use,” a copyright notion that allows an exception if the content is updated in a “transformative” manner.
According to Katherine Gardner, an intellectual property attorney at Gunderson Dettmer, a firm that mostly represents Internet start-ups, the subject of fair use is still up for debate and will likely be resolved in court in the next months and years. It’s less probable that individuals who merely posted or commented on a website would be able to win damages, she added. However, artists and other creative workers who can demonstrate that their copyrighted work was used to train the AI models could have a case against the firms utilizing it.
According to Gardner, when you post content on a social media site or any other website, you often give the site a very broad license to use your content however they see fit. The average end user will find it very challenging to argue that they are entitled to any kind of payment or recompense for the use of their data in the training.
The lawsuit also increases the number of legal issues facing the businesses developing AI technology and looking to make a profit from it. In November, a class-action lawsuit was brought against Microsoft and OpenAI for using the internet coding environment GitHub, which is controlled by Microsoft, to train AI tools. Stability AI, a smaller AI start-up, was sued by Getty Images in February on the grounds that it had improperly trained an image-generating bot using Getty Images’ photographs. Additionally, a Georgian radio broadcaster who claimed that text created by ChatGPT falsely implicated him of fraud filed a defamation lawsuit against OpenAI this month.
It’s not just OpenAI that uses vast amounts of data that has been collected from the public internet to train its AI models. Google, Facebook, Microsoft, and an increasing number of other businesses all share this practice. But Clarkson added that after ChatGPT attracted the public’s attention last year, it made the decision to pursue OpenAI as a result of its part in motivating its larger competitors to advance their own AI.
He said that it was their corporation that started the current AI arms race. They are the obvious initial target.
Although OpenAI withholds the type of data that went into its most recent model, GPT4, earlier iterations of the technology have been found to have ingested Wikipedia pages, news stories, and social media comments. Similar data sets have been used by Google chatbots and those of other businesses.
Regulators are considering passing new legislation requiring greater openness from businesses regarding the data used in their AI. The intellectual property attorney, Gardner, also noted that a court case might lead a judge to order a business like OpenAI to disclose the data it utilized.
A few businesses have made an effort to prevent AI startups from stealing their data. According to the report, the music distributor Universal Music Group urged Apple and Spotify to block scrapers in April. The social media network Reddit is blocking access to its data stream, citing the fact that Big Tech firms have been scraping its comments and chats for years. Elon Musk, the owner of Twitter, said to sue the firm for utilizing data it obtained from Twitter to train its AI.
The latest class-action lawsuit against OpenAI makes more serious accusations, claiming that the company doesn’t disclose to users who sign up to use its tools that the data they provide to the model may be used to train new products from which the company will profit, like its Plugins tool. Additionally, it claims that OpenAI is not doing enough to prevent minors under the age of 13 from utilizing its tools, something that other tech giants such as Facebook and YouTube have been accused of in the past.