Home Artificial Intelligence Artificial Intelligence News Training Generative AI with Open Sourced Content

Training Generative AI with Open Sourced Content

January 9, 2024

Numerous lawsuits accuse OpenAI and Microsoft, its largest investor, of utilizing other people’s copyrighted works without authorization in order to train its large language models (LLMs). In addition, there may be additional legal actions taken against the businesses in the future, according to what OpenAI informed the House of Lords Communications and Digital Select Committee. OpenAI stated in its written testimony (PDF) submitted for the committee’s investigation into LLMs that it would be “impossible to train today’s leading AI models without using copyrighted materials.

The business clarified that this is the case since almost all forms of human speech are now protected by copyright, including government papers, blog entries, images, forum messages, and snippets of software code. It further stated that while [l]imiting training material to books and pictures in the public domain that were made more than a century ago would result in an intriguing experiment, it would not produce AI systems that are suitable for today’s citizens. Additionally, OpenAI emphasized that when it trains its models, it complies with copyright rules. It claimed that using publicly accessible online resources to train artificial intelligence (AI) complies with fair use doctrine.

It acknowledged, although, that more needs to be done to empower and assist creators. The company discussed how it enables publishers to prevent access to their websites by the GPTBot web crawler. Moreover, it stated that it is working with rightsholders to reach mutually beneficial arrangements and that it is creating new procedures to allow them to choose not to participate in training.

In certain legal actions brought against Microsoft and OpenAI, the plaintiffs claim that the corporations have failed to compensate writers for their contributions while creating a billion-dollar sector and making substantial profits from copyrighted content. A few non-fiction writers filed a more recent lawsuit in which they claimed the corporations had the opportunity to consider different financing methods including profit sharing but “decided to steal” instead.

While OpenAI refrained from commenting on those specific complaints, it did directly respond to The New York Times’ complaint alleging unauthorized exploitation of its published news items. It claimed that the magazine was not providing the whole story. It was already in talks with The Times about a “high-value partnership” that would allow it to access the news stories published in the newspaper. OpenAI first learned about the lawsuit on December 19 after reading about it in The Times. It appears that the two sides were still in communication as of December 19.

The publication complained, citing occasions in which ChatGPT sent users “near-verbatim excerpts” from stories that were paywalled. The newspaper was accused by OpenAI of purposefully manipulating the prompts, such as by providing long extracts from articles in order to persuade the chatbot to recite information. It also charges The Times of selecting certain samples from a large number of tries. Although The Times’ case is without basis, OpenAI remains optimistic about a “constructive partnership” with the newspaper.

Source link