For the first time, OpenAI is allowing creators to remove their work from training data for DALL-E 3, its most recent AI image generator. The opt-out procedure is so onerous that it nearly seems as though it was created specifically to fail.
In order to request the removal of owned or copyrighted photos from the DALL-E training data, image owners and authors can now use a new form made available by OpenAI.
For AI models to function successfully, training data must be of high quality and produced by humans. There is a race to gather all of this data. However, the original authors of this information have now come to understand that the worth and knowledge they infused into it are being absorbed and used for someone else’s advantage. This is placing pressure on major tech firms to provide creators with means of either actively choosing to participate in the experiment or actively choosing to have their data removed.
One by one
An artist, owner, or rights holder must provide an individual copy of each image they want deleted from DALL-E’s training dataset along with a description for OpenAI’s new process to even evaluate an opt-out request.
For the majority of artists, this might entail having to individually submit hundreds or thousands of pieces of work. The Georgia O’Keeffe Museum, for instance, as the owner of the artist’s rights, would have to make unique requests for every one of O’Keeffe’s more than 2,000 works of art in order to have them removed from DALL-E’s dataset.
Smart technologists abound at OpenAI. The corporation may have implemented a system that would have allowed an owner or artist to ask for the removal of all of their work from the training data in a single request. However, the corporation chose not to. Why? It is most likely doing this because it needs as much data as possible to create its AI models.
Enraging
The DALL-E opt-out mechanism used by OpenAI is “enraging,” according to Toby Bartlett, an artist and founder of a consultancy company.
Now, artists will have to almost ruin their work with massive watermarks in the hopes that their work would not be used… if that even works!” He went on to say.
An IT expert named Greg Madhere also posted that he recently started taking photos and wants to share them online. Given the extent to which internet content is being scraped and utilized to train AI models like DALL-E and ChatGPT, he is now reluctant.
Where is it even safe to publish online anymore? He questioned.
Too late
Even if OpenAI accedes to an artist’s or owner’s request to opt out, it will only apply to “future” training data for DALL-E. A person may request that their creative work be removed from the training data for version 3, which was recently released, but it has already used such work. Or, as OpenAI put it, their model will have learned from their training data and be able to retain the notions that they learned.
Translation: This is the opt-out process, but it’s too late because we’ve already extracted the majority of the value from your work.
Opt-outs are one of the concerns related to the use of copyrighted works for AI training that are currently being addressed as part of a rule-making process at the US Copyright Office.
According to a representative for OpenAI, they have heard from artists and other creators that they don’t always want their work to be used for model training. As a result, they are giving them the option to choose whether or not their work will be included in future model training.
Robots.txt option
The company advises limiting OpenAI’s web crawler GPTBot by setting robots.txt for individuals who have huge body of work or “high volume of images from specific URLs.” Last month, OpenAI declared that it will adhere to the time-honored practice of websites indicating that a web crawler should not collect their data.
The issue is that in order to implement robots.txt, an artist or owner would require access to the codebases of each website that serves their images as well as knowledge of every website that hosts their images in order to add a robots.txt file that may block GPTBot.
It is probably impossible for an owner or artist to get their works taken out of the DALL-E training data without such access.