AI is headed for journalists

Is it just theft or fair use when artificial intelligence companies program their systems to scrape and consume millions of laboriously created news stories? Is it like teaching painting lessons by having pupils replicate the “Mona Lisa”?

In a recent complaint filed in federal court in Manhattan, The New York Times asserts the latter, joining the growing number of artists and copyright holders who are opposing the unauthorised use of their works by AI corporations. Defendants OpenAI and Microsoft Corp. will almost surely reply that using millions of copyrighted works by the Times and others to train ChatGPT and similar algorithms is legal “fair use.”

In fact, OpenAI has already hinted at a fair use defense in a move to dismiss a different, ongoing lawsuit against Meta filed last year in federal court in San Francisco by comedian Sarah Silverman and other writers. The move was based on a scenario where ChatGPT reproduced significant portions of the books after being trained on them. Although that lawsuit hasn’t been doing very well (the judge recently approved Meta’s move to have all but one of Silverman et al.’s claims dismissed), it was mostly founded on different premises than the ones used in the Times case.

The theory of “fair learning,” which some academics are using to support the wholesale reproduction of copyright materials in generative AI training sets, draws an analogy between this and how humans privately reproduce copyrighted works in order to study and gain knowledge from them. This practice is typically regarded as non-infringing or fair use. But the paper claims that these AI results are remarkably similar to particular Times articles. This use is unfair in every way.

In the absence of the fair use defense, GenAI companies may be held accountable for copyright violations. This allows the Times and other publishers to negotiate “guardrails” for how their resources are used or end up in GenAI outputs, as well as the right to a portion of the earnings GenAI will make off the publishers’ materials.

In response to the Times lawsuit, an Open AI representative stated that the tech company upheld the rights of content owners and creators and was collaborating with them to guarantee their advantages from AI innovations and new revenue streams: We have been having fruitful and helpful interactions with the New York Times, therefore we are startled and dismayed by this development. As with many other publishers, we hope that we can work together in a way that benefits both of us.

Students of art try to comprehend how Leonardo da Vinci carried out his creative vision when they replicate the “Mona Lisa.” Their objectives are not to slavishly copy another person’s approach, but to create instruments that allow them to convey their own vision in their own unique way.

In contrast, OpenAI and similar companies design their generative AI systems to imitate existing human works of art and fashion. The term “generative AI” refers to the creation of writing, graphics, and other forms of expression in response to user input.

Applications like Jukebox and MuseNet, two more OpenAI initiatives, advertise their ability to produce “new” compositions in the manner of designated musicians and artists. It’s unknown if there is a lengthy game that switches to producing original output.

At its best, the emerging sector advances the idea of a tool that enables anyone to produce original pieces of art. However, generative AI is currently restricted to mashups of pre-existing styles (partially due to the requirement for the systems to be trained on pre-existing materials).

Innovative creativity goes beyond simply repurposing existing stylistic elements while keeping them all identifiable. Instead, it is an entirely original work of style that barely mentions its sources. According to current generative AI systems, the song “Frank Sinatra singing an Ed Sheeran song” would sound exactly like what its title suggests. The listener would perceive Sinatra’s vocals as though he were performing a rendition of a song by Ed Sheeran, complete with the signature melodies, chord progressions, and phrasing, despite the fact that it is not a song by Sheeran.

On the other hand, the outcome does not sound like one of their influences singing the song of another influence when human musicians develop their own unique style based on the styles of other musicians they respect and mimic. For instance, singer-songwriter Brandi Carlile is well known for being upfront about the significant impact that past performers like Elton John and Joni Mitchell had on her own approach. To be sure, Carlile has performed songs by Joni Mitchell and Elton Feldman, but her original songs don’t sound exactly like their performances or compositions. Thus, whereas the output of AI sounds like strange juxtapositions of the various human works that it was taught on, the output of competent creative humans sounds like something fresh.

At its worst, the generative AI space appears to be completely focused on displacing human creativity. AI systems will generate new works at scale for all tastes and budgets based on internal suggestions. Does this lead to the emergence of any valuable aesthetic or authentic new style?

Some may wonder if news isn’t “just the facts” in this context. Furthermore, facts are unprotectable under copyright laws. Moreover, text is not protectable if it is “functional,” such as a recipe. The US Supreme Court’s 1918 ruling in International News Service v. Associated Press maintains that it is misappropriation to instantly replicate noncopyrightable news reports, even in cases where reporting is just factual and practical.

Journalism, however, is more than “just the facts.” It also involves narrative. Readers are looking for innovative, thought-provoking viewpoints and insights presented in visually appealing, styled sections. In certain instances, unconventional writing styles—like Hunter S. Thompson’s “gonzo journalism”—might even provide readers a novel perspective on current affairs.

The purpose of generative AI is to mimic the writing style of seasoned journalists. By doing this, it produces content that readers are accustomed to seeing in news and commentary. In real terms, this means that generative AI is producing content that is precisely replicating previously published portions as well as writing in the style of well-known authors. Numerous of these occurrences are documented in The Times’ complaint.

Is it fair usage to reproduce news and opinion that has already been published using generative AI? Not in my opinion. Not only do certain journalists gain greater readership than others due to their early publications or superior analysis, but they also gain recognition for their skillful idea expression. The four-part statutory test for fair use—purpose and character of the use (e.g., commercial or noncommercial); nature of the copyrighted work; amount and substantiality of the portion used compared to overall work; and effect of the use on the original’s market—is violated when generative AI arbitrarily appropriates these stylistic successes. A “transformative use” test is frequently used by courts to refer to some or all of these elements. Does the allegedly infringing work employ the copied passages differently from how the original work did?

Insofar as it doesn’t alter the original narrative for a new media, goal, or setting, the use of generative AI is not “transformative.” Rather, it is only copying significant portions of other people’s work in order to compete in the same markets as the original.

Generative AI is “hallucinating” articles while making them look like they are from reputable news outlets, which is even more troublesome in a society where misinformation is abound. When generative AI creates facts and narratives that are either false or have been adjusted to be false but are presented convincingly (e.g., presents a legal court reference that meets the technical format but no such case actually exists), this is known as “hallucinating.” Therefore, generative AI is misattributing ideas and tales and violating trademark rights.

In the end, generative AI accomplishes the exact opposite of what human learning is meant to accomplish. It is a snaking hose flailing around uncontrollably, spewing unthinking sequences of text based purely on probability that one word comes after another in human expression, rather than learning the styles of other experts to produce new and better ones.

A heinous firehose of inanity that threatens to destroy not just the creative industries but also democracy and our basic sense of reality and truth has appropriated the profound utterances of expert human creators. Enforcing intellectual property rights is the greatest place to start when it comes to controlling generative AI for the benefit of humanity, even though copyright violation might not seem like a big deal.

Source link