Up until now, not even artificial intelligence businesses have been able to develop instruments that can accurately identify when a written article was produced by use of a large language model. By analyzing which “excess words” began to appear much more frequently throughout the LLM era (i.e., 2023 and 2024), a team of researchers has now developed a novel method for assessing LLM usage across a large set of scientific writing. According to the researchers, the findings “suggest that at least 10 percent of 2024 abstracts were processed with LLMs.”
Four academics from Northwestern University and the University of Tübingen in Germany stated in a preprint paper published earlier this month that they were motivated by research that examined excess mortality in comparison to previous years to gauge the impact of the Covid-19 outbreak. Following the widespread availability of LLM writing tools in late 2022, the researchers looked at “excess word usage” and discovered that “the appearance of LLMs led to an abrupt increase in the frequency of certain style words” that was “unprecedented in both quality and quantity.”
Delving In
The relative frequency of each word as it appeared annually was tracked in 14 million paper abstracts published on PubMed between 2010 and 2024, which the researchers used to gauge these vocabulary changes. Then, using the pre-2023 trend line as a guide, they compared the actual frequency of those phrases in abstracts from 2023 and 2024 to the expected frequency of those words, when LLMs were widely used.
According to the findings, certain terms that were incredibly rare in these scientific papers prior to 2023 experienced an abrupt increase in usage following the introduction of LLMs. The phrase “delves,” for example, appears in 25 times more 2024 articles than one would anticipate based on the pre-LLM pattern; terms like “showcasing” and “underscores” also saw a nine-fold spike in frequency. In post-LLM abstracts, a few more frequently used terms significantly increased in frequency: For example, the frequency of “potential” rose by 4.1 percentage points, “findings” by 2.7 percentage points, and “crucial” by 2.6 percentage points.
These kinds of shifts in word use might occur regardless of LLM usage, of course—the natural evolution of language implies that terms come in and out of style. However, the researchers discovered that, prior to the LLM era, such enormous and sudden year-over-year spikes were only detected for words associated to major global health events: “ebola” in 2015, “zika” in 2017, and words like “coronavirus,” “lockdown,” and “pandemic” in the 2020 to 2022 period.
However, the researchers discovered hundreds of phrases with abrupt, noticeable spikes in scientific usage in the post-LLM era that had no connection to global events. Actually, the researchers discovered that while the majority of the excess words during the Covid pandemic were nouns, the majority of the words with a post-LLM frequency bump were “style words,” such as verbs, adjectives, and adverbs (a small sample: “across, additionally, comprehensive, crucial, enhancing, exhibited, insights, notably, particularly, within”).
This is not a wholly novel discovery—the rising use of “delve” in scientific articles has been well documented recently, for example. However, earlier research typically relied on lists of predetermined LLM indicators that were acquired from outside the study or comparisons with “ground truth” human writing samples. In order to demonstrate how language choice has evolved generally in the post-LLM era, the pre-2023 set of abstracts serves as an effective control group of its own.
A Complicated Interaction
Sometimes it is easy to identify the telltale signals of LLM use by looking up hundreds of so-called “marker words” that were much more popular in the post-LLM era. Consider the following sample abstract line that the researchers identified, with the marker words highlighted: Effective therapy strategies require a thorough understanding of the complex interactions between […] and […].
After conducting statistical analyses of marker word appearance across individual publications, the researchers estimate that at least 10% of the post-2022 works in the PubMed corpus were generated with some LLM aid. The number could be even higher, according to the researchers, because their dataset may be missing LLM-assisted abstracts that do not contain any of the marker words they detected.
Moreover, there can be significant variations in those measured percentages among various subsets of papers. A study by the researchers revealed that 15% of papers written by authors from countries such as China, South Korea, and Taiwan had LLM marker terms. This suggests that LLMs could be useful in helping non-native English speakers edit English literature, which could explain why they are used so widely. Nevertheless, the researchers suggest that native English speakers might [simply] be more adept at identifying and actively eliminating strange style phrases from LLM outputs, disguising their LLM usage from this type of study.
The researchers emphasize the need of detecting LLM use, stating that “LLMs are infamous for making up references, providing inaccurate summaries, and making false claims that sound authoritative and convincing.” However, if more people become aware of the telltale marker words used by LLMs, human editors may improve their ability to remove those terms from generated text before it is published with the public.
Perhaps huge language models in the future will perform this type of frequency analysis on their own, reducing the weight of flag words to better pass off their outputs as humanlike. We could soon have to summon some Blade Runners to identify the generative AI text that is concealed among us.