The same issue plagues all large language models (LLMs), including OpenAI’s ChatGPT: they invent things.
The errors range from peculiar and harmless—such as asserting that the Golden Gate Bridge was carried through Egypt in 2016—to extremely troublesome and even hazardous.
Recently, a mayor in Australia threatened to sue OpenAI because ChatGPT incorrectly stated that he had admitted guilt in a significant bribery issue. Researchers have discovered that software developers who are unaware of dangerous malware distribution techniques can be targeted by LLM hallucinations. Furthermore, LLMs regularly offer poor guidance on mental health and medicine, such as the claim that drinking wine can “prevent cancer.”
Today’s LLMs — and all generative AI models, for that matter — are built and trained in a way that encourages the inclination to create “facts”; this phenomena is known as hallucination.
Training models
Statistical systems that forecast words, images, voice, music, or other types of input, generative AI models lack true intelligence. AI models understand how often data is to occur based on patterns and the context of any nearby data after being fed a huge number of samples, typically from the public web.
A typical email would close with the phrase “Looking forward to…”, for instance, and an LLM might finish it with “… to hearing back” in accordance with the pattern of the numerous emails it has been trained on. It doesn’t imply that the LLM is anticipating anything.
Sebastian Berns, a Ph.D. researcher at Queen Mary University of London, explained in an interview that the current methodology for training LLMs entails concealing, or “masking,” preceding phrases for context and having the model anticipate which words should be used in their place. Using iOS’s predictive text and repeatedly tapping one of the proposed next words is conceptually similar to this.
The majority of the time, this probability-based strategy performs exceptionally well at scale. The likelihood of text making sense is high, but it’s far from certain given the variety of words and their probabilities.
For example, LLMs can come up with a claim regarding the Golden Gate that is grammatically correct but absurd. They may also spread untruths that would cause their training data to be inaccurate. Or they may combine information from several sources, even those that are made up, even though it is obvious that the information is in direct conflict.
The LLMs aren’t deliberately doing it. The ideas of true and wrong have no significance to them because they are without malice. Even if the correlations they make are inaccurate, they have simply learnt to link particular words or phrases with particular ideas.
According to Berns, hallucinations are related to an LLM’s inability to gauge the degree of uncertainty in its own prediction. Typically, an LLM is taught to always produce an output, despite the fact that the input differs greatly from the training data. Standard LLMs have no method of understanding whether they can make predictions or answers that can be relied upon.
Solving hallucination
Is it possible to treat hallucinations? It depends on your definition of “solved.”
According to Vu Ha, an engineer and applied researcher at the Allen Institute for Artificial Intelligence, LLMs “do and will always hallucinate.” According to how an LLM is taught and used, he also thinks there are practical ways to lessen hallucinations, though not completely remove them.
Consider a system that responds to questions, Ha advised via email. In order to offer accurate responses using a retrieval-like process, it is possible to engineer it to have high accuracy by creating a high quality knowledge base of questions and answers and integrating this knowledge base with an LLM.
Ha demonstrated the distinction between an LLM with a “high quality” knowledge base to draw from and one with less thorough data curation. He asked “Who are the authors of the Toolformer paper? Through Google’s Bard and Microsoft’s Bing Chat, which is powered by LLM, (Toolformer is an AI model created by Meta). All eight Meta co-authors were accurately identified by Bing Chat, however Bard gave incorrect credit to Google and Hugging Face researchers for the study.
Any implemented LLM-based system will experience hallucinations. The true question, according to Ha, is whether the advantages exceed the consequences brought on by hallucinations. In other words, if a model is helpful overall but occasionally gets a date or name wrong, for example, it might be worth the trade-off. It is a matter of maximizing predicted AI utility, he continued.
Berns mentioned reinforcement learning from human feedback (RLHF), another method that had been utilized to some extent with efficacy to lessen hallucinations in LLMs. The RLHF method was developed by OpenAI in 2017 and entails training an LLM, then obtaining more data to train a “reward” model and optimizing the LLM using the reward model via reinforcement learning.
In RLHF, a set of prompts are fed into an LLM to produce new text using a specified data set of prompts. Following that, the outputs from the LLM are ranked by human annotators according to their general “helpfulness”; this information is utilized to train the reward model. The generated responses from the LLM are then adjusted using the reward model, which at this point can accept any text and assign it a score according on how well humans interpret it.
GPT-4 was one of the models that OpenAI trained using RLHF. Berns emphasized that even RLHF isn’t flawless.
Several of OpenAI’s models, such as GPT-4, were trained using RLHF. Berns cautioned that even RLHF isn’t ideal.
According to Berns, the range of potential outcomes is too wide to completely “align” LLMs with RLHF. In the RLHF environment, it’s common practice to train a model to provide a “I don’t know” response to a challenging question, mostly relying on human domain expertise and hope the model generalizes it to its own domain expertise. It frequently does, but it can be somewhat picky.
Alternative philosophies
Is it problematic that we assume hallucination cannot be resolved, at least not with the LLMs of today? In reality, Berns disagrees. According to his theory, hallucinating models could stimulate creativity by serving as a “co-creative partner” and producing outputs that, while not entirely factual, do contain some useful lines to follow. Utilizing hallucinations creatively can lead to results or combinations of ideas that wouldn’t typically occur to most people.
In situations where a person relies on the LLM to be an expert, hallucinations are problematic if created comments are factually inaccurate or violate any universal human, societal, or particular cultural standards, the expert stated. But the capacity to produce unexpected results can be useful in creative or artistic endeavors. A human recipient might be startled by a response to a question and be therefore prodded in a certain mental direction, which might result in the novel idea connection.
Since humans also “hallucinate” when we misremember or otherwise falsify the facts, Ha suggested that the LLMs of today are being judged to an unfair standard. But with LLMs, he contends, we encounter cognitive dissonance because the models provide results that, although they appear accurate at first glance, turn out to be flawed.
Simply put, he stated, LLMs are flawed and prone to error, just like any AI method. Since humans expect and tolerate errors, historically, we can live with AI systems making blunders. However, it becomes more complex when LLMs make errors.
In fact, the answer might not reside in the technical details of how generative AI models function. If there is a “solution” to hallucination today, it seems prudent to view model forecasts with a skeptical eye.