In recent years, artificial intelligence language models have developed very well for certain tasks. Most importantly, they are great at predicting the next word in a text string; This technology helps search engines and SMS applications predict the next word you will type.
The newer generation of predictive language models also seems to be learning about the underlying meaning of language. In addition to predicting the next word, these models can perform tasks that require real understanding, such as answering questions, summarizing documents, and completing stories.
Such models were designed to optimize the performance of the specific text prediction function without attempting to mimic how the human brain accomplishes the task or understands language, but a new study by neuroscientists at MIT suggests that the underlying Function of these models It is similar to the function of language processing centers in the human brain.
Computer models that work well for other types of language tasks lack this similarity to the human brain and provide evidence that the human brain can use the next word prediction to drive language processing. Computer models that work well for other types of language tasks lack this similarity to the human brain and provide evidence that the human brain can use the next word prediction to drive language processing.
“The better the model can predict the next word, the better it fits the human brain,” says Nancy Kanwisher, Walter A. Rosenblith Professor of Cognitive Neuroscience, Fellow at MIT’s McGovern Brain Research Institute and Brain Center. Minds and Machines (CBMM) and author of the new study. “It’s surprising that the models fit so well, and it suggests, very indirectly, that the human language system might be predicting what will happen next.”
Joshua Tenenbaum, Professor of Computational Cognitive Science at MIT and member of the CBMM and the MIT Artificial Intelligence Laboratory (CSAIL); and Evelina Fedorenko, Frederick A. and Carole J. Middleton Associate Professor of Professional Development Neuroscience and Fellow at the McGovern Institute, are the lead authors of the study, which appears in the Proceedings of the National Academy of Sciences this week. Schrimpf, an MIT graduate working at the CBMM, is the paper’s first author.
Making predictions
The new powerful next-word prediction models belong to a class of models known as deep neural networks. These networks contain computational “nodes” that form links of varying strengths and layers that transmit information to one another in a prescribed manner.
For the past decade, scientists have used deep neural networks to create visual models that can see objects as well as primate brains. Research at MIT has also shown that the underlying function of visual object recognition models is consistent with the organization of the primate visual cortex, although these computer models were not specifically designed to mimic the brain.
In the new study, the MIT team used a similar approach to compare language processing centers in the human brain with models of language processing. The researchers analyzed 43 different language models, many of which were optimized to predict the following words. These include a model called GPT-3 (Generative Pretrained Transformer 3), which, if needed, can generate text similar to what a human would produce. Other models were designed to perform different linguistic tasks, such as filling in a blank in a sentence.
Each model being presented with a string of words, the researchers measured the activity of the nodes that make up the network. They then compared these patterns to human brain activity, measured in subjects performing three language tasks: listening to stories, reading sentences one at a time, and reading sentences where one word is revealed at a time.These human data sets included functional magnetic resonance imaging (fMRI) data and intracranial electrocorticographic measurements taken in people undergoing brain surgery for epilepsy.
They found that the best performing network word prediction models had activity patterns very similar to those of the human brain. Activity in the same models also strongly correlated with measurements of human behavior, such as how quickly people could read texts.
“We found that models that predict neural responses well also tend to better predict human behavioral responses in terms of reading times. And both are explained by the model’s performance in predicting the next word. This triangle really connects everything, ”says Schrimpf.
“A key takeaway from this work is that language processing is a highly constrained problem: The best solutions to it that AI engineers have created end up being similar, as this paper shows, to the solutions found by the evolutionary process that created the human brain. Since the AI network didn’t seek to mimic the brain directly — but does end up looking brain-like — this suggests that, in a sense, a kind of convergent evolution has occurred between AI and nature,” says Daniel Yamins, an assistant professor of psychology and computer science at Stanford University, who was not involved in the study.
Game changer
One of the key computational features of predictive models such as GPT-3 is an element known as a forward one-way predictive transformer. This kind of transformer is able to make predictions of what is going to come next, based on previous sequences. A significant feature of this transformer is that it can make predictions based on a very long prior context (hundreds of words), not just the last few words.
Scientists haven’t found any circuits or learning mechanisms in the brain that correspond to this type of processing, says Tenenbaum. However, the new results are in line with earlier hypotheses that prediction is one of the key functions in language processing. He says.
“One of the challenges with speech processing is what it looks like in real time,” he says. “Language comes into play and you have to keep up with it and understand it in real time.”
The researchers now plan to create variants of these language processing models to see how small changes in their architecture affect their performance and their ability to adapt to human neural data.
“For me, this result changed the rules of the game,” says Fedorenko. “It completely changes my research program because I would not have predicted that in my lifetime we would develop these computationally explicit models that capture enough about the brain to be used to understand how the brain works.”
The researchers also plan to combine these powerful language models with some computer models that Tenenbaum’s lab previously developed that can perform other types of tasks, such as perceptual representations of the physical world.
“If we can understand what these language models do and how they can connect with models that do things that are more like perceiving and thinking, then we can get more inclusive models of how things work in the brain.” Says Tenenbaum. “This could lead us to better models of artificial intelligence and give us better models of how a larger part of the brain works and how general intelligence arises than we have had in the past.”