DeepMind, a subsidiary of Google, has announced Gopher, an AI model with 280 billion parameters for natural language processing (NLP). Based on the Transformer architecture and trained on a 10.5 TB corpus called MassiveText, Gopher surpassed current levels in 100 of 124 evaluation tasks.
The model and several experiments are described in an article published on arXiv. As part of their overall AI research, the DeepMind team trained Gopher and several small models to examine the strengths and weaknesses of large language models (LLMs). Specifically, the researchers identified problems in which scaling the model improved accuracy, such as reading comprehension and fact-checking, and problems in which it did not, such as logical and mathematical reasoning.
a foundation for DeepMind’s language research going forward, particularly in areas that will have a bearing on how these models are evaluated and deployed…This approach is key to creating large language models that serve society, furthering our mission of solving intelligence to advance science and benefit humanity.
The language model predicts the next element or token in a text sequence given the previous tokens. If you use such a model repeatedly and feed back the predicted output as input, the model is said to be autoregressive. Transformer’s autoregressive language model, based on deep learning architecture, set new performance records with many NLP assignments, and many researchers have developed a very broad model. The 175B parameter GPT3 is best known, but the model is trained with more parameters such as the 178B Jurassic1 parameter and the 530B Megatron Turing NLG parameter.
It is difficult to collect large datasets to train such models. Some such datasets, such as Pile and C4, have been released as open source and contain documentation stripped from sites such as Wikipedia. The DeepMind team was afraid that simply crawling the web randomly could contaminate the training dataset with the test dataset for benchmark evaluation, because these are available on the internet.
To prevent this, DeepMind has developed a data preparation pipeline and a custom training dataset called MassiveText. The pipeline excludes explicit content, performs document deduplication, and excludes test data based on the content of C4, Wikipedia, GitHub, and other sources.
DeepMind trained six models of various sizes, from 44 million parameters to 280b parameter gopher models. They evaluated the model using a battery of 152 tasks, including 62 for BIG-bench and 57 for MMLU, and used benchmarking tasks for language modeling, reading comprehension, fact-finding, question answering, and common sense. In 124 of these tasks, they compared performance to known state-of-the-art performance, and Gopher broke the record at 100. The team also investigated the performance of the model on various scales and concluded that “[m] in all disciplines with common knowledge, all academic subjects know that scale alone makes a big improvement, but scale has a “less advantage” over logical reasoning, common sense. , and problems in mathematics.
The closer we get to artificial intelligence, the more we raise the bar for what qualifies as AI (as we should). Gopher/GPT-3 are already much more accurate than the average human at technical information retrieval.