Deepmind, a division of Google, has published a paper proposing a family of machine learning models aimed at completing more tasks with far less expensive and time-consuming training.
The benefit, according to the tech giant, is massive cost savings, as training has become prohibitively expensive. The disadvantage is that combining visual learning with a language model is a difficult task.
Flamingo is a model family of distinct software systems that are based on a few-shot visual language model (VLM) (versus a more monolithic model like GPT-3, for example). The Deepmind team at Google claims that it outperforms all previous few-shot learning methods, even those fine-tuned with orders of magnitude more data.
In a preprint of Deepmind’s academic paper on the subject, Flamingo is described as being designed to take a combination of text and image inputs and produce a text-only answer, with some room for the models to do some interpretation. Deepmind employs an in-house dataset designed specifically for multimodal machine learning research. All of the data is unlabeled and was gathered from the public internet in 43.3 million instances, with 185 million images and 182GB of text.
To put it another way, here’s an example of what Flamingo can do: it was only given a few examples to complete an inference task during training (identify an animal, solve a math problem, count types of animals in an image, etc). It was given another image and asked to return the explanatory text of the input after being told what kind of inference its users wanted.
Deepmind built Flamingo using its own pre-trained Chinchilla language model, which has 70 billion parameters. Deepmind “fused” the Chinchilla LM with visual learning elements “by adding novel architecture components in-between” that isolate and freeze training data, resulting in the Flamingo FLM, which has 80 billion parameters.
A single Flamingo model can achieve state-of-the-art results on a wide range of tasks, competing with approaches that require task-specific fine-tuning on orders of magnitude more examples and frequently requiring hand-engineered ‘tricks, according to Deepmind’s Flamingo contributors.
The potential applications of this machine learning model are obvious, and they aren’t limited to what Flamingo can do with data – the model could also benefit the state of machine learning in general, which is grappling with increasing energy and computing requirements to train newer models. A single Google BERT training session, according to one estimate, emitted the same amount of carbon as a trans-American jet flight.
Deepmind didn’t mention the energy costs associated with training a Flamingo model, though it did call it “computationally expensive to train.”
Flamingo, on the other hand, can be quickly adapted to low-resource settings and tasks, such as evaluating data for PII, social biases, stereotypes, and other elements that can lead to the frequently encountered issue of AI bias, according to the paper.
Despite this, Flamingo may not be ready for prime time, and not because of the model’s flaws: Deepmind admits that few-shot training has limitations, such as having too many variables to account for when the training dataset is so small.
There is no ‘golden’ few-shot method that will work in all situations, the Flamingo researchers said.
Source link