Multimodal neurons can respond to a cluster of abstract concepts centred around a common high-level theme rather than a specific visual feature.
In a major breakthrough, researchers at OpenAI have discovered neural networks within AI systems resembling the neural network inside the human brain. The multimodal neurons are one of the most advanced neural networks to date.
The researchers have found these advanced neurons can respond to a cluster of abstract concepts centred around a common high-level theme rather than a specific visual feature. Like their biological counterparts, these neurons can respond to a range of emotions, animals, photographs, drawings and famous people.
Researchers wrote these neurons in CLIP can respond to the same concept, whether presented literally, symbolically, or conceptually.
The multimodal neurons have been discovered in the CLIP model that can connect text and images. It can learn visual concepts from natural language supervision. Further, this general-purpose vision system can match the performance of a ResNet-50 but outperforms existing vision systems on the most challenging datasets. For instance, one neuron called the ‘Spider-Man’ can respond to a spider’s image, the text ‘spider’, and the comic book character ‘spider-man’.
The researchers found multimodal neurons in several CLIP models of varying sizes, but they focused on studying the mid-sized RN50-x4 model. Researchers employed two tools to understand the activations of the model:
- Feature visualisation, which maximises the neuron’s firing by doing gradient-based optimisation on the input.
- Dataset examples, which looks at the distribution of maximal activating images for a neuron from a dataset.
The researchers carried out a series of carefully-constructed experiments to find these neurons’ unique capabilities in the convolutional layer. Each layer consists of thousands of neurons. “For our preliminary analysis, we looked at feature visualisations, the dataset examples that most activated the neuron, and the English words that most activated the neuron when rastered as images,” said researchers. Most of these neurons were made to deal with sensitive topics, from political figures to emotions.
The experiment revealed an incredible diversity of features such as region neurons, person neurons, emotion neurons, art style neurons, time neurons, abstract neurons, colour neurons and more.
Researchers found that a majority of neurons in CLIP are readily interpretable. “From an interpretability perspective, these neurons can be seen as extreme examples of “multi-faceted neurons” which respond to multiple distinct cases. Looking to neuroscience, they might sound like “grandmother neurons,” but their associative nature distinguishes them from how many neuroscientists interpret that term,” stated researchers.