Some thoughts on Deep learning

We are putting another year of amazing advancements in deep learning for artificial intelligence (AI) behind us. This year was full with notable advancements, controversies, and, of course, disagreements. Here are some of the most important broad trends that characterised this year in deep learning as we wind up 2022 and get ready to embrace what 2023 has in store.

1. Scale is still a crucial consideration

The need to build larger neural networks is one deep learning trend that hasn’t changed in recent years. Scaling neural networks is made possible by the availability of computer resources, specialised AI gear, big datasets, and the creation of scale-friendly topologies like the transformer model.

Scaling neural networks up to higher sizes is currently helping businesses provide better results. In the past year, Google announced Pathways Language Model (PaLM), with 540 billion parameters, and Generalist Language Model (GLaM), with up to 1.2 trillion parameters. Microsoft and Nvidia also released the Megatron-Turing NLG, a 530 billion-parameter LLM. DeepMind announced Gopher, a 280 billion parameter large language model (LLM).

Emergent skills, when larger models are successful at completing tasks that were impossible with smaller ones, is one of the fascinating characteristics of scale. This phenomena has proved particularly fascinating in LLMs, where larger models exhibit promising performance on a wider variety of tasks and benchmarks.

However, it is important to note that even in the biggest models, some of deep learning’s core issues remain unresolved (more on this in a bit).

2. Unsupervised learning is still effective.

Many effective deep learning applications, also known as supervised learning, call for humans to annotate training samples. However, the majority of the data on the internet does not have the crisp labels required for supervised learning. Additionally, sluggish and expensive data annotation causes bottlenecks. For this reason, academics have long pursued improvements in unsupervised learning, which enables the training of deep learning models without the need for human annotation of the training data.

In recent years, this field has made enormous strides, particularly in LLMs, which are typically trained on massive raw data sets obtained from throughout the internet. While LLMs continued to advance in 2022, we also observed significant trends in the adoption of unsupervised learning approaches.

For instance, this year saw incredible progress in text-to-image models. Unsupervised learning is powerful, as seen by models like OpenAI’s DALL-E 2, Google’s Imagen, and Stability AI’s Stable Diffusion. These models leverage enormous datasets of already-existing, loosely labelled photos from the internet, in contrast to prior text-to-image models, which needed pairs of images and descriptions that were thoroughly annotated. These models are able to identify a wide range of complex relationships between textual and visual information due to the sheer amount of their training datasets (which is only made feasible because manual labelling is not required) and variety of the captioning schemes. They are consequently far more adaptable when creating images for different descriptions.

3. Multimodality advances significantly

Another intriguing feature of text-to-image generators is that they incorporate several data kinds into a single model. Deep learning models can tackle significantly more challenging jobs because they can handle various modalities.

For the kind of intelligence seen in both humans and animals, multimodality is crucial. Your mind may easily link two things together, for instance, when you see a tree and hear the wind rustling through its branches. Similar to how you may instantly picture a tree, recollect the scent of pine after a rainstorm, or recall other past events when you hear the word “tree”

Clearly, multimodality has contributed significantly to the flexibility of deep learning systems. DeepMind’s Gato, a deep learning model trained on a range of data sources including photos, text, and proprioception data, may have best demonstrated this. Gato performed admirably in a variety of tasks, including playing games, controlling a robotic arm, and labelling images. This contrasts with traditional deep learning models, which are intended to complete a specific goal.

Some academics have gone so far as to say that all we need to create artificial general intelligence is a system like Gato (AGI). While many scientists disagree with this viewpoint, multimodality has undoubtedly contributed to significant advances in deep learning.

4. Fundamental issues with deep learning still exist

Deep learning has made some remarkable advancements, but some of the difficulties it faces are still open. Causation, compositionality, common sense, logic, planning, intuitive physics, abstraction, and analogy-making are a few of them.

These are a few of the intellectual mysteries that researchers from various disciplines are still trying to solve. Pure scale- and data-based deep learning techniques have assisted in some of these problems’ incremental improvement but have fallen short of offering a conclusive answer.

Larger LLMs, for instance, help keep coherence and consistency over longer passages of text. However, they fall short when it comes to jobs that demand careful, methodical planning and reasoning.

Similar to how text-to-image engines produce beautiful graphics but erroneously construct pictures that need composition or have intricate descriptions.

Numerous scientists, including some of the fathers of deep learning, are debating and researching these issues. Yann LeCun, the Turing Award-winning creator of convolutional neural networks (CNN), stands out among them and recently penned a lengthy essay on the limitations of LLMs that just learn from text. LeCun is working on a deep learning architecture that can solve some of the problems the sector is now facing by learning from real-world models.

Deep learning has advanced considerably. But as we advance, we become more conscious of the difficulties in developing fully intelligent systems. Without a doubt, next year will be equally as fascinating as this one.

Source link