Unlocking human-compatible AI

What should be the next step in bridging the gap between natural and artificial intelligence? The answer is up for debate among scientists and researchers. Yann LeCun, Chief AI Scientist at Meta and 2018 Turing Award winner, is betting on self-supervised learning, or machine learning models that can be trained without the use of human-labeled examples.

For years, LeCun has been thinking about and discussing self-supervised and unsupervised learning. However, as his research and the fields of AI and neuroscience advanced, his vision converged around many promising concepts and trends.

LeCun recently spoke at a Meta AI event about possible paths toward human-level AI, remaining challenges, and the impact of AI advances.

World models are central to effective learning

Deep learning’s known limitations include the requirement for enormous training data and a lack of robustness in dealing with novel situations. The latter is known as “out-of-distribution generalization” or edge case sensitivity.

These are the kinds of problems that humans and animals learn to solve from a young age. One doesn’t have to drive off a cliff to know their car will crash. We’re aware that when one object obscures another, the latter continues to exist even if it can’t be seen. We are also aware that if we hit a ball with a club, it will fly in the direction of the swing.

The majority of these things we learn without being explicitly instructed, purely through observation and action in the world. During the first few months of our lives, we form a “world model” and learn about gravity, dimensions, physical properties, causality, and other concepts. This model assists us in developing common sense and making accurate predictions about what will happen in the world around us. We then utilize these fundamental building blocks for acquiring more complex knowledge.

Current AI systems lack this commonsense knowledge, which explains why they are data-hungry, need labeled examples, and are extremely rigid and sensitive to out-of-distribution data.

The question LeCun is attempting to answer is, the way to get machines to learn world models primarily through observation and accumulate the enormous knowledge that babies accumulate solely through observation?

Self-supervised learning

Deep learning and artificial neural networks, according to LeCun, will play a significant role in the future of AI. More particularly, he advocates for self-supervised learning, a branch of machine learning that reduces the need for human input and guidance in neural network training.

Supervised learning, in which models are trained on labeled examples, is the more popular branch of ML. While supervised learning has proven to be very effective in a variety of applications, its need for annotation by an outside actor (mostly humans) has proven to be a bottleneck. To begin, labeling training examples in supervised ML models need a significant amount of human effort. Second, supervised ML models cannot improve on their own because they require outside assistance to annotate new training examples.

Self-supervised ML models, on the other hand, learn by observing the world, recognizing patterns, making predictions (and sometimes acting and intervening), and updating their knowledge based on the way their predictions match the outcomes they observe in the world. It functions similarly to a supervised learning system that does its data annotation.

The self-supervised learning model is far more in tune with how humans and animals learn. We, humans, do a lot of supervised learning, but the majority of our fundamental and commonsense skills are acquired through self-supervised learning.

Because only a small portion of the data available is annotated, self-supervised learning is a highly sought-after goal in the ML community. The ability to train ML models on massive amounts of unlabeled data has numerous applications.

Self-supervised learning has made its way into several areas of machine learning in recent years, including large language models. Essentially, a self-supervised language model is trained by being given text excerpts with some words removed. The model must attempt to predict the missing components. Because the missing parts are included in the original text, this process requires no manual labeling and can scale to very large corpora of text, like Wikipedia and news websites. The trained model will learn accurate representations of text structure. It can be used for tasks like text generation or fine-tuned for downstream tasks like question answering.

Self-supervised learning has also been applied to computer vision tasks like medical imaging by researchers. In this case, the technique is known as “contrastive learning,” and it involves training a neural network to create latent representations of unlabeled images. For instance, during training, the model is given various copies of an image with varying modifications (e.g., rotation, crops, zoom, color modifications, different angles of the same object). The network’s parameters are adjusted until the output is consistent across different variations of the same image. The model can then be fine-tuned on a subsequent task that requires fewer labeled images.

Unlocking human-compatible AI 1

High-level abstractions

Scientists have recently experimented with pure self-supervised learning on computer vision tasks. The model must forecast the occluded parts of an image or the next frame in a video in this case.

According to LeCun, this is a very difficult problem. Images are extremely high-dimensional spaces. Pixels in an image can be arranged in nearly infinite ways. Humans and animals are good at foreseeing what will happen in their environment, but they do not need to forecast the world at the pixel level. We use high-level abstractions and prior knowledge for intuitively filtering the solution space and narrowing it down to a few plausible outcomes.

Unlocking human-compatible AI 2

For example, when we see a video of a flying ball, we expect it to continue on its path in the following frames. We expect it to bounce back if there’s a wall in front of it. We know this because we understand intuitive physics and how rigid and soft bodies work.

Similarly, one would expect a person’s facial features to change across frames when they are speaking to them. As they speak, their mouth, eyes, and brows will move, and they may tilt or nod their head slightly. However, we don’t expect their mouth and ears to switch places unexpectedly. This is because we have high-level mental representations of faces and are familiar with the constraints that govern the human body.

Self-supervised learning with these types of high-level abstractions, according to LeCun, will be critical in developing the kind of robust world models required for human-level AI. Joint Embedding Predictive Architecture is an important component of the solution LeCun is developing (JEPA). JEPA models grasp high-level representations that capture the dependencies amidst two data points, like two video segments that follow each other. JEPA replaces contrastive learning with “regularized” techniques capable of extracting high-level latent features from input and discarding irrelevant data. This enables the model to conclude high-dimensional data, such as visual data.

JEPA modules can be stacked on top of one another to predict and make decisions at various spatial and temporal scales.

Unlocking human-compatible AI 3

Modular architecture

LeCun also discussed a modular architecture for human-level AI at the Meta AI event. This architecture will rely heavily on the world model. However, it will also need to correlate with other modules. A perception module, for example, receives and processes sensory information from the outside world. Perceptions and predictions are transformed into actions by an actor module. A short-term memory module records actions and perceptions and fills in the model’s information gaps. A cost module aids in determining the intrinsic (or hardwired) costs of actions and also the task-specific value of future states.

Furthermore, there is a configurator module that alters all other modules based on the particular tasks that the AI system wishes to perform. The configurator is critical because it directs the model’s limited attention and computation resources toward information relevant to its current tasks and goals. For instance, if we’re participating in or watching a basketball game, our perception system will be focused on particular attributes and components of the world (e.g., the ball, players, court limits, etc.).

As a result, our world model will attempt to predict hierarchical features more relevant to the task at hand (e.g., where will the ball land, to whom will the ball be passed, will the player holding the ball shoot or dribble?) and eliminate non-essential features (e.g., actions of spectators, the movements and sounds of objects outside the basketball court).

Unlocking human-compatible AI 4

LeCun believes that each of these modules can distinctly learn their tasks and communicate with one another via high-level abstractions. This is similar to the brains of humans and animals, which contain a modular architecture (different cortical areas, hypothalamus, basal ganglia, amygdala, brain stem, hippocampus, and so on), with each having connections with others and their neural structure that gradually updates with the organism’s experience.

Role of human-level AI

The majority of discussions about human-level AI revolve around machines that replace natural intelligence and can perform any task that a human can. Naturally, these discussions lead toward topics like technological unemployment, singularity, runaway intelligence, and robot invasions. The future of artificial general intelligence is hotly debated among scientists.

Will there be artificial intelligence without the need for surviving and reproducing, which has been the driving force behind the evolution of natural intelligence? Is consciousness required for AGI? Will AGI have its desires and goals? Is it possible to create a brain in a vat without a physical shell? These are some of the philosophical questions that remain unanswered as scientists make incremental progress toward the long-desired goal of creating thinking machines.

However, a more practical research direction is to develop AI that is “compatible with human intelligence.”. This is the type of AI that may not be able to create the next great invention or write a compelling novel on its own, but it will undoubtedly help humans become more creative and productive, as well as find solutions to complex problems. It will most likely make our roads safer, healthcare systems more effective, weather prediction technology more reliable, search results more relevant, robots less dumb, and virtual assistants more helpful.

When asked regarding the most exciting aspect of the future of human-level AI, LeCun said he thought it was the amplification of human intelligence, the fact that each human could do more stuff, be more productive, creative, dedicate more time on fulfilling activities, which is the history of technological evolution.

Source link