Researchers from MIT revealed last year that they had created “liquid” neural networks, which are robust, flexible machine learning models that learn on the job and can adapt to changing conditions, for real-world safety-critical tasks like flying and driving. These networks were inspired by the brains of small species. The adaptability of these “liquid” neural networks meant strengthening the bloodstream of our interconnected world, resulting in better decision-making for various time-series data-intensive jobs including heart and brain monitoring, weather forecasting, and stock pricing.
However, as the number of neurons and synapses in these models rises, they become computationally expensive and necessitate clumsy computer programs to solve the intricate math at their core. And just like many physical events, all of this arithmetic gets harder to solve with size, which calls for computing numerous tiny steps to get a solution.
In order to unleash a new class of quick and effective artificial intelligence systems, the same scientific team has now found a way to reduce this bottleneck by resolving the differential equation underlying the connection of two neurons through synapses. These modes are orders of magnitude quicker and more scalable than liquid neural nets, but they share the same flexible, causal, robust, and explainable properties. Because they are small and flexible even after training, unlike many traditional models, these neural networks could be employed for any task that involves gaining insight into data over time.
The “closed-form continuous-time” (CfC) neural network models performed better than their state-of-the-art counterparts on a variety of tasks, including event-based sequential image processing, modelling the physical dynamics of a simulated walker robot, and human activity recognition from motion sensors. For instance, the new models were 220 times faster on a sample of 8,000 patients for a medical prediction task.
According to MIT Professor Daniela Rus, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and senior author on the new paper, “The new machine-learning models we call ‘CfC’s’ replace the differential equation defining the computation of the neuron with a closed form approximation, preserving the beautiful properties of liquid networks without the need for numerical integration.” “CfC models are efficient to train and predict, causal, compact, and explainable. They pave the path for reliable machine learning for applications that require it for safety.
Differential equations allow us to calculate the current state of the universe or a phenomenon as it develops, but only step-by-step throughout time. The team dug through their bag of mathematical gimmicks to find the perfect solution: a “closed form'” solution that models the entire description of a whole system, in a single computing step, in order to model natural phenomena over time and understand past and present behavior, such as human activity recognition or a robot’s path, for example.
Their models allow one to compute this equation at any point in the past or the future. Not only that, but because the differential equation doesn’t need to be solved step-by-step, computing is much faster.
Imagine an end-to-end neural network that utilizes a camera installed on a car to provide driving input. The network has been trained to produce outputs like the steering angle of the car. By employing liquid neural networks with 19 nodes in 2020, the team was able to create an automobile that could be driven by 19 neurons and a tiny perception module. Each node in the system is described by a differential equation. Since the closed-form solution is a good approximation of the system’s true dynamics, replacing it in this network would result in the exact behaviour you were looking for. As a result, they can solve the issue with an even less number of neurons, making the process quicker and less computationally expensive.
These models may take time series inputs (events that occurred over time) and utilize them for classification, driving a car, controlling a humanoid robot, and anticipating monetary and medical occurrences, among other things. In addition to improving calculation speed, which occasionally comes with a trade-off, it can also improve accuracy, resilience, and performance with all of these different modes.
The answer to this equation will significantly advance our understanding of both natural and artificial intelligence systems. We can create computational models of brains with billions of cells when we have a closed-form description of how neurons and synapses communicate, which is currently not viable due to the enormous computational complexity of neuroscience models. According to Ramin Hasani, an MIT CSAIL Research Affiliate and the paper’s first author, the closed-form equation might make such large-scale simulations possible and thus open new research directions for us to explore in our quest to comprehend intelligence.
Additionally, there is some evidence that Liquid CfC models can learn tasks from visual inputs in one environment and then transfer those skills to a totally different context without further training. It is known as an out-of-distribution generalization, and it is one of the most important unresolved problems in the field of artificial intelligence.
Differential equation-based neural network systems are challenging to solve and scale to, say, millions and billions of parameters. We may construct larger-scale neural networks by obtaining that description of how neurons interact with one another, not just by figuring out the threshold, but also by resolving the physical dynamics between cells, claims Hasani. This framework should serve as the fundamental building blocks of any future embedded intelligence system since it can aid in the resolution of increasingly difficult machine learning tasks, improving representation learning.
Sildomar Monteiro, AI and Machine Learning Group Lead at Aurora Flight Sciences, a Boeing company, who was not involved in this paper, notes that recent neural network architectures, such as neural ODEs and liquid neural networks, have hidden layers composed of particular dynamical systems representing infinite latent states rather than explicit stacks of layers. These models with implicit definitions have demonstrated cutting-edge performance while requiring significantly fewer parameters than traditional architectures.
However, due to the substantial computational expense needed for training and inference, their practical usage has been constrained. The computation efficiency for this class of neural networks has significantly improved, the author continues, as evidenced by this article. Additionally, it has the ability to open up a wider range of use cases for defense and commercial systems that depend on safety.
The paper was co-authored by Hasani and Mathias Lechner, a postdoc at MIT CSAIL, under the supervision of Rus. Other contributors included Max Tschaikowski, an associate professor of computer science at Aalborg University in Denmark, and Gerald Teschl, a professor of mathematics at the University of Vienna. Lucas Liebenwein, SM ’18, PhD ’21, Aaron Ray, an electrical engineering and computer science PhD student at MIT, and Lucas Liebenwe