Machine learning provides powerful tools for researchers to identify and predict patterns and behaviors, as well as to learn, optimize, and perform tasks. They range from applications like vision systems on autonomous vehicles or social robots to smart thermostats to wearable and mobile devices like smart watches and apps that can monitor changes in health. While these algorithms and their architectures are getting more powerful and efficient, they typically require huge amounts of memory, compute, and data to train and make inferences.
At the same time, researchers are working to reduce the size and complexity of the devices on which these algorithms can run, down to a microcontroller unit (MCU) present in billions of devices in the world of Internet of Things (IoT). Mini-computer with limited memory housed in a compact integrated circuit without an operating system and executing simple commands. These relatively inexpensive peripheral devices require low power consumption, low processing and bandwidth, and offer many opportunities to inject AI technology to expand their usefulness, increase privacy, and democratize their use – a. domain called TinyML.
Now, an MIT team working on TinyML at the MITIBM Watson AI Lab and the research group led by Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science (EECS), have developed a technique to reduce the amount of memory required. , while improving image recognition performance in live video.
“Our new technology can do a lot more and paves the way for tiny machine learning on peripheral devices,” says Han, who develops the TinyML software and hardware.
To make TinyML more efficient, Han and his colleagues at EECS and MITIBM Watson AI Lab studied how memory is used in microcontrollers that run various convolutional neural networks (CNNs). CNNs are biologically inspired models of neurons in the brain and are widely used to assess and identify visual features in images, such as a person walking through a frame of video. In their study, they discovered an imbalance in memory usage that was causing the initial stress on the computer chip and creating a bottleneck. By developing a new neural architecture and inference technique, the team was able to alleviate the problem and reduce the maximum memory usage by four to eight times.
Additionally, the team implemented it in their own tinyML vision system, equipped with a camera and capable of recognizing people and objects, and created its next generation called MCUNetV2. Compared to other machine learning methods that run on microcontrollers, MCUNetV2 has outperformed them. with high recognition accuracy and opens the door to further vision applications that were previously not possible.
The results will be presented this week at the Neural Information Processing Systems (NeurIPS) conference in a paper that includes Han, lead author and PhD student Ji Lin, postdoc WeiMing Chen, PhD student Han Cai, and MITIBM Watson AI Lab Research scientist Chuang Gan.
A design for memory efficiency and redistribution
TinyML offers numerous advantages over the deep machine learning that occurs on larger devices such as remote servers and smartphones. According to Han, this includes data protection, since the data is not transferred to the cloud for computing, but is processed on the local device; Robustness as the computation is fast and the latency is low; and low cost as IoT devices cost around $ 1 to $ 2. Additionally, some larger, more traditional AI models can emit up to five cars in their lifespan, require many GPUs, and cost billions of dollars to train. We therefore believe that these TinyML techniques can enable us to go offline to save CO2 emissions and make AI greener, smarter, faster and also more accessible for everyone in order to democratize AI, says Han.
However, small MCU memory and digital storage limit AI applications, so efficiency is a key challenge. MCUs contain only 256 kilobytes of memory and 1 megabyte of memory. In comparison, mobile AI on smartphones and cloud computing can therefore have 256 gigabytes and terabytes of storage as well as 16,000 and 100,000 times more storage. As a valuable resource, the team wanted to optimize their usage, so they outlined the MCU memory usage of the CNN designs, a task that had previously been overlooked, Lin and Chen say.
Their results showed that memory usage peaked in the first five convolution blocks around 17. Each block contains many interconnected layers of convolution that help filter the presence of certain features in an input image or video and a feature map such as During Initial Storage. To create In the intensive phase, most of the blocks worked beyond the memory limit of 256 KB, which left a lot of room for improvement.
To reduce the memory spike, the researchers developed a patch-based inference program that works with only a small fraction, about 25 percent, of the layer’s feature map at a time before moving on to the next quarter until the whole layer is done. This method saved four to eight times as much memory as the previous layer-by-layer calculation method with no latency.
To illustrate, let’s say we have a pizza. We can divide it into four pieces and only eat one piece at a time, saving about three quarters. That’s the patch-based inference method, says Han. “But that wasn’t free lunch.” Like the photoreceptors in the human eye, they can only capture and examine part of an image at a time; This receptive field is a section of the entire image or field of vision. As these receptive fields (or pizza slices in this analogy) increase in size, the overlap increases, which amounts to a redundant computation, which the researchers found at around 10 percent.
The researchers also suggested redistributing the neural network across the blocks in parallel to the patch-based inference method, without losing the precision of the vision system. The question remained, however, which blocks need the patch-based inference method and which blocks could use the original layer-by-layer along with redistribution decisions; adjusting all of these buttons manually was labor intensive and best left to the AI.
We want to automate this process by performing a joint automated optimization search that includes the architecture of the neural network, the number of layers, the number of channels, the kernel size and the inference program including patches, the number of layers for patch-based inference and other optimization buttons, says Lin,” so non-mechanical learning experts can have a push-button solution to improve computational efficiency but also improve engineering productivity to be able to implement this neural network in microcontrollers.
A new horizon for tiny vision systems
The design code for the network architecture with neural network search optimization and inference programming provided significant advantages and was adopted in MCUNetV2; Outperformed other image processing systems in terms of maximum memory usage and image and object recognition and classification. The MCUNetV2 device contains a small screen, camera, and is roughly the size of a headphone case. Compared to the first version, the new version required four times less memory for the same precision, says Chen. When placed face to face with other tinyML solutions, MCUNetV2 was able to detect the presence of objects in picture frames, such as faces, people, with an improvement of almost 17 percent.
Additionally, it set an accuracy record of nearly 72% for a thousand class image classification on the ImageNet dataset, using 465 KB of memory. The researchers tested for what’s known as visual wake words, how well their MCU vision model could identify the presence of a person in an image, and even with a limited memory of just 30KB, achieves over 90% accuracy, beating the previous advanced method. This means that the method is quite precise and could be implemented to help, for example, smarthome applications.
With the high precision and the low energy consumption and the low costs, the performance of MCUNetV2 opens up new IoT applications. Because of their limited memory, Han says, vision systems in IoT devices were previously only considered good for basic image classification tasks, but their work has helped expand the possibilities for using TinyML. Additionally, the research team envisions it in a wide variety of areas, from monitoring sleep and joint movement in the healthcare industry, to exercise training and movements like a golf swing, to plant identification in agriculture and smarter manufacturing. , from the identification of screws and nuts to the detection of defective machines.
We really tried to make these large-scale, real-world applications a reality, says Han. “Without GPUs or special hardware, our technology is so small that it can run on these cheap little IoT devices and run real world applications like these: visual trigger words, face mask recognition, and person recognition. This opens the door to a new way of realizing small artificial intelligence and mobile vision.