Understanding the reasons behind the Huge Energy And Power Demands of Artificial Intelligence
Artificial intelligence (AI) systems today are still far from replicating true human intelligence. But, they are certainly getting better at discerning data patterns and mining insights, to a degree better than us. At present, artificial intelligence models can recognize images, converse with people via chatbot, drive autonomous cars and even has won against us in Chess. However, did you know, that the energy and power consumption involved in training and building these models is highly staggering? In other words, training AI is an energy-intensive process with a high carbon footprint.
So, reducing this energy consumption will have positive knock-on effects on the environment. Further, it will also bring other benefits to businesses, such as reducing their carbon footprint and getting closer to carbon-related targets. And before moving ahead with building energy efficient AI or green AI, we must understand why artificial intelligence is so power-hungry?
Training a Neural Network
Let us consider a neural network model. Neural network is a powerful type of machine learning which models itself by mirroring the human brain. Composed of node layers, a neural network attempts to recognize the underlying relationships in a data set by mimicking human brain functions. Each node is connected with another and has an associated weight and threshold. Suppose the output value of a node is higher than the specified threshold value, it implies that the node is activated and ready to relay data to the next layer of the neural network.
The training of a neural network comprises of running forward pass, where input is passed through it and an output is generated after processing the input. Then the backward pass involves updating the weights of neural network using errors received in forward pass, via gradient descent algorithms that need a massive amount of matrix manipulation.
In June 2019, a team of researchers from the University of Massachusetts at Amherst, published a paper on their study, where they assessed the energy consumption required to train four large neural networks. These neural networks are: Transformer, ELMo, BERT, and GPT-2, which they trained on a single GPU for one day each, and measured the energy consumption throughout.
One of these neural networks, viz., BERT (Bidirectional Encoder Representations from Transformers) uses 3.3 billion words from English books and Wikipedia articles. According to an article on The Conversation by Kate Saenko, BERT had to read this vast data set around 40 times in the training stage. To draw a comparative analysis, she mentions that a five-year-old, average child learning to talk might hear 45 million words by this age, which is 3000 times fewer than BERT.
In the study from the University of Massachusetts at Amherst, the researchers found that training BERT once has the carbon footprint of a passenger flying a round trip between New York and San Francisco. The team computed the total power consumption for training each model by multiplying this figure by the total training time reported by each model’s original developers. The carbon footprint was calculated on the basis of average carbon emissions used in power production in the US.
The experiment study also included training and developing a tuning process called Neural Architectural Search. This technique involves automating the design of a neural network through an energy exhaustive trial and error process. This additional tuning step used to enhance the BERT’s final accuracy, contributed to an estimated 626,155 tons of CO2, which is approximately equal to the total lifetime carbon footprint of five cars. In comparison, the average American generates 18.078 tons of CO2 emissions in a year.
The GPU hunger
The advancements in artificial intelligence have been possible thanks to the powerful GPU (Graphical Process Units) we have today. These GPUs generally consume a lot of electricity. According to NVIDIA, the maximum power dissipated by a GPU equals 250 W, which is 2.5 times higher than that of the Intel CPU. Meanwhile, researchers believe that having larger artificial intelligence models can lead to better accuracy and performance. This is similar to the performance of gaming laptops, which though have high capabilities than a regular laptop but also gets heated up more quickly due to heavy performance. Today, one can rent online servers with dozens of CPUs and strong GPUs for few minutes and quickly develop powerful artificial intelligence models.
As per OpenAI, a San Francisco-based AI research lab, since the early years of machine learning development to 2012, the number of computational resources required by the technology doubled every two years (drawing parallels with Moore’s law of growth in processor power). However, after 2012, the trajectory of computing power for building top-notch models, on average has doubled every 3.4 months. This means that the new computational requirements translate to negative environmental impacts due to artificial intelligence.
Also now experts argue that building massive artificial intelligence models does not necessarily mean better ROI in terms of performance and accuracy. So companies may have to make trade-offs between accuracy and computational efficiency.
Spiking Neural Network
A research group from Oak Ridge National Laboratory has previously demonstrated a promising way to improve AI energy efficiency by converting deep learning neural networks into a spiking neural network (SNN). SNN replicates the brain’s neuronal-firing mechanisms and thus possesses many capabilities of the brain, e.g., energy efficiency and Spatio-temporal data processing. The Oak Ridge National Laboratory team spiked the deep spiking neural network (DSNN) by introducing a stochastic process that adds random values like Bayesian deep learning. Bayesian deep learning is an attempt to mimic how the brain processes information by introducing random values into the neural network. Through ‘spiking’ actions, researchers can know where to perform necessary computations – lowering energy consumption.
Currently, SNN is touted to be the next iteration of neural networks and a foundation for neuromorphic computing. Last year, researchers at the Centrum Wiskunde & Informatica (CWI), the Dutch national research center for mathematics and computer science, and IMEC/Holst Research Center from Eindhoven in the Netherlands, have successfully developed a learning algorithm for spiking neural networks.