Amazon Web Services (AWS), Amazon’s cloud services division, today announced the general availability of Elastic Compute Cloud (EC2) DL1 instances. While the new instance types generally aren’t particularly novel, DL1 (especially DL1.24xlarge) is the first type on AWS designed to train machine learning models, says Amazon, powered by Gaudí accelerators from Intel’s own Habana Labs.
Cheaper model training
Machine learning is becoming mainstream as companies realize the business impact of implementing AI models in their organizations. Using machine learning typically begins with training a model to recognize patterns by learning from sets of data, and then applying the model to new data to make predictions. The accuracy of the model requires frequent recycling of the model, which requires a significant amount of resources, which leads to increased expenses. Google affiliate DeepMind reportedly spent $ 35 million to train a learning system for the Chinese board game Go.
With DL1 – AWS’s first response to Google’s tensor processing units (TPUs), a set of custom accelerator chips running on the Google Cloud Platform – Amazon and Habana claim that AWS customers can now train models faster and with Price performance up to 40 times higher than the latest GPU-powered EC2 instances. DL1 instances leverage up to eight specially designed Gaudi accelerators to accelerate training, coupled with 256 GB of high bandwidth memory, 768 GB of system memory, custom 2nd Generation Amazon Xeon Scalable (Cascade Lake) processors, 400 Gbps network throughput and up to 4TB local NVMe storage.
Above: Habana’s new training chip was designed for high performance AI training at significant scale.Gaudi introduces one of the industry’s first unique implementations of Remote Direct Memory Access over Ethernet (RDMA and RoCE) on an AI chip. This provides 100 Gbps or 20 50 Gbps communication links so that you can scale up to any number of “thousands” of discrete accelerators. For a GPU or CPU-based instance, due to architectural differences, customers must use the SynapseAI SDK from Havana to migrate existing algorithms. Alternatively, Habana provides previously trained models for image classification, object recognition, natural language processing and recommendation systems in its GitHub repository. .
“The use of machine learning has skyrocketed. However, one of the challenges of training machine learning models is that it is computationally intensive and can become expensive as customers refine and retrain their models, “AWS EC2 Vice President David Brown said in a statement. . “AWS already offers the most comprehensive option for high-performance processing for any machine learning project or application. Adding DL1 instances with Gaudi accelerators offers the most cost-effective alternative to GPU-based instances in the cloud to date. Your optimal mix of price and performance makes it possible. “for customers to reduce training costs, train more models, and innovate faster.
Sizing up the competition
In the June 2021 results from MLPerf Training, an industry benchmark for artificial intelligence training hardware, it took an eight-Gaud system 62.55 minutes to train a variant of ResNet’s popular computer vision model, and 164.37 seconds to train BERT’s natural language model. of Google’s TPUs are hard to come by, but 4,096 fourth generation TPUs (TPUv4) can train a ResNet model in about 1.82 minutes and 256 TPUv4 chips train a BERT model in 1.82 minutes, shows MLPerf Training.
Aside from the purported performance benefits, DL1 offers cost savings, as Amazon and Habana claim, compared to three GPU-based instances: p4d.24xlarge (with eight 40 GB Nvidia A100 GPUs), p3dn.24xlarge (eight 32 GB Nvidia V100 GPUs) and p3. 16xlarge (eight 16GB V100 GPUs) – DL1 offers an on-demand hourly rate of $ 13.11 per hour when training a ResNet model, compared to $ 24.48 per hour for p3 and $ 32.77 per hour for p4d.
Eight A100 40GB GPUs can process more frames (18,251) per second during training than a system with eight Gaudi system (12,987). But Habana emphasizes the efficiency of its chips over the sheer performance.
“Based on the Habana tests of the various EC2 instances and the prices published by Amazon, we found that the DL1 offers 44% savings in ResNet50 training costs compared to the p4d instance. For p3dn end users, the cost savings for training ResNet50 is 69%, Habana wrote. “While… Gaudi doesn’t contain as many transistors as the 7-nanometer A100 GPU, Gaudi’s architecture, which is designed from the ground up for efficiency, achieves higher resource utilization and includes fewer system components than the GPU architecture. As a result, lower system costs ultimately enable lower prices for end users. ”
Future developments
When Intel acquired Habana for around $ 2 billion in December 2019, which made the AI accelerator hardware developed by its Nervana department a bit underhanded, it seemed like a smart move by the chip giant announced that AWS had invested in Habana chips to reduce their time to market.
As an EETimes article points out, cloud providers have so far been reluctant to invest in third-party chips with new computing architectures for AI acceleration. For example, Baidu offers Kunlun, while Alibaba has developed Hanguang. They are available on Microsoft’s Azure cloud or Nimbix, but priority is given to customers “pushing the boundaries of machine learning”.
The DL1 instances will be sit alongside Amazon’s AWS Trainium hardware, a custom accelerator that will be available to AWS customers later this year. In Habana, the company is working on its next-generation Gaudi2 AI that will support the Gaudi architecture from 16 nanometers to 7 nanometers. .
DL1 instances can be purchased as On-Demand Instances, Savings Plans, Reserved Instances, or Spot Instances. They are currently available in the US East (N. Virginia) and US West (Oregon) AWS regions.