Last year, Meta, formerly known as Facebook, announced that it would concentrate on the Metaverse, a shared virtual environment. Engineers are innovating hardware and software that are alluring, social, and increase the depth of people’s associations as part of the Meta Research program.
Meta is investing in the following research areas:
- Augmented reality (AR) / Virtual reality (VR)
- Artificial Intelligence (AI)
- Blockchain technology and cryptocurrencies
- Vision in computers
- Machine Learning (ML)
These cutting-edge technologies frequently necessitate the use of powerful computers with the ability to perform quadrillions of operations per second.Meta recently announced the design and construction of an AI Research SuperCluster (RSC) to aid in easing the computing necessities for its research.
Using RSC, Meta researchers were able to train large models required for the development of AI for technologies such as Natural Language Processing, Computer Vision, and Speech Recognition.
Let us examine the need for artificial intelligence and supercomputers before delving into Meta’s RSC supercomputer.
Supercomputers additional AI applications
The widespread use of AI and AI-based applications has increased the demand for supercomputers to a great extent. AI models are becoming more complex as they resolve next-generation technology challenges. Huge computational power and scalability are necessary for training them, especially because learning is the true power of AI, and it is only as dependable as the training they have received.
On the whole, supercomputers can speed up the system that trains AI models. AI models can be trained quickly, with much larger, detailed, and focused sets, because of the increased speed and capacity.
Applications such as computer vision need a system with the capability to process a large amount of media at increased data sampling rates. Other applications, such as natural language processing (NLP), need to understand multiple languages, dialects, and accents. In the real world, supercomputers can assist with similar tasks.
A supercomputer would not only help Meta in its future projects with AR/VR and AI, but it would also aid Meta engineers in developing multiple models. They could, for example, develop models that can detect harmful content on social media websites, paving the way for embodied AI and multimodal AI to help in improving user experience.
All about AI Research SuperClusters
The RSC will assist researchers in developing new and improved AI models with the capability to learn trillions of examples, whether they are images, texts, or other media. It claims to be one of the fastest AI supercomputers in the world.
Overall, supercomputers are built by the integration of various graphics processing units (GPUs) into compute nodes, which are then linked by high-performance and high-speed data lines that permit nodes to communicate quickly.
The RSC is made up of 760 NVIDIA DGX A100 compute nodes, totaling 6080 GPUs.
A supercharged system, the NVIDIA DGX A100 claims to be appropriate for all types of AI assignments. It incorporates one of the most state-of-the-art accelerators and the NVIDIA A100 tensor core GPU, allowing the hardware to offer thrice the throughput for the training of AI and 83 percent more throughput than the CPU.
Furthermore, this GPU employs the NVIDIA Ampere architecture for providing twenty times the performance of its predecessor.
Each DGX computes node communicates using the NVIDIA 1600 Gb/s InfiniBand fabric, without oversubscription (a situation that arises when shared hosting is providing a series of computing resources exceeding the available capacity).
Furthermore, when the RSC is finished, it will have over 16000 GPU endpoints. There are two types of data storage systems allowing accelerated computing for any data-center solution: one optimized for data storage and the other optimized for delivering it.
Flash storage solutions executing this type of configuration are quicker than traditional storage. The storage of RSC contains the following:
- 175 petabytes of Pure Storage Flash Array
- 46 petabytes of cache storage
- 10 petabytes of Pure Storage Flash Blade.
The future aspect of RSC
The RSC is still in operation and being developed. The project’s second phase will incorporate the increase in the number of GPUs to 16000 and the InfiniBand fabric to 16000 ports. Furthermore, researchers intend to increase the delivery bandwidth to 16 TB/s and exchange scale storage capacity.
Overall, the Meta researchers claim that the project’s second phase will result in more accurate AI models and better user experiences. They hope to use this supercomputer to build next-generation AI infrastructure and foundational technologies for a broader advancement in the AI community.
Source link