Singularity – Microsoft’s new AI infrastructure service

Microsoft is developing a new Artificial Intelligence (AI) infrastructure service called ‘Singularity,’ which will offer data scientists and AI researchers a strategy for building, scaling, testing, and iterating on their prototypes on a Microsoft Azure Cloud service designed specifically for AI.

A category of researchers, mostly of Indian origin, have published a paper detailing the ‘Singularity’ project’s technical details.

Lowering costs through increased utilization of deep learning workloads is a critical lever for cloud providers. The researchers wrote that they were presenting Singularity – Microsoft’s universal distributed scheduling service for highly competent and dependable deep learning training and inference workload execution.

Microsoft’s ‘Singularity’ is a completely managed, universal distributed infrastructure service for AI workloads that supports a wide range of hardware accelerators.

‘Singularity’ is built to scale across a worldwide fleet of hundreds of thousands of GPUs and more AI accelerators.

The researchers explained that Singularity is built with one primary objective: slashing AI’s cost by maximizing the aggregate effective output on a given fixed pool of accelerator ability at planet scale.

Microsoft made an investment of $1 billion in OpenAI in 2019 and announced last year that the ‘Azure OpenAI Service’ will be providing customers with access to OpenAI’s powerful models in addition to the security, authenticity, compliance, data privacy, and other enterprise-grade potentials built into Microsoft Azure.

According to the researchers, ‘Singularity’ accomplishes a remarkable improvement in scheduling deep learning workloads, transforming niche attributes like elasticity into mainstream, always-on features on which the scheduler can depend for executing stringent service level agreements (SLAs).

They added that Singularity facilitates unmatched levels of workload fungibility, allowing jobs to take advantage of reserve capacity anywhere in the universal distributed fleet while maintaining SLAs.

Microsoft revealed a new dynamic supercomputer in alliance with OpenAI in 2020, making new infrastructure accessible in Azure for training extremely large AI models.

The supercomputer established for OpenAI is an exclusive system consisting of over:

  1. 285,000 CPU cores,
  2.  10,000 GPUs, and
  3. 400 gigabits per second network connectivity for individual GPU servers.

Source link