By recognizing a big dataset’s main properties and segmenting it into manageable batches that don’t strain computer hardware, a machine-learning method proved its capacity to process data that exceeds a computer’s available memory. The programme, created at Los Alamos National Laboratory, broke a world record for factorizing enormous datasets during a test run on Summit, the fifth-fastest supercomputer in the world, at Oak Ridge National Laboratory.
The highly scalable algorithm, which is equally effective on laptops and supercomputers, addresses hardware bottlenecks that prevent the processing of data from data-rich applications in fields like cancer research, satellite imagery, social media networks, national security science, and earthquake research, to name a few.
Ismael Boureima, a computational physicist at Los Alamos National Laboratory, said, they have developed a ‘out-of-memory’ implementation of the non-negative matrix factorization method that allows you to factorize larger datasets than previously possible on a given hardware. Simply put, this method divides the huge data into manageable chunks that can be analyzed using the resources at hand. As a result, it’s a helpful tool for managing exponentially expanding datasets.
Data must fit inside memory restrictions for traditional data analysis. Manish Bhattarai, a machine learning expert at Los Alamos and a co-author of the article, remarked that their method refutes this idea. An out-of-memory fix has been presented. Their programme divides the data volume into smaller chunks when it surpasses the amount of memory that is available. Each of these portions is processed separately, cycling in and out of memory. Their capacity to effectively manage and analyze very huge datasets is made possible by this technology.
According to Boureima, the distributed method for current and heterogeneous high-performance computer systems can be useful on hardware as simple as a desktop computer to as huge and sophisticated as the Chicoma, Summit, and forthcoming Venado supercomputers.
The question is no longer whether it is possible to factorize a larger matrix, but rather how long the factorization will take, according to Boureima.
The Los Alamos system makes use of hardware capabilities like GPUs to speed up computing and quick connectivity to effectively transfer data between computers. The algorithm effectively completes numerous jobs concurrently.
The Los Alamos SmartTensors project has produced a number of high-performance algorithms, the most recent of which is non-negative matrix factorization.
Non-negative matrix factorization can be used in machine learning as a type of unsupervised learning to extract meaning from data, according to Boureima. For machine learning and data analytics, this is crucial because the algorithm can find explainable hidden elements in the data that have a specific meaning to the user.
The record-breaking run
The programme used 25,000 GPUs to process a 340 terabyte dense matrix and an 11 exabyte sparse matrix during the Los Alamos team’s record-breaking run.
According to Boian Alexandrov, a co-author of the current article and a theoretical physicist at Los Alamos who led the team that created the SmartTensors artificial intelligence platform, they are achieving exabyte factorization, which no one else has done to their knowledge.
A specialized data-mining technique called decomposing or factoring aims to extract the relevant information and transform the data into formats that are easier to comprehend.
Bhattarai went on to highlight the algorithm’s scalability, saying that traditional approaches frequently experience bottlenecks, mostly because of the delay in data transfer between a computer’s processors and its memory.
Moreover, they demonstrated that large computers are not always necessary, according to Boureima. Scaling to 25,000 GPUs is fantastic if you have the money to do so, but you can utilize this method to analyze data on desktop PCs that you couldn’t before.