Swarm learning is pulling down data silos

Swarm Learning enables machine learning on distributed data sources in compliance with data confidentiality requirements.

Like any large company, HewlettPackard Enterprise (HPE) has had to deal with data silos across the enterprise, from sales and marketing to finance and supply chain, but to develop machine learning analytics and applications that drive insights across the enterprise, they are necessary combine from different data sources.

Instead of building a data lake where data can be out of date when performing analyzes, HPE decided a few years ago to adopt the data federation approach, in which data from different sources is aggregated in a virtual database and at the same time real-time access to data.

HPE recently took its expertise in data federation to the next level when it worked with the German Center for Neurodegenerative Diseases in a research project using swarm learning to develop disease classifiers that use distributed data from patients from different hospitals while ensuring data confidentiality .

“Swarm learning enables any hospital to perform its own machine learning on its own patient data,” said Goh Eng Lim, senior vice president and chief technology officer for artificial intelligence (AI) and high-performance computing at HPE. “There is no sharing of the data, but every now and then a chain of blocks pops up to collect the knowledge that, in technical terms, are weights and parameters of neural networks.”

The study, published in the journal Nature in May 2021, found that the findings were similar to what would have been the result of performing machine learning on a mixed data set, making swarm learning possible along with better confidentiality, privacy and data protection.

Swarm Learning is part of a skills portfolio that HPE seeks to bring to enterprises through the GaiaX Initiative, a federated data infrastructure project supported by more than 300 organizations in Europe and around the world. It is also seen as a move by the European Union to challenge the dominance of American tech giants in the global digital economy.

HPE is a member of the non-profit GaiaX Association for Data and Cloud and contributes to the architecture, standards and certification of GaiaX. It is already working with dozens of organizations across Europe to help them prepare for decentralized data infrastructures such as GaiaX.

Goh noted that Asia Pacific organizations that are already doing or planning to do business in Europe will benefit from skills such as swarming learning under the auspices of GaiaX.

As an example of credit card companies in Europe and Asia that typically don’t share proprietary customer data for business reasons, he said they could benefit from sharing fraud profile data based on what they know about their customers. “They learn what fraud is from their own customers, but they know they don’t see it or maybe another credit card company saw things they didn’t see,” Goh said.

“So you are in a bind. For one, you can’t share customer data, but you want to share insights from customer data related to a specific fraud profile. This is where swamp learning can come into play.” He added that GaiaX also serves other areas such as health, agriculture and energy.

However, swarm learning comes into play at the edge, from where data is typically sent to a central location like a public cloud to train a machine learning model before deploying the model to the edge. With swarm learning, this can be done at edge locations with the information added at the macro level.

“With 50 to 100 billion devices available, most of your data will be relative to your decentralized sources,” Goh said, adding that a lot of work is still needed to ensure highly distributed data sources can be linked in a consistent manner.

Source link