Audio version of the article
A new kind of neural network for understanding symmetry and for accelerating material analysis.
Using a large unstructured data set of 25,000 images, scientists demonstrate for the first time a new type of machine learning technique to identify structural similarities and trends in materials.
Understanding the relationships between structure and property is an important goal of materials research, according to Joshua Agar, a faculty member in the Department of Materials Science and Engineering at Lehigh University. However, due to the complexity and multidimensional nature of the structure, there are currently no metrics to understand the structure of materials.
Artificial neural networks, a type of machine learning, can be trained to spot similarities and even correlate parameters like structure and properties, but there are two main challenges, says Agar.
- One is that the majority of vast amounts of data generated by materials experiments are never analyzed. This is largely because such images, produced by scientists in laboratories all over the world, are rarely stored in a usable manner and not usually shared with other research teams.
- The second challenge is that neural networks are not very effective at learning symmetry and periodicity (how periodic a material’s structure is), two features of utmost importance to materials researchers.
Now, a team led by Lehigh University has developed a new approach to machine learning that can create similarity projections, so researchers can search a database of unstructured images and spot trends for the first time. Agar and his coworkers developed and trained a neural model network with symmetrical software functions, and then applied their method to a series of 25,133 piezoelectric microscopy images with piezoelectric reaction force, that were collected on various material systems over five years at the University of California, Berkeley. Trends that form a basis for understanding structure-property relationships.
One of the novelties of our work is that we are building a special neural network to understand the symmetry and use it as a feature extractor to improve understanding of the images, says Agar, lead author of the article in which the work is described.
Symmetry Aware Recursive Image Similarity Scan for Materials Microscopy, published in Nature Computational Materials Science. The team was able to arrive at projections using Uniform Manifold Approximation and Projection (UMAP), a non-linear dimension reduction technique. This approach, Agar says, enables researchers to learn the topology and top-level structure of the data in a blurry way and compress it into 2D.
When you train a neural network, the result is a vector or a series of numbers that represent a compact description of the properties. These properties help classify things so that some similarity is learned, says Agar. But what is produced is quite large in space because it can have 512 or more different properties. Then you want to compress it into a space that a human can understand, like 2D or 3D or maybe 4D.
In this way, Agar and her team were able to take more than 25,000 images and group very similar types of material together. The types of similar structures in the material are semantically close and certain trends can be observed, they apply some metadata filters, says Agar, to refine and get more and more similarities.
This similarity can be linked to other parameters such as properties. This work shows how better data storage and management could accelerate material discoveries quickly. According to Agar, the images and data generated by failed experiments are of particular value.
Nobody publishes failed results, and that’s a big loss because a few years later someone repeats the same set of experiments, says Agar. So you’re wasting really good resources on an experiment that probably won’t work. Rather than losing all of that information, the data already gathered could be used to generate new trends that have never been seen before and exponentially accelerate discovery, says Agar.
This study is the first use case for an innovative new data warehousing company based in Oak Ridge National Laboratory called DataFed. According to its website, DataFed is a federated, big-data storage, collaboration, and full-life-cycle management system for computational science and/or data analytics within distributed high-performance computing (HPC) and/or cloud-computing environments.
Agar’s team at Lehigh was involved in the design and development of DataFed to make it relevant for scientific use cases. Lehigh is the first live implementation of this fully scalable system. It is a federated database so everyone can open their own server and connect to the central installation.
Agar is a machine learning expert on the Presidential Nanohuman Interface Initiative team at Lehigh University. The interdisciplinary initiative, which integrates social and engineering sciences, aims to transform the way people interact with the tools of scientific discovery in order to accelerate innovation.
One of the main goals of the Lehigh Nano / Human Interface Initiative is to provide experimenters with relevant information in order to provide actionable information that enables better informed decision-making and accelerates scientific discovery, says Agar. Humans have a limited memory and ability to recall. DataFed is a modern memex; provides a reminder of scientific information that is easy to find and remember.
This is one of the key components of our Lehigh Presidential Nano / Human Interface Initiative (NHI) to accelerate scientific discovery, said Martin P. Harmer, professor at the Alcoa Foundation in the Department of Materials Science and Engineering by Lehigh and Director of the Nano / Human Interface Initiative.