Audio version of the article
US Department of Energy’s Advanced Computational and Data Infrastructures (CDI) such as supercomputers, edge systems in experimental facilities, massive data storage and high-speed networks are used to solve the country’s most pressing scientific problems. Topics include assisting with astrophysical research, providing new materials, developing new drugs, developing more efficient engines and turbines, and making more accurate and timely weather forecasts and predictions of climate change. Increasingly, computational science campaigns are leveraging distributed, heterogeneous scientific infrastructures that span multiple sites connected by high performance networks, which results in scientific data being transferred from instruments to computing, storage and visualization facilities.
However, because these federated service infrastructures are typically complex and managed by different organizations, domains, and communities, both the infrastructure managers and the scientists who use them have limited global visibility, resulting in an incomplete understanding of everyone’s behavior Cross-Resource Scientific Workflows. Although scientific workflow systems greatly increase the productivity of scientists in managing and orchestrating computer campaigns, the complex nature of CDIs, including the heterogeneity of resources and the implementation of complex system software stacks, presents several challenges in predicting the behavior of scientific workflows and leads them past system and application anomalies.
Our new project will provide an integrated platform consisting of algorithms, methods, tools, and services that will help DOE facility operators and scientists to address these challenges and improve the overall end-to-end science workflow.
– Research professor of computer science and research director at the University of Southern California
As part of a new funding from the DOE, the project aims to improve understanding of how simulation and machine learning (ML) methods can be used and expanded to improve the DOE’s computer and data science. The project will add three key skills to current scientific workflow systems: (1) predict the performance of complex workflows; (2) identify and classify anomalies in the infrastructure and workflow and “explain” the sources of these anomalies; and (3) suggest performance optimizations. In order to fulfill these tasks, the project will examine the use of novel simulation, ML and hybrid methods in order to predict, understand and optimize the behavior of the complex scientific work processes of the DOE at DOE CDIs.
The Deputy Director of Research and Network Infrastructure at RENCI stated that in addition to creating a more efficient timeline for investigators, we want to provide CDI operators with the tools to efficiently identify anomalies occurring in the complex landscape of the DOE and fix attachments. To detect anomalies, the project examines real-time machine learning models that detect and classify anomalies using underlying spatial and temporal correlations and expert knowledge, combining heterogeneous sources of information, and generating real-time predictions. The project will enable scientists working at the scientific frontiers of the DOE to efficiently and reliably execute complex workflows across a wide range of DOE resources and reduce the time to discovery.
Additionally, the project will develop machine learning methods that can learn corrective behaviors on their own and optimize workflow performance, with an emphasis on the explainability of their optimization methods, scientific discoveries and transforming the way computer and data science is conducted. As reported by OpenGov Asia, the US Department of Energy (DOE )’s Argonne National Laboratory is at the forefront of efforts to combine cutting-edge artificial intelligence (AI) and simulation workflows to better understand biological observations and accelerating drug discovery. Argonne has partnered with academic and commercial research worked together to get near real-time feedback between the simulation and artificial intelligence approaches on how two proteins in the viral genome of SARSCoV2 interact to help the virus replicate and bypass the host’s immune system.