How to Empower AI’s Data

Artificial intelligence (AI) is one of the most important technological trends of the next decade. Propagating and collecting data is the default state of modern business and internet activity in an increasingly digital world. The issue for businesses is not a lack of data, but rather an abundance of it. Despite the massive amounts of data available to industrial companies, most find that their AI systems are not providing the insights they expected. The solution is to filter data so that only relevant data reaches AI systems. This “smart data” approach will enable AI systems to generate the expected insights.

What exactly is Smart Data?

AI is an important part of the fourth digital revolution. AI extracts insights from Big Data that no human could discover. The more data AI has, the more variables it has, the longer its timescales and granularity, and thus the greater the potential insights it has.

Years of data can be used by AI to discover the optimal parameters for industrial processes that use controlling variables. These insights can then be applied to these industrial systems to improve their performance.

Despite AI’s promise, many industrial companies have yet to reap the benefits of disseminating and collecting so much data. According to McKinsey, while 75% of industrial companies have tried some form of AI system, only 15% have seen any meaningful, scalable impact from AI. McKinsey identifies a lack of operational insight as a barrier to their use of AI. This approach can be successful, but only within very specific parameters, and often with frequent retraining and a large number of inputs. It can also lead to physical or unrealistic results.

Hence, these AI models cannot be used in the real world or to produce the kinds of meaningful change that their users expect. As a result, teams become dissatisfied with the system and lose faith in AI.

Smart data is the answer. To leverage big data for the expected insights, data must have fewer variables governed by feature engineering based on first principles. This re-engineering of data to produce smart data, combined with more appropriate training, can result in superior returns ranging from 5% to 15%.

Smart data has been defined in a variety of ways, but the key characteristics are that it refers to data that has been prepared and organized where it was collected in order for it to be ready and optimized for higher quality, speed, and insight data analytics.

Donna Ray, then executive director of the US Department of Homeland Security’s Information Sharing and Services Office, stated at a 2018 conference that her teams spend about 80 percent of their time just probing, ingesting, and acquiring data ready for analysis. The smart data approach has assisted federal agencies in optimizing processes, speeding up operations, and making them more intelligent. According to Wired, smart data is information that makes sense.

How Do You Create Smart Data?

Let’s take a look at five steps to creating smart data.

  1. Define the Data

The first step in developing smart data is to define the process. Processes must be broken down into clearly defined steps for the company’s plant engineers and experts, with physical and chemical changes sketched out. The critical instruments and sensors, limits, maintenance timeframes, measurement units, and controllability of the business must all be identified. There are deterministic elements in physical systems that are governed by well-defined equations. These equations, as well as their variables, must be noted. To add to their own understanding, teams must also understand the literature surrounding these equations.

  1. Enhance the Data

We’ve all heard the phrase “bad data in, bad data out,” but the reality is that all data is bad data in some way. Raw process data is always incomplete. So, rather than increasing the amount of data available, your task is to improve the quality of the dataset. Nonsteady-state information must be aggressively weeded out.

  1. Reduce the Dimensionality

AI models are created by matching observables to features. To obtain a generalized model, the number of observations must far outnumber the number of features. Inputs are frequently combined to generate new features. When the typical plant’s abundance of sensors is taken into account, the result is a vast trove of observations. However, inputs that describe the physical processes involved should be used, funneled through deterministic equations, to reduce their dimensionality while also creating features with intelligently combined sensor information.

  1. Make use of machine learning

Industrial processes contain both deterministic and stochastic elements. The deterministic components are supplied by first-principles-based features, while the stochastic components are supplied by machine learning. Features should be evaluated in order to determine their significance and explanatory power. Ideally, the most important features should be expert-engineered features.

Models should prioritize plant improvements over achieving maximum predictive accuracy. All process data exhibits high correlations. Correlations may thus be meaningless. What is required is the separation of causal elements and controllable variables.

  1. Model Implementation and Validation

Models must be implemented in order to have the anticipated meaningful impact. Results must be continuously evaluated by examining key features to ensure that they correspond to physical processes. Partial dependence plots must also be reviewed in order to learn about causality and confirm controllable elements.

To better understand what is implementable and what performance expectations make sense, operations teams must be consulted and made a critical part of the process. Control room operators must receive model results as they are generated, or teams must conduct on-off testing so that management can determine whether it is worthwhile to invest capital in full-scale solutions.

Conclusion

AI holds enormous promise, and it seems counterintuitive to suggest that limits or guardrails should be placed around the data that is being propagated and collected today. Nonetheless, Big Data frequently fails to yield meaningful AI insights. Smart data can ensure that AI has the significant impact that we expect.

Source link