Summary. Data is more important than ever, but most organizations still struggle with a few common issues: They focus more on data infrastructure than data products; data is often created with the needs of a particular department in mind, but little thought for the end use; they lack a common “data language” with each department coding and classifying with their own system; and they’re increasingly focused on outside data, but have few quality control systems in place. By focusing on “data supply chain” management, companies can address these and other issues. Similar to physical supply chains, companies should think systematically, focus on end products, define standards and measurements, introduce quality controls, and constantly refine their approach across all phases of data gathering and analysis.
Data management has bedeviled large companies for decades. Almost all firms spend a lot on it but find the results unsatisfactory. While the issue does not appear to be growing worse, resolving it is increasingly urgent as managers and companies strive to become more data driven, leverage advanced analytics and artificial intelligence, and compete with data. In this article we’ll explore a powerful approach to data management through the lens of “data products” and “data supply chains.”
Most companies struggle with a few common but significant data management issues.
First, companies have concentrated on the technical capabilities of data management, which are controlled by the IT function and are needed to acquire, store, and move data. This is no mean feat — building technical “pipes” is a challenging job. But in so doing they have focused more on infrastructure and much less on the outputs: the data products that are used to make decisions, differentiate products and services, and satisfy customers.
Second, data is created in different parts of the organization to meet the needs of various departments, not for later use by others in data products, business decisions, or processes. Contrast that with a physical product, such as a car, where components such as the chassis and the starter are designed with the end product in mind.
Third, most organizations lack a common data language. Data is subtle and nuanced and has different meanings for different people in different contexts. Exacerbating this, some departments, taking ownership for “their data,” may be reluctant to share. Or while willing to share, they will not make time to explain these nuances so others can use it effectively. This leads other departments to set up their own “near-redundant” databases, adding to the overall confusion.
Finally, companies are increasingly interested in what happens outside their walls, tapping external data to answer a variety of questions. But external data is largely unmanaged, with little supplier qualification or data quality assessment.
Data supply chain management, with data products as the end result of the process, can help to address each of these issues. It puts equal emphasis on all phases of data management — from collection to organization to consumption of data products. It’s a means of balancing the benefits of common data with those of unique and tailored data in products, and it’s equally suited to internal and external data. Relatively few companies employ data supply chain management, but those that do tend to report better results.
Process and Supplier Management for Data Products
Companies have always produced data products in the form of financial statements, reports to regulators, and so forth. Still, the range and importance of such products is growing. For many, the goal is to embed analytics and AI-derived models into products that serve both internal and external customers. Morgan Stanley’s Next Best Action, LinkedIn’s People You May Know, Google’s many search offerings, and MasterCard’s SpendingPulse and Business Locator are good examples. With the issues cited above in full display, “wrangling” the data takes far longer than building the model and still doesn’t solve all the issues.
Fortunately, there is a better way to source high-quality data. It builds on the process and supplier management techniques used by manufacturers of physical products. In particular, manufacturers extend deep into their supply chains to clarify their requirements, qualify suppliers, insist that suppliers measure quality, and make needed improvements at the source(s) of problem(s). This enables them to assemble components into finished products with minimal “physical product wrangling,” improving quality and lowering costs.
One organization employing supplier quality management in its data supply chain is Altria, the U.S.-based provider of tobacco and smoke-free products. Altria depends on point-of-sale data from more than 100,000 convenience stores daily to complete its market reports and analysis. A team reporting to Kirby Forlin, VP Advanced Analytics, manages this base. Data requirements are spelled out in contracts, and the team aims to help stores meet them. To begin, Altria concentrated on its most basic requirements. Quality was poor, with only 58% of daily submissions meeting them. But the Altria team worked patiently, improving quality to 98% in three years. As the score for basic quality improved, the Altria team added its more advanced requirements to the mix. As Forlin noted, “This is a work in progress. The evidence that we can increasingly trust the data saves us a lot of work in our analytics practice and builds trust into our work.”
Steps Toward a Data Supply Chain
The data supply chain can be established within a company by following some of the same steps used in process and quality management for physical supply chains:
- Establish management responsibilities. As step 1a, the chief data officer or product manager should name a “data supply chain manager” from their staff to coordinate the effort and recruit “responsible parties” from each department (including external data sources) across the supply chain. Step 1b is to put issues associated with data sharing and ownership front and center. We find that most issues melt away, as few managers wish to take a hard stance against data sharing in front of their peers.
- Identify and document the data and associated cost, time, and quality requirements needed to create and maintain data products.
- Describe the supply chain. Develop a flowchart that describes points of data creation/original sources of data and the steps taken to move, enrich, and analyze data for use in data products.
- Define and establish measurements. Generally, the idea is to implement measurements that indicate whether requirements are met. Start with data accuracy and the elapsed time from data creation to incorporation into a data product. Measures will vary for each data product’s supply chain.
- Establish process control and assess conformance to requirements. Use the measurements of step four to put the process in control and determine how well the requirements of step two are met and to identify gaps.
- Investigate the supply chain to identify needed improvements — overall and for particular data products. Determine where gaps uncovered in step five originate in the flowchart of step three.
- Make improvements and continuously monitor. Identify and eliminate root causes of gaps identified in step six, and return to previous steps if necessary. Continuously monitor both the input data and the data products, looking to improve products and for the new data and better sources needed to do so.
- “Qualify” data sources. Companies will continue to employ increasing numbers of external data suppliers and it is helpful to identify those that consistently provide high-quality data. Audits of their data quality programs provide the means “qualify” those that do and identify areas of weakness in those that do not.
Key Bank, a top 20 U.S. bank in asset size, uses a broad data supply chain concept to structure its data management initiatives. It breaks its process into the areas of “capture/organize/consume” and attempts to improve efficiency and effectiveness in each area. It recently shifted much of its data storage and analytics to the cloud, and found major improvements in flexibility and speed across the supply chain. Its consumption activities were historically focused on classic business intelligence capabilities, but now it also has a strong data science function.
That necessitated a change in the supply chain toward greater data virtualization and the ability to construct views of data that cut across different data marts and that incorporate external data as well. The bank has been able to use its data supply chain to rapidly develop new banking products that rely heavily on data. For example, it was one of the largest lenders of Payroll Protection Plan loans in the U.S., and also recently introduced a national digital bank for doctors. Mike Onders, the bank’s chief data officer, is effectively the data supply chain manager. He and his staff have evaluated the ability of the bank’s data supply chain to supply a variety of needed data products.
We urge all companies to aggressively manage their most important data supply chains. Data is as important an asset to businesses as any other type, and data products are increasingly as important as physical ones. The same thinking that has improved physical supply chains for decades is proving equally valuable for data.