Home Data Engineering Data News Role of Data Science in Epidemiology

Role of Data Science in Epidemiology

Role of Data Science in Epidemiology

During the COVID-19 pandemic, many data science and business analytics practitioners have been pulled—mostly willingly—into the field of epidemiology. Large businesses with data science teams wanted to learn as much as possible about the likely course of the infection in the places where they do business. Some may also have had some epidemiologists or medical officers in the organization, but they didn’t necessarily have enough analytical talent in their groups to run the numbers on the virus’ prevalence and growth.

These data scientists were primarily attempting to report or predict cases and/or deaths due to the coronavirus. Although many different websites offered basic descriptive analytics on the prevalence of the virus, most didn’t offer predictions about future cases and deaths, or provide data at a sufficiently granular level of geography to be useful to companies. Depending on their industry and business model, the companies had a specific rationale for doing this work involving how the pandemic relates to their business, customers, or workforce.

Because each particular use case for data science is dependent upon the context, I’ll describe each example in the company that employed it. Several of the companies and their representatives wished to remain anonymous, but they confirmed the details of their projects.

Predicting Deaths at a Life Insurance Company

The analytics and data science group at a large life insurance company began a project in March 2020 to predict deaths from COVID-19. Any pandemic that leads to a sizable increase in unexpected deaths is something that a life insurance company needs to understand and predict the course of. The company was also interested, of course, in when the company’s agency and office employees might come back to the office safely, and in what numbers.

Their models suggested that the total deaths from COVID would be higher than most other estimations, depending in part on the measures taken to control the virus. The models rely not only upon extrapolations of reported deaths, but also analyses of “excess deaths”—those likely to be from COVID-19 but not officially reported as such. The data scientists have revised their model several times to account for new data and new epidemiology policies across the US. The models aggregate state-level predictions and include state-specific undercount factors and effects at that level of tightening and opening policies. The company then categorized all states into one of four standardized opening phases. The standardized phase categories incorporate issues like opening or closing of schools, nonessential businesses, and other facilities and institutions.

The data scientists have also made more granular predictions about the impact on counties for purposes of assessing the impact on agencies and offices. The analytics team did not predict the number of COVID cases, in part because the number of cases has less impact on the company’s business, but primarily because the available data on US cases is less reliable. All of the analyses were received with great interest by various executives and groups within the company.

Forecasting Staffing Implications for a Logistics Company

The head of health and safety at a logistics company was thinking about how data might help the company adapt to the pandemic. Since his function included medical leave programs, he was particularly interested in predicting medical leaves for COVID-19, and understanding how they might affect company operations. He asked his analytics group to create a dashboard of COVID-19 impacts on the company. One key item was predictions of medical leaves based on COVID.

The health and safety leader said the dashboard has been very popular, and he gets requests for it from all over the company. In general, however, he notes that managers have been more interested in descriptive data on what has already happened rather than predictions of what might happen.

Predicting the Impact on Meat Processing Plants for an Animal Health Company

First Analytics, an analytics and data science services company (where I am the co-founder and nonexecutive Chairman), does analytics work for large companies. When the COVID-19 pandemic hit, Mike Thompson and Rob Stevens, who lead the firm, thought that some of their clients might be interested in predictive analytics on COVID prevalence in the U.S. They knew that there were several sources of descriptive county-level case and death data in the US, but none of it—at least at the time—was predictive. So the First Analytics team created a predictive model that took county-level data aggregated by the New York Times and predicted several weeks out what might happen to case and death rates. It took into account the lockdown status of the state or county and the percentage testing positive in the area. Of course, the model could be confounded by local breakouts of the virus in a prison or nursing home.

First Analytics had done analytics consulting for Elanco, a leading animal health company, and contacted them about whether there was interest in using the COVID-19 predictions. Michael Genho, head of analytics and other knowledge-based solutions for the company, said that he was interested in discussing the idea. The primary interest was not for internal use at Elanco, but rather for customers who have large herds of livestock. COVID-19 has been particularly problematic for meat processing plants, which have had 40,000 cases in the US, in part because workers have little social distancing. If a plant closed down or reduced its capacity, livestock owners would have nowhere to bring their animals for slaughter. During normal times, they carefully plan to bring animals in for processing when they are at the optimal weight.

Elanco does have epidemiologists on staff, but they focus on animals. The analytics group commonly works with commercial leaders to help them make business decisions using data and analysis. The predictive model was accurate in predicting meat processing plants that were soon to face challenges from COVID outbreaks. It identified the likelihood of problems with plants by segmenting them into green, yellow, and red categories. The best predictions were made one or two weeks before the plants closed or reduced capacity.

Customers, who otherwise were relying only on their intuition, valued the forecasts and asked to talk each week with Elanco when the predictions were updated. The predictions were augmented by Genho’s analytics group with data on weekly production of the plants, and news on plant shutdowns, slowdowns, whistleblowers among employees, and actual COVID cases in plants. Customers had some options in terms of moving livestock to other facilities or changing the window when they would go to market. The customers didn’t use the dashboard in an interactive fashion, but they were happy to get the predictions from Elanco.

Field Sales Safety at a Consumer Products Company

A consumer products company that sells through grocery retailers was concerned about the health and safety of its field sales force in visiting stores in COVID hot zones. They had been pulled out of stores in March, but the company was attempting to determine when it was safe for them to return. The company’s analytics group heard about the county-level prediction model from Rob Stevens at First Analytics, and applied it to analyze individual stores. A member of the analytics group put together a COVID-19 tracker—an internal, location-based tracker for COVID cases in the company’s plants and offices. Another version assessed store safety; each store for which a field sales rep was responsible was given a “red/yellow/green” tag in terms of how many COVID cases there were in each county.

The analytics group provided the analysis to the company’s Health and Safety and Legal teams, who were discussing the issue of what messages to send to employees. They found the predictive model interesting and useful, but they did not want to send the predictions to employees because they thought they might be difficult to explain. In addition, they were concerned that a “green” rating for a store might lead salespeople to not employ any precautions when visiting it.

Data Science and Epidemiology, on Balance

I learned several lessons from examining how data science and analytics groups are addressing COVID-19 data and acting as amateur epidemiologists. First, there aren’t enough epidemiologists in business to go around, so data scientists and business analytics professionals can provide helpful information to decision-makers. They may not be trained in epidemiology, but the principles of data science and analytics can easily be applied to the field.

However, given the challenges of applying these analytical results to daily operations, companies may be more comfortable providing insights to customers than to their own employees. And in many cases, because of the lack of historical data about pandemics, predictive analytics were somewhat less trustworthy to decision-makers than descriptive analytics in this uncertain time.  And although their skills can be applied to epidemiology, all the data science and analytics people I spoke with will be happy to return to more traditional domains like demand forecasting and customer behavior analysis when the COVID-19 pandemic is no longer with us.