Audio version of the article
Feature stores manage data pipelines that transform raw data to feature values.
According to a Gartner study, 85 percent of AI projects will flatline by 2022. Even the most diligent machine learning models may not meet expectations when deployed in an enterprise setting, mainly due to two reasons — inadequate data infrastructure and talent scarcity.
In the machine learning pipeline, search for appropriate data and dataset preparation are among the most time-consuming processes. A data scientist spends around 80 percent of his/her time in managing and preparing data for analysis. The demand-supply gap for qualified data scientists is another pressing challenge.
Enter, feature store.
What Are Feature Stores?
A feature store allows features (measure pieces of data) to be registered, discovered, and used for the machine learning pipelines and online applications for model inferencing. They can store large volumes of feature data and provide low latency access to features for online applications. A feature store automates the input, tracks, and governs data into machine learning models. Enterprise AI can benefit immensely from such a centralised and reproducible framework to manage machine learning models.
In 2017, Uber changed the game with the introduction of Michelangelo, an ML platform for data management. Michelangelo offered a feature store. In 2019, Feast project, in collaboration with Google Cloud, announced a feature store.
The latest to join the bandwagon is Amazon’s AWS SageMaker Feature Store — a fully managed and purpose-built repository. Airbnb, Twitter, Facebook, and Netflix are other major players with feature stores.
Feature stores (by taking up the most mundane yet time-intensive data tasks) allow data scientists to focus on essential tasks such as model building and experimentation rather than spending time on cleaning and managing data.