In March of this year, MIT Sloan Management Review made a sobering discovery: Most corporate data science projects are considered to have failed; a shocking proportion of companies do not achieve a significant ROI from their data science projects, as reported by a Gartner Inc. analyst in 2017, 87% were reported by VentureBeat in 2019, and 85.4% were reported by Forbes in 2020.
Despite the advances in data science and machine learning (ML) a Despite the development of various data management software and why are production-ready machine learning models just missing the mark, despite the hundreds of articles and videos on the Internet?
People often attribute this to a lack of appropriate data science talent and disorganized data; However, my business partner and co-founder Nirman Dave and I discussed this recently, and we believe there is something more complex at play here. Three key factors preventing AA models from going into production:
1. Volume – The rate at which raw data is created
2. Debugging – The ability to create an ML ready dataset from raw data
3. Explainability – The ability to explain how decisions are made from complex ML models to everyday non-technicians Business users
Let’s start with volume, one of the first major bottlenecks in the production of ML models. We know that the rate of data collection is growing exponentially. With this growing amount of data, it becomes incredibly important to provide information in real time. However, when statistics are created, new raw data is already being collected, which can make existing information out of date.
On top of that, this is completed by data cleansing, organizing, cleaning and manipulating data to make it ML-ready. Since data is distributed across multiple storage solutions in different formats (e.g. spreadsheets, databases, CRM), this step can be Herculean. A change as small as a new column in a table may require changes throughout the process to accommodate this.
Even once the models have been built, explaining them becomes a challenge. Nobody likes to take orders from a computer unless they are well explained in the technical details.
Solving any of these problems may take an army, and many organizations do not have or scale up a data science team. It doesn’t have to be like that, however, imagine if all of these problems could be solved simply by changing the choice of AA models. I call that the tiny model theory.
The tiny model theory is the idea that you don’t have to use heavy machine learning models to make simple and repetitive daily business predictions. In fact, by using lighter models (e.g., Random Forests, Logistic Regression, etc.), you can reduce the time-to-market for the above bottlenecks and reduce your time to market.
It’s often easy for engineers to choose complicated deep neural networks to solve problems, but in my experience as a CTO at one of the big AI startups in the Bay Area, most problems don’t require complicated deep neural networks. Instead, it can work very well with small models – freeing up speed, reducing complexity, and increasing explainability.
Let’s start with speed. Because a significant portion of the project schedule is occupied with data preprocessing, data scientists have less time to experiment with different types of models.As a result, they are drawn to large models with complex architecture in the hopes that they will be the miracle solution to their problems.
In most business use cases, like forecasting losses, forecasting earnings, forecasting loan defaults, etc., you just end up increasing time to value, resulting in a decrease in the return on time invested compared to performance.
I find it similar to using a hammer to break a walnut, but this is where small models can shine. Small models like logistic regression can be trained simultaneously using distributed ML that trains models in parallel through different servers in the cloud. Due to the lack of complexity in their architecture, small models require significantly less computing power for training and less storage space.
This lack of complexity makes them ideal candidates for distributed ML. Some of the big companies prefer simple models for their distributed ML channel, which includes peripherals like IOTs and smartphones. Federated machine learning based on distributed ML is rapidly becoming popular today.
The average data scientist can easily see how a simple model like a decision tree makes a prediction. A trained decision tree can be drawn to show how individual features contribute to a prediction. This makes the simple models more explainable.
You can also use a number of simply trained models that take an average of their predictions. This theorem is likely to be more accurate than a single complex model. Instead of having all the eggs in one basket, use a number of simple models that spread the risk of a poorly performing ML model.
Simple models are much easier to implement today because they are more accessible. Models like logistic regression and random forests have been around much longer than neural networks, so they are better understood today. Popular low-code ML libraries like SciKit Learn also helped lower the barrier to entry into ML, as ML models could be instantiated with one line of code.
Given the importance of AI in business strategy, the number of companies experimenting with AI will only increase; However, if companies want to gain a noticeable competitive advantage over others, I believe that simple ML models are the only way to go. This doesn’t mean complex models like neural networks are disappearing, they’re still used for niche projects like facial recognition and cancer detection, but all businesses have choices to make, and simple models are a better option than such.