There is a reason for businesses not building their own cloud computing infrastructure. Last decade, IT infrastructure teams pursued building their own private clouds because they believed they could do it cheaper and more effectively than the public cloud. Instead, they took longer, and the cost to build increased more than they anticipated, required more resources to maintain, and had fewer of the latest security and scaling capabilities than public clouds. Instead of investing in core business capabilities, these organizations spent significant time and resources on infrastructure that could not meet expanded business needs.
Many businesses are now taking the same do-it-yourself approach to most MLOps tasks, building custom solutions out of open source tools such as Apache Spark.
These frequently result in model deployments consuming weeks or even months per model, ineffective runtimes (as measured by inferences run over compute and time required), and, most importantly, a lack of observability required to test and monitor model accuracy over time. These approaches are too specialized to provide scalable, repeatable processes to multiple use cases across the enterprise.
The misdiagnosed problem case
Furthermore, conversations with the line of business leaders as well as chief data and analytics officers have revealed that organizations continue to hire more data scientists but aren’t seeing a return.
However, as we dug deeper and began asking questions to identify the bottlenecks to their AI, they quickly realized that the hindrance was actually at the last mile – setting up the models to use against live data, running them effectively so that the compute costs did not exceed the gains, and then measuring their performance.
Data scientists are experts at transforming data into models that can be used to solve business problems and make business decisions. However, the expertise and skills needed to build great models are not the same as those required to deploy those models in the real world with production-ready code, monitor, and upgrade on an ongoing basis.
This is where machine learning engineers are required. ML engineers are in charge of integrating tools and frameworks to ensure that the data, data pipelines, and key infrastructure are all working together to scale ML models.
What is the action plan? Recruit more ML engineers?
Even with the best ML engineers, enterprises face two major challenges when it comes to scaling AI:
- Failure to hire ML engineers quickly enough
Job openings for ML engineers have increased 30 times faster than the overall growth rate of IT services. Instead of waiting months or even years for these roles to be filled, MLOps teams must find a way to support more ML models and use cases without increasing ML engineering headcount linearly. However, this introduces the second hindrance.
- The absence of a repeatable, scalable process for deploying models regardless of where or how they were built
The reality of today’s enterprise data ecosystem is that various business units use different data platforms based on the data and technological requirements of their use cases (for instance, the product team may be required to assist with streaming data whereas finance may require a simple querying interface for non-technical users).
Furthermore, instead of being a centralized practice, data science is frequently dispersed across business units. Each of these data science teams has its own approved model training framework based on the use cases they are resolving, implying that a one-size-fits-all training framework for the entire enterprise may not be feasible.
How to Maximize the Value of AI
Enterprises have invested billions of dollars in AI in the hopes of increasing automation, customizing the customer experience at scale, or providing more accurate and granular predictions. However, there has been a significant gap between AI promises and outcomes thus far, with only about 10% of AI investments yielding remarkable ROI.
To solve the MLOps problem, Chief Data & Analytics Officers must build core business capabilities around data science while investing in technologies that automate the rest of MLOps. Yes, this is the age-old “build vs. buy” debate, however this time the correct metric is not just OpEx costs, but how swiftly and efficiently your AI investments are penetrating throughout the enterprise, whether generating the latest revenues through finer products and customer segments or reducing costs through greater automation and decreased waste.