Artificial Intelligence Pipelines With IBM CodeFlare

To say that AI is complicated is an understatement. Machine learning, a subset of artificial intelligence, is a multifaceted process that integrates and scales mountains of data in various forms from various sources. The data is used to train machine learning models to develop insights and solutions from newly collected related data. For example, an image recognition model trained on several million photos of dogs and cats can efficiently classify a new image as a cat or a dog.

The development of models for machine learning requires the coordination of many processes associated with pipelines. Pipelines can handle the ingestion, debugging, and manipulation of data from various sources for training and inference. Machine learning models use end-to-end pipelines to manage the collection and processing of input and output data.

To accommodate the extraordinary growth of AI and its increasing complexity, IBM created an open source framework called CodeFlare to address the complex needs of the AI ​​pipeline. CodeFlare simplifies the integration, scaling and acceleration of complex, multi-level machine learning and analytics pipelines in the cloud. Cloud deployment is one of the critical design points for CodeFlare, which can be easily deployed from on-premises to public clouds to the edge with OpenShift.

It is important to note that CodeFlare is not currently a generally available product and IBM has not yet established a schedule for developing any product. However, CodeFlare is available as an open source project, and as a developing project, some aspects of orchestration and automation are still in the works. At this stage, issues can be reported through the public GitHub project. IBM invites the community to participate through problem and bug reports that are processed with the greatest possible effort.

The main features of CodeFlare are:

  • Pipeline Scaling and Execution: CodeFlare pipelines make it easy to define and run parallel pipelines. It unifies pipeline workflows across multiple frameworks and at the same time offers almost optimal scaling parallelism in routed calculations.
  • Deploy and Integrate Anywhere: CodeFlare simplifies deployment and integration by enabling a serverless user experience with Red Hat OpenShift and IBM Cloud Code Engine, and by providing adapters and connectors to make it easy to load data and connect to data services.

Technology

CodeFlare is based on Ray, an open source distributed computing framework for machine learning applications. According to IBM, CodeFlare expands Ray’s capabilities by adding specific elements to make it easier to scale workflows.

Cloud Code Engine and Red Hat OpenShift: This platform gives CodeFlare the flexibility to deploy anywhere.

Emerging Workflows

CodeFlare can integrate new workflows with complex pipelines that require the integration and coordination of different tools and execution times. It is also designed to scale complex pipelines such as multi-level NLP, complex time series and forecasting, enhanced learning, and AI workbenches, and scale heterogeneous pipelines that use data from multiple sources and require different treatments.

What difference does CodeFlare make?

According to the IBM Research Blog, CodeFlare significantly increases the efficiency of machine learning. The blog claims that a user used the framework to analyze and optimize approximately 100,000 pipelines to train machine learning models. CodeFlare reduced the time it took to run each pipeline, from 4 hours to 15 minutes, 18 times faster than CodeFlare.

Research blog also suggests that CodeFlare can save scientists months of labor on large pipelines and give the data team more time to develop and be productive.

Bottom Line

Studies show that despite high investments in artificial intelligence, around 75% of prototype machine learning models do not transition to production status. Various reasons for low conversion rates range from poor project planning to poor collaboration and communication between members of the AI ​​data team.

CodeFlare is a purpose-built platform that provides complete end-to-end pipeline visibility and analysis for a wide variety of machine learning models and workflows. It offers a simpler way of integrating and scaling entire pipelines and at the same time offers a uniform runtime and programming interface.

For these reasons, despite historically high AI model error rates, Moor Insights and Strategy believes that machine learning models using CodeFlare pipelines will have a high percentage of the machine learning model’s transition from experimental to production.

Analyst Notes:

  • IBM hopes to improve CodeFlare to support increasingly complex pipelines.
  • Future development plans are expected to include increased fault tolerance and support for pipe visualization.
  • IBM made CodeFlare available in the CodeFlare repository of the GitHub project. There are also examples that run on IBM Cloud and Red Hat OpenShift.

Source link