Multiple-Model Machine Learning

An ensemble learning method involves combining the predictions from multiple contributing models.

Nevertheless, not all techniques that make use of multiple machine learning models are ensemble learning algorithms.

It is common to divide a prediction problem into subproblems. For example, some problems naturally subdivide into independent but related subproblems and a machine learning model can be prepared for each. It is less clear whether these represent examples of ensemble learning, although we might distinguish these methods from ensembles given the inability for a contributing ensemble member to produce a solution (however weakly) to the overall prediction problem.

In this tutorial, you will discover multiple-model techniques for machine learning and their relationship to ensemble learning.

After completing this tutorial, you will know:

  • Multiple-model machine learning refers to techniques that use multiple models in some way that closely resembles ensemble learning.
  • Use of multiple models for multi-class classification and multi-output regression differ from ensembles in that no contributing member can solve the problem.
  • Mixture of experts might be considered a true ensemble method, although hybrid machine learning models are probably not ensemble learning methods.

Kick-start your project with my new book Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. Multiple-Model Techniques
  2. Multiple Models for Multi-Class Classification
  3. Multiple Models for Multi-Output Regression
  4. Multiple Expert Models
  5. Hybrids Constructed From Multiple Models

Multiple-Model Techniques

Ensemble learning is concerned with approaches that combine predictions from two or more models.

We can characterize a model as an ensemble learning technique if it has two properties, such as:

  • Comprising two or more models.
  • Predictions are combined.

We might also suggest that the goal of an ensemble model is to improve predictions over any contributing member. Although a lesser goal might be to improve the stability of the model, e.g. reduce the variance in the predictions or prediction errors.

Nevertheless, there are models and model architectures that contain elements of ensemble learning methods, but it is not clear as to whether they may be considered ensemble learning or not.

For example, we might define an ensemble learning technique as being composed of two or more models. The problem is that there may be techniques that have more than two models, yet do not combine their predictions. Alternatively, they may combine their predictions in unexpected ways.

There are some methods which try to make use of multiple learners, yet in a strict sense they can not be recognized as ensemble combination methods.

— Page 89, Ensemble Methods, 2012.

For a lack of a better name, we will refer to these as “multiple-model techniques” to help differentiate them from ensemble learning methods. Yet, as we will see, the line that separates these two types of machine learning methods is not that clear.

  • Multiple-Model Techniques: Machine learning algorithms that are composed of multiple models and combine the techniques but might not be considered ensemble learning.

As such, it is important to review and explore multiple-model techniques that sit on the border of ensemble learning to both better understand ensemble learning and to draw upon related ideas that may improve the ensemble learning models we create.

There are predictive modeling problems where the structure of the problem itself may suggest the use of multiple models.

Typically, these are problems that can be divided naturally into sub-problems. This does not mean that dividing the problems into subproblems is the best solution for a given example; it only means that the problem naturally lends itself to decomposition.

Two examples are multi-class classification and multiple-output regression.

Multiple Models for Multi-Class Classification

Classification problems involve assigning a class label to input examples.

Binary classification tasks are those that have two classes. One decision is made for each example, either assigning it to one class or another. If modeled using probability, a single probability of the example being to one class is predicted, where the inverse is the probability for the second class, called a binomial probability distribution.

More than two classes can introduce a challenge. The techniques designed for two classes can be extended to multiple classes, and sometimes, this is straightforward.

  • Multi-Class Classification: Assign one among more than class labels to a given input example.

Alternatively, the problem can be naturally partitioned into multiple binary classification tasks.

There are many ways this can be achieved.

For example, the classes can be grouped into multiple one-vs-rest prediction problems. A model can then be fit for each subproblem and typically the same algorithm type is used for each model. When a prediction is required for a new example, then the model that responds more strongly than the other models can assign a prediction. This is called a one-vs-rest (OvR) or one-vs-all (OvA) approach.

  • OvR: A technique that splits a multi-class classification into one binary classification problem per class.

The multi-class classification problem can be divided into multiple pairs of classes, and a model fit on each. Again, a prediction for a new example can be chosen from the model that responds more strongly. This is referred to as one-vs-one (OvO).

  • OvR: A technique that splits a multi-class classification into one binary classification problem per each pair of classes.

This approach of partitioning a multi-class classification problem into multiple binary classification problems can be generalized. Each class can be mapped to a unique binary string for the class with arbitrarily length. One classifier can then be fit to predict each bit in the bit string, allowing an arbitrary number of classifiers to be used.

The bit string can then be mapped to the class label with the closest match. The additional bits act like error-correcting codes, improving the performance of the approach in some cases over simpler OvR and OvO methods. This approach is referred to as Error-Correcting Output Codes, ECOC.

  • ECOC: A technique that splits a multi-class classification into an arbitrary number of binary classification problems.

In each of these cases, multiple models are used, just like an ensemble. Predictions are also combined, like an ensemble method, although in a winner-take-all method rather than a vote or weighted sum. Technically, this is a combination method, but unlike most typical ensemble learning methods.

Unlike ensemble learning, these techniques are designed to explore the natural decomposition of the prediction problem and leverage binary classification problems that may not easily scale to multiple classes.

Whereas ensemble learning is not concerned with opening up new capabilities and is typically only focused on improved predictive performance over contributing models. With techniques like OvR, OvR, and ECOC, the contributing models cannot be used to address the prediction problem in isolation, by definition.

Multiple Models for Multi-Output Regression

Regression problems involve predicting a numerical value given an input example.

Typically, a single output value is predicted. Nevertheless, there are regression problems where multiple numeric values must be predicted for each input example. These problems are referred to as multiple-output regression problems.

  • Multiple-Output Regression: Predict two or more numeric outputs given an input.

Models can be developed to predict all target values at once, although a multi-output regression problem is another example of a problem that can be naturally divided into subproblems.

Like binary classification in the previous section, most techniques for regression predictive modeling were designed to predict a single value. Predicting multiple values can pose a problem and requires the modification of the technique. Some techniques cannot be reasonably modified for multiple values.

One approach is to develop a separate regression model to predict each target value in a multi-output regression problem. Typically, the same algorithm type is used for each model. For example, a multi-output regression with three target values would involve fitting three models, one for each target.

When a prediction is required, the same input pattern is provided to each model and the specific target for each model is predicted and together represent the vector output of the method.

  • Multi-Output Regression: A technique where one regression model is used for each target in a multi-output regression problem.

Another related approach is to create a sequential chain of regression models. The difference is that the output of the first model predicts the first output target value, but this value is used as part of the input to the second model in the chain in order to predict the second output target value, and so on.

As such, the chain introduces a linear dependence between the regression models, allowing the outputs of models later in the chain to be conditional on the outputs of prior models in the chain.

  • Regression Chain: A technique where a sequential chain of regression models is used to predict each target in a multi-output regression problem, one model later in the chain uses values predicted by models earlier in the chain.

In each case, multiple regression models are used, just like an ensemble.

A possible difference from ensembles is that the predictions made by each model are not combined directly. We could stretch the definition of “combining predictions” to cover this approach, however. For example, the predictions are concatenated in the case of multi-output regression models and indirectly via the conditional approach in chained regression.

The key difference from ensemble learning methods is that no contributing ensemble member can solve the prediction problem alone. A solution can only be achieved by combining the predictions from all members.

Multiple Expert Models

So far, we have looked at dividing problems into subtasks based on the structure of what is being predicted.

There are also problems that can be naturally divided into subproblems based on the input data. This might be as simple as partitions of the input feature space, or something more elaborate, such as dividing an image into the foreground and background and developing a model for each.

A more general approach for this from the field of neural networks is referred to as a mixture of experts(MoE).

The approach involves first dividing the learning task into subtasks, developing an expert model for each subtask, using a gating model to decide or learn which expert to use for each example and the pool the outputs of the experts, and gating model together to make a final prediction.

  • MoE: A technique that develops an expert model for each subtask and learns how much to trust each expert when making a prediction for specific examples.

Two aspects of MoE make the method unique. The first is the explicit partitioning of the input feature space, and the second is the use of a gating network or gating model that learns which expert to trust in each situation, e.g, each input case.

Unlike the previous examples for multi-class classification and multi-output regression that divided the target into subproblems, the contributing members in a mixture of experts model can address the whole problem, at least partially or to some degree. Although an expert may not be tailored to a specific input, it can still be used to make a prediction on something outside of its area of expertise.

Also, unlike those previously reviewed methods, a mixture of experts also combines the predictions of all contributing members using a weighted sum, albeit measured by the gating network.

As such, it more closely resembles more familiar ensemble learning techniques, such as stacked generalization, known as stacking.

Hybrids Constructed From Multiple Models

Another type of machine learning that involves the use of multiple models and is loosely related to ensemble learning is hybrid models.

Hybrid models are those models that combine two or more models explicitly. As such, the definition of what does and does not constitute a hybrid model can be vague.

  • Hybrid Model: A technique that combines two or more different machine learning models in some way.

For example, an autoencoder neural network that learns how to compress input patterns to a bottleneck layer, the output of which is then fed to another model, such as a support vector machine, would be considered a hybrid machine learning model.

This example has two machine learning models, a neural network and a support vector machine. It just so happens that the models are linearly stacked one on top of another into a pipeline and the last model in the pipeline makes a prediction.

Consider an ensemble learning method that has multiple contributing ensemble members of different types (e.g. a logistic regression and a support vector machine) and uses voting to average their predictions. This ensemble too might be considered a hybrid machine learning model, under the broader definition.

Perhaps the key difference between ensemble learning and hybrid machine learning is the need in hybrid models to use models of differing types. Whereas in ensemble learning, contributing members to the ensemble may be of any type.

Further, it is more likely that a hybrid machine learning will graft one or more models onto another base model, which is quite different from fitting separate models and combining their predictions as we do in ensemble learning.

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link