Calculating the Bias-Variance Trade-off with Python

The performance of a machine learning model can be characterized in terms of the bias and the variance of the model.

A model with high bias makes strong assumptions about the form of the unknown underlying function that maps inputs to outputs in the dataset, such as linear regression. A model with high variance is highly dependent upon the specifics of the training dataset, such as unpruned decision trees. We desire models with low bias and low variance, although there is often a trade-off between these two concerns.

The bias-variance trade-off is a useful conceptualization for selecting and configuring models, although generally cannot be computed directly as it requires full knowledge of the problem domain, which we do not have. Nevertheless, in some cases, we can estimate the error of a model and divide the error down into bias and variance components, which may provide insight into a given model’s behavior.

In this tutorial, you will discover how to calculate the bias and variance for a machine learning model.

After completing this tutorial, you will know:

  • Model error consists of model variance, model bias, and irreducible error.
  • We seek models with low bias and variance, although typically reducing one results in a rise in the other.
  • How to decompose mean squared error into model bias and variance terms.

Let’s get started.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Bias, Variance, and Irreducible Error
  2. Bias-Variance Trade-off
  3. Calculate the Bias and Variance

Bias, Variance, and Irreducible Error

Consider a machine learning model that makes predictions for a predictive modeling task, such as regression or classification.

The performance of the model on the task can be described in terms of the prediction error on all examples not used to train the model. We will refer to this as the model error.

  • Error(Model)

The model error can be decomposed into three sources of error: the variance of the model, the bias of the model, and the variance of the irreducible error in the data.

  • Error(Model) = Variance(Model) + Bias(Model) + Variance(Irreducible Error)

Let’s take a closer look at each of these three terms.

Model Bias

The bias is a measure of how close the model can capture the mapping function between inputs and outputs.

It captures the rigidity of the model: the strength of the assumption the model has about the functional form of the mapping between inputs and outputs.

This reflects how close the functional form of the model can get to the true relationship between the predictors and the outcome.

— Page 97, Applied Predictive Modeling, 2013.

A model with high bias is helpful when the bias matches the true but unknown underlying mapping function for the predictive modeling problem. Yet, a model with a large bias will be completely useless when the functional form for the problem is mismatched with the assumptions of the model, e.g. assuming a linear relationship for data with a high non-linear relationship.

  • Low Bias: Weak assumptions regarding the functional form of the mapping of inputs to outputs.
  • High Bias: Strong assumptions regarding the functional form of the mapping of inputs to outputs.

The bias is always positive.

Model Variance

The variance of the model is the amount the performance of the model changes when it is fit on different training data.

It captures the impact of the specifics the data has on the model.

Variance refers to the amount by which [the model] would change if we estimated it using a different training data set.

— Page 34, An Introduction to Statistical Learning with Applications in R, 2014.

A model with high variance will change a lot with small changes to the training dataset. Conversely, a model with low variance will change little with small or even large changes to the training dataset.

  • Low Variance: Small changes to the model with changes to the training dataset.
  • High Variance: Large changes to the model with changes to the training dataset.

The variance is always positive.

Irreducible Error

On the whole, the error of a model consists of reducible error and irreducible error.

  • Model Error = Reducible Error + Irreducible Error

The reducible error is the element that we can improve. It is the quantity that we reduce when the model is learning on a training dataset and we try to get this number as close to zero as possible.

The irreducible error is the error that we can not remove with our model, or with any model.

The error is caused by elements outside our control, such as statistical noise in the observations.

… usually called “irreducible noise” and cannot be eliminated by modeling.

— Page 97, Applied Predictive Modeling, 2013.

As such, although we may be able to squash the reducible error to a very small value close to zero, or even zero in some cases, we will also have some irreducible error. It defines a lower bound in performance on a problem.

It is important to keep in mind that the irreducible error will always provide an upper bound on the accuracy of our prediction for Y. This bound is almost always unknown in practice.

— Page 19, An Introduction to Statistical Learning with Applications in R, 2014.

It is a reminder that no model is perfect.

Bias-Variance Trade-off

The bias and the variance of a model’s performance are connected.

Ideally, we would prefer a model with low bias and low variance, although in practice, this is very challenging. In fact, this could be described as the goal of applied machine learning for a given predictive modeling problem,

Reducing the bias can easily be achieved by increasing the variance. Conversely, reducing the variance can easily be achieved by increasing the bias.

This is referred to as a trade-off because it is easy to obtain a method with extremely low bias but high variance […] or a method with very low variance but high bias …

— Page 36, An Introduction to Statistical Learning with Applications in R, 2014.

This relationship is generally referred to as the bias-variance trade-off. It is a conceptual framework for thinking about how to choose models and model configuration.

We can choose a model based on its bias or variance. Simple models, such as linear regression and logistic regression, generally have a high bias and a low variance. Complex models, such as random forest, generally have a low bias but a high variance.

We may also choose model configurations based on their effect on the bias and variance of the model. The k hyperparameter in k-nearest neighbors controls the bias-variance trade-off. Small values, such as k=1, result in a low bias and a high variance, whereas large k values, such as k=21, result in a high bias and a low variance.

High bias is not always bad, nor is high variance, but they can lead to poor results.

We often must test a suite of different models and model configurations in order to discover what works best for a given dataset. A model with a large bias may be too rigid and underfit the problem. Conversely, a large variance may overfit the problem.

We may decide to increase the bias or the variance as long as it decreases the overall estimate of model error.

Calculate the Bias and Variance

I get this question all the time:

How can I calculate the bias-variance trade-off for my algorithm on my dataset?

Technically, we cannot perform this calculation.

We cannot calculate the actual bias and variance for a predictive modeling problem.

This is because we do not know the true mapping function for a predictive modeling problem.

Instead, we use the bias, variance, irreducible error, and the bias-variance trade-off as tools to help select models, configure models, and interpret results.

In a real-life situation in which f is unobserved, it is generally not possible to explicitly compute the test MSE, bias, or variance for a statistical learning method. Nevertheless, one should always keep the bias-variance trade-off in mind.

— Page 36, An Introduction to Statistical Learning with Applications in R, 2014.

Even though the bias-variance trade-off is a conceptual tool, we can estimate it in some cases.

The mlxtend library by Sebastian Raschka provides the bias_variance_decomp() function that can estimate the bias and variance for a model over multiple bootstrap samples.

First, you must install the mlxtend library; for example:

The example below loads the Boston housing dataset directly via URL, splits it into train and test sets, then estimates the mean squared error (MSE) for a linear regression as well as the bias and variance for the model error over 200 bootstrap samples.

Running the example reports the estimated error as well as the estimated bias and variance for the model error.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model has a high bias and a low variance. This is to be expected given that we are using a linear regression model. We can also see that the sum of the estimated mean and variance equals the estimated error of the model, e.g. 20.726 + 1.761 = 22.487.

Summary

In this tutorial, you discovered how to calculate the bias and variance for a machine learning model.

Specifically, you learned:

  • Model error consists of model variance, model bias, and irreducible error.
  • We seek models with low bias and variance, although typically reducing one results in a rise in the other.
  • How to decompose mean squared error into model bias and variance terms.

This article has been pubished from the source link without modifications to the text. Only the headline has been changed.

Source link