[ad_1]
Upcoming changes to the scikit-learn library for machine learning are reported through the use of FutureWarning messages when the code is run.
Warning messages can be confusing to beginners as it looks like there is a problem with the code or that they have done something wrong. Warning messages are also not good for operational code as they can obscure errors and program output.
There are many ways to handle a warning message, including ignoring the message, suppressing warnings, and fixing the code.
In this tutorial, you will discover FutureWarning messages in the scikit-learn API and how to handle them in your own machine learning projects.
After completing this tutorial, you will know:
- FutureWarning messages are designed to inform you about upcoming changes to default values for arguments in the scikit-learn API.
- FutureWarning messages can be ignored or suppressed as they do not halt the execution of your program.
- Examples of FutureWarning messages and how to interpret the message and change your code to address the upcoming change.
Discover how to prepare data with pandas, fit and evaluate models with scikit-learn, and more in my new book, with 16 step-by-step tutorials, 3 projects, and full python code.
Let’s get started.
Tutorial Overview
This tutorial is divided into four parts; they are:
- Problem of FutureWarnings
- How to Suppress FutureWarnings
- How to Fix FutureWarnings
- FutureWarning Recommendations
Problem of FutureWarnings
The scikit-learn library is an open-source library that offers tools for data preparation and machine learning algorithms.
It is a widely used and constantly updated library.
Like many actively maintained software libraries, the APIs often change over time. This may be because better practices are discovered or preferred usage patterns change.
Most functions available in the scikit-learn API have one or more arguments that let you customize the behavior of the function. Many arguments have sensible defaults so that you don’t have to specify a value for the arguments. This is particularly helpful when you are starting out with machine learning or with scikit-learn and you don’t know what impact each of the arguments has.
Change to the scikit-learn API over time often comes in the form of changes to the sensible defaults to arguments to functions. Changes of this type are often not performed immediately; instead, they are planned.
For example, if your code was written for a prior version of the scikit-learn library and relies on a default value for a function argument and a subsequent version of the API plans to change this default value, then the API will alert you to the upcoming change.
This alert comes in the form of a warning message each time your code is run. Specifically, a “FutureWarning” is reported on standard error (e.g. on the command line).
This is a useful feature of the API and the project, designed for your benefit. It allows you to change your code ready for the next major release of the library to either retain the old behavior (specify a value for the argument) or adopt the new behavior (no change to your code).
A Python script that reports warnings when it runs can be frustrating.
- For a beginner, it may feel like the code is not working correctly, that perhaps you have done something wrong.
- For a professional, it is a sign of a program that requires updating.
In either case, warning messages may obscure real error messages or output from the program.
How to Suppress FutureWarnings
Warning messages are not error messages.
As such, a warning message reported by your program, such as a FutureWarning, will not halt the execution of your program. The warning message will be reported and the program will carry on executing.
You can, therefore, ignore the warning each time your code is executed, if you wish.
It is also possible to programmatically ignore the warning messages. This can be done by suppressing warning messages when your program is run.
This can be achieved by explicitly configuring the Python warning system to ignore warning messages of a specific type, such as ignore all FutureWarnings, or more generally, to ignore all warnings.
This can be achieved by adding the following block around your code that you know will generate warnings:
1
2
3
4
5
6
|
# run block of code and catch warnings
with warnings.catch_warnings():
# ignore all caught warnings
warnings.filterwarnings(“ignore”)
# execute code that will generate warnings
...
|
Or, if you have a very simple flat script (no functions or blocks), you can suppress all FutureWarnings by adding two lines to the top of your file:
1
2
3
4
|
# import warnings filter
from warnings import simplefilter
# ignore all future warnings
simplefilter(action=‘ignore’, category=FutureWarning)
|
To learn more about suppressing in Python, see:
How to Fix FutureWarnings
Alternately, you can change your code to address the reported change to the scikit-learn API.
Typically, the warning message itself will instruct you on the nature of the change and how to change your code to address the warning.
Nevertheless, let’s look at a few recent examples of FutureWarnings that you may encounter and be struggling with.
The examples in this section were developed with scikit-learn version 0.20.2. You can check your scikit-learn version by running the following code:
1
2
3
|
# check scikit-learn version
import sklearn
print(‘sklearn: %s’ % sklearn.__version__)
|
You will see output like the following:
1
|
sklearn: 0.20.2
|
As new versions of scikit-learn are released over time, the nature of the warning messages reported will change and new defaults will be adopted.
As such, although the examples below are specific to a version of scikit-learn, the approach to diagnosing and addressing the nature of each API change and provide good examples for handling future changes.
FutureWarning for LogisticRegression
The LogisticRegression algorithm has two recent changes to the default argument values that result in FutureWarning messages.
The first has to do with the solver for finding coefficients and the second has to do with how the model should be used to make multi-class classifications. Let’s look at each with code examples.
Changes to the Solver
The example below will generate a FutureWarning about the solver argument used by LogisticRegression.
1
2
3
4
5
6
7
8
9
|
# example of LogisticRegression that generates a FutureWarning
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
# prepare dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2)
# create and configure model
model = LogisticRegression()
# fit model
model.fit(X, y)
|
Running the example results in the following warning message:
1
|
FutureWarning: Default solver will be changed to ‘lbfgs’ in 0.22. Specify a solver to silence this warning.
|
This issue involves a change from the ‘solver‘ argument that used to default to ‘liblinear‘ and will change to default to ‘lbfgs‘ in a future version. You must now specify the ‘solver‘ argument.
To maintain the old behavior, you can specify the argument as follows:
1
2
|
# create and configure model
model = LogisticRegression(solver=‘liblinear’)
|
To support the new behavior (recommended), you can specify the argument as follows:
1
2
|
# create and configure model
model = LogisticRegression(solver=‘lbfgs’)
|
Changes to the Multi-Class
The example below will generate a FutureWarning about the ‘multi_class‘ argument used by LogisticRegression.
1
2
3
4
5
6
7
8
9
|
# example of LogisticRegression that generates a FutureWarning
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
# prepare dataset
X, y = make_blobs(n_samples=100, centers=3, n_features=2)
# create and configure model
model = LogisticRegression(solver=‘lbfgs’)
# fit model
model.fit(X, y)
|
Running the example results in the following warning message:
1
|
FutureWarning: Default multi_class will be changed to ‘auto’ in 0.22. Specify the multi_class option to silence this warning.
|
This warning message only affects the use of logistic regression for multi-class classification problems, instead of the binary classification problems for which the method was designed.
The default of the ‘multi_class‘ argument is changing from ‘ovr‘ to ‘auto‘.
To maintain the old behavior, you can specify the argument as follows:
1
2
|
# create and configure model
model = LogisticRegression(solver=‘lbfgs’, multi_class=‘ovr’)
|
To support the new behavior (recommended), you can specify the argument as follows:
1
2
|
# create and configure model
model = LogisticRegression(solver=‘lbfgs’, multi_class=‘auto’)
|
FutureWarning for SVM
The support vector machine implementation has had a recent change to the ‘gamma‘ argument that results in a warning message, specifically the SVC and SVR classes.
The example below will generate a FutureWarning about the ‘gamma‘ argument used by SVC, but just as equally applies to SVR.
1
2
3
4
5
6
7
8
9
|
# example of SVC that generates a FutureWarning
from sklearn.datasets import make_blobs
from sklearn.svm import SVC
# prepare dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2)
# create and configure model
model = SVC()
# fit model
model.fit(X, y)
|
Running this example will generate the following warning message:
1
|
FutureWarning: The default value of gamma will change from ‘auto’ to ‘scale’ in version 0.22 to account better for unscaled features. Set gamma explicitly to ‘auto’ or ‘scale’ to avoid this warning.
|
This warning message reports that the default for the ‘gamma‘ argument is changing from the current value of ‘auto‘ to a new default value of ‘scale‘.
The gamma argument only impacts SVM models that use the RBF, Polynomial, or Sigmoid kernel.
The parameter controls the value of the ‘gamma‘ coefficient used in the algorithm and if you do not specify a value, a heuristic is used to specify the value. The warning is about a change in the way that the default will be calculated.
To maintain the old behavior, you can specify the argument as follows:
1
2
|
# create and configure model
model = SVC(gamma=‘auto’)
|
To support the new behavior (recommended), you can specify the argument as follows:
1
2
|
# create and configure model
model = SVC(gamma=‘scale’)
|
FutureWarning for Decision Tree Ensemble Algorithms
The decision-tree based ensemble algorithms will change the number of sub-models or trees used in the ensemble controlled by the ‘n_estimators‘ argument.
This affects models’ random forest and extra trees for classification and regression, specifically the classes: RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, and RandomTreesEmbedding.
The example below will generate a FutureWarning about the ‘n_estimators‘ argument used by RandomForestClassifier, but just as equally applies to RandomForestRegressor and the extra trees classes.
1
2
3
4
5
6
7
8
9
|
# example of RandomForestClassifier that generates a FutureWarning
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
# prepare dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2)
# create and configure model
model = RandomForestClassifier()
# fit model
model.fit(X, y)
|
Running this example will generate the following warning message:
1
|
FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
|
This warning message reports that the number of submodels is increasing from 10 to 100, likely because computers are getting faster and 10 is very small, even 100 is small.
To maintain the old behavior, you can specify the argument as follows:
1
2
|
# create and configure model
model = RandomForestClassifier(n_estimators=10)
|
To support the new behavior (recommended), you can specify the argument as follows:
1
2
|
# create and configure model
model = RandomForestClassifier(n_estimators=100)
|
More Future Warnings?
Are you struggling with a FutureWarning that is not covered?
Let me know in the comments below and I will do my best to help.
FutureWarning Recommendations
Generally, I do not recommend ignoring or suppressing warning messages.
Ignoring warning messages means that the message may obscure real errors or program output and that API future changes may negatively impact your program unless you have considered them.
Suppressing warnings might be a quick fix for R&D work, but should not be used in a production system. Worse than simply ignoring the messages, suppressing the warnings may also suppress messages from other APIs.
Instead, I recommend that you fix the warning messages in your software.
How should you change your code?
In general, I recommend almost always adopting the new behavior of the API, e.g. the new default, unless you explicitly rely on the prior behavior of the function.
For long-lived operational or production code, it might be a good idea to explicitly specify all function arguments and not use defaults, as they might be subject to change in the future.
I also recommend that you keep your scikit-learn library up to date, and keep track of the changes to the API in each new release.
The easiest way to do this is to review the release notes for each release, available here:
Summary
In this tutorial, you discovered FutureWarning messages in the scikit-learn API and how to handle them in your own machine learning projects.
Specifically, you learned:
- FutureWarning messages are designed to inform you about upcoming changes to default values for arguments in the scikit-learn API.
- FutureWarning messages can be ignored or suppressed as they do not halt the execution of your program.
- Examples of FutureWarning messages and how to interpret the message and change your code to address the upcoming change.
[ad_2]
Source link