Python Static Analyzers

Static analyzers are tools that allow you to test your code without actually running it. The syntax highlighters in your favorite editors are the most basic form of static analyzers. If you need to compile your code (for example, in C++), your compiler, such as LLVM, may also provide some static analyzer functions to warn you about potential issues (for example, in C++, mistaken assignment “=” for equality “==”). We have some tools in Python to identify potential errors or violations of coding standards.

After completing this tutorial, you will be familiar with some of these tools. Specifically,

  1. What are the capabilities of the tools Pylint, Flake8, and mypy?
  2. What exactly are coding style infractions?
  3. How can type hints be used to assist analyzers in identifying potential bugs?

Let’s get started.

Overview

This tutorial is divided into three sections:

  1. Pylint: An Overview
  2. Flake8: An Overview
  3. An Overview of Mypy

Pylint

Lint was the name of a static analyzer for C that was developed many years ago. Pylint, from which the name was derived, is one of the most widely used static analyzers. It is available as a Python package and can be installed using pip:

$ pip install pylint

The command pylint is then available in our system.

Pylint can examine a single script or the entire directory. For example, suppose we have the script saved as lenet5-notworking.py:

import numpy as np
import h5py
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Dropout, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

# Load MNIST digits
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

# Reshape data to (n_samples, height, wiedth, n_channel)
X_train = np.expand_dims(X_train, axis=3).astype("float32")
X_test = np.expand_dims(X_test, axis=3).astype("float32")

# One-hot encode the output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# LeNet5 model
def createmodel(activation):
model = Sequential([
Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation=activation),
AveragePooling2D((2,2), strides=2),
Conv2D(16, (5,5), activation=activation),
AveragePooling2D((2,2), strides=2),
Conv2D(120, (5,5), activation=activation),
Flatten(),
Dense(84, activation=activation),
Dense(10, activation="softmax")
])
return model

# Train the model
model = createmodel(tanh)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
earlystopping = EarlyStopping(monitor="val_loss", patience=4, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32, callbacks=[earlystopping])

# Evaluate the model
print(model.evaluate(X_test, y_test, verbose=0))
model.save("lenet5.h5")

We can use Pylint to see how good our code is before we run it:

$ pylint lenet5-notworking.py

The output is as follows:

************* Module lenet5-notworking
lenet5-notworking.py:39:0: C0301: Line too long (115/100) (line-too-long)
lenet5-notworking.py:1:0: C0103: Module name "lenet5-notworking" doesn't conform to snake_case naming style (invalid-name)
lenet5-notworking.py:1:0: C0114: Missing module docstring (missing-module-docstring)
lenet5-notworking.py:4:0: E0611: No name 'datasets' in module 'LazyLoader' (no-name-in-module)
lenet5-notworking.py:5:0: E0611: No name 'models' in module 'LazyLoader' (no-name-in-module)
lenet5-notworking.py:6:0: E0611: No name 'layers' in module 'LazyLoader' (no-name-in-module)
lenet5-notworking.py:7:0: E0611: No name 'utils' in module 'LazyLoader' (no-name-in-module)
lenet5-notworking.py:8:0: E0611: No name 'callbacks' in module 'LazyLoader' (no-name-in-module)
lenet5-notworking.py:18:25: E0601: Using variable 'y_train' before assignment (used-before-assignment)
lenet5-notworking.py:19:24: E0601: Using variable 'y_test' before assignment (used-before-assignment)
lenet5-notworking.py:23:4: W0621: Redefining name 'model' from outer scope (line 36) (redefined-outer-name)
lenet5-notworking.py:22:0: C0116: Missing function or method docstring (missing-function-docstring)
lenet5-notworking.py:36:20: E0602: Undefined variable 'tanh' (undefined-variable)
lenet5-notworking.py:2:0: W0611: Unused import h5py (unused-import)
lenet5-notworking.py:3:0: W0611: Unused tensorflow imported as tf (unused-import)
lenet5-notworking.py:6:0: W0611: Unused Dropout imported from tensorflow.keras.layers (unused-import)

-------------------------------------
Your code has been rated at -11.82/10

If you give Pylint the root directory of a module, Pylint will check all of the module’s components. In that case, the path to various files will appear at the beginning of each line.

There are several points to consider here. First, Pylint’s complaints are divided into categories. We would most commonly see issues with convention (i.e., a matter of style), warnings (i.e., the code may run in a way that is inconsistent with what you intended to do), and errors (i.e., the code may fail to run and throw exceptions). They are identified by a code, such as E0601, with the first letter representing the category.

Pylint can produce false positives. Pylint flagged the import from tensorflow.keras.datasets as an error in the preceding example. Because of optimization in the Tensorflow package, when we import Tensorflow, not everything is scanned and loaded by Python, but a LazyLoader is created to help load only the necessary part of a large package. This saves time when starting the program, but it also confuses Pylint because we appear to import something that does not exist.

Furthermore, one of Pylint’s key features is that it assists us in aligning our code with the PEP8 coding style. When we define a function without a docstring, Pylint will complain that we did not follow the coding convention, even if the code is not broken.

But the most important function of Pylint is to assist us in identifying potential problems. We misspelled y train as Y train with an uppercase Y, for example. Pylint will inform us that we are using a variable without assigning it a value. It does not tell us what went wrong, but it does point us in the right direction for proofreading our code. Similarly, when we define the variable model on line 23, Pylint informs us that the same name exists in the outer scope. As a result, the later reference to the model may not be what we intended. Similarly, unused imports could indicate that we misspelled the module names.

Pylint provided all of these hints. We must still use our discretion to correct our code (or ignore Pylint’s complaints).

However, if you know what Pylint should stop complaining about, you can request that those be ignored. For example, we know the import statements are correct, so we can use Pylint to check them:

$ pylint -d E0611 lenet5-notworking.py

Pylint will now ignore all errors of code E0611. You can disable multiple codes by using a comma-separated list, for example,

$ pylint -d E0611,C0301 lenet5-notworking.py

If you only want to disable some issues on a single line or section of code, you can add special comments to your code as follows:

...
from tensorflow.keras.datasets import mnist # pylint: disable=no-name-in-module
from tensorflow.keras.models import Sequential # pylint: disable=E0611
from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Dropout, Flatten
from tensorflow.keras.utils import to_categorical

Pylint-specific instructions are introduced by the magic keyword pylint:. The code E0611 corresponds to the name no-name-in-module. Because of the special comments, Pylint will complain about the last two import statements but not the first two in the preceding example.

Flake8

Flake8 is a wrapper around PyFlakes, McCabe, and pycodestyle. When installing flake8 with:

$ pip install flake8

All of these dependencies will be installed.

After installing this package, we have the command flake8, which is similar to Pylint in that we can pass in a script or a directory for analysis. However, Flake8’s emphasis is on coding style. As a result, for the same code as above, we would get the following result:

$ flake8 lenet5-notworking.py
lenet5-notworking.py:2:1: F401 'h5py' imported but unused
lenet5-notworking.py:3:1: F401 'tensorflow as tf' imported but unused
lenet5-notworking.py:6:1: F401 'tensorflow.keras.layers.Dropout' imported but unused
lenet5-notworking.py:6:80: E501 line too long (85 > 79 characters)
lenet5-notworking.py:18:26: F821 undefined name 'y_train'
lenet5-notworking.py:19:25: F821 undefined name 'y_test'
lenet5-notworking.py:22:1: E302 expected 2 blank lines, found 1
lenet5-notworking.py:24:21: E231 missing whitespace after ','
lenet5-notworking.py:24:41: E231 missing whitespace after ','
lenet5-notworking.py:24:44: E231 missing whitespace after ','
lenet5-notworking.py:24:80: E501 line too long (87 > 79 characters)
lenet5-notworking.py:25:28: E231 missing whitespace after ','
lenet5-notworking.py:26:22: E231 missing whitespace after ','
lenet5-notworking.py:27:28: E231 missing whitespace after ','
lenet5-notworking.py:28:23: E231 missing whitespace after ','
lenet5-notworking.py:36:1: E305 expected 2 blank lines after class or function definition, found 1
lenet5-notworking.py:36:21: F821 undefined name 'tanh'
lenet5-notworking.py:37:80: E501 line too long (86 > 79 characters)
lenet5-notworking.py:38:80: E501 line too long (88 > 79 characters)
lenet5-notworking.py:39:80: E501 line too long (115 > 79 characters)

The error codes that begin with the letter E are from pycodestyle, while those that begin with the letter F are from PyFlakes. It complains about coding style issues, such as the use of (5,5) without a space after the comma. We can also see that it can detect the use of variables prior to assignment. However, it fails to detect some code smells, such as the function createmodel(), which reuses the variable model that was previously defined in the outer scope.

We can ask Flake8 to ignore some complaints in the same way that Pylint does. As an example,

flake8 --ignore E501,E231 lenet5-notworking.py

These lines will not appear in the output:

lenet5-notworking.py:2:1: F401 'h5py' imported but unused
lenet5-notworking.py:3:1: F401 'tensorflow as tf' imported but unused
lenet5-notworking.py:6:1: F401 'tensorflow.keras.layers.Dropout' imported but unused
lenet5-notworking.py:18:26: F821 undefined name 'y_train'
lenet5-notworking.py:19:25: F821 undefined name 'y_test'
lenet5-notworking.py:22:1: E302 expected 2 blank lines, found 1
lenet5-notworking.py:36:1: E305 expected 2 blank lines after class or function definition, found 1
lenet5-notworking.py:36:21: F821 undefined name 'tanh'

We can also use magic comments to disable certain complaints, for example,

...
import tensorflow as tf # noqa: F401
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential

Flake8 will look for the comment # noqa: in order to skip some complaints on those lines.

Mypy

Python is not a typed language, so you do not need to declare the types of some functions or variables before using them, unlike C or Java. However, Python recently introduced type hint notation, which allows us to specify what type a function or variable is intended to be without enforcing compliance like a typed language.

One of the most significant advantages of using type hints in Python is that they provide additional information for static analyzers to check. Mypy is a tool that can recognize type hints. Mypy can provide complaints similar to Pylint and Flake8 even without type hints.

Mypy can be downloaded from PyPI:

$ pip install mypy

The following example can then be passed to the mypy command:

$ mypy lenet5-notworking.py
lenet5-notworking.py:2: error: Skipping analyzing "h5py": module is installed, but missing library stubs or py.typed marker
lenet5-notworking.py:2: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
lenet5-notworking.py:3: error: Skipping analyzing "tensorflow": module is installed, but missing library stubs or py.typed marker
lenet5-notworking.py:4: error: Skipping analyzing "tensorflow.keras.datasets": module is installed, but missing library stubs or py.typed marker
lenet5-notworking.py:5: error: Skipping analyzing "tensorflow.keras.models": module is installed, but missing library stubs or py.typed marker
lenet5-notworking.py:6: error: Skipping analyzing "tensorflow.keras.layers": module is installed, but missing library stubs or py.typed marker
lenet5-notworking.py:7: error: Skipping analyzing "tensorflow.keras.utils": module is installed, but missing library stubs or py.typed marker
lenet5-notworking.py:8: error: Skipping analyzing "tensorflow.keras.callbacks": module is installed, but missing library stubs or py.typed marker
lenet5-notworking.py:18: error: Cannot determine type of "y_train"
lenet5-notworking.py:19: error: Cannot determine type of "y_test"
lenet5-notworking.py:36: error: Name "tanh" is not defined
Found 10 errors in 1 file (checked 1 source file)

We see similar errors to Pylint above, though not always as precisely (for example, the issue with the variable y train). However, we can see one feature of mypy above: It expects all of the libraries we used to include a stub so that type checking can be performed. This is because type hints are optional. If the code from a library does not provide type hints, the code will still work, but mypy will not be able to verify it. Typing stubs are available in some libraries, allowing mypy to better check them.

Consider the following example:

import h5py

def dumphdf5(filename: str) -> int:
"""Open a HDF5 file and print all the dataset and attributes stored

Args:
filename: The HDF5 filename

Returns:
Number of dataset found in the HDF5 file
"""
count: int = 0

def recur_dump(obj) -> None:
print(f"{obj.name} ({type(obj).__name__})")
if obj.attrs.keys():
print("\tAttribs:")
for key in obj.attrs.keys():
print(f"\t\t{key}: {obj.attrs[key]}")
if isinstance(obj, h5py.Group):
# Group has key-value pairs
for key, value in obj.items():
recur_dump(value)
elif isinstance(obj, h5py.Dataset):
count += 1
print(obj[()])

with h5py.File(filename) as obj:
recur_dump(obj)
print(f"{count} dataset found")

with open("my_model.h5") as fp:
dumphdf5(fp)

This program is designed to load an HDF5 file (such as a Keras model) and print every attribute and data contained within it. We used the h5py module (which lacks a typing stub and thus cannot be identified by mypy), but we added type hints to the function we defined, dumphdf5 (). This function takes an HDF5 filename and prints everything stored inside. The number of datasets stored will be returned at the end.

When we save this script as dumphdf5.py and run it through mypy, we get the following results:

$ mypy dumphdf5.py
dumphdf5.py:1: error: Skipping analyzing "h5py": module is installed, but missing library stubs or py.typed marker
dumphdf5.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
dumphdf5.py:3: error: Missing return statement
dumphdf5.py:33: error: Argument 1 to "dumphdf5" has incompatible type "TextIO"; expected "str"
Found 3 errors in 1 file (checked 1 source file)

We misused our function by passing an opened file object to dumphdf5() rather than just the filename (as a string). This error is detectable by Mypy. We also stated that the function should return an integer, but the function lacked a return statement.

There is one more error in this code that mypy did not detect. Because it is defined outside of scope, the variable count in the inner function recur_ dump() should be declared nonlocal. Pylint and Flake8 can detect this error, but mypy missed it.

The corrected code is shown below, with no more errors. To silence mypy’s typing stubs warning, we added the magic comment “# type: ignore” at the first line:

import h5py # type: ignore


def dumphdf5(filename: str) -> int:
"""Open a HDF5 file and print all the dataset and attributes stored

Args:
filename: The HDF5 filename

Returns:
Number of dataset found in the HDF5 file
"""
count: int = 0

def recur_dump(obj) -> None:
nonlocal count
print(f"{obj.name} ({type(obj).__name__})")
if obj.attrs.keys():
print("\tAttribs:")
for key in obj.attrs.keys():
print(f"\t\t{key}: {obj.attrs[key]}")
if isinstance(obj, h5py.Group):
# Group has key-value pairs
for key, value in obj.items():
recur_dump(value)
elif isinstance(obj, h5py.Dataset):
count += 1
print(obj[()])

with h5py.File(filename) as obj:
recur_dump(obj)
print(f"{count} dataset found")
return count


dumphdf5("my_model.h5")

Finally, the three tools we discussed above can be used in conjunction with one another. You should consider running all of them to look for potential bugs in your code or to improve your coding style. Each tool allows you to customize it with some configuration, either from the command line or from a config file (e.g., how long a line should be to warrant a warning?). Using a static analyzer can also help you improve your programming skills.

Summary

You’ve seen how some common static analyzers can help you write better Python code in this tutorial. You specifically learned:

  1. The advantages and disadvantages of three tools: Pylint, Flake8, and mypy
  2. How to Tailor the Behaviour of These Tools
  3. How can these analyzers’ complaints be understood

Source link