Specific Attacks on Machine Learning Systems

According to the NCC Group in a recent whitepaper, the growing number of organizations developing and deploying Machine Learning solutions raises concerns about their inherent security.

The classification of attacks with the potential of being executed against Machine Learning systems is provided in the NCC Group’s whitepaper, and it includes examples based on popular libraries like SciKit-Learn, Keras, PyTorch, and TensorFlow platforms.

Despite the several mechanisms enabling this being recorded to some extent, we profess that the security implications of this behavior are not thoroughly understood in the larger ML community.

According to NCC Groups, ML systems are prone to particular forms of attacks, besides more traditional attacks that may try to exploit infrastructure or applications bugs, or other types of issues.

A first risk vector is linked to the fact that several ML models have code that is implemented while loading the model, or while meeting a specific condition, like a given output class is predicted. This implies that an attacker could create a model with malicious code and have it implemented to achieve a variety of goals, such as leaking sensitive information, installing malware, producing output errors, etc.

Therefore, the downloaded models should be treated similarly to a downloaded code such as the verification of the supply chain, cryptographical signature of the content, and scanning of the models for malware if possible.

The NCC Group claims to have successfully exploited this type of vulnerability in a variety of popular libraries, such as Python pickle files, SciKit-Learn pickles, PyTorch pickles, state dictionaries, TensorFlow Server, and others.

Another kind of attack is the adversarial perturbation attack, in which the attacker can create an input resulting in the ML system returning results as per their choice.

Several methods for accomplishing this have been described in the literature, including creating input to increase confidence in any given class or a specific class, or to decrease confidence in any given class. This method could be used to meddle with authentication systems, content filters, and other similar systems.

The whitepaper of the NCC group gives a reference execution of a simple hill-climbing algorithm to illustrate adversarial perturbation through the addition of noise to the pixels of an image: Random noise is added to the image until confidence increases. The perturbed image is then used as the new base image. The noise addition begins with the addition of noise to 5% of the pixels in the image, and the proportion is decreased if unsuccessful.

Other well-known attacks are membership inference attacks, which allow an attacker to determine whether an input was part of the model training set; model inversion attacks, which enable an attacker to collect confidential data in the training set; and data poisoning backdoor attacks, which involve the insertion of particular items into a system’s training data resulting in it responding in a pre-defined way.

As previously stated, the whitepaper includes an exhaustive taxonomy of machine learning attacks, including potential mitigation, and a review of more traditional security issues discovered in many Machine Learning systems.