Generative vs. Discriminative Machine Learning Models

Some machine learning models belong to either the “generative” or “discriminative” model categories. Yet what is the difference between these two categories of models? What does it mean for a model to be discriminative or generative?

The short answer is that generative models are those that include the distribution of the data set, returning a probability for a given example. Generative models are often used to predict what occurs next in a sequence. Meanwhile, discriminative models are used for either classification or regression and they return a prediction based on conditional probability. Let’s explore the differences between generative and discriminative models in more detail, so that we can truly understand what separates the two types of models and when each type should be used.

Generative vs. Discriminative Models

There are a variety of ways to categorize a machine learning model. A model can be classified as belonging to different categories like: generative models, discriminative models, parametric models, non-parametric models, tree-based models, non-tree-based models.

This article will focus on the differences between generative models and discriminative models. We’ll start by defining both generative and discriminative models, and then we’ll explore some examples of each type of model.

Generative Models

Generative models are those that center on the distribution of the classes within the dataset. The machine learning algorithms typically model the distribution of the data points. Generative models rely on finding joint probability. Creating points where a given input feature and a desired output/label exist concurrently.

Generative models are typically employed to estimate probabilities and likelihood, modeling data points and discriminating between classes based on these probabilities. Because the model learns a probability distribution for the dataset, it can reference this probability distribution to generate new data instances. Generative models often rely on Bayes theorem to find the joint probability, finding p(x,y). Essentially, generative models model how the data was generated, answer the following question:

“What’s the likelihood that this class or another class generated this data point/instance?”

Examples of generative machine learning models include Linear Discriminant Analysis (LDA), Hidden Markov models, and Bayesian networks like Naive Bayes.

Discriminative Models

While generative models learn about the distribution of the dataset, discriminative models learn about the boundary between classes within a dataset. With discriminative models, the goal is to identify the decision boundary between classes to apply reliable class labels to data instances. Discriminative models separate the classes in the dataset by using conditional probability, not making any assumptions about individual data points.

Discriminative models set out to answer the following question:

“What side of the decision boundary is this instance found in?”

Examples of discriminative models in machine learning include support vector machines, logistic regression, decision trees, and random forests.

Differences Between Generative and Discriminative

Here’s a quick rundown of the major differences between generative and discriminative models.

Generative models:

  • Generative models aim to capture the actual distribution of the classes in the dataset.
  • Generative models predict the joint probability distribution – p(x,y) – utilizing Bayes Theorem.
  • Generative models are computationally expensive compared to discriminative models.
  • Generative models are useful for unsupervised machine learning tasks.
  • Generative models are impacted by the presence of outliers more than discriminative models.

Discriminative models:

  • Discriminative models model the decision boundary for the dataset classes.
  • Discriminative models learn the conditional probability – p(y|x).
  • Discriminative models are computationally cheap compared to generative models.
  • Discriminative models are useful for supervised machine learning tasks.
  • Discriminative models have the advantage of being more robust to outliers, unlike the generative models.
  • Discriminative models are more robust to outliers compared to generative models.

We’ll now briefly explore some different examples of generative and discriminative machine learning models.

Examples of Generative Models

Linear Discriminant Analysis (LDA)

LDA models function by estimating the variance and mean of the data for the each class in the dataset. After the mean and variances for every class has been calculated, predictions can be made by estimating the probability that a given set of inputs belongs to a given class.

Hidden Markov Models

Markov Chains can be thought of as graphs with probabilities that indicate how likely it is that we will move from one point in the chain, a “state”, to another state. Markov chains are used to determine the probability of  moving from state j to state i, which can be denoted as p(i,j). This is just the joint probability mentioned above. A Hidden Markov Model is where an invisible, unobservable Markov chain is used. The data inputs are given to the model and the probabilities for the current state and the state immediately preceding it are used to calculate the most likely outcome.

Bayesian Networks

Bayesian networks are a type of probabilistic graphical model. They represent conditional dependencies between variables, as represented by a Directed Acyclic Graph. In a Bayesian network, each edge of the graph represents a conditional dependency, and each node corresponds to a unique variable. The conditional independence for the unique relationships in the graph can be used to determine the joint distribution of the variables and calculate joint probability. In other words, a Bayesian network captures a subset of the independent relationships in a specific joint probability distribution.

Once a Bayesian network has been created and properly defined, with Random Variables, Conditional Relationships, and Probability Distributions known, it can be used to estimate the probability of events or outcomes.

One of the most commonly used types of Bayesian Networks is a Naive Bayes model. A Naive Bayes model handles the challenge of calculating probability for datasets with many parameters/variables by treating all features as independent from one another.

Examples of Discriminative Models

Support Vector Machines

Support vector machines operate by drawing a decision boundary between data points, finding the decision boundary that best separates the different classes in the dataset. The SVM algorithm draws either lines or hyperplanes that separate points, for 2-dimensional spaces and 3D spaces respectively. SVM endeavors to find the line/hyperplane that best separates the classes by trying to maximize the margin, or the distance between the line/hyperplane to the nearest points. SVM models can also be used on datasets that aren’t linearly separable by using the “kernel trick” to identify non-linear decision boundaries.

Logistic Regression

Logistic regression is an algorithm that uses a logit (log-odds) function to determinant the probability of an input being in one of two states. A sigmoid function is used to “squish” the probability towards either 0 or 1, true or false. Probabilities greater than 0.50 are assumed to be class 1, while probabilities 0.49 or lower are assumed to be 0. For this reason, logistic regression is typically used in binary classification problems. However, logistic regression can be applied to multi-class problems by using a one vs. all approach, creating a binary classification model for each class and determining the probability that an example is a target class or another class in the dataset.

Decision Tree

decision tree model functions by splitting a dataset down into smaller and smaller portions, and once the subsets can’t be split any further the result is a tree with nodes and leaves. Nodes in a decision tree are where decisions about data points are made using different filtering criteria. The leaves in a decision tree are the data points that have been classified. Decision tree algorithms can handle both numerical and categorical data, and splits in the tree are based on specific variables/features.

Random Forests

A random forest model is basically just a collection of decision trees where the predictions of the individual trees are averaged to come to a final decision. The random forest algorithm selects observations and features randomly, building the individual trees based on these selections.

This tutorial article will explore how to create a Box Plot in Matplotlib. Box plots are used to visualize summary statistics of a dataset, displaying attributes of the distribution like the data’s range and distribution.