Introduction
This article would cover Maximal- Margin Classifier, Support Vector Classifier, and Support Vector Machines. Although many people mix these terms up, there is a significant difference between them. Let’s take a look at each one individually.
Before get going, let’s understand the hyperplane.
Hyperplane
A hyperplane divides any ‘d’ dimensional space into two parts using a (d-1) dimensional hyperplane. The “green” colored 2-dimensional hyperplane is used to separate the two classes “red” and “blue” present in the third dimension, as shown in the diagram below.
Note!!
The data in the d dimension is divided into two parts using a (d-1) dimension hyperplane.
For instance, a point in (0-D) divides a line (in 1-D) into two parts, a line in (1-D) divides a plane (in 2-D) into two parts, and a plane (in 2-D) divides a three-dimensional space into two parts.
1. Maximal Margin Classifier
This classifier is designed specifically for linearly separable data, refers to the condition in which data can be separated linearly using a hyperplane. But, what is linearly separable data?
Linearly separable and non-linearly separable data
Linear and non-linear separable data are described in the diagram below. Linearly separable data is data that is populated in such a way that it can be easily classified with a straight line or a hyperplane. Non-linearly separable data, on the other hand, is described as data that cannot be separated using a simple straight line (requires a complex classifier).
However, as shown in the diagram below, there can be an infinite number of hyperplanes that will classify the linearly separable classes.
How do we choose the hyperplane that we really need?
Based on the maximum margin, the Maximal-Margin Classifier chooses the optimal hyperplane. The dotted lines, parallel to the hyperplane in the following diagram are the margins and the distance between both these dotted lines (Margins) is the Maximum Margin.
A margin passes through the nearest points from each class; to the hyperplane. The angle between these nearest points and the hyperplane is 90°. These points are referred to as “Support Vectors”. Support vectors are shown by circles in the diagram below.
This classifier would choose the hyperplane with the maximum margin which is why it is known as Maximal – Margin Classifier.
Drawbacks:
This classifier is heavily reliant on the support vector and changes as support vectors change. As a result, they tend to overfit.
They can’t be used for data that isn’t linearly separable. Since the majority of real-world data is non-linear. As a result, this classifier is inefficient.
The maximum margin classifier is also known as a “Hard Margin Classifier” because it prevents misclassification and ensures that no point crosses the margin. It tends to overfit due to the hard margin. An extension of the Maximal Margin Classifier, “Support Vector Classifier” was introduced to address the problem associated with it.
2. Support Vector Classifier
Support Vector Classifier is an extension of the Maximal Margin Classifier. It is less sensitive to individual data. Since it allows certain data to be misclassified, it’s also known as the “Soft Margin Classifier”. It creates a budget under which the misclassification allowance is granted.
Also, It allows some points to be misclassified, as shown in the following diagram. The points inside the margin and on the margin are referred to as “Support Vectors” in this scenario. Whereas, the points on the margins were referred to as “Support vectors” in the Maximal – Margin Classifier.
The margin widens as the budget for misclassification increases, while the margin narrows as the budget decreases.
While building the model, we use a hyperparameter called “Cost”. Here Cost is inverse of budget means when the budget increases —> Cost decreases and vice versa. It is denoted by “C”.
The influence of C’s value on the margin is depicted in the diagram below. When the value is small, for example, C=1, the margin widens, while when the value is high, the margin narrows down.
Small ‘C’ Value —-> Large Budget —–> Wide Margin ——> Allows more misclassification
Large ‘C’ Value —-> Small Budget —–> Narrow Margin ——> Allows less misclassification
Drawback:
Only linear classification can be done by this classifier. It becomes inefficient when classification is nonlinear.
Note the difference!!!!
Maximal Margin Classifier —————————> Hard Margin Classifier
Support Vector Classifier —————————–> Soft Margin Classifier
However, all Maximum-Margin Classifiers and Support Vector Classifiers are restricted to data that can be separated linearly.
3. Support Vector Machines
Support Vector Machines are an extension of Soft Margin Classifier. It can also be used for nonlinear classification by using the kernel. As a result, this algorithm performs well in the majority of real-world problem statements. Since, in the real world, we will mostly find non-linear separable data, which will necessitate the use of complex classifiers to classify them.
Kernel: It transforms non-linear separable data from lower to higher dimensions to facilitate linear classification, as illustrated in the figure below. We use the kernel-based technique to separate non-linear data because separation can be simpler in higher dimensions.
The kernel transforms the data from lower to higher dimensions using mathematical formulas.