Audio version of the article

Whether you implement a neural network yourself or you use a built in library for neural network learning, it is of paramount importance to understand the significance of a sigmoid function. The sigmoid function is the key to understanding how a neural network learns complex problems. This function also served as a basis for discovering other functions that lead to efficient and good solutions for supervised learning in deep learning architectures.
In this tutorial, you will discover the sigmoid function and its role in learning from examples in neural networks.
After completing this tutorial, you will know:
 The sigmoid function
 Linear vs. nonlinear separability
 Why a neural network can make complex decision boundaries if a sigmoid unit is used
Let’s get started.
Tutorial Overview
This tutorial is divided into 3 parts; they are:
 The sigmoid function
 The sigmoid function and its properties
 Linear vs. nonlinearly separable problems
 Using a sigmoid as an activation function in neural networks
Sigmoid Function
The sigmoid function is a special form of the logistic function and is usually denoted by σ(x) or sig(x). It is given by:
σ(x) = 1/(1+exp(x))
Properties and Identities Of Sigmoid Function
The graph of sigmoid function is an Sshaped curve as shown by the green line in the graph below. The figure also shows the graph of the derivative in pink color. The expression for the derivative, along with some important properties are shown on the right.
A few other properties include:
 Domain: (∞, +∞)
 Range: (0, +1)
 σ(0) = 0.5
 The function is monotonically increasing.
 The function is continuous everywhere.
 The function is differentiable everywhere in its domain.
 Numerically, it is enough to compute this function’s value over a small range of numbers, e.g., [10, +10]. For values less than 10, the function’s value is almost zero. For values greater than 10, the function’s values are almost one.
The Sigmoid As A Squashing Function
The sigmoid function is also called a squashing function as its domain is the set of all real numbers, and its range is (0, 1). Hence, if the input to the function is either a very large negative number or a very large positive number, the output is always between 0 and 1. Same goes for any number between ∞ and +∞.
Sigmoid As An Activation Function In Neural Networks
The sigmoid function is used as an activation function in neural networks. Just to review what is an activation function, the figure below shows the role of an activation function in one layer of a neural network. A weighted sum of inputs is passed through an activation function and this output serves as an input to the next layer.
When the activation function for a neuron is a sigmoid function it is a guarantee that the output of this unit will always be between 0 and 1. Also, as the sigmoid is a nonlinear function, the output of this unit would be a nonlinear function of the weighted sum of inputs. Such a neuron that employs a sigmoid function as an activation function is termed as a sigmoid unit.
Linear Vs. NonLinear Separability?
Suppose we have a typical classification problem, where we have a set of points in space and each point is assigned a class label. If a straight line (or a hyperplane in an ndimensional space) can divide the two classes, then we have a linearly separable problem. On the other hand, if a straight line is not enough to divide the two classes, then we have a nonlinearly separable problem. The figure below shows data in the 2 dimensional space. Each point is assigned a red or blue class label. The left figure shows a linearly separable problem that requires a linear boundary to distinguish between the two classes. The right figure shows a nonlinearly separable problem, where a nonlinear decision boundary is required.
For three dimensional space, a linear decision boundary can be described via the equation of a plane. For an ndimensional space, the linear decision boundary is described by the equation of a hyperplane.
Why The Sigmoid Function Is Important In Neural Networks?
If we use a linear activation function in a neural network, then this model can only learn linearly separable problems. However, with the addition of just one hidden layer and a sigmoid activation function in the hidden layer, the neural network can easily learn a nonlinearly separable problem. Using a nonlinear function produces nonlinear boundaries and hence, the sigmoid function can be used in neural networks for learning complex decision functions.
The only nonlinear function that can be used as an activation function in a neural network is one which is monotonically increasing. So for example, sin(x) or cos(x) cannot be used as activation functions. Also, the activation function should be defined everywhere and should be continuous everywhere in the space of real numbers. The function is also required to be differentiable over the entire space of real numbers.
Typically a back propagation algorithm uses gradient descent to learn the weights of a neural network. To derive this algorithm, the derivative of the activation function is required.
The fact that the sigmoid function is monotonic, continuous and differentiable everywhere, coupled with the property that its derivative can be expressed in terms of itself, makes it easy to derive the update equations for learning the weights in a neural network when using back propagation algorithm.
Summary
In this tutorial, you discovered what is a sigmoid function. Specifically, you learned:
 The sigmoid function and its properties
 Linear vs. nonlinear decision boundaries
 Why adding a sigmoid function at the hidden layer enables a neural network to learn complex nonlinear boundaries
This article has been published from the source link without modifications to the text. Only the headline has been changed.