Skip to content

Activation Functions

Is the function inside of a neuron and define the output of a node given an input or set of inputs.

Activation Functions Types

Linear Activation Function

It does not really do anything. Just returns the input as output.

Binary Step Activation Function

It is used for binary classification problems. It returns 0 if the input is less than a certain threshold and 1 otherwise.

Sigmoid/Logistic Activation Function

It maps the input to a value between 0 and 1. It is commonly used in the output layer for binary classification problems.

  • Non linear

Hyperbolic Tangent (Tanh) Activation Function

It maps the input to a value between -1 and 1.

  • Non linear

Rectified Linear Unit (ReLU) Activation Function

It returns the input if it is positive and 0 otherwise.

  • Dying ReLU problem: instead of go to 0, it goes to a small value like 0.01 to fix it.

Parametric ReLU (PReLU) Activation Function

It is a variant of ReLU where the slope of the negative part is learned during training.

Leaky ReLU Activation Function

It is a variant of ReLU where the slope of the negative part is a small constant value (e.g., 0.01).

Exponential Linear Unit (ELU) Activation Function

It is similar to ReLU but it smooths the negative part using an exponential function.

Swish Activation Function

It is a smooth, non-monotonic function defined as f(x) = x * sigmoid(x). It has been shown to outperform ReLU in some deep learning models.

Softmax Activation Function

It is used in the output layer for multi-class classification problems. It converts outputs to probabilities of each classification.

Choosing the Right Activation Function

  • For multiple classification problems, use Softmax in the output layer.
  • RNNs often use Tanh
  • For everything else, start with ReLU
  • If ReLU is not working well, try Leaky ReLU, then PReLU or Maxout
  • Swish for really deep networks