Skip to content

Tunning Neural Networks

Hyperparameters

Learning Rate

We start at some random point, and sample difference solutions (weights) seeking to minimize some cost function over many epochs. How far apart these samples are is the learning rate.

  • Neural Networks are trained by gradient descendent algorithms.
  • too high learning rate means we might overshoot the optimal solution.
  • too low learning rate means we might take too long to get the optimal solution.

Batch Size

The batch size is the number of training examples utilized in one iteration.

  • Small batch sizes tend to not get stuck in local minima, but take longer to converge.
  • Large batch sizes can converge on the wrong solution at random