Hello everyone, Welcome to our final blog in this Deep Learning Introduction series. In this blog, we will discuss on hyperparameter tuning, which’s a question on everyone’s mind when getting started with Deep Learning.
Hyperparameter tuning
There are several hyperparameters we should take in consideration while building deep learning models, which are mostly specific to our design choice. In this blog, we will discuss about the most common hyperparameters for most of the deep learning models.
Weight Initialization
Weights are not exactly the hyperparameters, but they form the heart of deep learning. In order to converge to a better minima, and also have non-zero initial weight vectors, might help us converge faster. One such technique is Xavier initialization. Using Xavier initialization, we would derive our weights from a distribution with zero mean and a specific variance, which is specific to your network architecture.
The variance is given by two values, fan-in and fan-out which correspond to number of incoming neurons and outgoing neurons in a specific layer. Here’s the formula for variance based on these values.

Learning Rate
The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. One of the most popular methods is called Adam, which is a method that adapts the learning rate during the model’s training.
We can make use of adaptive learning rates, where we would vary our learning rate during model’s training which helps us in reducing training time or better convergence. There are several ways we can make use of adaptive learning rates. One such is Adam.
Number of Hidden layers & units
The number of hidden layers and no. of neurons in hidden layers always play a huge role in the training process. Large hidden layers are known to learn more and more complex data and representations in the train set. The no. of hidden layers are comparatively more in unsupervised learning than supervised learning.
There is no particular formula or measure on no. of layers or units, but we have seen our models performing well with more no. of layers and units. The first hidden layer would have more no. of nodes than that of the input layer.
Loss Function
Loss function is also considered as a tunable hyperparameter depending on the task our model is expected to perform. Depending on classification/regression we can choose between categorical loss or Squared error, and we have many such loss functions in regression itself. We need to choose the loss function that fits to our data the best way.
Epochs
The no. of epochs or training iterations, has direct implication to our model’s performance. Normally, we do train our model for a larger no. of epochs, and use Early Stopping technique to stop training, if we feel our model’s performing better or sometimes even worse.
Using Grid Search
Grid search simply tries every hyperparameter setting over a specified range of values. This involves a cross-product of all intervals, so the computational expense is exponential in the number of parameters.
It can be easily parallelized, but care should be taken to ensure that if one job fails it fails gracefully; otherwise a portion of the hyperparameter space could be left unexplored.
Grid search iterates over all the possible combinations of our specified hyperparameters, once the best parameters combination are found, we can make use of those parameters to build our final model.
That’s it for this blog, we will catch up again on some more concepts related to Deep Learning and even some real time applications too. Until then, stay safe. Cheers ✌✌
3 responses to “Tuning Hyperparameters in Deep Learning”
I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
360DigiTMG
LikeLike
Really insightful post! Hyperparameter tuning is a game-changer when applying deep learning for predictive asset maintenance and usage patterns. This blog makes a complex topic way more approachable, well done!
LikeLike
Awesome content! As interns diving into deep learning, tuning hyperparameters always felt like magic. This post explains it in a way that finally makes it click, huge help for our learning curve!
internship programs here!
LikeLike