Training a Neural Net

Hello everyone 👋, Welcome back to our Deep Learning Series. In our previous post, we have discussed Data Preprocessing for DL Models or Neural Networks. In this post, we will delve deep into factors, to be taken into consideration while training a Deep Neural Net. Training model can be considered as heart of Deep Learning pipeline, which involves Data Collection, Data Preparation, Data Preprocessing, Training, Cross Validation, Performance Evaluation and Testing.

This pipeline is iterative, and chosen to be iterative in most of the cases, because of the abstractness involved in feature extraction in Deep Learning tasks, and also considering the bias-variance tradeoff.

Terminology

Let’s look into some terms used while training a Neural Network.

Epoch

In this context i.e. training a DL model, epoch is termed while the model sees a whole train data for once, to learn from it and update the weights accordingly.

Batch

In some cases, where we have huge training data, we can’t feed whole training data directly into model due to many reasons such as Memory Overflow. In those situations, we send training data in set of batches, with preferably a lower batch size.

Iteration

Iterations in deep learning terminology refer to the no. of training processes taken to complete one epoch over the train set. For eg. if we have 1000 images, with 250 images per batch. Our model would take 4 iterations to complete one epoch.

Loss Function

This function is used to quantify the performance of our model. This tells us, the predicting capability of our model compared to that of ground truth values.

Activation Function

Activation function is associated with each node in a Neural network, which can be looked as firing a node to be part of target value prediction. It defines the output from a given node, with a given set of input values. There are many kind of activation functions, which we’ll discuss in upcoming posts.

Mini Batch Gradient Descent

This term refers to updating weights during training phase, which is done based on loss function. We perform Gradient Descent in mini-batches rather than after whole iteration, which is computationally expensive or after every training example which may introduce lot of noise and also effect our model’s regularizing ability.

That covers some of the most used terms which relate to training phase of building a DL model.

Training

As we have already looked at, our model goes through the training data doing thousands of calculations to extract the features or establish the relationship between input features and target labels or values.

For every training example, our model initially predicts a value, which then is compared with the ground truth and loss is generated using loss function, which will then be propagated back to all the nodes in network using Back Propagation. You can refer to our previous article for more detailed explanation on Gradient Descent and Back Propagation.

So, how does our model initially predicts a value without knowing anything about our problem, there lies the essence of initialization of weights and hyper parameter tuning.

We’ll look into them in our upcoming article. With that we would have an outline on building Neural Net based predictive models. Then we will look into a simple but comprehensive problem to keep all of these into practice. So, stay tuned. Until then stay safe. Cheers.🤞

Hello World!