Data Preprocessing for Neural Networks

Hello everyone👋. In our previous blog we’ve discussed Face Recognition using One Shot Learning. In this post, we will discuss some of the widely used preprocessing methods in Deep Learning. So Let’s jump in.

Data Augmentation

Data Augmentation is one of the most widely used preprocessing strategy in Computer Vision Techniques. Deep Learning models need lot of data to make sure they are properly trained without overfitting or underfitting the train data. In the current data driven era, there is lot of raw data, but very small amount of data is really useful.

To make the most out of our useful training data, we will augment the existing ones to make even more data. With respect to Computer Vision, there are many kind of transformations we can apply to existing images in our train set.

Some Image Augmentation Techniques

  • Horizontal Flip.
  • Vertical Flip.
  • Random Cropping.
  • Brightness Range.
  • Width and Height shift.
  • Rescaling the Image.
  • Zoom Range.
  • Feature Wise Normalization.
  • Input Normalization.
Example of Data Augmentation.

You need not worry about writing codes to apply these preprocessing techniques. Most of the modern Deep Learning frameworks support huge amount of preprocessing techniques. One such for Data Augmentation in Keras is ImageDataGenerator.

Batch Normalization

Batch Normalization is one of the most widely used preprocessing techniques in Deep Learning. This is used in several modern State of The Art Algorithms to prevent higher dependence on initialization of weights or biases.

Batch Normalization takes two hyper-parameters, γ,β. This technique normalizes a particular batch in the data, which can also be used even during our model is being trained.

Formula for Batch Normalization

μB​,σB2​ correspond to the mean and variance of the batch, where we want to apply Batch Normalization Technique.

This technique is mostly used after convolutional layers and can also be used before layers with non linear activations. This can also help our model in being computationally efficient, as we would deal with normalized values.

Preprocessing on numeric Data

There are several preprocessing techniques, we can perform on numerical data. We will discuss one of the mostly widely used ones for continuous and categorical data.

Normalization and Standardization

Normalization refers to scaling the values from different ranges to a common range i.e. [0-1], while standardization refers to transforming the data such that the mean of the data is equal to zero and standard Deviation to one.

Label Encoding

This technique is used to represent categorical variables as a vector of values with either 0 or 1. If the number of classes are equal to 5, and the data we are processing on has label 3, then our encoded vector will be [0,0,0,1,0].

For more preprocessing techniques for numerical and categorical data, head to sklearn documentation.

That’s all for this blog. In our next blog, we will discuss on training a Deep learning model and tuning several hyperparameters, which helps train our model better. Until then, stay safe. Cheers.🤞

One response to “Data Preprocessing for Neural Networks”

Leave a comment

Design a site like this with WordPress.com
Get started