We have discussed about Multi Layer Neural Networks and it’s implementation in python in our previous post. In this post, we will discuss briefly on some of the mostly widely used neural network architectures and we will have a detail on Convolutional Neural Networks.
Some of the most widely used Neural Nets:
- Convolutional Neural Networks.
- Recurrent Neural Networks.
- General Adversarial Networks.
- Advanced CNN’s.
In this article, we focus on Convolutional Neural Networks and implement the first ever CNN using keras in python.
Convolutional Neural Networks
Convolutional Neural Networks also called ConvNets was first developed or devised by a French scientist called LeCun at the end of 1980’s. These networks delivered astonishing results and better performance than other networks. CNN’s are used in detecting image edges and shapes in tasks like hand writing recognition, locating certain objects in a given image and even finding multiple parts in a given complex image.
Dealing with Images
Before understanding the architecture of CNN’s, let’s understand the image basics. Images are represented by computers as a matrix of pixel values, with each position in the matrix corresponding to a particular point in the image.
Image is represented by a computer as a three dimensional matrix consisting of height, width and number of channels, which is three for an RGB image and could just be one in case of black and white images.


Understanding Convolutions
Convolution refers to a mathematical function, In this article we focus on the working of the convolutions rather than the math behind it. A convolution works by performing operations on image chunks across every channel at the same time. Image chunks are nothing but simply a moving window over a particular image. Convolution window can be a square or a rectangle which starts from the upper left of the image and moves from left to right and top to bottom. The complete tour of window over an image is termed as a filter.
Each time window shifts a certain number of pixels to process another chunk, the amount of shift the window does is termed as a stride.
Every time the window moves to a new position, a filtering process occurs, in which the values inside the pixel matrix are multiplied by the values in the kernel, which is a small matrix used for blurring, smoothening or sharping the edges in the image. There are different kinds of kernels available based on your need. You can find them here.
As the convolution process get’s done, you will have a new image having the following characteristics:
- If you use a stride of 1, the size of new image will be the same as input image.
- The resulting image might get smaller, depending on the kernel size, as the kernel processes the entire image including the borders, kernel will reduce the image by its size-1. For example, if the image size is 7×7 and the kernel size is 3×3, it will eat up 2 pixels from both height and width resulting in an 5×5 output image.
- We have an option to maintain the size of image by padding it with zeros on the border. This is called as same padding. If you let kernel do it’s job it is called as valid padding.
When using a convolutional neural network, we just need to set the following:
- The number of filters
- The kernel size
- The strides
- padding
After determining the parameters, the optimization process takes care of kernel matrix values. Each kernel element is therefore a neuron modified during training using Back Propagation. You can have a detail on Back Propagation and Neural Networks in this article. By changing the values in the kernel, the network finds the best way to process images for it’s regression or classification purpose.
So kernel can be described as a neural layer whose weights are shared across different input neurons. We do use Keras to implement our CNN, Keras provides us a layer called Conv2D in which we can directly input the image, and set the input_shape inside a tuple. We can also set the filters, kernel_size, strides and padding as parameters to our Conv2D layer. Each layer can also be named using name parameter.
What is Pooling?
CNN’s transform the image by applying the above described process, but as this process continues network becomes too complex to handle. To keep the complexity manageable we need to speed up the filtering and reduce the number of operations.
Pooling layers are used to simplify the output from a convolutional layer, thereby reducing the number of operations and fewer convolutions in the next layer. There are four types of pooling layers available:
- Max Pooling
- Average Pooling
- Global Max Pooling
- Global Average Pooling
For example, we have a 4×4 matrix and using a window of 2 pixels, then performing pool operation on the matrix reduces the output size to 2×2.
In max pooling, the maximum value from the window is retained. Likewise, in average pooling the average value of all the pixels is retained in the output matrix. We can find several pooling layers available in Keras, you can look into this documentation.
LeNet – The First CNN
LeNet is the first convolutional neural network designed and devised by LeCun was implemented by AT&T in the late 1990’s to read ZIP codes for United States Postal Service. This is use to automatically read the check amount. This system didn’t fail according to LeCun, but this didn’t came into notice because no one were interested in AI at that time.
LeNet is used to decipher both printed and handwritten digits.
LeNet consists of two sequences of convolutional and average pooling layers that perform image processing. The last layer of the sequence is flattened and a softmax classifier completes the network and provides the output in terms of probability.
Let’s code…
You can find the whole code in this github repo. Let’s start with importing all the packages required to build LeNet.
import keras
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Conv2D,AveragePooling2D
from keras.layers import Dense,Flatten
from keras.losses import categorical_crossentropy
After importing the necessary modules, we will load the data into our program.
(X_train,y_train),(X_test,y_test)=mnist.load_data()
The first time you run this command, data will be downloaded from the mnist source. Downloaded data consists of single channel 28×28 pixel images representing hand-written digits from zero to nine. We will convert the target value to categorical variables for our neural network to understand.
num_classes=len(np.unique(y_train))
print(y_train[0],end=" => ")
y_train=keras.utils.to_categorical(y_train,10)
y_test=keras.utils.to_categorical(y_test,10)
print(y_train[0])
This would output the following, representing five as a vector.

Now we will prepare the training data to feed into our network. Our training data contains the pixel values ranging from 0 to 255. We will scale this ranging from 0 to 1 to let reduce burden on neural network to perform operations on large numbers.
X_train=X_train.astype(np.float32)/255
X_test=X_test.astype(np.float32)/255
img_rows,img_cols=X_train.shape[1:]
X_train=X_train.reshape(len(X_train),img_rows,img_cols,1)
X_test=X_test.reshape(len(X_test),img_rows,img_cols,1)
input_shape=(img_rows,img_cols,1)
As discussed in our Architecture, let’s build LeNet using Keras containing two convolutional layers and two average pooling layers followed by flattening the output and applying softmax to give us the probability of the class it belongs to.
lenet=Sequential()
lenet.add(Conv2D(6,kernel_size=(5,5),activation="tanh",input_shape=input_shape,padding="same")
lenet.add(AveragePooling2D(pool_size=(2,2),strides=(1,1),padding="valid"))
lenet.add(Conv2D(16,kernel_size=(5,5),strides=(1,1),activation="tanh",padding="valid"))
lenet.add(AveragePooling2D(pool_size=(2,2),strides=(1,1),padding="valid"))
lenet.add(Conv2D(120,kernel_size=(5,5),activation="tanh"))
lenet.add(Flatten())
lenet.add(Dense(84,activation="tanh",name="FC6"))
lenet.add(Dense(10,activation="softmax",name="OUTPUT"))
The convolution operates with a filter size of 6 and a kernel size of 5×5. Activation function for all the layers is tanh, a non linear function that was state of art activation function at the time LeCun devised the network. It is now outdated, you can use ReLU ( Rectified Linear Unit ) instead. As you can see, the number of filters increased in the later convolutions processing the image to obtain more features and then classify them using softmax layer.
As we built the architecture, let’s compile the network.
lenet.compile(loss=categorical_crossentropy,optimizer="SGD",metrics=['accuracy'])
lenet.summary()
You can see a loss parameter in the compile command. Loss function determines the error while training, which is further used for back Propagation. Categorical Cross Entropy is one of the widely used loss functions for classification problems. SGD implies Stochastic gradient descent, which is an optimization function.
The summary of the network would be:

Let’s run the CNN for 50 times, which means 50 epochs with a batch size of 64 images at one time. We will use our test sets to validate our model after each epoch.
batch_size=64
epochs=50
history=lenet.fit(X_train,y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_test,y_test))
It takes time to train the CNN depending on your system configuration. Alternately you can run this on Google colab which takes approximately 8 seconds for each epoch to run. After the last epoch, our model gets validation accuracy of 0.989 which means it can predict 99 out of every 100 images. That’s a good score…🙌🙌
Conclusion
We have discussed CNN’s in this article and implemented one of the earliest CNN’s the LeNet. In our next post we will discuss Recurrent Neural Networks and their implementation using Keras. Until then, cheers..✌✌
One response to “Types of Neural Networks -CNN”
[…] you are not aware of how CNN’s work, refer to our previous article on […]
LikeLike