In our previous post, we discussed about the implementation of perceptron, a simple neural network model in Python. In this post, we will start learning about multi layer neural networks and back propagation in neural networks. The back propagation algorithm is capable of expressing non-linear decision surfaces. So, what is non-linear and what exactly is called linear?
Linear vs Non Linear Functions
Linear functions are those which can be represented on a single line or those which have a constant slope. Eg: y=mx+c or y=c.
Non Linear functions are those which doesn’t have any constant slope or to be more easier, all the polynomials with the highest exponent greater than 1 can be termed as non linear functions. Eg: y=x^2.

Why do we need Back Propagation in Multi Layer Neural Networks ?
Firstly, let’s know how do a multi layer neural network looks like. In a multi layer neural network, there will be one input layer, one output layer and one or more hidden layers.

Each and every node in the nth layer will be connected to each and every node in the (n-1)th layer(n>1). So, the input from the input layer is multiplied with the associated weights of every link and will be traversed till the output layer for the final ouput. In case of any error, unlike perceptron, in this case we might need to update several weight vectors in many hidden layers. This is where Back Propagation comes into place. It’s nothing but updation of the weight vectors in the hidden layers according to the training error or the loss produced in the ouput layer.
BACK PROPAGATION ALGORITHM
In this post, we are considering mutiple output units rather than a single output unit as discussed in our previous post. Therefore the formula for calculating training error for a neural network can be represented as follows:

- outputs is the set of output units in network
- d is the data point
- t and o are target values and the output values produced by the network for the kth output unit for data point ‘d’.
Now that we have the error function, input and output units we need to know the rule for updation of weight vector. Before that let’s know about one of the most common activation functions used in multi layer neural networks i.e sigmoid function.
A sigmoid function is any function which is continuously differentiable be it e^x or hyberbolic tangent(tanh) which produces the output in the range of 0 to 1 ( not including 0 and 1). It can be represented as:

where, y is the linear combination of input vector and the weight vector at a given node.
Now, let’s know how the weight vectors are updated in multi layer networks according to Back Propagation Algorithm.
Updation of weights in Back Propagation
The algorithm can be represented in step-wise manner:
- Input the first data point into the network and calculate the output for each output unit and let it be ‘o’ for every unit ‘u’.
- For each output unit ‘k’, training error ‘ 𝛿 ‘ can be calculated by the given formula:

- For each hidden unit ‘h’, training error ‘ 𝛿 ‘ can be calculated by the given formula in which the training error of output units to which the hidden layer is connected is taken into consideration:

- Update weight vectors by the given formula:


- weight vector from jth node to ith node is updated using above formula in which ‘η’ is the learning rate, ‘𝛿’ is the training error and ‘x’ is the input vector for the given node.
Termination Criterion for Multi layer networks
The above algorithm is continuously implemented on all data points until we specify a termination criterion, which can be implemented in either of these three ways:
- training the network for a fixed number of epochs ( iterations ).
- setting the threshold to an error, if the error goes below the given threshold, we can stop training the neural network further.
- Creating a validation sample of data, after every iteration we validate our model with this data and the iteration with the highest accuracy can be considered as the final model.
The first way of termination might not yield us better results , the most recommended way is the third way as we are aware of the accuracy of our model so far.
Conclusion
We have discussed lot of math in this article regarding multi layer neural networks. In my next post, we will discuss about the derivation of Back Propagation algorithm along with the implementation of multi-layer neural networks in Python. Until then, cheers🤞.
3 responses to “Multi layer Neural Networks Back Propagation”
It is very interesting to know new things
LikeLike
[…] discussed all the math stuff about Multi Layer Networks in our previous post. I recommend you going through that first to have a clear understanding of this post. In this post, […]
LikeLike
[…] training using Back Propagation. You can have a detail on Back Propagation and Neural Networks in this article. By changing the values in the kernel, the network finds the best way to process images for […]
LikeLike