Hello all, It’s been a while i have posted a blog in this series “Artificial Neural Networks”. We are back with an interesting post on Implementation of Multi Layer Networks in python from scratch.
We discussed all the math stuff about Multi Layer Networks in our previous post. I recommend you going through that first to have a clear understanding of this post. In this post, we will jump into the python implementation. We will build a simple Multi Layer Network with two hidden layers and one output layer. Before writing all the math functions required let’s import every module we’re gonna use in this post.
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline
Initialising weights
As i have already mentioned, we will build a neural network with two hidden layers( one input and one hidden ). Each and every link in the network will be associated with a weight. We will initialise those weights and create the architecture using the below snippet of code.
init() function will be used to create weight vectors depending on the number of neurons in the adjacent layers. We can initialise every weight with zero or lower values, which results in network having too little signals to work with. In the case of bigger values, it becomes too cumbersome for the network to deal with.
def init(inp,out):
return np.random.randn(inp,out)/np.sqrt(inp)
So, initialising weights should always depend on the values your activation function uses to send signals through your network. The simple solution is that to initialise them with vectors having zero mean and one standard deviation, which statistically is normal distribution, as you can see in our code with randn() function. As the neuron in the adjacent layer receives values from all the neurons in the previous layer, they need to be rescaled with respect to the number of neurons in the input layer or the previous layer.
Creating Architecture
Our create_arch() function takes layers as an input and creates a fully connected neural network by initialising the link weights between neurons using our init() function.
def create_arch(input_layer,first_layer,output_layer,random_seed=0):
np.random.seed(random_seed)
layers=X.shape[1],3,1
arch=list(zip(layers[:-1],layers[1:]))
weights=[init(inp,out) for inp,out in arch]
return weights
Activation Function
Now that we have the architecture ready, we need to decide on the activation function which is used to fire a neuron. As we are building a sample network, we will go with one of the most used activation functions “sigmoid”. Sigmoid function is best used for modelling probability based networks. The output of this function lies in between 0-1.

Below is the code which defines the activation function.
def sigmoid(z):
return 1/(1+np.exp(-z))
def sigmoid_derivative(s):
return s*(1-s)
Feed Forward Network
We are all set to create a feed forward network with everything we have until now. In a feed forward network, we will multiply the input values with the weight vector and pass it to the activation function and return the output of the function.
def feed_forward(X,weights):
a=X.copy()
out=list()
for w in weights:
z=np.dot(a,w)
a=sigmoid(z)
out.append(a)
return out
Back Propagation
Having a feed forward network will not help us in most of the cases, as we need to update the weights of our network in case of any error in the output while training. Detailed explanation of this is given in our previous post. Below is the code which implements Back Propagation algorithm.
def backpropagation(l1,l2,weights,y):
l2_error=y.reshape(-1,1)-l2
l2_delta=l2_error*sigmoid_prime(l2)
l1_error=l2_delta.dot(weights[1].T)
l1_delta=l1_error*sigmoid_prime(l1)
return l2_error,l1_delta,l2_delta
l1,l2 represent the two hidden layers in our network and l1_error and l2_error are errors associated with each layer. l1_delta and l2_delta is the correction to the error in our network weights.
Updation of weights
Now that we know the error in our network, we need to update our weights accordingly.
def update_weights(X,l1,l1_delta,l2_delta,weights,alpha=1.0):
weights[1]+=(alpha*l1.T.dot(l2_delta))
weights[0]+=(alpha*X.T.dot(l1_delta))
return weights
Here, alpha represents the learning rate which describes how fast or slow your network should learn. The value of alpha plays a key role to build an efficient neural network. A too big value for alpha makes your model skip through many local minima and more often global minima too. In the same way, having a too low value will take your network forever to learn. Normally, the learning rate will be a bit high in the initial epochs ( number of times you train the network ), as the accuracy increases we reduce the alpha value to better converge to the global minima.
Prediction using our network
We will predict the output of our network using the trained weights and our feed forward network.
def predict(X,weights):
_,l2=feed_forward(X,weights)
preds=np.ravel((l2>0.5).astype(int))
return preds
Our feed forward network returns the data point and the probability associated with it. As we are dealing with the moon data, to predict to which of the either moons our point belongs to, we set a threshold of 0.5 which means anything above 0.5 will be considered as a point in moon1 and all others in moon2. Dont worry much about what these are, you will have a better understanding when we visualize the data.
Loading the data and Visualization
In this sample, we will use the make_moons module from the sklearn which will create two moons given the number of data points.
coord,cl=make_moons(1000,noise=0.05)
X,Xt,y,yt=train_test_split(coord,cl,test_size=0.3,random_state=0)
plt.scatter(X[:,0],X[:,1],s=25,c=y,cmap=plt.cm.Set1)
plt.show()
make_moons will return the coordinates of each data point and the class to which it belongs. Let’s visualize this to have a clear understanding.

Why this Data ?
As you can see, the data is highly non linear because data points belonging to each moon are not linearly seperable i.e cannot be represented by drawing a line between them. So this kind of data would be challenging for our neural network to perform better.
Training the network
As we have all the functions ready, let’s train our network using the moons data.
weights=create_arch(X,3,1)
for j in range(30000+1):
l1,l2=feed_forward(X,weights)
l2_error,l1_delta,l2_delta=backpropagation(l1,l2,weights,y)
weights=update_weights(X,l1,l1_delta,l2_delta,weights,alpha=0.05)
if(j%5000==0):
train_error=np.mean(np.abs(l2_error))
print("epoch {:5} ".format(j),end='-')
print(' error:{:0.4f} '.format(train_error),end='-')
train_accuracy=accuracy(true_label=y,predicted=(l2>0.5))
test_preds=predict(Xt,weights)
test_accuracy=accuracy(true_label=yt,predicted=test_preds)
print(" acc:train {:0.3f} ".format(train_accuracy),end="|")
print(" test {:0.3f} ".format(test_accuracy))
In the first line we created architecture with three nodes in the hidden layer as we have data represented in 3 dimensions. Our input layer is fed with X and output layer has one neuron to output the probability of the data point belonging to either of the moons.
We will train our data with 30000 epochs and then perform Back Propagation to train our model. We output the train accuracy and test accuracy of our model after every 5000 iterations. The output would be the following:

The more the test accuracy, the more efficient our model is. We should have a look on the train accuracy too, to check whether our model is overfitted or not. Yeah, we have built a simple Multi Layer Neural Network with two hidden layers and one output layer. You can download the whole code from my github repo.
We will discuss about different types of Neural Networks available in our upcoming posts. Until then, cheers ✌✌.
One response to “Multi Layer Neural Networks Python Implementation”
[…] have discussed about Multi Layer Neural Networks and it’s implmentation in python in our previous post. In this post, we will discuss breifly on some of the mostly widely used neural network […]
LikeLike