Handwritten digit recognition using KNN

In our previous post, we’ve discussed classification problems and algorithms available in sklearn module along with implementation of KNN algorithm. In this post, let’s build a model to recognize handwritten digits.

Requirements :

  • sklearn

Commands :

Open your anaconda prompt and verify the installation of sklearn using pip list, if not available use the command pip install sklearn to install it.

Data Collection

The dataset we use in this post is a modified version of the “Optical Recognition of Handwritten Digits Data Set” by E. Alpaydin, C. Kaynak, Department of Computer Engineering at Bogazici University, 80815 Istanbul Turkey, retrieved from the UCI Machine Learning Repository on October 3, 2010. You can get the dataset from here.

Data pre-processing

It is very important to know how the dataset looks like before we start to build a model. So, let’s take a look how our data looks like.

Digit-0

Here, the cannot be directly feed to the algorithm. So, we convert the 32×32 dimensional data into a vector of dimension 1×1024.

def img2vector(filename):
    imgVector = []
    file = open(filename)
    for i in range(32):
        lineStr = file.readline()
        for j in range(32):
            imgVector.append(int(lineStr[j]))
    return imgVector

In the above code, the given text file is read line by line, character by character converted into a number and stored as a vector.

trainData = './trainingDigits'
trainingFiles = listdir(trainData)
trainDigits = []
trainLabels = []
for file in trainingFiles:
    trainDigits.append(img2vector(trainData+'/'+file))
    trainLabels.append(int(file[0]))

In the above code, we’ve listed all the files in our trainingDigits folder, converted them into vectors.We have saved all our vector data in trainDigits and labels in trainLabels.

testData = './testDigits'
testingFiles = listdir(testData)
testDigits = []
testLabels = []
for file in testingFiles:
    testDigits.append(img2vector(testData+'/'+file))
    testLabels.append(int(file[0]))

In the above code, we’ve listed all the files in our testDigits folder, converted them into vectors.We have saved all our vector data in testDigits and labels in testLabels.

Training the model

As we have our data ready to be modelled, let’s fit into our KNN algorithm which is available in sklearn module.

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(trainDigits,trainLabels)
trainScore = knn.score(trainDigits,trainLabels)
print(trainScore)

The above code is used to fit our data into the KNN algorithm using our trainDigits and trainLabels data.

Testing the model

In the above step the model shows an accuracy of 0.98. We need to check how our model behaves on unseen data.

pred = knn.predict(testDigits)
print("Predicted - Actual")
for i in range(20):
    print(pred[i],testLabels[i])
predicted v/s actual
from sklearn.metrics import accuracy_score
accuracy_score(testLabels,pred)

The above code runs our model against the test data. The model is able to predict the unseen data with an accuracy of 0.9809725158562368 which approximates to 0.98. Let’s have a look on one of the predicted sample in our test data.

You can clone or fork our repository in github which consists of the whole code and the data. There are many other algorithms other than KNN in our repo which you can practice in your local machine.

Conclusion

In this post we’ve implemented python code for the handwritten digit recognition.In our next post, we’ll be discussing about decision trees.Until then, cheers✌️ .

One response to “Handwritten digit recognition using KNN”

Leave a comment

Design a site like this with WordPress.com
Get started