Decision Trees in Machine Learning

In our previous post, we’ve implemented python code for the handwritten digit recognition. In this post, let’s understand what are decision trees and implementation of decision trees in python.

Decision Trees:

Decision trees are trees where each node in them represents a question regarding an attribute, edge represents the answer to the question and leaf node represents the actual output or output class.

Let’s understand this simply by a small example. Let’s assume that you want to play cricket on a particular day, how do you decide? So, the factors that we need to consider like you’ll check whether it is cold or hot, how’s the weather- sunny,rainy or cloudy ,wind speed and many more. So, let’s consider the data in the following format

So, now you can decide whether to play or not. But, what if the pattern given doesn’t match any of the above specified manner? This may be a problem. The decision tree can greatly represent the data to make easy decisions.

Decision Tree for playing cricket

How Decision tree works?

The general algorithm for decision tree can be described as follows:

  1. Select the best attribute that best splits or separates the data.
  2. Ask the relevant question.
  3. Follow the answer path.
  4. Repeat these steps until you arrive to the answer.

While constructing a decision tree, the major challenge is to identify the attribute for each root nodes in each level. This process is known as “attribute selection”. This selection can be done by using two methods.Let’s take a look at each.

Information Gain : In order to keep our tree small, we must select an attribute which can split the data into purest form i.e, to split data distinctly. The split with highest information gain is used first and repeat the process until all children nodes are pure or the information gain is ‘0‘.

Gini Index : Gini Index is the measurement of likelihood how often a randomly chosen element is misclassified. The attributes with lower Gini Index need to be consider for splitting or for making a decision.

Implementation of Decision Trees

Let’s start coding

Now, we’ve entered into the most interesting part. We are now building a Bank note classifier using decision trees. So, we use our dataset which contains nearly 13,000 records with 5 attributes variance, skewness, curtosis, entropy and class. So, now let’s start building the model.

Data Collection

import pandas as pd
file = './BankNote_Authentication.csv'
data = pd.read_csv(file)
data.head()

Data pre-processing

When we take a look at the data we can observe that the attributes are of different ranges. So, we scale the data as follows

from sklearn.preprocessing import StandardScaler
scaler  = StandardScaler()
features = scaler.fit_transform(features)
features

Building the model

Now, our data is ready for processing.So, we split the data into parts for testing and training.

from sklearn.model_selection import train_test_split
featureTrain, featureTest, labelTrain, labelTest = train_test_split(features, labels)

Now, let’s start training our Decision Tree algorithm with data as follows

from sklearn import tree
model = tree.DecisionTreeClassifier()
model.fit(featureTrain,labelTrain)

Now, our model is ready for testing. So, we perform testing as follows

pred = model.predict(featureTest)
from sklearn.metrics import classification_report, confusion_matrix
print("Confusion Matrix:\n")
print(confusion_matrix(labelTest,pred))
print("\nClassification Report:\n")
print(classification_report(labelTest,pred))

That’s a good sign that our model got well trained. So, let’s now check the accuracy of our model on unseen data.

from sklearn.metrics import accuracy_score
accuracy_score(labelTest,pred)

Whohhhh!! That’s coooollll🤩. Now, let’s take a look how the decision tree built on our data.

For clear picture, you can view this and the code at my github.

Conclusion

In this post, we’ve discussed about the Decision Trees and implementation of Decision trees in python.In the next post we’ll be discussing about Logistic regression.  Until then, cheers✌️ .

5 responses to “Decision Trees in Machine Learning”

  1. Hi
    It’s a nice article.
    It’s very interesting to read more about how AI can be used in the industry.
    You would love to see my artificial intelligence online course site as well.
    Thanks for sharing.

    Like

Leave a comment

Design a site like this with WordPress.com
Get started