In our previous post, we’ve discussed about the Decision Trees and implementation of Decision trees in python.In this post, let’s understand what is logistic regression and implement it in Python.
Logistic Regression
Logistic Regression can be simply understood as an extension to Linear Regression.Generally, in linear regression we deduce a linear relationship between the predictives and the predictor. In logistic regression, we assume that data follows a linear function and model the data using “Sigmoid Function”.
Sigmoid Function
Considering a linear function, the output of a linear function is always real i.e, continuous. But, in classification problems, the output must be discrete i.e, categorical. In order to convert the numerical value to a categorical value we use “Sigmoid Function”.
The Sigmoid function converts the continuous value and makes it fall in the range (0,1) such that all the positive classes lie on one side and the negative classes on the other side.

Now that we know a bit on the concepts of Logistic Regression, let’s code it in python using sklearn module.
Logistic Regression in Python
Now, we’ve entered into the most interesting part of our blog. Now, let’s build a model by following the steps mentioned in Machine Learning Pipeline.
Data Collection and Data pre-processing
The dataset used in this notebook is “Social Network Ads” that contains data related to purchase of a particular product.The dataset can be obtained from here.
import pandas as pd
file = './Social_Network_Ads.csv'
data = pd.read_csv(file)
data.head()

There is no need of much preprocessing, as they have mentioned that they are no missing values in this data. And the data is related to a particular product. So,the factors that effect the output can be the age and salary. So, let’s extract the required features and their outputs.
features = data.iloc[:,2:4]
labels = data.iloc[:,4]
When we take a close look towards the features, we observe that each feature is of different scale. So, we scale the features as follows
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
scaledFeatures = sc.fit_transform(features)
pd.DataFrame(scaledFeatures,columns = features.columns).head(10)
StandardScaler scales the given data into the range (-3,3)

Splitting the Data
Now let’s split our data into train and test data using train_test_split in sklearn.model_selection module.
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(scaledFeatures , labels)
Building the model
Let’s fit our data into a Logistic Regression model using Logistic Regression in sklearn.linear_model.
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(Xtrain,Ytrain)

Validating the model
Now, let’s test our model on the unseen data.So,we perform as follows
YPredict = classifier.predict(Xtest)
from sklearn.metrics import confusion_matrix,classification_report
cm = confusion_matrix(Ytest, YPredict)
print(cm)
print(classification_report(Ytest,YPredict))

So, our model gets an accuracy nearly 0.84. You can get the complete code and dataset on my github.
Conclusion
In this post, we’ve discussed about Logistic Regression and implementation of Logistic regression in python. In the next post we’ll be discussing about S.V.M. Until then, cheers✌️ .
4 responses to “Logistic Regression”
[…] our previous post, we’ve discussed about Logistic Regression and implementation of Logistic regression in python. […]
LikeLike
trimakasi atas infonya
LikeLike
Howdy! This is my 1st comment here so I just wanted to give a quick shout out and say I genuinely enjoy reading your blog posts. Can you recommend any other blogs/websites/forums that go over the same subjects? Many thanks!
LikeLike
Hey, we have a whole lot of blogs coming on Machine Learning, covering mostly used algorithms. Of course, you can always browse the internet for the algorithm you are looking for. There are n number of resources out there.
LikeLike