You are on page 1of 45

Supervised Machine Learning

Logistic Regression
Associate Professor Lemai Nguyen
Associate Professor Lemai Nguyen
Information Systems and Business Analytics

Expertise areas and research interests:


Artificial Intelligence and Business Analytics
Health Informatics and Digital Health
Socio-technical Analysis and Evaluation

External affiliations / professional activities:


Section Editor, Australasian Journal of Information Systems (2013 – Present)
Honorary Senior Research Affiliate, Epworth Healthcare (2015-2017)
Member of Association for Information Systems (AIS) and AAIS
Member of Australasian Institute of Digital Health (formerly HISA) (2009 -
Present)
Senior Certified Professional, Australian Computer Society (ACS) (2017-Present)
Track Co-Chair of Digital Healthcare Systems, Australasian Conference on
Information Systems, 2019-2022

Email: lemai.nguyen@deakin.edu.au

UPCOMING EVENTS:
• DATA ANALYTICS IN AUSTRALIAN ORGANISATIONS
• EV DETECTION CHALLENGE

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 2
Tell me about you…!

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 3
Kotu V., Deshpande B. Data Science : Concepts and Practice, chapters
1and 4. Second edition. Morgan Kaufmann Publishers; 2019.

Google Colab

https://colab.research.google.com https://jupyter.org/

https://rapidminer.com

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 4
Predictive Machine Learning with Logistic Regression

Logistic
Regression –
Key concepts

Exercises in
Python

Illustrative
example

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 5
Logistic
Regression –
Key concepts

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 6
Supervised Machine Learning

Kotu and Deshpande, 2019, chapter 1

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen MIS716 AI for Business Slide 7
Linear Regression – revision (1)
What it is and How it works

Simple linear regression

y’= f(𝑥) = 𝒃𝟎 + 𝒃𝟏 𝒙
𝒃𝟎 is the intercept of the line

𝒃𝟏 is the slope of the line

𝑥=independent variable/predictor
𝑦=dependent variable/label

Image source: http://www.sthda.com/english/articles/40-


regression-analysis/167-simple-linear-regression-in-r/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


8
Image source: https://www.miabellaai.net/regression.html
If we have more than one independent variables:

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


9
Linear Regression – Revision (2) 𝑦! =actual
Model evaluation metrics and cost functions 𝑦!! =estimated

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙

S𝐚𝐥𝐞𝐬 = $𝟏𝟎, 𝟎𝟎𝟎 + $𝟑𝟏, 𝟓𝟔𝟓 (Ad_spend)

Mean absolute error (MAE)


∑&#$% |𝑦# − 𝑦E# |
𝑀𝐴𝐸 =
𝑛

Root mean square error/deviation (RMSE/RMSD)


∑&#$%(𝑦# − 𝑦E# )'
𝑅𝑀𝑆𝐸 =
𝑛
Image source: http://www.sthda.com/english/articles/40-
regression-analysis/167-simple-linear-regression-in-r/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


10
Observations

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏

• Continuous predictors and target


• No missing data
• No outliers
• No multi-collinearity
• Normal distribution and constant variance
of residuals
• Not good for non-linear relationships,
complex relationships

Image source: https://www.miabellaai.net/regression.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


11
For example
Kotu and Deshpande, 2019

Continuous target data

Discrete target data

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


12
Fitting Linear Regression to a binary target
Kotu and Deshpande, 2019

• Target is categorical
• Predictors can be continuous or
categorical

One way is to convert

into y ∈ {0,1}

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


13
The Logistic function - logit

A common Sigmoid function

x is continuous from - ∞ to + ∞

If x= - ∞ then Sigmoid (x) = 0


If x= + ∞ then Sigmoid (x) = 1

v Aha!! S(x) can be function


to convert

https://en.wikipedia.org/wiki/Sigmoid_function

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


14
Logistic Regression
How it works
Kotu and Deshpande, 2019

p is the probability of the event y happening


then (1-p) is the probability of the event not happening

p/(1-p) -> the odds (odds ratio) of the event happening

log of the odds log (p/(1-p)) is called is called the logit.

Given predictors X, logit is a linear regression

Logit is continuous from - ∞ to + ∞

Probability (y) can be computed using the logistic function (Sigmoid):


= 1 / (1+ e-logit)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


15
Training the logistic model
Model training and loss, cost functions

Given predictors X (x1, x2, ..., xn),

Find the logit, a linear regression to the predictors X:

Computer probability of y using the Sigmoid function:

Training will involve a search for the coefficients bi to maximise the likelihood of estimations
for each datapoint using a simplified likelihood function

y – original target data (training dataset) v Cost function is Sum of all likelihood values.
p – estimated probability v Gradient descent can be utilised to search for
coefficients to maximise the likelihood of
correct estimations

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


16
Are you keeping pace?

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 17
Predictive Machine Learning with Logistic Regression

Illustrative
example

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 18
Business problem framing – business needs and application
• Assist pathologists in
interpretating test data ->
reduced time and improved
accuracy
• Training novice pathologists

• Predictive analytics to
classify diagnosis
• Past biopsy data and Application
results
• To predict cancer diagnosis
• Long delay in returning
pathology results
• Novice pathologists need Analytics (ML)
training
Data

Business needs

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 19
ML Problem Framing: Classification
Predict if a datapoint belongs to one of the predefined classes, based on learning from a labelled
dataset

§ Business Context: Pathology Lab


§ Business Problem: To make cancer diagnoses in less time, with same or higher accuracy
§ Business Data: Historical datasets of biopsy data and results (diagnoses)
§ Machine Learning Problem: Classification

Data Preparation
Model Training Model Evaluation
& Exploration

20

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


Dataset

Data:
V1, V2, V7-V9: biological variables
Diagnosis: healthy or cancerous

Source: adapted from a dataset


provided by Dr Mark Griffin, Industry
Fellow, University of Queensland

Sample size: 699


Number of columns: 7

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 21
Loading and Exploring the Dataset

# load dataset
records = pd.read_csv("/content/drive/MyDrive/VNU2022/biopsy_ln.csv")

records.info()

records.describe()

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 22
Exploration

sns.countplot('class', data=records, hue='class')

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 23
Data preparation

#convert categorical data to numerical


def coding_diagnosis(x):
if x=='cancerous': return 1
if x=='healthy': return 0

records['Diagnosis'] = records['class'].apply(coding_diagnosis)

records.head(10)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 24
Exploration

for i in records.iloc[:,1:5]:
sns.regplot(x=records[i], y=records['Diagnosis'], logistic=True, ci=None)
plt.title(i)
plt.show()

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 25
Data Preparation

#Selecting predictors

features =['V1', 'V2', 'V7', 'V8', 'V9']


X=records[features] #Input data
y=records['Diagnosis'] # Target variable

print(X.head())
print(y.head())

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 26
Data Splitting

from sklearn.model_selection import train_test_split # Import train_test_split function

# Split dataset into training set and test set


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) # 80% training and 20% testing

#inspect the split datasets


print(X_train.head())
print(y_train.head())

print('Training dataset size:',X_train.shape[0])


print('Test dataset size:',X_test.shape[0])

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 27
Model Training

from sklearn.linear_model import LogisticRegression

#Create an initial Logistic Regression model


logreg = LogisticRegression(max_iter=100)

# Train Logistic Regression Classifier with the training dataset


logreg = logreg.fit(X_train, y_train)

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 28
Model Testing

#Make predictions for the test dataset


y_pred = logreg.predict(X_test)

#inspection
inspection=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
inspection.head(20)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 29
Model Testing

import matplotlib.pyplot as plt


from sklearn.metrics import precision_recall_curve
from sklearn.metrics import plot_precision_recall_curve
from sklearn.metrics import plot_confusion_matrix

#Calculate metrics: Accuracy, Precision, Recall, F1,


print("Accuracy: ", metrics.accuracy_score(y_test,y_pred))
print("Precision: ", metrics.precision_score(y_test,y_pred))
print("Recall: ", metrics.recall_score(y_test,y_pred))
print("F1: ", metrics.f1_score(y_test,y_pred))

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 30
Model Testing

#print confusion matrix and evaluation report


from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 31
Plot ROC curve and Confusion Matrix

#Plot ROC (Receiver operating characteristic) curve and confusion matrix


from sklearn.metrics import RocCurveDisplay
from sklearn.metrics import ConfusionMatrixDisplay

RocCurveDisplay.from_estimator(logreg,X_test, y_test)
ConfusionMatrixDisplay.from_estimator(logreg, X_test, y_test)
plt.show()

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 32
Cross Validation

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html

from sklearn.model_selection import KFold


from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=5, random_state=2022, shuffle=True)
evaluation = cross_val_score(logreg, X, y, cv=kfold)

print("Accuracy and standard evaluation: %.3f%% (%.3f%%)" % (evaluation.mean()*100.0, evaluation.std()*100.0))

Accuracy and standard evaluation: 92.983% (2.721%)

from sklearn.linear_model import LogisticRegressionCV

logreg2=LogisticRegressionCV(cv=10, random_state=2022).fit(X, y)
print("Accuracy: %.3f" % logreg2.score(X,y))
Accuracy: 0.930

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 33
Recap

#visualise logistric regression S-curve for a single predictor


sns.regplot(x=X_train['V7'], y=y_train, logistic=True, ci=None)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 34
Rapidminer

Logit = -8.553 + 0.304V1 + 0.194V2 + 1.185V7 + 0.170 V8 + 0.092 V9

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 35
Tuning the model parameters

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

• Select different predictors and feature engineering


• Tuning the decision tree model hyper parameters
• LogisticRegression(class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None,
penalty='l1', random_state=None, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
• penalty can be “l1“, “l2“, “elasticnet” ( both) (adjust loss function)
• C close to 1.0: Light penalty; close to 0.0: Strong penalty
• not all solvers support all penalty types
• Change threshold
y_test2 = logreg.predict_proba(X_test)
y_test2 = logreg.predict_proba(X_test)[:,1] >= 0.3
• Retrain and retest the model

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 36
Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 37
Assumptions

• The target should be categorical


• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough
https://www.statology.org/assumptions-of-logistic-regression/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


38
Linear Regression and Logistic Regression

Linear Regression Logistic Regression


• Supervised ML • Supervised ML
• Linear Regression equations • Linear Regression equations

• Estimation: target is continuous • Classification: target is categorical


• Best-fit line • S-curve (Fit the regression values to the
sigmoid curve)
• Loss function: Prediction Error
• Cost function: Mean squared error • Lost function: maximum likelihood estimation
Cost function: maximum likelihood
estimation values

• Assume linear relationships between • Not assume linear relationships between


predictors and target predictors and target

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


39
Reflections

Pros
• Explainable: Easy to interpret Cons
• Visual representation
• Non-parametric: No assumptions on data • Target must be categorical, best with binary
distribution (linearity, normality) (dichotomous)
• Less effort for data preparation, no need for • Work best if predictors are linearly separable
normalisation by the target
• Work for both numerical and categorical • Require large datasets. Overfitting if datasets
predictors are small
• Complex when having multi-class targets

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


40
Assumptions

• The target should be categorical


• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough
https://www.statology.org/assumptions-of-logistic-regression/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


41
Are you keeping pace?

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 42
Predictive Machine Learning with Logistic Regression

Exercises in
Python

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 43
Google Colab

https://colab.research.google.com https://jupyter.org/

Exercise 1: Cancer Diagnosis


• Biopsy dataset
Exercise 2: Cancer Survivability Prediction
• Breast Cancer dataset
Exercise 3: Diabetes Diagnosis
• Pima Indians Diabetes dataset
Exercise 4: Titanic Survivability Prediction
• Titanic dataset
Exercise 5: Churn Prediction
• Telco Churn dataset

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 44
Additional resources

• Molnar, C., (2022) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable,
https://christophm.github.io/interpretable-ml-book/logistic.html

• Brownlee, J, Multinomial Logistic Regression With Python,


https://machinelearningmastery.com/multinomial-logistic-regression-with-python/

• Huyen, C., (2022) Designing Machine Learning Systems.


https://learning.oreilly.com/library/view/designing-machine-learning/9781098107956/ch05.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 45

You might also like