5 - Logistic Regression - Lemai Nguyen 2022

Supervised Machine Learning
Logistic Regression
Associate Professor Lemai Nguyen
Associate Professor Lemai Nguyen
Information Systems and Business Analytics
Expertise areas and research interests:

Artificial Intelligence and Business Analytics
Health Informatics and Digital Health
Socio-technical Analysis and Evaluation
External affiliations / professional activities:

Section Editor, Australasian Journal of Information Systems (2013 – Present)
Honorary Senior Research Affiliate, Epworth Healthcare (2015-2017)
Member of Association for Information Systems (AIS) and AAIS
Member of Australasian Institute of Digital Health (formerly HISA) (2009 -
Present)
Senior Certified Professional, Australian Computer Society (ACS) (2017-Present)
Track Co-Chair of Digital Healthcare Systems, Australasian Conference on
Information Systems, 2019-2022
Email: lemai.nguyen@deakin.edu.au
UPCOMING EVENTS:
• DATA ANALYTICS IN AUSTRALIAN ORGANISATIONS
• EV DETECTION CHALLENGE
Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 2
Tell me about you…!
Kotu V., Deshpande B. Data Science : Concepts and Practice, chapters
1and 4. Second edition. Morgan Kaufmann Publishers; 2019.
Google Colab
https://colab.research.google.com https://jupyter.org/
https://rapidminer.com
Predictive Machine Learning with Logistic Regression
Logistic
Regression –
Key concepts
Exercises in
Python
Illustrative
example
Logistic
Regression –
Key concepts
Supervised Machine Learning
Kotu and Deshpande, 2019, chapter 1
Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen MIS716 AI for Business Slide 7
Linear Regression – revision (1)
What it is and How it works
Simple linear regression
y’= f(𝑥) = 𝒃𝟎 + 𝒃𝟏 𝒙
𝒃𝟎 is the intercept of the line
𝒃𝟏 is the slope of the line
𝑥=independent variable/predictor
𝑦=dependent variable/label
Image source: http://www.sthda.com/english/articles/40-

regression-analysis/167-simple-linear-regression-in-r/
Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

8
Image source: https://www.miabellaai.net/regression.html
If we have more than one independent variables:
𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏

9
Linear Regression – Revision (2) 𝑦! =actual
Model evaluation metrics and cost functions 𝑦!! =estimated
𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙
S𝐚𝐥𝐞𝐬 = $𝟏𝟎, 𝟎𝟎𝟎 + $𝟑𝟏, 𝟓𝟔𝟓 (Ad_spend)
Mean absolute error (MAE)

∑&#$% |𝑦# − 𝑦E# |
𝑀𝐴𝐸 =
𝑛
Root mean square error/deviation (RMSE/RMSD)

∑&#$%(𝑦# − 𝑦E# )'
𝑅𝑀𝑆𝐸 =
𝑛
Image source: http://www.sthda.com/english/articles/40-
regression-analysis/167-simple-linear-regression-in-r/

10
Observations
𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏
• Continuous predictors and target

• No missing data
• No outliers
• No multi-collinearity
• Normal distribution and constant variance
of residuals
• Not good for non-linear relationships,
complex relationships
Image source: https://www.miabellaai.net/regression.html

11
For example
Kotu and Deshpande, 2019
Continuous target data
Discrete target data

12
Fitting Linear Regression to a binary target
• Target is categorical
• Predictors can be continuous or
categorical
One way is to convert
into y ∈ {0,1}

13
The Logistic function - logit
A common Sigmoid function
x is continuous from - ∞ to + ∞
If x= - ∞ then Sigmoid (x) = 0

If x= + ∞ then Sigmoid (x) = 1
v Aha!! S(x) can be function

to convert
https://en.wikipedia.org/wiki/Sigmoid_function

14
Logistic Regression
How it works
p is the probability of the event y happening

then (1-p) is the probability of the event not happening
p/(1-p) -> the odds (odds ratio) of the event happening
log of the odds log (p/(1-p)) is called is called the logit.
Given predictors X, logit is a linear regression
Logit is continuous from - ∞ to + ∞
Probability (y) can be computed using the logistic function (Sigmoid):

= 1 / (1+ e-logit)

15
Training the logistic model
Model training and loss, cost functions
Given predictors X (x1, x2, ..., xn),
Find the logit, a linear regression to the predictors X:
Computer probability of y using the Sigmoid function:
Training will involve a search for the coefficients bi to maximise the likelihood of estimations
for each datapoint using a simplified likelihood function
y – original target data (training dataset) v Cost function is Sum of all likelihood values.
p – estimated probability v Gradient descent can be utilised to search for
coefficients to maximise the likelihood of
correct estimations

16
Are you keeping pace?
Illustrative
example
Business problem framing – business needs and application
• Assist pathologists in
interpretating test data ->
reduced time and improved
accuracy
• Training novice pathologists
• Predictive analytics to
classify diagnosis
• Past biopsy data and Application
results
• To predict cancer diagnosis
• Long delay in returning
pathology results
• Novice pathologists need Analytics (ML)
training
Data
Business needs
ML Problem Framing: Classification
Predict if a datapoint belongs to one of the predefined classes, based on learning from a labelled
dataset
§ Business Context: Pathology Lab

§ Business Problem: To make cancer diagnoses in less time, with same or higher accuracy
§ Business Data: Historical datasets of biopsy data and results (diagnoses)
§ Machine Learning Problem: Classification
Data Preparation
Model Training Model Evaluation
& Exploration
20

Dataset
Data:
V1, V2, V7-V9: biological variables
Diagnosis: healthy or cancerous
Source: adapted from a dataset

provided by Dr Mark Griffin, Industry
Fellow, University of Queensland
Sample size: 699

Number of columns: 7
Loading and Exploring the Dataset
# load dataset
records = pd.read_csv("/content/drive/MyDrive/VNU2022/biopsy_ln.csv")
records.info()
records.describe()
Exploration
sns.countplot('class', data=records, hue='class')
Data preparation
#convert categorical data to numerical

def coding_diagnosis(x):
if x=='cancerous': return 1
if x=='healthy': return 0
records['Diagnosis'] = records['class'].apply(coding_diagnosis)
records.head(10)
Exploration
for i in records.iloc[:,1:5]:
sns.regplot(x=records[i], y=records['Diagnosis'], logistic=True, ci=None)
plt.title(i)
plt.show()
Data Preparation
#Selecting predictors
features =['V1', 'V2', 'V7', 'V8', 'V9']

X=records[features] #Input data
y=records['Diagnosis'] # Target variable
print(X.head())
print(y.head())
Data Splitting
from sklearn.model_selection import train_test_split # Import train_test_split function
# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) # 80% training and 20% testing
#inspect the split datasets

print(X_train.head())
print(y_train.head())
print('Training dataset size:',X_train.shape[0])

print('Test dataset size:',X_test.shape[0])
Model Training
from sklearn.linear_model import LogisticRegression
#Create an initial Logistic Regression model

logreg = LogisticRegression(max_iter=100)
# Train Logistic Regression Classifier with the training dataset

logreg = logreg.fit(X_train, y_train)
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Model Testing
#Make predictions for the test dataset

y_pred = logreg.predict(X_test)
#inspection
inspection=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
inspection.head(20)
Model Testing
import matplotlib.pyplot as plt

from sklearn.metrics import precision_recall_curve
from sklearn.metrics import plot_precision_recall_curve
from sklearn.metrics import plot_confusion_matrix
#Calculate metrics: Accuracy, Precision, Recall, F1,

print("Accuracy: ", metrics.accuracy_score(y_test,y_pred))
print("Precision: ", metrics.precision_score(y_test,y_pred))
print("Recall: ", metrics.recall_score(y_test,y_pred))
print("F1: ", metrics.f1_score(y_test,y_pred))
Model Testing
#print confusion matrix and evaluation report

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Plot ROC curve and Confusion Matrix
#Plot ROC (Receiver operating characteristic) curve and confusion matrix

from sklearn.metrics import RocCurveDisplay
from sklearn.metrics import ConfusionMatrixDisplay
RocCurveDisplay.from_estimator(logreg,X_test, y_test)
ConfusionMatrixDisplay.from_estimator(logreg, X_test, y_test)
plt.show()
Cross Validation
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html
from sklearn.model_selection import KFold

from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=5, random_state=2022, shuffle=True)
evaluation = cross_val_score(logreg, X, y, cv=kfold)
print("Accuracy and standard evaluation: %.3f%% (%.3f%%)" % (evaluation.mean()*100.0, evaluation.std()*100.0))
Accuracy and standard evaluation: 92.983% (2.721%)
from sklearn.linear_model import LogisticRegressionCV
logreg2=LogisticRegressionCV(cv=10, random_state=2022).fit(X, y)
print("Accuracy: %.3f" % logreg2.score(X,y))
Accuracy: 0.930
Recap
#visualise logistric regression S-curve for a single predictor

sns.regplot(x=X_train['V7'], y=y_train, logistic=True, ci=None)
Rapidminer
Logit = -8.553 + 0.304V1 + 0.194V2 + 1.185V7 + 0.170 V8 + 0.092 V9
Tuning the model parameters
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
• Select different predictors and feature engineering

• Tuning the decision tree model hyper parameters
• LogisticRegression(class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None,
penalty='l1', random_state=None, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
• penalty can be “l1“, “l2“, “elasticnet” ( both) (adjust loss function)
• C close to 1.0: Light penalty; close to 0.0: Strong penalty
• not all solvers support all penalty types
• Change threshold
y_test2 = logreg.predict_proba(X_test)
y_test2 = logreg.predict_proba(X_test)[:,1] >= 0.3
• Retrain and retest the model
Assumptions
• The target should be categorical

• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough
https://www.statology.org/assumptions-of-logistic-regression/

38
Linear Regression and Logistic Regression
Linear Regression Logistic Regression

• Supervised ML • Supervised ML
• Linear Regression equations • Linear Regression equations
• Estimation: target is continuous • Classification: target is categorical

• Best-fit line • S-curve (Fit the regression values to the
sigmoid curve)
• Loss function: Prediction Error
• Cost function: Mean squared error • Lost function: maximum likelihood estimation
Cost function: maximum likelihood
estimation values
• Assume linear relationships between • Not assume linear relationships between

predictors and target predictors and target

39
Reflections
Pros
• Explainable: Easy to interpret Cons
• Visual representation
• Non-parametric: No assumptions on data • Target must be categorical, best with binary
distribution (linearity, normality) (dichotomous)
• Less effort for data preparation, no need for • Work best if predictors are linearly separable
normalisation by the target
• Work for both numerical and categorical • Require large datasets. Overfitting if datasets
predictors are small
• Complex when having multi-class targets

40
Assumptions
• The target should be categorical

• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough
https://www.statology.org/assumptions-of-logistic-regression/

41
Are you keeping pace?
Exercises in
Python
Google Colab
https://colab.research.google.com https://jupyter.org/
Exercise 1: Cancer Diagnosis

• Biopsy dataset
Exercise 2: Cancer Survivability Prediction
• Breast Cancer dataset
Exercise 3: Diabetes Diagnosis
• Pima Indians Diabetes dataset
Exercise 4: Titanic Survivability Prediction
• Titanic dataset
Exercise 5: Churn Prediction
• Telco Churn dataset
Additional resources
• Molnar, C., (2022) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable,
https://christophm.github.io/interpretable-ml-book/logistic.html
• Brownlee, J, Multinomial Logistic Regression With Python,

https://machinelearningmastery.com/multinomial-logistic-regression-with-python/
• Huyen, C., (2022) Designing Machine Learning Systems.

https://learning.oreilly.com/library/view/designing-machine-learning/9781098107956/ch05.html

5 - Logistic Regression - Lemai Nguyen 2022

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 - Logistic Regression - Lemai Nguyen 2022

Uploaded by

Copyright:

Available Formats

Supervised Machine Learning

Expertise areas and research interests:

External affiliations / professional activities:

Kotu and Deshpande, 2019, chapter 1

Simple linear regression

𝒃𝟏 is the slope of the line

Image source: http://www.sthda.com/english/articles/40-

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

S𝐚𝐥𝐞𝐬 = $𝟏𝟎, 𝟎𝟎𝟎 + $𝟑𝟏, 𝟓𝟔𝟓 (Ad_spend)

Mean absolute error (MAE)

Root mean square error/deviation (RMSE/RMSD)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

• Continuous predictors and target

Image source: https://www.miabellaai.net/regression.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Continuous target data

Discrete target data

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

One way is to convert

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

A common Sigmoid function

If x= - ∞ then Sigmoid (x) = 0

v Aha!! S(x) can be function

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

p is the probability of the event y happening

p/(1-p) -> the odds (odds ratio) of the event happening

log of the odds log (p/(1-p)) is called is called the logit.

Given predictors X, logit is a linear regression

Logit is continuous from - ∞ to + ∞

Probability (y) can be computed using the logistic function (Sigmoid):

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Given predictors X (x1, x2, ..., xn),

Find the logit, a linear regression to the predictors X:

Computer probability of y using the Sigmoid function:

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

§ Business Context: Pathology Lab

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Source: adapted from a dataset

Sample size: 699

sns.countplot('class', data=records, hue='class')

#convert categorical data to numerical

features =['V1', 'V2', 'V7', 'V8', 'V9']

from sklearn.model_selection import train_test_split # Import train_test_split function

# Split dataset into training set and test set

#inspect the split datasets

print('Training dataset size:',X_train.shape[0])

from sklearn.linear_model import LogisticRegression

#Create an initial Logistic Regression model

# Train Logistic Regression Classifier with the training dataset

#Make predictions for the test dataset

import matplotlib.pyplot as plt

#Calculate metrics: Accuracy, Precision, Recall, F1,

#print confusion matrix and evaluation report

#Plot ROC (Receiver operating characteristic) curve and confusion matrix

from sklearn.model_selection import KFold

print("Accuracy and standard evaluation: %.3f%% (%.3f%%)" % (evaluation.mean()*100.0, evaluation.std()*100.0))

Accuracy and standard evaluation: 92.983% (2.721%)

from sklearn.linear_model import LogisticRegressionCV

#visualise logistric regression S-curve for a single predictor

Logit = -8.553 + 0.304V1 + 0.194V2 + 1.185V7 + 0.170 V8 + 0.092 V9

• Select different predictors and feature engineering

• The target should be categorical

print("Accuracy and standard evaluation: %.3f%% (%.3f%%)" % (evaluation.mean()100.0, evaluation.std()100.0))