Professional Documents
Culture Documents
Chapter 4 Statistical Classification Methods
Chapter 4 Statistical Classification Methods
CHAPTER 4: Classifications
1
Overview
At the end of this chapter students
should be able to understand
➢Logistic Regression
➢Naïve Bayesian
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
2
Machine Learning
Machine
Learning
Supervised Learning
Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs. It infers a
function from labeled training data consisting of a set of training examples.
In supervised learning, a dataset comprising of elements is given with a set of
features X1,X2,…,Xp as well as a response or outcome variable Y for each
element. The goal was then to build a model to predict Y using X1,X2,…,Xp.
Example:
Regression
and
classification
where prior
information is
available
Classification
Supervised learning or classification: attribution of a class or label to an
observation by exploiting the availability of a training set (labeled data) or in other
words Classification is a subcategory of supervised learning where the goal is
to predict the categorical class labels (discrete, unordered values, group
membership) of new instances based on past observations
Unsupervised Learning
What is unsupervised machine learning?
Unsupervised learning is a machine learning technique in which models are not
supervised using training dataset. Instead, models itself find the hidden patterns
and insights from the given data. It can be compared to learning which takes
place in the human brain while learning new things.
Example: Clustering
Supervised vs Unsupervised
Classification Performance - Confusion Matrix
THE TOOLS
What is confusion matrix?
A confusion matrix is a table that is often used to describe the performance of
a classification model (or "classifier") on a set of test data for which the true
values are known.
TP : True Positive, TN : True Negative
FP : False Positive, FN : False Negative
Classification Performance - Confusion Matrix
THE TOOLS
Example: In medical diagnosis,
test sensitivity is the ability of a test to
correctly identify those with the
disease (true positive rate), whereas test
specificity is the ability of the test to
correctly identify those without the
disease (true negative rate).
TP + TN
Accuracy =
TP + TN + FP + FN
Sensitivity = TP
= recall = r
TP + FN
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁 + 𝐹𝑃
Overview
➢Logistic Regression
➢Naïve Bayesian
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning with Applications in R”, Springer,
ISBN 978-1-4614-7137-0, ISBN 978-1-4614-7138-7 (eBook) 11
Overview – Logistic Regression
Logistic regression is a statistical model that uses a logistic function to model a
binary dependent variable. In regression analysis, logistic regression is
estimating the parameters in a form of binary regression.
What is logistic regression in simple terms?
Logistic Regression, also known as Logit Regression or Logit Model, is a
mathematical model used in statistics to estimate (guess) the probability of an event
occurring having been given some previous data. Logistic Regression works with
binary data, where either the event happens (1), or the event does not happen (0).
12
Overview – Logistic Regression
What is difference between logistic regression and linear regression?
• Linear regression is used for predicting the continuous dependent
variable using a given set of independent features whereas Logistic
Regression is used to predict the categorical.
13
Logistic Regression
Example: Credit Card Fraud
When a credit card transaction happens, the bank
makes note of several factors. For instance, the
date of the transaction, amount, place, type of
purchase, etc. Based on these factors, they
develop a Logistic Regression model of whether
the transaction is a fraud or not.
Logistic Regression
Why is logistic regression better?
Good accuracy for many simple data sets and it performs well when the dataset
is linearly separable.
Logistic Regression
Formula for one variable, X Formula for multi variables of Xi
Logistic Regression - Example
Using a software (e.g. Python, R) to find the Logistic Regression model
Example of output
RESULTS
Logistic Regression - Example
Logistic Regression
More than 2 independent variables
Example of outputs
Logistic Regression - Example
Example: more than 2 variables
A sample of 1000 people were selected to identify how their age, daily internet
usage and time spent on site, will affect their intuition to click on an
advertisement. Use first 700 observations as training datasets and remaining 300
as testing datasets.
Number of observation: 1000
Variables Description
Y “Clicked on Ad”: Indicating clicking on Ad, 0 = NO, 1 = YES
X1 “Daily time spent on Site”: Consumer time spending on site in
minute
X2 “Age”: Consumer Age
X3 “Daily Internet Usage”: Average time in minutes a day consumer is
on the internet (online)
Logistic Regression – Phyton Codes
#Logistic Regression #set the values that will be used to train the
#commands below are used to import the model which is about 70% and test 30%
important commands needed for the codes x_for_train (x[:700])
y_for_train (y[:700])
import matplotlib.pyplot as plt
import numpy as np x_for_test (x[701:1000])
import pandas as pd y_for_test (y[701:1000])
#import data from computer #fit training data command and print the
from google.colab import files results
uploaded = files.upload() model.fit(x_for_train,y_for_train)
beta0 : [19.66511428]
beta1 : [[-0.17887355 0.1258386 -0.06456599]]
Results - Interpretation:
Logistic Regression - Example
Classifying your daily productivity
Lately you’ve been interested in gauging your productivity. You’ve been asking yourself, at
the end of each day, if the day was indeed productive. But that’s just a potentially biased,
qualitative data point. You want to find a more scientific way to go about it. You’ve
observed the natural flows of your day, and realized that what impacts it the most is:
•Sleep you know that sleep, or lack thereof, has a big impact on your day.
•Coffee doesn’t the day start after coffee?
•Focus time it’s not always possible, but you try to have 3–4h of intently focused time to
dive into projects.
•Lunch you’ve noticed the day flows smoothly when you have time for a proper lunch, not
just snacks.
•Walks you’ve been taking short walks to get your steps in, relax a bit and muse about
your projects.
https://towardsdatascience.com/logistic-regression-in-real-life-
building-a-daily-productivity-classification-model-a0fc2c70584e
Logistic Regression - Example
To classify your day as productive or not with a
Logistics Regression model, the first step is to pick
an arbitrary threshold x and assign observations to
each class based on a simple criteria:
•Class Non-Productive, all outcomes that are less
than or equal to x.
•Class Productive otherwise, i.e., all outcomes
greater than x.
Logistic Regression - Example
Observed Data for 20 days
•Outcomes less than or equal to zero are assigned to Class 0, i.e., a nonproductive day.
•Positive outcomes are assigned to Class 1, i.e., a productive day.
Logistic Regression
One of the most significant advantages of the logistic regression model is that it doesn't
just classify but also gives probabilities.
The following are some of the advantages of the logistic
regression algorithm.
•Simple to understand, easy to implement, and efficient to train
•Performs well when the dataset is linearly separable
•Good accuracy for smaller datasets
•Doesn't make any assumptions about the distribution of classes
•Useful to find relationships between features
•Provides well-calibrated probabilities
•Less prone to overfitting in low dimensional datasets
•Can be extended to multi-class classification
Logistic Regression - Example
The following are some of the disadvantages of the logistic regression algorithm:
➢Naïve Bayes
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
31
Naïve Bayes
Learning objectives:
-Introduction - Deterministic vs
Stochastics
-Law of Probability
-Understand Naïve Bayes
Classifier
Some references
http://www3.cs.stonybrook.edu/~cse634/ch6book.pdf
https://www3.cs.stonybrook.edu/~cse634/T14.pdf
Introduction- Stochastic vs Deterministic
A deterministic system is a system A stochastic model is a tool for estimating
in which no randomness is involved probability distributions of potential outcomes
in the development of future states by allowing for random variation in one or more
of the system. A deterministic inputs over time. The random variation is
model will thus always produce the usually based on fluctuations observed in
same output from a given starting historical data for a selected period using
condition or initial state. standard time-series techniques.
33
Example - Stochastic vs Deterministic
34
Laws of Probability
Example
A doctor knows that meningitis causes stiff
neck 50% of the time - likelihood
Prior probability of any patient having
meningitis is 1/50,000 - prior Question:
Prior probability of any patient having stiff If a patient has stiff neck,
neck is 1/20 - prior what is the probability
Solution he/she has meningitis?
𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Naïve Bayes - Example
41
Bayes’ Theorem – Phyton codes
#Naive Bayes #set the values that will be used to train the
#commands below are used to import the model which is about 70% and test 30%
important commands needed for the codes x_for_train (x[:700])
y_for_train (y[:700])
import matplotlib.pyplot as plt
import numpy as np x_for_test (x[701:1000])
import pandas as pd y_for_test (y[701:1000])
43
Naïve Bayes
• Advantages:
• Disadvantage:
Assumes independence of features
Naïve bayes vs Logistic Regression – major differences
1. Purpose or what class of machine leaning does it solve?
Both the algorithms can be used for classification of the data. Example, you could predict whether a banker can
offer a loan to a customer or not or identify given mail is a Spam or not.
2. Algorithm’s Learning mechanism
Naïve Bayes: For the given features (x) and the label y, it estimates a joint probability from the training data, hence
this is a Generative model.
Logistic regression: Estimates the probability(y/x) directly from the training data by minimizing error. Hence this is
a Discriminative model
3. Model assumptions
Naïve Bayes: Model assumes all the features are conditionally independent .so, if some of the features are
dependent on each other (in case of a large feature space), the prediction might be poor.
Logistic regression: It the splits feature space linearly; it works OK even if some of the variables are correlated.
4. Model limitations
Naïve Bayes: Works well even with less training data, as the estimates are based on the joint density function
Logistic regression: With the small training data, model estimates may over fit the data
5. Approach to be followed to improve the results
Naïve Bayes: When the training data size is less relative to the features, the information/data on prior probabilities
help in improving the results
Logistic regression: When the training data size is less relative to the features, Lasso and Ridge regression will
help in improving the results.
https://www.quora.com/What-is-the-difference-between-logistic-regression-and-Naive-Bayes
Overview
➢Logistic Regression
➢Naïve Bayes
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
46
What is linear discriminant analysis
Linear discriminant analysis is a technique that is used by the researcher to
analyze the research data when the criterion or the dependent variable is
categorical and the predictor or the independent variable is interval in nature.
• Model the distribution of X in each of the classes separately, and then use
Bayes theorem to obtain
Pr( 𝑌 = 𝑘|𝑋 = 𝑥)
• Use normal (Gaussian) distributions for each class, this leads to linear or
quadratic discriminant analysis.
• Remark: it could be done with other distributions.
Discriminant Analysis
Linear Discriminant Analysis when there is only 1 predictor (p=1)
Classify to the
highest density
Example of
decision
boundaries:
Discriminant Analysis
Discriminant functions
• To classify the value X = x, we need to find the k which gives the largest pk ( x)
• After simplifications it is equivalent of finding the largest discriminant score using
the formula:
δ0 = - 0.12 δ1 = - 2.52
#Import additional items These codes are not complete, please attend
from sklearn import linear_model
import numpy as np
tutorial/lab classes for full details of the codes
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import
LinearDiscriminantAnalysis as LDA
from sklearn.metrics import accuracy_score
LDA Assumptions:
•LDA assumes normally distributed data and a class-
specific mean vector.
•LDA assumes a common covariance matrix, that is
common to all classes in a data set.
When these assumptions hold, then LDA approximates the Bayes classifier very
closely and the discriminant function produces a linear decision boundary.
QDA vs LDA
QDA Assumptions:
•Observation of each class is drawn from a normal distribution (same as LDA).
•QDA assumes that each class has its own covariance matrix (different from LDA).
When these assumptions hold, QDA approximates the Bayes classifier very closely
and the discriminant function produces a quadratic decision boundary.
In conclusion, LDA is less flexible than QDA because it can estimate fewer
parameters. This can be good when only a few observations in training dataset so
lower the variance. On the other hand, when the K classes have very different
covariance matrices then LDA suffers from high bias and QDA might be a
better choice, what comes down to is the bias-variance trade-off. Therefore, it is
crucial to test the underlying assumptions of LDA and QDA on the data set and
then use both methods to decide which one is more appropriate.
Prediction Models in Healthcare
Machine learning applications in healthcare sector: An overview
Virendra Kumar Verma a, Savita Verma
Materials Today: Proceedings 57 (2022) 2144–2147
This work proposes a smart home health monitoring system that helps to analyze the patient’s
blood pressure and glucose readings at home and notifies the healthcare provider in case of any
abnormality detected. The goal is to predict the hypertension and diabetes status using the
patient’s glucose and blood pressure readings using supervised machine learning classification
algorithms.