Professional Documents
Culture Documents
Yes or No?
Fail or Pass?
2
“To be, or not to be” decision in the business world
In or Out?
Advanced decision:
Which one? Accept or Deny?
(炒楼 vs. 炒币 vs. 炒鞋)
3
Shall I Lend Money to Him?
o You work for a bank. Some of your customers apply loan from you. By checking
their background, you know the property they own, their credit score and some
information on previous credit card use.
o Decision: should you approve the loan?
4
To Whom to Lend Money?
o𝑌 = 𝑎 + 𝑏& 𝑥& + 𝑏( 𝑥( +𝑏) 𝑥) +𝑏* 𝑥* +𝑏+ 𝑥+
oY: approve or not (binary)
oX:
–X1: Homeowner or not
–X2: Credit score
–X3: Years of credit history
–X4: Revolving balance
–X5: Revolving utilization
5
Can Not Use Linear Regression!
o If we use linear OLS (ordinary least square) regression:
𝑌 = 𝑎 + 𝑏𝑋 + 𝑒 ; where Y = (0, 1)
6
Why Not Linear Regression?
Exam Result
7
Source: http://www.math.cornell.edu/~numb3rs/kostyuk/num218.htm
8
S-Shaped Curve
• In the 1970s, the statisticians came up
with the logistic regression.
• The idea is to approximate the data by
an “S” shaped curve, call logistic
curve, and
• is given the following equation:
) !"#$
𝑦=
*+) !"#$
9
Odds of an event
• We define the odds of an event A as:
the probability of A happening divided
by the probability of it not happening.
𝑷(𝑨)
𝑶𝒅𝒅𝒔 𝑨 =
𝟏 − 𝑷(𝑨)
• Natural log of the odds: • In the left figure, we divide the hours
studied into intervals of 40 hours, now
Ln(odds) is called as logit of p. calculate the odds of different intervals. Use X to
• Use LN() function in Excel to denote the hours studied.
calculate the log odds. You Odds(60<X<=100)=(1/6)/(1-(1/6))=1/5
Odds(100<X<=140)=1/4
will get the same results as Odds(140<X<=180)=2/3
shown in the figure. Odds(180<X<=220)=5
Odds(220<X<=260) =4
10
Probabilities and the Corresponding Odds
11
Probabilities, odds, and the corresponding log-odds
12
Kinds of research questions
oLogistic regression allows one to predict a discrete outcome
(DV) such as group membership from a set of variables that may
be continuous, discrete, dichotomous, or mix (IVs).
13
Logistic Regression
o The purpose is to assess the effects of multiple
explanatory variables, which can be numeric and/or
categorical, on the outcome.
o Establishing a classification system based on the logistic
model for determining classification.
– Profiling: already know the outcome category, want to
describe the characteristics of the explanatory variables
– Classification: logistic regression, predict the class
probability
o Has a non-linear decision function
14
The Logistic Regression Model
log[p/(1-p)] = a + bX + e
Log(odds ratio)= a + bX + e
o p is the probability that the event Y occurs, p(Y=1)
o p/(1-p) is the "odds ratio"
o ln[p/(1-p)] is the log odds ratio, or "logit"
ea + b X
o In other words p=
1 + ea + b X
15
Multiple Logistic Regression Model 16
17
Assumptions of Logistic Regression
o The advantages of logistic regression are primarily the result of the
general lack of assumptions.
o Logistic regression does not require any specific distributional form for
the independent variables.
18
Performance of the Model
§ Prediction correctness
§ For instance, if the estimated p is greater than or equal to .5,
then consider it to be predicted as 1 (i.e., the loan is rejected).
correct prediction
All approved cases
22
Specificity 22+1
Sensitivity 26
1+26
correct prediction
correct prediction Overall All rejected cases
all predictions Accuracy
22+26
22+1+26+1 19
Sensitivity vs. Specificity
n2,2
Sensitivity =
(n2,4 + n2,2 ) We are more interested
in maximizing the
5-,- sensitivity.
Specificity =
(5-,- 65-,/ )
20
Interpretation of the Coefficients
Log(odds ratio)= a + bX + e
• In logistic regression, 𝛽 measures the incremental impact of
increasing IV by 1 unit on the log odds ratio
• Or, 𝑒 ! (Exp(B)) represents the change in the odds of the outcome
by increasing x by 1 unit
• If B = 0, the odds and probability are the same at all x levels (𝑒 - =1)
• If B > 0 , the odds and probability increase as x increases (𝑒 - >1)
• If B < 0 , the odds and probability decrease as x increases (𝑒 - <1)
21
Example
o Look at the odds ratios
Interpretation:
Exp (B)-odds ratio, is the effect size of the o OR>1 increased
frequency of exposure
predictor. The closer the odds ratio is to 1, among cases
the smaller the effect. o OR=1 No change in
frequency of exposure
o OR<1 decreased
frequency of exposure 22
How to calculate the probability change due to a one-unit change
in the independent variable?
Value
Exponentiated coefficient 0.20 0.50 1.0 1.5 1.7
(𝑒 7 )
𝑒 7 -1 -0.80 -0.50 0.0 0.50 0.70
Percentage change in odds -80% -50% 0% 50% 70%
23
Use the logistic regression to estimate the class probability
24
Tutorial: Use Logistic Regression To Predict The
Probability Of Marketing Campaign Response
25