You are on page 1of 23

7

Advanced
Regression Analysis

Business Analytics, 1e
By Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, and Leida Chen

Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior
8/16/2020 written consent of McGraw-Hill Education.
7-1
7.3: Linear Probability and Logistic
Regression Models(1/8)
• The response variable thus far has been quantitative.
• Binary choice (classification) models have a binary response
variable.
• Examples
– Whether or not to buy a house
– Whether or not to join a health club
– Whether or not to approve a loan
• The binary choice is related to predictor variables.
• We will consider two types of models.
– The linear probability regression model
– The logistic regression model

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-2


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-2
7.3: Linear Probability and Logistic
Regression Models (2/8)
• The linear probability model is a linear regression model
with a binary outcome: 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀.
– Outcome is a success 𝑦 = 1, outcome is a failure 𝑦 = 0
– 𝑦 = {0,1}
• The model provides estimates of the probability of a success
– 𝑃 𝑦 = 1 is the probability of success
– Linearly related to x
– 𝑝Ƹ = 𝑏0 + 𝑏1 𝑥 is the predicted probability
• Shortcomings
– Can produce predicted probabilities greater than 1 or less than 0
– Relationship may not be linear

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-3


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-3
7.3: Linear Probability and Logistic
Regression Models (3/8)
• Example: The response variable y equals 1 if the mortgage loan is
approved, 0 otherwise. It is believed that approval depends on the
percentage of the down payment (𝑥1 ) and the percentage of the income-
to-loan ratio (𝑥2 ).

a. Estimate and interpret the linear probability model.


b. Predict the loan approval probability for an applicant with a 20% down
payment and a 30% income-to-loan ratio. What if the down payment
was 30%?
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-4
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-4
7.3: Linear Probability and Logistic
Regression Models (4/8)
a. Model output

b. 𝑥1 = 20, 𝑥2 = 30: 𝑝Ƹ = −0.8682 + 0.0188 ∗ 20 + 0.0258 ∗ 30 = 0.2818


𝑥1 = 30, 𝑥2 = 30: 𝑝Ƹ = −0.8682 + 0.0188 ∗ 30 + 0.0258 ∗ 30 = 0.4698

• If we use 𝑥1 = 60, 𝑥2 = 30 then 𝑝Ƹ = 1.0338.


• If we use 𝑥1 = 5, 𝑥2 = 30 then 𝑝Ƹ = −0.0002.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-5


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-5
7.3: Linear Probability and Logistic
Regression Models (5/8)
• The logistic regression model constrains the probabilities to be between
exp(𝑏0 +𝑏1 𝑥)
0 and 1: 𝑝Ƹ = .
1+exp(𝑏0 +𝑏1 𝑥)
– This also allows for a nonlinear relationship.
– The model is estimated by Maximum Likelihood Estimation.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-6


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-6
7.3: Linear Probability and Logistic
Regression Models (6/8)
• Example: revisit the mortgage example

a. Estimate and interpret the logistic regression model.


b. For an applicant with a 30% income-to-loan ratio, predict loan approval
probabilities with down payments of 20% and 30%.
c. Compare the predicted probabilities based on the estimated logistic
regression model with those from the estimated linear probability
regression model.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-7


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-7
7.3: Linear Probability and Logistic
Regression Models (7/8)

𝑥1 = 20, 𝑥2 = 30
exp −9.3671 + 0.1349 ∗ 20 + 0.1782 ∗ 30
𝑝Ƹ = = 0.2103
1 + exp −9.3671 + 0.1349 ∗ 20 + 0.1782 ∗ 30

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-8


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-8
7.3: Linear Probability and Logistic
Regression Models (8/8)
• We cannot assess the performance of the binary choice models using
the methods thus far.
• We assess accuracy of these models on the basis of the percentage of
correctly classified observations.
• Use the predicted probabilities to get {0,1} predicted values
– The predicted value is 1 if 𝑦ො ≥ 0.5.
– The predicted value is 0 if 𝑦ො < 0.5.
– Can use cutoffs other than 0.5
• Then compute the accuracy rate.
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
∗ 100
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
• Report the number of correctly classified observations for both types of
outcomes.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-9


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-9
Exercises

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-10


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-10
Exercises

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-11


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-11
Exercises

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-12


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-12
7.4: Cross-Validation Methods (1/9)
• All of the model evaluation measures thus far assess predictability
in the sample data that was used to build the model.
• These measures do not help gauge how well an estimated model
will predict an unseen sample.
• It is possible a model performs well with the sample data used for
estimation, but then performs miserably once a new sample is
evaluated.
• Overfitting occurs when an estimated model describes quirks of
the data rather than the relationships between variables.
– Model becomes too complex
– Fails to describe the behavior in a new sample
– Predictive power for new samples is compromised

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-13


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-13
7.4: Cross-Validation Methods (2/9)
• We need a measure of the predictive power of a model based on a set of
data not used in estimation.
• Cross-validation is a technique that evaluates predictive models by
partitioning the sample.
– Training set: use to build/train the model
– Validation set: use to evaluate/validate the model
• Use the root mean square error (RMSE) on the validation set.
σ 𝑦 − 𝑦ො 2
𝑅𝑀𝑆𝐸 =
𝑛∗
– 𝑦ො is a true prediction for an observation in the validation set.
– 𝑛∗ is the number of observations in the validation set.
– RMSE will be lower for the training set than the validation set.
• For binary choice models, use the accuracy rate for the validation set.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-14


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-14
7.4: Cross-Validation Methods (3/9)
• The holdout method partitions the sample data set into two
independent and mutually exclusive data sets.

A. Partition the sample into two parts: training and validation sets.
B. Use the training set to estimate competing models.
C. Use the estimates from the training set to predict the response in the
validation set.
D. Calculate RMSE (or accuracy rate) for each competing model. The
preferred model will have the smallest RMSE (or highest accuracy rate).
• We would like the model with the best performance in the training set to
also have the best performance in the validation set.
• Conflicting results are a sign of overfitting using the training set.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-15


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-15
7.4: Cross-Validation Methods (4/9)
• Example: recall the introductory case
• Model 1 used Size, Experience, Female, and Grad
• Model 2 used Size, Experience, Female, and Grad, also
includes the interactions between Female with Experience,
Female with Grad, and Size with Experience.
• Use the holdout method to compare the predictability of both
models using the first 150 observations for training and the
remaining 50 observations for validation.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-16


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-16
7.4: Cross-Validation Methods (5/9)
• Estimation based on the training set

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-17


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-17
7.4: Cross-Validation Methods (6/9)
• Validation results

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-18


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-18
Exercises

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-19


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-19
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-20
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-20
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-21
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-21
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-22
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-22
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 7-23
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 7-23

You might also like