1 views

Uploaded by ChristianAquino

BUS STAT CH 14

- Assignment Today
- 1-s2.0-S0378778815303133-main
- 15103903 Multiple Regression (1)
- Statistics Cheat Sheet
- WP21
- Ryan Notes
- review
- Econometric s
- Municipal Quantatative Analysis - (Socio-Economic Indicators, Growth Forecasts, Cost-Benefit Analysis)
- How to Interpret Multiple Regression Output in Spss
- Dummy Variable Regression and Oneway ANOVA Models Using SAS
- Financial Econometrics; Hypothesis Testing
- Chap 006
- regression models
- ME_ehidio.pdf
- Influence of Processing Parameters on Ultrasonic welding of Thermoplastic Material Using Taguchi Method-IJAERDV04I1069145.pdf
- HMCost3e_SM_Ch03.doc
- Regr Hand
- Hatting h 2013
- 24 1104 Relationship Am0601

You are on page 1of 39

Building

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

Multiple Regression and Model

Building

14.1 The Multiple Regression Model and the

Least Squares Point Estimate

14.2 Model Assumptions and the Standard Error

14.3 R2 and Adjusted R2

14.4 The Overall F Test

14.5 Testing the Significance of an Independent

Variable

14.6 Confidence and Prediction Intervals

14-2

Multiple Regression and Model

Building Continued

14.7 The Sales Territory Performance Case:

Evaluating Employee Performance

14.8 Using Dummy Variables to Model

Qualitative Independent Variables

14.9 Using Squared and Interactive Terms

14.10 Model Building and the Effects of

Multicollinearity

14.11 Residual Analysis in Multiple Regression

14.12 Logistic Regression

14-3

LO 14-1: Explain the

multiple regression

model and the related

least squares point

estimates.

14.1 The Multiple Regression Model and

the Least Squares Point Estimate

Simple linear regression used one independent

variable to explain the dependent variable

Some relationships are too complex to be described

using a single independent variable

Multiple regression uses two or more independent

variables to describe the dependent variable

This allows multiple regression models to handle more

complex situations

There is no limit to the number of independent

variables a model can use

Multiple regression has only one dependent variable

14-4

LO14-1

The linear regression model relating y to x1, x2,…, xk

is y = β0 + β1x1 + β2x2 +…+ βkxk +

µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean value of

the dependent variable y when the values of the

independent variables are x1, x2,…, xk

β0, β1, β2,… βk are the unknown regression

parameters relating the mean value of y to

x1, x2,…, xk

is an error term that describes the effects on y of all

factors other than the independent variables

x1, x2,…, xk

14-5

LO14-1

The Least Squares Estimates and Point

Estimation and Prediction

1. Estimation/prediction equation

ŷ = b0 + b1x1 + b2x2 + … + bkxk

is the point estimate of the mean value of the

dependent variable when the values of the

independent variables are x1, x2,…, xk

2. It is also the point prediction of an individual value of

the dependent variable when the values of the

independent variables are x1, x2,…, xk

3. b0, b1, b2,…, bk are the least squares point

estimates of the parameters β0, β1, β2,…, βk

4. x1, x2,…, xk are specified values of the independent

predictor variables x1, x2,…, xk

14-6

LO14-1

EXAMPLE 14.1 The Tasty Sub

Shop Case

LO 14-2: Explain the

assumptions behind

multiple regression and

calculate the standard

error. 14.2 Model Assumptions and the

Standard Error

The model is

y = β 0 + β 1 x 1 + β 2 x 2 + … + β kx k +

stated about the model error terms, ’s

14-8

LO14-2

Continued

The mean of the error terms is equal to 0

2. Constant Variance Assumption

The variance of the error terms σ2 is, the

same for every combination values of x1,

x2,…, xk

3. Normality Assumption

The error terms follow a normal distribution

for every combination values of x1, x2,…, xk

4. Independence Assumption

The values of the error terms are statistically

independent of each other

14-9

LO14-2

Sum of Squares

Sum of squared errors

SSE e i2 ( y i yˆ i ) 2

residual variance σ2

SSE

s MSE

2

n-k 1

Standard error: point estimate of the residual

standard deviation σ

SSE

s MSE

n- k 1

14-10

LO 14-3: Calculate and

interpret the multiple

and adjusted multiple

coefficients of

determination.

Σ(yi - ȳ)2

2. Explained variation is given by the formula

Σ(ŷi - ȳ)2

3. Unexplained variation is given by the

formula

Σ(yi - ŷi)2

4. Total variation is the sum of explained and

unexplained variation

This section can be read anytime

after reading Section 14.1

14-11

LO14-3

5. The multiple coefficient of determination is

the ratio of explained variation to total

variation

6. R2 is the proportion of the total variation that

is explained by the overall regression model

7. Multiple correlation coefficient R is the

square root of R2

14-12

LO14-3

The multiple correlation coefficient R is just

the square root of R2

With simple linear regression, r would take on

the sign of b1

There are multiple bi’s with multiple

regression

For this reason, R is always positive

To interpret the direction of the relationship

between the x’s and y, you must look to the

sign of the appropriate bi coefficient

14-13

LO14-3

The Adjusted R2

Adding an independent variable to multiple

regression will raise R2

R2 will rise slightly even if the new variable

has no relationship to y

The adjusted R2 corrects this tendency in R2

As a result, it gives a better estimate of the

importance of the independent variables

14-14

LO 14-4: Test the

significance of a

multiple regression

model by using an F

test.

14.4 The Overall F Test

To test

H0: β1= β2 = …= βk = 0 versus

Ha: At least one of β1, β2,…, βk ≠ 0

Test statistic

(Explained variation)/k

F(model)

(Unexplain ed variation)/[n - (k 1)]

p-value <

*F is based on k numerator and n-(k+1) denominator degrees of freedom

14-15

LO 14-5: Test the

significance of a

single independent

variable. 14.5 Testing the Significance of an

Independent Variable

A variable in a multiple regression model is

not likely to be useful unless there is a

significant relationship between it and y

To test significance, we use the null

hypothesis H0: βj = 0

Versus the alternative hypothesis

Ha: βj ≠ 0

14-16

LO14-5

Testing Significance of an Independent

Variable #2

14-17

LO14-5

Testing Significance of an

Independent Variable #3

Customary to test significance of every

independent variable

If we can reject H0: βj = 0 at =0.05, we have

strong evidence the independent variable xj is

significantly related to y

If we can reject H0: βj = 0 at =0.01, we have

very strong evidence the independent

variable xj is significantly related to y

The smaller the significance level at which

H0 can be rejected, the stronger the evidence

that xj is significantly related to y

14-18

LO14-5

A Confidence Interval for the

Regression Parameter βj

If the regression assumptions hold, 100(1-

)% confidence interval for βj

is [b1 ± t/2 Sbj]

t/2 is based on n – (k + 1) degrees of

freedom

14-19

LO 14-6: Find and

interpret a confidence

interval for a mean

value and a prediction

interval for an individual

value.

14.6 Confidence and Prediction

Intervals

The point on the regression line

corresponding to a particular value of x1,

x2,…, xk, of the independent variables is

ŷ = b0 + b1x1 + b2x2 + … + bkxk

It is unlikely that this value will equal the

mean value of y for these x values

Therefore, we need to place bounds on how

far away the predicted value might be

We can do this by calculating a confidence

interval for the mean value of y and a

prediction interval for an individual value of y

14-20

LO14-6

Distance Value

Both the confidence interval for the mean

value of y and the prediction interval for an

individual value of y employ a quantity called

the distance value

With simple regression, we were able to

calculate the distance value fairly easily

However, for multiple regression, calculating

the distance value requires matrix algebra

14-21

LO14-6

A Confidence Interval for a Mean

Value of y

Assume the regression assumptions hold

Confidence interval

[ŷ t /2 s( y yˆ ) ] s( y yˆ ) s Distance value

Prediction interval

[ŷ t /2 s( y yˆ ) ] s( y yˆ ) s 1 Distance value

14-22

14.7 The Sales Territory Performance

Case: Evaluating Employee Performance

yi Yearly sales of the company’s product

x1 Number of months the representative has

been employed

x2 Sales of products in the sales territory

x3 Dollar advertising expenditure in the territory

x4 Weighted average of the company’s market

share in territory for the previous four years

x5 Change in the company’s market share in

the territory over the previous four years

14-23

Partial Excel Output of a Regression Analysis

of the Sales Territory Performance Data

LO 14-7: Use dummy

variables to model

qualitative independent

variables. 14.8 Using Dummy Variables to Model

Qualitative Independent Variables

So far, we have only looked at including

quantitative data in a regression model

However, we may wish to include descriptive

qualitative data as well

For example, might want to include the gender

of respondents

We can model the effects of different levels of

a qualitative variable by using what are called

dummy variables

Also known as indicator variables

14-25

LO14-7

How to Construct Dummy

Variables

A dummy variable always has a value of

either 0 or 1

For example, to model sales at two locations,

would code the first location as a zero and

the second as a 1

Operationally, it does not matter which is

coded 0 and which is coded 1

14-26

LO14-7

What If We Have More Than Two

Categories?

Consider having three categories, say A, B

and C

Cannot code this using one dummy variable

A=0, B=1 and C=2 would be invalid

Assumes the difference between A and B is

the same as B and C

We must use multiple dummy variables

Specifically, k categories requires k-1 dummy

variables

14-27

LO14-7

For A, B, and C, would need two dummy

variables

x1 is 1 for A, zero otherwise

x2 is 1 for B, zero otherwise

If x1 and x2 are zero, must be C

This is why the third dummy variable is not

needed

14-28

LO14-7

Interaction Models

So far, have only considered dummy

variables as stand-alone variables

Model so far is y = β0 + β1x + β2D +

Where D is dummy variable

However, can also look at interaction

between dummy variable and other variables

That model would take the form

y = β0 + β1x + β2D + β3xD +

With an interaction term, both the intercept

and slope are shifted

14-29

LO 14-8: Use

squared and

interaction variables.

14.9 Using Squared and

Interaction Variables

Quadratic regression model is:

y = β0 + β 1 x + β2 x 2 ε

where

1. β0 + β1x + β2x2 is μy

2. Β, β, and β2 are the regression parameters

3. ε is an error term

14-30

LO14-8

Regression models often contain interaction

variables

Formed by multiplying two independent

variables together

Consider a model where x3 and x4 interact

and x3 is used as a quadratic

14-31

LO 14-9: Describe

multicollinearity and

build a multiple

regression model.

14.10 Model Building and the

Effects of Multicollinearity

Multicollinearity: when “independent”

variables are related to one another

Considered severe when the simple

correlation exceeds 0.9

Even moderate multicollinearity can be a

problem

Another measurement is variance inflation

factors

Multicollinearity a problem when VIF>10

Moderate problem for VIF>5 1

VIF

1 R 2j

j

14-32

LO14-9

Effect of Adding Independent

Variable

Adding any independent variable will increase

R²

Even adding an unimportant independent

variable

Thus, R² cannot tell us that adding an

independent variable is undesirable

14-33

LO14-9

A Better Criterion is the Standard

Error

A better criterion is the size of the standard

error s

If s increases when an independent variable

is added, we should not add that variable

However, decreasing s alone is not enough

An independent variable should only be

included if it reduces s enough to offset the

higher t value and reduces the length of the

desired prediction interval for y

SSE

s

n k 1

14-34

LO14-9

C Statistic

Another quantity for comparing regression

models is called the C (a.k.a. Cp) statistic

First, calculate mean square error for the

model containing all p potential independent

variables (s2p)

Next, calculate SSE for a reduced model with

k independent variables

C 2 n 2k 1

SSE

sp

14-35

LO14-9

C Statistic Continued

We want the value of C to be small

Adding unimportant independent variables

will raise the value of C

While we want C to be small, we also wish to

find a model for which C roughly equals k+1

A model with C substantially greater than k+1

has substantial bias and is undesirable

If a model has a small value of C and C for

this model is less than k+1, then it is not

biased and the model should be considered

desirable

14-36

LO14-9

The Partial F Test: An F Test a Portion

of a Regression Model

To test

H0: All of the βj coefficients corresponding to the

independent variables in the subset are zero

Ha: At least one of the βj coefficients is not equal to

zero

(SSE R - SSE C )/k *

F

SSE C /[n - (k 1)]

F(partial) > F or

p-value <

F is based on k-g numerator and n-(k+1) denominator degrees of freedom

14-37

LO 14-10: Use residual

analysis to check the

assumptions of multiple

regression.

14.11 Residual Analysis in

Multiple Regression

For an observed value of yi, the residual is

ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik)

If the regression assumptions hold, the residuals

should look like a random sample from a normal

distribution with mean 0 and variance σ2

Residual plots

Residuals versus each independent variable

Residuals versus predicted y’s

Residuals in time order (if the response is a time

series)

14-38

LO14-10

Residual Plots for the Sales

Territory Performance Model

- Assignment TodayUploaded byharis
- 1-s2.0-S0378778815303133-mainUploaded bydad
- 15103903 Multiple Regression (1)Uploaded byMuhammad Haris
- Statistics Cheat SheetUploaded bycannickg
- WP21Uploaded byWasik Abdullah Momit
- Ryan NotesUploaded byudaysk
- reviewUploaded byapi-285777244
- Econometric sUploaded byCarter Mason
- Municipal Quantatative Analysis - (Socio-Economic Indicators, Growth Forecasts, Cost-Benefit Analysis)Uploaded bydnawrot2012
- How to Interpret Multiple Regression Output in SpssUploaded byMano Shehzadi
- Dummy Variable Regression and Oneway ANOVA Models Using SASUploaded byRobin
- Financial Econometrics; Hypothesis TestingUploaded byRz Mj Ag
- Chap 006Uploaded bypalak32
- regression modelsUploaded bysbc
- ME_ehidio.pdfUploaded byJhay Ehidio
- Influence of Processing Parameters on Ultrasonic welding of Thermoplastic Material Using Taguchi Method-IJAERDV04I1069145.pdfUploaded byvedant sutaone
- HMCost3e_SM_Ch03.docUploaded byAnna Antonio
- Regr HandUploaded byanshu002
- Hatting h 2013Uploaded byAyban Wan
- 24 1104 Relationship Am0601Uploaded byladang
- Ie Slide07Uploaded byLeo Kaligis
- Modelling sap flow of young apple rootstocks under various climatesUploaded byEndrit Kullaj
- data.docUploaded byBushraKhan
- FINS5517Slide_w5Uploaded byMaiNguyen
- 1299-4724-1-PBUploaded byMonicaSindyHeryuka
- aem00174-0043Uploaded byranasarker
- Regression LinesUploaded bymarcelinoplaceres
- Lab 1- Analyzing DataUploaded bytamhieu
- MFIMET2 EXAMUploaded byDaniel Hofilena
- Predict the v and Depth of Lake 2011Uploaded byPrince Sultan

- Lecture 3a_Life Process_Cell Division (Joel)2_2Uploaded byChristianAquino
- Lecture 2b_Types of Tissues (Joel)Uploaded byChristianAquino
- CloningUploaded byChristianAquino
- Lecture 2a_Life_structure, Function and Control (Joel)3Uploaded byChristianAquino
- Lecture 4_Reproduction and Developmental Biology (Joel)Uploaded byChristianAquino
- Lecture 5_ART (Joel)2Uploaded byChristianAquino
- Lecture 3b_Cancer Biology (Joel)3Uploaded byChristianAquino
- Lecture 8 GMO Joel Pers Alternate 3Uploaded byChristianAquino
- Lecture 3c_Stem Cells (Joel)Uploaded byChristianAquino
- Chapter 7 Post-Test QuizUploaded byChristianAquino
- Chapter 5 - Consumer BehaviorUploaded byChristianAquino
- Sexed_signs_-_queering_the_scenery.pdfUploaded byChristianAquino
- Ang Wika ay Kasangkapan ng Maykapangyarihan - Ang Wika Bilang Instrumentong Politikal-20170915133747.pdfUploaded byChristianAquino
- Intelektwalismo at WikaUploaded byChristianAquino
- Ukol Sa Wika at Kulturang PilipinoUploaded byChristianAquino
- CHAPTER 2 - System Development and Documentation TechniquesUploaded byChristianAquino
- Ang Wika ay Kasangkapan ng Maykapangyarihan - Ang Wika Bilang Instrumentong Politikal-20170915133747.pdfUploaded byChristianAquino
- Introducing_semiotic_landscapes.pdfUploaded byChristianAquino
- BUS STATUploaded byChristianAquino
- Bio 1Uploaded byChristianAquino
- Theory and EngagementUploaded byChristianAquino

- 21W-15020-2Uploaded byalegreco27
- Satteliate Magazine August 2011Uploaded byArshad Azim
- Space Master Character SheetUploaded byAlan Rennox
- Chess FssUploaded byAlessandro
- fuji catalogue.pdfUploaded bynourmlk
- Compiler Design: Theory, Tools, And Examples. Java Edition [HQ]Uploaded bytriafffii
- Thesis Writing - Format and GuidelinesUploaded byelvinperia
- EJA510-530 GS01C31F01-01ENUploaded bypalebejo
- Printer Sharing UnitUploaded byRakesh Kumar
- Plano Electrico ScoopUploaded bycris
- Easy ColoringUploaded byDoryta Bejarano
- Difference Between Goods and ServicesUploaded byDaniel Cook
- TRX - T-DobleUploaded bydiegotk123
- Juniper NGN White PaperUploaded byPreyas Hathi
- Tdil Mal TagsUploaded bySijo Thomas
- 75365Uploaded bybaajpur
- Presentation - InstrumentationUploaded bymahesh4975
- Delta v Course 7009-16Uploaded byFreddy Torres
- Admin GuideUploaded byImad Metalyzer Wolf
- Tma External Alarms InterfacesUploaded byloveismagic
- Matlab codesUploaded byAsmaa A Dawood
- Web Scripts Student GuideUploaded byajhired
- nsnusiUploaded byMotherboardTV
- C++ AssignmentsUploaded bySports Clube
- Allahabad Bank Probationary Officers ExamUploaded byajay-sharma-7739
- CSE Curriculum- Reg 2015 - After BOSUploaded bymoorthymtps_54120305
- Speed School Unit Info 2011Uploaded byUniversity of Louisville
- project management planUploaded byapi-324745485
- -- Cytoscape - A Software Environment for Integrated Models of Biomolecular Interaction NetworksUploaded byDiogo Junior
- Control Quadrotor HelicopterUploaded bynhatthang299