hh

© All Rights Reserved

13 views

hh

© All Rights Reserved

- BUS 302 Study Material
- Misconceptions of R2
- Eview Guide 24 Page
- Unit 9 Regression SLM
- Practice+Problems2_4031_F14
- A Guide to Regression Analysis Using MS Excel
- v22-25
- Unemployment Right Wing Extremism
- wp700
- asdasdasd
- Tutorial
- Interactions
- MPE,MAPE,RMSE
- Mar27 Syntax
- Minitab DOE Tutorial2015
- TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION INDICES OF 35 PYRAZINES
- QM-II Midterm OCT 2014 Solution
- R09 Correlation and Regression
- PROIECT MSSA
- Exam2 Topics

You are on page 1of 39

2 Multiple Linear

Regression

Hector Lemus

Spring 2016

single continuous dependent variable.

1. To assess the relationship between the dependent and the independent

variables simultaneously taking into account the intercorrelations

among the independent variables.

2. To examine the effect of one or more variables on the dependent

variable after controlling (adjusting) for the effects of the other

variables in the model.

3. To assess the interaction of two or more independent variables with

respect to the dependent variable.

4. To develop a prediction equation.

Example

Independent variables:

1. BMI

2. Age

3. Smoking history:

0 = Nonsmoker

1 = Current or Previous Smoker

Hypothetical Example

hypertension.

Half of the patients are randomly assigned to an active drug and half

assigned to placebo.

The dependent variable is change in diastolic blood pressure from the

baseline evaluation to the 6-month evaluation.

Suppose we observe the following mean changes in DBP stratified by age:

Age

(years)

Drug Group

Active

Placebo

<60

-10

-2

60

-1

-2

active drug. Its effectiveness varies by age.

4

Hypothetical Example

and test whether the interaction is statistically significant.

Independent variables:

1. Age

2. Drug group (Active/Placebo)

3. Interaction term (to be discussed)

Notation:

Let Y be the dependent variable

Let X1,, Xk be the independent variables

Model:

Y 0 1 X 1 2 X 2 L k X k E

k

0 i X i E

i 1

the error term which is a random variable

assumption about its distribution.

6

1. Y is a random variable with distribution of values for each specific

3. The mean value of Y for each specific combination of the Xs is given by

0 1 X 1 2 X 2 L k X k

4. The variance of Y is the same for any fixed combination of the Xs.

5. Y is normally distributed for each specific combination of the Xs.

7

Basic Idea: Find estimates of the s which minimizes the sum of the squared

distances between the observed and corresponding predicted values.

X X L X

Y

0

1 1

2 2

k

k

Let the predicted value be

Find the estimated parameters

i 1

Yi Yi

i 1

0 , 1 ,K , k

which minimizes

Yi 0 1 X 1i L k X ki

also called the residual sum of squares.

an estimate for E.

E Y Y

8

n

SSY Yi Y

SSE

i 1

i 1

SS Reg SS Res

SSY SSE

i 1

Source

df

SS

Regression

SSY SSE

nk1

SSE

n1

SSY

Residual

Total

Yi Yi

Yi Y

MS

SSY SSE

MSReg

k

SSE

MS Res

n k 1

F

MSReg

F

MS Res

10

Coefficient of Determination

Proportion of variability of Y that can be explained by the model

R2

SSY SSE

SSY

0 R2 1

11

Three test types:

1. Overall test: Does the set of independent variables taken together explain a

significant amount of the variability in Y?

Taken together, does the set of BMI, Age and Smoking History explain a

significant amount of the variability of SBP?

the model, does the addition of one variable explain a significant amount of the

variability of Y?

Evaluate the relationship between one independent variable and Y after controlling

(adjusting) for the other variables in the model.

Given that Age and Smoking History are in the model, what is the relationship

between BMI and SBP?

variables in the model, does the addition of another set of variables explain a

significant amount of the variability of Y?

adjusting for known factors related to DBP such as Age and BMI.

12

For example:

Suppose we test the association between BMI and SBP after adjusting for

Age and Smoking History

H 0 : 1 0

13

SBP 0 2 (Age) 3 (Smoking History) E

appropriate.

The concepts of nested (full and reduced) models will apply to all of the

tests that we discuss.

14

1.

H0: The k independent variables taken together do not explain a

significant amount of the variability in Y.

2.

H0: The overall regression using the k independent variables is not

statistically significant.

3.

H0: 1 = 2 = = k = 0

Y 0 1 X 1 2 X 2 L k X k E

Y 0 E

15

F

MS Reg

MS Res

Fk, n-k-1,

n-k-1, 1-

1-: the 100(1 - ) percentile from the F-dist with k and n-k-1 degrees

of freedom, where is our chosen level of significance.

n-k-1, 1-

1-

The percentile is the critical value or critical point.

Alternatively, compute the p-value and compare to the level.

16

SBP Example

Determine whether BMI, age and smoking history taken together account

for a significant amount of the variability of SBP.

Y: SBP, X1: BMI, X2: Age, X3: Smoking History

n = 32 subjects k = 3

Full model:

Y 0 1 X 1 2 X 2 3 X 3 E

H 0 : 1 = 2 = 3 = 0

Reduced model:

Y 0 E

17

SAS Output

The REG Procedure

Model: MODEL1

Dependent Variable: SBP Systolic Blood Pressure (mmHg)

Number of Observations Read

Number of Observations Used

At = 0.05,

F3, 28, 0.95 = 2.95

32

32

Analysis of Variance

Source

DF

Model

Error

Corrected Total

3

28

31

Root MSE

Dependent Mean

Coeff Var

Sum of

Squares

4889.82570

1536.14305

6425.96875

7.40691

144.53125

5.12478

Mean

Square

1629.94190

54.86225

R-Square

Adj R-Sq

F Value

29.71

0.7609

0.7353

Pr > F

At = 0.01,

F3, 28, 0.99 = 4.57

<.0001

At = 0.001,

F3, 28, 0.999 = 7.19

Reject H0 and conclude that taken together the 3 variables account for a

significant amount of the variability of SBP.

18

can be used to test hypotheses about individual variables

One type of breakdown is sequential, variables-added-in-order

Called Type I in SAS

X1: BMI, X2: Age, X3: Smoking History

Source

df

SS

X1

1 3537.95

Regression X 2 | X 1

1 582.65

X | X , X 1 769.23

3

1

2

Residual

28 1536.14

19

SS(X1)

This may be used to test whether BMI is linearly related to SBP without

adjusting for any other variables.

Since, technically, X2 and X3 are not in the model, then pool their terms

with the residual.

SSRes = 1536.14 + 582.65 + 769.23 = 2888.02

dfRes = 28 + 1 + 1 = 30

Test H0: 1 = 0 using

Y 0 1 X 1 E

3537.95 / 1 3537.95

36.75

2888.02 / 30

96.28

20

SS(X2|X1)

The extra sum of squares explained by adding Age to the model given BMI

already in the model.

Pooled error term:

SSRes = 1536.14 + 769.23 = 2305.37

dfRes = 28 + 1 = 29

Full:

Y 0 1 X 1 2 X 2 E

Reduced: Y 0 1 X 1 E

F

582.65 / 1

582.65

7.33

2305.37 / 29 79.50

21

SS(X3|X1, X2)

model given BMI and Age already in the model.

Full:

Y 0 1 X 1 2 X 2 3 X 3 E

Reduced: Y 0 1 X 1 2 X 2 E

H0: 3 = 0 [Smoking history is not associated with SBP after adjusting for

BMI and Age.]

F

769.23 /1

769.23

14.02

1536.14 / 28 54.86

22

Y 0 1 X 1 2 X 2 L p X p * X * E

Full model:

H0: The addition of X* to the model does not explain a significant amount

of the variability of Y in the presence of X1, X2, , Xp.

H 0 : * = 0

Reduced model:

Y 0 1 X 1 2 X 2 L p X p E

23

To construct the partial F test, you need the extra sum of squares for X*.

Denote:

SS(X*| X1, X2, , Xp) = RegSS(X1, X2, , Xp, X*) RegSS(X1, X2, , Xp)

= RegSS(Full) RegSS(Reduced)

MSRes Full

So,

F X * | X 1 ,..., X p

SS X * | X 1 ,..., X p

SSRe s (Full)

n p2

MSRes (Full)

n-p-2, 1-

1-

24

Example 1

Test whether smoking history is related to SBP after controlling for Age and

BMI.

Y 0 1 X 1 2 X 2 3 X 3 E

Full model:

H0: 3 = 0

SS(X3|X1, X2) = 769.23

MSRes(X1, X2, X3) = 1536.14/28 = 54.86

F = 769.23/54.86 = 14.02

Reject H0 and conclude Smoking history is significantly related to SBP after

adjusting for BMI and Age.

25

Example 2

Test the relationship of BMI to SBP controlling for Age and Smoking history.

H 0 : 1 = 0

Full:

Y 0 1 X 1 2 X 2 3 X 3 E

Reduced: Y 0 2 X 2 3 X 3 E

We know that SS(X1, X2, X3) = 4889.83 from the SAS Output.

However, we would have to find SS(X2, X3) by fitting a model with only X2

and X3 in it.

26

Example 2 (cont.)

SS(X1|X2, X3) = 4889.83 4689.69 = 200.14

This is the marginal sum of squares, SAS can provide this information.

F(X1|X2, X3) = 200.14/54.86 = 3.65

F1, 28, 0.90 = 2.89

F1, 28, 0.95 = 4.20

No evidence to suggest a significant relationship between SBP and BMI

adjusting for Age and Smoking history.

27

A T-test Equivalent

An equivalent test to the Partial F test.

*

*

Full model: Y 0 1 X 1 2 X 2 L p X p X E

Test: H0: * = 0

Could use F(X*| X1, X2, , Xp) or equivalently

*

where

is the estimated regression parameter

s

and * is the estimated standard error.

*

T

s *

Reject H0 if |T| > tn-p-2,1/2

n-p-2,1-

28

Example 2 (again)

Relationship of BMI to SBP adjusting for Age and Smoking History.

Parameter Estimates

Variable

Label

Intercept

BMI

AGE

SMK

Intercept

Body Mass Index

Age (years)

Smoking History

DF

Parameter

Estimate

Standard

Error

t Value

Pr > |t|

1

1

1

1

45.10319

1.22225

1.21271

9.94557

10.76488

0.63993

0.32382

2.65606

4.19

1.91

3.75

3.74

0.0003

0.0664

0.0008

0.0008

1.2223

1.91,

0.6399

p value 0.066

F = T2 = (1.91)2 = 3.65

29

1.

SS ( X 1 )

SS ( X 2 | X 1 )

SS ( X 3 | X 1 , X 2 )

testing.

This is SAS Type 1 SS.

2. SS ( X 1 | X 2 , X 3 )

SS ( X 2 | X 1 , X 3 )

SS ( X 3 | X 1 , X 2 )

Each test adjusts for all other variables in the

model.

This is SAS Type 2 SS.

With the exception of the last test, these tests are not equivalent.

30

proc reg data=sbp_data;

model sbp = bmi age smk / ss1 ss2;

run;quit;

Parameter Estimates

Variable

Label

Intercept

BMI

AGE

SMK

Intercept

Body Mass Index

Age (years)

Smoking History

DF

Parameter

Estimate

Standard

Error

t Value

Pr > |t|

1

1

1

1

45.10319

1.22225

1.21271

9.94557

10.76488

0.63993

0.32382

2.65606

4.19

1.91

3.75

3.74

0.0003

0.0664

0.0008

0.0008

Parameter Estimates

Variable

Label

Intercept

BMI

AGE

SMK

Intercept

Body Mass Index

Age (years)

Smoking History

DF

Type I SS

Type II SS

1

1

1

1

668457

3537.94574

582.64651

769.23345

963.09739

200.14147

769.45920

769.23345

31

MLR Table

Multiple Linear Regression of Systolic Blood Pressure versus selected characteristics (n

(n = 32)

Characteristic

Estimated Coefficient

p-value

BMI (kg/m2)

1.2

-0.1, 2.5

0.066

Age (5 yr interval)

6.1

2.7, 9.4

<0.001

Smoking History

9.9

4.5, 15.4

<0.001

R2 = 0.76

32

Given that a set of independent variables is in the model, test for the addition

of another set.

Uses:

1. The additional set represents a related group of variables; test a set of

behavioral variables controlling for a set of demographic variables.

2.

3.

33

Y 0 1 X 1 L p X p p* 1 X *p 1 L k* X k* E

Full model:

*

*

H0: The addition of Xp+1

p+1 , , Xk to the model does not explain a

significant amount of the variability of Y in the presence of X1, X2, , Xp.

*

*

H0: The set of Xp+1

p+1 , , Xk is not significantly related to Y controlling for

X1, X2, , Xp.

*

*

H0: p+1

p+1 = = k = 0

Reduced model:

Y 0 1 X 1 L p X p E

34

*

*

Need the extra sum of squares from adding Xp+1

p+1 , , Xk to the model.

Denote:

*

*

SS(Xp+1

p+1 , , Xk | X1, X2, , Xp) = RegSS(Full) RegSS(Reduced)

F X

*

p 1

,..., X | X 1 ,..., X p

*

k

SS X *p 1 ,..., X k* | X 1 ,..., X p / k p

So,

*

*

Reject the H0 if F(Xp+1

p+1 , , Xk | X1, X2, , Xp) > Fk-p, n-k-1,

n-k-1, 1-

1-

MSRes (Full)

35

Test for a set of interactions.

Let X1 = BMI

X2 = Age

X3 = Smoking History

X4 = BMIAge interaction

X5 = BMISmoking History interaction

X6 = AgeSmoking History interaction

Full model: Y 0 1 X 1 2 X 2 3 X 3 4 X 4 5 X 5 6 X 6 E

H 0 : 4 = 5 = 6 = 0

Reduced model:

Y 0 1 X 1 2 X 2 3 X 3 E

36

ANOVA for the full model:

Source

SS

df

MS

Source

Regression

4889.83

1629.94

Residual

1536.14

28

54.86

Regression

5092.83

848.80

Residual

1333.14

25

53.33

SS

df

MS

F ( X 4 , X 5 , X 6 | X1, X 2 , X 3 )

203.00 / 3

1.27

53.33

p-value > 0.25 Fail to reject H0.

The interactions taken together do not explain a significant amount of the

variability of SBP.

37

Constructing Extra SS

Suppose we have:

SS(X1)

SS(X2|X1)

SS(X3|X1, X2)

and we want to test H0: 2 = 3 = 0.

the model given X1 already in the model.

SS(X2, X3 | X1) = SS(X1, X2, X3) SS(X1)

38

SS(X2 | X1) = SS(X1, X2) SS(X1)

SS(X3 | X1, X2) = SS(X1, X2, X3) SS(X1, X2)

Therefore,

SS(X2 | X1) + SS(X3 | X1, X2) = SS(X1, X2) SS(X1) + SS(X1, X2, X3) SS(X1, X2)

= SS(X1, X2, X3) SS(X1)

= SS(X2, X3 | X1)

39

- BUS 302 Study MaterialUploaded byCadyMyers
- Misconceptions of R2Uploaded byapi-3744914
- Eview Guide 24 PageUploaded byHimanshu Gupta
- Unit 9 Regression SLMUploaded bymunmun8327
- Practice+Problems2_4031_F14Uploaded bycthunder_1
- A Guide to Regression Analysis Using MS ExcelUploaded bypalak32
- v22-25Uploaded byajaybhatia
- Unemployment Right Wing ExtremismUploaded byjellobiafra1
- wp700Uploaded byMarhadi Leonchi
- asdasdasdUploaded bySerpentarius_05
- TutorialUploaded bykhuzani
- InteractionsUploaded bybisma_aliyyah
- MPE,MAPE,RMSEUploaded byTriana Dian Nisa
- Mar27 SyntaxUploaded byAzrul Fazwan
- Minitab DOE Tutorial2015Uploaded bypuddin245
- TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION INDICES OF 35 PYRAZINESUploaded byijsc
- QM-II Midterm OCT 2014 SolutionUploaded bysandeeptirukoti
- R09 Correlation and RegressionUploaded byTorakiSato
- PROIECT MSSAUploaded byToma Amalia
- Exam2 TopicsUploaded byamt801
- Pr RegresiUploaded byKurnia Utami
- Auto Clave ExpansionUploaded byDiana Jenkins
- CorrelationsUploaded byLukmanul Hakim
- 3_linear_regression-handout.pdfUploaded byTaylor Tam
- SPE-10279-MSUploaded byPedro Guerrero
- Rsh Qam11 Excel and Excel QM ExplsM2010Uploaded byhlgonzalez
- 02_IJRG18_A02_1122Uploaded byKarthik Palani
- ekometrika2new.docxUploaded byFakhrul Rozi
- Regression Models for Data Science in R by Brian CaffoUploaded byCliff Anderson
- Labberymanualistic.pdfUploaded byRomel Aurelio

- CISSP CBK Final Exam-Answers v5.5Uploaded bychotaajay
- Pandas DataFrame NotesUploaded byscribd_sandeep
- PDF.pdfUploaded byFelix Stephen
- 9781847199867-Latex Beginners GuideUploaded byscribd_sandeep
- PMO ProposalUploaded byscribd_sandeep
- Pwc Adopting Agile MethodologyUploaded byJannpaul_08
- PMP Exam Prep, Sixth Edition-Rita's Course in a Book for Passing the PMP Exam.isbn_1932735186Uploaded byscribd_sandeep
- Consulting Data AnalystUploaded byscribd_sandeep
- Demand Forecasting in Supply chainUploaded bynaagasrikanth
- Fixed-Point Package User's GuideUploaded byGroup of Sharing
- SUITE_Agile_Process_Guide_-_20120711_V.1_430719_7Uploaded byscribd_sandeep
- xml_tutorial.pdfUploaded byAnonymous 1DK1jQgAG
- Munro, Roderick a._ Maio, Matthew J._ Nawaz, Mohamed B._ Ramu, Govindarajan_ Zrymiak, Daniel J.-certified Six Sigma Green Belt Handbook-American Society for Quality (ASQ) (2008)Uploaded bydhanu
- Object DefinitionUploaded byscribd_sandeep
- 775L1Uploaded byscribd_sandeep
- Test Plan SoftwareTestingUploaded byVineela Linga
- Ubee Router ManualUploaded byscribd_sandeep
- datatable-faqUploaded byscribd_sandeep
- MiceUploaded byYou Safe
- Ch1 IntroductionUploaded byscribd_sandeep
- 1 Page ProposalUploaded byscribd_sandeep
- Embedded 20Uploaded byahamed
- Advanced Testing With VHDLUploaded byLê Văn Lợi
- Creating Data ViewUploaded byscribd_sandeep
- Distributed Database 2Uploaded byscribd_sandeep
- Heizer Operation Management Solution PDFUploaded byMohit Dev
- hr_om11_ism_ch02.pdfUploaded byTural Gasimov
- 161-141021114008-conversion-gate01Uploaded byscribd_sandeep
- Art of R Programming a Tour of Statistical Software DesignUploaded byscribd_sandeep

- SanggunianUploaded byKristel Joy Mancera
- JMP_2013093015491987Uploaded byEdward Saldaña
- 作業五Uploaded by葛順康
- Mathematics 9 KeyUploaded byRPONTEJO
- motionmountain-volume6Uploaded byramasubramonian
- Logit & Probit ModelsUploaded byKishlay Kumar
- Atomic PhysicsUploaded byvivekrajbhilai5850
- Ether Flux Theory: Integrating Gravity, Quantum Mechanics and OpticsUploaded byHass Patel
- Tesla_vs_EinsteinUploaded bysolohovus
- David Tong QFT NotesUploaded byopenlaszlo
- Rare Kaon Decays - InSPIRE-HEPUploaded byBruno Da Fonseca Gonçalves
- cmip03c.pdfUploaded byluishipp
- clogitUploaded bylsabetti
- Horowitz singularities in string theoryUploaded byEvanAdams
- 140425_3. Analysis of VarianceUploaded bychaiyan_05
- hjhUploaded byPepe Luis
- Bab 1 terjemahan Accounting ReviewUploaded bydevilia_susanti
- Hypothesis.pptxUploaded byMajid Kifayat
- Chapter 12 HandoutsUploaded byPatrícia Cardoso Gonçalves
- Lorentz TransformationUploaded bypauljkt1
- 2002 Deutsch the Structure Ofthe Multi VerseUploaded bypeaceme00
- 08 Reliability and Maintenance Lecture #8Uploaded byFelipe
- Maxwell-and-Tesla-Vision-of-SpaceUploaded bydiaboliqu69
- Quantum PhysicsUploaded byBilla Ajith
- 5 Inference FRMUploaded byslowjams
- Testing Local Realism into the Past without Detection and Locality LoopholesUploaded byFábio Duarte
- IT Assignment 2Uploaded bysyed02
- G Power CalculationUploaded bybernut
- Categorizing HypothesesUploaded byNanay Gi
- Research ActivityUploaded byAdrienne Nicole