Multiple Regression Analysis

The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables we get an equation Y = α + β1X1 + β2X2. . Y = β0 +β1X1 + β2X2 It is customary to write it as

As an example, if a hypotensive agent is administered prior to surgery, recovery time for blood pressure to normal value will depend on the dose of the hypotensive and the blood pressure during surgery.

Categorical Explanatory Variables

Binary variables are coded 0, 1. For example a binary variable x1(‘Gender’) is coded male = 0, female = 1.

Recovery time for Blood Pressure and dose of hypotensive
The scatter plot shows a linear relationship. Blood Pressure takes longer to come back to normal value the larger the dose of the hypotensive. There are many outliers because of individual variability of subjects and because of
Recovery time for Blood Pressure and dose of hypotensive
RecvTime = -14.2576 + 8.00772 Logdose S = 14.7103 70 60 50 R-Sq = 15.5 % R-Sq(adj) = 13.8 %

RecvTime

40 30 20 10 0 Regression 95% CI

2.5

3.5

4.5

5.5

6.5

Logdose

Recovery time for Blood Pressure and lowest Blood Pressure reading during surgery
RecvTime = 34.4692 - 0.183546 Bpsurg

Recovery time for Blood Pressure and lowest B.P. reading during surgery

The lower the blood pressure achieved during surgery the longer the time for it to reach normal value during recovery from anaesthesia

S = 15.9386 70 60 50

R-Sq = 0.8 %

R-Sq(adj) = 0.0 %

RecvT ime

40 30 20 Regression 10 0 50 60 70 80 90 95% CI

Bpsurg

Multiple Regression Analysis
The effects of the two explanatory variables acting jointly is described by the equation Recov. Time = 22.3 + 10.6 Log dose – 0.740 Surg. B.P. As noted on the scatter plots several observations had outliers or larger than expected X values.

Categorical Explanatory Variables
Binary variables are coded 0, 1. For example a variable x1 (Gender) is coded male = 0 female = 1. Then in the regression equation Y = β0 + β1x1 + β2x2 when x1 = 1 the value of Y indicates what is obtained for female gender; and when x1 = 0 the value of Y indicates what is obtained for males. If we have a nominal variable with more than two categories we have to create a number of new dummy (also called indicator) binary variables

How many Explanatory Variables?

As a rule of thumb multiple regression analysis should not be performed if the total number of variables is greater than the number of subjects ÷ 10.

Analysis
In the computer output look for:

Adjusted R2. It represents the proportion of variability of Y explained by the X’s. R2 is adjusted so that models with different number of variables can be compared. The F-test in the ANOVA table. Significant F indicates a linear relationship between Y and at least one of the X’s. The t-test of each partial regression coefficient. Significant t indicates that the variable in question influences the Y response while controlling for other explanatory variables.

Usefulness of Scatter Plots - I

The scatter plot on the right illustrates the relationship between water hardness and mortality in 61 large towns in England and Wales. The regression line indicates inverse relationship between water hardness and mortality rates.

Motality and Water Hardness
Mortal = 1676.36 - 3.22609 Calcium S = 143.029 2000 R-Sq = 42.9 % R-Sq(adj) = 41.9 %

Mortal

1500

Regression 95% CI 1000 0 20 40 60 80 100 120 140

Calcium

Motality and Water Hardness in Towns in the North

Usefulness of Scatter Plots - II
MortalN = 1692.31 - 1.93134 CalciumN S = 129.209 R-Sq = 13.6 % R-Sq(adj) = 11.0 %

Motality and Water Hardness in Towns in the South
MortalS = 1522.82 - 2.09272 CalciumS S = 114.297 R-Sq = 36.3 % R-Sq(adj) = 33.6 %

2000

10 0 5 0 0
0 10

1600

1900

1800

1700

1600

1500

1400

1 st Qtr
20

30

40

CalciumN

3 rd Qtr
50

East West North
Regression 95% CI 70 80 90 100

1500

MortalN

MortalS

1400

1300

1200

Regression 95% CI

1100 0 20 40 60 80 100 120 140

60

CalciumS

The inverse relationship between water hardness is till maintained. But For towns in the North the regression line is less steep than for towns in the South indicating that other causes of mortality are stronger in the North compared to the South.

Sign up to vote on this title
UsefulNot useful