You are on page 1of 10

Multiple Regression

Analysis

The principles of Simple Regression Analysis can be


extended to two or more explanatory variables.
With two explanatory variables we get an equation
Y = α + β1X1 + β2X2. . It is customary to write it as
Y = β0 +β1X1 + β2X2

As an example, if a hypotensive agent is


administered prior to surgery, recovery time for
blood pressure to normal value will depend on the
dose of the hypotensive and the blood pressure
during surgery.
Categorical Explanatory
Variables
 Binary variables are coded 0, 1. For
example a binary variable
x1(‘Gender’) is coded male = 0,
female = 1.
Recovery time for Blood
Pressure and dose of
hypotensive
The scatter plot Recovery time for Blood Pressure and dose of hypotensive
shows a linear RecvTime = -14.2576 + 8.00772 Logdose

relationship.
S = 14.7103 R-Sq = 15.5 % R-Sq(adj) = 13.8 %

70

Blood Pressure 60

takes longer to 50

RecvTime
come back to 40

normal value the 30

20

larger the dose of 10 Regression

the hypotensive. 0
95% CI

2.5 3.5 4.5 5.5 6.5

There are many Logdose

outliers because
of individual
variability of
subjects and
because of
Recovery time for Blood
Pressure and lowest Blood
Pressure reading during
surgery
Recovery time for Blood Pressure and lowest B.P. reading during surgery
RecvTime = 34.4692 - 0.183546 Bpsurg

The lower the blood


S = 15.9386 R-Sq = 0.8 % R-Sq(adj) = 0.0 %

70

pressure achieved 60

during surgery the 50

longer the time for it to

RecvT ime
40

reach normal value 30

during recovery from 20


Regression
10

anaesthesia
95% CI

50 60 70 80 90

Bpsurg
Multiple Regression
Analysis

The effects of the two explanatory variables


acting jointly is described by the equation
Recov. Time = 22.3 + 10.6 Log dose – 0.740
Surg. B.P.

As noted on the scatter plots several


observations had outliers or larger than
expected X values.
Categorical Explanatory
Variables
 Binary variables are coded 0, 1. For example a variable x1
(Gender) is coded
male = 0 female = 1. Then in the regression equation
Y = β0 + β1x1 + β2x2 when x1 = 1 the value of Y indicates
what is obtained for female gender; and when x1 = 0 the
value of Y indicates what is obtained for males.
If we have a nominal variable with more than two categories
we have to create a number of new dummy (also called
indicator) binary variables
How many Explanatory
Variables?
 As a rule of thumb multiple
regression analysis should not be
performed if the total number of
variables is greater than the number
of
subjects ÷ 10.
Analysis
In the computer output look for:

 Adjusted R2. It represents the proportion of


variability of Y explained by the X’s. R2 is
adjusted so that models with different number of
variables can be compared.
 The F-test in the ANOVA table. Significant F
indicates a linear relationship between Y and at
least one of the X’s.
 The t-test of each partial regression coefficient.
Significant t indicates that the variable in
question influences the Y response while
controlling for other explanatory variables.
Usefulness of Scatter
Plots - I
 The scatter plot on the Motality and Water Hardness

right illustrates the


Mortal = 1676.36 - 3.22609 Calcium

S = 143.029 R-Sq = 42.9 % R-Sq(adj) = 41.9 %

relationship between 2000

water hardness and


mortality in 61 large

Mortal
1500

towns in England and


Wales. Regression

95% CI

 The regression line 1000

indicates inverse
0 20 40 60 80 100 120 140

Calcium

relationship between
water hardness and
mortality rates.
Usefulness of Scatter
Plots - II
Motality and Water Hardness in Towns in the North
Motality and Water Hardness in Towns in the South
MortalN = 1692.31 - 1.93134 CalciumN

100
MortalS = 1522.82 - 2.09272 CalciumS
S = 129.209 R-Sq = 13.6 % R-Sq(adj) = 11.0 %
S = 114.297 R-Sq = 36.3 % R-Sq(adj) = 33.6 %
2000

1600
1900

1800
50 East 1500
MortalN

MortalS
West
1700 1400

0
1600 1300

1500

1400 1st 3rd North


Regression

95% CI
1200 Regression

95% CI

1100

0 10

Qtr
20 30 40

Qtr 50

CalciumN
60 70 80 90 100 0 20 40 60

CalciumS
80 100 120 140

 The inverse relationship between water hardness


is till maintained. But
 For towns in the North the regression line is less
steep than for towns in the South indicating that
other causes of mortality are stronger in the
North compared to the South.

You might also like