Class Xiii: Controlling For Other Variables

CLASS XIII
Controlling for Other Variables

Association versus Causation
Storks and Births
30
20
Births
10
0
0 2 4 6 8 10
Storks
Equations for bivariate and multiple regression
The bivariate regression equation is expressed as:
Y= a + bX
a = Y intercept
b = the slope of the relationship between X and Y
The multiple regression equation is expressed as:
Y= a + b1X1 + b2X2
a = Y intercept
b1 = the slope of the relationship between X 1 and Y, holding

constant X2
b2 = the slope of the linear relationship between the X 2 and Y,

holding constant X1
What happens to the X  Y association
when we consider X2?
When X2 is introduced1, there are three possible
effects on the X  Y association. Either:
1.The magnitude of the association changes
2.The association disappears (magnitude = 0)
3.The magnitude of the association stays the same
1“X2 is introduced” = “X2 is held constant” = “adjusting for X 2”

--- these are used interchangeably in multiple regression.
(1) Changes, (2) Disappears, (3) Stays the
same
 We’ll illustrate these three possibilities in three
different ways
 Venn diagrams
 Word problem examples
 Regression equations
 Each time
 b reflects the magnitude of the association between X 1
and Y
 Bivariate
 b1 reflects the magnitude of the association between
X1 and Y, holding X2 constant
 Multivariate
The bivariate association
Seeing the patterns in Venn diagrams
X1
b
b reflects the magnitude of the bivariate

association between X1 and Y.
Seeing the patterns in an example
Maternal Age Infant birth weight
Suppose that you find that older mothers give birth to heavier
babies. That is, the slope of maternal age (X) on birthweight (Y)
is positive.
Is this relationship direct, or could other factors explain it?
Let’s look at how the relationship is modified by 3rd variables.

Seeing the patterns in a regression context
If Maternal SES and Birth weight were measured as continuous

variables, then we could look at the association between the two,
using regression.
Here is what the regression equation might look like:
BW = a + b1 (Maternal age)
Where a = 2500 and b1 = 20
(BW in grams; AGE in years)
BW= 2500 + 20 (AGE)
B
W
AGE
“For every additional year of age, birthweight increases 20 grams”

(1) Adding a third variable
changes the magnitude of the association
X1
b1
X2
b1 reflects the magnitude of the association
between X1 and Y, holding X2 constant.
In this case b1 changes – it is smaller than the
Maternal smoking during pregnancy

Consider the 3rd variable “smoking during pregnancy”
Suppose: older women are less likely to smoke during pregnancy, and
women who smoke less have healthier (heavier) babies.
Holding smoking constant, there is still an association between maternal

age and birth weight, but it is smaller. We say that maternal smoking
confounds the association between maternal age & birth weight.
Let’s now look at the association between AGE and BW, holding
constant # of cigarettes smoked.
The form of our regression equation would be:
BW = a + b1 (AGE) + b2 (# cigs smoked/day)

And the values of the coefficients might be:
a = 2000 and b1 = 15 and b2 = 25
or
BW= 2800 + 15 (AGE) - 25 (# cigs/day)
B
W
AGE
“For every additional year of age, birthweight increases 15 grams,

holding constant number of cigarettes smoked”
makes the association disappear
X2 X1
b1

In this case b1 = 0 – the new variable has
swallowed up all of b’s space!
Maternal socioeconomic status (SES)

Consider the 3rd variable “maternal socioeconomic status.”
Suppose: Mothers of higher SES are older when they have their babies.
Mothers of higher SES have heavier babies
(perhaps because they have better prenatal care)?
So when maternal SES is held constant, there is no longer

a relationship between maternal age and infant birth weight.
The bivariate relationship was spurious.
Now let’s look at the association between AGE and BW, holding constant
maternal SES.
BW = a + b1 (AGE) + b2 (SES)
And the values of the coefficients might be:
a = 2200 and b1 = 0 and b2 = 200, or
BW = 2200 + 0 (AGE) + 200 (SES)
B
W
AGE
“Holding SES constant, there is no association between birthweight and age”,
or
For every additional year of age, birthweight increases 0 grams,
holding SES constant”
doesn’t change the association at all
X1
b1
X2

In this case b1 is the same size (magnitude)
as the bivariate b.
Gender of the child
Consider the 3rd variable “gender of the child.”
Suppose that: while male children are heavier then female children,
there is no association between maternal age and gender of the child.
Gender of the child has no effect on the association between maternal age
and infant birth weight. We say that the relationship between maternal age
and infant birth weight is direct, or not confounded by the gender of
the child.
Now let’s look at the association between BW and AGE, holding

constant gender.
BW = a + b1 (AGE) + b2 (GENDER) a dummy = 1 for Male children

and the values of the coefficients might be:
a = 2300 and b1 = 20 and b2 = 400
or
BW = 2300 + 20 (AGE) + 400 (GENDER)
B
W
AGE
“For every additional year of age, birthweight increases 20 grams,

holding gender constant”
Multiple Regression --- The Movie
http://www.math.yorku.ca/SCS/spida/lm/visreg.html
Multiple regression in Stata:
interpreting the output
Example: Predicting the number of hours worked per week,
based on education and drinking habits
hrs_worked = number of hours worked per week

educ = number of years of education
repub = number of hours spent in the pub last night
Regression output for the example
Interpreting the regression output
hrs_worked = Number of hours worked per week

educ = Number of years of education
repub = Number of hours spent in the pub last night
N = 1725
Look under Number of obs
R2 = 0.0572
The proportion of variance in Y that is explained both independent
variables together. In this case it is 5.72%.
Adjusted R2= 0.0561

R2, adjusting for the number of independent variables in the model.
Sum of Squares – Related to the residuals, or the “unexplained” distance

between the regression line and the actual values of Y. FYI one can
calculate R2 by dividing the Model Sum of Squares [=21115.4807] by
the Total [Model + Residual] Sum of Squares [=369471.478] – see
ANOVA table.
Interpreting the regression output, continued
F =52.19, Sig. F = 0.000

Tests for the statistical significance of the entire regression equation –
that is, the predictive value of all the variables in the equation. If
significance is less than 0.05, then the equation is significant at the
0.05 level.
Constant = 38.49222
The constant is the Y intercept (a.k.a. “a”).
The Regression Equation

Hrs_worked’ = 38.49 + .4831397 * [educ] - 6.556286 [repub]
b, the Regression Coefficient

Represents the change in the value of the dependent variable [hrs_worked]
for each unit change in the independent variable
“For each additional year of education, there is an average increase of

0.48 hours worked per week, holding constant number of hours in the
pub last night.”
For every additional hour spent in the pub last night, there is an
average decrease of 6.5 hours worked per week, holding constant
years of education.”
Interpreting the regression output, continued
Std. Error [for b] – Standard error for b, which is needed to test

whether b is statistically significant.
t, Pr(t) – tell you whether b is statistically significant, that is, different

from zero.
•t is tobtained for the regression coefficient. If tobtained falls in the critical
region, then the coefficient is statistically significantly different from
zero. (For alpha = .05, critical region is beyond +/- 1.96)
•Pr(t) is the p-value associated with tobtained. If Pr(|t|) is less than alpha,
then the coefficient is statistically significantly different from zero.
If “t” were not given, you could still calculate t by using the
formula:
t= Statistic
Std. Error
Rules for X’s in multiple regression
(For those who are aiming to finish the

final project analysis; we’ll discuss this
again next week)
Rules for X’s in multiple regression
 Continuous variables are OK
 You knew that before today!
 Grouped variables with > 2 values are NOT OK
 You knew that before today!
 Dummy variables are OK
 These special grouped variables are permitted as X’s in
regression
 This is news – it breaks the “continuous/ continuous” rule
 We’ll talk a lot more about dummy variables in regression next
week
Encore !
JB
This week in Stat 1
 No Lab this week
 For next lecture
 No Healey reading
 Re-read Chapters 7, 8, 9 of Wagner Way
 Complete the homework
 Complete the challenge problems
 Final assignment
 You can now complete almost all analyses
 Dig into the assignment deeply before you
attend next week’s lecture – I’ll take questions
then
 Always: analyze, then stop and think
JB
Practice problems
JB

Class Xiii: Controlling For Other Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Class Xiii: Controlling For Other Variables

Uploaded by

Copyright:

Available Formats

CLASS XIII

Controlling for Other Variables

Storks and Births

The bivariate regression equation is expressed as:

b = the slope of the relationship between X and Y

The multiple regression equation is expressed as:

b1 = the slope of the relationship between X 1 and Y, holding

b2 = the slope of the linear relationship between the X 2 and Y,

1“X2 is introduced” = “X2 is held constant” = “adjusting for X 2”

b reflects the magnitude of the bivariate

Maternal Age Infant birth weight

Is this relationship direct, or could other factors explain it?

Let’s look at how the relationship is modified by 3rd variables.

If Maternal SES and Birth weight were measured as continuous

Here is what the regression equation might look like:

(BW in grams; AGE in years)

BW= 2500 + 20 (AGE)

“For every additional year of age, birthweight increases 20 grams”

Maternal Age Infant birth weight

Maternal smoking during pregnancy

Holding smoking constant, there is still an association between maternal

The form of our regression equation would be:

BW = a + b1 (AGE) + b2 (# cigs smoked/day)

BW= 2800 + 15 (AGE) - 25 (# cigs/day)

“For every additional year of age, birthweight increases 15 grams,

b1 reflects the magnitude of the association

Maternal Age Infant birth weight

Maternal socioeconomic status (SES)

So when maternal SES is held constant, there is no longer

The form of our regression equation would be:

BW = 2200 + 0 (AGE) + 200 (SES)

b1 reflects the magnitude of the association

Maternal Age Infant birth weight

Gender of the child

Consider the 3rd variable “gender of the child.”

Now let’s look at the association between BW and AGE, holding

The form of our regression equation would be:

BW = a + b1 (AGE) + b2 (GENDER) a dummy = 1 for Male children

BW = 2300 + 20 (AGE) + 400 (GENDER)

“For every additional year of age, birthweight increases 20 grams,

hrs_worked = number of hours worked per week

hrs_worked = Number of hours worked per week

Adjusted R2= 0.0561

Sum of Squares – Related to the residuals, or the “unexplained” distance

F =52.19, Sig. F = 0.000

The Regression Equation

b, the Regression Coefficient

“For each additional year of education, there is an average increase of

Std. Error [for b] – Standard error for b, which is needed to test

t, Pr(t) – tell you whether b is statistically significant, that is, different

(For those who are aiming to finish the

You might also like