You are on page 1of 14

PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Correlation and Linear Regression

 Open NCBIRTH800mod dataset


Recall that the North Carolina State Center for Health Statistics makes publicly available birth and infant
death data for all children born in the state of North Carolina. This comprehensive dataset for the births
in 2001 contains 120,300 records. The data in ‘NCBIRTH800.sav’ represents a random sample of 800 of
those births and selected variables.

 Correlational analysis

 Is birth weight associated with gestational age? Is this relationship statistically significant?

State the null and alternative hypothesis:

H0: ρ = 0
Ha: ρ ≠ 0

Visual inspection of the association:


GRAPHS →LEGACY DIALOGS→SCATTER/DOT→ Simple Scatter

There is a positive, linear relationship between weeks of gestation and birth weight –longer gestation
time is associated with higher birth weight. Data points in the lower left corner of the scatterplot show
infants with very low birth weights, these are premature infants. You may want to exclude these data
points if you are interested in inference for full-term infants only.

1
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Compute Pearson Moment correlation coefficient:


ANALYZE →CORRELATE →BIVARIATE

Sample correlation coefficient r = .58


The association between weeks of gestation and birth weight is positive, moderate, and statistically
significant (i.e. different from 0, p<.001).

 Simple Linear Regression

 Is birth weight associated with gestational age? Is this relationship statistically significant?

State the null and alternative hypothesis:

H0: β = 0
Ha: β ≠ 0

First, show regression line in the scatterplot


GRAPHS →LEGACY DIALOGS→SCATTER/DOT→ Simple Scatter

Double-click on the scatterplot→ Chart editor opens → Select Add Fit Line at Total

2
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Request Simple Linear Regression SPSS output:


ANALYZE →REGRESSION →LINEAR

Confidence Intervals for β estimates and Descriptive statistics are available under Statistics; residual
analysis plots are available under Plots

3
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Variables Entered/Removed table shows the outcome variable and the explanatory variable:

Variables Entered/Removeda
Model Variables Variables Method
Entered Removed
Completed . Enter
weeks of
1
gestation
(weeks)b
a. Dependent Variable: Weight of child (grams)
b. All requested variables entered.

Model Summary table shows the r² estimate and residual standard deviation:

Model Summaryb

Model R R Square Adjusted R Std. Error of the


Square Estimate

1 .583a .340 .339 519.65007


a. Predictors: (Constant), Completed weeks of gestation (weeks)
b. Dependent Variable: Weight of child (grams)

Descriptive statistics table shows basic summary statistics for the outcome variable and the
explanatory variable (Sample correlation coefficient is also produced, not shown here):

Descriptive Statistics
Mean Std. Deviation N
Weight of child (grams) 3298.5704 639.06495 799
Completed weeks of 38.61 2.716 799
gestation (weeks)

4
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Coefficients table shows the estimates for the intercept (b o ¿ and slope ( b 1) along with 95% CI for
these estimates. This table also shows the t-test and associated p-value for the following test:
H0: β = 0
Ha: β ≠ 0

Coefficientsa
Model Unstandardized Standardized t Sig. 95.0% Confidence Interval
Coefficients Coefficients for B
B Std. Error Beta Lower Upper
Bound Bound
(Constant) -1996.149 262.165 -7.614 .000 -2510.765 -1481.534
1 Completed weeks of 137.117 6.773 .583 20.246 .000 123.823 150.412
gestation (weeks)
a. Dependent Variable: Weight of child (grams)

Sample statistics:

n=799 – sample size

Gestational age (x): M = 38.61 weeks, SD= 2.72 weeks


Weight of child in grams (y): M= 3299g, SD= 639.06g

Sample correlation coefficient: r = .58

Sample r²: gestational age (in weeks) explained 34% of the variability in birth weight

Regression equation: ^y =-1996.15 + 137.12x

Hypotheses testing (Is birth weight associated with gestational age?):


H0: β = 0
Ha: β ≠ 0

t-test=20.25, p-value<.001; 95% CI for β is (123.82, 150.41)

• Conclusion: Since p-value associated with this test is p<.001 (which is smaller than α= .05), we
reject the null hypothesis β = 0 in favor of the alternative hypothesis Ha: Ha: β ≠ 0. We
conclude that gestation age is associated with birth weight and this relationship is statistically
significant. Based on a 95% CI, we can conclude that, on average, infants that differ by one week
in gestational age differ by between 124 and 150 grams in birth weight.

5
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Residual analysis
Check for:

1. Linearity: scatterplot (done, fulfilled)


2. Constant variability (Homoscedasticity): examine the scatterplot of standardized
predicted value vs. standardized residuals (look for a random cloud of data points →
scatterplot does not show any serious violations to this assumption, i.e. there are no
obvious trends visible in the plot, the spread of data points is approx. constant (except
for a couple of premature infants shown in the left side of the data cloud))

3. Approximate normality: examine the histogram of standardized residuals (the plot


reveals that residuals are approximately normally distributed)

4. Independence: is data independent? Probably yes, need to check whether siblings were
included in the dataset (correlated observations)

6
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

For practice, repeat the analysis to investigate the following associations:


1) Birth weight and mother’s age (mage)
2) Birth weight and weight gained during pregnancy (gained)

Simple linear regression output of the birth weight - mother’s age relationship:

Model Summary
Model R R Square Adjusted R Std. Error of the
Square Estimate
1 .168a .028 .027 630.32143
a. Predictors: (Constant), Age of mother (years)

Coefficientsa
Model Unstandardized Standardized t Sig. 95.0% Confidence Interval
Coefficients Coefficients for B
B Std. Error Beta Lower Upper
Bound Bound
(Constant) 2827.226 100.768 28.057 .000 2629.425 3025.027
1 Age of mother 17.538 3.651 .168 4.803 .000 10.371 24.705
(years)
a. Dependent Variable: Weight of child (grams)

Sample r²: mother’s age (in years) explained 2.8% of the variability in birth weight
Regression equation: ^y =2827.23 + 17.54x

Hypotheses testing (Is birth weight associated with mother’s age?):


H0: β = 0
Ha: β ≠ 0

t-test=4.80, p-value<.001; 95% CI for β is (10.37, 24.71)

• Conclusion: Since p-value associated with this test is p<.001 (which is smaller than α= .05), we
reject the null hypothesis β = 0 in favor of the alternative hypothesis Ha: β ≠ 0. We conclude
that mother’s age is associated with birth weight and this relationship is statistically significant.
Based on a 95% CI, we can conclude that, on average, mothers that differ by one year in age have
infants that differ by between 10 and 25 grams in birth weight.

Simple linear regression output of the birth weight – weight gained during pregnancy relationship

Model Summary
Model R R Square Adjusted R Std. Error of the
Square Estimate
1 .222a .049 .048 613.27957
a. Predictors: (Constant), Weight gained during pregnancy (pounds)

7
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Coefficientsa
Model Unstandardized Standardized t Sig. 95.0% Confidence Interval
Coefficients Coefficients for B
B Std. Error Beta Lower Upper
Bound Bound
(Constant) 2996.665 54.014 55.479 .000 2890.633 3102.696
1 Weight gained during 10.239 1.613 .222 6.348 .000 7.072 13.405
pregnancy (pounds)
a. Dependent Variable: Weight of child (grams)

Sample r²: weight gained during pregnancy (in pounds) explained 4.9% of the variability in birth
weight
Regression equation: ^y =2996.67 + 10.24x

Hypotheses testing (Is birth weight associated with weight gained during pregnancy?):
H0: β = 0
Ha: β ≠ 0

t-test=6.35, p-value<.001; 95% CI for β is (7.07, 13.41)

• Conclusion: Since p-value associated with this test is p<.001 (which is smaller than α= .05), we
reject the null hypothesis β = 0 in favor of the alternative hypothesis Ha: β ≠ 0. We conclude
that weigh gained during pregnancy is associated with birth weight and this relationship is
statistically significant. Based on a 95% CI, we can conclude that, on average, mothers that differ
by pound in weight gained during pregnancy have infants that differ by between 7 and 13 grams
in birth weight.

 Multiple Linear Regression

• Is birth weight associated with gestational age? Are age of mother and weight gained during
pregnancy potential confounders?

EXPLORATORY ANALYSIS:

1. First, construct scatterplots for each explanatory variable vs. outcome variable

GRAPHS →LEGACY DIALOGS→SCATTER/DOT→ Matrix Scatter

8
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Compute Pearson Moment correlation coefficient for each pair of variables:


ANALYZE →CORRELATE →BIVARIATE

Correlations
Weight of child Completed Age of mother Weight gained
(grams) weeks of (years) during
gestation pregnancy
(weeks) (pounds)
Pearson Correlation 1 .583** .168** .222**
Weight of child (grams) Sig. (2-tailed) .000 .000 .000
N 800 799 800 777
Pearson Correlation .583** 1 .021 .081*
Completed weeks of .000 .553 .024
Sig. (2-tailed)
gestation (weeks)
N 799 799 799 777
Pearson Correlation .168** .021 1 .028
Age of mother (years) Sig. (2-tailed) .000 .553 .441
N 800 799 800 777
Pearson Correlation .222** .081* .028 1
Weight gained during .000 .024 .441
Sig. (2-tailed)
pregnancy (pounds)
N 777 777 777 777
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).

9
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Correlational analysis suggests that the relationship between the outcome variables and each of the
explanatory variables is approximately linear. Pearson moment correlation is the strongest for the
birth weight- gestational age relationship (r=.58, p<.001), and rather week but still statistically
significant for birth weight - mother’s age relationship (r=.17, p<.001) and birth weight – weight gained
during pregnancy relationship (r=.22, p<.001).

ANALYZE →REGRESSION →LINEAR

Confidence Intervals for β estimates and Descriptive statistics are available under Statistics; residual
analysis plots are available under Plots

10
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Variables Entered/Removed table shows the outcome variable and the explanatory variable:

Variables Entered/Removeda
Model Variables Entered Variables Method
Removed
Weight gained during . Enter
pregnancy (pounds),
1 Age of mother (years),
Completed weeks of
gestation (weeks)b
a. Dependent Variable: Weight of child (grams)
b. All requested variables entered.
Model Summary table shows the r² estimate and residual standard deviation:

Model Summary

Model R R Square Adjusted R Std. Error of the


Square Estimate

1 .608a .370 .367 499.94166

a. Predictors: (Constant), Weight gained during pregnancy (pounds),


Age of mother (years), Completed weeks of gestation (weeks)

ANOVA table shows the Omnibus F-test results:


• Ho: β 1= β 2= β 3=0
• Ha: at least one beta parameter is not equal to 0

ANOVAa
Model Sum of Squares df Mean Square F Sig.
Regression 113436571.246 3 37812190.415 151.284 .000b
1 Residual 193204908.215 773 249941.667
Total 306641479.461 776
a. Dependent Variable: Weight of child (grams)
b. Predictors: (Constant), Weight gained during pregnancy (pounds), Age of mother (years),
Completed weeks of gestation (weeks)

Descriptive statistics table shows basic summary statistics for the outcome variable and the
explanatory variable (Sample correlation coefficient is also produced, not shown here):

Descriptive Statistics
Mean Std. Deviation N
Weight of child (grams) 3309.7986 628.61478 777
Completed weeks of 38.66 2.630 777
gestation (weeks)
Age of mother (years) 26.95 6.069 777
Weight gained during 30.58 13.649 777
pregnancy (pounds)

11
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Coefficients table shows the estimates for the intercept (b o ¿ and slopes ( b 1−b3) ) along with 95% CI
for these estimates. This table also shows the t-test and associated p-value for the following tests:
H0: βk = 0
Ha: βk ≠ 0
Coefficientsa
Model Unstandardized Standardized t Sig. 95.0% Confidence Interval
Coefficients Coefficients for B
B Std. Error Beta Lower Upper
Bound Bound
(Constant) -2383.550 274.881 -8.671 .000 -2923.152 -1843.948
Completed weeks of 129.586 6.847 .542 18.926 .000 116.145 143.027
1 gestation (weeks)
Age of mother (years) 16.244 2.959 .157 5.490 .000 10.436 22.053
Weight gained during 8.016 1.320 .174 6.075 .000 5.426 10.607
pregnancy (pounds)
a. Dependent Variable: Weight of child (grams)

Sample statistics:

n=799 – sample size

Sample multiple correlation coefficient: R= .61

Sample r²: Gestational age, Age of mother and Weight gained during pregnancy explain 37% of the
variation in birth weight

Regression equation: ^y =-2383.55 + 129.59*weeks+16.24*age+8.02*weight

Hypotheses testing (Omnibus F tests (is the model useful for prediction?))

• Ho: β 1= β 2= β 3=0
• Ha: at least one beta parameter is not equal to 0

F-test=151.28, p-value<.001
• Conclusion: Since p-value associated with this test is p<.001 (which is smaller than α= .05), we
reject the null hypothesis β 1= β 2= β 3=0 in favor of the alternative hypothesis Ha: at least one
beta parameter is not equal to 0. We conclude that at least one explanatory variable have a non-
zero correlation with the outcome.

Significance Test and CI for slopes

All three explanatory variables are significant (p<.001 for all 3 variables)

Birth weight- gestational age: t-test=18.93, p-value<.001; 95% CI for β is (116.14, 143.03)
Infants that differ by one week in gestational age are expected to differ by 129.59 grams in
birth weight (on average) (95% CI (116.14, 143.03)), after controlling for mother’s age and
weight gained during pregnancy.

12
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

Birth weight - mother’s age: t-test=5.49, p-value<.001; 95% CI for β is (10.44, 22.05)
Mothers who differ in age by one year are expected to have infants that differ by 16 grams in
birth weight (on average) (95% CI (10.44, 22.05), after controlling for gestational age and
weight gained during pregnancy.

Birth weight – weight gained during pregnancy: t-test=6.08, p-value<.001; 95% CI for β is (5.43,
10.61).
Mothers who differ by 1 pound in weight gained during pregnancy are expected to have infants
that differ by 8 grams in birth weight (on average) (95% CI (5.43, 10.61), after controlling for
gestational age and mother’s age.

Residual analysis
Check for:
1. Linearity: scatterplot (done, fulfilled)
2. Constant variability (Homoscedasticity): examine the scatterplot of standardized predicted
value vs. standardized residuals (look for a random cloud of data points → scatterplot does
not show any serious violations to this assumption, i.e. there are no obvious trends visible
in the plot, the spread of data points is approx. constant (except for a couple of premature
infants shown in the left side of the data cloud))

3. Approximate normality: examine the histogram of standardized residuals (the plot reveals
that residuals are approximately normally distributed)

13
PUBH 614 Lab 6: Quant and Qual Data Analysis Methods

4. Independence: is data independent? Probably yes, need to check whether siblings were
included in the dataset (correlated observations)

14

You might also like