- Lecture06_MultReg
- Springer 1
- Undergraduate Econometric
- hello world
- HW3
- Lecture 16 Statistical Analysis (Cont.)
- Lecture 20
- Chapter 5,6 Regression Analysis.pptx
- Fetkovich PAper
- Accelerated Curing - Concrete Mix Design
- Chapter 14, Multiple Regression Using Dummy Variables
- Estimate Demand Function & Forecast Demand
- 3461_1
- Multi Regression
- 41-46
- Brief Introduction to Stata 10 Time Analysis
- 2011 10 18 Moderation Mediation
- Lecture Slides Stats1.13.L11
- Capital Strucure and Its Impact on Financial Performance of Indian Steel Industry
- Poverty and Vulnerability in Adverse Ecological Environments: Evidence from the Coastal Areas of Bangladesh - tables
- regression.pdf
- 1738-8709-1-PB.pdf
- Linear Regression
- Assignment
- Experimento 9
- STAT1008 Final Exam S1 2006
- -28sici-291096-8644-28199806-29106-3A2-3C157-3A-3Aaid-ajpa4-3E3.0.co-3B2-n
- Job Satisfaction
- Eviews
- Chap 005
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- Sapiens: A Brief History of Humankind
- Yes Please
- The Unwinding: An Inner History of the New America
- The Prize: The Epic Quest for Oil, Money & Power
- Grand Pursuit: The Story of Economic Genius
- This Changes Everything: Capitalism vs. The Climate
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- The Emperor of All Maladies: A Biography of Cancer
- John Adams
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- Rise of ISIS: A Threat We Can't Ignore
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Team of Rivals: The Political Genius of Abraham Lincoln
- The New Confessions of an Economic Hit Man
- Bad Feminist: Essays
- How To Win Friends and Influence People
- Angela's Ashes: A Memoir
- Steve Jobs
- The Incarnations: A Novel
- You Too Can Have a Body Like Mine: A Novel
- The Silver Linings Playbook: A Novel
- Leaving Berlin: A Novel
- Extremely Loud and Incredibly Close: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- The Light Between Oceans: A Novel
- The Blazing World: A Novel
- The Rosie Project: A Novel
- The First Bad Man: A Novel
- We Are Not Ourselves: A Novel
- Brooklyn: A Novel
- The Flamethrowers: A Novel
- A Man Called Ove: A Novel
- The Master
- Bel Canto
- Life of Pi
- The Love Affairs of Nathaniel P.: A Novel
- A Prayer for Owen Meany: A Novel
- The Cider House Rules
- Lovers at the Chameleon Club, Paris 1932: A Novel
- The Bonfire of the Vanities: A Novel
- The Perks of Being a Wallflower
- Interpreter of Maladies
- The Kitchen House: A Novel
- Beautiful Ruins: A Novel
- The Art of Racing in the Rain: A Novel
- Wolf Hall: A Novel
- The Wallcreeper
- My Sister's Keeper: A Novel

**A Regression Analysis on the birth weight of Children born in Philadelphia, 1990
**

Christian Jones

Andrew Garrison

Introduction:

Low birth weight has been linked to mental retardation, epilepsy, deaIness, and blindness.

It has Iurthermore been proven to increase chances oI perinatal mortality (Bache et al 2008).

Because oI this, we want to analyze the Iactors that may contribute to low birth weight in

children. Many oI the suggested causes Ior low birth weight are manageable, even in poor

countries (Pojda & Kelley 2000). We thereIore hope that by Iinding some oI the key

contributing Iactors to low birth weight, we shed some light on this issue.

We chose to analyze the eIIect on birth weight oI children born in Philadelphia in the

year 1990. We Iound this to be relevant because we and our peers were born close to this time,

and thus could have Iit right into this dataset. While there are many Iactors that may aIIect birth

weight, we chose to Iocus on a select Iew Ior simplicity`s sake. Smoking is oIten linked to low

birth weight and thus we wanted to test that Iactor in our regression (Bache et al 2008). Other

predictor variables we chose to include were race, education level, and gestation period. Race

could possibly aIIect the outcome due to genetic diIIerences. Education level may not have a

direct eIIect on birth weight, but higher education may lead to healthier decisions which improve

the child`s birth weight. Finally gestation period should also directly aIIect the birth weight.

We expect that smoking and gestation period will be strong predictors oI birth weight in

our regression analysis. We are unsure whether or not race and education level will be

signiIicant to the model at this time.

nalysis:

We will attempt to build a linear regression model to predict the birth weight oI children

(measured in grams). Our pool oI potential predictor variables include race (other ÷ 0, black ÷ 1),

education level (number oI years), whether or not the mother smoked during pregnancy (no ÷ 0,

yes ÷ 1) and gestation period (number oI days).

We begin by examining the Minitab output regressing birth weight on all Iour oI these

predictors (Iig. 1). Using Minitab`s data subsetting Ieature, we immediately see that there may be

an interaction between the race and smoking variables, and there may be curvature in the

gestation period variable. There is also a clear lack oI Iit (p ÷ 0.000). In order to correct these

problems, we add an interaction term between race and smoking and add squared and cubed

terms oI the gestation period variable. We run the regression analysis in Minitab again and these

issues are resolved. However, there is clearly collinearity present between the gestation period

variable and its squared and cubed terms (VIF ÷ 6750.440, 30542.551 and 8773.925,

respectively). To address this concern, we center the gestation period variable by subtracting the

mean value Irom each value. We then square and cube this variable to create the polynomial

terms. These centered variables are represented by 'Cgestation¨, 'cgest`2¨ and 'cgest`3.¨ Fig.

2a shows the Minitab output Ior this regression analysis. We see that there are still legitimate

collinearity concerns between the squared and cubed terms oI the gestation period variable (VIF

÷ 23.061, 19.656, respectively). However, having already centered the variables, we choose to

move Iorward.

Next, we check the assumption that the residual terms Iollow a normal distribution (Iig.

2b). The Anderson-Darling test statistic ÷ 0.365, and p ÷ .437. ThereIore, we conclude that the

residuals Iollow a normal distribution.

We also check the assumption that the residual terms have an equal variance by applying

the modiIied Levine test (Iig. 2c). We conclude that the error terms Ior the education level (p÷

0.843) and gestation period (p ÷ 0.843) variables, as well as the squared (p ÷ 0.930) and cubed (p

÷ 0.843) terms Ior gestation period all have a constant variance. We cannot run this test Ior the

race and smoking variables because they are not continuous.

Having completed checking the assumptions and making the necessary transIormations,

we begin the process oI building the best model. We split our data into two equal sets: one to

select a Iew possible models and the other to validate our selection. We make use oI all three oI

Minitab`s automatic procedures, the Iorward selection, backward selection and stepwise

selection (Iigs. 3a-c). Both the Iorward and stepwise selection methods select the same model,

selecting the gestation, race, smoking, and the interaction term between race and smoking

predictors to include in the model. The backward selection method selected a model including all

oI the predictors except education level. We also use Minitab`s best subsets Iunction to compare

models based on R

2

, R

2

(adj) and cp values (Iig. 3d). We Iind that the same model selected by the

backward selection method has the highest R

2

and R

2

(adj)

values and has the lowest cp. Thus,

we choose these two models to compare using our validation data set.

Using the validation data set, we again use Minitab`s best subsets Ieature to compare our

two possible models (Iig. 4). The only diIIerence between the models is whether the squared and

cubed terms Ior the gestation period are included. While including these terms yields only a

marginal increase in R

2

and R

2

(adj) values, the cp value decrease Irom 26.8 to 7.0. Since these

terms can be manuIactured Irom the gestation period term, including them in the model will not

require any additional observations and will not be expensive to include. ThereIore, we Ieel that

the large reduction in the cp value justiIies their inclusion in the model. Our Iinal model includes

race, smoking, the race and smoking interaction, gestation period, and the squared and cubed

terms oI gestation period as its predictors. Education level is excluded Irom the model. The

regression analysis oI this model is shown in Iig. 5.

onclusion

Our regression supported much oI what we thought to be true, and also surprised us in some

areas. Our regression conIirmed our belieIs that smoking and gestation period are key components in

predicting birth weight, although we did not expect to only see an R

2

oI only 54.2°. However, birth

weight is a complex problem that cannot be easily explained by Iour variables. II Iour variables can

predict 54.2° oI the variation in birth weight, that can be considered a success.

We were somewhat surprised to see that education did not contribute much to the model.

Education clearly should not directly impact birth weight; however we thought that since it might impact

a mother`s health choices, higher education may lead to higher birth weights.

It is interesting to note that race was a signiIicant component oI our selected model. This could

be that low birth weight is genetically more common in black people, but it is also very possible that it is

actually more linked to socioeconomic class.

The inclusion oI the gestation period was oI no surprise to us. It would seem intuitive that a

shorter gestation period would likely result in a smaller baby, and our results conIirm our suspicions. As

we would have suspected, it is the strongest indicator oI birth weight in our model.

Through running our regression, we now have a better understanding oI some oI the Iactors

contributing to low birth weight. Smoking and gestation period were Iactors as expected, but we were

surprised to learn that higher education levels may not be a reasonable solution to the problem.

Furthermore, we were surprised to Iind that race acted as a signiIicant predictor oI low birth weight. We

are satisIied with the results oI our regression and have learned more about the causes oI low birth weight.

550ndix

**Figur0 1: Mod0l including rac0 0ducation smoking g0station
**

The regression equation is

weight = - 2834 - 169 black + 9.57 edu - 175 smoke + 157 gestation

Predictor Coef SE Coef T P VIF

Constant -2834.5 215.6 -13.15 0.000

black -168.97 27.26 -6.20 0.000 1.051

edu 9.572 6.458 1.48 0.139 1.074

smoke -174.81 31.62 -5.53 0.000 1.073

gestation 156.512 5.014 31.22 0.000 1.051

S = 436.107 R-Sq = 52.8% R-Sq(adj) = 52.6%

Analysis of Variance

Source DF SS MS F P

Regression 4 235987581 58996895 310.20 0.000

Residual Error 1110 211110570 190190

Total 1114 447098151

Source DF Seq SS

black 1 29407936

edu 1 3020604

smoke 1 18211075

gestation 1 185347966

Unusual Jbservations

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large leverage.

Lack of fit test

Possible interaction in variable black (P-Value = 0.000 )

Possible interaction in variable smoke (P-Value = 0.047 )

Possible curvature in variable gestation (P-Value = 0.000 )

Jverall lack of fit test is significant at P = 0.000

Figur0 2: Mod0l including rac0 0ducation smoking g0station (c0nt0r0d) int0raction t0rm

b0tw00n rac0 and smoking and g0station (c0nt0r0d) squar0d and cub0d

Fig. 2a R0gr0ssion nalysis

The regression equation is

weight = 3352 - 222 black + 6.68 edu - 306 smoke + 139 Cgestation

+ 209 black·smoke - 10.2 Cgest^2 - 0.527 Cgest^3

Predictor Coef SE Coef T P VIF

Constant 3352.28 86.85 38.60 0.000

black -222.48 30.60 -7.27 0.000 1.363

edu 6.677 6.405 1.04 0.297 1.087

smoke -305.87 50.38 -6.07 0.000 2.803

Cgestation 138.517 7.843 17.66 0.000 2.647

black·smoke 209.29 62.70 3.34 0.001 3.093

Cgest^2 -10.199 2.110 -4.83 0.000 23.061

Cgest^3 -0.5271 0.1208 -4.36 0.000 19.656

S = 429.956 R-Sq = 54.2% R-Sq(adj) = 53.9%

Analysis of Variance

Source DF SS MS F P

Regression 7 242455509 34636501 187.36 0.000

Residual Error 1107 204642642 184862

Total 1114 447098151

Source DF Seq SS

black 1 29407936

edu 1 3020604

smoke 1 18211075

Cgestation 1 185347966

black·smoke 1 2077746

Cgest^2 1 871580

Cgest^3 1 3518602

No evidence of lack of fit (P = 0.1).

Fig 2b 0cking Normality of R0siduals

**Anderson-Darling Test
**

H

0

: residuals Iollow a normal distribution

H

a

: residuals do not Iollow a normal distribution

At the .05 signiIicance level, we Iail to reject H

0

and conclude that the normality assumption is plausible.

Fig. 2c 0cking Equal Varianc0 ssum5tion (Modifi0d L0;0n0s T0st)

Education: P ÷ 0.843

Centered Gestation: P ÷ 0.843

(Centered Gestation)

2

: P ÷ 0.930

(Centered Gestation)

2

: P ÷ 0.843

Note: We cannot run test Ior race and smoking variables because they are not continuous.

H

0

: residuals have constant variance

H

a

: residuals do not have constant variance

For all continuous variables, we Iail to reject H

0

at the .05 signiIicance level and conclude that the residuals have

equal variance.

Figur0 3: Mod0l Building

2000 1000 0 ·1000 ·2000

33.33

33

3S

80

S0

20

S

1

0.01

RES!1

P

e

r

c

e

n

t

Nean 3.164281E·12

StDev 428.6

N 111S

AD 0.36S

P·value 0.437

Probability Plot of RES!1

Normal · 3Sº C!

**Fig. 3a Forward S0l0ction
**

Forward selection. Alpha-to-Enter: 0.25

Response is weight on 7 predictors, with N = 557

Step 1 2 3 4

Constant 3208 3325 3368 3399

Cgestation 167.7 161.3 156.3 156.8

T-Value 23.83 22.98 22.39 22.56

P-Value 0.000 0.000 0.000 0.000

black -196 -181 -234

T-Value -4.93 -4.64 -5.29

P-Value 0.000 0.000 0.000

smoke -199 -349

T-Value -4.61 -4.73

P-Value 0.000 0.000

black·smoke 225

T-Value 2.50

P-Value 0.013

S 459 450 442 440

R-Sq 50.57 52.65 54.40 54.91

R-Sq(adj) 50.48 52.48 54.15 54.58

Mallows Cp 54.2 30.7 11.2 6.9

Fig. 3b Backward S0l0ction

ackward elimination. Alpha-to-Remove: 0.1

Response is weight on 7 predictors, with N = 557

Step 1 2

Constant 3417 3423

black -235 -235

T-Value -5.17 -5.30

P-Value 0.000 0.000

edu 0.5

T-Value 0.05

P-Value 0.957

smoke -353 -354

T-Value -4.66 -4.82

P-Value 0.000 0.000

black·smoke 236 237

T-Value 2.59 2.62

P-Value 0.010 0.009

Cgest^2 -6.4 -6.4

T-Value -2.16 -2.17

P-Value 0.031 0.030

Cgest^3 -0.35 -0.35

T-Value -2.18 -2.19

P-Value 0.030 0.029

Cgestation 147 147

T-Value 12.83 12.84

P-Value 0.000 0.000

S 439 439

R-Sq 55.31 55.31

R-Sq(adj) 54.74 54.82

Mallows Cp 8.0 6.0

Fig. 3c St05wis0 S0l0ction

Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15

Response is weight on 7 predictors, with N = 557

Step 1 2 3 4

Constant 3208 3325 3368 3399

Cgestation 167.7 161.3 156.3 156.8

T-Value 23.83 22.98 22.39 22.56

P-Value 0.000 0.000 0.000 0.000

black -196 -181 -234

T-Value -4.93 -4.64 -5.29

P-Value 0.000 0.000 0.000

smoke -199 -349

T-Value -4.61 -4.73

P-Value 0.000 0.000

black·smoke 225

T-Value 2.50

P-Value 0.013

S 459 450 442 440

R-Sq 50.57 52.65 54.40 54.91

R-Sq(adj) 50.48 52.48 54.15 54.58

Mallows Cp 54.2 30.7 11.2 6.9

Fig. 3d rit0rion S0l0ction

Response is weight

b

l C

a g

c e

k C C s

· g g t

b s s e e a

l m m s s t

a e o o t t i

Mallows c d k k ^ ^ o

Vars R-Sq R-Sq(adj) Cp S k u e e 2 3 n

1 50.6 50.5 54.2 459.24 X

1 30.4 30.2 302.3 545.05 X

2 52.6 52.5 30.7 449.90 X X

2 52.6 52.5 30.9 450.01 X X

3 54.4 54.2 11.2 441.90 X X X

3 53.1 52.8 27.4 448.25 X X X

4 54.9 54.6 6.9 439.82 X X X X

4 54.4 54.1 12.9 442.19 X X X X

5 54.9 54.5 8.7 440.15 X X X X X

5 54.9 54.5 8.8 440.18 X X X X X

6 55.3 54.8 6.0 438.67 X X X X X X

6 54.9 54.4 10.7 440.53 X X X X X X

7 55.3 54.7 8.0 439.07 X X X X X X X

Figur0 4 Mod0l Validation

Response is weight

b

l C

a g

c e

k C C s

· g g t

b s s e e a

l m m s s t

a o o t t i

Mallows c k k ^ ^ o

Vars R-Sq R-Sq(adj) Cp S k e e 2 3 n

1 47.6 47.5 61.4 443.73 X

1 32.6 32.5 237.2 503.13 X

1 28.7 28.5 284.0 517.78 X

1 5.6 5.4 554.8 595.59 X

1 3.6 3.4 578.1 601.83 X

2 49.3 49.1 43.9 437.04 X X

2 48.8 48.6 49.7 439.16 X X

2 48.1 47.9 57.2 441.89 X X

2 48.1 47.9 57.8 442.09 X X

2 47.7 47.5 62.4 443.75 X X

3 50.5 50.2 31.3 432.03 X X X

3 49.9 49.6 38.5 434.70 X X X

3 49.5 49.2 43.2 436.44 X X X

3 49.4 49.1 44.2 436.80 X X X

3 49.4 49.1 44.5 436.89 X X X

4 51.4 51.1 22.6 428.40 X X X X

4 51.2 50.9 24.7 429.19 X X X X

4 51.1 50.7 26.8 429.98 X X X X

4 50.7 50.4 30.9 431.52 X X X X

4 50.6 50.3 31.7 431.83 X X X X

5 52.7 52.2 10.0 423.28 X X X X X

5 51.7 51.3 21.2 427.52 X X X X X

5 51.6 51.2 22.5 428.03 X X X X X

5 51.2 50.8 26.8 429.64 X X X X X

5 50.6 50.2 33.7 432.20 X X X X X

6 53.1 52.6 7.0 421.76 X X X X X X

Figur0 5 R0gr0ssion nalysis of Final Mod0l

The regression equation is

weight = 3439 - 228 black - 317 smoke + 139 Cgestation + 215 black·smoke

- 10.3 Cgest^2 - 0.534 Cgest^3

Predictor Coef SE Coef T P VIF

Constant 3439.07 24.75 138.93 0.000

black -227.52 30.22 -7.53 0.000 1.329

smoke -316.64 49.31 -6.42 0.000 2.685

Cgestation 138.724 7.841 17.69 0.000 2.646

black·smoke 215.25 62.44 3.45 0.001 3.067

Cgest^2 -10.285 2.109 -4.88 0.000 23.026

Cgest^3 -0.5338 0.1206 -4.42 0.000 19.600

S = 429.973 R-Sq = 54.2% R-Sq(adj) = 53.9%

Analysis of Variance

Source DF SS MS F P

Regression 6 242254609 40375768 218.39 0.000

Residual Error 1108 204843542 184877

Total 1114 447098151

Source DF Seq SS

black 1 29407936

smoke 1 20664220

Cgestation 1 185497639

black·smoke 1 2229039

Cgest^2 1 836653

Cgest^3 1 3619122

No evidence of lack of fit (P = 0.1).

- Lecture06_MultRegUploaded byKaziRafi
- Springer 1Uploaded byNisar Hussain
- Undergraduate EconometricUploaded byAcho Jie
- hello worldUploaded byaskus
- HW3Uploaded byrogervalen5049
- Lecture 16 Statistical Analysis (Cont.)Uploaded byMelissa Choi
- Lecture 20Uploaded bymeryana99
- Chapter 5,6 Regression Analysis.pptxUploaded bySushil Kumar
- Fetkovich PAperUploaded byAnonymous moOnbb4wg
- Accelerated Curing - Concrete Mix DesignUploaded byElango Paulchamy
- Chapter 14, Multiple Regression Using Dummy VariablesUploaded byAmin Haleeb
- Estimate Demand Function & Forecast DemandUploaded bySakisan Satchithanandam
- 3461_1Uploaded byAnthonyCancio
- Multi RegressionUploaded byfansuri80
- 41-46Uploaded byBAYU
- Brief Introduction to Stata 10 Time AnalysisUploaded byrajkumarbaimad
- 2011 10 18 Moderation MediationUploaded byLithleuxay Philavong
- Lecture Slides Stats1.13.L11Uploaded byJuan
- Capital Strucure and Its Impact on Financial Performance of Indian Steel IndustryUploaded byAmit
- Poverty and Vulnerability in Adverse Ecological Environments: Evidence from the Coastal Areas of Bangladesh - tablesUploaded byADB Poverty Reduction
- regression.pdfUploaded bySyaza Ibrahim
- 1738-8709-1-PB.pdfUploaded byDilini Indeewari
- Linear RegressionUploaded byJosue Perez
- AssignmentUploaded byTushita Gulati
- Experimento 9Uploaded byJosé Carlos Chan Arias
- STAT1008 Final Exam S1 2006Uploaded byIpTony
- -28sici-291096-8644-28199806-29106-3A2-3C157-3A-3Aaid-ajpa4-3E3.0.co-3B2-nUploaded byLata Deshmukh
- Job SatisfactionUploaded byNaga Sayana Srinivas Koneru
- EviewsUploaded bymarkfinans21
- Chap 005Uploaded byErica Jurkowski