You are on page 1of 24

1.

Introduction Childrens environment including nutrition and family income has been reported to play an important role in childrens reading achievement. In order to diagnostic the model and well evaluate the relationship between mathematic achievement and parenting practice, a national probability sample data has been collected and used in the following analysis. 2. Descriptive statistics

Table 1. Descriptive Statistics Variables No. sample Dependent variable Mathraw Independent variables(quantitative variables) Age97 Faminc97 2223 3563 7.47 49841.3 2.93 49751.07 3.00 -72296.26 13.00 784610.5 9 Home97 Independent variables( Dichotomous variables) Low birth weight status WIC participation 3563 3322 0.39 0.43 3563 18.92 3.62 7.00 27.00 2211 36.32 22.27 0 98.00 of M SD Minimum Maximu m

The descriptive statistics for the variables are presented in Table 1. The skewness for each variable has been examined and there were no values greater than an absolute value of one, which suggesting reasonably distribution. The first variable is Age97. The mean of the child age is 7.47, which shows that the average age of the child is 7.47. The sample standard deviation is 2.93, which means that 68.3% of child in this sample is aged 4.45 to 10.3. The age of the child participating in this program ranged from 3.00 to 13.00. The second variable is family income. The mean of family income is 49841.25, which presents that the average family income is 49841.25. The standard deviation is 49751.07, and the family income of the total participations range from -72296.26 to 784610.59. The third variable is low birth weight status. 38.82% of children in the

sample were born with low birth weight. The fourth variable is parenting practice. Childrens parenting practice ranges from 7.00 to 27.00. The mean of the parenting practice is 18.92 and the standard deviation is 3.62, which means that 68.3% of the children have parenting practice between 5.3 and 22.54. 3. Multiple Regression Model Table2. Multiple Regression Models
Independent variables Model 1 B Standard error Age97 7.01* (0.06) Model 2 B Standard error 6.89* (0.06)

0.92*

0.90*

Faminc 97 WIC program

0.00*(0.00)

0.08*

0.00*(0.00)

0.05*

-3.11* (0.40)

-0.07*

-2.00* (0.40)

-0.04*

Low birth weight status Home 97 N R-squared

-2.15* (0.38) 2042 0.8648

-0.04* -

-1.86* (0.37)

-0.04*

0.65* (0.07) 2042 0.8708

0.09*

* p<.05 Multiple regressions can be used to examine the relationship between several independent variables including childs age, family income, low birth weight status, WIC participation, home practice and a single continuous dependent variable (raw score of childs mathematic achievement). The general formula for least squares or ordinary least squares regression for more than one predictor is as follows, Y^ = a + b1 X1 + b2 X1 + ... + bpXp. Table 2 shows the multiple regression models of childrens math raw score. In order to find out the influences of independent variables on dependent variables, I estimated two models: model 1 with childs age, family income, low birth weight status, WIC participation; Model 2 includes additional home parenting practice variable, which is used to see the influence of home parenting practice on childs math raw score. Model I from table 2 indicates that for each addition in increase in childs age, there is 7.01 increase in childs raw math score, controlling for other variables; for each addition increase in family income, there is 0.00 increase in childs raw math score; Low birth weight childrens raw math score is on average 2.15(unstandardized coefficient) lower than that of their counterpart,

controlling for other variables; WIC program participation childrens raw math score is on average 3.11(unstandardized coefficient) lower than that of their counterpart, controlling for other variables ; From the analyses above, we know that the increases in childs age, family income have positive impacts on childs raw math score. However, the increases in raw math score and WIC program participation all have negative impact on childs raw math score. In the second model we add parenting practice variable into the independent variables. The result indicates that for each increase in parenting practice, there are 0.65 increases in childs raw math score. As a result, child can get benefit from more parenting practice at home. 4. Correlation
Table 3: Pearson Correlation Coefficients Math Test Raw Score Math Test Raw Score Age of Child Total Family Income Birth Weight Status WIC Participation 1.00000 0.91702* 0.15325* 0.13407* -0.18295 Age of Child 0.91702 1.00000 0.05013 0.21580 -0.08985 Total Family Income 0.15325 0.05013 1.00000 -0.10109 -0.39297 Birth Weight Status 0.13407 0.21580 -0.10109 1.00000 0.10401 WIC Participation -0.18295 -0.08985 -0.39297 0.10401 1.00000

* p<.05 The correlation measures how well a straight line fits through a scatter of points when plotted on an x y axis. The correlations between different variables are shown in the above table. However, before using the Pearson correlation coefficient as a measure of association, we should be aware of its assumptions and limitations. As a result, diagnostic about the assumption is provided as below. 5. Diagnostic Violation of assumption can lead to problems in the specification of the regression model and thus provide valuable clues to a revision of the model. The true relationship between IVs and DVs may take many different mathematical forms. Therefore, we use graphical method here to detect types of misspecification of the form of the relationship. 1) linearity In order to make sure whether the data meet the assumption of linear regression, we construct a separate scatter plot for the dependent variable (mathraw) against each independent variable and superimpose linear and loess curves to see if the relationship is linear. From the graph, we know that there is large discrepancy between the loess curve and the regression line for the famic variable. In terms of the graph for the age variable, the loess curve is plotted against math test raw score curves. As a result, both of the two variables violate linearity.

2) Homoscedastic

When the error term variance appears constant, the data are considered homoscedastic, otherwise, the data are said to be heteroscedastic. From the predicted value, we know that there are some heteroscedastic problems with value. The graph of age of child (graph) shows that the residuals have constant variance, which indicates that it does not violate the assumption. However, from the graph of the family income (graph 5), we see that there is a high concentration of residuals above zero and below zero, which means that the variance is not constant and thus a systematic error exits.
3) Normality

Non-normal residuals are often an important signal of problems such as misspecification in the regression model. The normality graph below shows that the residuals for the data approximate normality. 6. Correction Correction was made in order to deal with the violations in terms of linearity and homoscedasticity by transforming variables. We use log transformation for family income and create age-square with centering. Through using the new transformed variables, we create a new model. 1) Linearity The result indicates that the age2c is significant and age has a nonlinear relationship with mathraw97. As a result, age2c should be included in the model. From the graph of the log of family income towards math raw score(graph 7), we can also see that the regression line is better matches the loess curve, which means that the log of family income should also be in the model. From the above analyst, we can conclude that the new model is better than the old one. 2) Homoscedasticity Apart from the check of linearity, the homoscedasticity of the variables are also checked. Based on the diagnostic result, the age variable is homoscedasticity, which is unnecessary to be checked. The distribution of the residuals in the new graph of the log of family income (graph 8) is more even than the graph before. 3) Normality Normality of the residuals is also been checked. From the new graph (graph 9), we can know that the residuals are approximate normality in the new model. 7. Parenting practice (Home 97) Parenting practice is thought to be a relevant variable of the model. In order to test if we should include home97 as potential omitted relevant variable, we do an added variable plot (AVP) which is basically a plot of residuals from original model on y axis and residuals from following regression model on x-axis (home97=age97 faminc97 bthwht wicpreg). Since the regression line/loess curves on plot are not horizontal, there is reason to suspect that home97 is omitted variable. So if we do not include home97 in the model, we will face omitted variable bias.

Therefore, our new model is as follows: mathraw97 = intercept + b1*agec + b2*age2c + b3*loginc + b4*bthwht + b5*wicpreg b7*home97 + e

8. Multicollinearity Table 4: Multicollinearity


Tolerance Age of child Age of child Squared Log of family Income Low birth weight status WIC nutrition program Parenting Practices 0.91 0.93 0.74 0.87 0.76 0.71 Variance Inflation Factor 1.10 1.08 1.35 1.15 1.31 1.40

Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated . It is a serious problem since it can influence parameter estimates and lead to inflated standard errors. Tolerance and Variance Inflation Factor are used to test multicollinearity. A VIF >= 10 and a tolerance <= .10 indicates serious multicollinearity. From the table above, the variance inflation factor for each variable is closed to 1 and the tolerance of each variable is less than 0.10, which means that there is no reason to suspect multicollinearity. 9. New multiple regression models after diagnostic

Table 5. Multiple Regression Models

Independent variables

Model 1

Model 2 (discrepancy)

Model 3(leverage)

Model4(influence)

B Standard error 6.96* (0.06) Childs age 6.89*(0.06) -0.09* (0.02) Childs age squared 0.63* (0.17) Log Family income -1.54* (0.39) Low birth weight status -1.60* (0.39) --2.26* (0.41) WIC program -0.67* (0.07) Parenting practice N R-squared 2045 0.8701 1985 0.8731 0.56* (0.07) --1.85* (0.43) 1.38* (0.24) -0.07* (0.03)

B Standard error 6.94* (0.06) -0.08* (0.02)

B Standard error 7.05* (0.06) -0.06* (0.02)

0.51* (0.16) -1.62* (0.37) -2.26* (0.39)

1.06* (0.18) -1.51* (0.33)

-1.71* (0.37)) -0.70* (0.07) 0.56* (0.06) 2043 0.8836 1936 0.9067

The new model after diagnostic is shown above. From model 1, we can conclude that for each addition increase in log family income, there is 0.63 increase in childs raw math score; Low birth weight childrens raw math score is on average 1.60 (unstandardized coefficient) lower than that of their counterpart, controlling for other variables ; WIC program participation childrens raw math score is on average 2.26(unstandardized coefficient) lower than that of their counterpart, controlling for other variables and a one unit Parenting Practice change is associated with a 0.67 point change in Math Test Raw Score. As discussed before, age has nonlinear relationship with math raw score. R square of this model is 0.8701, which means that 87.01% of the variance can be explained by this model. Apart from the model with outlier, we also built three models that are removed with outliers. The second model of the above table is defined in terms of discrepancy (standardized residuals). Discrepancy is to look at the distance between predicted and observed value. Looking at the distribution of standardized residuals can give us a sense of what observations have large differences between predicted and observed values and consequently might be outliers. Observations with standardized residuals lower than -3 or higher than 3 are regarded as potential outliers, which are removed from the model. The second model indicates that for each addition increase in log family income, there is 1.38 increase in childs raw math score; Low birth weight childrens raw math score is on average 1.54(unstandardized coefficient) lower than that of their counterpart, controlling for other variables; WIC program participation childrens raw math score is on average1.85(unstandardized coefficient) lower than that of their counterpart, controlling for other variables and a one unit Parenting Practice change is associated with a 0.56 point change in

Math Test Raw Score. Age has a nonlinear relationship with math raw score. R square of this model is 0.8731, which means that 87.31% of the variance can be explained by this model. The third model without outliers is defined in terms of leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an effect on the estimate of regression coefficients. Generally, a point with leverage greater than (2k+2)/n should be carefully examined, where k is the number of predictors and n is the number of observations. In our example this works out to (2*6+2)/2042 = 0.00686,thus points that are greater than 0.00686 are removed. The third model indicates that for each addition increase in log family income, there is 0.51 increase in childs raw math score; Low birth weight childrens raw math score is on average 1.62(unstandardized coefficient) lower than that of their counterpart, controlling for other variables; WIC program participation childrens raw math score is on average 2.26 (unstandardized coefficient) lower than that of their counterpart, controlling for other variables and a one unit Parenting Practice change is associated with a 0.70point change in Math Test Raw Score. Age has a nonlinear relationship with math raw score. R square of this model is 0.8836, which means that 88.36% of the variance can be explained by this model. The final model a regression model without outliers defined in terms of Influence. Cooks D is used to assess influence. Observations with a Cooks D above 4/n are removed from the model. In our example, the number of sample is 2042, thus the observations with a Cooks D above 0.0020 are removed from the model. The final model indicates that for each addition increase in log family income, there is 1.06 increase in childs raw math score; Low birth weight childrens raw math score is on average 1.51(unstandardized coefficient) lower than that of their counterpart, controlling for other variables; WIC program participation childrens raw math score is on average 1.71 (unstandardized coefficient) lower than that of their counterpart, controlling for other variables and a one unit Parenting Practice change is associated with a 0.56 point change in Math Test Raw Score. Age has a nonlinear relationship with math raw score. R square of this model is 0.9067, which means that 90.67% of the variance can be explained by this model. 10. Conclusion The analysis above indicates the variables such as family income violate the assumption of the original linear model. A new model is built through transforming the variables. The result is more correct after the transformation. From adding variable plot, we know that parenting practice is an omitted relevant variable, which will result in omitted variable bias if it is not included. The new model also suggests that the participation of WIC program has posed negative effect on childrens reading achievement.

Appendix
The MEANS Procedure Variable Label N Mean Std Dev Minimum Maximum mathraw97 2211 36.3265491 22.2712427 0 98.0000000 AGE97 AGE OF CHILD 97 2223 7.4669366 2.9318174 3.0000000 13.0000000 faminc97 TOTAL FAMILY INCOME 3563 49841.25 49751.07 -72296.26 784610.59 HOME97 FULL HOME SCALE 97 3563 18.9230143 3.6229133 7.0000000 27.0000000 The SAS System 18:33 Thursday, March 17, 2011 2 The FREQ Procedure WIC PROGRAM-PREGNANT 97 Cumulative Cumulative WICpreg Frequency Percent Frequency Percent 0 1882 56.65 1882 56.65 1 1440 43.35 3322 100.00 Frequency Missing = 241 The FREQ Procedure BIRTH WEIGHT OF THIS INDIVIDUAL Cumulative Cumulative bthwht Frequency Percent Frequency Percent 0 2180 61.18 2180 61.18 1 1383 38.82 3563 100.00 The SAS System 18:33 Thursday, March 17, 2011 4 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 4 876792 219198 3257.12 Error 2037 137086 67.29802 Corrected Total 2041 1013878 Root MSE 8.20354 R-Square 0.8648 Dependent Mean 35.82860 Adj R-Sq 0.8645 Coeff Var 22.89662 Parameter Estimates Variable Label

Parameter Standard DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 -15.54787 0.58633 -26.52 <.0001 WICpreg WIC PROGRAM-PREGNANT 97 1 -3.11256 0.40231 -7.74 <.0001 AGE97 AGE OF CHILD 97 1 7.01318 0.06394 109.68 <.0001 faminc97 TOTAL FAMILY INCOME 1 0.00003238 0.00000371 8.73 <.0001 bthwht BIRTH WEIGHT OF THIS 1 -2.14889 0.38263 -5.62 <.0001 INDIVIDUAL Parameter Estimates Variable Label Standardized DF Estimate

Intercept Intercept 1 0 WICpreg WIC PROGRAM-PREGNANT 97 1 -0.06887 AGE97 AGE OF CHILD 97 1 0.91948 faminc97 TOTAL FAMILY INCOME 1 0.07724 bthwht BIRTH WEIGHT OF THIS 1 -0.04736 INDIVIDU

The SAS System

18:33 Thursday, March 17, 2011 5

The REG Procedure Model: MODEL2 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F <.0001

Model 5 882864 176573 2743.99 Error 2036 131014 64.34884 Corrected Total 2041 1013878 Root MSE 8.02177 R-Square 0.8708 Dependent Mean 35.82860 Adj R-Sq 0.8705 Coeff Var 22.38930 Parameter Estimates Variable Label

Parameter Standard DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 -27.89046 1.39399 -20.01 <.0001 WICpreg WIC PROGRAM-PREGNANT 97 1 -1.99849 0.40977 -4.88 <.0001 AGE97 AGE OF CHILD 97 1 6.89320 0.06373 108.16 <.0001 faminc97 TOTAL FAMILY INCOME 1 0.00002210 0.00000378 5.85 <.0001 bthwht BIRTH WEIGHT OF THIS 1 -1.86583 0.37528 -4.97 <.0001 INDIVIDUAL HOME97 FULL HOME SCALE 97 1 0.65323 0.06725 9.71 <.0001 Parameter Estimates Variable Label Standardized DF Estimate

Intercept Intercept 1 0 WICpreg WIC PROGRAM-PREGNANT 97 1 -0.04422 AGE97 AGE OF CHILD 97 1 0.90375 faminc97 TOTAL FAMILY INCOME 1 0.05273 bthwht BIRTH WEIGHT OF THIS 1 -0.04112 INDIVIDUAL HOME97 FULL HOME SCALE 97 1 0.08992

Multiple Regression
The SAS System 11:57 Sunday, March 20, 2011 1 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 4 876792 219198 3257.12 Error 2037 137086 67.29802 Corrected Total 2041 1013878

Root MSE 8.20354 R-Square 0.8648 Dependent Mean 35.82860 Adj R-Sq 0.8645 Coeff Var 22.89662 Parameter Estimates Variable Label Parameter Standard DF Estimate Error t Value Standardized Pr > |t| Estimate

Intercept Intercept 1 -15.54787 0.58633 -26.52 <.0001 0 AGE97 AGE OF CHILD 97 1 7.01318 0.06394 109.68 <.0001 0.91948 faminc97 TOTAL FAMILY INCOME 1 0.00003238 0.00000371 8.73 <.0001 0.07724 bthwht BIRTH WEIGHT OF THIS 1 -2.14889 0.38263 -5.62 <.0001 -0.04736 INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 1 -3.11256 0.40231 -7.74 <.0001 -0.06887

Correlations
The SAS System 11:57 Sunday, March 20, 2011 2 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 4 876792 219198 3257.12 Error 2037 137086 67.29802 Corrected Total 2041 1013878 Root MSE 8.20354 R-Square 0.8648 Dependent Mean 35.82860 Adj R-Sq 0.8645 Coeff Var 22.89662 Parameter Estimates

Variable Label

Squared Squared Parameter Standard Standardized Semi-partial Partial DF Estimate Error t Value Pr > |t| Estimate Corr Type II Corr Type II

Intercept Intercept 1 -15.54787 0.58633 -26.52 <.0001 0 . . AGE97 AGE OF CHILD 97 1 7.01318 0.06394 109.68 <.0001 0.91948 0.79854 0.85520 bthwht BIRTH WEIGHT OF THIS 1 -2.14889 0.38263 -5.62 <.0001 -0.04736 0.00209 0.01525 INDIVIDUAL faminc97 TOTAL FAMILY INCOME 1 0.00003238 0.00000371 8.73 <.0001 0.07724 0.00506 0.03609 WICpreg WIC PROGRAM-PREGNANT 97 1 -3.11256 0.40231 -7.74 <.0001 -0.06887 0.00397 The SAS System 11:57 Sunday, March 20, 2011 3 The REG Procedure Model: MODEL2 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 5 882864 176573 2743.99 Error 2036 131014 64.34884 Corrected Total 2041 1013878 Root MSE 8.02177 R-Square 0.8708 Dependent Mean 35.82860 Adj R-Sq 0.8705 Coeff Var 22.38930 Parameter Estimates

Variable Label

Squared Squared Parameter Standard Standardized Semi-partial Partial DF Estimate Error t Value Pr > |t| Estimate Corr Type II Corr Type II

Intercept Intercept 1 -27.89046 1.39399 -20.01 <.0001 0 . . AGE97 AGE OF CHILD 97 1 6.89320 0.06373 108.16 <.0001 0.90375 0.74247 0.85176 bthwht BIRTH WEIGHT OF THIS 1 -1.86583 0.37528 -4.97 <.0001 -0.04112 0.00157 0.01200 INDIVIDUAL faminc97 TOTAL FAMILY INCOME 1 0.00002210 0.00000378 5.85 <.0001 0.05273 0.00217 0.01654 WICpreg WIC PROGRAM-PREGNANT 97 1 -1.99849 0.40977 -4.88 <.0001 -0.04422 0.00151 0.01155 HOME97 FULL HOME SCALE 97 1 0.65323 0.06725 9.71 <.0001 0.08992 0.00599 0.04429

The SAS System

10:26 Wednesday, March 23, 2011 2 The CORR Procedure

5 Variables:

mathraw97 AGE97 Simple Statistics

faminc97 bthwht

WICpreg

Variable mathraw97 AGE97 faminc97 bthwht WICpreg

Mean

Std Dev

Sum

Minimum

Maximum

2211 36.32655 22.27124 80318 0 98.00000 2223 7.46694 2.93182 16599 3.00000 13.00000 3563 49841 49751 177584390 -72296 784611 3563 0.38816 0.48740 1383 0 1.00000 3322 0.43347 0.49563 1440 0 1.00000 Simple Statistics Variable Label

mathraw97 AGE97 AGE OF CHILD 97 faminc97 TOTAL FAMILY INCOME bthwht BIRTH WEIGHT OF THIS INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations mathraw97 mathraw97 AGE97 faminc97 bthwht WICpreg

1.00000 0.91702 0.15325 0.13407 -0.18295 <.0001 <.0001 <.0001 <.0001 2211 2211 2211 2211 2042 0.91702 1.00000 0.05013 0.21580 -0.08985 <.0001 0.0181 <.0001 <.0001 2211 2223 2223 2223 2053 0.15325 0.05013 1.00000 -0.10109 -0.39297 <.0001 0.0181 <.0001 <.0001 2211 2223 3563 3563 3322

AGE97 AGE OF CHILD 97 faminc97 TOTAL FAMILY INCOME

bthwht 0.13407 0.21580 -0.10109 1.00000 0.10401 BIRTH WEIGHT OF THIS INDIVIDUAL <.0001 <.0001 <.0001 <.0001 2211 2223 3563 3563 3322 WICpreg -0.18295 -0.08985 -0.39297 0.10401 1.00000 WIC PROGRAM-PREGNANT 97 <.0001 <.0001 <.0001 <.0001 2042 2053 3322 3322 3322

Linearity (age)

Graph Linearity between mathraw and faminc

Graph 2 Predicted value by mathraw

Graph 3

Homoscedasticity 2

Graph 4 Homoscedasticity Faminc

Graph 5 Normality for residuals

Graph 6 Linearity:
The SAS System 11:57 Sunday, March 20, 2011 5 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 5 876154 175231 2590.47 Error 2036 137724 67.64431 Corrected Total 2041 1013878 Root MSE 8.22462 R-Square 0.8642 Dependent Mean 35.82860 Adj R-Sq 0.8638 Coeff Var 22.95545 Parameter Estimates Variable Intercept agec age2c loginc bthwht Label Intercept 1 1 1 BIRTH WEIGHT OF THIS Parameter Standard DF Estimate Error 1 t Value

Standardized Pr > |t| Estimate

27.02422 1.85308 14.58 <.0001 0 7.01213 0.06412 109.37 <.0001 0.91935 -0.10530 0.02528 -4.17 <.0001 -0.03524 1.18155 0.16788 7.04 <.0001 0.06316 1 -1.81405 0.39663 -4.57 <.0001 -0.03998

INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97

-3.28529

0.40871

-8.04

<.0001

-0.07269

Graph 7 2) Homoscedasticity:

Graph 8

Graph 9 OMITTED VARIABLE BIAS


The SAS System

11:57 Sunday, March 20, 2011 6

The REG Procedure Model: MODEL1 Dependent Variable: HOME97 FULL HOME SCALE 97 Number of Observations Read 3563 Number of Observations Used 2053 Number of Observations with Missing Values 1510 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 5 5593.24343 1118.64869 Error 2047 13750 6.71723 Corrected Total 2052 19343

166.53

Root MSE 2.59176 R-Square 0.2892 Dependent Mean 20.17540 Adj R-Sq 0.2874 Coeff Var 12.84615 Parameter Estimates Variable Intercept agec age2c loginc bthwht Label Intercept 1 1 1 BIRTH WEIGHT OF THIS Parameter Standard DF Estimate Error 1 t Value Pr > |t|

12.54808 0.58263 21.54 <.0001 0.18212 0.02012 9.05 <.0001 -0.03023 0.00793 -3.81 0.0001 0.83258 0.05278 15.78 <.0001 1 -0.32716 0.12458 -2.63 0.0087

INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97

-1.53162

0.12853

-11.92

<.0001

Graph 10

Multicollinearity and Outliers 1)


The SAS System 11:57 Sunday, March 20, 2011 7 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 6 882222 147037 2272.75 Error 2035 131656 64.69563 Corrected Total 2041 1013878 Root MSE 8.04336 R-Square 0.8701 Dependent Mean 35.82860 Adj R-Sq 0.8698 Coeff Var 22.44955 Parameter Estimates Variable Label Intercept Intercept agec age2c loginc

Parameter Standard Standardized DF Estimate Error t Value Pr > |t| Estimate 1 1 1

Variance Tolerance Inflation

1 18.64701 2.00809 9.29 <.0001 0 . 0 6.89157 0.06393 107.80 <.0001 0.90354 0.90836 1.10088 -0.08536 0.02481 -3.44 0.0006 -0.02857 0.92556 1.08042 0.62996 0.17378 3.63 0.0003 0.03367 0.73945 1.35235

bthwht BIRTH WEIGHT OF THIS 1 -1.59559 0.38854 -4.11 <.0001 -0.03516 0.87024 1.14911 INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 1 -2.26226 0.41342 -5.47 <.0001 -0.05005 0.76261 1.31128 HOME97 FULL HOME SCALE 97 1 0.66547 0.06871 9.68 <.0001 0.09161 0.71323 1.40208

The SAS System

11:57 Sunday, March 20, 2011 8 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 6 882222 147037 2272.75 Error 2035 131656 64.69563 Corrected Total 2041 1013878 Root MSE 8.04336 R-Square 0.8701 Dependent Mean 35.82860 Adj R-Sq 0.8698 Coeff Var 22.44955 Parameter Estimates Variable Label Parameter Standard DF Estimate Error

t Value

Pr > |t|

Intercept Intercept 1 18.64701 2.00809 9.29 <.0001 agec 1 6.89157 0.06393 107.80 <.0001 age2c 1 -0.08536 0.02481 -3.44 0.0006 loginc 1 0.62996 0.17378 3.63 0.0003 bthwht BIRTH WEIGHT OF THIS 1 -1.59559 0.38854 -4.11 <.0001 INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 1 -2.26226 0.41342 -5.47 <.0001 HOME97 FULL HOME SCALE 97 1 0.66547 0.06871 9.68 <.0001 --------------------------------------------------

Regression model w/ outliers:


The SAS System 11:57 Sunday, March 20, 2011 10 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3563 Number of Observations Used 2042 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 6 882222 147037 2272.75 Error 2035 131656 64.69563 Corrected Total 2041 1013878 Root MSE 8.04336 R-Square 0.8701 Dependent Mean 35.82860 Adj R-Sq 0.8698 Coeff Var 22.44955 Parameter Estimates Variable Label Parameter Standard DF Estimate Error

t Value

Pr > |t|

Intercept Intercept 1 18.64701 2.00809 9.29 <.0001 agec 1 6.89157 0.06393 107.80 <.0001 age2c 1 -0.08536 0.02481 -3.44 0.0006 loginc 1 0.62996 0.17378 3.63 0.0003 bthwht BIRTH WEIGHT OF THIS 1 -1.59559 0.38854 -4.11 <.0001 INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 1 -2.26226 0.41342 -5.47 <.0001 HOME97 FULL HOME SCALE 97 1 0.66547 0.06871 9.68 <.0001 ---------------------------------------------------------------------

Regression model without outliers defined in terms of discrepancy (studentized residuals);

The SAS System

11:57 Sunday, March 20, 2011 13 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3544 Number of Observations Used 2023 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 6 880992 146832 2549.99 Error 2016 116084 57.58141 Corrected Total 2022 997076 Root MSE 7.58824 R-Square 0.8836 Dependent Mean 35.71478 Adj R-Sq 0.8832 Coeff Var 21.24678 Parameter Estimates Variable Label Parameter Standard DF Estimate Error

t Value

Pr > |t|

Intercept Intercept 1 19.27959 1.90351 10.13 <.0001 agec 1 6.93544 0.06071 114.24 <.0001 age2c 1 -0.07684 0.02353 -3.27 0.0011 loginc 1 0.51045 0.16495 3.09 0.0020 bthwht BIRTH WEIGHT OF THIS 1 -1.61804 0.36813 -4.40 <.0001 INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 1 -2.26195 0.39203 -5.77 <.0001 HOME97 FULL HOME SCALE 97 1 0.69831 0.06511 10.73 <.0001 -------------------------------------------------------------------------------

* Regression model without outliers defined in terms of LEVERAGE;


The SAS System 01:42 Wednesday, March 23, 2011 9 The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3506 Number of Observations Used 1985 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 6 852610 142102 2268.35 Error 1978 123913 62.64539 Corrected Total 1984 976522 Root MSE 7.91488 R-Square 0.8731 Dependent Mean 35.79950 Adj R-Sq 0.8727 Coeff Var 22.10893 Parameter Estimates Variable Label Parameter Standard DF Estimate Error

t Value

Pr > |t|

Intercept Intercept 1 12.72161 2.52431 5.04 <.0001 agec 1 6.96853 0.06452 108.00 <.0001 age2c 1 -0.07290 0.02541 -2.87 0.0042 loginc 1 1.38018 0.23938 5.77 <.0001 bthwht BIRTH WEIGHT OF THIS 1 -1.53548 0.38668 -3.97 <.0001 INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 1 -1.84535 0.42952 -4.30 <.0001 HOME97 FULL HOME SCALE 97 1 0.55511 0.07433 7.47 <.0001 -------------------------------------------------------------

* Regression model without outliers defined in terms of Influence (Cook's d);


The SAS System 01:42 Wednesday, March 23, 2011 10

The REG Procedure Model: MODEL1 Dependent Variable: mathraw97 Number of Observations Read 3457 Number of Observations Used 1936 Number of Observations with Missing Values 1521 Analysis of Variance Source DF Sum of Mean Squares Square F Value Pr > F <.0001

Model 6 844913 140819 3125.37 Error 1929 86914 45.05674 Corrected Total 1935 931827 Root MSE 6.71243 R-Square 0.9067 Dependent Mean 35.06715 Adj R-Sq 0.9064 Coeff Var 19.14165 Parameter Estimates Variable Label Parameter Standard DF Estimate Error

t Value

Pr > |t|

Intercept Intercept 1 16.05139 1.98502 8.09 <.0001 agec 1 7.05032 0.05570 126.59 <.0001 age2c 1 -0.05996 0.02162 -2.77 0.0056 loginc 1 1.05519 0.18295 5.77 <.0001 bthwht BIRTH WEIGHT OF THIS 1 -1.51037 0.33201 -4.55 <.0001 INDIVIDUAL WICpreg WIC PROGRAM-PREGNANT 97 1 -1.70965 0.36580 -4.67 <.0001 HOME97 FULL HOME SCALE 97 1 0.56232 0.06089 9.24 <.0001