Professional Documents
Culture Documents
Lidia Lentz
12/07/2020
PREDITING LITERACY PROFICIENCY 2
Predicting Literacy Proficiency Level from Age, Sex, Previous Year Preschool Enrollment, and
Race
Kindergarten standards for literacy and mathematics have increased over the years,
demanding higher expectations from children. The percentage of students enrolled in preschool
and pre-kindergarten had only slightly raised from the previous census to 66% at four-year-old’s
and 43%of three-year-old’s (Yoshikaw & Brooks-Gunn, 2016). Low enrollment is shocking,
considering the importance that research has shown over the years of early education. This trend
starts to raise the question, why is enrollment rates low? What are the barriers and implications
Literature Review
education before four years old, and pre-kindergarten, structured setting based on education the
school year before kindergarten. Early education has made a shift from play time to a structured
learning environment focused all aspects of being a learner. Bassok and Rorem (2016) found
kindergarten expectations from 1998 to 2010 increased in rigor over five domains: literacy,
language, mathematics, social behavior, and functional behavior. Since the severity is set higher
for children entering kindergarten, research to determine the effects of early education and its
Literacy is a focus of this research that can be very complex due to the numerous skills
sets in this category. Literacy is a combination of different reading and language skills,
knowledge, and invented spelling (Slutzky & Debruin-Parecki, 2019). Kindergarten students are
expected to identify sight words, read pattern texts, understand story elements and the
differences between genres. Since the expectations for entering kindergarten students are
PREDITING LITERACY PROFICIENCY 3
significantly higher in literacy than the years past, building necessary reading skills is vital to
Children across America come from different racial groups and family dynamics. Family
dynamics can play a part in the ability and type of preschool that a child will attend. Past studies
showed a correlation of race as a predictor of struggling students. Bowdon et al. (2019) found
that Hispanic and black students were 28 and 24 days behind white students, which is significant.
Race was a common factor across studies as being a predictor of achievement. Gender, on the
other hand, was not a predictor of academic achievement (St.Clair-Christman et al, 2011).
Finding predictors and correlations across multiple predictors will help identify areas of
focus for early elementary stakeholders. Previous research shows some compelling results about
possible predictors such as race, gender, and preschool enrollment (St.Clair-Christman et al,
2011; Morrow, 2005). The study does not focus on the years before pre-kindergarten, age 0-3,
and the impact of childcare during that time on later academic achievement. This study takes into
consideration all of these variables and sees if they are significantly related.
Purpose statement
The purpose of this data analysis is to investigate possible predictors for Spring
proficiency level in the area of literacy to age, sex, race, and last year's previous preschool
Research Question
1. Does age significantly predict Spring literacy proficiency level for children entering
kindergarten?
2. After controlling for age, does sex and last year preschool enrollment significantly
3. After controlling for age, sex, and previous year preschool enrollment, does race
significantly predict Spring literacy proficiency level for children entering kindergarten?
Methodology
Dataset
The dataset found in this study is a combination of two study's datasets. The combined
dataset was from the Multi-State Study of Pre-Kindergarten and the State-Wide Early Education
Programs (SWEEP), which collected information across 11 states focusing on early childhood
education. The participants were children, 4 to 6 years old, and early education teachers. In total,
721 classrooms and 2,982 pre-kindergarten children were the participants (Early, et al., 2013).
The Multi-State Study of Pre-Kindergarten was conducted in the 2001-2002 school year
in a total of six states that had a single-minded focus and initiatives for early education. The
sampling was a stratified random sample of 40 centers or schools from a selected list. Data
collection was collected from participating teachers and families. Students included entering
kindergarten the next year, did not qualify for an IEP, and understood English or Spanish
The State-Wide Early Education Programs (SWEEP) was conducted in the 2003-2004
school year in five states. These states were different from the Mult-State Study of Pre-
Kindergarten to represent the population of states who use other initiatives and funding models.
State-funded pre-kindergarten sites were selected by random. 465 sites participated from the
states' given list, only two discontinued in the spring. Like the previous study, teachers and
families collected data. Eligible participants were selected the same as the previous study (Early,
et al., 2013). Data that was collected for both studies included demographic information.
Variables of Interest
PREDITING LITERACY PROFICIENCY 5
The full dataset 2, renamed MSS_SWEEP, was imported into R statistical software
package for analysis. Dataset 1 was not used because this contained the teachers' information and
classroom observations. These variables were not used. Missing values are appropriately
specified for all variables used in data analysis. A data frame was created omitting missing
Outcome Variable
The dependent variable (outcome) is a composite score was a mean of the five items
related to literacy proficiency evaluated by teachers on a rating scale. The respondents were
asked to rate the participants on a scale from 1 to 5, where 1=not yet, 2=beginning, 3=In
(CSLANG2 through CSLANG6) included computing a composite score (Proflitskill) for each
participant. "Proflitskill" is a quantitative ratio variable. The composite scale score items rated
participants' comprehension, letter identification, phonological skills, prediction skills, and early
Predictor Variables
predictors in this data analysis. The predictors used are age (ASMTAGEPS), age in years;
Age was classified to a numeric and named (ageR). Sex, last year's preschool enrollment,
and race were classified as a factor and assigned labels. The sex variable has two levels, 1="
Male" and 2=" Female" and renamed sexR. Last year preschool enrollment has two levels, 1="
No" 2=" Yes" was named prekR. prekR is releveled, so the outcome variable reference category
PREDITING LITERACY PROFICIENCY 6
is 'Yes', prekR1. Releveling is appropriate since the variable is measuring if they went to school
the previous year. Race variable has six levels, 1 =Latino, 2 = African American, 3 =Native
American, 4 =Asian, 5 =White, 6 =Multiracial and was renamed raceR. raceR is releveled, so
statistics and assess the distributions using the data frame, MSregdata1cc. Cronbach's Alpha was
calculated for the composite score, Proflitskill, to check for reliability between the items. Q-Q
plot was used to check for normality of the distribution. Multiple linear regression was used for
this analysis. Additionally, the Shapiro-Wilk normality test for homogeneity of variance were
computed.
Results
Descriptive Statistics
Descriptive statistics were computed for each variable. Figure 1 shows the visual
representation of each predictor variable (Early, et al., 2013). Skewness for ageR is 0, which
indicates normal distribution. Skewness for raceR, -0.22, and sexR, -0.02, is slightly negatively
skewed to the left. Skewness for prekR, -1.04, is negatively skewed to the left. Negative kurtosis
for all variables indicate a playkurtic distribution; ageR, -0.85, raceR, -1.70, sexR, -2.00,
and prekR,-0.92 .
Figure 01
Descriptive statistics for each variable in the subset.
PREDITING LITERACY PROFICIENCY 7
Cronbach's Alpha was computed to determine reliability between the items in the
composite scale score of Proflitskill, named Proflitskillitems. The value for raw alpha = 0.88
(based on covariances) and standardized alpha = 0.88 (based on correlations). The value of alpha
indicates adequate reliability. Removal of an item would not significantly increase alpha, so all
items remained in the composite score. Descriptive statistics were computed as well as a
construct of a histogram seen in Figure 02. The computed results show that a negative skewness
statistic indicates a "left-skewed" distribution, and a slightly positive kurtosis statistic indicates a
A Q-Q plot was computed for Proflitskill, and the results showed linearity in this plot,
which indicated normal distribution. Descriptive data were calculated, with plots shown in
Figure 02. The Q-Q plot shows positive skewness statistic indicates a "right-skewed"
distribution, and the negative kurtosis statistic indicates a somewhat "flattened" distribution.
Values of skew.2SE and kurt.2SE were more extreme than ±1.0, which is evidence of
The Shapiro-Wilk test for normality was conducted. The results showed 95% confidence
interval for the mean, 2.438 ± 0.039 and coefficient of variation = SD/mean = 0.997/2.438 =
0.409. The null hypothesis was that data come from a normal distribution. The null hypotheses is
accepted due to the W = 0.952, p < .001. There is a statistically significant departure from
normality for the composite "Literacy Proficient skills" scores. Descriptive HMISC showed
because the information statistic is close to 1, 0.997; this suggests a high degree of continuity in
this variable.
Figure 2
Inferential Statistics
Level” from age . The equation used Proflitskil= b0 + b1(ageR). This was computed in Rstudio;
Proflitski= -0.63458 + 0.61172 (age), R2 = 0.03881. A Test was conducted to test of null
hypothesis; F(1, 2091) = 84.44, p = 2.2e-16, because p < .05, we reject the null hypothesis. It is a
formal test from the car was computed to test the null hypothesis that the residuals have constant
variance. The test results showed; χ2(1) = 7.432077, p = 0.0064071. Since p < .05, we reject the
PREDITING LITERACY PROFICIENCY 9
null hypothesis of constant variance. Further Residual plots and histogram, in Figure 3, show
homoscedasticity assumption has not been met. Shapiro-Wilk test for normality of residuals;
rejected the null hypothesis; χ2(1) = 0.9652, p = 2.2e-16, because p < .05.
Figure 3
Multiple linear regression predicting "Literacy Proficiency Level" from age, last year
preschool enrollment, and sex was computed, R2 = 0.08108. 8.1% of the variation in "Literacy
Proficiency Level" is explained by the full set of predictors. This is an increase of R2 by .04227.
A test was computed to test the null hypotheses. Results from that test are; F(1, 2098) =
61.44, p = 2.2e-16, because p < .05, we reject the null hypothesis. This combination of three
predictors significantly predicts perceived Literacy Proficiency Level. Shapiro-Wilk test for
normality of residuals; rejected the null hypothesis; χ2(1) = 0.9652, p = 0.96681, because p <
.05. The data is positively skewed and negatively kurtosis. This descriptive statistic can be seen
in the histogram in Figure 4. The non-constant Variance Score Test, Chisquare = 6.880831, Df =
PREDITING LITERACY PROFICIENCY 10
1, p = 0.0087125. All variables were 1.0, which means no variance inflation due to
multicollinearity. Plots can be seen in Figure 4. The models have compared models by fitting the
two models., F(2, 2089) = 48.041 , p = 2.2e-16, because p < .05, we reject the null hypothesis.
Figure 4
Scatterplot and residual plosts of multple regression model proflitskill ~ age + preschool + sex
A third linear regression model was computed with an additional predicator, race, a
nominal variable requiring dummy coding. The code dummy code is 'White'. The model
compared White people to people from each of the other race categories. The model showed that
Native American, Asian, and Multiracial do not differ significantly in perceived Literacy
Proficiency Level due to the p-value being more than .05. Latino people had a p-value of 2.64e-
12, and black people had a p-value of 6.67e-05, which is significant and is a predictor of lower
Literacy Proficiency Levels. However, age (b = 0.55119, p =2e-16) , prek enrollment (b=0.39725
, p=2e-16), and female sex (b = 0.20888, p =2.63e-07) are statistically significant predictors of
The R2 = 0.1354 for this model. The combined set of predictors explains 13.54% of the
for multicollinearity, the test of the null hypothesis, H0: R2 = 0 in the population, found F(8,
2084) = 40.8, p = 2.2e-16, and because p < .05, we reject the null hypothesis. The non-constant
Variance Score Test was computed, and all VIF statistics equaled one, which means there are no
concerns about multicollinearity among predictors. Models were compared; race is a statistically
Figure 5
Scatterplot and residual plosts of multiple regression model proflitskill ~ age + preschool + sex
+ race
RQ1: Does age significantly predict Spring literacy proficiency level for children
entering kindergarten? There is a positive linear relationship between age and perceived Literacy
Proficiency Level. 3.88% of the variation in "Literacy Proficiency Level" is explained by age.
RQ2: After controlling for age, does sex and last year preschool enrollment significantly
predict Spring literacy proficiency level for children entering kindergarten? Considered
PREDITING LITERACY PROFICIENCY 12
individually, age (b1 = 0.59494, p = 2e-16), preschool enrollment ‘Yes’ (b2 = 0.38652, p =
2.76e-16), and female sex (b3 = 0.21363, p = 3.04e-07) each are statistically significant, positive
predictors of increased Literacy Proficiency Level. As age increases, the Literacy Proficiency
Level also increases. Females have a higher perceived Literacy Proficiency Level than
males. Children who went to preschool the previous year have higher perceived Literacy
Proficiency Levels than children attending other childcare types. After controlling for age, the
combined set of predictor variables (sex and preschool enrollment) are statistically significant
RQ3: After controlling for age, sex, and previous year preschool enrollment, does race
significantly predict Spring literacy proficiency level for children entering kindergarten? After
controlling for age, preschool enrollment, and sex, race is a statistically significant literacy
proficiency level.
The findings show that preschool enrollment for two years before kindergarten has an
impact on literacy achievement. Further funding and initiatives in early education can improve
academic achievement for young children. This research also supports the findings that early
entrance to kindergarten is not ideal because older age is a predictor of academic achievement.
Hispanic and black children are at risk for lower academic achievement and should be a focus
Limitations
One limitation of this data set is the possibility of error in data input. There was little
information about how both sets of data were combined and processed used. Additionally, there
is a big chance of error due to the data collection since there was multiple teachers collected data
across 11 states in different studies. The guidelines for the data collection were not exact and that
could cause a discrepancy. The levels of literacy proficiency were very vague and dependent on
PREDITING LITERACY PROFICIENCY 13
peer performance. Since the rating scale for proficiency was based merely on a teacher's
perspective, these levels are objective by their perceived expectations and the standards set in
their district or state. These studies were performed in different states that might have different
standards or initiatives. The funding in the state's early education and marginalized groups may
differ.
Future Research
The questionnaire for this research asked if a child attending preschool the previous year.
Since this was a predictor from data, this is an area that further research is essential. Future
research should be conducted on the consistency of preschool from 3 years old to entering
kindergarten and the quality and enrollment to academic achievement. Additionally, preschool
enrollment with race, gender, and age should be considered compared to growth versus
achievement. Achievement does not assess the child's ability from the start, where growth will
References
Bassok, D., Latham, S., & Rorem, A. (2016). Is kindergarten the new first grade? AERA Open,
Bowdon, J., Dahlke, K., Yang, R., Pan, J., Marcus, J., & Lemieux, C. (2019). Children's
knowledge and skills at kindergarten entry in Illinois: Results from the first statewide
Early, D., Burchinal, M., Barbarin, O., Bryant, D., Chang, F., Clifford, R., . . . Barnett, W. S.
and study of state-wide early education programs (SWEEP). ICPSR Data Holdings.
doi:10.3886/icpsr34877.v1
Morrow, L. M. (2005). Language and literacy in preschools: Current issues and concern.
St.Clair-Christman, J., Buell, M., & Gamel-McCormick, M. (2011). Money matters for early
education: The relationships among childcare quality, teacher characteristics, and subsidy
Yoshikaw, H., Weiland, C., & Brooks-Gunn, J. (2016). When does preschool matter? The Future
Appendix A
1 Male
2 Female
ASK ALL
Q.20 Rate the student’s achievement in comparison to other students of the same grade
level. The examples do not exhaust all the ways that a child may demonstrate what he/she
knows or can do. This child (INSERT ITEM) is not yet, beginning, in progress,
intermediate, proficienct, not applicable.
RESPONSE CATEFORIES:
1 Not yet
2 Beginning
3 In progress
4 Intermediate
5 Proficient
CHRACEP and CHATNDPRKP were survey items in the family questionnaire – The family
Appendix B
Appendix C
R Syntax
####load packages####
library(dplyr)
library(psych)
library(car)
library(mice)
library(MissMech)
library(imputeR)
library(naniar)
library(Hmisc)
library(ggplot2)
library(pastecs)
library(psych)
nrow(MSS_SWEEP)
MSregdata1cc<-na.omit(MSregdata1)
table(MSregdata1cc$sexR, exclude="NULL")
####Side-by-side boxplots####
boxplot(data=MSregdata1cc,
Proflitskill~raceR,
main="Boxplot of Literacy Proficiency Level by Race",
ylab="Composite Literacy Proficiency Level",
col="yellow",
notch=TRUE)
boxplot(data=MSregdata1cc,
Proflitskill~raceR,
main="Boxplot of Literacy Proficiency Level by Race",
ylab="Composite Literacy Proficiency Level",
col="yellow",
notch=FALSE)
####Side-by-side boxplots####
boxplot(data=MSregdata1cc,
Proflitskill~prekR,
main="Boxplot of Literacy Proficiency Level by Preschool Enrollment",
PREDITING LITERACY PROFICIENCY 23
Cronbach Alpha
####Creating a dataframe that includes items only####
Proflitskillitems<- subset(MSS_SWEEP,
select=(c("CSLANGPF2R",
"CSLANGPF3R",
"CSLANGPF4R",
"CSLANGPF5R",
"CSLANGPF6R")))
names(Proflitskillitems)
####Descriptive statistics####
Hmisc::describe(MSregdata1cc$Proflitskill)
####boxplot####
boxplot(MSregdata1cc$Proflitskill,
main="Boxplot of 'Literacy Proficiency Level'",
ylab="Composite Score",
col="darkgreen",
notch=TRUE)
####Side-by-side boxplots####
boxplot(data=MSregdata1cc,
Proflitskill~sexR,
main="Boxplot of 'Literacy Proficiency Level'",
ylab="Composite Score",
col="darkgreen",
notch=TRUE)
#Scatterplot of scores on age with linear regression line and confidence interval#
ggplot2::ggplot(MSregdata1cc,
aes(x=ageR,
y=Proflitskill))+
geom_point(na.rm=TRUE, alpha=0.2)+
labs(title="Scatterplot of Literacy Proficiency Level",
x="Age",
y="Literacy Proficiency Level") +
theme_bw(base_size=8) +
theme(plot.title=element_text(hjust = 0.5)) +
geom_smooth(method=lm,
se=TRUE,
color="red",
fill="darkgreen",
alpha=0.2)
ggplot2::ggplot(MSregdata1cc,
aes(x=raceR,
y=Proflitskill))+
geom_point(na.rm=TRUE, alpha=0.2)+
labs(title="Scatterplot of Literacy Proficiency Level",
x="Race",
y="Literacy Proficiency Level") +
theme_bw(base_size=8) +
theme(plot.title=element_text(hjust = 0.5)) +
geom_smooth(method=lm,
se=TRUE,
PREDITING LITERACY PROFICIENCY 25
color="red",
fill="darkgreen",
alpha=0.2)
ggplot2::ggplot(MSregdata1cc,
aes(x=sexR,
y=Proflitskill))+
geom_point(na.rm=TRUE, alpha=0.2)+
labs(title="Scatterplot of Literacy Proficiency Level",
x="Sex",
y="Literacy Proficiency Level") +
theme_bw(base_size=8) +
theme(plot.title=element_text(hjust = 0.5)) +
geom_smooth(method=lm,
se=TRUE,
color="red",
fill="darkgreen",
alpha=0.2)
ggplot2::ggplot(MSregdata1cc,
aes(x=prekR,
y=Proflitskill))+
geom_point(na.rm=TRUE, alpha=0.2)+
labs(title="Scatterplot of Literacy Proficiency Level",
x="Last Year Preschool Enrollment",
y="Literacy Proficiency Level") +
theme_bw(base_size=8) +
theme(plot.title=element_text(hjust = 0.5)) +
geom_smooth(method=lm,
se=TRUE,
color="red",
fill="darkgreen",
alpha=0.2)
#histogram of residuals
hist(MSreg2$residuals, col="red")
#Residual plots
plot(MSreg2)
#histogram of residuals
PREDITING LITERACY PROFICIENCY 27
hist(MSreg3$residuals, col="red")
#Residual plots
plot(MSreg3)
car::vif(MSreg3)
library(imputeR)
####Create new dataframe with mean-imputed missing values####
MSregmean<-imputeR::guess(MSregdata1cc, type="mean")
Hmisc::describe(MSregmean)
MSreg2MIsum<-summary(pool(MSreg2MI))
round(MSreg2MIsum,digits=3)
PREDITING LITERACY PROFICIENCY 29
Appendix D
Regression Outputs
> summary(MSreg1)
Call:
lm(formula = Proflitskill ~ ageR, data = MSregdata1cc)
Residuals:
Min 1Q Median 3Q Max
-1.8663 -0.7696 -0.1712 0.6485 2.9194
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.59696 0.33005 -1.809 0.0706 .
ageR 0.60441 0.06518 9.274 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(MSreg2)
Call:
lm(formula = Proflitskill ~ ageR + prekR2 + sexR, data = MSregdata1cc)
Residuals:
Min 1Q Median 3Q Max
-2.2377 -0.7493 -0.1409 0.6425 2.9133
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.72665 0.32384 -2.244 0.0249 *
ageR 0.58858 0.06385 9.218 < 2e-16 ***
prekR2Yes 0.37893 0.04599 8.239 2.93e-16 ***
sexRFemale 0.21286 0.04081 5.216 1.99e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(MSreg3)
Call:
lm(formula = Proflitskill ~ ageR + prekR2 + sexR + raceR2, data = MSregdata1cc)
Residuals:
Min 1Q Median 3Q Max
-2.4218 -0.6927 -0.1498 0.5909 2.9971
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.31699 0.31702 -1.000 0.317
ageR 0.54815 0.06207 8.831 < 2e-16 ***
prekR2Yes 0.39170 0.04491 8.722 < 2e-16 ***
sexRFemale 0.20619 0.03960 5.207 2.09e-07 ***
raceR2Latino -0.59414 0.04937 -12.035 < 2e-16 ***
raceR2AfricanAmerican -0.22920 0.05597 -4.095 4.38e-05 ***
raceR2Native American -0.20650 0.26049 -0.793 0.428
raceR2Asian -0.03671 0.11869 -0.309 0.757
raceR2Multiracial -0.09545 0.06998 -1.364 0.173
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1