Etr 560 Lidia Lentz 1

1
Predicting Literacy Proficiency Level from Age, Sex, Previous Year
Preschool Enrollment, and Race
Lidia Lentz
Department of Education Research and Evaluation, Northern Illinois University
ETR 560: Computer Data Analysis
Dr. Thomas Smith
12/07/2020
PREDITING LITERACY PROFICIENCY 2
Predicting Literacy Proficiency Level from Age, Sex, Previous Year Preschool Enrollment, and
Race
Kindergarten standards for literacy and mathematics have increased over the years,
demanding higher expectations from children. The percentage of students enrolled in preschool
and pre-kindergarten had only slightly raised from the previous census to 66% at four-year-old’s
and 43%of three-year-old’s (Yoshikaw & Brooks-Gunn, 2016). Low enrollment is shocking,
considering the importance that research has shown over the years of early education. This trend
starts to raise the question, why is enrollment rates low? What are the barriers and implications
of lack of early education?
Literature Review
Early education in this study is defined as a preschool, structured setting based on
education before four years old, and pre-kindergarten, structured setting based on education the
school year before kindergarten. Early education has made a shift from play time to a structured
learning environment focused all aspects of being a learner. Bassok and Rorem (2016) found
kindergarten expectations from 1998 to 2010 increased in rigor over five domains: literacy,
language, mathematics, social behavior, and functional behavior. Since the severity is set higher
for children entering kindergarten, research to determine the effects of early education and its
importance on later achievement has been a topic of many studies.
Literacy is a focus of this research that can be very complex due to the numerous skills
sets in this category. Literacy is a combination of different reading and language skills,
particularly oral language, phonological/phonemic awareness, alphabetic knowledge, print
knowledge, and invented spelling (Slutzky & Debruin-Parecki, 2019). Kindergarten students are
expected to identify sight words, read pattern texts, understand story elements and the
differences between genres. Since the expectations for entering kindergarten students are
significantly higher in literacy than the years past, building necessary reading skills is vital to
obtain before kindergarten.
Children across America come from different racial groups and family dynamics. Family
dynamics can play a part in the ability and type of preschool that a child will attend. Past studies
showed a correlation of race as a predictor of struggling students. Bowdon et al. (2019) found
that Hispanic and black students were 28 and 24 days behind white students, which is significant.
Race was a common factor across studies as being a predictor of achievement. Gender, on the
other hand, was not a predictor of academic achievement (St.Clair-Christman et al, 2011).
Finding predictors and correlations across multiple predictors will help identify areas of
focus for early elementary stakeholders. Previous research shows some compelling results about
possible predictors such as race, gender, and preschool enrollment (St.Clair-Christman et al,
2011; Morrow, 2005). The study does not focus on the years before pre-kindergarten, age 0-3,
and the impact of childcare during that time on later academic achievement. This study takes into
consideration all of these variables and sees if they are significantly related.
Purpose statement
The purpose of this data analysis is to investigate possible predictors for Spring
proficiency level in the area of literacy to age, sex, race, and last year's previous preschool
enrollment for students entering kindergarten.
Research Question
1. Does age significantly predict Spring literacy proficiency level for children entering
kindergarten?
2. After controlling for age, does sex and last year preschool enrollment significantly
predict Spring literacy proficiency level for children entering kindergarten?

3. After controlling for age, sex, and previous year preschool enrollment, does race
significantly predict Spring literacy proficiency level for children entering kindergarten?
Methodology
Dataset
The dataset found in this study is a combination of two study's datasets. The combined
dataset was from the Multi-State Study of Pre-Kindergarten and the State-Wide Early Education
Programs (SWEEP), which collected information across 11 states focusing on early childhood
education. The participants were children, 4 to 6 years old, and early education teachers. In total,
721 classrooms and 2,982 pre-kindergarten children were the participants (Early, et al., 2013).
The Multi-State Study of Pre-Kindergarten was conducted in the 2001-2002 school year
in a total of six states that had a single-minded focus and initiatives for early education. The
sampling was a stratified random sample of 40 centers or schools from a selected list. Data
collection was collected from participating teachers and families. Students included entering
kindergarten the next year, did not qualify for an IEP, and understood English or Spanish
directions (Early, et al., 2013).
The State-Wide Early Education Programs (SWEEP) was conducted in the 2003-2004
school year in five states. These states were different from the Mult-State Study of Pre-
Kindergarten to represent the population of states who use other initiatives and funding models.
State-funded pre-kindergarten sites were selected by random. 465 sites participated from the
states' given list, only two discontinued in the spring. Like the previous study, teachers and
families collected data. Eligible participants were selected the same as the previous study (Early,
et al., 2013). Data that was collected for both studies included demographic information.
Variables of Interest
The full dataset 2, renamed MSS_SWEEP, was imported into R statistical software
package for analysis. Dataset 1 was not used because this contained the teachers' information and
classroom observations. These variables were not used. Missing values are appropriately
specified for all variables used in data analysis. A data frame was created omitting missing
cases, MSregdata1cc (Early, et al., 2013).
Outcome Variable
The dependent variable (outcome) is a composite score was a mean of the five items
related to literacy proficiency evaluated by teachers on a rating scale. The respondents were
asked to rate the participants on a scale from 1 to 5, where 1=not yet, 2=beginning, 3=In
progress, 4=intermediate, and 5=proficient. Modifications to these five variables
(CSLANG2 through CSLANG6) included computing a composite score (Proflitskill) for each
participant. "Proflitskill" is a quantitative ratio variable. The composite scale score items rated
participants' comprehension, letter identification, phonological skills, prediction skills, and early
reading skills (Early, et al., 2013).
Predictor Variables
Six variables from MSS_SWEEP, a combination of both studies, were selected as
predictors in this data analysis. The predictors used are age (ASMTAGEPS), age in years;
quantitative, ratio-level variable, gender (CHGENP); categorical variable, race (CHRACEP);
nominal variable, last year preschool enrollment (CHATNDPRKP); nominal-binary level
variable (Early, et al., 2013).
Age was classified to a numeric and named (ageR). Sex, last year's preschool enrollment,
and race were classified as a factor and assigned labels. The sex variable has two levels, 1="
Male" and 2=" Female" and renamed sexR. Last year preschool enrollment has two levels, 1="
No" 2=" Yes" was named prekR. prekR is releveled, so the outcome variable reference category
is 'Yes', prekR1. Releveling is appropriate since the variable is measuring if they went to school
the previous year. Race variable has six levels, 1 =Latino, 2 = African American, 3 =Native
American, 4 =Asian, 5 =White, 6 =Multiracial and was renamed raceR. raceR is releveled, so
the outcome variable reference category is 'White', raceRR.
Analytic Methods Used
Descriptive statistical and graphical representations were used to collect descriptive
statistics and assess the distributions using the data frame, MSregdata1cc. Cronbach's Alpha was
calculated for the composite score, Proflitskill, to check for reliability between the items. Q-Q
plot was used to check for normality of the distribution. Multiple linear regression was used for
this analysis. Additionally, the Shapiro-Wilk normality test for homogeneity of variance were
computed.
Results
Descriptive Statistics
Descriptive statistics were computed for each variable. Figure 1 shows the visual
representation of each predictor variable (Early, et al., 2013). Skewness for ageR is 0, which
indicates normal distribution. Skewness for raceR, -0.22, and sexR, -0.02, is slightly negatively
skewed to the left. Skewness for prekR, -1.04, is negatively skewed to the left. Negative kurtosis
for all variables indicate a playkurtic distribution; ageR, -0.85, raceR, -1.70, sexR, -2.00,
and prekR,-0.92 .
Figure 01
Descriptive statistics for each variable in the subset.
Cronbach's Alpha was computed to determine reliability between the items in the
composite scale score of Proflitskill, named Proflitskillitems. The value for raw alpha = 0.88
(based on covariances) and standardized alpha = 0.88 (based on correlations). The value of alpha
indicates adequate reliability. Removal of an item would not significantly increase alpha, so all
items remained in the composite score. Descriptive statistics were computed as well as a
construct of a histogram seen in Figure 02. The computed results show that a negative skewness
statistic indicates a "left-skewed" distribution, and a slightly positive kurtosis statistic indicates a
somewhat "piked" distribution. The normal distribution can be assumed.
A Q-Q plot was computed for Proflitskill, and the results showed linearity in this plot,
which indicated normal distribution. Descriptive data were calculated, with plots shown in
Figure 02. The Q-Q plot shows positive skewness statistic indicates a "right-skewed"
distribution, and the negative kurtosis statistic indicates a somewhat "flattened" distribution.
Values of skew.2SE and kurt.2SE were more extreme than ±1.0, which is evidence of
statistically significant (p < .05) skewness and kurtosis.

The Shapiro-Wilk test for normality was conducted. The results showed 95% confidence
interval for the mean, 2.438 ± 0.039 and coefficient of variation = SD/mean = 0.997/2.438 =
0.409. The null hypothesis was that data come from a normal distribution. The null hypotheses is
accepted due to the W = 0.952, p < .001. There is a statistically significant departure from
normality for the composite "Literacy Proficient skills" scores. Descriptive HMISC showed
because the information statistic is close to 1, 0.997; this suggests a high degree of continuity in
this variable.
Figure 2
Q-Q Plot of Composite Scale Score
Inferential Statistics
Simple linear regression equation was computed to predicting “Literacy Proficiency
Level” from age . The equation used Proflitskil= b0 + b1(ageR). This was computed in Rstudio;
Proflitski= -0.63458 + 0.61172 (age), R2 = 0.03881. A Test was conducted to test of null
hypothesis; F(1, 2091) = 84.44, p = 2.2e-16, because p < .05, we reject the null hypothesis. It is a
small effect (R2 = 0.03881).
The ggplot, in Figure 3, is not excessively curved, so a linear relationship is suggested. A
formal test from the car was computed to test the null hypothesis that the residuals have constant
variance. The test results showed; χ2(1) = 7.432077, p = 0.0064071. Since p < .05, we reject the
null hypothesis of constant variance. Further Residual plots and histogram, in Figure 3, show
homoscedasticity assumption has not been met. Shapiro-Wilk test for normality of residuals;
rejected the null hypothesis; χ2(1) = 0.9652, p = 2.2e-16, because p < .05.
Figure 3
Scatterplot and residual plosts of simple regression model proflitskill ~ age
𝑦̂𝑖 = −0.63458 + 0.61172𝑥𝑖
Multiple linear regression predicting "Literacy Proficiency Level" from age, last year
preschool enrollment, and sex was computed, R2 = 0.08108. 8.1% of the variation in "Literacy
Proficiency Level" is explained by the full set of predictors. This is an increase of R2 by .04227.
A test was computed to test the null hypotheses. Results from that test are; F(1, 2098) =
61.44, p = 2.2e-16, because p < .05, we reject the null hypothesis. This combination of three
predictors significantly predicts perceived Literacy Proficiency Level. Shapiro-Wilk test for
normality of residuals; rejected the null hypothesis; χ2(1) = 0.9652, p = 0.96681, because p <
.05. The data is positively skewed and negatively kurtosis. This descriptive statistic can be seen
in the histogram in Figure 4. The non-constant Variance Score Test, Chisquare = 6.880831, Df =
1, p = 0.0087125. All variables were 1.0, which means no variance inflation due to
multicollinearity. Plots can be seen in Figure 4. The models have compared models by fitting the
two models., F(2, 2089) = 48.041 , p = 2.2e-16, because p < .05, we reject the null hypothesis.
Figure 4
Scatterplot and residual plosts of multple regression model proflitskill ~ age + preschool + sex
A third linear regression model was computed with an additional predicator, race, a
nominal variable requiring dummy coding. The code dummy code is 'White'. The model
compared White people to people from each of the other race categories. The model showed that
Native American, Asian, and Multiracial do not differ significantly in perceived Literacy
Proficiency Level due to the p-value being more than .05. Latino people had a p-value of 2.64e-
12, and black people had a p-value of 6.67e-05, which is significant and is a predictor of lower
Literacy Proficiency Levels. However, age (b = 0.55119, p =2e-16) , prek enrollment (b=0.39725
, p=2e-16), and female sex (b = 0.20888, p =2.63e-07) are statistically significant predictors of
Literacy Proficiency Level.
The R2 = 0.1354 for this model. The combined set of predictors explains 13.54% of the
variability in perceived literacy proficiency level. This is an increase of R2 by .09659. To check

for multicollinearity, the test of the null hypothesis, H0: R2 = 0 in the population, found F(8,
2084) = 40.8, p = 2.2e-16, and because p < .05, we reject the null hypothesis. The non-constant
Variance Score Test was computed, and all VIF statistics equaled one, which means there are no
concerns about multicollinearity among predictors. Models were compared; race is a statistically
significant predictor of perceived proficiency literacy skills, see Figure 5.
Figure 5
Scatterplot and residual plosts of multiple regression model proflitskill ~ age + preschool + sex
+ race
Discussion of the Findings and Recommendations
RQ1: Does age significantly predict Spring literacy proficiency level for children
entering kindergarten? There is a positive linear relationship between age and perceived Literacy
Proficiency Level. 3.88% of the variation in "Literacy Proficiency Level" is explained by age.
RQ2: After controlling for age, does sex and last year preschool enrollment significantly
predict Spring literacy proficiency level for children entering kindergarten? Considered
individually, age (b1 = 0.59494, p = 2e-16), preschool enrollment ‘Yes’ (b2 = 0.38652, p =
2.76e-16), and female sex (b3 = 0.21363, p = 3.04e-07) each are statistically significant, positive
predictors of increased Literacy Proficiency Level. As age increases, the Literacy Proficiency
Level also increases. Females have a higher perceived Literacy Proficiency Level than
males. Children who went to preschool the previous year have higher perceived Literacy
Proficiency Levels than children attending other childcare types. After controlling for age, the
combined set of predictor variables (sex and preschool enrollment) are statistically significant
predictors of literacy proficiency level.
RQ3: After controlling for age, sex, and previous year preschool enrollment, does race
significantly predict Spring literacy proficiency level for children entering kindergarten? After
controlling for age, preschool enrollment, and sex, race is a statistically significant literacy
proficiency level.
The findings show that preschool enrollment for two years before kindergarten has an
impact on literacy achievement. Further funding and initiatives in early education can improve
academic achievement for young children. This research also supports the findings that early
entrance to kindergarten is not ideal because older age is a predictor of academic achievement.
Hispanic and black children are at risk for lower academic achievement and should be a focus
group to close the gap in these marginalized groups.
Limitations
One limitation of this data set is the possibility of error in data input. There was little
information about how both sets of data were combined and processed used. Additionally, there
is a big chance of error due to the data collection since there was multiple teachers collected data
across 11 states in different studies. The guidelines for the data collection were not exact and that
could cause a discrepancy. The levels of literacy proficiency were very vague and dependent on
peer performance. Since the rating scale for proficiency was based merely on a teacher's
perspective, these levels are objective by their perceived expectations and the standards set in
their district or state. These studies were performed in different states that might have different
standards or initiatives. The funding in the state's early education and marginalized groups may
differ.
Future Research
The questionnaire for this research asked if a child attending preschool the previous year.
Since this was a predictor from data, this is an area that further research is essential. Future
research should be conducted on the consistency of preschool from 3 years old to entering
kindergarten and the quality and enrollment to academic achievement. Additionally, preschool
enrollment with race, gender, and age should be considered compared to growth versus
achievement. Achievement does not assess the child's ability from the start, where growth will
show preschool's impact.

References
Bassok, D., Latham, S., & Rorem, A. (2016). Is kindergarten the new first grade? AERA Open,
2(1), 233285841561635. doi:10.1177/2332858415616358
Bowdon, J., Dahlke, K., Yang, R., Pan, J., Marcus, J., & Lemieux, C. (2019). Children's
knowledge and skills at kindergarten entry in Illinois: Results from the first statewide
administration of the Kindergarten Individual Development Survey (Rep. No. REL
2020012). Retrieved https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=4573
(ERIC Document Reproduction Service No. ED599357)
Early, D., Burchinal, M., Barbarin, O., Bryant, D., Chang, F., Clifford, R., . . . Barnett, W. S.
(2013). Pre-Kindergarten in eleven states: NCEDL's multi-state study of pre-kindergarten
and study of state-wide early education programs (SWEEP). ICPSR Data Holdings.
doi:10.3886/icpsr34877.v1
Morrow, L. M. (2005). Language and literacy in preschools: Current issues and concern.
Literacy Teaching and Learning, 9(1), 7-19. Retrieved https://eric.ed.gov/?id=EJ966159.
Slutzky, C., & Debruin‐Parecki, A. (2019, December). State‐level perspectives on kindergarten
readiness. ETS Research Report Series, 2019(1), 1-40. doi:10.1002/ets2.12242
St.Clair-Christman, J., Buell, M., & Gamel-McCormick, M. (2011). Money matters for early
education: The relationships among childcare quality, teacher characteristics, and subsidy
status. Early Childhood Research & Practice, 13(2).
Yoshikaw, H., Weiland, C., & Brooks-Gunn, J. (2016). When does preschool matter? The Future
of Children, 26(2), 21-35. doi:10.1353/foc.2016.0010

Appendix A
Excerpt of Survey Items
CHGENP [ENTER RESPONDENTS GENDER:]
1 Male
2 Female
ASMTAGEPS ASK ALL

What is this child’s date of birth?
ASK ALL
Q.20 Rate the student’s achievement in comparison to other students of the same grade
level. The examples do not exhaust all the ways that a child may demonstrate what he/she
knows or can do. This child (INSERT ITEM) is not yet, beginning, in progress,
intermediate, proficienct, not applicable.
b. Understands and interprets a story or other text read to him/her – for

example, retelling a story just read to the group, or telling about why a story
ended
as it did, or connecting part of the story to his/her own life.
c. Easily and quickly names all upper– and lower-case letters of the alphabet.
d. Produces rhyming words – for example, says a word that rhymes with "chip,"
"shop," "drink," – or "light."
e. Predicts what will happen next in stories by using the pictures and storyline for
clues.
f. Reads simple books independently – for example, reads books with a repetitive
language pattern.
RESPONSE CATEFORIES:
1 Not yet
2 Beginning
3 In progress
4 Intermediate
5 Proficient
CHRACEP and CHATNDPRKP were survey items in the family questionnaire – The family
questionnaire were unavailable

Appendix B
Data Values for Each Variable
ASMTAGEPS: ASSMT PK S: AGE AT TIME OF ASSMT (YEARS)

Based upon 2,757 valid cases out of 2,982 total cases.
• Mean: 5.05
• Median: 5.06
• Mode: 5.34
• Minimum: 4
• Maximum: 6
• Standard Deviation: 0.32
Location: 2129-2136 (width: 8; decimal: 2)
Variable Type: numeric
(Range of) Missing Values: -99.00
CHGENP: PRESCHOOL: CHILD`S GENDER

Value Label Frequency Unweighted %
1 Male 1459 48.9 %
2 Female 1507 50.5 %
Missing Data
-99 System Missing 16 0.5 %
Total 2,982 100%
CHRACEP: FAMQ: PK CHILD`S RACE (MutExCat)

Value Label Frequency Unweighted %
1 Latino 764 25.6 %
2 African American 533 17.9 %
3 Native American 21 0.7 %
4 Asian 83 2.8 %
5 White 1200 40.2 %
6 Multiracial 297 10.0 %
Missing Data
Total 2,982 100%
CHATNDPRKP: TQSC PK F: DID CHILD ATTEND PREK LAST YEAR?

Value Label Unweighted Frequency %
1 Yes 669 22.4 %
2 No 1803 60.5 %
Missing Data
Total 2,982 100%
(Range of) Missing Values: -99
CSLANGPF2: ECLSK Acad skills PK FL&L ITEM2:STORY

Value Label UnweightedFrequency %

1 Not Yet 294 9.9 %
2 Beginning 682 22.9 %
3 In Progress 677 22.7 %
4 Intermediate 512 17.2 %
5 Proficient 356 11.9 %
Missing Data
Total 2,982 100%
CSLANGPF3: ECLSK Acad skills PK FL&L ITEM3:ALPHABET

1 Not Yet 821 27.5 %
Missing Data
Total 2,982 100%
CSLANGPF4: ECLSK Acad skills PK FL&L ITEM4:RHYME

1 Not Yet 940 31.5 %
Missing Data
Total 2,982 100%
CSLANGPF5: ECLSK Acad skills PK FL&L ITEM5:PREDICTS

1 Not Yet 323 10.8 %

Missing Data
Total 2,982 100%
CSLANGPF6: ECLSK Acad skills PK FL&L ITEM6:READS

1 Not Yet 1059 35.5 %
Missing Data
Tota l 2,982 100%
Appendix C
R Syntax
####Making a backup copy of your dataframe####

MSS_SWEEPup<-MSS_SWEEP
####load packages####
library(dplyr)
library(psych)
library(car)
library(mice)
library(MissMech)
library(imputeR)
library(naniar)
library(Hmisc)
library(ggplot2)
library(pastecs)
library(psych)
nrow(MSS_SWEEP)
####Converting variables to numeric####

MSS_SWEEP$CSLANGPF2R<-as.numeric(MSS_SWEEP$CSLANGPF2)
####Computing a composite score as the mean of item scores####

MSS_SWEEP$Proflitskill<-rowMeans(cbind(MSS_SWEEP$CSLANGPF2R,
MSS_SWEEP$CSLANGPF3R,
MSS_SWEEP$CSLANGPF6R),
na.rm=TRUE)
####Classify age as numeric####

MSS_SWEEP$ageR<-as.numeric(MSS_SWEEP$ASMTAGEPS)
####Classifying sex as a factor and assigning labels####

MSS_SWEEP$sexR<-factor(MSS_SWEEP$CHGENP,
levels=c(1,2),
labels=c("Male", "Female"))
####Converting variable to a factor and assigning labels####

MSS_SWEEP$raceR<-factor(MSS_SWEEP$CHRACEP,
levels=c(1,2,3,4,5,6),
labels=c("Latino",
"AfricanAmerican",
"Native American",
"Asian",
"White",
"Multiracial"))
####Classifying Last Year Preschool Enrollment as a factor and assigning labels####

MSS_SWEEP$prekR<-factor(MSS_SWEEP$CHATNDPRKP,
levels=c(1,2),
labels=c("Yes", "No"))
####Create subset dataframe MSregdata1cc####

MSregdata1<-dplyr::select(MSS_SWEEP,
Proflitskill,
ageR,
sexR,
raceR,
prekR)
MSregdata1cc<-na.omit(MSregdata1)
####Check for MCAR Create a temporary data set of six variables####

MSS_SWEEP$sexRR<-factor(MSS_SWEEP$CHGENP,
levels=c(1,2))
MSS_SWEEP$raceRR<-factor(MSS_SWEEP$CHRACEP,
levels=c(1,2,3,4,5,6))
MSS_SWEEP$prekRR<-factor(MSS_SWEEP$CHATNDPRKP,
levels=c(1,2))
####Descriptive statistics for age####

summary(MSregdata1cc$ageR)
psych::describe(MSregdata1cc$ageR)
hist(MSregdata1cc$ageR, col = "red")
####Descriptive statistics for sex####

summary(MSregdata1cc$sexR)
####construct a frequency distribution table for sex####

table(MSregdata1cc$sexR, exclude="NULL")
####compute descriptive stats for age by sex####

psych::describeBy(MSregdata1cc$ageR,MSregdata1cc$sexR)
####Assign frequency table to an object called sex_table####

sex_table <- table(MSregdata1cc$sexR)
####construct a barplot of sex####

barplot(sex_table,
col="purple",
main="Barplot of Sex")
####Bar plot of mean values, including 95% bootstrapped CI####

ggplot2::ggplot(MSregdata1cc,
aes(sexR, ageR)) +
stat_summary(fun=mean,
geom="bar",
fill=c("lightgreen","lightblue"),
color="black") +
labs(title="Barplot of Mean Age by Sex",
x="Sex",
y="Mean Age") +
stat_summary(fun.data=mean_cl_boot,
geom="pointrange")
####Bar plot of median values, including 95% CI####

aes(sexR, ageR)) +
stat_summary(fun=median,
geom="bar",
fill=c("lightgreen","lightblue"),
color="black") +
labs(title="Barplot of Median Age by Sex",
x="Sex",
y="Median Age") +
stat_summary(fun.data=median_hilow,
geom="pointrange")
####Constructing a freq table####

table(MSregdata1cc$raceR)
####Constructing a freq table that also shows missing values####

table(MSregdata1cc$raceR, exclude="NULL")
####construct a frequency distribution table for race####

race_table<-table(MSregdata1cc$raceR)
####construct barplot for Race####

barplot(race_table,
col="yellow",
main="Barplot for Race",
xlab="Race of Student",
ylab="Frequency")
####compute descriptive stats for age by race####

psych::describeBy(MSregdata1cc$ageR,MSregdata1cc$raceR)
####Descriptive statistics for “PROF” by Sex####

psych::describeBy(MSregdata1cc$Proflitskill, MSregdata1cc$sexR)
####Side-by-side boxplots####
boxplot(data=MSregdata1cc,
Proflitskill~raceR,
main="Boxplot of Literacy Proficiency Level by Race",
ylab="Composite Literacy Proficiency Level",
col="yellow",
notch=TRUE)
Proflitskill~raceR,
main="Boxplot of Literacy Proficiency Level by Race",
col="yellow",
notch=FALSE)
####construct a frequency distribution table for Preschool Setting####

table(MSregdata1cc$prekR, exclude="NULL")
####Assign frequency table to an object called prek_table####

prek_table <- table(MSregdata1cc$prekR)
####construct a barplot of sex####

barplot(prek_table,
col="purple",
main="Barplot of Last Year Preschool Enrollment",
xlab="Enrolled in Preschool Last Year",
ylab = "Frequency")
####Descriptive statistics for “PROF” by Enrollment####

psych::describeBy(MSregdata1cc$Proflitskill, MSregdata1cc$prekR)
Proflitskill~prekR,
main="Boxplot of Literacy Proficiency Level by Preschool Enrollment",

xlab="Last year Preschool Enrollment",
col="purple",
notch=TRUE)
#Computing descriptive statistics and histogram for composite variable

#na.rm=TRUE indicates to exclude missing values from computations
summary(MSregdata1cc$Proflitskill,
na.rm=TRUE)
hist(MSregdata1cc$Proflitskill, col = "green")
####Advanced Descriptive Statistics Histogram for composite variable####

#Descriptive stats from psych package
psych::describe(MSregdata1cc$Proflitskill)
#Histogram using additonal options

hist(MSregdata1cc$Proflitskill,
main="Histogram for Proficiency level of \n
Literacy Skills",
xlab="Proficiency Scores",
col="lightblue")
Cronbach Alpha
####Creating a dataframe that includes items only####
Proflitskillitems<- subset(MSS_SWEEP,
select=(c("CSLANGPF2R",
"CSLANGPF3R",
"CSLANGPF4R",
"CSLANGPF5R",
"CSLANGPF6R")))
names(Proflitskillitems)
####Computing Cronbach's alpha####

psych::alpha(Proflitskillitems)
####Constructing a Q-Q plot####

ggplot2::qplot(sample=MSregdata1cc$Proflitskill,
main = "Q-Q plot of Composite Scale Score (Proflitskill)")
####Computing descriptive statistics####

pastecs::stat.desc(MSregdata1cc$Proflitskill,
norm=TRUE)
####Computing descriptive statistics and round values to 3 digits####

ProflitskillStats1<-pastecs::stat.desc(MSregdata1cc$Proflitskill,
norm=TRUE)
round(ProflitskillStats1, digits=3)
####Descriptive statistics####
Hmisc::describe(MSregdata1cc$Proflitskill)
####boxplot####
boxplot(MSregdata1cc$Proflitskill,
main="Boxplot of 'Literacy Proficiency Level'",
ylab="Composite Score",
col="darkgreen",
notch=TRUE)
Proflitskill~sexR,
main="Boxplot of 'Literacy Proficiency Level'",
ylab="Composite Score",
col="darkgreen",
notch=TRUE)
#Scatterplot of scores on age with linear regression line and confidence interval#
aes(x=ageR,
y=Proflitskill))+
geom_point(na.rm=TRUE, alpha=0.2)+
labs(title="Scatterplot of Literacy Proficiency Level",
x="Age",
y="Literacy Proficiency Level") +
theme_bw(base_size=8) +
theme(plot.title=element_text(hjust = 0.5)) +
geom_smooth(method=lm,
se=TRUE,
color="red",
fill="darkgreen",
alpha=0.2)
aes(x=raceR,
y=Proflitskill))+
x="Race",
se=TRUE,
color="red",
fill="darkgreen",
alpha=0.2)
aes(x=sexR,
y=Proflitskill))+
x="Sex",
se=TRUE,
color="red",
fill="darkgreen",
alpha=0.2)
aes(x=prekR,
y=Proflitskill))+
x="Last Year Preschool Enrollment",
se=TRUE,
color="red",
fill="darkgreen",
alpha=0.2)
####Linear regression of Proficiency Literacy Lvel on age####

MSreg1<-lm(data=MSregdata1cc,
Proflitskill~ageR)
summary(MSreg1)
####Test for non-constant variance####

car::ncvTest(MSreg1)
####Regression diagnostic plots####

plot(MSreg1)
#Shapiro-Wilk test for normality of residuals

shapiro.test(MSreg1$residuals)
####Releveling the outcome variable so reference category is ‘Yes’ ####

MSregdata1cc$prekR2 <- relevel(MSregdata1cc$prekR, 2)
#Multiple linear regression model

MSreg2 <- lm(data=MSregdata1cc,
Proflitskill ~
ageR +
prekR2 +
sexR)
summary(MSreg2)
#histogram of residuals
hist(MSreg2$residuals, col="red")

#Descriptive statistics for residuals

psych::describe(MSreg2$residuals)
#Test for non-constant variance

#Residual plots
plot(MSreg2)
#Assessing multicollinearity among predictors

car::vif(MSreg2)
#Comparing ‘reduced’ and ‘full’ models#

anova(MSreg1,MSreg2)
####Releveling the outcome variable so reference category is ‘White’ ####

MSregdata1cc$raceR2 <- relevel(MSregdata1cc$raceR, 5)
#Releveling a categorical predictor (equivalent code)

MSregdata1cc$raceR2<-relevel(MSregdata1cc$raceR, "White")
MSreg3 <- lm(data=MSregdata1cc,

Proflitskill ~
ageR +
prekR2 +
sexR +
raceR2)
summary(MSreg3)
#histogram of residuals
hist(MSreg3$residuals, col="red")

#Descriptive statistics for residuals

psych::describe(MSreg3$residuals)
#Test for non-constant variance

#Residual plots
plot(MSreg3)
car::vif(MSreg3)
#Plot of actual outcome on predicted outcome

plot(predict(MSreg3),MSregdata1cc$Proflitskill,
xlab="predicted Literacy Proficiency Level",ylab="actual Literacy Proficiency Level")
#Comparing ‘reduced’ and ‘full’ models#

anova(MSreg2,MSreg3)
library(imputeR)
####Create new dataframe with mean-imputed missing values####
MSregmean<-imputeR::guess(MSregdata1cc, type="mean")
Hmisc::describe(MSregmean)
####Create new dataframe with randomly-imputed missing values ####

MSregrand<-imputeR::guess(MSregdata1cc, type="random")
MSregrand<-as.data.frame(MSregrand)
Hmisc::describe(MSregrand)
#Classifying sex as a factor and assigning labels#

MSregrandIMP$sexR<-factor(MSregrand$sexR,
levels=c(1,2),
labels=c("Male", "Female"))
#Multiple linear regression model w/imputed data

MSregrandIMP <- lm(data=MSregrand,
Proflitskill ~
ageR +
prekR2 +
sexR +
raceR2)
summary(MSregrandIMP)
#Create five multiply-imputed data sets

library(mice)
MSregMULTIMP <- mice::mice(MSregdata1cc,
m=5,
maxit=50,
seed=500)
#Pooling regression results from multiply-imputed data

MSreg2MI <- with(MSregMULTIMP,
lm(Proflitskill ~
ageR +
prekR2 +
sexR +
raceR2))
summary(pool(MSreg2MI))
MSreg2MIsum<-summary(pool(MSreg2MI))
round(MSreg2MIsum,digits=3)
Appendix D
Regression Outputs
> summary(MSreg1)
Call:
lm(formula = Proflitskill ~ ageR, data = MSregdata1cc)
Residuals:
Min 1Q Median 3Q Max
-1.8663 -0.7696 -0.1712 0.6485 2.9194
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.59696 0.33005 -1.809 0.0706 .
ageR 0.60441 0.06518 9.274 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9825 on 2223 degrees of freedom

Multiple R-squared: 0.03724, Adjusted R-squared: 0.03681
F-statistic: 86 on 1 and 2223 DF, p-value: < 2.2e-16
> summary(MSreg2)
Call:
lm(formula = Proflitskill ~ ageR + prekR2 + sexR, data = MSregdata1cc)
Residuals:
-2.2377 -0.7493 -0.1409 0.6425 2.9133
Coefficients:
(Intercept) -0.72665 0.32384 -2.244 0.0249 *
ageR 0.58858 0.06385 9.218 < 2e-16 ***
prekR2Yes 0.37893 0.04599 8.239 2.93e-16 ***
sexRFemale 0.21286 0.04081 5.216 1.99e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

F-statistic: 62.3 on 3 and 2221 DF, p-value: < 2.2e-16
> summary(MSreg3)
Call:
lm(formula = Proflitskill ~ ageR + prekR2 + sexR + raceR2, data = MSregdata1cc)
Residuals:
-2.4218 -0.6927 -0.1498 0.5909 2.9971
Coefficients:
(Intercept) -0.31699 0.31702 -1.000 0.317
ageR 0.54815 0.06207 8.831 < 2e-16 ***
prekR2Yes 0.39170 0.04491 8.722 < 2e-16 ***
sexRFemale 0.20619 0.03960 5.207 2.09e-07 ***
raceR2Latino -0.59414 0.04937 -12.035 < 2e-16 ***
raceR2AfricanAmerican -0.22920 0.05597 -4.095 4.38e-05 ***
raceR2Native American -0.20650 0.26049 -0.793 0.428
raceR2Asian -0.03671 0.11869 -0.309 0.757
raceR2Multiracial -0.09545 0.06998 -1.364 0.173
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

F-statistic: 43.73 on 8 and 2216 DF, p-value: < 2.2e-16
Analysis of Variance Table
Model 1: Proflitskill ~ ageR + prekR2 + sexR

Model 2: Proflitskill ~ ageR + prekR2 + sexR + raceR2
Res.Df RSS Df Sum of Sq F Pr(>F)
1 2221 2055.8
2 2216 1924.9 5 130.91 30.141 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Etr 560 Lidia Lentz 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Etr 560 Lidia Lentz 1

Uploaded by

Copyright:

Available Formats

1

Predicting Literacy Proficiency Level from Age, Sex, Previous Year

Preschool Enrollment, and Race

Department of Education Research and Evaluation, Northern Illinois University

ETR 560: Computer Data Analysis

Dr. Thomas Smith

of lack of early education?

Early education in this study is defined as a preschool, structured setting based on

importance on later achievement has been a topic of many studies.

particularly oral language, phonological/phonemic awareness, alphabetic knowledge, print

obtain before kindergarten.

enrollment for students entering kindergarten.

predict Spring literacy proficiency level for children entering kindergarten?

directions (Early, et al., 2013).

cases, MSregdata1cc (Early, et al., 2013).

progress, 4=intermediate, and 5=proficient. Modifications to these five variables

reading skills (Early, et al., 2013).

Six variables from MSS_SWEEP, a combination of both studies, were selected as

quantitative, ratio-level variable, gender (CHGENP); categorical variable, race (CHRACEP);

nominal variable, last year preschool enrollment (CHATNDPRKP); nominal-binary level

variable (Early, et al., 2013).

the outcome variable reference category is 'White', raceRR.

Analytic Methods Used

Descriptive statistical and graphical representations were used to collect descriptive

somewhat "piked" distribution. The normal distribution can be assumed.

statistically significant (p < .05) skewness and kurtosis.

Q-Q Plot of Composite Scale Score

Simple linear regression equation was computed to predicting “Literacy Proficiency

small effect (R2 = 0.03881).

The ggplot, in Figure 3, is not excessively curved, so a linear relationship is suggested. A

Scatterplot and residual plosts of simple regression model proflitskill ~ age

𝑦̂𝑖 = −0.63458 + 0.61172𝑥𝑖

Literacy Proficiency Level.

variability in perceived literacy proficiency level. This is an increase of R2 by .09659. To check

significant predictor of perceived proficiency literacy skills, see Figure 5.

Discussion of the Findings and Recommendations

predictors of literacy proficiency level.

group to close the gap in these marginalized groups.

show preschool's impact.

2(1), 233285841561635. doi:10.1177/2332858415616358

administration of the Kindergarten Individual Development Survey (Rep. No. REL

2020012). Retrieved https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=4573

(ERIC Document Reproduction Service No. ED599357)

(2013). Pre-Kindergarten in eleven states: NCEDL's multi-state study of pre-kindergarten

Literacy Teaching and Learning, 9(1), 7-19. Retrieved https://eric.ed.gov/?id=EJ966159.

Slutzky, C., & Debruin‐Parecki, A. (2019, December). State‐level perspectives on kindergarten

readiness. ETS Research Report Series, 2019(1), 1-40. doi:10.1002/ets2.12242

status. Early Childhood Research & Practice, 13(2).

of Children, 26(2), 21-35. doi:10.1353/foc.2016.0010

Excerpt of Survey Items

CHGENP [ENTER RESPONDENTS GENDER:]

ASMTAGEPS ASK ALL

b. Understands and interprets a story or other text read to him/her – for

questionnaire were unavailable

Data Values for Each Variable

ASMTAGEPS: ASSMT PK S: AGE AT TIME OF ASSMT (YEARS)

CHGENP: PRESCHOOL: CHILD`S GENDER

CHRACEP: FAMQ: PK CHILD`S RACE (MutExCat)

CHATNDPRKP: TQSC PK F: DID CHILD ATTEND PREK LAST YEAR?

CSLANGPF2: ECLSK Acad skills PK FL&L ITEM2:STORY

Value Label UnweightedFrequency %

CSLANGPF3: ECLSK Acad skills PK FL&L ITEM3:ALPHABET

CSLANGPF4: ECLSK Acad skills PK FL&L ITEM4:RHYME