You are on page 1of 10

POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

A Critique on a Research Article

Do Curriculum-Based Measures Predict Performance on


Word-Problem Solving Measures?

By

Dennis Sisco-Taylor, MA, Wenson Fung, MA,


and H. Lee Swanson, PhD

Presented to
Dr. Dennis O. Dumrique
Polytechnic University of the Philippines

In Partial Fulfillment
of the Requirements for the Course
Curriculum Theories, Principle, and
Instructional Designs

By

Dinnes A. Masubay

October 2016

RESEARCH PROBLEMS
The main purpose of this study was to examine the utility of word-problem
CBMs in predicting math outcomes for third-grade students. In particular, the
researchers (Swanson et. Al. 2014) wanted to investigate the extent to which
word-problem CBMs were predictive of high-stake criterion measures of math word-
problem. Therefore, the California STAR (California’s Standardized Testing and
Reporting measure) math test (California Department of Education, 2009) and two
norm-referenced tests (the word-problem subtests from the Key Math and
Comprehensive Math Abilities Test) were used as criterion measures. Third-grade
students were selected because word-problems become a major point of emphasis
in third grade math curricula.

RESEARCH QUESTIONS

The research questions are the following:

Research Question 1 (RQ1): To what extent do level of performance and ROI on


word-problem CBMs predict word-problem accuracy above and beyond
covariates of problem solving (i.e., calculation, reading comprehension,
estimation)?

Research Question 2 (RQ2): Are there significant differences in ROI on word-


problem CBMs that can be attributed to ability status?

Research Question 3 (RQ3): Are word-problem CBMs able to identify at-risk


students at levels that are greater than chance?

IMPORTANCE OF THE RESEARCH

 It was being emphasized that there were a big drop in mathematics


achievements of the students in this pass few years. This research may help in a
way that formative assessments allow us to adapt instruction based on results,
making modifications and improvements that will produce immediate benefits for
our students’ learning (Naiku, 2015). The results offered by formative assessments
can be used to find where the shortcomings are within your instruction, or among
your students.

 Gersten, Chard, et al. (2009) reported that formative assessment


practices were most effective when teachers used performance assessments to
evaluate specific academic skills, and subsequently used those data to make
instructional changes. In this research, teachers and educators can affirm that
formative assessments are very crucial in teaching –learning process. In this case, it
can be concluded that teachers and educators might give emphasis to a critically
planned formative assessments so that the learning can be ensured.
 Response to intervention (RTI) models (as cited in the research,
Swanson et. al. 2014) presents an established framework for implementing
formative assessment practices. The formative assessment practices are critical
components of the RTI process as formative assessments help inform both general
classroom instruction and supplemental instruction. These promises to enrich and
empower the education process not just inside the classroom but also with some
extra curriculum activities.

METHODS AND PROCEDURES USED

Students were administered a battery of assessments in the fall (pretest) and


spring (posttest) of third grade in group and individual formats.

The word-problem (Curriculum-Based Measure) CBMs were administered to


groups of students by teachers. The CBMs (modeled after Lehi et al., 2007) were
administered across six different time points (i.e., Day 6, Day 12, Day 18, etc.).
These six alternate forms of the CBM were randomly assigned to classrooms. Each
of the measures had 12 problems that included relevant and irrelevant propositions
matched by complexity.

Students had 2 min to work on the CBM version they were given. The 2-min
time limit was selected because this was assumed to capture the formatting of
timed tests in high-stakes assessment as well as norm-referenced measures. To
assist in controlling for possible order effects, one of three presentation orders was
randomly assigned to each of the classrooms. Although a comparison of the three
presentation orders showed no significant order effect favoring one presentation
order, presentation order was included as a covariate in the subsequent analyses.

Word-problem composite (Comprehensive Mathematical Abilities Test [CMAT]


+ Key Math). Two separate norm-referenced measures of word-problem accuracy
were individually administered during the posttest phase: the word-problem
subtests from the CMAT (Hresko, Schlieve, Herron, Swain, & Sherbenou, 2003) and
KeyMath (Connolly, 1998). The technical manuals for these subtest reported
adequate reliabilities (>.86) and moderate correlations (>.50) with other math
standardized tests (e.g., the Stanford Diagnostic Mathematics Test). The scores from
the CMAT and KeyMath was converted to scaled scores, and then averaged to
create the word-problem composite score.

SAMPLE USED IN THE STUDY


The sample consisted of 142 third-grade students (72 males, 70 females) who
were nested within 11 classrooms in five schools. The sample consisted of 61.97%
White (n = 88), 11.97% Hispanic (n = 17), 8.45% African American (n = 12), 4.23%
Asian (n = 6), and 13.38% who reported mixed ethnicity (n = 19). Of the 142
students, 6 (4.23%) had Special Education status (i.e., students with an Individual
Education Plan). The socioeconomic status (SES) of the sample consisted of low to
middle SES, based on students’ free and reduced lunch eligibility and parents’
education level and occupation. Of the 157 students recruited, 15 of the parents did
not give permission for their students to participate. Students were included in the
study if they returned informed parental consent (N = 142).

Procedures Students were administered a battery of assessments in the fall


(pretest) and spring (posttest) of third grade in group and individual formats.

RELIABILITY AND VALIDITY OF THE INSTRUMENT USED

The research discussed the validity of word-problem CBMs in predicting math


achievement, and identifying students in need of intervention. Findings related to
each research question are reviewed below.

RQ1: To What Extent Does Level of Performance and ROI on Word-Problem


CBMs Predict Overall Math Achievement Above and Beyond Traditional
Predictors?

The hierarchical regression models revealed that measures of calculation,


reading comprehension, estimation, and word-problem, accounted for 23% of the
variance in the STAR math test and 55% of the variance in the word-problem
composite scores at the end of the school year. In addition, the initial score and ROI
on the CBMs added unique variance in the prediction of math achievement at the
end of the school year for both outcome measures. Thus, even after accounting for
students’ calculation, reading comprehension and estimation skills at the beginning
of the school year, students’ initial performances on the word-problem CBM and
their rates of improvement on the task provided useful information in predicting end
of year outcomes in math problem solving. This finding lends further support to
recommendations from Gersten, Chard, et al. (2009), which called for the
implementation of math interventions that include explicit instruction on solving
word-problems. As students’ performances on word-problem CBMs in the winter
were highly predictive of problem-solving outcomes in the spring, improving
students’ word-problem skills through explicit instruction would likely result in better
problem-solving outcomes. While the relationships between the CBM measures and
the criterion measures are not as strong as those often observed in the CBM reading
literature, they were comparable with those previously reported in the math
literature. For example, Jitendra and colleagues (2005) reported concurrent validity
coefficients ranging from .64 to .71; whereas, in the current study, a predictive
validity coefficient of .62 was observed between the word-problem CBM
(administered in the winter) and the word-problem composite (administered in the
spring). This finding has strong implications for practice, as it shows that the
strength of association between word-problem tasks and word-problem outcome
measures was not compromised by the shorter administration time used in this
study. Recall that students had 8 min to work on the word-problem CBM used in
Jitendra et al. (2005), and students in this study had only 2 min to work on the
measure used in this study. The differences observed in the strength of association
between the word-problem CBM and the two respective criterion measures used in
this study may be attributable to differences in response format. While the STAR
test is in a multiple-choice format, the word-problem composite measure in this
study required the students to generate answers. Therefore, the criterion-
referenced tests (CMAT and KeyMath) used to form the word-problem composite are
likely more accurate representations of students’ math abilities in the area of word-
problem. However, given the focus on high-stakes testing in most states in the
United States, it is important to find measures that are better at predicting scores
on high-stakes tests. Therefore, future research in this area might consider
comparing the technical properties of problem-solving CBM with other CBM
measures that are more representative of the variety of math problems included in
high-stakes tests (i.e., a combination of word problems, calculation, and math
concepts).

RQ2: Are There Significant Differences in Growth (Rates of Improvement)


on Word Problem CBMs That Can Be Attributed to Ability Status?

While the CBM ROI across the three time points emerged as a significant
predictor of performance on both criterion measures, there were no statistically
significant differences between the slopes of at-risk students and students who
were not at risk. We have two explanations for these findings.

First, the standard deviations for the slope estimates of the respective groups
were rather large in comparison with the means (SDs = 2.00–3.60), indicating
substantial variance associated with these slope estimates. This outcome is likely
because only three data points were used to estimate slope in this study. Christ
(2006) demonstrated that the standard error of the estimates (SEEs) decreased
continuously when more data points were considered in calculating rates of
improvement and therefore, with more sessions, differences between risk groups
may have emerged. Second, our ability to accurately measure growth may be a
function of our methodology. In this study, we elected to use the total number of
correct responses as the outcome variable. Others (e.g., Foegen et al., 2008) have
used alternative scoring methodologies that reward students for accurately
completing parts of the word-problem process (i.e., identifying correct numbers,
choosing the correct algorithmic, etc.). Rewarding students for properly executing
parts of the word-problem process would likely increase the range of possible scores
and be more sensitive to growth. Future research in this area should examine these
possibilities. Despite these issues with measurement error, however, it is important
to note that the CBM slope emerged as a significant predictor of the criterion
measures. This is an important finding as it demonstrates that we are not only able
to measure growth in word-problem skills in an efficient manner but that growth
rates can give valuable information as it pertains to end-of-year success in math.
Thus, as teachers make the shift to incorporating more activities related to math
word-problem in their lesson plans, word problem CBMs hold promise for providing a
means to evaluate the effectiveness of the instruction. It is our recommendation
that in addition to CBM scores, ROI can provide more information for teachers to
assess their students’ progress.

RQ3: Are Word-Problem CBMs Able to Identify At-Risk Students at Levels


That Are Greater Than Chance?

The word-problem CBMs demonstrated the ability to distinguish between


students at different levels of risk, which could have strong implications for
informing math instruction. If a measure can correctly distinguish between students
who will meet an end of year criterion and those who will not, it has the capacity to
serve as an agent for selecting students for intervention. The word-problem CBMs
produced AUC values of .80 and .83 for the STAR and word-problem composite,
respectively. These values are greater than chance (50%), and comparable with
rates reported in the reading CBM literature (e.g., Compton et al., 2006; Deno et al.,
2009). Thus, our findings provide preliminary evidence that math CBMs have the
potential to be used in a similar fashion as reading CBMs; they can be used for the
purposes of identifying the students who present the greatest need for math
intervention. For this study, the 25th percentile was used as a cutoff for risk to
identify students scoring in the lowest quartile on the criterion measures. This is an
arbitrary cutoff, and the precision of the CBM measures can be evaluated using a
number of different cutoffs. However, from a practical aspect, the selection of
students for math intervention in a given school will depend largely on the
resources available at the school. For example, it may be of benefit to identify
students scoring below the 15th percentile as the resources available may only
allow one to serve a very small number of students. Regardless, this study showed
that word-problem CBMs hold the potential to be used as screeners for math
difficulties. Results from the logistic regression and ROC analyses have strong
implications for educational decision-making with regard to math instruction. The
overall AUC’s of .80 and .83, and levels of sensitivity and specificity that were
generated when using optimal cut-points, suggest that the word-problem CBMs can
aid educators in determining which students are in need of intervention. As Gersten,
Beckmann, et al. (2009), and others have made the recommendation of providing
interventions targeting word-problem during the early elementary years, it would
benefit teachers to know which students need those interventions most.
TYPE OF RESEARCH

In this research, the goal is to determine the relationship between the


performances of students in Curriculum-Based Measures and if these can be
predicted in solving word problems. In the way the researchers gathered data and
analyzed it showed that they used quantitative research which the design is
descriptive [subjects usually measured once]. A descriptive study establishes only
associations between variables; an experimental study establishes causality. The
data is gathered using structured research instruments, in this case the CBM.

Furthermore, the study examined whether curriculum-based measures


(CBMs) of math word-problem contributed unique variance in predictions of
performance on high-stakes tests, beyond the contribution of calculation and
reading skills. CBMs were administered to a representative sample of 142 third-
grade students at three time points. Results indicate that problem-solving CBMs
uniquely predicted a problem-solving composite measure and the California
Standardized and Reporting test, and were able to discriminate students at risk and
not at risk for math difficulties. Implications for practice are discussed.

HOW WAS THE DATA ANALYZED?

The results below are organized by the research questions that guided the study.

RQ1: To What Extent Does Level of Performance and ROI on Word-Problem


CBMs Predict Word-Problem Accuracy Above and Beyond Traditional
Predictors (i.e., Calculation, Number Sense)?

This research question was addressed by evaluating whether word-problem


CBM level and/or slope could add additional variance in predicting the criterion
measures of math achievement beyond the contribution of calculation skills,
estimation, and reading comprehension. A forced entry hierarchical regression
method was used where a measure of calculation was entered into the model first,
followed by a measure of number sense (estimation), and then the student’s
reading comprehension score. Next, the students’ initial CBM scores were entered
into the model, followed by the CBM ROI across the three data points. The ROIs
were calculated using an ordinary least squares regression approach as
recommended in Shinn, Good, and Stein (1989). The ROI was taken from the
regression equation of each student’s respective trend line. (Note that because ROI
was calculated using the trend line feature in Excel to generate least squares
regression line, the ROI is estimated ROI rather than actual ROI.) Results are
reported below for each measure.

The inclusion of the word-problem CBM accounted for an additional 8% of the


variance in the word-problem composite beyond the model containing calculation,
estimation, and reading comprehension. Also, once the initial CBM score was
introduced into the model, reading comprehension was no longer a significant
predictor. Last, entering of CBM ROI into model yielded a unique contribution). The
final model accounted for 55% of the variance in the problem-solving composite.

RQ2: Are There Significant Differences in Growth (Rates of Improvement)


on Word Problem CBMs That Can Be Attributed to Ability Status?

Students were grouped by ability status. Those with scores below the 25th
percentile on the California STAR or word-problem composite (CMAT and Key Math)
criterion measures were considered at risk. A one-way ANOVA was computed to
compare the rates of improvement (ROI) of at-risk students with the ROI of students
who were not at risk for both criterion measures. (All assumptions of ANOVA was
tested and met.). The researchers found no significant differences in ROI emerged
between the two groups as a function of risk status on the STAR; F (1, 141) = 1.16,
p > .05, or the problem-solving composite, F (1, 141) = 0.83, p > .05. For the total
sample, the mean ROI was 0.54 items per week, with a standard deviation of 1.44.

RQ3: Are Word-Problem CBMs Able to Identify At-Risk Students at Levels


That Are Greater Than Chance? To address this research question, logistic
regression and receiver operating curve (ROC) analyses were utilized. The

Results from the previous hierarchical regression model indicated that CBM
level and CBM slope were the only significant predictors of the STAR measure, while
calculation, CBM level, and CBM slope were all significant predictors of the word
problem composite. Therefore, these variables were included in the logistic
regression models, and the non-significant predictors from the previous analysis
(i.e., estimation, reading comprehension) were excluded.

The models specified for the STAR measure and word problem composite
produced AUC of .80 and .83, respectively. This means that when students were
identified as members of the not-at-risk group as a function of their initial CBM level
and slope, they yielded scores that were greater than students who were identified
as at risk 80% of the time for the STAR measure, and 83% of the time for the word
problem composite. The AUC indicates the extent to which the model is able to
discriminate between members of the at-risk group and not-at-risk group. The
observed values of .80 and .83 indicate that the models are discriminating at levels
greater than chance (e.g., Pepe, Longton, & Janes, 2009).

MAJOR FINDINGS

Large-scale reviews of the math literature (e.g., Gersten, Chard, et al., 2009;
NMAP, 2008) have highlighted the need to utilize formative assessment practices in
schools to improve math education. Formative assessment practices have been
most effective when teachers use performance assessments to evaluate specific
academic skills, and subsequently use those data to make instructional changes;
effects are strengthened further when guidance is given to teachers on using
assessment data to make instructional changes (Gersten, Chard, et al., 2009). Word-
problem CBMs hold the promise of assisting in this process because they can
provide low-inference information to teachers on students’ word-problem skills and
be used in a repeated fashion. This study examined the extent to which word-
problem CBMs predict math achievement, specifically criterion referenced measures
of word-problem, and the California STAR standardized test, beyond that of
traditional measures such as calculation, estimation, and reading abilities. This
investigation uncovered that word-problem CBM accounted for approximately one
quarter of the variance in the STAR test and over one half of the variance in the
word-problem composite measure. Predictive correlations with the STAR math test
and word-problem composite ranged from .46 to .62 (similar to correlations reported
in other studies investigating word-problem tasks; for example, Fuchs et al., 2011;
Fuchs et al. 2012; Jitendra et al., 2005), providing early evidence of predictive
validity for word-problem CBM. The researchers acknowledged that this is an initial
step in the validation process and that further research will be necessary to provide
more empirical support to the psychometric properties of word-problem CBMs.

SUGGESTIONS

In this study, the only word-problem accuracy was assessed by the CBM
measure. The addition of other problems such as calculation (e.g., addition,
subtraction) may be more representative of what is taught in schools and on high-
stakes tests. However, because of the limited research on word-problem CBM
measures, one of the purposes of this study was to try to determine whether word-
problem CBM predicted word-problem performance and the California STAR high-
stakes test beyond that of calculation skills.

The 2-min time limit that students had to complete the CBM measure is
questionable. This may have assessed other areas that are related to word-problem,
including reading fluency, processing speed, and working memory (e.g., Andersson,
2007; Swanson & Beebe-Frankenberger, 2004; Vilenius-Tuohimaa, Aunola, & Nurmi,
2008) rather than word-problem-solving ability. That is, students who have better
reading comprehension, faster phonological processing speed, and/or more working
memory capacity may be able to answer each question more quickly and answer
more questions. However, one advantage of the time limit is that it can be easily
implemented by teachers because it takes little time to administer. Despite the
limitations of the 2-min administration time limit addressed in the previous section,
findings from the current study suggest that 8- to 10-min samples of word problem
may not be necessary for the purposes of screening. The word-problem CBM was
able to distinguish between students who were at risk for word-problem difficulties
and students with relatively low risk of having word problem difficulties, all within a
2-min time frame. Future research in this area might examine how much time is
necessary to obtain an adequate sample of word-problem skills for the purposes
screening, and progress monitoring. Measures that require less administration time
also take less instructional time away from teachers; they are therefore more likely
to be accepted by teachers.

You might also like