Integrated Literature Review: The Relationship Between Assessment and Student Achievement

William Kralovec

Lehigh University Comprehensive Examination - Summer 2017


Relationship Between Assessment and Student Achievement

In education, the term assessment refers to the wide variety of methods or tools that

educators use to evaluate, measure, and document the academic readiness, learning progress,

skill acquisition, or educational needs of students. (Great Schools Partnership, 2013) A precise

definition of student achievement is difficult due to term having several layers of meaning and

nuance. Student learning and student achievement are closely related ideas and are often used

interchangeably. Student achievement is defined as the status of subject-matter knowledge,

understandings, and skills at one point in time. (Linn, Bond, Darling-Hammond, Harris, Hess,

& Shulman, 2011). Student learning is the growth of achievement, or as previously defined,

subject-matter knowledge, understandings and skills, over time. Achievement is commonly

assessed through standardized tests, developed by testing companies and administered to large

groups of students. Standardized tests compare and rank students. Teachers, students, parents,

policymakers and others focus on what is being measured, thus assessment and achievement are

closely related. The type of assessment determines what type of achievement is measured and

therefore valued. Achievement can also be measured through graduation rates, attendance,

university acceptances, and ultimately, whether a student matures into a happy, contributing

member to society. As Dewey wrote in Curriculum and the Child (1902) the following:

The world in which most of us live is a world in which everyone has a calling and

occupation, something to do. Some are managers and others are subordinates. But the

great thing for one as for the other is that each shall have had the education which enables

him to see within his daily work all there is in it of large and human significance.

The full interpretation of achievement will not be considered in this review and will focus on the

achievement that is measured or tested. However, student achievement is larger than what is

formally tested in the core subjects. There is a larger body of knowledge and skills that can be

measured that are not included on standardized tests. There is also an even larger amount of

classroom learning that cannot be measured easily. Linn, et al. (2011) capture this model in

figure 1.

Figure 1. From learning to measuring. Linn et al. (2011)

There are many forms of assessment for different reasons. One major division of

assessment is formative versus summative. The seminal study of formative assessment by Black

and Wiliam (1998) defined it as all those activities undertaken by teachers, and/or by their

students, which provide information to be used as feedback to modify the teaching and learning

activities in which they are engaged. Dunn and Mulvenon (2009) further clarified that the

assessments need to be used to monitor student progress during the learning process, and

teachers must provide qualitative and quantitative feedback to the student, if they are to be

considered formative. Many researchers, including the Black and Wiliam (1998) meta analysis of

over 250 articles related to formative assessment shows conclusively that formative assessment

does improve learning, and that gains in student achievement were amongst the largest ever

reported. One criticism about the numerous studies demonstrating this is that there are no

well-defined practices or artifacts that represent formative assessment. The wide variety of

implementations greatly differ from one implementation and student population to the next.

(Bennett, 2011) Bennett urges new development should focus on conceptualising well-specified

approaches built around process and methodology rooted within specific content domains.

Stiggins and Chappuis (2005) did set conditions for effective formative assessment as follows:

1. Focuses on clear purposes or goals.

2. Is an accurate reflection of achievement.

3. Uses frequent descriptive feedback on improvement in students work,

communicated clearly.

4. Students are also doing assessing.

The research on formative assessment does back up the claim that these aspects of

formative assessment, improve student achievement. Orsmond, Merry and Reiling (2002)

demonstrated the use of examplars, self-assessment and peer marking criteria improved first-year

biology students on a poster assignment. Examplars are models or excellent examples of

previous students work at different levels. Feedback done well, is a major influence on learning

and achievement (Hattie & Timperley, 2007). This means to be asking the right questions to

identify knowledge gaps, find erroneous understanding and determine remediation or alternative

steps to learning. Feedback is so important that teachers should automate much of the routine

tasks in schools so more time and resources can be devoted to responding to feedback. (Hattie &

Jaegar, 1998)

Technology is making individualized formative assessment even more effective due to

the individualization of learning. For example, Faber, Luyten & Visscher (2016) found large

gains in grade 3 mathematics use of digital formative assessment programs. Other technology

tools like Audience Response Systems (clicker-based technologies) and mobile devices (phones,

laptops) also had a small, but positive effect on learning through increased communication of

feedback between teacher and students. (Hunsu, Adesope, Bailey, 2016; Sung, Chang, Liu,


In contrast, summative assessments are used to evaluate student learning, skill acquisition

and academic achievement at the conclusion of a defined instructional period. (Great Schools

Partnership, 2013) This is typically at the end of a project, unit, course, semester, program or

school year. Smith, 2014 gives a comprehensive classification model of all categories of

assessment below in figure 2, including summative assessments.

Figure 2 National assessment categories. Types of assessment policies


Summative testing summarizes how well an individual is doing at a given point in time.

(Looney, 2009; Brookhart, 2011) Scores are disseminated at the district, state, national, regional

or world levels. Testing for assessment evaluates academic progress and may direct instruction,

hence becoming formative in nature. For example, graded work with comments will allow

students to take the feedback and improve on the next summative assessment. (Basey, Maines,

Frances, 2014) Testing for advancement determines the academic trajectory for students. For

example, in my current country of work, Japan, admittance to prestigious or highly selective

junior high schools, high schools and universities is determined by exams. (Yamamoto, 2016)

Examples of testing for accreditation are teacher certification exams or high school graduation

proficiency exams, which can serve both for accreditation and advancement.

In the USA, testing for accountability is becoming more prevalent. Because formative

assessment is difficult to measure, summative assessment, mostly in the form of standardized

tests have become synonymous with student achievement. Testing for accountability shifts the

blame from low performing students to low performing schools, including teachers and

administration. (Apple, 1999) There is much research outlining the history of this movement in

America, starting in the 1970s, when education leaders established a link between test scores and

school accountability. (Dorn, 2007; Kornhaber, 2004; Carl, 1994; Lee, 2008) The failure to

narrow the achievement gap between upper and middle class white students and disadvantaged

minority students was the focus of the 1983, Nation at Risk report. (Hopman, 2008) The Educate

America Act of 1994 under the Clinton administration pushed for national standards and

voluntary national testing in grades 4, 8 and 12. (Carl, 1994) The movement culminated with the

No Child Left Behind Act (NCLB), signed by George W. Bush in January of 2002. The law

required states to link standards, assessment and accountability. (Lohman, 2010) Schools were

judged on making adequate yearly progress (AYP) towards 100% student proficiency within

three years or face sanctions and even potential school closure. (Springer, 2008)

Does the increase in testing for accountability form of assessment improve student

achievement? A meta analysis research study of 25 states by Nichols, Glass & Berliner (2012)

looked at the relationship between high-stakes testing pressure and student achievement. ...a

pattern seems to have emerged that suggests that high-stakes testing has little or no relationship

to reading achievement, and a weak to moderate relationship to math, especially in fourth grade

but only for certain student groups. (Nichols, Glass, Berliner, 2012 p.3) Dee & Jacob in 2009,

concurred that the NCLB improved math achievement in grade 4 students, improved the lower

percentile groups in grade 8 mathematics, but had no effect on reading achievement.

Beyond the impact on student test scores, other unintended consequences resulted from

assessment for accountability. Nichols, Glass & Berliner in 2005 reminded us of Campbells

Law and that high-stakes tests cannot be trusted they are corrupted and distorted. Campbells

Law predicts when quantitative indicators are used for social decision-making, the more subject

they will be to corruption pressures and the more apt will be to distort and corrupt the social

processes it is intended to monitor. (Campbell, D. 1979, p.85) Scholarly evidence finds that

accountability pressure has a number of negative consequences such as a narrowing of the

curriculum (Au, 2007; Jacob, 2005), gaming the system by reclassifying students to remove them

from the testing pool (Cullen & Reback, 2006; Figlio & Getzer, 2006; Jacob, 2005), and outright

cheating. (Jacob & Levitt, 2003) A recent report from the National Assessment of Educational

Progress (NAEP) showed little progress between the 2008 and 2016 assessment in music and

visual arts achievement of grade 8 students. (Johnson, 2016) This may confirm the narrowing of

the curriculum with resources going towards mathematics and reading because they are assessed

annually, and music and visual arts being ignored.

The goal of 100% proficiency announced with the NCLB legislation was unrealistic and

the consequences of school closure and personnel changes were too high for schools. This

resulted in individual states lowering their standards to ensure all students reached the

proficiency standards (Peterson & Hess, 2008) In response, the United States Department of

Education announced the Race to the Top program in 2009. The competitive grant program

awards funding to schools demonstrating performance-based evaluations of teachers and

administrators, adopting common learning standards and policies that do not prohibit charter

schools (US Department of Education, 2015). The program takes control away from the states to

a national focus on standards and inducements, instead of punitive measures. (Lohman, 2010)

There has been much public discourse on the adoption of Common Core State Standards (CCSS)

and the growth of charter schools. Lee and Wu (2017) found that the CCSS movement has

indeed raised standards but not student achievement on the NEAP math and reading assessments.

The Common Core has helped America race to the top for performance standards, but not for

performance outcomes yet. (Lee and Wu, 2017) This might be that the NEAP is not aligned yet

to the CCSS (Wixson, Valencia, Murphy, & Phillips, 2013; Hughes, Daro, Holtzman &

Middleton, 2013)

The testing for accountability movement is spreading throughout the world. (Smith,

2014) The Programme for International School Assessment (PISA) is one of the most known

assessments used to measure student achievement across the world. The number of countries

participating has increased from 43 in the first round in 2000, to 71 countries in the latest round

of tests in 2015. Surveys show PISA affects educational systems throughout the world through

ministries of education learning about and emulating practices and policies of countries of high

achievement or that have demonstrated growth. (PISA, 2017) One example is reading

achievement greatly improved in Germany between 2000 and 2009. PISA evidence indicated

great inequalities in schools in the country and Germany invested in sub-par and disadvantaged

schools. (Hanushek & Woessmann, 2010) In another example, US Education Secretary Arne

Duncan in 2012, called for ...accelerating achievement in secondary school and the need to

close large and persistent achievement gaps and the results of the revealed US students failed

to improve in mathematics. (Carnoy & Rothstein, 2013)

A common assessment to measure student achievement in international schools is the

Measures of Academic Progress (MAP) produced by Northwest Evaluation Association. The

computer-based adaptive assessments in reading, mathematics, language usage and science can

be administered to students in kindergarten through grade 10. Students can compare themselves

to norm groups in the USA, international schools and the several regional associations of

international schools. Currently, over 1,000 international schools in 145 countries are using the

assessment, along with over 5,000 schools in the USA. (NWEA, 2017) Precise feedback is

provided immediately to teachers and it is designed for teachers to use to direct instruction. The

assessment is usually given two to three times per year in order to assess progress.

NWEA has a team of 20 curious and innovative researchers spend their days

investigating strategies to advance academic student growth and measurement. (NWEA, 2017)

In a series of studies focusing on high achieving students, defined by being in the 90th percentile

on the MAP assessment, approximately 40% dropped to around the 80th percentile over time,

however a larger number of students in the 80th percentile rose to the 90th percentile and above.

The study also revealed that high achievers in both high poverty and low poverty schools had no

difference in growth rates. (Xiang, Dahlin, Cronin, Theaker, & Durant, 2011) Another study

reaffirmed that the poverty rate at a given school had little effect on the growth of these high

achieving students. There was also much variance in growth among schools, with a significant

number high poverty schools of the 1300 schools in the study, outperforming low poverty

schools with growth of high achieving students. (Dahlin & Terasawa, 2013) In both studies,

although growth rates in high and low poverty schools were the same, pre-existing achievement

gaps gave more students in high achieving schools access to merit-based scholarships. Dahlin

and Terasawa conclude that policy makers should not define the nations elite students by

scoring in the top 1%, 5% or 10% in the national standardized pool, but every school has its own

10% of students and improving the achievement these elite students promotes American

competitiveness and a more fair and just society. (Dahlin & Terasawa, 2013 p5)

For such a common assessment used in American and international schools, there is a

disturbing lack of studies by independent researchers on the relationship between MAP and

student achievement. One study showed that MAP had no significant impact on the scores of

students on the state reading tests in grades 4 and 5. Evidence showed that subgroups of low and

high-achieving students may benefit most from MAP but more research is needed. (Cordray,

2013) Other groups, such as English language learners, low-income students and special needs

students, grow at the same rate as all students on MAP reading, mathematics and language usage

assessment. (Buchsbaum, 2013) . An interesting line of research is in the value-added model of


teacher evaluation using MAP. Gray (2010) found principals partially able to identify

high-performing teachers in mathematics, but not for English teachers.

On the basis of the research findings in this work, the following recommendations to

guide policy development at (school not identified) is as follows:

With so much evidence pointing to the effectiveness of formative assessment on student

growth and achievement, greater emphasis on best practices and wider implementation of

formative assessment should be promoted in all of classrooms.

Policies of assessment for accountability have too narrowly focused the attention of

teachers and schools on low-performing students and students close to proficiency level,

and a broader assessment program is needed to address the needs of all students. For our

school, this means focusing on high-achievers to ensure student growth. Do not be

satisfied with proficiency or world averages, but individualize assessment and focus on

achievement for all students.. This will allow us to have more data to support

achievement high-achieving students.

Assessment of reading and mathematics have narrowed teaching and learning to these

subjects. A drive to improve and assess student achievement in the fine arts, natural

sciences, technology and innovation, instead of solely assessing these two subjects

would benefit students. Developing assessments for achievement in these areas should be


Encourage research projects with student achievement on NWEAs Measures of

Academic Progress assessment. There is a lack of studies on this assessment in

international schools.

Using assessment data at our school, explore developing a value-added model of teacher


Although not an issue at our school, improving family and home life of high poverty and

disadvantage students would raise achievement of students coming from these homes.

Research indicates that the pre-existing conditions of students determines their ultimate

achievement as much as their schooling. In our context, appropriate interventions would

take the form of outreach meetings and workshops for our families (parents,

grandparents) to support English language education, student well-being and improving

understanding of the cultural aspects of the International Baccalaureate curriculum.



