Professional Documents
Culture Documents
who doubt this should review the t extbooks in common use then and
now or look at the tests then and now. If they are still in doubt, I invite
them to go to the NAEP Web site and review the que stions in_�t:_h
and science for eighth-grade students. The questions range from easy to
very difficult. Surelyanaatiltsnmila-5e able to answer them all, right ?
You are likely to learn, if you try this experiment, that the difficult y
and complexity of what is taught today far exceed anything the average
The Facts About Test Scores student enc ountered in s chool decades ago.
NAEP is central to any discussion of whether American s tudents
and t he public schools they at tend a re doing well or badly. It h as mea
sured reading and m ath and other subjects over tim e. It is adm in istered
CLAIM Test scores are fallint: and the educational system is to s amples of students; no one knows who will take it, no one can pre
-
brolienanaoliSoleie�--- pare to take it, no one takes the whole test. There are no stakes at tached
to NAEP; no student ever ge ts a test score. NAEP reports the results of
REALITY Test scores a�heir highest point ever recorded.
its assessments in two different ways.
1. Trend in Fourth-Grade NAEP Matnematics Average scores students know and can do and, like No Child Left Behind,
l
Scale score eiqiectation that all students ought to be proficient.
::I
· All definitions of education standards are subjecti';,e. People who set
230
·-· 3
235' . 2 �:
240' .240' 241 sta;;:'dards use their own judgment to decide what stude��g�Lt_�
know and how well they should know it. People use their own judg
220 ment to decide the passing mark on a test. None of this is science. It
210 is human judgment, subject to error and bias; the passing mark may
0-'---r--...----.----,-----r---,----,---r--...-- go up or down, and the decision about what st.udents should know in
'90 '92 '96 '00 '03 '05 '07 '09 '11 Year which grades may change, depending on who is making the decisions
- - - - Accommodations not permitted
-- Accommodations permitted and whether they want the test to be hard or easy or just right. All of
*Significantly different (p<.05) from 2011
these are judgmental decisions, not science.
Here are definitions ofNAEP's achievement levels:
2. Trend in Fourth-Grade NAEP Mathematics Achievement-level ''Advanced" represents a superior level of academic performanc�.Jn
Results
Percent most subjects and grades, only 3-8 percent of students reach that level.
100
I think of it as A+. Very few students in any grade or subject score
80 4* 5' 6' 6'
"advanced."
60
"Proficient" represents solid achievement. The National Assessment
40
Governing Board (NAGB) defines it as "solid academic performance
20
for each grade assessed. This is a very high level of academic achieve
O-'---;:::---;::---::-i�--'�-l!�-1:!l�Jli!i!1!..-�!!..._�!!..._
'96 '00 '03 '05 '07 '09 ment. Students reaching this level have demonstrated competency over
'11 Year
% at Advanced % at Advanced
challenging subject matter, including subject matter knowledge, appli
% at or above Proficient % at or above Proficient cation of such knowledge to real-world situations, and analytical skills
% at or above Basic % at or above Basic
Accommodations not permitted Accommodations permitted
appropriate to the subject matter." From what I observed as a member
*Significantly different (p<.05) from 2011
of the NAGB who reviewed questions and results over a seven-year
period, a student who is "proficient" earns a solid A and not less than
a�
TheNAEP governing board authorized the establishment of achieve "Basic," as defined by the NAGB, is "partial mastery of prerequisite
ment levels in the early 1990s with the hope that the public would have kno�e and skills that are fundamental for proficient work at each
a better understanding of student performance, as compared with scale grade." In my view, the student who scores "basic" is probably a B or
scores. Critics of the achievement levels complained at the time that
�t.
the process was rushed and that the standards might be flawed and "Below basic" connotes students who have a weak grasp of the
unreasonably high. But a member of the governing board, Chester E. knowledge and skills that are being assessed. This student, again in my
Finn Jr., said it was necessary to move forward promptly and not to let understanding, would be a D or below.
the perfect become the "enemy of the good'Jor fear of sacr�he The film Waiting for "Superman" misinterpreted the lsJAEP
sense of ur ency for national improvement." 1 mei1t levels. Davis Guggenheim, the film's director ancLnarrator,
The critics were right. e ac 1evement evels have not led to better the NAEP achievement levels to argue that American students v:::s:re
understanding. Instead, the public is confused about what expectations woefully undereducated. The film claimed that 70 percent of e2th
are appropriate. T_he achievement levels present a bleak portrait of what grade students could not read at grade level. That �d be dread-
48 REIGN OF ERROR The Facts About Test Scores 49
fol if it were true, but it is not. NAEP does not report grade levels American public that there are large numbers of students who don't '\\
(grade kve: describes a midpoint on the grading scale where half are earn an A. They know that. That is common sense. Ideally, no one ))
aboP: and half are below). Guggenheim assumed that 2tu�ho �uld be "below basic," but that lowest rating includes children who
i1 were not "proficient" oorbe NAEP were "below grade level." That is are English-language learners and children with a range of disabilities
E:19\ w�. Actually, 7� percent on NAEP are basic or above, and 24� that might affect their scores. O.EdY� the dreams of policy makers and-i
cent are below basic. It would be good to reduce the proportion who le�ators is there a world where all students reach "proficiency" and /
are "below basic," but it is 24 percent, not the 70 percent that Gug score an A. If everyone scored an A or not less than a B+, the reform-
genheim claimed. 2 e�ldbe complaining about rampant grad�inflation-an�'.:]r f
Michelle Rhee, the former chancellor of the District of Columbia
j
would be ri�h,t. .
public schools, makes the same error in her promotio�afsTor �t years, reformers complained that student achievement nas
her advocacy group called Studen�First. She created this organiz��n been flat for the past twenty years. They make this claim to justify their
after the mayor of Washington, D.C., was defeated and she resigned demand for radical, unproven strategies like privatization. After all, if
her post. StudentsFirst raised millions of dollars, which Rhee dedicated we have spent more and more and achievement has declined or barely
to a campaign to weaken teachers' unions, to eliminate teachers' due moved for two decades, then surely the public educational system is
process rights, to promote charter schools and vouchers, and to fund "broken" and "obsolete," and we must be ready to try anything at all.
candidates who agreed with her views. Her central assertion is that the This is the foundational claim of the corporate reform movement.
nation's public schools are failing and in desperate shape. Her new orga But it is not true.
nization claimed, "Every morning in America, as we send eager fourth Let's look at the evidence.
graders off to school, ready to learn with their backpacks and lunch NAEP has tested samples of students in the states and in the nation
l
,, boxes, we are entrusting them to an education system that accepts the every other year since 1992 in reading and mathematics.
fact that only one in three of them can read at grade level." Like Gu Here is what we know from NAEP data. There have been signifi
£:._
genheim, she confuses "grade level" with " roficiency." The same page cant increases in both reading and mathematics, more in mathematics
has a statement that is more accurate, sa ing, "O a t e 4t gra ers m than in reading. The sharpest increases were registered in the years
the U.S., only½ of them are able to read this a e proficient y. Tnat's preceding the implementation ofNCLB, from 2000 to 2003.4
c�r to the NAEP definition, yet it is still a distortion, a m to saymg Reading scores in fourth grade have improved slowly, steadily, and sig
it is disappointing :hat only ½ of the class earned an A. But to deepen nificantly since I992 for almost every group ofstudents. (See graph 6.)
the co�ion, _the cfarifying statement is followed by "Let me repeat
t1:,at. Only one m three U.S. fourth-graders can read at grade level. This The scale scores in reading show a flat line, but this is misleading.
is not okay.'� So, two out of three times, Rhee confuses "proficiency" Every group of students saw gains, but the overall line looks flat
(which is a solid A or B+ performance) with "grade level" (which means because of an increase in the proportion of low-scoring students.
average performance). 3 This is known to statisticians as Simpson's paradox.5
-
W hat are the facts? Two-thirds of American fourth graders were The proportion of fourth-grade students who were proficient
reading at or above basic in 2ou; one-third were reading below basic. \
or?ctvari�ed increased from 1992 to 20II. In 1992, 29 percent of l
Thirty-four percent achieved "proficiency," which is solid academic I
students were proficient or above; in 20n, it was 3,1 perc�nt. ,,/'
performance, equivalent to an A. Three-quarters of American eighth The proportion of fourth-grade students who were "below basic"
graders were reading at or above basic in 2ou; a quarter were reading .
declined from 38 percent in 1992_12,32 percent m 2011.
below basic. Thirty-four percent achieved "proficiency," equivalent to a Ths�of white students, black students, Hispanic students,
\\ solid A. (See graph 5; graphs 5-41 appear in the appendix.) and Asian students in fourth grade were higher in 20n than
\) Unfortunately, you can't generate a crisis atmosphere by telling the in 1992. The only group that saw a decline was American Indian
\"
50 REIGN OF ERROR The Facts About Test Scores 51
students. 6 (See graphs 7, 8, 9, and IO, which show rising scores for propor tion of eighth-grade students who were proficient or
whites, blacks, Hispanics, Asians, but not for American Indians.) advanced increased from 1990 to 2011. In 1990, 15 percent were
proficient or above; in 20n, it was 35 percent. (See graph 22.)
Reading scores in eighth grade have improved slowly, steadily, and sig The proportion of eighth-grade students who were "below
nificantly since r992for every group ofstudents. basic" declined from 48 percent in 1990 to 27 percent in 20n .
(See graph 22.)
The proportion of eighth-grade students who were proficient The scores of white students, black students, Hispanic students,
or advanced increased from 1992 to 20n. In 1992, 29 percent Asian students, and American Indian students in eighth grade were
of students were proficient or above; in 20n, it was 34 percent. higher in 20n than in 1992. (See graphs 23, 24, 25, 26, and 27.)
(See graph 11.)
The proportion of eighth-grade students who were "below basic"
declined from 31 percent in 1992 to 24 percent in 20n. As it happens, there is another version of NAEP that the federal gov
The scores of white students, black students, Hispanic students, ernment has administered since the early 1970s. The one I described
Asian students, and American Indian students in eighth grade were before is known as the "main NAEP." It tests students in grades 4 and
higher in 20n than in 1992. (See graphs 12, 13, 14, and 15.) 8; scores on the main NAEP reach back to 1990 or 1992, depending on
the subject. It is periodically revised and updated.
The alternative form of NAEP is called the "long-term trend assess
�alUlS_!�t reas[ing has not i1:2rroved over
the �st twent� years. It tsn t true. NAEP is the only gauge of change ment." It dates back to the early 1970s and tests students who are ages
over time, and It s ows s ow, steady, and significant increases. Students nine, thirteen, and seventeen (which roughly corresponds to grades
of all racial and ethnic groups are reading better now than they were in 4, 8 , and 12). The long-term trend NAEP contains large numbers of
1992. And that's a fact. questions that have been used consistently for more than forty years.
Mathematics scores in fourth grade have improved dramatically from Unlike the main NAEP, the content of the long-term trend NAEP sel
r992 to 20II. dom changes, other than to remove obsolete terms like "S&H Green
Stamps." The long-term trend NAEP is administered to scientific sam
ples of students every four years.
The proportion of fourth-grade students who were proficient
Both the main NAEP and the long-term trend NAEP show steady
or advanced increased from 1990 to 2011. In 1990, 13 percent of
increases in reading and mathematics. Neither shows declines. The
students were proficient or above; in 20n, it was 40 percent.
long-term tests hardly ever change, so they provide a consistent yard
(See graph 2.)
stick over the past four decades.
The proportion of fourth-grade students who were "below basic"
Here are the changes in the long-term trend data in mathematics,
�eclined from 50 percent in 1990 to an astonishingly low 18 percent
from 1973 to 2008 : 7
Ill 20II.
The overall score does not reflect the large gains that were made Black students: age nine, up 34 points; age thirteen, up 25 points; age
over the past four decades, again because of Simpson's paradox. Each seventeen, up 28 points.
of the four major groups of students saw significant gains. (See graphs Hispanic students: age nine, up 25 points; age thirteen, up IO points;
28 and 29). age seventeen, up 17 points.
White students over the past forty years show impressive gains: age
nine, up 25 points; age thirteen, up 16 points; age seventeen, up 4 points. Compare this with gains on the main NAEP reading from 1992 to
Black students over the past forty years show remarkable gains: age
20n:
nine, up 34 points; age thirteen, up 34 points; age seventeen, up 17
points. White students: fourth grade, up 7 points; eighth grade, up 7 points.
Hispanic students also show remarkable gains: age nine, up 32 (See graphs 7 and 12.)
points; age thirteen, up 29 points; age seventeen, up 16 points. Black students: fourth grade, up 13 points; eighth grade, up 12
On the main NAEP, from 1990 to 20n, here are the data for points. (See graphs 7 and 12.)
mathematics: Hispanic students: fourth grade, up 9 points; eighth grade, up II
points. (See graphs 8 and 13.)
White students: fourth grade, up 29 points; eighth grade, up 23 Asian students: fourth grade, up 19 points; eighth grade, up 7 points.
points. (See graphs 16 and 23.) (See graphs 9 and 14.)
Black students: fourth grade, up 36 points; eighth grade, up 25
points. (See graphs 16 and 23.) NAEP data show beyond question that test scores in reading and
Hispanic students: fourth grade, up 29 points; eighth grade, up 24 math have improved for almost every group of students over the past
points. (See graphs 17 and 24.) two decades: slowly and steadily in the case of reading, dramatically
Asian students: fourth grade, up 31 points; eighth grade, up 28 in the case of mathematics. Students know more and can do more in
points. (See graphs 18 and 25.) these two basic skills subjects now than they could twenty or forty
years ago.
In reading, the changes are less dramatic, but they are steady and Why the difference between the two subjects? Reading is influenced
significant. to a larger extent by differences in home conditions than mathematics.
On the long-term trend assessments, these were the changes in read P ut another way, students learn language and vocabulary at home and
ing from 1971 to 2008: in school; they learn mathematics in school. Students can improve their
vocabulary and background knowledge by reading literature and his
tory at school, but their starting point in reading is influenced more by
4. Reading: Changes from 1973 to 2008 home and family than in mathematics.
So the next time you hear someone say that the system is "broken,"
Age9 t 14 points that American students aren't as well educated as they used to be, that
our schools are failing, tell that person the facts. Test scores are rising.
Age13 t 7 points
Of course, test scores are not the only way to measure education, but
Age17 t 4 points
to the extent that they matter, they are improving. Our students have
higher test scores in reading and mathematics than they did in the early
White students: age nine, up 14 points; age thirteen, up 7 points; age 1970s or the early 1990s. Of course, we can do better. Students should
seventeen, up 4 points. be writing more and reading more and doing more science projects and
54 REIGN OF ERROR
decades after 1990, when the federal tests were first offered; black stu Clearly, performance onNAEP is not flat. The gains in reading have
dent achievement was higher in 2009 than white student achievement
been slow, steady, and sig nifica nt. The gains in mathematics in both
in 1990. In addition, over this past generation there has been a remark
rested grades have been remarkable fo r whites, blacks, Hispanics, a nd
able decline in the propor tion of African America n and Hispanic stu
Asia ns. . .
dents who register "below basic," the lowest possible academic rating
Despite these increases, the achievement gaps remam between white
on the NAEP tests. students because
and black students and between white and His panic
If white achievement had stood still, the achievement gap would be rfor as well
all groups are improving t heir scores. Asian stu�ents pe n_i
closed by now, but of course white achievement has also improved, e tudents i eadi g d bette t an white students m math.
so as whit s n r n an r h
the gap remains large. t ese gains a d c stigate t e public sch o s for the
Reformers ignore h n a h o l
In mathematics, over the past two decades, all students made
dra persistence of the �ap. . . .
matic progress. I n 1990, 83 percent of black students in four t
h grade Closing the racial achievement gap has been a maJor policy goal of
scored "below basic," but that number fell to 34 percent in 2011.
In education policy ma kers for at least the past decade. There has been
eighth grade, 78 percent of black students were below basic
in 1990, some progress, but it has been slow a nd uneven. This is not surprising:
but by 20n the propor tion had dropped to 49 percent. Amo
_ ng His it is hard to narrow or close the gap if all groups are improving.
pamc students, the propor tion below basic in four th grade
fell from There is nothing new about achievement ga ps between different
67 percent to 28 percent; in eighth grade, that propor tion m families at differ
declined racial and ethnic groups and between children fro
from 66 percent to 39 percent. Among white students in four
th grade, ent ends of the income distribution. Such gaps exist wherever there is
the propor tion below basic dropped in that time period from
41 per inequality, not only in this country, but internationally. In every coun
cent to only 9 percent; in eighth grade, it declined from 40
percent try, the students from the most advantaged families have higher test
to 16 percent. The proportion of four th-grade Asian students 1
below scores on average than students from the least advantaged families.
basic dro� ped from 38 percent in 1990 to 9 percent in 20n; in eight One of the major reasons for the passage of theNo Child Left Behind
h
grade, Asian students who were below basic declined from 36 pe law was the expectation that it would narrow, perha ps even close, the
rcent
to 14 percent. ( See graphs 20 and 27.) black-white and also t he Hispa nic-white achievement ga ps. Policy
This is truly remarkable progress. makers and legislators believed in 2001, when NCLB was debated, that
The changes in reading scores were not as dramatic as in math testing and accountability would suffice to close the gaps. Lawmakers
, but
t�ey nonetheless are impressive. In four th-grade reading, the p believed that t he combination of test-based accountabil ity and trans
ropor
tion of black students who were below basic in 1992 was 68 pe parency would produce the desired results.
rcent;
by 20n, it was down to 51 percent. In eighth grade, the propo The very act of publishing the disparate results, they expe cted, would
r tion of
black students who were reading below basic was 55 percent; compel teachers to spend more time teaching the students who had low
that had
declined to 41 percent by 2011. Among fourth-grade white scores, especially if there were punitive consequences for not ra ising
students,
the propor tion below basic declined from 29 percent to 22 perce those scores. P resident George W. Bush sta ked his claim to being a
nt in the
same twenty-year period. Among four th-grade Hispanic stude "compassionate conserv ative" because, as he put it, he oppo sed "the soft
nts, the
propor tio_n reading below basic dropped from 62 percent to 49 bi gotr y of low expectations." If teachers were required by law to have
percent.
A�ong e�ghth-grade Hispanic students, the proportion readi high expecta tions for a ll students, the theory went, then all students
ng below
basic declmed from 51 percent to 36 percent. Among four th-g would learn and meet high standards.
rade Asian
students, the ?ropor tion below basic fell from 40 percen Now we k now t hat, despite some gains, NCLB did n ot clo se the
t to 20 per
cent. In the eighth grade, it declined from 24 percent to gaps. Paul Barton and Richard C oley of the E ducational Testing Ser
17 percent.
( See graphs 30 and 31 for all racial, ethnic groups.) vice wrote an overview o f the bla ck-white achieve�ent gap over the
58 R EIGN OF ERROR The Facts About the Achievement Gap 59
ment gap, which has narrowed, the income achievement gap is grow -within the school and between the school and the community. Social
ing. In fact, he found that the income achievement gap was nearly capital is a necessary ingredient of reform, and it is built on a sense �f
twice as large as the black-white achievement gap; the reverse was true cornrnunity, organizational stability, and trust. Successful schools m
fifty years earlier. The income achievement gap is already large when distressed communities have stable leadership and a shared vision for
children start school, and according to the work of other researchers it change. They have "a sense of purpose, a coherent plan, and individu
"does not appear to grow ( or narrow) appreciably as children progress als with responsibility to coordinate and implement the plan. Teach
through school." Reardon suggests that the income-based gap is grow ers worked collaboratively to improve teaching and learning across the
ing in part because affluent families invest in their children's cogni entire school curriculum . . . School improvement wasn't something
tive development, with tutoring, summer camp, computers, and other done to them ( like some sort of medical procedure), but a collaborative
enriching experiences. He concludes that "family income is now nearly
t undertaking. Students also realized that the school's engagement in
I
as strong as parental education in predicting children's achievement." 5 f school improvement activities was meant for them, for their benefit." 7
Thomas B. Timar of the University of California reviewed the ef t If we are serious about significantly narrowing the achievement gaps
forts to close the black-white achievement gap and the Hispanic-white between black and white students, Hispanic and white students, and
achievement gap and concluded that while there had been progress, the poor and affluent students, then we need to think in terms of long
overall situation was discouraging. W hy was there so little progress? He term, comprehensive strategies. Those strategies must address the
wrote: "One reason is that although schools can be held accountable for problems of poverty, unemployment, racial isolation, and mass incar
I
some of the disadvantage these students experience, they have been given f ceration. Income inequality in the United States, he points out, cannot
the entire responsibility for closing the achievement gap [emphasis mine] . be ignored, since it is greater now than at any time since the 1920s
Yet the gap is the symptom of larger social, economic and political and more extreme than in any other advanced nation. But American
problems that go far beyond the reach of the school . . . While schools politics has grown so politically conservative and unwilling to address
are part of the solution, they alone cannot solve the problem of educa structural issues that die chances of this happening are slim.
tional disparities."6 So we are left with the short-term strategies. Timar says that the
Another reason for the persistence of the gaps, Timar writes, is that strategies of "bureaucratizing the process of school improvement and
policy makers have invested in strategies for thirty years that are "mis turning it into a chase for higher test scores" have not worked. They
directed and ineffectual," managing to keep urban schools in a state of have not made schools more stable, more coherent, and more profes
"policy spin," bouncing from one idea to another but never attaining sional. NCLB plus the Obama administration's Race to the Top have
the learning conditions or social capital that might make a difference. made schools less stable, encouraged staff turnover, promoted policy
Schools can't solve the problem alone, Timar acknowledges, as long churn, and undermined professionalism.
as society ignores the high levels of poverty and racial isolation in which Timar believes that the best hope for a school-based strategy for
many of these youngsters live. He writes of children growing up in reducing the gaps lies in a grassroots model of change. He points to
neighborhoods that experience high rates of crime and incarceration, approaches like the Comer Process, developed by Dr. James Comer
violence, and stress-related disorders. In the current version of reform, of Yale University, which engages the school community in meeting
fixing schools means more legislation, more mandates, and more regu the emotional, psychological, social, and academic needs of students.
lations. W hat is missing from reform, he says, is an appreciation for the W hat works best is not regulation and mandates but professional col
value of local and regional efforts, the small-scale programs that rely laboration, community building, and cooperation. Such a scenario can
on local initiative for implementation. Without local initiative, reforms happen only when those in the school have the authority to design their
cannot succeed. own improvement plans and act without waiting for instructions or
Of great importance in creating lasting change is social capital, permission from Washington or the state capital.
Timar notes. This is the capital that grows because of relationships W hat we know from these scholars makes sense. The achievement
62 REIGN O F ERROR
ritics say that the nation is more at risk than ever because American
students are getting mediocre scores on international tests and fall
ing behind other nations. If we don't have top scores soon, our nation
will suffer grievously, our national security will falter, our economy will
founder, and our future will be in j eopardy.
By now, this is a timeworn bugbear, but it still works, so the critics
continue to employ it to alarm the public. In 1957, critics blamed the
public schools when the Soviets were first to launch a space satellite,
even though this feat was the work of a tiny scientific and technologi
cal elite. In 1983, critics blamed the public schools for the success of the
Japanese automobile industry ( overlooking the lack of foresight by lead
ers of the American automobile industry) and said the nation was "at
risk." In 2012, critics asserted that the nation's public schools are "a very
grave national security crisis," even though the nation has no signifi
cant international enemies. 1
Today, critics use data from international assessments to generate a
crisis mentality, not to improve public schools but to undermine public
confidence in them. To the extent that they accomplish this, the public
will be more tolerant of efforts to dismantle public education and divert
public funding to privately managed schools and for-profit vendors of
instruction.