You are on page 1of 90

.--I--.

--

--

.J ;1 1 1 1 1 ;1 1 I!)!):)
\

S T U D E N T T E S T IN G
C u rre n t E x te n t a n d
E x p e n d i tu re s , W i th
C o s t E s ti m a te s fo r a
N a ti o n a l E x a m i n
148205

(;A O /P I;:M

I)-!J :3 -8

Program
Methodol o gy

Eval u ati o n
Di v i s i o n

and

B-249895
January

13, 1993

The Honorabl e
Wi l i a m
Chai r man,
Commi t tee
House of Representati v es

D. Ford
on Educati o n

The Honorabl e
Wi l i a m
F. Goodl i n g
Ranki n g
Mi n ori t y
Member,
Commi t tee
House of Representati v es
The Honorabl e
Dal e E. Ki l d ee
Chai r man,
Subcommi t tee
on El e mentary,
and Vocati o nal
Educati o n
Commi t tee
on Educati o n
and Labor
House of Representati v es

and Labor

on Educati o n

and Labor

Secondary,

In 1991, thi s country


began debati n g
i n earnest the proposi t i o n
that the Uni t ed States adopt a
nati o nal
exami n ati o n
system. It soon became
apparent,
however,
that the debate l a cked
some
key i n formati o n
about the present
extent and cost of testi n g, as weIl as the l i k el y
cost of a
nati o nal
exami n ati o n
system. At your request, we attempted
to devel o p
that i n formati o n
and
found that students
do not seem to be overtested
today. Systemwi d e
testi n g (that i s , the testi n g
that i s gi v en to al l students
at any one grade l e vel i n a school
di s tri c t)
took up about 7 hours for
an average
stud%
i n 1990-91
(hal f of that ti m e i n di r ect testi n g and hal f i n rel a ted
acti v i t y)
and
cost about $15 per student
(i n cl u di n g
the cost of the test and staff ti m e). We esti m ate
that such
testi n g cost about $616 mi l i o n
nati o nwi d e
i n that year, and that a nati o nal
exami n ati o n-dependi n g
on whether
it i s based on mul t i p l e
choi c e
or performance
testi n g-woul d
cost, respecti v el y ,
about $160 mi l i o n
or about $330 mi h i o n
annual I y.
We are sendi n g
copi e s
of thi s report to offi c i a l s
at the Department
of Educati o n
and to others
who are i n terested,
and we wi l make copi e s
avai l a bl e
to others upon request. If you have any
questi o ns
or woul d
l i k e addi t i o nal
i n formati o n,
pl e ase
caII me at 202-275-1864
or Robert L. York,
Di r ector
of Program
Eval u ati o n
i n Human Servi c es
Areas, at 202-2755886.
Other maj o r
contri b utors
to the report are l i s ted i n appendi x
V.

El e anor
Assi s tant

Chel i m sky
Comptrol l e r

General

Executi v e

Summary

Purpose

Recent proposal s
from the federal executi v e
branch
and pri v ate
groups
have drawn unprecedented
attenti o n
to the i d ea of a nati o nal
exami n ati o n
for el e mentary
and secondary
students.
The House Commi t tee
on
Educati o n
and Labor asked GAO to l o ok at school
testi n g as i t exi s ts today,
descri b e
i t s nature, esti m ate
i t s extent and cost, and assess how a new,
nati o nal
test mi g ht affect those factors.

Background

Most of the debate on expanded


nati o nal
testi n g has centered
on maj o r
i s sues of what to test, how to test, and how to use the resul t s. Not much
attenti o n
has been gi v en to date to the questi o n
of how much and what
ki n d of testi n g there i s now. Yet the l i k el y
success of future testi n g may be
rel a ted to the si z e, nature, and cost of current efforts about whi c h there
exi s t onl y wi d e-rangi n g,
confl i c ti n g,
and hi g hl y
uncertai n
esti m ates.
These
range from 30 mi l i o n
to over 127 mi l i o n
standardi z ed
tests admi n i s tered
per year, at a cost of from $100 mi l i o n
to $915 mi l i o n.
The
congressi o nal l y
mandated
Nati o nal
Counci l
on Educati o n
Standards
and
Testi n g
(NCEST) decl i n ed
to provi d e
a cost esti m ate
i n i t s report
recommendi n g
a nati o nal
testi n g system, and others esti m ates
have
ranged from a few mi l i o n
dol l a rs
a year up to $3 bi l i o n.
GAO wanted
to obtai n val i d nati o nal
data on at l e ast al l systemwi d e
tests;
that i s , those gi v en to al l students
at any one grade l e vel i n a school
di s tri c t. Thi s excl u des
tests that onl y sel e cted
students
take, such as
i n di v i d ual
teachers
exams, speci a l
educati o n
di a gnosti c
tests, or col l e ge
admi s si o ns
exams. In the fal l of 1991, GAO surveyed
testi n g offi c i a l s
i n al l
the state educati o n
agenci e s
and i n a random
sampl e
of U.S. school
di s tri c ts.
The survey i n cl u ded
questi o ns
about each test admi n i s tered
and
about the testi n g offi c i a l s
vi e ws on the bal a nce
of costs and benefi t s
in
thei r current testi n g effort, on trends i n the fi e l d , and on the i d ea of a
nati o nal
test. GAO recei v ed
compl e ted
questi o nnai r es
from 74 percent
of
the l o cal di s tri c ts
i n the nati o nal
sampl e and from 48 of the 50 states. The
resul t s are general i z abl e
nati o nwi d e.

Resul t s

i n Bri e f

In 1990-91, U.S. students


do not seem to have been overtested.
Systemwi d e
testi n g took up about 7 hours per year for an average
student (hal f i n
di r ect testi n g and hal f i n rel a ted acti v i t y)
and cost about $16 per student
i n cl u di n g
the cost of the test and staff ti m e. The typi c al
test was the
fami l i a r,
commerci a l y
devel o ped
four- or fi v e-subj e ct
mul t i p l e -choi c e
exam. The l e ss common
performance-based
tests-i n
whi c h students
wri t e out some answers-cost
more (an average
of about $20 per student),

Page

GAOIPEMD-98-8

Student

Testi n g

Executi v e

Summary

but were consi d ered


preferabl e
di r ecti o n
cost of systemwi d e

by some testi n g offi c i a l s


for further devel o pment.
testi n g i n 1990-91 at $616

to be an i m provement
and a
GAO esti m ates
the overal l
mi l i o n.

Three model s
are commonl y
di s cussed
for future nati o nal
testi n g,
i n cl u di n g
(1) a si n gl e
nati o nal
mul t i p l e -choi c e
test, (2) a si n gl e
nati o nal
performance-based
test, and (3) a decentral i z ed
system of cl u sters
of
states, each cl u ster usi n g di f ferent
performance-based
tests. GAO esti m ated
that none of these woul d cost as much as the mul t i - bi l i o n-dol l a r
esti m ates
that some have put forth. The fi r st opti o n
woul d be l e ast expensi v e
($160 mi l i o n
per year). The thi r d (cl u sters),
the one advocated
by NCEST,
woul d l i k el y
cost about $330 mi l i o n
per year after about $100 mi l i o n
in
start-up devel o pment
costs, and the costs coul d be expected
to decl i n e
over ti m e. Any choi c e
among the three opti o ns
woul d i n vol v e
trade-offs.
For exampl e ,
the l e ast expensi v e
mul t i p l e -choi c e
test woul d be fami l i a r
and provi d e
the most comparabl e
data, but woul d be the most dupl i c ati v e
and mi g ht not be as val u ed
by many state and l o cal testi n g offi c i a l s .
Cl u sters
of performance
tests woul d cost more and woul d not necessari l y
be comparabl e ,
but may be better l i n ked
to l o cal teachi n g
and woul d be
vi e wed
more favorabl y
by many testi n g offi c i a l s .
Those offi c i a l s
respondi n g
to GAOS survey di d
expressed
concerns
over the purpose,
qual i t y,
the content and admi n i s trati o n
of further tests.
hi g h techni c al
qual i t y
that woul d be useful for
state or l o cal l e vel . However,
many respondents
the general
i d ea of a nati o nal
test.

Pri n ci p al

not oppose
more tests, but
and l o cus of control
over
They preferred
tests of
di a gnosi n g
probl e ms
at the
expressed
opposi t i o n
to

Fi n di n gs

The Current Extent and


Natyre of School Testi n g

Though
the average
student spent onl y 7 hours annual l y
on systemwi d e
testi n g, GAO found wi d e vari a ti o n
and total s as hi g h as 30 hours a year. A
maj o ri t y
of systemwi d e
testi n g was state-mandated,
wi t h state educati o n
agenci e s
devel o pi n g
most of these tests, usual l y
i n conj u ncti o n
wi t h test
devel o pment
contractors.
Al m ost 60 percent
of the tests used were
commerci a l y
avai l a bl e ,
wi t h achi e vement
tests from three publ i s hers
accounti n g
for 43 percent
of al l systemwi d e
tests. Testi n g
remai n ed
tradi t i o nal
i n format, wi t h 71 percent
of al l tests i n cl u di n g
onl y
mul t i p l e -choi c e
questi o ns.

Page

GAO/PEMD-93-8

Student

Testi n g

Executi v e

Summary

GAOS survey
showed
that new approaches
to testi n g are fi n di n g
l i m i t ed
acceptance.
By 1990-91, performance-based
tests (wi t h the excepti o n
of
fai r l y common
tests aski n g
for a wri t i n g
sampl e )
were i n use i n onl y seven
states or i n speci a l i z ed
appl i c ati o ns
such as readi n ess
tests for very young
students.
However,
these seven states, and several
others that have
devel o ped
hi g h-qual i t y
mul t i p l e -choi c e
tests, have devel o ped
fai r l y
sophi s ti c ated
testi n g programs
and have gai n ed
an experti s e
i n test
devel o pment
that coul d be useful to the devel o pment
of a nati o nal
exami n ati o n
system. Most of these states, moreover,
empl o yed
l o cal
teachers
and admi n i s trators
i n test devel o pment
and scori n g
and reported
that thei r i n vol v ement
faci l i t ated
acceptance
of the test and the al i g nment
of the test to the subj e ct
matter that teachers
actual l y
teach.

The Current
Testi n g

Cost of

The $15 per-student


average
cost of testi n g i n cl u ded
$4 i n
and over $10 i n state and l o cal staff ti m e, but costs vari e d
the
types of tests. In a subset of states where GAO obtai n ed
comparati v e
data, mul t i p l e -choi c e
tests averaged
l e ss than
performance-based
tests ($16 versus $33, respecti v el y ).

purchase
costs
for di f ferent
best
hal f the cost of

In budgetary
terms, testi n g rarel y accounted
for more than 1 percent
of
school
di s tri c t
budgets,
averagi n g
about one-hal f
of 1 percent.
State testi n g
programs
averaged
l e ss than 2 percent
of state educati o n
agency budgets.
For onl y three tests i n the country
di d state costs average
more than
di s tri c t
costs.

The Future Cost and


Extent of Testi n g

GAO esti m ates


that a nati o nal
test model e d
on the common
mul t i p l e -choi c e
tests, if taken by 10 mi l i o n
students
a year, woul d cost about $160 mi l i o n;
a nati o nal
performance-based
test si m i l a r
to those now devel o ped
in
several
states woul d cost $330 mi l i o n
per year, or al m ost two thi r ds of the
$516 mi l i o n
GAO esti m ates
i s now spent on systemwi d e
testi n g. Start-up
devel o pment
costs coul d add another
$100 mi l i o n.

But GAO found new costs woul d vary dependi n g


on the pl a n. Looki n g
at
deci s i o ns
made i n school
di s tri c ts
that i n the past faced a choi c e
between
an ol d test and a new state-mandated
test, GAO found that 82 percent
dropped
the ol d test when the states l a rgel y
dupl i c ated
it, but were much
more l i k el y
to use both if the tests di f fered
i n purpose
or coverage.
If the
same pattern hel d true i n response
to a nati o nal
test, a nati o nal
mul t i p l e -choi c e
test woul d cost the di s tri c ts
onl y $42 mi l i o n
more and 15
mi n utes
per student i n new costs, al l from addi t i o nal
testi n g i n 26 percent

Page

GAO/PEMD-93-8

Student

Testi n g

Executi v e

Summary

of U.S. school
di s tri c ts.
The other 74 percent
of di s tri c ts
woul d si m pl y
drop a current test, repl a ci n g
it wi t h the nati o nal
test. Because
many fewer
di s tri c ts
use such tests now, a nati o nal
performance-based
test woul d add
more new costs i n money and ti m e: $209 mi l i o n
and 30 mi n utes
per
student.
_-_--_.___.

_I _

_.._-----

Testi n g Offi c i a k
Vi e ws on
Present and Future Testi n g

Seventy-fi v e
percent
of state testi n g offi c i a l s
and 43 percent
of l o cal
testi n g offi c i a l s
consi d ered
the net benefi t s
of thei r present testi n g
programs
to be posi t i v e,
and most bel i e ved
that these benefi t s
woul d
conti n ue
or even i n crease
if more tests were added.
Maj o ri t i e s
menti o ned
performance-based
testi n g as a posi t i v e
trend and
confi r med
a trend away from norm-referenced
mul t i p l e -choi c e
tests
toward tests wi t h a hi g her
degree of curri c ul u m
al i g nment.
Less than hal f
the states had a curri c ul u m
that thei r di s tri c ts
were obl i g ed
to fol l o w,
however,
whi l e 10 states had unrequi r ed
curri c ul a .
The survey reveal e d
si g ni f i c ant
opposi t i o n
to the concept
of a nati o nal
exami n ati o n
system. Forty percent
of l o cal respondents
and 29 percent
of
state respondents
saw no advantages
to a nati o nal
system, and they
forecast some di s advantages,
parti c ul a rl y
a potenti a l
for mi s use
of test
resul t s. Thi r ty-two
percent
of l o cal respondents
and 63 percent
of state
respondents,
however,
speci f i c al l y
ci t ed the potenti a l
for compari n g
test
scores nati o nal l y
as an advantage
of a nati o nal
testi n g system. When asked
under what condi t i o ns
they woul d deci d e
to use a vol u ntary
nati o nal
test,
they rated most i m portant
whether
or not the test was of hi g h techni c al
qual i t y,
useful to thei r needs, and not costl y to them.

Matters for
Congressi o nal
Consi d erati o n

GAO bel i e ves

that if a deci s i o n
i s made to i m pl e ment
a
exami n ati o n
system, the Congress
may wi s h to ensure
l o cal teachers
and admi n i s trators
i n test devel o pment
state testi n g offi c i a l s
i n pl a nni n g
and i m pl e mentati o n.
support
and i m prove
the l i k el i h ood
of success
as state
wi l probabl y
pl a y a consi d erabl e
rol e i n the admi n i s trati o n
test.

nati o nal
the i n vol v ement
of
and scori n g
and of
Thi s shoul d
bui l d
and l o cal educators
of any nati o nal

If the Congress
wi s hes to encourage
the devel o pment
of a wel l - accepted
and wi d el y
used nati o nal
exami n ati o n
system, it shoul d
al s o consi d er
means for ensuri n g
the techni c al
qual i t y
of the tests. Test qual i t y
wi l
requi r e
an enduri n g
commi t ment
and suffi c i e nt
resources.

Page

GAO/PEMD-93-S

Student

Tehng

Contents

Executi v e
Chapter 1
Introducti o n

Testi n g

10
10
10

Amount of Ti m e Spent i n Testi n g


Types of Tests
Sources of Tests
Test Desi g n
Purnoses
of Tests
Rel a ti o nshi p
of Tests to Curri c ul u m
Trends
i n State Testi n g
Programs
S-arY

18
18
20
21
22
25
26
26
27

Nati o nal
Test Proposal s
The Debate Over Nati o nal
Obj e cti v es
Scope
Methodol o gy
Organi z ati o n
of the Report

Chapter 2
The Current Extent
and Nature of School

Chapter 3
The Current
Testi n g

Summary

Cost of

Chapter 4
The Future Cost and
Extent of Testi n g

Dol l a r and Ti m e Costs of Tests


Di f ferences
i n State and Local Costs
Mul t i p l e -Choi c e
and Performance-Based
Economi e s
i n Testi n g
Test Devel o pment
Costs
summary

Test

32
34
36
37

Costs

39
39
43
46
46
47
49

Cost of a Nati o nal


Exami n ati o n
System
The Effect of Addi n g
a New Test
Three Al t ernati v es
for a Nati o nal
Exami n ati o n
System
Possi b l e
Responses
to Each Pl a n
Inci d ental
Costs and Benefi t s
of Addi n g
a Nati o nal
Test
summary

Page

GAOi P EMD-93-8

Student

Testi n g

Content8

Chapter 5
Testi n g Offi c i a l s
Vi e ws on the Benefi t s
and Costs of Present
and Future Testi n g
Chapter 6
Concl u si o ns
Matters for
Consi d erati o n
Appendi x es

Tabl e s

and

Benefi t s and
The Future
Reacti o n
to
A Trade-Off
summary

Costs of Current Tests


of Testi n g
a Nati o nal
Exami n ati o n
Between Test Qual i t y
and Cost

Concl u si o ns
Matters for Congressi o nal

Appendi x
Appendi x
Testi n g
Appendi x
Appendi x
Appendi x
Gl o ssary
Bi b l i o graphy

51
51
52
54
56
57

I: Sampl e
II: Margi n al

69
59
60

Consi d erati o n

Survey: Stati s ti c al
Anal y si s
Effect of Proposed
Testi n g

64
67

Over Current

74
77
80
81
84

III: The Extent and Cost of Other Standardi z ed


Testi n g
Iv: Other Esti m ates
of the Extent and Cost of Testi n g
V: Maj o r Contri b utors
to Thi s Report

14
25
31

Tabl e
1.1: Sources We Used to Answer Eval u ati o n
Questi o ns
Tabl e 2.1: Compari s on
of Performance-Based
Tests Wi t h Al l Tests
Tabl e 3.1: Factors
Rel a ted to Hi g her and Lower Testi n g
Costs per
Student
Tabl e 4.1: Proj e cted
Costs of Nati o nal
Testi n g
Opti o ns
Tabl e 4.2: School Di s tri c t Responses
to State Testi n g
Mandates
Tabl e 4.3: School Di s tri c t Responses
to Three Nati o nal
Test
Al t ernati v es
Tabl e 4.4: Inci d ental
Costs and Benefi t s
of Proposed
Tests
Tabl e 5.1: Posi t i v e
and Negati v e
Trends
i n Testi n g
Tabl e 5.2: Advantages
and Di s advantages
of a Nati o nal
Exami n ati o n
System
Tabl e 6.1: Eval u ati n g
the Three Nati o nal
Test Al t ernati v es
Tabl e I. 1: Chi - Square
Tests Compari n g
Survey Response
Rates
Among Respondent
Groups
Tabl e 1.2: S&Percent
Confi d ence
Interval s
for Key Vari a bl e s

Page

GAO/PEMD-93-8

Student

43
44
46
49
54
55
60
64
66

Testi n g

Cbntsntr

Fi g ures

Fi g ure
2.1: Types of Tests
Fi g ure
2.2: Sources of Commerci a l y
Devel o ped
Tests
Fi g ure
2.3: Test Desi g n Features
Fi g ure
3.1: Per-Student
Costs of Two Test Types i n States Havi n g
Both
Fi g ure
3.2: Economi e s
of Scal e i n State Performance
Testi n g
Fi g ure
3.3: Economi e s
of Scope i n State Performance
Testi n g
Fi g ure
11.1: Degree of Overl a p
Wi t h a Si n gl e Nati o nal
Mul t i p l e -Choi c e
Test
Fi g ure
11.2: Degree of Overl a p
Wi t h a Si n gl e Nati o nal
Performance-Based
Test
Fi g ure
II.3 Degree of Overl a p
Wi t h a Cl u ster System

21
22
24
34
35
36
68
70
72

Abbrevi a ti o ns

Educati o nal
Intel l i g ence
Nati o nal
Nati o nal
Nati o nal
Offi c e of
Stanford
Uni v ersi t y

ETS
IQ
NAEP
NCEST
NCTPP
OTA
SAT
UCLA

Page

Testi n g
Servi c e
quoti e nt
Assessment
of Educati o nal
Progress
Counci l
on Educati o n
Standards
and Testi n g
Commi s si o n
on Testi n g
and Publ i c
Pol i c y
Technol o gy
Assessment
Achi e vement
Test or Schol a sti c
Apti t ude
Test
of Cal i f orni a
at Los Angel e s
GAfYPEMD-93-8

Student

Testi n g

Page

GMWPEMD-93-S

Student

Testi n g

Chanter

Introducti o n

In the summer of 1991, the country


was debati n g
i n earnest the
proposi t i o n
that the Uni t ed States adopt a nati o nal
exami n ati o n
for
el e mentary
and secondary
school
students.
Several proposal s
wi t h some
measure
of detai l were put forth by vari o us
pol i c y-ori e nted
groups. One
maj o r sti m ul u s
for the proposal s
was the conti n ui n g
i n terest i n measuri n g
progress
toward the nati o nal
educati o nal
goal s that emerged
from the
September
1989 e ducati o n
summi t
meeti n g
between
the state governors
and the Presi d ent
at Charl o ttesvi l e ,
Va.

Nati o nal
Proposal s

Si n ce 1990, many i n the fi e l d have offered


type of nati o nal
testi n g. These i n cl u de:

Test

a vari e ty

of proposal s

for some

the Ameri c an
Achi e vement
Tests segment
of Presi d ent
Bushs Ameri c a
2000 educati o n
strategy;
. i n novati v e
performance-based
tests urged by a coal i t i o n
of uni v ersi t y
researchers
worki n g
on the New Standards
proj e ct;
. a si n gl e nati o nal
mul t i p l e -choi c e
test advocated
by Educate Ameri c a,
an ad
hoc group;
work-rel a ted
ski l tests recommended
by the Secretary of Labors
Commi s si o n
on Achi e vement
of Necessary
Ski l s ; and
a vari e ty of state tests merged i n to a nati o nal
system, proposed
by the
congressi o nal l y
mandated
Nati o nal
Counci l
on Educati o n
Standards
and
Testi n g.
(The Counci ls
i d eas are di s cussed
i n more detai l i n chapter 4.)
l

The stated
encourage
standards,
nati o n.

The Debate Over


Nati o nal
Testi n g

pri n ci p al
obj e cti v e
of each of the test proposal s
was to
better teachi n g
and more l e arni n g-i n
short, to rai s e educati o n
and i n turn, to i m prove
the economi c
competi t i v eness
of the

Advocates
for nati o nal
testi n g argued that to compete
i n a technol o gi c al l y
advanced
worl d , Ameri c an
students
must achi e ve
hi g her l e vel s of
knowl e dge
and ski l s . Some argued that the new tests shoul d
i m prove
academi c
achi e vement
by dri v i n g
i n structi o nal
practi c es
and curri c ul a
to
be more focused
and chal l e ngi n g
than they have been. Some argued,
further, that the new exami n ati o ns
shoul d
faci l i t ate
compari s ons
across al l
states and school
di s tri c ts,
cal l i n g
attenti o n
to the most successful
and
defi c i e nt
educati o n
programs.

S ee, for exampl e ,


Educati o n
11:l (spri n g 1992), pp. 6-10.

Page

10

Commi s si o n

of the States, N ati o nal

Efforts,

State Educati o n

GAWPEMD-93-8

Student

Leader,

Testi n g

Chapter
Introducti o n

Cri t i c s agreed thai a nati o nal


test woul d focus i n structi o nal
practi c es
and
curri c ul a ,
but thought
thi s woul d be harmful
if the focus was too narrow
and some ski l s
and subj e ct
matter l o st out. Some cri t i c s argued, further,
that i n terpretati o n
of academi c
resul t s shown i n the tests shoul d
be
bal a nced
wi t h i n formati o n
on students
and school s
because
students
come to school
from wi d el y
di f ferent
backgrounds
and attend school s
wi t h wi d el y
di f ferent
l e vel s of resources.
Other arguments
revol v ed
around
format and admi n i s trati o n.
Shoul d test questi o ns
have a
mul t i p l e -choi c e
or performance-based
format? Shoul d they be based on a
nati o nal
curri c ul u m,
and if so, shoul d
the curri c ul u m
be devel o ped
fi r st or
si m ul t aneousl y ?
Shoul d a nati o nal
body devel o p
and admi n i s ter
the test,
or shoul d
both tasks be l e ft to the states to coordi n ate
by some sort of
compact
among them?2
Earl y i n the debate over nati o nal
testi n g, deci s i o nmakers
saw that they
l a cked
some key i n formati o n,
What was the current extent and cost (i n
both ti m e and dol l a rs)
of testi n g i n the school s ,
and how much woul d a
nati o nal
exami n ati o n
cost? Some opponents
of a nati o nal
exam asserted
that Ameri c an
students
and teachers
were al r eady
overburdened
wi t h
standardi z ed
tests3 Other opponents
asserted that a nati o nal
exami n ati o n
woul d be prohi b i t i v el y
expensi v e;
they often based thei r esti m ates
on the
cost of one parti c ul a r
test seri e s.4
Esti m ates
of the extent of standardi z ed
testi n g i n the Uni t ed States ranged
from 30 mi l i o n
to over 127 mi l i o n
tests admi n i s tered
annual l y .
Si m i l a rl y ,
esti m ates
of the current annual
cost of standardi z ed
testi n g ranged from
$100 mi l i o n
to $916 mi l i o n.6
We di s cuss
some esti m ates
of the extent and
cost of testi n g i n appendi x
IV. Esti m ates
of the cost of a new nati o nal
test
vari e d wi d el y ,
too. Our survey of the l i t erature
reveal e d
seven thoughtful
esti m ates
that ranged from several
mi l i o n
dol l a rs
annuahy
for a
mul t i p l e -choi c e
test l i k e the Armed Servi c es
Vocati o nal
Apti t ude
Battery
2Many of these same i s sues are al s o addressed
i n our anal y si s
student achi e vement
to be used wi t h the Nati o nal
Assessment
a forthcomi n g
report.
Vhey often referred to a report of the Nati o nal
Gatekeeper
to Gateway: Transformi n g
Testi n g
?hi s is the Advanced
expensi v e
for several
popul a ti o ns,
i n hi g hl y
si t es.

Commi s si o n
i n Ameri c a

of setti n g and measuri n g


of Educati o nal
Progress,
on Testi n g
and Publ i c
(Boston: Boston Col l e ge,

11

Pol i c y, Prom
l Q QO), pp. 14-18.

Pl a cement
exami n ati o ns
of the Educati o nal
Testi n g
Servi c e, whi c h are
reasons: each subj e ct-area
exam is admi n i s tered
separatel y ,
to di f ferent
secure condi t i o ns,
and test scorers are fl o wn i n from many states to central

The l o w esti m ates


of number of tests and costs come from Dougl a s
Testi n g?,
press rel e ase, Monterey,
Cal i f .: CTB Macmi l a n/McGraw-Hi l ,
esti m ates
are from the Nati o nal
Commi s si o n
on Testi n g
and Publ i c
Gateway (Boston: 1990).

Page

standards
for
the subj e ct of

J. McRae, T OPIC: Too Much


Nov. 16, 1990; the hi g h
Pol i c y, Prom Gatekeeper
to

GAO/PEMD-93-8

Student

Tenti n g

Chapter
Introducti o n

-_- _--.. ..-....-.. .-- .__

to $3 bi l i o n
a year pl u s $10 bi l i o n
performance-based
exams si m i l a r

Obj e cti v es

i n devel o pment
to the Advanced

costs for a system


Pl a cement
seri e s.

of

To obtai n more rel i a bl e


esti m ates,
the House Commi t tee
on Educati o n
Labor and i t s Subcommi t tee
on El e mentary,
Secondary,
and Vocati o nal
Educati o n
asked us to exami n e
the present extent and cost of testi n g
the Uni t ed States. Speci f i c al l y ,
we addressed
the fol l o wi n g
questi o ns
our study:

and
in
in

. What

i s the nature of current standardi z ed


school
testi n g, and what i s i t s
extent, i n cl u di n g
tests i n i t i a ted
by l o cal school
di s tri c ts
as wel l as by
states?
What are the costs of these tests?
How woul d new nati o nal
tests affect those factors, and i s there any
overl a p
between
current assessments
and those bei n g proposed?
How do testi n g offi c i a l s
vi e w the costs and benefi t s
of present and future
testi n g?

Scope

We restri c ted
the domai n
of tests to i n cl u de
onl y s ystemwi d e
tests; that
i s , those admi n i s tered
to every student, to al m ost every student, or to a
representati v e
sampl e
of al l students
i n at l e ast one grade l e vel i n a di s tri c t
or state. Si n ce we i n tended
to use questi o nnai r es
as our pri m ary
source of
data, we real i z ed
it was i m possi b l e
to ask about al l tests, or even al l
standardi z ed
tests, because
the reporti n g
burden
woul d have been too
great and our response
rate woul d have decreased
i n consequence.
The
domai n
of systemwi d e
tests i n cl u des
al l standardi z ed
tests except those
admi n i s tered
to speci a l
popul a ti o ns,
such as speci a l
educati o n
and gi f ted
and tal e nted
students;
opti o nal
tests, such as col l e ge
entry exams; and
many tests used for Chapter 1 eval u ati o n.6
Thus, the set of systemwi d e
tests seemed the most appropri a te
for our study, si n ce it consi s ts
of the
tests most l i k e the nati o nal
tests proposed
for al l students.
We attempt to
account
for the extent of other standardi z ed
testi n g i n appendi x
III.
We defi n ed
c osts by i t s two rel e vant
components.
Purchase
costs (dol l a r
costs) represent
the fi r st cost component
of testi n g-money
spent on
test-rel a ted
goods or servi c es
purchased
at set pri c es. The test forms and
bookl e ts
used wi t h a standardi z ed
test are purchased
from test compani e s
at a contracted
pri c e, for exampl e .
Li k ewi s e,
the scori n g
of
vests used for federal
tests were admi n i s tered

Page

12

Chapter 1 program eval u ati o n


woul d
at all school s
i n a school
di s tri c t.

onl y

be i n cl u ded

i n our survey

GAWPEMD-93-8

Student

data if the

Testi n g

Chapter

Introducti o n

.--.._.. . __.. -----

machi n e-readabl e
forms i s a servi c e purchased
at a contracted
pri c e. Ti m e
spent by educati o n
personnel
represents
the second cost component
of
testi n g; that i s , the amount of ti m e spent i n al l the test-rel a ted
acti v i t i e s
of
devel o pi n g,
admi n i s teri n g,
prepari n g
for, taki n g, gradi n g,
and i n terpreti n g
tests by al l the parti e s i n vol v ed-teachers,
admi n i s trators,
cl e ri c al
staff,
and others. Some of thi s ti m e i s expl i c i t l y
pai d for, and we gathered
i n formati o n
on i t s cost, Thi s cost can al s o be i n di r ect,
or i n -ki n d,
i f i t i s not
pai d for.
In general ,
any one test shoul d
not necessari l y
be preferred
over another
si m pl y
because
i t i s l e ss expensi v e,
as i t may al s o be l e ss benefi c i a l .
We
di d not attempt to make a quanti t ati v e
esti m ate
of tests benefi t s,
as they
do not l e nd themsel v es
to preci s e
measurement.
But we di d ask
knowl e dgeabl e
offi c i a l s
to gi v e us thei r assessment
of the benefi t s
of
testi n g rel a ti v e
to the costs. We di d not l i m i t the type of test or test format
we asked about. Many advocates
of a nati o nal
exami n ati o n
system
proposed
that i t empl o y
some of the newer testi n g techni q ues,
such as
performance-based
formats (i n whi c h students
must wri t e, perform a
l a boratory
experi m ent,
or i n some other way do more than si m pl y
answer
a mul t i p l e -choi c e
questi o n),
si n ce many experts consi d er
such tests of
hi g her qual i t y.
So we al s o sought i n formati o n
pertai n i n g
to thei r cost and
extent of use. We buttressed
our cost esti m ates
for performance
testi n g
wi t h fi g ures
obtai n ed
from our i n tervi e ws
wi t h educati o n
offi c i a l s
i n two
Canadi a n
provi n ces
that empl o y
performance
tests7 W i t h adequate
data on
the cost and extent of most types of current tests, we al s o pl a nned
to
esti m ate
both the expense
of any proposed
nati o nal
test that woul d be
si m i l a r
to current tests and i t s overl a p
wi t h current testi n g.

Methodol o gy
Surveys
Testi n g

on State and Local

a
We gathered
the
through
surveys
admi n i s trators.
four eval u ati o n

pri m ary
data to answer the four eval u ati o n
questi o ns
of state testi n g offi c i a l s
and l o cal school
di s tri c t
(Tabl e
1.1 shows the data sources we used for each of the
questi o ns.)

These
i n tervi e ws
were conducted
provi n ces
experi e nce
wi t h school

Page

13

as part of a rel a ted


testi n g wi l appear

study. A detai l e d
i n a forthcomi n g

di s cussi o n
report.

GACVPEMD-93-8

of Canadi a n

Student

Testi n g

Chapter
Introducti o n

Tabl e
1 .l : Sources
Eval u ati o n
Questi o ns

We Used

to Answer
Questi o n
Current

nature

and

extent

of school

testi n g

Cost of testi n g

Potenti a l
effects of new
nature, extent, and cost
overl a p
between
current
Testi n g
offi c i a l s
vi e ws
benefi t s
of current and

nati o nal
tests on
of testi n g, and
and proposed
tests
on the costs and
future testi n a

Data source
Surveys of all states and a sampl e
of
di s tri c ts;
data on each test
Surveys, data on each test, pl u s i n tervi e ws
wi t h offi c i a l s
of testi n g fi r ms, i n tervi e ws
wi t h Canadi a n
offi c i a l s
Surveys, case studi e s
of 50 di s tri c ts,
our
anal y si s
of nati o nal
test proposal s ,
pl u s
i n tervi e ws
wi t h offi c i a l s
of testi n g fi r ms
Surveys

Duri n g Jul y and August 1991, 10 state and l o cal testi n g offi c i a l s
from four
states revi e wed
earl y versi o ns
of our questi o nnai r es
and suggested
revi s i o ns.*
We then pretested
the revi s ed
versi o ns
wi t h four l o cal school
di s tri c t
offi c i a l s
and one state testi n g di r ector
i n Maryl a nd
and Vi r gi n i a .
The four l o cal di s tri c ts
represented
smal l , l a rge, and very l a rge student
popul a ti o ns
and urban, suburban,
and rural areas9
We desi g ned
two questi o nnai r es
for our state or l o cal respondents.
The
fi r st requested
general
i n formati o n
about the state or di s tri c t
and the
respondents
vi e ws on general
testi n g i s sues, The second
requested
i n formati o n
about each systemwi d e
test, parti c ul a rl y
detai l e d
i n formati o n
on ti m e and dol l a r
expendi t ures.
Respondents
were to fil out a separate
questi o nnai r e
for each test. In hopes of i n creasi n g
the wi l i n gness
of
offi c i a l s
to respond,
wi t h the agreement
of the congressi o nal
requesters
we promi s ed
not to i d enti f y
any speci f i c
state or school
di s tri c t.
In September
1991, we sent the questi o nnai r es
to al l 60 state testi n g
di r ectors
and to 663 l o cal publ i c
school
admi n i s trators
(di r ector
of testi n g
or superi n tendent)
i n school
di s tri c ts
contai n i n g
more than 50 students.
To
achi e ve
a hi g h response
rate, we sent the survey twi c e, if necessary,
and
then sent two postcard
remi n ders.
We tel e phoned
many of the
respondents
who returned
i n compl e te
questi o nnai r es
i n order to fil i n
mi s si n g
i n formati o n.
Of the 663 l o cal di s tri c ts
that recei v ed
questi o nnai r es,
16 ei t her had fewer
than 60 students,
were defunct
(usual l y
through
mergi n g
wi t h another
di s tri c t),
or were unabl e
to respond,
gi v i n g
us a total sampl e
si z e of 648
Those

four states were Cal i f orni a ,

W e cl a ssi f i e d
and very l a rge

Page

14

Maryl a nd,

North

Carol i n a,

di s tri c t si z e as smal l (from 61 to 3,600 students),


(over 36,000 students).

and Vi r gi n i a
l a rge

(from 3,600 to 35,000

GAO/PEMD-93-8

Student

students),

Teeti n g

Chapter
Introducti o n

l o cal school
di s tri c ts.
Of those di s tri c ts,
600 formed a nati o nal l y
representati v e
sampl e
we had desi g ned
to produce
general i z abl e
esti m ates
for the Uni t ed States.1o We recei v ed
368 compl e ted
questi o nnai r es
from
thi s group, for a 74-percent
response
rate. We recei v ed
compl e ted
questi o nnai r es
from 48 of the 50 states. The two remai n i n g
states di d not
admi n i s ter
statewi d e
tests i n 1990-91. Appendi x
I contai n s
our anal y si s
of
the sampl e
survey response
rates among di f ferent
respondent
groups.
We searched
for publ i s hed
i n formati o n
on vari a ti o ns
i n testi n g programs
among l o cal school
di s tri c ts
so that we coul d target our surveys to cover
the di f ferent
si t uati o ns.
We found no useful i n formati o n
asi d e from the
types of state-mandated
tests. We therefore
desi g ned
our strati f i e d
sampl e
usi n g some school
di s tri c t
characteri s ti c s
that were avai l a bl e
to
us-di s tri c t
si z e, metropol i t an
status (urban, suburban,
or rural ) , and type
of state test-that
we thought to be rel a ted to the l e vel and cost of
testi n g. l 1
In addi t i o n
to surveyi n g
the 500 school
di s tri c ts
that formed our nati o nal
sampl e ,
we oversampl e d
i n certai n states that were usi n g
performance-based
formats i n state-speci f i c
and statemanaged
tests.12 We
attempted
to get more responses
from di s tri c t
offi c i a l s
i n these states
because
there are few data avai l a bl e
el s ewhere
on the i m pl e mentati o n
of
these techni q ues
and thei r admi n i s trati o n
costs. Oversampl i n g
al l o wed
for
more preci s e
esti m ates
of the cost of performance-based
tests.
In summary,
the surveys provi d ed
di r ect answers
to the fi r st two questi o ns
concerni n g
the nature, extent, and costs of current testi n g programs.
We
al s o gathered
data on test devel o pment
andit s
costs from state testi n g
offi c i a l s
and representati v es
of commerci a l
testi n g fi r ms. The surveys
were al s o useful i n answeri n g
the thi r d questi o n--concerni n g
the overl a p
between
current and proposed
tests-by
provi d i n g
i n formati o n
on school
di s tri c t
reacti o ns
i n the past when they faced new state test mandates
and
had to choose
between
si m pl y
addi n g
another
test to thei r programs
or
droppi n g
a current test i n favor of the new state test. The surveys
al s o
0The remai n der
of the 648 l o cal school
di s tri c ts
representati v e
sampl e )
were used to oversampl e
those i n states that we knew empl o yed
statewi d e
W e
l e ast

cl a ssi f i e d
state tests i n to four types:
one cri t eri o n-referenced
mul t i p l e -choi c e

(148 di s tri c ts
were not i n cl u ded
i n certai n categori e s
of school
performance-baaed
tests.

no state test, onl y norm-referenced


mul t i p l e -choi c e
test, and at l e ast one performance-based
test.

T hese
states i n cl u ded
Ari z ona, Connecti c ut,
Mai n e, Maryl a nd,
Vermont. We oversampl e d
di s tri c ts
i n Ari z ona
and Vermont to
portfol i o
assessments.
That effort proved unsuccessful ,
as most
not consi d er
portfol i o s
to be t ests and thus di d not compl e te

Page

15

i n the nati o nal l y


di s tri c ts,
i n cl u di n g
test(s), at

Massachusetts,
New York, and
obtai n data on the i m pl e mentati o n
of
of the respondents
i n those states di d
surveys about thi s acti v i t y.

GAOFEMD-99-8

Student

Testi n g

Ch4ptcr

Introducti o n

provi d ed
di r ect answers to the fourth questi o n
regardi n g
testi n g
vi e ws on the costs and benefi t s
of present and future testi n g.

Case Studi e s of the Effect


of Testi n g Mandates

To fi n d further i n formati o n
effects of new nati o nal
tests
testi n g programs,
we made
mandated
that thei r di s tri c ts
tests, school
di s tri c ts
woul d
l i n eup,
di s cardi n g
an exi s ti n g
the nati o nal
test i n favor of

offi c i a l s

on the thi r d questi o n,


concerni n g
the possi b l e
on the nature, extent, and cost of current
a separate
study of past i n stances
where states
admi n i s ter
new tests. Gi v en new nati o nal
face the choi c e
of addi n g
another test to thei r
test i n favor of a nati o nal
test, or rej e cti n g
the exi s ti n g
tests.

Before the l a te 197Os, few states mandated


statewi d e
tests. By the end of
the 198Os, the si t uati o n
reversed
so that onl y a few states di d not do so.
From our survey responses,
we i d enti f i e d
about 200 school
di s tri c ts
that
had been admi n i s teri n g
tests of si m i l a r
subj e cts
and purposes
at the ti m e
thei r states i m posed
statewi d e
tests, Some of these di s tri c ts
kept thei r ol d
tests; some di s carded
them. We asked al l of them i n the mai n survey how
si m i l a r
i n purpose
or content the new test was to the ol d , and speci f i c al l y ,
if they dropped
the ol d test and, if so, why. We i n tervi e wed
offi c i a l s
by
tel e phone
i n a systemati c
sampl e of 50 of these di s tri c ts
to l e arn more
about how they made thei r deci s i o ns
and how the new tests affected the
l e vel and costs of thei r testi n g programs.

Study Strengths
Li m i t ati o ns

and

The most i m portant


strengths
of our study are four. Fi r st, it i s uni q ue,
si n ce no other up-to-date
i n formati o n
on current testi n g i s avai l a bl e ,
and
new test costs have typi c al l y
been crudel y
esti m ated.
Second, our fi n di n gs
are comprehensi v e,
coveri n g
the enti r e country, cl o se to the ful l
popul a ti o n
of states and a representati v e
sampl e of l o cal school
di s tri c ts.
Thi r d, wi t h the strati f i e d
sampl e
desi g n, we can make stronger
esti m ates
of the extent and cost of testi n g i n the Uni t ed States than we otherwi s e
coul d , Fourth, we oversampl e d
i n certai n groups that otherwi s e
woul d not
have been wel l represented,
such as very l a rge school
di s tri c ts
and
di s tri c ts
i n states wi t h statewi d e
performance-based
tests.
Of course, the study has some l i m i t ati o ns,
too. It covers 1 year onl y -the
1990-91 school year-and
we onl y surveyed
publ i c
school s .
We di d not
gather i n formati o n
on al l testi n g, or even on al l standardi z ed
testi n g, nor
di d we make fi i t +hand-;;bservati o ns
to check survey answers, and so any
effort on our part to portray the total testi n g burden
on el e mentary
and
secondary
students
(or i t s cost) from the esti m ates
our respondents
gave

Page

16

GAO/PEMD-93-8

Student

Testi n g

Chapter
Introducti o n

can onl y
pertai n i n g
consi d er

be roughl y
approxi m ate.
We were not abl e to col l e ct
much data
to assessment
methods
that testi n g offi c i a l s
often do not
to be tests, the most promi n ent
exampl e
bei n g student portfol i o s.

Some useful data were beyond


our abi l i t y
to col l e ct,
such as the vi e ws of
parents, advocacy
groups, and students
on the costs, burdens,
and benefi t s
of testi n g or the meri t s of expanded
nati o nal
tests. Si m i l a rl y ,
we coul d not
gather fi r st-hand
data on key topi c s such as how tests are used or how
they affect i n structi o n;
the vi e ws of our survey respondents
gi v e onl y some
aspects of these matters, and from a parti c ul a r
poi n t of vi e w. Fin al l y ,
because
we asked respondents
i n fal l 1991 thei r general
vi e ws on nati o nal
tests, the answers
do not refl e ct the speci f i c s
of any proposal s
made si n ce
then.

Organi z ati o n
Report

of the

Chapter 2 addresses
the fi r st of our four questi o ns,
wi t h esti m ates
of the
current nature and extent of systemwi d e
testi n g i n the Uni t ed States and
the rel a ti v e
promi n ence
of di f ferent
types of tests. Chapter 3 answers the
second
questi o n,
on the current costs of systemwi d e
testi n g i n the Uni t ed
States, i n ti m e and i n dol l a rs,
and for di f ferent
types of tests. Chapter 4
responds
to the thi r d questi o n
wi t h i n formati o n
on the possi b l e
effects on
di s tri c t
testi n g programs
of addi n g
a new test and the condi t i o ns
under
whi c h di s tri c ts
woul d adopt new tests. Chapter 5 answers the fourth
questi o n,
provi d i n g
the vi e ws of our respondents
on the \benefi t s of thei r
testi n g programs,
and it i n cl u des
addi t i o nal
vi e ws on trends i n testi n g, a
nati o nal
exami n ati o n
system, and current l e vel s and costs of testi n g
programs.
Chapter 6 proposes
some matters for congressi o nal
consi d erati o n
rai s ed by thi s study.

Page

17

GAO/PEMD-93-8

Student

Testi n g

Chapter

The Current
Testi n g

Extent

and Nature

of School

Many cl a i m s have been made concerni n g


the number
of tests students
in
the Uni t ed States are requi r ed
to take and the amount
of ti m e they spend
taki n g them. On the one hand, some state-l e vel
educati o n
reformers
of the
past 2 decades
thought
that there was too l i t tl e testi n g to ensure
accountabi l i t y
for educati o n
expendi t ures,
and they successful l y
urged
expanded
statewi d e
testi n g. On the other hand, some testi n g cri t i c s have
asserted
that U.S. students
take the most tests of any students.
In the
di s cussi o ns
on nati o nal
testi n g, observers
reacted to new testi n g
proposal s ,
i n part, based on thei r vi e w of the extent of testi n g at the ti m e.
Thi s chapter presents
our nati o nal
survey data, al l o wi n g
us to descri b e
the
extent and nature of systemwi d e
school
testi n g for the year 1990-91,
i n cl u di n g
the amount
of ti m e students
spent i n testi n g, the types of tests
they took, who desi g ned
these tests, what desi g ns
they used, and for
whi c h purposes
they i n tended
to use them. We al s o present i n formati o n
on how testi n g offi c i a l s
see tests l i n ked
to curri c ul u m.

Amount of Ti m e
i n Testi n g

Spent

For the systemwi d e


tests we l o oked
at, the burden
of testi n g on U.S.
students
was modest i n 1990-91. On average, students
spent l e ss than 4
hours each taki n g systemwi d e
tests, or l e ss than one-hal f
of 1 percent of a
students
school
year. Counti n g
al l the ti m e devoted
to test-rel a ted
acti v i t i e s,
such as l e arni n g
test-taki n g
ski l s or l i s teni n g
to test i n structi o ns
or resul t s, the mean ti m e burden
sti l averaged
l e ss than 7 hours for the
year (wi t h the medi a n
at l e ss than 6 hours).
There was a wi d e range to thi s ti m e burden,
however.
One school
di s tri c t
gave no systemwi d e
tests at al l i n 1990-91; si x di s tri c ts
i n our sampl e
admi n i s tered
10 or more. Ei g hty-fi v e
percent
of school
di s tri c ts
admi n i s tered
one to three tests (the mean was 2.5; the medi a n,
2 tests a
year). Fi v e states requi r ed
no statewi d e
tests, whi l e one of them requi r ed
four. The mean number
of tests among al l states was 1.7; the medi a n
was 1
test.
At the hi g h end of the range of testi n g effort, we found several
di s tri c ts
admi n i s tered
over 27 hours of systemwi d e
tests i n 1990-91.2 Counti n g
student ti m e devoted
to al l test-rel a ted
acti v i t i e s
(i n cl u di n g
preparati o n,
IThe exact number i s 3.4 hours.
3 hours per student.

the medi a n

was

%s
i s the sum of the hours requi r ed
to admi n i s ter
al l of the di s tri c ts
tests. No i n di v i d ual
student
year woul d have taken al l of them, owi n g to the common
practi c e
(as we descri b e)
of scatteri n g
systemwi d e
tests across the grades.

in 1

Page

18

Thi s

stati s ti c

represents

the mean

for al l U.S. students;

GAO/PEMD-93-8

Student

Testi n g

Chapter
2
The Current
Terti n g

Extent

and

Nature

of School

l i s teni n g
to resul t s, and the l i k e), several
di s tri c ts
cl a i m ed
over 100 hours.
When we di v i d ed
each di s tri c ts
total hours by the number
of students
in
the di s tri c t,
we found di s tri c ts
have made a wi d e range of choi c es
of how
much ti m e to devote to testi n g: from zero to over 13 hours for the average
student j u st wri t i n g
exams i n 1990-91 and from zero to over 40 hours
al t ogether
(or a ful l week of 6-hour school
days) i n test-rel a ted
acti v i t y.
The most ti m e-consumi n g
test was a certai n state test that covered
four
subj e ct
areas. Passi n g thi s test was requi r ed
for graduati o n.
Because the
stakes were hi g h, students
took the test duri n g
a 3-day peri o d
wi t h
vi r tual l y
no ti m e constrai n t
on each secti o n
of the test. (The offi c i a l
who
compl e ted
our survey esti m ated
18 hours for the ful l test.) Al s o because
the stakes were hi g h, some di s tri c ts
i n the state spent a consi d erabl e
amount
of ti m e i n test preparati o n
acti v i t i e s.3
Few tests wi t h more
conventi o nal
ti m e l i m i t s occupi e d
more than 10 hours total test-taki n g
ti m e.
We found more hours of systemwi d e
testi n g i n the school
di s tri c ts
wi t h
more experi e nce
i n testi n g, that have a rel a ti v el y
hi g h l e vel of poverty, that
admi n i s ter
hi g h-stakes
tests, and that are l o cated
i n Northeastern
or
Southern
states.4 Northeastern
states testi n g programs
commonl y
used
l o nger, performance-based
tests that contri b uted
to more hours of testi n g
there. Hi g h-stakes
testi n g i n Southern
statewi d e
testi n g programs
contri b uted
to more hours of test-rel a ted
acti v i t y
there, though not more
hours of test wri t i n g
ti m e. On average, hi g h-stakes
tests requi r ed
43 percent more ti m e i n test-rel a ted
acti v i t i e s
other than taki n g the test,
mostl y i n test preparati o n
acti v i t i e s.
We found fewer hours of systemwi d e
testi n g i n school
di s tri c ts
wi t h hi g her
professi o nal
sal a ri e s
and i n Western
states6 Metropol i t an
l o cati o n,
di s tri c t
si z e, and the presence
of bi l i n gual
students
seemed to have l i t tl e cl e ar
rel a ti o nshi p
to the amount
of di s tri c t
testi n g one way or another.

these
were defi n ed
tests, or i n moti v ati o nal

i n our survey as m i n utes


of i n structi o n
acti v i t i e s
geared to thi s test.

i n test4aki n g

skil s,

P overty
was measured
by the proporti o n
of students recei v i n g
free or reduced-pri c e
Hi g h-stakes
tests are those used to determi n e
promoti o n,
retenti o n,
or graduati o n.
and tests used for student-l e vel
accountabi l i t y
are consi d ered
synonymous.
S al a ri e s
general l y ,

Page

and expendi t ures


represent
both the weal t h and the cost of l i v i n g
the Pl a i n s, the Rocky Mountai n
states, and the Paci f i c states.

19

of taki n g

l u nches.
H i g h-stakes

i n a regi o n.

GAO/PEMD-98-8

practi c e

Thi s

Student

tests
i n cl u des,

Tenti n g

Chapter
2
The Current
Testi n g

l & -pes of Tests

Extent

and

Nature

of School

One way to categori z e


tests di v i d es
them accordi n g
to the mai n purpose
i n tended
by the test makers. Cl a ssi f i e d
thi s way, 81 percent
of al l
systemwi d e
tests taken i n 1990-91 were achi e vement
tests, those that
attempt to measure
a students
accumul a ted
knowl e dge
or ski l . Most of
the achi e vement
tests were commerci a l y
avai l a bl e ;
many of them were
state-speci f i c
(i . e., desi g ned
or adapted
to match a states curri c ul u m).
Exampl e s
of the more wi d el y
used commerci a l
achi e vement
tests i n cl u ded
the Iowa Test of Basi c Ski l s , the Comprehensi v e
Test of Basi c Ski l s , the
Stanford Achi e vement
Test (SAT), the Cal i f orni a
Achi e vement
Test, and the
Metropol i t an
Achi e vement
Test.
Another 8 percent
of systemwi d e
tests were desi g ned
to measure
apti t ude
IQ tests fal l i n thi s category.
Exampl e s
or abi l i t y
(i . e., future performance).
of commerci a l y
avai l a bl e
apti t ude
tests i n cl u de
the Oti s -Lennon
School
Abi l i t y
Test, the Cogni t i v e
Abi l i t i e s
Test, and the Test of Cogni t i v e
Ski l s .
Si x percent
of systemwi d e
tests were desi g ned
to measure
vocati o nal
i n terests
i n order to hel p students
wi t h career pl a nni n g.
Another 3 percent
of tests taken i n 1990-91 were devel o ped
to measure
r eadi n ess.
Readi n ess
tests are normal l y
gi v en to ki n dergarten
or pri m ary
school
students.
Fi g ure
2.1 summari z es
test types accordi n g
to thei r mai n
purpose.
Another way to categori z e
tests di v i d es
them accordi n g
to the subj e ct
areas covered,
of course, any one test coul d address
from one to several
subj e cts.
Systemwi d e
tests i n 1990-91 mostl y covered
school
achi e vement
i n fi v e core subj e cts:
math, readi n g,
grammar,
sci e nce,
and hi s tory
or
soci a l
sci e nce.
Our respondents
cl a i m ed
that 25 percent
of systemwi d e
tests addressed
apti t ude
or abi l i t y
and 10 ti e rcent
addressed
readi n ess,
though for that to be true, some school
di s tri c ts
must have been usi n g
tests that were desi g ned
to measure
achi e vement
as an i n di c ator
of
apti t ude,
abi l i t y,
or readi n ess.
As the previ o us
secti o n
expl a i n ed,
onl y
8 percent
of tests were desi g ned
to measure
apti t ude
and onl y 3 percent
tests were desi g ned
to measure
readi n ess.
Our respondents
al s o tol d us that 36 percent
of tests addressed
wri t i n g,
12 percent
c ri t i c al
thi n ki n g,
7 percent
ci v i c s or ci t i z enshi p ,
6 percent
vocati o nal
i n terests,
and 1 percent
atti t udes.
Notabl e
i n thei r absence
(at
l e ss than 1 percent
of di s tri c ts)
were tests that i n cl u ded
forei g n
l a nguage
or art. Sel d om do al l students
i n a di s tri c t
take art or any si n gl e
forei g n
l a nguage,
so these subj e cts
tend not to be tested systemwi d e.
%eventy-ei g ht
percent of tests addressed
math knowl e dge
or skil s, 70 percent readi n g,
49 percent
grammar, 44 percent sci e nce,
and 42 percent hi s tory or soci a l sci e nce.
Percentages
do not total 100
because
many tests cover mul t i p l e
subj e ct areas.

Page

20

GAO/PEMD-93-8

Student

Testi n g

a
of

Chspl m
2
The Current
Tenti n g

Fi g ure

2.1: Types

Extent

and

Nature

of School

of Tests

6%
Vocati o nal

84/
A&de

I
aThe

sum

of the percentages

Interest

or Ability

Achi e vement
exceeds

100 because

of roundi n g.

On the questi o n
of the grade-l e vel
at whi c h tests were gi v en, we found,
fi r st, that l o cal tests were di s tri b uted
fai r l yevenl y
over al l the el e mentary
and secondary
grade l e vel s , wi t h some drop-off at 12th grade. The
di s tri b uti o n
of state tests over the grade l e vel s was more uneven.
More
state tests were gi v en i n grades 3,4,6,8,
and 11. Very few state tests were
admi n i s tered
to l s t, 2nd, or 12th graders.

Sc&ces

of Tests

l o cal school
di s tri c t
testi n g
strongl y
i n fl u ence
State educati o n
agenci e s
programs.
States mandated
j u st over hal f of al l the systemwi d e
tests
admi n i s tered
i n US. school
di s tri c ts.
State educati o n
agenci e s
devel o ped
most, but not al l , of these state-mandated
tests, usual l y
i n conj u ncti o n
wi t h
test devel o pment
contractors.
Hal f the state educati o n
agenci e s
requi r ed
that thei r l o cal di s tri c ts
admi n i s ter
speci f i c
commerci a l y
devel o ped
tests.
In four cases, states modi f i e d
these tests to match state curri c ul u m
standards.
Another four state educati o n
agenci e s
requi r ed
onl y that thei r

Page

21

GAO/PEMD-93-8

Student

Testi n g

Chapter
2
The Current
Tee&g

Extent

and

Nature

l o cal di s tri c ts
admi n i s ter
approved
l i s t.

of School

some

commerci a l y

devel o ped

test from an

Thus, di r ectl y
or through
state mandates,
commerci a l
test publ i s hers
are
al s o a force shapi n g
school
di s tri c t
testi n g programs.
Al m ost 60 percent
of
the systemwi d e
tests reported
to us were commerci a l y
devel o ped.
In fact,
achi e vement
tests produced
by the three l a rgest commerci a l
test
publ i s hers
compri s ed
43 percent
of al l tests. Fi g ure
2.2 summan zes these
data on sources
of tests.
Fl g ure
2.2: Sources
Devel o ped
Tests

of Commerci a l y
CTB MacMi l a n/McGraw-Hi l

Ri v ersi d e

Psychol o gi c al

Publ i s hi n g

Corporati o n

BThe sum of percentages


exceeds
100 because
of roundi n g.
Tests Incl u de
four categori e s:
achi e vement,
apti t ude,
readi n ess,
and vocati o nal
i n terest. CTB MacMi l a n/McGraw-Hi l
publ i s hed
the Comprehensi v e
Test of Basi c Skil s, Cal i f orni a
Achi e vement
Test, Sci e nce
Research
Associ a tes
tests, Test of Cogni t i v e
Skil s, and Kuder Occupati o nal
Interest Survey. The
Psychol o gi c al
Corporati o n
publ i s hed
the Stanford Achi e vement
Test, Metropol i t an
Achi e vement
Test, Oti s -Lennon
School Abi l i t i e s
Test, Di f ferenti a l
Apti t ude
Test, and Ohi o Vocati o nal
Interest
Survey. Ri v ersi d e
Publ i s hi n g
publ i s hed
the Iowa Test of Basi c Skil s, Iowa Test of Educati o nal
Devel o pment,
Tests of Achi e vement
and Profi c i e ncy,
Cogni t i v e
Abi l i t i e s
Test, and the 3-Rs Test.

Test Desi g n

desi g n
of current tests, we found most
When we asked about the techni c al
were qui t e tradi t i o nal ,
despi t e
l i v el y
debate i n recent years about needed
i m provements,
That i s , most systemwi d e
tests were desi g ned
to show how
a student performs
i n rel a ti o n
to the norm or average
of al l others
%ome of the tests were customi z ed
to state or l o cal speci f i c ati o ns.
The three publ i s hers
are: CTB
MacMi l a rVMcGraw-Hi l ,
the Psychol o gi c al
Corporati o rVHarcotu%BraceJovanovi c h,
and Ri v ersi d e
Publ i s hi n g/Houghton-Mi f fl i n .

Page

GAWPEMD-93-9

22

,,:
:9,

Student

Testi n g

Chaptar
2
The Current
Terti n g

Extent

and

Nature

of School

(norm-referenced)
and to measure
knowl e dge
by aski n g
students
to
choose
one answer among several
choi c es
for each of a battery of
questi o ns
(mul t i p l e -choi c e
format). The al t ernati v es
are to measure
a
student agai n st some standards
or cri t eri a
external
to the group
(cri t eri o n-referenced)
or to exami n e
more types of ski l s
and l e arni n g
by
cal l i n g
for short wri t ten answers, essays, or other more creati v e
acti v i t i e s
(performance-based
format).
In the cri t i c al
apprai s al s
of a maj o ri t y
of testi n g experts and the l a rger
communi t y
of educati o n
professi o nal s ,
cri t eri o n-referenced
and
performance-based
tests are more popul a r
than the tradi t i o nal
norm-referenced
and mul t i p l e -choi c e
tests. Responses
to the opi n i o n
questi o ns
of our survey affi r m thi s (see chapter
5). Yet, testi n g practi c e
l a gs behi n d
these preferences.
We found that 71 percent
of the tests
admi n i s tered
l a st year were norm-referenced,
refl e cti n g
the domi n ance
of
nati o nal
commerci a l y
devel o ped
tests, And 71 percent
of the tests were
formatted
excl u si v el y
wi t h mul t i p l e -choi c e
responses.
Thi r ty percent
of
the tests di d contai n
some performance
el e ment,
but 40 percent
of them
were wri t i n g
sampl e s
al o ne or test batteri e s
that i n cl u ded
a wri t i n g
sampl e
but were mul t i p l e -choi c e
tests. Onl y 18 percent
of al l tests asked students
to perform
i n more than one subj e ct
area usi n g performance
formats.
the test desi g n features we found.
Fi g ure
2.3 s ummari z es

Page

23

GAOIPEMD-99-8

Student

Teeti n g

Chapter
2
The Current
Teeti n g

Fi g ure

2.3: Test

Desi g n

Extent

and

Nature

of School

Features

I
BThe sum of the percentages

exceeds

100 because

Wri t i n g
sampl e s
tests wi t h wri t i n g

or mul t i p l e -choi c e
sampl e

Mul t i p l e -subj e ct

performance

Mul t i p l e -choi c e

tests

tests

of roundi n g.

Wri t i n g
sampl e s,
readi n g
comprehensi o n
and response
exerci s es,
and
math or sci e nce
probl e m-sol v i n g
predomi n ated
among the
performance-based
test formats. Less frequentl y ,
we found some use of
other types of performance
formats, such as sci e nce
l a boratory
work,
group work, or ski l s observati o ns.
But l a boratory
work and group work
compri s ed
onl y 4 percent
of al l performance
formats used i n 1990-91.
Ski l s observati o ns
compri s ed
a l a rger percentage-12
percent-but
l a rgel y
because
of thei r use i n readi n ess
tests. Thus, performance
formats
remai n
domi n ated
by the more tradi t i o nal
paper-and-penci l
essay
questi o ns.
States and school
di s tri c ts,
rather than the testi n g i n dustry,
seem to have
managed
most of thi s type of testi n g up to now. Tabl e 2.1 shows that
performance-based
tests i n 1990-91 tended more often to be
state-mandated
and to be much more often devel o ped
by or for a state or
school
di s tri c t
than tests i n general .

Page

24

GAO/PEMD-93-9

Student

Testi n g

Chapter
2
The Current
Testi n g

Tabl e
2.1: Comparl o on
Performance-Based
Tests

of
Tests Wi t h

All

Extent

and

Test characterl s tl c
State-mandated
Devel o ped
by or for school
Devel o ped
by or for state
Commerci a l y
devel o ped
Grades
K-8
Grades 9-l 2

Nature

of School

Performance-based
district

tests
88%
14
76
9
77
55

All tests
58%
7
35
55
82
56

A few more states are now tryi n g cri t eri o n-referenced,


performance-based
statewi d e
tests, and al l of the three l a rgest test publ i s hers
expect to have
thei r maj o r achi e vement
exams avai l a bl e
wi t h performance
formats wi t hi n
the next 2 years. The costs of performance-based
tests are di s cussed
in
chapter
3.

Purposes

of Tests

Another debate over testi n g concerns


the stakes that shoul d
ri d e on the
resul t s, wi t h hi g her stakes thought
by some to strengthen
teacher and
student moti v ati o n,
but by others to di v ert too much ti m e from regul a r
cl a sswork
and even to prompt cheati n g
by teachers
and students.
In fact,
most di s tri c ts
reported
l o w stakes. Di s tri c ts
gave tests because
thei r states
requi r ed
them and because
they bel i e ved
tests offered useful i n formati o n
on the students,
school s ,
or curri c uhun.
Thus, we found that l o cal di s tri c ts
were l e ast l i k el y
to report usi n g tests for student or school
accountabi l i t y
or for student pl a cement.
Twenty-four
percent
were reported
formal l y
used for student accountabi l i t y
and, therefore,
as h i g h-stakes
tests. (See
gl o ssary.)
For over hal f of the tests admi n i s tered,
the respondents
rated
student or school
accountabi l i t y
measures
of l i t tl e or no i m portance
or not
appl i c abl e .
At the state l e vel , however,
di s tri c t
or state accountabi l i t y
was a vi v i d
purpose
for testi n g-though
not student or school
accountabi l i t y.
States
reported
a cl e ar purpose
i n maki n g
test resul t s publ i c
to encourage
voters
or school
boards to i n sti g ate
needed
systemwi d e
changes.
As was true at
the di s tri c t
l e vel , though, state educati o n
agenci e s
most commonl y
admi n i s tered
statewi d e
tests for purposes
of eval u ati o n
and di a gnosi s .
The
l e ast popul a r
uses of statewi d e
tests i n vol v ed
state-l e vel
management
(pl a nni n g,
tracki n g,
or resource
al l o cati o n)
or groupi n g
and pl a cement
of
i n di v i d ual
students.

Page

26

GAO/PEMD-93-8

Student

Testi n g

Chapter
2
The Current
Testi n g

Rel a ti o nshi p
to Curri c ul u m

of Tests

Extent

and

Nature

of School

Whether
students
have had enough
opportuni t y
to l e arn the materi a l
on
tests i s another
conti n ui n g
i s sue i n testi n g pol i c y
debates. The match of
what i s tested wi t h what i s taught-or
requi r ed
to be taught-i s
someti m es
referred to as the al i g nment
of the two, and state curri c ul u m
requi r ements
are one means toward that end, Despi t e consi d erabl e
di s cussi o n
of the
need for standards
prescri b i n g
course content, not al l states had a
statewi d e
curri c ul u m
i n 1990-91, and not al l of those that di d requi r ed
thei r
l o cal di s tri c ts
to fol l o w it. At l e ast 17 states had no curri c ul u m
and at l e ast
10 others had curri c ul a
thei r l o cal di s tri c ts
were not obl i g ed
to fol l o w.
Onl y 14 states both requi r ed
that l o cal di s tri c ts
fol l o w
a state curri c ul u m
and admi n i s tered
a statewi d e
test. For 65 percent
of the statewi d e
tests i n
those states wi t h a curri c uhun,
offi c i a l s
tol d us they bel i e ved
the tests
were l a rgel y
or perfectl y
al i g ned
wi t h the curri c ul u m,
and for another
30 percent, offi c i a l s
bel i e ved
the tests were moderatel y
al i g ned.
Local di s tri c t
respondents
reported
that 37 percent
of the di s tri c twi d e
tests i n use i n 1990-91 had caused some curri c ul a r
real i g nment,
27 percent
to a moderate
or l a rge extent. The i n fl u ence
of tests on curri c ul u m
was
j u dged
posi t i v el y ,
by and l a rge. Where l o cal offi c i a l s
reported
shi f ts i n
curri c uh~~-~
i n response
to tests, about two-thi r ds
thought
that the
real i g nment
had strengthened
l e arni n g
i n thei r di s tri c t,
whi l e onl y
2 percent
thought
that it had weakened
l e arni n g.
The i s sue of al i g nment
rai s es the questi o n
of al i g nment
to what, especi a l y
if l o cal teachi n g
and curri c ul u m
do not match the breadth
or depth of
content nati o nal
experts recommend
i n an area. As shown i n chapter
5,
some state testi n g offi c i a l s
tol d us they prefer tests geared to thei r
curri c ul a ,
though that may to some degree be at odds wi t h the current
pressure
for school s
to adopt nati o nal
standards
and be tested agai n st
them.

Trends
Testi n g

i n State
Programs

As was menti o ned


i n the previ o us
chapter, few states mandated
statewi d e
tests before the l a te 197Os, but by the end of the 19809, few di d not do so.
Many of the fi r st statewi d e
tests, ari s i n g
from the b ack-to-basi c s
emphasi s
of the peri o d,
were meant to measure
m i n i m um
competency.
They tested onl y the maj o r subj e cts
and someti m es
j u st readi n g,
wri t i n g,
or math. More often than not, states merel y purchased,
or requi r ed
that
thei r l o cal di s tri c ts
purchase,
commerci a l
norm-referenced
tests. Partl y i n
reacti o n
to percei v ed
shortcomi n gs
i n thi s method
of assessment,
state
educati o n
offi c i a l s
argued for di f ferent
testi n g programs.
In many states,

Page

26

GAOIPEMD-92-S

Student

Testi n g

Chapter
2
The Current
Terti n g

they were
desi r es.

Extent

al l o wed

and

Nature

to desi g n

of School

testi n g

programs

that l a rgel y

matched

thei r

Greater control
over student testi n g by state educati o n
offi c i a l s
has
fostered
several
trends. They i n cl u de:
more i n vol v ement
i n test
devel o pment
by state and l o cal educati o n
offi c i a l s ;
more
cri t eri o n-referenced
testi n g and l e ss norm-referenced
testi n g; more
performance-based
formats; teacher i n vol v ement
i n test devel o pment
and
scori n g;
test devel o pment
procedures
that i n cl u de
consensus-bui l d i n g
among most i n terested
groups; col l e cti n g
and rel e asi n g
soci a l
and
economi c
i n di c ators,
al o ng wi t h test resul t s, to descri b e
school
di s tri c t
or
performance;
and statewi d e
testi n g programs
i n corporati n g
more than one
test.
Local teacher and admi n i s trator
i n vol v ement
i n test devel o pment
and
scori n g
has general l y
worked
to the sati s facti o n
of al l parti e s i n many
states and Canadi a n
provi n ces.
Moreover,
survey respondents
i n states
wi t h cri t eri o n-referenced
performance-based
tests-whi c h
provi d e
an
opportuni t y
for teacher i n vol v ement
i n devel o pment
and scori n g-usual l y
ci t ed teacher i n vol v ement
as one of the maj o r strengths
of thei r testi n g
programs.
Not al l teachers
and admi n i s trators
need to be i n vol v ed,
j u st
enough,
on a rotati n g
basi s , to gi v e l o cal educati o n
professi o nal s
a sense
that thei r group i s i n fl u enti a l
i n the process.
Some devel o pments,
occurri n g
i n too few states and too recentl y
to be
cal l e d
trends, poi n t to ways i n whi c h state testi n g programs
mi g ht be
expanded.
Two states are attempti n g
to devel o p
programs
that are
rel a ti v el y
comprehensi v e
i n subj e ct
matter, i n cl u di n g
tests i n art, musi c ,
many vocati o nal
educati o n
subj e cts,
and more. Fiv e states are i n the earl y
stages of devel o pment
for statewi d e
end-of-course
tests. Two states
al r eady
admi n i s ter
statewi d e
achi e vement
tests for advanced
hi g h school
subj e cts,
and other states may j o i n i n that effort.
Thus, state educati o n
getti n g more so. Testi n g
owi n g to current poor
been forced to ski p a
gi v en up on statewi d e

Summary

Our survey
i m portance

Page

27

agenci e s
are acti v el y
i n vol v ed
i n testi n g and are
acti v i t y
has been stal l e d
some i n several
states
state fi s cal condi t i o ns.
But though some states have
year or stretch out devel o pment
schedul e s,
few have
tests wi t hout
repl a ci n g
them wi t h other tests.

resul t s suggest that testi n g


to tests as student-accountabi l i t y

offi c i a l s

di d not ascri b e
much
measures
but that

GAO/PEMD-92-8

Student

Testi n g

Chapter
2
The Current
Testi n g

Extent

and

Nature

of School

onequarter
of al l tests were, nonethel e ss,
hi g h-stakes
tests. And wi t h
excepti o n
of wri t i n g
sampl e s,
despi t e
al l the enthusi a sm
surroundi n g
cri t eri o n-referenced,
performance-based
testi n g, by 1991 it was stil
pri m ari l y
i m pl e mented
i n the seven states wi t h statewi d e
performance
tests and coul d otherwi s e
be found mostl y i n earl y grades
school - readi n ess
tests.

the

In the mai n , students


i n 1990-91 were tested i n four or fi v e subj e ct
areas,
usi n g commerci a l y
devel o ped,
mul t i p l e -choi c e,
norm-referenced,
and
state-mandated
tests. But if state testi n g offi c i a l s
have thei r way, more
tests i n the future wi l be performance-based,
cri t eri o n-referenced,
and at
l e ast partl y devel o ped
by state and l o cal offi c i a l s .
At l e ast on average, and consi d eri n g
onl y
not seem to have been overl y tested. The
hours i n the year taki n g exams. Thus, an
exami n ati o n
system shoul d
be opposed
evi d ence
than what we found. However,
of testi n g from di s tri c t
to di s tri c t.
Some
some not at al l .

Page

systemwi d e
tests, students
do
average student spent l e ss than 4
argument
that a nati o nal
on those grounds
demands
other
there was a range to the amount
di s tri c ts
tested qui t e a l o t, and

GAO/PEMD-92-8

28

,.I

,,

#,,,

Student

Testi n g

Chapter

1 The Current

Cost of Testi n g

Of al l the unsettl e d
i s sues i n the debate over a nati o nal
exami n ati o n,
none
has provoked
such a di v erse
set of cl a i m s as i t s esti m ated
cost. These have
ranged wi d el y -from
a few mi l i o n
dol l a rs
to several bi l i o n
dol l a rs
a year.
The costs of current tests arouse controversy,
too, and are not al w ays
known preci s el y .
Thi s i s true even for tests that are commerci a l y
devel o ped
and sol d at a fi x ed pri c e, for whi l e the testi n g fi r ms know thei r
costs, vari a ti o ns
i n use by the purchasi n g
school
di s tri c ts
affect the overal l
costs i n ways that have not been thoroughl y
documented.
Thi s chapter
answers part of the second
eval u ati o n
questi o n
wi t h the
resul t s of our surveys on the costs of parti c ul a r
types of tests and on the
aggregate
cost of testi n g i n the Uni t ed States. Both al l o w us to make
reasonabl e
esti m ates
of the potenti a l
cost of di f ferent
ki n ds of nati o nal
exami n ati o n
systems, a task undertaken
i n chapter 4. The fi r st part of thi s
chapter di s cusses
the maj o r components
that make up the cost of a test,
whi c h partl y expl a i n
why cost esti m ates
can vary so much when taki n g
onl y some of these components
i n to account.
We then present our cost
esti m ates
for systemwi d e
testi n g i n the Uni t ed States, for parti c ul a r
types
of tests, and for test devel o pment.
We al s o i n vesti g ate
the presence
of
economi e s
i n l a rge-scal e
testi n g.

Dol l a r and Ti m e
of Tests

Costs

Cost esti m ates


can be thoughtful
and accurate
and sti l vary wi d el y ,
si n ce
a tests cost has many components,
not al l of whi c h are al w ays i n cl u ded
in
esti m ates.
Some are obvi o us.
The l e ngth of the test i s one component,
for
exampl e ,
and l o nger tests tend to be more expensi v e
to devel o p,
admi n i s ter,
score, and report than shorter tests when al l other factors are
equal . Some components
are not so obvi o us.
The ti m e taken from a
teachers
schedul e
to admi n i s ter
a test, for exampl e ,
i s often negl e cted
in
cost cal c ul a ti o ns.
Test devel o pment
costs, l i k ewi s e,
often get l e ft out.
Si n ce we asked about al l costs i n our surveys, we can
esti m ate
al l costs i n vol v ed
i n admi n i s teri n g
systemwi d e
school
di s tri c ts
i n the year 1990-91. Our respondents
costs i n two ways: by l i s ti n g
the dol l a rs
they pai d out
test-rel a ted
servi c es
or suppl i e s
and by esti m ati n g
the

IWe di d not ask about costs i n cl u ded


such as the costs of bui l d i n g
space
respondents
to al l o cate
consi s tentl y .

Page

29

i n general school di s tri c t


used for tests. Such i n di r ect

use the responses


to
tests i n U.S.
accounted
for testi n g
for tests or
personnel
hours

or state agency overhead


costs woul d have been

GANPEMD-93-8

expenses,
di f fi c ul t
for

Student

Testi n g

Chapter
3
The Current

Cost

of Testi n g

devoted
to testi n g and to test-rel a ted
acti v i t i e s.2
For state-mandated
tests,
we i n corporated
costs from both the state and the l o cal di s tri c t
l e vel . We
cal c ul a ted
the ti m e cost by mul t i p l y i n g
the number
of hours spent on
test-rel a ted
acti v i t y
by the hourl y
empl o yee
~al a ry.~ Addi n g
the ti m e costs
to the other costs gave us the total for each test.
Wi t hout
excepti o n,
every test i n curred
some expendi t ure
of personnel
ti m e. School personnel
(usual l y
teachers)
admi n i s tered
al m ost al l the tests
taken by thei r students.
School di s tri c ts
al s o expended
cash when they
purchased
tests from commerci a l
test publ i s hers.
In many cases, however,
school
di s tri c ts
pai d nothi n g-i n
cash-for
tests: states that devel o ped
thei r own tests commonl y
di d not charge the di s tri c ts
for them.
Occasi o nal l y ,
tests were al s o provi d ed
free when a school
di s tri c t
served
as a pi l o t for a new test or when it used the Armed Servi c es
Vocati o nal
Apti t ude
Battery.
In the year 1990-91, state and l o cal educati o nal
agenci e s
pai d an average
of
about $4 for each i n di v i d ual
student test admi n i s trati o n.
At the same ti m e,
they devoted
sl i g htl y
over $10 worth of state and l o cal educati o n
personnel
ti m e for each i n di v i d ual
student test admi n i s trati o n
(that
amounts
to about 35 mi n utes
of personnel
ti m e per student test, or about
620 hours per di s tri c t
test). So each ti m e a student took a test, it cost
about $16. On average,
each school
di s tri c t
expended
about 1,500
personnel
hours l a st year on systemwi d e
testi n g and spent, i n dol l a rs
and
ti m e, about $34,500q4 In budget terms, testi n g di d not often account
for
more than 1 percent
of school
di s tri c t
budgets,
averagi n g
about one-hal f
of
1 percent. State programs
averaged
l e ss than 2 percent
of state educati o n
agency budgets.
We found wi d e vari a ti o n
in
student test), so we l o oked
type of test i n fl u ences
costs.
$14, whi l e tests wi t h at l e ast
about $20. Second, di f ferent

these fi g ures (from l e ss than $1 to over $90 per


for expl a nati o ns
of those vari a ti o ns.
Fi r st, the
Mul t i p l e -choi c e-onl y
tests averaged
around
some performance
component
averaged
di s tri c ts
face di f ferent
si t uati o ns
that seem to

Qespondents
were asked to account
for the amount of personnel
ti m e devoted
to: devel o pi n g
the test;
prepari n g
students to take the test; getti n g trai n ed to admi n i s ter
or score the test; trai n i n g
others to
admi n i s ter
or score the test; admi n i s teri n g
the test; col l e cti n g,
sorti n g, and mai l i n g
compl e ted
tests,
scori n g
the tests; and anal y zi n g
and reporti n g
the resul t s.
W e asked for the ti m e spent on testi n g by three l e vel s of staff manageri a l ,
nonmanageri a l
professi o nal ,
and cl e ri c al .
We al s o asked each state or di s tri c t to gi v e the average sal a ry
three l e vel s , whi c h we then used to cal c ul a te
the dol l a r costs of the ti m e spent.
4These parti c ul a r
di s tri c t
ti m e and cost fi g ures.
Page

30

averages

for personnel

hours

expended

and cost do not i n cl u de

GAWPEMD-93-8

of each

of the

any state

Student

Testi n g

Chapter
8
The Current

Coat

be systemati c al l y
are summari z ed

of Testi n g

rel a ted
i n tabl e

to the costs
3.1.

they i n cur

for testi n g.

These

factors

Some cost vari a ti o ns


refl e ct characteri s ti c s
of the student body, such as
more l o w-i n come
or non-Engl i s h-speaki n g
students.
Stil others refl e ct
state mandates.
The choi c e
to use more of the more expensi v e
performance
tests carri e s obvi o us
cost consequences.
Northeastern
and
Southern
states may have hi g her testi n g costs because
they admi n i s ter
the
more expensi v e
performance-based
tests (i n the Northeast)
and
hi g h-stakes
tests wi t h hi g her l e vel s of test securi t y.
In si t uati o ns
where we
found a di s tri c t
spendi n g
l e ss per student on testi n g, we al s o found such
features as l a rger si z e of di s tri c t,
more grade l e vel s tested, and more
experi e nce
wi t h the chosen tests, as wel l as more testi n g overal l .
We al s o
found more use of hi g h-stakes
tests associ a ted
wi t h l o wer costs6
Tabl e
3.1: Factors
and Lower Testi n g

Rel a ted
Costs

to Hi g her
per Student

Testi n g
Hi g her

Contri b uti n g
factors
Hi g her number
of performance
tests
Hi g her proporti o n
of l o w-i n come
students
Hi g her proporti o n
of bi l i n gual
students
State mandates
to test
Northeastern
l o cati o n
Southern
l o cati o n
Hi g her number
of tests admi n i s tered
Hi g her number
of grade l e vel s
tested
Hi g her number
of years of experi e nce
wi t h
a test
Hi g her number
of hi g h-stakes
tests
Larger district si z e

costs

Lower

aAs measured

Di f ferences
and Local

i n State
Costs

by cost per student

test hour.

As mi g ht be expected,
l o cal school
di s tri c ts
the contri b uti o n
of di f ferent
ki n ds of staff
fi g ures. For exampl e ,
i n the l o cal di s tri c ts,
contri b uted
86 percent
of the ti m e spent i n
admi n i s trators
and cl e ri c al
empl o yees
onl y
offi c i a l s
respondi n g
to our survey reported

and state agenci e s


di f fered
in
and acti v i t i e s
to the overal l
cost
teachers
and speci a l i s ts
test-rel a ted
acti v i t y,
and
12 percent. In contrast, state
that admi n i s trati v e
and cl e ri c al

6Hi g h-stakes
tests i n fl u enced
overal l
testi n g costs i n two di f ferent ways. They tended to consume
more
personnel
ti m e i n test preparati o n
acti v i t i e s,
but these i n creased
costs were more than offset by cost
decreases
associ a ted
wi t h the fact that these tended more often to be mul t i p l e -choi c e
tests

Page

31

GAO/PEMD-93-8

Student

Testi n g

Chapter
3
The Current

Cort

of Teeti n g

empl o yees
contri b uted
acti v i t y
and nonmanageri a l

about

41 percent
professi o nal s

of the ti m e
contri b uted

spent i n test-rel a ted


69 percent.

Concerni n g
test-rel a ted
acti v i t i e s,
states tend to devel o p
rather than
admi n i s ter
tests, whi l e di s tri c ts
show the opposi t e
pattern: much
admi n i s trati v e
expense
but few devel o pment
costs. Thus at the di s tri c t
l e vel , 39 percent
of ti m e was devoted
to admi n i s teri n g
tests; 28 percent
to
prepari n g
students;
18 percent to col l e cti n g,
scori n g,
and anal y zi n g
the
tests; and 16 percent
to other test-rel a ted
acti v i t i e s.
At the state l e vel ,
36 percent
of ti m e was devoted
to test devel o pment;
10 percent
to trai n i n g;
37 percent
to scori n g,
col l e cti n g,
mai l i n g,
and anal y zi n g;
and 17 percent to
other acti v i t i e s.
Onl y 9 percent
of state-l e vel
ti m e was devoted
to test
admi n i s trati o n,
and onl y 2 percent
of di s tri c t-l e vel
ti m e was devoted
to test
devel o pment.
For onl y three tests i n the Uni t ed States di d state costs average more than
di s tri c t
costs. Even i n those states admi n i s teri n g
thei r own
state-devel o ped,
ful l - battery
(that i s , three or more core subj e ct
areas)
performance-based
tests i n 1990-91-probabl y
the most expensi v e
possi b l e
si t uati o n
for a state-di s tri c t
costs exceeded
state costs. On
average, the state assumed
onl y 25 percent
of the costs of tests i n whi c h
states were i n vol v ed.
Even wi t h tests that state agenci e s
themsel v es
devel o ped,
pri n ted,
di s tri b uted,
scored, anal y zed,
and provi d ed
to the
di s tri c ts
wi t hout
charge, the bul k of the costs fel l at the l o cal l e vel . The
resul t refl e cted
the fact that personnel
ti m e devoted
to test admi n i s trati o n
al w ays compri s ed
the maj o ri t y
of the costs, and these were, of course,
costs onl y to the l o cal school
di s tri c ts.

Mul t i p l e -Choi c e
Performance-Based
Test Costs

and

i n chapter
5),
As we l e arned
i n the opi n i o n
secti o n
of our survey (reported
and as wi d espread
di s cussi o n
of desi r abl e
i m provements
i n testi n g shows,
there are currentl y
both great hopes and l a rge unknowns
about new
methods
of testi n g that go beyond
aski n g students
to choose
from among
several
answers. These performance-based
tests are known to be more
expensi v e
than mul t i p l e -choi c e
tests as a general
rul e , but how much more
expensi v e?
Accurate
esti m ates
requi r e
cl e ar di s ti n cti o ns
among di f ferent
defi n i t i o ns
of a performance
test. Many have some mul t i p l e -choi c e
i t ems
and may have onl y one or several
performance
components.
Formats
can
vary wi d el y
i n type and expense,
and as a resul t , performance-based
tests
can vary wi d el y
i n cost.

Page

82

GAO/PEMD-98-8

Student

Testi n g

Chapter
3
The Current

Cost

of Testi n g

Our l a rge survey sampl e


and oversampl i n g
i n states wi t h state
performance-based
tests al l o wed
us to obtai n a good compari s on
of the
costs of the two types of tests by l o oki n g
at school
di s tri c ts
i n states where
both were admi n i s tered.
Thus, we coul d hol d constant,
or remove the
confusi n g
effect of, many factors by exami n i n g
costs of two di f ferent
ki n ds
of tests i n the same di s tri c t
for the same student popul a ti o n,
and al l as
reported
by a si n gl e person compl e ti n g
our survey. And where school
di s tri c ts
general l y
admi n i s ter
both ki n ds of tests, the performance-based
tests are l i k el y
to be cl e arl y
di f ferent
from the mul t i p l e -choi c e
tests,
di f ferent
enough
to j u sti f y
usi n g both.
In the si x states where school
di s tri c ts
used both state-devel o ped
performance-based
tests and commerci a l y
devel o ped
mul t i p l e -choi c e
tests, we found the performance-based
tests were typi c al l y
al m ost twi c e as
expensi v e.
As shown i n fi g ure 3.1, the mul t i p l e -choi c e
tests averaged
$16
per student (rangi n g
from $11 to $20), whi l e the performance-based
tests
averaged
$33 (wi t h a range from $16 to $64).6

%i c tl y
speaki n g,
these cost fi g ures may underesti m ate
the cost of pure performance-based
tests.
These si x states reported to us on a total of 11 performance-based
tests (two states used 2 each and
another state used 4). Of the 11 tests, onl y 1 wss formatted
excl u si v el y
wi t h performance
questi o ns.
All
of the others had some mul t i p l e -choi c e
questi o ns.
All of the tests, however, empl o yed
performance
formats i n more than one subj e ct. That di s ti n gui s hes
these from other state tests wi t h
performance-based
formats i n wri t i n g but mul t i p l e -choi c e
formats i n all other subj e ct areas. The
percentage
of test ti m e devoted to performance-based
questi o ns
among these tests ranged from 20 to
100, wi t h a mean of 46 percent.

Page

83

GAO/PEMD-98-8

Student

Testi n g

Chapter
8
The Current

Fi g ure
3.1: Per-Student
Costs of Two
Test Types
In States Havl n g
Both

70

Dol l u 5

C
l

Economi e s

i n Testi n g

Cost

of Testi n g

per Studwi t

Most l xp5nol v e
Avmg5
L558t

oxp5nrl v o

over ti m e, we found, through


The cost of testi n g can be l o wered
economi e s
of scal e and scope and as experi e nce
grows. Thi s i s especi a l y
i m portant
i n consi d eri n g
the nati o nwi d e
costs of performance
testi n g,
whi c h has been very expensi v e
i n the pi l o t efforts so far. Our survey data
provi d e
evi d ence
of al l three possi b i l t i e s
for future economi e s,
though we
di d not try to use our observati o ns
to create corrected
or adj u sted
esti m ates
of the l o ng-term
costs of one or more systems of nati o nal
tests
because
so many factors are uncertai n .
As shown i n fi g ure 3.2, i n
revi e wi n g
our data on state performance-based
tests, we found economi e s
of scal e when tests were gi v en to di f ferent-si z ed
groups of students.
The
per-student
cost of a test decl i n ed
as more students
were i n cl u ded
i n a test
admi n i s trati o n.
We can expect the per-student
cost to decl i n e
as fi x ed
costs (such as test devel o pment
and some costs of operati o n,
such as
scori n g,
di s tri b uti o n,
and si t e preparati o n)
are di v i d ed
by a l a rger test
popul a ti o n.

Page

34

GAO/PEMD-93-8

Student

Testi n g

Chapter
8
The Current

Fl g ure

3.2: Economi e s

State Performance

of Scal e
Testi n g

Cost

of Testi n g

In
70

Cost psf sl u dsnt

par tnt (dol l u s)

0
60
0
50

40
0
30

20

10

0
0
Numkr

100
of studsnta

200
tdsd

300

400

woo

600

700

800

(thausmxts)

Second, as shown i n fi g ure 3.3, agai n usi n g the state performance-based


test data, we found economi e s
of scope when the same test admi n i s trati o n
was empl o yed
for several
purposes,
such as to test the same student
popul a ti o n
i n more than one subj e ct
area. Agai n , we can expect the
per-subj e ct-area
cost of a test to decl i n e
as more subj e ct
areas are
i n cl u ded
i n a test admi n i s trati o n
and the fi x ed costs are di v i d ed
by thi s
l a rger number.

Page

35

GAO/PEMD-93-9

Student

Testi n g

Chapter

The

State

Performance

Current

Cost

of Testi n g

testi n g

Cat par otudml

30

pw l hJoct

arm

tested

(dol l a rs)

26
0

20

0
15

10

m
:

II

0
0
Numkr

1
of subj e ct

2
amas

tasted

Thi r d, costs can decl i n e


wi t h experi e nce,
as those i n vol v ed
fi n d ways to
accompl i s h
tasks i n si m pl e r
and l e ss expensi v e
ways. For exampl e ,
the
state and the Canadi a n
provi n ce
reporti n g
the most years of experi e nce
in
performance-based
testi n g have average
per-student
performance-based
test costs of l e ss than $22, wel l bel o w the overal l
average
of $33.

Test Devel o pment


costs

The data we used to descri b e


testi n g costs thus far i n cl u de
al l the costs
our state and l o cal survey respondents
coul d recal l for the academi c
year,
1990-91. Thus, we l e arned
of ongoi n g
test devel o pment
costs (whi c h
can
themsel v es
be consi d erabl e ),
but we di d not get good i n formati o n
on
start-up devel o pment
costs, those encountered
before a tests fi r st
admi n i s trati o n.
We i n tervi e wed
offi c i a l s
at testi n g fi r ms and i n state
agenci e s
to l e arn more about these one-ti m e-onl y
costs. Some tests i n use
today, such as the commerci a l y
produced,
nati o nal
norm-referenced
achi e vement
tests, were devel o ped
decades
ago. Thei r start-up costs, even
if adj u sted
for i n fl a ti o n,
woul d bear l i t tl e si m i l a ri t y
to todays
costs.
Technol o gi e s,
procedures,
and experti s e
have changed
a great deal over
ti m e and so have test devel o pment
costs.

Page

36

GAXYPEMD-93-8

Student

Testi n g

- ---- --._~
Chapter
3
The Current

Cost

of Testi n g

.___...._^... - _.-~- .---

The many state tests devel o ped


wi t hi n the past decade offer some more
recent i n formati o n.
These state efforts have ranged wi d el y
i n compl e xi t y.
In al l cases, commerci a l
fi r ms have been empl o yed
to do some or most of
the work. The l e ast expensi v e
efforts have i n vol v ed
commerci a l
test
publ i s hers
adapti n g
thei r exi s ti n g
achi e vement
tests to state curri c ul a
or
other state needs. The test publ i s hers
tap thei r exi s ti n g
test-i t em bank as a
source of questi o ns
for a parti c ul a r
state test. Offi c i a l s
of testi n g fi r ms tol d
us thei r start-up devel o pment
costs ranged from one to a few dol l a rs
per
student.
A more expensi v e
way to devel o p
a test i s to start from scratch, wri t i n g
test questi o ns
that fi t a states curri c ul u m
or gui d el i n es,
then testi n g the
draft on pi l o t groups of students
and maki n g
further revi s i o ns
i n the text,
procedures,
and so on. Al l of the recent state-devel o ped,
ful l - battery,
performance-based
tests have been done thi s way. From offi c i a l s
i n si x
states wi t h these tests and two more states where they were bei n g
devel o ped,
we l e arned
that costs for i n i t i a l
test devel o pment
averaged
$10
per student, These state testi n g offi c i a l s
al s o tol d us the amount of ti m e
needed
to devel o p
the 10 tests from scratch to the pi l o ttest
stage averaged
14 months and to fi n al form, 27 months. None of these states used an
exi s ti n g
state curri c ul u m
to devel o p
questi o ns
for the tests. Al I devel o ped
state curri c ul a
or the l i k e si m ul t aneousl y
wi t h the tests7
Two very expensi v e
and
around
$30 per student.
i n 1991-92-excl u si v el y
test uses the l e ss expensi v e
subj e ct
areas, i n cl u di n g
l a nguage.*

Summary

current state test devel o pment


efforts cost
These prototypes-the
fi r st of whi c h was pi l o ted
use performance-based
formats (no part of the
mul t i p l e -choi c e
format) and cover many
vocati o nal
educati o n,
art, musi c , or forei g n

Based on returns from our nati o nal


sampl e
of school
di s tri c ts
and state
educati o n
agenci e s,
we esti m ate
that the average per-student
test cost i n
the Uni t ed States i n 1990-91 was $15. Mul t i p l e -choi c e
tests tended to cost
l e ss whi l e performance-based
tests tended to cost more, and testi n g costs
vari e d wi d el y
from di s tri c t to di s tri c t. School personnel
ti m e devoted
to
Testi n g
offi c i a l s
devel o ped
what they cal l e d
speci f i c ati o ns,
or o bj e cti v es,
l e ss detai l e d
among the ei g ht wi t h an exi s ti n g
curri c ul u m
abandoned
the effort when they found it too
MPerfotmance
of Technol o gy
(Washi n gton,

Page

37

c urri c ul a r
frameworks,
v al u ed
outcomes,
s ki l
than a true curri c ul u m.
Testi n g
offi c i a l s
i n the one state
at fi r st tri e d to use it i n devel o pi n g
test i t ems, but
cumbersome.

assessment
and new testi n g techni q ues
are di s cussed
i n detai l i n U.S. Congress, Offi c e
Assessment,
Testi n g
i n Ameri c an
School s : Aski n g the Ri g ht Questi o ns,
OTA-SET-619
D.C.: U.S. Government
Pri n ti n g
Offi c e, February
1992), ch. 7-S.

GAO/PEMD-93-8

Student

Testi n g

Chapter
3
The Current

Coat

of Testi n g

testi n g accounted
for three-quarters
of a tests cost, whi c h was borne for
the most part at the l o cal di s tri c t
l e vel , even for statewi d e
tests. State and
l o cal rol e s i n testi n g di f fered;
states di d more test devel o pment
and
trai n i n g,
and l o cal di s tri c ts
di d more test admi n i s trati o n
and student
preparati o n.
Both our sampl e
of state performance-based
tests and the nati o nal
sampl e
of al l tests reveal e d
economi e s
of scal e , scope, and l e arni n g.
Some factors
associ a ted
wi t h hi g her testi n g costs i n cl u ded
central ci t y or rural school
di s tri c t
l o cati o n,
l o w-i n come
or ethni c al l y -mi x ed
popul a ti o ns,
and state
mandates
to test.
Our surveys
col l e cted
compl e te
i n formati o n
about ongoi n g
testi n g costs
(i n cl u di n g
ongoi n g
test devel o pment
costs), but not for start-up costs
i n curred
before the fi r st test admi n i s trati o n.
A pol l i n g
of state testi n g
di r ectors
i n states wi t h the newer forms of statewi d e
performance-based
tests suggested
an average
start-up devel o pment
cost of $10 per student
and an average start-up devel o pment
ti m e of 3 years.

Page

38

GAO/PEMD-93-8

Student

Testi n g

Chapter

The Future

Cost and Extent

of Testi n g

In and of i t sel f , current testi n g does not predi c t


the future extent and cost
of testi n g that woul d occur wi t h the addi t i o n
of a nati o nal
exami n ati o n
system. That woul d depend
on the type of nati o nal
exams and on what
states and school
di s tri c ts
woul d do wi t h thei r current tests. If they al l
were to keep al l thei r current tests and add a nati o nal
test, the extent and
cost of testi n g woul d i n crease
by an i n crement
equal to the l e ngth and cost
of the nati o nal
test, If school s
were to repl a ce
an equal amount of current
testi n g wi t h a new nati o nal
test, the extent and cost of testi n g woul d not
change.
Thi s chapter responds
to the thi r d eval u ati o n
questi o n
regardi n g
the
extent and cost of testi n g i n a future wi t h a nati o nal
exami n ati o n
system
and the overl a p
between
current tests and those bei n g proposed.
Fi r st, we
esti m ate
the cost of a nati o nal
exami n ati o n
system, Then we exami n e
how
l o cal school sdi s tri c ts
reacted when thei r states mandated
the use of new
statewi d e
tests, speci f i c al l y ,
di d they si m pl y
add the new tests or di d they
repl a ce
a then current test wi t h the new state test? How they reacted
provi d es
a cl u e as to how state and l o cal offi c i a l s
may react to a nati o nal
test mandate.
Fi n al l y ,
we di s cuss
the i m pact of the type of nati o nal
test,
whi c h may determi n e
how many school
di s tri c ts
repl a ce
current tests wi t h
a nati o nal
test, how much addi t i o nal
testi n g ti m e and expendi t ure
wi l be
requi r ed,
and to what degree di f ferent
school
di s tri c ts
benefi t from (or pay
for) the change.

Cost of a Nati o nal


Exami n ati o n
System

To esti m ate
the cost of a nati o nal
exami n ati o n
system, many assumpti o ns
must be made about the type and extent of that system, especi a l y
the
ki n ds of tests i n vol v ed.
Most recent di s cussi o ns
have proposed
testi n g
students
at three grade l e vel s . That woul d i n vol v e
approxi m atel y
10 mi l i o n
students.2
Gi v en the range i n test cost per student of $16 for
mul t i p l e -choi c e
tests to $33 for performance-based
tests, as esti m ated
in
chapter 3, a nati o nal
exami n ati o n
system coul d cost between
$160 mi l i o n
and $330 mi l i o n
per year. As most recent di s cussi o ns
have proposed
that
a nati o nal
system be made up of performance-based
tests, however,
$330 mi l i o n
may be the more rel e vant
esti m ate.
At a maxi m um,
if the
performance-based
tests used i n a nati o nal
system were to cost as much as

That
owi n g

i s , the total cost woul d i n crease.


to economi e s
of scal e .

The per-uni t ,

or per-student-test,

cost coul d

possi b l y

decl i n e

M any nati o nal


test cost esti m ates
have used thi s fi g ure i n cal c ul a ti o ns,
and it i s pl a usi b l e ,
as it
represents
one quarter (3 of 12 grades) of the nati o ns
40 mi l i o n
total enrol l m ent
i n precol l e ge
publ i c
educati o n.

Page

GAO/PEMD-93-8

39

,.I _

Student

Testi n g

.,

Chapter
4
The Future

.,~._.I.. .

Co&

and

Extent

of Testi n g

..-...I.-..I.._

the most expensi v e


state-devel o ped
performance-based
woul d ri s e to $640 mi l i o n
for a nati o nal
system.3

test, the cost

Agai n , these esti m ates


i n vol v e
al l testi n g costs, i n cl u di n g
the cost of
personnel
ti m e spent i n test-rel a ted
acti v i t y.
In other words, our fi g ures
i n cl u de
what it woul d cost at l o cal and state l e vel s to prepare,
admi n i s ter,
and score a test gi v en nati o nwi d e
to al l students
i n three grades, i n cl u di n g
payi n g
for the ti m e of al l educati o n
personnel
i n vol v ed.4
The esti m ates
vary
dramati c al l y
from both the hi g h esti m ates
offered by some nati o nal
test
opponents
to the l o w esti m ates
offered by some proponents.
-.--

-.._

Hi g h Esti m ates

The hi g h esti m ates


(at over $3 bi l i o n)
were based on the per-student
pri c e
for tests i n fi v e subj e ct
areas of the Advanced
Pl a cement
exami n ati o ns
now admi n i s tered
by Educati o nal
Testi n g
Servi c e (ETS) for the Col l e ge
Board. Though
the el a borate
central i z ed
marki n g
of these l o ng wri t ten
exams i s undeni a bl y
expensi v e,
usi n g these exi s ti n g
tests as a benchmark
produces
a hi g h esti m ate
for several
reasons. The fi g ure of $65 per
subj e ct-area
exam i s the pri c e currentl y
charged
each student taki n g an
exam, not the cost. Thus, some pri o r devel o pment
costs may not be
refl e cted
(whi c h understates
the cost), and some current expenses
pai d
from the fee may be for unrel a ted
acti v i t i e s,
such as fee reducti o ns
for
l o w-i n come
students,
teacher trai n i n g,
or other acti v i t i e s
of the Col l e ge
Board or ETS (whi c h overstates
the cost). Further, the fi v e exams are
separate,
wi t h fi v e di f ferent
admi n i s trati o ns,
each taki n g 3 hours. And the
ETS staff tol d us that the $66 the student
must pay i s , i n fact, an average
pri c e. Some Advanced
Pl a cement
tests cost ETS more than that (art and
forei g n
l a nguage
exams) and some cost l e ss (core subj e ct
area exams).

Lo& Esti m ates

Lower esti m ates


of the cost of a nati o nal
test have been made usi n g the
anal o gy
of the Armed Servi c es
Vocati o nal
Apti t ude
Battery. Thi s i s a
mul t i p l e -choi c e
test composed
of 13 subtests measuri n g
abi l i t i e s
consi d ered
i m portant
for mi l i t ary
servi c e.
It i s admi n i s tered
by mi l i t ary
personnel
to al l potenti a l
recrui t s
and gi v en free to school
di s tri c ts
that
wi s h to use it. Some i n herent
features of thi s test make it parti c ul a rl y
weak
%re excepti o nal l y
hi g h cost of that one test appears
necessary
for admi n i s trati o n
of a hi g h-stakes
exam.
4We are not addressi n g
the
i n cl u di n g
the ti m e of l o cal
mandated
nati o nal l y ,
wi t h
di s tri c ts
mi g ht absorb the
nati o nal l y .

Page

40

to resul t

from the extra supervi s i o n

bel i e ved

to be

i s sue of who shoul d bear the costs of nati o nal


testi n g. All the costs,
di s tri c t personnel ,
coul d be pai d for nati o nal l y .
Or, a test coul d si m pl y
be
all costs l e ft to l o cal di s tri c ts.
In between these two extremes,
the l o cal
costs of test admi n i s trati o n,
whi l e the test i t sel f is devel o ped
and provi d ed

GAO/PEMD-99-8

Student

Testi n g

Chapter
4
The Future

Cost

and

Extent

of Testi n g

as an anal o gy;
for exampl e ,
the i n cl u si o n
of some topi c s such as
el e ctroni c s,
the l o w degree of securi t y,
and the excl u si v e
rel i a nce
on
mul t i p l e -choi c e
i t ems. In addi t i o n,
the commonl y
used cost data do not
i n cl u de
the costs of staff admi n i s teri n g
the tests and anal y zi n g
and
reporti n g
the resul t s.
- ..^.-._-_-.--

Our Best Esti m ate

General i z i n g
from our survey data, the cost of al l systemwi d e
testi n g i n the
Uni t ed States i n 1990-91 total e d
about $516 mi l i o n.
Gi v en our best
esti m ate
of $330 mi l i o n
for a nati o nal
exami n ati o n
system based on
typi c al
current
performance-based
tests, we bel i e ve
a nati o nal
system
woul d cost al m ost two-thi r ds
as much as the present cost of al l
systemwi d e
testi n g.
By compari s on,
ranged between
annual
revenues
$130 bi l i o n
over
compl e te
nati o nal
di s cussi n g
woul d
contri b uti o ns
to
of al l government

the annual
federal
contri b uti o n
to l o cal publ i c
school s
has
$7 bi l i o n
and $10 bi l i o n
i n the past two decades.
Total
to l o cal publ i c
school s
ranged between
$110 bi l i o n
and
the same ti m e peri o d.
Thus, annual
total cost for a
exami n ati o n
system of the type we have been
amount to l e ss than 5 percent
of present federal
l o cal publ i c
school s
and to l e ss than one-hal f
of 1 percent
funds for l o cal publ i c
school s .

We j u dged
the current state performance-based
tests to be a val i d sampl e
from whi c h to esti m ate
the cost of a nati o nal
test for a coupl e
of reasons.
Fi r st, these tests resul t ed
from consensus-from
a pol i t i c al
process
wi t h
pressures
and counterpressures
si m i l a r
to those one fi n ds i n the current
debate over a nati o nal
test. Di f ferent
i n terest
groups, testi n g experts,
testi n g offi c i a l s ,
and el e cted
offi c i a l s
expressed
concerns
over test format,
qual i t y,
cost, and l e ngth, and these are the tests they chose.
Second, these state testi n g programs
have actual l y
been i m pl e mented.
So
thei r extent and cost fi g ures ari s e from actual practi c e,
not as esti m ates.
Because
al l but one of these programs
are fai r l y recent, they may be more
expensi v e
now than they wi l be l a ter, after testi n g offi c i a l s
have l e arned
how to admi n i s ter
them more effi c i e ntl y .
Nonethel e ss,
a nati o nal
test
coul d i n corporate
features that woul d make it more expensi v e.
For
exampl e ,
a h i g h-end
nati o nal
test coul d i n cl u de
onl y performance-based
questi o ns,
whi c h take more ti m e to answer and to score, but it coul d stil
i n cl u de
enough
questi o ns
to cover al l subj e ct
area content thoroughl y ,
and
it coul d be a h i g h-stakes,
and thus hi g h-securi t y,
test. Extrapol a ti n g
from
the cost of a certai n state test that if al t ered somewhat
woul d resembl e
a

Page

41

GAO/PEMD-93-8

Student

Testi n g

Chapter
4
The Future

Cort

and

Extent

of Testi n g

hi g h-end
test, we esti m ate
that a hi g h-end
nati o nal
test coul d cost over
$1 bi l i o n.6
No state or school
di s tri c t
now admi n i s ters
a test wi t h al l the
features of a hi g h-end
test, however.

Cost Economi e s

Economi e s
i n l a rge-scal e
testi n g are rel e vant
to any esti m ati o n
of the cost
of a nati o nal
exami n ati o n
system, too. The decl i n e
i n costs from havi n g
more experi e nce
i n testi n g suggests
that the cost of a nati o nal
exami n ati o n
system shoul d
decl i n e
over ti m e. The presence
of economi e s
of scope
suggests
that per-subj e ct-area
costs shoul d
decl i n e
as more subj e ct
areas
are added to the same test. The presence
of economi e s
of scal e suggests
that per-student
costs shoul d
decl i n e
as more students
take the same test.
In the previ o us
chapter, we noted that 11 state-devel o ped
performance-based
exams, each coveri n g
four to ei g ht subj e ct
areas,
averaged
$33 i n per-student
costs. Many advocate
thi s type of exam as the
mai n format for any nati o nal
system of exams; thus, the $33 per student
seems a reasonabl e
esti m ate
for the cost of nati o nal
exams. If as has al s o
been suggested,
some of these current state performance-based
tests were
used by cl u sters
of states, thei r per-student
costs shoul d
decl i n e
as l a rger
popul a ti o ns
of students
take each test and decl i n e
even more wi t h
experi e nce
over ti m e.
Coul d economi e s
of scal e be pushed
to an extreme,
such that a si n gl e
nati o nal
test coul d be the l e ast expensi v e-and
most effi c i e nt-approach
to broadeni n g
current testi n g?
Our data do not suggest that. We found that
the economi e s
i n performance-based
testi n g seem to be exhausted
at a
much l o wer scal e than that of the enti r e nati o n-at
about the scal e of a
l a rge state. Thus, groupi n g
smal l states together
i n a cl u ster i n whi c h al l
use a common
exam woul d achi e ve
most or al l of the possi b l e
economi e s
of scal e .

Start-Up Devel o pment


costs

Proj e cti n g
start-up test devel o pment
costs to the nati o nal
l e vel i n vol v es,
once agai n , mul t i p l y i n g
the per-student
costs by the assumed
10 mi l i o n
zrhi s state test is compl e tel y
performance-based
i n format, has hi g h securi t y
because
of that format,
covers si x subj e ct areas, and empl o ys
l o cal teachers and admi n i s trators
i n test devel o pment
and
scori n g. There are currentl y
three di f ferent
forms of the test, and any one student gets onl y one of the
forms. One form takes about 10 hours. Col l e cti v el y ,
the three forms cover the enti r e curri c ul u m;
si n gl y ,
each form covers onl y one-thi r d
of it. The test is now a lo w-stakes
test. If the test were to be
h i g h-stakes,
i n fai r ness, each student shoul d be tested wi t h the same test and over the enti r e
curri c ul u m.
Such a test woul d take 30 hours. The current state test, i n its fi r st year of use, cost $48 per
student. Tri p l i n g
the tests l e ngth woul d not qui t e tri p l e its cost, because
ongoi n g
devel o pment
costs
woul d not change. Moreover,
a nati o nal
admi n i s trati o n
of the test woul d benefi t from economi e s
of
scal e and, over ti m e, economi e s
of experi e nce.
Adj u sti n g
the cost esti m ate for economi e s
of scal e , a
nati o nal
hi g h-end
test coul d cost over $1 bi l i o n.

Page

42

GAO/FEMD-93-8

Student

Tenti n g

Chapter4
The

Future

Coat

and

Extent

of Testi n g

students.
For the three methods
of test devel o pment
menti o ned
i n chapter
3-for
a mul t i p l e -choi c e
test, for an average performance-based
test, and
for the most expensi v e
type of performance-based
test-nati o nal
costs
woul d amount to about $20 mi l i o n,
$100 mi l i o n,
and over $300 mi l i o n.
As
most recent di s cussi o ns
of a nati o nal
exam system have proposed
the type
of exam represented
by the mi d dl e
fi g ure-a
performance-based
test l i k e
those currentl y
i n use i n some states-$100
mi l i o n
probabl y
ranks as the
best esti m ate. Agai n , thi s represents
a one-ti m e-onl y
cost that coul d be
used to devel o p
a new test or perhaps
to pay the states that have al r eady
devel o ped
appropri a te
tests to share thei r knowl e dge.
Tabl e 4.1
summari z es
these proj e cti o ns.
Tabl e
4.1: Proj e cted
Testi n g
Opti o ns

Costs

of Natl o nal
Type
Mul t l o l e chol c e

Testi n a
cost
Per-student
Start-up devel o pment
Annual
admi n i s trati o n
Nati o nal
(mi l i o ns)
Start-up devel o pment
Annual
admi n i s trati o n*
%cl u des

The Effect of Addi n g


New Test

ongoi n g,

recurri n g

of test
Performance-based

devel o pment

$2
16

$10

20

100

160

330

33

costs.

We exami n ed
di s tri c ts
responses
to past testi n g mandates
as a way to
esti m ate
future responses
to any requi r ed
nati o nal
test. Twenty years ago,
very few states mandated
statewi d e
testi n g, but by 1990-91, onl y a handful
of states remai n ed
that di d not requi r e
thei r l o cal di s tri c ts
to admi n i s ter
a
statewi d e
test. Twenty-fi v e
states requi r ed
thei r di s tri c ts
to admi n i s ter
a
commerci a l y
devel o ped
norm-referenced
test. Thi r ty-three
states have
devel o ped
thei r own tests, ei t her adapti n g
an avai l a bl e
commerci a l
test to
thei r needs (typi c al l y
produci n g
a cri t eri o n-referenced
mul t i p l e -choi c e
test) or devel o pi n g
thei r own test from scratch (usual l y
produci n g
a
cri t eri o n-referenced
performance-based
test).
At the ti m e thei r states mandated
new tests, offi c i a l s
i n l o cal di s tri c ts
that
were al r eady
testi n g faced a choi c e. They coul d repl a ce
an exi s ti n g
test
wi t h the state-mandated
test and thus hol d to the same number
of tests, or
they coul d add the state-mandated
test to thei r testi n g program.
Evi d ence
from our surveys i n di c ates
that, i n maki n g
thei r choi c e
of whether
or not
to drop a test, l o cal school
offi c i a l s
consi d ered
the state-mandated
tests

Page

43

GAWPEMD-93-8

Student

Testi n g

Chapter
4
T h e Future

Cost

and

Extent

of Testi n g

si m i l a ri t y
i n p u r p o s e a n d c o n te n t to thei r exi s ti n g test. A s s h o w n i n ta b l e
4 .2 , w e fo u n d th a t w h e n th e n e w state-mandated
test w a s very si m i l a r,
di s tri c t o ffi c i a l s
te n d e d to d r o p thei r exi s ti n g
l o cal test. T h e y w e r e m u c h
m o r e l i k el y to k e e p thei r o w n test a n d a d d th e n e w o n e w h e n th e state test
di f fered i n p u r p o s e or c o n te n t.
-...I._ _ ._ .-...
.-._ _ _ .,-_..._ _ ~ _ _ _
T a b l e 4.2: S c h o o l Dl s trl c t
to S tate Testl n g
M a n d a tes

Responses
Dl s trl c ts
S tate a n d l o cal tests p u r p o s e
Exactl y the s a m e or very si m i l a r
S o m e w h a t or moderatel v
si m i l a r
Not at al l si m i l a r or verv l i t tl e

S ti l , 4 1 p e r c e n t o f
di f ferent from th e
r e s p o n d e n ts
were
p r o g r a m to o l a rge

a n d content

substi t uti n g
state test
82%
69
41

di s tri c ts d r o p p e d thei r o w n test e v e n w h e n i t w a s


states T h e m o s t c o m m o n r e a s o n s ci t ed b y o u r survey
th a t th e n e w state test m a d e thei r overal l testi n g
or th e n e w test w a s o f h i g h e r qual i t y th a n th e ol d .

In conversati o ns
wi t h o ffi c i a l s
from c o m m e r c i a l
testi n g fi r m s a n d
systemati c
s a m p l e o f 5 0 school di s tri c ts a ffected b y state testi n g
m a n d a tes, w e l e a r n e d th a t school di s tri c ts try to s p r e a d th e testi n g
evenl y across g r a d e l e vel s . W h e n school di s tri c ts a d d state-mandated
a t certai n g r a d e l e vel s , th e y o fte n m o v e o th e r tests to o th e r g r a d e
S o m e school di s tri c ts a u g m e n t th e i n formati o n
g a i n e d from th e
state-mandated
commercial
tests b y admi n i s teri n g
th e s a m e tests
g r a d e l e vel s , a t thei r o w n e x p e n s e .6

from o u r
burden
tests
l e vel s .
a t o th e r

Thi s predi l e cti o n


o f school di s tri c ts to e v e n o u t th e testi n g b u r d e n across
g r a d e l e vel s i s corroborated
i n o u r n a ti o n a l
s a m p l e . A s w a s m e n ti o n e d
in
c h a p ter 2 , statewi d e tests te n d to b e c o n c e n trated
a t g r a d e s 3 ,4 ,6 ,8 , a n d
1 1 . B u t di s trl c twi d e
tests, w h i c h i n cl u de
b o th statewi d e a n d excl u si v el y
l o cal tests, a r e s p r e a d fai r l y evenl y across th e g r a d e s . S o excl u si v el y
l o cal
tests a r e c o n c e n trated
i n th e g r a d e l e vel s i n w h i c h state tests a r e n o t
admi n i s tered.

si n o u r systemati c
s a m p l e of 6 0 school di s tri c ts a d o p t i n g s t a t e - m a n d a t e d
tests, 2 6 of t h e m si m pl y
d r o p p e d a n o l d test a n d 4 kept a n o l d test i n s o m e of the s a m e g r a d e l e vel s a s the n e w state test.
H o w e v e r , 1 6 di s tri c ts kept a n o l d test but m o v e d i t to g r a d e l e vel s not c o v e r e d b y the n e w state test.
Si x di s tri c ts p a i d to s u p p l e m e n t
the s t a t e - m a n d a t e d
test i n n o n m a n d a t e d
g r a d e l e vel s .

Page

44

G A O /P E M D - 9 9 - 8

S tudent

Testi n g

Chapter
The

Future

Cost

and

Extent

of Testi n g

Three Al t ernati v es
for
a Nati o nal
Exami n ati o n
System

We proj e cted
the costs and other effects of three hypotheti c al
al t ernati v e
nati o nal
testi n g pl a ns, drawn from current debates. For exampl e ,
the
congressi o nal l y
mandated
Nati o nal
Counci l
on Educati o n
Standards
and
Testi n g
(NCEST) revi e wed
three mai n opti o ns
i n i t s work i n 1991-9Z7
The
several possi b l e
structures
for a nati o nal
exami n ati o n
system that NCEST
consi d ered
can be summari z ed
by three general
types: a si n gl e nati o nal
mul t i p l e -choi c e
test; a si n gl e nati o nal
performance-based
test; and several
cl u sters
of states, each admi n i s teri n g
a di f ferent
performance-based
test.
NCEST fi n al l y
recommended
the l a tter structure,
whi c h coul d empl o y
several nati o nal
performance-based
tests, l e avi n g
the states free to choose
among them. A c l u ster
woul d be formed when several states deci d ed
on
a parti c ul a r
test or devel o ped
one themsel v es.
Concei v abl y ,
some or al l of
the current state-devel o ped
performance-based
tests coul d be
i n corporated
i n a cl u ster system.*

Possi b l e
Responses
Each Pl a n

Knowi n g
how di s tri c ts
responded
i n the past to mandated
tests si m i l a r
or
di s si m i l a r
to those al r eady
i n use, we assessed
what coul d happen
under
each of the three al t ernati v e
nati o nal
test scenari o s
j u st descri b ed.
From
knowi n g
past behavi o r
i n droppi n g
tests or not and how many di s tri c ts
have tests si m i l a r
to those proposed,
we deri v ed
esti m ates
of how many
di s tri c ts
woul d repl a ce
current tests and how many woul d i n crease
thei r
testi n g programs
by not repl a ci n g
any current tests. From these data we
deri v ed
further esti m ates
of overal l
i n creased
cost and testi n g ti m e. The
detai l s
of our procedure
are shown i n appendi x
II; the resul t s are shown i n
tabl e 4.3.

to

rBy Publ i c Law 102-62 (si g ned June 27, 1991) the Congress created NCEST to report by January
1992
on the desi r abi l i t y
and feasi b i l t y
of nati o nal
standards
and tests and on pl a nni n g
an appropri a te
system of tests. At the ti m e, several i n dependent
groups had al r eady proposed
structures
for a nati o nal
exami n ati o n
system. In i t s fi r st several months, NCEST studi e d those proposed
structures
and others
generated
by i t s own members.
N CEST,

Page

Rai s i n g

Standards

for Ameri c an

Educati o n

(Washi n gton,

DC.: January

GAO/PEMD-93-8

46

,i

24, 1992).

Student

Testi n g

Chapter
The

Tabl e
4.3: School
to Three Nati o nal

Future

Cort

and

Extent

of TertIng

Di s tri c t
Responses
Test Al t ernati v es

Test
Response
Add new
Add new
Margl n al
Addi t i o nal
Addi t i o nal
per year

test; droD ol d
test; keep ol d
effect
annual
cost
testi n g ti m e
per student)

BOur cal c ul a ti o ns
soon

al t ernati v e
Si n gl e
performancebased

Si n gl e
mul t i p l e choi c e

wi l admi n i s ter

test
test
(mi l i o ns)
(mi n utes

that under a cl u ster


state performance-based

assume

74%
26%

$42
15
system, 27 percent
tests wi l i n corporate

52%
48%

Cl u sters
of
performancebaseda
30%
43%
$193

$209

25

30
of school
di s tri c ts that now or
them i n to the cl u ster system.

We found that a si n gl e nati o nal


mul t i p l e -choi c e
test, whi c h woul d most
overl a p
wi t h current testi n g, woul d add the l e ast new ti m e and money
cost, as 74 percent
of di s tri c ts
woul d drop some exi s ti n g
test. We
esti m ated
earl i e r i n the chapter that the total absol u te
cost of a nati o nal
mul t i p l e -choi c e
test woul d be $160 mi l i o n.
Consi d eri n g
the 26 percent of
di s tri c ts
that woul d mai n tai n
thei r exi s ti n g
test whi l e addi n g
the new
nati o nal
test, we esti m ate that onl y $42 mi l i o n
woul d be new costs. The
remai n i n g
$118 mi l i o n
i s al r eady
bei n g spent, but on tests that woul d be
repl a ced.
We esti m ated
onl y a smal l change i n overal l
testi n g ti m e per
student per year, about 16 mi n utes
more.
Because performance-based
tests are much l e ss commonl y
used, they
woul d bri n g somethi n g
new to many more di s tri c ts,
and as we have
shown, di s tri c ts
faci n g such a choi c e
are much more l i k el y
to add a
mandated
test that i s di f ferent
wi t hout
repl a ci n g
an exi s ti n g
test. Thus,
tabl e 4.3 shows, we esti m ated
that from 43 percent to 48 percent
of
di s tri c ts
woul d add a nati o nal
or cl u ster performance
test wi t hout
repl a ci n g
any current test, thus yi e l d i n g
a hi g her l e vel of new costs-from
$193 mi l i o n
to $209 mi l i o n-and
between
26 mi n utes
and 30 mi n utes
more testi n g ti m e per year for the average
student.
The si n gl e nati o nal
mul t i p l e -choi c e
test emerges as the l e ast expensi v e
al t ernati v e
for two reasons, Fi r st, mul t i p l e -choi c e
tests are i n herentl y
expensi v e
than performance-based
tests to admi n i s ter
and to process.
Second, they i m pose
fewer new costs because
they dupl i c ate
current
testi n g the most.

Page

46

GAOi P EMD-93-8

Student

as

l e ss

Testi n g

Chapter
4
The Future

Inci d ental
Benefi t s
Nati o nal

Coot

and

Extent

of Testi n g

Costs and
of Addi n g a
Test

Repl a cement

Di s rupti o n

The addi t i o n
of a nati o nal
test wi l affect more than the extent and cost of
testi n g i n the Uni t ed States. It coul d di s rupt
present systems and testi n g
programs.
For exampl e ,
school
di s tri c ts
that drop currentl y
used
mul t i p l e -choi c e
tests i n favor of a new nati o nal
mul t i p l e -choi c e
test woul d
gi v e up some test fami l i a ri t y,
trend data, and perhaps
a curri c ul a r
al i g nment
wi t h the test. And if enough
school
di s tri c ts
abandon
commerci a l
and state-devel o ped
mul t i p l e -choi c e
tests, some test
devel o pers
coul d l o se thei r j o bs.
Si m i l a rl y ,
school
di s tri c ts
that drop currentl y
used state performance
tests
i n favor of a si n gl e
nati o nal
performance-based
test woul d gi v e up some
test fami l i a ri t y,
trend data, and perhaps
a curri c ul a r
al i g nment
wi t h the
test. Moreover,
many state testi n g offi c i a l s
who now devel o p
and
admi n i s ter
state performance-based
tests mi g ht fi n d thei r j o bs obsol e te.
Presumabl y ,
wi t h a cl u ster system, states woul d be
wi t h a test that cl o sel y
matches
thei r curri c ul u m,
if
woul d be l o st i n curri c ul a r
al i g nment.
Moreover,
if
admi n i s ter
thei r own performance-based
tests were
by starti n g a cl u ster, no state testi n g offi c i a l s
woul d

abl e to j o i n a cl u ster
they have one. So l i t tl e
states that currentl y
al l o wed
to keep them
be di s pl a ced.

-- __. .-_-____.-

Wi n dfal l
Added

Benefi t s
Costs

and

Addi n g
a test to a school
di s tri c ts
testi n g program
can be vi e wed
as a
benefi t or as a burden-a
benefi t if the test i s wanted, a burden
if it i s not.
When a school
di s tri c t
recei v es
a desi r abl e
test free, as it mi g ht by
parti c i p ati n g
i n a nati o nal
exami n ati o n
system, it recei v es
a wi n dfal l
benefi t . It gets a test it wants wi t hout
purchase
or devel o pment
costs. If
the school
di s tri c ts
admi n i s trati v e
ti m e costs were subsi d i z ed
as wel l , the
test woul d be an even greater benefi t .
When a hi g her l e vel of government
mandates
that a l o cal school
di s tri c t
admi n i s ter
an unwanted
test, the test woul d create added costs to the
school
di s tri c t
unl e ss the personnel
ti m e i n admi n i s teri n g
the test was
compl e tel y
subsi d i z ed
and di d not detract from regul a r
i n structi o n
(i n
whi c h case, the new test woul d have a neutral
effect). In most cases, a
new test adds both benefi t s
and costs to a testi n g program.

Page

47

GAO/PEMD-93-8

Student

Testi n g

_,I,.-~-

-..--

.-~-

-.---

-._.

-_-__-~__
Chapter
4
The Future

Effi c i e ncy

~B enefi t s

Cost

and

Extent

of Testi n g

Effi c i e ncy
benefi t s
can resul t from the achi e vement
of one or more
economi e s
i n testi n g, such as those menti o ned
earl i e r-scal e ,
scope, or
l e arni n g.
One mi g ht thi n k that conversi o n
to si n gl e nati o nal
tests, whether
mul t i p l e -choi c e
or performance-based,
woul d produce
economi e s
of scal e .
But scal e economi e s
i n testi n g seem to be exhausted
at a scal e smal l e r
than the whol e country.
Our anal y si s
earl i e r i n thi s chapter showed
that
scal e economi e s
i n state performance-based
testi n g seem to be exhausted
at about the l e vel of a l a rge state. The fact that three profi t abl e
test
publ i s hers
coexi s t
sel l i n g
several
di f ferent
nati o nal l y
normed
mul t i p l e -choi c e
tests suggests
scal e economi e s
mi g ht be exhausted
for
mul t i p l e -choi c e
tests as wel l . (Otherwi s e,
one of the three compani e s
coul d undercut
the other two on pri c e, enl a rge
i t s market share whi l e
l o weri n g
i t s costs, and dri v e i t s competi t ors
from the market.)
Performance-based
tests are now bei n g devel o ped
and admi n i s tered
some rel a ti v el y
smal l states, and the l a rger scal e s of ei t her a si n gl e
nati o nal
performance-based
test or a cl u ster system shoul d
engender
scal e economi e s.

by
some

A cl u ster system coul d produce


some l e arni n g
benefi t s.
Several tests i n
separate
cl u sters
woul d , essenti a l y ,
compete
wi t h each other for the
al l e gi a nce
of states, who woul d be free to sel e ct the cl u sters
of thei r
choi c e.
Competi t i o n
among the several
cl u sters
shoul d
provi d e
i n centi v e
for them to l e arn how to l o wer costs and i m prove
qual i t y
i n order to retai n
the states wi t hi n the cl u ster and to attract others. Learni n g
effects can be
parti c ul a rl y
i m portant
wi t h new or rel a ti v el y
undevel o ped
technol o gi e s.
The more preval e nt
the opportuni t i e s
to experi m ent
wi t h new methods,
the faster the technol o gy
can devel o p.
.._-_. _ .-_ - I.I

_..__

Test Matchi n g

Costs

In and of i t sel f , a nati o nal


system of state performance-based
tests woul d
not provi d e
comparabi l i t y
of test scores across cl u sters.
For exampl e ,
the
medi a n
student test score i n one cl u ster of states mi g ht not represent
the
same l e vel of academi c
achi e vement
as the medi a n
student test score i n
another
cl u ster of states. The two cl u sters
mi g ht have very di f ferent
tests
or tests that di f fer i n thei r l e vel of di f fi c ul t y.
Arrangi n g
the tests to produce
comparabl e
scores wi l requi r e
some effort and coordi n ati o n,
If such
test-matchi n g
i s to be done, i t shoul d
be consi d ered
as a cost, one uni q ue
to the cl u ster desi g n.
Tabl e 4.4 summari z es
nati o nal
tests.

Page

48

the i n ci d ental

costs

and benefi t s

of the proposed

GAO/PEMD-93-8

Student

Testi n g

Chapter
4
The Future

Tabl
-.~_-e

4.4: l n cl d ental

Costa

and

wi t h

no state

States

wi t h

mul t i p l e -choi c e

of Proposed

test

Add
and
tests

Extent

of Testi n g

Tests

Si n gl e
natl o nal
mul t i p l e -choke

testi n g
-ICurrent
---.-. ---------_
States

Benefi t s

and

Coot

test

a test; wi n dfal l
added
costb

Do not add

Repl a ce
di s rupti o nd

Proposed
Si n gl e
nati o nal
performance
benefi t a

Add
and

a test

a test; repl a cement

testi n g
Cl u atera
of
performance
tests
Add a test; wi n dfal l
benefi t ,
added
cost, and
test-matchi n g
costC
Add a test; wi n dfal l
benefi t ,
added
cost, and
test-matchi n g
cost

test

a test; wi n dfal l
added
cost

benefi t

Add

a test; wi n dfal l

benefi t

and

added

cost

Repl a ce
di s rupti o n

a test; repl a cement

-_~-.~
States

wi t h

performance

tests

Add
and

a test; wi n dfal l
added
cost

Repl a ce
di s rupti o n
aWi n dfal l

Repl a ce
di s rupti o n

a test

a test; repl a cement

be admi n i s tered

cost: Effort requi r ed

to make

Joi n

al o ng
scores

test; repl a cement

a cl u ster;

benefi t P
cost

State gets a new test to use free of some

cost: New test woul d

CTest-matchi n g

Summary

Do not add

a test; repl a cement

benefi t :

bAdded

benefi t

Repl a ce

di s rupti o n
and
test-matchi n g
cost
Do not add or repl a ce

a test

effi c i e ncy

and

test-matchi n g

of its costs.

wi t h ol d test (or for the fi r st ti m e).


comparabl e

across

cl u sters.

dRepl a cement
di s rupti o n:
In repl a ci n g
an ol d test, a di s tri c t may gi v e
curri c ul a r
al i g nment;
commerci a l
test publ i s hers
may l o se customers;
state tests may no l o nger be needed.

up fami l i a ri t y,
trend data, or
and etTIpl O y9eS
managi n g

eEffi c i e ncy

effi c i e nt

benefi t s:

Cl u steri n g

hel p s

smal l

states

as a group

to reach

scal e

i n testi n g.

Looki n g
at a sampl e
of states wi t h performance-based
tests much l i k e
those proposed
for a nati o nal
exami n ati o n
system al l o wed
us to esti m ate
the cost of a nati o nal
system. Our best esti m ate
i s $330 mi l i o n,
and
di f ferent
al t ernati v es
coul d cost from $160 mi l i o n
to $640 mi l i o n,
far
l o wer than the esti m ates
of some nati o nal
test opponents,
and far hi g her
than those of some proponents.
Economi e s
of scal e , scope, and l e arni n g
i m pl y that the cost of any nati o nal
system of exams shoul d
decl i n e
over
ti m e.
In general ,
when a school
di s tri c t
test, it was more l i k el y
to abandon
i n purpose
or content, and more
were di f ferent.
Thi s suggests
that
nati o nal
test mi g ht determi n e
the

Page

49

i n our sampl e
adopted
an exi s ti n g
test if the
l i k el y
to retai n an exi s ti n g
the purpose
or content
degree of overl a p
wi t h

a mandated
state
two were si m i l a r
test if the two
of a vol u ntary

GAWPEMD-98-9

exi s ti n g

testi n g

Student

Testi n g

Chapter
4
The Future

Coat

and

Extent

of Terti n g

If the nati o nal


test i s di f ferent
from exi s ti n g
tests, a di s tri c t
(or
state) may j u st add the nati o nal
test. If the nati o nal
test i s si m i l a r
to an
exi s ti n g
test, a di s tri c t
(or state) may j e tti s on
the exi s ti n g
test or not adopt
the nati o nal
test, effecti v el y
not enl a rgi n g
i t s testi n g program.
programs,

Each of three al t ernati v e


pl a ns for a nati o nal
exami n ati o n
system woul d
l i k el y
have di f ferent
effects on the extent and cost of testi n g i n the Uni t ed
States. A si n gl e nati o nal
mul t i p l e -choi c e
test woul d l i k el y
repl a ce
tests
now i n use i n three-quarters
of U.S. school
di s tri c ts
(90 percent
of whi c h
woul d be other mul t i p l e -choi c e
tests) and woul d add $42 mi l i o n
overal l
and 16 mi n utes
per student to the current cost of testi n g ($616 mi l i o n
and
3.4 hours per student per year). A si n gl e
nati o nal
performance-based
test
woul d l i k el y
repl a ce
tests now or soon to be i n use i n j u st over hal f of U.S.
school
di s tri c ts
(42 percent
of whi c h woul d be other performance-based
tests) and woul d add $209 mi l i o n
and 30 mi n utes
of testi n g per student. A
c l u ster
system of performance-based
tests woul d l i k el y
repl a ce
tests
now i n use i n 30 percent
of U.S. school
di s tri c ts
and woul d add $193
mi l i o n
and 26 mi n utes
of testi n g per student.

Page

GAO/PEMD-99-8

60

,i..
:

Student

Testi n g

Chapter

Testi n g Offi c i a l s
Costs of Present

Vi e ws on the Benefi t s
and Future Testi n g

and

The vi e ws of l o cal and state school


admi n i s trators
on school
testi n g can be
i m portant
for several
reasons.
Fi r st, the admi n i s trators
i m pl e ment
the
present school
testi n g programs
and wi l determi n e
much of the character
of any new ones. Second, they are i n a posi t i o n
to make i n formed
j u dgments
about the val u e of thei r current tests. Thi r d, for some of the
i n formati o n
we were asked to obtai n , such as the benefi t s
of testi n g, there
i s vi r tual l y
no other practi c al
way to get it.
Thi s chapter presents
a summary
of the vi e ws of state and l o cal testi n g
offi c i a l s
on the benefi t s
and costs of thei r testi n g programs,
thei r
perspecti v es
on future trends i n testi n g, and thei r reacti o ns
to the concept
of a nati o nal
exam or nati o nal
exami n ati o n
system.

Benefi t s
Current

and Costs of
Tests

Two survey questi o ns


addressed
the net benefi t s
(total benefi t s
mi n us
total costs) of l o cal and state testi n g programs.
Respondents
from both
groups strongl y
bel i e ved
that the net benefi t s
of thei r present testi n g
programs
were posi t i v e.
Seventy-fi v e
percent
of state respondents
fel t that
way (compared
to 6 percent
who fel t the opposi t e)
and 43 percent
of l o cal
respondents
fel t that way (compared
to 18 percent who fel t the opposi t e).
Those l o cal di s tri c t
respondents
who were testi n g di r ectors
were al m ost
twi c e as l i k el y
as l o cal di s tri c t
superi n tendents
to see thei r testi n g
programs
net benefi t s
as posi t i v e,
though
even superi n tendents
l e aned
strongl y
i n that di r ecti o n.
Al l our state respondents
were ei t her testi n g
di r ectors
or admi n i s trators
i n the testi n g programs.
State respondents
bel i e ved
strongl y
that net benefi t s
woul d i n crease
if
thei r testi n g programs
were somewhat
l a rger-62
percent i n di c ated
so
(compared
to 6 percent
i n di c ati n g
the opposi t e).2
But a l a rger state testi n g
program
necessari l y
means l a rger di s tri c t
programs,
unl e ss di s tri c t
admi n i s trators
j e tti s on
an exi s ti n g
test when thei r state mandates
a new
one. At the l o cal l e vel , sl i g htl y
more respondents
(28 percent versus
22 percent)
thought
net benefi t s
woul d decrease
than thought
they woul d
i n crease
wi t h a somewhat
l a rger di s tri c t
testi n g program,
but 40 percent
*An addi t i o nal
16
benefi t s
and costs
d ont know or no
of your statesl d i s tri c ts
than the benefi t s,

percent of state respondents


and 34 percent of l o cal respondents
thought that the
of thei r testi n g programs
were about equal . Fi v e percent of each group repl i e d
wi t h
opi n i o n.
The exact wordi n g
of the questi o n
was, D o you bel i e ve
that the benefl t a
present testi n g program are greater than the costs, that the costs are greater
or do you bel i e ve
that they are about equal ?

2Twenty-three
percent of the state respondents
fel t that the net benefi t s
woul d remai n about the same
if thei r testi n g programs
were somewhat
l a rger, whi l e another 21 percent repl i e d
d ont know or no
opi n i o n,
The exact wordi n g
of the questi o n
was, D o you bel i e ve
that the net benefi t s
(total benefi t s
mi n us total costs) to your state/di s tri c t
woul d i n crease
if your testi n g program were somewhat
l a rger,
woul d decrease,
or woul d remai n about the same as now?

Page

61

GAWPEMD-98-8

Student

Teathg

&

Chapter
6
Teeti n g
Offi c l a l s
Coats of Present

Vi e ws
and

on the Benefi t s
Testi n g

and

Future

thought
net benefi t s
respondents
thought
an addi t i o nal
test.

woul d

remai n the same. Thus, 62 percent


of l o cal
net benefi t s
woul d i n crease
or remai n the same wi t h

When asked i n open-ended


questi o ns
to list thei r testi n g programs
chi e f
benefi t s,
the l o cal respondents
overwhel m i n gl y
menti o ned
such benefi t s
as di a gnosi s
and eval u ati o n
i n formati o n
for students,
parents, school s ,
programs,
or di s tri c ts.
By contrast, other potenti a l
benefi t s,
accounti n g
for
l e ss than 15 percent
of the responses,
concerned
posi t i v e
cl a ssroom
outcomes
(i m proved
student performance,
curri c ul u m
al i g nment
wi t h
standards,
and so on), posi t i v e
products
of the assessment
process
(cl e ar
standards,
better publ i c
understandi n g,
teacher edi f i c ati o n,
and so on), or
accountabi l i t y
at any l e vel . 3 Cl e arl y ,
l o cal school
offi c i a l s
vi e wed
tests as
hel p ful
di a gnosti c
i n struments,
though not cl e arl y
l i n ked
to practi c e
or
resul t s, even whi l e others percei v ed
di f ferent
purposes
for the same tests.
State respondents
were more l i k el y
than l o cal respondents
to menti o n
other types of benefi t s,
such as accountabi l i t y
(33 percent)
or mai n tenance
they menti o ned
most
of common,
cl e ar standards
(11 percent).
Otherwi s e,
often the di a gnosti c
benefi t s.
When asked to list the chi e f costs of thei r
testi n g programs,
the l o cal respondents
referred usual l y
to di r ect costs,
such as the test purchase,
admi n i s trati o n
ti m e, or scori n g
fees, or to
opportuni t y
costs, chi e fl y
the l o ss of teachi n g
and l e arni n g
ti m e. Though
the questi o n
di r ected
respondents
to thi n k of costs i n a broad sense, few
menti o ned
such probl e ms
as teachi n g
to the test, mi s use
of test resul t s, or
stress. Much the same was true among the state respondents,
though wi t h
them, test devel o pment
was al s o often menti o ned
as a cost.

The Future

of Testi n g

Maj o ri t i e s
among both the l o cal and state respondents
saw more tests i n
thei r future, whether they l i k ed it or not. Fi f ty-ni n e
percent
of the l o cal
respondents
anti c i p ated
more state-mandated
tests i n parti c ul a r,
despi t e
use at l e ast one state-mandated
the fact that al l but a few states al r eady
test. Forty-si x
percent
of the l o cal respondents
and 61 percent
of the state
respondents
bel i e ved
that the proporti o n
of thei r educati o n
budgets
devoted
to testi n g woul d grow i n the near future (compared
to 6 and
2 percent, respecti v el y ,
who bel i e ved
the opposi t e).
Pl u s, very l a rge
JA ccountabi l i t y
was defi n ed
i n the questi o nnai r e
to mean that a ssessment
[i s ] used to determi n e
promoti o n,
retenti o n,
or graduati o n
at the student l e vel ; that r esul t s
are used to hel p determi n e
pri n ci p als
retenti o n,
promoti o n,
or bonus, or cash awards to, honors for, status of, or budget of the
school
at the school l e vel ; that in formati o n
is made publ i c
and voters or school board can i n sti g ate
systemwi d e
change at the di s tri c t l e vel ; and that in formati o n
is made publ i c
and voters or l e gi s l a ture
can i n sti g ate
systemwi d e
change at the state l e vel .

Page

62

GAO/PEMD-93-8

Student

Testi n g

--

__.-.__ ... __ _-,__


Chapter
5
Testi n g
Offi c i a l s
Costa of Present

- _...

..-_....._.
-.--__.-._

_____.

Vi e ws
on the Benefi t s
aud Future
Testi n g

and

-.----

maj o ri t i e s
from both groups
that i s , no more suscepti b l e
educati o nal
programs.

bel i e ved
to budget

thei r testi n g programs


were secure;
cutbacks,
or even l e ss so, than other

A maj o ri t y
of l o cal respondents
bel i e ved
that the proporti o n
of tests that
are commerci a l y
devel o ped
wi l not change i n the near future. A cl e ar
maj o ri t y
of state respondents,
however,
bel i e ved
that they wi l rel y l e ss on
commerci a l y
devel o ped
tests (onl y one state respondent
expected
to rel y
on them more). Both state and l o cal respondents
i d enti f i e d
a trend toward
more use of cri t eri o n-referenced
and away from norm-referenced
tests and
a trend toward more use of performance-based
and away from
mul t i p l e -choi c e
formats. Very l a rge maj o ri t i e s
among the state
respondents
confi r med
the two trends-70
percent predi c ted
more
cri t eri o n-referenced
tests (compared
to 2 percent predi c ti n g
more
norm-referenced
tests), and 87 percent predi c ted
more performance-based
tests (compared
to no state respondents
predi c ti n g
more mul t i p l e -choi c e
tests).4
When asked i n open-ended
questi o ns
to i d enti f y
the most posi t i v e
contemporary
trends i n student assessment,
state testi n g di r ectors
menti o ned
most often: more performance-based
and a uthenti c
tests
(52 percent);
i m proved
testi n g procedures
i n general ,
such as
cri t eri o n-referenci n g,
l e ss cul t ural
bi a s i n test i t ems, and testi n g h i g her
order ski l s
(38 percent);
and testi n g as part of i n tegrated
educati o nal
programs
(11 percent).
When asked to i d enti f y
the most negati v e
contemporary
trends, they menti o ned
most often: mi s use of test resul t s
compare
di s tri c ts
or states that are not al i k e as i f they were or to make
unwarranted
i n ferences
about students
(47 percent);
use of unproven
methods
(for exampl e ,
performance-based
tests-25
percent);
and too
much testi n g or too much emphasi s
on testi n g (21 percent).
Tabl e 5.1
summari z es
these responses.

to

I n addi t i o n,
17 percent of state respondents
predi c ted
about the same proporti o n
of
cri t eri o n-referenced
to norm-referenced
tests, whi l e 11 percent repl i e d
n o opi n i o n
or not appl i c abl e .
El e ven percent of state respondents
predi c ted
about the same proporti o n
of mul t i p l e -choi c e
to
performance-based
tests, whi l e 2 percent repl i e d
n o opi n i o n
or not appl i c abl e .

Page

53

GAO/FEMD-93-8

Student

Testi n g

Chapter
IS
Testi n g
Offi c i a l s
Costa of Present

Tabl e

5.1: Posl t l v e

hnd8

i n TaSti n g

and

Vi e ws
on the Benefl t a
and Future
Testi n g

and

Negati v e
Trends
Posi t i v e
More performance-based
tests
Improved
testi n g procedures
(l e ss bi a s, hi g her order
Testi n g
as part of an i n tegrated
educati o nal
system
Negati v e
Mi s use
of test resul t s
Use of unwoven
methods
(i n cl u di n a
oerformance-based
Too much testi n g or emphasi s
on testi n g
*State testi n g di r ectors
respondi n g
respondent,
so the percentages

Reacti o n
to a Nati o nal
Exami n ati o n

to our surky.
of all responses

Respondents

skil s

52%
38
11

tested)

47
25
21

tests)

Up to three responses
do not total 100.

were counted

from each

Our questi o nnai r es


were devel o ped
i n 1991 when many proposal s
centered
on a si n gl e
nati o nal
test, so we asked respondents
for thei r reacti o n
to a
vol u ntary
nati o nal
achi e vement
test. Thus, thei r responses
refl e ct thei r
reacti o n
to the i d ea of a si n gl e test. Were the questi o nnai r e
wri t ten today,
that phrase mi g ht have been al t ered to read n ati o nal
exami n ati o n
system, to better refl e ct the wi d el y
di s cussed
recommendati o n
of NCEST
agai n st
a si n gl e test and i n favor of a system i n corporati n g
several
di f ferent
tests. Some others who were opposed
to a si n gl e test mi g ht favor
such a c l u ster
system of exams. We di d , however,
attach a seri e s of
open-ended
questi o ns
to our survey that referred to a potenti a l
n ati o nal
exami n ati o n
system.
The survey, then, asked respondents
whi c h factors, among 12 posed,
woul d be most i m portant
to them if it were thei r responsi b i l t y
to choose
to adopt a vol u ntary
nati o nal
test. Among the 12,3 factors were
consi d ered
the most i m portant
by both groups: the qual i t y
of the nati o nal
test, the cost to the state or di s tri c t
of admi n i s teri n g
the test, and the
useful n ess
of the test resul t s to state or l o cal i n ternal
eval u ati o ns.
Judgi n g
from thei r responses
to other survey questi o ns,
we bel i e ve
that our
respondents
woul d consi d er
a test to be of hi g her qual i t y
to the degree
that it covers what thei r teachers
teach and measures
di v erse
ski l s ,
by
i n cl u di n g
some performance-based
response
formats as wel l as
content-based,
mul t i p l e -choi c e
formats.
Other factors consi d ered
i m portant,
but l e ss so than these
menti o ned,
were those i n vol v i n g
the fit between
a nati o nal
exi s ti n g
di s tri c t
or state tests. Ki n ds of fit that respondents

Page

64

GAO/PEMD-93-8

three j u st
test and
sai d were

Student

Testi n g

Chap&r
6
Te&ng
Offtci a h
Corta
of Prsmnt

Vi a m
on the BenefIti
and Futura
Testi n g

and

--

i m portant
i n cl u ded,
i n order of decreasi n g
i m portance,
si m i l a ri t y
in
content, purpose,
grade l e vel , test type, or ti m e of year when gi v en. A
nati o nal
test woul d be l e ss accepted,
that i s , to the degree that it di f fered
from current practi c e
on these di m ensi o ns.
State respondents
noted that if it were thei r responsi b i l t y
to choose to
adopt a vol u ntary
nati o nal
test, they woul d fi n d it extremel y
i m portant
if
the nati o nal
test proposal
were accompani e d
by pressure
to adopt or not
adopt from forces outsi d e,
such as the governor,
the state l e gi s l a ture,
or
publ i c
opi n i o n.
Local respondents
j u dged
these consi d erati o ns
somewhat
l e ss i m portant.
When asked whi c h factors they woul d consi d er
most
i m portant
i n deci d i n g
to drop an exi s ti n g
test i n favor of a nati o nal
test,
the rel a ti v e
ranki n g
of factors mi r rored
that for the previ o us
questi o n
on
the si m pl e
adopti o n
of the nati o nal
test.
Separate, open-ended
questi o ns
that asked for the percei v ed
advantages
exami n ati o n
system reveal e d
a good
and di s advantages
of a n ati o nal
deal of opposi t i o n
to the i d ea (see tabl e 6.2). Forty percent
of the l o cal
di s tri c t
respondents
and 29 percent
of the state respondents
offered that
there were no advantages
or that they coul d not thi n k of any. One posi t i v e
advantage
menti o ned
by a si z abl e
number
of respondents
(over hal f of the
state respondents
and 32 percent
of the l o cal respondents)
concerns
the
common
metri c and basi s for compari s on
of performance
that a nati o nal
testi n g system coul d provi d e.
Tabl e
5.2: Advantages
and
Di s advantages
of a Natl o nal
Examl n atl o n
System

Response
Advantages
No advantages
or cannot thi n k of any
Common
bases for compari s on,
cl e ar standards
Di s advantages
Mi s use
of test resul t s
Push for nati o nal
curri c ul u m
or a decrease
i n l o cal
control
Mi s match
of test to l o cal curri c ul u m
Teachi n g
to the test or a narrowi n g
of curri c ul u m
Use of restri c ti v e
or narrow testi n g formats
Wp to two responses
do not total 100.

Page

55

were counted

from each

respondent,

Offi c i a l s
State

respondi n g
Local

district

29%
53

40%
32

41

26

25
20
16
14

14
4
17

so the percentages

GAO/PEMD-93-S

of all responses

Student

Testi n g

Chapter
5
Test&g
Offi c i s l e
Costa of Present

Vi e ws
on the Benefi t a
and Future
Testi n g

and

The potenti a l
l i n kage
of such expanded
testi n g to a nati o nal
curri c ul u m,
the cl e ar decrease
i n l o cal control ,
and a l a ck of match of the tests to l o cal
curri c ul a
were menti o ned
often as di s advantages
of a nati o nal
exam
system. Other di s advantages
often menti o ned
concerned
mi s uses
of tests
i n general - not
necessari l y
j u st a nati o nal
test-such
as the i n appropri a te
compari s on
of unl i k e
di s tri c ts
or states, i n accurate
reporti n g
of test
resul t s, teachi n g
to the test, narrowi n g
the curri c ul u m,
and use of
restri c ti v e
or nsrrow testi n g formats.

A Trade-Off
Test Quality

Between
and Cost

Al t hough
our survey respondents
di d not seem opposed
to more testi n g,
they were parti c ul a r
about the ki n ds of tests they favored. They i n di c ated
a preference
for performance-based
tests wi t h the content based on thei r
state or l o cal curri c ul a
and resul t s that csn serve l o cal purposes,
such as
student, school ,
or curri c ul u m
di a gnosi s .
Thi s desi r e does not necessari l y
match present practi c e.
As we di s cussed
i n chapter 2, most state
respondents
reported
no requi r ed
state curri c uhuu
i n 1990-91. Overal l ,
state respondents
i d enti f i e d
onl y 46 percent
of thei r statewi d e
tests as
l a rgel y
or perfectl y
al i g ned
wi t h thei r state curri c ul a .
Even so, curri c ul a ,
whether
state or l o cal , whether
speci f i e d
or j u st ad hoc, do not vary so
much that di s parate
school
di s tri c ts
cannot stil use the same textbooks,
vi r tual l y
al l of whi c h are sol d nati o nal l y .
Because
the ki n d of testi n g our respondents
want i s not exactl y
what they
now have does not i n val i d ate
thei r wi s hes, however.
Testi n g
di r ectors
and
l o cal superi n tendents
and admi n i s trators
work wi t hi n
the constrai n ts
of
budgets
and state mandates
and cannot al w ays compl e tel y
control
the
make-up
of thei r testi n g programs.
Besi d es;
it seems l o gi c al
that they
woul d desi r e a posi t i v e
addi t i o n
to thei r present testi n g programs.
A
nati o nal
mul t i p l e -choi c e
test-the
l o w-cost
al t ernati v e-woul d
be l a rgel y
dupl i c ati v e;
a curri c ul u m-based
performance
test woul d be, for most
di s tri c ts,
somethi n g
new.
Some woul d argue, moreover,
that the present
commerci a l y
devel o ped
mul t i p l e -choi c e
tests al r eady
are nati o nal
tests; they are desi g ned
and
devel o ped
wi t h i n formati o n
drawn from nati o nal
sampl e s
of students
in
pi l o t tests, and then they are sol d nati o nal l y .
Some cri t i c s of these tests
have argued that the test publ i s hers
do not update the materi a l
i n these
tests often enough,
that thei r test securi t y
i s often l a x, or that they test
onl y a narrow range of ski l s
and do not chal l e nge
the students
enough

Page

GAWPEMD-93-8

66

Student

Testi n g

,.,,I
I

Chapter
6
Testi n g
Oi A ci a l s
Costa of Present

Vi e ws
on the Benefi t s
and Future
Testl n g

even wi t hi n that range.


these tests.

Stil

more

and

cri t i c i s ms

have

been

l e vel e d

agai n st

To be fai r , however,
we note that a si z abl e
mi n ori t y
(25 percent)
of state
testi n g di r ectors
saw the use of performance-based
tests as a negati v e
trend. We cannot be sure, but they may have sai d thi s because
of the
undeni a bl e
fact that mul t i p l e -choi c e
tests do have some advantages
other
than cost over performance-based
tests. Fi r st, because
mul t i p l e -choi c e
test
questi o ns
can be answered
qui c kl y ,
many more of them can be answered
wi t hi n
a gi v en ti m e peri o d.
Thus, mul t i p l e -choi c e
tests can cover the
content
of a subj e ct
area far more qui c kl y
than can a performance-based
test.
Second, because
mul t i p l e -choi c e
tests l i m i t the domai n
of possi b l e
answers
and onl y one i s correct, scori n g
the exams can be done qui c kl y
and wi t h near-perfect
consi s tency.
Machi n es
score mul t i p l e -choi c e
tests,
and every test i s scored the same way. Indi v i d ual s
score
performance-based
tests, and each scorer may have a di f ferent
i d ea of
whi c h answer i s correct, how it shoul d
be expressed,
and what score
certai n answers
shoul d
get.6
Regardl e ss
of the test format, some efforts to cut costs can threaten
test
qual i t y,
To save money, testi n g offi c i a l s
can update the content of tests
l e ss often, devel o p
shorter tests or fewer forms of a test, use fewer
teachers
to score performance-based
test i t ems, or make no effort to ti e
the content of a test to the subj e ct
matter actual l y
taught i n the school s .
These parti c ul a r
cost-savi n g
efforts can threaten
test qual i t y
by decreasi n g
the degree to whi c h i n di v i d ual
test resul t sgenui n el y
and accuratel y
represent
a students
knowl e dge.

Summary

of
the net benefi t s
Our respondents
general l y
tol d us that they bel i e ved
thei r testi n g programs
were posi t i v e
and woul d i n crease
or remai n the
same if more tests were added. Thus, our l o cal di s tri c t
and state
6A wi d e vari e ty of probl e ms
wi t h tests were rai s ed i n testi m ony
before the House Commi t tee
on
Educati o n
and Labor i n 3 days of heari n gs
on the NCEST proposal s .
See House Commi t tee
on
Educati o n
and Labor, Oversi g ht
Heari n g on the Report of the Nati o nal
Counci l
on Educati o n
Standards
and Testi n g,
seri a l no. 102-106 (Washi n gton,
DC.: U.S. Government
Pri n ti n g
Offi c e, February
19,1Q92).
F or exampl e ,
seri o us probl e ms
of rel i a bi l i t y
surfaced
i n a recent eval u ati o n
of one states portfol i o
assessment
(that i n vol v ed
teacher rati n gs of sel e cted
exampl e s
of students wri t i n g and math). In thi s
case, standardi z ed
scori n g
condi t i o ns-a
maj o r preventi v e
agai n st unrel i a bl e
rati n gs-may
have been
di f fi c ul t
to obtai n as a l a rge number of teachers took part and thei r trai n i n g
was modest. See Dan
Koretz, et al . , The Rel i a bi l i t y
of Scores From the 1992 Vermont Portfol i o
Assessment
Program: Interi m
Re ort Techni c al
Report No. 366 (Los Angel e s: UCLA Center for the Study of Eval u ati o n,
se ecember
1992).

Page

57

GAO/PEMD-93-8

Student

Testi n g

chapter

TwtJa#
Corta

Offl d ab
of Preaant

Vhvm
and

on the BenefIta

Future

and

Tsrti n g

respondents
seemed not to be opposed
to more tests, though the l o cal
di s tri c ts,
where the tests are admi n i s tered,
may be cl o ser to the saturati o n
poi n t than the states. Moreover,
though both state and l o cal respondents
were open to more testi n g, they were parti c ul a r
about the type of tests:
they worri e d
about thei r qual i t y,
purpose,
and l o cus of control
over
content and admi n i s trati o n.
Very cl e arl y ,
l o cal di s tri c ts
tol d us they use tests-and
bel i e ved
they
shoul d
use them-as
di a gnosti c
i n struments,
to assess and i m prove
the
performance
of students,
programs,
school s ,
or di s tri c ts,
rather than as
accountabi l i t y
measures.
Our respondents
have i n di c ated
a preference
for
wel l - desi g ned
tests that served l o cal purposes,
such as student, school ,
or
curri c ul u m
di a gnosi s .
FInal l y ,
the survey reveal e d
a l a rge amount
of opposi t i o n
i n fal l 1991,
parti c ul a rl y
at the l o cal l e vel , to the concept
of a nati o nal
test or nati o nal
exami n ati o n
system. Temperi n g
that opposi t i o n
was an acknowl e dgment
by 63 percent
of state offi c i a l s
and 32 percent
of l o cal offi c i a l s
that a
nati o nal
exami n ati o n
system coul d provi d e
a common,
cl e ar basi s for
,
compari n g
academi c
performance
across the Uni t ed States.

Page 68

GAO/PEMD-93-8

Student

Testi n g

Chapter

Concl u si o ns

and Matters

for Consi d erati o n

Concl u si o ns
A Nati o nal
Exami n ati o n
System May Not Be So
Costl y

Our esti m ates


for the cost of a nati o nal
exami n ati o n
system are hi g her
than those of some nati o nal
test proponents,
but l o wer than those of some
opponents.
Our best esti m ates
for the most l i k el y
type of test show a
nati o nal
cost near $330 mi l i o n
annual l y ,
or about one-tenth
the amount
that some test opponents
have suggested.
Of thi s , we esti m ate
that cl o se to
$200 mi l i o n
woul d be new costs, whi l e the rest woul d be compensated
for
by repl a ci n g
some current tests. Start-up test devel o pment
coul d add a
one-ti m e
cost of $100 mi l i o n.
A nati o nal
exami n ati o n
system woul d l i k el y
i n crease
by up to 30 mi n utes
the average
amount of systemwi d e
testi n g ti m e per student, i n creasi n g
the
nati o nal
average to 4 hours per student per year-an
amount
of ti m e that
sti l does not seem undul y
burdensome,
especi a l y
i n vi e w of the powerful
potenti a l
i n formati o n
gai n s.

Some Opposi t i o n
a Nati o nal
Test

Exi s ts

No One Pl a n Domi n ates


the Others

to

Though
our respondents
di d not seem opposed
to more testi n g that met
certai n qual i t y,
uti l i t y,
and reporti n g
cri t eri a ,
many expressed
opposi t i o n
to a nati o nal
exami n ati o n
system. That i s , they opposed
a nati o nal
exami n ati o n
system i n the abstract wi t hout
knowi n g
i t s parti c ul a r
characteri s ti c s.
Thi s opposi t i o n
shoul d
gi v e pause to advocates
of a
nati o nal
system who are counti n g
on the cooperati o n
and support
of state
and l o cal educati o n
offi c i a l s
who wi l l i k el y
be the ones responsi b l e
for
admi n i s teri n g
and prepari n g
the students
for the exams. If they remai n
opposed
to the i d ea, ei t her because
no one has convi n ced
them of i t s
worth, because
they see it as a usel e ss
or harmful
i m posi t i o n,
or because
they do not see themsel v es
i n vol v ed,
success i s l e ss l i k el y .
No pl a n i s a cl e ar wi n ner. Tabl e 6.1 compares
three al t ernati v e
nati o nal
testi n g pl a ns on the three mai n cri t eri a
we exami n ed
(cost, overl a p,
testi n g
offi c i a l s
preferences)
as wel l as on three others where they have obvi o us;
wel l - establ i s hed
di f ferences
(fami l i a ri t y
of method,
comparabi l i t y
of
scores nati o nal l y ,
and al i g nment
of test and curri c ul u m).
A si n gl e nati o nal
mul t i p l e -choi c e
test offers l o wer cost, strong comparabi l i t y
of scores and
the most fami l i a r
methodol o gy.
A cl u ster system of performancei b ased
tests overl a ps
l e ss wi t h present testi n g, but may better match the
preferences
of state and l o cal testi n g offi c i a l s
and has more chance
for

Page

I59

GMNPEMD-93-8

Student

Terthg

Chapter
Concl u si o ns

6
and

Matters

for

Consi d erati o n

curri c ul a r
al i g nment.
The thi r d opti o n, a si n gl e
nati o nal
performance-based
test, coul d provi d e
stronger
test score comparabi l i t y
than the cl u ster pl a n and potenti a l y
stronger
i n fl u ence
toward nati o nal
standards
and curri c ul u m.
Obvi o usl y ,
preferences
for certai n cri t eri a
or
for the al t ernati v es
wi l vary among teachers,
other educators
and offi c i a l s ,
and the publ i c .
--_-

-w-----------

Tabl e
6.1: Eval u ati n g
the Three
Nati o nal
Test Al t ernati v es

Crl t erl o n
cost
Overl a p
wi t h present testi n g
Testi n g
offi c i a l s
preferences
Methodol o gy
Comparabi l i t y
of test scores
nati o nal l y
Curri c ul a r
al i g nment

Natl o nal
Si n gl e
mul t l p l e choi c e
Not costl y
More
Least
Fami l i a r
Strong
Strong if
curri c ul u m
nati o nal

examl n atl o n
system
Si n gl e
performance=
based
More expensi v e
Less
More
Less fami l i a r
Good

is

Strong if
curri c ul u m

al t ernati v e
Cl u sters
of
performance=
based
More expensi v e
Least
Most
Least fami l i a r
Weak
Possi b l e
di v erse

is

even wi t h
curri c ul a

nati o nal

Matters for
Congressi o nal
Consi d erati o n
Invol v ement
of State and
Local Educators

If the Congress
wi s hes to bui l d support
for a nati o nal
exami n ati o n
system
among teachers
and state and l o cal admi n i s trators,
it shoul d
consi d er
speci f i c
ways to encourage
thei r i n vol v ement
i n the process
of curri c ul u m
devel o pment,
standard-setti n g,
and test devel o pment,
admi n i s trati o n,
and
scori n g.
Thi s woul d i m prove
the l i k el i h ood
of success
of a nati o nal
system
as l o cal teachers
and admi n i s trators
shoul d
be an i n tegral
part of any test
admi n i s trati o n.
Done thi s way, test devel o pment
efforts can stil try to benefi t from the
l o wer cost, adherence
to common
standards,
curri c ul a r
i n tegrati o n,
and
other potenti a l
advantages
of l a rge-scal e
assessment
whi l e tryi n g al s o to
overcome
l o cal fears and al i e nati o n.
Teacher
i n vol v ement
i n test
devel o pment
seems to strengthen
teacher adherence
to standards
and

Page

GAO/PEMD-93-8

60

.y,,
I.,_

,r .,,,.,~, ,.
,

Student

,,

Testi n g

>.,
,,.
,.

Chapter
Concl u si o ns

-.-- .._..--_. . _.-.._- .._.-

6
and

Matters

for

Consi d erati o n

curri c ul a r
i n tegrati o n
and l e arni n g.

and to rel a te

testi n g

to i m provements

i n teachi n g

Invol v i n g
state testi n g offi c i a l s
i n the pl a nni n g
and executi o n
of a nati o nal
system coul d be advantageous
for two reasons.
Fir st, offi c i a l s
i n the many
states wi t h acti v e and sophi s ti c ated
testi n g programs
have devel o ped
a
great deal of experti s e
i n l a rge-scal e
testi n g and, thus, have much to teach
anyone
pl a nni n g
a nati o nal
system. Thei r experti s e
i n techni c al
aspects of
testi n g may be shared by many experts i n uni v ersi t i e s
and el s ewhere.
But
thei r experti s e
i n the i m pl e m.entati o n
of l a rge-scal e
testi n g programs
i n vol v i n g
di f ferent
types of tests and i n vol v i n g
several
groups of
stakehol d ers
i s shared by few others.
A second
reason for i n vol v i n g
state testi n g offi c i a l s
i n the pl a nni n g
and
executi o n
of a nati o nal
system i s to benefi t from thei r vi e ws on the most
orderl y
transi t i o n
to a nati o nal
system. Many state testi n g programs
are
now wel l - establ i s hed
or soon wi l be. Several others are bei n g pl a nned.
Some of these programs
are l a rge, sophi s ti c ated,
and compl i c ated,
and a
nati o nal
system wi l , i n evi t abl y ,
affect them.

Ensuri n g
Reliability

the Val i d i t y
of Tests

and

If the Congress
wi s hes to encourage
the devel o pment
of a wel l - accepted
and wi d el y
used nati o nal
exami n ati o n
system, it shoul d
consi d er
means
for ensuri n g
the techni c al
qual i t y
of the tests. Large-scal e
performance-based
testi n g, i n parti c ul a r,
i s both popul a r
and new-onl y
one state performance-based
test i s more than 6 years ol d and onl y two
are more than 3 years ol d . Its newness
suggests
that devel o pment
of
appropri a te,
val i d , and rel i a bl e
tests and of effi c i e nt
methods
for scori n g
them wi l requi r e
some tri a l , effort, and ti m e. State performance-based
test
devel o pment
peri o ds
ranged from j u st 1 year to 3 years. Creati n g
a
nati o nal
system of any ki n d, however,
wi l be an endeavor
of
unprecedented
scope. Coordi n ati n g
the efforts of several
l a yers of
government
al o ne shoul d
chal l e nge
the best of pl a nners.
Test qual i t y
wi l requi r e
an enduri n g
commi t ment
and suffi c i e nt
resources
to ensure that any tests i n a nati o nal
system are val i d and rel i a bl e .
Pressures
to cut corners
and degrade
the qual i t y
of tests are i n evi t abl e .
Money and ti m e can be saved, at the expense
of hi g h qual i t y,
for exampl e ,
by creati n g
fewer forms of a test, forgoi n g
pi l o t tests of test i t ems,
shorteni n g
the l e ngth of a test, or rel a xi n g
securi t y.
The need for qual i t y
control s
i s underscored
by the vi e ws and preferences
of the testi n g
offi c i a l s
who responded
to our survey. They prefer tests wi t h hi g h-qual i t y

Page

61

GMNPEMD-93-8

Student

Testi n g

Chapter
Conchui o na

6
and

characteri s ti c s
embody
those
mi s represented

Mat&m

for

Conel d eratl o n

and they worry that a nati o nal


exami n ati o n
wi l not
characteri s ti c s.
And they worry that test resul t s may be
and mi s used.

It i s , however,
beyond
the scope of our study to suggest what means
shoul d
be used to ensure quahty i n a nati o nal
system of exami n ati o ns.
The
Nati o nal
Counci l
on Educati o n
Standards
and Testi n g
has proposed
that a
nati o nal
techni c al
panel be appoi n ted
for thi s purpose.
There are other
possi b l e
ways to ensure qual i t y.
In vi e w of the si z abl e
controversy
over
current testi n g, and the potenti a l
for i n correct
deci s i o ns
based on fl a wed
test data, qual i t y
assurance
i n an expanded
system i s extremel y
i m portant
and shoul d
be expl i c i t l y
and proacti v el y
consi d ered
i n any nati o nal
exami n ati o n
system i m pl e mentati o n
pl a n.

Page

62

GAO/PEMD-08-8

Student

Testi n g

Page

68

GMNPEMD-@8-8

Student

Tcrti n g

Appendi x

-..

Sampl e

Survey:

Stati s ti c al

Anal y si s

The representati v e
nati o nal
sampl e
from whi c h we deri v ed
our esti m ates
on the cost, extent, and nature of testi n g i n the Uni t ed States consi s ted
of
500 school
di s tri c ts,
We recei v ed
368 compl e ted
questi o nnai r es
from thi s
group, for a 74-percent
response
rate.

Di f ferences
i n Survey
Response
Rates
Among Respondent
Groups

Nati o nal
esti m ates
bui l t up from sampl e
survey data can be shaky if some
groups surveyed
di d not send back many responses.
We anal y zed
di f ferences
i n survey response
rates among groups based on al l the
characteri s ti c s
for whi c h we had i n formati o n:
metropol i t an
status of
di s tri c t
(urban, suburban,
or rural ) ; di s tri c t
student popul a ti o n
si z e,
number
of statewi d e
tests i n di s tri c ts
state, number
of statewi d e
cri t eri o n-referenced
tests, and number
of statewi d e
performance-based
tests.
Onl y the di f ference
i n response
rates across di s tri c ts
wi t h di f ferent
student
popul a ti o n
si z es proved to be stati s ti c al l y
si g ni f i c ant.
We used a stati s ti c al
test cal l e d
the chi - square,
whi c h si g nal s
the l i k el i h ood
that a pattern of
di f ferences
among
groups woul d prove to be, upon further repeti t i o n,
consi s tent.
A l o wer &i - square
stati s ti c
suggests
a strong probabi l i t y
that
the response
rates among respondent
groups are trul y the same, and a
hi g her chi - square
stati s ti c
suggests
a l o w probabi l i t y
that the rates are
trul y the same.
For di s tri c t
si z e, the chi - square
was a rel a ti v el y
hi g h 11.031, wi t h a very
smal l chance, a probabi l i t y
l e vel of 0.004, that the response
rates were
actual l y
the same among the respondent
groups. As shown i n tabl e I. 1, the
other chi - squares
were l o w, wi t h correspondi n g
hi g h probabi l i t i e s
of trul y
the respondent
groups.
si m i l a r
response
rates among

-Tabl e
1.1: Chi - Square
Tests Compari n g
Survey
Response
Rates Among
Respondent
Groups

4
Respondent
characteri s ti c
and groups
Metropol i t an
status (urban, suburban,
rural )
Di s tri c t student popul a ti o n
si z e (smal l , l a rge, very l a rge)
Number of statewi d e
cri t eri o n-referenced
tests (0, 1, or 2)
Number of statewi d e
Derformance-based
tests (0, 1, 2, or 3)
Number of statewi d e
tests CO. 1, 2. or 31

Chi - square

Probabi l i t y

2.35
11.03
1.28
3.15
2.01

0.308
0.004
0.527
0.369
0.571

Di s tri c t student popul a ti o n


was categori z ed
i n three si z es-smal l
(l o wer
than 3,500 students),
l a rge (between
3,500 and 35,000 students),
or very
l a rge (more than 35,000 students).
The response
rates vari e d for the three
groups-wi t h
a 69-percent
rate from smal l di s tri c ts,
an 83-percent
rate

Page

64

GAWPEMD-93-8

Student

Testi n g

Appendi x
Sampl e

I
Survey:

Stati t i c al

Anal y si s

from l a rge di s tri c ts,


and a 79-percent
rate from very l a rge di s tri c ts.
The
di f ferences
i n response
rates among the three si z es of di s tri c ts
do not
i m pl y a bi a s i n the esti m ates
toward the la rge and very l a rge di s tri c ts,
because
al l the esti m ates
were wei g hted.
For exampl e ,
one cel l that
represents
30 di s tri c ts
i n the Uni t ed States coul d be represented
by 7
di s tri c ts
respondi n g
to our survey. Another cel l that represents
300
di s tri c ts
i n the Uni t ed States coul d be represented
by another
7 di s tri c ts
respondi n g
to our survey. In the fi r st case, the responses
from the 7
di s tri c ts
are gi v en the wei g ht of 30 di s tri c ts
i n the nati o nal
esti m ates,
and
i n the second
case, the responses
of the 7 di s tri c ts
respondi n g
ID our
survey are gi v en the wei g ht of 300 di s tri c ts
i n the nati o nal
esti m ates.
The nati o nal
esti m ates
woul d not be bi a sed
in
di s tri c ts,
for the smal l di s tri c ts
are suffi c i e ntl y
sampl e
through
the wei g hti n g.
However,
i n thi s
smal l di s tri c ts
woul d be l e ss rel i a bl e
(i . e., l e ss
woul d be based on a smal l e r
percentage
of the

favor of l a rge and very l a rge


represented
i n the nati o nal
exampl e
the esti m ates
for
accurate)
because
they
group.

The esti m ates


we deri v ed
from the group of smal l di s tri c ts
shoul d
be
rel i a bl e ,
however,
because
the response
rate for the group was stil rather
hi g h-69
percent. The group was l a rge-214
school
di s tri c ts
responded,
out of 312 surveyed.
And it was, al m ost certai n l y ,
the most homogenous
(i n terms of the extent and cost of testi n g)
of the three di s tri c t
si z es. Smal l
di s tri c ts
were wel l represented
i n the respondent
group because
there
were so many i n the ori g i n al
sampl e -312
of the ori g i n al
500 were smal l
di s tri c ts.
By contrast, onl y 42 of the ori g i n al
500 were very l a rge di s tri c ts.
We cannot,
systemati c al l y
the response

Confi d ence
Interval s
on Key Esti m ators

of course, demonstrate
di f ferent
on other
rate was suffi c i e ntl y

that nonrespondents
mi g ht not be
(nonmeasured)
factors. Agai n , however,
hi g h to mute such concerns.

Presented
bel o w i n tabl e I.2 are the 95-percent
confi d ence
i n terval s
for the
key vari a bl e s
i n the report. The esti m ates
are provi d ed
for the sampl e
as a
whol e . Standard
errors for al l vari a bl e s
are avai l a bl e
from our offi c e upon
request.

Page

66

GAWPEMD-93-8

Student

Testi n g

.-.-.. -

-___---_.
Appendi x
Sampl e

-. -..-.--._---Tabl e
1.2: S&Percent
Confi d ence
i n terval s
for Key Vari a bl e s

I
Surveyr

Stathti u l

Anal y aL

Vari a bl e
Average
amount of hours spent
taki n g test, per student per year
Average
amount of hours spent i n
al l test-rel a ted
acti v i t y,
per student
oer year
Average
cost per test
admi n i s trati o n
per studenta
Average
purchase
cost per test
admi n i s trati o n
per studenta
Average
parsonnel
ti m e cost per
test admi n i s trati o n
per studenta
Total cost of testi n g nati o nwi d e
in
1990-918
Total number
of di s tri c twi d e
tests
admi n i s tered
i n 1990-91
Total number
of i n di v i d ual
test
admi n i s trati o ns
i n 1990-91
@Esti m ate i n cl u des
to the di s tri c t-l e vel
state-l e vel fi g ures

Page

Eetl m ats

$516

Lower

bound

UDper

bound

3.4

3.1

3.8

6.5

5.4

7.6

$14.51

$12.61

$16.41

$4.33

$3.79

$4.87

$10.18

$8.56

$11.80

mi l i o n

$448

mi l i o n

$583

mi l i o n

35,600

32,700

38,500

36 mi l i o n

32 mi l i o n

39 mi l i o n

state- and di s tri c t-l e vel


costs. The confi d ence
i n terval s , however, pertai n onl y
costs. Our di s tri c t-l e vel
esti m ates were deri v ed from a sampl e of di s tri c ts. Our
are total s from the uni v erse of al l the states and, thus, are not esti m ates at al l .

GAO/PEMD-99-8

66

Student

..

;.
/

Testi n g

,:

Appendi x

II

Margi n al
Current

Effect of Proposed
Testi n g

Testi n g

Over

Thi s appendi x
gi v es detai l s
of our anal y si s
of how school
di s tri c ts
woul d
react to several nati o nal
testi n g al t ernati v es.
It i s based on responses
to
survey questi o ns
about how di s tri c ts
had responded
i n the past to
state-mandated
tests (see tabl e 4.2) and on other i n formati o n
about
di s tri c ts
current tests,

Potenti a l
Response
a Si n gl e Mul t i p l e Choi c e Test

to

If al l school
di s tri c ts
were to adopt a si n gl e nati o nal
mul t i p l e -choi c e
test,
we esti m ate
74 percent of them woul d drop another
test, thus not
enl a rgi n g
thei r testi n g programs.
The remai n i n g
26 percent
woul d add the
nati o nal
test wi t hout
droppi n g
another test.
A si n gl e nati o nal
mul t i p l e -choi c e
test woul d cl e arl y
overl a p
i n the
81 percent of school
di s tri c ts
that now admi n i s ter
ful l - battery
(mul t i - subj e ct)
mul t i p l e -choi c e
achi e vement
tests systemwi d e.
Usi n g the
fi g ure (from tabl e 4.2) of 82 percent to esti m ate
the proporti o n
of di s tri c ts
wi t h si m i l a r
tests that woul d drop a current test, we concl u de
that
66 percent
of -al l di s tri c ts
woul d do so (and 16 percent woul d not).
Si m i l a rl y ,
usi n g 41 percent
(agai n ,
fracti o n
of those di s tri c ts
that do
mul t i p l e -choi c e
achi e vement
test
wi t h the nati o nal
test, we see that
drop a current test (and 11 percent

from tabl e 4.2) as our esti m ate


of the
not currentl y
admi n i s ter
a
but that woul d repl a ce
some current test
another
8 percent
of al l di s tri c ts
woul d
woul d not).

Thi s can be more easi l y vi s ual i z ed


i n a t ree di a gram
of condi t i o nal
probabi l i t i e s
as shown i n fi g ure II. 1. The tree di a gram
shows that school
di s tri c ts,
ei t her wi t h or wi t hout
mul t i p l e -choi c e
tests, mi g ht repl a ce
a
current test, though the probabi l i t i e s
of that happeni n g
di f fer between
the
two groups. Addi n g
the two di f ferent
repl a cement
probabi l i t i e s
(66 and
8 percent)
together
produces
an overal l
repl a cement
probabi l i t y
of
74 percent.

That i s , some current


currentl y
admi n i s teri n g

Page

07

test other than a ful l - battery


ful l - battery
mul t i p l e -choi c e

mul t i p l e -choi c e
test. These are di s tri c ts
not
tests, but admi n i s teri n g
other types of testrr.

GAWPEMD-@S-S

Student

Terti n g

Appendi x
Margi n al
Current

-.-.-...--

.---.

---..-

11.1..

V...~.V

. . .

II
Effect
Testi n g

. ..W...

. .....*.

of Proposed

Wm..

Testi n g

Over

.YY.

0.81 X 0.82 = 66%

0.81 x 0.18 = 15%

0.19 x 0.41 = 8%
School di s tri c ts
wi t hout mul t i p l e choi c e tests

School
School

di s tri c ts
di s tri c ts

that woul d
that woul d

A74%

= repl a cement

testi n g;

repl a ce
a current test = 66% + 8% = 74%
not repl a ce
a current test = 15% + 11% = 26%
26%

= new, addi t i o nal

testi n g

Though
we esti m ated
i n chapter 4 that a si n gl e nati o nal
mul t i p l e -choi c e
test woul d cost around
$160 mi l i o n
a year to admi n i s ter,
the anal y si s
here
shows that some of thi s cost woul d be new and some of it woul d be
compensated
for by school
di s tri c ts
droppi n g
ol d tests and thei r costs.
Usi n g our repl a cement
probabi l i t i e s,
we cal c ul a te
that a si n gl e
nati o nal

Page

08

GAO/PEMD-93-8

Student

Testi n g

Appendi x
Margi n &l
Current

II
Effect
Testi n g

mul t i p l e -choi c e
new testi n g
per student

Fi r st-Order

Condi t i o ns

of Proposed

Over

test woul d add onl y $42 mi l i o n


a year i n new costs. The
woul d al s o add about 16 mi n utes
to the average
of 3.4 hours
i n systemwi d e
testi n g. The cal c ul a ti o ns
are shown bel o w.

$16 per student

Cost:

Testi n g

per mul t i p l e -choi c e

test

Number:

10 mi l i o n
students
tested (3 grade l e vel s )
40 mi l i o n
U.S. students
total
26% of school
di s tri c ts
adopti n g
new tests
4 hours

Ti m e:

Cal c ul a ti o ns

10 mi l i o n
x 0.26 = 2.6 mi l i o n
students
2.6 x 4 hours = 10.4 mi l i o n
new hours
10.4 + 40 mi l i o n
= 0.26 hours (15 mi n utes)
0.26 hours x $16 per test x 10 mi l i o n
students
Of $160

Potenti a l
Response
to
a Si n gl e PerformanceBased Test

per test

mi l i o n

costs,

$42 mi l i o n

new, $118

= $42 mi l i o n
mi l i o n

new costs

repl a cement

Another tree di a gram,


fi g ure 11.2, i l u strates
the equi v al e nt
ci r cumstances
that woul d obtai n if al l school
di s tri c ts
were to adopt a si n gl e
nati o nal
performance-based
test2 Fi f ty-two
percent
of school
di s tri c ts
woul d drop
another
test, thus not enl a rgi n g
thei r testi n g programs.
The remai n i n g
48 percent
woul d add the nati o nal
test wi t hout
droppi n g
another
test, thus
addi n g
to the extent and cost of testi n g.

@ lo make the numbers


more rel e vant,
we count among the school di s tri c ts
tests all those i n the ni n e states that pl a n to have statewi d e
performance-based
Onl y seven states now admi n i s ter
statewi d e
performance-based
tests.

Page

69

wi t h performance-based
tests 3 years from now.

GAOi P EMD-99-8

Student

Testi n g

Appendi x
Margi n al
Current

@pure 1.2: Degree

of Overl a p

Wi t h

a Si n gl e

II
Effect
Terti n g

Nati o nal

of Propored

Tertl n g

Performance-Based

Over

Test

0.27 X 0.82 - 22%

School di s tri c ts
wi t h performance0.27 x 0.18 = 5%

School di s tri c ts
wi t hout performancebased tests

School
School

di s tri c ts
di s tri c ts

that woul d
that woul d

.a 52%

= repl a cement

testi n g;

repl a ce
a current test = 22% + 30% = 52%
not repl a ce
a current test = 5% + 43% = 48%
48%

= new, addi t i o nal

testi n g

Though
we esti m ated
i n chapter 4 that a si n gl e
nati o nal
performance-based
test woul d cost around
$330 mi l i o n
a year to
admi n i s ter,
agai n the anal y si s
here shows that some of thi s cost woul d be
new and some of it woul d be compensated
for by the di s tri c ts
droppi n g
ol d tests. Usi n g our repl a cement
probabi l i t i e s,
we cal c ul a te
that a si n gl e

Page

70

GAO/PEMD-98-S

Student

Teetl n g

Appendi x
Margi n al
Current

II
Effect
Terthg

of Propored

Terti n g

Over

nati o nal
performance-based
test woul d add about $209 mi l i o n
a year i n
new costs. The new testi n g woul d al s o add more than 30 mi n utes
to the
average
of 3.4 hours per student i n systemwi d e
testi n g. The cal c ul a ti o ns
are shown bel o w.

Fi r st-Order

Condi t i o ns

$33 per student

Cost

per performance-based

test

Number:

10 mi l i o n
students
tested (3 grade l e vel s )
40 mi l i o n
U.S. students
total
48% of school
di s tri c ts
adopti n g
new tests
4 hours

Ti m e:

Cal c ul a ti o ns

per test

10 mi l i o n
x 0.48 = 4.8 mi l i o n
students
4.8 x 4 hours = 19.2 mi l i o n
new hours
19.2 + 40 mi l i o n
= 0.48 hours (30 mi n utes)
0.48 hours x $33 per test x 10 mi l i o n
students
Of $330

mi l i o n

costs,

$168 mi l i o n

new, $172

= $158 mi l i o n
mi l i o n

Because
30 percent
of the $33 performance-based
mul t i p l e -choi c e
tests:
10 mi l i o n
3 mi l i o n
$168

Potenti a l
Response
Cl u ster System

to a

mi l i o n

x 0.30 = 3 mi l i o n
students
x ($33 - $16) = 3 mi l i o n
x $17 = another
+ $61 mi l i o n

= $209

new costs

repl a cement
tests wi l

$51 mi l i o n

repl a ce

$16

new costs

mi l i o n

The l a st tree di a gram,


fi g ure 11.3, i l u strates
the si t uati o n
if al l school
di s tri c ts
not now admi n i s teri n g
state performance-based
exams were to
adopt a performance
test from one of the nati o nal
c l u sters.
Fi f ty-seven
percent
of the school
di s tri c ts
woul d drop another
test, thus not enl a rgi n g
thei r testi n g programs.
Forty-three
percent
woul d add a nati o nal
test
wi t hout
droppi n g
another
test, thus addi n g
to the extent and cost of
testi n g.

Page

71

GAO/PEMD-98-8

Student

Testi n g

Appendi x

II
Effect
Testi n g

Margi n al
Current

i g ure

11.3: Degree

of Overl a p

Wi t h

a Cl u ster

of Proposed

Testi n g

Over

System

Joi n

a Cl u ster

School di s tri c ts
wi t hout performancebased tests

School
School
School

di s tri c ts
di s tri c ts
di s tri c ts

that woul d repl a ce


a current test or not need to add the new nati o nal
that woul d not repl a ce
a current test = 43%
joining
a cl u ster do not add or repl a ce
a test = 27%

&30%

= repl a cement

testi n g;

43%

= new, addi t i o nal

72

+ 30%

= 57%

testi n g

Usi n g our repl a cement


probabi l i t i e s,
we
performance-based
tests woul d add $193
new testi n g woul d al s o add more than 25
hours per student i n systemwi d e
testi n g.
bel o w.

Page

test = 27%

cal c ul a te
that a cl u ster system of
mi l i o n
a year i n new costs. The
mi n utes
to the average
of 3.4
The cal c ul a ti o ns
are shown

GAO/PEMD-98-I

Student

Testi n g

Appendi x
Margi n al
Current

Fi r st-Order

Condi t i o ns

II
Effect
Testi n g

of Proposed

$33 per student

Cost:

Testi n g

Over

per performance-based

test

Number:

10 mi l i o n
students
tested (3 grade l e vel s )
40 mi l i o n
U.S. students
total
43% of school
di s tri c ts
adopti n g
new tests
4 hours

Ti m e:

Cal c ul a ti o ns

per test

10 mi l i o n
x 0.43 = 4.3 mi l i o n
students
4.3 x 4 hours = 17.2 mi l i o n
new hours
17.2 + 40 mi l i o n
= 0.43 hours (25 mi n utes)
0.43 hours x $33 per test x 10 mi l i o n
= $142
Of $330

mi l i o n

costs,

$142

mi l i o n

mi l i o n

new, $99 mi l i o n

Because
30 percent
of the $33 performance-based
mul t i p l e -choi c e
tests:
10 mi l i o n
3 mi l i o n
$142

Page

mi l i o n

x 0.30 = 3 mi l i o n
students
x ($32 - $15) = 3 mi l i o n
x $17 = another
+ $51 mi l i o n

new costs
repl a cement
tests wi l

$51 mi l i o n

repl a ce

$16

new costs

= $193 mi l i o n

GMVPEMD-93-8

73

,.
,

Student

Testi n g

Appendi x

III

The Extent
Testi n g

and Cost of Other Standardi z ed

To fi n d the extent and cost of the most wi d espread


tests, our surveys
asked about systemwi d e
testi n g done i n U.S. school s .
We defi n ed
systemwi d e
tests as those admi n i s tered
to al l students,
al m ost al l students,
or a representati v e
sampl e
of al l students
i n a school
di s tri c t
i n at l e ast one
grade l e vel . Most standardi z ed
tests are gi v en systemwl d e,
but not al l .
School s
gi v e some standardi z ed
tests onl y to certai n groups of students.
How much standardi z ed
testi n g di d we mi s s by our choi c e
of tests to
study? We thi n k not much. Thi s appendi x
gi v es our esti m ates
of the extent
of three ki n ds of tests beyond
those covered
i n our survey-tests
gi v en to
meet eval u ati o n
requi r ements
of the federal Chapter 1 program,
state
advanced
achi e vement
tests, and col l e ge
entrance
tests.

Chapter

1 Testi n g

Chapter 1 i s the federal program


provi d i n g
suppl e mentary
servi c es
for
economi c al l y
di s advantaged
students,
Most school
di s tri c ts
recei v e
some
Chapter 1 funds, whi c h are targeted
to school s
that exceed
a mi n i m um
percentage
of economi c al l y
di s advantaged
students.
Thus, Chapter 1 funds
may support
acti v i t i e s
at some school s
wi t hi n
a school
di s tri c t
but not at
others.
To i d enti f y
educati o nal l y
di s advantaged
students
for servi c es
and al s o to
check the Chapter 1 programs
effects, parti c i p ati n g
school s
must test
students
both at the begi n ni n g
and at the end of a reporti n g
peri o d.
The
test empl o yed
must be nati o nal l y
normed.
Because
they have nati o nal l y
normed
tests readi l y
avai l a bl e ,
commerci a l
test publ i s hers
suppl y
vi r tual l y
al l the tests used for Chapter 1 testi n g. Furthermore,
because
the
publ i s hers
have formatted
al l the nati o nal l y
normed
tests wi t h
mul t i p l e -choi c e
questi o ns,
Chapter 1 tests are al w ays i n mul t i p l e -choi c e
format.
Accordi n g
to Department
of Educati o n
offi c i a l s ,
some school
di s tri c ts
use
the Chapter 1 testi n g requi r ement
as an opportuni t y
to test al l thei r
students
at one or more grade l e vel s . Thus, i n stead
of purchasi n g
j u st
enough
test bookl e ts
for the students
i n thei r Chapter 1 school s ,
di s tri c t
offi c i a l s
purchase
enough
test bookl e ts
for aI1 the students
i n certai n grade
l e vel s . That way, they obtai n i n formati o n
on al l thei r students
and they
ful f i l
thei r Chapter 1 eval u ati o n
requi r ement,
payi n g
a l o wer p&
than
they woul d if they tri e d to meet both eval u ati o n
obj e cti v es
separatel y .
Department
of Educati o n
offi c i a l s
bel i e ve
that most school
di s tri c ts
do
thei r Chapter 1 testi n g thi s way, admi n i s teri n g
ful l - battery
commerci a l
tests to al l students
i n certai n grade l e vel s i n both Chapter 1 school s
and
non-Chapter
1 school s .

Page

74

GAO/PEMD-98-9

Student

Testi n g

Appendi x
The Extent
Terti n g

III
and

Cost

of Other

Standardi z ed

If the maj o ri t y
of school
di s tri c ts
recei v i n g
Chapter 1 money do, i n deed,
test al l students
at the same ti m e and wi t h the same test as thei r Chapter 1
students,
then most Chapter 1 testi n g i s systemwi d e
and i s represented
in
the data the nati o nal
sampl e
of l o cal school
offi c i a l s
provi d ed
on our
surveys.
We have no way of preci s el y
esti m ati n g
how much Chapter 1
testi n g i s systemwi d e
and how much i s not.
Even addi n g
al l Chapter 1 testi n g to our esti m ate
of the extent of testi n g
does not markedl y
i n crease
our esti m ate,
however.
We cal c ul a ted
thi s
extreme
case, whi c h assumes
that no Chapter 1 tests were i n cl u ded
i n our
surveys,
so as not to underesti m ate
the added testi n g burden
caused by
Chapter 1 eval u ati o n.
About 1.5 mi l i o n
students
take Chapter 1 tests i n
readi n g
and about 1 mi l i o n
students
take Chapter 1 tests i n math.
Department
of Educati o n
offi c i a l s
tol d us the tests (gi v en twi c e) take
about 45 mi n utes
per test admi n i s trati o n.
Thi s amount
of testi n g adds l e ss
than 6 mi n utes,
or 0.1 hour, to our esti m ate
of 3.4 hours of systemwi d e
testi n g per student.
AI1 systemwi d e
testi n g and Chapter 1 testi n g compri s e
the group of aII
mandatory,
school
di s tri c t-admi n i s tered
standardi z ed
academi c
tests. Our
nati o nal
sampl e
of systemwi d e
tests compri s es
98 percent
of the tests i n
thi s group.

St&e AdvancedSubj e ct-Area


Tests

In addi t i o n
to statewi d e
achi e vement
tests, two of the l a rger states
admi n i s ter
advanced-subj e ct-area
tests to some of thei r hi g h school
students.
These are not systemwi d e
tests because
not al l students
i n any
one grade l e vel take these tests, onl y those regi s tered
i n certai n advanced
hi g h school
courses. The advanced-subj e ct-area
exams are admi n i s tered
to
about 2.5 mi l i o n
students
for about 3 hours each. From i n formati o n
provi d ed
i n i n tervi e ws
wi t h the two states testi n g offi c i a l s ,
we cal c ul a ted
the ti m e i n vol v ed
for al l students
taki n g aII the di f ferent
subj e ct
tests, and
found that i n total those tests add 12 mi n utes,
or 0.2 hour, to our
3.4-hours-per-student
average
for the extent of testi n g i n the Uni t ed States.
AI1 systemwi d e
testi n g, Chapter 1 testi n g, and these state advanced
subj e ct-area
tests compri s e
the group of al l school
di s tri c t-admi n i s tered
standardi z ed
academi c
tests. Our nati o nal
sampl e
of systemwi d e
tests
compri s es
93 percent
of the tests i n thi s group.

B eth Si n cl a i r
and Babette Gutman,
Informati o n
for 1989-90 (Washi n gton,

Page

76

A Summary of State Chapter 1 Parti c i p ati o n


and Achi e vement
D.C.: Department
of Educati o n,
1992), p. 46.

GAWPEMD-93-8

Student

Testi n g

Col l e ge
Entrance
Exami n ati o ns

Appendi x
The Extent
Testi n g

III

The col l e ge
Program
Test) and
Apti t ude
by school
students
consi d eri n g
admi s si o n
fi r ms on
i n cl u di n g
mi n utes,
the extent

entrance
exami n ati o ns
of the Ameri c an
Col l e ge
Testi n g
(the Ameri c an
Col l e ge
Test and Prel i m i n ary
Ameri c an
Col l e ge
the Educati o nal
Testi n g
Servi c e (SAT, Prel i m i n ary
Schol a sti c
Test, and the Advanced
Pl a cement
exams) are not admi n i s tered
di s tri c ts
but by the testi n g fi r ms, themsel v es.
Hi g h school
are not requi r ed
to take them; they take them onl y if they are
appl y i n g
to col l e ges
and uni v ersi t i e s
that requi r e
them for
or advanced
course credi t . Prom fi g ures suppl i e d
by the two
test ti m es and number
of students
i n vol v ed,
we cal c ul a ted
that
al l the nati o nal l y
standardi z ed
col l e ge
entrance
exams adds 20
or 0.3 hours, to our nati o nal
average
of 3.4 hours per student for
of testi n g i n the Uni t ed States.

and

Cost

of Other

Standardi z ed

Al l systemwi d e
testi n g, Chapter 1 testi n g, state advanced-subj e ct-area
tests, and col l e ge
entrance
exams compri s e
the group of al l standardi z ed
academi c
tests. Our nati o nal
sampl e
of systemwi d e
tests compri s es
86 percent
of the tests i n thi s group.

Other Standardi z ed
Tests

The standardi z ed
tests for school - age
students
that remai n
are those gi v en
to speci a l
popul a ti o ns,
such as psychol o gi c al
tests for speci a l
educati o n
students,
IQ tests for gi f ted
and tal e nted
students,
or opti o nal
nonacademi c
tests, such as vocati o nal - i n terest
tests admi n i s tered
after school
hours to
students
who el e ct to take them on thei r own ti m e. We di d not exami n e
these. Compared
to the nati o nal
sampl e
of systemwi d e
tests, they are not
many, and they are not l i k e achi e vement
tests, the ki n d of tests bei n g
consi d ered
for a nati o nal
exami n ati o n
system.

Other Testi n g

Most tests, of course, are not standardi z ed.


Cl a ssroom
teachers
devel o p
and admi n i s ter
most tests as a normal
part of academi c
coursework.
We
know of no compl e ted
studi e s
desi g ned
to accuratel y
determi n e
the extent
of teacher cl a ssroom
testi n g. And such a study was wel l beyond
our
resources
to undertake.

Page

76

GAOi P EMD-93-8

Student

Testi n g

Appendi x -__-_IV _-- .-._-

.--~-

Other Esti m ates


Testi n g
_

of the Extent

and

Cost of

.._._._ _.I_...._---.~

Thi s appendi x
summari z es
other attempts to esti m ate
the current
and cost of testi n g i n the Uni t ed States. These studi e s
have deri v ed
esti m ates
ei t her from aggregate
fi g ures or from case studi e s.

OTA Esti m ates

extent
thei r

The Offi c e of Technol o gy


Assessment
(OTA) di d not attempt
to esti m ate
the current extent and cost of testi n g but di d provi d e
some perti n ent
i n formati o n
that we exami n ed
to see how consi s tent
it was wi t h our own
data. The OTA report i n cl u des
i n formati o n
from one l a rge urban school
di s tri c t
on al l outl a ys
for one school
year on materi a l s ,
servi c es,
and
personnel
rel a ted to standardi z ed
testi n g. From data i n the OTA report, we
cal c ul a ted
that expenses
on standardi z ed
testi n g amounted
to l e ss than
one-hal f
of 1 percent
of the di s tri c ts
budget. Thats a typi c al
l e vel of
spendi n g
for l a rge di s tri c ts
i n our nati o nal
sampl e .
al s o reported
the extent of testi n g i n that di s tri c t,
fi n di n g
the average
student took 5 to 6 hours of standardi z ed
tests per year. Thi s i s sl i g htl y
more than our nati o nal
average
of 3.4 hours per student per year. But we
al s o found i n our nati o nal
sampl e
that di s tri c ts
wi t h some of the
characteri s ti c s
of the di s tri c t
OTA studi e d-central
ci t y l o cati o n,
a hi g h
l e vel of poverty, and Northeastern
l o cati o n-had
somewhat
more testi n g
hours.
OTA

The other test cost i n formati o n


i n the report i s deri v ed
from a report
prepared
for OTA by uni v ersi t y
researchers.
They stated that
performance-based
tests i n Great Bri t ai n and Irel a nd
cost $107 per
student, and OTA used thi s fi g ure to represent
potenti a l
costs of
performance-based
tests i n the Uni t ed States.2 None of the state
performance-based
tests i n our nati o nal
sampl e
cost that much (we found
an average
cost of $33 and a range from $16 to $64), though such a cost
fi g ure coul d be expected
gi v en certai n condi t i o ns.
The condi t i o ns
surroundi n g
the European
tests were not speci f i e d
i n the researchers
report.

NCTPP Esti m ates

Between
Nati o nal
centered

1987 and 1990 the Ford


Commi s si o n
on Testi n g
chi e fl y
on equi t y i s sues

Testi n g
i n Ameri c an
Government
Pri n ti n g
G eorge
Lessons

Page

School s : Aski n g the Ri g ht


Offi c e, 1992), pp. 27-29.

F. Madaus and Thomas


Kel l a ghan,
for the Uni t ed States. Contractor

77

Foundati o n
sponsored
the work of the
and Publ i c Pol i c y (NCTPP), whi c h
i n the desi g n and use of tests. Usi n g some
Questi o ns,

OTA-SET-619

(Washi n gton,

DC.: U.S.

Student Exami n ati o n


Systems i n the European
report submi t ted
to OTA, June 1991.

GACMPEMD-93-8

Student

Communi t y:

Testi n g

Appendi x

IV

Other
Testi n g

Esti m ates

of the

Extent

and

Coat

of

esti m ates
and some aggregate
fi g ures (for the reported
sal e s revenue
and
vol u me
of commerci a l
tests, for exampl e )
the Commi s si o ns
report
esti m ated
that m andatory
testi n g consumes
some 20 mi l i o n
school
days
and the equi v al e nt
of $700 to $900 mi l i o n
i n di r ect and i n di r ect
expendi t ures
annual l y .3
The report ci t es as i t s source a book that i s as yet
unpubl i s hed,
so we coul d not determi n e
how NCTPP made these esti m ates.
The Commi s si o ns
fi g ures are, nonethel e ss,
cl o se to ours. Usi n g our fi g ure
of about 3.4 hours of testi n g for the average
student, the approxi m atel y
40 mi l i o n
students
i n publ i c
el e mentary
and secondary
school s
woul d
spend i n the aggregate
17 mi l i o n
S-hour days on tests. Usi n g a 6hour
school
day i n the cal c ul a ti o n,
we woul d esti m ate
a somewhat
hi g her
total - 23
mi l i o n
school
days of testi n g per year. We al s o report i n chapter
4 an overal l
esti m ate
of $616 mi l i o n
i n testi n g costs annual l y ,
whi c h fal l s
bel o w the Commi s si o ns
esti m ate, but we do not know exactl y
what they
were counti n g
as in di r ect
expendi t ures.
The report used some very
strong l a nguage
to emphasi z e
i t s contenti o n
that thi s amount of testi n g i s
t oo much. The fi g ure of l e ss than one day per student per year, on
average, seems not so al a rmi n g
to us, but the concl u si o n
i s a matter of
j u dgment.
The Commi s si o n
report al s o esti m ated
that students
take 127 mi l i o n
tests
per year, wi t h i n di v i d ual
students
at some grade l e vel s taki n g from 7 to 12
tests i n a year. But i n cal c ul a ti n g
these esti m ates,
the Commi s si o n
separated
test batteri e s
i n to thei r several subj e ct-area
components
and
counted
each of them as a test. A typi c al
commerci a l
test of 4 to 5 hours i n
l e ngth mi g ht contai n
separate
secti o ns
coveri n g
the basi c subj e ct
areas of
readi n g,
grammar, math, sci e nce,
soci a l sci e nce,
and wri t i n g.
The test
publ i s her
and most others woul d sti l cal l i t one test; the Commi s si o n
descri b ed
thi s as si x tests. We esti m ate that US students
take about
36 mi l i o n
tests per year.
4

Other Esti m ates

In our search of the l i t erature,


we found onl y two other empi r i c al
esti m ates
of the extent or cost of testi n g that were based on reasonabl y
compl e te
cal c ul a ti o ns.
A survey of school
di s tri c ts
i n 14 Northwestern
states esti m ated
that t he average
student experi e nces
2 to 6 hours of

3From

Page

Gatekeeper

78

to Gateway

(Boston:

1990), p. X.

GAO/PEMD-93-8

Student

Testi n g

Appndl x

IV

Other
Tasti n g

E&mater

of the

Extmt

and

testi n g each year throughout


range i n cl u des
our esti m ate
From a 1982 case study
Proj e ct at the Uni v ersi t y
di s tri c twi d e
testi n g costs
expendi t ures,
about the

Co&

of

el e mentary
and secondary
from our nati o nal
sampl e

of one suburban
school
di s tri c t, the Test Use
of Cal i f orni a
at Los Angel e s
cal c ul a ted
to be one-hal f
of 1 percent
of di s tri c t
average that we found.6

4Beverl y Anderson,
T est Use Today i n El e mentary
School s and Secondary
Wi g dor and Wendel l
R. Garner, eds., Abi l i t y Testi n g: Uses, Consequences,
(Washi n gton,
DC.: Nati o nal
Academy
Press, 1982), pp. 232-264.
GD. Dorr-Bremme
and J. Cateral l ,
Study of Eval u ati o n,
1982).

Page

79

school . 4
That
of 3.4 hours.

Test Use Proj e ct:

Costa of Testi n g

School s , i n Al e xandra
and Controversi e s

(Los Angel e s:

UCLA Center

GAO/PEMD-98-8

Student

K.

for the

Teeti u g

Appendi x

( Maj o r

Contri b utors

Program Eval u ati o n


and Methodol o gy
Di v i s i o n

to Thi s

Report

Frederi c k
V. Mul h auser,
Assi s tant
Di r ector
Ri c hard
P. Phel p s, Proj e ct Manager
Gai l S. MacCol l ,
Soci a l Sci e nce Anal y st
Chri s ti n e
Ing, Researcher
Cynthi a
S. Tayl o r,
Researcher
Harry M. Conl e y, Sampl i n g
Consul t ant
Venkareddy
Chennareddy,
Referencer

Page

80

GAO/PEMD-93-S

Student

Testi n g

.-.

Gl o ssary

Accountabi l i t y

Defi n ed
i n our questi o nnai r e
to mean assessment
that i s u sed to
determi n e
promoti o n,
retenti o n,
or graduati o n
at the student l e vel ; whose
r esul t s
are used to hel p determi n e
pri n ci p als
retenti o n,
promoti o n,
or
bonus, or cash awards to, honors for, status of, or budget of the school
at
the school
l e vel . At the di s tri c t
l e vel , in formati o n
i s made publ i c
and
voters or school
board can i n sti g ate
systemwi d e
change,
and at the state
l e vel , in formati o n
i s made publ i c
and voters or l e gi s l a ture
can i n sti g ate
systemwi d e
change.

Achi e vement

Test

A test desi g ned


to measure
a persons
knowl e dge,
understandi n g,
accompl i s hment
i n a certai n subj e ct
area, or the degree to whi c h
possesses
a certai n ski l . Achi e vement
tests shoul d
be di s ti n gui s hed
apti t ude
tests, whi c h attempt to esti m ate
future performance.

Assessment

General l y
refers to l a rge-scal e ,
systemwi d e
measurement
programs
for
pupi l di a gnosi s ,
program
eval u ati o n,
accountabi l i t y,
resource
al l o cati o n,
teacher eval u ati o n.

Cri t eri o n-Referenced

Hi g h-Stakes

or
a person
from

Test

Test

Norm-Referenced

Performance-Based

A test that al l o ws i t s users to i n terpret


scores i n rel a ti o nshi p
to a
functi o nal
performance
l e vel . Cri t eri o n-referenced
measures
provi d e
i n formati o n
as to the degree of competence
attai n ed
by a parti c ul a r
student, wi t hout
reference
to the performance
of others.
A test that i s used to determi n e
promoti o n,
retenti o n,
H i g h-stakes
tests and tests used for s tudent-l e vel
consi d ered
synonymous.

Test

Test

or graduati o n.
accountabi l i t y

are

A test that shows a persons


rel a ti v e
standi n g
al o ng a conti n uum
of
attai n ment
i n compari s on
to the performance
of other peopl e
in a
speci f i e d
group, such as test-takers
of a certai n age or group.
A test that measures
abi l i t y
by assessi n g
open-ended
responses
or by
aski n g a person to compl e te
a task. Al s o known as al t ernati v e
assessment,
constructed
response,
or task performance,
performance-based
tests
requi r e
the respondent
to produce
a response
or demonstrate
a ski l or
procedure.
Exampl e s
i n cl u de
answeri n g
an open-ended
questi o n,

Page

81

GAO/PEMD-98-8

Student

Testi n g

or

-.--__ ---

conversi n g
i n a forei g n
showi n g
al l cal c ul a ti o ns,
sci e nce
experi m ent.

Rel i a bi l i t y

Strati f i e d

Systemati c

sol v i n g
a mathemati c s
probl e m
whi l e
an essay on a gi v en topi c , or desi g ni n g

The rel i a bi l i t y
of a test refers to the degree to whi c h test resul t s are
consi s tent
across test admi n i s trati o ns.
Indi v i d ual
student scores are
rel i a bl e
i f the same student gi v es the same answers to the same questi o ns
asked at di f ferent
ti m es. Test rel i a bi l i t y
can al s o be measured
at the
cl a ssroom,
school ,
or di s tri c t
l e vel . Tests tend to be rel i a bl e
i f thei r
questi o ns
are cl e ar and focused
and unrel i a bl e
i f thei r questi o ns
are vague,
contradi c tory,
or confusi n g.
Rel i a bi l i t y
can be measured
rather preci s el y .

Representati v e

Standardi z ed

l a nguage,
wri t i n g

Sampl e

Test

Sampl e

Sampl e

A sampl e
accuratel y
study.

i s a subgroup
of a popul a ti o n,
A sampl e
refl e cts the character
of the popul a ti o n

A test i s standardi z ed
i f i t i s gi v en i n i d enti c al
students
i n more than one school ,
and al l the
same way. Tests scored by machi n e-readi n g
b ubbl e s
are not the onl y type of standardi z ed
open-ended
essay questi o ns
and other ki n ds
can be standardi z ed,
too, i f the condi t i o ns
of
are careful l y
control l e d
across school s .

i s representati v e
i n those aspects

if it
under

form and at the same ti m e to


resul t s are marked
i n the
of student marks i n answer
test. Tests wi t h
of performance-based
tests
admi n i s trati o n
and scori n g

In strati f i e d
sampl i n g,
a researcher
sel e cts randoml y
wi t hi n
each of
separate
homogenous
subsets, or strata. The val u es deri v ed
from each of
these subsampl e s
are then wei g hted
accordi n g
to the proporti o n
of the
popul a ti o n
represented
by each subset.
In a systemati c
sampl e , the researcher
randoml y
pi c ks a number
between
zero and a number
n/x, wi t h n bei n g the popul a ti o n
si z e and x bei n g the
si z e of the systemati c
sampl e . Then, starti n g wi t h that random
number,
the
researcher
pi c ks every n/x i t em unti l x i t ems are sel e cted.
In our study, we
pi c ked
a systemati c
sampl e from our nati o nal
sampl e , pi c ki n g
every n/x
i t em i n the order i n whi c h the questi o nnai r es
were returned
i n the mai l .

Page 82

GAO/PEMD-93-8

Student

Testi n g

Gl o uary

Systemwi d e

Val i d i t y

Test

We defi n ed
systemwi d e
tests, for the purpose
of thi s study, as any test that
i s admi n i s tered
to al l students,
to al m ost al l students,
or to a
representati v e
san$e
of al l students
wi t hi n
a j u ri s di c ti o n
for at l e ast one
grade l e vel . Such a test can i n cl u de
several
subj e ct
areas i n a test battery.
Tests that are opti o nal
for the student (as are col l e ge
entrance
tests) or
that are onl y admi n i s tered
to unrepresentati v e
subsets of the student
popul a ti o n
(as are tests for speci a l
educati o n
students)
are not i n cl u ded.
The val i d i t y
of a test refers to the degree to whi c h it measures
what it i s
desi g ned
to measure.
There are several
ki n ds of val i d i t y.
Curri c ul a r
val i d i t y,
for exampl e ,
woul d be strong if a test contai n ed
questi o ns
based
on the content of the curri c ul u m
and weak if a test contai n ed
questi o ns
not, based on the content of the curri c ul u m.
Predi c ti v e
val i d i t y
woul d be
strong if an i n di v i d uals
test score accuratel y
forecasted
some other event,
such as the l i k el i h ood
of graduati n g
or succeedi n g
i n a parti c ul a r
:
endeavor.
Unl i k e
rel i a bi l i t y,
val i d i t y
i s di f fi c ul t
to measure
preci s el y .

Page

GAWPEMD-93-S

83

/ :

Student

Twti n g

Bi b l i o graphy

Anderson,
Beverl y . T est Use Today i n El e mentary
School s
and Secondary
School s .
In Al e xandra
K. Wi g dor
and Wendel l
R. Garner, eds., Abi l i t y
Testi n g:
Uses, Consequences,
and Controversi e s.
Washi n gton,
D.C.:
Nati o nal
Academy
Press, 1982.
Burry, James, et al . Testi n g
i n the Nati o ns
School s
and Di s tri c ts:
Much? What Ki n ds? To What Ends? At What Costs? Los Angel e s:
Center for the Study of Eval u ati o n,
1982.
Cateral l ,
James. The Cost of Instructi o nal
Prom Two Study Di s tri c ts.
Los Angel e s:
Eval u ati o n,
1983.
Caterai l ,
James. F undamental
In M.C. Ai k i n and L.C. Sol m on,
Cal i f .: Sage, 1983.

Informati o n
UCLA Center

Croni n ,
Boston:
1986.

Systems: Resul t s
for the Study of

Issues i n the Costi n g


of Testi n g
eds., The Costs of Eval u ati o n.

Col e y, Ri c hard
D., and Margaret
E. Goertz.
50 States: 1990. Pri n ceton,
N.J.: Educati o nal

Educati o nal
Testi n g

Standards
i n the
Servi c e, August 1990.

Don, and James Cateral l .


UCLA Center for the Study

Test Use Proj e ct: Costs of Testi n g.


of Eval u ati o n,
November
1982.

Educati o n
Commi s si o n
of the States. N ati o nal
Leader,
11:l (spri n g
1992).

Efforts.

Educati o n
Week. B y Al l Measures:
The
Assessments.
Educati o n
Week, Speci a l

Over Standards
June 17,1992.

Hal a dyna,
Standardi z ed
Pol l u ti o n.

Page

84

Programs,
Beverl y
Hi & s,

J.M. The Cost of Nati o nal


and State Educati o nal
Assessments.
Study Group on the Nati o nal
Assessment
of Student Achi e vement,

Dorr-Bremme,
Los Angel e s:

Jaeger,
Hopes,

How
UCLA

Debate
Report,

Thomas
M., Susan Bobbi t Nol e n, and Nancy
Achi e vement
Test Scores and the Ori g i n s
Educati o nal
Researcher,
20:5 (1991).

Ri c hard
M. L egi s l a ti v e
and Desi r es.
Phi Del t a

State Educati o n
and

S. Haas. R ai s i n g
of Test Score

Perspecti v es
on Statewi d e
Kappan,
November
1991.

GAOi P EMD-93-8

Testi n g:

Student

Goal s ,

Testi n g

---Bi b l i o graphy

Koretz, Dan, et al . The Rel i a bi l i t y


of Scores From the 1992 Vermont
Portfol i o
Assessment
Program: Interi m Report. Techni c al
Report No. 355.
Los Angel e s:
UCLA Center for the Study of Eval u ati o n,
December
1992.
Madaus,
George F. T he Effects of Important
Tests on Students:
Impl i c ati o ns
for a Nati o nal
Exami n ati o n
System. Phi Del t a Kappan,
November
199 1.
Madaus,
George
i n the European
report submi t ted
McLaughl i n ,
Phi Del t a

F., and Thomas


Kel l a ghan.
Student Exami n ati o n
Systems
Communi t y:
Lessons
for the Uni t ed States. Contractor
to the Offi c e of Technol o gy
Assessment,
June 1991.

Mi l b rey
W. T est-Based
Kappan, November
1991.

McRae, Dougl a s
J. T OPIC: Too
Cal i f .: CTB Macmi l a n/McGraw-Hi l ,
Mehrens,
Securi t y

Wi l i a m
Practi c es.

A., S.E. Phi l i p s,


East Lansi n g,

Accountabi l i t y

Much

Testi n g?
November

as a Reform
Press rel e ase.
15,199O.

and Chri s ti n e
Mi c h.: Mi c hi g an

Strategy.
Monterey,

M. Schram
Survey of Test
State Uni v ersi t y,
1992.

Nati o nal
Associ a ti o n
of El e mentary
School Pri n ci p al s .
S tandardi z ed
Tests
Useful - But
Dont Need More, Say Pri n ci p al s .
Press rel e ase. Al e xandri a ,
Va.: March 27, 1992.
Nati o nal
Gateway:

Commi s si o n
Transformi n g

on Testi n g
Testi n g

and Publ i c
i n Ameri c a.

Pol i c y.
Boston:

From Gatekeeper
Boston Col l e ge,

Nati o nal
Counci l
on Educati o n
Standards
and Testi n g.
Rai s i n g
for Ameri c an
Educati o n.
Washi n gton,
D.C;: January
1992.
Nati o nal
Standards

Counci l
of Teachers
of Mathemati c s.
for School Mathemati c s.
Reston,

Curri c ul u m
Va.: 1989.

to
1990.

Standards

and Eval u ati o n

Pechman,
El l e n M., and Pei r ce A. Hammond.
A Background
Report on
Educati o nal
Assessment.
Washi n gton,
D.C.: Nati o nal
Research
Counci l
the Nati o nal
Academy
of Sci e nce, Jul y 1991.
Roeber, Edward D. Survey of Large-Scal e
Assessment
Programs.
Washi n gton,
D.C.: Associ a ti o n
of State Assessment
Programs,
Counci l
Chi e f State School Offi c ers, fal l 1990 and spri n g 1991.

Page

85

GAO/PEMD-98-8

Student

of

of

Testi n g

Bi b l i o graphy

Shavel s on,
Assessments:
Researcher,

Ri c hard
J., Gai l P. Baxter, and Jerry
Pol i t i c al
Rhetori c
and Measurement
21:4 (May 1992).

Shepard, Lorri e A. W i l
Del t a Kappan, November

Nati o nal
1991.

Tests

Improve

Pi n e. P erformance
Real i t y.
Educati o nal
Student

Learni n g?

Si n cl a i r ,
Beth, and Babette Gutman. A Summary
of State Chapter
Parti c i p ati o n
and Achi e vement
Informati o n
for 198980.
Prepared
Department
of Educati o n,
Offi c e of Pol i c y and Pl a nni n g,
1992.

Phi
1
for U.S.

Smi t h, Marshal l .
P ol i c y
Issues: The Systemi c
Character
of Reform and
Impl i c ati o ns
for Curri c ul u m
and Equi t y. Paper presented
at the annual
meeti n g
of the Ameri c an
Educati o nal
Research
Associ a ti o n,
San
Franci s co:
Apri l 1992.
Tech,
1992.

Thomas.

S chool s

for Scandal .

U.S. News & Worl d

Report,

Apri l

U.S. Congress.
House Commi t tee
on Educati o n
and Labor. Oversi g ht
Heari n g
on the Report of the Nati o nal
Counci l
on Educati o n
Standards
Testi n g,
seri a l no. 102-105. Washi n gton,
D.C.: U.S. Government
Pri n ti n g
Offi c e, February
19,1992.
US. Congress.
Offi c e of Technol o gy
School s :
Aski n g the Ri g ht Questi o ns,
Government
Pri n ti n g
Offi c e, February

Assessment.
oTA-SET-619.
1992.

Page

96

and

Testi n g
i n Ameri c an
Washi n gton,
D.C.: U.S.

Wi g dor,
Al e xandra
K., and Bert F. Green, eds. Performance
the Workpl a ce
(Vol u me
I) and Techni c al
Issues (Vol u me
DC.: Commi t tee
on the Performance
of Mi l i t ary
Personnel ,
Research
Counci l ,
1991.

(878740)

27,

GANPEMD-99-8

Assessment
II). Washi n gton,
Nati o nal

Student

for
I,

Testi n g

--

Ortl t ~ri n g

Informati
_ o n.-__.

Ihc~ fi r st. copy of each GAO report


and testi m ony
i s free. Addi t i o nal
copi t bs
are $2 each. Orders
shoul d
be sent to the fol l o wi n g
address,
accompani e d
by a check
or money
order
made out to the Superi n tendent
of Documents,
when
necessary.
Orders
for 100 or more
copi e s
to be mai l e d
to a si n gl e
address
are di s counted
25 percent.
l J .S. (;eneral
Accounti n g
P.O. Box 6015
(;ai thershurg,
MD 20877
Ordt*rs

may

al s o

be pl a ced

Offi c e

by cal l i n g

(202)

2756241.

Related Interests