You are on page 1of 13

European Journal of Psychological Assessment, Vol. 13, Issue 2, pp.

140–152 © 1997 Hogrefe & Huber Publishers

Wayne J. Camara ITC Bulletin: Test Ethics in the USA

Use and Consequences of Assessments in the USA:


Professional, Ethical and Legal Issues
Wayne J. Camara
The College Board, Princeton Junction NJ, USA

Tests and assessments in the USA have taken on additional burdens as their uses have been greatly ex-
panded by educators, employers, and policy makers. Increased demands are often placed on the same
assessment, by different constituencies, to serve varied purposes (e. g., instructional reform, student account-
ability, quality of teaching and instruction). Such trends have raised considerable concerns about the ap-
propriate use of tests and test data and testing is under increased scrutiny in education, employment and
health care. This paper distinguishes among the legal, ethical and professional issues recently emerging
from the increased demands of assessments and also identifies unique issues emanating from computer-
based modes of test delivery and interpretation. Efforts to improve assessment practices by professional
associations and emerging issues concerning the proper use of assessments in education are reviewed.
Finally, a methodology for identifying consequences associated with test use and a taxonomy for evaluating
the multidimensional consequential outcomes of test use within a setting are proposed.

Keywords: ethics, legal issues, assessment, USA

Individuals are first exposed to tests and assess- ployment, and health care settings (Berliner & Bid-
ments very early in their school years in the United dle, 1995). The real and perceived misuses of assess-
States. By the age of 18, assessments have already ments and assessment results have become one of
played a significant role in the life decisions of many the most challenging dilemmas facing measurement
young adults such as graduation, promotion and re- professionals and test users today. Abuses have
tention, college admissions, placement, and scholar- been widely reported in preparing students to take
ship awards. Tests and assessments are also widely tests and in the use and misuse of data resulting
used in the psychological and educational screening from large-scale testing programs. Elaborate and
of children and adults, for career and vocational as- costly test cheating operations have been disclosed
sessment, for certification and licensing of individu- by federal agencies (Educational Testing Service,
als for a number of occupations, and for the selec- 1996), test preparation services have employed con-
tion and placement of workers within government federates to allegedly steal large pools of items from
and private sector organizations. Given the diverse computer-based admissions testing programs, in-
and important use of tests and assessments, mea- stances of students and employees being provided
surement professionals have become increasingly with actual test items before an administration have
concerned with questions of validity, fairness, in- been reported, as have unauthorized extension of
tended use(s) and consequences related to the ap- time limits, falsification of answer sheets and score
propriate use of educational and psychological as- reports, and violations of the confidentiality of test
sessments. data (Schmeiser, 1992). Misuses of test data in high
The ethical and legal conduct of lawmakers, ce- stakes programs abound and the accuracy and mar-
lebrities, athletes and professionals from all areas keting tactics of test publishers have been criticized
(e. g., business, investment, marketing, law) have at- in some (Sackett, Burris, & Callahan, 1989; Sackett
tracted headlines and the attention of mass media. & Harris, 1984).
In the U. S. and many European countries there has Professional conduct and responsibilities in use
been a fixation on such ethical and legal issues in- of assessments can be ordered within three levels:
volving the responsible conduct and obligations to (1) legal issues, (2) ethical issues, and (3) profession-
the public in recent years. Attention has also focused al issues. The practices and behaviors within these
on the use of tests and test results in education, em- three levels are certainly interrelated, yet this cate-
ITC Bulletin: Test Ethics in the USA 141

gorization is useful to initiate a discussion of con- more common issues and concerns of test use unad-
cerns, the severity of inappropriate practices and be- dressed (e. g., validity of the measure when dispa-
haviors and examples of assessment practices which rate impact is not present). Numerous federal and
are most likely to be questioned by professionals state laws and executive orders have implications on
and the public. employment testing primarily through prescribed
This paper, focusing on the U. S., will first discuss standards for equal employment opportunity
the assessment practices and behaviors which raise (Camara, 1996), but also for the assessment of indi-
professional, ethical and legal concerns. Second, the viduals with disabilities, the handling and retention
paper discusses efforts in addressing these concerns of personnel records, and restrictions of the use of
and the diversity among individuals using tests and certain pre-employment techniques (e. g., Em-
test results. Third, a paradigm is proposed for iden- ployee Polygraph Protection Act of 1988). The gen-
tifying and evaluating consequences associated with eral consensus among industrial psychologists is
test use. Unique professional issues emerging from that Civil Rights laws, which emanated in the 1960s,
the increased reliance on technology and computer- have been a major stimulus for improved pre-em-
based testing are briefly reviewed. Finally, the vari- ployment assessment practices. Employers became
ety of such issues directly relating to educational as- better educated about the technical, professional,
sessments are illustrated to provide a context for and legal issues involved in the use of testing out of
discussions of the technical, legal and professional necessity, and while there is some evidence that
issues involved in assessment. regulations initially decreased use of employment
testing, today they are used by a higher proportion
of organizations than ever (Deutsch, 1988).
The first formal ethics code for any profession us-
Legal, Ethical and Professional Issues ing assessments was adopted by the American Psy-
chological Association (APA) in 1952. Eighteen of
in Testing and Assessment
the more than 100 ethical principals from this Code
(APA, 1953) addressed the use of psychological
It is difficult to define the boundaries of and distinc- tests and diagnostic aids, and addressed the follow-
tions between professional, ethical and legal issues ing issues of test use: (1) qualifications of test users
or concerns surrounding the development and use (3 principles); (2) responsibilities of the psychologist
of tests and assessments. Legal, ethical, and profes- sponsoring test use (4 principles); (3) responsibili-
sional issues form a continuum of standards for pro- ties and qualifications of test publisher’ repre-
fessional conduct in assessment and other areas. sentatives (3 principles); (4) readiness of a test for
Laws and government regulations are legal man- release (1 principle); (5) the description of tests in
dates that affect all individuals living in a society. manuals and publications (5 principles); and (6) se-
Ethical codes may range from enforceable to exem- curity of testing materials (2 principles).
plary to educational principals that guide the pro- Codes from the Canadian and British Psycholog-
fessional behavior and conduct of members of any ical Associations came later, as did those from other
profession. Professional guidelines, principals and European nations (Lindsay, 1996). In the past dec-
standards are also developed to educate and guide ade, many other professional associations have
professionals in more technical activities. All three adopted ethical standards and professional codes
layers of regulations or standards exist in testing and which cover measurement and assessment issues.
assessment. These trends have resulted from the increased pub-
Laws and legal documents about testing and as- lic awareness of ethical issues, the variety of new
sessment are generally vague and ambiguous, but it proposed and actual uses for assessments, the in-
is clear that where they exist, they have greatly in- creased visibility given to assessments for account-
fluenced both professional standards of conduct ability purposes, and a commitment from the pro-
and professional practices in assessment and testing. fessions to safeguard the public (Eyde & Quain-
Government involvement and regulation of testing tance, 1988; Schmeiser, 1992). Ethical standards of
is most evident in personnel testing. But even here, the American Counseling Association, and APA are
laws and legal challenges to testing are limited to unique in that these associations maintain formal
very specific domains. They address some issues enforcement mechanisms that can result in member
(e. g., discrimination) and applications (e. g., em- suspension and expulsion, respectively. In 1992, the
ployment testing) of assessment which have re- American Educational Research Association
ceived widespread attention, while leaving many (AERA) adopted ethical standards, followed in
142 Wayne J. Camara

1995 by the National Council of Measurement in Organization for Competency Assurance, 1993),
Education’s (NCME) Code of Professional Re- educational testing (Joint Committee on Testing
sponsibilities in Educational Measurement. Several Practices, 1988); (2) for specific groups or users such
other organizations such as the Society for Indus- as classroom teachers (AFT, NCME, NEA, 1990),
trial and Organizational Psychology (SIOP) and re- and test takers (Joint Committee on Testing Prac-
gional I-O organizations formally adopted APA’s tices, 1996), and (3) for specific applications such as
most recent ethical code for their members for edu- performance assessments, adapting and translating
cational purposes without any enforcement mecha- tests (Hambleton, 1994), and admissions testing
nisms. (College Entrance Examination Board, 1988; Na-
Laws which affect testing, primarily strive to pro- tional Association of Collegiate Admissions Coun-
tect certain segments of the public from specific selors, 1995).
abuses. Ethical standards and codes attempt to es- Professional standards, principals, and guidelines
tablish a higher normative standard for a broader are more specific and generally oriented toward
range professional behaviors. For example, APA’s more technical issues to guide test users with specif-
ethical standards note that: ic applications and use of assessments. First, techni-
. . . in making decisions regarding their profes- cal issues concerning the development, validation,
sional behavior, psychologists must consider this and use of assessments are addressed in standards.
Ethics Code, in addition to applicable laws and Validity is the overarching technical requirement
psychology board regulations. If this Ethics Code for assessments, however, additional professional
establishes a higher standard of conduct than in and social criteria have been considered in evaluat-
required by law, psychologists must meet the ing assessments, such as: (1) how useful the test is
higher ethical standard. If the Ethics Code stan- overall, (2) how fair the test is overall, and (3) how
dard appears to conflict with the requirements of well the test meets practical constraints (Cole &
law, then psychologists make known their com- Willingham, 1997). These criteria are directed at
mitment to the Ethics Code and take steps to re- both the intended and unintended uses and conse-
solve the conflict in a responsible manner (APA, quences of assessments. Existing standards guide
1992, p. 1598). test users in the development and use of tests and
assessments; however, these standards may rarely
Coinciding with this increased attention to ethical reach and influence test users not associated with a
codes has been a dramatic increase in professional profession.* For example, most employers are un-
and technical standards for assessment issued which aware of the Principles for the validation and use of
is described later. The Standards for Educational personnel selection procedures (Society for Indus-
and Psychological Testing (AERA, APA, & NCME, trial and Organizational Psychology, 1987) and cer-
1985) are the most widely cited document address- tainly the vast majority of educational administra-
ing technical, policy, and operational standards for tors and policy makers who determine how to use
all forms of assessments that are professionally de- tests and cite test results in making inferences about
veloped and used in a variety of settings. Four sepa- the quality of education have never viewed a copy
rate editions of these standards have been devel- of any of the various standards in educational mea-
oped by these associations and a fifth edition is cur- surement and testing.
rently under development. However, numerous Professional standards developed by groups such
other sets of standards have been developed to ad- as APA and AERA do not appear in publications
dress more specific applications of tests or aimed at commonly read by employers, educators and policy
specific groups of test users. Standards have been makers. Many standards are written at a level where
developed for: (1) specific uses such as the valida- they may be incomprehensible to such individuals
tion and use of pre-employment selection proce- even if they had access to them. Finally, in many in-
dures (Society for Industrial and Organizational stances, members of the professional associations
Psychology, 1987), integrity tests (ATP, 1990), licens- which develop and release standards themselves
ing and certification exams (Council on Licensure, may not have and use copies of the standards and
Enforcement, and Regulation, 1993; Council on Li- may have had little exposure to the standards and
censure, Enforcement and Regulation & National other new topics in testing, measurement, and sta-

* Often professional standards have been cited by courts in case law and have influenced assessment practices in these
ways.
ITC Bulletin: Test Ethics in the USA 143

tistics through graduate training courses (Aiken, tested, the more likely instances of test misuse will
West, Sechrest, Reno, Roediger, Scarr, Kazdin, & occur.
Sherman, 1990). For example, the Standards for With the expanded uses and increased focus on
Educational and Psychological Testing, referred to assessment has come renewed criticism of the mis-
as “the Standards” (AERA, APA, & NCME, 1985), uses and negative consequences of assessments.
which are the most widely cited professional stan- Professionals in measurement and testing are in-
dards for any form of testing and assessment, have creasingly struggling with how to best improve
total sales of 56,000 through 1996, while there are proper test use and to both inform and influence an
more than 120,000 members of APA alone (Brown, increasingly diverse group of tests users today who
1997). may have no formal training in testing and measure-
ment uses, but still have legitimate claims for using
test results in a wide variety of ways. Such groups
have attempted to address the legal, ethical, and
Efforts to Improve Proper Use of Tests professional concerns with additional codes of con-
duct, technical standards, workshops, and case stud-
and Assessments
ies. However, most of these efforts rarely reach be-
yond members of the specific professional associa-
In the 1950s, when the first technical standards for tion or clients/users of a specific assessment product.
testing were adopted in the United States (APA Clearly such efforts are essential for improving the
1955, AERA, & NCMUE, 1954) the test user was proper use of assessments and appropriate under-
considered to be a trained professional who con- standing of assessment results. Yet, these initiatives
ducted testing or disseminated results and interpre- will not generally reach the secondary users who
tations to an individual. This classic definition of test may insist on using assessment results as the sole de-
user includes psychologists, physicians, counselors, terminant of high school graduation, rewards and
personnel managers, and state or local assessment sanctions to schools and teachers, and the primary
directories who generally have both some training indicator of equity, educational improvement or stu-
and explicit job responsibilities for assessment. dent achievement. Unfortunately, efforts which are
These test users seek to qualify for purchasing test aimed at only one segment of a much more expan-
materials, provide detailed interpretation of test sive population of test users may not go far enough,
scores, or represent a client organization in procur- fast enough for improving proper assessment prac-
ing and using assessments. The current version of tice.
the Standards (AERA, APA, & NCME, 1985) fol- Associations have attempted to cope with this
lows this line in defining the test user as someone new more expansive group of “secondary test users”
who “requires the test for some decision making by developing broader and simpler forms of stan-
purpose” (p. 1). These individuals may best be dards, such as the Code of Fair Testing Practices in
termed “primary test users” because of their role Education which basically condenses the primary
and responsibilities. Today the concept of test user standards from a 100 page document into a four
is much broader and often includes many different page booklet which encourages duplication. Other
individuals with little or no training in measurement efforts have been to work in collaboration with
and assessment. However, there are secondary test broader groups such as the National Education As-
users, especially in education, including policy mak- sociation to develop codified guidelines or stan-
ers, teachers, parents and the media, who often have dards. However all such efforts have had no consis-
no general training in assessment and no prescribed tent impact across situations because there are few
responsibilities for assessment. Some of these indi- common linkages, different priorities and expecta-
viduals can greatly influence and distort the general tions for assessments, and little common under-
interpretation of assessment results, misuse assess- standing between primary and secondary test users.
ments and results, and may have political incentives Relatively few efforts have been focused on under-
to selectively use results to support certain values or graduate and graduate programs which train teach-
beliefs (Berliner & Biddle, 1995). Many of the most ers and measurement specialists. Because universi-
striking examples of test misuse are generated and ties and colleges differ in types of programs offered,
supported by such secondary users. The more re- the title of courses and course sequences, and even
moved the test user is from the examine, the less fa- the departments which such programs are housed
miliar they are with the personal characteristics, aca- in, targeting and reaching educational programs
demic abilities, or workplace skills of the individual broadly presents a number of substantial logistical
144 Wayne J. Camara

obstacles. Often it is difficult to identify the faculty There are also often positive consequences when
and administrators responsible for such programs validated assessments are appropriately used:
and to effect systematic changes in their training – merit as a guide for decision making (selecting
programs. the most qualified candidate or making awards
For these and other reasons, Haney and Madaus based on relevant performance)
(1991) state that test standards have had little direct – efficiency (relatively quick and effective means of
impact on test publishers practice and even less im- collecting a large amount of data across a range
pact on test use. They note that professional codes of skills/competencies)
and standards primarily serve to enhance the pres- – quality control (certification or licensure)
tige, professional status, and public relations image – protection of the public (negligent hiring for criti-
of the profession rather than narrow the gap be- cal occupations)
tween standards and actual practice. How do we re- – objectivity for making comparisons and decisions
solve these issues? Given the increased social policy among individuals or against established criteria
implications of testing, some have argued that great- – cost effectiveness and utility
er legal regulation, litigation, or enforcement of
technical standards by independent auditing agency Consideration of how social ramifications of assess-
present some potential mechanisms for reducing ments affect validity has been summarized by Cron-
the misuse of assessment practices (Haney, 1996; bach (1988) who stated that validity research is es-
Haney, Madaus, & Lyons, 1993; Madaus, 1992). sentially a system which considers personal, institu-
However, such mechanisms may have little impact tional, and societal goals as they relate to inferences
on many of the most visible misuses of assessments derived from test scores. If validity is established
because it is often legislative and executive through evidence that supports inferences regard-
branches of state and federal government who ad- ing specific uses of a test, then intended and unin-
vance expanded and often inappropriate use of as- tended consequences of test interpretation and use
sessments. Because test use is so expansive and should be considered in evaluating validity (Mes-
abuses are so diverse, solutions which address only sick, 1989). Test developers and test users need to
one element or one audience (e. g., test developer, anticipate negative consequences that might result
teacher) may not be equipped to resolve the major- from test scores, potential corruption of tests, nega-
ity of instances where assessments are misused. tive fallout from curriculum coverage, and how
teachers and students spend their time (Linn, 1993).
While there is some consensus that the conse-
quences of test use must become an important cri-
Consequences of Testing and terion in evaluating tests within education, this view
is not generally held in other settings (e. g., person-
Assessment nel, clinical).
Before consequences can be integrated as a com-
A more thorough understanding and consideration ponent in evaluating assessments, a taxonomy or
of potential consequences of test user can substan- model is required. Such a taxonomy must consider
tially reduce inappropriate uses and the resulting both positive and negative impacts and conse-
ethical issues. When consequences are discussed we quences associated with test use. The impact, conse-
are often reminded of the exclusively negative con- quences, and feasibility of alternative procedures
sequences resulting from test use: (e. g., biographical data, open admissions vs selec-
– adverse impact on minorities and women tion). Further complicating such a taxonomy is the
– discouraging and ‘institutionalizing’ failure for knowledge that different stakeholders will have
individuals widely differing views on these issues. After the con-
sequences have been identified, their probability of
– teaching to the test and limiting curriculum and
occurrence, the weight (positive or negative) asso-
learning
ciated with each consequence, and the level at which
– reinforcing tasks requiring simple rote memory the consequence occurs (i. e., individuals, organiza-
at the expense of more complex cognitive pro- tions, or society) must be determined.
cesses required for success in the real world This taxonomy borrows from terminology and
– creating barriers processes from expectancy theory (Vroom, 1964)
– tracking individuals into classrooms and jobs of- where the weight of the consequences are similar to
fering fewer challenges and opportunities the “valence” and the probability is related to the
ITC Bulletin: Test Ethics in the USA 145

Table 1. Paradigm for evaluating the consequential basis of assessments.


Consequence Individual Organization Societal
(e. g., student) (e. g., school) (e. g., community)
Positive
Harmful
Summative

Consequence #1 = Valence × Instrumentality Figure 1. Computation of the summa-


tive consequence for each potential
consequence associated with test use.
Summative Strength of the Probability
Two examples of how specific outco-
Consequence = consequence consequence
mes of a high standards graduation test
× will occur
can produce summative consequences
for arriving at a consensus regarding
–10 to +10 0 to 10
the potential positive and harmful con-
sequences of assessment. For each of
Example: A state proposes development of a high standards test which all the possible consequences, the sum-
students must pass to graduate from high school. This proposed test has num- mative consequence would be sum-
erous potential consequences for the students, schools, districts, and the sta- med by level (individual, organization,
te, which would include the business community, parents, citizens, etc. Be- societal) to arrive at an overall index
low are only two of several potential consequences of such a testing program. of the potential consequences of an as-
sessment for the level. A consensus
process would be employed to develop
Example: Individual Consequence #1 these values and to determine the
overall desirability of the proposed as-
Consequence #1 Individual Probability sessment
increase student consequence
drop out rate
×
–40 –8 5

Example: Societal Consequence #2

Consequence #2 Organizational Probability


higher standards consequence
will increase the
value of a diploma
and produce more
competent gradu-
ates ×
27 +9 3

“instrumentality.” Proposed steps (adopted from, curs (does it impact individuals, organizations, or
Camara, 1994) in determining the consequences of society?)
test use include: 5. Determine the probability of occurrence for each
1. Identify the intended consequences and objec- consequence (e. g., instrumentality)
tives of assessment 6. Determine the strength or weight of each conse-
2. Identify subject matter or content experts or de- quence (e. g., valence)
velop an alternative consensus process 7. Employ a consensus process to determine the
3. Identify potential intended and unintended con- summative consequences of different aspects of
sequences to individuals and organizations test use on individuals, organizations and society.
through a review of the literature, interviews or
focus groups with key stakeholders Ideally the test developer, test users, and other key
4. Determine the level that each consequence oc- stakeholder groups would consider these issues be-
146 Wayne J. Camara

fore embarking on a new or revised testing program. quiring much greater time and effort for clinical
Most consequences will have multiple impacts on judgment and interpretation of computer-generat-
individuals (e. g., test taker, teacher), organizations ed clinical interpretations than is usually the case.
(e. g., schools, business), and society (e. g., commu- “Specifically, two or more identical soil readings,
nity, state). Steps 5 and 6 require individuals, often blood chemistries, meteorological conditions or
with very diverse views, to arrive at a consensus or MMPI profiles may require different interpreta-
common judgment about the probabilities and tions depending on the natural or human context in
strength (and direction) of consequences. The liter- which each is found . . . use of the same ‘objective’
ature on standard setting may be of assistance in finding (e. g., an IQ of 120 or a ‘27’ MMPI codetype)
structuring a more explicit process. may be quite different if the ‘unique’ patient is a 23-
Table 1 illustrates how consequences may be year-old individual being treated for a first acute,
identified and classified through a consensus pro- frankly suicidal episode than if the ‘unique’ patient
cess. A list of potential consequences would be de- is a 52-year-old truck driver . . . applying for total
veloped and classified within each of the nine boxes disability” (Matarazzo, 1986, pp. 20–21). Eyde and
(step 4). Once all potential consequences are iden- Kowal (1987) explained that computer-based test-
tified, each consequence is fully evaluated to deter- ing provides greater access to tests and expressed
mine its valence and instrumentality as illustrated in concern about the qualifications of such expanded
Figure 1. Step 7 in the process would have key stake- test users.
holders determine the overall summative conse- Technological innovations and the increased
quences on individuals, organizations and society pressure for accountability in health care services
before a final decision is reached on the desirability may also be creating a different demand and market
and appropriateness of an assessment program or for clinical and counseling assessments. Assess-
proposed use for assessments. ments in all areas can be and are delivered directly
Such a taxonomy would not ensure that test mis- to consumers. The availability of a take-home CD-
use is minimized, but it would help to raise aware- ROM IQ test for children and adults, marketed to
ness of the diverse range of issues that emerge the general public or by Pro-Ed, a psychological
across different stakeholders and constituency testing and assessment publisher, has raised these
groups who are involved in a high stakes assessment same ethical and professional issues for psycholo-
program. The absence of literature proposing mod- gists. The CD-ROM test comes with an 80-page
els or taxonomies to identify and contrast conse- manual which informs parents of some of the theo-
quences associated with test use leaves the test de- ries of testing, how to administer the test, and how
veloper and user with little or no guidance in im- to deal with test results (New York Times, January
proving professional conduct and appropriate use 22, 1997). In such instances when tests are delivered
of assessments. by the vendor directly to the test taker there is no
traditional test user. The test taker or their parents,
who have no training and little knowledge of testing,
must interpret the results, which increases the risk
Concerns Arising from Technology of test misuse.
Computer-adaptive testing (i. e., assessments in
and Computed-Based Testing
which the examine is presented with items or tasks
matched to his or her ability or skill level) is increas-
Professional and ethical concerns about the appro- ingly used for credentialling and licensing examina-
priate use of testing have dramatically increased tions and admissions test programs today. Several
over the past few decades as the community of test unique concerns arise even when computer-based
users has increased and the use of assessments has tests are administered under controlled conditions,
expanded. Most recently technological innovations such as in the above instances. First, issues of equity
have given rise to a number of new and unique pro- and access arise because these computer-based test-
fessional and ethical challenges in assessment. ing programs often charge substantially higher test-
Matarazzo (1986), and later, Eyde and Kowal ing fees, which are required to offset the additional
(1987), identified many of the unique concerns cre- expenses incurred for test development and deliv-
ated by the use of computerized clinical psycholog- ery, and often have more limited geographical test-
ical test interpretation services. For example, ing locations. Second, familiarization with technolo-
Matarazzo noted that variation in clinical interpre- gy and completing assessments on computer may be
tation is often the rule rather than the exception, re- related to test performance. Research has demon-
ITC Bulletin: Test Ethics in the USA 147

strated that providing students with practice tests on Educational Assessment Today:
disk in advance of testing, and tutorials at the begin-
ning of the test administration are important in re-
Legal, Ethical and Professional
ducing anxiety and increasing familiarization with Concerns
computer-based testing. Third, differences in the
format, orientation, and test specifications of com- As tests are increasingly used for distinct and mul-
puter-based testing may affect the overall perform- tiple purposes negative consequences and misuse
ance of individual test takers. Russell and Haney are more likely to emerge. The use of performance
(1997) note that students who use computers regu- assessments and portfolios in high stakes assess-
larly perform about one grade level worse if tested ment programs can also raise additional issues
with a paper-and-pencil test than with a computer- about standardization and fairness. Nowhere are
based test. Students completing computer adaptive these concerns more evident than in educational as-
tests may also become more frustrated and anxious sessment today.
as they receive far fewer “easy items” since item se- In the past decade, there have been expanded ex-
lection algorithms are designed to match items with pectations for assessments to not only measure edu-
the level of each test taker’s ability — resulting in cational achievement but to bring it about. Assess-
more items that are perceived as “hard” by the test ments are increasingly viewed as tools to document
taker. Additionally, computer-based tests generally the need for reform by holding schools and students
do not permit test takers to review items previously accountable for learning, and also as leverages of re-
answered, as is common on paper-and-pencil tests. form (Camara & Brown, 1995; Linn, 1993). Presi-
Additional rules are required for students who omit dent Clinton has proposed development of national
a large number of items and disclosure of test forms. assessments for all students in reading and mathe-
Computer adaptive testing could be manipulated by matics by 1999 and called on all schools to measure
test takers or coaching schools if some minimum achievement of their students in these and other ar-
threshold of item completion were not required. Be- eas. Currently, forty-eight of fifty states in the U. S.
cause exposure of items is a major risk with these currently have in place or are developing large-scale
tests disclosure of test forms can not be as easily ac- educational assessments to measure the perform-
commodated as with paper-and-pencil tests (Mills ance of their students. In some states these tests are
& Stocking, 1996). These and other distinctions as- used for high stakes purposes such as issuing a di-
sociated with computer-based testing raise addi- ploma or rewarding/sanctioning schools, districts,
tional professional issues for test users, test devel- and even individual teachers. State and local boards
opers, and test takers. As Everson (1997) notes, con- of education and state and local departments of
vergence of new theories of measurement with education translate test performance to make deci-
increased technology presents many opportunities sions about schools and/or individuals. School ad-
for improved assessment frameworks, but also ministrators come under increased pressure in such
raises additional professional and ethical issues con- high-stakes testing programs to defend instructional
cerning assessment. practices and student achievement. Classroom
Fees for computer-based testing programs have teachers who administer the assessments and in-
generally been running between 300 to 600% higher creasingly view them as a driving force for instruc-
than the fees for the same paper-based tests. Cur- tional change and educational reform also have a
rently, test takers will receive immediate score re- role in such assessment programs. Parents, students,
ports and slightly more precise measurement accu- and the general public who demand improved qual-
racy at their level, but little additional benefits from ity in education, business leaders who are often criti-
the higher test fees. The few national programs of- cal of graduates for lacking appropriate workplace
fering computer-based testing programs have either skills, higher education which finds an increasing
eliminated (or plan to eliminate) the paper-and- proportion of incoming students requiring remedial
pencil or raised fees on the paper-based test to en- instruction, and policy makers who must respond to
sure adequate volume for the higher priced com- all these diverse stakeholder groups represent many
puter-based product. Until additional advantages different types of secondary test users.
are realized from computer-based tests, business Dissatisfaction with standardized assessments is
practices of replacing a lower priced test with one also greatest in education because of their perceived
that is three to six times as costly for the test taker negative consequences on learning and instruction.
should be questioned. The performance assessment movement has strong
support both within education and educational
148 Wayne J. Camara

measurement and has not become another educa- notes that such curricular validity can often be dem-
tional fad as some had predicted. Several large as- onstrated through survey responses from teachers
sessment programs had sought to replace their that ensure students had on average more than one
standardized testing programs with fully perform- opportunity to learn each skill tested.
ance-based or portfolio systems. Today it appears
that the “model” state assessment program will Evidence of opportunity for success. This challenge
combine such constructed response tasks with more emerges when major variations from standardiza-
discrete, selected response (e. g., multiple choice, tion occur. This assumes that all students are famil-
grid-in’s) test items. Employing multiple measures iar with the types of tasks on the assessment, the
allows educators to gain the benefits of more in- mode of administration (e. g., computer-based test-
depth and applied performance tasks that increase ing), have the same standardized administrative,
curricular validity, as well as increased reliability scoring procedures, and equipment (e. g., some stu-
and domain coverage that selected response items dents have access to a calculator or superior labora-
offer. However, a number of legal, ethical and pro- tory equipment in completing the assessment), and
fessional concerns emerge with any high stakes as- that outside assistance (e. g., group tasks, student
sessment program whether the decisions made pri- work produced over time where parents and others
marily affect the student or the school. could unduly offer assistance) could not affect per-
Single assessments, either norm-reference multi- formance on the assessment. Variations in these and
ple choice assessments or more performance-based other conditions can present an unfair advantage to
assessments, do not well serve multiple, high-stakes some students.
needs (Cresst, 1995). Often key proponents of large-
scale assessments support multiple uses, but actually Assessments reflect current instructional and curricu-
have very different priorities given these uses. Kirst lar practices. If assessments are designed to reflect
and Mazzeo (1996) explain that when such a state exemplary instructional or curricular practices, as is
assessment system moved from a design concept to often the desire of educators who hope to use the
becoming an operational testing program it became assessment to drive changes, which are not reflected
clear that not all the proposed uses and priorities for in the actual practices for many schools, a funda-
the design could be accommodated. When priorities mental fairness requirement may not be met. The
of key stakeholders could not be met, support for same challenges could be brought where teachers
the program decreased. do not receive the professional development to im-
Phillips (1996) identified legal criteria which ap- plement new instructional or assessment practices
ply to such expanded uses of assessments for high (e. g., use of a graphing calculator) that are required
stakes purposes. These criteria have been modified on the assessment or in end-of-course assessments
and supplemented with several additional criteria where the teacher lacks appropriate credentials for
which reflect a range of issues: the subject area.
While these concerns apply to most educational
Adequate advance notification of the standards re- assessments, they move from professional issues to
quired of students. To ensure fairness, students and legal and ethical concerns when assessments are
parents should be notified several years in advance used to make high stakes decisions. Additional ethi-
of the type of standards they will be held to in the cal and professional issues which have been associ-
future. Students and teachers should be provided ated with various high stakes educational assess-
with the content standards (knowledge and skills re- ments may also affect other types of testing pro-
quired) and performance standards (level of per- grams in other settings. Only a few of these issues
formance). Sample tasks, model answers, and re- are briefly addressed below.
leased items should be provided and clear criteria
should be established when high stakes (e. g., gradu- Overreliance or exclusive reliance on test scores. Test
ation) uses are associated with the test. performance should be supplemented with all rele-
vant and available information to form a coherent
Evidence that students had an opportunity to learn. profile of students when making individual high
The critical issue is whether students had adequate stakes decisions (e. g., admissions, scholarships).
exposure to the knowledge and skills included on Student performance on tests should be interpreted
the assessment or whether they are being asked to within the larger context of other relevant indica-
demonstrate competency on content or in skills that tors of their performance. In admissions decisions,
they were not exposed to in school. Phillips (1996) students grades, courses, and test scores are gener-
ITC Bulletin: Test Ethics in the USA 149

ally all considered, with supporting information on considered when making simplistic comparisons
personal qualities and other achievements. When among schools, districts, and other units. When these
testing has been repeated, performance on all ad- issues are not adequately considered by test devel-
ministrations will permit individuals to identify any opers and test users serious professional and ethical
particular aberrations; less weight should generally issues arise.
be assigned to that particular test score or other in-
dicator in these instances. Similar errors occur when
individuals overinterpret small score differences be- Exclusion of students from large-scale testing pro-
tween individuals, groups, or schools. grams. Most large-scale national assessment pro-
grams which use aggregate level data (school, dis-
Cheating and “teaching to the test.” There have been trict, state) to monitor educational progress and per-
numerous examples of individuals cheating on high mit comparisons systematically exclude large
stakes tests. In addition, several instances where proportions of students with limited English profi-
educators and other test users have been accused of ciency and disabilities (McGrew,Thurlow,& Spiegel,
systematic efforts of cheating (e. g., excessively high 1993) Often school staff determine which students
erasure rates on students papers, disclosure of an- may be excluded from such national and state test-
swer keys to job incumbents on promotional exams) ing programs and there is variation across schools
have received national attention, with some esti- in the exclusion rates and application of criteria for
mates that over 10% of test takers are cheating on excluding students. Paris, Lawton, Turner, and Roth,
high stakes tests (Fairtest, 1996). Because test scores (1991) have also demonstrated that “low achievers”
are used as an indicator of school quality, school per- are often excluded by some schools or districts
formance influences property values, school fund- which would have the effect of artificially raising
ing, and school choice — creating added incentives district test scores. Such practices introduce addi-
to increase school and district test scores by any tional error into analyses, complicate accurate pol-
means possible. These pressures often result in icy studies, affect the rankings resulting from the test
teaching to the test according to many educators data and introduce a basic unfairness in the use of
and this common criticism of standardized tests. It test data (National Academy of Education, 1992).
is such negative consequences and the prospect for
improved schooling that has caused the impetus for
performance assessment, not the desire for better Use of test scores for unintended purposes. Many of
measurement for its own sake (Dunbar, Koretz, & the most visible misuses of tests occur when scores
Hoover, 1991). are used for unintended purposes (Linn, Baker, &
Dunbar, 1991). This occurs with state comparisons
Consideration of the cultural and social experiences of unadjusted SAT or ACT scores, when results
of the test taker. Students bring their prior social, and from state assessments are used as indicators of
cultural experiences with them when they partici- teacher competence, and when test results become
pate in the class, compete in a sporting event, or the primary basis for inferences concerning the rela-
complete an assessment. For many students the cu- tive quality of learning or education among differ-
mulative effect of these experiences may be to em- ent schools or geographical regions. Test scores will
phasize certain behaviors, skills or abilities that are always be considered an important indicator of
less similar to those required of the assessment. The learning. However, test users must become more
greater the similarity of an individual’s socioeco- aware of the extraordinary limitations and weak-
nomic and cultural background to that of the major- nesses of placing undue weight on test scores in such
ity population, the better his or her test perform- situations. Many state reports cards have attempted
ance will generally be (Helms, 1992). Additional ef- to provide public reports on the quality of education
forts are required to both ensure that all students by examining a range of criteria (e. g., safety, learn-
are familiar with the types of tasks on the assess- ing, continuation in higher education, gainful em-
ment and to ensure that divergent skills and abilities ployment, student honors) with a range of indicators
are considered in the construction and validation of that extend beyond test scores. As the test user be-
assessment programs. Sensitivity to cultural, ethnic, comes increasingly removed from personal knowl-
gender, and language differences are required when edge of the examine, or less familiar with the units
interpreting results from assessments or other mea- (e. g., schools, districts) of comparison, instances of
sures for individual students. Similarly, differences mismeasurement and test misuse will increase
in these and other demographic variables must be (Scheuneman & Oakland, in press).
150 Wayne J. Camara

Conclusion and how these are determined and evaluated by the


various stakeholders is essential to reduce the mis-
use of testing and improve assessment practices
This paper has attempted to distinguish among legal among the increasingly diverse types of individuals
and regulatory mandates, ethical issues, and profes- using tests and results from testing.
sional responsibilities all which concern the appro-
priate use of tests and test data. Numerous efforts
have been undertaken by testing professionals and Résumé
professional organizations to improve responsible
Les tests et les évaluations ont fait l’objet de con-
use of tests, yet often these efforts are judged to
traintes supplémentaires au fur et à mesure que leur
have fallen short. As tests are used by an increasing
utilisation a été étendue par les enseignants, les em-
number of users with a variety of objectives (e. g.,
ployeurs et les décideurs. Des exigences grandissan-
policy makers, state and local education officials,
tes sont souvent adressées à ces mêmes évaluations
business) the potential for misuse of tests increases
par diverses instances, pour répondre à des buts
and efforts to educate and monitor test users be-
variés (par ex: la réforme de l’enseignement, la re-
come less effective. Existing testing standards and
sponsabilité des étudiants, la qualité des méthodes
specialty guidelines and other forms of addressing
d’enseignement). Ces tendances ont suscité des
the responsible use of tests are discussed. The po-
préoccupations considérables quant à l’emploi ap-
tential consequences of testing and assessment are
proprié des tests et de leurs résultats, et les
reviewed and a taxonomy has been proposed to aid
méthodes de testing sont examinées de plus en plus
test users in addressing the multiple and multidi-
scrupuleusement dans le domaine de l’éducation, de
mensional consequences resulting from test use
l’emploi et de la santé. Le présent article différencie
with various key stakeholder groups. Finally, this pa-
les problèmes légaux, éthiques et professionnels ap-
per provides a more detailed review of the profes-
parus récemment du fait de demandes accrues
sional concerns arising from the migration of tests
d’évaluations et il identifie les problèmes spéci-
to a computer-based platforms and the increased
fiques liés à l’application et à l’interprétation infor-
demands placed on assessments in U. S. education.
matisées des tests. L’auteur passe en revue les ef-
The value of assessment is often related to its im-
forts entrepris par les associations professionnelles
pact. Individual appraisals should bring to bear all
en vue d’améliorer les pratiques d’évaluation ainsi
relevant information to describe and explain impor-
que les problèmes concernant l’utilisation adéquate
tant qualities, minimize problems, promote growth
des évaluations dans le domaine de l’éducation. En-
and development, and increase the validity of im-
fin il propose une méthodologie pour identifier les
portant decisions (e. g., course placement, admis-
conséquences liées à l’emploi des tests et une
sions, certification, selection) (Scheuneman & Oak-
taxonomie destinée à évaluer les conséquences
land, in press). National, state, and local testing pro-
multidimensionnelles de l’emploi des tests dans un
grams should provide comprehensive data that can
contexte donné.
supplement other sources of information in both in-
forming us of student skills and knowledge today
and the growth in learning over time. Author’s address:
Legal, ethical, and professional concerns with as-
sessment are difficult to distinguish. All such issues Dr. Wayne J. Camara
concern the proper use of assessment and the prob- The College Board
able consequences of using assessments. Conse- 19 Hawthorne Drive
quences of testing are in the eye of the beholder. Princeton Junction, NJ 08550
The same assessment which presents several poten- USA
tial benefits to some groups (e. g., policy makers, E-mail: wcamara@collegeboard.org
community, business) may also result in negative
consequences to individuals (e. g., test takers, stu-
dents). A paradigm is needed to assist test users
identify and evaluate the potential consequences
References
that result from test use and the consequences
Aiken, L., West, S. G., Sechrest, L., Reno, Raymond R.,
which would result from alternative practices (use Roediger III, H. L., Scarr, S., Kazdin, A. E., & Sherman,
of more subjective processes, collecting no data). S. J. (1990). Graduate training in statistics, methodolo-
Additional attention to the consequences of testing, gy, and measurement in psychology: A survey of PhD
ITC Bulletin: Test Ethics in the USA 151

programs in North American. American Psychologist, Deutsch, C. H. (October 16, 1988). A mania for testing
45, 721–734. spells money. New York Times.
American Educational Research Association, American Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991).
Psychological Association, & National Council on Quality control in the development and use of perform-
Measurement in Education. (1985). Standards for edu- ance assessments. Applied Measurement in Education,
cational and psychological testing. Washington, DC: 4(4), 289–303.
APA. Educational Testing Service (October 31, 1996). Test
American Educational Research Association & National cheating scheme used encoded pencils, complaints
Council on Measurements Used in Education. (1955). charges. ETS Access. Princeton, NJ: Author.
Technical Recommendations for Achievement Tests.
Employee Polygraph Protection Act of 1988, Sec. 200001
Washington, DC: National Educational Association.
et sec., 29 U. S.C.
American Federation of Teachers, National Council on
Everson, H. E. (in press). A theory-based framework for
Measurement in Education, & the National Educa-
future college admissions tests. In S. Messick (Ed.), As-
tional Association (1990). Standards for teacher compe-
sessment in higher education. Hillsdale, NJ: Erlbaum.
tence in educational assessment of students. Washington,
DC: Authors. Eyde, L. D., & Kowal, D. M. (1987). Computerized test
American Psychological Association (1953). Ethical stan- interpretation services: Ethical and professional con-
dards for psychologists. Washington, DC: Author. cerns regarding U. S. producers and users. Applied Psy-
chology: An International Review, 36, 401–417.
American Psychological Association (1954). Technical
recommendations for psychological tests and diagnostic Eyde, L. D., & Quaintance, M. K. (1988). Ethical issues
techniques. Washington, DC: Author. and cases in the practice of personnel psychology. Pro-
fessional psychology: Research and Practice, 19(2), 148–
American Psychological Association (1993). Ethical prin-
154.
cipals of psychologists and code of conduct. American
Psychologist, 49, 1597–1611. Fairtest (Summer, 1996). Cheating cases reveal testing ma-
nia. Fairtest Examiner, 9, 3–4.
Association of Personnel Test Publishers (1990). Model
guidelines for preemployment integrity testing programs. Hambleton, R. K. (1994). Guidelines for adapting psycho-
Washington, DC: Author. logical and educational tests: A progress report. Euro-
Berliner, D. C., & Biddle, B. J. (1995). The manufactured pean Journal of Psychological Assessment, 10, 229–244.
crisis: Myths fraud and the attack on America’s public Haney, W. (1996). Standards, schmandards: The need for
schools. Reading, MA: Addison-Wesley. bringing test standards to bear on assessment practice.
Brown, D. C. (February 8, 1997). Personal correspon- Paper presented at the Annual Meeting of the Ameri-
dence. can Educational Research Association, New York.
Camara, W. J. (1994). Consequences of test use: The need Haney W., & Madaus, G. C. (1991). In R. K. Hambleton
for criteria. Paper presented at the 23rd International & J. C. Zaal (Eds.), Advances in educational and psy-
Congress of Applied Psychology. Madrid, Spain. chological testing (pp. 395–424). Boston, MA: Kluwer.
Camara, W. J. (1996). Fairness and public policy in em- Haney, W., Madaus, G. C., & Lyons, R. (1993). The frac-
ployment testing. In R. Barrett (Ed.) Fair employment tured marketplace for standardized testing. Boston, MA:
strategies in human resource management (pp. 3–11). Kluwer.
Westport, CT: Quorum Books. Helms, J. E. (1992). Why is there no study of cultural
Camara, W. J., & Brown, D. C. (1995). Educational and equivalence in standardized cognitive ability testing?
employment testing: Changing concepts in measure- American Psychologist, 47, 1083–1101.
ment and policy. Educational Measurement: Issues and Joint Committee on Testing Practices. (1988). Code of fair
Practice, 14, 1–8. testing practices in education. Washington, DC: Author.
Center for Research on Evaluation, Standards and Stu- (Copies may be obtained from NCME, Washington,
dent Testing (1995). Results from the 1995 CRESST DC)
Conference: Assessment at the crossroads. Los Angeles: Joint Committee on Testing Practices. (1996). Rights and
UCLA, CRESST. responsibilities of test takers (Draft). Washington, DC:
College Entrance Examination Board (1988). Guidelines Author.
on the uses of College Board test scores and related data. Kirst, M. W., & Mazzeo, C. (1996). The rise and fall of state
New York: Author. assessment in California 1993–96. Kappan, 22, 319–323.
Cole, N., & Willingham, W. (1997). Gender and fair assess-
Lindsay, G. (1996). Ethics and a changing society. Euro-
ment. Hillsdale, NJ: Erlbaum.
pean Psychologist, 1, 85–88.
Council on Licensure, Enforcement, and Regulation
Linn, R. L. (1993). Educational assessment: Expanded ex-
(1993). Development, administration, scoring, and re-
pectations and challenges. Educational evaluation and
porting of credentialing examinations. Lexington, KY:
policy analyses, 15(1), 1–16.
Council of State Governments.
Council on Licensure, Enforcement, and Regulation & Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex,
National Organization for Competency Assurance. performance-based assessment: Expectations and vali-
(1993). Principles for fairness: An examining guide for dation criteria. Educational Researcher, 20(8), 15–21.
credentialing boards. Lexington, KY: Author. Madaus, G. F. (1992). An independent auditing mechan-
Cronbach, L. J. (1988). Five perspectives on validity argu- ism for testing. Educational Measurement: Issues and
ments. In H. Wainer & H. I. Braun (Eds.), Test validity Practice, 11, 26–31.
(pp. 3–17). Hillsdale, NJ: Erlbaum. Matarazzo, J. D. (1986). Computerized clinical psycholog-
152 Wayne J. Camara

ical test interpretations: Unvalidated plus all mean and achievement testing. Educational Researcher, 20(5),
no sigma. American Psychologist, 41, 14–24. 12–20.
McGrew, K. S., Thurlow, M. L., & Spiegel, A. N. (1993). Phillips, S.E. (1996). Legal defensibility of standards: Is-
An investigation of the exclusion of students with dis- sues and policy perspectives. Educational Measure-
abilities in national data collection programs. Educa- ment: Issues and Practice, 15, 5–13.
tional Evaluation and Policy Analyses, 15, 339–352. Russell, M., & Haney, W. (1997). Testing writing on com-
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educa- puters: An experiment comparing student performance
tional measurement (3rd ed., pp. 33–46). New York: on tests conducted via computer and via paper-and-
Macmillan. pencil. Educational Policy Analyses Archives, 5(3) 1–
18.
Mills, C. N., & Stocking, M. L. (1996). Practical issues in
Sackett, P. R., Burris, L. R., & Callahan, C. (1989). Integ-
large-scale computerized adaptive testing. Applied
rity testing for personnel selection: An update. Person-
Measurement in Education, 9, 287–304.
nel Psychology, 42, 491–529.
National Academy of Education (1992). Assessing student Sackett, P. R., & Harris, M. M. (1984). Honesty testing for
achievement in the states: The first report of the National personnel selection: A review and critique. Personnel
Academy of Education panel on the evaluation of the Psychology, 32, 487–506.
NAEP trial state assessment; 1990 Trail State Assess-
Scheuneman, J. D., & Oakland, T. (in press). High stakes
ment. Stanford, CA: Stanford University, National
testing in education.
Academy of Education.
Schmeiser, C. B. (1992). Ethical codes in the professions.
National Association of Collegiate Admissions Counsel- Educational Measurement: Issues and Practice, 11(3),
ors (1995). NCACA commission on the role of stand- 5–11.
ardized testing in college admissions. Author.
Society for Industrial and Organizational Psychology
New York Times (January 22, 1997). One of the newest (1987). Principles for the validation and use of personnel
take-at-home tests: IQ. New York Times. selection procedures. Bowling Green, OH: Author.
Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L. Vroom, V. H. (1964). Work and motivation. New York:
(1991). A developmental perspective of standardized Wiley.

You might also like