You are on page 1of 6

1

1 INTRODUCTION
2 Educational and psychological testing and assessment are among the most important
3 contributions of behavioral science to our society, providing fundamental and significant sources
4 of information about individuals and groups. Although not all tests are well-developed nor are all
5 testing practices wise and beneficial, there is extensive evidence documenting the positive effects
6 of well-constructed tests. The proper use of tests can result in better decisions about individuals
7 and programs than would be the case without their use and also can provide a route to broader
8 and more equitable access to education and employment. The improper use of tests, however,
9 can cause considerable harm to test takers and other parties affected by test-based decisions. The
10 intent of the Standards is to promote sound testing practices and to provide a basis for evaluating
11 the quality of these practices.

12 PARTICIPANTS IN THE TESTING PROCESS


13 Educational and psychological testing and assessment involve and significantly affect
14 individuals, institutions, and society as a whole. The individuals affected include students,
15 parents, teachers, educational administrators, job applicants, employees, clients, patients,
16 supervisors, executives, and evaluators, among others. The institutions affected include schools,
17 colleges, businesses, industry, clinics, and government agencies. Individuals and institutions
18 benefit when testing helps them achieve their goals. Society, in turn, benefits when testing
19 contributes to the achievement of individual and institutional goals.

20 There are many participants in the testing process, including, among others: (a) those
21 who prepare and develop the test; (b) those who publish and market the test; (c) those who
22 administer and score the test; (d) those who use the test results for some decision-making
23 purpose; (e) those who interpret test results for clients; (f) those who take the test by choice,
24 direction, or necessity; (g) those who sponsor tests, which may be boards that represent
25 institutions or governmental agencies that contract with a test developer for a specific instrument
26 or service; and (h) those who select or review tests, evaluating their comparative merits or
27 suitability for the uses proposed. In general, those who are participants in the testing process
28 should have appropriate knowledge of tests and assessments to allow them to make good
29 decisions on what tests to use and how to interpret test results. Uses of tests and testing practices
30 are improved to the extent that those involved have adequate levels of assessment literacy.

31 The interests of the various parties involved in the testing process may or may not be
32 congruent. For example, when a test is given for counseling purposes or for job placement, the
33 interests of the individual and the institution often coincide. In contrast, when a test is used to
34 select from among many individuals for a highly competitive job or for entry into an educational
35 or training program, the preferences of an applicant may be inconsistent with those of an
36 employer or admissions officer. Similarly, when testing is mandated by a court, the interests of
37 the test taker may be different from those of the party requesting the court order. Individuals or
38 institutions may serve several roles in the testing process. For example, in clinics the test taker is
39 typically the intended beneficiary of the test results. In some situations the test administrator is
40 an agent of the test developer, and sometimes the test administrator is also the test user. When an
41 industrial organization prepares its own employment tests, it is both the developer and the user.
42 Sometimes a test is developed by a test author but published, advertised, and distributed by an
2

43 independent publisher, though the publisher may play an active role in the test development
44 process. Roles may also be further subdivided. For example, both the organization and a
45 professional assessor may play a role in the provision of an assessment center. Given this
46 intermingling of roles, it is difficult to assign precise responsibility for addressing various
47 standards to specific participants in the testing process.

48 The Standards is based on the premise that effective testing and assessment require that
49 all participants in the testing process possess the knowledge, skills, and abilities necessary to
50 fulfill their role in the testing process, as well as an awareness of personal and contextual factors
51 that may influence the testing process. For example, test developers and those selecting tests and
52 interpreting test results need adequate knowledge of psychometric principles such as validity and
53 reliability. They also should obtain any appropriate supervised experience and legislatively
54 mandated practice credentials that are required to perform competently those aspects of the
55 testing process in which they engage.

56 THE PURPOSE OF THE STANDARDS


57 The purpose of publishing the Standards is to provide criteria for the evaluation of tests, testing
58 practices, and the effects of interpretations of the test scores for the intended test use(s).
59 Although the evaluation of the appropriateness of an interpretation of a test result for its intended
60 use should depend heavily on professional judgment, the Standards provides a frame of
61 reference to assure that relevant issues are addressed. All professional test developers, sponsors,
62 publishers, and users should make reasonable efforts to satisfy and follow the Standards and
63 encourage others to do so. All applicable standards should be met by all tests and in all test uses
64 unless a sound professional reason is available to show why the standard is not relevant or
65 technically feasible in a particular case.

66 The Standards makes no attempt to provide psychometric answers to questions of public


67 policy regarding the use of tests. In general, the Standards advocates that, within feasible limits,
68 the relevant technical information be made available so that those involved in policy decisions
69 may be fully informed.

70 TESTS AND TEST USES TO WHICH THESE STANDARDS APPLY


71 A test is an evaluative device or procedure in which a sample of an examinee's behavior in a
72 specified domain is obtained and subsequently evaluated and scored using a standardized
73 process. While the label test is ordinarily reserved for instruments on which responses are
74 evaluated for their correctness or quality and the terms scale or inventory are used for measures
75 of attitudes, interest, and dispositions, the Standards uses the single term test to refer to all such
76 evaluative devices.

77 A distinction is sometimes made between test and assessment. Assessment is a broader


78 term than test, commonly referring to a process that integrates test information with information
79 from other sources (e.g., information from other tests, the individual's social, educational,
80 employment, health, or psychological history). The applicability of the Standards to an
81 evaluation device or method is not altered by the label applied to it (e.g., test, assessment, scale,
82 inventory).
3

83 Tests differ on a number of dimensions: the mode in which test materials are presented
84 (paper and pencil, oral, computerized administration, and so on); the degree to which stimulus
85 materials are standardized; the type of response format (selection of a response from a set of
86 alternatives as opposed to the production of a response); and the degree to which test materials
87 are designed to reflect or simulate a particular context. In all cases, however, tests standardize the
88 process by which test-taker responses to test materials are evaluated and scored. As noted in
89 prior versions of the Standards, the same general types of information are needed for all varieties
90 of tests.

91 The precise demarcation between those measurement devices used in the fields of
92 educational and psychological testing that do and do not fall within the purview of the Standards
93 is difficult to identify. Although the Standards applies most directly to standardized measures
94 generally recognized as "tests," such as measures of ability, aptitude, achievement, attitudes,
95 interests, personality, cognitive functioning, and mental health, the Standards also may be
96 usefully applied in varying degrees to a broad range of less formal assessment techniques.
97 Admittedly, rigorous application of the Standards to unstandardized questionnaires or to the
98 broad range of unstructured behavior samples used in some forms of clinical- and school-based
99 psychological assessment (e.g., an intake interview), and to instructor-made tests that are used to
100 evaluate student performance in education and training, is generally not possible. It is useful to
101 distinguish between devices that lay claim to the concepts and techniques of the field of
102 educational and psychological testing from those which represent nonstandardized or less
103 standardized aids to day-to-day evaluative decisions. Although the principles and concepts
104 underlying the Standards can be fruitfully applied to day-to-day decisions, such as when a
105 business owner interviews a job applicant, a manager evaluates the performance of subordinates,
106 a teacher uses classroom assessment to monitor student progress on an instructional lesson, or a
107 coach evaluates a prospective athlete, it would be overreaching expectations for the standards of
108 the educational and psychological testing field to be followed by those making such decisions. In
109 contrast, a structured interviewing system developed by a psychologist and accompanied by
110 claims that the system has been found to be predictive of job performance in a variety of other
111 settings falls within the purview of the Standards.

112 SCOPE OF THE REVISION


113 This volume serves as a revision to the 1999 Standards for Educational and Psychological
114 Testing. The revision process started with the appointment of a Management Committee,
115 composed of representatives of the three sponsoring organizations (AERA, APA, NCME), who
116 were responsible for overseeing the general direction of the effort. To guide the revision, the
117 Management Committee solicited and synthesized comments on the 1999 Standards from
118 members of the sponsoring organizations and convened the Joint Committee for the Revision of
119 the 1999 Standards in 2009 to do the actual revision. The Joint Committee also was composed
120 of representatives of the three sponsoring organizations and was charged by the Management
121 committee to address five major areas: consider the accountability issues for use of tests in
122 educational policy; broaden the concept of accessibility of tests for all examinees; represent more
123 comprehensively the role of tests in the workplace; broaden the role of technology in testing, and
124 provide for a better organizational structure for communicating the standards.

125 To be responsive to this charge, several actions were taken:


4

126  The chapters on Educational Testing and Policy Uses of Test were rewritten to attend to
127 the issues of uses of tests for educational accountability purposes.

128  A new chapter on Fairness in Testing was written to emphasize accessibility and fairness
129 as fundamental issues in testing, and specific concerns for fairness are threaded
130 throughout all of the chapters of the Standards.

131  The chapter on Testing in Employment and Credentialing was reorganized, clearly
132 identifying when a standard is relevant to employment and/or credentialing.

133  Technology was considered throughout the volume. One of the major technology issues
134 identified was the tension between the use of proprietary algorithms and the need for test
135 users to be able to evaluate complex applications in areas such as automated scoring of
136 essays, administering and scoring of innovative item types, and computer-based testing.
137 These issues are considered in the Test Design and Development Chapter of these
138 Standards.

139  A content editor was engaged to help with both the technical accuracy and clarity of each
140 chapter and also with the consistency in language across chapters. As noted below,
141 chapters in the Fundamentals section now have an “overarching standard” as well as
142 themes under which the individual standards are organized. In addition, the glossary from
143 the 1999 Standards for Educational and Psychological Testing was updated.

144 A major change in the organization of this volume involves the conceptualization of fairness.
145 The previous edition of the Standards (1999) had a section devoted to this topic, with separate
146 chapters on Fairness in Testing and Test Use, Testing Individuals of Diverse Linguistic
147 Backgrounds, and Testing Individuals with Disabilities. In this edition of the Standards, the
148 topics addressed in these chapters are combined into a single, comprehensive chapter and the
149 chapter is located in Section I, Foundations. This change was made to emphasize that fairness
150 demands that all test takers be treated equitably, not just individuals from particular subgroups.
151 Fairness and accessibility, the unobstructed opportunity for all examinees to demonstrate their
152 standing on the construct(s) being measured, are relevant for all individuals and subgroups in the
153 intended population of test takers. Since issues related to fairness in testing are not restricted to
154 only individuals with diverse linguistic backgrounds or those with disabilities, the chapter was
155 more broadly cast to support appropriate testing experiences for all individuals. While the
156 examples in the chapter often refer to individuals with diverse linguistic and cultural
157 backgrounds and individuals with disabilities, they also include examples relevant to older adults
158 and young children to illustrate potential barriers to fair and equitable assessment for all
159 examinees.

160 ORGANIZATION OF THE VOLUME


161 Part I of the Standards, "Foundations” contains standards for Validity (Chapter 1); Errors of
162 Measurement and Reliability/Precision (Chapter 2); and Fairness in Testing (Chapter 3). Part II,
163 “Operations,” consists of the following chapters: Chapter 4: Test Design and Development;
164 Chapter 5: Scores, Scales, Norms, Cut Scores, and Score Linking; Chapter 6: Test
165 Administration, Scoring, and Reporting; Chapter 7: Supporting Documentation for Tests;
166 Chapter 8: Rights and Responsibilities of Test Takers; and Chapter 9: Rights and
5

167 Responsibilities of Test Users. Part III treats specific "Testing Applications," and contains
168 standards involving Psychological Testing and Assessment (Chapter 10); Workplace Testing and
169 Credentialing (Chapter 11); Educational Testing and Assessment (Chapter 12); and Policy Uses
170 of Test Results (Chapter 13).

171 Each chapter begins with introductory text that provides background for the standards
172 that follow. Although the text is at times prescriptive and exhortatory, it should not be interpreted
173 as imposing additional standards. The Standards also contains an index and includes a glossary
174 that provides definitions for terms as they are specifically used in this volume.

175 CATEGORIES OF STANDARDS


176 The text of each standard, and any accompanying commentary, discusses the conditions under
177 which a standard is relevant. Depending on the context and purpose of test development or use,
178 some standards will be more salient than others. Moreover, some standards are broad in scope,
179 setting forth concerns or requirements relevant to nearly all tests or testing contexts, and other
180 standards are narrower in scope. However, all standards are important in the contexts to which
181 they apply. Any classification that gives the appearance of elevating the general importance of
182 some standards over others could invite neglect of certain standards that need to be addressed in
183 particular situations. Rather than differentiate standards using priority labels, such as “Primary”,
184 “Secondary” or “Conditional” (as was used in the 1985 Standards), this edition instead
185 emphasizes that unless the standard is deemed irrelevant, inappropriate, or technically infeasible
186 for a particular use, all standards should be met, making all of them essentially “Primary.”

187 Unless otherwise specified in the standard or commentary, and with the caveats outlined
188 below, standards should be met before operational test use. Each standard should be carefully
189 considered to determine its applicability to the testing context under consideration. In a given
190 case there may be a sound professional reason why adherence to the standard is unnecessary. It is
191 also possible that there may be occasions when technical feasibility may influence whether a
192 standard can be met prior to operational test use. For example, some standards may call for
193 analyses of data that may not be available at the point of initial operational test use. In other
194 cases, traditional quantitative analyses may not be feasible due to small sample sizes. However,
195 there may be other methodologies that could be used to gather information to support the
196 standard, such as small sample methodologies, qualitative studies, focus groups, or even logical
197 analysis. In such instances, test developers and users should make a good faith effort to provide
198 the kinds of data called for in the standard to support the valid interpretations of the test results
199 for their intended purposes. If test developers, users, and, when applicable, sponsors have
200 deemed a standard to be inapplicable or unfeasible, they should be able, if called upon, to explain
201 the basis for their decision. However, there is no expectation that documentation of all such
202 decisions be routinely available.

203 PRESENTATION OF INDIVIDUAL STANDARDS


204 The standards in each chapter in Parts I and II (Foundations and Operations) are introduced by an
205 overarching standard, designed to convey the central intent of the chapter. These overarching
206 standards, always numbered as .0 (e.g., 1.0 for the overarching standard in the Validity chapter)
207 summarize guiding principles that are applicable to all tests and test uses. Further, the themes
208 and standards were ordered in the chapter to be consistent with the sequence of the material in
209 the introductory text for the chapter. Because some users of the Standards may turn only to
6

210 chapters directly relevant to a given application, certain standards are repeated in different
211 chapters, particularly in the Applications section (Part III). When such repetition occurs, the
212 essence of the standard is the same. Only the wording, area of application, or level of elaboration
213 in the comment, is changed.

214 CAUTIONS TO BE CONSIDERED IN USING THE STANDARDS


215 Several cautions are important to avoid misinterpreting the Standards:

216  Evaluating the acceptability of a test or test application does not rest on the literal
217 satisfaction of every standard in this document, and evaluating acceptability of a test or
218 test application cannot be determined by using a checklist. Specific circumstances affect
219 the importance of individual standards, and individual standards should not be considered
220 in isolation. Therefore, evaluating acceptability involves (a) professional judgment that is
221 based on a knowledge of behavioral science, psychometrics, and the community
222 standards in the professional field to which the test applies; (b) the degree to which the
223 intent of the standard has been satisfied by the test developer and user; (c) the alternatives
224 that are readily available; (d) research and experiential evidence regarding feasibility of
225 meeting the standard and (e) applicable laws and regulations for the professional field.

226  When tests are at issue in legal proceedings and other venues requiring expert witness
227 testimony, it is essential that professional judgment be based on the accepted corpus of
228 knowledge in determining the relevance of particular standards in a given situation. The
229 intent of the Standards is to offer guidance for such judgments.

230  Claims by test developers or test users that a test, manual, or procedure satisfies or
231 follows the standards in this volume should be made with care. It is appropriate for
232 developers or users to state that efforts were made to adhere to the Standards, and to
233 provide documents describing and supporting those efforts. Blanket claims without
234 supporting evidence should not be made. These standards are concerned with a field that
235 is evolving. Consequently, there is a continuing need to monitor changes in the field and
236 to revise this document as knowledge develops.

237  Prescription of the use of specific technical methods is not the intent of the Standards.
238 For example, where specific statistical reporting requirements are mentioned, the phrase
239 "or generally accepted equivalent" always should be understood.

240  The standards do not attempt to repeat or to incorporate the many legal or regulatory
241 requirements that might be relevant to the issues they address. In some areas, such as the
242 collection, analysis, and use of test data and results for different subgroups, the law may
243 both require participants in the testing process to take certain actions and prohibit those
244 participants from taking other actions. Where it is apparent that one or more standards or
245 comments address an issue on which established legal requirements may be particularly
246 relevant, the standard, comment, or introductory material may make note of that fact.
247 Lack of specific reference to legal requirements, however, does not imply that no relevant
248 requirement exists. In all situations, participants in the testing process should separately
249 consider and, where appropriate, obtain legal advice on legal and regulatory
250 requirements.

You might also like