BACHMAN CHAPTER 3 - Uses and Interpretations of Language Tests

BACHMAN
CHAPTER 3
Presented by:
AliReza Khazaei
• Major uses of language tests:
1. as sources of information for making decisions within

the context of educational programs
2. as indicators of abilities or attributes that are of interest

in research on language, language acquisition, and
language teaching
• In the context of research, the interpretation of test results is

of both theoretical and applied interest.
• Such interpretations can assist in:
I. Understanding the nature of language proficiency

II. Have implications for learning and teaching language
• In educational settings, the major uses of test scores are
related to evaluation or making decisions about people or
programs.
• Evaluation comprises essentially two components:
1. Information
2. Value judgment or decision
• The information relevant to evaluation can be either:
• Qualitative: observation – performance checklist

• Quantitative: teacher rating – class rank
• The use of tests as a source of evaluation information
requires three assumptions:
1. We must assume that information regarding educational

outcomes is essential to effective formal education. We
must consider accountability and feedback as essential
mechanisms for effectiveness of educational programs.
2. It is possible to improve learning and teaching through

appropriate changes in the program, based on
feedback.
3. We must assume that the educational outcomes of the

given program are measurable.
• In addition to those assumptions, we must also consider
how much and what kind of testing is needed, as well as
the quality of information provided by our tests.
• The amount and type of testing that is done depends

upon:
1. The decisions that are to be made
2. The type of information that is needed to make the

correct decisions
• A second consideration in using tests is the quality of the
information that they must provide.
• In educational programs, the decisions made are generally

about people and have some effect on their lives.
• It is therefore essential that the information upon which we

base these decisions be as Reliable and as Valid as
possible.
• The necessary level for Relativity and Validity is relative and

depends on the importance of the decisions to be made.
• In general, the more important the decision to be made,
the greater the effort that should be expended in assuring
that the test is reliable and valid.
• In every decision situation, there is a certain probability for

errors and costs that are associated with those errors.
• The more important the decision, the greater the cost of

making an error.
• By maximizing the reliability and validity of the information

provided by tests, we are reducing the probability of errors
in the decisions we make, and hence the potential costs
associated with those errors.
• At the same time, there are costs associated with assuring
reliability and validity, even in classroom testing.
• Collecting the right kinds of information to estimate

reliability takes not only training, but time and effort.
• Similarly, the care that goes into designing and developing

a test whose content is representative of a given course is
a cost incurred in assuring content validity.
• In order to achieve very high levels of reliability and validity,

we must carry out rigorous and extensive research and
development, and this can be quite costly.
• Deciding how much time and effort to put into assuring
reliability and validity, we need to consider the cost-
effectiveness of such expenditures.
• The costs associated with assuring reliability and validity

should be offset by the potential reduction in costs
associated with decision errors.
• If the costs associated with decision errors are minimal, then

it would be wasteful to expend a great deal of time and
effort to assure high levels of reliability and validity.
• On the other hand, if the potential costs of errors are great,

it would be unethical not to make every effort to achieve
the highest levels of reliability and validity possible.
• Decisions are of two types:
1. Decisions about individuals: Micro-evaluation
I. Decisions about students

II. Decisions about teachers
2. Decisions about the program: Macro-evaluation

• Decisions about students include:
• Selection (entrance, readiness): The first decision that may

be made about students is whether they should enter the
program. If the purpose of this test is to determine whether
students are ready for instruction, it may be referred to as a
readiness test.
• Placement: Decisions regarding the placement of students

into appropriate groups.
• Diagnosis: Information from language tests can be used for

diagnosing students’ areas of strength and weakness in
order to determine appropriate types and levels of
teaching and learning activities.
• Progress and grading
• Formative evaluation:
I. Provide continuous feedback to both teacher and

learner for making decisions regarding appropriate
modifications in the instructional procedures and
learning activities.
II. Identify learner’s areas of strength and weakness.
III. Made based on qualitative feedback.
IV. Provide additional information on student’s progress.

• Summative evaluation
I. Typically in the form of grades or marks
II. At the end of an instructional course
III. Grades are assigned based on performance on tests in

addition to classroom performance
• Content of tests used for making decisions regarding

progress and grades should be based on the syllabus
rather than on a theory of language proficiency.
• Decisions about teachers include:
I. The decision to hire a given individual as a teacher will

depend on a wide range of information, some of which
may be obtained from tests.
II. It is important to recognize that the proficiency required

of a language teacher may be both quantitatively and
qualitatively different from that of the students.
III. The language tests that are used for students may not be
sufficient for evaluating teachers’ communicative
language ability.
• Decisions about programs
• Evaluating components in terms of appropriacy, efficiency

and effectiveness in developing a new program.
• Learners’ performance on achievement tests can provide:
I. Indication of achievement of program’s objectives
II. Pinpointing areas of deficiency

• Formative evaluation of programs
• The focus is on providing information that will be useful for
making decisions about a program while it is under
development.
• Achievement tests based on the content of the syllabus
• Summative evaluation of programs

• The focus is on whether our program is better than other
comparable programs, or whether it is the ‘best’ program
currently available.
• Obtain measures of language proficiency in addition to
information on students' achievement of syllabus objectives
• Research uses of language tests
• As operational definitions of theoretical constructs,

language tests have a potentially important role in virtually
all research, both basic and applied.
• It is now generally agreed that language proficiency:
I. Is not a single unitary ability
II. Consists of several distinct but related constructs in

addition to a general construct of language
proficiency
1. Research into nature of language proficiency
• Focuses on identifying and empirically verifying various

components of language proficiency.
• Of particular interest in this regard are models of

communicative competence, which have provided the
theoretical definitions for the development of tests of
constructs such as:
I. Sensitivity to cohesive relationships

II. Discourse organization
III. Differences in register
2. Research into the nature of language processing
• Responses to language tests can provide a rich body of

data for the identification of processing errors and their
explanation, while language testing techniques can serve
as elicitation procedures for collecting information on
language processing.
3. Research on the nature of language acquisition
• Studies of language acquisition often require indicators of

the amount of language acquired for use as criterion or
dependent variables, and these indicators frequently
include language tests.
4. Research into factors related to language acquisition
• Factors include:
I. Language aptitude
II. Level of proficiency in the native language
• Although language attrition (loss) is not simply the reverse

of language acquisition, many of the same factors that
have been examined with respect to language acquisition
are also hypothesized to affect language attrition, and
language tests also have a role to play in this area of
research.
5. Research into effects of different instructional settings
and techniques on language acquisition
• Language tests have also provided criterion indicators of

language ability for studies in classroom-centered second
language acquisition.
• And research into the relationship between different

language teaching strategies and aspects of second
language competence.
• Language tests can be classified according to five
distinctive features:
1. The purpose or use for which they are intended
2. The content upon which they are based
3. The frame of reference within which their results are to be

interpreted
4. The way in which they are scored
5. The specific technique or method they employ

• Intended use
• Tests are developed with a particular primary use in mind.
• In research, language tests are used to provide information

for comparing the performances of individuals.
• In educational settings, however, language tests provide

information for making a wide variety of decisions like:
I. Admission decisions (entrance & readiness tests)
II. Identifying the appropriate instructional level in which

instruction is needed (placement & diagnostic tests)
III. Decisions about how individuals should proceed

through the program or how well they are attaining
the program’s objectives (achievement test)
• Content
• Can be based on either a theory of language proficiency

or a specific domain of content (course syllabus).
• We can refer to theory-based tests as proficiency tests,

while syllabus-based tests are generally referred to as
achievement tests.
• Language aptitude tests are also distinguished according

to content.
• Like language proficiency tests, language aptitude tests

are theory-based, but the theory upon which they are
based includes abilities that are related to the acquisition,
rather than the use of language.
• Frame of reference
• The results of language tests can be interpreted in two

different ways, depending on the frame of reference
adopted.
• When test scores are interpreted in relation to the

performance of a particular group of individuals, we speak
of a norm-referenced interpretation.
• On the other hand, if they are interpreted with respect to a

specific level or domain of ability, we speak of a criterion-
referenced interpretation.
• Norm-referenced tests
• Designed to enable the test user to make ‘normative’

interpretations of test results.
• Test results are interpreted with reference to the

performance of a given group or norm.
• The norm group is typically a large group of individuals who

are similar to the individuals for whom the test is designed.
• in the development of NR tests the norm group is given the

test and then the characteristics or norms of this group’s
performance are used as reference points for interpreting
the performance of other students who take the test.
• Typical performance characteristics used are:
• Mean (x): Average score of the group
• Standard deviation (s): Indicator of how spread out the
scores of the group are
• If the NR test is properly designed, the scores attained will

typically be distributed in the shape of a normal bell-
shaped curve.
• Characteristics of the normal bell-shaped curve:
I. 50% of the scores are below the mean or average and

50% are above.
II. 34% of the scores are between the mean and one
standard deviation above (+ 1s) or below (- 1s) the
mean.
III. 27% are between one and two standard deviations from
the mean (13.5% above and 13.5% below).
IV. Only 5% of the scores will be as far away as two or more

standard deviations from the mean.
• The quintessential NR test is the standardized test, which
has three characteristics:
I. Standardized tests are based on a fixed, or standard

content, which does not vary from one form of the test
to another.
II. There are standard procedures for administering and

scoring the test, which do not vary from one
administration of the test to the next.
III. Standardized tests have been thoroughly tried out, and

through a process of empirical research and
development, their characteristics are well known.
• Criterion-referenced tests
• Designed to enable the test user to interpret a test score

with reference to a criterion level of ability or domain of
content.
• The primary concerns in developing a CR test are that it

adequately represents the criterion ability level or sample
the content domain and be sensitive to levels of ability or
degrees of mastery of the different components of that
domain.
• It is important to point out that it is this level of ability or

domain of content that constitutes the criterion, and not
the setting of a cut-off score for making decisions.
• Primary distinctions between NR and CR tests are:
I. In their design, construction, and development

II. In the scales they yield and the interpretation of these
scales
• NR tests are designed and developed to maximize

distinctions among individual test takers.
• It means that the items or parts of such tests will be

selected according to how well they discriminate
individuals.
• CR tests, on the other hand, are designed to be
representative of specified levels of ability or domains of
content.
• The items or parts will be selected according to how

adequately they represent these ability levels or content
domains.
• NR test scores are interpreted with reference to the

performance of other individuals on the test.
• CR test scores are interpreted as indicators of a level of

ability or degree of mastery of the content domain.
• Scoring procedure
• Subjective tests are distinguished from objective tests

entirely in terms of scoring procedure.
• In an objective test, the correctness of the test taker’s

response is determined entirely by predetermined criteria
so that no judgment is required on the part of scorers.
• In a subjective test, the scorer must make a judgment

about the correctness of the response based on her
subjective interpretation of the scoring criteria.
• Testing method
• It is not possible to make an exhaustive list of the methods

used for language tests since their number keeps growing.
• Some commonly used methods are multiple-choice,

completion, dictation and cloze.
• But such methods are not themselves single methods, but

consist of different combinations of features: instructions,
types of input and task types.
• Test method facets such as these provide a more precise

way of describing and distinguishing among different types
of tests than do single category labels.
THANK YOU FOR
YOUR ATTENTION

BACHMAN CHAPTER 3 - Uses and Interpretations of Language Tests

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BACHMAN CHAPTER 3 - Uses and Interpretations of Language Tests

Uploaded by

Copyright:

Available Formats

BACHMAN

1. as sources of information for making decisions within

2. as indicators of abilities or attributes that are of interest

• In the context of research, the interpretation of test results is

• Such interpretations can assist in:

I. Understanding the nature of language proficiency

• Evaluation comprises essentially two components:

• The information relevant to evaluation can be either:

• Qualitative: observation – performance checklist

1. We must assume that information regarding educational

2. It is possible to improve learning and teaching through

3. We must assume that the educational outcomes of the

• The amount and type of testing that is done depends

1. The decisions that are to be made

2. The type of information that is needed to make the

• In educational programs, the decisions made are generally

• It is therefore essential that the information upon which we

• The necessary level for Relativity and Validity is relative and

• In every decision situation, there is a certain probability for

• The more important the decision, the greater the cost of

• By maximizing the reliability and validity of the information

• Collecting the right kinds of information to estimate

• Similarly, the care that goes into designing and developing

• In order to achieve very high levels of reliability and validity,

• The costs associated with assuring reliability and validity

• If the costs associated with decision errors are minimal, then

• On the other hand, if the potential costs of errors are great,

1. Decisions about individuals: Micro-evaluation

I. Decisions about students

2. Decisions about the program: Macro-evaluation

• Selection (entrance, readiness): The first decision that may

• Placement: Decisions regarding the placement of students

• Diagnosis: Information from language tests can be used for

I. Provide continuous feedback to both teacher and

II. Identify learner’s areas of strength and weakness.

III. Made based on qualitative feedback.

IV. Provide additional information on student’s progress.

I. Typically in the form of grades or marks

II. At the end of an instructional course

III. Grades are assigned based on performance on tests in

• Content of tests used for making decisions regarding

I. The decision to hire a given individual as a teacher will

II. It is important to recognize that the proficiency required

• Evaluating components in terms of appropriacy, efficiency

• Learners’ performance on achievement tests can provide:

I. Indication of achievement of program’s objectives

II. Pinpointing areas of deficiency

• Summative evaluation of programs

• As operational definitions of theoretical constructs,

• It is now generally agreed that language proficiency:

I. Is not a single unitary ability

II. Consists of several distinct but related constructs in

• Focuses on identifying and empirically verifying various

• Of particular interest in this regard are models of

I. Sensitivity to cohesive relationships

• Responses to language tests can provide a rich body of

3. Research on the nature of language acquisition

• Studies of language acquisition often require indicators of

• Although language attrition (loss) is not simply the reverse

• Language tests have also provided criterion indicators of

• And research into the relationship between different

1. The purpose or use for which they are intended