You are on page 1of 35

BACHMAN

CHAPTER 3
Presented by:
AliReza Khazaei
• Major uses of language tests:

1. as sources of information for making decisions within


the context of educational programs

2. as indicators of abilities or attributes that are of interest


in research on language, language acquisition, and
language teaching

• In the context of research, the interpretation of test results is


of both theoretical and applied interest.

• Such interpretations can assist in:

I. Understanding the nature of language proficiency


II. Have implications for learning and teaching language
• In educational settings, the major uses of test scores are
related to evaluation or making decisions about people or
programs.

• Evaluation comprises essentially two components:

1. Information
2. Value judgment or decision

• The information relevant to evaluation can be either:

• Qualitative: observation – performance checklist


• Quantitative: teacher rating – class rank
• The use of tests as a source of evaluation information
requires three assumptions:

1. We must assume that information regarding educational


outcomes is essential to effective formal education. We
must consider accountability and feedback as essential
mechanisms for effectiveness of educational programs.

2. It is possible to improve learning and teaching through


appropriate changes in the program, based on
feedback.

3. We must assume that the educational outcomes of the


given program are measurable.
• In addition to those assumptions, we must also consider
how much and what kind of testing is needed, as well as
the quality of information provided by our tests.

• The amount and type of testing that is done depends


upon:

1. The decisions that are to be made

2. The type of information that is needed to make the


correct decisions
• A second consideration in using tests is the quality of the
information that they must provide.

• In educational programs, the decisions made are generally


about people and have some effect on their lives.

• It is therefore essential that the information upon which we


base these decisions be as Reliable and as Valid as
possible.

• The necessary level for Relativity and Validity is relative and


depends on the importance of the decisions to be made.
• In general, the more important the decision to be made,
the greater the effort that should be expended in assuring
that the test is reliable and valid.

• In every decision situation, there is a certain probability for


errors and costs that are associated with those errors.

• The more important the decision, the greater the cost of


making an error.

• By maximizing the reliability and validity of the information


provided by tests, we are reducing the probability of errors
in the decisions we make, and hence the potential costs
associated with those errors.
• At the same time, there are costs associated with assuring
reliability and validity, even in classroom testing.

• Collecting the right kinds of information to estimate


reliability takes not only training, but time and effort.

• Similarly, the care that goes into designing and developing


a test whose content is representative of a given course is
a cost incurred in assuring content validity.

• In order to achieve very high levels of reliability and validity,


we must carry out rigorous and extensive research and
development, and this can be quite costly.
• Deciding how much time and effort to put into assuring
reliability and validity, we need to consider the cost-
effectiveness of such expenditures.

• The costs associated with assuring reliability and validity


should be offset by the potential reduction in costs
associated with decision errors.

• If the costs associated with decision errors are minimal, then


it would be wasteful to expend a great deal of time and
effort to assure high levels of reliability and validity.

• On the other hand, if the potential costs of errors are great,


it would be unethical not to make every effort to achieve
the highest levels of reliability and validity possible.
• Decisions are of two types:

1. Decisions about individuals: Micro-evaluation

I. Decisions about students


II. Decisions about teachers

2. Decisions about the program: Macro-evaluation


• Decisions about students include:

• Selection (entrance, readiness): The first decision that may


be made about students is whether they should enter the
program. If the purpose of this test is to determine whether
students are ready for instruction, it may be referred to as a
readiness test.

• Placement: Decisions regarding the placement of students


into appropriate groups.

• Diagnosis: Information from language tests can be used for


diagnosing students’ areas of strength and weakness in
order to determine appropriate types and levels of
teaching and learning activities.
• Progress and grading

• Formative evaluation:

I. Provide continuous feedback to both teacher and


learner for making decisions regarding appropriate
modifications in the instructional procedures and
learning activities.

II. Identify learner’s areas of strength and weakness.

III. Made based on qualitative feedback.

IV. Provide additional information on student’s progress.


• Summative evaluation

I. Typically in the form of grades or marks

II. At the end of an instructional course

III. Grades are assigned based on performance on tests in


addition to classroom performance

• Content of tests used for making decisions regarding


progress and grades should be based on the syllabus
rather than on a theory of language proficiency.
• Decisions about teachers include:

I. The decision to hire a given individual as a teacher will


depend on a wide range of information, some of which
may be obtained from tests.

II. It is important to recognize that the proficiency required


of a language teacher may be both quantitatively and
qualitatively different from that of the students.

III. The language tests that are used for students may not be
sufficient for evaluating teachers’ communicative
language ability.
• Decisions about programs

• Evaluating components in terms of appropriacy, efficiency


and effectiveness in developing a new program.

• Learners’ performance on achievement tests can provide:

I. Indication of achievement of program’s objectives

II. Pinpointing areas of deficiency


• Formative evaluation of programs
• The focus is on providing information that will be useful for
making decisions about a program while it is under
development.
• Achievement tests based on the content of the syllabus

• Summative evaluation of programs


• The focus is on whether our program is better than other
comparable programs, or whether it is the ‘best’ program
currently available.
• Obtain measures of language proficiency in addition to
information on students' achievement of syllabus objectives
• Research uses of language tests

• As operational definitions of theoretical constructs,


language tests have a potentially important role in virtually
all research, both basic and applied.

• It is now generally agreed that language proficiency:

I. Is not a single unitary ability

II. Consists of several distinct but related constructs in


addition to a general construct of language
proficiency
1. Research into nature of language proficiency

• Focuses on identifying and empirically verifying various


components of language proficiency.

• Of particular interest in this regard are models of


communicative competence, which have provided the
theoretical definitions for the development of tests of
constructs such as:

I. Sensitivity to cohesive relationships


II. Discourse organization
III. Differences in register
2. Research into the nature of language processing

• Responses to language tests can provide a rich body of


data for the identification of processing errors and their
explanation, while language testing techniques can serve
as elicitation procedures for collecting information on
language processing.

3. Research on the nature of language acquisition

• Studies of language acquisition often require indicators of


the amount of language acquired for use as criterion or
dependent variables, and these indicators frequently
include language tests.
4. Research into factors related to language acquisition

• Factors include:

I. Language aptitude
II. Level of proficiency in the native language

• Although language attrition (loss) is not simply the reverse


of language acquisition, many of the same factors that
have been examined with respect to language acquisition
are also hypothesized to affect language attrition, and
language tests also have a role to play in this area of
research.
5. Research into effects of different instructional settings
and techniques on language acquisition

• Language tests have also provided criterion indicators of


language ability for studies in classroom-centered second
language acquisition.

• And research into the relationship between different


language teaching strategies and aspects of second
language competence.
• Language tests can be classified according to five
distinctive features:

1. The purpose or use for which they are intended

2. The content upon which they are based

3. The frame of reference within which their results are to be


interpreted

4. The way in which they are scored

5. The specific technique or method they employ


• Intended use
• Tests are developed with a particular primary use in mind.

• In research, language tests are used to provide information


for comparing the performances of individuals.

• In educational settings, however, language tests provide


information for making a wide variety of decisions like:

I. Admission decisions (entrance & readiness tests)

II. Identifying the appropriate instructional level in which


instruction is needed (placement & diagnostic tests)

III. Decisions about how individuals should proceed


through the program or how well they are attaining
the program’s objectives (achievement test)
• Content

• Can be based on either a theory of language proficiency


or a specific domain of content (course syllabus).

• We can refer to theory-based tests as proficiency tests,


while syllabus-based tests are generally referred to as
achievement tests.

• Language aptitude tests are also distinguished according


to content.

• Like language proficiency tests, language aptitude tests


are theory-based, but the theory upon which they are
based includes abilities that are related to the acquisition,
rather than the use of language.
• Frame of reference

• The results of language tests can be interpreted in two


different ways, depending on the frame of reference
adopted.

• When test scores are interpreted in relation to the


performance of a particular group of individuals, we speak
of a norm-referenced interpretation.

• On the other hand, if they are interpreted with respect to a


specific level or domain of ability, we speak of a criterion-
referenced interpretation.
• Norm-referenced tests

• Designed to enable the test user to make ‘normative’


interpretations of test results.

• Test results are interpreted with reference to the


performance of a given group or norm.

• The norm group is typically a large group of individuals who


are similar to the individuals for whom the test is designed.

• in the development of NR tests the norm group is given the


test and then the characteristics or norms of this group’s
performance are used as reference points for interpreting
the performance of other students who take the test.
• Typical performance characteristics used are:
• Mean (x): Average score of the group
• Standard deviation (s): Indicator of how spread out the
scores of the group are

• If the NR test is properly designed, the scores attained will


typically be distributed in the shape of a normal bell-
shaped curve.
• Characteristics of the normal bell-shaped curve:

I. 50% of the scores are below the mean or average and


50% are above.

II. 34% of the scores are between the mean and one
standard deviation above (+ 1s) or below (- 1s) the
mean.

III. 27% are between one and two standard deviations from
the mean (13.5% above and 13.5% below).

IV. Only 5% of the scores will be as far away as two or more


standard deviations from the mean.
• The quintessential NR test is the standardized test, which
has three characteristics:

I. Standardized tests are based on a fixed, or standard


content, which does not vary from one form of the test
to another.

II. There are standard procedures for administering and


scoring the test, which do not vary from one
administration of the test to the next.

III. Standardized tests have been thoroughly tried out, and


through a process of empirical research and
development, their characteristics are well known.
• Criterion-referenced tests

• Designed to enable the test user to interpret a test score


with reference to a criterion level of ability or domain of
content.

• The primary concerns in developing a CR test are that it


adequately represents the criterion ability level or sample
the content domain and be sensitive to levels of ability or
degrees of mastery of the different components of that
domain.

• It is important to point out that it is this level of ability or


domain of content that constitutes the criterion, and not
the setting of a cut-off score for making decisions.
• Primary distinctions between NR and CR tests are:

I. In their design, construction, and development


II. In the scales they yield and the interpretation of these
scales

• NR tests are designed and developed to maximize


distinctions among individual test takers.

• It means that the items or parts of such tests will be


selected according to how well they discriminate
individuals.
• CR tests, on the other hand, are designed to be
representative of specified levels of ability or domains of
content.

• The items or parts will be selected according to how


adequately they represent these ability levels or content
domains.

• NR test scores are interpreted with reference to the


performance of other individuals on the test.

• CR test scores are interpreted as indicators of a level of


ability or degree of mastery of the content domain.
• Scoring procedure

• Subjective tests are distinguished from objective tests


entirely in terms of scoring procedure.

• In an objective test, the correctness of the test taker’s


response is determined entirely by predetermined criteria
so that no judgment is required on the part of scorers.

• In a subjective test, the scorer must make a judgment


about the correctness of the response based on her
subjective interpretation of the scoring criteria.
• Testing method

• It is not possible to make an exhaustive list of the methods


used for language tests since their number keeps growing.

• Some commonly used methods are multiple-choice,


completion, dictation and cloze.

• But such methods are not themselves single methods, but


consist of different combinations of features: instructions,
types of input and task types.

• Test method facets such as these provide a more precise


way of describing and distinguishing among different types
of tests than do single category labels.
THANK YOU FOR
YOUR ATTENTION

You might also like