Principles of L2 Assessment

PRINCIPLES OF SECOND LANGUAGE
ASSESSMENT
Dr. VMS
PRINCIPLES OF SECOND LANGUAGE ASSESSMENT
Fundamental principles for evaluating

and designing second language
assessment include validity, reliability,
practicality, equivalency, authenticity,
and washback.
Language Assessment
Validity
Reliability
Practicality
Equivalency
Authenticity
Washback
VALIDITY
 A test is considered valid when it reflects the test-takers’

ability in a particular area and the test does not measure
anything else.
RELIABILITY
 A test is considered reliable if it is administered on different

occasions and similar results are obtained. Brown and
Abeywickrama (2010, p. 27) suggested the following ways
to ensure that a test is reliable:
 It is consistent in its conditions across two or more
administrations.
 It gives clear directions for scoring or evaluation.
 It has uniform rubrics for scoring or evaluation.
 It lends itself to consistent application of those rubrics by

the rater.
 It contains items or tasks that are unambiguous to the test-
takers.
PRACTICALITY
 Practicality refers to the logistical, practical, and
administrative issues involved in the process of constructing,
administering, and rating an assessment instrument (Brown
& Abeywickrama, 2010).
 Bachman and Palmer (1996, p. 36), defined practicality as
“the relationship between the resources that will be required
in the design, development, and use of the test and the
resources that will be available for testing activities.”
EQUIVALENCY AND AUTHENTICITY
“An assessment has the property of

equivalency if it is directly based on
curriculum standards or instructional
activities. Specifically, equivalency
determines in what ways assessment
design is influenced by teaching”
(Mihai, 2010, p. 45).
WASHBACK
 Washback may have been called backwash, test impact,
measurement-driven instruction, curriculum alignment, and
test feedback (Brown & Hudson, 1998).
 Washback, the effect of testing and assessment on the
language teaching curriculum that is related to it.
 washback is used to refer to the influence that a test has on
teaching and learning (Hughes, 2003).
LYLE E. BACHMAN: 1990: 18
 ‘Measurement’ in social sciences is the process of
quantifying the characteristics according to explicit
procedures and rules.
 quantification-:-numbers
 characteristics:-verbal accounts or non-verbal, visual

representations
 Non-numerical categories or rankings –grades-a, b, c,etc
 Attitude-aptitude, intelligence, motivation, field

dependence, independence attitude, native language
. Explicit rules and procedures: Applying reliable
quantification methods
EVALUATION
 Evaluation is an activity through which the human
behaviors, actions and happenings of the world are
identified, perceived and realized. It is the only activity that
controls and provides valid judgments and conclusions about
each and every activity of the day-to-day events.
 Test is a part in the process of evaluation but not the whole
of it.
 An evaluation process may be complete when the tests are
rightly interpreted with pros and cons of it.
TESTING AND EVALUATION IN
CURRICULUM DOMAIN
 Tests do not always follow evaluation procedures and in
many cases the purpose of the tests is specific and they do
not necessarily include the evaluation procedures. Mostly
tests are conducted and made use of for pedagogical and
recruitment purposes.
GRANT HENNING (1987,P: 9)
Evaluation of the language tests should consider
 Purpose of the test
 Characteristics of the examinees
 Accuracy of measurement
 Suitability of the format and features of a test
 Developmental sample
 Availability of equivalent or equated forms
 Nature of the scoring and reporting of scores
 Procurement and
 Political compatibility of the test.

ROLE OF EVALUATION
 Identification of course objectives. (the expected or desired
learning outcome)
 Defining the objectives in terms of learners’ terminal
behavior.
 Constructing appropriate tools or instrument for measuring
the behavior.
 Applying or administering the tools/instruments and
analyzing the results to determine the degree of learners’
achievement in the instructional program.
 The above four steps are basically the same in the
evaluation of instructions, curriculum or the program as a
whole. Both measurement and evaluation require broad
variety of tools or instruments such as, tests, rating scales,
inventories, check lists, questionnaires etc.
TYPES OF EVALUATION
Evaluation
Ongoing Terminal
Formative
Summative
Brief
Extensive
ONGOING EVALUATION
 Ongoing evaluation is meant for getting the feedback
regularly after the completion of every step
during its process viz. planning, preparation, production and
application. This would enable the program to improve at
various stages at that time of the program itself. This type
of evaluation is more helpful to modify anything if
necessary in the course of the didactic process.
TERMINAL EVALUATION
 Terminal evaluation is a type of evaluation that is made
after the completion of the program and it is used to know
whether the program is success or failure. This type of
evaluation would not be used for any improvement of the
program. In general, evaluation has been further classified
into four categories: They are:
 Formative evaluation
 Summative evaluation
 Brief evaluation and
 Extensive evaluation

TYPES OF EVALUATION
 Formative evaluation
 Formative evaluation is a process of evaluation that is made
from time to time in the case of an instructional program and
from one stage to the other. It does not provide a totalitarian
impression of the quality either of the instructional programs,
the techniques and methods, materials or media.

 Summative evaluation
 Summative evaluation is that kind of evaluation which takes

into consideration the periodic evaluation that has been made
and in addition to a total evaluation of the program: process or
product made and the conclusions are arrived at keeping in view
the outcome of the periodic evaluation in addition to the final
evaluation.
TYPES OF EVALUATION
 Brief evaluation
 Evaluating a program can also be made taking into account
only some aspects and the evaluator can also give a judgment
based on the few aspects chosen for evaluation. But it will be
subjective and impressionistic and not a realistic one. This can
be useful to roughly compare two (or) more programs.
 Extensive evaluation
 Extensive evaluation involves the analysis of a program in its
entire main and sub aspects. The evaluator has to rate and
weigh each of them individually and consolidate the total
rating based on which he makes his value judgment. This is
more objective and valid.
RESEARCH TECHNIQUES
1. data collection techniques
 Observational
 Experimental
2. causality relationships
 Descriptive
 Analytical
3. relationships with time

 Retrospective
 Prospective
 Cross-sectional
4. medium of application
 Clinical
 Laboratory
 Social descriptive research

SCIENTIFIC RESEARCH
 Scientific research can be classified in several ways.

Classification can be made according to the data collection
techniques based on causality, relationship with time and
the medium through which they are applied.
Scientific research Non-scientific research
Logical Not Logical

Expanding understanding Reproduction may result in varied
Reproduced and demonstrated results
Truth and factual enquiry Acquiring knowledge and truths about

Scientific techniques are utilized the world using techniques without
Identification of problem following the scientific method.
Formulation of hypothesis
Data analysis and interpretation
Recommendations and conclusions
Systematic Investigation based on natural

Experimentation phenomenon
Observation
PRE-SCIENTIFIC MOVEMENT
 characterized by translation tests developed exclusively by
the classroom teachers.
 Relatively difficult to score objectively; thus, subjectivity
becomes an important factor in the scoring of such tests
(Brown, 1996).
THE PSYCHOMETRIC-STRUCTURALIST
PERIOD
 With the onset of the psychometric-structuralist movement
of language testing, language tests became increasingly
scientific, reliable, and precise. In this era, the testers and
psychologists, being responsible for the development of
modern theories and techniques of educational
measurement, were trying to provide objective measures,
using various statistical techniques to assure reliability and
certain kind of validity.
REFERENCE
 Bachman, L. F. (2011). Fundamental considerations in language testing. Oxford:
Oxford University Press.
 Bachman, L. F., & Palmer, A. S. (1996) Language Testing in Practice.Oxford: Oxford
University Press.
 Brown, D., &Abeywickrama, P. (2010).Language assessment principles and classroom
practices (2nd ed.), Pearson Education.
 Harris, M. (1997).Self-assessment of language learning in formal settings.ELT Journal,
Oxford University Press 51/1, 14.
 Iseni, A. (2011) Assessment, Testing and Correcting Students' Errors and Mistakes.
Language Testing in Asia (1/3).
 Miller, M.J. Reliability and validity.Graduate Research Methods, Western International
University.
 Rahimi, M, ,Momeni, G., &Nejati, R. (2012). The impact of lexically based language
teaching on students‘ achievement in learning English as a foreign language. Elsevier
Journal,31.
 Underhill, N. (1987) Testing Spoken Language: A handbook of oral testing techniques.
(1sted.) (pp. 21-86).Cambridge University Press.
 Wolfe K. (2004). Student assessment-testing and grading.Tips for Teaching Assistants
and New Instructors.Journal of Teaching in Travel & Tourism, 4/2, 80.
Thank You

Principles of L2 Assessment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Principles of L2 Assessment

Uploaded by

Copyright:

Available Formats

PRINCIPLES OF SECOND LANGUAGE

Fundamental principles for evaluating

 A test is considered valid when it reflects the test-takers’

 A test is considered reliable if it is administered on different

 It has uniform rubrics for scoring or evaluation.

 It lends itself to consistent application of those rubrics by

“An assessment has the property of

 characteristics:-verbal accounts or non-verbal, visual

 Attitude-aptitude, intelligence, motivation, field

 Characteristics of the examinees

 Suitability of the format and features of a test

 Availability of equivalent or equated forms

 Nature of the scoring and reporting of scores

 Political compatibility of the test.

 Brief evaluation and

 Summative evaluation is that kind of evaluation which takes

3. relationships with time

 Social descriptive research

 Scientific research can be classified in several ways.

Logical Not Logical

Truth and factual enquiry Acquiring knowledge and truths about

Systematic Investigation based on natural

You might also like