You are on page 1of 25

PRINCIPLES OF SECOND LANGUAGE

ASSESSMENT

Dr. VMS
PRINCIPLES OF SECOND LANGUAGE ASSESSMENT

Fundamental principles for evaluating


and designing second language
assessment include validity, reliability,
practicality, equivalency, authenticity,
and washback.
Language Assessment

Validity

  Reliability

Practicality

Equivalency

Authenticity

 Washback
VALIDITY

 A test is considered valid when it reflects the test-takers’


ability in a particular area and the test does not measure
anything else.
RELIABILITY

 A test is considered reliable if it is administered on different


occasions and similar results are obtained. Brown and
Abeywickrama (2010, p. 27) suggested the following ways
to ensure that a test is reliable:
 It is consistent in its conditions across two or more
administrations.
 It gives clear directions for scoring or evaluation.

 It has uniform rubrics for scoring or evaluation.

 It lends itself to consistent application of those rubrics by


the rater.
 It contains items or tasks that are unambiguous to the test-
takers.
PRACTICALITY
 Practicality refers to the logistical, practical, and
administrative issues involved in the process of constructing,
administering, and rating an assessment instrument (Brown
& Abeywickrama, 2010).
 Bachman and Palmer (1996, p. 36), defined practicality as
“the relationship between the resources that will be required
in the design, development, and use of the test and the
resources that will be available for testing activities.”
EQUIVALENCY AND AUTHENTICITY

“An assessment has the property of


equivalency if it is directly based on
curriculum standards or instructional
activities. Specifically, equivalency
determines in what ways assessment
design is influenced by teaching”
(Mihai, 2010, p. 45).
WASHBACK
 Washback may have been called backwash, test impact,
measurement-driven instruction, curriculum alignment, and
test feedback (Brown & Hudson, 1998).
 Washback, the effect of testing and assessment on the
language teaching curriculum that is related to it.
 washback is used to refer to the influence that a test has on
teaching and learning (Hughes, 2003).
LYLE E. BACHMAN: 1990: 18
 ‘Measurement’ in social sciences is the process of
quantifying the characteristics according to explicit
procedures and rules.
 quantification-:-numbers

 characteristics:-verbal accounts or non-verbal, visual


representations
 Non-numerical categories or rankings –grades-a, b, c,etc

 Attitude-aptitude, intelligence, motivation, field


dependence, independence attitude, native language
. Explicit rules and procedures: Applying reliable
quantification methods
EVALUATION
 Evaluation is an activity through which the human
behaviors, actions and happenings of the world are
identified, perceived and realized. It is the only activity that
controls and provides valid judgments and conclusions about
each and every activity of the day-to-day events.
 Test is a part in the process of evaluation but not the whole
of it.
 An evaluation process may be complete when the tests are
rightly interpreted with pros and cons of it.
TESTING AND EVALUATION IN
CURRICULUM DOMAIN
 Tests do not always follow evaluation procedures and in
many cases the purpose of the tests is specific and they do
not necessarily include the evaluation procedures. Mostly
tests are conducted and made use of for pedagogical and
recruitment purposes.
GRANT HENNING (1987,P: 9)
Evaluation of the language tests should consider
 Purpose of the test

 Characteristics of the examinees

 Accuracy of measurement

 Suitability of the format and features of a test

 Developmental sample

 Availability of equivalent or equated forms

 Nature of the scoring and reporting of scores

 Procurement and

 Political compatibility of the test.


ROLE OF EVALUATION
 Identification of course objectives. (the expected or desired
learning outcome)
 Defining the objectives in terms of learners’ terminal
behavior.
 Constructing appropriate tools or instrument for measuring
the behavior.
 Applying or administering the tools/instruments and
analyzing the results to determine the degree of learners’
achievement in the instructional program.
 The above four steps are basically the same in the
evaluation of instructions, curriculum or the program as a
whole. Both measurement and evaluation require broad
variety of tools or instruments such as, tests, rating scales,
inventories, check lists, questionnaires etc.
TYPES OF EVALUATION

Evaluation

Ongoing Terminal

Formative

Summative

Brief

Extensive
ONGOING EVALUATION
 Ongoing evaluation is meant for getting the feedback
regularly after the completion of every step
during its process viz. planning, preparation, production and
application. This would enable the program to improve at
various stages at that time of the program itself. This type
of evaluation is more helpful to modify anything if
necessary in the course of the didactic process.
TERMINAL EVALUATION
 Terminal evaluation is a type of evaluation that is made
after the completion of the program and it is used to know
whether the program is success or failure. This type of
evaluation would not be used for any improvement of the
program. In general, evaluation has been further classified
into four categories: They are:
 Formative evaluation

 Summative evaluation

 Brief evaluation and

 Extensive evaluation

 
TYPES OF EVALUATION
 Formative evaluation
 Formative evaluation is a process of evaluation that is made
from time to time in the case of an instructional program and
from one stage to the other. It does not provide a totalitarian
impression of the quality either of the instructional programs,
the techniques and methods, materials or media.
  

 Summative evaluation

 Summative evaluation is that kind of evaluation which takes


into consideration the periodic evaluation that has been made
and in addition to a total evaluation of the program: process or
product made and the conclusions are arrived at keeping in view
the outcome of the periodic evaluation in addition to the final
evaluation.
TYPES OF EVALUATION
 Brief evaluation
 Evaluating a program can also be made taking into account
only some aspects and the evaluator can also give a judgment
based on the few aspects chosen for evaluation. But it will be
subjective and impressionistic and not a realistic one. This can
be useful to roughly compare two (or) more programs.

 Extensive evaluation
 Extensive evaluation involves the analysis of a program in its
entire main and sub aspects. The evaluator has to rate and
weigh each of them individually and consolidate the total
rating based on which he makes his value judgment. This is
more objective and valid.
RESEARCH TECHNIQUES
1. data collection techniques
 Observational
 Experimental

2. causality relationships
 Descriptive
 Analytical

3. relationships with time


 Retrospective
 Prospective

 Cross-sectional

4. medium of application
 Clinical
 Laboratory

 Social descriptive research


SCIENTIFIC RESEARCH

 Scientific research can be classified in several ways.


Classification can be made according to the data collection
techniques based on causality, relationship with time and
the medium through which they are applied.
Scientific research Non-scientific research

Logical Not Logical


Expanding understanding Reproduction may result in varied
Reproduced and demonstrated results

Truth and factual enquiry Acquiring knowledge and truths about


Scientific techniques are utilized the world using techniques without
Identification of problem following the scientific method.
Formulation of hypothesis
Data analysis and interpretation
Recommendations and conclusions

Systematic Investigation based on natural


Experimentation phenomenon
Observation
PRE-SCIENTIFIC MOVEMENT
 characterized by translation tests developed exclusively by
the classroom teachers.
 Relatively difficult to score objectively; thus, subjectivity
becomes an important factor in the scoring of such tests
(Brown, 1996).
THE PSYCHOMETRIC-STRUCTURALIST
PERIOD
 With the onset of the psychometric-structuralist movement
of language testing, language tests became increasingly
scientific, reliable, and precise. In this era, the testers and
psychologists, being responsible for the development of
modern theories and techniques of educational
measurement, were trying to provide objective measures,
using various statistical techniques to assure reliability and
certain kind of validity.
REFERENCE
 Bachman, L. F. (2011). Fundamental considerations in language testing. Oxford:
Oxford University Press.
 Bachman, L. F., & Palmer, A. S. (1996) Language Testing in Practice.Oxford: Oxford
University Press.
 Brown, D., &Abeywickrama, P. (2010).Language assessment principles and classroom
practices (2nd ed.), Pearson Education.
 Harris, M. (1997).Self-assessment of language learning in formal settings.ELT Journal,
Oxford University Press 51/1, 14.
 Iseni, A. (2011) Assessment, Testing and Correcting Students' Errors and Mistakes.
Language Testing in Asia (1/3).
 Miller, M.J. Reliability and validity.Graduate Research Methods, Western International
University.
 Rahimi, M, ,Momeni, G., &Nejati, R. (2012). The impact of lexically based language
teaching on students‘ achievement in learning English as a foreign language. Elsevier
Journal,31.
 Underhill, N. (1987) Testing Spoken Language: A handbook of oral testing techniques.
(1sted.) (pp. 21-86).Cambridge University Press.
 Wolfe K. (2004). Student assessment-testing and grading.Tips for Teaching Assistants
and New Instructors.Journal of Teaching in Travel & Tourism, 4/2, 80.
Thank You

You might also like