Professional Documents
Culture Documents
Assessment
❖ The term ‘Assessment’ is an umbrella term encompassing many measurement
instruments. Some assessment information is gathered throughout a student’s time in
a course with the aim of adjusting instruction and some at the end to measure
student learning.
❖ Assessment is the process of gathering and evaluating information on what students
know, understand, and can do in order to make an informed decision about next
steps in the educational process.
❖ Assessment refers to a variety of tools (e.g., tests, quizzes, student assignments,
teacher observations) of collecting information on a learner’s language ability or
achievement.
Kinds of assessment
❖ Diagnostic assessment
❖ Formative assessment
❖ Summative assessment
Test
❖ A test in simple terms is a method of measuring a person’s ability, knowledge or
performance in a given domain.
❖ Test refers to an instrument that is used to measure student learning at a particular
point in time (e.g., multiple-choice tests, quizzes, cloze tests).
❖ A test can be informal or structured
Evaluation
❖ Evaluation is the culminating act of interpreting the information gathered for the
purpose of making decisions or judgements about students’ learning and needs, often
at a reporting time.
❖ Evaluation is also concerned with the overall language program and not just with
what individual students have learned. It could include interviews, examination of
curriculum materials and a variety of other information sources to determine how
well a program is operating and which of its goals are being met..
Assessment n Evaluation
(various sources, but especially Dan Apple 1998)
Assessment Evaluation
Absolute: Strive for Ideal Outcomes Comparative: Divide Better from Worse
Cooperative: Learn from Each Other Competitive: Beat Each Other Out
Criteria for designing and selecting tests
Bachman and Palmer (1996) describe these concepts as the qualities of test usefulness.
Validity : whether a test measures what it is supposed to measure/ the desired criterion,
and not other factors/Does it reflect the ability(ies)/ competency (ies) we want to test?
e.g Essay that requires specialist knowledge of history or science (If not known by all
students) is not valid.
✓ Content validity:
- Does the test cover all aspects of what it claims to measure? (control of grammar for
e.g/ is enough sampling of grammatical structures represented?)
- If the course contains ten objectives and you only cover two in a test, it lacks C.
validity / the case of speaking on paper.
✓ Face validity:
- Do the test items look like realistic, authentic uses of what is being measured? (mirrors
the competencies expected)
✓ Construct validity:
- Does the test coordinate with the theoretical background and the course objectives?
- Does it demonstrate to be testing the competence we want to test? example writing
(direct and indirect tests) or communication (pronunciation/ fluency/ grammar/
vocabulary use/ socio-linguistic appropriacy)
If you test only one component, your test lacks Construct validity
✓ Concurrent validity:
- If you assess mastery of a point of grammar, for e.g, in communicative use and you
test it subsequently in another behavior and the scores reveal the same, then you
have concurrent validity.
Reliability deals with the consistency of measurement/ trustworthiness of the test results.
✓ Test reliability?
Standard: Answers to test questions will be consistently trusted to represent what students
know.
✓ Test score reliability?
Requirement: Adequate specification of an analytical scoring instrument with clear and
measurable rubrics.
✓ Student-related reliability?
✓ Test-administrability reliability?
Washback means the effect of testing on teaching and learning. Negative effects include
teaching only to the test and memorizing possible test questions. Positive effects, if the test is
valid, include focusing teaching upon what is important.
➢ Tips: praise the good sides/ provide constructive criticism.
Practicality is a matter of the extent to which the demands of the particular test can be met
within the limits of existing resources including time, staff and test administration/ scoring
and interpretation.
Approaches of testing
➢ Direct testing / Indirect testing
➢ Discrete point / Integrative
Direct: asking students to write the right tense of a verb/ Indirect: writing a
composition
Cloze procedure: indirect, but integrative (vocabulary, grammar, collocations, fixed phrases,
reading comprehension…)
➢ Norm-referenced (scores are interpreted against a mean/ median or percentile
rank)/ criterion-referenced (give feedback on specific course or lesson
objectives/ deals with the achievement of every student)
Before opting for any technique, we should ask ourselves these questions:
➢ Will it elicit the behaviour which is the target tested ability?
➢ Can it be reliably scored?
➢ Is it practical in terms of time and energy?
➢ Will it have a positive washback effect?
Assessment techniques
Selected-response assessments:
➢ Strengths:
• easy to score/ fast/relatively objective
➢ Weaknesses:
• Non-productive
True-false
Strengths: Ss choose from two alternatives/ easy to write/ can cover a lot of levels of
difficulty/ easy to score
Weaknesses: Sometimes tricky/ guessing/ the score might be unreliable
Tips:
➢ Wording in a true/false item or a multiple choice item should be
different from the wording of the passage being tested and, if possible,
should be based on a potential misunderstanding of the meaning of
the text.
➢ avoid words like "always" and "never." Items with these words are
almost always false, and experienced testees know that.
Matching
Strengths: compact in space/ objective/ Easy to design/ Easy to think of an example/ low
guessing if there are few options)
Weaknesses: Test passive knowledge/ Only for grammar, vocabulary and pronunciation/ not
good for testing the skills
Transformations
Strengths: Good for testing some structures/ Easy to write and administer/ tackles one
grammatical objective/ allow recognition of a connection between « grammar » and
« meaning »
Weaknesses: Very artificial/ There can be more than one transformation sometime.
Constructed-response assessments:
❖ Fill-in
❖ Short answers
❖ Performance
❖ Dictation
❖ etc
Strengths: Productive
Limitations: Subjective
Questions
Strengths: Good for checking comprehension/ useful to test the student’s ability to analyse
Limitations: not easy to score/ takes time to correct
Cloze tests
Strengths:
- Economical ways of measuring overall ability
- may respond to content validity (deletion of the ninth word)
- Easy to construct, administer and score
Weaknesses:
- Might not predict all language forms/ test reading comp but not other skills
(speaking, writing)
- Hard to guess all the words even by native speakers, unles you provide lists.
- Variety is necessary (not the ninth word deleted/ selection of the target words in
reading and oral ability with tapes).
Dictation:
Strengths:
- Test overall ability including listening
- Easy to create, administer.
- Scoring is easier with partial dictation
Weaknesses:
- hard to score
Personal-response assessments: (Alternative assessment)
❖ Conference
❖ Portfolio
❖ Self and peer-assessment
❖ Checklists
❖ Journals
❖ Logs
Assessing listening
Listening and speaking are generally tested together in oral interactions; however, there are
cases when listening stands by itself (lectures, radio, railway station announcements, etc.)
✓ Content :(Macro/ micro skills)
✓ Test type or format: (monologue, dialogues, lectures, directions, instructions,
announcements, etc.)
✓ Criterial level of performance: A set of responses may be required
✓ Materials used: should be based on genuine recodings or transcripts)
Assessing speaking
Rule: Set tasks that are Representative of the abilities we expect students to be able to
perform to guarantee a certain amount of validity and reliability
• Content
• Text types or format (interview/ Interaction with peers, presentations)
• Criterial levels of performance :(accuracy/ appropriacy/ range/ flexibility/ Size, or
other)
Mostly used facilitation techniques: Questions and requests for information/ pictures/ Role
play/ Discussion/oral presentation/ information-gap…
Test formats:
• Interview One (with one or more examiners) Oral presentation One
(speaking to a real or imagined audience)
• Interactive task (at least two or more)
• Group discussion (four to six)
Possible tasks:
• Describing something (a picture, a place, a person or an event)
• Telling a story (based on a single picture or a series of pictures, or invented)
• Comparing things (Real objects, photographs, artwork or abstract concepts
• Giving some personal information (Talking about family, hobby, hometown
or some experience (such as a holiday)
Features to be tested:
• Grammar range and accuracy
• Vocabulary range and accuracy
• Task fulfilment
• Fluency
• Pronunciation/ stress
• Body- language
• Etc. (taught material)
Rating: use of rubrics
✓ http://rubistar.4teachers.org/index.php