You are on page 1of 12

Assessment and Evaluation

Reasons for testing:


❖ Guide and improve learning and instruction.
❖ Help learners retain more content, reduce test anxiety, and aid their own monitoring
of their progress.
❖ Test results are useful tools for measuring the effectiveness of instruction and
learning.
❖ Tests results are important devices to share information with boards of education,
parents, and the general public.

The relationship between teaching and testing


❖ Testing and teaching are not separate entities. Testing is an integral part of teaching.
❖ Testing is a useful tool at the beginning of the year to diagnose what individual
learners know
❖ Testing can aid in decisions about grouping students in class
❖ Testing can help the teacher determine the pace of classroom instruction
❖ Tests help the teacher devise more differentiated activities

The successful exam teacher is likely someone who:


❖ thinks that exams are useful and important.
❖ enjoys the discipline of teaching towards an exam, and manages his/her class time
effectively.
❖ knows and understands the exam that s/he is teaching (including the necessary
administrative details).
❖ listens to students’ concerns and anxieties.
❖ gives honest and direct feedback on student performance.
❖ motivates students and fosters autonomous learning.

What do students need to know about exams?


❖ Students need an introduction to the format and content of the exam.
❖ They also need immediate information on the date(s) when the exam will be held.
❖ They need to see some sample questions showing which skills are tested and how.
❖ It is important to explain to students how the course is paced, how learning will be
monitored, and when practice tests will be used (first class).
Students should also build exam skills
❖ spend some regular-out of-class time working on their English
❖ review what they did in class and make a note to ask about anything that was not
clear.
❖ learn to use reference books intelligently.
❖ find a time and place to study - where they can concentrate and not be distracted.
❖ organize their paper work so that they can review their work easily and get a sense
of their own progress.
❖ monitor their own use of language, identify and correct their own mistakes.

Assessment
❖ The term ‘Assessment’ is an umbrella term encompassing many measurement
instruments. Some assessment information is gathered throughout a student’s time in
a course with the aim of adjusting instruction and some at the end to measure
student learning.
❖ Assessment is the process of gathering and evaluating information on what students
know, understand, and can do in order to make an informed decision about next
steps in the educational process.
❖ Assessment refers to a variety of tools (e.g., tests, quizzes, student assignments,
teacher observations) of collecting information on a learner’s language ability or
achievement.

Kinds of assessment
❖ Diagnostic assessment
❖ Formative assessment
❖ Summative assessment

Test
❖ A test in simple terms is a method of measuring a person’s ability, knowledge or
performance in a given domain.
❖ Test refers to an instrument that is used to measure student learning at a particular
point in time (e.g., multiple-choice tests, quizzes, cloze tests).
❖ A test can be informal or structured
Evaluation
❖ Evaluation is the culminating act of interpreting the information gathered for the
purpose of making decisions or judgements about students’ learning and needs, often
at a reporting time.
❖ Evaluation is also concerned with the overall language program and not just with
what individual students have learned. It could include interviews, examination of
curriculum materials and a variety of other information sources to determine how
well a program is operating and which of its goals are being met..

Assessment n Evaluation
(various sources, but especially Dan Apple 1998)
Assessment Evaluation

Formative: Ongoing to Improve Summative: Final to Gauge Quality


Learning

Process-Oriented: How Learning Is Product-Oriented: What’s Been Learned


Going

Reflective: Internally Defined Prescriptive: Externally Imposed


Criteria/Goals Standards

Diagnostic: Identify Areas for Judgmental: Arrive at an Overall


Improvement Grade/Score

Flexible: Adjust As Problems Are Fixed: To Reward Success, Punish Failure


Clarified

Absolute: Strive for Ideal Outcomes Comparative: Divide Better from Worse

Cooperative: Learn from Each Other Competitive: Beat Each Other Out
Criteria for designing and selecting tests
Bachman and Palmer (1996) describe these concepts as the qualities of test usefulness.
Validity : whether a test measures what it is supposed to measure/ the desired criterion,
and not other factors/Does it reflect the ability(ies)/ competency (ies) we want to test?
e.g Essay that requires specialist knowledge of history or science (If not known by all
students) is not valid.

✓ Content validity:
- Does the test cover all aspects of what it claims to measure? (control of grammar for
e.g/ is enough sampling of grammatical structures represented?)
- If the course contains ten objectives and you only cover two in a test, it lacks C.
validity / the case of speaking on paper.
✓ Face validity:
- Do the test items look like realistic, authentic uses of what is being measured? (mirrors
the competencies expected)
✓ Construct validity:
- Does the test coordinate with the theoretical background and the course objectives?
- Does it demonstrate to be testing the competence we want to test? example writing
(direct and indirect tests) or communication (pronunciation/ fluency/ grammar/
vocabulary use/ socio-linguistic appropriacy)
If you test only one component, your test lacks Construct validity
✓ Concurrent validity:
- If you assess mastery of a point of grammar, for e.g, in communicative use and you
test it subsequently in another behavior and the scores reveal the same, then you
have concurrent validity.

Reliability deals with the consistency of measurement/ trustworthiness of the test results.
✓ Test reliability?
Standard: Answers to test questions will be consistently trusted to represent what students
know.
✓ Test score reliability?
Requirement: Adequate specification of an analytical scoring instrument with clear and
measurable rubrics.
✓ Student-related reliability?
✓ Test-administrability reliability?
Washback means the effect of testing on teaching and learning. Negative effects include
teaching only to the test and memorizing possible test questions. Positive effects, if the test is
valid, include focusing teaching upon what is important.
➢ Tips: praise the good sides/ provide constructive criticism.

Practicality is a matter of the extent to which the demands of the particular test can be met
within the limits of existing resources including time, staff and test administration/ scoring
and interpretation.

Authenticity: a test task approximate « the real world tasks »

Approaches of testing
➢ Direct testing / Indirect testing
➢ Discrete point / Integrative
Direct: asking students to write the right tense of a verb/ Indirect: writing a
composition
Cloze procedure: indirect, but integrative (vocabulary, grammar, collocations, fixed phrases,
reading comprehension…)
➢ Norm-referenced (scores are interpreted against a mean/ median or percentile
rank)/ criterion-referenced (give feedback on specific course or lesson
objectives/ deals with the achievement of every student)

Kinds of tests and testing


➢ Proficiency tests
➢ Achievement tests
▪ Final achievement tests
- syllabus-based approach
- objective-based approach
➢ Progress tests
➢ Diagnostic tests
➢ Placement tests
➢ Aptitude tests

Test construction: questions


➢ WHY: Test purpose
➢ WHAT: Test specifications (Length of Test/ Weight to be given to each
objective/Weight to be given to each item/Estimate number of items/ macro-micro
skills)
➢ HOW: Task types

Before opting for any technique, we should ask ourselves these questions:
➢ Will it elicit the behaviour which is the target tested ability?
➢ Can it be reliably scored?
➢ Is it practical in terms of time and energy?
➢ Will it have a positive washback effect?

Test Development Process


❖ Set the objectives (Make a list of what ss should know and be able to do before
designing objectives)
❖ Write an Outline (Test specifications)
❖ Select items and construct items
❖ Write instructions
❖ Construct answer keys
❖ Administer the test
❖ Revise it

Assessment techniques
Selected-response assessments:
➢ Strengths:
• easy to score/ fast/relatively objective
➢ Weaknesses:
• Non-productive
True-false
Strengths: Ss choose from two alternatives/ easy to write/ can cover a lot of levels of
difficulty/ easy to score
Weaknesses: Sometimes tricky/ guessing/ the score might be unreliable
Tips:
➢ Wording in a true/false item or a multiple choice item should be
different from the wording of the passage being tested and, if possible,
should be based on a potential misunderstanding of the meaning of
the text.
➢ avoid words like "always" and "never." Items with these words are
almost always false, and experienced testees know that.

Multiple-choice (a stem/ distractors)


Strengths:
• highly reliable test scores.
• scoring efficiency and accuracy.
• objective measurement of student achievement or ability.
• a wide sampling of content or objectives.
• a reduced guessing factor when compared to true-false items.
• different response alternatives which can provide diagnostic feedback.
Weaknesses:
• are difficult and time consuming to construct.
• test only recognition knowledge
• problem of construct validity
• a certain degree of guessing
• backwash may be harmful
• cheating is facilitated (two versions can help avoid this)
• Not authentic

Matching
Strengths: compact in space/ objective/ Easy to design/ Easy to think of an example/ low
guessing if there are few options)
Weaknesses: Test passive knowledge/ Only for grammar, vocabulary and pronunciation/ not
good for testing the skills

Transformations
Strengths: Good for testing some structures/ Easy to write and administer/ tackles one
grammatical objective/ allow recognition of a connection between « grammar » and
« meaning »
Weaknesses: Very artificial/ There can be more than one transformation sometime.

Constructed-response assessments:
❖ Fill-in
❖ Short answers
❖ Performance
❖ Dictation
❖ etc
Strengths: Productive
Limitations: Subjective

Questions
Strengths: Good for checking comprehension/ useful to test the student’s ability to analyse
Limitations: not easy to score/ takes time to correct

Cloze tests
Strengths:
- Economical ways of measuring overall ability
- may respond to content validity (deletion of the ninth word)
- Easy to construct, administer and score
Weaknesses:
- Might not predict all language forms/ test reading comp but not other skills
(speaking, writing)
- Hard to guess all the words even by native speakers, unles you provide lists.
- Variety is necessary (not the ninth word deleted/ selection of the target words in
reading and oral ability with tapes).

C- tests (second half of every second word is deleted)


Strengths:
- Test a wide range of abilities (different short C-tests)
- Test different parts of speech
Weaknesses:
- Harder to read than cloze tests
- The passage might contain the answers

Dictation:
Strengths:
- Test overall ability including listening
- Easy to create, administer.
- Scoring is easier with partial dictation
Weaknesses:
- hard to score
Personal-response assessments: (Alternative assessment)
❖ Conference
❖ Portfolio
❖ Self and peer-assessment
❖ Checklists
❖ Journals
❖ Logs
Assessing listening
Listening and speaking are generally tested together in oral interactions; however, there are
cases when listening stands by itself (lectures, radio, railway station announcements, etc.)
✓ Content :(Macro/ micro skills)
✓ Test type or format: (monologue, dialogues, lectures, directions, instructions,
announcements, etc.)
✓ Criterial level of performance: A set of responses may be required
✓ Materials used: should be based on genuine recodings or transcripts)

Types of Listening Performances


✓ Listening for perception of the components (phonemes, words, intonation, discourse
markers, etc.) of a larger stretch of language.
✓ Listening to a relatively short stretch of language (a greeting, question, command,
comprehension check, etc.) in order to make an equally short response.
✓ Processing stretches of discourse such as short monologues for several minutes in
order to pick up specific information. students, for example, can listen for names,
numbers, a grammatical category, directions (in a map exercise), or certain facts and
events.
✓ Listening for a top-down understanding of spoken language: Listening for the gist,
for the main idea, and making inferences…
Possible techniques:
➢ Multiple Choice (to be kept short and simple)
➢ Short answers
➢ Information transfer
➢ Note taking
➢ Partial dictation
➢ T/F questions
➢ etc.
➢ Scoring: Errors of grammar and spelling shouldn’t be considered

Assessing speaking
Rule: Set tasks that are Representative of the abilities we expect students to be able to
perform to guarantee a certain amount of validity and reliability
• Content
• Text types or format (interview/ Interaction with peers, presentations)
• Criterial levels of performance :(accuracy/ appropriacy/ range/ flexibility/ Size, or
other)
Mostly used facilitation techniques: Questions and requests for information/ pictures/ Role
play/ Discussion/oral presentation/ information-gap…

What do we want to test?


Interactional or transactional communication?
• Interactional communication is for social purposes (exchange of news or
catching up)/ No particular purpose
• Transactional communication (buying something or asking about specific
information)/ A clear purpose
• Presentations and monologues, which have functional purposes, for
example, to persuade or complain.

Test formats:
• Interview One (with one or more examiners) Oral presentation One
(speaking to a real or imagined audience)
• Interactive task (at least two or more)
• Group discussion (four to six)

Possible tasks:
• Describing something (a picture, a place, a person or an event)
• Telling a story (based on a single picture or a series of pictures, or invented)
• Comparing things (Real objects, photographs, artwork or abstract concepts
• Giving some personal information (Talking about family, hobby, hometown
or some experience (such as a holiday)

Features to be tested:
• Grammar range and accuracy
• Vocabulary range and accuracy
• Task fulfilment
• Fluency
• Pronunciation/ stress
• Body- language
• Etc. (taught material)
Rating: use of rubrics
✓ http://rubistar.4teachers.org/index.php

You might also like