You are on page 1of 17

English Language Assessment:

Meeting 3
 1. Make a clear statement of the testing ‘problem’.
 2. Write test specifications
 3. Write and moderate items.
 4. Trial the items on native speakers
Stages of test  5. Trial the test on a group of non-native speakers
development  6. Analyse the results of the trial
(Hughes, 2003)
 7. Calibrate scales.
 8. Validate (for high stakes)
 9. Write handbooks for test takers, test users and staff.
 10. Train any necessary staff
 What kind of test is it to be? Achievement (final or
progress), proficiency, diagnostic, or placement?
 What is its precise purpose?
 What abilities are to be tested?
Stating the  How detailed must the results be?
problem  How accurate must the results be?
 How important is backwash?
 What constraints are set by unavailability of expertise,
facilities, time (for construction, administration and
scoring)?
According to Brown
(2004) the composition
of test specifications
includes,
the outline of the test,
skills to be included,
item types and tasks.
Sample
( Brown, 2004)
 Content (operation, types of text, addresses, length,
topics, structural range, vocabulary, dialect and style,
speed of processing)
Specifications  Test structure ( e.g. 3 sections, expeditious reading, or
for the test no separate items, etc.)
 Number of items and passages
 Timing (for each section and for entire test) e.g. 30
minutes for all multiple choice questions
 medium/channel, techniques (paper and pencil, tape,
computer, face-to-face, etc. and how to measure skills and
subskills)
 Techniques, e.g. half of the items will be gap filling and the
Specifications other half will be MC.
 criterial levels of performance (accuracy, appropriacy, range,
for the test flexibility, size) e.g. completed performance is fulfilled by
cont… 75% accuracy and correct answers.
 and Scoring procedures e.g. Students answer questions in a
separated answer sheet and a set of key answer will be
provided for scoring
Sampling (considering content
validity and beneficial backwash)
Writing and Writing items (planned, precise, and
moderating clear)
items Moderating Items (proofread by at
least two colleagues and informal
trial items on native speakers)
Sample of
moderation
grammar
items
Activity: Moderating grammar items
 At the whole-test level:
 Descriptive statistics (mean, spread),
 Reliability (internal consistency, or inter-rater)

Analysis of test  At the item level:


results  Item Facility (IF)
 Item Discrimination (ID)
 Distracters (mc items only)
 Measures the difficulty of an item (the higher, the easier)
 Index p, value between 0 and 1, e.g., p = .65
 Often informally given as a percentage (65%)
Item Facility  Divide number who answer an item correctly by the number
(IF) of test takers, e.g., with 5 test takers:

• p = correct item total / number


of test takers
• p = 2 / 5 = .4
 Indicates how well an item distinguishes between weak
and strong candidates

Item  Expectation: those students who perform well overall


are more likely to get a particular item right than those
Discrimination who perform poorly overall
(ID)  The more discriminating the items, the more reliable the
test
 Highly discriminating items => high reliability!
 D = 0: weak and strong students perform the same
ID Continued..
(Brown, 2004)
 Calibration (assigning items that have full range
of relevant scales)
Other stages in  Validation ( for published test, measure what it
test intends to measure)
development  Handbook ( for test takers, users, and staffs)
 Staff training (raters, scorers, computer
operators, etc.)
 Try to develop a placement test for
undergraduate students who will join a TOEFL
preparation class. This test is important to
Activity determine the competence level of the
students and help teachers setting up the
syllabus and course materials. In this case,
backwash is an essential part of the test.
Questions? Thank you !

You might also like