You are on page 1of 50

Session II: Designing

valid and useful


English tests
Diana Arroyo
Universidad del Norte
2021
Participants will be able to:
• Recognize the constructs
informing standardized English
Session testing in the Colombian context.
• Identify the task types associated
objectives: with specific constructs in Saber
English tests.
• Critique an English test in terms of
its constructs and format.
1.RELIABILITY A. The test is feasible to be applied under
specific time and resource constraints.
2. VALIDITY B. Test results are consistent and dependable.
Key If you give the same test to the same student
on two occassions, the test should yield similar
concepts results.

in 3.
AUTHENTICITY
C. The test allows appropriate meaningful
inferences on the skills it intends to measure.
assessment 4.PRACTICALITY D. The test has a positive impact on teaching
and learning practices.
5. WASHBACK E. The test reflects and allows inferences on
language use outside the classroom.
Saber English tests: general considerations

• Oriented by the Common European Framework of


Reference (CEFR)
• Measures up to level B1 for high school graduates.
• Similar constructs across levels: Primary, Secondary,
University.
• Targets reading ability.
• 45 questions, 60 minutes.
CEFR Levels
Example task: Column matching

Level: Pre-A1 to A1
Example task: Cloze test (grammatical accuracy focus)

Level: A2
Example task: Cloze test (comprehension focus)

Level: B1
Example task: Utterance matching (multiple choice)

Level: A1
Example task: Utterance matching (multiple choice)

Level: A1
Example task: Context-location matching task (multiple choice)

Level: Pre-A1
Reading levels
• Contextualizes text in terms of purpose and intention.
• Extrapolates information from the text.
Inferential

• Retrieves details mentioned on the surface of the text.


Literal
Example task: Literal level reading (multiple choice)

Level: A2
Example task: Literal level reading (multiple choice)

Level: A2
Example task: inferential level reading (multiple choice)

Level: B1
Example task: inferential level reading (multiple choice)

Level: B1
Summary
A1 A2 B1
Linguistic competence Column matching Cloze test (accuracy Cloze test
focus) (comprehension
focus)

Pragmatic competence Utterance matching

Sociolinguistic Context-location
competence matching task

Reading competence Literal level reading Inferential level


multiple choice reading multiple
choice
Are your tests aiming
If not, probably they
for these competence
should…
levels and constructs?
Tips and guidelines for designing
some common test tasks
Cloze

D IN
LEA
DE
L
ET
ED

Source

• Word bank provided?


• What words are you interested in deleting?
C-tests
C-tests may be defined as the type of cloze items in
which instead of deleting complete words, the second
half of any word is deleted, beginning with the second
word of the second line; both the first and the last lines
of the text are left intact.

The usefulness of C-tests was validated


for measuring language competence in
placement rather than for measuring
successful learning processes.
C-tests

D IN
LEA
Once upon a time, there was a merchant family who owned a dog and a
donkey. One night when t_ _ owner of the ho_ _ _ was slee_ _ _ _ in
his b_ _, a th_ _ _ broke in_ _ the ho_ _ _ to st_ _ _ some o_ the own_
DE

_ _ possessions. The donkey w_ _ saw th_ thief, wo_ _ the d_ _ up a_ _


L
ET

said, “W_ _ don’t y_ _ bark a_ _ wake t__ master u_?”


ED

“Why should I care about our master?” the dog replied.


Cloze Elide

Cloze Elide requires the test taker to detect and then


eliminate the intrusive words. It can be done by
drawing a line through.
D IN
Cloze Elide

LEA
In college, your professors will occasionally provide you with writing is

topics for your papers. When the assignment has a clear topic or focus, one
EX

thing were that is often not given to you is the angle. The angle for your
TR

EX
A

topic is possible simply your perspective or point of view on the topic. The

TR

EX
A
there are rarely new topics or genres to writing

TR
A
EX

to assignments, but each assignment has a unique perspective


TR
A
EX

—your perspective.
TR
A
Considerations on multiple
choice items
Multiple choice item parts:
• Which is the largest river in the United States of
America? STEM

KEY a. Mississippi
b. Missouri
Alternatives
Distracters c. Ohio
d. Rio Grande

Direct- Question Format


Multiple choice item
• The largest river in the United States of America is
the ____________

a. Mississippi
b. Missouri
c. Ohio
d. Rio Grande

Incomplete- Sentence Format


Multiple choice item
• Which variable is generally thought to be the most
important when buying a house?

a. cost
b. builder
c. design
d. location

KEY

Best-Answer Format
Guidelines

• While there is not a universally accepted format for multiple choice items, here are a
few recommendations regarding physical layout and enhance clarity:
Format Guidelines (1)

• Brief and clear directions: Including how the selected alternative should be marked.
• The item should be numbered for easy identification, while alternatives are indented
and identified with letters.
• Either capital or lowercase letters followed or parenthesis can be used for
alternatives. If scoring sheet is used make the alternative letters on the scoring sheet
and the test as similar as possible.
Format Guidelines (2)

• There is no need to capitalize the beginning of alternatives unless they begin with a
proper name.
• Keep the alternatives in a vertical list instead of placing them side by side. It is easier
for students to scan vertical lists quickly.
• Use correct grammar and formal language structure in writing items.
• All items should be written so that the entire question appears on one page.
When the item stem is a complete sentence there
should not be a period at the end of the alternatives:

1. Which type of validity study involves a substantial


time interval between when the test is administered
and when the criterion is measured?
a. delayed study
b. content study
c. factorial study No periods!
d. predictive study
When the stem is in the form of an
incomplete statement with the missing
phrase at the end of the sentence,
alternatives should end with a period.
2. The type of validity study involves a substantial time interval
between when the test is administered and when the criterion is
measured is a _______________________
a. delayed study.
b. content study.
Periods!
c. factorial study.
d. predictive study.
Have the item stem contain all the information
necessary to understand the problem or question.

3. Absolute zero point.


a. interval scale
b. nominal scale
c. ordinal scale
d. ratio scale
4. Which scale of measurement incorporates a true or
absolute zero point?
a. interval scale
b. nominal scale
c. ordinal scale
d. ratio scale
Unnecessary content:
5. There are several different scales of measurement
used in educational settings. Which scale of
measurement incorporates a true or absolute zero
point?

DO CLA
ES RIT
a. interval scale

NO Y!
b. nominal scale

TA
c. ordinal scale

DD
d. ratio scale
Keep the alternatives brief and arrange them
in an order that promotes efficient scanning

6. Andrew Jackson _______________________


a. was born in Virginia
b. did not fight in the American Revolution due to a
childhood
c. was the 7th president of the United States
d. served three terms as president of the United States
7. Who was the 7th president of the United States of
America?
a. James Monroe
b. John Adams
c. Andrew Jackson
d. Martin Van Buren
Avoid negatively stated items
in most situations
9. Which country does not have a border with Brazil?
a. Chile
b. Argentina
c. Guyana
d. Colombia
Avoid negatively stated items
in most situations
10. Which country does NOT have a border with Brazil?
a. Chile
b. Argentina
c. Guyana ut…
, b
d. Colombia
ter .
t .
B e no

How would you re-write this item?


Make sure only one alternative is
correct or represents the best answer
11. Which country has a border with Brazil?
a. Chile
b. Argentina
c. Guyana KEY
d. Colombia
KEY
KEY
Obvious or “cascarita”?
Avoid cues that inadvertenly identify the
correct answer

13. Which type of examines the ability of test scores to predict a criterion?
a. delayed study
b. content study
c. factorial study
d. predictive study
Keep alternatives of equal length,
and complexity:
14. Ecology is the study of ____________________________________
a. genetics.
b. organism and their relationship to their environment.
c. internal balances.
d. evolution.
Make sure the alternatives are
grammatically correct relative to the stem
•Which individuals are credited with the first •Watch for a vs an in English
successful flights in a heavier-than-air aircraft
that was both powered and controlled?
• Octave Chanute
• Otto Lilienthal
• Samuel Langley
• Wilbur and Orville Wright
Guidelines:

• Make sure no item reveals the answer to another item.


• Have all the distracters appear plausible.
• Use alternative positions in random manner for correct answer.
• Avoid the use of “none of the above” or “all of the above”.
Your turn:

• Complete the alternatives for the questions you created for the competences.
• Follow the guidelines.

You might also like