Avaliação em Língua Inglesa: Métodos
Avaliação em Língua Inglesa: Métodos
PROCESSOS AVALIATIVOS EM
LÍNGUA INGLESA
Before choosing any type of test method, you must first determine the
purpose for assessing. After that, you need to start planning it. At this point, you
must identify and clarify the test's content, decide on the proportionate weights
assigned to each section, select the assessment method, determine how the test
will be given, establish the correction procedures that will be followed, and
ultimately decide on how the results will be communicated.
Some teachers use the specification table that helps determine the goals
and material that will be covered in the test (Shohamy, 1985). Let’s observe an
example of a test specifications table.
Test specification
Evaluate the simple Evaluate the simple past - Giving chronological
Objective past - regular verbs irregular verbs sequence of past events
Performance-based, direct
Knowledge-based, knowledge-based, indirect items
Content indirect items items
What kind of test Achievement test
will it be?
Talking about 25% 25% 50%
past events
Test method Fill in the blanks Multiple-choice + fill in the Writing: Comic strips-retell
blanks the story
Speaking: Talk about pictures
of your childhood.
Time 15 minutes 15 minutes Writing : 20 minutes
Speaking: 20 minutes
Source: Retorta, 2023.
We will start this module by talking about the types of testing and how the
choice should be in line with the concept of language and teaching the teacher is
engaged in.
As soon as you establish a purpose and the specification for your test, now
comes the time to choose a test method. Below are some of the methods you might
choose to compose your test. Remember, the choice of the test will depend on the
objective of the evaluation.
2
1.1 Multiple choice
Multiple choice test is a method that requires the test-taker to choose the
accurate response/s from a set of options. The options that are incorrect are
referred to as distractors. Ideally, the distractors should be designed to mislead
those who lack the relevant knowledge. Here are some details about this type of
test.
Writing the Creating multiple choice questions is not as easy as it may seem, and
question/stem requires significant expertise from the examiner. The quality of the
questions can only be evaluated after they have been administered and
graded, as this determines if the distractors were effective in misleading the
test-takers.
Correction Multiple choice exams are easy to evaluate and can be scanned optically
for grading purposes. It is considered to have high reliability.
Types of multiple- Standard multiple choice: This is the most common type of multiple choice
choice tests question where a candidate is given a stem, followed by several options,
among which only one is the correct answer.
Multiple response: This type of multiple choice question allows the student
to select more than one correct answer from a list of options.
Ordering: This type of multiple choice question requires the candidate to
place items in the correct order, for example, sequencing events or steps in
a process.
Explanation: This type of multiple choice question asks the student to
choose the best explanation for a given concept or situation, rather than
selecting the correct answer from a list of options.
Advantages and Multiple choice tests are convenient to administer and score, and they are
disadvantages recognized for their objectivity and dependability, making them a preferred
choice for conducting large-scale evaluations. Nonetheless, creating
effective questions can be challenging, since pilot testing is advisable but
not always feasible.
Overall, multiple choice tests can be a useful tool for language testing,
particularly when used in combination with other assessment methods to provide
a more comprehensive picture of a test taker's language skills.
3
1.2 True/False tests
True/False
Writing the The development of True/False questions is more complex than it might
initially appear, and requires just as much care and attention as the writing
question/stem
of multiple-choice questions. Despite the apparent simplicity of the process,
it is important to exercise caution to avoid creating questions that are
unclear, misleading, or poorly constructed. To ensure that True/False items
are effective, pre-testing is recommended in order to identify and correct
any issues before administering the test to a larger population. Creating
high-quality True/False questions requires both skill and experience, and
we suggest that test-makers begin with open-ended questions before
generating false options based on incorrect or inconsistent responses from
test-takers.
Correction True/False tests are easy to correct and can be done mechanically and are
considered highly reliable but not necessarily valid.
Types of true/false Simple True/False: This type of test presents declarative statements that
tests are either true or false, and the student must select the correct answer.
True/False with Reasons: This type of test requires the student not only to
indicate whether a statement is true or false, but also provide a reason for
their answer.
Matching True/False: This type of test presents two columns, one with
statements and the other with True or False. The student must match the
statements with the appropriate True or False column.
Complex True/False: This type of test presents more complex sentences
with multiple clauses, and the student must determine the truth or
falsehood of each individual clause.
Advantages and The main benefit of using the True/False method is its efficiency, as it is
disadvantages quick and easy to grade, and the grader does not need to have knowledge
of the subject being tested. However, a downside is that it allows for
guessing, as the chance of guessing correctly is 50%, making it less reliable
than other objective assessment methods. Additionally, this method can
facilitate cheating on tests. Moreover, it can be challenging to create high-
quality false items for this type of test.
Matching tests are a type of language test where the student has to match
items from one column to items from another column. The items may be words,
4
phrases, or sentences, and they are usually related in some way. Matching tests
are commonly used to assess vocabulary knowledge, collocations, and grammar
structures. They can be presented in different formats, such as multiple-choice
matching or open-ended matching.
Matching
Writing the This method requires careful construction to ensure that they accurately
question/stem reflect the preceding statement. To minimize guessing, it is recommended
that the conclusion column should have more options than necessary to
prevent the candidate from guessing the correct answer by elimination.
Authenticity This method is considered to have low authenticity because it differs from
an argumentative linguistic interaction where a set of propositions supports
a conclusion. In this method, the conclusions are already established and
may not be the same as those of the student being evaluated.
Correction This method is easy to correct and score, with a high degree of reliability.
Types of matching Simple matching: In this type of test, the student is required to match items
tests from one column to items in another column. The items in both columns
are typically presented in random order and the student must match each
item correctly. This type of test is commonly used to test vocabulary and
grammar knowledge.
Complex matching: In this type of test, the student is required to match
items from one column to multiple items in another column. The items in
the second column may be presented in a different order or may be mixed
with distractors. This type of test is commonly used to test comprehension
skills and to assess the student's ability to identify relationships between
different pieces of information.
Advantages and The main advantage of this method is the ease of correction, while the
disadvantages drawback lies in the challenge of crafting properly structured questions.
5
2.1 The cloze procedure
Writing the Once a text has been selected, the examiner needs to choose a criterion
question/stem for the test. The examiner then deletes the word(s) from the text according
to the type of procedure he/she wishes to use.
Authenticity The cloze method is very authentic as it assesses the overall ability of the
test-taker. However, there are several criticisms of this method, as there
is no agreement on which specific skill or ability it evaluates. Another
group argues that in practical situations, we do not usually have to guess
the meaning of words in context.
Correction If the examiner selects the fixed-ratio criterion, correcting the cloze test is
simple as there is only one correct answer for the candidate to choose.
On the other hand, if the rational deletion criterion is used, correcting the
test becomes a bit more challenging because the student may have
multiple options, including synonyms, to fill in the blank.
Types of cloze Fixed-ratio deletion cloze: This type of cloze test involves deleting every
procedures nth word from a passage, usually every fifth or sixth word, and requires
the test-taker to fill in the missing words.
Rational deletion cloze: In this type of cloze test, words are removed from
a passage based on the text's logical structure or semantic relationships,
making it more challenging to fill in the blanks.
Semantic cloze: This type of cloze test involves removing words from a
passage that are semantically related to the target word, making it
necessary for the test-taker to have a good understanding of the
passage's meaning.
Syntax cloze: This type of cloze test involves deleting words from a
passage based on their grammatical function or syntactic structure,
requiring the test-taker to understand the syntax of the sentence to fill in
the blanks.
Discourse cloze: This type of cloze test requires the test-taker to fill in the
blanks with words that fit the text's overall meaning and flow, rather than
just selecting individual words that match grammatically or semantically.
Advantages and The cloze procedure has the advantage of being easy to create,
disadvantages administer and correct. However, one of its main disadvantages is the lack
of consensus on what construct it actually evaluates.
In sum, the cloze procedure is best used when the aim is to evaluate the
test-taker's ability to comprehend a passage and fill in missing words in context. It
6
can be used to assess various language skills such as vocabulary, grammar,
syntax, and reading ability.
2.2 Role-play
Role-play
Authenticity When role play activities are connected to genuine, day-to-day scenarios,
they can be deemed as authentic. However, if they are unrealistic or
imaginary, they cannot be viewed as authentic but are still acceptable.
Correction The way in which a role play is assessed will vary based on the examiner's
objectives for the activity. The evaluation may concentrate on the
appropriateness of the vocabulary and grammar used, as well as whether
the activity's objectives were met. Additionally, more subjective factors
such as originality and ingenuity may also be considered during the
assessment.
Types of cloze Simulated Role Plays: This involves presenting a situation or scenario that
procedures is similar to a real-life situation that the test-taker may encounter. The
student is then required to play a particular role and engage in a
conversation with an interlocutor.
Information Gap Role Plays: In this type of role play, one participant has
information that the other participant does not. The participant who has the
information has to convey it to the other participant using language. This
type of role play assesses the student's ability to use language to convey
information accurately and effectively.
Decision-Making Role Plays: In this type of role play, the test-taker is
presented with a problem or situation that requires them to make a decision
or take action. The student then uses language to justify their decision or
7
actions. This type of role play assesses the student's ability to use
language to make decisions and express opinions.
Interactive Role Plays: This type of role play requires the student to engage
in a conversation with an interlocutor. The conversation is designed to
simulate a real-life situation, such as making a reservation or ordering food
at a restaurant. This type of role play assesses the student’s ability to use
language in a natural, conversational context.
Advantages and Role play is a technique used in various fields that has advantages such
disadvantages as enhancing learning, developing social skills, promoting empathy, and
providing a safe environment to experiment. However, it also has
disadvantages, such as unrealistic scenarios, lack of confidence, time-
consuming, and limited scope. Careful planning, clear objectives, and
consideration of participant needs and expectations are essential for the
success of role play activities.
8
Open-ended questions
Writing the One might think that open-ended questions may appear simple to write.
question/stem However, for them to be considered valid and reliable, the examiner must
pay special attention to the coherence of the objectives of the exam.
Selecting well-written texts is a prerequisite for developing good questions.
The choice of question type will depend on various factors, including the
textual structure, the intended level of difficulty, the type of reading being
assessed, and other related factors.
Correction The main limitation of these questions is their lack of reliability due to the
subjective nature of this evaluation process. Correcting open-ended
questions can be difficult as there may be disagreements among
examiners on what is considered correct or acceptable. Therefore, the
reliability of the test depends on the establishment of clear and objective
criteria, which can be developed through the use of correction grids.
Types of open- Open-ended questions in language testing can be categorized into five
ended questions types: discursive questions, interpretive questions, creative questions,
comparative questions, and personal response questions. Discursive
questions require discussing a topic, interpretive questions require
interpreting a text, creative questions require using imagination,
comparative questions require comparing and contrasting, and personal
response questions require providing a personal response to a prompt 1.
Advantages and Open-ended questions have the advantage of decreasing the chance of
disadvantages guessing correct answers as each student must provide their own unique
response. This method is also authentic and can assess a combination of
skills such as reading and writing. However, a major drawback of this
approach is the time-consuming process of correction, which involves
creating a detailed correction grid and manually correcting each response.
This cannot be accomplished by a machine. Moreover, selecting
appropriate texts for the exam is also a demanding task that requires
considerable effort and attention.
1For a thorough explanation about types of open-ended questions, we suggest that you read
chapter 2 of Nery (2003).
9
3.2 Writing tasks
Writing tasks
Types of cloze The type of writing tasks assigned to students will depend on the teacher’s
procedures objective.
Advantages and Writing tasks in language testing have advantages such as providing a
disadvantages more accurate assessment of writing skills, assessing a range of skills,
encouraging creativity and higher-level thinking, and customization for
different levels and purposes. However, writing tasks are time-consuming
to administer and correct, may involve subjectivity in grading, have a limited
scope of assessment, and may be difficult to generate relevant prompts.
10
automated scoring system, analyzing the results to determine proficiency levels or
progress, reporting results in the form of scores, rankings, or study
recommendations, providing feedback to test-takers on their performance and
areas of improvement, and evaluating the effectiveness of the test itself for
potential modifications in the future. In this session, we will study the whole
process.
Once the test has been designed and reviewed, it's time to administer it.
But before that, the teacher or examiner should certify that all the requirements are
met. Based on Shohamy (1985), we propose that the following recommendations
should be followed.
Recommendations to be followed
Try not to increase the student's anxiety level. High anxiety can affect test performance.
Remember that a test is not to be used as a means of punishment.
Make it clear to the student what will be evaluated a few weeks before the application of the test.
So they can prepare themselves for the assessment.
Do not forget to include the value of each question in the test and, if it is a task, make clear the
criteria that will be evaluated. This can help the student organize their time during the test.
Before starting the application, inform the student how much time they will have to take the test.
Mix up the order of items on the test or on the answer sheet to avoid “cheating”
Review the test with the student before starting. With this, doubts/problems can be resolved and
the student will be more familiar with the procedures adopted.
11
It is important to correct the test promptly after it has been administered to
ensure that the evaluation goal is accomplished. In order for an evaluation tool to
be reliable and valid, it is crucial for the examiner to have clear and well-defined
correction criteria that are also explicitly communicated to the students.
Performance-based tests should have criteria set by the teacher, whereas
large-scale tests have criteria set by a team. In order to ensure that the grading or
scoring scale is appropriate, it is necessary for the examiners to undergo training
and make any necessary adjustments to the scale to ensure its validity.
A fair correction process ensures that all test-takers are evaluated
consistently, without bias or discrimination. It also helps to maintain the validity of
the test and the credibility of the evaluation system. Therefore, it is crucial for
examiners to follow clear and well-defined correction criteria, and to ensure that
they are applied consistently and objectively to all test-takers.
12
of the test. To obtain a representative sample of a student's progress and
difficulties, it is recommended that the teacher use a variety of evaluation
instruments, such as written tests, tasks, seminars, exercises, and assignments,
over a period of time, such as two or three months, or a semester.
After the evaluations have been completed, the teacher should conduct a
detailed analysis of the results, identifying the students' mistakes and difficulties.
Based on this analysis, the teacher can determine the appropriate measures to
help the student effectively learn the content. This may involve implementing a
remedial work (recuparação paralela) process to provide the student with a second
chance to improve their performance.
According to Law No. 9.394 (Brasil, 1996), which establishes the Guidelines
and Bases of National Education (LDB), remedial studies are mandatory. This
obligation is outlined in Article 24, item V, letter "e", which states that remedial
studies should ideally be conducted alongside the regular school period and
regulated by educational institutions in their policies. This means that remedial
work (recuperação paralela), as determined by CNE/CEB No. 12 (Brasil, 1997),
cannot be incorporated into the subject's curriculum and students must be offered
complementary activities within the institution's pedagogical project. The provision
of these complementary studies by the institution is not enough, and there must
be a reassessment, also parallel, as stated in the regulations. These corrective
measures throughout the school year may result in better performance for students
with lower grades and learning difficulties.
Next, we will talk about the last phase of administering a language test: the
analysis of the items and the correction of any badly written item or question.
13
correspond to a particular learning outcome and should be written accurately to
reflect the language skills being measured.
An additional crucial consideration when analyzing test items is the level of
difficulty. Test items should be suitably challenging for the intended test-takers. If
the items are too simple, they may not accurately evaluate the test-taker's
language proficiency, while if they are too difficult, they may cause undue stress
and anxiety for the test-taker (Shohamy, 1985).
Furthermore, test items should not include bias or ethnic/social/cultural
references that may unfairly disadvantage particular test-takers. The items should
be clearly and easily understandable and should not necessitate any specific
cultural or background knowledge.
Last, test items should be evaluated to ensure that they are reliable and
valid. This involves examining the statistical properties of the items, mainly when
considering large-scale exams, to confirm that they are evaluating the intended
language skills, and that the scores are accurate and consistent across different
test-takers. In relation to achievement tests, a healthy way to analyze test items is
to classify the test results according to the percentage of mistakes made on each
item. For example: if 70% of the class got one item wrong, the teacher should ask
if it was because it was too difficult for that group or if the item was badly written.
It is advisable to ask students what they thought about the items students got
wrong. You will get interesting comments on the mistakes and with them you can
decide whether you will disregard the item, which is an ethical attitude to take, or
redo the test with the item corrected.
Overall, analyzing test items is a critical aspect of language testing as it
helps ensure that the test is fair, reliable, and accurately measures the language
skills of the test-takers.
14
REFERÊNCIAS
15