You are on page 1of 8

1

Tests
There are a number of tools for the assessment of the students and one of the best
tools is test and test is a means to measure knowledge and skill of an individual or a
group (Gay, 2000). According to Glossary of Testing and Assessment Terms (2003):

“A test is a measure that provided information about a person’s


knowledge, skill, competency or behavior.”

Test is defined as a group of questions to be answered orally or in writing or tasks to be


performed to evaluate skills or knowledge. According to Haladyna (2004, p.4):

“Test is measuring device intended to describe numerically the degree


or amount of learning... In educational testing, most tests contain a
single item or set of items intended to measure a domain of knowledge
or skills or a cognitive ability”

The purpose of the test may be to diagnose the effectiveness of teaching plans,
grouping the students, readiness for learning experience, mastery learning of students,
progress in learning of the students, learning difficulties of students, learning
requirement of the students, decision for future filed of interest (i.e. aptitude).

Types of Tests

A. Major Types of Tests

There are three major types of tests: written tests, oral tests and practical tests.

1. Written tests: A test taker who takes a written test could respond to specific items by
writing or typing within a given space of the test or on a separate form or document.

2. Oral Tests: The oral exam (also oral test or viva voce) is a practice in many
schools and disciplines, where an examiner poses questions to the student in spoken
form. The student has to answer the question in such a way as to demonstrate sufficient
knowledge of the subject in order to pass the exam.
3. Practical Tests: In practical test, examinees needed to demonstrate any learning or
task (Majeed, 2005). Practical exams can take many different forms depending on the
subject material.

B. Types of Written Tests

In written tests, examinee explores his ideas in written form. The basic types of written
tests are essay type tests and objective tests.

1. Essay Type Tests: According to Majeed (2005), an essay type test presents one or
more questions or other tasks that require extended written responses from the person
2

being tested. Ability like logical thinking, critical reasoning and systematic presentation
can be evaluated by essay type of tests.

2. Objective Type Tests: Objective type tests present students with a highly structured
task that limits the type of response, they can make. These tests provide a simple
predetermined test of correct answer. Pupils must demonstrate the specific skill,
knowledge, and understanding for the item (Linn & Miller, 2008).

C. Types of Objective Type Tests: The two main types of objective type tests are
supply type test and selections type tests.

1. Supply type test: Supply type tests include short answer items or completion items
while. Short answer item can be answered by a word, phrase, number, or symbol,
preferably there is a single correct answer. Short answer type tests used for the recall of
memorized facts and figure (Wiersma & Jurs, 1990).

2. Selection type tests: Selection type tests include true-false or alternate response
item, matching item or multiple-choice items According to Special Connections (2005),
matching items are utilized to measure a student understands of the relationship
between similar terms, events, or theorists (Kline, 1986). Matching items are considered
selection assessments, as they require students to identify or recognize the correct
association between provided alternatives. According to Wiersma & Jurs (1990),
matching items are usually presented in a two-column format: one column consists of
premises and the other, responses and students have to match premises with
responses (Majeed, 2005). In true-false or alternate response type test, a statement
or proposition (declarative statement) is given and students are asked to mark true or
false (Kline, 1986), right or wrong, correct or incorrect, yes or no, agree or disagree and
the like. They are used to measure the ability of correctness of the statement of the fact
(Linn & Gronlund, 1995). According to Axman (2005), the most commonly used type of
question is the multiple-choice question that generally measures a variety of learning
outcomes such as knowledge, understanding and application area from simple to
complex. Many standardized tests use multiple-choice items (Linn & Gronlund, 1995).
Multiple-choice questions are more easily and objectively graded than essay questions
and are more difficult to answer correctly without the required knowledge than true-false
questions.

D. Types of Tests w. r. t. Criteria: Test may be classified into norm referenced (NRT)
and criterion-referenced (CRT) tests.

1. Norm referenced tests: In norm-referenced tests, students are placed on normal


curve and compete each other (Kelly, 2005), interpretation describes the performance of
the relative position held in the some known group (Linn & gronlund, 1995), scores are
compared with the average performance of others (Woolfolk, 1998).

2. Criterion referenced tests: In criterion referenced tests, questions are written


according to specific predetermined criteria (Kelly, 2005), designed to provide measure
3

of performance that is interpretable in terms clearly defined domain of learning tasks. In


these tests, scores are compared to a set performance standard (Woolfolk, 1998).

E. Types of Tests w. r. t. Purpose: Some other types of tests are speed tests, power
tests, mastery tests, individual tests, group tests, readiness tests and oral tests. In a
speed test students must answer the questions or performed the tasks in a limited
amount of time. The purpose of a speed test is to measure primarily the rapidity of
student’s performance. In contrast to a pure speed test is a power test. The item in a
power test is more difficult and time limits should not exist. In reality, time limits
frequently do exist. Little is to be gained by giving a student unrestricted time for writing
a test (Ahmann & Glock, 1981). In mastery test the level of item difficulty is quite low
and the time limits allowed are generous. Its main purpose is to measure the knowledge
and skills that every student of the class should have acquired. It is expected that all of
the students (at least eighty percent) will perform perfectly on the test. Mastery test are
commonly used for unit of programmed instruction to find out whether a student is
processed to new material, repeat the material completed or study remedial material
(Morris, 1987; Vashist, 1993). Individual test are tests that can be administered to only
one student at a time. The examiner practically always has the responsibility of making
a written record of the student’s answer. A common illustration of an individual test is
the Wechsler intelligence scale for children (Gronlund, 1990). Commonly used group
test can be administered to more than one student at a time. Students are often tested
as a class being a group. Because of its simplicity and low cast, this type of test is
widely used (Linn and Gronlund, 1995). Readiness test are designed to determine the
ability of a student to undertake a certain type of learning. They are also designed to
discover whether a student is sufficiently advanced to profit from formal instruction in a
subject area or a program. They deal with a highly specific kind of achievement such as
reading. These tests are administered to students before the formal instruction is given
(Shah, 2002). According to Rashid (1997), tests normally used in research are:
achievement tests, aptitude tests, intelligence tests, interest inventories, and personality
measures (Harrison, 1987; Koul, 1992). Similarly psychological tests are used to
determine and analyze individual differences in general intelligence, performance,
behavior, attitude, aptitude etc. of the students (Freeman, 1965). In fact achievement
tests determine what an individual has learnt after some kind of treatment (Haladyna,
2004; Yasmin, 2005). Achievement tests focused on specific academic curricula are
referred to as the academic achievement tests. They seek to measure the academic
progress (Ahmann & Glock, 1981; Wiersma & Jurs, 1990; Kubiszyn & Borich, 2003;
Yasmin, 2005). Achievement test describes what a person has learnt (Thorndike &
Hagen quoted by Shahid et. al., 2003), a systematic procedure for determining the
learning through instruction (Gronlund quoted by Shahid et. al., 2003), the measure of
student’ s level of achievement (Gronlund & Linn, 1990), an examinee’ s achievement
(Popham, 2000), the measure of knowledge (Wrightslaw, 2005), (Factmonster
Dictionary online, 2004) or skills in a content area (Wiersma & Jurs, 1990). Shahid et.
al. (2003) has quoted Gronlund and Linn for the characteristics of good achievement
test that achievement tests are selected on the basis of difficulty level (difficulty index)
and discriminating power and thy have a description of measured behavior. These tests
are typically interpreted by means of specific educational tasks what each people has
4

learned and has yet to learn in some clearly specified achievement domain (Gronlund,
1995; Kubiszyn & Borich, 2003). Intelligence tests are designed to measure learning
abilities (Linn & Gronlund, 1995) and attempt to measure the factor of native brightness
or capacity for intelligence adaptation as it varies individual to individual (Rashid, 1999).
These tests are used for selection and classification of the students in the school,
detection of superior and inferior intelligence, award of scholarships and determination
of optimum level of work etc. Intelligence tests may be classified under three categories:
individual tests, group tests and performance tests (Linn & Miller, 2008). Interest tests
or interest inventories are used to choose courses, curricula and co curricular
activities. Personality tests or inventories are developed to assess the personality of
an individual but its assessment is not easy due to complex nature (Shahid et. al.,
2003). Personality inventories are widely used in psychological research to measure
personality because they can be constructed with many attributes of good test –
reliability, discriminatory power and well-standardized norms (Kline, 1986). Aptitude
tests measure the effects of learning under relatively uncontrolled and unknown
conditions. They are designed to predict the students’ future performance. They are
used measure the ability to learn new tasks. They can be used with pupils of more
widely varying educational background (Ahmann & Glock, 1981; Gronlund, 2003; Gay,
1996; Linn & Miller, 2008). Shahid et. al. (2003) quoted Bingham and Freeman as
aptitude tests are used to predict success to some degree, to establish where there is
evidence that you have the potential in these areas; they need to be taken in
accordance with the instruction provided (Barrett, 2004).

F. Types of Tests w. r. t. Construction:

1. Teacher made tests

2. Standardized Tests

1. General Principles for Writing Questions


5

While the different types of questions—multiple choice, fill-in-blank or short answer,


true-false, matching, and essay—are constructed differently, the following principles
apply to constructing questions and tests in general (Axman, 2005; Blerkom, 2009;
Magno & Ouano, 2009).

1. Make the instruction for each type of question simple and brief.
2. Use simple and clear language in the questions. If the language is
difficult, students who understand the material but who do not have
strong language skills may find it difficult to demonstrate their
knowledge. If the language is ambiguous, even a student with strong
language skills may answer incorrectly if his or her interpretation of the
question differs from the instructor’s intended meaning.
3. Write items that require specific understanding or ability developed in
that course, not just general intelligence or test-wise ness.
4. Do not suggest the answer to one question in the body of another
question. This makes the test less useful, as the test-wise student will
have an advantage over the student who has an equal grasp of the
material, but who has less skill at taking tests.
5. Do not write questions in the negative. If you must use negatives,
highlight them, as they may mislead students into answering
incorrectly.
6. Specify the units and precision of answers. For example, will you
accept numerical answers that are rounded to the nearest integer?

2. Suggestions for Construction of Multiple Choice Items

Shahid et. al. (2003) discuss the suggestions for the construction of multiple choice
items as given below:

1. Every item should reflect specific content and keep the content an item
independent of the content of other items on the test (Haladyna, 2004).
2. Create the stem of the item by forming a question or an incomplete sentence that
implies a meaningful problem (Linn & Gronlund, 2005) or question.
3. Write the correct answer to the question in the stem in as few words as possible
(Haladyna, 2004), include common words in the main body of question (Axman,
2005). The stem of the item should be free from irrelevant material (Kubiszyns
and Borich, 2003; Linn & Gronlund, 2005; Blerkom, 2009).
4. Write distracters that are plausible to the pupils lacking the degree of knowledge
you want them to asses. Distracters should be capable of distracting subjects
(Kline, 1986).
5. The stem should introduce what is expected of the examinees. The stem of the
item should be meaningful and should clearly formulate a problem.
6. Specific determiners should be avoided.
7. Such vocabulary should be used in the test, which is suitable to the examinees,
so that vocabulary level may not confound results (Kline, 1986; Haladyna, 2004).
6

8. The stems and choices should be positively as far as possible (Haladyna; Magno
and Ouano, 2009).
9. All choices should be plausible (Kubiszyns and Borich, 2003; Linn & Gronlund,
2005).
10. Test item should have a defensibly correct best choice (Haladyna, 2004).
11. Opinions should not be evaluated through multiple choice test items.
12. The correct item should not be at the same place in all or most of the items and
should vary randomly the placement of correct responses (Axman, 2005).
13. There should not be any overlapping in the choices (Blerkom, 2009).
14. “None of the above” choice should not be used (Kline, 1986; Axman, 2005; Linn
& Gronlund, 2005; Magno and Ouano, 2009).
15. The choice like “All the above” should be avoided because all choices should not
be equally correct (Kline, 1986; Haladyna, 2004; Axman, 2005; Linn & Gronlund,
2005; Magno and Ouano, 2009).
16. The grammar should agree will all the choices from the point of grammar and
language (Linn & Gronlund, 2005) and should be grammatically consistent with
stem (Kubiszyns and Borich, 2003, Haladyna, 2004).
17. All the common words should be taken out of choices and included in stem as far
as possible (Halaydna, 2004).
18. All the choices should be of equal strength otherwise they would provide a clue
to the examinees to the correct answer. Similarly the answer to one question
should not give clues to the answer to another (Kline, 1986; Linn & Gronlund,
2005). Make incorrect alternatives attractive to students who have not achieved
the targeted learning objectives (Axman, 2005).
19. Each item should pose only one problem (Haladyna, 2004; Linn & Gronlund,
2005).
20. If negative word is used, it should be emphasized through underlining or printing
bold. Use a negatively stated items stem only when a significant learning
outcomes require it. (Kubiszyns & Borich, 2003; Linn & Gronlund, 2005).
21. Control the difficulty of a question by making the alternatives more or less similar
or by making the main part of the question more or less specific. If the
alternatives are more similar, the student will have to make finer distinctions
among them. If the main part is more specific, the student will be required to
draw on more detailed knowledge (Axman, 2005)
22. Include from three to five options to optimize testing for knowledge (Axman,
2005).
23. Increase the difficulty of multiple-choice items from simple to complex items
(Axman, 2005).

3.3.3.1 Item Difficulty

Tests items of Difficulty Index below 0.30 and above 0.70 were dropped as difficult level
of items is discussed in detail as:

Traditional grading practices used by many teachers at nearly all levels of education
seem geared to a belief that test scores of around 70% or better represent a "passable"
7

level of achievement. The thinking that leads to the idea of a passing score in the
neighborhood of 70% may have its roots in elementary education. However, this
approach to testing frequently breaks down even at the elementary school level. There
are test construction methods available for dealing with these problems (Frary, 2008). It
is recommended to avoid the construction of test that is answered correctly by 80 to
90% of examinees. Qualified students can answer a large percentage of the questions
and negligent student will score below average on harder test. Harder tests will have
higher difficulty level and passing grade may be below 50% (Frary, 2008). According to
Shahid et. al. (2003), the average difficulty of multiple-Choice items should be:

Average Difficulty

5-Choice Items 60

4-Choice items 63

3-Choice Items 67

On a good test, most items will be answered correctly by 30% to 80% of examinees
(Kehoe, 2008).

3.3.3.2 Guessing Effect

To reduce the guessing effect, students were requested in the instructions given for
science aptitude test to avoid the use of guessing techniques to answer the question
because this exercise will apply negative impact on research results as well five options
were given in multiple choice items. According to Majeed (2005), three, four or five
choices are used but expert favor five choices to reduce the chance of guessing.
Guessing affects the scores on multiple-choice tests and score increases due to
guessing that affects the reliability of score (Frary, 2008) and will distort score to some
extent and guessing of course lower the reliability and validity (Kline, 1986). As a result,
many educators avoided use of multiple-choice tests. However, multiple-choice tests
became major use in testing and were found to have other virtues which argued for their
inclusion in the educational process, such as broader coverage of instructional topics,
accuracy of scoring, and provision of statistical feedback at the item level (Frary, 2008).
Guessing correction can be done by formula:

X correct = X - W/(N – 1), Where


X correct = Score corrected for guessing
X = No. of correct item
W = No. of wrong item
N = No. of the options in the items (Kline, 1986, Linn & Gronlund, 1995)
8

This formula is called the conventional correction formula subtracts a fraction of the
wrong answers from the number-right score (Frary, 2008).It is assumed that all wrong
answers are due to guessing and guessing correction applies to the average subject, in
individual cases it will be wrong. Guessing is not a major problem with tests and
guessing corrections are only useful in test of true false items, which are not
recommended (Kline, 1986). Guessing effect is neglible in case of five option multiple
choice items (Kline, 1986). According to Majeed (2005), five choices to reduce the
chance of guessing as given:

Chances of Correct Chances of Correct


Guess Score on 100 Items Test

5-Choice Items 1 in 5 20

4-Choice items 1 in 4 25

3-Choice Items 1 in 3 33

In balance, then, it is difficult to recommend any scoring procedure to control guessing


for typical college multiple-choice testing. In the absence of this practice, the only fair
procedure is to advise all students that it is to their advantage to answer every question
regardless of knowledge (Frary, 2008). So, test in the present study had five options
and students were requested to avoid the guessing practices.

You might also like