You are on page 1of 8

Fakultet za pravne i poslovne studije dr Lazar Vrkati Engleski jezik

CRITERIA AND TYPES OF TESTS (Term paper) Vrednovanje u nastavi engleskog jezika

Profesor: dr Biljana Milatovi

Student: Slavica Miranovi, 202/12

Novi Sad, decembar 2012. Introduction This paper deals with criteria and types of tests. Tests are teaching devices which have several purposes: to reinforce learning, to motivate the student or to assess the students performance in the language. Therefore it is necessary to establish certain criteria for making and evaluating the tests, as well as to make a distinction between types of tests.

Criteria criterion /kraitiri. n/ noun [C] plural criteria A standard by which you judge, decide about or deal with something1 Tests should contain certain features like validity and reliability. They also have to enable discrimination, practical administration, and backwash effects.

Validity valid / adjective Based on truth or reason; able to be accepted2 In language testing, validating a test means being able to establish a reasonable link between a test-takers performance and his or her actual language ability. So, the question in validating a test is: Does the test measure what it is intended to measure? (Heaton 1990: 159). Validity, then, can be seen as a concept allowing us to endow test scores with meaning. This unitary notion of validity has traditionally been subdivided according to the kind of evidence on which the interpretations are based. Usually, one will come across the terms construct validity, content validity, empirical validity, and face validity. It should, however, be understood that these types are in reality different methods of assessing validity and that it is best to validate a test in as many ways as possible (Alderson, Clapham, Wall 2005: 171).

1 2

Cambridge Advanced Learners Dictionary, Cambridge University Press 2003, Version 1.0 Cambridge Advanced Learners Dictionary, Cambridge University Press 2003, Version 1.0

Face Validity Face validity is not so much concerned with asking whether the interpretations of the test results are valid, but rather with whether they appear valid. Basically, what we are dealing with in face validity is not the actual validity but the face value test-takers and test users attribute to the test. When referring to a tests face validity, one therefore means the degree to which test-takers and users believe the interpretation of the test results to be accurate. Face validity is therefore much more to do with acceptance than with validity (Alderson, Clapham, Wall 2005: 173).

Content validity When dealing with content validity, we are concerned with the systematic investigation of the degree to which the items on a test, and the resulting scores, are representative and relevant samples of whatever content or abilities the test has been designed to measure (Brown/Hudson 2002: 213). This kind of validity depends on a careful analysis of the language being tested and of the particular course objectives. The test should be so constructed as to contain a representative sample of the course, the relationship between the test items and the course objectives always being apparent (Heaton 1990: 160).

Construct validity The construct is defined as the abstracted set of abilities we want to infer from the test results. So, before asking whether the test measures what it is intended to measure, one has to be clear about what it is intended to measure, has to be clear about what the test construct is. Only then can we ask what the test actually measures and compare it to the predefined construct. Especially when the construct appears to be somewhat questionable, it is important to bear in mind that the theory itself is not called into question: it is taken for granted. The issue is whether the test is a successful operationalisation of the theory (Alderson, Clapham, Wall 2005: 183).

Empirical validity A fourth type of validity is usually referred to as statistical or empirical validity. This validity is obtained as a result of comparing the results of the test with the results of some criterion measure as: - an existing test, known or believed to be valid and given at the same time; or - the teachers ratings or any other form of independent assessment given at the same time; or - the subsequent performance of the testees on a certain task measured by some valid test; or - the teachers ratings or any other such form of independent assessment given later. (Heaton 1990: 161).

Reliability reliable / adjective Something or someone that is reliable can be trusted or believed because they work or behave well in the way you expect3 Reliability is a necessary characteristict of any good test: for it to be valid at all, a test must first be reliable as a measuring instrument. If the test is administered to the same candidates on different occasions (with no language practice work taking place between these occasions), then, to the exent that it produces differing results, it is not reliable. Reliabity measured in this way is commonly referred to as test/re-test reliability to distinguish it from mark/re-mark reliability and the other kinds of reliability (Heaton 1990: 162). Apart from test-retest reliability, there are two other ways to estimate reliability: parallel-form reliability and internal consistency reliability. Parallel-form reliability is concerned with the correlation between one test version and another, parallel one. Whereas this model solves the problem of having to present the same test to the same candidates twice, it creates another: having to come up with a parallel test version of equal difficulty and standard deviation (Bachman 1990: 183). In internal consistency reliability estimates, one single test administration is enough to provide information about the reliability of the entire test, as the test is split in two halves which are then treated as parallel test versions. Obviously, though, we have to make sure that the two halves are equivalent in terms of difficulty, mean and standard deviation as

Cambridge Advanced Learners Dictionary, Cambridge University Press 2003, Version 1.0

well as independent of each other. That is, that an individuals performance on one half does not affect how he performs on the other (Bachman 1990: 175).

Discrimination discriminate /diskrim.i.neit/ verb To be able to see the difference between two things or people4 Sometimes an important feature of a test is its capacity to discriminate among the different candidates and to reflect the differences in the performances of the individuals in the group. The extent of the need to discriminate will vary depending on the purpose of the test (Heaton 1990: 165). One method to provide discrimination is to use items which differ in difficulty level like: easy items, items of average difficulty level, and difficult items.

Administration administration /d,min.istrei.n/ noun The arrangements and tasks needed to control the operation of a plan or organization5 A test must be practicable: in other words, it must be fairly straight forward to administer. There are several considerations, but one of them concerns the answer sheets and the stationery used. The use of separate answer sheets greatly facilitates marking and is strongly recommended when large numbers of students are being tested (Heaton 1990: 168).

Backwash effects backwash /bk.wo/ noun An indirect effect6 When talking about backwash, we are dealing with the way in which tests affect the preceding teaching and learning process. On the one hand, backwash can be seen as a negative factor in that it may add to the predictability of a tests outcome and in that it may
4 5

Cambridge Advanced Learners Dictionary, Cambridge University Press 2003, Version 1.0 Cambridge Advanced Learners Dictionary, Cambridge University Press 2003, Version 1.0 6 Cambridge Advanced Learners Dictionary, Cambridge University Press 2003, Version 1.0

lead to a restriction of the syllabus to only those criteria which are absolutely necessary to pass the test. On the other hand, backwash can have positive aspects, as well. It is particularly in effect-driven test development that these aspects become apparent. Therefore, if we know the effects of particular tests or test methods, they can be employed as a valuable tool to create the desired influence, e.g. in a school surrounding. However, what we actually do know about specific test backwash, is surprisingly little (Alderson, Clapham, Wall 2005: 46). Studies are sometimes contradictory and a thorough investigation taking into account the many extrinsic as well as intrinsic motivational factors in test preparation both on part of the students and the teacher is still a necessity.

Types of tests In general, we can distinguish four kinds of tests: proficiency tests, achievement tests, aptitude tests, and diagnostic tests.

Achievement tests An achievement test is a test of developed skill or knowledge. The most common type of achievement test is a standardized test developed to measure skills and knowledge learned in a given grade level, usually through planned instruction, such as training or classroom instruction.

Proficiency tests Proficiency tests assess the amount to which the testee has reached proficiency, i.e. a certain predefined level. The proficiency test is thus concerned with measuring not general attainment but specific skills in the light of the language demands made later on the student by a future course of study or job.

Aptitude tests

A language aptitude test is designed to measure the students probable performance in a foreign language which he or she has not started to learn: i.e. it assesses aptitude for learning a language.

Diagnostic Tests A diagnostic test is a test that helps the teacher and learners identify problems that they have with the language. Achievement and proficiency tests are frequently used for diagnostic purposes.

Conclusion We use tests to obtain information. The information that we hope to obtain will vary from situation to situation. If we want to conduct a successful language assessment, we have to choose an apropriate test. We also have to be ensured that the test we choose is valid and reliable. Good choice of the test should have a positive effect on learning and teaching.

1. Alderson, Charles, Clapham, Caroline & Wall, Diane. (2005) Language Test

Construction and Evaluation. Cambridge: CUP.

2. Bachman, Lyle. (1990) Fundamental Considerations in Language Testing. Oxford:

3. Brown, James Dean & Hudson, Tom. (2002) Criterion Referenced Language Testing.

Cambridge: CUP.
4. 5. 6.

Heaton, J. B. (1988) Writing English Language Tests. Harlow: Pearson Longman Hughes, A. (1989) Testing for Language Teachers. Cambridge: CUP. Cambridge Advanced Learners Dictionary, Cambridge University Press 2003, Version 1.0