This action might not be possible to undo. Are you sure you want to continue?
Byram, M. (ed.) (1993) Germany. Its representation in textbooks for teaching German in Great Britain, Frankfurt: Diesterweg. Â Doye, P. (ed.) (1991) Groûbritannien. Seine Darstellung in deutschen Schulbuchern fur den Englischunterricht È È (Great Britain. Its representation in textbooks for teaching English in Germany), Frankfurt: Diesterweg. Kramer, J. (1983) English cultural and social studies, Stuttgart: Metzler. Kramer, J. (1990) Cultural and intercultural studies Frankfurt: Peter Lang. Kramsch, C. (1993) Context and culture in language teaching, Oxford: Oxford University Press.
È JU RGEN KRAMER
Assessment and testing
The term `assessment' is generally used to cover all methods of testing and assessment, although some teachers and testers apply the term `testing' to formal or standardised tests such as the Test of English as a Foreign Language (TOEFL), and `assessment' to more informal methods. In this entry, however, the terms `assessment' and `test' are used interchangeably. Recent history In Britain, assessment of foreign languages was mostly conducted by means of traditional examinations until well into the twentieth century (Spolsky, 1995). However, in the USA, influences from the field of PSYCHOLOGY, together with concerns about the fairness of subjective EVALUATIONS, led to the wide use from the 1920s onwards of objectively marked tests. Such tests were ideally suited to the structural language SYLLABUSES of the 1950s and 1960s with their emphasis on the teaching of separate elements of language, and discrete point multiple choice questions became common in many parts of the world (Lado, 1961). Objective tests had many advantages: apart from being easy to mark, the internal RELIABILITY of the tests could be calculated, and item analysis could tell test constructors not only how difficult individual items had been for their examinees, but also how well these items discriminated
between the strong and the weak students. (See Alderson, Clapham and Wall, 1995, for information about item analysis and reliability indices.) In the 1970s, however, concerns that the answers to these discrete point items provided no evidence of students' more global linguistic SKILLS led to Oller's unitary competence hypothesis, and the wide use of integrative tests such as CLOZE and DICTATION to assess general linguistic proficiency (see Oller, 1979). Although Oller later concluded that language proficiency consisted of more than one underlying factor (Oller, 1983) and although cloze tests were later shown to be less valid and reliable than had originally been thought (Cohen, 1998), cloze tests have remained a popular method of testing around the world. In recent years, the move towards the COMMUNICATIVE approach to teaching has encouraged testers to make their test items more integrated (less discrete), and the tasks more AUTHENTIC in both content and purpose. Interest has swung from reliability to validity, and more researchers are turning their attention once again to direct tests of SPEAKING and WRITING (see McNamara, 1996). In recent years, too, differing test philosophies have moved closer together: American test constructors are more concerned with test content than they were, while British examination boards use statistical procedures to analyse the validity and reliability of their tests. Theories of language testing Test content is linked to theories of language learning and testing, and at present such theories relate to communicative principles. Canale and Swain (1980) included sociolinguistic and STRATEGIC COMPETENCE in their description of the domains of language knowledge, and Bachman (1990) added psychophysiological mechanisms. Bachman and Palmer (1996) elaborated on this model further to include both affective and metacognitive factors. This model of communicative language ability is used as the theoretical basis for tests such as the International English Language Testing System (IELTS) test, and also provides the theoretical basis for many current research projects. (See McNamara, 1996, for a discussion of recent language testing models.)
test constructors attempt to prevent their tests being biased against students according to factors such as GENDER. Alderson. progress. 1995). teachers. its content and the theory of language teaching on which it is based. so that a test is not biased according to test method effect. HampLyons. 1995). 1998b). it is possible. Tests where much is at stake for the examinee are generally based on a set of specifications (see Alderson. Fulcher. it may have many of the same weaknesses as the cloze test (see Jafarpur. 1997) ± but the accessibility of computer programs such as FACETS (see Cushing Weigle. (For a more . first language or background knowledge (Wood. but his views have had a profound effect on language testing. the International Baccalaureate language examinations) or to speak a language for a specific purpose (for example. 1994. Such Language for Specific Purposes (LSP) tests contain language and tasks similar to those the students will encounter in their future career (see Douglas. Since it is now accepted that students differ in the types of task in which they excel (Wood. Clapham and Wall. However. 1990 and Brindley. 1995) which set out the main features of the test. rating scales have been devised to help raters assess students' performances. 2000. Clapham and Wall. Methods of test validation Other advances in statistical analysis have enabled test researchers to use complex methods such as multiple regression. as well as describing its potential candidature. 1988. and describe the test's aims. Fulcher. Such scales may be `holistic'. 1991). or `analytic'. 1997) to investigate the reliability of the marking. 1997. but many authors have explained his views more simply (see. 1993). The specifications vary according to whether they are designed to be read by students. 1997. GRAMMAR and VOCABULARY (Weir. raters do not always mark consistently and sometimes give marks that are not in line with those of other markers (Brindley. however. Clapham and Wall. How such scales work needs to be investigated because. too. are affected by the test's purpose. analysis of variance. 1997.Assessment and testing 49 Test purpose The overall purpose of a test inevitably affects its contents.) Rating scales With the increasing use of subjectively marked writing and speaking tests. 1998a. or whether it is to be used for placement or diagnostic purposes). and in the speaking and writing components of the English as a Foreign Language examinations of the University of Cambridge Local Examinations Syndicate (UCLES). 1998) have made it possible to assess how such scales work in practice. where the performance is marked according to a range of separate criteria such as content. or administrators. where the assessor judges the student's performance as a whole. the Finnish Foreign Language Diploma for Professional Purposes (FFLDPP)). The specifications also list other reasons for taking the test. but in all cases these specifications state the test's overall purpose (whether it is to assess the students' linguistic APTITUDE. for example. Not all of Messick's (1989) theories about validity are universally accepted. His 1989 article is long and complex. and any detailed set of test specifications will describe the methods of assessment to be used (Alderson. in spite of training. Examples of these are used in the Oral Proficiency Instrument (Lowe and Stansfield. Test types Test types. 1995). organisation. The validity of such marking scales may be questionable ± few attempts have so far been made to design analytic scales using samples of actual performance (see. Discussions of different test types are given in Buck. Alderson. One type of test which is widely used at present is the C-TEST. 1988). Similarly. and Shepard. such as the demonstration of an ability to communicate in a foreign language (for example. (For useful descriptions of different test methods. 1993. using generalisability studies (Bachman. Moss. 1993. factor analysis and structural equation modelling to assess the construct validity of their tests. see Heaton. 2000). 1991). achievement or proficiency. item writers. and Weir. which is easy to construct and is supposed to assess a wide range of skills. In addition. test batteries generally include a range of test types.
lies with the reliability of such assessments. Clapham and Wall. 1997). because multiple choice items and gap filling tasks are straightforward to answer on the computer. Impact and washback In the last ten years there has been an upsurge of interest in the impact of tests on education. `alternative assessment' is generally `criterion referenced'. It is difficult to draw a line between `testing' and `alternative assessment'. it is often the case that teachers use `tests' for `summative assessment' at the end of a course or the school year. This project aims to produce diagnostic tests in fourteen different European languages (DIALANG. which can only report the difficulty of an item for a particular group of test takers. see Alderson. the expected impact of personal computers on language assessment has not materialised. the Ethnograph and ATLAS have made it more possible to analyse large amounts of qualitative data. The fact that DIALANG will be able to adjust to the student's level is possible because of advances in test analysis. are all steadily increasing the scope of computer-administered tests. they will adapt to each student's level of linguistic proficiency.50 Assessment and testing traditional view of test validity. and the effect of tests on teaching (see Alderson and Wall. Items can therefore be banked according to their level of difficulty. but they have many advantages. however. Students will be tested on their grammatical knowledge and on their READING. the increasing ability of the computer to recognise sounds and letters. such alternative methods of assessment will not be considered to be part of the mainstream of language assessment until they can be shown to be both valid and reliable. In addition. Their marking schemes may not have been validated. the tests tend to be integrated. Unlike classical item analysis. WRITING. students will receive instant diagnostic information about the strengths and weaknesses of their performance. so that it is theoretically possible to report the difficulty of any test item regardless of the students on whom the item has been trialled. learner diaries or interviews with teachers. However. 1995. However. Such assessment procedures may be more time-consuming and difficult for the teacher to administer than `paperand-pencil' tests. teachers and students to understand. Similarly. and they can reflect the more holistic Teacher methods used in the classroom. such as writing-portfolios. it is perhaps fair to say that while `tests' are often `norm referenced'.) Technological advances So far.e. Computer testing has tended to fossilise existing objective testing methods. 1997) also takes account of the ability of the students. the comparative ease with which videos and listening extracts can now be downloaded from the Internet. After taking their chosen test. One project which has the potential to produce interesting tests which are easy to deliver and mark is DIALANG. and are easy to mark mechanically. the increasing sophistication and ease of use of computer programs such as NUD*IST. LISTENING and SPEAKING skills. and the tests will be computer adaptive. Item Response Theory (see Bachman and Eignor. and can be used as required in computer adaptive tests. 1997). One problem with methods of alternative assessment. with the student's performance being compared not to that of other students but to a set of performance OBJECTIVES or criteria. with the intention of using the information to decide what needs to be taught or reviewed in the next stages of a course. As Hamayan (1995) says. and raters have often not been trained to give consistent marks. a project supported by Lingua in Europe. Alternative assessment `Alternative assessment' refers to informal assessment procedures. which are often used within the classroom. and many test batteries include examples of each. and `alternative assessment' for `formative assessment' that is carried out by teachers during the learning process. i. They produce information that is easy for administrators. with the student's score being compared to that of other students. . and advances in the uses of language corpora for teaching and testing. and many researchers now use qualitative methods such as in-depth interviews and verbal introspections and retrospections to investigate the validity of a test or a test method (Banerjee and Luoma.
Alderson. Bachman (2000) and Brindley (in press). Alderson. Corson (eds).F. Oxford: Oxford University Press. DC: AERA Publications. and more holistic tests which can potentially reveal all aspects of the candidates' language proficiency. Other testing organisations. Since then there have been many empirical studies into washback (Wall. validity. (1997) `Recent advances in quantitative test analysis'. Turner and Huhta. A. pre-testing. ALTE (1998) Multilingual glossary of language testing terms. 1997. Corson (eds).R. Vol. such as the Association of Language Testers in Europe (ALTE) and Educational Testing Service (ETS) in Princeton. Language testing and assessment. Reliability. 1: 1±42. 1996).F.F. 1997). 1997. Direct/Indirect testing. 7. Hamp-Lyons. Applied Linguistics 14.F. While testers are likely to experiment with complex and time-consuming methods of testing language. Language testing and assessment. . Proficiency tests. Bachman. (1999) and the Multilingual glossary of language testing terms (ALTE. 17. L. item-writing. Current research in different areas of language assessment is discussed in Clapham and Corson (1997). See also: Aptitude tests. In their 1993 article. It is impossible to cover all aspects of language assessment in this entry. see Davidson.C. listening and speaking (ILTA. reliability. (For more about this.) Current trends It seems likely that the competing requirements of test validity and financial practicality will maintain the distinction between tests which can be administered reliably to large numbers of students. writing. and partly to the uses that might be made of test results. statistics. Clapham and D. Cambridge: Cambridge University Press. Language testing and assessment. and current concerns about language testing are described by Douglas (1995). S. This concern relates partly to questions of fairness and equity. (2000) `State of the art article on language testing'. and Eignor. and Wall. and Palmer. and Luoma. 1998) have concise explanations of most of the concepts related to the field. L. but the Dictionary of language testing by Davies et al. in C. These videos introduce the novice language tester to test specifications. Many testing organisations adhere to the AERA standards (American Educational Research Association. Bachman. Shohamy (1997). Bachman. 1999) and ILTA (International Language Testing Association) has prepared its own Code of Ethics for language testers (ILTA. Integrated tests. and the assessment of the skills of reading.. Alderson and Wall bemoan the lack of research into whether tests do actually affect teaching and. Integrative tests. L. Wall. 1997. C-test. J. Ethics and accountability There is also increasing concern with issues relating to ethics and accountability in assessment. Diagnostic tests. Bachman. testing for specific purposes. L. J. Clapham and D. and Wall. Cambridge: Cambridge University Press. J. and Norton. (1990) Fundamental considerations in language testing. the International Language Testing Association (ILTA) has produced twelve five-minute videos on the most frequently discussed aspects of language testing. Evaluation. Bachman. Washington. (1997) `Generalizability theory'. Placement tests.F. Clapham. L. (1996) Language testing in practice. American Educational Research Association (1999) Standards for educational and psychological testing. the expense of such methods will prevent many large testing organisations from adopting them. In addition.C. what form such `washback' might take. D. Cloze test. Dordrecht: Kluwer. have their own codes of practice.C. 2000) and is preparing its own Code of Practice. Clapham and D. D. Banerjee. Vol. (1995) Language test construction and evaluation. New Jersey.Assessment and testing 51 1993. (2000) Assessing reading. Encyclopedia of Language and Education. D. (1993) `Does washback exist?'. too. C. Dordrecht: Kluwer. London: Longman. in C. if they do. (1997) Qualitative approaches to test validation in C. 2: 115±29. 1999). Corson (eds). 7. test impact and ethics. Progress tests. Language Testing. Validity References Alderson. Discrete point tests. Encyclopedia of Language and Education. Oxford: Oxford University Press. J.
Vol. Annual Review of Applied Linguistics. Encyclopedia of Language and Education. Vol. and Swain. (1997) `The testing of L2 speaking'. in C. Cohen. Dordrecht: Kluwer.. A. A. D. (1996) Measuring second language performance. Oller (ed. (1990) `Second language writing.L. (1998) `Using FACETS to model rater training effects'. D.ac. Language Testing 12. (in press) `Assessment'. 21: 38±9 (website: http:// www. G. Dordrecht: Kluwer.W. Vol. Cambridge: Cambridge University Press. and Stansfield. Clapham and D. (1997) `Accountability in language assessment'. (1994) `Can there be validity without reliability?'. Corson (eds). London: Longman. (1997) `The testing of listening in a second language'. in C. Applied Linguistics 1. Dordrecht: Kluwer. Dordrecht: Kluwer. Oller.fi/DIALANG). Kroll (ed. (1988) Writing English language tests. Vol. Shohamy. Language Testing 15. 15: 212±26. (1979) Language tests at school. B. R. (1998b) `Outcomes-based assessment and reporting in language learning programmes: a review of the issues'. Vol.).uk/ELI/ilta/faqs/ main. Corson (eds). 15: 167±87. Language testing and assessment. Cambridge: Cambridge University Press. in L. Vol. J. Messick. London: Longman. http://www. Canale M. and Corson. 7. M. The Cambridge TESOL guide. . Corson (eds). G. Bachman and A. Clapham and D. Educational Researcher. (1998a) `Assessing listening abilities'. Brindley. (1997) `Ethics in language testing' in C. Language Testing 15. J. J. New York: American Council of Education/Macmillan. ILTA (1999) Frequently asked questions about language testing. Douglas. (1995) `Is c-testing superior to cloze?'. Brindley. Davies. Moss. Cambridge: Cambridge University Press. in C. 7. 2: 194±216. (1961) Language testing. Nunan (eds). F. Dordrecht: Kluwer. Encyclopedia of Language and Education.. Clapham and D. 7. L. Language testing and assessment. and Hill. Oller. Clapham and D. 7. Review of Research in Education 19: 405±50. 7. Fulcher. K. C.jyu. 7.. (1995) `Approaches to alternative assessment'. Heaton. Corson (eds). in C. Corson (eds). Language Testing Update. G. McNamara. in B. Language testing and assessment.W. Language testing and assessment. Language testing and assessment.W. 7. March: 6±12. (1998) `Strategies and processes in test taking and SLA'. and Huhta. D. Educational measurement. D. L. (1980) `Theoretical bases of communicative approaches to language teaching and testing'. E. London: Longman. 27: 14±24. Corson (eds). Vol. Lado. Shepard. DIALANG (1997) `DIALANG: A new European system for diagnostic language assessment'. Encyclopedia of Language and Education. Encyclopedia of Language and Education. Hamp-Lyons. (1988) Second language proficiency assessment.).surrey. assessment issues'. Issues in language testing research. (1997) `Second language assessment'. (1999) Dictionary of language testing. G. (1997) `Language testing standards'. Rowley. P. A. Brown. (1995) `Developments in language testing'. Dordrecht: Kluwer. Englewood Cliffs. Carter and D. Dordrecht: Kluwer. Turner. Douglas. (2000) Assessing language for specific purposes. S. Encyclopedia of Language and Education. Dordrecht: Kluwer. in J. Interfaces between second language acquisition and language testing research. Clapham. Encyclopedia of Language and Education. NJ: Prentice Hall Regents. Encyclopedia of Language and Education. C. Second language writing: research insights for the classroom. 1: 45±85. Annual Review of Applied Linguistics 18: 171±91. Hamayan. Davidson. Elder. Douglas. E. London: Longman. Cohen (eds). Clapham and D. in R. Cambridge: Cambridge University Press. C. Hamp-Lyons. (eds) (1997) Language testing and assessment. Cambridge: Cambridge University Press. C. T.52 Assessment and testing Encyclopedia of Language and Education. in C. G. (1983) `An emerging consensus'. Linn (ed.). L. A. Brindley. P.html ILTA (2000) `Code of ethics for ILTA'. (1997) `Language for specific purposes testing'. Jafarpur.B. Norton. (1989) `Validity'. 2: 263±87. (1993) `Evaluating test validity'. Vol. Buck. 1: 1±47. Cushing Weigle. Annual Review of Applied Linguistics. in R. Language testing and assessment. Lowe. Language Testing Update. A. 7. MA: Newbury House.F. S. Clapham and D.
Bachman. (1990) Fundamental considerations in language testing.C. D.C. Vol. Oxford: Oxford University Press. C. D. 7. It is for this reason that foreign language teaching sees its task. according to which belonging to a language community determines the mode of human perception (Sapir. Clapham. As Wilhelm von HUMBOLDT says. Spolsky. (1993) `Does washback exist?'. J. (1997) `Impact and washback in language'. and Palmer. of making them conscious of the limits of their own ways of seeing as determined by their MOTHER TONGUE. as has that of the SAPIR± WHORF HYPOTHESIS. Alderson. Oxford: Oxford University Press. D. The child already internalises in first language ACQUISITION the values of its environment. Wall. this can undermine its ways of perceiving hitherto.J. (1996) `Introducing new tests into traditional systems: insights from general education and from innovation theory'. country and people. McNamara. 2: 115±29. Clapham. London: Longman. in C. Empirical research into the connection between attitudes and language learning leads to two viewpoints. Corson (eds). subjective and objective world. L. strength and lightness meet each other in a more harmonious way' (Humboldt. 1970: 68). practical and political reasons. Encyclopedia of Language and Education. London: Longman. It aims thereby not only to teach the cultural context of the other language but also to create a certain distance from pupils' own culture. Encyclopedia of Language and Education. The resultative hypothesis The resultative hypothesis is based on the assumption that experience of success influences attitudes to language. 4. Oxford: Oxford University Press. and Wall. with the help of a new language and its contents. as that of leading pupils from primary age onwards out of its tried and tested conventions. this may be `one of the best mental exercises' because `on account of this. Wall. CULTURAL STUDIES and INTERCULTURAL COMMUNICATION are concerned at a theoretical level with the relationship between understanding of self and understanding of the Other. and Corson. D. 7. J.. Dordrecht: Kluwer. Vol. since it also carries the attitudes. patterns and rules but is simultaneously bound up with the social. Encyclopedia of Language and Education. Vol. B. depth and clarity. for educational. In this way it is hoped to establish an approach to the understanding of Otherness which will contribute to changes in attitudes.F. Further reading Alderson. (1993) Understanding and developing language tests. Applied Linguistics 14. Wood. (1996) Language testing in practice. and Wall. C. 3: 334±54. A. Tucker and D. Language Testing 13. Corson (eds). (1996) Measuring second language performance. T. Concepts such as ACCULTURATION. (1995) Language test construction and evaluation. (1991) Assessment and testing. Language testing and assessment.Attitudes and language learning 53 in G. CAROLINE CLAPHAM Attitudes and language learning Language does not consist only of forms. The Humboldtian way of thinking has left its traces in the context of the justification and aims of foreign language teaching.F. C. habits and cultural characteristics of its speakers. Dordrecht: Kluwer. Clapham and D. even though the data (here discussed selectively) and current theoretical work suggest a quite different interpretation.R. The first systematically collected data were provided by a study at the beginning of the 1940s of 11±15-year-old boys . to the breaking down of prejudices and STEREOTYPES. (1995) Measured words. Bachman. Cambridge: Cambridge University Press. and identifies with those people who appear to it to be authorities. Dordrecht: Kluwer. Weir. its true inner content appears more clearly. L. D. R. Second language education. If it is confronted with an unknown sign system. 1907: 193). Hemel Hempstead: Prentice Hall. which as the resultative hypothesis and the MOTIVATIONAL hypothesis continue to be discussed in polarised terms. thought becomes more independent of one particular kind of expression. (eds) (1997) Language testing and assessment.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.