You are on page 1of 66


A Grammar and Writing Achievement Test: Assessing Learners of the Advanced Study Course Community English Program, Teachers College, Columbia University Aya Tsuchie Rongchan Lin Teachers College, Columbia University

Professor Kirby C. Grabowski A&HL 4088 Second Language Assessment Final Paper (Fall 2012)

TABLE OF CONTENTS 1. INTRODUCTION A. Motivation for the Test4 B. Research Questions..6 2. REVIEW OF THE LITERATURE A. Grammar Knowledge..6 B. Writing Ability9 C. Relationship between Grammar Knowledge and Writing Ability14 3. TEST CONSTRUCTION A. Target Language Use Domain...16 B. Design Statement...19 C. Operationalization..20 D. Item Coding for the MC Section....21 E. Administration Procedures22 4. TEST PILOT A. Study Participants......23 B. Measuring Instrument ...24 C. Scoring Procedures.30 5. TEST ANALYSES AND RESULTS 5.1. RESULTS FOR THE GRAMMAR TASK A. Descriptive Statistics.....34 B. Internal-consistency Reliability and Standard Error of Measurement .35 C. Item Analyses ....38

D. Distractor Analysis.....41 E. Evidence of Construct Validity within the MC Task ....44 5.2. RESULTS FOR THE WRITING TASK A. Descriptive Statistics.....45 B. Internal-consistency Reliability and Standard Error of Measurement ... 47 C. Inter-Rater Reliability ...49 D. Evidence of Construct Validity within the Extended Production Task....52 5.3. Other Evidence of Validity A. Relationships between the Two Parts of the Test... ...53 B. Relationships between Background Variable and Performance. .. 55 6. 7. 8. DISCUSSION AND CONCLUSIONS56 REFERENCES.60 APPENDICES A. Test Task Specifications...63 B. Outline of Participation65



A. Motivation for the test The Community English Program (CEP) is a program within the Community Language Program (CLP) offered by the TESOL and Applied Linguistics (AL) Programs at the Teachers College, Columbia University. It is dedicated to teaching English as a second language and foreign language to adult learners from diverse backgrounds. The student population is made up of students from different nationalities, professions, economic statuses and language backgrounds. They usually reside in New York City or other nearby areas. The instructors are students from the TESOL/AL Program who undertake teaching duties at the CEP as part of their practical training. Students in the CEP have to sit for a placement exam and they are subsequently placed into different classes based on their scores. They then receive instruction aimed at their respective proficiency levels. The curriculum takes a theme-based and communicative approach to equip the students for effective communication skills in the real world. This paper pertains to the learning progress of the students enrolled in the Advanced Level course in the CEP. In Advanced Study, the teachers have the flexibility to design the curriculum in accordance to students needs. This helps ensure that learning is meaningful and is of an appropriate difficulty level. As part of the assessment procedure for the CEP, a midterm exam and final exam are administered every semester. The assessment is a relatively low-stakes. Continuing students are promoted to the next level for the next semester if they have a score of 70 or higher out of 100 on their final grade. Otherwise, they would have to retake the course again depending on their test scores.

The midterm exam is a form of formative assessment and has diagnostic value. It comprises five sections, namely grammar, listening, speaking, reading and writing. The derived test score for the midterm makes up part of students final score for the program. Besides, the test results also provide useful information to guide further teaching and learning. After taking the test, students would have a better idea of their mastery level and be better informed of their strengths and weaknesses. The teachers would also be able to analyze the results, identify the learning gaps and adjust his or her teaching accordingly for the next half of the semester. The test in the present study is positioned as an achievement test which is in alignment to the curriculum for the Advanced Studies course. It aims to assess students mastery of the content taught for the grammar and writing components for the first half of the semester. The grammar section consists of 20 multiple choice (MC) questions targeted at students mastery of grammatical knowledge, focusing primarily on the morphosyntactic elements (i.e., tenses) and lexical elements (i.e., collocations) taught in class. Form and meaning of both categories are given equal emphasis. The writing component aims to assess students writing ability for the argumentative essay. Students are required to respond to the extended production task designed. There is no word limit being imposed. The purpose of this paper is to discuss the construction of the test. First, the research questions will be presented. This will be followed by a literature review regarding the construct of grammar knowledge and writing ability. The relationship between the two constructs abovementioned will be explored and the respective theoretical model will also be presented. We will then describe the different stages of the test construction, namely the target language use (TLU) domain, the design statement, operationalization, the item coding for the MC section and the administration procedures. A copy of the test with the key will also be provided.

B. Research Questions This research paper will address the following research questions: 1. What is the nature of grammatical knowledge at the Advanced Study level in the CEP? 2. What is the nature of writing ability at the Advanced Study level in the CEP? 3. What is the relationship between grammatical knowledge and writing ability in this test? 4. What is the relationship between test-takers' native language (Asian VS non-Asian) and their performance on this test? 2. REVIEW OF THE LITERATURE

A. Grammar Knowledge Nature of Grammatical Knowledge In order to make inferences from the test results about the test takers language ability, a clear definition of the constructs being measured on the test needs to be provided (Purpura, 2004). Therefore, in order to test grammatical knowledge of a test taker, one must have a clear understanding of what constitutes grammatical knowledge. Grammatical knowledge has been actively discussed and defined by several researchers in the last few decades. This section of the literature review looked at the definitions of grammatical knowledge provided by different researchers by examining their consideration of form, meaning, pragmatics as part of grammatical knowledge. Lastly, the construct measured in the midterm test administered in the CEP at Teachers College, Columbia University will be described along with the graphic representation of the model. Some frameworks of grammatical knowledge proposed by researchers only include form without considering meaning. In Lados (1961) view, morphosyntactic form was the only component of the grammatical knowledge. Similarly, Bachman and Palmar (1996) focuses on

the accuracy of utterance and meaning was not explicitly mentioned in their definition of grammatical knowledge. However, in their opinion, grammatical knowledge included additional components of linguistic forms such as phonology, graphology, vocabulary with the syntax. Some researchers noticed a need to add meaning to their definition since grammar is used to encode meaning. One of the early researchers to propose this need was Carroll (1968). Carroll pointed out a need to design tests that allow teachers to predict the use of language in future situations. He recognized the overlap between form and meaning in a real life situation and introduced the idea of incorporating semantics as a part of grammatical competence in addition to morphosyntactic form. In terms of incorporation of semantics in the definition of grammatical knowledge, Canale and Swain (1980) constructed a similar framework. In their view, grammatical competence is composed of phonology, lexicon, syntax and semantics. What was different from Carrolls (1968) definition was the inclusion of phonology and lexicon as a part of grammatical knowledge. In Carrolls framework, lexicon such as vocabulary and idi oms are part of lexical competence which is one of the components of language competence along with grammatical competence. Other researchers expanded the definition of grammar knowledge even further by considering the pragmatic aspect of a language. One of the early researchers to propose this idea was Oller (1979). Oller introduced pragmatic use in addition to semantic meaning and linguistic from in the definition of grammar. However, Oller failed to clearly specify how pragmatic use relates to grammatical knowledge. Rea-Dickins (1991) and Larson-Freeman (1991) also suggested pragmatics as a part of grammatical knowledge in addition to syntax and semantics. However, Purpura (2004) proposed separation of pragmatics from the definition of grammatical

knowledge since the inclusion of pragmatics makes a blur distinction between grammar and language. One of the most recent frameworks is proposed by Purpura (2004). Purpura (2004) suggested that grammatical knowledge is composed of grammatical form and meaning (literal and intended). According to Purpura, literal meaning can be derived from the combination of the parts of a sentence and how the parts are arranged in the sentence while intended meaning can be derived from the speakers meaning with a context. In Purpuras model, phonological, lexical and morphosyntactic elements are included for both grammatical form and meaning at the sentential level, while cohesive, information management and interactional levels pertain to the discourse level. Purpuras view of considering speakers intended meaning in the definition of grammatical meaning is different from the traditional definitions of grammatical meaning proposed by previous researchers. Purpura noticed a need to include intended meaning in addition to literal meaning by pointing out that intended meaning of an utterance might be different from the literal meaning of the utterance in a specific content in some situations.

Theoretical Model of Grammatical Knowledge The construct of grammatical knowledge that is tested in Section A of the Advanced Studies course in CEP is drawn from Purpuras framework of grammatical knowledge. Both grammatical form and meaning are tested for at the morphosyntactic and lexical level. Discourse level of grammatical form and meaning such as cohesive, information management, interactional form and meaning as well as phonological/ graphological form and meaning are not tested. This decision was based on the instructors decision. Since there is no textbook at Advanced Study

level, what is tested was based on what the instructor of the course decided to cover for grammatical knowledge. The graphic representation of the construct of grammatical knowledge is represented in Figure 1. FIGURE 1 Theoretical Model of Grammatical Knowledge Morphosyntactic Form Grammatical Form Present Perfect Past Perfect Past Continuous Simple Past Lexical Form Collocations Morphosyntactic Meaning Present Perfect Past Perfect Past Continuous Simple Past Lexical Meaning Collocations

Grammatical Knowledge

Grammatical Meaning

B. Writing Ability Nature of Writing Ability The writing ability has been neglected for a period of time as many used to believe that it could be tested using an indirect test before the 1990s (Hamp-Lyons, 1990). However, writing is a skill with its unique features and conventions (Brown, 2004). Over the years, we have come to recognize the importance of writing, and its related assessment has matured significantly. Considering the multifaceted and complex nature of writing ability, there is a need to scrutinize


the various models proposed by researchers to gain a deeper understanding of its construct and how it has been operationalized. Considering that writing is a form of language use, it is essential for us to examine the construct of language ability and the complex interaction between the various components involved for writing specifically. Regarding what writers need to deal with in producing a piece of writing, Raimes (1983) identified nine domains: content, syntax, grammar, mechanics, organization, word choice, purpose, audience and the writing process. Raimes (1983) model does not seem to suggest that there is any hierarchy relationship between the various elements as they appear to exist as separate entities. There is no clear categorization of the grammatical knowledge and the actual writing process involved. Adapting from Weir (2005), Shaw and Weir (2007) identified three aspects of context validity for writing, namely: 1) Setting: task (response of format, purpose, knowledge of criteria, weighting, text length, time constraints, writer-reader relationship), 2) Setting: administration (physical conditions, uniformity of administration, security) and 3) Linguistic demands: task input and output (lexical resources, structural resources, discourse mode, functional resources and content knowledge). The aspect of linguistic demand is of direct relevance to the construct of the writing ability. Such elements could be further categorized and expanded for better clarity. In their model of language use and language test performance, Bachman and Palmer (1996) listed four characteristics of individuals, namely personal characteristics, topical knowledge, affective schema, and language ability. This section will discuss the last three components which relate more to the construct of the test in the present study. The topical knowledge can be perceived as knowledge structures in long-term memory and we are reminded that certain tasks that presuppose cultural or topical knowledge on the part


of the test takers may be easier for those who have that knowledge and more difficult for those who do not (Bachman & Palmer, 1996, p. 65). The choice of the topic for a writing task is believed to impact students writing performance to a certain extent. Regarding affective schema, emotionally charged topics should be avoided so that learners could better display a repertoire of their language knowledge and meta-cognitive strategies. Stemming from Bachman (1990), Bachman and Palmer (1996) defined language ability as involving two domains: language competence and strategic competence. Organizational knowledge and pragmatic knowledge are the two main categories underpinning the construct of language competence. The former comprises grammatical knowledge and textual knowledge. Grammatical knowledge is concerned with the accuracy of utterances or sentences which involves knowledge of vocabulary, syntax and phonology. Textual knowledge constitutes knowledge of cohesion and rhetorical or conversational organization. Pragmatic knowledge is made up of functional (ideational function, manipulative function, heuristic function, imaginative function) and sociolinguistic knowledge (knowledge of dialects/varieties, registers, natural or idiomatic expressions, cultural references and figures of speech). Strategic competence, which is the other component of language ability, is a set of metacognitive strategies that provide a cognitive management function in language use, as well as in other cognitive activities (Bachman & Palmer, 1996, p.70). Given the complexity of the writing task, effective writing essentially entails a good mastery of the above-mentioned elements. The components identified above interact with one another. In a recent study by He and Shi (2012) regarding topical knowledge and ESL writing, it was discovered that students performed significantly better on general topics compared to specific topics. For the timed-impromptu essay that requires specific knowledge, students scored lower scores on content due to poor


development of ideas, position taking, and a weak conclusion. Interestingly, lower scores are also obtained on organization and language arising from weaker coherence and more language errors. The intricate relationship between the components of writing ability requires careful consideration in the design of a test to ensure construct validity. It is also crucial to recognize that there are fundamental differences between L1 and L2 writing. This gives us a clearer idea of the components of writing ability that we need to focus on for assessing L2 writing. According to Hyland (2003, p.3.), learning to write in a foreign or second language mainly involves linguistic knowledge and the vocabulary choices, syntactic patterns, and cohesive devices that comprise the essential building blocks of texts. Weigle (2002) also noted that L2 writers tend to focus on language rather than content. This may imply that higher-order issues of content and organization are neglected due to limitations of L2 learners working memory. In a study involving 72 research reports, Silvia (1993) suggested that L2 writers tend to be less fluent, display less accuracy and are also less effective. Besides, L2 writing demonstrates more conjunctive and fewer lexical ties and less lexical control. Hence, in our design of the writing task, while drawing on the language ability models proposed by researchers, we also attempt to factor in the difficulties faced by the L2 learners so that realistic goals are set, as well as ensure that a comprehensive set of related skills are being tested. Theoretical Model of Writing Ability The construct of writing in the present study is derived from the related syllabus for the CEP Advanced Studies course. The genre and scope is determined by the instructor based on the results of the needs analysis conducted. The students were taught to write an argumentative essay which requires them to state their position, provide evidence, and address alternative views.


Besides considering the difficulties faced by the L2 learners, the test in the present study also addressed aspects of Bachman and Palmers (1996) model of language ability that pertains to the writing task of the CEP test most. The relevant components are merged and streamlined into three broad domains to facilitate grading. The writing task designed aimed to assess students writing of an argumentative essay by looking at three domains of the written product, namely 1) content control, 2) organizational control and 3) language control. First, in our attempt to ensure validity, the topic in the present study is selected with care to minimize the chances of students being penalized due to their lack of topical knowledge. Students were required to discuss their views on the given prompt, and any stance taken by the test takers will be accepted. The content component effectively corresponds to the pragmatic knowledge in Bachman and Palmers model. The responses were assessed based on the clarity of position and claim, integration of evidence and effectiveness of counter-argument. Second, organizational control refers to the use of transitional strategies and progression of ideas. The completeness of the discourse is also taken into account by considering the inclusion of introduction and conclusion. This is consistent with Bachman and Palmers (1996) textual knowledge. Finally, language control reflects test takers grammatical knowledge by requiring them to demonstrate their mastery of different grammatical structures and accurate use of collocation. This is consistent with the grammatical knowledge proposed by Bachman and Plamer (1996). Emphasis is also placed on language conventions such as the effective and consistent use of punctuation, capitalization and spelling. The graphic representation of the construct of writing ability is represented in Figure 2.


FIGURE 2 Theoretical Model of Writing Ability

Content Control Writing Ability Organizational Control

Language Control

C. Relationship between Grammar Knowledge and Writing Ability Considering that language is temporal, language material entails a certain kind of sequential order which applies to grammatical elements and chunks of information (also known as proposition content) that the grammar structure conveys (Rutherford, 1987). Regarding the relationship between form and content, in an empirical study conducted by Fathman and Whalley (1990), it was noted that when students receive feedback on content and grammar at the same time, the content during their rewriting improves approximately as much as when content feedback only is given and focus on grammar does not negatively affect the content of writing (p.186). The findings hinted at the relationship between content and form which certainly deserves closer scrutiny. As noted by Raimes (1983), writing itself is a language skill that incorporates a repertoire of skills from the leaners perspective. Many researchers would agree that grammar and form are intrinsic components of writing ability. Brown (2004) advanced our understanding of writing by providing a comprehensive taxonomy of its related micro-skills and macro-skills. The former pertains more to imitative and


intensive types of writing while the latter relates more to responsive and extensive writing. It is noteworthy that the use of acceptable grammatical systems (e.g. tense, agreement, pluralization) , the expression of a particular meaning in different grammatical forms and the use of cohesive devices constitute part of the micro-skills. For macro-skills, the ability to convey links and connections between events and communicate such relations as main idea, supporting idea, new information, given information, generalization, and exemplification (p. 221) is listed along with other skills. For effective communication, the importance of grammatical knowledge as a part communicative language ability cannot be undermined (Canale and Swain, 1980; Purpura, 2008). From the above taxonomy, it is evident that the rules of grammar govern the process of writing. It was posited that a focus on form (grammar) can help writers develop rich linguistic resources that are required to express ideas effectively (Frodesen & Eyring , 2000, as cited in Mohammad, 2008). This claim was supported by an empirical study which discovered that the correlation between the students writing and grammatical competence was the strongest compared to other sub-skills (Mohammad, 2008). Based on the aforementioned, it is speculated that a high correlation exists between grammatical knowledge and writing ability. The relationship between the two components is presented in Figure 3.


FIGURE 3 Relationship between Grammatical Knowledge and Writing Ability Morphosyntactic Form Grammatical Form Present Perfect Past Perfect Past continuous Simple Past Lexical Form Collocations Morphosyntactic Meaning Present Perfect Past Perfect Past Continuous Simple Past Lexical Meaning Collocations Content Control Writing Ability Organizational Control Language Control

Grammatical Knowledge

Grammatical Meaning



A. Target Language Use Domain The overarching goal of the CEP is to equip learners with the essential skills to communicate in English in everyday life. Hence, authentic tasks are integrated into the


curriculum. Besides developing the four language skills, special emphasis is also placed on grammar. The instructional outcomes are clearly delineated to guide teaching and learning. As evident from the needs analysis administered by the instructor, the students enrolled in the Advanced Studies course are highly motivated to learn English. The main goal for learning English is to engage in real life communication and be able to interact effectively with others in New York. The advanced learners aim to increase their English competency for them to have a competitive edge at work or to further their academic pursuit at the tertiary level in the States The test content was determined after an in-depth discussion with the instructor. To ensure representativeness of the test content, we basically tested all of the content being taught. In addition, to ensure that the test is aligned with the curriculum, two lessons were observed. We also took reference from the notes and exercises given out so that the test methods are familiar to the test takers. After reviewing the handouts and consulting the CEP teacher, the theme, Education, was chosen as we felt it is close to the learners hearts and is of interest to them. The test comprises two sections. The grammar test consists of 20 multiple-choice questions. There are three parts to the grammar test. The writing test is the extended production section which requires the learners to write an essay. In the grammar test, students were tested for their understanding on grammatical knowledge. The mastery of grammar is fundamental to paving a good foundation for the learning of the English language. A good understanding of grammar including tenses and collocations will allow students to perform a wide range of real life tasks and minimize miscommunication in everyday life. The MC questions are an effective and efficient way of assessing learners understanding and this is also a test format familiar to the students.


Regarding tenses, the questions revolve around simple past, present perfect, past perfect and past continuous tense. Besides knowing the form of the above-mentioned, there are also tested for the meaning of the various tenses. The learning of tenses will enable students to better understand the timeframe of events and facilitate effective communication in everyday life. For collocations, the students are required to demonstrate their understanding of the different types of collocation. It is important to assess lea rners mastery of collocations because an appreciation for collocation will allow learners to 1) use words more accurately, 2) sound more natural in speaking and writing, 3) vary their speech and writing, and 4) understand when a skillful writer departs from normal patterns of collocation ( ODell & McCarthy, 2008). The various types of collocation taught are: 1) verb + noun, 2) noun + verb, 3) noun + preposition and 4) metaphorical collocation. Also known as idioms, the metaphorical collocations are the hardest the master as they are fixed expression. This will stretch the students appropriately as they are advanced learners. For writing, the focus is on the writing of an argumentative essay which is a challenging yet attainable task for the learners. This task mirrors the interaction that the learners may have outside the classroom. The advanced learners have many opportunities to exchange opinions on particular topics in everyday life or in academic settings. The writing task hence tests them to state their stand or position, address alternative views, as well as explain their reasons in a logical and convincing manner. By integrating critical thinking and effective use of the language, this task challenges the advanced learners appropriately. In the design of the topic, special care is taken to ensure that there is no topical bias. The topic designed is suitable for the target profile in accordance to their interest and experience. It does not require students to possess specific knowledge as students could draw upon their own


experience of learning a second language. Considering that the learners themselves are learning a second language, such a topic should allow them to have room for discussion. B. Design Statement The design statement for the test is presented in Table 1. TABLE 1 Design Statement
1. TEST PURPOSE A. Inferences

About test takers knowledge of selected grammar points and their ability to write an argumentative essay. Relatively low: test results count towards students final grades but the course is the most advanced level in the Communicative Language Program so it will not be used to determine the advancement of the students to the next level. Students and teachers of the course, CEP administrators To assess whether students mastered the grammar points and writing skills covered in the first half of the semester. To identify students strength and weakness of grammar points and writing skills for both teachers and students. The result can be used to decide which area needs to be further taught in the second half of the semester.

B. Decisions 1. Stakes

2. 3.

Individuals affected Specific decisions to be made A. Achievement B. Diagnostic

2. DESCRIPTION OF TLU DOMAIN/S AND TASK TYPES A. Identification of tasks 1. TLU domain Real life situation 2. Identification of selection of TLU tasks for consideration as test tasks TLU tasks were identified by the instructor based on the needs analysis carried out in the beginning of the semester. Test tasks were then identified by the instructor based on what was covered in the first half of the semester and what is important. See Table 2

B. Description of TLU task types 3. DEFINITION OF CONSTRUCTS A. Language ability

Construct definition for this achievement test are based on the theoretical definition of grammatical knowledge and writing and also based on the course content. Grammatical form and meaning Content, organization, and language control Not explicitly included in the construct

1. 2.

Grammatical knowledge Writing ability

B. Strategic competence

C. Topical knowledge Not included in the construct since the achievement test was created to make an inference about their grammatical knowledge and writing ability covered in the class.

The test task specification can be found in Appendix A. C. Operationalization Blueprint Test Structure 1. Number of parts/tasks: The test consists of two tasks: grammar task (multiple-choice) and writing task (extended production). The grammar task consists of three parts: Part 1 grammatical form, Part 2 grammatical meaning, and Part 3 collocation form and meaning. 2. Salience of parts: Three parts are clearly distinct with separate instructions and printed on separate pages. The three parts in the grammar task are also distinct with headings. 3. Sequence of parts: The grammar task is completed first, followed by the writing task. For the grammar task, grammatical form questions are completed first, grammatical meaning questions are completed afterwards, followed by the collocation questions. 4. Relative importance of parts of tasks: Both tasks are equally important. For the grammar task, Part 1 and 2 have six questions each while Part 3 has eight questions. 5. Number of tasks per part : One grammar task and one writing task

The test structure is presented in Table 2. TABLE 2 Test Blueprint

Test Content Grammar *Form - Morphosyntactic - Lexical Topic Education Input Type and Expected Response Item; Selectedresponse (multiple choice) Time 20 Length (Number of items) 20 Scoring *Dichotomous scoring (0/1) *One criterion for

*Meaning - Morphosyntactic -Lexical Writing *Content control *Organizational control *Language control correctness *20 points total *Independently scored by two raters, scores are then averaged *Analytic rubric; 0-4 point scale for each of the following; content, organization, and language control


Prompt; Extended production (composing an argumentative essay)


Test task specifications 1. 2. 3. 4. 5. Purpose: see Table 1, design statement Definition of construct: see Table 1, design statement Setting: see Appendix A on test task specifications Time allotment: one hour (20 minutes for grammar and 40 minutes for writing) Instruction: a. b. c. Language : the target language (English) Channel: visual (writing) Instructions: Students were told to read the instructions and to ask the teacher if they have any questions. 6. Characteristics of input and expected response : see Appendix A on test task specifications D. Item coding for the MC Section The coding of the multiple choice section for the grammar task is presented in Table 3. Depending on the construct of the grammatical knowledge tested, each item is coded to test students mastery of morphosyntactic form, morphosyntactic meaning, lexical form or lexical meaning.


Table 3 Item Coding Measured Variable Form: Morphosyntactic Present Perfect Past Perfect Past Continuous Simple Past Present Perfect Past Perfect Past Continuous Simple Past Collocation Noun + Preposition Collocation Verb + Noun Collocation Verb + Preposition Collocation Metaphorical Item Number 1 5 3 4 2 6 10 11 7 12 9 8 13 14 15 16 17 18 19 Key D D B C D A C D C C B B C B C C A B C

Meaning: Morphosyntactic

Form: Lexical

Meaning: Lexical

E. Administration Procedures The test was administered to the students in the Advanced Studies level. The whole test lasted 2 hours but only one hour was allotted for the grammar and writing tasks (as discussed in this paper) while the remaining hour is dedicated to the listening and reading tasks. It was suggested that 20 minutes be spent for Section A (Grammar Section) and 40 minutes for Section B (Writing Section). Students are required to answer all questions. The test booklets were handed out and the students were required to check that it comprised a total of five printed pages. The students also received writing paper for Section B. Students were reminded that the test is a closed-book test and students were not allowed to refer to any


materials or consult their classmate at any point of time during the test. Students were also reminded not to start answering until told to do so. Students were also told that each section of the test will be graded individually. Section A (Grammar Section) is worth a total of 20 marks (1 mark for each correct answer). Section B (Writing Section) will be graded from 0 to 4, with 4 indicating excellent performance and 0 for unacceptable performance. At the end of the test, students were reminded to write their names clearly at the top left hand corner of the test booklet as well as the writing paper before submission.

4. PILOT TEST A. Study Participants In terms of personal characteristics, the native languages of the test-takers were widely varied (i.e, French, German, Korean, Japanese, Portuguese and Spainish). Their length of stay in the USA spanned from 2 weeks to 5 years. Their length of study for English also varied from 1 to 12 years. The highest level of education obtained was also varied, comprising high school, college and postgraduate level. In terms of topical knowledge of the test takers, it was relatively varied due to students varying experiences. Regarding the levels and profiles of the language knowledge of test-takers, the test-takers are advanced learners. Many of them wanted to improve their writing and/or speaking skills for work and academic purposes.


Considering the achievement test implemented in this study was of relatively low stakes, students were unlikely to feel negative about the test. B. Measuring Instrument In alignment with the test blueprint and description of the test tasks, the test is reproduced below. SECTION A: GRAMMAR (20 points; 20 minutes) (Key: MS = morphosyntactic; L = lexical; F = form; M = meaning. Keys are circled.)

Part 1 Choose the best answer. 1) I_____________ working on my research paper yet. a) do not start b) will not start c) had not started d) have not started

(MSF - Present Perfect Tense)


I thought about what to write while I _____________ some articles. a) had read b) have read c) am reading d) was reading (MSF - Past Continuous Tense) I deleted my research paper before I _____________ it to my computer. a) will save b) had saved c) was saving d) have saved (MSF - Past Perfect Tense) When I realized what_____________, I started to panic. a) is happening b) has happened c) had happened d) have happened



(MSF - Past Perfect Tense)



I _____________ since I started working on my research paper. a) wont rest b) didnt rest c) wasnt resting d) havent rested (MSF - Present Perfect Tense) I presented my research paper in class after David _____________. a) presented b) will present c) has presented d) was presenting


(MSF - Past Tense)

Part 2 Gloria, a student sent an email to her professor, Dr. Johnson. Read the email and answer the questions by choosing the best answer. Dear Dr. Johnson, I visited your office at 3 pm yesterday but I believe you (7) had already left. I (8) went to your office right after class but I was not able to get there before 3 pm. I (9) was thinking about going to your office tomorrow again but it might be better to email you instead. Ever since you assigned the research paper, I (10) have been thinking about my topic. I would like to focus on World Englishes because, since I started learning English, I (11) have always wondered why British English is different from American English. I (12) had also thought about other research areas by the time I chose this topic. Would you mind giving me some advice? Thank you. Warm regards, Gloria Smith 7) Dr. Johnson left __________ Gloria arrived. a) after b) when c) before d) as soon as

(MSM - Past Perfect Tense)



It is likely that Gloria went to the office ____________________ Dr. Johnson knocked off. a) 1 hour after b) 5 minutes after c) 1 hour before d) 5 minutes before (MSM - Past Tense)


Gloria thought about going to the office again the next day ___________________. a) after she emailed Dr. Johnson b) after she noticed Dr. Johnson had left the office c) before she went to the office for the first time d) before she noticed Dr. Johnson had left the office (MSM - Past Continuous Tense) Gloria thought about her topic ____________________________. a) when she visited the office b) when the research paper was assigned c) from the time the research paper was assigned to now d) from the time the research paper was assigned to the time she visited the office (MSM - Present Perfect Tense) When does Gloria wonder about British and American English? a) When she thought about a topic b) When she started learning English c) Before she started learning English to now d) From the time she started learning English to now (MSM - Present Perfect Tense) Gloria thought about different topics __________ World Englishes as her topic. a) after she chose b) when she chose c) before she chose d) while she was choosing (MSM - Past Perfect Tense)




Part 3 Choose the best answer 13) This lecture gives insight ____________ the nature of bilingual education. a) in b) for c) into d) about (LF - Noun + Preposition)


14) Reproduction of an article without permission is an infringement ___________ copyright. a) in b) of c) on d) to (LF - Noun + Preposition) 15) The professor firmly believes that education is the only way to ____________ extreme poverty. a) remove b) destroy c) eradicate d) demolish (LF - Verb + Noun) 16) To stay abreast of the times, there is a pressing need to ___________ new educational policies. a) invent b) fabricate c) formulate d) procreate (LF - Verb + Noun) 17) Danny and his team recently ____________ a policy regarding compulsory education. a) drew up b) drew on c) drew in d) drew back (LM - Verb + Preposition) 18) For her presentation, Elizabeth decided to ____________ with her vision of bilingual education. a) kick in b) lead in c) tune in d) cave in (LM - Verb + Preposition) 19) The President of the University _____________ by implementing the new initiatives as promised. a) struck a chord b) faced the music c) hit the right note


d) played second fiddle

(LM - Metaphorical)

20) With the examination around the corner, we need to _______________. a) keep our eyes peeled b) keep our ears to the ground c) keep body and soul together d) keep our noses to the grindstone (LM - Metaphorical)

SECTION B: WRITING (40 minutes) Question: All students should be required to learn a second language. Do you agree? Why? Make sure your essay includes the following: A clear position/ stand Clear thesis statement Supporting evidence Evidence refuting opposing view Clear transition between introduction, body, and conclusion

You may use the space below to organize your thoughts before writing. This will NOT be graded. Write your essay on the paper provided.



The following response was given a score of 12 out of 12. This version is typed word to word to ensure a true reflection of the test takers writing ability . To ensure the confidentiality of the student, the name of the candidate is withheld. Sample Response This summer, I met numerous new people from all over the world. Their native languages were all different, but I was surprised to find out that most of them were fluent in English, as well as other languages not native to them. This made friendships much easier, quicker, and a lot more fun. I recall thinking to myself, How fortunate it is that everyone speaks a second language! With this experience in mind, I am of the opinion that learning a second language should be mandatory for all students. The command of a second language can be the basis for all around personal growth. Speaking a second language can open doors to new experiences, international friendships, broaden view of the world and give oneself a sense of personal achievement, A friend of mine, who speaks English and French as well as her native language Korean, has travelled to the US and to Europe, made friends there, and has been moved by French novels and movies. Think of all the new experiences she would have missed had she not been able to speak other languages! However, some may argue that with all the great translating tools and machines that are currently in existence not to mention the even more advanced versions sure to be on their way why go through the trouble of learning a second language? But translation tools arent perfect and they produce lots of errors. Also, no tool could ever take the place of knowing the language yourself. Often, a machine is unable to catch the gentle nuance or a tone. A different language could bring about a totally different experience you can probably recall being surprised by the


difference between books that had been translated into your language and the original version they are shockingly different sometimes. In such a globalized world, everyone is connected in some way to the whole world in various ways, and by making it compulsory for students to learn a second language, we could help them be better prepared. Nowadays, even an employee in a small company in Korea finds herself/himself corresponding with counterparts in other countries, and by knowing a second language, he/she can be more confident and competent in those situations. Isnt the goal of education to help students be better prepared for later life? In this sense, students should be required to learn a second language. Based on the reasons stated above, I firmly state that it should be mandatory for students to learn a second language. It would have immense positive effects on personal growth enrichment in life, and competence at work situations. I would change the famous saying Boys, be ambitious! to Boys, be bilingual!

C. Scoring Procedures Regarding the criteria for correctness, the multiple-choice grammar task is scored dichotomously in an objective manner. One point is awarded for correct response and zero point for each incorrect response. Though the distractors may present varying levels of grammatical knowledge, no partial grade (i.e., 0.5 point) was awarded. Maximum possible points for this task are 20 points. Only one response was accepted for each question. No mark was given if more than one response is given in the bracket provided. The two raters scored the scripts independently. On the other hand, the writing task is scored using an analytic rubric. Grading is


done subjectively by the two raters. Three components to be evaluated are content control, organizational control, and language control and each component is scored in a scale of 0-4. Regarding the procedures for scoring the response, the grammar task is scored using the answer key independently by two raters. The writing task is scored by two raters independently by using the analytic rubric. Scores were normed prior to scoring by a discussion between the two raters. Each rater assigns a score on each component of the rubric, and those two scores were then averaged to be provided to the students. In the case of a large discrepancy between two raters, a third rater scores and two of the closest scores are then averaged. Pertaining to the explicitness of criteria and procedures, the test takers are informed of the general scoring method for the grammar task. The instructor of the course was familiar with the writing rubric and taught the test takers with the consideration of the rubric. Students have access to the rubric when the scored writing task is returned. The rubric was adapted from the Smarter Balanced argumentative writing rubric developed by Measured Progress/ ETS Collaborative. The original expressions in the above-mentioned source are simplified and the five components (statement of purpose/focus, organization, elaboration of evidence, language and vocabulary, conventions) are streamlined into three, namely content control, organizational control and language control. The Statements of Purpose/ Focus and Elaboration of Evidence in the original rubrics are subsumed under Content Control while Organizational Control becomes a stand -alone criterion. Language and Vocabulary is subsumed under Language Control along with grammar and language conventions which are also perceived as components of language control. Lastly, refinement to the content was being made to ensure that each point in the rubric has a parallel counterpart in every other level. The rubric is presented in Table 4.


TABLE 4 Rubrics for Extended Production Task

Adapted from Smarter Balanced Assessment Consortium English Language Arts Rubrics, retrieved from: LanguageArtsLiteracy/ELARubrics.pdf

Rating 4

Content Control Position is clearly stated, focused and strongly maintained Claim is introduced and communicated clearly within the context Use of evidence from sources is smoothly integrated, comprehensive, relevant, and concrete Alternative or opposing claims are clearly addressed Effective use of a variety of elaborative techniques Position is clear and for the most part maintained, though some loosely related material may be present Context provided for the claim is adequate Some evidence from sources is integrated, though citations may be general or imprecise Alternative or opposing claims are addressed but are

Organizational Control Effective, consistent use of a variety of transitional strategies Logical progression of ideas from beginning to end Effective introduction and conclusion for audience and purpose Strong connections among ideas, with some syntactic variety

Language Control Wide range of grammatical structures with minimal errors Accurate use of collocations with minimal errors Use of academic and domain-specific vocabulary is clearly appropriate for the audience and purpose Few, if any, errors are present in usage and sentence formation Effective and consistent use of punctuation, capitalization, and spelling Good range of grammatical structures with sporadic errors that do not obscure meaning Good use of collocation with sporadic errors that do not impede understanding Use of domain-specific vocabulary is generally appropriate for the audience and purpose Some errors in usage and sentence formation

Adequate use of transitional strategies with some variety Adequate progression of ideas from beginning to end Adequate introduction and conclusion Adequate, if slightly inconsistent, connection among ideas


rather sketchy Adequate use of some elaborative techniques

may be present, but no systematic pattern of errors is displayed Adequate use of punctuation, capitalization, and spelling Inconsistent use of basic transitional strategies with little variety Uneven progression of ideas from beginning to end Conclusion and introduction, if present, are weak Weak connection among ideas Reasonable range of grammatical structures with errors that obscure meaning Frequent misuse of collocation with errors that impede understanding Use of domain-specific vocabulary may at times be inappropriate for the audience and purpose Frequent errors in usage may obscure meaning Inconsistent use of punctuation, capitalization, and spelling Limited range of grammatical structures with frequent errors that obscure meaning Ineffective use of collocation with errors that impede understanding Uses limited language or domain-specific vocabulary May have little sense of audience and purpose Errors are frequent and severe for punctuation, spelling and

Position may be clearly focused on the claim but is insufficiently sustained Claim on the issue may be somewhat unclear and unfocused Evidence from sources is weakly integrated, and citations, if present, are uneven Alternative or opposing claims are inadequately addressed Weak or uneven use of elaborative techniques Position is unclear and may have a major drift Claim may be confusing or ambiguous Use of evidence from sources is minimal, absent, in error, or irrelevant Alternative or opposing claims are not addressed or it is too brief Minimal or no use of elaborative

Few or no transitional strategies are evident Frequent extraneous ideas may intrude Conclusion and/or introduction is missing No connection between ideas


techniques 0 Unattempted. Shows no evidence of language ability. Response is completely irrelevant.

capitalization; meaning is often obscure

5. TEST ANALYSES AND RESULTS 5.1. RESULTS FOR THE GRAMMAR TASK A. Descriptive Statistics By calculating and analyzing descriptive statistics, the distribution of the scores can be examined. The grammar task in the present study consisted of 20 items (k=20) which tested students understanding of the tenses (present perfect, past perfect, past continuous , and simple past tense) as well as collocation. Specifically, there were 12 items that tested morphosyntactic form and meaning and 8 items that pertained to lexical form and meaning. Since each item was worth 1 point, the maximum possible score that could be achieved was 20. The mean score was 14.19 while the median was 14.00. The small difference between the mean and the median of .19 suggested that the distribution of the scores was close to normal. This was also supported by the kurtosis and skewness of the distribution. The kurtosis was -.303 and the skewness was -.493 which were both within a range of -2.5 to 2.5. The standard deviation was 1.70 and the range was 7 which suggests that the difference between the lowest score and the highest score was 7. These statistics are summarized in Table 5. Table 5 Descriptive Statistics for Grammar Task Max. k Possible Mean Median SD Range Kurtosis Skewness Score 20 20 14.19 14.00 1.76 7 -.30 -.49

N Grammar 16


From the descriptive statistics, the following interpretations could be made about the result of the grammar task. First, the negative skewness indicates that the curve peaked towards the right, and this denotes that more students scored above the mean . A total of seven students scored between 15 to 17 points while five students obtained a score of 14. This result was expected since the grammar task was an achievement test. The items on the test consisted of the grammar points and collocations covered in the first half of the semester, and students were expected to answer most of the questions correctly. The results may also be an indication that the teacher has taught well and this led to effective learning. In addition, the question type was familiar to the students, and hence they should not face difficulty in understanding the task requirements. Second, the negative kurtosis indicated that there is a certain degree of variability of the scores. This might be due to the fact that even though all students were in the same level in CEP, there was a certain degree of variability in their levels due to the diversity in student profile. The students length of stay in the States ranged from 2 weeks to 5 years while the length of study for English spanned from 1 to 12 years. B. Internal-consistency Reliability and Standard Error of Measurement Reliability of a test refers to the extent in which the results are consistent. Internal consistency reliability was calculated using Cronbachs alpha in order to determine whether the items on the test relate to each other and measure the construct we were trying to measure. Though the test included 20 items, all the test takers answered two items correctly. Therefore, Cronbachs alpha was calculated using 18 items and it was estimated to be .074 as seen in Table 6.


TABLE 6 Reliability Statistics for the Grammar Task Cronbachs Alpha .074 k (number of items) 18

A Cronbachs alpha of .074 indicates that 7.4% of the test score variance can be accounted for by the true score variance. In other words, only 7.4% of the test score reflects the learners true knowledge of the construct we were trying measure, and the remaining 92.6% is attributed to measurement error. This is very low when the ideal reliability coefficient of a classroom test should be above .75. Several reasons for the low estimate of .074 can be speculated. One reason for the lower reliability might be due to the items on the test. If some items do not homogenously measure the same construct as the other items, those items can lower the internal-consistency reliability. The test included items on present and past perfect as well as collocations. The present and past perfect and collocation are somewhat unrelated and that can be the cause of the inconsistency on the items. Another source of error variance might be due to test takers. The test was administered only in one class of 16 students. If a few students scores show unusual variability, those scores could cause the reliability coefficient to be lower. Furthermore, the test takers are at the Advanced Study level, which is the highest level in CEP. Due to the fact that the test takers were at the highest level of the program, their motivation for doing well on the test might be low since there is no possibility of advancement to the next level. Since the reliability coefficient is very low, the test is likely not reliable in measuring the construct we were trying to measure. The reliability coefficient might be improved if the number


of items as well as the number of test takers is increased. In order to increase the internal consistency reliability, item analysis has been conducted. Poor performing items are decided to be revised or removed to increase the reliability. See the section 5.1 B to see item analysis. In addition to the internal-consistency reliability, the Standard Error of Measurement (SEM) was calculated. SEM can be used to determine a range of a test takers score if s/he takes the test again in a different occasion. The SEM was calculated using the formula SEM ______ = S 1 rxx where S is the standard deviation and rxx is the reliability. As shown in Figure 4, the SEM for the grammar test was 1.69. Figure 4 Standard Error of Measurement (SEM) Calculation for the Grammar Task Standard Error of Measurement (SEM) ______ SEM = S 1 rxx S = standard deviation, rxx = reliability _______ SEM = 1.759 1 .074 = 1.69

Based on the value of SEM, a 95% confidence interval (2SEM) was determined to be 3.38. In other words, if test takers are to be given the same test again, with 95% certainty their scores will fall within the range of 3.38 of their first score. For example, if a test taker scored 15 on the first test, his or her test score will fall within a range of 11.62 to 18.38 or more appropriately within a range of 12 to 18 which were rounded to the nearest integer since the test is scored dichotomously. This range seems be relatively high since the maximum score is 20. This wide range is due to a high standard deviation and low internal-consistency reliability.


C. Item Analyses In order to examine the quality of each test item in the multiple choice grammar task, item difficulty, item discrimination, and the effect on the internal consistency measure if each item were to be deleted were calculated using SPSS. Item difficulty (p-value) and item discrimination (D-index) were used to decide whether to keep, revise, or remove the item from the test. The p-value indicates the proportion of test takers who got the item correct and was calculated by using the following equation, high p-value indicates a large proportion of test takers with correct answers; low level of difficulty and vice versa. For instance, if 18 of 20 test takers answered an item correctly, the pvalue for the particular item would be .90, indicating that 90% of the test takers who answered the item got correct and that the item is relatively easy. The p-value for a classroom achievement test should be in the range of .6 to .95. The D-index indicates the degree to which an item discriminates between high and low performing groups. A positive D-index indicates that more high performers answered the item correctly than low performers; a negative D-index indicates that opposite. D-index value of zero indicates that the same number of high and low performers answered the item correctly. Both negatively discriminating and non-discriminating items would lower the internal-consistency reliability. The D-index for a classroom achievement test should be .300 or above. In addition to p-value and D-index, the effect on the internal consistency reliability, Cronbachalpha, if an item is deleted was considered in order to improve the internal consistency reliability. The following table shows the three values discussed as well as the decision made based on those three values. A


Item Form 1 Form 2 Form 3 Form 4 Form 5 Form 6 Mng 7 Mng 8 Mng 9 Mng 10 Mng 11 Mng 12 Form 13 Form 14 Form 15 Form 16 Mng 17 Mng 18 Mng 19 Mng 20

Table 7 Initial Item Analysis for the Multiple Choice Grammar Task Difficulty Discrimination Alpha if Deleted .94 .363 -.032 1.00 0 .074 .81 -.083 .116 .94 -.113 .107 .75 .471 -.196 .44 -.363 .276 1.00 0 .074 .88 .448 -.114 .94 -.113 .107 .88 .191 .000 .94 -.113 .107 .69 -.195 .183 .38 -.213 .197 .38 -.142 .160 .75 .361 -.126 .75 -.106 .133 .88 .191 .000 .25 -.147 .153 .56 .284 -.104 .06 .137 .036

Decision Keep Keep Remove Remove Keep Remove Keep Keep Remove Revise Remove Remove Remove Remove Keep Remove Revise Remove Revise Remove

Though there were 20 items on the grammar test, since all the test takers answered two items (items 2 and 7) correctly, these two questions were omitted by SPSS for the data analysis. The alpha if deleted for these two questions are hence unavailable. Table 7 shows a total of 18 items. The level of difficulty for the 18 items ranged from .06 to .94. Six items (items 6, 13, 14, 18, 19, and 20) fell below the lowest ideal p-value of an achievement test, .6. These might be too difficult for the test takers. In terms of item discrimination, D-indices ranged from -.363 to .471. Four items (items 1, 5, 8, and 15) had D-indices of above .3, four items (items 10, 17, 19, and 20) had positive D-indices below .3 and ten items (items 3, 4, 6, 9, 11, 12, 13, 14, 16, and 18) had negative D-indices. The decision stated in table 7 was tentative and based purely on the respective data generated. Hence, negatively discriminating items were proposed to be removed


at this stage. For the purpose of this test, items 10, 17 and 19 which had positive discrimination index of less than .3 were suggested to be revised. The grammar test had a very low internal consistency reliability of .074 and this can be due to the negatively discriminating items. A total of eight items (items 6, 12, 4, 18, 13, 9, 20, 3) were deleted to increase the reliability coefficient. In order to delete an item at a time, the item that would increase the Cronbachs alpha the most and the item that is discriminating the least were considered. Most of the time, these two values coincided. In the case when these two values did not coincide, the effect on Cronbachs alpha was given a preference. The procedure was repeated until the internal consistency reliability reached a reasonable level of .666, as represented in Table 8. Though it did not reach the ideal coefficient level of .700, revision of the remaining items is expected to further increase the reliability. Table 8 Reliability of the Revised Grammar Task (N=16) Cronbachs alpha k (# of items) .666 10

Table 9 shows the final results and decision made after the iterations. Table 9 Final Item Analysis for the Multiple Choice Grammar Task Item Form 1 Form 2 Form 5 Mng 7 Mng 8 Mng 10 Mng 11 Form 14 Form 15 Form 16 Difficulty .94 1.00 .75 1.00 .88 .88 .94 .38 .75 .75 Discrimination .244 0 .400 0 .274 .274 .103 .309 .400 .309 Alpha if Deleted .657 .666 .626 .666 .652 .652 .674 .649 .626 .647 Decision Revise Keep Keep Keep Revise Revise Revise Revise Keep Keep


Mng 17 Mng 19

.88 .56

.166 .728

.669 .533

Revise Keep

Since the test in the present study is positioned as an achievement test, items that had a p-value between .6 to 1, and D-index of .3 or above were kept. Hence, item 2, 5, 7, 15, 16 and 19 were kept. It is to be noted that non-discriminating items, namely item 2 and 7, were decided to be kept. Considering that this is a classroom achievement test, it is not an issue for concern since it shows that the students had mastered the respective grammar knowledge well. Retaining these two items would also encourage the learners and boost their confidence level. Items that had a p-value of .6 to 1 and D-index of less than .3 were proposed to be revised. They are namely item 1, 8, 10, 11 and 17. As for item 14, it was suggested to be revised as it has a lower p-value of .38 which is too low a figure for an achievement test. On the other hand, item 19 was decided to be kept even though its p-value fell just below .6. In addition, it discriminates well as evident from its D-index of .728. Though the deletion of eight items increased the reliability, most of the items that tested past perfect tense were deleted as a result of this process. In order to ensure that the construct we decided to measure is presented on the test, re-examination or addition of the items that tested past perfect tense is needed.

D. Distractor Analysis In order to examine the items that were poorly discriminating high performers from low performers, disctractor analysis was conducted. Items 6 and 12 were selected to be analyzed because they were poorly discriminating and were the first two items decided to be deleted to increase the internal consistency. The four answer choices were examined in terms of frequency,


and the number of high and low performers who answered the item correctly. The p-value and D-index for each item were not examined due to a small number of test takers. Table 10 shows the analysis for each of the distracters for item 6. Table 10 Distractor Analysis for Item 6 (N=16) Frequency High (N=4) 7 2 0 0 8 2 1 0

Answer Choice A (Key) B C D

Low (N=4) 2 0 1 1

Item 6, represented below, was designed to measure the knowledge of morphosyntactic form of past tense. The key is circled. 6) I presented my research paper in class after David ______________________. a) presented b) will present c) has presented d) was presenting The result shows that more test takers chose C over the key, A, for this item. This might indicate that test takers might not yet fully understand the distinction between the past perfect tense and present perfect tense since had presented in a past perfect tense would be correct answer while the choice C has presented in a present perfect tense would not. They might have chosen choice C due to the similarity of forms between has presented and had presented. Distracters B and D are not attractive enough since none of the test takers chose B and only one low performer chose D. Distractor B might have been too obvious as a wrong choice. Though the item was decided to be deleted from the test, if it were to be revised, the better distractor would


be has been presenting or had been presenting. The most attractive choice C discriminated better than the key, A and this seems to be the major problem with this particular item. A distractor analysis was also conducted for item 12. Table 11 shows the analysis of each distractor. Table 11 Distractor Analysis for Item 12 (N=16) Frequency High (N=4) 0 0 3 1 11 3 2 0

Answer Choice A B C (Key) D

Low (N=4) 0 1 3 0

Item 12, represented below, was designed to measure the knowledge of morphosyntactic meaning of past perfect tense. The key is circled. (earlier text omitted here)I had also thought about other research areas by the time I chose this topic [World Englishes]. 12) Gloria thought about different topics ____________ World Englishes as her topic. a) after she chose b) when she chose c) before she chose d) while she was choosing The result shows that most of the test takers chose the key, choice C. The major problem of this item is that none of the choices, including the key, discriminated high performers from low performers. Since no one chose Distractor A, our first instinct was to replace after with another adverb clause. However, there seems to be no reasonable replacement. Therefore, we decided to create two choices commencing with when and before respectively. It is


suggested that Distractor A be revised to before she had chosen. Distractor D could be replaced by when she had chosen. Hopefully this would enhance the effectiveness of the distractors. It is hoped that the amended distractors would attract the low performing students and increase the D-index for this question.

E. Evidence of Construct Validity within the Grammar Task Construct validity refers to the degree to which a test measures the underlying construct of the test. The grammar task in the present study was designed to measure a construct of grammatical knowledge by measuring test takers knowledge of grammatical form and meani ng using Purpuras (2004) model. There were a few items on the test for each item types. By examining the degree to which these item types relate to one another, the assumptions can be made about whether or not the items are all measuring the same underlying construct of grammatical knowledge. Using SPSS, correlation of the two variables was calculated. The Pearson product moment procedure was used since the variables used, average of each of the variables, were on an interval scale. The revised version of grammar task, which excluded items 3, 4, 6, 9, 12, 13, and 20, with a total of 12 items was used for the correlation. The coefficients are presented in the correlation matrix in Table 12. Table 12 Correlation Matrix between Two Components of the Grammar Task Meaning 1 .585* Form .585* 1

Meaning Form

* Correlation is significant at the 0.05 level As shown in Table12, a statistically significant correlation of .585 was found between meaning and form. This positive correlation was moderate and statistically significant at the .05


level. This indicates that there is a 95% chance that the observed correlation is not due to pure chance. This positive correlation is expected because according to the theoretical model of grammatical knowledge, meaning and form are closely related. This statistically significant correlation suggests that the test items are measuring the same underlying construct, namely the grammatical knowledge. It provides some evidence of construct validity.

5.2. RESULTS FOR THE WRITING TASK A. Descriptive Statistics Descriptive statistics were calculated for us to explore the participants performance for the extended-production writing task. There was only one prompt for the writing section, and students responses were scored using an analytic rubric which comprised three domains. The rubric was categorized into three domains: Content Control, Organization Control, and Language Control. For each criterion, the highest possible rating is 4 while the lowest is 0. A total of 16 students (N=16) sat for the test. The students responses were marked independently by two raters and the total scores were averaged out to derive the score that each student actually received. For administrative purposes, the maximum possible score (Writing Total) that a student was able to obtain for this test was 12. This was derived by adding up their scores for all three criterions. However, in the case of this paper, we are reporting it as Writing Average to keep the numbers aligned with the same scale as the rubric. Hence, the maximum for the Writing Average was 4. The minimum score was 1.83 and the maximum was 4, yielding a range of 3.17. The mean of 2.67 and median of 2.58 were very close. The standard deviation was 0.577. A kurtosis


of 0.52 and skewness of 0.53 fell within the acceptable range of 2.5. These figures give us a near normal distribution. The statistics are presented in Table 13. Table 13 Descriptive Statistics for the Writing Task N Mean Median SD Range Min. Max. Poss. Poss. Score Score 16 2.22 2.25 1.00 4 1 4 2.91 2.88 2.67 3.00 3.00 2.58 0.58 0.59 0.58 3 3 3.17 2 2 1.83 4 4 4



Content Control Ave Organization 16 Control Ave 16 Language Control Ave 16 Writing Ave

-1.18 -0.38 -0.55 0.52

0.05 -0.16 0.00 0.53

Interestingly, the skewness for the language control average is zero which indicates a normal distribution. The skewness for the other two components approximate zero as well. By yielding a skewness of -0.163 for organization control, we can see that the students performed best for organization control, followed by language control and content control. This could be because the mastery of organizational control is slightly easier than the demonstration of grammatical performance. Since organizational control is related to the structure of the essay and the progression of the ideas, as long as there is effective use of introduction, conclusion and transitional strategies, students would be able to score quite well for this component. However, the respective performance for the three components does not seem to differ significantly. The skewness for the writing task was slightly positive. For an achievement test, we may expect the skewness to be negatively skewed as an indication that the students have mastered the skills taught. However, a few students have misinterpreted the question and hence went off topic by discussing why all students should learn English as a second language instead of the learning of a second language in general. This could have impacted their scores for content though they


did reasonably well for organization and language control. A total of 5 students scored 1 for the content component resulting in the average score for content (2.22) to be much lower than that of organization (2.91) and language control (2.88). The standard deviation of the content control average is also the greatest. Evidently, the students performed much better for organization and language control given that a total of 11 and 10 students scored above the average score for organization and language control respectively. The difference of 0.03 between the average score of organization and language control is rather insignificant. This may be an indication that organization and language control are more closely related relative to content control which pertains to students topical knowledge and critical thinking ability. With regards to dispersion, the respective kurtosis of all the three components was negative, showing a leptokurtic distribution. It is apparent that the kurtosis for content control (-1.18) was the lowest. This was expected due to a handful of students misunderstanding the prompt as above-mentioned. The slightly positive kurtosis of 0.517 for writing average indicates that the scores were more or less clustered in the middle and variability may not be that significant. If not for the fact that some students misinterpreted the question, the kurtosis may be greater. Although the students came from diverse background and varied in their length of stay in the States and the length of time studying English, their overall performance for the writing task was relatively homogenous. This could be attributed to the success of the instructor for teaching argumentative essay to the students.

B. Internal-Consistency Reliability and Standard Error of Measurement The internal-consistency reliability and Standard Error of Measurement (SEM) of the writing task were calculated. For each domain, namely content control, language control, and


organizational control (K=3), the students obtained scores ranging from 1 to 4 points. The scores awarded by the two raters for each domain were averaged and were considered items. They were then used to estimate the internal-consistency reliability. The Cronbachs alpha obtained was 0.655, as shown in the Table 14 below. TABLE 14 Reliability Statistics for the Writing Task (N=16) Cronbachs Alpha .655 k (# of items) 3

Considering that the range of internal-consistency reliability is 0 to 1.00, a Cronbachs alpha of .655 is moderately high. This indicates that 65.5% of test score variance can be accounted for by true score variance in this test, leaving approximately 34.5% of the score for measurement error, or error variance. The test in the present study could have been more consistent and reliable given the purpose of the test. The test could be subjected to the following sources of error variance. First, students could suffer from test anxiety as it may be challenging for them to complete all four sections of the test within two hours. Second, there may be differences in the scores given by the raters. This will be further discussed under inter-rater reliability. Third, the task may be too difficult for the students. Besides the Cronbachs alpha, the SEM and the 95% confidence interval were also calculated to further examine the consistency of scores. The SEM is useful in determining the confidence interval around an observed score, and it is calculated using the formula as shown in Figure 5.


FIGURE 5 Standard Error of Measurement (SEM) Calculation for the Writing Task

Standard Error of Measurement (SEM) Calculation SEM = S SEM = 0.58 where S = standard deviation, rxx = reliability estimate for the test = 0.34

Additionally, a 95 % (2 SEMs) confidence interval of 0.68 was also determined, indicating that there is a 95% chance that test takers scores would fall within 0.68 point of their score on should they take the writing test again. This range was fairly broad but it could still be within reasonable limits for the reliability for this test.

C. Inter-Rater Reliability Considering that the writing task responses were marked by two raters, it is important to determine the inter-rater reliability to examine the consistency of measurement between Rater 1 (R1) and Rater 2 (R2). The scores given by them for content control, organization control, and language control were correlated respectively by computing the Spearman rank-order correlation. This is taking into account that the three domains were ordinal variables. The Pearson productmoment correlation was computed for the writing total for R1 and R2 since these variables are on a continuous scale. For the three individual domains, correlations of 0.931 for Content Control, 0.672 for Organisation Control, and 0.629 for Language Control were observed, and were statistically


significant at the .01 level. This implies that there is a 99% chance that the correlation is not due to chance. These results can be found in Table 15. Table 15 Correlation Matrix for Inter-Rater Reliability Rater 2 Content Control .931** Rater 2 Organisation Control Rater 2 Language Control

Rater 1 Content Control Rater 1 Organisation Control Rater 1 Language Control

.672** .629**

* Correlation is significant at the 0.01 level It is evident that the inter-rater reliability of .931 for Content Control was the highest among the three domains. The correlation for Organization Control and Language Control were.672 and .629, respectively, differing by only .043. Ideally, norming should have been conducted prior to the scoring process. However, due to unforeseen circumstances (i.e., occurrence of the hurricane), the two raters were unable to do a proper norming before scoring. Hence, both raters proceeded with their independent scoring to ensure that the scripts were returned to the students promptly. The original written responses were with R1, and R2 had the scanned version. Due to the power outage occurring at R2s apartment after the hurricane, scoring was done under the poor lighting. In addition, a couple of students wrote on both sides of the paper, hence leading to poor quality of the scanned responses. Such technical issues impacted the judgment of R2 when grading the responses. Hence, after much deliberation, the responses were remarked after R2 got hold of the original responses. This is to better reflect the actual performance of the students so that accurate feedback could be given to them for the benefit of their learning. The eventual inter-rater reliability for Content Control


was the highest and this could be attributed to the straightforward nature of the argumentative genre, which was also captured in the rubric descriptors. Also, the two raters shared a good understanding of the requirements as delineated in the rubrics. Students who went off topic were given the common score of 1 out of 4 for Content Control. It is interesting that though the inter-reliability for Language Control and Organization Control were close, they were much lower than that for Content Control. This could be a result of the discrepancies in the understanding of the rubrics between the two raters. There were four sub-points being listed under Organization Control and five under Language Control. It was noted that R1 tended to be harsher in scoring by awarding lower scores when students did not meet two or more of the criterion listed for these two domains. In comparison, R2 tended to be more lenient in her grading. The differences in background of the two raters could be a factor. R1 grew up in a bilingual society where she learned a different variant of English since young and R2 is a competent second language learner. Hence, R2 might be more empathetic in her grading process as she may better understand the difficulties of learning a second language. The inter-rater reliability for Organizational Control and Language Control were very close, only differing by .043. This may be because both domains pertain to the technical aspect of language and are, hence, more closely related relative to content. Therefore, each rater was consistent in their individual judgment of the two domains resulting in a similar outcome between the two domains. In retrospect, the raters could have discussed the rubrics more explicitly to enhance the inter-rater reliability for the various domains. A more rigorous norming procedure could also have been conducted. Nonetheless, the inter-rater reliability for the writing task was .897 and was statistically significant at the .01 level. This indicates that the two raters were highly


consistent in their grading and this eventuated in a high correlation between them. The results can be found in Table 16. Table 16 Inter-rater Reliability Based on Total Task Score Rater 2 Writing Ave Score .897**

Rater 1 Writing Ave Score

* Correlation is significant at the 0.01 level D. Evidence of Construct Validity within the Extended Production Task Table 17 Correlation Matrix between Domains of the Writing Task Language Control 1.00 .308 .346 Organizational Control 1.00 .593* Content Control

Language Control Organizational Control Content Control

*. Correlation


is significant at the 0.05 level

Only the correlation between the content control and organizational control was statistically significant, indicating a 95% probability that it was not due to chance. The systematic relationship could be due to an overlapping of the two domains as described in the scoring rubrics. To score well for content, the test takers position had to be clearly stated and strongly maintained; the claim had to be introduced and communicated clearly within the context. Such requirements inevitably overlapped with that of the criteria for organizational control where the essays had to have an effective introduction and conclusion, demonstrate a logical progression of ideas and make strong connections among ideas. The coefficient of .593 was only


moderate. It could have been higher if some of the students did not digress from the given topic. As such, they were graded low for content but high for organization. The relationship between content control and language control was not statistically significant. There are two possible reasons to this observation. First, it was possible for the students to come up with good content and yet they do not display high level of language control. Next, quite a number of test takers misinterpreted the question and hence failed to answer to the point. Thus, even though they scored well for language control, they were penalised for inappropriate content. Similarly, no systematic relationship was discovered between organizational control and language control. As evident from the rubrics, language control is manifested at the lexical, phrasal or sentence level. It does not relate to organizational control as much which emphasizes more on transitional strategies. Hence, an essay might display effective language control and yet be poorly organized and incoherent.

5.3. Other Evidence of Validity A. Relationship between the Grammar Task and the Writing Task The correlation between the grammar and writing scores are shown in table 18. Table 18 Correlation Matrix between Grammar and Writing Scores Grammar Score 1.00 -.145 Writing Score 1.00

Grammar Score Writing Score


There are several reasons as to why the correlation between the grammar and writing sections was negative and not statistically significant. First, the grammar section was not very well-designed. After removing eight items, the internal-consistency reliability increased from .074 to .666. However, this was still relatively low. Much revision has to be made to the remaining items. It was believed that the revised items could lead to a higher correlation. Additionally, a limited range of grammatical knowledge was tested for the grammar task. Only tenses and collocations were tested. Besides, these two areas are distinctly unrelated and students performed better for questions on tenses than collocations. Students might have a better grasp of tenses as the teaching and learning of tenses is more rule-based. Related concepts are easier to understand and apply according to the context. In contrast, the mastery of collocations is dependent on the students exposure to the language, especially when pertaining to the meaning of idioms. The collocation section proved to be challenging for the students and affected their overall score for the grammar task. However, for writing, the students could strategically avoid using collocations that they were not familiar with so that they would not be penalized. It is also worthy to note that the test developers did not manage to observe the teaching of collocations. Hence, the difficulty level of the test items may not be level-appropriate. The collocations that were tested might not have been emphasized in class. This could have led to a poorly defined construct for the section on collocation, causing it to resemble a proficiency test rather than an achievement test. This might have a pronounced effect on the correlation between the grammar and writing tasks.


The writing task was strictly aligned to the content covered in the class. However, it was unexpected that the topic appeared to be emotionally charged as several students discussed specifically about learning English as a second language instead of relating it to the learning of second languages in general. Hence, they performed badly for content and the overall writing grade was not a true reflection of their writing ability. This subsequently weakened the correlation between the two tasks. Though it was speculated that a strong correlation exists between the grammatical knowledge and writing ability, the test in the present study did not support this hypothesis. However, this could be due to the design flaw of the test. B. Relationships between Nature of Native Language and Overall Performance Background variables were collected from the needs analysis conducted by the instructor at the beginning of the semester. The variable for the test takers native languages was chosen to analyze the relationship against the test takers overall test scores . Since there were a variety of native languages within this test taker group, the languages were divided into Asian and nonAsian languages since this resulted an equal number of test takers in each group. For the purpose of the analysis, Point-biserial correlation should have been used since native language is a nominal variable while total score is a continuous/interval variable. However, for this study, we analysed the data using Pearson product-moment correlation due to technical limitations. Therefore, the result should be interpreted with caution. Table 19 presents the correlation matrix for the relationship between the test takers native languages (Asian or Non -Asian) and overall test scores.


Table 19 Correlation Matrix for Relationship between Native Language and Test Score Overall Test Score Native Languages .920 (Asian or Non-Asian)

As shown in Table 19, no statistically significant correlations were found between the native languages and the overall test scores. This may be explained by a small number of test takers, which was 16. The sample might be too small to find any significant correlation. Another reason might be due to the fact that the grammar test was an achievement test. All of the test takers received the same instruction on the grammar points covered on the test. Hence the impact of their native languages on the test score might have been weakened. On the contrary, if this test was positioned as a placement test, perhaps some correlation would have been discovered with a larger sample.


DISCUSSION AND CONCLUSIONS The aim of this study was to design an achievement test for the Advanced Study level of

the CEP Program at Teachers College, Columbia University. It aimed to explore the nature of the students grammatical knowledge and writing ability at the Advanced Stu dy level. Also, this study attempted to investigate the relationship between the grammatical knowledge and writing ability in this test, as well as the relationship between test takers native language (Asian VS non-Asian) and their performance on this test. A MC grammar task and an extended-production writing task were hence designed and administered. A series of statistical analyses were then conducted.


For the MC grammar task, the results revealed a negative skewness, which is characteristic of an achievement test. However, a negative kurtosis was observed, indicating a certain degree of variability in the students ability. The items in the grammar task that pertained to tenses were relatively easy. Two items were answered correctly by all the test takers. There were four questions with an item facility of 0.94, indicating that only one student got it wrong for that respective question. The items regarding collocation were obviously more challenging. The item facility ranged from 0.88 to 0.06. For the last question, only one student answered it correctly. The internal-consistency reliability of the grammar task was estimated to be .074, indicating that the grammar task might have been an inaccurate assessment of the students grammatical knowledge. Upon reviewing the items, we removed a total of eight items. For the remaining 12 items, six items were recommended to be revised. Hopefully this will enhance the overall construct validity of the MC task. The correlation between meaning and form was statistically significant. The correlation coefficient was .585 which is of moderate magnitude. In sum, there is still room for refinement of the MC task to suit its purpose of being a classroom achievement test. On the other hand, the results for the writing task revealed a kurtosis of 0.52 and skewness of 0.53 which fell within the acceptable range of 2.5. The internal-consistency reliability is moderately high. The inter-rater reliability for the three domains, namely content control, organizational control and language control were all statistically significant. With regards to the total task score, the two raters were highly consistent in their grading as evident from the interrater reliability of .897 which was statistically significant at the .01 level. There was no strong evidence of construct validity within the extended production task. Only the relationship between content control and organizational control was reported to be


statistically significant. This was probably due to overlapping content in the rubrics. Due to a number of students digressing from the given topic, low scores were given for content control and this consequently impacted the correlation matrix of the three domains. In addition, the outcome of the study did not confirm the hypothesis that grammatical knowledge and the writing ability are related as suggested by the literature. However, this could be due to design flaw of the test, as well as unforeseen circumstances. Besides, there also appeared to be no relationship between the test-takers native language (Asian VS non-Asian) and their overall test performance. Given that the sample size was too small, the design of this study may be unsuitable for identifying the existence of such a relationship. There were several limitations to this study. A sample of 16 test takers was believed to be relatively small to make any generalization. Furthermore, a lack of knowledge of exactly what has been taught in terms of grammar might have led to a relatively low internal consistency reliability of the test. This is because neither of the test developers taught the course and though each test developer observed the class once, it might not have been enough to understand exactly what has been taught by the instructor. In order for the grammar test items to better reflect what has been taught, more observations might have been necessary. As for the writing task, due to the varying experience of the test takers, the topic had to be generic to ensure that all students were able to respond accordingly. Besides, the topic has to be aligned with the theme of education as taught in class. This makes it rather challenging for the test developers to design an appropriate question. It was unexpected that a couple of students ended up discussing the importance of learning English as a second language instead of learning second languages in general. Therefore, a less emotionally charged topic could have been chosen.


Furthermore, the norming process of the writing task could have been conducted more carefully and thoroughly to increase inter-rater reliability. In addition, subject to administrative constraints, the test takers had to sit for the grammar and writing tests along with the reading and listening tests in two hours. This could trigger some fatigue and anxiety in the students and threaten student-related reliability. For the administration of the CEP tests in future, perhaps it could be considered that the students complete the four tests on two different days. In conclusion, it is anticipated that resolving the above-mentioned issues could enhance the overall quality of the test tasks for it to better serve as a classroom achievement test.


REFERENCES Bachman, L.F. (1990). Fundamental considerations in language testing. Oxford, Uk: Oxford University Press. Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford, UK: Oxford University Press. Brown, H. D. (2004). Language assessment: Principles and classroom practices. New York: Pearson. Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1, 1-47. Carroll, J. B. (1968). The psychology of language testing. In A. Davies (Ed.), Language testing symposium: A psycholinguistic approach (pp. 46-69). London: Oxford University Press. Fathman, A. K., & Whalley, E. (1990). Teacher response to student writing: Focus on form versus content. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 178-190). New York: Cambridge University Press. Hamp-Lyons, L. (1990). Second language writing: assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 69-87). New York: Cambridge University Press. He, L., & Shi, L. (2012). Topical knowledge and ESL writing. Language Testing, 29, 443-464. Hyland. K. (2002). Teaching and researching writing. Harlow, UK: Pearson. Hyland. K. (2003). Second language writing. Cambridge, UK: Cambridge University Press. Lado, R. (1961). Language testing. New York: McGraw-Hill.


Larsen-Freeman, D. (1991). Teaching grammar. In M. Celce-Murcia (Ed.), Teaching English as a second or foreign language (pp. 279-296). Boston, MA: Heinle & Heinle Publishers. Mohammad, F. (2008). The relationship between writing competence, language proficiency, and grammatical errors in the writing of Iranian TEFL sophomores (Doctoral dissertation). University Sains Malaysia, Malaysia. ODell, F., & McCarthy, M. (2008). English collocations in use: Advanced. Cambridge, UK: Cambridge University Press. Oller, J. W., Jr. (1979). Language tests at school. London: Longman. Purpura, J. (2004). Assessing grammar. Cambridge, UK: Cambridge University Press. Purpura, J. (2008). Assessing Communicative Language Ability: Models and their Components. In E. Shoham., & S. Hornberger (Eds.), Encyclopedia of language and education (2nd ed.): Vol.7. Language Testing and Assessment (pp. 53-68). New York: Springer Science + Business Media LLC. Raimes, A. (1983). Techniques in teaching writing. Oxford, UK: Oxford University Press. Rea-Dickins, P. M. (1991). What makes a grammar test communicative? In J. C. Alderson & B. North (Eds.), Language testing in the 1990s: The communicative legacy (pp. 112-131). New York: Harper Collins. Rutherford, W. E. (1987). Second language grammar: learning and teaching. New York: Longman. Shaw, D. S., & Weir, C. J. (2007). Examining Writing. Cambridge University Press. Silva, T. (1993). Toward an understanding of the distinct nature of L2 writing: The ESL research and its implications. TESOL Quarterly, 27, 657-677. Weigle, S. C. (2002). Assessing writing. Cambridge, UK: Cambridge University Press.


Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Basingstoke, UK: Palgrave Macmillan


Appendix A: Test Task Specifications

Task 1: Grammar 20 multiple choice questions SETTING Physical characteristics Location: One classroom at Teachers College, Columbia University Noise level: Low Temperature: Comfortable Seating conditions: two or three students in a long table Lighting: Well lit Materials: Pen or pencil, test packet Students and a teacher During class time: 10 AM to 12 AM on October 25th Grammar Task: 20 minutes Writing Task : 40 minutes The total test duration is two hours (including the listening and reading tests not covered here). Hence, the above time allocation is recommended for the two tasks respectively. INPUT Format Channel: Visual Form: Language Language: Target (English) Length: Short 20 questions, Medium for Part 2 - Email Type: Multiple-choice items Speededness: Generally unspeeded Vehicle: Reproduced (printed) Channel: Visual Form: Language Language: Target (English) Length: Short 3-sentence prompt Type: Prompt Speededness: Generally unspeeded Vehicle: Reproduced (printed Task 2: Writing Writing an argumentative essay Same as Task 1

Participants Time of Task

Same as Task 1 Same as Task 1

Language characteristics Organizational characteristics Grammatical

Textual Pragmatic characteristics Functional Sociolinguistic

Vocabulary: Range of general vocabulary Morphosyntax: Range of sentence structure Graphology: Typed Cohesion: Input across questions are loosely related Ideational Dialect: Standard Register: Formal Naturalness: Natural Cultural reference: None Figurative language: None Restricted: Theme - Education Channel: Visual Form: Non-language Language: N/A Length: Short 20 multiple choice

Same as Task 1

Cohesion: Cohesive prompt

Same as Task 1 Same as Task 1

Topical characteristics EXPECTED RESPONSE Format

Same as Task 1 Channel: Visual Form: Language Language: Target (English) Length: Medium 3 to 4 paragraphs

items Type: Selected response Speededness: Generally unspeeded Language characteristics Organizational characteristics Grammatical Type: Extended production Speededness: Generally unspeeded


Textual Pragmatic characteristics Functional Sociolinguistic


Vocabulary: Range of general vocabulary Morphosyntax: Range of sentence structure Graphology: Handwritten Cohesion: Cohesive content Ideational and heuristic Dialect: Standard Register: Formal Naturalness: Natural Cultural reference: Varied Figurative language: Minimum Restricted: Theme - Education Same as Task 1 Broad Indirect (opinions and background knowledge is required)

Topical characteristics N/A RELATIONSHIP BETWEEN INPUT AND RESPONSE Reactivity Non-recipical Scope of relationship Narrow Directness of relationship Direct


Appendix B: Outline of Participation Phase Preparation Description Meeting with CEP Teachers & Subsequent Communication Lesson Observation (one lesson each) Design of Test Name of Contributor Aya Tsuchie & Rongchan Lin Aya Tsuchie & Rongchan Lin Aya Tsuchie (Tenses) Rongchan Lin (Collocation & Writing) Rongchan Lin Aya Tsuchie Rongchan Lin Rongchan Lin Rongchan Lin Aya Tsuchie Aya Tsuchie Aya Tsuchie Rongchan Lin Rongchan Lin Aya Tsuchie & Rongchan Lin Rongchan Lin (as she lives nearer to school) Aya Tsuchie & Rongchan Lin Aya Tsuchie & Rongchan Lin

Test Design

Writing of Paper

Introduction Literature Review Grammar Writing Writing and Grammar Test Construction Target Language Use Domain Design Statement Operationalization Item Coding for the MC Section Administration Procedures

Scoring Procedures Design of Writing Rubrics Marking Administrative Printing of Papers Matters Collection and Return of Scripts Feedback to the Class Running of SPSS MC Grammar Section Test Analysis and Results Writing Task Other Evidence of Validity Discussion and Purpose of Paper and Limitations Conclusion Notes:

Aya Tsuchie & Rongchan Lin

We would like to express our heartfelt appreciation to Dr. Grabowski and Ms Saerhim Oh for their strong support. We are grateful to Dr. Grabowski for her valuable feedback regarding the refinement of our test. She was also extremely patient in answering our queries throughout the whole process. Also, in the earlier phase of this project, due to unforeseen circumstances, we have encountered difficulties confirming the CEP teachers that we should collaborate with for this project. Many thanks to Saerhim for her prompt advice and help in linking us up with an alternative CEP teacher.