V.vo JSLW Corpus

9/23/2019 Journal Format For Print Page: ISI
Master Journal List JOURNAL LIST
Search terms: *JOURNAL OF SECOND LANGUAGE WRITING

Total journals found: 1
1. JOURNAL OF SECOND LANGUAGE WRITING

Quarterly
ISSN: 1060-3743
PERGAMON-ELSEVIER SCIENCE LTD, THE BOULEVARD, LANGFORD LANE, KIDLINGTON,
OXFORD, ENGLAND, OX5 1GB
1. Social Sciences Citation Index

2. Current Contents - Social & Behavioral Sciences
mjl.clarivate.com/cgi-bin/jrnlst/jlresults.cgi 1/1
-RXUQDORI6HFRQG/DQJXDJH:ULWLQJ²
Contents lists available at ScienceDirect
Journal of Second Language Writing

journal homepage: www.elsevier.com/locate/jslw
Use of lexical features in non-native academic writing

7
Sonca Vo
⁎
University of Foreign Language Studies - the University of Danang, Danang, Vietnam
A R TIC L E INFO A B S TR A C T
Keywords: Second language writing research has often analyzed written discourse to provide evidence on
Lexical analysis learner language development; however, single word-based analyses have been found to be in-
Lexical bundles sufficient in capturing learner language development (Read & Nation, 2006). This study therefore
Academic writing utilized both single word-based and multi-word analyses. Specifically, it explored vocabulary
distributions and lexical bundles to better understand the development of writing proficiency
across three levels in an English Placement Test corpus (EPT) (N = 1388). Inference tests for
multiple population proportions were conducted to compare statistical differences in the pro-
portions of vocabulary and lexical bundle distributions across all group levels. The results sug-
gested that higher proficiency learners used a higher number of types, tokens, and word families
than lower proficiency learners. Regarding lexical bundles, noun phrase-based and verb phrase-
based bundles with referential and stance functions were significantly found in lower-level re-
sponses. Preposition phrase-based bundles were significantly used in higher-level written dis-
course. This study suggests the importance of vocabulary and lexical bundles in academic
writing, the necessity of including these features identified in this study in a second language
writing curriculum, and the need to incorporate these features prevalent in a rating scale for
assessments of academic writing.
1. Introduction
Being proficient in academic writing is crucial since it is one of the key measures of academic success (Kellogg & Raulerson, 2007).
Among the many features of second language (L2) writing, lexical knowledge can be a significant indicator of developmental stages of
writing ability. Engber (1995), for example, found that there was a positive relationship between lexical diversity, such as the type-
token ratio, and L2 writing proficiency. Essays with more diverse lexical items tended to be awarded with higher scores. In addition,
Santos (1988) showed that lexical errors could affect the judgments of essays, since raters’ evaluations were primarily based on
whether the lexicon was used correctly or whether a variety of lexical resources were included in the essays. Lexical sophistication is
also another indicator of writing proficiency. Less frequent words are considered sophisticated, whereas more frequent words are
considered less sophisticated (Kyle & Crossley, 2015). Research showed that L2 writers at higher proficiency used less frequent words
than their less proficient counterparts (Crossley & McNamara, 2012). Therefore, lexical analysis of learner language is an important
area of second language acquisition (SLA), and numerous studies analyzed different features of learner language in order to provide
measures of language development (Biber & Gray, 2013; Ferris, 1994; Grant & Ginther, 2000). The wide range of lexical features that
received much attention includes lexical specificity (Grant & Ginther, 2000); word length and special lexical classes (Ferris, 1994);
lexical density (Read & Nation, 2006); mean word length and type/token ratio (Cumming et al., 2005); and lexical frequency
⁎
Present address: Department of English, University of Foreign Language Studies - the University of Danang, Vietnam.
E-mail address: vtsca@ufl.udn.vn.
https://doi.org/10.1016/j.jslw.2018.11.002
Received 20 June 2018; Received in revised form 19 November 2018; Accepted 26 November 2018
$YDLODEOHRQOLQH'HFHPEHU
(OVHYLHU,QF$OOULJKWVUHVHUYHG
S. Vo -RXUQDORI6HFRQG/DQJXDJH:ULWLQJ²
distributions (Biber & Gray, 2013). These studies showed that vocabulary frequency measures could be useful indicators of the
development of learner language at different levels.
However, analyses of individual words are not robust enough to fully measure learner language development, since lexis is not
just related to single words but involves multi-word units (Read & Nation, 2006). Therefore, recent research paid more attention to
formulaic language that is learned and encoded in the brain as chunks rather than individual words (Nation, 2013). For example, a
few studies focused on lexical bundles (Biber & Gray, 2013) or collocational density (Howarth, 1998). Those studies found that
formulaic sequences might provide clearer evidence of learner language progression. Nevertheless, as most L2 vocabulary acquisition
research has focused on comparing native and nonnative written discourse rather than on learner language across proficiency levels
(Staples, Egbert, Biber, & McClair, 2013), more studies on learner lexical development in terms of individual words and lexical
bundles are needed (Read & Nation, 2006).
Therefore, the current study is distinguished from previous studies by its focus on vocabulary profiles and lexical bundles of non-
native writing samples across proficiency levels. Lexical bundles are defined as the most frequent recurring lexical sequences in a
register (Biber, Johansson, Leech, Conrad, & Finegan, 1999). They are usually not complete grammatical structures nor idiomatic, but
they function as basic building blocks of discourse (Biber et al., 1999). Together with vocabulary distribution analysis, lexical bundles
are chosen in this study because “using well-tried expressions in appropriate places” (Biber et al., 1999, p. 990) is important in
helping L2 learners sound natural in their second language (Vidakovic & Barker, 2010). Therefore, the current study examines how
lexical frequency distributions as well as the frequency, structure and functions of lexical bundles are used in written responses across
proficiency levels in order to provide an insight into learners’ L2 lexical development.
2. Literature review
2.1. Vocabulary profiles in academic writing
In the two studies investigating lexical variation in learners’ written output for the Main Suite exams, which are Cambridge ESOL’s
core General English exams, Shaw and Weir (2007) showed that higher proficiency learners produced more tokens and types than
lower-level counterparts. Moreover, these studies found that discourse by higher proficiency learners included content words with a
higher information load than discourse by lower proficiency learners. Vidakovic and Barker (2010) also supported these findings by
showing the same patterns in both spoken and written responses for the five Common European Framework of Reference for Lan-
guages (CEFR) levels.
Several other studies investigated a variety of linguistic characteristics of the Test of English as a Foreign Language Internet-Based
Test (TOEFL iBT) written and spoken discourse. Examining the written independent and integrated responses from 36 TOEFL iBT test-
takers, Cumming et al. (2005) found significant differences across levels for length of response, lexical diversity, T-unit (clause)
length, grammatical accuracy, integration of source materials, and paraphrasing. Later, Biber and Gray (2013) conducted a larger-
scale discourse analysis of 480 TOEFL iBT written and spoken performances. Biber and Gray found that 80–85% of words belong to
the top 1000 words from the general service list (GSL) (West, 1953). Moreover, they found that higher-level responses used fewer of
the most frequent list (GSL 1 K words) and more of the less-common words (GSL 2 K words) and Academic Word List (AWL)
(Coxhead, 2000) words than lower proficiency levels.
2.2. Lexical bundles in academic writing
2.2.1. Frequency of lexical bundles

Previous studies investigated differences in the frequencies of lexical bundles by native English writers and non-native English
writers (DeCock, 2000; Römer, 2009). DeCock (2000) compared the type and token frequencies of lexical bundles in native English
(NS) and nonnative English (NNS) writing, and found that more two- to four-word bundle tokens were employed in NNS under-
graduate writing than in NS undergraduate writing. DeCock added that compared to NS writing, some bundles were very frequent in
NNS writing while other bundles were less frequent. In contrast, Römer (2009) found few differences between four-word bundles in
native and advanced non-native English writing, but both native and advanced non-native English student writers lacked very similar
sets of expert academic English bundles. Römer indicated that experience with academic writing is more important to the frequency
of lexical bundles than the first language of the writers, and indicated both groups of students might need similar training with their
academic writing so that they can become more proficient writers.
Other studies focused on differences in the frequency of bundles in NNS academic writing across proficiency levels in order to
provide evidence of learner lexical development. Read and Nation (2006) revealed that higher proficiency level test-takers (IELTS
Band 8) produced more formulaic multiword strings than other band test-takers. Similarly, investigating learner lexical development
in the written discourse of Cambridge ESOL’s Skills for Life test, Vidakovic and Barker (2010) suggested that lexical bundles were
rarely used in the written discourse of low proficiency learners, whereas the type and token of lexical bundles increased in the
discourse of intermediate and advanced proficiency learners. However, the results of Read and Nation (2006) and Vidakovic and
Barker (2010) differ from the findings of Staples et al.’s (2013) study, in which more lexical bundle tokens were observed in low-level
TOEFL iBT written responses than in higher-level responses, although most of those bundles in low-level discourse were prompt-
dependent. These differences might be a result of the different task types in those three standardized tests because different task types
can lead to different patterns in learners’ discourse (Cumming et al., 2005).

Table 1
Structures of lexical bundles.
Adapted from Chen and Baker (2010, p. 35).
Category Pattern Example
NP-based (1) NP with post-modifier fragment the nature of the

PP-based (2) PP + noun phrase fragment as a result of
VP-based (3) copula be + NP /adjective phrase is one of the
(4) VP with active verb has a number of
(5) anticipatory it + VP/adjective phrase+(complement clause) it is possible to
(6) passive verb + PP fragment is based on the
(7) VP + that-clause fragment should be noted that
(8) verb/adjective + to-clause fragment are likely to be
(9) others as well as the
Note. NP = noun phrase; PP = prepositional phrase; VP = verb phrase.
2.2.2. Structures of lexical bundles

In addition to the studies on frequencies of lexical bundles, a number of studies have also investigated the structures of lexical
bundles across register (Biber et al., 1999; Biber, Conrad, & Cortes, 2004) and among writers’ language background (Chen & Baker,
2010). In conversation register, lexical bundles are mainly in the forms of clauses (I don’t know why), whereas those in academic prose
are mostly NPs and PPs (the nature of the, as a result of) (Biber et al., 1999). Biber et al. (1999) grouped lexical bundles prevalent in
academic prose into twelve structural categories: 1) NP with of-phrase fragment (the base of the); 2) NP with other post-modifier
fragment (the way in which); 3) PP with embedded of-phrase fragment (about the nature of); 4) other PP fragment (between the two
groups); 5) anticipatory it + VP/AdjP (it is possible to); 6) passive verb + PP fragment (be found in the); 7) copula be + NP/AdjP (is one
of the); 8) VP + that-clause fragment (should be noted that); 9) verb/adjective + to-clause fragment (are likely to be); 10) adverbial
clause fragment (as we have seen); 11) pronoun/NP + be (this is not the); and 12) other expressions (as well as the) (pp. 1014–1015).
Later, Biber et al. (2004), p. 381) group lexical bundles into three main categories: lexical bundles that incorporate 1) VP fragments
(is based on the), 2) dependent clause fragments (that this is a), and 3) NP and PP fragments (the way in which) and show that phrasal
bundles are mostly common in academic research articles and university textbooks.
Chen and Baker (2010) re-organized Biber et al.’s (1999) twelve structural categories of lexical bundles in academic prose into
three main categories: NP-based, PP-based, and VP-based bundles as shown in Table 1 according to the prevalent patterns occurring in
academic writing by native English novice and expert writers and non-native English writers in their study. NP-based bundles refer to
NPs with post-modifier fragments (the nature of the) (which are relevant to categories 1 and 2 identified in Biber et al., 1999 and
category 1 identified in Biber et al., 2004). PP-based bundles include bundles starting with a preposition and a NP fragment (as a
result of) (which are related to categories 3 and 4 identified in Biber et al., 1999 and category 3 in Biber et al., 2004). VP-based
bundles refer to bundles starting with a verb component (see patterns from 3 to 8 in Table 1, which are associated with categories 5,
6, 7, 8, and 9 in Biber et al., 1999 and relevant to categories 1 and 2 in Biber et al., 2004). Chen and Baker also added ‘others’ category
for the bundles that have grammatical structures different from the three main categories above (my dream is to) (which are related to
bundles in categories 10 and 12 in Biber et al., 1999 and to several sub-parts of the categories 1 [e.g., third person pronoun + VP
fragments: that’s one of the], 2 [e.g., if-clause fragments: if you want to], and 3 [e.g., comparative expressions: as well as the] in Biber
et al., 2004). Comparing lexical bundles in written discourse by the three groups of writers, Chen and Baker showed that native
English experts wrote more NP-, PP-, and VP-based bundles than non-native and native English students, and among these three types
of structures, NP-based and PP-based bundles were used more than VP-based bundles across both native and non-native writing.
However, little is known about how lexical bundle structures are employed by non-native writers at different proficiency levels.
Given the gap in the literature, this study adapted Chen and Baker’s (2010) structural framework of lexical bundles in academic
writing, which is a re- categorization of the twelve structural categories of lexical bundles in academic prose proposed by Biber et al.
(1999) for the structural bundle analysis of the EPT written responses. Chen and Baker’s framework is chosen over Biber et al.’s
(1999) and Biber et al.’s (2004) frameworks, since this current study focused on the comparisons of the structure of bundles by
learners across proficiency levels, which would match more closely to Chen and Baker, who compared the bundle structure among
student and expert academic writing than to Biber et al. (1999) and Biber et al. (2004), which focused more on comparisons of the
structure of lexical bundle across register.
2.2.3. Functions of lexical bundles

Apart from the frequency and structures of lexical bundles, several other studies also examined functional classification of lexical
bundles in academic writing (Biber & Gray, 2013; Biber et al., 2004; Chen & Baker, 2010; Hyland, 2008; Nesi & Basturkmen, 2006;
Staples et al., 2013; Vidakovic & Barker, 2010). Most of these studies adopted the taxonomy from Biber et al.’s (2004) study in which
lexical bundles are classified into three main functions: stance expressions, discourse organizers, and referential expressions. Stance
expressions “express attitudes or assessments of certainty that frame some other proposition” (Biber et al., 2004, p. 384). The
following examples extracted from the materials in this current study include epistemic stance in Example 1 (are more likely to) and
attitudinal/modality stance as in Example 2 (it is important to). The illustrations of the bundles are bolded in the examples.

(1) According to the recent reasearches people who eat too much fast food are more likely to get heart disease or be fat. (Epistemic stance,
EPT 101B)
(2) It is important to remember that even though it may be a cool invention to be able to chat through your phone, social skills are still
very much needed in order to work in certain business or corporations. (Epistemic stance, EPT Pass)
Discourse organizers “reflect relationships between prior and coming discourse” (Biber et al., 2004, p. 384). ‘On the other hand’ in
Example 3 below is one type of discourse organizing bundles.
(3) They never let their children watch TV. On the other hand, some people refute such criticisms on TV and say that TV is useful and
good for education. (Discourse organizer, EPT 101C/D)
Referential expressions “make direct reference to physical or abstract entities, or to the textual context itself, either to identify the
entity or to single out some particular attribute of the entity as especially important” (Biber et al., 2004, p. 384). Examples include
identification/focus (one of the most, Example 4), imprecision (or something like that, Example 5), specification of attributes (a lot of
the, Example 6), and time/place/text reference (in the United States, Example 7).
(4) The Internet has developed to a great extent and has become one of the most powerful communication tools for almost every purpose
over the past decade. (Referential expression – identification/focus, EPT Pass)
(5) Nowdays, students waste tons of money and time on their outfit everyday. They have to figure what kind of T-shirt goes with jeans or
something like that. (Referential expression – imprecision, EPT 101C/D)
(6) A lot of the medical treatment doctors use is to prolong the life but not curing the patient. (Referential expression – specification of
attributes, EPT 101C/D)
(7) In the past few years, distance learning has been developed in the United States. (Referential expression – place reference, EPT 101B)
Similar to the structure of lexical bundles, the function of lexical bundles has been found to vary according to registers and
writers’ language background (Biber et al., 2004; Hyland, 2008; Nesi & Basturkmen, 2006). Research on the bundle function in a
variety of registers has shown that there were more stance and discourse organizing bundles in classroom teaching than in con-
versation, and more referential bundles are employed in classroom teaching than in academic prose (Biber et al., 2004). Furthermore,
discourse organizing bundles were mostly found in academic lectures (Nesi & Basturkmen, 2006). Meanwhile, instances of stance
bundles were not common in research articles and dissertations (Hyland, 2008).
The bundle function in the writing among non-native and native English writers and expert writers was also investigated in a
number of studies in which mixed results were reported (Ädel & Erman, 2012; Chen & Baker, 2010). When compared with published
academic writing, both non-native and native student writing used more discourse organizing bundles and fewer referential bundles
(Chen & Baker, 2010). Meanwhile, Ädel and Erman (2012) did not find any significant differences in the three functions in NSs and
NNSs essay writing. Both groups used referential bundles more frequently than discourse organizing and stance bundles, whereas the
number of stance bundles was found to be slightly greater in NSs writing and the instances of discourse organizing bundles were
lower in the NNSs writing.
Several recent studies on functions of lexical bundles in academic writing across learner levels also showed that high proficiency
learners produced more discourse bundles (Vidakovic & Barker, 2010) and stance bundles (Biber & Gray, 2013) than lower profi-
ciency learners who relied mainly on bundles with a referential function. However, Staples et al. (2013) showed that the frequency of
stance and discourse organizing bundles were similar across proficiency groups and very few referential bundles were present across
the levels. These contrasting findings might be because the prompts and the test tasks in these studies were different, which might
have influenced the functional distributions of lexical bundles in academic writing (Cumming et al., 2005).
Generally, most research on the function of lexical bundles has compared the writing by native and non-native English writers.
Moreover, the findings of those have not yet provided clear evidence as to whether non-native writing across proficiency levels uses
bundles with different functional purposes. While Chen and Baker’s (2010) structural framework of lexical bundles in academic
writing is most helpful for examining structures of lexical bundles in non-native writing, Biber et al.’s (2004) framework is most
appropriate for examining the functional patterns of lexical bundles among different L2 writing proficiency levels in this study. The
purpose of this analysis was to provide more understanding of functional patterns of bundle in NNS writing across different levels.
3. Research questions
Given the shortcomings and contrasting findings in the literature with respect to lexical frequency distributions and lexical
bundles in non-native English academic writing across proficiency levels, the current study aimed to compare vocabulary frequency
and lexical bundles in the EPT written responses across score levels. The purpose of this study was to provide insights into learners’ L2
lexical development. The study investigated four research questions:
1) What are the word frequency distributions across proficiency levels of academic writing?
2) To what extent do learners across proficiency levels use lexical bundles in academic writing?
3) How are the structures of lexical bundles distributed among different levels of writing proficiency?
4) What are the functions of lexical bundles across different levels of writing proficiency?

4. Methodology
4.1. Corpus description
The material of this study was the English Placement Test Corpus (EPT), which is a collection of non-native English students’
writing examination papers taken from different EPT tests from 2007 to 2014 from a large Midwestern university in the US. In the 30-
minute writing task, English as a second language (ESL) undergraduate students were required to write an essay in response to a point
of view, argument, or problem while ESL graduate students were asked to describe and explain data presented in a table or a chart.
Each of the essays were rated and classified into three levels by experienced ESL instructors according to the 2012 ACTFL Proficiency
Guidelines for Writing (http://www.actfl.org/sites/default/files/pdfs/public/ACTFLProficiencyGuidelines2012_FINAL.pdf). This
study was informed by the three sub-corpora of the three EPT proficiency levels: EPT 101B, EPT 101C/D, and EPT Pass. Students who
failed the EPT writing test were placed into EPT 101B and EPT 101C/D. The EPT 101B students are ESL students (both undergraduate
and graduate students) who lack major time marker control, thereby resulting in lack of clarity, and the ability to effectively use
vocabulary, grammar, and style corresponding to written language. They need to take ESL academic writing courses with a focus on
sentence, paragraph, and essay level writing. Essays that lacked well-connected discourse of paragraph structure were categorized as
EPT 101C/D. The EPT 101C/D students need to take only one ESL academic writing course with an emphasis on genre, discourse, and
rhetoric. EPT Pass students are advanced ESL undergraduate and graduate students who are able to demonstrate the ability to
develop arguments with good grammatical structure and wide general vocabulary control but are not able to do this consistently.
They passed the EPT exam and are not required to take any ESL academic writing courses. Most of the test-takers were from China,
India, Brazil, or Korea.
At the time of this study, the corpus contained 386,451 words. The sub-corpora contained 57,990 words for EPT 101B; 156,790
words for EPT 101C/D; and 171,671 for EPT Pass. The distributions of texts and word counts across these three sub-corpora are
represented in Table 2. Frequency-based lexical analyses were conducted on these three EPT sub-corpora in order to identify the
distributions of vocabulary profiles and of lexical bundles (in terms of frequency, structures, and functions) across different levels of
writing proficiency. The prompts were also available for checking prompt-dependent and non-prompt dependent lexical bundles.
4.2. Procedures
To address the first research question about vocabulary distributions across proficiency levels of academic writing, Vocabulary
Profiler within Compleat Lexical Tutor software (www.lextutor.ca/vp/eng/) was used for a variety of frequency-based lexical ana-
lyses. These analyses included type (different words) and token (total words in text) counts, lexical diversity, lexical density, and
lexical sophistication. Lexical diversity was measured by means of type-token ratio (TTR) per text. TTR is the ratio of the number of
different words to the number of total words, which shows the degree of lexical richness of a text according to the amount of
repetition (Nation, 2013). Lexical density was measured by the number of content words (i.e., nouns, verbs, adjectives, and adverbs)
divided by total number of words. As content words carry higher information load than function words (i.e., prepositions, articles),
texts with more content words are expected to be linguistically and cognitively richer (Nation, 2013). Lexical sophistication was
calculated as the proportion of low and high frequency words in a text according to three categories: K1 tokens (the most frequent
1000 words of English), K2 tokens (the second most frequent thousand words of English), and AWL (Coxhead, 2000) tokens. This
current study also included another lexical analysis that counted the total number of word families because it was found that a higher
number of word families were found more commonly in higher-level discourse than lower-level performances (Kang & Wang, 2014).
Overall, the first question of the study included eight measures of lexical resources: total number of types, total number of tokens,
TTR, lexical density, percentage of K1 tokens, percentage of K2 tokens, percentage of AWL tokens, and total number of word families.
In order to answer the other three research questions regarding the frequency, structures, and functions of lexical bundles among
writing proficiency levels, AntConc 3.2.1 (Anthony, 2011) was run. The study focused on four-word clusters as they occur more
commonly than 5-word clusters and have a clearer range of functions than 3-word clusters (Biber et al., 1999). Biber et al. (1999)
defined lexical sequences as lexical bundles only if they are used 10 times per million words in a register and spread across at least
five different texts in the register. Considering the word counts in the EPT sub-corpora, this current study defined lexical bundles as
any four-word clusters that occurred at least 10 times per 50,000 words in the corpus and were present in at least five texts. Any
bundles that were used in fewer than five texts were excluded in order to avoid individual writer idiosyncrasies. Because of the word
counts of the EPT sub-corpora, which were less than one million words, and the different text lengths between the sub-corpora, the
bundle counts were normalized to a rate of occurrence per 50,000 words in order for quantitative measures to be directly comparable
Table 2
Texts and word counts in the three levels of the EPT sub-corpora.
Sub-corpora Texts Word Counts
EPT 101B 242 57,990

EPT 101C/D 580 156,790
EPT Pass 566 171,671
Total 1388 386,451

across the sub-corpora.

A lexical bundle search of the entire corpus using AntConc software limiting the results to four-word lexical bundles was run.
Misspellings were also checked and the KWIC lines were referenced whenever possible by two coders who are graduate students
majoring in Applied Linguistics to ensure that the usage of the bundle was included into the data for analysis. Each coder first
conducted the coding of the same 5% sample of each sub-corpus. When there was disagreement, the researcher was consulted to solve
the issue. Cohen’s Kappa was calculated with 0.95. Then, the two coders shared the remaining texts for the coding. The coding left a
total of 32 lexical bundle types for the investigation of the bundle frequency in the whole EPT corpus. The complete chart with the
raw frequencies of tokens and types as well as the normalized frequency of the lexical bundles can be found in Appendix A in
Supplementary materials.
Then, the list of 32 lexical bundle types was checked to see whether they came from the prompts. Any lexical bundle that
belonged to the prompts was deleted from the analyses of bundle structures and functions because several prompt-dependent bundles
were topic-related and did not fit into any functional categories (e.g., similar occupation and income or the distribution of male). As a
result, the final list of lexical bundles for structural and functional analyses included 19 types (389 tokens). The complete list of non-
prompt bundles is provided in Appendix B in Supplementary materials. Lexical bundles were analyzed according to the structural
taxonomy of lexical bundles used in academic prose by Chen and Baker (2010). For the analysis of bundle functions, the 19 bundle
types were classified according to three functions: stance expressions, discourse organizers, and referential expressions (Biber et al.,
2004). The coding processes for the analyses of bundle structures and functions were similar to that for the analysis of the lexical
frequency above. Cohen’s Kappa was 0.93 and 0.91 for the coding of bundle structures and functions, respectively. To answer the
third and fourth research questions, the structures and functions of lexical bundles in each sub-corpus were examined and compared.
Inference tests for multiple population proportions were conducted to compare statistical differences in vocabulary and lexical bundle
distributions across levels. Fig. 1 below represents a brief overview of the research design of the current study.
5. Results and discussion
This study investigated the discourse of EPT written examinations in terms of vocabulary distributions and lexical bundles. The
purpose of the study was to examine lexical development of learner written discourse.
5.1. The word frequency across proficiency levels
The first research question was what the word frequency distributions across proficiency levels of academic writing are. As
presented in Table 3, there was a progression in the number of types and tokens in different levels of writing proficiency. The number
of types and tokens was low in lower-level written responses (3695 types and 57,989 tokens for EPT 101B) but high in advanced level
responses (6536 types and 156,790 tokens for EPT 101C/D; 7121 types and 171,671 tokens for EPT Pass). An inference test for
multiple population proportions indicated a statistically significant difference in the proportion of vocabulary distributions across all
groups (χ2 = 1054.6, p < 0.05). A more detailed analysis showed that the significant differences were found in the types (p < 0.05)
and tokens (p < 0.05) among the three groups. This suggests that written output by higher-level learners was longer and more varied
because at higher-levels, learners acquire greater linguistic knowledge. This result is in accordance with Shaw and Weir’s (2007)
findings for Main Suite written learner output in which the authors suggested that as learners developed in proficiency, they produced
a wider range of vocabulary in terms of both tokens and types.
Lexical diversity was measured by mean of type-token ratio (TTR) per text. The responses from EPT 101C/D and EPT Pass sub-
corpora used TTR in a very similar way (0.29 for both sub-corpora); however, there was a small difference between the lowest level -
EPT 101B and the higher-levels – EPT 101C/D and EPT Pass (the TTRs of 0.27 for EPT 101B and 0.29 for EPT 101C/D and EPT Pass).
Another vocabulary measure is lexical density. As can be seen from Table 3, lexical density was higher as proficiency levels increased
(1.81 for EPT 101B, 5.77 for EPT 101C/D, and 6.76 for EPT Pass). For lexical sophistication, the large percentage of words in the
Fig. 1. Study design.

Table 3
Distribution of words across vocabulary classes identified by EPT proficiency levels (normed per 50,000).
Vocabulary distributions EPT 101B EPT 101C/D EPT Pass
N = 242 N = 580 N = 566
(1) Total number of types 3695 6536 7121

(2) Total number of tokens 57,989 156,790 171,671
(3) Lexical diversity (type-token ratio) 0.27 0.29 0.29
(4) Lexical density 1.81 5.77 6.76
(5) Lexical sophistication
(5a) K1 tokens 0.76 0.27 0.25
(5b) K2 tokens 0.04 0.04 0.05
(5c) AWL tokens 0.03 0.04 0.05
(6) Total number of word families 1390 1806 1919
Note. TTR = mean of type-token ratio per text; Lexical density = number of content words divided by total number of words; K1 = the most
frequent 1000 words of English; K2 = the second most frequent thousand words of English; AWL = academic word list.
responses by the three groups was from the most frequent 1000 words of English (K1 tokens) and their frequency decreased as
proficiency levels increased, with 0.76 in the EPT 101B corpus, 0.27 in the EPT 101C/D corpus, and 0.25 in the EPT Pass. In contrast,
K2 tokens and AWL increased across score levels. EPT 101B responses included 0.04 K2 tokens and 0.03 AWL tokens, whereas EPT
101C/D responses produced 0.04 K2 tokens and 0.04 AWL, and EPT Pass responses included 0.05 K2 tokens and 0.05 AWL. However,
there was no statistically significant difference in lexical diversity, lexical density, and lexical sophistication across three levels of
proficiency.
As for word families, there were 1390 word families in the EPT 101B sub-corpus, 1806 in the EPT 101C/D sub-corpus, and 1919 in
the EPT Pass sub-corpus. An inference test for multiple population proportions showed that there was statistically significant dif-
ference in the proportion of word families (χ2 = 136.56, p < 0.05) across three groups of learners. This finding suggests that the
higher-level learners used more word families in their written discourse, suggesting that the higher-level written responses were
linguistically and cognitively richer than the lower-level output.
Very similar findings have also been reported in Biber and Gray’s (2013) study where higher-level TOEFL iBT responses contained
fewer of the most frequent words (General Service List 1000 most frequent words) than lower-level responses, and more of the less-
common words (General Service List second 1000 most frequent words) and AWL words. Moreover, these results are similar to Kang
and Wang’s (2014) findings, which indicated that higher-level candidates of Cambridge English exams (C1-C2) produced more types,
tokens, and word families than lower (B1-B2) level candidates. Additionally, the current study shares similar findings with Vidakovic
and Barker’s (2010), which compared the total number of tokens and types in written responses to Skills for Life Writing ex-
aminations across five entry levels and also suggested that those vocabulary variables were good indicators of high and low profi-
ciency levels.
Overall, although there is not much difference in lexical density, lexical diversity, and lexical sophistication among the EPT levels,
when combined with the other individual word measures, such as types, tokens, and word families, the higher-level learners’ written
discourse comes across as being more complex than the lower-level learners’ written language. Moving beyond the measures of
individual words, the result section continues with findings regarding formulaic language, particularly lexical bundles in terms of
their frequency, structures, and functions in different writing levels. In general, lower-level learners preferred to use prompt-based
bundles than the advanced group. VP-based bundles were prevalent in lower-level responses while PP-based bundles were noticeable
in higher-level responses.
5.2. The frequency of lexical bundles across proficiency levels
The second research question was ‘To what extent do learners across proficiency levels use lexical bundles in academic writing?’
As shown in Table 4, there was a decrease in the number of tokens of lexical bundles as the proficiency level increased. There were
more bundles in EPT 101B performances than in EPT 101C/D and EPT Pass students (263; 242; 195, respectively). An inference test
for multiple population proportions was conducted to determine if the proportion of lexical bundles was the same across proficiency
Table 4
Frequency of lexical bundles (tokens and types) in the three EPT sub-corpora.
Corpus Lexical Bundles: Tokens Lexical Bundles: Types Total Words
Raw Frequency Normed per 50,000
EPT 101B 348 263 30 57,990

EPT 101C/D 851 242 31 156,790
EPT Pass 744 195 32 171,671
Total 1943 700 32 386,451

Fig. 2. Frequency normed per 50,000 of non-prompt vs. prompt bundle tokens across proficiency levels.
levels. The test statistic is χ2 = 15.587 with a p-value of < 0.05. We reject the null hypothesis and conclude the proportion of lexical
bundles was different for at least one of the levels of proficiency. The pairwise hypothesis tests for the proportion of lexical bundles
for the three levels of proficiency showed that the differences are statistically significant for the difference between 101B and Pass
(p < 0.05) and between 101C/D and Pass (p < 0.05). There was no statistically significant difference between 101B and 101C/D. In
addition to the claim made by Staples et al. (2013) that formulaic language is an important device for low-level learners, this current
study specifically points out that lexical bundles are one of those important devices in low-level responses in non-native academic
prose.
5.3. The frequency of prompt-based versus non prompt-based bundle tokens across proficiency levels
A detailed examination of the bundles in the three sub-corpora showed that many bundles came from the test prompts such as year
of high school or similar occupation and income. However, as shown in Fig. 2, a high number of prompt-related bundles were found in
the three groups of learners. One hundred and six out of 263 bundles from the EPT 101B group were prompt bundles; 119 out of 242
bundles in EPT 101C/D responses were prompt-dependent; and 86 out of 195 bundles from the EPT Pass responses were prompt-
based bundles. Comparing prompt bundles across the three groups, it was noted that the EPT 101C/D learners had a higher number of
prompt-dependent bundles than the EPT Pass learners. An inference for multiple population proportions showed that there was
evidence that instances of prompt-dependent bundles were statistically significant different between 101C and Pass (χ2 = 10.48,
p < 0.05).
The fact that prompt-based bundles were preferred by lower-level groups more than the advanced group suggests a general
developmental process in which lower-level learners have acquired bundles and tend to overuse them, while advanced learners can
control their formulaic language and employ more creative expression in their writing. The findings of the current study also indicate
that the knowledge of formulaic language starts at the low level and then develops at the intermediate level and becomes the most
productive at the advanced level (Biber & Gray, 2013).
5.4. The structures of lexical bundles
The third research question investigated the structural distributions of lexical bundles among the three groups of non-native
English writers. In general, the three groups produced more NP-based and PP-based bundles than VP-related bundles and other
patterns such as adverbial clause fragment and NP + copula (to clause fragment) (see Table 5). The types and tokens of NP- based (5
types and 85 tokens) and PP-based bundles (4 types and 92) were higher than those of VP bundles (3 types and 72 tokens). These
Table 5
Distribution of lexical bundle structures across proficiency levels (normed per 50,000).
Category Pattern Total (Type) EPT 101B EPT 101C/D EPT Pass Example
(Token) (Token) (Token)
NP-based (1) NP with post-modifier fragment 5 29 32 24 a lot of time

PP-based (2) PP + noun phrase fragment 4 29 28 35 on the other hand
VP-based (3) copula be + NP 1 8 4 7 is one of the
(4) VP with active verb 2 33 14 6 have a lot of
Others (5) Adverbial clause fragment 3 27 19 14 as far as I
(6) NP + copula (to clause fragment) 1 9 5 3 my dream is to
(7) Personal pronoun + lexical VP 3 24 21 22 I want to be
Total 19 159 123 111
Note: NP = noun phrase; PP = prepositional phrase; VP = verb phrase.

findings are in line with Biber et al. (1999) in that it is common to see more NP- and PP-based lexical bundles in academic prose.
It is also interesting to see from Table 5 that there are more PP-related bundles (on the other hand) in high proficiency academic
prose than in lower-level discourse. The token of PP-related bundles in the EPT Pass corpus (35) was higher than that in the lower-
level sub-corpora (28 for EPT 101C/D and 29 for EPT 101B). In contrast, the patterns for NP- and VP-based bundles were opposite,
with a higher frequency of those bundles in lower proficiency academic prose (29 NP-based bundles, 8 copula be + NP, and 33 V P
with active verb for EPT 101B) than in higher-level responses (24 NP-based bundles, 7 copula be + NP, and 6 V P with active verb for
EPT Pass). An inference test of multiple population proportions was conducted to determine if types of bundle structures were the
same for all of the proficiency groups. There was no statistically significant difference in the distributions of NP across proficiency
levels. However, there were differences for the distributions of VP and other types. For instance, for VP, the test statistic is
χ2 = 27.875 and p-value is < 0.05. We reject the null hypothesis that the distribution of VP structures is the same for all three groups
and conclude that the distribution of VP structures is different for at least one of the three groups. The pairwise comparisons of
proportions showed that 101B responses contained more VP-based bundles than 101C/D ones (p < 0.05), and that 101B responses
were composed of more VP-based bundles than Pass responses (p < 0.05). Concerning PP, the test statistic is χ2 = 7.313 and p-value
is < 0.05, suggesting that the distribution of PP structures is different for at least one of the three levels. The pairwise comparisons
showed that there was difference between the proportion of PP bundles in 101B responses and that in Pass responses (p < 0.05). The
frequency of PP-based bundles in the written responses by EPT Pass learners suggests that when learners advance in proficiency, their
writing is more complex in a way that may show the use of English prepositions beyond their adverbial meanings. Moreover, learners
at higher proficiency levels preferred more VP-based bundles in passive voice followed by PP-fragment while learners at lower-levels
produced more VP-based bundles in infinitive verb form. For example, the VP-based bundles beginning with an infinitive verb (have a
lot of) are common in lower-level output. Another interesting point is that lower-level output tended to produce more NP bundles to
denote quantities (a lot of time) than higher-level output. These patterns might be a result of topic effect when undergraduate and
graduate students in this study took different prompts with different topics, which could be an interesting area for future in-
vestigation.
5.5. The functions of lexical bundles
The frequency of types and tokens of lexical bundles as well as the bundle structures provide only partial information about
learners’ language development. Therefore, the functions of lexical bundles were also investigated to provide a clearer understanding
of development of non-native English academic writing. As presented in Fig. 3, the most frequent function of lexical bundles across
the three levels was referential (159), followed by stance bundles (118) and discourse organizers (116).
An inference test for multiple population proportions showed that there was statistically significant difference in the instances of
referential expressions bundles across levels (χ2 = 8.151, p < 0.05). It was found that EPT 101B and EPT Pass learners performed a
significantly higher number of referential bundles (p < 0.05) than EPT 101C/D learners. For the stance bundles, the findings from an
inference test for multiple population proportions showed that there was significantly significant difference in stance bundles across
proficiency levels (χ2 = 32.136, p < 0.05). 101B learners significantly used more stance bundles than 101C/D learners (p < 0.05)
and Pass learners (p < 0.05). Another inference test for multiple population proportions was also run for discourse organizing
bundles. However, there was no statistical evidence that responses across groups differed in bundles functioning discourse orga-
nizing. These findings are in accordance with Vidakovic and Barker’s (2010) findings that as compared to stance and referential
bundles, advanced learners used a greater number of discourse bundles with a greater frequency than lower-level learners.
The nine types of referential bundles in the EPT sub-corpora (see Appendix B in Supplementary materials for the complete list of
referential bundles in the three EPT sub-corpora) expressed identification (is one of the, Example 11), quantity specification (a lot of
time, Example 12), and time or place reference (all over the world, Example 13). These discourse functions were more frequent in
lower-level output than in higher-level responses.
Fig. 3. Functions of lexical bundles (tokens) across proficiency levels (normed per 50,000).

(11) The finding of paper is one of the great and important events in the world. (EPT 101B, identification)
(12) If a person always changes the goal, he will waste a lot of time and maybe will not live his successful life in the end. (EPT 101C/D,
quantity specification)
(13) I believe all these similarities are due to the fact that high school students all over the world are teenagers after all and thus have
common interest. (EPT Pass, place reference)
Meanwhile, all of the five stance bundles in the three sub-corpora (see Appendix B in Supplementary materials) functioned as
personal epistemic stance (my point of view) and personal desire (I want to be) as in Examples 14 and 15. While the personal desire
bundles such as I want to be, my dream is to were more common in the lower-level EPT writing than in the higher-level sub-corpus, the
pattern of the frequency of the epistemic stance my point of view was opposite.
(14) From my point of view, I vote for the latter point. (EPT Pass, personal epistemic stance)
(15) In my life to make a decision that I want to be was a big homework to me. (EPT 101B, personal desire)
Of the five bundles functioned as discourse organizers among all three groups (see Appendix B in Supplementary materials), on the
other hand, Example 16 was the most frequent. In the three groups, this bundle served as a movement of the focus of the essay
between one point and another. I would like to, Example 17 was also present in higher-level responses to introduce the topic of the
essay, whereas this bundle did not occur in the lowest level sub-corpora.
(16) On the other hand, some think rules should be abolished. (EPT 101B, topic elaboration)
(17) I would like to use the paper version for writing since this will improve my skills and accuracy. (EPT Pass, topic introduction)
The analysis of lexical bundle functions in this study also shows that although lower-level learners used more lexical bundles than
advanced learners, an examination of concordance lines in the entire corpus suggested that the bundle functions in advanced re-
sponses were more appropriate than those in lower-level responses. To illustrate, two examples of the stance bundle my point of view
in the EPT 101B and EPT Pass sub-corpora are provided below.
(18) In my point of view, I think these products and serivices actually make our lives more convenient. (EPT 101B)
(19) In my point of view, there are lots of differences, between electronic version and paper version system. (EPT Pass)
While there was an overlap in the stance bundle in the low-level response as in Example 18 (my point of view and I think were
usually found together in the low-level sub- corpora), high-level output could use this stance bundle my point of view correctly as in
Example 19. These findings suggest that learning formulaic language becomes truly productive only at later stages of L2 acquisition
(Biber & Gray, 2013). Generally, this functional analysis of lexical bundles suggests that the development of written discourse across
score levels goes together with the changing functional roles of lexical bundles. The higher frequency of referential and stance
bundles by low-level learners suggests that low-level learners acquire referential and stance bundles before discourse bundles.
6. Conclusion
The main purpose of this study was to investigate lexical frequency distributions and lexical bundles in learner language between
language proficiency levels, trying to obtain a picture of learners’ L2 lexical development. The findings of this study based on the
three EPT sub-corpora consisting of 1388 texts and 386,451 tokens suggest that the single word-based measures such as the number
of types, tokens, and the number of word families provide useful information about learner lexical progression. The frequency of
word types and tokens increased as the proficiency levels increased, and higher proficiency learners produced a wider range of word
families than lower proficient counterparts.
The frequency, structure and functions of lexical bundles can also be useful indicators of writing development across proficiency
levels. Lexical bundles were employed the most frequently by low-level learners, although their bundles were more prompt-de-
pendent and less structurally diverse than those by learners of higher writing proficiency. In terms of the functional findings of lexical
bundles, the study suggested that there were some changes in the functional roles of lexical bundles as the proficiency level increased.
For example, referential and stance bundles were prevalent in the written responses of high intermediate and low-advanced learners.
Overall, the findings of the frequency, structure and functions of lexical bundles in written discourse provide useful information on
the patterns of L2 lexical development and the development of discourse organization in L2 writing.
Several implications can be drawn from the findings of this study. First, it supports L2 vocabulary acquisition research (Nation,
2013) in a way that can trace learners’ lexical progression across proficiency levels. Discourse by lower-level learners shows evidence
of high proportions of high-frequency words and memorized expressions, suggesting the limits in their vocabulary knowledge at the
early stages of L2 acquisition. At the later stages of L2 acquisition, advanced L2 learners can master a wider range of vocabulary and
construct their own language with the diverse and appropriate range of formulaic language, suggesting the developmental stage to
more native-like language. Thus, such research on lexical analyses of learner language would be beneficial for understanding of how
L2 learners develop their knowledge of vocabulary at different proficiency levels.
Second, the study has implications for teaching practices by raising the importance of academic words and lexical bundles in
academic writing and the necessity of teaching them in the L2 writing classroom. Although there are no statistical differences in less-
common words and discourse organizers among the level groups in this study, it might be useful to have more explicit instruction on
vocabulary with a focus on using more reading as input to serve as exposure to less-common words (e.g., K2 words) and AWL words
in order to help improve learners’ lexis knowledge so that they can become more proficient L2 writers. Moreover, low-level learners

appear not to be aware of the necessity of discourse organizers in their academic writing. Therefore, this indicates that learners at
earlier proficiency levels should also be taught how to use discourse organizers to connect their ideas in academic writing. As
formulaic language can be learned in a short-term course (Schmitt, Dörnyei, Adolphs, & Durow, 2004), the instruction of lexical
bundles could be beneficial for language learners (Cortes, 2006). Thus, teachers could give learners opportunities to practice using
different types of lexical bundles, especially frequently used discourse organizers bundles in writing assignments. Useful information
about lexical bundle instruction has also been recommended in Cortes (2006). Overall, the results of this study suggest that ESL/
English as a foreign language learners would benefit from a writing curriculum which includes features of L2 writing at both in-
dividual word and lexical bundle levels.
Finally, individual words and lexical bundles are considered to be features of interest for assessments of writing proficiency.
Although the EPT raters might have paid attention to a variety of textual features during their ratings, the lexical features investigated
in this study could be among the useful factors that might have led to the ratings of the EPT writing examinations. Therefore, from the
writing assessment perspective, the findings of this study inform future development of a rating scale for the EPT writing test and the
EPT rater training process.
Future research is recommended to overcome the limitations of this study. First, the linguistic analysis of learner language in this
study was limited to individual words and lexical bundles. Therefore, it would be good to include more comprehensive linguistic
measures such as collocational and lexico- grammatical analyses so that more evidence of learner language development can be
obtained.
Second, topic-related bundles have been a concern in the literature because they could inflate the findings of the bundle em-
ployment in academic writing (Chen & Baker, 2010). As the EPT written responses in this study were from different prompts, the
findings could not avoid the effect of topic-related bundles. Also, this study did not identify two types of overlapping bundles
mentioned in Chen and Baker’s (2010, p. 33) study: “complete overlap” and “complete subsumption”. Complete overlap refers to two
4-word bundles which come from a single 5-word bundle. For example, if both it has been suggested and has been suggested that occur
six times and come from one longer bundle it has been suggested that, then those two 4-word bundles are called complete overlaps.
Complete subsumption occurs when “two or more 4-word bundles overlap and the occurrences of one of the bundles subsume those
of the other overlapping bundles” (Chen & Baker, 2010, p. 33). For example, one of the most occurs 10 times and is one of the occurs
five times and both bundles occur as a subset of the 5-word expression is one of the most. In this situation, one of the most and is one of
the are called complete subsumption bundles. Chen and Baker argued that these types of bundles should be combined into one longer
unit to avoid inflating the results of quantitative analyses. Thus, future research might want to take this issue into more careful
consideration.
As a final point, previous studies have shown several similarities in lexical bundles between non-native advanced English learners
and native English student writers (Römer, 2009); however, there could be some differences between the two groups in terms of types
of lexical bundles (Ädel & Erman, 2012) and structures of lexical bundles (Chen & Baker, 2010). As this current study indicates that
individual words and lexical bundles were more diverse, productive, and appropriate in the advanced level responses, it would be
particularly interesting to see if this group of advanced L2 learners uses those lexical features in the same way as native English
student writers do.
Appendix A. Supplementary data
Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.jslw.2018.11.
002.
References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach. English for
Specific Purposes, 31, 81–92.
Anthony, L. (2011). AntConc (Version 3.2.1) [Computer Software]. Tokyo, Japan: Waseda University.
Biber, G., & Gray, B. (2013). Discourse characteristics of writing and speaking task types on the TOEFL ibt® test: A lexico-grammatical analysis (TOEFL iBT® research series No.
TOEFLiBT-19). Princeton, NJ: Educational Testing Service.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The longman grammar of spoken and written English. London: Longman.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405.
Chen, Y. H., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning & Technology, 14(2), 30–49.
Cortes, V. (2006). Teaching lexical bundles in the disciplines: An example from a writing intensive history class. Linguistics and Education, 17, 391–406.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.
Crossley, S. A., & McNamara, D. S. (2012). Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in
Reading, 35, 115–135.
Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for
next generation TOEFL. Assessing Writing, 10(1), 5–43.
DeCock, S. (2000). Repetitive phrasal chunkiness and advanced EFL speech and writing. In C. Mair, & M. Hundt (Eds.). Corpus Linguistics and Linguistic Theory (pp. 51–
68). Amsterdam: Rodopi.
Engber, C. A. (1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing, 4(2), 139–155.
Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28(2), 414–420.
Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2), 123–145.
Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 24–44.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4–21.
Kang, O., & Wang, L. (2014). Impact of different task types on candidates’ speaking performances and interactive features that distinguish between CEFR levels. Cambridge
English: Research notes, vol. 57, 40–49.

Kellogg, R. T., & Raulerson, B. A. (2007). Improving the writing skills of college students. Psychonomic Bulletin & Review, 14, 237–242.
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings and application. TESOL Quarterly, 49, 757–786.
Nation, I. S. P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge: Cambridge University Press.
Nesi, H., & Basturkmen, H. (2006). Lexical bundles and discourse signaling in academic lectures. International Journal of Corpus Linguistics, 11(3), 283–304.
Read, J., & Nation, P. (2006). An investigation of the lexical dimension of the IELTS speaking test. IELTS research reports 6. IELTS Australia/British Council207–231.
Römer, U. (2009). English in academia: Does nativeness matter? Anglistik. International Journal of English Studies, 20(2), 89–100.
Santos, T. (1988). Professors’ reactions to the academic writing of nonnative-speaking students. TESOL Quarterly, 22, 69–90.
Schmitt, N., Dörnyei, Z., Adolphs, S., & Durow, V. (2004). Knowledge and acquisition of formulaic sequences. In N. Schmitt (Ed.). Formulaic sequences: Acquisition,
processing and use (pp. 55–71). John Benjamins.
Shaw, S. D., & Weir, C. J. (2007). Examining writing: Research and practice in assessing second language writing. Cambridge: Cambridge University Press.
Staples, S., Egbert, J., Biber, D., & McClair, A. (2013). Formulaic sequences and EAP writing development: Lexical bundles in the TOEFL iBT writing section. Journal of
English for Academic Purposes, 12(3), 214–225.
Vidakovic, I., & Barker, F. (2010). Use of words and multi-word unites in Skills for Life writing examinations. IELTS research reports 41. IELTS Australia/British Council7–14.
West, M. (1953). General service list of English words. London, UK: Longman.
Sonca Vo received a Ph.D. degree in Applied Linguistics and Technology from Iowa State University, Ames, IA, USA. She is teaching at the Department of English at
University of Foreign Language Studies - the University of Danang, Danang, Vietnam. She is interested in the development and validation of language assessments and
quantitative research methods.

V.vo JSLW Corpus

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

V.vo JSLW Corpus

Uploaded by

Copyright:

Available Formats

9/23/2019 Journal Format For Print Page: ISI

Master Journal List JOURNAL LIST

Search terms: *JOURNAL OF SECOND LANGUAGE WRITING

1. JOURNAL OF SECOND LANGUAGE WRITING

1. Social Sciences Citation Index

Contents lists available at ScienceDirect

Journal of Second Language Writing

Use of lexical features in non-native academic writing

University of Foreign Language Studies - the University of Danang, Danang, Vietnam

2.1. Vocabulary profiles in academic writing

2.2. Lexical bundles in academic writing

2.2.1. Frequency of lexical bundles

NP-based (1) NP with post-modifier fragment the nature of the

Note. NP = noun phrase; PP = prepositional phrase; VP = verb phrase.

2.2.2. Structures of lexical bundles

2.2.3. Functions of lexical bundles

4.1. Corpus description

EPT 101B 242 57,990

across the sub-corpora.

5. Results and discussion

5.1. The word frequency across proficiency levels

Fig. 1. Study design.

(1) Total number of types 3695 6536 7121

5.2. The frequency of lexical bundles across proficiency levels

Raw Frequency Normed per 50,000

EPT 101B 348 263 30 57,990

5.4. The structures of lexical bundles

NP-based (1) NP with post-modifier fragment 5 29 32 24 a lot of time

Note: NP = noun phrase; PP = prepositional phrase; VP = verb phrase.

5.5. The functions of lexical bundles

Appendix A. Supplementary data

You might also like