Journal of English for Academic Purposes 12 (2013) 214–225

Journal of English for Academic Purposes

Formulaic sequences and EAP writing development: Lexical

bundles in the TOEFL iBT writing section
Shelley Staples, Jesse Egbert, Douglas Biber, Alyson McClair
Northern Arizona University, Applied Linguistics
Iowa State University, Applied Linguistics & Technology

a b s t r a c t

Keywords: Formulaic sequences are widely used in academic writing and are known to be an
Corpus linguistics important aspect of EAP writing development. However, little research has investigated
EAP the frequency, function and degree of fixedness of their use by ESL writers across profi-
Lexical bundles
ciency levels. This study examines the use of lexical bundles in written responses across
Academic writing development
three proficiency levels in the TOEFL iBT (N ¼ 480). Bundles that were identical to those
Learner writing
found in the prompts were analyzed separately. Biber, Conrad, and Cortes’ (2004) taxon-
omy was used to identify bundle functions. Following Biber (2009), the degree of fixedness
for each of the four slots in the bundle was investigated in relation to the other three. The
results indicate that lower level learners used more bundles overall but also more bundles
identical to those in the prompts. In contrast, the functional analysis reveals a similar use
of stance and discourse organizing bundles across proficiency levels and very few refer-
ential bundles used by any of the groups. In addition, there were few differences in fixed
versus variable slot bundles across proficiency levels. These findings have important im-
plications for instruction and assessment of EAP writing.
Ó 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Formulaic sequences are an essential part of native and native-like language use. Use of formulaic sequences has also been
shown to be an important measure of learner development (Bolinger, 1976; Ellis, 1996; Ellis & Simpson-Vlach 2008; Pawley &
Syder, 1983; Wray, 2002). Second language acquisition (SLA) theory indicates that there is a heavy reliance on formulaic
sequences at early stages of second language learning. Research suggests that these sequences are initially stored as whole
units but then reanalyzed and reprocessed to form more flexible constructions at later stages of development (see Ellis, 1996;
Wray, 2002). Despite the importance of formulaic sequences in learner development, few studies have investigated the use of
formulaic sequences by learners of different proficiency levels. In addition, the findings for these studies are mixed: it is
unclear whether lower proficiency learners use more or fewer formulaic sequences than higher proficiency learners (Boers,
Eyckmans, Kappel, Stengers, & Demecheleer, 2006; Forsberg, 2010; Myles, Hooper, & Mitchell, 1998).
Notably, most previous investigations of learner development in the use of formulaic language have focused on spoken
language. Previous research has shown that formulaic sequences, specifically three or four word recurrent sequences called

lexical bundles, also have important functions in academic writing (see Biber, 2006; Biber, 2009; Biber & Barbieri, 2007; Biber,
Conrad & Cortes, 2004; Biber, Johansson, Leech, Conrad, & Finegan, 1999). While there have been some empirical in-
vestigations of formulaic sequences in learner academic writing, these have examined differences between native and non-
native speaker use rather than across learner proficiency levels (Ädel & Erman, 2012; Chen & Baker, 2010; DeCock, 2000;
Römer, 2009). Also important to determine is the extent to which learners are using formulaic language that is typical for
academic writing. This can be examined in terms of the functions employed by writers and the level of bundle fixedness.
The current study investigates scored responses of 480 participants on the written portion of the Test of English as a
Foreign Language Internet-Based Test (TOEFL iBT). The main goal of this study is to determine whether the frequency,
function and fixedness of lexical bundles used by English language learners in a controlled environment vary across profi-
ciency levels. A secondary goal is to see how learners’ use of lexical bundles compares with that generally found in academic

2. Literature review

2.1. Frequency of lexical bundles

Numerous previous studies have investigated the relative frequencies of lexical bundles in a wide range of registers (e.g.,
Biber & Conrad, 1999; Biber et al., 1999; Cortes, 2002). In addition, a growing number of studies have investigated differences
in bundle use between native speakers (NSs) and non-native speakers (NNSs) (e.g. Ädel & Erman, 2012; Chen & Baker, 2010;
DeCock, 2000; Römer, 2009).
DeCock (2000) compared the type (total unique lexical bundles) and token (total number of lexical bundles, including
duplicates) frequencies of bundles used by NS and NNS undergraduate academic writers. She found that NNS writers used
more two to four word bundle tokens than NS writers, but that NNSs used some sequences very frequently and others rather
infrequently in comparison with NSs. Ädel and Erman (2012) similarly found that NNS undergraduate writers used fewer
types than NS undergraduates but did not investigate token frequency. Results from Chen and Baker’s (2010) study support
the findings of DeCock (2000) and Ädel and Erman (2012), in part. The NNS undergraduate academic writers in their study
used certain lexical bundle types more than others, but they found a pattern of increased number of lexical bundle tokens
from NNS student writing to NS student writing to NS published academic writing.
However, other studies have found fewer distinctions between native and non-native use of bundles. Römer (2009), for
example, found very few differences between the use of four word sequences by NS and advanced NNS undergraduate
writers. Her findings emphasized the importance of experience with academic writing regardless of first language. It seems
that learners’ level of academic writing proficiency might be an important factor in whether or not native speaker status plays
a role in use of formulaic language.
A few studies have investigated and quantified lexical bundle use with multiple learner levels. Myles et al. (1998) studied
16 child language learners of French over a two year period and determined that while learners relied on fixed sequences in
the early stages of learning they gradually “unpacked” the formulaic language used in lower proficiency levels to create
unique patterns as proficiency levels increased. Boers et al. (2006) investigated whether oral proficiency judgments were
related to college level learners’ use of formulaic language. They found that a greater amount of formulaic language correlated
with higher proficiency scores. Finally, Forsberg (2010) developed a corpus of interviews with French language learners of
four different levels (beginner, intermediate, advanced, and very advanced). Similar to the findings in Boers et al. (2006), the
use of formulaic language increased as proficiency level increased.
It should be noted that it is difficult to accurately compare these studies, as different definitions of formulaic language were
used. All three studies coded the sequences by hand, using researcher impressions to guide choices. In addition, all three
focused on speech rather than writing. The present study extends previous research by investigating the use of formulaic
sequences, specifically lexical bundles, in academic writing of three learner levels.

2.2. Functions of lexical bundles

In addition to quantifying the frequencies of lexical bundles, a number of studies have organized bundles according to
function. One widely accepted taxonomy is Biber et al. (2004), which distinguishes among three functions of bundles: stance
expressions, discourse organizers, and referential expressions. Stance bundles most often refer to the speaker’s knowledge of
or attitude toward the information in the following proposition. Referential bundles are defined as expressions that “make
direct reference to physical or abstract entities, or to the textual context itself” (Biber et al., 2004, p. 384). Finally, discourse
organizers are defined as “relationships between prior and coming discourse” (Biber et al., 2004, p. 384).
Other functional taxonomies were developed for more specific circumstances than Biber et al. (2004) and thus were not
chosen for this study. Hyland (2008) proposes three categories of bundles for analyzing research articles and dissertations:
research-oriented bundles (some of which correspond to referential bundles), text-oriented bundles (which mostly corre-
spond to discourse orienting bundles), and participant-oriented bundles (which loosely correspond to stance bundles).
Hyland (2008) found that participant-oriented bundles were the least common in all of the disciplines explored in his study.
Another important functional study is Nesi and Basturkmen (2006), which investigates bundles used for the purpose of
cohesion in academic lectures (many of which overlap with Biber et al.’s (2004) discourse organizing bundles).
While no studies could be found that compared the use of bundles for the functions identified by Biber et al. (2004) across
learner levels, two recent studies compared the functions used by NSs and NNSs. Chen and Baker (2010) investigated student
writing by NSs and NNSs as well as journal articles written by NSs. Differences were found between the journal articles and
the two groups of student writing, but not between the NS and NNS student writing. Both student groups used proportionally
more discourse organizing bundles and proportionally fewer referential bundles when compared with those found in
published academic writing. However, both student groups still used referential bundles at a rate of approximately 38% (p.
38). Ädel and Erman (2012) similarly compared student writing by NSs and NNSs and found very little difference in the two
groups’ use of bundles for the three functions. They did find that the NS writers used a slightly greater number of bundles to
convey stance and slightly fewer bundles for discourse organizing functions. Referential bundles were used more frequently
than discourse organizers or stance bundles by both groups, at around the same rate (Ädel & Erman, 2012).
It is unclear whether NS and NNS writers use bundles for different functions. Even less certain is whether NNS writers at
different levels of proficiency produce bundles in different functional categories. Categorizing and comparing the functions for
which NNSs use lexical bundles may increase our understanding of patterns of use in NNS writing.

2.3. Fixedness of lexical bundles

Most studies of formulaic language have focused on quantifying and describing fixed bundles, where each bundle token is
a fixed string of a predetermined number of words. Fewer studies have investigated the degree to which the “slots” in
formulaic sequences are fixed or variable. In an early study of this type, Renouf and Sinclair (1991) investigate discontinuous
bundles that combine fixed and variable lexical slots. They show evidence that many formulaic sequences in English are made
up of frameworks with variable slots, containing fixed slots filled by function words and variable slots which may be filled by a
variety of content words (e.g., ‘a þ ? þ of ’).
Biber (2009) expands on the work of Renouf and Sinclair (1991) and introduces a more comprehensive methodology for
the analysis of lexical bundle fixedness. He looks at lexical bundles as both fixed units and frames with four word slots that are
potentially variable, meaning that more than one word can fill a given slot. In this study, each of the four slots in a given four
word formula is then compared with other similar formulas to measure the degree of fixedness for that particular slot (Biber,
2009, p. 292). Slots that are filled by a given word at least 50% of the time are labeled “fixed”. For example, the lexical bundle
on ‘the other hand’ was relatively fixed compared to ‘the nature of the’, in which the second slot was found to be filled by other
words, such as ‘end’.
There is very little research conducted on the use of bundles with variable slots by NNSs. However, Chen and Baker (2010)
did report that published academic writers used some bundles common to academic writing (e.g., ‘the þ noun þ of a/the’ as
in, ‘the nature of the’) with a higher number of collocations than either NS student writers or NNS student writers. Thus, there
is some evidence that published academic writers are using variable slot bundles more frequently than student writers,
regardless of native language.

2.4. Lexical bundles in speech and academic writing

Many studies have investigated differences in the use of lexical bundles across registers. These studies have revealed
notable differences between spoken and written registers in the frequency, function, and fixedness of lexical bundles. Studies
that have investigated the frequency of lexical bundles in NS writing have repeatedly shown that lexical bundles are more
common in speech than in writing, particularly academic writing (Biber, 2006; Biber et al., 2004, 1999).
Biber et al. (2004) compared the frequencies of lexical bundles used for stance, discourse organizing and referential
functions in conversation and academic writing. They found that stance and discourse organizing bundles were more
frequent in conversation, while referential bundles were more frequent in writing. Regarding bundle fixedness, Biber (2009)
revealed that formulaic language in conversation tended to contain three fixed words in a sequence with a variable slot either
preceding or succeeding it (e.g., ‘I don’t know’ *). In academic writing, on the other hand, there were more frames with in-
ternal variable slots (e.g., ‘the * of the’) (Biber, 2009, p. 299). In sum, variation in the frequency, function and fixedness of
lexical bundle use suggests meaningful differences between speech and writing in the way that formulaic language is pro-
cessed and the functions for which it is used. While the current study focuses on the use of lexical bundles in academic
writing, a comparison with findings for both conversation and academic writing provides insight into whether learners are
producing patterns common to academic writing.

3. Research questions

a) What differences, if any, are there in the frequency of lexical bundle use across proficiency levels?
b) What differences, if any, are there in the distribution of lexical bundle functions across proficiency levels?
c) What differences, if any, are there in the degree of lexical bundle fixedness across proficiency levels?
d) How do the findings for function and fixedness compare with those reported in other studies of academic writing and
4. Methodology

4.1. Corpus description

As mentioned above, this study uses data from a corpus composed of written responses to items on the TOEFL iBT.4 This
study is part of a larger project, the details of which are reported in Biber and Gray (2013). The larger project involved a
comprehensive lexico-grammatical description of the TOEFL iBT responses, both spoken and written, and across different
score levels and task types. The results showed systematic variation in the use of lexical bundles across score levels, so a more
detailed analysis of the bundle characteristics was undertaken for the current study. Along with additional investigations into
the frequency and function of lexical bundles used, this study adds an analysis of the degree of fixedness found for the bundles
used by test takers, since this is an important feature of lexical bundles used in academic writing. Another important addition
is the analysis of frequency data from each individual test taker, instead of analyzing the group as a whole. This approach is
relatively rare in studies investigating lexical bundle use and importantly allows for the use of inferential statistics.5
The corpus used for this study contains two written texts from 480 participants for a total of 960 texts and 249,417 words.
The responses were scored on a five point scale in half point increments. The corpus is further subdivided into three pro-
ficiency levels (low, medium, and high) based on a range of ETS scores, determined by percentile rank (33%, 66%, 99%). For the
purposes of this study, we define participant proficiency as the mean of the ETS scores on each participant’s two written tasks
(ETS, 2010). Table 1 below includes information regarding the composition of the corpus.

4.2. TOEFL iBT data

The writing section of the TOEFL iBT contains two tasks: integrated and independent. The integrated task requires the
participant to read a passage and listen to a recording, then synthesize information that they read and heard. The independent
task prompts the participant to express their opinion about an issue. The data for this study included two topics for each of the
two task types. Studies have shown TOEFL iBT writing section to be a both reliable (ETS, 2008) and culturally unbiased
(Stricker & Rock, 2008) assessment tool.

4.3. Procedures

After obtaining the corpus, a programming script was written in the R language, which provided word counts, retrieved
individual four word sequences, and determined whether they met the baseline criteria for analysis. Additionally, in an effort
to eliminate bundles that were task-related, lexical bundles needed to occur in texts from at least two separate writing tasks.
An effort was also made to distinguish between “prompt” and “non-prompt” bundles. This was done by identifying all of the
bundles used in the prompts. Prompt bundles were defined as bundles that appeared word for word in the prompt and that
were clearly related to the topic or task.
The next step was to generate normed token rates of occurrence (per 100 words) of prompt and non-prompt lexical
bundles for each test taker. Normed rates of occurrence were used for all lexical bundle analyses to eliminate the effect of
varying text lengths. We then classified non-prompt bundles that occurred more than 25 times based on their function
(referential, stance, or discourse organizing) (see Biber et al., 2004). Non-prompt bundle types were also analyzed for the
degree to which each slot was fixed or variable. A bundle was considered “fixed” if all four slots were filled by the same words
at least 50% of the time. On the other hand, a variable slot bundle was one in which one or more of the four slots was filled by a
different word more than 50% of the time.
In addition to prompt-related bundles, topic-related bundles have been identified as a concern in a number of studies.
Chen and Baker (2010) warn that topic-related bundles in particular should be included with extreme caution. For this study,
since test takers answered the same prompts, topic was not considered to be as great a concern. Bundles related to the topic of
the essay were included in the analysis of rates of occurrence and fixedness. However, they were not analyzed for function,
since their function did not fit into one of the three identified categories.

4.4. Variables

The independent variable in this study is the proficiency level of the participants. The two scores (from each of the two
tasks) were first averaged for each participant and then proficiency level was determined by the participant’s percentile rank.
The participants were divided into three groups – low, intermediate, and high. The three dependent variables in this study are
rates of occurrence of lexical bundle tokens (overall and prompt/non-prompt), rates of occurrence of lexical bundle tokens for

Table 1
Composition of the TOEFL written corpus.

Score range # of participants # Words Average # words/participant

1.00–2.75 (Low) 170 74,430 437.82
2.76–3.75 (Intermediate) 165 87,338 529.32
3.76–5.00 (High) 145 87,649 604.48
Total 480 249,417 519.62

the three functional categories, and frequencies of fixed and variable slot bundle tokens. A preliminary analysis of lexical
bundle tokens and types showed only a very small difference between results for tokens and types. This means that there
were many different types of bundles and most of them were not used very often. Therefore, the analysis focuses on token
rates of occurrence only. Fig. 1 displays the overall research design of the study.

4.5. Data analysis

For the frequency analysis, the token rates of occurrence for each participant were included. These data were screened, and
it was determined that they did not meet the underlying assumptions for normality or homogeneity of variance. Therefore,
non-parametric analogs were selected for each of the statistical procedures. The rates of occurrence for overall bundle tokens,
percentage of prompt versus non-prompt bundle tokens, and non-prompt bundle tokens were tested for significant differ-
ences across proficiency levels with a Kruskall–Wallis one-way analysis of variance (a ¼ .05), the non-parametric analog to
the parametric one-way ANOVA. A post-hoc analysis was conducted to compare all paired mean ranks using an independent
Mann–Whitney U-Statistic (the non-parametric analog to the independent samples t-test) with a Bonferroni adjustment
(a ¼ .017).
For the functional analysis, we classified non-prompt bundles that occurred more than 25 times based on their function:
referential, stance, or discourse organizing (see Biber et al., 2004). Referential expressions “make direct reference to physical
or abstract entities, or the textual context itself” (Biber et al., p. 384). For example, ‘one of the things’ and ‘in the case of’ are
referential bundles because they refer to physical and abstract entities. Stance expressions “express attitudes or assessments
of certainty that frame some other proposition.” (p. 384). ‘I don’t know if’ and ‘It is important to’ are examples of stance
bundles, with the former expressing personal stance and latter expressing impersonal stance. Finally, discourse organizing
bundles “reflect relationships between prior and coming discourse.” Two examples are ‘If you look at’ and ‘On the other hand’.
Lexical bundles were excluded from the functional analysis if they were determined to be focused on the specific topic of
the essay. For example ‘the sun and the stars’ and ‘the earth’s magnetic field’ were not analyzed functionally. Once the bundles
were analyzed according to the functional taxonomy, the normed rates of stance, discourse organizing, and referential
bundles was then compared across proficiency levels. Due to the fact that many participants did not use the individual lexical
bundles identified in the analysis, tests for statistical significance were not conducted.
Finally, the fixedness analysis first compared the percentage of bundles for each level that was fixed or had variable slots.
Since conversation and academic writing have been shown to follow different patterns in relation to fixedness (conversation

Proficiency Level

Overall Frequency

Prompt Non-Prompt
Bundles Bundles

Frequency Frequency Function Fixedness

• Stance • Fixed
• Discourse • Variable slot
Organizing Internal
• Referential variable slot

Fig. 1. Visual representation of categorizations in research design.

has more continuous sequences and academic writing contains more bundles with internal variable slots), the bundles were
further compared according to these patterns. Thus, the bundles that were either fixed or had a variable slot at the beginning
or end of the sequence (position 1 or 4 of a four word bundle) were grouped together, and those that contained a variable slot
in position 2 or 3 formed the other, comparison, group. Similar to the functional analysis, many participants did not use the
particular bundles identified for analysis and thus inferential statistics were not computed.

5. Results and discussion

5.1. Overall rates of occurrence of bundle tokens

The total number of bundle tokens meeting the criteria was determined for each participant and the mean scores were
compared across the three proficiency levels. As Fig. 2 shows, there was a decrease in the number of bundle tokens used as the
proficiency level increased.

Fig. 2. Participants’ overall bundle token frequency by proficiency level. Note: Center horizontal lines represent the median.

To determine whether the differences were statistically significant, a Kruskall–Wallis one-way ANOVA6 was computed for
the total rates of occurrence. The omnibus ANOVA analysis was significant, c2KW (2, N ¼ 480) ¼ 21.307, p ¼ .000, h2 ¼ .044. A
post-hoc analysis was conducted to compare all paired means using a Mann–Whitney U test.7 Significant differences were
found between group 1 (M ¼ 5.90) and group 3 (M ¼ 3.92) (p ¼ .001) as well as group 2 (M ¼ 5.07) and group 3 (p ¼ .000). The
participants with the highest proficiency level (scores on the TOEFL iBT ranging from 3.76 to 5.00) used significantly fewer
bundle tokens than both the lowest scoring group and intermediate level group. There seems to be a pattern of greater use of
bundle tokens by lower proficiency level writers. This finding may offer support for the idea that formulaic language is an
essential device for lower level learners.

5.2. Rates of occurrence of prompt-based versus non prompt-based bundle tokens

Upon examining the bundles used in the corpus, it became apparent that many of the four word sequences came directly
from the test prompts (e.g., ‘the ability to cooperate well’). Thus, bundle use for each participant was calculated separately for
prompt and non-prompt bundles. As Fig. 3 indicates, the lowest proficiency group used a higher number of prompt-based
bundles than either of the other two groups.
On the other hand, the participants in the intermediate proficiency group used the most non-prompt bundles. It can also be
seen that the lowest proficiency level used more prompt-based bundles in comparison to non-prompt bundles while the two
higher proficiency levels used proportionally more non-prompt bundles than prompt bundles. To determine whether the

The Kruskall–Wallis one-way ANOVA is the non-parametric analog to the parametric one-way ANOVA. The non-parametric analog was used since the
data in this study did not follow a normal distribution.
The Mann–Whitney U test is the non-parametric analog to the parametric independent samples t-test. The non-parametric analog was used since the
data in this study did not follow a normal distribution.
220 S. Staples et al. / Journal of English for Academic Purposes 12 (2013) 214–225

Fig. 3. Non-prompt versus prompt bundle tokens across proficiency levels. Note: Center horizontal lines represent the median.

differences across proficiency levels were statistically significant, a separate Kruskall–Wallis one-way ANOVA was computed for
the prompt bundles and the non-prompt bundles. The omnibus ANOVA analysis for prompt bundles was significant, c2KW
(2, N ¼ 480) ¼ 18.647, p ¼ .000, h2 ¼ .039. A post-hoc analysis was conducted to compare all paired means using a Mann–Whitney
U test. Significant differences were found between group 1 (M ¼ 3.77) and group 3 (M ¼ 2.02) (p ¼ .000). The participants in the
lowest proficiency group (scores on the TOEFL iBT ranging from 1.00 to 2.75) used significantly more bundles that came directly
from the prompts than the participants in the highest level group. This finding is not surprising. The learners in this group are
heavily relying on language provided in the environment, and certainly these chunks of language are often left unanalyzed by the
writers. Learners at higher proficiency levels, on the other hand, rely less on these unanalyzed chunks and are able to draw on
formulaic language from outside of the test prompts.
The rate of occurrence for non-prompt bundle tokens was also compared across the three participants in the groups. As
Fig. 3 shows, the pattern is in fact different for bundles not taken directly from the prompt as compared with the overall
bundle token use and use of prompt bundles. Level 2 used the greatest number of non-prompt bundles while the mean scores
for the lowest and highest proficiency level groups are similar. To determine whether any of the differences were statistically
significant, a Kruskall–Wallis one-way ANOVA was computed. The omnibus ANOVA analysis for non-prompt bundles was
significant, c2KW (2, N ¼ 480) ¼ 11.335, p ¼ .003, h2 ¼ .024. A post-hoc analysis was conducted to compare all paired means
using a Mann–Whitney U test. Significant differences were found between level 2 (M ¼ 2.31) and level 3 (M ¼ 1.90) (p ¼ .001).
Participants in level two (TOEFL iBT scores of 2.76–3.75) used significantly more bundles than participants in level three when
the bundles that came directly from the prompt are removed. This is similar to the finding for the rates of overall bundle use.
However, there was no significant difference between the number of bundles used by the participants in level 1 (M ¼ 2.13)
and level 3 once the prompt-based bundles were removed (p ¼ .148).
If we compare the findings from the three approaches to rates of bundle use, we find that the participants in the lowest
level group, while using more bundles overall, used a greater percentage of bundles directly from the prompt. Thus, when the
prompt bundles were removed, the number of bundles was not significantly different from either of the other groups. The
level 2 participants, on the other hand, used fewer bundles overall than the lowest level group (although not significantly
fewer) but a greater percentage of non-prompt based bundles than level 1. Compared to the highest level of test takers, level 2
used significantly more bundles overall, and this difference remained once the prompt-based bundles were removed. There
seems, then, to be a pattern of use of more prompt-based bundles at the lowest level, meaning, as mentioned above, that
these participants are relying heavily on repeating unanalyzed language from the immediate environment. In level 2, the
participants developed skills to draw on bundles not found in the prompts, and seemed to rely more heavily than the highest
level group on these bundles. Finally, the highest TOEFL scorers used fewer bundles overall, and while they were adept at
developing their own recurrent sequences, relied on them less than test takers in level 2.

5.3. Functional use of bundles

To determine the functions of the bundles used by the test takers, bundles which were not found in the prompt and
occurred more than 25 times in the entire corpus were investigated. As indicated in Section 4.3 above, bundles that were
related to the topics in the prompts were not analyzed for function. For example, ‘the sun and stars’ and ‘the sun and the’ were
two of the most frequently used bundles in the corpus. These bundles were eliminated from the functional analysis. The
frequencies of referential, stance, and discourse organizing bundles across the three proficiency levels can be found in Table 2.
As can be seen, bundles expressing the writer’s stance were overall more frequent than bundles functioning to organize
the discourse or frame references to entities. All of the stance bundles express attitudinal/evaluative stance, which function to
S. Staples et al. / Journal of English for Academic Purposes 12 (2013) 214–225 221

Table 2
Mean frequency per 100 words of referential, stance, and discourse organizing bundles across three proficiency levels.

Bundle Level 1 M (SD) Level 2 M (SD) Level 3 M (SD)

Referential bundles
Are a lot of .014 (.62) .010 (.47) .006 (.32)
There are a lot .014 (.61) .012 (.50) .007 (.32)
Stance bundles
Agree with the statement .026 (.93) .032 (.93) .030 (.77)
I agree with the .028 (.89) .024 (.76) .017 (.54)
I am interested in .036 (1.37) .025 (.98) .010 (.45)
I would like to .010 (.53) .012 (.53) .009 (.39)
Is more important than .014 (.55) .027 (.73) .016 (.52)
It is important to .027 (1.01) .018 (.63) .021 (.63)
It is very important .007 (.49) .021 (.75) .010 (.45)
Not be able to .014 (.82) .012 (.52) .019 (.68)
Others is more important .008 (.58) .021 (.71) .012 (.52)
That it is more .022 (.69) .022 (.73) .020 (.57)
They are interested in .011 (.49) .018 (.83) .009 (.39)
To be able to .009 (.54) .013 (.55) .025 (.76)
We are interested in .029 (1.33) .019 (.81) .010 (.61)
With others is more .008 (.58) .024 (.85) .010 (.50)
Discourse organizing bundles
According to the lecture .014 (.62) .012 (.53) .010 (.41)
According to the professor .007 (.53) .010 (.54) .012 (.53)
According to the reading .010 (.52) .008 (.45) .007 (.39)
At the same time .008 (.49) .015 (.59) .008 (.36)
First of all the .016 (.59) .015 (.55) .007 (.44)
In the lecture the .018 (.69) .019 (.61) .014 (.48)
On the other hand .043 (.92) .039 (.85) .031 (.73)
The lecture the professor .012 (.54) .018 (.72) .010 (.42)
The other hand the .008 (.40) .017 (.56) .006 (.34)
The second theory is .015 (.64) .013 (.56) .014 (.50)

express the test takers’ opinions about the topic. For example, test takers frequently started their responses to the inde-
pendent task with ‘I agree with the’ or ‘agree with the statement’:
“In my opinion, I agree with the statement.”
However, the most frequent bundle across all three proficiency levels was ‘on the other hand’. Test takers frequently used
this bundle in the integrated task to shift the focus of the essay between one source and another:
“The listening passage, on the other hand, mentions why these theories are not valid in all situations, so the listening
passage focuses on the limitation of the these theories.”
In the independent task, test takers used this bundle to provide more local contrast between statements:
“Additionally cooperating with others make you a better person because you learn other things by helping others and
also you can make more friends. On the other hand when you are a person that does not like to cooperate with people
you do get the chance to get along with them, and make people get away from you.”
The discourse organizing bundles used by test takers can be divided into three main functions: reference to the source of
information (‘according to the lecture’, ‘according to the professor’, ‘according to the reading’, ‘in the lecture the’, and ‘the
lecture the professor’), general discourse organizers (‘at the same time’, ‘first of all the’, ‘on the other hand’, and ‘the other
hand the’) and organizing source information (‘the second theory is’).
Only two bundles that performed a referential function met the frequency criteria: ‘are a lot of ’ and ‘there are a lot’. The
importance of these bundles in framing reference to entities can be seen in the words that commonly collocate with these two
bundles: ‘people’, ‘fish’, ‘clouds’. For example,
“Of course, it is true that there are a lot of people who changed their majors because they wanted to earn more money.”
Many of the bundles (46%) show a trend of greatest use by participants in level 2. This follows the trend seen for all non-
prompt bundles. Some of the bundles (38%), however, show a trend of greatest use by the lowest proficiency level and a
downward trend across the three proficiency levels. Three bundles were used most by the highest proficiency level
(‘according to the professor’, ‘to be able to’, ‘not be able to’).
To determine how functional use differed across levels, the normed frequency of each of the bundles that occurred more
than 25 times in the corpus was compared to the total number of bundles occurring for each level, creating a percentage of use
for each bundle. The individual bundles were then summed by category (referential, stance or discourse organizing) allowing
for comparison of the percentage of use across levels. Fig. 4 displays the percentage of bundle use by function across the three
222 S. Staples et al. / Journal of English for Academic Purposes 12 (2013) 214–225

Proportion of analyzed bundles

Discourse Organizing

Level 1 Level 2 Level 3

Fig. 4. Proportion of referential, stance, and discourse organizing bundles used by proficiency level.

Fig. 4 shows that the three proficiency levels are using discourse organizing and stance bundles at very similar rates
relative to their use of other bundles. The majority of bundles are related to the specific topics used in the exam prompts, and
test takers used very few bundles that reference the textual context in a general way, rather than framing a specific topic
within the essay.

5.4. Degree of fixedness

In order to further understand lexical bundle use in NNS writing we investigated patterns of bundle fixedness across
proficiency levels. The lexical bundles that met the inclusion criteria for this study were categorized as either fixed or variable
slot. Table 3 shows the resulting percentages for each proficiency level. There are no identifiable trends in the data, with the
exception of a very slight proportional increase in the number of fixed bundles as proficiency increases. However, this in-
crease is far too small to be interpreted as meaningful.
Additionally, the lists of fixed and variable slot lexical bundles were reviewed, but they did not reveal any noteworthy
differences among the three proficiency levels. Therefore, for this stage of the analysis, the fixed and variable slot lists for each
proficiency level were combined for qualitative analysis. The ten most frequent fixed and variable slot lexical bundles can be
seen in Table 4 below, with the variable slots indicated with asterisks. It can be seen that the majority of bundles in both lists
are closely related to the writing task prompt. However, it is interesting to note that ten of the eleven variable slots (91%) were
most often filled by function words rather than content words. This finding in the ten most frequent bundles led us to
calculate the total percentage of function word fillers in the entire pool of variable slots tokens (unique variable slots). The
results of this analysis showed that 61% of the variable slot tokens were filled by function words. Biber (2009) showed that
function word fillers were much more common in conversation, whereas content words were more commonly used as fillers
in academic writing.
The final step in our investigation of lexical bundle fixedness was to look more closely at the variable slot bundles.
Biber (2009) divided variable slot bundles into two sub-categories: continuous sequence and internal variable slot.
Continuous sequence bundles include both fixed four word bundles and fixed three word strings which are preceded
and/or followed by a variable slot. Internal variable slot bundles, on the other hand, are comprised of a fixed frame, in
which the second and/or third slot is variable. Fig. 5, below, displays the proportion of these two variable slot types
across proficiency levels. Similar to our previous fixedness analyses, there is no noticeable variation across proficiency
levels. However, the pattern for all three proficiency levels is clear. There is a strong preference for using continuous
sequence bundles rather than ones with internal variable slots. Biber (2009) showed that this pattern is associated
with conversational discourse, in contrast with academic prose which prefers internal variable slot lexical bundles
(p. 299).

Table 3
Percentage distribution of fixed versus variable slot lexical bundles across proficiency levels.

Fixed bundles Variable slot bundles Total

Level 1 781 (50.88%) 754 (49.12%) 1535
Level 2 1096 (54.34%) 921 (45.66%) 2017
Level 3 916 (55.38%) 738 (44.62%) 1654
Table 4
Ten most frequent fixed and variable slot lexical bundles.

Fixed bundles Variable slot bundles

On the other hand the sun * the stars
To cooperate with others * in the past
The sun and the ability to cooperate *
That the ability to * more important *
That fish farming is * are interested in
Agree with the statement * be able to
I am interested in * be able to
That it is more the sun and *
It is important to to study subjects *
I agree with the fish farming is *

In sum, the findings related to lexical bundle fixedness revealed little variation across proficiency levels. The three pro-
ficiency levels used very similar proportions of fixed and variable slot bundles. Likewise, there were no meaningful trends
across proficiency levels in their use of internal variable slot versus continuous sequence variable slot bundles. However,
comparisons of the fixedness patterns in these data from NNSs and the patterns revealed in previous analyses of academic
writing and conversation show marked differences, which will be discussed in more detail below.

5.5. Comparison with previous findings for academic writing and conversation

A comparison of the results for function and fixedness with those of academic writing and conversation indicate that in
both areas, all three groups are trending towards use of the types of bundles found in conversation rather than academic
writing. First, none of the groups used very many referential bundles in their exam responses. Previous research has shown
that bundles such as ‘the nature of the’, ‘in terms of the’, and ‘on the basis of’ are frequent in academic writing (Biber et al.,
2004; Chen & Baker, 2010; Hyland, 2008). These bundles help to identify characteristics of abstract entities and thus are
helpful in framing specific informational content. Biber et al. (2004) found that referential bundles were the most frequent out
of the three categories in academic writing (p. 396). The fact that only two referential bundles were identified indicates that
test takers did not utilize this important function of bundles in academic writing.
Furthermore, the results of the fixedness analysis suggest that the use of variable slot lexical bundles by the NNS writers in
this study tends to follow patterns of native speaker conversation rather than academic writing. The first indication of this is
the use of function word fillers in variable slots. Biber (2009) states, “Function words predominate in conversation, in both
fixed and variable slots” (p. 299). It appears that function word fillers are generally more common in variable slot bundles
(61%), and that this pattern is even more pronounced in the ten most frequent bundles (91%).
The results of the final analysis, in which internal variable slot and continuous sequence bundles were compared, offers
additional evidence that lexical bundle use by NNS writers is comparable to that which is found in NS conversation. All three
proficiency levels use continuous sequence bundles at least 75% of the time. This pattern closely resembles the patterns seen
in conversation (see Biber, 2009, p. 295).

Continuous sequence Internal variable Slot
Level 1 Level 2 Level 3
Fig. 5. Proportional distribution of variable slot lexical bundles (continuous sequence and internal variable slot) across proficiency levels.
6. Conclusion and implications

The use of formulaic sequences has been a focus of studies on learner development for quite some time. The current study
provides some evidence that suggests there may be a developmental sequence for some aspects of formulaic language use.
This sequence is reflected in the rates of occurrence of bundles used by test takers scoring at different levels of the TOEFL iBT.
The findings seem to support Ellis’ hypothesis and Myles et al. (1998) findings that low-level NNS learners rely heavily on
formulaic patterns and move toward self-constructed language as their proficiency increases (Ellis, 2002, p. 145). This would
also support conclusions from other second language acquisition studies which show that developmental sequences begin
with memorization and one-to-one form-function mapping and move slowly in the direction of more native-like production
of language (Ellis, 2006).
On the other hand, there was not much variability across the three levels of TOEFL iBT scores in terms of the functional use
of the bundles or the degree of bundle fixedness. One potential explanation for lack of variability in this area is the possibility
that test takers as a whole have had more exposure to stance and discourse organizing functions and less exposure to
referential functions of bundles. Chen and Baker (2010) found that both NS and NNS student writers used proportionally
fewer referential bundles than those found in published academic writing. However, TOEFL test takers appear to be using
proportionally fewer referential bundles than the two groups of writers in Chen and Baker’s study, since approximately 40% of
the bundles used by those writers were categorized as referential. The test takers, regardless of overall proficiency level, may
not have developed the skills necessary, for example, to refer to abstract entities, a common function within academic writing.
This suggests an area that EAP instructors may want to include in direct instruction. Frequently used referential bundles could
be explicitly taught and students could be given opportunities to practice using such bundles in the context of academic
writing assignments. Schmitt, Dornyei, Adolphs, and Durow (2004) offer evidence that formulaic language can be learned by
EAP students in a short term course. Cortes (2006) provides an example of how such instruction could proceed. Notably,
although her study found little increase in students’ use of targeted bundles after instruction, students did produce more
bundles that framed intangible attributes (e.g., ‘in the face of’).
Similarly, variable slot bundles, found rarely in the data set, may be an area to focus on with EAP learners. These bundles
provide important building blocks for managing the highly informational writing found in academic texts. However, new
approaches to teaching bundle frames rather than fixed sequences need to be developed.
It should be noted that another explanation for the lack of referential and variable slot bundles could be register differ-
ences between longer and shorter essay writing. Unlike the studies on which this study has been based (e.g., Biber, 2009;
Biber et al., 2004) and on previous studies investigating bundle frequency and function in NNS writing (e.g., Chen & Baker,
2010), the data in this study were collected in high-stakes testing situations. The writing produced in such contexts may
constitute a different register than other types of academic writing and thus may be associated with different characteristics
of bundle use in terms of both function and level of fixedness. Future research should explore such register differences as they
have implications for students entering academic institutions. Finally, the results presented here suggest that lexical bundle
use may be a feature of interest for assessments designed to measure learner development. However, it is important to
remember that raters using holistic scoring methods are responding to a number of textual features, and that lexical bundle
use is only one of many potentially useful linguistic measures of writing proficiency.
It is also not clear whether some of the differences found between bundles used in TOEFL iBT writing and published
academic writing might also be found for apprentice NS academic writers. Studies have shown a number of similarities in
bundle use between NS and NNS student writers (e.g. Ädel & Erman, 2012; Chen & Baker, 2010). However, there may be
differences between the two groups of writers as well. Chen and Baker (2010), for example, found that NS student writers
used significantly more noun phrase bundles that did not contain ‘of’ (e.g., ‘the extent to which’). Such bundles are often found
to serve referential purposes. Future research could explore whether writers use a more limited set of bundles in controlled
situations such as the TOEFL iBT, regardless of first language use.
Finally, one question raised by Biber’s (2009) findings is whether a bundle containing a variable slot before or after three
fixed words is actually nothing more than a three word bundle. One possible approach to answering this question is Gries and
Mukherjee’s (2010) computational measure of lexical gravity which determines the strength of a given association between
words in a lexical bundle. While this is an important methodological issue to consider in future research, we do not believe it
invalidates the findings in this study.


This project was supported by Educational Testing Service. We thank Bethany Gray for her feedback during the analysis
stage and William Crawford for suggestions on earlier drafts of this paper. We would also like to thank the anonymous re-
viewers for their helpful and insightful comments and suggestions.


Shelley Staples is a PhD candidate in Applied Linguistics at Northern Arizona University. Her research interests include corpus-based analyses of specialized
spoken and written registers as well as applications of corpus-based research to language teaching and learning.

Jesse Egbert is a PhD candidate in Applied Linguistics at Northern Arizona University. His research interests include register variation, academic writing,
corpus stylistics, and English grammar.

Douglas Biber is Regents’ Professor of English (Applied Linguistics) at Northern Arizona University. His research efforts have focused on corpus linguistics,
English grammar, and register variation (in English and cross-linguistic; synchronic and diachronic).

Alyson McClair is a PhD student at Iowa State University. Her research interests include CALL materials development, virtual learning environments,
L2 motivation and learner identity, automatic assessments, iCALL, and corpus linguistics.