Professional Documents
Culture Documents
kr
With increasing need for pedagogic mediation for the corpus use in L2 writing instruction, this study
explored the potential of controlled corpus consultation of selected formulaic expressions as a pedagogic
mediation to improve learners' awareness of formulaic language and L2 writing quality. This study
compared the use of formulaic expressions and L2 writing quality between two groups, i.e., Controlled
Corpus Consultation Group (CG) and Uncontrolled Corpus Consultation Group (UG). CG used
formulaic expressions (such as extent to which) in their corpus search dealing with limited concordances on
the selected expressions, while UG consulted the corpus data on the same words that comprised the
formulaic expressions given to CG, but as individual words (such as extent, to, which). Types, frequency, and
usage of formulaic expressions as well as overall writing quality of the two groups were subjected to
statistical analysis in order to identify significant group differences. Results suggested that CG
outperformed UG in terms of frequency and types of formulaic expressions as well as overall quality of
writing. Findings of this study suggested that the controlled corpus consultation on selected formulaic
expressions improved learners' general awareness of formulaic language and writing proficiency in L2.
Key words controlled corpus consultation, formulaic expressions, cluster analysis, L2 writing
doi: 10. 15702/mall.2016.19.3.11
I. INTRODUCTION
Corpora have been used as valuable learning tools as they provide realistic, rich, illustrative,
and up-to-date sources of language (Braun, 2005). In particular, direct use of corpora for
* This study used parts of data of an unpublished doctoral dissertation by Cho (2014).
11
learners' analysis has brought strong benefits to L2 learning (e.g., Chambers & O’Sullivan, 2004;
Cresswell, 2007; Flowerdew, 2012; Geluso, 2013; Kennedy & Miceli, 2001; Lee & Swales, 2006;
Nam, 2010; O'Sullivan, 2007; Todd, 2001; Yoon, 2008; Yoon & Hirvela, 2004). The studies
have noted that the discovery process of corpus analysis improved learners' awareness on
lexical patterns and collocations that were typically used in unique contexts and genres.
Moreover, learners' corpus consultation can foster inductive learning through analysis of a large
number of language examples, and it can improve learning autonomy by allowing students to
engage in independent searches of corpora.
However, it should be noted that the benefits of learners' corpus consultation come with
significant challenges. As Conrad (2005) pointed out, learners are required to have a lot of
technological and research skills to perform corpus consultation. They need to have substantial
knowledge to use a computer, to handle an overwhelming amount of data and to generalize and
evaluate the findings. In the same vein, Ädel (2010) reported major challenges of corpus-based
L2 writing instruction. She found that in dealing with the corpus data, students were easily
overwhelmed by the significant amount of data and felt lost in the maze of a corpus. In
particular, the students experienced difficulty in choosing concordances to analyze and identify
linguistic patterns from the corpus. In addition, the students tended to pay undue attention to
the word and the phrase level, making it difficult to connect surface forms to meaning.
The difficulties of learners' corpus consultation have been observed by a number of
empirical research studies (e.g., Chambers, 2005; Chambers & O'Sullivan, 2004; Cheng, Warren,
& Xun-Feng, 2003; Cresswell, 2007; Pérez-Paredes, Sánchez-Tornel, Alcaraz Calero, &
Jiménez, 2011; Sun, 2007; Vännestal & Lindquist, 2007). The studies showed that some
participants did not find the concordances helpful in achieving particular learning purposes. For
instance, in examination of the benefits of corpus-based lexicogrammar instruction in an
English as a foreign language class, Liu and Jiang (2009) found that many participants felt
overwhelmed by the extremely large number of search results. They suggested that the problem
was compounded when search results did not seem to be relevant to the students' particular
study questions.
Along a similar line, Vännestal and Lindquist (2007) found that students experienced many
difficulties when learning grammatical rules through corpus-based problem solving tasks and
peer teaching activities, and a student even questioned the usefulness of corpora in language
learning. The authors noted that a large amount of introduction and support was needed in
learners' corpus analysis. Similarly, Kennedy and Miceli (2010) found that students experienced
significant difficulties in corpus consultation, and the authors highlighted the importance of
In the face of the difficulties of the learners' corpus consultation, a number of researchers
have pointed out the need for a more controlled approach and suggested to provide pedagogic
mediation for students to make the corpus data pedagogically appropriate (e.g., McEnery, Xiao,
& Tono, 2006; Widdowson, 1998; 2000; 2004). Aston (2001) noted that guided and controlled
corpus searching is needed for effective learning even for higher-level students. In particular,
the author suggested that the difficulties that students experienced in the process of corpus
analysis can be reduced by controlling the corpus data and corpus consultation tasks.
Instructors can usually control the corpus data by pre-selecting and pre-editing the data to
make the samples of language suitable for learning objectives and learners' proficiency levels.
As exemplified by Tribble and Jones (1990), Hill (2000), and Yoon and Jo (2014), the use of
controlled corpus data can be actively directed by instructors who manipulate the corpora to
serve the learning needs of students. In addition to using the pre-selected/pre-edited corpus
data, the instructors gave explicit instructions on the corpus analysis and detailed explanations
about formulaic expressions to assist the learners in the corpus consultation process.
The use of controlled tasks can reduce learners' difficulties in corpus consultation. Aston
(2001) suggested several examples of controlled corpus consultation: "choosing tasks which do
not pose undue problems of precision and recall in interrogating the corpus; choosing tasks
which require little manipulation of the output in order to categorize and sort citations, remove
Hyeyoung Cho 13
irrelevant citations, etc.; choosing tasks which do not require all the data to be classified and
interpreted; choosing tasks which require relatively superficial interpretation of the data;
choosing tasks which allow learners to help and support each other; choosing tasks whose more
complex aspects can be delegated to more able students" (pp. 43-44). In particular, having
learners collaborate has been reported to assist the corpus consultation process (e.g.,
Flowerdew, 2008; Gavioli & Aston, 2001; O'Sullivan, 2007). For example, Gavioli and Aston
(2001) noted that collaborative corpus consultation allowed for more generalizable (or
comprehensible) interpretation of the corpus data. In addition, Flowerdew (2008) found that
more proficient students were able to offer their insights and interpretations on the corpus data,
assisting the weaker students to gradually develop independence in the learning process.
Compared to the collaborative corpus consultation, the guided inductive approach put more
focus on the roles of instructors in controlling the process of corpus consultation. Referring to
the definition put forth by Flowerdew (2009), Smart (2014) noted that the guided inductive
approach in learners' corpus consultation is the process whereby teachers assist learners in the
language discovery process through extensive guidance and scaffolding. Flowerdew (2012)
showed an example of the guided inductive approach in the use of an online corpus to teach
formulaic expressions. In learning frequent and appropriate usage of formulaic expressions in
business letters, the students engaged in hands-on consultation of a corpus, the process of
which was aided by the instructor's feedback and suggestions. Flowerdew noted that the guided
inductive approach assisted the learners in developing linguistic and pragmatic awareness on the
genre of business letters.
In order to further advance the controlled tasks of corpus consultation, this study aimed to
examine potential benefits of a type of controlled task - using selected formulaic expressions as
search terms for corpus analysis - to improve learners' linguistic awareness on formulaic
language and L2 writing proficiency. It should be noted that the use of single words as search
terms for corpus consultation has been customary in learners' corpus consultation (e.g.,
Kennedy & Miceli, 2001; Lee & Swales, 2006; O'Sullivan, 2007; Todd, 2001; Yoon, 2008; Yoon
& Hirvela, 2004). However, as studies have suggested (e.g., Ädel, 2010; Kennedy & Miceli,
2010; Liu & Jiang, 2009), students felt seriously overwhelmed by the large amount of corpus
data that may appear irrelevant to the students' particular objectives of a corpus query. In
addition, learners may find it difficult to identify typical formulaic patterns and build
hypotheses about their usage from the data. Further, it would be equally challenging for
learners to refine their corpus search through evaluation of their findings.
On the other hand, the controlled task of using selected formulaic expressions as search
III. METHOD
1. Participants
This study used two English writing classes at a Korean university. Each class consisted of
40 students of different majors such as English, Chinese, Japanese, and Business
Administration. On the first day of the experiment, the researcher surveyed the participants
about their gender, age, majors, and years of English education. After the survey, the students
were asked to visit a website (http://www.typeonline.co.uk/typingspeed.php) to measure their
English typing speed. Since the participants took writing tests on computer, it was important to
Hyeyoung Cho 15
test the homogeneity of the two groups’ typing speed. After the measurement of the typing
speed, the participants took a pretest, consisting of two English argumentative essay writing
activities. The description of the participants is provided in Table 1.
Writing Quality
Years of Typing Speed
Gender Age (On a scale from
Group N Education (WPM)
0 to 6)
M F Mean SD Mean SD Mean SD Mean SD
CG 40 6 34 22.45 1.64 12.65 1.81 48.20 10.78 2.38 .84
UG 40 4 36 22.40 1.14 12.40 1.67 47.05 14.49 2.60 .55
2. Data Collection
The experiment started with an introductory class of corpus-based writing activities for
both groups. The participants learned basic concepts of corpus linguistics and did hands-on
activities to analyze corpus data. After the introductory class, the students took a pretest of
writing two argumentative essays on different topics. Different topics were used to reduce the
direct effects of the topic on the results of writing scores. Each writing activity consisted of two
stages: 10 minutes for brainstorming and 30 minutes for writing and revising.
From the second to the ninth week of the semester, both groups engaged in L2 writing class
activities using the textbook, "Writing Academic English" by Oshima and Hogue (2006). In
most cases, the students received about 15-minute lectures on theoretical issues about L2
writing using the textbook and engaged in relevant textbook activities and writing tasks. After
60 minutes of textbook-based activities, approximately 30 minutes were allotted for learners'
corpus consultation using the Corpus of Contemporary American English (COCA). CG
students were given a list of formulaic expressions (e.g., extent to which, when it comes to) in each
class, selected from the Academic Formulas List (AFL) (Simpson-Vlach & Ellis, 2010). UG
students were given a list of individual words that consist of the formulaic expressions that
were given to CG (e.g., extent, to, which, when, come). In addition to the AFL, this study included
several formulaic expressions frequently used in argumentative essays, which were chosen from
"TOEFL writing (TWE) topics and model essays" (Wayabroad Company, 2002), a collection of
model essays and writing templates of argumentative essays. The inclusion was necessary
because the text type of each test was an argumentative essay. Given the various majors of the
Hyeyoung Cho 17
[TABLE 2] Timetable of the Study
3. Data Analysis
The data collected through the experiment was analyzed in terms of the quantitative (types
and frequencies of formulaic expressions and writing quality) and the qualitative (usage of
formulaic expressions) differences in the use of formulaic expressions between the two groups.
In order to analyze the different types and frequencies of formulaic expressions, this study
performed cluster analysis using Wordsmith 5.0. Clusters refer to "words which are found
repeatedly together in each others' company, in sequence. They represent a tighter relationship
than collocates, more like multi-word units or groups or phrases" (Scott, 2014, para. 1). Given
the comprehensive definition of cluster, this study utilized the results of cluster analysis to
investigate the different use of formulaic expressions in the two groups. The cluster analysis of
this study created wordlists of clusters, the size of two to five words respectively in the pretest
and the immediate and delayed posttests. The minimum cut-off point of frequency was five.
Based on the wordlists of clusters, this study calculated the total frequency of clusters as
well as the number of cluster types and performed chi-square tests to investigate differences
between CG and UG. In order to investigate the instructional effects of the controlled corpus
consultation for improving general linguistic awareness on formulaic language, this study
examined the use of uninstructed clusters as well as instructed ones in the CG's writing and
performed chi-square tests to assess statistical significance.
In addition, this study examined the different usage of formulaic expressions between the
two groups by analyzing the clusters exclusively used by each group. Further, according to the
classification of words clusters by Hyland (2008), this study classified the clusters into
research-oriented, text-oriented, and participant-oriented types and examined the differences in
the usage of formulaic expressions between the two groups.
Hyeyoung Cho 19
[TABLE 3] Types and Frequencies of 2-Word Clusters in CG and UG
Type Frequency
Group Immediate Delayed Immediate Delayed
Pretest Pretest
posttest posttest posttest posttest
CG 214 287 306 2305 3365 3199
UG 198 225 259 2173 2645 2638
Chi- χ2(1)=0.62, χ2(1)=7.51, χ2(1)=3.91, χ2(1)=3.89, χ2(1)=86.46, χ2(1)=53.92,
square p=.431 p=.006* p=.048* p=.049* p<.001* p<.001*
Note. An asterisk indicates that the chi-square value is statistically significant.
When it comes to the total frequency of 2-word clusters, CG used 2-word clusters more
frequently (2305) than UG (2173) on the pretest, and the group difference was marginally
significant (χ2(1)=3.89, p=.049*). The weak significance in the pretest is in contrast to the
strong results in posttests, indicating noticeable differences in instructional benefits between the
two groups. CG used 2-word clusters 3365 times in the immediate posttest, while UG used
them 2645 times. On the delayed posttest, CG used 2-word clusters 3199 times, while UG used
them 2638 times. The group differences were statistically significant both in the immediate (χ
2 2
(1)=86.46, p<.001*) and the delayed posttest (χ (1)=53.92, p<.001*), indicating the meaningful
instructional benefits of controlled corpus consultation of formulaic expressions. The results of
statistical examination on the types and frequencies of 2-word clusters in CG and UG showed a
significant group difference in the immediate and delayed posttests, suggesting that CG
increased as well as diversified their use of 2-word clusters.
The examination of 3-word clusters showed similar results with those of 2-word clusters,
providing a clear proof of the instructional benefits for CG in their use of formulaic
expressions as suggested in Table 4.
Type Frequency
Group Immediate Delayed Immediate Delayed
Pretest Pretest
posttest posttest posttest posttest
CG 51 84 68 588 1047 713
UG 46 50 52 549 715 526
Chi- χ2(1)=.26, χ2(1)=8.63, χ2(1)=2.13, χ2(1)=1.34, χ2(1)=62.56, χ2(1)=28.22,
square p=.612 p=.003* p=.144 p=.247 p<.001* p<.001*
Note. An asterisk indicates that the chi-square value is statistically significant.
Hyeyoung Cho 21
[TABLE 5] Types and Frequencies of 4-Word Clusters in CG and UG
Type Frequency
Group Immediate Delayed Immediate Delayed
Pretest Pretest
posttest posttest posttest posttest
CG 24 36 25 313 485 286
UG 21 27 19 284 382 216
Chi- χ2(1)=.2, χ2(1)=1.29, χ2(1)=.82, χ2(1)=1.41, χ2(1)=12.24, χ2(1)=9.76,
square p=.655 p=.257 p=.366 p=.235 p<.001* p=.002*
Note. An asterisk indicates that the chi-square value is statistically significant.
The frequency of 5-word clusters also showed meaningful group differences in immediate
and delayed posttests (Table 6). Despite non-significant results in the types of 5-word clusters
(presumably due to low numbers), the investigation on the frequency showed that the group
2
differences was statistically significant in the immediate (χ (1)=8.61, p=.003*) and delayed
posttest (χ2(1)=8.15, p=.004*). This finding is more meaningful considering the non-significant
2
group differences in the pretest (χ (1)=2.2, p=.138), indicating a significant difference in the
instructional effects between the two groups.
Type Frequency
Group Immediate Delayed Immediate Delayed
Pretest Pretest
posttest posttest posttest posttest
CG 19 28 17 206 270 185
UG 16 18 13 177 206 134
Chi- χ2(1)=.26, χ2(1)=2.17, χ2(1)=.53, χ2(1)=2.2, χ2(1)=8.61, χ2(1)=8.15,
square p=.612 p=.14 p=.465 p=.138 p=.003* p=.004*
Note. An asterisk indicates that the chi-square value is statistically significant.
Qualitative investigation on the use of instructed and uninstructed clusters in the two
groups’ writing suggested that CG was more successful to develop learners’ general awareness
on formulaic expressions than UG did through their corpus search. For instance, the formulaic
use of sense, or sense-clusters, showed significant group differences after eight weeks of corpus
consultation tasks. As shown in Table 8, there was no word cluster usage including the term
sense on the pretest of both groups. However, after CG did the corpus consultation on the term
in the sense that, the students seem to develop their awareness on the formulaic use of sense as
they varied the use of sense-clusters such as a sense of, in this sense and a sense of belonging in the
immediate and the delayed posttest. The diverse use of sense-cluster by CG is in stark contrast
with UG, which showed only one type of cluster (sense of) in the delayed posttest despite their
corpus search on sense.
Hyeyoung Cho 23
[TABLE 8] Use of Sense-Clusters by CG and UG
The different development of the awareness on formulaic expressions between the two
groups seemed to be attributable to the different units of search terms in the corpus
consultation. During the corpus consultation, the researcher witnessed many instances, in
which CG students gradually modified their search terms (e.g., in the sense that) to shorter and
simpler ones (e.g., in the sense, the (a) sense, and sense), presumably in an attempt to refine their
corpus search. Through this process, the students moved from dealing with simple and limited
concordances to analyzing more diverse and complicated concordances including various
semantic and textual usages of the search terms. Despite the increasing difficulty of the task, the
process seemed to be manageable as the corpus analysis was scaffolded by the findings from
prior corpus search. Through this gradual refinement of corpus consultation, the CG students
seemed to be able to improve their awareness on various uninstructed formulaic expressions.
However, for UG, who started their corpus consultation with the single word sense, it
seemed to be more difficult for them to develop their corpus analysis by modifying search
terms as CG did. As shown in Figure 1, UG students had to deal with various usages and
functions of sense from the first corpus search results such as get a sense of, make sense, could sense,
my sense of, and common sense.
The concordances including various examples of the use of sense may have seemed too
arbitrary for the students to notice formulaic patterns based on semantic and functional
consideration of the search term. In order to advance their corpus search, UG students had to
choose one or two clusters based on the clusters’ pedagogical values, the process which
Results of qualitative investigation on the usage of word clusters in the writing by CG and
UG provided us with a clear picture to explain the significant group differences in learners'
awareness on formulaic expressions. In order to choose word clusters for qualitative
examination, this study identified clusters that were exclusively used by each group in the
immediate and delayed posttests. For accurate evaluation of the instructional effects, this study
excludes clusters that were parts of the writing prompts from the analysis because it is difficult
to determine whether or not the use of the formulaic expressions in the writing prompts was a
result of the improved awareness on formulaic language through learners' corpus consultation.
The qualitative investigation on the usage of word clusters revealed two major findings.
First, UG used some erroneous expressions that were not found in CG's writing. For instance,
official buildings, wear same clothes and many of were erroneous clusters only present in UG's writing.
Official buildings in UG's writing was classified as an erroneous expression because qualitative
analysis on UG's writing suggested that it was mistakenly used for office buildings in the writing
prompt. In addition, wear same clothes and many of are grammatically wrong due to omission of
the article the and incorrect use of preposition of respectively. Illustration of erroneous use of
many of is given in Figure 2. Figure 2 demonstrates that the use of many of is not a unique
mistake by a single writer, but a systematic error made by multiple writers in UG. Given the
distinctive and systematic errors by UG writers in their use of word clusters, it seems to be true
Hyeyoung Cho 25
that UG was less aware of appropriate forms of formulaic language than CG, which did not use
such erroneous expressions in their writing.
Cluster Categories CG UG
a lot of, in touch with, more and more, not good for, the
wear same clothes*,
fact that, do harm to, is harmful to, the issue of, there
is not only, there is a, and it
is not, we don't have, a sense of, sense of belonging, to
research-oriented is, is one of, can be a, there are
talk about, in front of, can talk about, in that it, it is
many, but it is, it is hard,
easy, talk about the, the statement that, they can talk,
3-word some people say
they want to, to have a
cluster
according to the, it comes to, when it comes, in other
this is because, because of the,
text-oriented words, in this sense, for these reasons, in terms of,
however it is, however I think
for this reason
I agree with, agree with the, in my opinion,
participant-oriented I believe that, I agree that
should not be, these reasons I
research-oriented a sense of belonging is one of the
4-word
text-oriented when it comes to -
cluster
participant-oriented I agree with the, for these reasons I -
In terms of 3-word clusters, the most frequently used function was research-oriented one in
both groups, and it is notable that CG included clusters indicating various types of syntactic
structures (e.g., do harm to, is harmful to, there is not, we don't have, can talk about, in that it, it is easy,
they can talk, they want to), while the majority of UG's research-oriented clusters (8 out of 10)
included be verbs. Studies have suggested that excessive use of be copula as a main verb is
indicative of low syntactic structure in English writing (e.g., Hinkel, 2002; 2003). In terms of the
text-oriented clusters, CG used more various clusters (8) than UG (4). In particular, UG's
3-word clusters were limited to the use of causal (because) and adversative (however) connectors,
while CG used various transition signals of different textual functions such as according to the,
when it comes, in other words, in this sense and in terms of. The use of participant-oriented clusters also
showed that CG used diverse types such as I agree with, agree with the, in my opinion, should not be
and these reasons I, while UG showed only two clusters, I believe that and I agree that. The use of
4-word clusters showed similar results to 3-word clusters in that UG had only one occurrence
(is one of the), while CG showed four clusters across all three macrofunctions.
In brief, CG used not only less erroneous word clusters, but more diverse clusters serving
different macrofunctions than UG. The results clearly indicate the instructional benefits of
controlled corpus consultation for improving learners' awareness of appropriate and diverse
forms of formulaic expressions. In addition, it should be noted that most of the clusters used
Hyeyoung Cho 27
exclusively by CG were not the given search terms for corpus consultation. This seems to
suggest that the controlled task did not restrict the scope of the corpus search; rather, it allowed
for more effective investigation of the data to develop learners' general awareness of formulaic
expressions in L2.
In order to compare the different instructional benefits on the writing quality between the
two groups, this study examined the two groups' mean scores of writing tests, the significance
of group difference on each test, and group by time interaction. The results are reported in
Table 10. In terms of a within-subject ANOVA, Mauchly’s test of sphericity shows that the
sphericity assumption was violated, so Huynh-Feldt adjustment was used for hypothesis testing
(W(2)=.905, p=.021). For between-subject tests, unlike CG (W(2)=.990, p=.829), UG showed
significant results (W(2)=.784, p=.010), which was adjusted by Huynh-Feldt’s epsilon.
Immediate Within-subject
Pretest Delayed posttest Between-subject
posttest ANOVA
Group ANOVA
(group by time
Mean SD Mean SD Mean SD (time effect)
interaction)
F(2,78)=103.77,
CG 2.38 .84 3.13 .69 3.9 .87 F(1.89,147.57)
p<.001*
=40.43,
F(1.71, 66.63)=3.1,
UG 2.6 .55 2.9 .74 2.7 .65 p<.001*
p=.059
Independ t(78)=-1.42, t(78)=1.41, t(78)=6.99,
ent t-test p=.159 p=.164 p<.001*
Note. An asterisk indicates that the p value is statistically significant.
CG's mean score of the pretest was 2.38, which improved to 3.13 in the immediate posttest
and 3.9 in the delayed posttest. The improvement over time was statistically significant
(F(2,78)=103.77, p<.001*). This is in stark contrast with the results of UG, who made
non-significant improvement in the posttest (2.9) and delayed posttest (2.7) (F(1.71,66.63)=3.1,
p=.059). The repeated measure ANOVA suggested significant group by time interaction
(F(1.89,142.57)=40.43, p<.001*), indicating meaningful group differences in the writing quality
over three times of measurements. Further, it is noteworthy that the independent t-tests to
V. CONCLUSION
Based on the increasing attention on the direct use of corpora for L2 learning, a number of
studies have reported a need for controlled tasks to exploit the best benefits of learners' corpus
consultation (e.g., Ädel, 2010; Aston, 2001; Kennedy & Miceli, 2010). In response to such
needs, this study explores the instructional benefits of controlled corpus consultation of
selected formulaic expressions in L2 writing. This study presents an empirical investigation to
compare the students’ linguistic awareness of formulaic expressions and the writing quality
between a controlled corpus consultation group (which developed the corpus search starting
from selected formulaic expressions) and an uncontrolled corpus consultation group (which
began the corpus search from individual search terms). The findings of this study suggested
that the controlled group improved the number of types and frequency of formulaic
expressions in their L2 writing, indicating instructional benefits of the controlled task to
improve the learners' awareness on formulaic expressions in L2. Qualitative investigation on the
use of clusters supported the improvement of the controlled group, as it used more diverse and
Hyeyoung Cho 29
accurate word clusters, while the uncontrolled group showed less diversity with some erroneous
usage of clusters. In terms of writing quality, the controlled group also showed meaningful
improvement in the immediate and delayed posttests compared to the uncontrolled
counterpart. The findings of this study showed a significant outperformance of the controlled
group, indicating the instructional benefits of controlled corpus consultation of selected
formulaic expressions for improving linguistic awareness on formulaic expressions as well as L2
writing quality.
The significant group difference in this study is majorly attributable to the different unit of
search terms. The investigation of this study suggested that the different search terms seemed
to create significantly different environments for the learners to develop their linguistic
awareness on formulaic expressions and L2 writing proficiency. When students start their
corpus consultation with single search words, they usually have to deal with a vast amount of
concordances. Analyzing the concordances, students have to put a lot of effort into identifying
formulaic patterns because there are a number of possible hypotheses about meanings and
usage of the search terms. As a way of the testing the hypotheses, students have to modify their
search terms by adding new words and changing the words, which requires significant analytical
and linguistic ability (e.g., Ädel, 2010; Kennedy & Miceli, 2010; Liu & Jiang, 2009). On the
other hand, when students started their corpus search with formulaic expressions as search
terms, they have fewer concordances, making interpretation of the data more manageable. With
a small number of possible hypotheses about the usage of the search terms, it can be relatively
easy for the students to complete the testing of hypotheses and to refine their corpus search,
allowing for more opportunities for learners to examine various forms and usage of formulaic
expressions.
The findings of this study suggested several pedagogical implications for corpus-based L2
writing instruction. Most of all, it highlighted the significance of a controlled approach in
learners' use of corpus data. As a number of studies have suggested (e.g., Ädel, 2010; Kennedy
& Miceli, 2010; Liu & Jiang, 2009; McEnery, Xiao, & Tono, 2006; Widdowson, 1998; 2000;
2004), the direct use of corpora for pedagogical purposes may bring about considerable
challenges. Despite the significant benefits of corpus searching, it seems obvious that more
guidance and training is required both for instructors and students in their use of corpora as a
learning tool. The findings of this study echoed the significance of controlled corpus
consultation, inviting future studies to examine various ways of controlling the data and tasks of
corpus consultation for better pedagogical benefits. In addition, instructors who incorporate or
hope to incorporate corpora into their L2 instruction should keep in mind the strong need for
Hyeyoung Cho 31
which are not readily identifiable through the current experimental design. As such, we cannot
easily dismiss the benefits of corpus consultation using single words as search terms, which
should be investigated through carefully designed longitudinal studies.
Notwithstanding these limitations, the findings of this study have significance for
corpus-based L2 writing instruction. This study is meaningful as it explored the instructional
benefits of controlled corpus consultation of selected formulaic expressions for improving
students' linguistic awareness of formulaic expressions and L2 writing proficiency. With only a
limited number of studies exploring the controlled corpus consultation tasks, it is hoped that
findings of this study will provoke further studies to examine various types and functions of
controlled tasks of learners' corpus consultation. In addition, based on the findings of this
study, future research is hoped to explore the instructional value of formulaic expressions in
learners' corpus consultation to achieve the best benefits of corpus-based L2 writing
instruction.
REFERENCES
Ädel, A. (2010). Using corpora to teach academic writing: Challenges for the direct approach. In M.
Campoy-Cubillo, B. Belles-Fortuno, & M. Geo-Valor (Eds.), Corpus-based approaches to English
language teaching (pp. 39-55). London & New York: Continuum.
Aston, G. (2001). Learning with corpora: An overview. In G. Aston (Ed.), Learning with corpora (pp.
7-45), Houston, TX: Athelstan.
Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, M. (2006). Formulaic sequences and
perceived oral proficiency: Putting a lexical approach to the test. Language Teaching Research,
10(3), 245-261.
Braun, S. (2005). From pedagogically relevant corpora to authentic language learning contents.
ReCALL, 17(1), 47-64.
Chambers, A. (2005). Integrating corpus consultation in language studies. Language Learning &
Technology, 9(2), 111-125.
Chambers, A., & O'Sullivan, I. (2004). Corpus consultation and advanced learners' writing skills in
French. ReCALL, 16(1), 158-172.
Cheng, W., Warren, M., & Xun-Feng, X. (2003). The language learner as language researcher: Putting
corpus linguistics on the timetable. System, 31(2), 173-186.
Cho, H. (2014). The effects of corpus consultation of formulaic expressions on the improvement of automaticity in the
Hyeyoung Cho 33
Kennedy, C., & Miceli, T. (2001). An evaluation of intermediate students' approaches to corpus
investigation. Language Learning & Technology, 5(3), 77-90.
Kennedy, C., & Miceli, T. (2010). Corpus-assisted creative writing: Introducing intermediate Italian
learners to a corpus as reference resource. Language Learning & Technology, 14(1), 28-44.
Lee, D., & Swales, J. (2006). A corpus-based EAP course for NNS doctoral students: Moving from
available specialized corpora to self-compiled corpora. English for Specific Purposes, 25(1), 56-75.
Liu, D., & Jiang, P. (2009). Using a Corpus Based lexicogrammatical approach to grammar instruction
in EFL and ESL contexts. The Modern Language Journal, 93(1), 61-78.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies. London: Routledge.
Nam, D. (2010). The effects of corpus-based language instruction on productive vocabulary
knowledge. Multimedia-Assisted Language Learning, 13(2), 155-176.
Oshima, A., & Hogue, A. (2006). Writing academic English. New York: Pearson Education.
O'Sullivan, I. (2007). Enhancing a process-oriented approach to literacy and language learning: The role
of corpus consultation literacy. ReCALL, 19(3), 269-286.
Pérez-Paredes, P., Sánchez-Tornel, M., Alcaraz Calero, J. M., & Jiménez, P. A. (2011). Tracking
learners' actual uses of corpora: Guided vs non-guided corpus consultation. Computer Assisted
Language Learning, 24(3), 233-253.
Scott, M. (2014). WordSmith tools manual. Retrieved May 12, 2016, from http://www.lexically.
net/downloads/ version6/HTML/index.html?single_words.htm
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology
research. Applied Linguistics, 31, 487-512.
Smart, J. (2014). The role of guided induction in paper-based data-driven learning. ReCALL, 26(2),
184-201.
Sun, Y.-C. (2007). Learner perceptions of a concordancing tool for academic writing. Computer Assisted
Language Learning, 20(4), 323-343.
Todd, R. W. (2001). Induction from self-selected concordances and self-correction. System, 29(1),
91-102.
Tribble, C., & Jones, G. (1990). Concordances in the classroom: A resource book for teachers. Harlow: Longman.
Vännestal, M., & Lindquist, H. (2007). Learning English grammar with a corpus: Experimenting with
concordancing in a university grammar course. ReCALL, 19(3), 329-350.
Wayabroad Company. (2002). TOEFL writing (TWE) topics and model essays. Retrieved Jan 25, 2016, from
https://www.wiziq.com/tutorial/671118-185-TOEFL-Writing-TWE-Topics-and-Model-Ess
ays
Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press.
Widdowson, H. G. (1998). Communication and community: The pragmatics of ESP. English for Specific
Purposes, 17(1), 13-14.
Widdowson, H. G. (2000). On the limitations of linguistics applied. Applied Linguistics, 21(1), 3-25.
APPENDIX A
Hyeyoung Cho 35
When we consider that/ A deem B as the premier choice/
well/ exactly/ likely/ also/ safely
be interested in/face the dilemma of (whether to A or to
week case/ way/ point/ role
B)/ whether or not/ referred to as/ in response to/ the real
7 effect/ size/ response/ choice/ type
world/ modern society/ It is important to/ It is also
presence/ spite/ dilemma/ world
important to see that/ In summary/ In short
it may be/may not be/ there may be/ may neglect that/ this
week may explain why/ Some people might argue that/ can be in/on/at/by/from/with/between/to/
8 used/it can be/ we can see/ it does not/ Most people toward/of/for
would agree that/ it would be/ it seems to be/ it seems that
the development of/the role of/the size of the/ the
week this/there/it/ we/I/these/people
importance of / the effect of/ as a function of/ the use of/
9 the/a/ not/no
the presence of/ different types of
2. Writing Topics
Test Topic
The sale of human organs should be legalized.
Pretest
Businesses should do anything they can to make a profit.
People should not be allowed to smoke in public places and office buildings.
Posttest Face-to-face communication is better than other types of communication, such
as letters, email or telephone calls.
Delayed High school students should be required to wear school uniforms.
posttest Television has destroyed communication among friends and family.