Content Server

http://journal.kamall.or.kr/wp-content/uploads/2016/10/Cho_19_3_01.pdf http://www.kamall.or.
kr
Multimedia-Assisted Language Learning

19(3) 11-36
An Investigation of Controlled Corpus Consultation of Selected

Formulaic Expressions in L2 Writing*
Hyeyoung Cho (Cyber Hankuk University of Foreign Studies)
Cho, Hyeyoung. (2016). An investigation of controlled corpus consultation of selected formulaic

expressions in L2 writing. Multimedia-Assisted Language Learning, 19(3), 11-36.
With increasing need for pedagogic mediation for the corpus use in L2 writing instruction, this study
explored the potential of controlled corpus consultation of selected formulaic expressions as a pedagogic
mediation to improve learners' awareness of formulaic language and L2 writing quality. This study
compared the use of formulaic expressions and L2 writing quality between two groups, i.e., Controlled
Corpus Consultation Group (CG) and Uncontrolled Corpus Consultation Group (UG). CG used
formulaic expressions (such as extent to which) in their corpus search dealing with limited concordances on
the selected expressions, while UG consulted the corpus data on the same words that comprised the
formulaic expressions given to CG, but as individual words (such as extent, to, which). Types, frequency, and
usage of formulaic expressions as well as overall writing quality of the two groups were subjected to
statistical analysis in order to identify significant group differences. Results suggested that CG
outperformed UG in terms of frequency and types of formulaic expressions as well as overall quality of
writing. Findings of this study suggested that the controlled corpus consultation on selected formulaic
expressions improved learners' general awareness of formulaic language and writing proficiency in L2.
Key words controlled corpus consultation, formulaic expressions, cluster analysis, L2 writing
doi: 10. 15702/mall.2016.19.3.11
I. INTRODUCTION
Corpora have been used as valuable learning tools as they provide realistic, rich, illustrative,
and up-to-date sources of language (Braun, 2005). In particular, direct use of corpora for
* This study used parts of data of an unpublished doctoral dissertation by Cho (2014).
11
learners' analysis has brought strong benefits to L2 learning (e.g., Chambers & O’Sullivan, 2004;
Cresswell, 2007; Flowerdew, 2012; Geluso, 2013; Kennedy & Miceli, 2001; Lee & Swales, 2006;
Nam, 2010; O'Sullivan, 2007; Todd, 2001; Yoon, 2008; Yoon & Hirvela, 2004). The studies
have noted that the discovery process of corpus analysis improved learners' awareness on
lexical patterns and collocations that were typically used in unique contexts and genres.
Moreover, learners' corpus consultation can foster inductive learning through analysis of a large
number of language examples, and it can improve learning autonomy by allowing students to
engage in independent searches of corpora.
However, it should be noted that the benefits of learners' corpus consultation come with
significant challenges. As Conrad (2005) pointed out, learners are required to have a lot of
technological and research skills to perform corpus consultation. They need to have substantial
knowledge to use a computer, to handle an overwhelming amount of data and to generalize and
evaluate the findings. In the same vein, Ädel (2010) reported major challenges of corpus-based
L2 writing instruction. She found that in dealing with the corpus data, students were easily
overwhelmed by the significant amount of data and felt lost in the maze of a corpus. In
particular, the students experienced difficulty in choosing concordances to analyze and identify
linguistic patterns from the corpus. In addition, the students tended to pay undue attention to
the word and the phrase level, making it difficult to connect surface forms to meaning.
The difficulties of learners' corpus consultation have been observed by a number of
empirical research studies (e.g., Chambers, 2005; Chambers & O'Sullivan, 2004; Cheng, Warren,
& Xun-Feng, 2003; Cresswell, 2007; Pérez-Paredes, Sánchez-Tornel, Alcaraz Calero, &
Jiménez, 2011; Sun, 2007; Vännestal & Lindquist, 2007). The studies showed that some
participants did not find the concordances helpful in achieving particular learning purposes. For
instance, in examination of the benefits of corpus-based lexicogrammar instruction in an
English as a foreign language class, Liu and Jiang (2009) found that many participants felt
overwhelmed by the extremely large number of search results. They suggested that the problem
was compounded when search results did not seem to be relevant to the students' particular
study questions.
Along a similar line, Vännestal and Lindquist (2007) found that students experienced many
difficulties when learning grammatical rules through corpus-based problem solving tasks and
peer teaching activities, and a student even questioned the usefulness of corpora in language
learning. The authors noted that a large amount of introduction and support was needed in
learners' corpus analysis. Similarly, Kennedy and Miceli (2010) found that students experienced
significant difficulties in corpus consultation, and the authors highlighted the importance of
12 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

extensive practice and training for learners to formulate proper search terms and to interpret
concordance examples correctly.
Given the challenges of the learners’ corpus consultation and the need for extensive training
for corpus use, it seems urgent to develop more controlled approach in the use of corpus data
in L2 learning. As one way of developing the controlled corpus use in L2 writing instruction,
this study focused on the value of formulaic expressions as search term for corpus consultation.
Unlike the corpus query starting from individual search words (such as extent, which), the corpus
consultation starting from selected formulaic expressions (such as extent to which) would provide
more controlled and limited search results to the learners. The purpose of this study is to
identify the instructional effects of the controlled corpus consultation of selected formulaic
expressions to improve learners’ general linguistic awareness in L2 and their L2 writing
proficiency.
II. CONTROLLED APPROACH IN CORPUS CONSULTATION
In the face of the difficulties of the learners' corpus consultation, a number of researchers
have pointed out the need for a more controlled approach and suggested to provide pedagogic
mediation for students to make the corpus data pedagogically appropriate (e.g., McEnery, Xiao,
& Tono, 2006; Widdowson, 1998; 2000; 2004). Aston (2001) noted that guided and controlled
corpus searching is needed for effective learning even for higher-level students. In particular,
the author suggested that the difficulties that students experienced in the process of corpus
analysis can be reduced by controlling the corpus data and corpus consultation tasks.
Instructors can usually control the corpus data by pre-selecting and pre-editing the data to
make the samples of language suitable for learning objectives and learners' proficiency levels.
As exemplified by Tribble and Jones (1990), Hill (2000), and Yoon and Jo (2014), the use of
controlled corpus data can be actively directed by instructors who manipulate the corpora to
serve the learning needs of students. In addition to using the pre-selected/pre-edited corpus
data, the instructors gave explicit instructions on the corpus analysis and detailed explanations
about formulaic expressions to assist the learners in the corpus consultation process.
The use of controlled tasks can reduce learners' difficulties in corpus consultation. Aston
(2001) suggested several examples of controlled corpus consultation: "choosing tasks which do
not pose undue problems of precision and recall in interrogating the corpus; choosing tasks
which require little manipulation of the output in order to categorize and sort citations, remove
Hyeyoung Cho 13
irrelevant citations, etc.; choosing tasks which do not require all the data to be classified and
interpreted; choosing tasks which require relatively superficial interpretation of the data;
choosing tasks which allow learners to help and support each other; choosing tasks whose more
complex aspects can be delegated to more able students" (pp. 43-44). In particular, having
learners collaborate has been reported to assist the corpus consultation process (e.g.,
Flowerdew, 2008; Gavioli & Aston, 2001; O'Sullivan, 2007). For example, Gavioli and Aston
(2001) noted that collaborative corpus consultation allowed for more generalizable (or
comprehensible) interpretation of the corpus data. In addition, Flowerdew (2008) found that
more proficient students were able to offer their insights and interpretations on the corpus data,
assisting the weaker students to gradually develop independence in the learning process.
Compared to the collaborative corpus consultation, the guided inductive approach put more
focus on the roles of instructors in controlling the process of corpus consultation. Referring to
the definition put forth by Flowerdew (2009), Smart (2014) noted that the guided inductive
approach in learners' corpus consultation is the process whereby teachers assist learners in the
language discovery process through extensive guidance and scaffolding. Flowerdew (2012)
showed an example of the guided inductive approach in the use of an online corpus to teach
formulaic expressions. In learning frequent and appropriate usage of formulaic expressions in
business letters, the students engaged in hands-on consultation of a corpus, the process of
which was aided by the instructor's feedback and suggestions. Flowerdew noted that the guided
inductive approach assisted the learners in developing linguistic and pragmatic awareness on the
genre of business letters.
In order to further advance the controlled tasks of corpus consultation, this study aimed to
examine potential benefits of a type of controlled task - using selected formulaic expressions as
search terms for corpus analysis - to improve learners' linguistic awareness on formulaic
language and L2 writing proficiency. It should be noted that the use of single words as search
terms for corpus consultation has been customary in learners' corpus consultation (e.g.,
Kennedy & Miceli, 2001; Lee & Swales, 2006; O'Sullivan, 2007; Todd, 2001; Yoon, 2008; Yoon
& Hirvela, 2004). However, as studies have suggested (e.g., Ädel, 2010; Kennedy & Miceli,
2010; Liu & Jiang, 2009), students felt seriously overwhelmed by the large amount of corpus
data that may appear irrelevant to the students' particular objectives of a corpus query. In
addition, learners may find it difficult to identify typical formulaic patterns and build
hypotheses about their usage from the data. Further, it would be equally challenging for
learners to refine their corpus search through evaluation of their findings.
On the other hand, the controlled task of using selected formulaic expressions as search

terms for corpus consultation would narrow down the corpus search results to a more
manageable level, requiring little manipulation of the data for the learners to analyze. Dealing
with the controlled data, the learners should be able to easily identify typical forms of formulaic
expressions, which would lead the learners to refine their corpus search. In this process of
controlled corpus analysis, it is expected that the learners would improve L2 awareness on
formulaic language as well as L2 writing proficiency.
In order to investigate the benefits of controlled corpus consultation of selected formulaic
expressions, this study examined the development of learners' awareness on formulaic language
in L2 and overall quality of L2 writing after eight weeks of corpus consultation activities. To
that end, this study utilized two groups of students: a group who engaged in controlled corpus
consultation by searching selected formulaic expressions (Controlled Corpus Consultation
Group, CG) and a group who searched the exact same words that were given to CG but as
single words (Uncontrolled Corpus Consultation Group, UG). In order to examine the
differences in learners' use of formulaic language and L2 writing quality, this study compared
the types, frequencies, and usage of formulaic expressions as well as L2 writing quality between
the two groups. The purposes of this study will be served through the investigation on the
following three research questions:
1) Do CG and UG show differences in the number of frequencies and types of formulaic

expressions in L2 writing?
2) Do CG and UG show differences in the usage of formulaic expressions in L2 writing?
3) Do CG and UG show differences in the improvement of overall quality of L2 writing?
III. METHOD
1. Participants
This study used two English writing classes at a Korean university. Each class consisted of
40 students of different majors such as English, Chinese, Japanese, and Business
Administration. On the first day of the experiment, the researcher surveyed the participants
about their gender, age, majors, and years of English education. After the survey, the students
were asked to visit a website (http://www.typeonline.co.uk/typingspeed.php) to measure their
English typing speed. Since the participants took writing tests on computer, it was important to
Hyeyoung Cho 15
test the homogeneity of the two groups’ typing speed. After the measurement of the typing
speed, the participants took a pretest, consisting of two English argumentative essay writing
activities. The description of the participants is provided in Table 1.
[TABLE 1] Description of Participants
Writing Quality
Years of Typing Speed
Gender Age (On a scale from
Group N Education (WPM)
0 to 6)
M F Mean SD Mean SD Mean SD Mean SD
CG 40 6 34 22.45 1.64 12.65 1.81 48.20 10.78 2.38 .84
UG 40 4 36 22.40 1.14 12.40 1.67 47.05 14.49 2.60 .55
2. Data Collection
The experiment started with an introductory class of corpus-based writing activities for
both groups. The participants learned basic concepts of corpus linguistics and did hands-on
activities to analyze corpus data. After the introductory class, the students took a pretest of
writing two argumentative essays on different topics. Different topics were used to reduce the
direct effects of the topic on the results of writing scores. Each writing activity consisted of two
stages: 10 minutes for brainstorming and 30 minutes for writing and revising.
From the second to the ninth week of the semester, both groups engaged in L2 writing class
activities using the textbook, "Writing Academic English" by Oshima and Hogue (2006). In
most cases, the students received about 15-minute lectures on theoretical issues about L2
writing using the textbook and engaged in relevant textbook activities and writing tasks. After
60 minutes of textbook-based activities, approximately 30 minutes were allotted for learners'
corpus consultation using the Corpus of Contemporary American English (COCA). CG
students were given a list of formulaic expressions (e.g., extent to which, when it comes to) in each
class, selected from the Academic Formulas List (AFL) (Simpson-Vlach & Ellis, 2010). UG
students were given a list of individual words that consist of the formulaic expressions that
were given to CG (e.g., extent, to, which, when, come). In addition to the AFL, this study included
several formulaic expressions frequently used in argumentative essays, which were chosen from
"TOEFL writing (TWE) topics and model essays" (Wayabroad Company, 2002), a collection of
model essays and writing templates of argumentative essays. The inclusion was necessary
because the text type of each test was an argumentative essay. Given the various majors of the

participants, argumentative essays seemed to be a more appropriate text type than academic
writing for assessment of instructional benefits. The lists of the search terms for corpus
consultation of the two groups are provided in Appendix 1.
Prior to corpus consultation, the students were given a short lecture about the search terms.
Due to the different units of search terms, the lecture for CG was majorly concerned with
semantic and functional usage of the formulaic expressions (e.g., in terms of, the fact that the), while
that of UG was more focused on literal meanings and grammatical issues of the search terms
(e.g., in, of, term, fact, that). To start the corpus consultation, CG students typed the given
formulaic expressions, while UG students searched for them as individual words in the
concordancer of COCA. During the corpus consultation, the researcher provided weekly
worksheets and asked students to write down words, phrases, or sentences that they found
useful from the concordances for their future L2 English writing. In analyzing the data, the
students were encouraged to ask questions, and the instructor provided individual assistance
when the students asked for help.
In the 10th week, CG and UG took the immediate posttest that involved writing two
argumentative essays on different topics to evaluate the instructional benefits of the two
groups. As in the pretest, the student were given 10 minutes of brainstorming, followed by 30
minutes of writing for each topic. From weeks 11 to 13, there was no corpus consultation
activity. Instead, both groups of students took lectures on stylistic and rhetorical issues about
English writing and engaged in writing tasks and peer feedback activities. In the 14th week, the
delayed posttest was administered to measure the enduring effects of corpus consultation on
the two groups. As in the pretest and the immediate posttest, the delayed posttest included
writing two argumentative essays. Each essay writing process started with 10 minutes of
brainstorming, followed by 30 minutes of writing. The writing prompts for the writing tests are
given in Appendix 2.
The overall schedule of the data collection was presented in Table 2.
Hyeyoung Cho 17
[TABLE 2] Timetable of the Study
Week Instruction Tests

1 Introductory session to corpus analysis Pretest
Textbook instruction
2-9
30 minutes of corpus consultation
10 Immediate posttest
Lectures on stylistic and rhetorical issues & writing activities
11-13
No corpus consultation
14 Delayed posttest
3. Data Analysis
The data collected through the experiment was analyzed in terms of the quantitative (types
and frequencies of formulaic expressions and writing quality) and the qualitative (usage of
formulaic expressions) differences in the use of formulaic expressions between the two groups.
In order to analyze the different types and frequencies of formulaic expressions, this study
performed cluster analysis using Wordsmith 5.0. Clusters refer to "words which are found
repeatedly together in each others' company, in sequence. They represent a tighter relationship
than collocates, more like multi-word units or groups or phrases" (Scott, 2014, para. 1). Given
the comprehensive definition of cluster, this study utilized the results of cluster analysis to
investigate the different use of formulaic expressions in the two groups. The cluster analysis of
this study created wordlists of clusters, the size of two to five words respectively in the pretest
and the immediate and delayed posttests. The minimum cut-off point of frequency was five.
Based on the wordlists of clusters, this study calculated the total frequency of clusters as
well as the number of cluster types and performed chi-square tests to investigate differences
between CG and UG. In order to investigate the instructional effects of the controlled corpus
consultation for improving general linguistic awareness on formulaic language, this study
examined the use of uninstructed clusters as well as instructed ones in the CG's writing and
performed chi-square tests to assess statistical significance.
In addition, this study examined the different usage of formulaic expressions between the
two groups by analyzing the clusters exclusively used by each group. Further, according to the
classification of words clusters by Hyland (2008), this study classified the clusters into
research-oriented, text-oriented, and participant-oriented types and examined the differences in
the usage of formulaic expressions between the two groups.

Finally, the students' writing scores were rated by two professional instructors with more
than 10 years of TESOL experience. They were native English speakers born and raised in the
US and UK respectively. They were asked to give an appropriate grade to an essay on a scale
from 0 (very poor and hard to understand) to 6 (very effectively organized and well-written)
according to the holistic rating rubric developed by ETS (Weigle, 2002). To ensure the
reliability of the scoring, the researcher gave raters samples of writing graded according to the
rubric before the actual job. The raters worked individually and exchanged their grading results
for cross-examination. When they gave different grades on the same writing, they discussed
with the researcher and chose one grade according to the guidelines. The scores of students'
writing were subjected to ANOVA and independent t-tests to assess the significance of changes
in the writing quality and the group by time interaction.
IV. RESULTS AND DISCUSSION
1. Types and Frequency of Clusters
Results of types and frequencies of clusters suggested statistically significant differences

between CG and UG. The results of 2-word clusters are reported in Table 3. In terms of types
of 2-word clusters, CG used 214 types of 2-word clusters in the pretest, which increased to 287
in the immediate posttest and 306 in the delayed posttest. Similarly, UG showed an increasing
number of types of 2-word clusters from 198 in the pretest to 225 in the immediate posttest
and 259 in the delayed posttest. The difference of the pretest between the two groups was not
2
statistically significant (χ (1)=0.62, p=.431), indicating homogeneity of the two groups.
However, the group difference on the immediate and the delayed posttests became significant
2 2
(χ (1)=7.51, p=.006* and χ (1)=3.91, p=.048* respectively), suggesting strong instructional
effects on CG to use various types of 2-word clusters in students' L2 writing.
Hyeyoung Cho 19
[TABLE 3] Types and Frequencies of 2-Word Clusters in CG and UG
Type Frequency
Group Immediate Delayed Immediate Delayed
Pretest Pretest
posttest posttest posttest posttest
CG 214 287 306 2305 3365 3199
UG 198 225 259 2173 2645 2638
Chi- χ2(1)=0.62, χ2(1)=7.51, χ2(1)=3.91, χ2(1)=3.89, χ2(1)=86.46, χ2(1)=53.92,
square p=.431 p=.006* p=.048* p=.049* p<.001* p<.001*
Note. An asterisk indicates that the chi-square value is statistically significant.
When it comes to the total frequency of 2-word clusters, CG used 2-word clusters more
frequently (2305) than UG (2173) on the pretest, and the group difference was marginally
significant (χ2(1)=3.89, p=.049*). The weak significance in the pretest is in contrast to the
strong results in posttests, indicating noticeable differences in instructional benefits between the
two groups. CG used 2-word clusters 3365 times in the immediate posttest, while UG used
them 2645 times. On the delayed posttest, CG used 2-word clusters 3199 times, while UG used
them 2638 times. The group differences were statistically significant both in the immediate (χ
2 2
(1)=86.46, p<.001*) and the delayed posttest (χ (1)=53.92, p<.001*), indicating the meaningful
instructional benefits of controlled corpus consultation of formulaic expressions. The results of
statistical examination on the types and frequencies of 2-word clusters in CG and UG showed a
significant group difference in the immediate and delayed posttests, suggesting that CG
increased as well as diversified their use of 2-word clusters.
The examination of 3-word clusters showed similar results with those of 2-word clusters,
providing a clear proof of the instructional benefits for CG in their use of formulaic
expressions as suggested in Table 4.
Type Frequency
Pretest Pretest
CG 51 84 68 588 1047 713
UG 46 50 52 549 715 526
Chi- χ2(1)=.26, χ2(1)=8.63, χ2(1)=2.13, χ2(1)=1.34, χ2(1)=62.56, χ2(1)=28.22,
square p=.612 p=.003* p=.144 p=.247 p<.001* p<.001*

Table 4 shows that the types of 3-word clusters of CG increased from 51 in the pretest to
84 in the immediate posttest and 68 in the delayed posttest. UG also used more types of 3-word
clusters in the immediate (50) and the delayed posttest (52) compared to the pretest (46), but
the improvement seemed to be more prominent in CG than in UG. The outperformance of
CG is statistically evident with significant group difference in the immediate posttest (χ
2
(1)=8.63, p=.003*), which is in contrast with the non-significant difference in the pretest (χ
2
(1)=.26, p=.612).
The frequency of 3-word clusters also showed significant improvement in CG’s writing. CG
used 3-word clusters 588 times, while UG used them 549 times in the pretest. The difference
2
between the two groups is not significant (χ (1)=1.34, p=.247), suggesting no group difference
in the pretest. However, the two groups showed significant difference in the immediate (χ
2
(1)=62.56, p<.001*) and the delayed posttest (χ2(1)=28.22, p<.001*). CG almost doubled the
frequency of 3-word clusters in the immediate posttest (1047) from that in the pretest (588),
while the increase of UG was relatively moderate (549 in the pretest and 715 in the posttest).
More significantly, the CG's frequency in the delayed posttest (713) was higher than that in the
pretest (588), while UG reduced the frequency in the delayed posttest (526) from the pretest
(549). The results on the use of 3-word clusters suggested significant outperformance of CG
both in types and frequencies of 3-word clusters, highlighting the instructional benefits for CG
in its use of formulaic expressions in L2 writing.
Results of 4-word clusters, as shown in Table 5, suggested similar group differences as
reported in the investigation of 2- and 3-word clusters. Although the types of 4-word clusters
showed statistically non-significant group differences presumably due to the low frequency, it is
noteworthy that CG increased the types of 4-word clusters from 24 in the pretest to 36 and 25
in the immediate and delayed posttest respectively, while UG reduced the types from 21 in the
pretest to 19 in the delayed posttest. The investigation on the frequency of 4-word clusters
showed much clearer findings on the differences between the two groups with statistical
significance. That is, the group difference was not significant in the pretest (χ2(1)=1.41,
2
p=.235), but it became significant in the immediate (χ (1)=12.24, p<.001*) and the delayed
2
posttest (χ (1)=9.76, p=.002*), indicating meaningful differences in the frequency of 4-word
clusters between the two groups.
Hyeyoung Cho 21
Type Frequency
Pretest Pretest
CG 24 36 25 313 485 286
UG 21 27 19 284 382 216
Chi- χ2(1)=.2, χ2(1)=1.29, χ2(1)=.82, χ2(1)=1.41, χ2(1)=12.24, χ2(1)=9.76,
square p=.655 p=.257 p=.366 p=.235 p<.001* p=.002*
The frequency of 5-word clusters also showed meaningful group differences in immediate
and delayed posttests (Table 6). Despite non-significant results in the types of 5-word clusters
(presumably due to low numbers), the investigation on the frequency showed that the group
2
differences was statistically significant in the immediate (χ (1)=8.61, p=.003*) and delayed
posttest (χ2(1)=8.15, p=.004*). This finding is more meaningful considering the non-significant
2
group differences in the pretest (χ (1)=2.2, p=.138), indicating a significant difference in the
instructional effects between the two groups.
Type Frequency
Pretest Pretest
CG 19 28 17 206 270 185
UG 16 18 13 177 206 134
Chi- χ2(1)=.26, χ2(1)=2.17, χ2(1)=.53, χ2(1)=2.2, χ2(1)=8.61, χ2(1)=8.15,
square p=.612 p=.14 p=.465 p=.138 p=.003* p=.004*
In order to further examine the instructional effects of CG in their use of formulaic

expressions, this study identified the types of instructed and uninstructed clusters in CG’s
writing as shown in Table 7. The total number of types of instructed and uninstructed clusters
showed significant improvement over three testing periods from 308 types in pretest to 435 and
416 types in the immediate and the delayed posttest respectively (χ2(2)=24.29, p<.001*). In
terms of instructed clusters, CG students used 11 clusters in the pretest, but qualitative
examination showed that the clusters were highly common and typical expressions in English

writing such as it can be and such as. More importantly, CG showed an increase in the number of
types of instructed clusters after the corpus consultation activities (26 and 24 in the immediate
and delayed posttest respectively), indicating instructional benefits of the controlled corpus
2
consultation to improve the use of selected formulaic expressions in L2 writing (χ (2)=6.53,
p=.038*). In addition, the use of uninstructed clusters showed more noticeable development
from pretest (297) to the immediate (409) and the delayed posttest (392), suggesting extensive
effects of corpus consultation activities to diversify the use of uninstructed word clusters (χ
2
(2)=19.91, p<.001*). The results seemed to indicate that the controlled corpus consultation on
selected formulaic expressions allowed students to investigate not only instructed but also
uninstructed expressions, heightening their general awareness on formulaic expressions in L2.
[TABLE 7] Types of Instructed and Uninstructed Clusters in CG’s Writing
Type of Immediate Delayed

Pretest Chi square results
clusters posttest posttest
Instructed 11 26 24 χ2(2)=6.53, p=.038*
Uninstructed 297 409 392 χ2(2)=19.91, p<.001*
Total 308 435 416 χ2(2)=24.29, p<.001*
Qualitative investigation on the use of instructed and uninstructed clusters in the two
groups’ writing suggested that CG was more successful to develop learners’ general awareness
on formulaic expressions than UG did through their corpus search. For instance, the formulaic
use of sense, or sense-clusters, showed significant group differences after eight weeks of corpus
consultation tasks. As shown in Table 8, there was no word cluster usage including the term
sense on the pretest of both groups. However, after CG did the corpus consultation on the term
in the sense that, the students seem to develop their awareness on the formulaic use of sense as
they varied the use of sense-clusters such as a sense of, in this sense and a sense of belonging in the
immediate and the delayed posttest. The diverse use of sense-cluster by CG is in stark contrast
with UG, which showed only one type of cluster (sense of) in the delayed posttest despite their
corpus search on sense.
Hyeyoung Cho 23
[TABLE 8] Use of Sense-Clusters by CG and UG
Pretest Immediate posttest Delayed posttest

in this sense, sense of, a sense, a sense of, sense of belonging,
CG -
this sense in this sense, this sense, a sense of belonging
UG - - sense of
The different development of the awareness on formulaic expressions between the two
groups seemed to be attributable to the different units of search terms in the corpus
consultation. During the corpus consultation, the researcher witnessed many instances, in
which CG students gradually modified their search terms (e.g., in the sense that) to shorter and
simpler ones (e.g., in the sense, the (a) sense, and sense), presumably in an attempt to refine their
corpus search. Through this process, the students moved from dealing with simple and limited
concordances to analyzing more diverse and complicated concordances including various
semantic and textual usages of the search terms. Despite the increasing difficulty of the task, the
process seemed to be manageable as the corpus analysis was scaffolded by the findings from
prior corpus search. Through this gradual refinement of corpus consultation, the CG students
seemed to be able to improve their awareness on various uninstructed formulaic expressions.
However, for UG, who started their corpus consultation with the single word sense, it
seemed to be more difficult for them to develop their corpus analysis by modifying search
terms as CG did. As shown in Figure 1, UG students had to deal with various usages and
functions of sense from the first corpus search results such as get a sense of, make sense, could sense,
my sense of, and common sense.
[Figure 1] Examples of Concordances of Sense
The concordances including various examples of the use of sense may have seemed too
arbitrary for the students to notice formulaic patterns based on semantic and functional
consideration of the search term. In order to advance their corpus search, UG students had to
choose one or two clusters based on the clusters’ pedagogical values, the process which

required substantial linguistic and analytical ability. In this effortful process, UG seemed to be
significantly disadvantaged in developing the same level of awareness of formulaic expressions
as CG did.
All in all, the investigation on the types, frequencies, and the use of instructed and
uninstructed clusters suggested meaningful achievement of CG in developing general awareness
on formulaic language in L2. In terms of the frequencies of 2-, 3-, 4-, and 5-word clusters and
the types of 2- and 3-word clusters, CG showed statistically significant outperformance against
UG in the immediate and delayed posttests. In addition, the investigation on the use of
instructed and uninstructed word clusters supported the finding as CG students increased and
diversified both types of clusters after the corpus consultation activities. The findings seemed to
suggest that the controlled corpus consultation created a more suitable environment for
learners to enhance their general awareness on formulaic expressions than the uncontrolled task
did.
2. Usage of Word Clusters in CG and UG
Results of qualitative investigation on the usage of word clusters in the writing by CG and
UG provided us with a clear picture to explain the significant group differences in learners'
awareness on formulaic expressions. In order to choose word clusters for qualitative
examination, this study identified clusters that were exclusively used by each group in the
immediate and delayed posttests. For accurate evaluation of the instructional effects, this study
excludes clusters that were parts of the writing prompts from the analysis because it is difficult
to determine whether or not the use of the formulaic expressions in the writing prompts was a
result of the improved awareness on formulaic language through learners' corpus consultation.
The qualitative investigation on the usage of word clusters revealed two major findings.
First, UG used some erroneous expressions that were not found in CG's writing. For instance,
official buildings, wear same clothes and many of were erroneous clusters only present in UG's writing.
Official buildings in UG's writing was classified as an erroneous expression because qualitative
analysis on UG's writing suggested that it was mistakenly used for office buildings in the writing
prompt. In addition, wear same clothes and many of are grammatically wrong due to omission of
the article the and incorrect use of preposition of respectively. Illustration of erroneous use of
many of is given in Figure 2. Figure 2 demonstrates that the use of many of is not a unique
mistake by a single writer, but a systematic error made by multiple writers in UG. Given the
distinctive and systematic errors by UG writers in their use of word clusters, it seems to be true
Hyeyoung Cho 25
that UG was less aware of appropriate forms of formulaic language than CG, which did not use
such erroneous expressions in their writing.
[Figure 2] Erroneous Use of Many of by UG
Another significant group difference is found in terms of macrofunctions of the word

clusters. For objective investigation on different use of macrofunctions, this study adopted the
classification of word clusters by Hyland (2008), who suggested three linguistic macrofunctions
of words clusters based on the study of Halliday (1994): research-oriented; text-oriented; and
participant-oriented clusters. Hyland (2008) noted that research-oriented clusters help writers to
structure their activities and experiences of the real world, concerning location, procedure,
quantification, description, and topics of writing. Text-oriented clusters are concerned with the
organization of the text such as transition signals, resultative signals, structuring signals, and
framing signals. Finally, participant-oriented clusters are related with the writer/reader of the
text, such as stance and engagement features. Based on the definitions of the categories, this
study classified the 3- and 4-word clusters, which were exclusively used in each group's writing.
This study did not examine the 2- and 5-word clusters because 2-word clusters were too short
to identify their macrofunctional orientation and all 5-word clusters were parts of the writing
prompts. The results of classification of functional clusters are shown in Table 9.

[TABLE 9] Categorization of 3- and 4-word Clusters Exclusively Used by Each Group
Cluster Categories CG UG
a lot of, in touch with, more and more, not good for, the
wear same clothes*,
fact that, do harm to, is harmful to, the issue of, there
is not only, there is a, and it
is not, we don't have, a sense of, sense of belonging, to
research-oriented is, is one of, can be a, there are
talk about, in front of, can talk about, in that it, it is
many, but it is, it is hard,
easy, talk about the, the statement that, they can talk,
3-word some people say
they want to, to have a
cluster
according to the, it comes to, when it comes, in other
this is because, because of the,
text-oriented words, in this sense, for these reasons, in terms of,
however it is, however I think
for this reason
I agree with, agree with the, in my opinion,
participant-oriented I believe that, I agree that
should not be, these reasons I
research-oriented a sense of belonging is one of the
4-word
text-oriented when it comes to -
cluster
participant-oriented I agree with the, for these reasons I -
In terms of 3-word clusters, the most frequently used function was research-oriented one in
both groups, and it is notable that CG included clusters indicating various types of syntactic
structures (e.g., do harm to, is harmful to, there is not, we don't have, can talk about, in that it, it is easy,
they can talk, they want to), while the majority of UG's research-oriented clusters (8 out of 10)
included be verbs. Studies have suggested that excessive use of be copula as a main verb is
indicative of low syntactic structure in English writing (e.g., Hinkel, 2002; 2003). In terms of the
text-oriented clusters, CG used more various clusters (8) than UG (4). In particular, UG's
3-word clusters were limited to the use of causal (because) and adversative (however) connectors,
while CG used various transition signals of different textual functions such as according to the,
when it comes, in other words, in this sense and in terms of. The use of participant-oriented clusters also
showed that CG used diverse types such as I agree with, agree with the, in my opinion, should not be
and these reasons I, while UG showed only two clusters, I believe that and I agree that. The use of
4-word clusters showed similar results to 3-word clusters in that UG had only one occurrence
(is one of the), while CG showed four clusters across all three macrofunctions.
In brief, CG used not only less erroneous word clusters, but more diverse clusters serving
different macrofunctions than UG. The results clearly indicate the instructional benefits of
controlled corpus consultation for improving learners' awareness of appropriate and diverse
forms of formulaic expressions. In addition, it should be noted that most of the clusters used
Hyeyoung Cho 27
exclusively by CG were not the given search terms for corpus consultation. This seems to
suggest that the controlled task did not restrict the scope of the corpus search; rather, it allowed
for more effective investigation of the data to develop learners' general awareness of formulaic
expressions in L2.
3. Overall Quality of Writing
In order to compare the different instructional benefits on the writing quality between the
two groups, this study examined the two groups' mean scores of writing tests, the significance
of group difference on each test, and group by time interaction. The results are reported in
Table 10. In terms of a within-subject ANOVA, Mauchly’s test of sphericity shows that the
sphericity assumption was violated, so Huynh-Feldt adjustment was used for hypothesis testing
(W(2)=.905, p=.021). For between-subject tests, unlike CG (W(2)=.990, p=.829), UG showed
significant results (W(2)=.784, p=.010), which was adjusted by Huynh-Feldt’s epsilon.
[TABLE 10] Results of Statistical Analysis on Writing Quality of CG and UG
Immediate Within-subject
Pretest Delayed posttest Between-subject
posttest ANOVA
Group ANOVA
(group by time
Mean SD Mean SD Mean SD (time effect)
interaction)
F(2,78)=103.77,
CG 2.38 .84 3.13 .69 3.9 .87 F(1.89,147.57)
p<.001*
=40.43,
F(1.71, 66.63)=3.1,
UG 2.6 .55 2.9 .74 2.7 .65 p<.001*
p=.059
Independ t(78)=-1.42, t(78)=1.41, t(78)=6.99,
ent t-test p=.159 p=.164 p<.001*
Note. An asterisk indicates that the p value is statistically significant.
CG's mean score of the pretest was 2.38, which improved to 3.13 in the immediate posttest
and 3.9 in the delayed posttest. The improvement over time was statistically significant
(F(2,78)=103.77, p<.001*). This is in stark contrast with the results of UG, who made
non-significant improvement in the posttest (2.9) and delayed posttest (2.7) (F(1.71,66.63)=3.1,
p=.059). The repeated measure ANOVA suggested significant group by time interaction
(F(1.89,142.57)=40.43, p<.001*), indicating meaningful group differences in the writing quality
over three times of measurements. Further, it is noteworthy that the independent t-tests to

examine group difference in each test suggested that the two groups showed a meaningful
difference in the delayed posttest (t(78)=6.99, p<.001*). Given the non-significant difference in
the pretest (t(78)=-1.42, p=.159), the significant group differences in the delayed posttest hinted
at strong enduring benefits of the controlled corpus consultation in overall quality of L2
writing. The different patterns of writing quality over time can be illustrated as in Figure 3.
[Figure 3] Differences in Improvement of Writing Quality between the Two Groups
V. CONCLUSION
Based on the increasing attention on the direct use of corpora for L2 learning, a number of
studies have reported a need for controlled tasks to exploit the best benefits of learners' corpus
consultation (e.g., Ädel, 2010; Aston, 2001; Kennedy & Miceli, 2010). In response to such
needs, this study explores the instructional benefits of controlled corpus consultation of
selected formulaic expressions in L2 writing. This study presents an empirical investigation to
compare the students’ linguistic awareness of formulaic expressions and the writing quality
between a controlled corpus consultation group (which developed the corpus search starting
from selected formulaic expressions) and an uncontrolled corpus consultation group (which
began the corpus search from individual search terms). The findings of this study suggested
that the controlled group improved the number of types and frequency of formulaic
expressions in their L2 writing, indicating instructional benefits of the controlled task to
improve the learners' awareness on formulaic expressions in L2. Qualitative investigation on the
use of clusters supported the improvement of the controlled group, as it used more diverse and
Hyeyoung Cho 29
accurate word clusters, while the uncontrolled group showed less diversity with some erroneous
usage of clusters. In terms of writing quality, the controlled group also showed meaningful
improvement in the immediate and delayed posttests compared to the uncontrolled
counterpart. The findings of this study showed a significant outperformance of the controlled
group, indicating the instructional benefits of controlled corpus consultation of selected
formulaic expressions for improving linguistic awareness on formulaic expressions as well as L2
writing quality.
The significant group difference in this study is majorly attributable to the different unit of
search terms. The investigation of this study suggested that the different search terms seemed
to create significantly different environments for the learners to develop their linguistic
awareness on formulaic expressions and L2 writing proficiency. When students start their
corpus consultation with single search words, they usually have to deal with a vast amount of
concordances. Analyzing the concordances, students have to put a lot of effort into identifying
formulaic patterns because there are a number of possible hypotheses about meanings and
usage of the search terms. As a way of the testing the hypotheses, students have to modify their
search terms by adding new words and changing the words, which requires significant analytical
and linguistic ability (e.g., Ädel, 2010; Kennedy & Miceli, 2010; Liu & Jiang, 2009). On the
other hand, when students started their corpus search with formulaic expressions as search
terms, they have fewer concordances, making interpretation of the data more manageable. With
a small number of possible hypotheses about the usage of the search terms, it can be relatively
easy for the students to complete the testing of hypotheses and to refine their corpus search,
allowing for more opportunities for learners to examine various forms and usage of formulaic
expressions.
The findings of this study suggested several pedagogical implications for corpus-based L2
writing instruction. Most of all, it highlighted the significance of a controlled approach in
learners' use of corpus data. As a number of studies have suggested (e.g., Ädel, 2010; Kennedy
& Miceli, 2010; Liu & Jiang, 2009; McEnery, Xiao, & Tono, 2006; Widdowson, 1998; 2000;
2004), the direct use of corpora for pedagogical purposes may bring about considerable
challenges. Despite the significant benefits of corpus searching, it seems obvious that more
guidance and training is required both for instructors and students in their use of corpora as a
learning tool. The findings of this study echoed the significance of controlled corpus
consultation, inviting future studies to examine various ways of controlling the data and tasks of
corpus consultation for better pedagogical benefits. In addition, instructors who incorporate or
hope to incorporate corpora into their L2 instruction should keep in mind the strong need for

pedagogic medication in learners’ corpus consultation process. Given the outperformance of
the controlled task group in this study, instructors need to appreciate the value of the
controlled approach and be ready to deal with the potential challenges in the process of
learners' corpus consultation.
In addition, this study shed light on the value of formulaic expressions in learners' corpus
consultation. The findings of this study suggested that focused attention on formulaic
expressions in learners' corpus consultation facilitated not only the use of instructed
expressions, but also that of uninstructed ones, hinting at a significant effect of the instructional
focus on formulaic expressions in the learners' corpus search. Consequently, instructors need to
put more focus on formulaic expressions in designing learners' corpus consultation tasks and
should develop various ways to improve students' linguistic awareness on formulaic language
through corpus-based L2 writing instruction. Along with the controlled task of learners' corpus
consultation (as presented in this study), prior studies suggested several ways of teaching
formulaic expressions such as noticing (e.g., Boers, Eyckmans, Kappel, Stengers, &
Demecheleer, 2006), CLT-based instruction (e.g., Gatbonton & Segalowitz, 1988; 2005) and
text memorization (e.g., Ding, 2007; Dai & Ding, 2010). Based on the findings of these studies,
it is hoped that more effective and diverse ways of corpus-based instruction on formulaic
expressions will be designed and examined in future empirical studies.
The findings of this study should, however, be taken with caution. Due to the quantitative
research framework, this study fell short of examining the students' thoughts and opinions on
the learning experience through employing questionnaires or interviews. Any future attempt to
examine the students' opinions on the controlled corpus consultation of formulaic expressions
would provide more in-depth understanding and explanation on the instructional effects of the
controlled corpus consultation in L2 writing. In addition, this study used only 40 students,
which was a relatively small number of subjects from which to draw a generalizable conclusion.
Accordingly, it is hoped that future studies will recruit more participants to duplicate this study,
which would provide empirical support for the current findings. Finally, the findings of this
study should be cautiously adopted by researchers and practitioners who consider formulaic
expressions as the only appropriate form of search terms for corpus consultation. It is true that
the findings of this study suggested positive results for the controlled corpus consultation of
selected formulaic expressions, but learners' corpus consultation using single words as search
terms has its own merit (e.g., Chambers & O’Sullivan, 2004; Cresswell, 2007; Lee & Swales,
2006; O'Sullivan, 2007; Todd, 2001; Yoon, 2008; Yoon & Hirvela, 2004). In addition, it should
be noted that incidental learning and data-driven learning have long-term and life-long effects,
Hyeyoung Cho 31
which are not readily identifiable through the current experimental design. As such, we cannot
easily dismiss the benefits of corpus consultation using single words as search terms, which
should be investigated through carefully designed longitudinal studies.
Notwithstanding these limitations, the findings of this study have significance for
corpus-based L2 writing instruction. This study is meaningful as it explored the instructional
benefits of controlled corpus consultation of selected formulaic expressions for improving
students' linguistic awareness of formulaic expressions and L2 writing proficiency. With only a
limited number of studies exploring the controlled corpus consultation tasks, it is hoped that
findings of this study will provoke further studies to examine various types and functions of
controlled tasks of learners' corpus consultation. In addition, based on the findings of this
study, future research is hoped to explore the instructional value of formulaic expressions in
learners' corpus consultation to achieve the best benefits of corpus-based L2 writing
instruction.
REFERENCES
Ädel, A. (2010). Using corpora to teach academic writing: Challenges for the direct approach. In M.
Campoy-Cubillo, B. Belles-Fortuno, & M. Geo-Valor (Eds.), Corpus-based approaches to English
language teaching (pp. 39-55). London & New York: Continuum.
Aston, G. (2001). Learning with corpora: An overview. In G. Aston (Ed.), Learning with corpora (pp.
7-45), Houston, TX: Athelstan.
Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, M. (2006). Formulaic sequences and
perceived oral proficiency: Putting a lexical approach to the test. Language Teaching Research,
10(3), 245-261.
Braun, S. (2005). From pedagogically relevant corpora to authentic language learning contents.
ReCALL, 17(1), 47-64.
Chambers, A. (2005). Integrating corpus consultation in language studies. Language Learning &
Technology, 9(2), 111-125.
Chambers, A., & O'Sullivan, I. (2004). Corpus consultation and advanced learners' writing skills in
French. ReCALL, 16(1), 158-172.
Cheng, W., Warren, M., & Xun-Feng, X. (2003). The language learner as language researcher: Putting
corpus linguistics on the timetable. System, 31(2), 173-186.
Cho, H. (2014). The effects of corpus consultation of formulaic expressions on the improvement of automaticity in the

cognitive process and L2 writing quality. Unpublished doctoral dissertation, Hankuk University of
Foreign Studies, Korea.
Conrad, S. (2005). Corpus linguistics and L2 teaching. In E. Hinkel (Ed.), Handbook of research in second
language teaching and learning (pp. 393-409). Mahwah, NJ: Lawrence Erlbaum Associates.
Cresswell, A. (2007). Getting to 'know' connectors? Evaluating data-driven learning in a writing skills
course. In E. Hidalgo, L. Quereda, & S. Juan (Eds.), Corpora in the foreign language classroom (pp.
267-287). Amsterdam: Rodopi.
Dai, Z., & Ding, L. (2010). Effectiveness of text memorization in EFL learning of Chinese students. In
D. Wood (Ed.), Perspectives on formulaic language in acquisition and communication (pp. 71-87). London
& New York: Continuum.
Ding, Y. (2007). Text memorization and imitation: The practices of successful Chinese learners of
English. System, 35(2), 271-280.
Flowerdew, L. (2008). Corpus linguistics for academic literacies mediated through discussion activities.
In D. Belcher. & A. Hirvela (Eds.), The oral-literate connection: Perspectives on L2 speaking, writing and
other media interactions (pp. 268-287). Ann Arbor, MI: University of Michigan Press.
Flowerdew, L. (2009). Applying corpus linguistics to pedagogy: A critical evaluation. International Journal
of Corpus Linguistics, 14(3), 393-417.
Flowerdew, L. (2012). Exploiting a corpus of business letters from a phraseological, functional
perspective. ReCALL, 24(2), 152-168.
Gatbonton, E. & Segalowitz, N. (1988). Creative automatization: Principles for promoting fluency
within a communicative framework. TESOL Quarterly, 22(3), 473–492.
Gatbonton, E. & Segalowitz, N. (2005). Rethinking communicative language teaching: A focus on
access to fluency. Canadian Modem Language Journal, 61(3), 325-353.
Gavioli, L., & Aston, G. (2001). Enriching reality: Language corpora in language pedagogy. ELT
Journal, 55(3), 238-246.
Geluso, J. (2013). Phraseology and frequency of occurrence on the web: Native speakers’ perceptions
of Google-informed second language writing. Computer Assisted Language Learning, 26(2),
144-157.
Halliday, M. A. (1994). Functional grammar. London: Edward Arnold.
Hill, J. (2000). Revising priorities: From grammatical failure to collocational success. In M. Lewis (Ed.),
Teaching collocation: Further development in the lexical approach (pp. 47-69). Oxford: Oxford
University Press.
Hinkel, E. (2002). Second language writers' text: Linguistic and rhetorical features. London: Routledge.
Hinkel, E. (2003). Simplicity without elegance: Features of sentences in L1 and L2 academic texts. Tesol
Quarterly, 37(2), 275-301.
Hyland, K. (2008). Academic clusters: Text patterning in published and postgraduate writing.
International Journal of Applied Linguistics, 18(1), 41-62.
Hyeyoung Cho 33
Kennedy, C., & Miceli, T. (2001). An evaluation of intermediate students' approaches to corpus
investigation. Language Learning & Technology, 5(3), 77-90.
Kennedy, C., & Miceli, T. (2010). Corpus-assisted creative writing: Introducing intermediate Italian
learners to a corpus as reference resource. Language Learning & Technology, 14(1), 28-44.
Lee, D., & Swales, J. (2006). A corpus-based EAP course for NNS doctoral students: Moving from
available specialized corpora to self-compiled corpora. English for Specific Purposes, 25(1), 56-75.
Liu, D., & Jiang, P. (2009). Using a Corpus Based lexicogrammatical approach to grammar instruction
in EFL and ESL contexts. The Modern Language Journal, 93(1), 61-78.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies. London: Routledge.
Nam, D. (2010). The effects of corpus-based language instruction on productive vocabulary
knowledge. Multimedia-Assisted Language Learning, 13(2), 155-176.
Oshima, A., & Hogue, A. (2006). Writing academic English. New York: Pearson Education.
O'Sullivan, I. (2007). Enhancing a process-oriented approach to literacy and language learning: The role
of corpus consultation literacy. ReCALL, 19(3), 269-286.
Pérez-Paredes, P., Sánchez-Tornel, M., Alcaraz Calero, J. M., & Jiménez, P. A. (2011). Tracking
learners' actual uses of corpora: Guided vs non-guided corpus consultation. Computer Assisted
Language Learning, 24(3), 233-253.
Scott, M. (2014). WordSmith tools manual. Retrieved May 12, 2016, from http://www.lexically.
net/downloads/ version6/HTML/index.html?single_words.htm
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology
research. Applied Linguistics, 31, 487-512.
Smart, J. (2014). The role of guided induction in paper-based data-driven learning. ReCALL, 26(2),
184-201.
Sun, Y.-C. (2007). Learner perceptions of a concordancing tool for academic writing. Computer Assisted
Language Learning, 20(4), 323-343.
Todd, R. W. (2001). Induction from self-selected concordances and self-correction. System, 29(1),
91-102.
Tribble, C., & Jones, G. (1990). Concordances in the classroom: A resource book for teachers. Harlow: Longman.
Vännestal, M., & Lindquist, H. (2007). Learning English grammar with a corpus: Experimenting with
concordancing in a university grammar course. ReCALL, 19(3), 329-350.
Wayabroad Company. (2002). TOEFL writing (TWE) topics and model essays. Retrieved Jan 25, 2016, from
https://www.wiziq.com/tutorial/671118-185-TOEFL-Writing-TWE-Topics-and-Model-Ess
ays
Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press.
Widdowson, H. G. (1998). Communication and community: The pragmatics of ESP. English for Specific
Purposes, 17(1), 13-14.
Widdowson, H. G. (2000). On the limitations of linguistics applied. Applied Linguistics, 21(1), 3-25.

Widdowson, H. G. (2004). Text, context, pretext. London: Blackwell.
Yoon, H. (2008). More than a linguistic reference: The influence of corpus technology on L2 academic
writing. Language Learning & Technology, 12(2), 31-48.
Yoon, H., & Hirvela, A. (2004). ESL student attitudes toward corpus use in L2 writing. Journal of Second
Language Writing, 13(4), 257-283.
Yoon, H., & Jo, J. W. (2014). Direct and indirect access to corpora: An exploratory case study
comparing students' error correction and learning strategy use in L2 writing. Language, Learning
& Technology, 18(1), 96-117.
APPENDIX A
1. List of Expressions for Corpus Consultation
Week Controlled Group Uncontrolled Group

in terms of the/with respect to
important/importance
in the case of/in this case/ when it comes to
serious/ premier/ real/ interested
week the relationship between/ in relation to/ related
most / more/ numerous/
2 to/associated with the/ in the sense that/ by the same
some/few/each/ different/difference
token/ from the point of view/ as well as/ generally
short/ large/ other/ due/same
speaking
as a result of/ in order to/ this is a/there is no/ it is not/
related/associated/relationship/relation
that there is a/ a number of/ the number of /large number
week result/suggest/explain/refer/function
of /the amount of/ a variety of/a series of
3 sense/see/use/respect/face/come/cast
the extent to which/to some extent/ the way in
/ complete/follow/ opposed/contrast
which/ways in which/ I would explain
drawbacks such as/cast (serious) doubts on that /which /what /who/why
week complete evidence to/come to the conclusion that /when/where/how
4 what is more/numerous reasons why whether/as/although/even (though)
as follows/for instance and/so/or/but
words/doubts/term/part/such/
due to the/the fact that the/ on the basis of/based on
number/amount/variety/extent/series
week the/is based on/ an example of/as an example/for
order/drawback/development/
5 example of/such as the/in such a way/ in other words /the
example/instance/reasons/basis/based
part of the/each of these/and so on
/fact/evidence/ conclusion/summary
at the same time/ exactly the same/in the same may/ might/ will/ would/ could/ can
week more likely to/ likely to be/ as opposed to/ in contrast/ the should/must / have to/ have(had)
6 difference between (the) (two)/ in spite of (the fact that better/ prefer/agree/view/consider/
there may be)/ People may prefer (A to B) deem/ seem/ argue/suggest/neglect
Hyeyoung Cho 35
When we consider that/ A deem B as the premier choice/
well/ exactly/ likely/ also/ safely
be interested in/face the dilemma of (whether to A or to
week case/ way/ point/ role
B)/ whether or not/ referred to as/ in response to/ the real
7 effect/ size/ response/ choice/ type
world/ modern society/ It is important to/ It is also
presence/ spite/ dilemma/ world
important to see that/ In summary/ In short
it may be/may not be/ there may be/ may neglect that/ this
week may explain why/ Some people might argue that/ can be in/on/at/by/from/with/between/to/
8 used/it can be/ we can see/ it does not/ Most people toward/of/for
would agree that/ it would be/ it seems to be/ it seems that
the development of/the role of/the size of the/ the
week this/there/it/ we/I/these/people
importance of / the effect of/ as a function of/ the use of/
9 the/a/ not/no
the presence of/ different types of
2. Writing Topics
Test Topic
The sale of human organs should be legalized.
Pretest
Businesses should do anything they can to make a profit.
People should not be allowed to smoke in public places and office buildings.
Posttest Face-to-face communication is better than other types of communication, such
as letters, email or telephone calls.
Delayed High school students should be required to wear school uniforms.
posttest Television has destroyed communication among friends and family.
Applicable levels: tertiary education

Author: Cho, Hyeyoung (Cyber Hankuk University of Foreign Studies);
junjungh7@naver.com
Received: July 31st, 2016

Reviewed: August 20th, 2016
Accepted: September 15th, 2016

Copyright of Multimedia-Assisted Language Learning is the property of Korea Association of
Multimedia-Assisted Language Learning and its content may not be copied or emailed to
multiple sites or posted to a listserv without the copyright holder's express written permission.
However, users may print, download, or email articles for individual use.

Content Server

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Content Server

Uploaded by

Copyright:

Available Formats

http://journal.kamall.or.kr/wp-content/uploads/2016/10/Cho_19_3_01.pdf http://www.kamall.or.

Multimedia-Assisted Language Learning

An Investigation of Controlled Corpus Consultation of Selected

Hyeyoung Cho (Cyber Hankuk University of Foreign Studies)

Cho, Hyeyoung. (2016). An investigation of controlled corpus consultation of selected formulaic

12 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

II. CONTROLLED APPROACH IN CORPUS CONSULTATION

14 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

1) Do CG and UG show differences in the number of frequencies and types of formulaic

[TABLE 1] Description of Participants

16 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

Week Instruction Tests

18 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

IV. RESULTS AND DISCUSSION

1. Types and Frequency of Clusters

Results of types and frequencies of clusters suggested statistically significant differences

[TABLE 4] Types and Frequencies of 3-Word Clusters in CG and UG

20 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

[TABLE 6] Types and Frequencies of 5-Word Clusters in CG and UG

In order to further examine the instructional effects of CG in their use of formulaic

22 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

[TABLE 7] Types of Instructed and Uninstructed Clusters in CG’s Writing

Type of Immediate Delayed

Pretest Immediate posttest Delayed posttest

[Figure 1] Examples of Concordances of Sense

24 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

2. Usage of Word Clusters in CG and UG

[Figure 2] Erroneous Use of Many of by UG

Another significant group difference is found in terms of macrofunctions of the word

26 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

3. Overall Quality of Writing

[TABLE 10] Results of Statistical Analysis on Writing Quality of CG and UG

28 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

[Figure 3] Differences in Improvement of Writing Quality between the Two Groups

30 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

32 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

34 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

1. List of Expressions for Corpus Consultation

Week Controlled Group Uncontrolled Group

Applicable levels: tertiary education

Received: July 31st, 2016

36 An Investigation of Controlled Corpus Consultation of Selected Formulaic Expressions in L2 Writing

You might also like