You are on page 1of 20

The practicality of Data- driven learning on developing EFL learners’ adjective -

noun collocations

1. INTRODUCTION
English has become the predominant foreign language in Vietnam, where many
citizens need to have the ability to use this language (Vu & Peters, 2021). Towards the
foreign language learning, there is no denying that vocabulary plays a pivotal role in the
language proficiency (Nation & Meara, 2020). Of many lexical items, collocations can be
a challenging formulaic expression for many EFL learners (McCarthy and O’Dell, 2017).
Teaching and learning collocations can be various; one of the most noteworthy
approaches is data-driven learning (Hadley, 2002). Using data-driven learning shows the
teachers’ employment of corpus-based tools to promote students’ access to authentic
texts for their later language use (Pérez-Paredes, 2022). Theoretically, this approach has
been believed to be useful for students as it helps them see the language in use rather than
trying to remember many lists of grammatical rules and producing unnatural sentence
patterns (Cotos, Link, & Huffman, 2017; Wu, 2021). The use of data-driven learning
(DDL) could help students become more autonomous in their learning, too (Lin, 2016).
On the other hand, the unequal distribution of the students’ linguistic proficiency levels
and the great amount of language data could be a potential obstacle to this approach
(Park, 2012). Overall, the use of DDL could be related to the study of computer-assisted
language learning; however, it is strongly connected to the study of language learning and
teaching (Wu, 2021).
Regardless of research contexts, the use of DDL for students’ collocation retention
has been studied in a wide range of studies. Moreover, the most prominent corpus-based
tools for the data-driven learning approach are Corpus of Contemporary American
English (COCA) and online dictionaries. Many studies to show the positive effects of
DDL on EFL learners’ collocation retention are Kartal and Yangineksi (2018), Pham
(2018), Altun (2021), Fang, Ma and Yan (2021), and Wu (2021). Moreover, Vietnamese

1
EFL learners’ problems with adjective-noun collocations were confirmed by the study of
Pham (2022). All of this motivates the current researcher to conduct the study.
2. LITERATURE REVIEW
2.1. Data-driven learning
Hadley (2002, p.108) reported that data-driven learning (DDL) refers to “a method
of studying grammar. Language learners start with a question, and then come to their
conclusions after analyzing the corpora with a concordancer program.” In this learning,
the language learners can be exposed to authentic sources of language to generalize the
appropriate language use. Similarly, Pérez-Paredes (2022) recited that DDL was
provoked by linguists who have been becoming interested in using corpora, a great
collection of authentic texts, for their language research.
Basically, the use of DDL serves academic purposes, especially for vocabulary
learning (Lee, Warschauer, & Lee, 2019) and writing (Cotos, Link, & Huffman, 2017;
Wu, 2021). Moreover, the use of DDL could help EFL learners become more
autonomous in their language learning, especially for writing improvement (Lin, 2016).
Nevertheless, because EFL learners have to discover the authentic written language in
diverse contexts, they might find it challenging to generalize correct patterns and this
learning heavily depends on their own language proficiency (Park, 2012).
2.2. Collocations
Celce-Murcia and Schmitt (2010, p.9) defined collocations as “words co-occur
together” such as strong coffee, a herd of deer, and so forth, which form chunks or
formulaic language expressions and is used quite often by native speakers. Additionally,
McCarthy and O’Dell (2017) stressed that collocations show the co-occurrence of words
as meaningful word combinations that native speakers frequently use, which is
challenging to comprehend and use by many foreign language learners.
Moehkardi (2002) reported that collocations are divided into two groups: lexical
collocations and grammatical collocations. Lexical collocations refer to ones without any
grammatical function but meaning use. They consist of verb + noun/ pronoun (e.g.,
“compose music”), noun+ verb (e.g., “bomb explodes”), noun+ noun (e.g., “a bouquet of

2
flowers”), adjective + noun (e.g., “best regards”), adverb + adjective (e.g., “deeply
absorbed”), and and verb + adverb (e.g., “appreciate sincerely”). Grammatical
collocations refer to ones with grammatical functions, which can comprise of noun +
preposition (e.g, solution to), verb + preposition (e.g., consist of), and adjective +
preposition (e.g., angry with)
2.3. Previous findings
The study of Kartal and Yangineksi (2018) proved that data-driven learning
(DDL) could have positive effects on Turkish EFL learners’ collocation retention. Thanks
to data obtained from writing pretest and posttest, collocation pretest and posttest, and
survey after course. The participants included in this study were 60 first-year students
majoring in English teaching, equally distributed into one control and one experimental
group. Results revealed that the students did not change statistically remarkably their
collocation receptive knowledge after treatment. However, the experimental group could
outweigh their collocation production in writing. These students also expressed positive
attitudes towards the incorporation of corpus-based tools for language learning.
In a Vietnamese context, Pham (2018) included 68 English majors at a private
university in Hochiminh City in the quasi-experimental study in order to investigate the
effects of using corpus-based tools on the students’ writing abilities and vocabulary
learning gains. The instruments were a combination of writing pretest and posttest and
vocabulary pretest and posttest. The findings showed the positive influences of DDL on
EFL receptive and productive vocabulary knowledge in academic writing. The researcher
came to the conclusion that this approach was useful for the students.
EFL learners’ inclinations of data-driven learning approach for their collocation
retention was also a concern in the study of Wu (2021). The quasi-experimental study
included seven Taiwanese first-year EFL students at a local university who completed
writing pretests and posttests before and after course. The other instruments were the
questionnaire and interview after course and video recordings of the students’ collocation
retrieval on Corpus of Contemporary American English (COCA). Results demonstrated
that the data-driven learning approach was positive to the students’ collocation retention

3
to use in their written essays, in terms of accuracy and complexity. The students appeared
to have different perceptions of understanding and incorporating the induced sentence
structures into their written products; however, there existed the remarkable
improvement.
In a detailed research with clear statistical evidence, Altun (2021) demonstrated
the positive impacts of DDL on Turkish EFL learners’ collocation retention, regardless of
any type (strong or weak collocations). This research included two different groups
(N=44, equally distributed): one control and one experimental groups who received
COCA instruction. Though the results did not reach the statistical remarkable change in
the corpus group about their strong collocation retention, their mean change was higher
than the other. Especially, the retention of weak collocations in this group outweighed the
other.
Similarly, Fang, Ma and Yan (2021) stated that Chinese EFL learners in the study
(N=22) could improve their collocation retention to use after the corpus-based
instruction, focusing on three types: verb+ noun, adjective+ noun, and verb+ preposition.
Moreover, they could express more positive attitudes towards this approach despite the
fact that they might not be allowed to use these tools in formal tests or exams. Three
research protocols helped confirm this, namely pretest-posttest on writing, questionnaire,
and interview after course. All major results showed the positive effects of this
instruction on the students’ collocation retention to apply in writing.
As regards Vietnamese students’ problems with English adjective-noun
collocations, Pham (2022) illustrated that Vietnamese EFL learners’ acquisition of
adjective-noun collocations in English is noteworthy. At a local public university in
Hochiminh City, Pham (2022) conducted two different tests to measure the students’
receptive and productive knowledge in adjective-noun collocations. The results
demonstrated that the students, even ones at higher-level, really confronted with many
difficulties in both understanding and producing these collocations. The advanced
students were inclined to reproduce new collocations as a product of their creativity.
Therefore, though this study might not focus on the use of DDL in Vietnamese EFL

4
learners’ collocation retention, it might help describe Vietnamese EFL learners’ with a
particular type of collocation: adjective-noun. This should be a concern when studying
Vietnamese EFL learners’ collocation retention.
In a nutshell, it was noteworthy that using collocations can be a problem for many
EFL learners, including Vietnamese ones. Of many types, adjective-noun collocations
can be a typical case. Moreover, the use of DDL aims to provide authentic language use
was demonstrated to be useful for many EFL learners to improve their collocation
retention in writing and attitudes towards vocabulary instruction, regardless of the
research contexts. There remains a question of whether using DDL could really improve
Vietnamese EFL learners’ adjective-noun collocation retention. Accordingly, the present
research aims to address the following questions.
Research question 1: Is there a positive effect of data-driven learning on Vietnamese EFL
learners’ adjective-noun collocation production in writing?
Research question 2: What are Vietnamese EFL learners’ attitudes towards using data-
driven learning for their adjective-noun collocation production in writing?
3. RESEARCH METHODOLOGY
3.1. Research design
The current study employs action research in order to investigate the effects of
data-driven learning approach on Vietnamese EFL learners’ adjective-noun collocation
retention in their writing. The rationales behind this research design are twofold. First, the
current researcher was also the teacher of the intact classes who would like to change the
students’ learning situation. The teacher as researcher has observed that these students
made many errors in using collocations in their writing, especially wrong adjective-noun
combinations. Second, choosing this research design would help the researcher design the
effective teaching approach rather than just convincing the positive impacts of the
instruction, as compared to the quasi-experimental study. While using the action research,
the researcher could implement a wide range of instruments, both qualitative and
quantitative tools to describe the research phenomenon (Creswell, 2012).

5
3.2. Research setting
The research will take place at a local private university in Hochiminh City. In
particular, the English educational program for English majors will be the primary
concern. The students, who are majoring in English, study discrete skills – listening,
speaking, reading, and writing, at the beginning of the course.
3.3. Sampling and participants
Convenience sampling will be used because the researcher could gain access to the
intact classes, as assigned on the working schedule. Choosing this sampling method, as
its name suggests, helps the researcher save time stratifying or randomizing the sample as
the researcher can contact with participants who are available for research (Creswell,
2012).
The would-be participants of the current research will include 125 English majors
at the school. They are the second-year English majors who have experienced at least
three years of studying. They are studying in three different intact classes, called in this
research as Class A (N=40), B (N=45), and C (N=40). They were randomly assigned into
three research groups: the control group (Class A), the first experimental group (Class B),
and the second experimental group (Class C).
3.4. Research instruments
There are three research instruments employed in this research, namely pre/ post-
questionnaire, pretest/ posttest, and field notes of classroom observations.
The pre-questionnaire aims to collect students’ background information before the
actual research. It gathers the students’ English learning experience, English collocational
background knowledge in terms of adjective-noun collocations, and previous learning
experience with data-driven learning approach. The post-questionnaire aims to collect the
students’ self-evaluation of the course effectiveness. Using the questionnaire would be
applied to gather the responses from a large sample of participants and becomes one of
the most ubiquitous instruments in research (Creswell, 2012). Therefore, to collect the
data from these 120 students, the current study employs this instrument.

6
The pretest and posttest share the same content. They aim to evaluate the students’
collocational retention to use in the context of writing. In particular, the pretest and
posttest has the same writing topic under the same guiding question, “Write a 150-word
paragraph to answer the following question.” Moreover, to eliminate the mediating
factors as the effect of remembering or copying texts, the students are required to write in
paper and do not use any aid while writing. The design of pretest and posttest helps
collect the actual data of the students’ collocation use and serve for the comparison later.
The classroom observations will be made within the 10-week course. The field
notes will be used to report key happenings of the lesson, rather than covering all
information. As mentioned in Creswell (2012), using field notes for the classroom
observations help the researcher focus on the key details to be observed and reflected
rather than writing too much.
3.5. Training materials
As assigned in the course syllabus, Great Writing 2 will be the primary textbook
of these classes. The aim of this coursebook is to scaffold students’ paragraph writing,
not only through process-based writing – Brainstorming, Drafting, Writing, and
Proofreading, but also genre-based writing – Definition Paragraphs, Narrative
Paragraphs, Opinion Paragraphs, Comparison/ Contrast Paragraph, Process Paragraph,
and so on.
Moreover, as regards the collocation instruction, corpus-based tools will be
employed in Class B and C. For Class B, they will be exposed to Corpus of
Contemporary American English (https://www.english-corpora.org/coca/) (COCA) and
Class C will be exposed to Oxford Learners’ Dictionaries
(https://www.oxfordlearnersdictionaries.com/) (OLD).
3.6. Training procedures
Three different teaching procedures will be applied into these three intact classes
within the 10-week course. Also, these three classes will receive the collocation
scaffolding lesson in three different ways.

7
Class A will study how to write in a traditional way at this school. Every writing
lesson starts with a short theory-based lecture on writing, including definitions of genres,
paragraph structures, and further linguistic mechanics – vocabulary, grammar, cohesive
devices, and then join writing tasks. Class A will have chances to read some paragraph
models, but they will not be asked to use some corpus-based tools while improving
writing. Also, the teacher just showed some useful collocations on the Power Point
presentations for the students to replicate. The students depend too much on the teachers
to have collocation retention for their writing.
Class B will study the same lessons as Class A. Nevertheless, this class will
receive a little instruction on how to use COCA before the writing tasks. This instruction
includes how to use COCA to extract good sentence patterns which entails good
collocations. Class C will study writing with OLD. This instruction includes how to use
OLD to extract good sentence patterns which entails good collocations. These two classes
use two different corpus-based tools for their data-driven learning approach during this
course.
The lesson plans for these classes are summarized into the following table (see
Table 3.6)
Table 3.6. The teaching procedures
Week Class Lesson Collocation Corpus-based
instruction tools
A Lesson 1: Descriptive Yes, through None
Paragraphs teacher’s
lecture
B Lesson 1: Descriptive Yes, through COCA
Paragraphs
1 the corpus-
based tool
C Lesson 1: Descriptive Yes, through OLD
Paragraphs
the corpus-
based tool

8
A Lesson 2: Descriptive Yes, through None
Paragraphs (cont) teacher’s
lecture
B Lesson 2: Descriptive Yes, through COCA
Paragraphs (cont)
2 the corpus-
based tool
C Lesson 2: Descriptive Yes, through OLD
Paragraphs (cont)
the corpus-
based tool
A Lesson 3: Process Paragraphs Yes, through None
teacher’s
lecture
B Lesson 3: Process Paragraphs Yes, through COCA
3 the corpus-
based tool
C Lesson 3: Process Paragraphs Yes, through OLD
the corpus-
based tool
A Lesson 4: Process Paragraphs Yes, through None
(cont)
teacher’s
lecture
B Lesson 4: Process Paragraphs Yes, through COCA
(cont)
4 the corpus-
based tool
C Lesson 4: Process Paragraphs Yes, through OLD
(cont)
the corpus-
based tool
5 A Lesson 5: Comparison/ Yes, through None
Contrast Paragraphs
teacher’s

9
lecture
B Lesson 5: Comparison/ Yes, through COCA
Contrast Paragraphs
the corpus-
based tool
C Lesson 5: Comparison/ Yes, through OLD
Contrast Paragraphs
the corpus-
based tool
A Lesson 6: Comparison/ Yes, through None
Contrast Paragraphs (cont)
teacher’s
lecture
B Lesson 6: Comparison/ Yes, through COCA
Contrast Paragraphs (cont)
6 the corpus-
based tool
C Lesson 6: Comparison/ Yes, through OLD
Contrast Paragraphs (cont)
the corpus-
based tool
A Lesson 7: Narrative Paragraphs Yes, through None
teacher’s
lecture
B Lesson 7: Narrative Paragraphs Yes, through COCA
7 the corpus-
based tool
C Lesson 7: Narrative Paragraphs Yes, through OLD
the corpus-
based tool
8 A Lesson 8: Narrative Paragraphs Yes, through None
(cont)
teacher’s
lecture
B Lesson 8: Narrative Paragraphs Yes, through COCA

10
(cont) the corpus-
based tool
C Lesson 8: Narrative Paragraphs Yes, through OLD
(cont)
the corpus-
based tool
A Lesson 9: Opinion Paragraphs Yes, through None
teacher’s
lecture
B Lesson 9: Opinion Paragraphs Yes, through COCA
9 the corpus-
based tool
C Lesson 9: Opinion Paragraphs Yes, through OLD
the corpus-
based tool
A Lesson 10: Opinion Paragraphs Yes, through None
(cont)
teacher’s
lecture
B Lesson 10: Opinion Paragraphs Yes, through COCA
(cont)
10 the corpus-
based tool
C Lesson 10: Opinion Paragraphs Yes, through OLD
(cont)
the corpus-
based tool

3.7. Reliability and validity


First, the researcher will invite one expert to help proofread the questionnaires and
study tests before the actual implementation. On the construction of the pre-questionnaire
and post-questionnaire, five outside students will be invited to complete in order to check
whether there is any technical problem.

11
3.8. Data collection procedures
After the prequestionnaire is done one week before the course, the researcher will
administer the pretest on the first lesson. Then the researcher will make the first
classroom observation in each class. The teaching procedures will happen as noted earlier
within the next 9 weeks. At the end, the post-questionnaire and posttest will be
administered.
3.9. Data analysis procedures
SPSS (Version 20.0) will be used to analyze the descriptive statistics of the pre-
questionnaire, in terms of the students’ background information. This will be done at the
beginning of the course. For the field notes, the analysis will be done after each writing
lesson. The researcher will read the reflection again and again and highlight the particular
problems and students’ benefits if possible. Especially, the pretest and posttest will be
used to help extract adjective-noun collocation use. The researcher will first read,
underline, and distinguish good and bad collocation use in each piece of writing. The
errors in the students’ collocation use will be retrieved too and presented in tables. After
course, while the results from the post-questionnaire will be extracted from the
descriptive statistics on SPSS, the collocation use in the posttest will be noted. The
comparison between pretest and posttest will show the noticeable errors in adjective-noun
collocations and the number of effective adjective-noun collocation use over time.

REFERENCES
Altun, H.(2021).The learning effect of corpora on strong and weak collocations:
implications for corpus-based assessment of collocation competence. International
Journal of Assessment Tools in Education, 8(3), 509-526.
https://doi.org/10.21449/ijate.845051
Celce-Murcia, M., & Schmitt, N.(2010).An overview of applied linguistics. In Nobert, S.
(Ed.), An introduction to applied linguistics (pp. 1-15). NY: Routledge.

12
Cotos, E., Link, S., & Huffman, S. (2017). Effects of DDL technology on genre
learning. Language Learning & Technology, 21(3), 104-130.
http://hdl.handle.net/10125/44623
Creswell, J. W.(2012).Educational research: Planning, conducting, and evaluating
quantitative, (4th, ed.). Prentice Hall.
Fang, L., Ma, Q., & Yan, J.(2021).The effectiveness of corpus-based training on
collocation use in L2 writing for Chinese senior secondary school students. Journal of
China Computer-Assisted Language Learning, 1(1), 80-109.
https://doi.org/10.1515/jccall-2021-2004
Hadley, G.(2002).An introduction to data-driven learning. RELC Journal, 33(2), 99-124.
https://doi.org/10.1177%2F003368820203300205
Kartal, G., & Yangineksi, G.(2018).The effects of using corpus tools on EFL student
teachers' learning and production of verb-noun collocations. PASAA: Journal of
Language Teaching and Learning in Thailand, 55, 100-125.
Lee, H., Warschauer, M., & Lee, J. H.(2019).The effects of corpus use on second
language vocabulary learning: A multilevel meta-analysis. Applied Linguistics, 40(5),
721-753.
McCarthy. M.,& O’Dell, F.(2017).English collocations in use: Intermediate (2nd ed.).
NY: Cambridge University Press.
Moehkardi, R. R. D.(2002).Grammatical and lexical English collocations: Some possible
problems to Indonesian learners of English. Humaniora, 15(1), 53-62.
Nation,P.& Meara,P.(2020).Vocabulary. In N. Schmitt & Rodgers,M.P.H. (Eds.). An
Introduction to Applied Linguistics, pp.35-54.NY:Hodder & Stoughton.
Park, K.(2012).Learner–corpus interaction: A locus of microgenesis in corpus-assisted L2
writing. Applied Linguistics, 33(4), 361-385. https://doi.org/10.1093/applin/ams012
Pérez-Paredes, P.(2022).A systematic review of the uses and spread of corpora and data-
driven learning in CALL research during 2011–2015. Computer Assisted Language
Learning, 35(1-2), 36-61.

13
Pham, T. B. N.(2022).Language proficiency and knowledge in adjective-noun
collocations: A case study of Vietnamese learners of English. Journal of Language
Teaching and Research, 13(1), 172-181.
Pham, T. T. T.(2018).An investigation into data-driven approach to introducing
vocabulary and collocations. Proceeding of language teaching and learning today
2018: Diversity and unity of language education in the globalised landscape (5-6
May, 2018). Faculty of Foreign Languages, Ho Chi Minh City University of
Technology and Education, Vietnam.
Vu, D. V., & Peters, E.(2021).Vocabulary in English language learning, teaching, and
testing in Vietnam: A review. Education Sciences, 11(9), 563.

Appendices
Primary coursebook

14
COCA

OLD

15
Lesson plan
For Class A
Lesson title: Process Paragraph
Aim
- to help students be able to write a process paragraph
Time: 50 minutes
Aid: whiteboard, projector, coursebook
Stage Teacher’s activities Students’ activities
Warm-up Show a model paragraph Work in a group of four
and ask student to work in students and briefly
group to answer questions: discuss in 3 minutes
- How many sentences are
there in the paragraph?
- What is its topic and
concluding sentences?
- How many key supporting
sentences are there? What
kind of pattern of
organization is it?
Pre-writing - Teacher presents the - Observe and take notes
definition of a process if possible
paragraph and show two
other model paragraphs
- Teacher gives a topic and - Work in group and do
ask students to work in the writing. Students
groups to finish a 120-word MUST NOT use any
paragraph. technical device or
printed dictionaries to

16
help with the writing
While- - Observe students’ group - Work in group to finish
writing work and give language the writing and ask
hints (vocabulary) to the teacher to explain some
students when needed possible problems
- List some useful - Take notes and may
collocations related to the apply some into the
topic on the Power Point writing
presentations
Post-writing - Collect the students’ paper - Submit the paragraph
and have an overlook
- Receive questions or - Ask questions an give
feedback from the students feedback if any

Class B
Lesson title: Process Paragraph
Aim
- to help students be able to write a process paragraph
Time: 50 minutes
Aid: whiteboard, projector, coursebook, COCA

Stage Teacher’s activities Students’ activities


Warm-up Show a model paragraph Work in a group of four
and ask student to work in students and briefly
group to answer questions: discuss in 3 minutes
- How many sentences are
there in the paragraph?
- What is its topic and
concluding sentences?

17
- How many key supporting
sentences are there? What
kind of pattern of
organization is it?
Pre-writing - Teacher presents the - Observe and take notes
definition of a process if possible
paragraph and show two
other model paragraphs
- Teacher gives a topic and - Work in group and do
ask students to work in the writing. Students
groups to finish a 120-word HAVE TO use COCA to
paragraph. help with the writing and
also have to take
snapshots of COCA to
demonstrate their work
- Instruct students how to - Practice using COCA
use COCA for searching
useful words or phrases for
the writing
While- - Observe and control - Work in group to finish
writing students’ group work the writing and ask
teacher to explain some
technical problems with
COCA
- Present COCA examples
on the screen
Post-writing - Collect the students’ paper - Submit the paragraph
and have an overlook
- Receive questions or - Ask questions an give

18
feedback from the students feedback if any

Class C
Lesson title: Process Paragraph
Aim
- to help students be able to write a process paragraph
Time: 50 minutes
Aid: whiteboard, projector, coursebook, Oxford Learners’ Dictionaries (OLD)

Stage Teacher’s activities Students’ activities


Warm-up Show a model paragraph Work in a group of four
and ask student to work in students and briefly
group to answer questions: discuss in 3 minutes
- How many sentences are
there in the paragraph?
- What is its topic and
concluding sentences?
- How many key supporting
sentences are there? What
kind of pattern of
organization is it?
Pre-writing - Teacher presents the - Observe and take notes
definition of a process if possible
paragraph and show two
other model paragraphs
- Teacher gives a topic and - Work in group and do
ask students to work in the writing. Students
groups to finish a 120-word HAVE TO use OLD to
paragraph. help with the writing and

19
also have to take
snapshots of OLD to
demonstrate their work
- Instruct students how to - Practice using OLD
use OLD for searching
useful words or phrases for
the writing
While- - Observe and control - Work in group to finish
writing students’ group work the writing and ask
teacher to explain some
technical problems with
OLD
- Present OLD examples on
the screen
Post-writing - Collect the students’ paper - Submit the paragraph
and have an overlook
- Receive questions or - Ask questions an give
feedback from the students feedback if any

20

You might also like