Professional Documents
Culture Documents
A Micro Process Product Study of A CLIL
A Micro Process Product Study of A CLIL
Instructed
isla (online) issn 2398–4163
Second Language
Acquisition Article
Abstract
Affiliations
Michael H. Long: University of Maryland, USA.
email: mlong5@umd.edu
Assma Al Thowaini: University of Maryland, USA; King Saud University, Riyadh, Saudi Arabia.
email: aalthowaini@ksu.edu.sa
Buthainah Al Thowaini: University of Maryland, USA; King Saud University, Saudi Arabia.
email: balthowaini@ksu.edu.sa
Jiyong Lee: University of Maryland, USA.
email: jlee0123@umd.edu
Payman Vafaee: Columbia University, USA
email: pv2203@tc.Columbia.edu
generalisations to real CLIL programs, which was not our intention. Rather,
we wish to suggest that process-product laboratory studies of larger scale and
longer duration, paired with classroom studies employing a similar design and
research methodology, offer a useful approach to identifying strengths and
weaknesses of CLIL programs largely ignored to date.
1. Canadian-style immersion
2. Transitional (early exit) and maintenance (late exit) bilingual education
3. So-called (and misleadingly called) Structured English Immersion
4. Submersion
5. Content-based language teaching
6. Dual, or two-way, immersion
7. Sheltered subject-matter teaching
8. Foreign language immersion.
In practice, this was generally a foreign language (FL), usually English, with
development and maintenance of students’ native language(s) not at risk
because of its use outside the program.
The increasing reliance on English in education in some countries today,
however, is sometimes differently motivated, for example, by the need for a
lingua franca when the increasing presence of children from ethnolinguis-
tic minorities in schools, or the growing numbers of international students
at universities, means that not all students and/or teachers share a common
language. In still other cases, especially but by no means only at the ter-
tiary level, even where the student body is drawn exclusively from the same
L1 background, the motivation is like that for the original primary- and
secondary-level CLIL programs: a felt need to internationalise the curricu-
lum and, through mastery of a foreign language, facilitate students’ access
to educational opportunities overseas and employment prospects both
at home and abroad. Rather than ‘CLIL’, such programs today are often
referred to as ‘English-Medium Instruction’, ‘English as a lingua franca in
academia’ or ‘Integrating Content and Language in Higher Education’ (see,
e.g., Mauranen 2012, Slobodanka, Hultgren and Jensen 2015; Wilkinson
and Walsh 2015).
In many cases, however, for example, at a growing number of Middle-
Eastern universities, instructors are non-native speakers of English them-
selves and local nationals who share the students’ L1, Arabic. Theirs are still
CLIL programs, therefore, even if the rationale for the use of English in the
situations in which they work may now be different. In still other settings,
instructors may be expatriate content specialists, for example, university
lecturers in science and technology, for whom the medium of instruction,
English, is their L1. Their situation is different again, therefore, from that of
local-born lecturers now mandated to use English in their courses, some-
times at the very same universities, and of CLIL teachers in many parts of
Asia, the Middle East and Europe who share an L1 with their students, and
for whom the medium of instruction, usually English, is a foreign language
that they and their students do not necessarily speak very well.
Adding to the program diversity still further, like Canadian immer-
sion, even traditional CLIL programs are not monolithic, either within or
across countries. Of the situation in Spain, for example, Ruiz de Zarobe
and Lasagabaster write, ‘There are no set formula and methods for CLIL’
(2010:vii), and ‘there are as many models as [the 17 autonomous] regions
and no single blueprint exists to take root across the country’ (2010:ix).
European CLIL programs range from single subjects to much of the cur-
riculum taught through the L2 (Hüttner and Smit 2014), sometimes with
the presence of a second or even a third language in the mix, as in several
A micro process-product study of a CLIL lesson 7
Figure 1: French immersion in Canada and CLIL programs in many countries compared.
Research on CLIL
As several commentators have pointed out (see, e.g., Bruton 2011a, 2011b,
2013, 2015; Dallinger et al. 2016; Navés and Victori 2010; Pérez-Canado
2012; Rumlich 2013, 2014), some comparative studies of CLIL and tra-
ditional FL programs, for example, that by Lorenzo, Casal and Moore
(2010), have been beset by threats to internal validity, with selection being
particularly problematic. (For a discussion of the impact of six standard
threats to internal validity in comparative studies of L2 programs: history,
maturation, testing, instrumentation, selection and mortality, see Long,
1984.) Participation in CLIL programs is usually optional for both teach-
ers and students (although not always for students; see Hüttner, Dalton-
Puffer and Smit, 2013). Consequently, there is a lack of random assignment
to CLIL and regular FL courses, some CLIL programs even guaranteeing
non-equivalent groups by admitting students based on their superior FL
abilities. Yet when comparing intact groups, researchers typically fail to
correct for pre-existing differences among teachers and students, who may
have elected CLIL or non-CLIL groups voluntarily. If teachers volunteer
to teach their usual courses in an FL, it may indicate greater enthusiasm
and willingness to work harder, for example, preparing new materials and/
or greater competence. Students and families who choose CLIL often do
so because they value FL study more than those who do not, or are chosen
by the school as having superior FL abilities and/or greater potential to
thrive in CLIL. These factors can mean that the CLIL students are more
motivated to succeed, have higher starting English or other FL proficiency,
in turn often related to economic and social class differences, and may also
enjoy more out-of-class contact with English during the school year, for
example, through additional private tutoring. CLIL courses themselves
often provide as many as 50% more hours of instruction than the tradi-
tional foreign language courses with which they are compared (another
confounding variable sufficient to predict better outcomes), but even
then, do not always produce superior FL learning (Pladevall-Ballester and
Vallbona 2016). Muñoz (2015) provides an insightful review of studies
showing the importance of time (total hours) and timing (starting age) in
CLIL programs. There is sometimes a lack of established reliability of FL
or subject-matter measures, and in cases of pre-test–post-test designs,
unverified equivalence of their pre- and post-test forms.
In defence of the early studies, such threats to internal validity are by no
means unique to evaluations of CLIL. The use of non-equivalent control
group designs, with learning effects potentially confounded with pre-exist-
ing differences between students and conditions, differential time on task,
12 Michael H. Long ET AL.
Dallinger et al.’s study is one of the first evaluations to examine the effects of
selection and other potential confounds, and one of the very few with suf-
ficient, and sufficiently reliable, measures to do so. (However, for a recent
two-year study of CLIL in Germany with similar findings, see Rumlich
2016.) The broad scope and methodological rigour of the work, along with
the lack of studies of comparable size and quality, make its findings unusu-
ally important. The results demonstrate that selection really can be a major
factor in determining outcomes in CLIL studies and needs to be controlled
for in future work. Since doing so is difficult in natural classroom settings,
alternative approaches, including a true randomised design, are worthy of
consideration. This will usually mean a laboratory study, with attendant
limitations on external validity, that is, the generalisability of findings to
real classroom settings. However, as proposed elsewhere (Long, 2015b),
pairing laboratory experiments and classroom studies, ideally performed
in that order, examining the same variables and using the same measures,
can provide a defensible basis for pedagogic recommendations if results
from each setting are comparable.
Two additional limitations of most research to date should be noted.
First, while valuable descriptive studies of lessons and lectures have been
reported (e.g. Smit 2010), documenting such matters as the greater fre-
quency of language-related episodes in CLIL than in EFL classrooms in
Spain (Basterrechea and García Mayo 2013), patterns of corrective feed-
back in CLIL and immersion classrooms (Llinares and Lyster 2014), and
the use of focus on form in lectures in English at an Italian university (Costa
2012), process variables have yet to be related to learning outcomes. Yet
L2 classroom researchers have long established the importance of detailed
descriptions of classroom processes and language use before moving on to
14 Michael H. Long ET AL.
evaluation studies (for a detailed discussion, see Long, 2015a, pp. 347–364;
Shintani 2011). In comparative studies of FL and CLIL programs, impor-
tant dimensions of classroom discourse would include the proportions
of lessons delivered through the students’ L2 and/or L1, and the extent
to which each focused on subject matter as opposed to code features. An
absence of data on classroom processes renders explanations of findings
on learning outcomes speculative. Second, virtually all studies to date have
been conducted on the use of CLIL with school-age children. With the
model now spreading fast in both public and private tertiary institutions
in many countries, process-product research on CLIL with college-age
students is sorely needed, and research that considers both language and
subject-matter learning.
It was with these considerations in mind that a small-scale laboratory
study was undertaken to ascertain the feasibility of evaluating CLIL while
dealing with confounds that, with the notable exception of Dallinger and
co-workers (2016), have afflicted much of the early work. The study was
exploratory, designed to identify potential pitfalls and methodological con-
siderations when designing a larger-scale CLIL evaluation. The experimen-
tal lessons differed in several important ways from those in authentic CLIL
programs, so were not intended to produce findings generalisable to such
programs.
• RQ1: How does teacher speech differ when the medium of instruc-
tion is either the L1 or L2 of teachers and students?
A micro process-product study of a CLIL lesson 15
Method
Design
The study employed a post-test only, criterion group design. Three lan-
guage conditions were examined: English baseline, CLIL and Arabic base-
line. Arabic was chosen as the L1 of participants in the CLIL and baseline
groups due to the surging interest in CLIL at the tertiary level in parts
of the Arab world (among other places). Each condition comprised three
groups, with one participant in each group functioning as an instructor
and four as students (see Figure 2). The medium of instruction and the L1
of all teachers and students in the English baseline condition was English.
The medium of instruction and L1 of all teachers and students in the
Arabic baseline condition was Arabic. In the CLIL condition, the medium
of instruction was English, while the L1 of all teachers and students was
Arabic.
Language conditions
Participants
Following IRB approval, forty-five participants were recruited via adver-
tisements and word of mouth. Participants in the English baseline condi-
tion were three graduate students and twelve undergraduates at a public
university in the USA, all native speakers (NSs) of English. The twelve
undergraduates were randomly assigned to form three groups of four, and
the three graduate students randomly assigned to serve as instructors,
one for each of the three groups. The fact that the graduate students were
not specialists in the subject matter, and that the material created for the
study was fictitious (see below), pre-empted potential confounds caused
by differences in teaching experience that might exist, and whose effects
would need to be examined, among real teachers. For the Arabic baseline
condition, fifteen native speakers of Gulf Arabic studying at a university
in a Middle-Eastern country and with minimal knowledge of English were
randomly assigned to form three groups of four undergraduates, and three
graduate students were randomly assigned to serve as their instructors,
one for each group. For the CLIL condition, potential participants, NSs of
Gulf Arabic studying at a public university in the USA, first completed an
English proficiency test and submitted their most recent TOEFL or other
standardised test scores, with the dates on which the tests had been taken.
The proficiency information was used to screen a suitable subset into the
study, and then for their stratified random assignment to form three groups
of four of comparable average proficiency in English. The proficiency level
used to determine eligibility to participate in the CLIL condition were the
iBT and PBT equivalent scores to CEFR B1 for teachers and CEFR A1 for
students. Participants functioning as teachers and students in the experi-
ment were paid $40 and $20, respectively.
Materials
To eliminate the possibility that participants might possess prior knowl-
edge of the subject matter, a story sufficient for a fifteen-minute lesson
was written especially for the study (for a brief excerpt, see Appendix 1),
together with forty multiple-choice test items. The story intentionally
contained plausible, but purely fictitious, information about an amateur
anthropologist’s alleged discovery of a hitherto unknown indigenous tribe,
the Kiriboe, in the Amazonian jungle. It covered such matters as the fictional
tribe’s language, matriarchal social organisation (a concept that turned out
to pose significant comprehension problems in the Arabic baseline condi-
tion), living arrangements, childcare, hunting methods, rituals and more,
as reported by Smith, the explorer, on his return to London. His account
A micro process-product study of a CLIL lesson 17
Instrumentation
Multiple-choice subject-matter test
A subject-matter assessment measure was developed consisting of forty
multiple-choice content questions. It was based on the key information
points in the lesson outline, each answer containing one correct choice
and three distractors. The test included a variety of items involving factual
recall, information synthesis and inference from information provided. An
Arabic translation of the measure was provided for the Arabic baseline and
CLIL conditions.
Procedures
The three groups in each condition (a teacher and four students per group)
completed the experiment in a small classroom. Teachers arrived first and
were asked to read and sign the consent form in their native language. They
were then given sixty minutes to prepare their lesson. Teachers in the two
baseline conditions reviewed the accompanying picture prompts (printed
PowerPoint slides) and read the Kiriboe text in their native language,
English or Arabic, as many times as they liked. The CLIL teachers reviewed
the prompts and read the script in English, accompanied by a glossary of
Arabic translations of the low-frequency words. The teachers were allowed
to take notes, but were warned that they would only be permitted to use
the pictures, not the script or their notes, when teaching the lesson. This
was to preclude their reading excerpts of the script aloud, which would
have pre-empted the kind of spontaneous language and decision-making
characteristic of real lessons.
When the participants acting as students arrived an hour later, they read
and signed the consent form in their native language. Then, with one of the
researchers seated unobtrusively in the classroom, and with the English
baseline and CLIL sessions audio- and video-recorded, each teacher was
asked to deliver the lesson in fifteen minutes and to encourage student
participation, for example, in the form of questions and comments. As
instructed by the researchers, lessons were delivered exclusively in English
in the English baseline and CLIL conditions, and exclusively in Arabic in
the Arabic baseline condition. Some fairly minimal teacher–student inter-
action, including teacher responses to occasional student clarification
requests, was observed in all groups. Immediately after the lessons, stu-
dents completed the multiple-choice test on lesson content, in English for
the English baseline group, in Arabic for both the Arabic baseline and CLIL
groups. They then completed the cloze test. The tests were administered
in that order to all groups. The multiple-choice test required about fifteen
minutes, the cloze test approximately twenty minutes. The English baseline
and CLIL lessons were subsequently transcribed, and the transcripts veri-
fied by two native speakers.
Hypotheses
Summarised here for reasons of space, our hypotheses were motivated by:
types and tokens, in the English baseline than the CLIL condition, and
superior subject-matter learning and vocabulary scores in the English and
Arabic baseline conditions than in the CLIL condition.
Results
Teacher T1 T2 T3
English baseline 1765 1141 2156
CLIL 1105 570 1767
Teacher T1 T2 T3 T1 T2 T3
Clausal utterances 70 59 143 78 37 107
Total 272 222
Mean 90.67 74.00
SD 45.65 35.17
SEM 26.36 20.31
n 3 3
Teacher T1 T2 T3
English baseline 2.27 3.58 1.73
CLIL 1.71 2.08 1.89
Table 3a: Numbers and descriptive statistics for vocabulary in teacher speech.
Teacher T1 T2 T3 T1 T2 T3
Type 9 16 13 4 7 5
Token 13 27 29 8 8 7
TTR 0.69 0.59 0.45 0.50 0.88 0.71
Guiraud’s Index 2.50 3.08 2.41 1.41 2.47 1.89
A micro process-product study of a CLIL lesson 23
There was very little difference between the English baseline and CLIL
lessons either in the total numbers of mentions of the forty target items
in the cloze test or in the numbers of the forty items not mentioned by
any of the three teachers in each condition (see Appendix 3). Thirty out
of forty target items were mentioned a total of eighty times by the three
teachers in the English baseline condition. In the CLIL groups, twenty-six
out of the forty target items were mentioned by the three teachers a total
of seventy-six times. The difference was statistically non-significant (z =
0.43, p = 0.67).
A simple linear regression analysis was conducted to determine whether
the number of correct responses to the cloze test items could be predicted
from the number of occurrences of the words needed for correctly answer-
ing those items in teacher speech. The null hypothesis tested was that the
regression coefficient (i.e. the slope) was equal to 0. Prior to analysis, the
data for both the English baseline and CLIL groups were screened for
missing entries and violation of assumptions. There were no missing data.
For the English baseline groups, the results of the simple linear regression
suggested that a significant proportion of the total variance in the number
of correct answers to cloze test items was predicted by the number of
occurrences in teacher speech of the words needed to answer the items
correctly. In other words, the number of times a word appeared in the
lessons was a good predictor of correct responses to the cloze test item
targeting that word (F (1, 38) = 18.5, p < 0.001). Additionally, it was found
that (a) the unstandardised slope (0.91) and standardised slope (0.57) were
statistically significantly different from 0 (t = 4.3, df = 38, p < 0.001). Finally,
multiple R-squared indicated that approximately 32.7% of the variance in
cloze test scores was predicted by the number of occurrences of the rel-
evant words in the lessons. According to Cohen (1988), this constitutes a
medium effect.
24 Michael H. Long ET AL.
For the CLIL groups, the results were similar. The simple linear regres-
sion suggested that a significant proportion of the total variance in the
number of correct answers to cloze test items was predicted by the number
of occurrences in teacher speech of the words required to answer the items
correctly. In other words, the number of times a word appeared in the
lessons was a good predictor of correct responses to the cloze test item
targeting that word (F (1, 38) = 10.29, p < 0.001). Additionally, it was found
that (a) the unstandardised slope (0.23) and standardised slope (0.46) were
statistically significantly different from 0 (t = 3.2, df = 38, p < 0.001). Finally,
multiple R-squared indicated that approximately 21.3% of the variance in
cloze test scores was predicted by the number of the occurrences of the
relevant words in the lessons. According to Cohen (1988), this constitutes
a small effect.
Subject-matter learning
Three items on the MC test of subject-matter learning were deleted from
both versions of the subject-matter test because they were discovered to
have two possibly correct answers. The two versions of the test were then
subjected to a classic test theory item and reliability analysis. The first round
of item analysis on the version for the English baseline and CLIL groups (n
= 24) revealed four items with a zero or negative item discrimination index
(DI). Those items were deleted, the reliability estimate (Cronbach’s alpha,
α) for this version increasing from 0.69 to 0.74. The same analysis for the
Arabic version (n = 12) identified three items with a zero or negative DI.
Those items were deleted, α increasing from 0.82 to 0.86. The second round
of item analysis on both versions of the test found no additional items with
a zero or negative DI. Total scores for the three experimental groups were
calculated for the purpose of group comparisons. As can be seen in Table
4 and Figure 3, the English and Arabic baseline groups outperformed the
CLIL groups, and statistically significantly so in the case of the English
baseline groups.
Assumptions for the one-way ANOVA analysis were met, except that
variances among the groups were unequal, as revealed by the results of
Levene’s test: (2, 33) = 3.75, p = 0.03. The Welch test was used, therefore,
the results showing that the omnibus test was significant, F (2, 19.25), p
= 0.04. Dunnett’s T3 post-hoc test indicated that the difference between
the English baseline and CLIL groups was marginally significant, with a
p-value of 0.05. The remaining group comparisons yielded no significant
differences between groups.
A micro process-product study of a CLIL lesson 25
Table 4: Means and standard deviations for the MC test of subject- matter learning.
Teacher n Mean SD
English baseline 12 16.75 4.29
CLIL 12 13.00 2.22
Arabic NSs 12 14.83 5.13
Figure 3: Group differences for the MC test scores for subject-matter learning.
Vocabulary knowledge
The first round of item analysis for the version for the English baseline and
CLIL groups (n = 24) revealed seven items with a zero or negative DI. After
deleting those items, α for this version rose from 0.91 to 0.92. The same
analysis for the Arabic version (n = 12) identified ten items with a zero or
negative DI. Those items were removed, α increasing from 0.82 to 0.90. The
second round of item analysis on both versions of the test found no addi-
tional items with a zero or negative DI. Total scores for the three experi-
mental groups were then calculated for the purpose of group comparisons.
Assumptions for the one-way ANOVA analysis were met, except that
the variances among the groups were unequal, as revealed by the results of
Levene’s test: (2, 33) = 8.66, p = 0.00. Therefore, the Welch test was used,
the results showing that the omnibus test was significant, F (2, 17.04), p =
0.00. As shown in Table 5 and Figure 4, both baseline groups outperformed
the CLIL group. Dunnett’s T3 post-hoc test indicated that the differences
between the English baseline and CLIL, and Arabic baseline and CLIL,
26 Michael H. Long ET AL.
Table 5: Means and standard deviations for the cloze test of vocabulary learning.
Teacher n Mean SD
English baseline 12 13.58 4.62
CLIL 12 1.50 1.68
Arabic NSs 12 16.42 7.01
Figure 4: Group differences for the cloze test scores for vocabulary learning.
Discussion
Results for the process variables were mixed. The sheer volume of input
in English baseline lessons was greater than in the CLIL lessons, but not
statistically significantly so, as measured both by the numbers of words and
clausal utterances in teacher speech. As predicted, the syntactic complexity
of teacher speech in the baseline condition was also greater, as measured
by s-nodes per clausal utterance, but again, not statistically significantly
so. These differences could be expected to become statistically significant
with larger samples. As it was, the small n-size limited the statistical power
to detect differences, a problem increased by the considerable variability
among teachers in the same condition, for example, in the case of clausal
utterances.
A micro process-product study of a CLIL lesson 27
was chiefly due to a lack of time in which to pilot either measure before its
use in the study for fear of reducing our pool of potential Arabic-speaking
participants for the CLIL condition, some of whom were soon to return to
their country of origin. Nevertheless, the English baseline advantage for
content learning, and both the English baseline and Arabic baseline advan-
tage for vocabulary scores, were statistically significant, in line with find-
ings of more recent studies, such as Dallinger and colleagues (2016), and
like them, suggest a serious need for further research before CLIL is hailed
as a success in either domain.
There were no pre-tests in the design for this study, so language learning
in the CLIL groups could only be assessed indirectly, and since the les-
son’s short duration precluded measurable grammatical or other linguistic
development, only in terms of post-test vocabulary cloze test scores. (The
need for a pre-test of content knowledge was obviated by the use of fictional
subject matter.) Students in the native speaker baseline groups would have
known all, or almost all, the target lexical items before the study began,
making the relevant data here the CLIL groups’ cloze test scores. The CLIL
groups’ mean was a mere 1.5 out of a possible 23 (after seven poor items
among the original thirty had been removed) – a meagre return even for
such a short lesson. A full-scale study will require pre-testing of whichever
linguistic abilities are to be targeted.
Although language learning was the original driver for CLIL in Europe,
results for subject-matter learning are potentially even more important
now that CLIL programs are becoming so common across the curricu-
lum in some countries, potentially placing curricular content at risk. In
research of larger scope and longer duration than the micro-study reported
here, the syntactic complexity and low-frequency lexical type diversity of
teacher speech and other classroom process variables offer potential means
of explaining differential subject-matter results (or the same results despite
50% more time being allocated) in CLIL compared with regular versions of
courses delivered in students’ native language(s).
Some results failed to achieve statistical significance, but those for
several instructional process variables and the two outcome measures gen-
erally conformed to predictions. The non-statistically significant findings
probably reflected one or more of at least four factors, each easy to modify
in future larger scale research:
3 the relatively low number of items in the subject matter and vocab-
ulary tests, and especially,
4 the study’s brief duration.
Jiyong Lee is a PhD student in the Second Language Acquisition program at the Uni-
versity of Maryland. Her research interests include negative feedback, age effects and
maturational constraints in SLA, Korean phonology and morphosyntax, the validation
of task complexity manipulations, and relationships among task complexity, language
aptitude, and L2 performance. jlee0123@umd.edu
Population: A total of about 1,000 individuals, divided into about twenty family
groups, each spanning several generations.
Food: Monkeys and small mammals, small reptiles, coconuts, honey, bananas,
mangos, oranges and other fruit, wild plants and herbs.
Tools and weapons: Knives fashioned from stones and animal bones, spears made
from sharpened tree branches, blow-pipes made from hollowed bamboo, used to fire
darts tipped with a lethal plant-based poison.
Language: Kiriboese. Kiriboese can be used to describe past and present events and
situations, and has a rich array of colour terms and numerous terms for different kinds
of family members, but it has no words for counting and no means for referring to the
future. The Kiriboe believe talking about the future brings bad luck.
Appendix 2
Lexeme Frequency
ancestral 2,362
paddle 2,305
edible 2,066
dart 1,877
poisonous 1,868
reptile 1,849
trudge 1,849
forage 1,700
repel 1,604
slay 1,508
taboo 1,488
unheard 1,481
zebra 1,405
tributary 1,404
fight off 1,127
fermented 1,076
linguist 991
machete 871
encroach 831
songbird 748
great-grandmother 723
stilt 708
childcare 617
shrunken 443
stereotypically 185
matriarchal 139
blow dart 11
dart blower 0
A micro process-product study of a CLIL lesson 33
Note
1 COCA is a free online corpus that contains more than 520 million words, compiled
from 1990 to 2015. It consists of spoken texts, fiction, popular magazines, newspa-
pers and academic texts.
References
Aguilar, M. and Muñoz, C. (2014) The effect of proficiency on CLIL benefits in engi-
neering students in Spain. International Journal of Applied Linguistics 24(1): 1–18.
https://doi.org/10.1111/ijal.12006
Basterrechea, M. and García Mayo, P. (2013) Language-related episodes during collab-
orative tasks: a comparison of CLIL and EFL learners. In K. McDonough and A.
Mackey (eds) Second Language Interaction in Diverse Educational Contexts 25–43.
Philadelphia: John Benjamins. https://doi.org/10.1075/lllt.34.05ch2
Bruton, A. (2011a). Are the differences between CLIL and non-CLIL groups in Andalu-
sia due to CLIL? A reply to Lorenzo, Casal and Moore (2010). Applied Linguistics
32(2): 236–41. http://applij.oxfordjournals.org/content/32/2/236.abstract; https://
doi.org/10.1093/applin/amr007
Bruton, A. (2011b) Is CLIL so beneficial, or just selective? Re-evaluating some of the
research. System 39(4): 523–32. https://doi.org/10.1016/j.system.2011.08.002
Bruton, A. (2013) CLIL: Some of the reasons why … and why not. System 41(3): 587–97.
https://doi.org/10.1016/j.system.2013.07.001
Bruton, A. (2015). CLIL: detail matters in the whole picture. More than a reply to J.
Huttner and U. Smit (2014). System 53: 119–28. https://doi.org/10.1016/j.
system.2015.07.005
Cenoz, J., Genesee, F. and Gorter, D. (2014) Critical analysis of CLIL: taking stock and
looking forward. Applied Linguistics 35(3): 243–62. https://doi.org/10.1093/applin/
amt011
Chaudron, C. (1982) Vocabulary elaboration in teachers’ speech to L2 learners. Studies
in Second Language Acquisition 4(2): 170–80. https://doi.org/10.1017/
S027226310000440X
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd edn).
Mawah, NJ: Lawrence Erlbaum.
Costa, F. (2012) Focus on form in ICLHE lectures in Italy. Evidence from English-
medium science lectures by native speakers of Italian. AILA Review 25(1): 30–47.
http://www.jbe-platform.com/content/journals/10.1075/aila.25.03cos
Coyle, D. (2008) CLIL – a pedagogical approach from the European perspective. In N.
van Dusen-Scholl and N. Hornberger (eds) Encyclopedia of Language and Education
Vol. 4 97–110. Berlin: Springer. https://doi.org/10.1007/978-0-387-30424-3_92
Coyle, D., Hood, P. and Marsh, D. (2010). Content and Language Integrated Learning.
Cambridge: Cambridge University Press.
Cummins, J. (1998) Immersion education for the millennium: what have we learned
from 30 years of research on second language immersion? In M. R. Childs and R. M.
Bostwick (eds) Learning Through Two Languages: Research and Practice 34–47. Shi-
zuoka: Katoh Gakuen.
Cummins, J. (2009) Bilingual and immersion programs. In M. H. Long and C. J. Doughty
(eds) The Handbook of Second Language Teaching 161–81. Oxford: Wiley-Blackwell.
https://doi.org/10.1002/9781444315783.ch10
A micro process-product study of a CLIL lesson 35
Dallinger, S., Jonkmann, K., Hollm, J. and Fiege, C. (2016) The effect of content and
language integrated learning on students’ English and history competences – killing
two birds with one stone? Learning and Instruction 41: 23–31. https://doi.
org/10.1016/j.learninstruc.2015.09.003
Dalton-Puffer, C. (2008) Outcomes and processes in CLIL: current research from
Europe. In W. Delanoy and L. Volkmann (eds) Future Perspectives for English Lan-
guage Teaching 139–57 Heidelberg: Carl Winter.
Dalton-Puffer, C. (2011) Content-and-language integrated learning: from practice to
principles? Annual Review of Applied Linguistics 31: 182–204. https://doi.
org/10.1017/S0267190511000092
Dalton-Puffer, C., Llinares, A., Lorenzo, F. and Nikula, T. (2013) ‘You can stand under
my umbrella’: immersion, CLIL and bilingual education. A response to Cenoz, Gen-
esee and Gorter. Applied Linguistics 35(2): 213–18. http://applij.oxfordjournals.org/
content/35/2/213.abstract; https://doi.org/10.1093/applin/amu010
Ellis, N. C. and Wulff, S. (2015) Usage-based approaches to SLA. In B. VanPatten and J.
Williams (eds) Theories in Second Language Acquisition. An Introduction (2nd edn)
75–93. New York: Routledge.
Genesee, F. (1995) The Canadian second language immersion program. In O. Garcia
and C. Baker (eds) Policy and Practice in Bilingual Education 118–33. Clevedon:
Multilingual Matters.
Hoyer, W. J. and Lincourt, A. E. (1998) Ageing and the development of learning. In M.
A. Stadler and P. A. Frensch (eds) Handbook of Implicit Learning 445–70. Thousand
Oaks: Sage.
Hüttner, J. and Smit, U. (2014) CLIL (content and language integrated learning): the
bigger picture. A reply to A. Bruton (2013) System 41: 587–97. https://doi.
org/10.1016/j.system.2014.03.001
Hüttner, J., Dalton-Puffer, C. and Smit, U. (2013) The power of beliefs: lay theories and
their influence on the implementation of CLIL programmes. International Journal
of Bilingual Education and Bilingualism 16(3): 267–84. https://doi.org/10.1080/136
70050.2013.777385
Janacsek, K., Fiser, J. and Nemeth, D. (2012) The best time to acquire new skills: age-
related differences in implicit sequence learning across the human lifespan. Develop-
mental Science 15(4): 496–505. https://doi.org/10.1111/j.1467-7687.2012.01150.x
Jäppinen, A.-K. (2005) Thinking and content learning of mathematics and science as
cognitional development in content and language integrated learning (CLIL): teach-
ing through a foreign language in Finland. Language and Education 19(2): 148–69.
https://doi.org/10.1080/09500780508668671
Järvinen, H. M. (2007). Language in language and content integrated learning (CLIL).
In D. Marsh and D. Wolff (eds) Diverse Contexts – Converging Goals: CLIL in Europe
253–60. Bern: Peter Lang.
Johnson, R. K. and Swain, M. (1997) Immersion Education: International Perspectives.
Cambridge University Press. https://doi.org/10.1017/CBO9781139524667
Lambert, W. E. and Tucker, G. R. (1972) Bilingual Education of Children: The St. Lam-
bert Experiment. Newbury House.
Lasagabaster, D. and Ruiz de Zarobe, Y. (2010) CLIL in Spain: Implementation, Results,
and Teacher Training. Newcastle: Cambridge Scholars.
Lasagabaster, D. and Sierra, J. M. (2010) Immersion and CLIL in English: more differ-
ences than similarities. ELT Journal 64(4): 367–75. http://eltj.oxfordjournals.org/
content/64/4/367.abstract; https://doi.org/10.1093/elt/ccp082
36 Michael H. Long ET AL.