A Micro Process Product Study of A CLIL

isla (print) issn 2398–4155
Instructed
isla (online) issn 2398–4163
Second Language
Acquisition Article
A micro process-product study of a CLIL lesson:

linguistic modifications, content dilution and
vocabulary knowledge
Michael H. Long, Assma Al Thowaini,

Buthainah Al Thowaini, Jiyong Lee and Payman Vafaee
Abstract
We begin by comparing two models for the simultaneous teaching of language

and content: immersion, and content and language integrated learning (CLIL).
Following a brief summary and critique of research on CLIL, we describe a
micro process-product laboratory experiment with young adult native speakers
of Arabic for whom English was the L2. The same fifteen-minute lesson about
an amateur anthropologist’s alleged discovery of a hitherto unknown indige-
nous tribe in the Amazonian jungle was delivered by nine surrogate teachers to
nine groups of four surrogate students in three baseline English native speaker
groups, three baseline Arabic native speaker groups and three CLIL groups.
Findings on language use in the nine lessons are related to content learning and
vocabulary knowledge. The short-term, artificial nature of the study precludes
Affiliations
Michael H. Long: University of Maryland, USA.
email: mlong5@umd.edu
Assma Al Thowaini: University of Maryland, USA; King Saud University, Riyadh, Saudi Arabia.
email: aalthowaini@ksu.edu.sa
Buthainah Al Thowaini: University of Maryland, USA; King Saud University, Saudi Arabia.
email: balthowaini@ksu.edu.sa
Jiyong Lee: University of Maryland, USA.
email: jlee0123@umd.edu
Payman Vafaee: Columbia University, USA
email: pv2203@tc.Columbia.edu
isla vol 2.1 2018 3–38 doi: https://doi.org/10.1558/isla.33605

©2018, equinox publishing
4 Michael H. Long ET AL.
generalisations to real CLIL programs, which was not our intention. Rather,
we wish to suggest that process-product laboratory studies of larger scale and
longer duration, paired with classroom studies employing a similar design and
research methodology, offer a useful approach to identifying strengths and
weaknesses of CLIL programs largely ignored to date.
keywords: immersion; clil; incidental learning; age effects; teacher

speech; linguistic modifications; content dilution
Models for the integration of language learning and

content learning
At least eight models have been employed for the integration of language
learning and content learning in North America over the past five decades:
1. Canadian-style immersion
2. Transitional (early exit) and maintenance (late exit) bilingual education
3. So-called (and misleadingly called) Structured English Immersion
4. Submersion
5. Content-based language teaching
6. Dual, or two-way, immersion
7. Sheltered subject-matter teaching
8. Foreign language immersion.
Widely recognised as one of the most successful, Canadian immersion was

initially developed in Montreal in the 1970s for the education of mostly
middle-class, school-age, language-majority English-speaking children
through French (Cummins 1998, 2009; Genesee 1995; Lambert and Tucker
1972; Lyster 2007). Since extended to other languages and socially and eth-
nically diverse student populations, research has shown that the immersion
model results in additive bilingualism, with high levels of L2 proficiency,
especially in receptive skills, and achievement in content subjects broadly
comparable to that of monolingual age peers (Cummins 1998; Swain 1991).
The success of French immersion programs in Canada was the early
inspiration for a ninth model now increasingly popular in primary and sec-
ondary schools in Europe and spreading rapidly from pre-kindergarten to
university there, in East Asia, the Middle East and elsewhere: content and
language integrated learning (CLIL). It is instructive to recall the typical
characteristics of Canadian French immersion programs before consider-
ing the likely and observed outcomes of the more recent model.
A micro process-product study of a CLIL lesson 5
French immersion in Canada

Immersion in Canada is not a single monolithic approach. Forty-three vari-
ants (early/middle/late, complete/partial, one/two immersion languages,
etc.) were identified by Rebuffot (1998). However, there are some common
features. Placement in French immersion programs is usually voluntary,
and usually much sought-after. Parental support has always been strong, in
part because bilingualism in English and French is a requirement for many
government jobs, and partly because of the positive results of the early
programs. As noted, research has shown that the best French immersion
students graduate from high school aged 18 with their native language,
English, intact, and receptive French skills statistically comparable to those
of monolingual French age peers, although their French spoken and written
production is often still marked by a range of grammatical errors (Swain
1991). The advanced French skills, it should be remembered, however, are
achieved after as many as ten years, during which at least 50% of classroom
exposure and use is in the L2.
The regular Canadian school curriculum is delivered by trained subject-
matter teachers who are native speakers or highly proficient non-native
speakers of French, to groups of students with broadly homogeneous (on
entry, little or no) command of the L2, meaning that teacher speech can
be modified and made comprehensible for most students simultaneously,
thereby providing them with usable input for language learning. Immersion
students’ achievements in subjects taught through French are as good as
those of anglophone children taught exclusively in English throughout their
school years. Since English is most immersion students’ home language, as
well as the language of wider communication outside school in most parts
of the country, the native language develops normally and is fully retained,
making French immersion in Canada a case of additive bilingualism with
no adverse effects on content mastery – two for the price of one. Not sur-
prisingly, immersion programs inspired by the Canadian example are now
to be found in many countries around the world (see, e.g., Johnson and
Swain 1997).
CLIL in Europe and beyond

As with French immersion in Canada and FL immersion programs in the
USA, the original goal of the first CLIL programs in Europe, and now in
Asia and the Middle East, was not subject-matter learning, but the addi-
tion of a second (or third, fourth, etc.) language. (For historical background
and overviews of CLIL, see Cenoz, Genesee and Gorter 2014; Coyle 2008;
Coyle, Hood and Marsh 2010; Dalton-Puffer 2011; Muñoz and Navés 2007).
In practice, this was generally a foreign language (FL), usually English, with
development and maintenance of students’ native language(s) not at risk
because of its use outside the program.
The increasing reliance on English in education in some countries today,
however, is sometimes differently motivated, for example, by the need for a
lingua franca when the increasing presence of children from ethnolinguis-
tic minorities in schools, or the growing numbers of international students
at universities, means that not all students and/or teachers share a common
language. In still other cases, especially but by no means only at the ter-
tiary level, even where the student body is drawn exclusively from the same
L1 background, the motivation is like that for the original primary- and
secondary-level CLIL programs: a felt need to internationalise the curricu-
lum and, through mastery of a foreign language, facilitate students’ access
to educational opportunities overseas and employment prospects both
at home and abroad. Rather than ‘CLIL’, such programs today are often
referred to as ‘English-Medium Instruction’, ‘English as a lingua franca in
academia’ or ‘Integrating Content and Language in Higher Education’ (see,
e.g., Mauranen 2012, Slobodanka, Hultgren and Jensen 2015; Wilkinson
and Walsh 2015).
In many cases, however, for example, at a growing number of Middle-
Eastern universities, instructors are non-native speakers of English them-
selves and local nationals who share the students’ L1, Arabic. Theirs are still
CLIL programs, therefore, even if the rationale for the use of English in the
situations in which they work may now be different. In still other settings,
instructors may be expatriate content specialists, for example, university
lecturers in science and technology, for whom the medium of instruction,
English, is their L1. Their situation is different again, therefore, from that of
local-born lecturers now mandated to use English in their courses, some-
times at the very same universities, and of CLIL teachers in many parts of
Asia, the Middle East and Europe who share an L1 with their students, and
for whom the medium of instruction, usually English, is a foreign language
that they and their students do not necessarily speak very well.
Adding to the program diversity still further, like Canadian immer-
sion, even traditional CLIL programs are not monolithic, either within or
across countries. Of the situation in Spain, for example, Ruiz de Zarobe
and Lasagabaster write, ‘There are no set formula and methods for CLIL’
(2010:vii), and ‘there are as many models as [the 17 autonomous] regions
and no single blueprint exists to take root across the country’ (2010:ix).
European CLIL programs range from single subjects to much of the cur-
riculum taught through the L2 (Hüttner and Smit 2014), sometimes with
the presence of a second or even a third language in the mix, as in several
bilingual regions of Spain. Many programs begin as early as infant or

elementary school, but others as late as secondary school or university.
Some college programs, again, may be motivated wholly or in part by the
felt need for internationalisation and globalisation mentioned earlier or by
the presence of international students with different L1s. At the primary
and secondary level, some programs are open to any families interested;
others select students for admission on the basis of demonstrated FL skills.
Most teachers are content specialists, not language specialists, non-native
speakers of the language of instruction, and until recently, with only in-
service training for CLIL (Muñoz and Navés 2007). In fact, like the pro-
grams themselves, training for CLIL varies, depending very much on
a program’s location. (For the situation in Spain, see chapters in Part II:
Teacher Training, in Lasagabaster and Ruiz de Zarobe 2010.)
The variations notwithstanding, CLIL always involves teaching subject
matter through the medium of a foreign (or occasionally, a minority) lan-
guage, most often English, although some teachers temporarily switch to
the students’ L1 when the need arises. As pointed out by Cenoz, Genesee
and Gorter (2014), the same is true of Canadian immersion and several
other models, and the validity of claimed distinctions (see, e.g., Lasagabaster
and Sierra 2010) between current incarnations of CLIL and various forms
of content-based instruction (CBI), including immersion, are contested.
Complicating the matter further, given the increasing presence in urban
classrooms in parts of Europe of ethnolinguistic minority students, there
may not be a common L1 for the students, and the medium of instruc-
tion may not be the L2 for all of them, but, for example, for an immigrant
Moroccan child with Arabic as L1 and Spanish and Catalan as L2 and L3,
perhaps even an L4. Cenoz, Genesee and Gorter (2014) provide a detailed
analysis of contrasting positions, concluding that CLIL serves as a cover
term for a whole range of programs today, with just about the only char-
acteristic they share being ‘their view that authentic content that extends
beyond language be used as a vehicle for L2/foreign language teaching and
learning – as well, of course, as being of importance for teaching and learn-
ing itself ’ (2014:255). As they point out, that is true of immersion, too.
In our view, despite the undeniable similarities and overlaps, differ-
ences between at least some CLIL and some immersion programs are real,
with nine features distinguishing many (but not all) CLIL programs from
some (but not all) true immersion programs (see Figure 1). Unfortunately,
they are differences that could reasonably be expected to reduce CLIL’s
effectiveness.
Most features in Figure 1 are self-explanatory, but the third and fourth
are related and deserve comment. To a large degree, immersion, especially
French immersion in Canada CLIL programs in many countries

Teachers
Native or near-native speakers of the L2 Mostly non-natives, some with limited
L2 ability
Trained content and immersion teacher Content teachers, with no or variable
training training for CLIL
Students
Younger students with greater capacity Older students with reduced capacity in
for incidental learning cases when programs begin later
Homogeneous L2 proficiency Heterogeneous L2 proficiency
L2 exposure
Several years, part or full-time As little as three or four hours per week
Native or native-like teacher input Sometimes restricted and deviant teacher
input
Some L2 access outside the classroom Variable, often minimal, L2 access
outside
Specialised pedagogic materials
Often available for immersion, as are Often unavailable for CLIL, which must
materials written for native speakers then use abridged versions
Articulation
Consistently available year to year, with Perhaps only available for one semester
articulation across grades and schools or year, with little articulation across
grades or schools
Figure 1: French immersion in Canada and CLIL programs in many countries compared.
early immersion, relies on students’ capacity for incidental (as opposed

to intentional) and implicit language learning, the dominant processes in
L1A or L2A in young children. Children pick up much of a language while
focused on something else, such as play, or in this case, subject matter.
Although still available to older children and adults (see, e.g., Ellis and
Wulff 2015; Rastelli 2014), the capacity is no longer as powerful in them
(Janacsek, Fiser and Nemeth 2012; Nemeth, Janacsek and Fiser 2013). In
particular, the capacity for instance learning, that is, the ability to pick up
arbitrary form-function and form-meaning associations (most obviously
the thousands of vocabulary items and collocations they need, but also
many grammatical rules) without conscious attention or intention to do so,
is weaker (Hoyer and Lincourt 1998). To the extent, therefore, that immer-
sion generally begins earlier, and CLIL sometimes later, immersion students
are at a psycholinguistic advantage. Compounding the problem, even when
the capacity is at its strongest, incidental learning requires large quanti-

ties of rich input and takes time. Several years of consistent immersion,
whether total or partial, involves massively more target language exposure
than the typical three or four hours a week of a single subject (occasionally,
six or eight hours for two subjects) taught through the medium of an FL,
perhaps only for one or two years.
As CLIL has evolved and spread, the priority originally accorded
improved foreign language learning has become a dual focus on both lan-
guage and content (Coyle 2008; Marsh 2002). Foreign language learning
outcomes have usually been reported as comparable to those in traditional
FL classes, with listening comprehension and vocabulary development
sometimes better (Aguilar and Muñoz 2014; Dalton-Puffer 2008; Järvinen
2007; Lorenzo, Casal and Moore 2010; Lorenzo, Moore and Casal 2011),
and until recently, subject-matter learning comparable (e.g. Jäppinen 2005)
or slightly better (Madrid 2011). In cases where both instructor and stu-
dents have a reasonably good command of the L2, CLIL seems to be a
viable arrangement. Elsewhere, however, while it is usually recommended
that teachers have at least C1 proficiency and a minimum of B2 on the
CEFR scale, B1 is common. CLIL students’ L2 proficiency varies, but is
often lower still, even A1 or A2 among secondary school students in some
countries, for example, Spain. In the many situations where the instruc-
tor’s command of the medium of instruction is weak, and the students’
command weaker still, the possibility obviously exists for CLIL to prove
unsatisfactory for language learning, due to limited and often deviant
teacher and student classroom input, and because of teachers’ and stu-
dents’ linguistic and communicative difficulties in the FL, unsatisfactory
for subject-matter learning as well.
CLIL advocates are usually well aware of the risks that some local instan-
tiations of the model present, yet, for such a widely accepted educational
innovation, very little research on FL-learning outcomes has been carried
out, and even less on content learning (Seikkula-Leino 2007), despite the
fact that many would consider mastery in such subjects as maths, science
or history at least as important as FL development. When teachers or stu-
dents’ L2 proficiency is limited, there is clear potential for linguistic simpli-
fication, curricular dilution, and eventually, poorer content mastery
Given these potentially negative outcomes, it is surprising that such a
sweeping educational change, affecting the lives of so many students, could
have been introduced with little evidence of its efficacy and some reason to
believe it might be unsuccessful, depending on local circumstances. Even
staunch supporters recognise that ‘empirical research on CLIL implemen-
tations visibly started to happen in different national contexts [only] around
the mid 2000s’ (Dalton-Puffer et al. 2013:214). Do students really learn an

FL better through CLIL than through FL courses taught by trained lan-
guage teachers? Do they learn as much maths, biology, history or econom-
ics as students who take the same courses in their native language in the
classroom next door, at a neighbouring institution or in another country?
It is tempting to assume that the success of French immersion programs
in Canada would imply affirmative answers to both questions. However,
as noted above, many Canadian immersion and CLIL programs differ in
several ways, and at least three research findings before CLIL was born
might have given policymakers some cause for concern.
First, in a quasi-ethnographic study of education in an Artic school in
northern Canada, Mackay (1986, 1993) observed the difficulties experi-
enced even by native English-speaking teachers delivering content lessons
in English to Inuktitut-speaking children, whose command of English
was poor. This situation repeatedly led to comprehension failures and
embarrassing breakdowns in communication. Mackay reported that what
he termed various ‘hygienic’ measures teachers employed to repair the
discourse and ‘clean up’ classroom trouble often entailed diluting lesson
content, for example, through switching to a simpler question or dropping
a topic. Most CLIL teachers fall well short of the L2 proficiency of the native
speaker teachers in Mackay’s study, making their task that much harder.
Second, a laboratory study by Lynch (1987) found that in their effort
to maintain comprehensibility, native speakers (NSs) describing the same
picture-guided stories deleted progressively more information as the ESL
(English as a second language) proficiency of their individual non-native
interlocutors decreased. If CLIL students’ L2 proficiency is low, as is often
the case, the amount of information lost over a school year could be sig-
nificant, and student command of the subject matter weaker than that of
students taking the same subject through their native language.
Third, Long and Ross (1993, 2009) found that linguistic simplification
of a short reading passage resulted in a text that contained fewer informa-
tion bits than the original version. Elaboration of the same text, conversely,
increased comprehensibility while retaining all of the original information.
(For a review of research findings on spoken and written simplification and
elaboration, see Long, 2015a, pp. 250–259.) Simplification, moreover, often
results in non-native-like L2 use, and in loss of key lexical items and collo-
cations, suggesting another potential drawback, this time for the language-
learning opportunities CLIL provides.
Research on CLIL
As several commentators have pointed out (see, e.g., Bruton 2011a, 2011b,
2013, 2015; Dallinger et al. 2016; Navés and Victori 2010; Pérez-Canado
2012; Rumlich 2013, 2014), some comparative studies of CLIL and tra-
ditional FL programs, for example, that by Lorenzo, Casal and Moore
(2010), have been beset by threats to internal validity, with selection being
particularly problematic. (For a discussion of the impact of six standard
threats to internal validity in comparative studies of L2 programs: history,
maturation, testing, instrumentation, selection and mortality, see Long,
1984.) Participation in CLIL programs is usually optional for both teach-
ers and students (although not always for students; see Hüttner, Dalton-
Puffer and Smit, 2013). Consequently, there is a lack of random assignment
to CLIL and regular FL courses, some CLIL programs even guaranteeing
non-equivalent groups by admitting students based on their superior FL
abilities. Yet when comparing intact groups, researchers typically fail to
correct for pre-existing differences among teachers and students, who may
have elected CLIL or non-CLIL groups voluntarily. If teachers volunteer
to teach their usual courses in an FL, it may indicate greater enthusiasm
and willingness to work harder, for example, preparing new materials and/
or greater competence. Students and families who choose CLIL often do
so because they value FL study more than those who do not, or are chosen
by the school as having superior FL abilities and/or greater potential to
thrive in CLIL. These factors can mean that the CLIL students are more
motivated to succeed, have higher starting English or other FL proficiency,
in turn often related to economic and social class differences, and may also
enjoy more out-of-class contact with English during the school year, for
example, through additional private tutoring. CLIL courses themselves
often provide as many as 50% more hours of instruction than the tradi-
tional foreign language courses with which they are compared (another
confounding variable sufficient to predict better outcomes), but even
then, do not always produce superior FL learning (Pladevall-Ballester and
Vallbona 2016). Muñoz (2015) provides an insightful review of studies
showing the importance of time (total hours) and timing (starting age) in
CLIL programs. There is sometimes a lack of established reliability of FL
or subject-matter measures, and in cases of pre-test–post-test designs,
unverified equivalence of their pre- and post-test forms.
In defence of the early studies, such threats to internal validity are by no
means unique to evaluations of CLIL. The use of non-equivalent control
group designs, with learning effects potentially confounded with pre-exist-
ing differences between students and conditions, differential time on task,
and measures of unknown reliability and/or unestablished equivalence of

pre- and post-test forms, afflicts much educational research in real school
settings, which is notoriously difficult to carry out reliably, given under-
standable restrictions on disturbing regular classes and curricula. Many
such problems can be dealt with satisfactorily, however, as shown by a
recent large-scale German evaluation of a CLIL program.
In a methodologically rigorous year-long study, one of the first to assess
subject-matter learning, as well as FL development, Dallinger and col-
leagues (2016) used a pre-test–post-test design to compare the achieve-
ment of 1,806 CLIL and non-CLIL German eighth graders in thirty-seven
schools in English as an FL and a content subject, history, controlling for
selection, students’ prior general English and listening comprehension
skills, prior history knowledge, individual differences (IDs) in student
motivation and figural and verbal cognitive abilities, quality of instruction
(assessed via student ratings), history teacher characteristics (enthusiasm
and self-efficacy, via teacher responses to Likert scale items) and a number
of other variables. CLIL students differed in the amount of history instruc-
tion they received – three lessons per week, instead of two. Pre- and post-
general English skills were assessed using equivalent versions of a 159-item
c-test, each form with high internal reliability (> 0.94), and listening abili-
ties via equivalent versions of a standardised test with internal reliability
ranging from 0.71 to 0.84. Individual student differences were assessed
using standardised tests. At the start of the school year, all students’ prior
history knowledge was assessed in German, using a cloze test and a mul-
tiple-choice test. At the end of grade 8, all non-CLIL and one-third of the
CLIL students took both tests (of history taught in grade 8) in German, and
the remaining CLIL students either the cloze or the multiple-choice test in
English. Reliabilities were acceptable (pre- 0.70, post- 0.79).
Using sophisticated statistical analyses, Dallinger and co-workers (2016)
identified significant pre-instruction differences between the CLIL and
non-CLIL groups – in prior achievement and motivation in English, and
often in history, cognitive abilities, and SES – and significant relationships
between those differences and outcomes at the end of grade 8. Student
ID variables, for example, explained roughly 55% of the within-classroom
variance in post-treatment c-test results, 30% in listening comprehension,
and 37% in history knowledge. Selection effects were visible among teach-
ers in the study, too, with CLIL teachers reporting more enthusiasm than
their non-CLIL counterparts. Controlling for these potentially confound-
ing pre-existing differences, it was shown that listening skills improved
significantly more in the CLIL group, but that there were no statistically
significant differences between the achievement of CLIL and non-CLIL
groups in general English abilities or in knowledge of history. A positive

effect on listening skills, but little grammatical improvement, is a common
finding in CLIL program evaluations, including one of a postgraduate
university engineering course (Aguilar and Muñoz 2014). The equivalent
achievement in history knowledge in the German study, however, does not
reflect well on CLIL, the authors point out, when it is recalled that the
CLIL students had received 50% more history instruction than the non-
CLIL students (three hours per week, instead of two). (Some CLIL pro-
grams provide additional English instruction for CLIL students, as well.)
Dallinger and associates concluded:
if future studies do not find any CLIL-advantages in other content subject-related
areas or English skills, then the implementation of CLIL programmes would have
to be questioned. (2016:30).
Dallinger et al.’s study is one of the first evaluations to examine the effects of
selection and other potential confounds, and one of the very few with suf-
ficient, and sufficiently reliable, measures to do so. (However, for a recent
two-year study of CLIL in Germany with similar findings, see Rumlich
2016.) The broad scope and methodological rigour of the work, along with
the lack of studies of comparable size and quality, make its findings unusu-
ally important. The results demonstrate that selection really can be a major
factor in determining outcomes in CLIL studies and needs to be controlled
for in future work. Since doing so is difficult in natural classroom settings,
alternative approaches, including a true randomised design, are worthy of
consideration. This will usually mean a laboratory study, with attendant
limitations on external validity, that is, the generalisability of findings to
real classroom settings. However, as proposed elsewhere (Long, 2015b),
pairing laboratory experiments and classroom studies, ideally performed
in that order, examining the same variables and using the same measures,
can provide a defensible basis for pedagogic recommendations if results
from each setting are comparable.
Two additional limitations of most research to date should be noted.
First, while valuable descriptive studies of lessons and lectures have been
reported (e.g. Smit 2010), documenting such matters as the greater fre-
quency of language-related episodes in CLIL than in EFL classrooms in
Spain (Basterrechea and García Mayo 2013), patterns of corrective feed-
back in CLIL and immersion classrooms (Llinares and Lyster 2014), and
the use of focus on form in lectures in English at an Italian university (Costa
2012), process variables have yet to be related to learning outcomes. Yet
L2 classroom researchers have long established the importance of detailed
descriptions of classroom processes and language use before moving on to
evaluation studies (for a detailed discussion, see Long, 2015a, pp. 347–364;
Shintani 2011). In comparative studies of FL and CLIL programs, impor-
tant dimensions of classroom discourse would include the proportions
of lessons delivered through the students’ L2 and/or L1, and the extent
to which each focused on subject matter as opposed to code features. An
absence of data on classroom processes renders explanations of findings
on learning outcomes speculative. Second, virtually all studies to date have
been conducted on the use of CLIL with school-age children. With the
model now spreading fast in both public and private tertiary institutions
in many countries, process-product research on CLIL with college-age
students is sorely needed, and research that considers both language and
subject-matter learning.
It was with these considerations in mind that a small-scale laboratory
study was undertaken to ascertain the feasibility of evaluating CLIL while
dealing with confounds that, with the notable exception of Dallinger and
co-workers (2016), have afflicted much of the early work. The study was
exploratory, designed to identify potential pitfalls and methodological con-
siderations when designing a larger-scale CLIL evaluation. The experimen-
tal lessons differed in several important ways from those in authentic CLIL
programs, so were not intended to produce findings generalisable to such
programs.
A micro process-product study of a CLIL lesson

A process-product laboratory study of a CLIL lesson was conducted. The
study was micro-scale and short-term. Of interest was the potential for
explaining product findings on both subject-matter and language learn-
ing (here, just vocabulary knowledge) through use of an instructional
process component missing from all previous product evaluations of CLIL
of which we are aware. Given the unnaturalness of several aspects of the
lessons and the limited scope of the study, and given the great variability of
CLIL programs outlined above, the purpose was not to produce findings
generalisable to any, much less all, such programs. Rather, the aim was to
investigate the feasibility of an approach to investigating their efficacy that
is common in educational research, but, to the best of our knowledge, has
yet to be used in research on CLIL.
The research questions motivating the study were as follows.
• RQ1: How does teacher speech differ when the medium of instruc-
tion is either the L1 or L2 of teachers and students?
• RQ2: What effect do the differences have on student learning of

lesson content?
• RQ3: What effect do the differences have on student knowledge
of the L2?
• RQ4: What is the potential of experimental process-product
research on CLIL?
Method
Design
The study employed a post-test only, criterion group design. Three lan-
guage conditions were examined: English baseline, CLIL and Arabic base-
line. Arabic was chosen as the L1 of participants in the CLIL and baseline
groups due to the surging interest in CLIL at the tertiary level in parts
of the Arab world (among other places). Each condition comprised three
groups, with one participant in each group functioning as an instructor
and four as students (see Figure 2). The medium of instruction and the L1
of all teachers and students in the English baseline condition was English.
The medium of instruction and L1 of all teachers and students in the
Arabic baseline condition was Arabic. In the CLIL condition, the medium
of instruction was English, while the L1 of all teachers and students was
Arabic.
Language conditions
English baseline CLIL condition Arabic baseline

condition (English as (English as medium condition (Arabic as
medium of instruction) of instruction) medium of instruction)
3 teachers, NSs of 3 teachers, NSs of 3 teachers, NSs of

English + 3 groups of 4 Arabic + 3 groups of 4 Arabic + 3 groups of 4
students, NSs of English students, NSs of Arabic students, NSs of Arabic
Figure 2: Design for the study.

Participants
Following IRB approval, forty-five participants were recruited via adver-
tisements and word of mouth. Participants in the English baseline condi-
tion were three graduate students and twelve undergraduates at a public
university in the USA, all native speakers (NSs) of English. The twelve
undergraduates were randomly assigned to form three groups of four, and
the three graduate students randomly assigned to serve as instructors,
one for each of the three groups. The fact that the graduate students were
not specialists in the subject matter, and that the material created for the
study was fictitious (see below), pre-empted potential confounds caused
by differences in teaching experience that might exist, and whose effects
would need to be examined, among real teachers. For the Arabic baseline
condition, fifteen native speakers of Gulf Arabic studying at a university
in a Middle-Eastern country and with minimal knowledge of English were
randomly assigned to form three groups of four undergraduates, and three
graduate students were randomly assigned to serve as their instructors,
one for each group. For the CLIL condition, potential participants, NSs of
Gulf Arabic studying at a public university in the USA, first completed an
English proficiency test and submitted their most recent TOEFL or other
standardised test scores, with the dates on which the tests had been taken.
The proficiency information was used to screen a suitable subset into the
study, and then for their stratified random assignment to form three groups
of four of comparable average proficiency in English. The proficiency level
used to determine eligibility to participate in the CLIL condition were the
iBT and PBT equivalent scores to CEFR B1 for teachers and CEFR A1 for
students. Participants functioning as teachers and students in the experi-
ment were paid $40 and $20, respectively.
Materials
To eliminate the possibility that participants might possess prior knowl-
edge of the subject matter, a story sufficient for a fifteen-minute lesson
was written especially for the study (for a brief excerpt, see Appendix 1),
together with forty multiple-choice test items. The story intentionally
contained plausible, but purely fictitious, information about an amateur
anthropologist’s alleged discovery of a hitherto unknown indigenous tribe,
the Kiriboe, in the Amazonian jungle. It covered such matters as the fictional
tribe’s language, matriarchal social organisation (a concept that turned out
to pose significant comprehension problems in the Arabic baseline condi-
tion), living arrangements, childcare, hunting methods, rituals and more,
as reported by Smith, the explorer, on his return to London. His account
contained several claims that raised suspicions as to its veracity, thereby

providing material for a number of post-test questions that required infer-
encing and ‘reading between the lines’ on students’ part. For instance, if
Kiriboese really exists and does not have a way of referring to future time,
as Smith claimed, how could the Kiriboe have told him what they would do
if ever attacked again? And while head-shrinking rituals like those Smith
reported have been observed elsewhere, for example, among the Jivaro
people of the Amazon rainforest, it was the heads of slain enemies that
were shrunk and displayed, not, as Smith reported to be the case with the
Kiriboe, those of deceased family members.
The text was written in semi-note form to make it less likely that teach-
ers could memorise and use ready-made sentences in the lessons that
followed. It came accompanied by a set of pictures linked to the main
pieces of information about the Kiriboe. The pictures were organised as
a PowerPoint presentation and followed the order of their mention in
the text. They were designed both to liven up the lessons for the students
and, more importantly, to serve as visual reminders for the participants
functioning as teachers of the main information they should include. They
were told that they would no longer have access to the texts when they
taught. The three teachers in the Arabic baseline condition were provided
with a translated version of the text. By design, it included a number of
low-frequency lexical items unlikely to be known to the Arabic-speaking
participants, for example, tributary, stilts, blow-darts, reptiles, carvings,
matriarchal, hollowed, fermented, skull and artifacts, which would serve as
target items for assessing vocabulary knowledge.
Instrumentation
Multiple-choice subject-matter test
A subject-matter assessment measure was developed consisting of forty
multiple-choice content questions. It was based on the key information
points in the lesson outline, each answer containing one correct choice
and three distractors. The test included a variety of items involving factual
recall, information synthesis and inference from information provided. An
Arabic translation of the measure was provided for the Arabic baseline and
CLIL conditions.
Modified cloze test of vocabulary knowledge

Vocabulary knowledge was assessed using a modified thirty-item cloze test
based on the original lesson script, with gaps where critical low-frequency
vocabulary items had been deleted. A translated version of the cloze test
was provided for the Arabic baseline condition.
Procedures
The three groups in each condition (a teacher and four students per group)
completed the experiment in a small classroom. Teachers arrived first and
were asked to read and sign the consent form in their native language. They
were then given sixty minutes to prepare their lesson. Teachers in the two
baseline conditions reviewed the accompanying picture prompts (printed
PowerPoint slides) and read the Kiriboe text in their native language,
English or Arabic, as many times as they liked. The CLIL teachers reviewed
the prompts and read the script in English, accompanied by a glossary of
Arabic translations of the low-frequency words. The teachers were allowed
to take notes, but were warned that they would only be permitted to use
the pictures, not the script or their notes, when teaching the lesson. This
was to preclude their reading excerpts of the script aloud, which would
have pre-empted the kind of spontaneous language and decision-making
characteristic of real lessons.
When the participants acting as students arrived an hour later, they read
and signed the consent form in their native language. Then, with one of the
researchers seated unobtrusively in the classroom, and with the English
baseline and CLIL sessions audio- and video-recorded, each teacher was
asked to deliver the lesson in fifteen minutes and to encourage student
participation, for example, in the form of questions and comments. As
instructed by the researchers, lessons were delivered exclusively in English
in the English baseline and CLIL conditions, and exclusively in Arabic in
the Arabic baseline condition. Some fairly minimal teacher–student inter-
action, including teacher responses to occasional student clarification
requests, was observed in all groups. Immediately after the lessons, stu-
dents completed the multiple-choice test on lesson content, in English for
the English baseline group, in Arabic for both the Arabic baseline and CLIL
groups. They then completed the cloze test. The tests were administered
in that order to all groups. The multiple-choice test required about fifteen
minutes, the cloze test approximately twenty minutes. The English baseline
and CLIL lessons were subsequently transcribed, and the transcripts veri-
fied by two native speakers.
Data and analyses

The number of words and number of clausal utterances (utterances with
one or more tensed verbs) in teacher speech in the English baseline and
CLIL conditions were counted as measures of volume of input. As mea-
sures of the lexical diversity of teacher speech, type-token ratios (TTRs)
and Guiraud’s Index of lexical richness (which corrects TTR for text
length) were computed using Wordsmith lexical analysis software. Low-

frequency words in the transcripts, those with frequencies of five or fewer
per million in the Corpus of Contemporary American English (COCA)1,
were identified (see Appendix 2 for a list of the low-frequency lexemes).
The frequency of mention of items targeted in the cloze test was calculated
for the English baseline and CLIL conditions. The syntactic complexity of
teacher speech in the three English baseline and three CLIL lessons was
assessed in terms of average s-nodes per clausal utterance.
No word count was conducted for the Arabic group, as comparisons
with the English data would have been misleading. In Arabic, an agglutina-
tive language, one word may contain the object, verb and subject. A simple
example would be the word Ahbik – I love you in English. Thus, while the
Arabic text has a word count of 1, the English version would be 3. Syntactic
complexity in the three Arabic lessons was not calculated either, again due
to the inappropriateness of s-nodes per clausal utterance as a measure for
an agglutinative language and the non-comparable results its use would
have produced.
Both the multiple-choice (MC) tests of subject-matter learning and the
cloze tests of vocabulary learning had two versions – English for the English
baseline and CLIL groups, Arabic for the Arabic baseline groups. After
standard item analyses, conducted separately on each version of both tests,
and elimination of unsatisfactory items, the performance of groups in the
three conditions on the MC and cloze was compared via a set ANOVAs,
with group as the independent variable and total test scores the dependent
variable.
Hypotheses
Summarised here for reasons of space, our hypotheses were motivated by:
1 the findings in the studies of Mackay (1986, 1993), Lynch (1987)

and Long and Ross (1993), described earlier;
2 results obtained in the extensive research over the past four
decades on linguistic and conversational adjustments to non-
native speakers made by native speakers (e.g. Long, 1983) and by
classroom language teachers (e.g. Chaudron 1982), and
3 the FL and subject-matter learning outcomes in recent evaluations
of CLIL (e.g. Dallinger et al. 2016).
We predicted more speech, more syntactically complex speech, and lexi-

cally more complex and diverse speech, for example, more low-frequency
types and tokens, in the English baseline than the CLIL condition, and
superior subject-matter learning and vocabulary scores in the English and
Arabic baseline conditions than in the CLIL condition.
Results
Quantity of teacher speech

The numbers of words and clausal utterances in teacher speech in the
English baseline and CLIL lessons are shown in Table 1a and 1b, respec-
tively. Teachers in the English baseline lessons produced more words and
more clausal utterances than teachers in the CLIL lessons, but one-tailed
t-tests for independent samples showed that the differences were not sta-
tistically significant: words (t (4) = 1.18, p = 0.3/2 = 0.15); clausal utterances
(t (4) = 0.5, p = 0.64/2 = 0.32). The results of the Shapiro–Wilk test and
Levene’s test had shown that the assumptions of normality and homogene-
ity of variance, respectively, were met for the two t-tests.
Table 1a: Words in teacher speech.
Teacher T1 T2 T3
English baseline 1765 1141 2156
CLIL 1105 570 1767
Table 1b: Clausal utterances in teacher speech.
English baseline CLIL
Teacher T1 T2 T3 T1 T2 T3
Clausal utterances 70 59 143 78 37 107
Total 272 222
Mean 90.67 74.00
SD 45.65 35.17
SEM 26.36 20.31
n 3 3
Syntactic complexity of teacher speech

The numbers of s-nodes in clausal utterances in teacher speech in the
English baseline and CLIL conditions are shown in Table 2a. Table 2b
shows the mean s-nodes per clausal utterance in teacher speech in the two
conditions. As predicted, the average complexity of teacher speech in the

three English baseline lessons (2.52) was higher than that in the three CLIL
lessons (1.89), but a one-tailed t-test for independent samples showed that
the difference was not statistically significant (df = 4, t = 1.13, p = 0.32/2
= 0.16). The results of the Shapiro–Wilk test and Levene’s test had shown
that the assumptions of normality and homogeneity of variance, respec-
tively, were met for the t-test.
Table 2a: Syntactic complexity of teacher speech.
Teacher T1 T2 T3 Total T1 T2 T3 Total

Clausal 70 59 143 272 78 37 107 222
utterances
s-nodes 159 211 248 133 77 203
Table 2b: Mean s-nodes per clausal utterance in teacher speech.
Teacher T1 T2 T3
English baseline 2.27 3.58 1.73
CLIL 1.71 2.08 1.89
Lexical complexity and diversity of teacher speech

It is well known that type-token ratios (TTRs) decrease with increasing
text length, as speakers or writers tend to repeat themselves more as they
speak or write more about the same topics. Guiraud’s Index of lexical
richness corrects for text length, and was applied here to take the varying
length of the lesson transcripts into account. The data for vocabulary and
low frequency words in teacher speech are shown in Table 3a, 3b and 3c.
A significant difference between the English baseline and CLIL groups was
found in the number of types (t (4) = 2.7, p = 0.05/2 = 0.03) and number of
low-frequency types (t (4) = 3.32, p = 0.03/2 = 0.015). However, there were
no significant differences in number of tokens (t (4) = 1.402, p = 0.234/2=
0.117), TTR (t (4) = –0.75, p = 0.49/2= 0.25), Guiraud’s Index (t (4) = 1.24,
p = 0.28/2 = 0.14), or low-frequency TTR (t (4) = –9.22, p = 0.41/2 = 0.21).
There were marginally significant differences between the two groups in
the number of low-frequency tokens (t (2.02) = 3.04, p = 0.09/2 = 0.05)
and the low-frequency Guiraud’s Index measure (t (4) = 1.99, p = 0.117/2
= 0.06). The results of the Shapiro–Wilk test and Levene’s test had shown
that the assumptions of normality and homogeneity of variance, respec-

tively, were met for all the above t-tests, with one exception. Levene’s test
indicated unequal variances (F = 9. 02, p = 0.04) for the low-frequency
tokens t-test, so degrees of freedom were adjusted from 4 to 2.05.
Table 3a: Numbers and descriptive statistics for vocabulary in teacher speech.
Group n Mean SD SEM

Type English baseline 3 407.33 30.50 17.61
CLIL 3 302.33 60.01 34.65
Token English baseline 3 1463.67 466.53 269.35
CLIL 3 947.33 434.96 251.12
TTR English baseline 3 0.29 0.08 0.04
CLIL 3 0.35 0.10 0.06
Guiraud’s Index English baseline 3 10.85 0.99 0.57
CLIL 3 10.07 0.44 0.25
Low-frequency type English baseline 3 12.67 3.51 2.03
CLIL 3 5.33 1.53 0.88
Low-frequency token English baseline 3 23.00 8.72 5.03
CLIL 3 7.67 0.58 0.33
Low-frequency TTR English baseline 3 0.58 0.12 0.07
CLIL 3 0.70 0.19 0.11
Low-frequency Guiraud’s Index English baseline 3 2.66 0.36 0.21
CLIL 3 1.96 0.53 0.31
Table 3b: Low-frequency words in teacher speech.
Teacher T1 T2 T3 T1 T2 T3
Type 9 16 13 4 7 5
Token 13 27 29 8 8 7
TTR 0.69 0.59 0.45 0.50 0.88 0.71
Guiraud’s Index 2.50 3.08 2.41 1.41 2.47 1.89
Table 3c: Group statistics for low-frequency words.
Group n Mean SD SEM

Type English baseline 3 12.67 3.51 2.03
CLIL 3 5.33 1.53 0.88
Token English baseline 3 23.00 8.72 5.03
CLIL 3 7.67 0.58 0.33
TTR English baseline 3 0.58 0.12 0.07
CLIL 3 0.70 0.19 0.11
Guiraud’s Index English baseline 3 2.66 0.36 0.21
CLIL 3 1.96 0.53 0.31
There was very little difference between the English baseline and CLIL
lessons either in the total numbers of mentions of the forty target items
in the cloze test or in the numbers of the forty items not mentioned by
any of the three teachers in each condition (see Appendix 3). Thirty out
of forty target items were mentioned a total of eighty times by the three
teachers in the English baseline condition. In the CLIL groups, twenty-six
out of the forty target items were mentioned by the three teachers a total
of seventy-six times. The difference was statistically non-significant (z =
0.43, p = 0.67).
A simple linear regression analysis was conducted to determine whether
the number of correct responses to the cloze test items could be predicted
from the number of occurrences of the words needed for correctly answer-
ing those items in teacher speech. The null hypothesis tested was that the
regression coefficient (i.e. the slope) was equal to 0. Prior to analysis, the
data for both the English baseline and CLIL groups were screened for
missing entries and violation of assumptions. There were no missing data.
For the English baseline groups, the results of the simple linear regression
suggested that a significant proportion of the total variance in the number
of correct answers to cloze test items was predicted by the number of
occurrences in teacher speech of the words needed to answer the items
correctly. In other words, the number of times a word appeared in the
lessons was a good predictor of correct responses to the cloze test item
targeting that word (F (1, 38) = 18.5, p < 0.001). Additionally, it was found
that (a) the unstandardised slope (0.91) and standardised slope (0.57) were
statistically significantly different from 0 (t = 4.3, df = 38, p < 0.001). Finally,
multiple R-squared indicated that approximately 32.7% of the variance in
cloze test scores was predicted by the number of occurrences of the rel-
evant words in the lessons. According to Cohen (1988), this constitutes a
medium effect.
For the CLIL groups, the results were similar. The simple linear regres-
sion suggested that a significant proportion of the total variance in the
number of correct answers to cloze test items was predicted by the number
of occurrences in teacher speech of the words required to answer the items
correctly. In other words, the number of times a word appeared in the
lessons was a good predictor of correct responses to the cloze test item
targeting that word (F (1, 38) = 10.29, p < 0.001). Additionally, it was found
that (a) the unstandardised slope (0.23) and standardised slope (0.46) were
statistically significantly different from 0 (t = 3.2, df = 38, p < 0.001). Finally,
multiple R-squared indicated that approximately 21.3% of the variance in
cloze test scores was predicted by the number of the occurrences of the
relevant words in the lessons. According to Cohen (1988), this constitutes
a small effect.
Subject-matter learning
Three items on the MC test of subject-matter learning were deleted from
both versions of the subject-matter test because they were discovered to
have two possibly correct answers. The two versions of the test were then
subjected to a classic test theory item and reliability analysis. The first round
of item analysis on the version for the English baseline and CLIL groups (n
= 24) revealed four items with a zero or negative item discrimination index
(DI). Those items were deleted, the reliability estimate (Cronbach’s alpha,
α) for this version increasing from 0.69 to 0.74. The same analysis for the
Arabic version (n = 12) identified three items with a zero or negative DI.
Those items were deleted, α increasing from 0.82 to 0.86. The second round
of item analysis on both versions of the test found no additional items with
a zero or negative DI. Total scores for the three experimental groups were
calculated for the purpose of group comparisons. As can be seen in Table
4 and Figure 3, the English and Arabic baseline groups outperformed the
CLIL groups, and statistically significantly so in the case of the English
baseline groups.
Assumptions for the one-way ANOVA analysis were met, except that
variances among the groups were unequal, as revealed by the results of
Levene’s test: (2, 33) = 3.75, p = 0.03. The Welch test was used, therefore,
the results showing that the omnibus test was significant, F (2, 19.25), p
= 0.04. Dunnett’s T3 post-hoc test indicated that the difference between
the English baseline and CLIL groups was marginally significant, with a
p-value of 0.05. The remaining group comparisons yielded no significant
differences between groups.
Table 4: Means and standard deviations for the MC test of subject- matter learning.
Teacher n Mean SD
English baseline 12 16.75 4.29
CLIL 12 13.00 2.22
Arabic NSs 12 14.83 5.13
Figure 3: Group differences for the MC test scores for subject-matter learning.
Vocabulary knowledge
The first round of item analysis for the version for the English baseline and
CLIL groups (n = 24) revealed seven items with a zero or negative DI. After
deleting those items, α for this version rose from 0.91 to 0.92. The same
analysis for the Arabic version (n = 12) identified ten items with a zero or
negative DI. Those items were removed, α increasing from 0.82 to 0.90. The
second round of item analysis on both versions of the test found no addi-
tional items with a zero or negative DI. Total scores for the three experi-
mental groups were then calculated for the purpose of group comparisons.
Assumptions for the one-way ANOVA analysis were met, except that
the variances among the groups were unequal, as revealed by the results of
Levene’s test: (2, 33) = 8.66, p = 0.00. Therefore, the Welch test was used,
the results showing that the omnibus test was significant, F (2, 17.04), p =
0.00. As shown in Table 5 and Figure 4, both baseline groups outperformed
the CLIL group. Dunnett’s T3 post-hoc test indicated that the differences
between the English baseline and CLIL, and Arabic baseline and CLIL,
groups were statistically significant, with a p-value of 0.00. The difference

between the English and Arabic NS groups was not statistically significant.
Table 5: Means and standard deviations for the cloze test of vocabulary learning.
Teacher n Mean SD
English baseline 12 13.58 4.62
CLIL 12 1.50 1.68
Arabic NSs 12 16.42 7.01
Figure 4: Group differences for the cloze test scores for vocabulary learning.
Discussion
Results for the process variables were mixed. The sheer volume of input
in English baseline lessons was greater than in the CLIL lessons, but not
statistically significantly so, as measured both by the numbers of words and
clausal utterances in teacher speech. As predicted, the syntactic complexity
of teacher speech in the baseline condition was also greater, as measured
by s-nodes per clausal utterance, but again, not statistically significantly
so. These differences could be expected to become statistically significant
with larger samples. As it was, the small n-size limited the statistical power
to detect differences, a problem increased by the considerable variability
among teachers in the same condition, for example, in the case of clausal
utterances.
The lexical richness and diversity of teacher speech tended to be greater

in the English baseline lessons, as measured by number of types and low-
frequency types. These differences, together with the marginally statistically
significantly higher numbers of low-frequency tokens and the low-fre-
quency Guiraud’s Index measure, constitute a potentially troubling finding
for CLIL. As suggested by the early studies of Mackay (1986, 1993), Lynch
(1987) and Long and Ross (1993), continued over time, such patterns could
well indicate content dilution, with potentially important information lost
as non-native teachers addressing non-native students in the L2 default to
higher frequency lexical items and collocations (fall asleep  go to sleep,
French Brittany  dog, iron oxide  metal, plurality  majority, increase
 get bigger, etc.) in their effort to maintain communication.
Unexpectedly, the number of target lexical items mentioned – 30/40
and 26/40 in the English baseline and CLIL lessons, respectively – and total
number of mentions of those items – eighty and seventy-six, respectively
– differed little. It seems that the PowerPoint slides did their job by provid-
ing the surrogate teachers with visual reminders of the key information to
be covered, but in doing so, inadvertently precluded a direct test of one
of the major hypotheses, that subject matter coverage would be diluted
in the CLIL condition. The pictures on a slide pertaining to key pieces of
information prompted mention of the information, at least. Whether that
would happen in real lessons, and if mentioned, how superficially or deeply
the information would be treated, remain open questions. The small dif-
ferences in numbers of mentions of key items is certainly an issue to be
examined more closely in larger scale process-product research on CLIL
programs, for again, as suggested by the Lynch (1987) study, key lexical
items and collocations tend to mark key pieces of subject-matter informa-
tion, and their absence to indicate subject-matter dilution. Input frequency,
moreover, is known to be a major factor in language learning. This was
reflected here in the finding that within conditions, the number of times a
word appeared in the lessons was a predictor of correct responses to the
cloze test item targeting that word and accounted for 32.7% and 21.3% of
the variance in cloze test scores within the English baseline and CLIL con-
ditions, respectively.
The English baseline groups outperformed the CLIL groups statisti-
cally significantly on the multiple-choice test of subject-matter learning,
and both the English and Arabic baseline groups outperformed the CLIL
groups statistically significantly on the vocabulary cloze test. They did so
despite the reduced number of test items mitigating against finding statis-
tically significant differences through limiting the total possible variability
in each case. The loss of items in both the MC and cloze outcome measures
was chiefly due to a lack of time in which to pilot either measure before its
use in the study for fear of reducing our pool of potential Arabic-speaking
participants for the CLIL condition, some of whom were soon to return to
their country of origin. Nevertheless, the English baseline advantage for
content learning, and both the English baseline and Arabic baseline advan-
tage for vocabulary scores, were statistically significant, in line with find-
ings of more recent studies, such as Dallinger and colleagues (2016), and
like them, suggest a serious need for further research before CLIL is hailed
as a success in either domain.
There were no pre-tests in the design for this study, so language learning
in the CLIL groups could only be assessed indirectly, and since the les-
son’s short duration precluded measurable grammatical or other linguistic
development, only in terms of post-test vocabulary cloze test scores. (The
need for a pre-test of content knowledge was obviated by the use of fictional
subject matter.) Students in the native speaker baseline groups would have
known all, or almost all, the target lexical items before the study began,
making the relevant data here the CLIL groups’ cloze test scores. The CLIL
groups’ mean was a mere 1.5 out of a possible 23 (after seven poor items
among the original thirty had been removed) – a meagre return even for
such a short lesson. A full-scale study will require pre-testing of whichever
linguistic abilities are to be targeted.
Although language learning was the original driver for CLIL in Europe,
results for subject-matter learning are potentially even more important
now that CLIL programs are becoming so common across the curricu-
lum in some countries, potentially placing curricular content at risk. In
research of larger scope and longer duration than the micro-study reported
here, the syntactic complexity and low-frequency lexical type diversity of
teacher speech and other classroom process variables offer potential means
of explaining differential subject-matter results (or the same results despite
50% more time being allocated) in CLIL compared with regular versions of
courses delivered in students’ native language(s).
Some results failed to achieve statistical significance, but those for
several instructional process variables and the two outcome measures gen-
erally conformed to predictions. The non-statistically significant findings
probably reflected one or more of at least four factors, each easy to modify
in future larger scale research:
1 the small numbers of teachers and students in the study,

2 considerable individual variation within teacher performance,
mostly due to the novelty of the task for student surrogates,
3 the relatively low number of items in the subject matter and vocab-
ulary tests, and especially,
4 the study’s brief duration.
The differences observed, it should be remembered, were all in the pre-

dicted direction and were obtained on the basis of a single fifteen-minute
lesson. The magnitude of the same effects over a fifteen-week semester or a
whole school year of forty-five weeks of three or four hours of instruction
per week with larger numbers of real teachers and students of the subject
matter concerned, and better tests, could be expected to be far greater.
To reiterate, however, whatever the outcomes had been, the purpose of
this micro-study was not to generalise findings to genuine CLIL programs.
Rather, the aim was to assess the potential of process-product laboratory
studies, eventually paired with classroom studies of authentic CLIL pro-
grams employing the same basic design and methodology, as one compo-
nent among many in future research on their results and relative efficacy.
Limitations and conclusions

Given the fact that CLIL programs of various kinds and scope are spread-
ing fast, including in high schools and universities, and already involve
large numbers of students in many parts of the world, there is a pressing
need for evaluations both of their absolute effectiveness and of their rela-
tive effectiveness in comparison with the same content courses delivered
using teachers’ and students’ native languages. Delivery of the equivalent
content course through the L1 may not be a realistic option in multilingual
urban primary and secondary school settings where immigration has led
to the presence of multiple L1s in the same classroom and made English an
attractive alternative; however, as noted earlier, use of the students’ native
language remains an option in many parts of the world, for example, in the
Middle East, where CLIL is increasingly being used with a student body
that remains linguistically and ethnically homogeneous. Are those students
really being well served by CLIL? Is their command of English improving
as much as would be the case through traditional foreign language courses,
and is their subject-matter learning being negatively affected?
As with most educational research, a variety of methodological
approaches will be required, both qualitative and quantitative, each with
unique insights to contribute. Controlled process-product laboratory
studies of the kind illustrated, but on a larger scale and of longer dura-
tion, offer potential for avoiding confounding variables and, crucially, of
using the instructional process findings to explain language-learning and
subject-matter learning outcomes. As detailed elsewhere (Long, 2015b),

laboratory studies should precede, but eventually be paired with, studies
of the same populations, employing a similar design and most of the same
research methodology, in authentic classroom settings. If causal relation-
ships cannot be established under controlled laboratory conditions, it is
unreasonable, and some would argue, premature, to implement the model
in real-world classrooms, in just the same way that it would be dangerous
to release new drugs for public use before they had been tested and shown
to work in controlled laboratory trials.
Several limitations of the pilot study reported here have already been
noted, but at least four are worth reiterating. First, and most obviously,
as this was a micro-study to ascertain the feasibility and potential foci of
quasi-experimental process-product work on CLIL with college-age learn-
ers, the scope and duration of the research were both tiny. Second, while
tolerably realistic, the subject matter was invented, so as to preclude the
possibility of a confound in the form of prior student knowledge. Third, the
participants were not real teachers or students (of this subject matter), but
surrogates recruited for the purpose of the study. Genuine subject-matter
teachers using material with which they are familiar and for which they are
trained and experienced, working with students with whom they are famil-
iar, and with whose command of the subject matter and the L2 they are
also familiar, might very well produce different results. Over the course of
a semester or school year, some could be expected to deal with many prob-
lems, both linguistic and with regard to content. Whether they could do so
while covering the regular curriculum adequately in the same amount of
time, however, remains a serious open question. Fourth, as with any study,
outcome measures of both language learning and subject-matter learning
of greater breadth and depth, and with previously established item reli-
ability, would enhance the validity of findings and simultaneously increase
their potential for revealing statistically significant differences between
conditions if the differences we predict turn out to be real.
About the authors

Michael H. Long is Professor of SLA at the University of Maryland (USA). He has served
on the editorial boards of many journals and was co-editor of the Cambridge Applied
Linguistics Series for its first 20 years. Recent publications include The Handbook of
Second Language Acquisition (Blackwell, 2003), Second Language Needs Analysis (CUP,
2005), Problems in SLA (Erlbaum, 2007), The Handbook of Language Teaching (Wiley-
Blackwell, 2009), Sensitive Periods, Language Aptitude, and Ultimate L2 Attainment
(John Benjamins, 2013) and Second Language Acquisition and Task-Based Language
Teaching (Wiley-Blackwell, 2015). In 2009, he was awarded a doctorate honoris causa
by Stockholm University for his contributions to the field of SLA.
Assma Al Thowaini is a PhD candidate in the Second Language Acquisition program

at the University of Maryland and a lecturer at King Saud University. Her research
interests include instructed second language acquisition, explicit and implicit language
learning, computer-mediated communication (CMC) in language learning, and Arabic
word recognition and lexical access. ama267@umd.edu
Buthainah Al Thowaini is a PhD candidate in the Second Language Acquisition

program at the University of Maryland, and a faculty member at King Saud University,
Riyadh. Along with instructed SLA, her research interests include lexical and morpho-
logical processing of Arabic, and the relationships between lexical access processes,
language learning, and higher cognitive functions. Bma3@umd.edu
Jiyong Lee is a PhD student in the Second Language Acquisition program at the Uni-
versity of Maryland. Her research interests include negative feedback, age effects and
maturational constraints in SLA, Korean phonology and morphosyntax, the validation
of task complexity manipulations, and relationships among task complexity, language
aptitude, and L2 performance. jlee0123@umd.edu
Payman Vafaee is a Lecturer in Applied Linguistics at Teachers College of Columbia

University. He earned his PhD in SLA from the University of Maryland with a disserta-
tion on the relative contributions of lexical and syntactic knowledge in L2 listening
comprehension. His research interests include L2 assessment and testing, quantita-
tive research methodology, instructed SLA, and cognitive individual differences. He
has published in several journals, including Studies in Second Language Acquisition,
Working Papers in TESOL & Applied Linguistics, and International Journal of Language
Testing. pv2203@tc.Columbia.edu
Appendix 1: subject matter for the lesson (excerpts)
The Kiriboe: fact or fiction?

Name: Kiriboe
Population: A total of about 1,000 individuals, divided into about twenty family
groups, each spanning several generations.
Living arrangements: All generations of a family live together in a single long,

rectangular hut made of wood and tree branches and set about ten feet above ground
on stilts.
Social structure and life-style: A matriarchal society of hunter-gatherers. Men have

primary responsibility for building and maintaining the family shelter, for cooking
and for childcare. When women die, their property passes to their daughters, or in the
absence of daughters, to their nieces. Decision-making authority is vested in women
elders, typically grandmothers and great-grandmothers. Younger women leave the
tribe for two or three days at a time to forage for edible plants and fruit, and to hunt,
using spears and poison darts fired from blow-guns.
Food: Monkeys and small mammals, small reptiles, coconuts, honey, bananas,
mangos, oranges and other fruit, wild plants and herbs.
Tools and weapons: Knives fashioned from stones and animal bones, spears made
from sharpened tree branches, blow-pipes made from hollowed bamboo, used to fire
darts tipped with a lethal plant-based poison.
Language: Kiriboese. Kiriboese can be used to describe past and present events and
situations, and has a rich array of colour terms and numerous terms for different kinds
of family members, but it has no words for counting and no means for referring to the
future. The Kiriboe believe talking about the future brings bad luck.
Appendix 2
Lexeme Frequency
ancestral 2,362
paddle 2,305
edible 2,066
dart 1,877
poisonous 1,868
reptile 1,849
trudge 1,849
forage 1,700
repel 1,604
slay 1,508
taboo 1,488
unheard 1,481
zebra 1,405
tributary 1,404
fight off 1,127
fermented 1,076
linguist 991
machete 871
encroach 831
songbird 748
great-grandmother 723
stilt 708
childcare 617
shrunken 443
stereotypically 185
matriarchal 139
blow dart 11
dart blower 0
Appendix 3: total mentions of the 40 target items, and number

not mentioned
Item # Target Word English CLIL
1 indigenous 1 2
2 explorer 4 6
3 jungle 1 5
4 sceptical 4 2
5 add up 0 2
6 paddling 5 2
7 tributary 2 1
8 dense 2 0
9 machete 3 3
10 spanning 1 0
11 stilts 2 1
12 matriarchal 11 4
13 shelter 0 2
14 absence 0 0
15 vested 0 0
16 forage 2 0
17 poison darts 3 9
18 partner 0 2
19 approved 1 3
20 reptiles 2 2
21 spears 4 4
22 hollowed 1 1
23 tipped 0 2
24 array 1 0
25 pets 3 1
26 wood carvings 2 1
27 play with 3 2
28 consume 1 0
29 ancestor 9 8
30 skulls 2 6
31 offerings 2 0
32 fought them off 1 3
33 encroached 1 0
34 come to the attention of 0 0
35 neighbouring 1 1
36 gone undetected 0 0
37 achieve 0 0
38 rituals 1 0
39 slain 0 1
40 artifacts 4 0
Total mentions of the 40 target items/numbers of items not mentioned 80/10 76/14
Note
1 COCA is a free online corpus that contains more than 520 million words, compiled
from 1990 to 2015. It consists of spoken texts, fiction, popular magazines, newspa-
pers and academic texts.
References
Aguilar, M. and Muñoz, C. (2014) The effect of proficiency on CLIL benefits in engi-
neering students in Spain. International Journal of Applied Linguistics 24(1): 1–18.
https://doi.org/10.1111/ijal.12006
Basterrechea, M. and García Mayo, P. (2013) Language-related episodes during collab-
orative tasks: a comparison of CLIL and EFL learners. In K. McDonough and A.
Mackey (eds) Second Language Interaction in Diverse Educational Contexts 25–43.
Philadelphia: John Benjamins. https://doi.org/10.1075/lllt.34.05ch2
Bruton, A. (2011a). Are the differences between CLIL and non-CLIL groups in Andalu-
sia due to CLIL? A reply to Lorenzo, Casal and Moore (2010). Applied Linguistics
32(2): 236–41. http://applij.oxfordjournals.org/content/32/2/236.abstract; https://
doi.org/10.1093/applin/amr007
Bruton, A. (2011b) Is CLIL so beneficial, or just selective? Re-evaluating some of the
research. System 39(4): 523–32. https://doi.org/10.1016/j.system.2011.08.002
Bruton, A. (2013) CLIL: Some of the reasons why … and why not. System 41(3): 587–97.
https://doi.org/10.1016/j.system.2013.07.001
Bruton, A. (2015). CLIL: detail matters in the whole picture. More than a reply to J.
Huttner and U. Smit (2014). System 53: 119–28. https://doi.org/10.1016/j.
system.2015.07.005
Cenoz, J., Genesee, F. and Gorter, D. (2014) Critical analysis of CLIL: taking stock and
looking forward. Applied Linguistics 35(3): 243–62. https://doi.org/10.1093/applin/
amt011
Chaudron, C. (1982) Vocabulary elaboration in teachers’ speech to L2 learners. Studies
in Second Language Acquisition 4(2): 170–80. https://doi.org/10.1017/
S027226310000440X
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd edn).
Mawah, NJ: Lawrence Erlbaum.
Costa, F. (2012) Focus on form in ICLHE lectures in Italy. Evidence from English-
medium science lectures by native speakers of Italian. AILA Review 25(1): 30–47.
http://www.jbe-platform.com/content/journals/10.1075/aila.25.03cos
Coyle, D. (2008) CLIL – a pedagogical approach from the European perspective. In N.
van Dusen-Scholl and N. Hornberger (eds) Encyclopedia of Language and Education
Vol. 4 97–110. Berlin: Springer. https://doi.org/10.1007/978-0-387-30424-3_92
Coyle, D., Hood, P. and Marsh, D. (2010). Content and Language Integrated Learning.
Cambridge: Cambridge University Press.
Cummins, J. (1998) Immersion education for the millennium: what have we learned
from 30 years of research on second language immersion? In M. R. Childs and R. M.
Bostwick (eds) Learning Through Two Languages: Research and Practice 34–47. Shi-
zuoka: Katoh Gakuen.
Cummins, J. (2009) Bilingual and immersion programs. In M. H. Long and C. J. Doughty
(eds) The Handbook of Second Language Teaching 161–81. Oxford: Wiley-Blackwell.
https://doi.org/10.1002/9781444315783.ch10
Dallinger, S., Jonkmann, K., Hollm, J. and Fiege, C. (2016) The effect of content and
language integrated learning on students’ English and history competences – killing
two birds with one stone? Learning and Instruction 41: 23–31. https://doi.
org/10.1016/j.learninstruc.2015.09.003
Dalton-Puffer, C. (2008) Outcomes and processes in CLIL: current research from
Europe. In W. Delanoy and L. Volkmann (eds) Future Perspectives for English Lan-
guage Teaching 139–57 Heidelberg: Carl Winter.
Dalton-Puffer, C. (2011) Content-and-language integrated learning: from practice to
principles? Annual Review of Applied Linguistics 31: 182–204. https://doi.
org/10.1017/S0267190511000092
Dalton-Puffer, C., Llinares, A., Lorenzo, F. and Nikula, T. (2013) ‘You can stand under
my umbrella’: immersion, CLIL and bilingual education. A response to Cenoz, Gen-
esee and Gorter. Applied Linguistics 35(2): 213–18. http://applij.oxfordjournals.org/
content/35/2/213.abstract; https://doi.org/10.1093/applin/amu010
Ellis, N. C. and Wulff, S. (2015) Usage-based approaches to SLA. In B. VanPatten and J.
Williams (eds) Theories in Second Language Acquisition. An Introduction (2nd edn)
75–93. New York: Routledge.
Genesee, F. (1995) The Canadian second language immersion program. In O. Garcia
and C. Baker (eds) Policy and Practice in Bilingual Education 118–33. Clevedon:
Multilingual Matters.
Hoyer, W. J. and Lincourt, A. E. (1998) Ageing and the development of learning. In M.
A. Stadler and P. A. Frensch (eds) Handbook of Implicit Learning 445–70. Thousand
Oaks: Sage.
Hüttner, J. and Smit, U. (2014) CLIL (content and language integrated learning): the
bigger picture. A reply to A. Bruton (2013) System 41: 587–97. https://doi.
org/10.1016/j.system.2014.03.001
Hüttner, J., Dalton-Puffer, C. and Smit, U. (2013) The power of beliefs: lay theories and
their influence on the implementation of CLIL programmes. International Journal
of Bilingual Education and Bilingualism 16(3): 267–84. https://doi.org/10.1080/136
70050.2013.777385
Janacsek, K., Fiser, J. and Nemeth, D. (2012) The best time to acquire new skills: age-
related differences in implicit sequence learning across the human lifespan. Develop-
mental Science 15(4): 496–505. https://doi.org/10.1111/j.1467-7687.2012.01150.x
Jäppinen, A.-K. (2005) Thinking and content learning of mathematics and science as
cognitional development in content and language integrated learning (CLIL): teach-
ing through a foreign language in Finland. Language and Education 19(2): 148–69.
https://doi.org/10.1080/09500780508668671
Järvinen, H. M. (2007). Language in language and content integrated learning (CLIL).
In D. Marsh and D. Wolff (eds) Diverse Contexts – Converging Goals: CLIL in Europe
253–60. Bern: Peter Lang.
Johnson, R. K. and Swain, M. (1997) Immersion Education: International Perspectives.
Cambridge University Press. https://doi.org/10.1017/CBO9781139524667
Lambert, W. E. and Tucker, G. R. (1972) Bilingual Education of Children: The St. Lam-
bert Experiment. Newbury House.
Lasagabaster, D. and Ruiz de Zarobe, Y. (2010) CLIL in Spain: Implementation, Results,
and Teacher Training. Newcastle: Cambridge Scholars.
Lasagabaster, D. and Sierra, J. M. (2010) Immersion and CLIL in English: more differ-
ences than similarities. ELT Journal 64(4): 367–75. http://eltj.oxfordjournals.org/
content/64/4/367.abstract; https://doi.org/10.1093/elt/ccp082
Llinares, A. and Lyster, R. (2014) The inﬂuence of context on patterns of corrective

feedback and learner uptake: a comparison of CLIL and immersion classrooms.
Language Learning Journal 42(2): 181–94. https://doi.org/10.1080/09571736.2014.8
89509
Long, M. H. (1983). Linguistic and conversational adjustments to non-native speakers.
Studies in Second Language Acquisition 5(2): 177–93.
Long, M. H. (1984). Process and product in ESL program evaluation. TESOL Quarterly
18(3): 409–25.
Long, M. H. (2015a). Second language acquisition and Task-Based Language Teaching.
Oxford: Wiley-Blackwell.
Long, M. H. (2015b). Experimental perspectives on classroom interaction. In N. Markee
(ed.) Handbook of classroom discourse and interaction 60–73. Oxford:
Wiley-Blackwell.
Long, M. H. and Ross, S. (1993) Modifications that preserve language and content. In
M. Tickoo (ed.) Simplification: Theory and Application 29–52. Singapore: SEAMEO
Regional Language Centre.
Long, M. H. and Ross, S. (2009) Input elaboration: a viable alternative to ‘authentic’ and
simplified texts. In K. Namai and Y. Fukada (eds) Toward the Fusion of Language,
Culture and Education: From the Perspectives of International and Interdisciplinary
Research. A Festschrift for Yasukata Yano 307–25. Tokyo: Kaitakusha.
Lorenzo, F., Casal, S. and Moore, P. (2010) The effects of content and language inte-
grated learning in European education: key findings from the Andalusian bilingual
sections evaluation project. Applied Linguistics 31(3): 418–42. http://applij.oxford-
journals.org/content/31/3/418.short; https://doi.org/10.1093/applin/amp041
Lorenzo, F., Moore, P. and Casal, S. (2011) On complexity in bilingual research: the
causes, effects, and breadth of content and language integrated learning – a reply to
Bruton (2011). Applied Linguistics 32(4): 450–5. http://applij.oxfordjournals.org/
content/32/4/450.short; https://doi.org/10.1093/applin/amr025
Lynch, T. (1987). Modifications to Foreign Listeners: The Stories Teachers Tell. ERIC
Document ED 274 255. Center for Applied Linguistics. http://eric.ed.
gov/?id=ED274225
Lyster, R. (2007) Learning and Teaching Languages through Content: A Counterbal-
anced Approach. Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.18
Mackay, R. (1986) The Role of English in Education in an Eastern Arctic School: An
Account of Success and Failure in the Canadian Arctic. PhD dissertation, L’Université
de Montréal.
Mackay, R. (1993) Embarrassment and hygiene in the classroom. ELT Journal 47(1):
32–39. http://eltj.oxfordjournals.org/content/47/1/32.abstract; https://doi.org/
10.1093/elt/47.1.32
Madrid, D. (2011) Monolingual and bilingual students’ competence in social sciences.
In D. Madrid and S. Hughes (eds) Studies in Bilingual Education 195–222. Bern:
Peter Lang.
Marsh, D. (2002) CLIL/EMILE – The European Dimension: Actions, Trends and Fore-
sight. Potential Public Services Contract DG EAC. European Commission.
Mauranen, A. (2012). Second-order language contact: English as an academic lingua
franca. In M. Filppula, J. Klemola and D. Sharma (eds) The Oxford Handbook of
World Englishes. Oxford: Oxford University Press.
Muñoz, C. (2015) Time and timing in CLIL. In M. Juan-Garau and J. Salazar-Noguera

(eds) Content-Based Learning in Multilingual Educational Environments 87–102.
Berlin: Springer.
Muñoz, C. and Navés, T. (2007) Spain. In A. Maljers, D. Marsh, and D. Wolff (eds) Win-
dows on CLIL 160–5. Graz, Austria: European Centre for Modern Languages.
Navés, T. and Victori, M. (2010) CLIL in Catalonia: an overview of research studies. In
D. Lasagabaster and Y. Ruiz de Zarobe (eds) CLIL in Spain: Implementation, Results
and Teacher Training 30–54. Newcastle: Cambridge Scholars Publishing.
Nemeth, D., Janacsek, K. and Fiser, J. (2013) Age-dependent and coordinated shift in
performance between implicit and explicit skill learning. Frontiers in Computa-
tional Neuroscience 7(147): 1–13. https://www.ncbi.nlm.nih.gov/pubmed/24155717;
https://doi.org/10.3389/fncom.2013.00147
Pérez-Canado, M. (2012) CLIL research in Europe: past, present, and future. Interna-
tional Journal of Bilingual Education and Bilingualism 15(3): 315–41. https://doi.
org/10.1080/13670050.2011.630064
Pladevall-Ballester, E. and Vallbona, A. (2016) CLIL in minimal input contexts: a longi-
tudinal study of primary school learners’ receptive skills. System 58: 37–48. https://
doi.org/10.1016/j.system.2016.02.009
Rastelli, S. (2014) Discontinuity in Second Language Acquisition. The Switch between
Statistical and Grammatical Learning. Multilingual Matters.
Rebuffot, J. (1998) Aspects récents de l’immersion en français au Canada vers le renou-
vellement de la pédagogie immersive. In J. Arnau and J. Artigal (eds) Immersion
Programmes: A European Perspective 685–92. Barcelona: Publicacions de la Univer-
sitat de Barcelona.
Ruiz de Zarobe, Y. and Lasagabaster, D. (2010) Introduction. The emergence of CLIL in
Spain. An educational challenge. In D. Lasagabaster and Y. Ruiz de Zarobe (eds)
CLIL in Spain: Implementation, Results and Teacher Training ix–xvii. Newcastle:
Cambridge Scholars.
Rumlich, D. (2013) Student general English proficiency prior to CLIL: empirical evi-
dence for substantial differences between prospective CLIL and non-CLIL students
in Germany. In S. Breidbach and B. Viebrock (eds) Content and Language-Integrated
Learning (CLIL) in Europe: Research Perspectives on Policy and Practice 181–201.
Bern: Peter Lang.
Rumlich, D. (2014). Prospective CLIL and non-CLIL students’ interest in English
(classes): a quasi-experimental study on German sixth graders. In R. Breeze, C. Mar-
tínez Pasamar, C. Llamas Saíz and C. Tabernero Sala (eds) Integration of Theory and
Practice in CLIL. Utrecht Studies in Language and Communication 75–96. Amster-
dam and New York: Rodopi.
Rumlich, D. (2016). Evaluating Bilingual Education in Germany: CLIL Students’ Gen-
eral English Proficiency, EFL Self-Concept and Interest. Bern: Peter Lang. https://doi.
org/10.3726/978-3-653-06460-5
Seikkula-Leino, J. (2007). CLIL learning: achievement levels and affective factors. Lan-
guage and Education 21(4): 328–41. https://doi.org/10.2167/le635.0
Shintani, N. (2011). A comparative study of the effects of input-based and production-
based instruction on vocabulary acquisition by young EFL learners. Language
Teaching Research 15(2): 137–58. http://ltr.sagepub.com/content/15/2/137; https://
doi.org/10.1177/1362168810388692
Slobodanka, D., Hultgren, A. K. and Jensen, C. (2015). English-Medium Instruction in

European Higher Education. Series: Language and Social Life 4. Berlin: De Gruyter
Mouton.
Smit, U. (2010). English as a Lingua Franca in Higher Education. A Longitudinal Study
of Classroom Discourse. Berlin: De Gruyter. https://doi.org/10.1515/9783110215519
Swain, M. (1991). French immersion and its off-shoots: getting two for one. In B. F.
Freed (ed.) Foreign Language Acquisition Research and the Classroom 91–103. Lex-
ington, MA: D. C. Heath.
Wilkinson, R. and Walsh, M. L. (2015). Integrating Content and Language in Higher
Education. From Theory to Practice. Selected papers from the 2003 ICLHE Confer-
ence. Berlin: Peter Laing. https://doi.org/10.3726/978-3-653-05109-4

A Micro Process Product Study of A CLIL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Micro Process Product Study of A CLIL

Uploaded by

Copyright:

Available Formats

isla (print) issn 2398–4155

A micro process-product study of a CLIL lesson:

Michael H. Long, Assma Al Thowaini,

We begin by comparing two models for the simultaneous teaching of language

isla vol 2.1 2018 3–38 doi: https://doi.org/10.1558/isla.33605

keywords: immersion; clil; incidental learning; age effects; teacher

Models for the integration of language learning and

Widely recognised as one of the most successful, Canadian immersion was

French immersion in Canada

CLIL in Europe and beyond

bilingual regions of Spain. Many programs begin as early as infant or

French immersion in Canada CLIL programs in many countries

early immersion, relies on students’ capacity for incidental (as opposed

the capacity is at its strongest, incidental learning requires large quanti-

the mid 2000s’ (Dalton-Puffer et al. 2013:214). Do students really learn an

and measures of unknown reliability and/or unestablished equivalence of

groups in general English abilities or in knowledge of history. A positive

A micro process-product study of a CLIL lesson

• RQ2: What effect do the differences have on student learning of

English baseline CLIL condition Arabic baseline

3 teachers, NSs of 3 teachers, NSs of 3 teachers, NSs of

Figure 2: Design for the study.

contained several claims that raised suspicions as to its veracity, thereby

Modified cloze test of vocabulary knowledge

Data and analyses

length) were computed using Wordsmith lexical analysis software. Low-

1 the findings in the studies of Mackay (1986, 1993), Lynch (1987)

We predicted more speech, more syntactically complex speech, and lexi-

Quantity of teacher speech

Table 1a: Words in teacher speech.

Table 1b: Clausal utterances in teacher speech.

English baseline CLIL

Syntactic complexity of teacher speech

conditions. As predicted, the average complexity of teacher speech in the

Table 2a: Syntactic complexity of teacher speech.

English baseline CLIL

Teacher T1 T2 T3 Total T1 T2 T3 Total

Table 2b: Mean s-nodes per clausal utterance in teacher speech.

Lexical complexity and diversity of teacher speech

that the assumptions of normality and homogeneity of variance, respec-

Group n Mean SD SEM

Table 3b: Low-frequency words in teacher speech.

English baseline CLIL

Table 3c: Group statistics for low-frequency words.

Group n Mean SD SEM

groups were statistically significant, with a p-value of 0.00. The difference

The lexical richness and diversity of teacher speech tended to be greater

1 the small numbers of teachers and students in the study,

The differences observed, it should be remembered, were all in the pre-

Limitations and conclusions

subject-matter learning outcomes. As detailed elsewhere (Long, 2015b),

About the authors

Assma Al Thowaini is a PhD candidate in the Second Language Acquisition program

Buthainah Al Thowaini is a PhD candidate in the Second Language Acquisition

Payman Vafaee is a Lecturer in Applied Linguistics at Teachers College of Columbia

Appendix 1: subject matter for the lesson (excerpts)

The Kiriboe: fact or fiction?

Living arrangements: All generations of a family live together in a single long,

Social structure and life-style: A matriarchal society of hunter-gatherers. Men have

Appendix 3: total mentions of the 40 target items, and number

Llinares, A. and Lyster, R. (2014) The inﬂuence of context on patterns of corrective

Muñoz, C. (2015) Time and timing in CLIL. In M. Juan-Garau and J. Salazar-Noguera

Slobodanka, D., Hultgren, A. K. and Jensen, C. (2015). English-Medium Instruction in