You are on page 1of 15

Teaching and Teacher Education 88 (2020) 102971

Contents lists available at ScienceDirect

Teaching and Teacher Education


journal homepage: www.elsevier.com/locate/tate

Professional development on fostering students’ academic language


proficiency across the curriculumdA meta-analysis of its impact on
teachers’ cognition and teaching practices
Eva Kalinowski a, *, Franziska Egert b, Anna Gronostaj a, 1, Miriam Vock a
a €t Potsdam, Empirische Unterrichts- und Interventionsforschung, Karl-Liebknecht-Str. 24e25, 14476 Potsdam, Germany
Universita
b
Katholische Stiftungshochschule München, Don-Bosco-Str. 1, 83671 Benediktbeuern, Germany

h i g h l i g h t s

 Focuses teacher training on promoting academic language support in content areas.


 Effects of 11 training interventions for in-service teachers are aggregated.
 Meta-analyses reveal effects on teachers’ cognition und teachers’ practices.
 Methodological quality of studies moderates the effect magnitude.

a r t i c l e i n f o a b s t r a c t

Article history: This meta-analysis aggregates effects from 10 studies evaluating professional development interventions
Received 25 November 2018 aimed at qualifying in-service teachers to support their students in mastering academic language skills
Received in revised form while teaching their respective subject areas. The analysis of a subset of studies revealed a small non-
23 September 2019
significant weighted training effect on teachers’ cognition (g’ ¼ 0.21, SE ¼ 0.14). An effect aggregation
Accepted 7 November 2019
including all studies (with 650 teachers) revealed a medium to large weighted overall effect on teachers’
Available online 2 December 2019
classroom practices (g’ ¼ 0.71, SE ¼ 0.16). Methodological variables moderated the effect magnitude.
Nevertheless, the results suggest professional development is beneficial for improving teachers’ practice.
Keywords:
Professional development
© 2019 Elsevier Ltd. All rights reserved.
Language
Cross-curriculum
Content areas
In-service teacher training

1. Introduction English proficiency is substantially influenced by a language other


than English, make up approximately 10% of all public school stu-
The ability to use the type of language used in school is dents (McFarland et al., 2018). In fact, there is a persistent
considered a major prerequisite for educational success achievement gap in educational attainment between such students
(Schleppegrell, 2004). This appears problematic in light of the and those who have access to the language of instruction in their
number of students who have difficulty with the language used at home contexts. This also applies to children from socially disad-
school. For example, in the US, English language learners (ELLs) vantaged families (McFarland et al., 2018; OECD, 2016). To close
alone, whose native language is not English or whose level of these gaps, policies and concepts calling on teachers to integrate
language support into their subject area teaching have been
established (see e.g., Carnegie Council on Advancing Adolescent
* Corresponding author.
Literacy, 2010; Cheuk, 2016; Van Roekel, 2011). Considerable ef-
E-mail addresses: eva.kalinowski@uni-potsdam.de (E. Kalinowski), franziska. forts and investments in teacher professional development (PD) are
egert@ksh-m.de (F. Egert), anna.gronostaj@deutsche-schulakademie.de made to qualify teachers for this challenging task (see e.g., National
(A. Gronostaj), miriam.vock@uni-potsdam.de (M. Vock). Clearinghouse for English Language Acquisition, 2017; U.S.
1
Author’s affiliation has changed since the work described in the article was
Department of Education, 2017; and Education through Speaking
conducted. Present address: Die Deutsche Schulakademie gGmbH, Hausvogteiplatz
12, 10117 Berlin, Germany. and Writing [BiSS], 2014, in Germany). In most US states and in

https://doi.org/10.1016/j.tate.2019.102971
0742-051X/© 2019 Elsevier Ltd. All rights reserved.
2 E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971

other countries as well, teachers are required to participate in some mentioned risk, teachers should provide directed support to stu-
amount of PD, e.g., to maintain their teaching licenses (Hoffman & dents at all grade levels in elementary and high schools and from all
Harris, 2018; OECD, 2009). PD endeavors are rooted in the socioeconomic and linguistic backgrounds, and especially to stu-
assumption that they benefit teachers’ competence development dents who do not encounter academic language at home. In order
and improve teaching quality. Although there is evidence that to better account for the interconnected development of language
teacher PD can foster positive change (Hattie, 2009), it is not and content knowledge and to avoid segregation, many policies and
necessarily effective in all cases (Jacob & McGovern, 2015; Wei, researchers recommend instructional approaches that integrate
Darling-Hammond, Andree, Richardson & Orphanos, 2009). academic language and literacy instruction into subject-area
Instead, PD success depends on various factors, such as character- teaching (Becker-Mrotzek, Schramm, Thürmann, & Vollmer, 2012;
istics of the PD intervention and individual school and teacher Chamot & O’Malley, 1987; Fillmore & Snow, 2000; Lucas & Villegas,
characteristics, and can differ substantially across domains 2011; U.S. Department of Education, 2016; Valde s, Kibler, & Walqui,
(Lipowsky, 2014; Timperley, Wilson, Barrar, & Fung, 2007). Never- 2014). Corresponding programs and concepts have been developed
theless, policymakers and school leaders need to ensure that in recent years. These programs are based on the simultaneous
available PD opportunities are effective (OECD, 2009). In the spe- learning of academic language and subject-area content, thus
cific field of language development in academic content classes, making content accessible to all students. Effective teacher PD in
individual PD programs have been evaluated, but to our knowledge this field is essential in order to qualify teachers.
no comprehensive overview of the entirety of existing evaluation
studies exists for this specific field until now. Thus, the aggregate 1.2. Effectiveness of teacher professional development
effect of such PD initiatives is unclear. The present study was
conducted to close this research gap and investigate what quanti- The term PD is used inconsistently in the literature. In this study,
tative studies tell us about the effectiveness of PD aimed at it “designates any purposeful, to some extent face-to-face,
fostering students’ language proficiency across the curriculum, and formalized and organized learning and/or training opportunity
more specifically, the effect such PD has on teachers. for in-service teachers” (Kalinowski, Gronostaj, & Vock, 2019, p. 3).
Researchers, particularly in the US, have started to examine the
1.1. Language proficiency in teaching and learning processes effectiveness of PD specifically focusing on language support in
subject-area teaching. Parts of this evaluation literature have
Language is an essential medium for teaching and learning. already been summarized. For instance, Knight and Wiseman
Whereas language skills used in everyday or home life also play a (2005) examined several peer-reviewed US studies on PD for
role at school, typical tasks at school, such as analyzing a text, teachers of (linguistically) diverse students. They assumed that
explaining a phenomenon, or writing an essay, require proficiency teacher PD that is not tailored to the specific target group might be
in academic language (Cummins, 2008; Francis, Rivera, Lesaux, inappropriate for diverse student groups and concluded that
Kieffer, & Rivera, 2006; Schleppegrell, 2001, 2009), “the language research on such PD and evidence for determining the effectiveness
used in schooling for purposes of learning” (Schleppegrell, 2009, p. of different PD approaches was lacking. A few years later, Bunch
3). Academic language is characterized by specific grammatical, (2013) focused on the pedagogical language knowledge main-
lexical and discursive features associated with academic contexts, stream teachers of English learners should possess and found
involves mastery of written and oral discourse and encompasses research claims regarding outcomes to be quite varied and limited.
literacy focused on higher-order skills instead of basic or technical DiCerbo, Anstrom, Baker, and Rivera (2014) reviewed the literature
skills, such as decoding single words (Fillmore & Snow, 2000; on teaching academic English to ELLs, including a few studies on
Scarcella, 2003; Schleppegrell, 2001). Anstrom et al. (2010) teacher PD. They conclude that the reviewed studies suggest
comprehensively describe different conceptualizations of aca- beneficial impacts of high-quality PD. Zhang (2014) synthesized 25
demic language. studies on PD for teachers of ELLs and found that these studies
Studies indicate that academic language poses more difficulties primarily reported successes. Similarly, a systematic review by
than everyday language for all learners. Furthermore, students’ Kalinowski, Gronostaj, & Vock, 2019 on teacher PD to foster stu-
sociocultural backgrounds seem to affect their language proficiency dents’ academic language proficiency in content area classes found
(Heppt, Henschel, & Haag, 2016; Townsend, Filippini, Collins, & some degree of positive impact for all evaluated interventions,
Biancarosa, 2012; Uccelli, Galloway, Barr, Meneses, & Dobbs, although much of the primary research was based on small sample
2015): Students whose language experiences at home are not sizes. To date, none of the existing research syntheses have quan-
aligned with those at school are less likely to meet the re- tified the effects of PD interventions that specifically focus on
quirements essential for school success. This particularly concerns integrating academic language and literacy instruction into
students from low socioeconomic backgrounds and other students subject-area teaching in a meta-analytic way.
growing up in an environment where a different language than the
one used in educational institutions is dominant, such as ELLs. Since 1.2.1. Outcomes of professional development
language proficiency and particularly academic language profi- Generally, the outcome levels and constructs used in PD eval-
ciency affects students’ academic performance (Gogolin & Lange, uation studies vary greatly. Teacher PD effectiveness is usually
2011; Heppt et al., 2016; Townsend et al., 2012) but academic lan- measured on one or several of the following levels: (1) teachers’
guage skills are usually not explicitly taught in schools but rather satisfaction with or acceptance of the PD intervention; (2) teachers’
implicitly presupposed (Brisk & Zisselsberger, 2011; Feilke, 2012; cognition, i.e., changes in knowledge, beliefs, motivation, etc.; (3)
Schleppegrell, 2001; Schmo € lzer-Eibinger, 2013), these students are teachers’ classroom practices; and (4) student achievement and
at particular risk of low academic achievement (McFarland et al., development (e.g., Guskey, 2000; Lipowsky & Rzejak, 2015; Wade,
2018; Stanat, Weirich, & Radmann, 2012). 1984). Across various domains, teacher PD programs have been
The situation that teachers usually do not explicitly teach the found to have an impact on the student level, which can be
language skills fundamental for learning at school comes with the considered the ultimate goal of teacher PD (Birman, Desimone,
fact that most teachers are ill-prepared for this challenging task Porter, & Garet, 2000). In his meta-analysis, Hattie (2009)
(Bunch, 2013; Quintero & Hansen, 2017; Samson & Collins, 2012). computed a substantial overall teacher PD effect of d ¼ 0.62.
To make content accessible to all students and reduce the above- However, meta-analytical evidence on the effects of PD varies
E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971 3

considerably across domains, such as literacy and science (e.g., Maerten-Rivera, 2012; Shanahan & Shea, 2012).
Timperley et al., 2007). We will present the results of primary Overall, the results concerning these sort of classroom practices
studies quantitatively evaluating PD aimed at qualifying teachers to have been heterogeneous, that is to say, they do not point in one
foster students’ language and literacy skills in subject areas ac- direction, as illustrated by the following examples. Greenleaf et al.
cording to the four aforementioned levels as follows. (2011), who examined “the literacy challenge offered to students”
Level 1: Satisfaction and acceptance. Some evaluation studies (p. 680), reported significantly higher ratings on some measures for
have examined teachers’ reactions to and satisfaction with the intervention group teachers compared to control group teachers,
intervention in question in a descriptive way. They mostly reported but also several non-significant results. Other examined teaching
positive teacher feedback (e.g., He, Prater, & Steed, 2011; McIntyre, practices include initiating collaborative activities, introducing key
Kyle, Chen, Mun ~ oz, & Beldon, 2010). However, it is generally not vocabulary, using visuals, and employing a slower pace of discourse
possible to quantify effects on this level in a (quasi-)experimental and clearer enunciation (Choi & Morrison, 2014; Echevarria &
way, because it is obviously not meaningful to compare teachers’ Graves, 2007; Echevarria, Vogt, & Short, 2004; Gibbons, 2006;
assessment of a given PD program with their assessment before the Haynes, 2007; Herrera & Murry, 2005; Miramontes, Nadeau, &
intervention or with non-participants. Commins, 1997). These practices are variously referred to as En-
Level 2: Teachers’ cognition. With respect to cognition, there are glish-as-a-second-language strategies (Anderson, 2009) and features
several areas that one would expect to be affected by PD in- of sheltered instruction (Crawford, Schmeister, & Biggs, 2008).
terventions focused on fostering students’ language skills in Another term used is linguistic scaffolding (Hart & Lee, 2003; Lee &
subject-area classes, in particular teachers’ self-efficacy, beliefs, and Maerten-Rivera, 2012), which emphasizes the responsiveness of
attitudes, as well as their knowledge. Cognition may be defined as teachers’ support to the particular demands made on students
comprising beliefs as well as knowledge (see e.g., Borg, 2006) and (Gibbons, 2015). For example, Hart and Lee (2003) reported pre-
self-efficacy may be defined as a belief (Guskey & Passaro, 1994). To post mean differences indicating significant positive change in
add to this, according to Bandura’s (1993) social-cognitive theory, the linguistic scaffolding provided by teachers with a small effect.
self-efficacy is cognitively generated. Some positive findings for In contrast, they did not find significant change in instructional
these areas of teacher cognition exist. The results vary in terms of strategies to encourage reading and writing. Other components
statistical significance and effect size, however. For example, that have been studied include teachers’ use of cognitive strategies
Cantrell and Hughes (2008) studied an intervention’s impact on in literacy activities during their lessons (Olson et al., 2012) and
different measures of teachers’ self-efficacy, i.e., their “belief or engaging students in metacognitive conversations about reading
conviction that they can influence how well students learn” processes or reflections on learning (Cantrell & Hughes, 2008;
(Guskey & Passaro, 1994, p. 628), for literacy teaching. They re- Greenleaf et al., 2011). Considerable PD effects have been demon-
ported significant improvements from pre- to post-test. Hart and strated for different measures related to both. Another language
Lee (2003), as another example, examined PD impact on the support strategy studied in existing primary research is teachers’
importance teachers ascribed to incorporating language and liter- use of reasoning and sophisticated and varied vocabulary. Henrichs
acy into science instruction and found mixed results: Pre- and post- and Leseman (2014) reported intervention effects between non-
data comparisons indicated significant increases for reading and significant ES ¼0.27 and significant ES ¼ 1.25 for several specific
writing with a small effect, but no significant change for grammar. topics and areas. These examples show the wide variety of
Greenleaf et al. (2011) studied effects on different aspects of dependent variables that can be used to capture teachers’ class-
teachers’ teaching philosophy (among various other constructs), room practices.
including the degree to which teachers view diverse language Level 4: Student achievement. Several studies that evaluated PD
backgrounds as a potential asset. They reported effects of inter- aimed at increasing students’ comprehension of content while
vention and control group comparisons between non-significant facilitating language acquisition reported positive changes in stu-
ES ¼0.24 and significant ES ¼ 0.76. Lee and Maerten-Rivera dent outcomes, e.g., students’ language proficiency and content
(2012) examined teachers’ knowledge of the subject matter. knowledge (e.g., August et al., 2014; Lara-Alecio et al., 2012).
Teachers need subject knowledge to be able to enhance students’ However, using this level of evaluation has potential risks, partic-
language skills and understanding of the subject at the same time. ularly when no data on the teacher level is available. Teacher PD
Lee and Maerten-Rivera (2012) found slight but non-significant and student achievement are assumed to be linked only indirectly
growth in teachers’ science knowledge during the first year of the (Yoon, Duncan, Lee, Scarloss, & Shapley, 2007). In the case of null
intervention but no further increase in subsequent years. results, it is impossible to determine whether improvements at the
Level 3: Classroom practices. During PD, teachers may also be teacher level emerged that just did not further affect student
expected to develop specific skills to foster (academic) language achievement. Furthermore, PD effectiveness on the student level
proficiency across the curriculum. According to the empirically might be confounded with several other factors, such as the
based Teaching through Interactions model, which considers effectiveness of the student support activities teachers are sup-
teacher-student interactions to play a crucial role in student posed to learn about in PD. The latter is particularly problematic in
learning in lessons (Hamre et al., 2013), students’ language learning this field, because the effectiveness of language support strategies
is related to the quality of language modeling and instructional and programs has not yet been sufficiently evaluated (Paetsch,
support in teacher-student interactions (e.g., opportunities to ex- Wolf, Stanat, & Darsow, 2014). Thus, differences in the effective-
press existing skills, feedback and scaffolding). PD effects on ness of different student support activities might affect how stu-
language-supportive teaching practices are measured in different dents benefit from teacher PD programs.
ways (based on different theories of language and literacy), but the
techniques examined are instructional strategies that likely predict 1.2.2. Potential moderators and sources of bias
the development of students’ language and literacy skills. Such Due to different outcome constructs and measures, it remains
strategies are manifold and include the promotion of inquiry, unclear how effective PD aimed at qualifying teachers to foster
proper use of language structures, engaging students in literacy- students’ language and literacy skills in subject-area classes is
related activities, and providing students with opportunities for overall. Additionally, a number of factors can potentially affect
meaningful and authentic academic talk, such as making reasoned evaluation outcomes, making it even more difficult to determine
arguments and negotiating ideas (Greenleaf et al., 2011; Lee & the effectiveness of these programs. Effect sizes may vary
4 E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971

depending on characteristics of the study as well as the PD program discussed (Birman et al., 2000). The PD interventions studied in the
in question. review by Kalinowski, Gronostaj, & Vock, 2019 encompassed a wide
Study characteristics. Wilson and Lipsey (2001) showed that range of formats, with workshops and coaching being the most
several factors regarding study quality and methodology can ac- common components. Coaching may be of particular interest, as it
count for almost as much variation in outcomes in meta-analyses as has been found to be a particularly effective form of PD for language
intervention features. For instance, different research designs can and literacy practicesdat least among early childhood educators
explain differences in effect sizes between studies (Higgins & (Markussen-Brown et al., 2017; Neuman & Wright, 2010). Beyond
Green, 2011). Wilson and Lipsey (2001) found that one-group the type of format, the number of formats included may also be
pre-post designs tend to overestimate intervention effects important. Markussen-Brown et al. (2017) found a significant
compared to comparison group designs. Furthermore, quasi- positive association between the number of formats used in PD and
experimental designs may inflate intervention effect estimates. educators’ practices.
For example, in terms of selection bias, higher motivation among In summary, the information found in the literature to date on
participants in the intervention group could result in higher effects the effects of teacher PD aimed at fostering students’ language skills
(Borenstein, Hedges, Higgins, & Rothstein, 2009; Cheung & Slavin, across the curriculum is heterogeneous. Variations in study quality
2016). Rigorous research designs, such as randomized controlled and methodological factors as well as intervention features
trials, can reduce the risk of such alternative explanations for gains complicate the interpretability of current primary PD research.
(WWC, 2017; Yoon et al., 2007). However, even in randomized Thus, the overall effectiveness of PD aimed at qualifying teachers to
trials, methodological variables can affect effect sizes when foster language development in academic content classes and the
randomization occurs at the group (e.g., school) rather than the extent to which study and intervention features relate to the
individual (e.g., teacher) level. Such cluster-randomized trials may magnitude of effects is unclear. Reviews and syntheses can help to
be associated with greater effects, e.g., due to motivation bias integrate existing literature, estimate overall effects and reveal
resulting from greater compliance in implementing new methods variables that can potentially account for differences in the results
in preexisting groups. In contrast, randomized trials tend to un- of studies (Cooper & Hedges, 2009). Moreover, they can increase
derestimate intervention effects when teachers from the same statistical power and the precision of intervention effect calcula-
school are randomized individually, and PD content is passed on tions (Cohn & Becker, 2003; Mulrow, 1994).
from intervention group to comparison group teachers (Higgins,
Deeks, & Altman, 2011). Furthermore, the measures and scales 1.3. Aim of this study
used to evaluate the intervention impact matter. For example, in-
struments specifically developed for the study in question may Given the lack of syntheses of existing primary studies and the
yield larger effects than standardized/published instruments due to lack of information on the overall effectiveness of PD interventions
closer alignment with the content of the PD intervention (Blank & to help teachers integrate the development of language necessary
de las Alas, 2009; Cheung & Slavin, 2016; Gersten et al., 2005; for schooling into their teaching of academic content, our research
Wilson & Lipsey, 2001). While instruments generally need some objectives were twofold. We aimed (1) to estimate the effect of such
degree of alignment and sensitivity to identify effects, unstan- PD programs across existing studies on (a) teachers’ cognition and
dardized and unestablished instruments can fall prey to insufficient (b) teachers’ classroom practice and (2) to examine the distribution
broadness and unknown validity (Gersten et al., 2005; Yoon et al., of effect sizes and gather evidence on potential effect modifiers.
2007). Moreover, there may be issues associated with the reli- This is of great practical relevance, as it can help us better under-
ability of outcome measures (Hunter & Schmidt, 2004). For stand PD effectiveness and show what is important in PD in-
example, scales with low internal consistency or low interrater terventions. Assuming a sequential relationship between teacher
reliability may reduce precision (WWC, 2017). Furthermore, the change and student outcomes (Yoon et al., 2007), it seems neces-
information source may affect the magnitude of the effect (Wilson sary to carefully examine the teacher level beforedwhere appro-
& Lipsey, 2001). For example, unlike an independent rater, a person priatedturning to the student level in subsequent research. While
involved in the development or delivery of an intervention might meta-analyses focusing on teacher PD aimed solely at fostering
not be able to judge that intervention without bias; thus, larger students’ general and basic literacy skills (without integration into
effects may be expected if the information source is affiliated with content-area teaching) already exist (e.g., Basma & Savage, 2018;
the intervention. Similarly, the allegiance of the researchers con- Timperley et al., 2007), this study specifically focuses on teacher PD
ducting the evaluation can potentially influence PD outcomes to promote academic language support in content areas. In contrast
(Luborsky et al., 1999; Munder, Brütsch, Leonhart, Gerger, & Barth, to some of the non-quantitative research syntheses cited above
2013). (e.g., Zhang, 2014), this study’s focus is not limited to PD geared to
PD intervention characteristics. Features of successful teacher ELLs.
training across disciplines are well known. One of the most often-
cited features is program duration. There is consensus that PD 2. Method
should be intensive and sustained over time (Garet, Porter,
Desimone, Birman, & Yoon, 2001; Guskey & Yoon, 2009). Howev- 2.1. Inclusion criteria
er, findings on the relationship between the duration of a PD pro-
gram and its effectiveness are inconsistent (Lipowsky & Rzejak, We applied a meta-analytic method to meet these research
2015). In the area of academic language across the curriculum, PD objectives. After perusing the relevant literature, we developed the
programs thought to be effective differ greatly in duration, following selection criteria for study inclusion: The study must (a)
although most are long-term Kalinowski, Gronostaj, & Vock, 2019. evaluate the effectiveness of a PD intervention for in-service
According to Timperley et al. (2007), a certain amount of time and teachers aimed at helping them integrate students’ development
an extended period of PD are important, yet not sufficient. There of the language necessary for schooling into their academic content
seems to be no ideal amount of time and no linear relationship instruction; (b) have been published between January 2000 and
between PD duration and effectiveness in general. Another aspect January 2016; (c) be written in English; (d) and be a quantitative
related to PD effectiveness is program format. For example, tradi- empirical study that reported either effect sizes or statistical data
tional workshops versus more innovative approaches have been allowing effect sizes to be computed from pre- and post-data and/
E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971 5

or intervention and comparison group data. Further, the examined Zisselsberger, 2011; McIntyre et al., 2010). Other studies were not
PD intervention (e) must have been conducted among teachers at proper empirical studies (but rather short descriptive reports, for
general education schools, including kindergarten if part of primary instance) (e.g., Frey, Fisher, & Nelson, 2010; Walsleben, 2008),
education; and (f) be described in some detail. Criterion (a) was focused on questions other than the effectiveness of the PD inter-
further specified in that the study’s focus had to be the evaluation vention (e.g., Patrick, 2012) or a different type of language support
of the PD intervention itself rather than the student support pro- (e.g., Vaughn et al., 2008), or barely described the PD intervention
gram teachers were trained to use in the intervention. In line with (e.g., Eun, 2006; Kolano, Davila, Lachance, & Coffey, 2014). To avoid
the above definition of academic language, we excluded PD initia- duplicate data, we included the more recent and comprehensive of
tives focusing on initial or basic reading and writing, foreign- two articles that reported on the same study (Kim et al., 2011; Olson
language learning, and children with disabilities. Whereas some et al., 2012). If studies met all inclusion criteria but lacked the
researchers advocate using only studies with rigorous research required statistical information, we tried to contact the authors to
designs (such as randomized controlled trials) in a meta-analysis, request the missing information. Fig. 1 illustrates the different
this can exclude much of the existing research. Instead of stages of the literature search with the respective numbers of ar-
excluding studies beforehand, it can be more informative to ticles identified, assessed, excluded, and retrieved. As can be seen,
examine systematic differences in study characteristics among the the studies found to meet the inclusion criteria reported effects
included studies as part of the meta-analysis (Chambers, 2004; across different outcome levels, namely teachers’ cognition (e.g.,
Hattie, 2009; Lipsey & Wilson, 2001; Pant, 2014). knowledge, beliefs), teachers’ practice, and student outcomes. None
of the thematically relevant studies reported quantifiable data on
teachers’ reactions to PD. In this paper, we focus on the available
2.2. Search procedure results for the teacher level.

The search for studies was conducted using a systematic multi- 2.3. Data coding
step process. To define our key concepts and search terms, we
conducted preliminary searches and used database thesauri. We Following recommendations by Lipsey and Wilson (2001), we
then developed a comprehensive search syntax2 with Boolean coded study and PD intervention variables (derived from the
operators including terms describing academic language or the literature cited in Section 1.2) as well as effects (separated by
student population of interest (e.g., “language,” “literacy”, outcome levels). Table 1 lists the coded variables and their
“linguistically diverse”, “bilingual”) and terms to indicate that the respective definitions.4 Following the guidelines of The Campbell
language support took place in the subject areas (e.g., “across the Collaboration (2014), all included studies were independently
curriculum,” “mainstream”, “sheltered instruction”). We used coded by the first and second author of this paper, who were both
various combinations of these keywords as appropriate and further research associates with several years of experience in the field of
combined them with “teacher professional development” and research on educators’ PD, to ensure the reliability of coding. We
synonyms (e.g., “teacher* professional learning”, “teacher* used a coding form for all codes on the study and intervention level.
training”) as well as “effectiveness” and alternate terms (e.g., Interrater reliability, estimated using SPSS 24.0 (IBM Corp, 2016),
“impact”, “effic*”). Using this syntax, we searched the databases was excellent, with Cohen’s k ¼ 0.87 (calculated for nominal vari-
Educational Resources Information Centre and EBSCOhost, including ables) and intraclass correlation ICC ¼ 0.87 (calculated for interval
PsycINFO, PsycARTICLES, Education Full Text (H.W. Wilson), and Psy- variables). For all codes on the effect size level, one author entered
chology and Behavioral Sciences Collection. We identified 1,778 re- the data directly into a spreadsheet in the Comprehensive Meta-
cords. The first author of this paper screened and applied the Analysis (CMA) software, Version 3 (Borenstein, Hedges, Higgins,
inclusion criteria to the titles and abstracts of the identified records. & Rothstein, 2013), whereupon the other author checked the en-
A trained research assistant independently rated 27% of the records. tries and marked discrepancies. Since none of the studies provided
Inter-rater reliability was almost perfect, as indicated by Cohen’s correlations between pre- and post-measurements, we inserted a
k ¼ 0.81 (Landis & Koch, 1977). Next, the full texts of the approved conservative estimate of r ¼ 0.50 (Fukkink & Lont, 2007). Interrater
documents were screened.3 Disagreements and uncertainties about reliability for codes on the effect size level was 99.4%. Disagree-
whether a study should be included were discussed until consensus ments in coding were solved through discussion. A short descrip-
was reached. We also conducted a manual search in relevant tion of the PD programs’ content and specification of the outcome
journals and examined the reference lists of all eligible studies. construct was added.
Additionally, we asked experts in the field for suggestions of
pertinent studies, attempting also to find unpublished studies and
2.4. Analysis
reduce the risk of publication bias, which is “the tendency [ …] to
submit or accept manuscripts for publication based on the direction
The statistical data provided in various formats in the primary
or strength of the study findings” (Dickersin, 1990, p. 1385),
studies was transformed into the standardized mean difference
generally in favor of positive or statistically significant results. The
corrected for bias, Hedges’ g. This is the appropriate measure when
71 articles identified through these processes were also examined
sample sizes are small (Borenstein et al., 2009). Mean differences
using the assessment procedure outlined above. The main reasons
were based on data on pre-post changes or contrasts between
for excluding articles after full-text screening stemmed from the
intervention and comparison groups. If pre- and post means and
inclusion criteria: Many studies lacked a study design (e.g., only
standard deviations for both the intervention and comparison
post-intervention surveys on perceived PD impact) or statistical
group were given, we standardized by change score standard de-
data allowing for the estimation of effect sizes (e.g., Brisk &
viation. If multiple outcome measures were available on the same
outcome level, we nested the effect sizes under PD intervention and
2
calculated an aggregate effect size per intervention. We applied a
The syntax will be provided by the first author upon request.
3
The full text of seven potentially relevant dissertations could not be obtained
even after contacting the authors or their chairs (when contact details were
4
available). Further details on the inclusion process and a list of excluded studies can Some additional characteristics were coded but not analyzed because of
be provided upon request. missing data and the small number of studies.
6 E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971

Fig. 1. Literature search process with numbers of records considered (chart adapted from Moher, Liberati, Tetzlaff, Altman, & The PRISMA Group, 2009).

Table 1
Overview and description of codes used for included studies.

Code Description

Study
Publication year Year the study was published
Sample size Number of participating teachers at beginning of the study
Educational stage Possible categories: (1) elementary (kindergartenegrade 6), (2) secondary (grades 7e12), (3) mixed (kindergartenegrade 12)
Design General study design. Possible categories: (1) one-group pre-post measurement, (2) pre-post comparison group design, (3) comparison
group design without pre-measurement
Assignment to conditions Applicable if comparison group design was used. Possible categories: (1) non-random assignment and no matching of groups prior to
intervention, (2) non-random assignment but comparison group matched to intervention group, (3) random assignment at group/school
level, (4) random assignment at individual level
Instruments Instruments used for data collection. Possible categories: (1) survey, (2) interview, (3) observation, (4) assignment/test
Standardized instrument Standardized/established instruments that were used/validated in previous studies. Possible categories: (1) not included (i.e., all instruments
used were developed or adapted for the study in question), (2) included in study
Reliable data collection Reliable measures (internal consistency of scale with Cronbach’s alpha .65)/data collection process (interobserver agreement of ICC/
Kappa .70). Possible categories: (1) not included (i.e., reliability lower than cut-off values or not reported), (2) included in study
Independent information Independence of the person who provided data. Possible categories: (1) not included (rating solely by persons involved in PD, e.g., mentor, PD
source facilitator), (2) included (study included rating by independent persons, e.g., external researcher, student)
Independent evaluation Evaluation of PD intervention and evaluation financing by actors not involved in the development/delivery of the PD intervention. Possible
categories: (1) no, (2) missing, (3) yes
Intervention
Format Delivery format(s) employed. Possible categories: (1) workshop/course; (2) coaching/mentoring; (3) web-mediated/online training; (4)
classroom demonstrations; (5) other group meetings (e.g., PLG); (6) provision of curricular units
Time span Total time span in months in which PD took place. If a study did not report the time span in months, we calculated it ourselves assuming one
school year ¼ 9 months.
Duration Duration of PD workshops/courses in hours. If studies did not report workshop duration in hours, total hours were approximated by
assuming one full day ¼ 8 h. Other supports like coaching and team meetings could not be considered because their duration was often not
specified.
Effect size level
Outcome name Outcome as labeled in the report
Outcome level Level the provided outcome data applicable for meta-analysis matched. Possible categories: (a) teacher cognition (i.e., beliefs, attitudes,
knowledge), (b) teachers’ practices (i.e., techniques and instructional strategies to support students’ language development and make
content accessible to all learners)
Statistical data Outcome data applicable for meta-analysis. Format dependent on data given in reports, including sample sizes, means, standard deviations,
pre-post correlations

Note. All codes from Design through Independent information source were coded separately for (a) teacher cognition and (b) teachers’ practices.

separate meta-analysis for each outcome level (teachers’ cognition of effects using Q-statistics and I2 (Higgins, Thompson, Deeks, &
and teachers’ practices). We integrated the intervention effects into Altman, 2003), which relate the observed variance to the within-
one weighted summary effect size g’ per outcome level by con- study variance, and examined potential sources of heterogeneity,
ducting random-effects meta-analyses with interventions i.e. effect modifiers. We first investigated study characteristics,
weighted on the basis of the variance of their effect sizes. Random- followed by PD features. If variables were dichotomous, mixed-
effects models were used because variables that may have effects subgroup analyses assuming a common among-study vari-
impacted the magnitude of the effect size, such as samples and ance across subgroups were used to test whether the mean effects
interventions, varied from study to study. Thus, between-study for each subgroup were different from each other. For continuous
variance had to be taken into account in addition to within-study variables, separate random-effects meta-regressions were per-
variance (Borenstein et al., 2009). We estimated the homogeneity formed, taking into consideration that at least around 10 studies
E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971 7

should be available per covariate (Borenstein et al., 2009; Deeks, like self-efficacy, teaching philosophy and content knowledge.
Higgins, & Altman, 2011). In sensitivity analyses, i.e., repeating an Teachers’ practices, by contrast, comprised instructional practices
analysis while substituting alternatives for unclear or arbitrary and language input provided, both in the sense of practices to
decisions, we tested whether and how the effects changed if improve lesson quality.
particular studies (e.g., outliers) and values were omitted from the
analyses (Deeks et al., 2011). In addition, we assessed the potential 3.2. Effects of PD on teachers’ cognition
impact of publication bias using a funnel plot, for which a study’s
intervention effect is plotted against its standard error (as an index Four studies with N ¼ 378 teachers could be included to esti-
of precision). In the absence of publication bias, one would expect mate the effect of the examined PD programs on teachers’ cogni-
the plot to resemble a symmetrical funnel with the results of less tion. Eleven individual effect sizes were calculated for this outcome,
precise studies (which are smaller and have larger standard errors) ranging from one to four per study. Table 3 and Fig. 2 show that
being more widely scattered and allocated at the bottom of the plot estimates indicate negligible (g ¼ 0.02) to medium positive impacts
than those of more precise studies. Conversely, asymmetry may (g ¼ 0.63), according to the rules of thumb established by Cohen
indicate publication bias. We also applied Egger’s regression, which (1988). The meta-analysis showed a non-significant weighted
can be considered a statistical analogue of a funnel plot, with sig- overall effect of PD interventions on teachers’ cognition of g’ ¼ 0.21
nificance indicating asymmetry (Egger, Smith, Schneider, & Minder, (SE ¼ 0.14, p ¼ .120). Heterogeneity analysis to test whether all the
1997; Sterne, Egger, & Smith, 2001). All analyses were conducted dispersion of the interventions’ effect sizes was due to random
using CMA (Borenstein et al., 2013). error was non-significant (Q ¼ 6.03, df ¼3, p ¼.110, T2 ¼ 0.04,
SE ¼ 0.06). However, power may be low when one only has a small
number of studies to work with (Higgins et al., 2003). An I2 of 50.28,
3. Results
suggesting that 50.28% of the observed dispersion reflected real
differences in effect sizes. According to Deeks et al. (2011), this
3.1. Description of included studies and PD interventions
proportion may be interpreted as moderate heterogeneity, indi-
cating that the studies included in this analysis differ in some re-
Table 2 gives an overview of the characteristics of the 10 studies
spects, which could be clarified through further analyses. However,
and PD interventions included. The studies were published be-
because of the very small number of studies, heterogeneity was not
tween 2003 and 2014. All studies were conducted in the US, with
investigated further on this level.
the exception of one from the Netherlands. Nine were journal ar-
A one-study-removed procedure revealed that the aggregated
ticles and one was a dissertation. A total of N ¼ 650 teachers
effect was not very robust. In particular, if the intervention with the
participated in the studies at the beginning of the investigations.
highest effect size (Cantrell & Hughes, 2008) was removed, the
Teacher sample sizes ranged from 21 to 198 and were relatively
summary effect decreased to g’ ¼ 0.08 (SE ¼ 0.10, p ¼ .431).
small in many of the included studies. Six studies used a one-group
Conversely, removing the intervention with the lowest effect size
pre-post design and four used a pre-post comparison group design,
(Lee & Maerten-Rivera, 2012) resulted in a total effect of g’ ¼ 0.31
of which two also included group comparisons without pre-
(SE ¼ 0.18, p ¼ .098).
measurement. Of the controlled trials, one used matched groups,
We further analyzed the risk of publication bias. The funnel plot
two were randomized at the school level, and one was randomized
(Fig. 3) displays the relationship between the standard error and
at the individual level. Teachers’ cognition was assessed via surveys.
the effect size for each trial. In general, trials with small sample
Classroom practices were sometimes assessed via surveys and in-
sizes tend to have larger standard errors, i.e., to be less precise, and
terviews, but in most studies teachers were observed. In one study,
appear at the bottom of the figure. Although some small studies
teachers completed assignments instead. It was striking that only
with large standard errors are included in the meta-analysis,
one study included a standardized instrument and only one study
Egger’s regression suggests symmetry of results (Intercept ¼ 4.32,
clearly stated that the evaluators were not involved in the PD
SE ¼ 2.15, p ¼ .182).
intervention. Basically, the 10 studies evaluated 10 different PD
interventions. However, Shanahan and Shea (2012) compared data
3.3. Effects of PD on teachers’ classroom practices
from teachers who attended a certain amount of PD to data from
teachers who attended less than this amount. Considering the
Another meta-analysis on the basis of 11 trials examined out-
different amount of time the two groups participated in PD, we
comes related to teachers’ classroom practices. For this outcome, 74
viewed the study as evaluating two separate interventions, i.e., as
effect sizes were extracted, between one and 30 per trial. As dis-
two different trials. Thus, a total of k ¼ 11 PD interventions were
played in Table 4 and Fig. 4, aggregated intervention effects were all
included in our analysis. Elementary and secondary grade levels
positive and ranged from g ¼ 0.11 to g ¼ 2.17. The overall pooled
were represented. The time span of the in-service training and the
effect was g’ ¼ 0.71 (SE ¼ 0.16, p < .001), indicating a significant
duration of the workshops varied greatly, from half a day to three
medium to large PD effect on teachers’ classroom practices. How-
years and from three to 112 h respectively. Most of the in-
ever, there was significant variation between effect sizes (Q ¼ 41.41,
terventions applied multiple delivery formats. Workshops were
df ¼ 10, p <.001, T2 ¼ 0.19, SE ¼ 0.12) with I2 ¼ 75.85, suggesting that
part of all programs; coaching was the second most frequently-
a substantial percentage of the variation in individual trial results
used format. PD programs addressed various language support
cannot be explained by chance.
approaches and strategies, such as sheltered instruction. Six in-
In a sensitivity analysis, we removed the intervention studied by
terventions were related to a specific subject, which was science in
Choi and Morrison (2014), which exhibited an exceptionally large
five cases. All 11 trials evaluated PD program effectiveness on
effect, from the analysis. The results revealed a smaller, yet still
teachers’ practices; four studies additionally examined effects on
considerable overall effect of g’¼ 0.54 (SE ¼ 0.10, p < .001). The
teachers’ cognition.5 The latter were measured using constructs
removal of the study with the smallest effect (Hart & Lee, 2003)
only slightly changed the results; the aggregated effect was
5
Four of the included articles also considered the student level. For two other
g’ ¼ 0.78 (SE ¼ 0.16, p < .001). We conducted further explorative
studies, student outcomes were presented in separate articles identified during our analyses to examine potential bias in our data. From Lee and
search process. Maerten-Rivera (2012), we included science knowledge in the
8
Table 2
Overview of included studies and PD interventions.

Study Country Publication Sample Educational Design Instruments Standardized Reliable Independent Independent Time Duration Formats Content Outcomes
type stage instrument data information evaluation span
collection source

Anderson USA Disser- 32 Secondary Pre-post Observation Not incl. Incl. Not incl. No 9 80 Workshops, SIOP, ESL strategies, Practice
(2009) tation comparison group coaching, guided language (instruction)
(matched) demonstrations acquisition design
Cantrell and USA Journal 22 Secondary One-group pre- Survey, Not incl. Incl. Incl. Yes 9 64 Workshops, Content literacy Cognition (self-
Hughes post observation coaching, techniques efficacy),
(2008) groups practice
(instruction)
Choi and USA Journal 33 Mixed One-group pre- Observation Not incl. Not incl. Not incl. No 18 90 Workshops, SIOP Practice
Morrison post coaching, (instruction)
(2014) online, groups
Crawford et al. USA Journal 23 Elementary One-group pre- Observation Not incl. Not incl. Incl. Missing 4.5 45 Workshops, Sheltered instruction, Practice
(2008) post coaching total integrated (instruction)
language approach

E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971


Greenleaf et al. USA Journal 105 Secondary (Pre-post) Survey Not incl. Incl. Incl. No 12 80 Workshops, Reading Cognition
(2011) Comparison (teachers, coaching apprenticeship, (teaching
group students), inquiry-based philosophy),
(randomization at interview, science practice
school level) assignment (instruction)
Hart & Lee USA Journal 53 Elementary One-group pre- Survey, Not incl. Incl. Incl. No 9 32 Workshops, Scaffolding, inquiry- Cognition
(2003) post observation units based science (knowledge,
beliefs),
practice
(instruction)
Henrichs & NL Journal 59 Elementary Pre-post Observation Not incl. Incl. Not incl. No 0.03 3 Workshop Academic language Practice
Leseman comparison group input, science (language
(2014) (randomization at input)
school level)
Lee and USA Journal 198 Elementary One-group pre- Survey, Not incl. Incl. Incl. Missing 36 112 Workshops, Scaffolding, literacy Cognition
Maerten- post observation units development and (knowledge),
Rivera ESOL strategies, practice
(2012) inquiry-based (instruction)
science
Olson et al. USA Journal 103 Secondary (Pre-post) Observation Incl. Incl. Incl. No 9 46 Workshops, Cognitive strategies Practice
(2012) Comparison coaching approach to literacy (instruction)
group instruction, English
(randomization at
individual level)
Shanahan and 10 Observation Student-talk Practice
Shea (2012) strategies, Five E (instruction)
(a) model, inquiry-based
Shanahan and USA Journal 11 Elementary One-group pre- Not incl. Not incl. Incl. Missing 8 18 Workshops science
Shea (2012) post
(b)

Note. SIOP ¼ Sheltered Instruction Observation Protocol. ESL ¼ English as a second language. ESOL ¼ English for speakers of other languages.
E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971 9

Table 3
Effects of PD on teachers’ cognition in random-effects meta-analysis.

Study g SE Lower CI limit Upper CI limit p Relative weight

Cantrell and Hughes (2008) 0.63 0.23 0.19 1.08 .006 20.75
Greenleaf et al. (2011) 0.33 0.26 0.19 0.85 .212 17.41
Hart & Lee (2003) 0.05 0.15 0.23 0.34 .709 31.93
Lee & Maerten-Rivera (2012) 0.02 0.16 0.30 0.33 .921 29.90
Combined 0.21 0.14 0.06 0.48 .120

Fig. 2. Forest plot showing effects of PD on teachers’ cognition in random-effects meta-analysis. Squares represent the aggregated effect per trial with horizontal lines showing the
respective confidence interval. The diamond represents the summary effect size.

Fig. 3. Funnel plot for the cognition outcome. Circles represent trials; diamond represents the summary effect size and its confidence interval.

Table 4
Effects of PD on teachers’ practices in random-effects meta-analysis.

Study g SE Lower CI limit Upper CI limit p Relative weight

Anderson (2009) 0.98 0.37 0.26 1.70 .008 7.35


Cantrell and Hughes (2008) 0.46 0.22 0.04 0.89 .033 10.05
Choi and Morrison (2014) 2.17 0.32 1.54 2.80 .000 8.16
Crawford et al. (2008) 1.10 0.26 0.59 1.60 .000 9.28
Greenleaf et al. (2011) 0.49 0.28 0.05 1.03 .074 9.01
Hart & Lee (2003) 0.11 0.16 0.20 0.42 .492 11.08
Henrichs & Leseman (2014) 0.81 0.28 0.26 1.35 .004 8.91
Lee & Maerten-Rivera (2012) 0.39 0.18 0.04 0.73 .028 10.76
Olson et al. (2012) 0.45 0.28 0.10 1.00 .106 8.92
Shanahan & Shea (2012) (a) 0.51 0.31 0.10 1.12 .102 8.34
Shanahan & Shea (2012) (b) 0.75 0.32 0.12 1.38 .020 8.16
Combined 0.71 0.16 0.41 1.01 .000

original meta-analysis, a construct that might not be as closely The risk of publication bias can be derived from the funnel plot
related to teachers’ support of students’ language as the other (Fig. 5). Egger’s regression (Intercept ¼ 5.02, SE ¼ 1.77, p < .020)
included outcomes on the level of teachers’ classroom practices. confirmed the asymmetry that can be seen in the plot, with a
When excluding this construct in a sensitivity analysis, the sum- tendency towards more studies with larger standard errors and
mary effect remained the same as when the construct was larger effect sizes. In this case, small studies might lead to less
included. precise results and overestimated effect sizes.
10 E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971

Fig. 4. Forest plot showing effects of PD on teachers’ practices in random-effects meta-analysis. Squares represent the aggregated effect per trial with horizontal lines showing the
respective confidence interval. The diamond represents the summary effect size.

Fig. 5. Funnel plot for teachers’ practices outcome. Circles represent trials; diamond represents the summary effect size and its confidence interval.

3.3.1. Study characteristics and teachers’ classroom practice and for which information about evaluator involvement was not
Table 5 presents the results of subgroup analyses on the level of given tended to be higher than those of interventions assessed by
teachers’ classroom practices. Subgroup analyses were first per- independent evaluators. The differences were not statistically sig-
formed to determine whether the effect sizes varied depending on nificant, however.
study characteristics. Comparing the summary effect of in-
terventions studied using pre-post designs to those studied using 3.3.2. PD intervention characteristics and teachers’ classroom
comparison group designs yielded no significant difference, justi- practice
fying the combination of both study types in further analyses. In this section, we present the results of the moderator analyses
Comparing the four controlled trials according to method of for all 11 trials investigating PD intervention characteristics as po-
assignment to conditions, the non-random matched group trial tential moderators of PD effectiveness. Subgroup analysis showed
yielded higher values than the two school level-randomized trials that the impact of the six interventions including coaching was
and the individual-level randomized trial, but the differences were higher than that of the five interventions without coaching, but the
not statistically significant. Another subgroup analysis showed a difference was not statistically significant (see Table 5). Meta-
non-significant difference between the majority of interventions regressions (see Table 6) showed that neither the time span of
evaluated with unstandardized instruments and the one inter- the PD intervention nor the total workshop hours were linearly
vention evaluated with standardized measures, with the aggre- related to PD effects on teachers’ practices. There was a positive
gated effect size favoring the former. The effects of interventions association between the number of formats used in the PD in-
evaluated solely with instruments and data collection processes for terventions and the effect size, with the effect size increasing by
which reliability was low or not given were significantly higher 0.32 with every additional PD format included. However, this as-
than the effect sizes for interventions evaluated including reliable sociation did not reach statistical significance.
measurements. Furthermore, interventions evaluated solely via Given the high effect size and the type of instruments and
observations by mentors or other persons involved in the inter- methods used (e.g., no standardized instruments or reliable data,
vention yielded effect sizes significantly higher than those (also) no external information source or evaluation) in the study by Choi
assessed by independent personnel. Finally, the effects of in- and Morrison (2014), we repeated the analyses omitting this trial.
terventions evaluated by personnel involved in their development In these analyses, the difference between interventions with
E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971 11

Table 5
Results of subgroup analyses on the level of teachers’ practices.

Groups Number of trials Effect size Test of null (2-tail) Heterogeneity

k g SE p Q df p

Study characteristics
Design
Pre-Post 7 0.74 0.20 .000
Comparison group 4 0.67 0.28 .016 0.04 1 .841
Assignment to conditions
Non-random, matched 1 0.98 0.37 .008
Random, school level 2 0.65 0.20 .001
Random, individual level 1 0.45 0.28 .106 1.31 2 .520
Standardized instrument
Not included 10 0.74 0.17 .000
Included 1 0.45 0.54 .407 0.26 1 .612
Reliable data collection
Not included 4 1.12 0.22 .000
Included 7 0.48 0.15 .002 5.73 1 .017
Independent information source
Not included 3 1.31 0.26 .000
Included 8 0.50 0.14 .000 7.59 1 .006
Independent evaluation
No 6 0.79 0.24 .001
Missing 4 0.68 0.30 .022
Yes 1 0.46 0.57 .417 0.31 2 .856
PD intervention characteristics
Format
Coaching 6 0.91 0.20 .000
No Coaching 5 0.48 0.21 .023 2.17 1 .141

Table 6 impact when it came to improving teachers’ ability to facilitate


Results of random-effects meta-regression analyses. students’ acquisition of content and language simultaneously. The
k B SE 95% CI p R2 analog aggregate effect estimate varied in some of the sensitivity and
Time span 11 .00
moderator analyses. The original estimate of g’ ¼ 0.71 corresponds
Intercept 0.73 0.27 0.21, 1.25 to a 26-percentile point difference on a normal distribution, while
Time span 0.00 0.02 0.04, 0.03 .937 conservative estimates from studies randomized at individual level,
Duration 11 .00 standardized instruments, reliable data collection, independent
Intercept 0.53 0.30 0.10, 1.12
information sources, or external evaluation suggest an overall PD
Duration 0.00 0.00 0.01, 0.01 .454
Number of formats 11 .15 effect ranging from g’ ¼0.45 to g’ ¼0.50 on teachers’ classroom
Intercept 0.05 0.39 0.71, 0.80 practice indicating a maximum of a 19-percentile point difference.
Number of formats 0.32 0.17 0.02, 0.65 .064 Our summary effect size estimate is in line with results from other
meta-analyses: Across various domains, Wade (1984) found a PD
effect of d ¼0.60 on teachers’ practices. In a meta-analysis on lan-
coaching (k¼ 5, g’ ¼ 0.67, SE ¼ 0.15) and without coaching was guage- and literacy-focused PD for early education teachers,
smaller (k ¼ 5, g’ ¼ 0.43, SE ¼ 0.13, p ¼ .228), and unlike in the Markussen-Brown et al. (2017) found an overall effect of g’ ¼0.59 on
original results, the number of formats used in the interventions a measure of educators’ practice.
was not associated with the effect size (B¼ 0.04, SE ¼ 0.17, Given the significant heterogeneity, however, Borenstein et al.
p ¼ .796, R2analog ¼ .00). Apart from that, the results did not (2009) recommend focusing on factors that can explain differ-
substantially differ from the results of the original analyses. ences in effect sizes rather than immediately interpreting the
combined effect size as meaningful. Indeed, subgroup analyses
showed some of the study characteristics to moderate the out-
4. Discussion
comes on teachers’ practices. In line with other meta-analyses,
methodological variables affected the magnitude of the effect size
The results of our meta-analysis on teachers’ cognition indicate
((Egert, Fukkink, & Eckhardt, 2018); Cheung & Slavin, 2016;
that all four evaluated interventions tended to show at least posi-
Markussen-Brown et al., 2017; Wilson & Lipsey, 2001). In particular,
tive results. However, the weighted summary effect of PD was small
studies using reliable data collection had a summary PD effect that
and non-significant. This overall estimate appears to be in line with
was significantly lower (g’ ¼ 0.48) than that from studies that did
findings by Markussen-Brown et al. (2017), for instance, who re-
not report reliable data collection and also lower than the original
ported a non-significant ES of 0.12 for language- and literacy-
overall estimate. The larger effect of unreliable measures was
focused PD on early educators’ knowledge. However, although all
produced by trials that used observations but did not report
sensitivity analyses indicated a positive overall effect, the magni-
interrater reliability. Furthermore, effect sizes were significantly
tude of the effect varied from negligible to small. Since only four
larger (g’ ¼ 1.31) in studies, where only non-independent data from
studies could be included in this meta-analysis, it may not be
sources involved in the PD intervention were used. On the one
possible to interpret the results with confidence and further
hand, this might be explained by the assumption that a person who
research is needed.
is involved in a program has particular interest in finding their work
On the level of teachers’ classroom practices, our meta-analysis
to be effective. On the other hand, they could simply be able to
revealed a significant medium to large effect and substantial het-
better identify important aspects/outcomes of that program.
erogeneity. Again, all interventions tended to exhibit a positive
12 E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971

Moderator analyses indicated several further tendencies for dif- There were several additional limitations to our study. For one,
ferences in effect sizes, but the alpha-level was non-significant. we were not at all close to taking all potential effect modifiers into
However, non-significant results of subgroup analyses can be due account. For instance, individual school and teacher characteristics
to small sample sizes and, as a consequence thereof, low test power also play a role in the effectiveness of teacher PD (Lipowsky &
(Borenstein et al., 2009). Considering just the differences in average Rzejak, 2015). Insufficient reporting prevented some features
effect sizes, our results on effect modifiers point in the expected from being coded and small sample sizes, too little variance or too
direction and are in line with the findings on methodological much variance across studies concerning a given feature, as well as
moderators outlined in the introduction. potential confounding, prevented others from being analyzed, in
None of the subgroup analyses yielded significant results particular further intervention characteristics. Finally, some infor-
regarding PD intervention characteristics. Our findings on coaching mation had to be summarized when aggregating the data, which
are somewhat surprising in light of the results of meta-analyses in necessarily led to some simplification. For example, some studies
related fields showing coaching to be an effective approach to reported data for multiple time points (e.g., Lee & Maerten-Rivera,
improve classroom quality (Egert, Fukkink, & Eckhardt, 2018; 2012; Olson et al., 2012), but the development of effects over time
Markussen-Brown et al., 2017). Our results exhibit a trend in the was disregarded in our analysis as it was not our study’s focus.
expected direction but did not reach statistical significance. Thus, Similarly, we summarized findings across different subject areas
we cannot definitively conclude whether the descriptive advantage (note that some studies did not distinguish between subject areas
of coaching was due to chance or to the assumed effectiveness of either) as well as across constructs and their respective oper-
coaching. While the method might be beneficial in principle, its ationalizations. Considering that different outcome constructs can
ultimate effectiveness might depend on how it is implemented explain a substantial amount of variance in effect sizes even within
(Sanetti & Kratochwill, 2014). It is possible that the coaching one domain (Wilson & Lipsey, 2001), a more differentiated analysis
implemented in (some of) the evaluated interventions lacked some of outcomes might have been useful.
important elements. However, vague and insufficient reporting in
several included studiesda problem regularly encountered by au- 5. Conclusion
thors of meta-analyses (Lipsey & Wilson, 2001)dprevented us
from assessing and comparing the quality and extent of the The positive results of our analysis support the idea that PD
implementation of elements such as coaching. plays an important role in supporting teachers in fostering stu-
Meta-analytic evidence on PD intensity and duration is dents’ academic language proficiency across the curriculum. The
controversial (Egert, Fukkink, & Eckhardt, 2018; Lipowsky & Rzejak, quantitative meta-analytical estimates may help identify a more
2015; Markussen-Brown et al., 2017). In our study, PD duration was precise estimate of the benefits of PD than single studies and
not systematically related to the PD effect. This could potentially be complement existing qualitative research syntheses (e.g.,
explained by the fact that we were only able to code the total hours Kalinowski, Gronostaj, & Vock, 2019; Zhang, 2014). Taken together,
spent inworkshops/courses, and not the duration of other support the findings of our meta-analysis show that research on the effects
elements like coaching due to vague reporting (see Table 1). of such PD is still at an early stage. Drawing upon existing studies
Another reason could be that different PD aims and contents evaluating PD for in-service teachers at general education schools
require different program durations, as Lipowsky and Rzejak (2015) in the field of language support in subject areas, this study identi-
assume. We further found a non-significant association between fied the magnitude of the effect of such PD interventions on
the number of formats and the magnitude of the PD effect, which teachers’ cognition and classroom practices. We explored the field
disappeared after removing an outlying study. It might have been on the basis of a few studies with heterogeneous designs, methods,
meaningful to further research the relationship between PD con- and quality. The magnitude of PD effects might be biased due to the
tent and intensity. Since our study’s small sample size prevented us inclusion of non-reliable instruments and non-independent infor-
from testing methodological moderators in meta-regressions as mation sources. Our study confirms the importance of rigorous
control variables, the findings on moderators regarding PD char- evaluation research to reduce the risk of bias. Without doubt,
acteristics could potentially be confounded. investigating changes in teachers’ classroom performance is
The relatively small number of studies that fall under the scope methodologically challenging. For instance, it may be difficult to
of our research topic is somewhat surprising. No studies on bilin- acquire randomized control groups in real-world educational set-
gual programs or from any countries other than the US or the tings (Lipowsky, 2010). Nevertheless, valid instruments and
Netherlands met the inclusion criteria for our analysis. One diffi- rigorous research processes, such as reliable observations by in-
culty of this research project certainly lay in defining PD and the dependent raters, should be established in future research on the
type of language support to be considered in the synthesis, since a effectiveness of teacher PD so that conclusions can be drawn with
great variety of concepts exist, many of which are not clearly greater confidence. For instance, a number of recent studies, pub-
defined and are used inconsistently (in international research as lished after our systematic literature search, used randomized
well as in the field of education in general). Thus, creating an controlled trials (at the school level) as well as multiple observa-
appropriate search syntax and deciding what studies to include tions (Babinski, Amendum, Knotek, Sa nchez, & Malone, 2018; Tong
involved extensive literature searches and numerous decisions. In et al., 2018). Furthermore, methods, interventions and outcomes
light of our thorough search for studies, an explanation for the should be described as clearly and comprehensively as possible in
small study sample size may be the relative novelty of the topic order to facilitate comparisons. Further analyses based on a greater
(DiCerbo et al., 2014) and the fact that language support strategies body of primary research in this field are needed to resolve the
need to be developed and, ideally, evaluated before corresponding remaining uncertainties in interpretation. In particular, since our
PD initiatives can emerge and evaluation studies are useful. The analysis does not clearly suggest whether different training char-
preponderance of studies from the US may reflect that the US acteristics yield different effects, study and PD intervention char-
seems to have taken a pioneering role in evaluating PD in this field. acteristics need to be disentangled and the relationship between
Whether the results of our analysis also pertain to other countries is PD features and effect sizes needs to be further explored. On the
unclear. As many interventions in this analysis were related to basis of a greater number of studies, it might also be informative to
science, generalizability might also be limited in terms of subject look at outcome constructs and subject areas separately and to
areas. meta-analytically study teacher change over time. An important
E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971 13

next step will be to examine student outcomes as well as their Fach. Sprachlichkeit und fachliches Lernen [Language in subject areas. Linguisti-
cality and subject-specific learning]. Münster: Waxmann.
relation to teacher outcomes.
Birman, B. F., Desimone, L. M., Porter, A. C., & Garet, M. S. (2000). Designing pro-
The small overall effect on the level of teachers’ cognition, which fessional development that works. Educational Leadership, 57(8), 28e33.
was based on a small number of studies and might be instable, Blank, R. K., & de las Alas, N. (2009). Effects of teacher professional development on
should be investigated further as soon as more primary research on gains in student achievement: How meta analysis provides scientific evidence
useful to education leaders. Washington, DC: The Council of Chief State School
this outcome is available. On the level of teachers’ practices, sub- Officers.
stantial heterogeneity was indicated and we were able to examine Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction
study and PD intervention characteristics that could help to explain to meta-analysis. Chichester, UK: Wiley.
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2013). Compre-
differences in effects. Significantly larger effect sizes among hensive meta-analysis. Englewood, NJ: Biostat.
potentially biased trials were found. As a result, the aggregate effect Borg, S. (2006). Teacher cognition and language education: Research and practice.
in our original meta-analysis might be slightly overestimated and London, New York, NY: Continuum.
Brisk, M. E., & Zisselsberger, M. (2011). “We’ve let them in on the secret”: Using SFL
the results on moderating effects of PD intervention characteristics theory to improve the teaching of writing to bilingual learners. In T. Lucas (Ed.),
could not be interpreted with confidence. Even when taking these Teacher preparation for linguistically diverse classrooms: A resource for teacher
uncertainties into account, however, we can assume a small to educators (pp. 111e126). New York: Taylor & Francis.
Bunch, G. C. (2013). Pedagogical language knowledge: Preparing mainstream
medium positive effect of PD in this particular field with some teachers for English learners in the new standards era. Review of Research in
confidence. Education, 37(1), 298e341.
Thus, our findings on PD effects are promising and of practical *Cantrell, S. C., & Hughes, H. K. (2008). Teacher efficacy and content literacy
implementation: An exploration of the effects of extended professional devel-
relevance. It seems that PD is able to positively affect some aspects
opment with coaching. Journal of Literacy Research, 40(1), 95e127.
of teachers’ beliefs and knowledge and to support teachers in Carnegie Council on Advancing Adolescent Literacy. (2010). Time to act: An agenda
integrating language and literacy teaching into their subject-area for advancing adolescent literacy for college and career success. New York, NY:
teaching. This could inform arguments about providing resources Carnegie Corporation of New York.
Chambers, E. A. (2004). An introduction to meta-analysis with articles from the
for teacher PD. Our analyses may also support administrators and Journal of Educational Research (1992-2002). The Journal of Educational
educators seeking to design or implement teacher PD focused on Research, 98(1), 35e44.
providing all students with access to the specific kind of language Chamot, A. U., & O’Malley, J. M. (1987). The cognitive academic language learning
approach: A bridge to the mainstream. Teachers of English to Speakers of Other
used in education. Although many factors influence students’ lan- Languages, 21(2), 227e249.
guage learning (Texas Education Agency, 2000) and teacher PD is Cheuk, T. (2016). Discourse practices in the new standards: The role of argumen-
only one element involved in fostering it, investments in this area tation in Common Core-era Next Generation Science Standards classrooms for
English language learners. Electronic Journal of Science Education, 20(3), 92e111.
generally seem worthwhile and, in the long term, may help to Cheung, A. C. K., & Slavin, R. E. (2016). How methodological features affect effect
reduce the disadvantages students from minority language groups sizes in education. Educational Researcher, 45(5), 283e292.
and families with lower levels of education have in the educational *Choi, D. S.-Y., & Morrison, P. (2014). Learning to get it right: Understanding change
processes in professional development for teachers of English learners. Pro-
system. fessional Development in Education, 40(3), 416e435.
Cohen, J. (1988). Statistical power analysis for the behavioural sciences. Hillsdale, NJ:
Funding Lawrence Erlbaum Associates.
Cohn, L. D., & Becker, B. J. (2003). How meta-analysis increases statistical power.
Psychological Methods, 8(3), 243e253.
This work was supported by the German Federal Ministry of Cooper, H., & Hedges, L. V. (2009). Research synthesis as a scientific process. In
Education and Research, Germany [01JI1501A]. The funding source H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research syn-
thesis and meta-analysis (2nd, pp. 3e16). New York: Russel Sage Foundation.
had no involvement in any of the stages of this study.
*Crawford, L., Schmeister, M., & Biggs, A. (2008). Impact of intensive professional
development on teachers’ use of sheltered instruction with students who are
Acknowledgements English language learners. Journal of In-Service Education, 34(3), 327e342.
Cummins, J. (2008). BICS and CALP: Empirical and theoretical status of the
distinction. In B. Street, & N. H. Hornberger (Eds.), Literacy (2nd ed., Vol. 2, pp.
Our thanks to Keri Hartman who provided indispensable lan- 71e83). New York: Springer Science þ Business Media. Encyclopedia of lan-
guage assistance and to Max Borchelt for rating hundreds of guage and education.
studies. Deeks, J. J., Higgins, J. P. T., & Altman, D. G. (2011). Analysing data and undertaking
meta-analyses. In J. P. T. Higgins, & S. Green (Eds.), Cochrane handbook for sys-
tematic reviews of interventions Version 5.1.0. .
References* DiCerbo, P. A., Anstrom, K. A., Baker, L. L., & Rivera, C. (2014). A review of the
literature on teaching academic English to English language learners. Review of
*Anderson, E. M. (2009). Teacher change: The effect of a professional development Educational Research, 84(3), 446e482.
intervention on middle school mainstream teachers of English language learners. Dickersin, K. (1990). The existence of publication bias and risk factors for its
Doctoral dissertation. Denton: University of North Texas. occurrence. The Journal of the American Medical Association, 263(10), 1385e1389.
Anstrom, K. A., DiCerbo, P. A., Butler, F., Katz, A., Millet, J., & Rivera, C. (2010). Echevarria, J., & Graves, A. (2007). Sheltered content instruction: Teaching English
A review of the literature on academic English: Implications for K-12 English lan- language learners with diverse abilities (3rd ed.). Boston: Pearson/Allyn and
guage learners. Arlington, VA: The George Washington University Center for Bacon.
Equity and Excellence in Education. Echevarria, J., Vogt, M. E., & Short, D. J. (2004). Making content comprehensible for
rdenas-Hagan, E., Francis, D. J., Powell, J., Moore, S.,
August, D., Branum-Martin, L., Ca English learners: The SIOP model (3rd ed.). Boston: Pearson/Allyn and Bacon.
et al. (2014). Helping ELLs meet the Common Core State Standards for literacy in Education through Speaking and Writing [BiSSeBildung durch Sprache und Schrift].
science: The impact of an instructional intervention focused on academic lan- (2014). Offizieller Auftakt der Bund-L€ ander-Initiative in Berlin. [Official kick-off
guage. Journal of Research on Educational Effectiveness, 7(1), 54e82. of the joint federal-states initiative in Berlin]. Retrieved on 22 November, 2018
Babinski, L. M., Amendum, S. J., Knotek, S. E., Sanchez, M., & Malone, P. (2018). from http://www.biss-sprachbildung.de/neuigkeit.html?Id¼11.
Improving young English learners’ language and literacy skills through teacher Egert, F., Fukkink, R. G., & Eckhardt, A. G. (2018). Impact of in-service professional
professional development: A randomized controlled trial. American Educational development programs for early childhood teachers on quality ratings and child
Research Journal, 55(1), 117e143. outcomes: A meta-analysis. Review of Educational Research, 88(3), 401e433.
Bandura, A. (1993). Perceived self-efficacy in cognitive development and func- https://doi.org/10.3102/0034654317751918.
tioning. Eductional Psychologist, 28(2), 117e148. Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis
Basma, B., & Savage, R. (2018). Teacher professional development and student lit- detected by a simple, graphical test. British Medical Journal, 315(7109), 629e634.
eracy growth: A systematic review and meta-analysis. Educational Psychology Eun, B. (2006). The impact of an English as a second language professional develop-
Review, 30(2), 457e481. ment program: A social cognitive approach. Doctoral dissertation. Chapel Hill:
Becker-Mrotzek, M., Schramm, K., Thürmann, E., & Vollmer, H. J. (2012). Sprache im University of North Carolina.
Feilke, H. (2012). Bildungssprachliche Kompetenzen  fo € rdern und entwickeln.
[Fostering and developing academic language proficiency.]. Praxis Deutsch, 233,
4e13.
* Fillmore, L. W., & Snow, C. E. (2000). What teachers need to know about language.
References marked with an asterisk are included in the meta-analysis
14 E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971

Washington, DC: Center for Applied Linguistics. Knight, S. L., & Wiseman, D. L. (2005). Professional development for teachers of
Francis, D. J., Rivera, M., Lesaux, N., Kieffer, M., & Rivera, H. (2006). Practical diverse students: A summary of the research. Journal of Education for Students,
guidelines for the education of English language learners: Research-based 10(4), 387e405.
recommendations for instruction and academic interventions. Retrieved on Kolano, L. Q., Da vila, L. T., Lachance, J., & Coffey, H. (2014). Multicultural teacher
22 November, 2018 from http://www.centeroninstruction.org/files/ELL1- education: Why teachers say it matters in preparing them for English language
Interventions.pdf. learners. CATESOL Journal, 25(1), 41e65.
Frey, N., Fisher, D., & Nelson, J. (2010). Lessons scooped from the melting pot: Landis, J. R., & Koch, G. G. (1977). An application of hierarchical Kappa-type statistics
California district increases achievement through English language develop- in the assessment of majority agreement among multiple observers. Biometrics,
ment. JSD, 31(5), 24e28. 33(2), 363.
Fukkink, R. G., & Lont, A. (2007). Does training matter? A meta-analysis and review Lara-Alecio, R., Tong, F., Irby, B. J., Guerrero, C., Huerta, M., & Fan, Y. (2012). The effect
of caregiver training studies. Early Childhood Research Quarterly, 22(3), 294e311. of an instructional intervention on middle school English learners’ science and
Garet, M. S., Porter, A. C., Desimone, L. M., Birman, B. F., & Yoon, K. S. (2001). What English reading achievement. Journal of Research in Science Teaching, 49(8),
makes professional development effective? Results from a national sample of 987e1011.
teachers. American Educational Research Journal, 38(4), 915e945. *Lee, O., & Maerten-Rivera, J. L. (2012). Teacher change in elementary science in-
Gersten, R., Fuchs, L. S., Compton, D., Coyne, M., Greenwood, C., & Innocenti, M. S. struction with English language learners: Results of a multiyear professional
(2005). Quality indicators for group experimental and quasi-experimental development intervention across multiple grades. Teachers College Record,
research in special education. Exceptional Children, 71(2), 149e164. 114(8), 1e44.
Gibbons, P. (2006). Bridging discourses in the ESL classroom: Students, teachers and Lipowsky, F. (2010). Lernen im Beruf e empirische Befunde zur Wirksamkeit von
researchers. London, New York: Continuum. Lehrerfortbildung [Vocational learningdempirical evidence on the effective-
Gibbons, P. (2015). Scaffolding language, scaffolding learning. Teaching English lan- ness of teacher training]. In F. H. Müller, A. Eichenberger, M. Lüders, & J. Mayr
guage learners in the mainstream classroom. Portsmouth: Heinemann. (Eds.), Lehrerinnen und Lehrer lernen: Konzepte und Befunde zur Lehrerfortbildung
Gogolin, I., & Lange, I. (2011). Bildungssprache und Durchga €ngige Sprachbildung (pp. 51e72). Münster: Waxmann.
[Academic language and continuous language education]. In S. Fürstenau, & Lipowsky, F. (2014). Theoretische Perspektiven und empirische Befunde zur Wirk-
M. Gomolla (Eds.), Migration und schulischer Wandel: Mehrsprachigkeit (pp. samkeit von Lehrerfort- und -weiterbildung [Theoretical perspectives and
107e127). Wiesbaden: VS, Springer Fachmedien. empirical evidence on the effectiveness of teachers’ professional development
*Greenleaf, C. L., Litman, C., Hanson, T. L., Rosen, R., Boscardin, C. K., Herman, J., & and training]. In E. Terhart, H. Bennewitz, & M. Rothland (Eds.), Handbuch der
Jones, B. (2011). Integrating literacy and science in biology: Teaching and Forschung zum Lehrerberuf (pp. 511e541). Münster: Waxmann.
learning impacts of reading apprenticeship professional development. American Lipowsky, F., & Rzejak, D. (2015). Key features of effective professional development
Educational Research Journal, 48(3), 647e717. programmes for teachers. Ricercazione, 7(2), 27e51.
Guskey, T. R. (2000). Evaluating professional development. Thousand Oaks: Corwin. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Applied social research
Guskey, T. R., & Passaro, P. D. (1994). Teacher efficacy: A study of construct di- methods series: Vol. 49. Thousand Oaks, CA: Sage Publications.
mensions. American Educational Research Journal, 31(3), 627e643. Luborsky, L., Diguer, L., Seligman, D. A., Rosenthal, R., Krause, E. D., Johnson, S., &
Guskey, T. R., & Yoon, K. S. (2009). What works in professional development? Phi Schweizer, E. (1999). The researcher’s own therapy allegiances: A “wild card” in
Delta Kappan, 90(7), 495e500. comparisons of treatment efficacy. Clinical Psychology: Science and Practice, 6(1),
Hamre, B. K., Pianta, R. C., Downer, J. T., DeCoster, J., Mashburn, A. J., Jones, S. M., 95e106.
et al. (2013). Teaching through Interactions. Testing a developmental frame- Lucas, T., & Villegas, A. M. (2011). A framework for preparing linguistically
work of teacher effectiveness in over 4,000 classrooms. The Elementary School responsive teachers. In T. Lucas (Ed.), Teacher preparation for linguistically diverse
Journal, 113(4), 461e487. https://doi.org/10.1086/669616. classrooms: A resource for teacher educators. New York: Taylor & Francis.
*Hart, J. E., & Lee, O. (2003). Teacher professional development to improve the Markussen-Brown, J., Juhl, C. B., Piasta, S. B., Bleses, D., Højen, A., & Justice, L. M.
science and literacy achievement of English language learners. Bilingual (2017). The effects of language- and literacy-focused professional development
Research Journal, 27(3), 475e501. on early educators and children: A best-evidencemeta-analysis. Early Childhood
Hattie, J. A. C. (2009). Visible learning: A synthesis of over 800 meta-analyses relating Research Quarterly, 38, 97e115.
to achievement. London: Routledge. McFarland, J., Hussar, B., Wang, X., Zhang, J., Wang, K., Rathbun, A., & Bullock
Haynes, J. (2007). Getting started with English language learners: How educators can Mann, F. (2018). The condition of education 2018. Washington, DC: U.S. Depart-
meet the challenge. Alexandria, VA: Association for Supervision and Curriculum ment of Education. Retrieved on 22 November, 2018 from https://nces.ed.gov/
Development. pubs2018/2018144.pdf.
*Henrichs, L. F., & Leseman, P. P. M. (2014). Early science instruction and academic McIntyre, E., Kyle, D., Chen, C.-T., Mun ~ oz, M., & Beldon, S. (2010). Teacher learning
language development can go hand in hand. The promising effects of a low- and ELL reading achievement in sheltered instruction classrooms: Linking
intensity teacher-focused intervention. International Journal of Science Educa- professional development to student development. Literacy Research and In-
tion, 36(17), 2978e2995. struction, 49(4), 334e351.
Heppt, B., Henschel, S., & Haag, N. (2016). Everyday and academic language Miramontes, O. B., Nadeau, A., & Commins, N. L. (1997). Restructuring schools for
comprehension. Investigating their relationship with school success and chal- linguistic diversity: Linking decision making to effective programs. Language and
lenges for language. Learning and Individual Difference, 47, 244e251. literacy series. New York: Teachers College Press.
He, Y., Prater, K., & Steed, T. (2011). Moving beyond “just good teaching”: ESL pro- Mulrow, C. D. (1994). Rational for systematic reviews. British Medical Journal,
fessional development for all teachers. Professional Development in Education, 309(6954), 597e599.
37(1), 7e18. Munder, T., Brütsch, O., Leonhart, R., Gerger, H., & Barth, J. (2013). Researcher
Herrera, S. G., & Murry, K. G. (2005). Mastering ESL and bilingual methods: Differ- allegiance in psychotherapy outcome research: An overview of reviews. Clinical
entiated instruction for culturally and linguistically diverse (CLD) students. Boston: Psychology Review, 33(4), 501e511.
Pearson. National Clearinghouse for English Language Acquisition. (2017). National profes-
Higgins, J. P. T., Deeks, J. J., & Altman, D. G. (2011). Special topics in statistics. In sional development program. Retrieved on 22 November, 2018 from https://
J. P. T. Higgins, & S. Green (Eds.), Cochrane handbook for systematic reviews of ncela.ed.gov/national-professional-development-program.
interventions. Retrieved on 22 November, 2018 from Version 5.1.0. https:// Neuman, S. B., & Wright, T. S. (2010). Promoting language and literacy development
training.cochrane.org/handbook. for early childhood educators: A mixed-methods study of coursework and
Higgins, J. P. T., & Green, S. (Eds.). (2011). Cochrane handbook for systematic reviews of coaching. The Elementary School Journal, 111(1), 63e86.
interventions. Retrieved on 22 November, 2018 from, 5.1.0 https://handbook-5-1. OECD. (2016). PISA 2015 results. In Excellence and equity in education (Vol. I)Paris:
cochrane.org/. OECD publishing.
Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring OECD [Organisation for Economic Cooperation and Development. (2009). Creating
inconsistency in meta-analyses. BMJ Clinical Research Ed., 327(7414), 557e560. effective teaching and learning environments: First results from TALIS. OECD
Hoffman, M., & Harris, W. (2018). A complete guide to continuing education for publishing. Retrieved on 22 November, 2018 from https://www.oecd-ilibrary.
teachers. Retrieved on 22 November, 2018 from https://www.teachtomorrow. org/education/creating-effective-teaching-and-learning-environments_
org/continuing-education-for-teachers/. 9789264068780-en.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis. Correcting error and *Olson, C. B., Kim, J. S., Scarcella, R., Kramer, J., Pearson, M., van Dyk, D. A., &
bias in research findings. Thousand Oaks: Sage Publications. Land, R. E. (2012). Enhancing the interpretive reading and analytical writing of
IBM Corp. (2016). IBM SPSS statistics for windows. Armonk, NY: IBM Corp. mainstreamed English learners in secondary school: Results from a randomized
Jacob, A., & McGovern, K. (2015). The Mirage. Confronting the hard truth about our field trial using a cognitive strategies approach. American Educational Research
quest for teacher development. Brooklyn: TNT. Journal, 49(2), 323e355.
Kalinowski, E., Gronostaj, A., & Vock, M. (2019). Effective professional development Paetsch, J., Wolf, K. M., Stanat, P., & Darsow, A. (2014). Sprachfo €rderung von Kindern
for teachers to foster students’ academic language proficiency across the cur- und Jugendlichen aus Zuwandererfamilien. [Language support for children and
riculum: A systematic review. AERA Open, 5(1), 1e23. https://doi.org/10.1177/ adolescents from immigrant families]. Zeitschrift für Erziehungswissenschaft, 24,
2332858419828691. 315e347.
Kim, J. S., Olson, C. B., Scarcella, R., Kramer, J., Pearson, M., van Dyk, D. A., & Pant, H. A. (2014). Aufbereitung von Evidenz für bildungspolitische und
Land, R. E. (2011). A randomized experiment of a cognitive strategies approach p€adagogische Entscheidungen: Metaanalysen in der Bildungsforschung. [Pre-
to text-based analytical writing for mainstreamed Latino English language paring evidence for education policy and pedagogical decisions: Meta-analyses
learners in grades 6 to 12. Journal of Research on Educational Effectiveness, 4(3), in education science]. Zeitschrift für Erziehungswissenschaft, 17(4), 79e99.
231e263. Patrick, V. D. (2012). The role of superintendents as instructional leaders facilitating
E. Kalinowski et al. / Teaching and Teacher Education 88 (2020) 102971 15

student achievement among ESL/EL learners through school-site professional The impact of professional learning on in-service teachers’ pedagogical delivery
development. Doctoral dissertation. University of Southern California. of literacy-infused science with middle school English learners: A randomised
Quintero, D., & Hansen, M. (2017). English learners and the growing need for controlled trial study in the U.S. Educational Studies, 21(2), 1e21.
qualified teachers. Retrieved on 22 November, 2018 from https://www. Townsend, D., Filippini, A., Collins, P., & Biancarosa, G. (2012). Evidence for the
brookings.edu/blog/brown-center-chalkboard/2017/06/02/english-learners- importance of academic word knowledge for the academic achievement of
and-the-growing-need-for-qualified-teachers/. diverse middle school students. The Elementary School Journal, 112(3), 497e518.
Samson, J. F., & Collins, B. A. (2012). Preparing all teachers to meet the needs of Uccelli, P., Galloway, E. P., Barr, C. D., Meneses, A., & Dobbs, C. L. (2015). Beyond
English language learners: Applying research to policy and practice for teacher vocabulary: Exploring cross-disciplinary academic-language proficiency and its
effectiveness. Retrieved on 22 November, 2018 from http://files.eric.ed.gov/ association with reading comprehension. Reading Research Quarterly, 50(3),
fulltext/ED535608.pdf. 337e356.
Sanetti, L. M. H., & Kratochwill, T. R. (2014). School Psychology book series. Treatment U.S. Department of Education. (2016). English learner tool kit for state and local ed-
integrity: A foundation for evidence-based practice in applied psychology. Wash- ucation for State and Local Education Agencies (SEAs and LEAs). Washington, DC.
ington, D.C.: American Psychological Association. Retrieved on 22 November, 2018 from https://www2.ed.gov/about/offices/list/
Scarcella, R. (2003). Academic english: A conceptual framework. University of Cali- oela/english-learner-toolkit/eltoolkit.pdf.
fornia linguistic minority research Institute. U.S. Department of Education. (2017). National professional development program.
Schleppegrell, M. J. (2001). Linguistic features of the language of schooling. Lin- Retrieved on 22 November, 2018 from https://www2.ed.gov/programs/nfdp/
guistics and Education, 12(4), 431e459. funding.html.
Schleppegrell, M. J. (2004). the language of schooling: A functional linguistics Valdes, G., Kibler, A., & Walqui, A. (2014). Changes in the expertise of ESL pro-
perspective. Mahwah, New Jersey: Erlbaum. fessionals: Knowledge and action in an era of new standards. Alexandria, VA:
Schleppegrell, M. J. (2009). In Language in academic subject areas and classroom TESOL International Association.
instruction: What is academic language and how can we teach it? Paper presented Van Roekel, D. (2011). Professional development for general education teachers of
at the workshop the role of language in school learning by the national academy of English language learners. Washington, D.C: NEA Quality School Programs and
sciences. Menlo Park, CA. Retrieved on 22 November, 2018 from https://www. Resources Department.
rcoe.us/educational-services/files/2012/08/What_is_Academic_Language_ Vaughn, S., Linan-Thompson, S., Woodruff, A. L., Murray, C. S., Wanzek, J.,
Schleppegrell.pdf. Scammacca, N., & Elbaum, B. (2008). Effects of professional development on
Schmo €lzer-Eibinger, S. (2013). Sprache als Medium des Lernens im Fach. [Language improving at-risk students’ performance in reading. In C. R. Greenwood,
as the medium for learning in the subjects.]. In M. Becker-Mrotzek, K. Schramm, T. R. Kratochwill, & M. Clements (Eds.), Schoolwide prevention models: Lessons
E. Thürmann, & H. J. Vollmer (Eds.), Sprache im Fach. Sprachlichkeit und learned in elementary schools (pp. 115e142). New York: Guilford Press.
fachliches Lernen (pp. 25e40). Münster: Waxmann. Wade, R. K. (1984). What makes a difference in inservice teacher education? A
*Shanahan, T., & Shea, L. (2012). Incorporating English language teaching through meta-analysis of research. Educational Leadership, 42(4), 48e54.
science for K-2 teachers. Journal of Science Teacher Education, 23(4), 407e428. Walsleben, L. (2008). Signs of success: District gets specific about the needs of
Stanat, P., Weirich, S., & Radmann, S. (2012). Sprach- und Lesefo € rderung [Language English language learners. JSD: English Language Learners, 29(1), 18e22.
and reading support]. In P. Stanat, H. A. Pant, K. Bo €hme, & D. Richter (Eds.), Wei, R. C., Darling-Hammond, L., Andree, A., Richardson, N., & Orphanos, S. (2009).
Kompetenzen von Schülerinnen und Schülern am Ende der vierten Jahrgangsstufe Professional learning in the learning profession: A status report on teacher devel-
in den Fa €chern Deutsch und Mathematik: Ergebnisse des IQB-La €ndervergleichs opment in the United States and abroad. Dallas, TX: National Staff Development
2011 (pp. 251e276). Münster: Waxmann. Council. Retrieved on 10 September, 2019 from https://edpolicy.stanford.edu/
Sterne, J. A. C., Egger, M., & Smith, G. D. (2001). Investigating and dealing with sites/default/files/publications/professional-learning-learning-profession-
publication and other biases in meta-analysis. British Medical Journal, status-report-teacher-development-us-and-abroad.pdf.
323(7304), 101e105. Wilson, D. B., & Lipsey, M. W. (2001). The role of method in treatment effectiveness
Texas Education Agency. (2000). The Texas Successful Schools Study: Quality educa- research: Evidence from meta-analysis. Psychological Methods, 6(4), 413e429.
tion for limited English proficient students. Austin, TX. WWC [What Works Clearinghouse]. (2017). Standards handbook. Washington, DC:
The Campbell Collaboration. (2014). Campbell Collaboration systematic reviews: U.S. Department of Education. Version 4.0.
Policies and guidelines. Campbell Policies and Guidelines Series No. 1: The Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. (2007). Reviewing the
Campbell Collaboration. Retrieved on 22 November, 2018 from https://www. evidence on how teacher professional development affects student achievement.
campbellcollaboration.org/library/campbell-collaboration-systematic-reviews- Washington, DC: Institute of Education Sciences. Retrieved on 22 November,
policies-and-guidelines.html. Version 1.3. 2018 from https://ies.ed.gov/ncee/edlabs/regions/southwest/pdf/REL_2007033.
Timperley, H. S., Wilson, A., Barrar, H., & Fung, I. (2007). Teacher professional learning pdf.
and development: Best evidence synthesis iteration. Wellington: Ministry of Zhang, Y. (2014). Investigating the impact of a university-based professional develop-
Education. ment program for teachers of English language learners in Ohioda mixed methods
Tong, F., Irby, B. J., Lara-Alecio, R., Guerrero, C., Tang, S., & Sutton-Jones, K. L. (2018). study of teacher learning and change. Doctoral dissertation.

You might also like