You are on page 1of 19

sustainability

Review
Effects of Gamification on Students’ English Language
Proficiency: A Meta-Analysis on Research in South Korea
Je-Young Lee 1 and Minkyung Baek 2, *

1 Department of English Education, Jeonju University, Jeonju 55069, Republic of Korea; jylee@jj.ac.kr
2 Department of Home Economics Education, Jeonju University, Jeonju 55069, Republic of Korea
* Correspondence: bmk0419@jj.ac.kr

Abstract: This study presents a meta-analysis of research on the impact of gamification on English
language proficiency among South Korean students. Through an examination of 11 cases involving
610 participants, the study reveals a medium effect size (g = 0.517), suggesting that gamification
can significantly enhance English language learning outcomes. The analysis also reveals that theses
(g = 0.799) reported higher effect sizes than journal articles (g = 0.298), and that the absence of
technology in gamified learning interventions could potentially lead to larger effect sizes (g = 0.932).
Furthermore, the incorporation of points/scores and badges/rewards showed statistically significant
effects on student learning. The study found no significant differences in effect sizes when considering
grade, number of participants, weeks, sessions, sessions per week, and the number of gaming
elements. The results demonstrate varying impact of gamification across different subcomponents
of English proficiency, particularly in the learning of vocabulary, listening, and writing skills. The
findings underscore the potential of gamification as a tool for English language learning, but also call
for careful consideration in its design and implementation to maximize learning outcomes. Lastly,
we offer suggestions for future research and discuss the pedagogical implications of this study.

Keywords: gamification; gamified learning; meta-analysis; research synthesis; English as a foreign


language
Citation: Lee, J.-Y.; Baek, M. Effects
of Gamification on Students’ English
Language Proficiency: A
Meta-Analysis on Research in South
1. Introduction
Korea. Sustainability 2023, 15, 11325. The 20th century was characterized by the advent of information technology, marking
https://doi.org/10.3390/ a significant milestone in the way we learn and educate. However, the 21st century has
su151411325 heralded a new era—the ‘ludic century’, a time when play has assumed a critical role in
Academic Editors: Fezile Özdamlı,
educational methodologies [1]. In contrast to the 20th century’s focus on merely integrating
Damla Karagozlu and Şenay technology into classrooms, the current century propels us to make learning not just
Kocakoyun Aydoǧan technologically advanced, but also more enjoyable and effective. Among the various
strategies adopted to achieve this, gamification, or the application of game elements in
Received: 30 June 2023 non-game contexts, has emerged as a promising approach. The concept of gamification was
Revised: 16 July 2023
first introduced in 2002 and has since been incorporated into numerous domains, including
Accepted: 19 July 2023
education, since 2010 [2].
Published: 20 July 2023
English as a Foreign Language (EFL) education has also benefited from this trend.
After the advent of computers, several technological applications have been tested in
foreign language education, including computer-assisted language learning (CALL) and
Copyright: © 2023 by the authors.
technology-enhanced language learning (TELL). The late 1990s and early 2000s saw recog-
Licensee MDPI, Basel, Switzerland. nition for the potential of game-based learning in language acquisition [3]. In the realm of
This article is an open access article EFL, gamification entails the use of game design principles and elements, such as rewards,
distributed under the terms and badges, leaderboards, and challenges. These elements aim to motivate learners, augment
conditions of the Creative Commons their engagement, and render the learning process more enjoyable [4].
Attribution (CC BY) license (https:// The journey to attain proficiency in a foreign language requires significant time and
creativecommons.org/licenses/by/ effort. Keeping this in mind, fostering engagement and interest in learners is crucial. Re-
4.0/). search indicates that learners who exhibit interest and active engagement in their language

Sustainability 2023, 15, 11325. https://doi.org/10.3390/su151411325 https://www.mdpi.com/journal/sustainability


Sustainability 2023, 15, 11325 2 of 19

study perform better and retain the language for longer [5,6]. Hence, gamification can
significantly enhance not only the efficiency but also the overall amount of learning.
Nevertheless, the efficacy of gamification in foreign language education is not unan-
imously agreed upon. Numerous studies (e.g., [7–9]) have suggested that gamification
positively impacts learning outcomes and improves foreign language skills. Conversely,
other studies have highlighted no significant improvements or even potential negative
impacts on foreign language learning and behavioral aspects (e.g., [10–12]).
With this background, our study aspires to conduct a comprehensive meta-analysis on
the influence of gamification on English learning as a foreign language, with a particular
focus on experimental studies from South Korea. We aim to evaluate whether English
classes employing gamification demonstrate positive effects, and if so, quantify the extent
of these benefits. Simultaneously, we intend to scrutinize factors, particularly gaming
elements, that moderate the impact of gamification on learning. Prior to presenting the
methodology and results of our research, we will delve into the gaming elements, known for
significantly influencing the success or failure of gamification, and underline the importance
and uniqueness of the present study.

1.1. Gaming Elements in Gamification


Gamification, defined as the application of game design elements in non-gaming envi-
ronments, has been recognized for its potential to enhance user engagement, productivity,
and learning outcomes. In the context of education, the integration of gaming elements into
learning experiences—known as gamified learning—is particularly notable for its array of
associated benefits.
Engagement, an essential component of effective learning, is greatly amplified by the
deliberate choice and incorporation of gaming elements [13]. These elements also function
as powerful aids in reinforcing learning outcomes by providing learners with immediate
and tangible feedback [14]. Moreover, gaming elements facilitate an environment of healthy
competition and collaboration, augmenting social interaction and collective learning [15].
The multifaceted nature of these elements accommodates diverse learning styles and rates,
fostering a more inclusive and individualized learning experience [16].
Building on existing academic work, Toda et al. [17] presented an expanded taxonomy
of gaming elements. This is concisely summarized by the researchers, as provided in the
following Table 1.

Table 1. Taxonomy of gaming elements (summarized from the findings of Toda et al. [17]).

Elements Description
Rewards learners for specific tasks, e.g., badges for completed
1. Acknowledgement
problems.
2. Level Hierarchical system providing new advantages as learners progress.
Performance/measurement 3. Progression Guides users about their advancement in the environment.
4. Point Basic feedback method, usually through scores or experience points.
Visual information about the learner’s performance or overall
5. Stats
environment.
1. Chance Involves uncertainty in outcomes.
2. Imposed Choice Requires users to make decisions for progress.
Ecological 3. Economy Represents transactions within the environment.
4. Rarity Involves limited resources to stimulate specific goals.
5. Time Pressure Applies time constraints but can disengage users.
Sustainability 2023, 15, 11325 3 of 19

Table 1. Cont.

Elements Description
1. Competition Involves user challenges to attain common goals.
2. Cooperation Encourages collaboration towards shared objectives.
Social
3. Reputation Relates to social status titles within a community.
4. Social Pressure Reflects the influence of social interactions on behavior.
1. Novelty Updates within the environment to maintain user engagement.
2. Objectives Goals providing a purpose for task completion.
3. Puzzle Cognitive challenges within the environment.
Personal
4. Renovation Opportunities for learners to redo tasks.
Enhances the experience using sensory stimulation. Lack of these
5. Sensation
may lead to demotivation.
1. Narrative Describes event sequences, influenced by user decisions.
Fictional
2. Storytelling Conveys the environment’s story, supporting the narrative.

The practical implementation of this gamification taxonomy can significantly shape


learning environments. When integrated correctly, these dimensions can create an in-
teractive, motivating, and meaningful learning experience. However, their application
necessitates careful balance. Constructive and motivating performance feedback is one
example of this [18]. It is crucial to promote healthy competition and collaboration through
ecological and social dimensions [15], and to cater to individual learning preferences and
pace with personal elements [16]. Fictional elements should be used to enhance immer-
sion without distracting from educational content [19]. While gamification can boost
user engagement and motivation, an unbalanced emphasis or misuse can lead to stress,
disengagement, or unhealthy competition [20]. Therefore, gamification should be thought-
fully planned and executed with an emphasis on enriching learning and overall student
experiences [21].

1.2. Contributions of This Study


The present study carries significant importance, as it uniquely focuses on the influence
of gamification on English language proficiency. As one of the most frequently attempted
applications of gamification is in the field of foreign languages, this specificity holds
substantial relevance. Previous research by Bai et al. [2] has reported an effect size of
g = 0.377 for language subjects, but there still exists a gap in quantitative meta-analytical
studies that concentrate particularly on the impact within foreign languages, and more
specifically, English. This gap is especially noteworthy considering the integral role of
English as a functional subject and the prevalent usage of various technological tools in
its teaching and learning processes. Therefore, a meta-analysis exploring the effects of
gamification within English education holds immense significance.
Secondly, this research narrows its focus to studies conducted exclusively within South
Korea. A qualitative meta-analysis by Kwon and Lyou [22] on gamification studies within
South Korea identified ‘education’ as the most active field of such studies since 2010. In com-
parison with other regions, Bai et al. [2] reported that the implementation of gamification
yielded higher effects in East Asian regions, including South Korea (g = 0.514), compared to
South America (g = 0.39), North America (g = 0.254), and Southern Europe (g = −0.725). Ad-
ditionally, the latest curriculum from South Korea’s Ministry of Education [23] emphasizes
the application of various technologies in education to integrate education for sustainable
development. For example, the national curriculum for English language stated that “the
English language subject nurtures students’ English communication skills, cultivating fun-
damental competencies and adaptability to actively respond to societal changes prompted
Sustainability 2023, 15, 11325 4 of 19

by digital transformation, climate change, and environmental disasters. Communicating


in English means acquiring various forms of information expressed in English within
real-life contexts connected to the students’ lives. It also involves freely and creatively
expressing their thoughts and feelings in English, and cooperatively interacting with other
participants in the English-speaking community (p. 5)”. There has been a surge in the ac-
tive implementation of training programs aimed at strengthening teachers’ digital literacy,
with a special focus on technological pedagogical content knowledge (TPACK) [24–26].
Hence, a meta-analysis centered on studies from South Korea, a region striving for diverse
technology-education integrations, including gamification, is likely to provide significant
insights into the overall effects of gamification and the factors that dictate its success or
failure.
Thirdly, the study proposes new insights regarding publication bias, particularly with
regards to language bias. Publication bias is a vital issue to consider in meta-analysis,
as it can considerably distort research conclusions [27]. Language bias pertains to the
preferential selection and citation of studies published in certain languages, commonly
English, at the expense of other languages. This bias can significantly influence the accessi-
bility, interpretation, and generalizability of research findings [28,29]. Meta-analyses and
systematic reviews published in English-language journals often overlook studies penned
in other languages. Despite the impracticality of mastering every language containing
relevant literature, researchers must remain cognizant of this limitation and its bearing on
the body of literature under review [30].
Some scholars have put forth several pragmatic strategies for reducing language bias
in meta-analysis, including: (1) incorporating non-English studies, (2) implementing a
comprehensive search strategy involving multiple databases and sources of grey literature,
(3) collaborating with researchers fluent in diverse languages, and (4) employing language
bias assessment tools such as Egger’s Test and Funnel Plots [28,31,32]. Following these
guidelines, this study employs a systematic approach, as detailed in the methodology
section, to mitigate language bias and publication bias.
Firstly, this study includes research written in languages other than English, specifically
Korean, to ensure the representation of non-English literature. Secondly, a comprehensive
search strategy was employed, encompassing multiple academic databases to gather a
broad spectrum of relevant literature. Additionally, we sought out unpublished material,
such as dissertations and research reports, to reduce the impact of publication bias. Lastly,
to assess the presence and impact of potential language bias, we conducted a statistical
validation of publication bias, employing tools such as Egger’s Test. Therefore, through
these multi-pronged strategies, this study aims to present a thorough and balanced meta-
analysis by minimizing the influence of language bias.
The specific objectives of this research can be listed as follows:
1. To evaluate the effect of gamification on the English proficiency of Korean students.
2. To identify the moderating factors, such as gaming elements, influencing the effects of
gamification on Korean students’ English proficiency.
3. To investigate any differences in the effects of gamification on Korean students’ English
proficiency based on various dependent variables.

2. Methods
2.1. Selection of Studies for Analysis
The scope of this meta-analysis is defined by studies that have investigated the influ-
ence of gamification on English proficiency, identified through a search on the Research
Information Sharing Service (RISS). RISS, a well-acknowledged Korean academic database,
provides a broad platform to perform meta-searches across prime academic databases in
Korea, such as KCI, DBpia, e-article, KISS, Kyobo Scholar, etc. Furthermore, RISS accom-
modates almost all of Master’s theses and Doctoral dissertations from universities in South
Korea. The search, conducted in April 2023, utilized keywords such as ‘gamification’ and
‘gamified learning’, along with their Korean equivalents.
Sustainability 2023, 15, 11325 5 of 19

The initial collection phase accumulated a total of 237 studies: 154 from academic
Sustainability 2023, 15, x FOR PEER REVIEW 5 of 19
journals and 83 theses. A series of refining steps, as depicted in Figure 1, were then applied
to this primary collection, following the PRISMA statement [33].

Figure 1. PRISMA
Figure1. PRISMA flowchart
flowchart of
of the
the article
article screening
screening process.
process.

In
In the
the selection
selection process,
process, inclusion and exclusion
inclusion and exclusion criteria
criteriawere
wereestablished
establishedwith
withrefer-
refer-
ence to previous studies [2,34–36], creating a definitive framework for this meta-analysis:
ence to previous studies [2,34–36], creating a definitive framework for this meta-analysis:
(a)
 Onlystudies
Only studies that
that empirically
empirically investigated
investigated gamified
gamified practices
practices werewere included,
included, ex-
exclud-
cluding any that simply discussed or described gamification without
ing any that simply discussed or described gamification without empirical backing. empirical

backing.
The focus was placed on studies conducted in K-12 or higher education settings. Any
(b) The focus was placed on studies conducted in K-12 or higher education settings.
studies where interventions were held in out-of-school environments, such as private
Any studies where interventions were held in out-of-school environments, such as
tutoring or extracurricular academies, were omitted.
private tutoring or extracurricular academies, were omitted.
 Included studies were required to objectively measure students’ English proficiency
(c) Included studies were required to objectively measure students’ English proficiency
after the treatment. Studies were excluded if they solely relied on self-reported data
after the treatment. Studies were excluded if they solely relied on self-reported data
about students learning achievements.
about students learning achievements.
 Studies had to explicitly mention the usage of at least one game element. Those not
(d) Studies had to explicitly mention the usage of at least one game element. Those not
specifying the game elements used were excluded.
specifying the game elements used were excluded.
 Finally, studies were excluded if their datasets or results lacked sufficient information
(e) Finally, studies were excluded if their datasets or results lacked sufficient information
for effect size calculations, such as missing sample size data or mean scores without
for effect size calculations, such as missing sample size data or mean scores without
corresponding standardized deviation values.
corresponding standardized deviation values.
2.2. Codebook
The objective of this study is to explore the impact of gamification on EFL learners’
English proficiency. In order to do so, each study is classified according to analysis criteria,
as displayed in Table 2. Two independent researchers conducted the analysis, reconciling
any differences in coding outcomes via discussion and consensus.
Sustainability 2023, 15, 11325 6 of 19

2.2. Codebook
The objective of this study is to explore the impact of gamification on EFL learners’
English proficiency. In order to do so, each study is classified according to analysis criteria,
as displayed in Table 2. Two independent researchers conducted the analysis, reconciling
any differences in coding outcomes via discussion and consensus.

Table 2. Coding scheme.

Category Variables
1. Publication Type (1) journal article (2) MA thesis
2. Experimental Design (1) quasi-experimental (2) pre-experimental
3. School Level (1) primary school (2) secondary school (3) university
4. Technology Use (1) used (2) not used
(1) avatar/character (2) point/score (3) leaderboard/scoreboard (4) feedback
5. Gaming Element
(5) level (6) collaboration (7) mission/challenge (8) story/fiction (9) badge/reward
6. Grade Raw data
7. Number of Participants Raw data
8. Weeks Raw data
9. Sessions Raw data
10. Sessions per Weeks Raw data
11. Number of Gaming Elements Raw data
12. Dependent Variables (1) listening (2) speaking (3) reading (4) writing (5) vocabulary (6) achievement

To begin with, let us explain the moderating variables, which are classified under a
nominal scale. The ‘publication type’ is the first criterion, differentiating between studies
published in academic journals and those presented as master’s theses. Notably, no doctoral
dissertation was included in the final meta-analysis.
The second criterion, ‘experimental design’, is employed to categorize the study
designs investigating the effects of gamification. As per Brown’s classification [37] for the
L2 field, experimental study designs are divided into: (1) true-experimental design, using
both random sampling and control groups, (2) quasi-experimental design, only employing
control groups without random sampling, and (3) pre-experimental design, which lacks
both random sampling and control groups. This study adopts these categories. However,
it should be noted that this study is confined to experimental studies carried out in K-12
and universities, where random sampling is infrequently applied. Therefore, none of the
studies featured random sampling.
The ‘school level’ is the third criterion, categorizing the setting of the experiment
into primary school, secondary school, and university. Initially, a distinction was planned
between middle and high schools, but no middle school studies were included in the final
meta-analysis.
The fourth criterion, ‘technology use’, differentiates between studies that incorpo-
rated technology-related elements, such as computers, tablets, or applications, in their
gamification process, and those that did not.
The fifth criterion, ‘gaming element’, classifies the elements utilized in each gami-
fied learning experiment, only recording those explicitly mentioned in each study, and
recategorizing these according to the classifications of prior studies [2,17,34,38–40].
Items 6–11 correspond to continuous scale (interval scale) moderators, with the results
for each case entered as raw data (numerical data). In the ‘grade’ category, the grades of
experiment participants are recorded. In studies where multiple grades are involved in the
experiment [17,41], an average grade is calculated based on the ratio of students by grade
and recorded.
Sustainability 2023, 15, 11325 7 of 19

For ‘number of participants’, the total number of students from both the experimental
and control groups are added up. While conducting a meta-analysis, both approaches
can be utilized, depending on the research question and the context of the study. Yet, it is
typically more common to use the total number of participants from both groups, especially
when determining effect sizes such as standardized mean differences. These effect sizes
rely on the variability within both groups, and larger sample sizes usually offer more
dependable estimates of the effect size [27].
The set of items—‘weeks’, ‘sessions’, and ‘sessions per weeks’—evaluates the influence
of treatment duration on gamified learning. Although all studies in this meta-analysis
provided information about the week, some studies [17,42] did not provide session infor-
mation. There are various methods for handling missing values, such as the likewise or
pairwise deletion [43], model-based method [44], and imputation method [45,46]. In this
study, regression imputation, which can predict missing values based on other variables
and maintain the relationship between variables as much as possible, was used.
The final set of items, ‘dependent variables’, assesses the English-proficiency-related
variables under examination for the effects of gamification. There were a total of six de-
pendent variables among the studies targeted in this meta-analysis: listening, speaking,
reading, writing, vocabulary, and achievement. The results of each study’s coding, classified
by these criteria, are presented in Appendices A and B.

2.3. Instruments and Analysis


This section delineates the tools used for the meta-analysis, the procedure under-
taken, and the methods of interpretation. The meta-analysis was carried out using three
types of software: Excel 16.73, Comprehensive Meta Analysis (CMA) 3.3, and R Studio
2023.06.0+421. We utilized Excel for coding and entering data for each research. Subse-
quently, CMA 3 facilitated the computation of individual effect sizes and the homogeneity
test (Q-test) for categorical variables, which was necessary for the subgroup analysis. Mean-
while, the ‘metafor’ package in R Studio was employed for the meta-regression analysis of
continuous moderating variables, the evaluation of publication bias, and the visualization
of our findings.
In terms of effect size computation, we applied Hedges’ g. Notably, although Hedges’
g and Cohen’s d bear similarities in interpretation, the former includes a bias correction
contributing to more conservative effect size estimates. This measure assists in reducing
Type I errors, thus preventing false claims of an effect when none is present [47].
The choice of the analytical model hinged on the Q-test for homogeneity. As a general
protocol, a Fixed Effect Model was adopted where homogeneity was established, while a
Random Effect Model was used in the absence of homogeneity [27]. Subsequent to the Q-
test, we computed the mean effect size. We also scrutinized the presence of publication bias
utilizing multiple approaches, including Egger’s Regression Test, Begg’s Rank Correlation
Test, Trim and Fill Method, and fail-safe N analysis.
We conducted separate analyses of the effects of moderating and dependent variables,
keeping their nature in consideration. For variables on a discrete scale, we executed a
homogeneity test (Q-test) to verify statistically significant differences among the variables.
In the case of moderating variables on a continuous scale, a meta-regression analysis was
performed to assess their impact on dependent variables.
Finally, Cohen’s guidelines [48] informed the interpretation of effect sizes, with 0.2, 0.5,
and 0.8 representing ‘small’, ‘medium’, and ‘large’ effect sizes, respectively. The process of
interpreting effect sizes adhered to these benchmarks, while acknowledging alternative
standards proposed by other scholars [49–52].

3. Results and Discussion


3.1. Homogeneity Test and Assessment of Effect Sizes
Before the computation of effect sizes, we conducted a homogeneity test (Q-test).
The outcome revealed a significant level of heterogeneity amongst the analyzed studies
Sustainability 2023, 15, 11325 8 of 19

(Q = 30.848, df = 10, p = 0.001). Notably, about two-thirds of the observed variability in the
results across studies (indicated by the I2 statistic of 67.583) was due to true differences in
effect sizes rather than merely sampling errors. Considering the typical interpretation of
Sustainability 2023, 15, x FOR PEER REVIEW 8 of 19
I2 values—25%, 50%, and 75%, corresponding to low, moderate, and high heterogeneity,
respectively [53], our study unveiled a moderate to high level of heterogeneity.
Card provides three potential options for instances where significant heterogeneity is
Card
identified: provides three potential
(1) disregarding options for instances
the heterogeneity where significant
and proceeding heterogeneity
with analysis as if the data
iswere
identified: (1) disregarding the heterogeneity and proceeding with
homogeneous, an approach that is typically considered least justifiable; analysis as if the
(2)data
undertak-
were homogeneous, an approach that is typically considered least justifiable;
ing moderator analyses, which leverage the coded characteristics of the studies, such as (2) under-
taking moderator analyses,
methodological features orwhich leverage
sample the coded
attributes, characteristics
to predict variances of in
theeffect
studies, such
sizes between
asstudies;
methodological
or (3) adopting an alternative to the Fixed Effect Model, specifically thebe-
features or sample attributes, to predict variances in effect sizes Random
tween studies; or (3) adopting an alternative to the Fixed Effect Model, specifically the
Effect Model, which conceptualizes the population effect size as a distribution rather than
Random Effect Model, which conceptualizes the population effect size as a distribution
a fixed point (pp. 184–185) [30]. In response to the detected heterogeneity in this study,
rather than a fixed point (pp. 184–185) [30]. In response to the detected heterogeneity in
we applied the Random Effects Model to compute the mean effect size, thus mitigating
this study, we applied the Random Effects Model to compute the mean effect size, thus
potential errors attributed to the heterogeneity. We also pursued further investigation into
mitigating potential errors attributed to the heterogeneity. We also pursued further inves-
the heterogeneity sources through a subgroup analysis and meta-regression analysis, based
tigation into the heterogeneity sources through a subgroup analysis and meta-regression
on the characteristics of the moderator.
analysis, based on the characteristics of the moderator.
Subsequently,
Subsequently, wewe employed
employed Hedges’
Hedges’ g togdetermine
to determine the individual
the individual effecteffect
size forsize for each
each
study. As shown in Figure 2, the resulting individual effect sizes displayed a range from from
study. As shown in Figure 2, the resulting individual effect sizes displayed a range
−0.26
−0.26 to to 1.26.
1.26. Among
Among these,
these, we we observed
observed ninenine instances
instances of positive
of positive effect effect sizestwo
sizes and andoftwo of
negative, indicating that most studies reported beneficial
negative, indicating that most studies reported beneficial outcomes. outcomes.

Forest
Figure2.2.Forest
Figure plot
plot [11,18,19,41,42,54–56].
[11,18,19,41,42,54–56].

Theoverall
The overall mean
mean effect
effect size,
size, computed
computed via random-effects
via the the random-effects model, model, was 0.517,
was 0.517,
whichsignifies
which signifiesa medium
a medium effect
effect size
size (N (N = 610,
= 610, k =g11,
k = 11, g = 0.517,
= 0.517, se = 0.139,
se = 0.139, CI = 0.245–0.790,
CI = 0.245–0.790,
ZZ= =3.718,
3.718,p =p0.000).
= 0.000).
ThisThis
impliesimplies that
that the the employment
employment of gamification-based
of gamification-based English in-English
instruction
struction can can
leadlead
to a to a medium-to-large
medium-to-large improvement
improvement in English
in English proficiency.
proficiency.
As
Asnoted
notedininthe
theprevious
previous discussion
discussion ononrelated literature,
related nono
literature, quantitative
quantitativemeta-anal-
meta-analysis
ysis
hashas
beenbeen conductedsolely
conducted solelyon onthe
the impact
impact of ofgamification
gamificationonon foreign
foreignlanguage
languagelearning
learning to
todate.
date.Nonetheless,
Nonetheless,Bai Baietetal.
al.[2]
[2]have
have documented
documented that gamificationin
that gamification indiverse
diverseeduca-
educational
tional contexts
contexts correlates
correlates withwith a medium
a medium effect
effect sizesize
(g =(g = 0.504)
0.504) on academic
on academic achievements.
achievements. Interest-
Interestingly, they reported a slightly smaller effect size (g = 0.377) for language-related
ingly, they reported a slightly smaller effect size (g = 0.377) for language-related disciplines.
disciplines.
While theirWhile
studytheir study
did not did not
isolate isolate
foreign foreign languages,
languages, particularlyparticularly English,
English, a larger a size
effect
larger effect size emerged from the findings of the present study. In reviewing the collec-
tive findings of past meta-analyses that investigated the influence of gamification on aca-
demic achievements across a range of subjects, inclusive of foreign languages, the
Sustainability 2023, 15, 11325 9 of 19

emerged
Sustainability 2023, 15, x FOR PEER REVIEW from the findings of the present study. In reviewing the collective findings 9 of of
19 past
meta-analyses that investigated the influence of gamification on academic achievements
across a range of subjects, inclusive of foreign languages, the reported effect sizes typically
fall into the medium effect size ([34]: g = 0.464; [35]: g = 0.49; [36]; g = 0.557). Remarkably,
reported effect sizes typically fall into the medium effect size ([34]: g = 0.464; [35]: g = 0.49;
these reported effect sizes are closely aligned with the findings from the current study. As
[36]; g = 0.557). Remarkably, these reported effect sizes are closely aligned with the find-
such, the evidence suggests that the application of gamification could result in a moder-
ings from the current study. As such, the evidence suggests that the application of gami-
ate increase in academic achievements across various disciplines, including English as a
fication could result in a moderate increase in academic achievements across various dis-
Foreign
ciplines,Language.
including English as a Foreign Language.
3.2. Publication Bias
3.2. Publication Bias
Publication bias was
Publication bias wasassessed
assessedusing
usingvarious
various statistical
statistical analyses,
analyses, including
including a funnel
a funnel
plot with Egger’s Regression Test, Begg’s Rank Correlation Test, Trim
plot with Egger’s Regression Test, Begg’s Rank Correlation Test, Trim and Fill Method,and Fill Method,
and
and fail-safe N analysis. Each study’s effect size and standard error were derived, and and
fail-safe N analysis. Each study’s effect size and standard error were derived, the the
resulting
resulting funnel plot (Figure
funnel plot (Figure3)3)showed
showedconsiderable
considerable symmetry.
symmetry. TheThe results
results fromfrom Egger’s
Egger’s
Regression Test were
Regression Test werenotnotstatistically
statistically significant
significant (z =(z = −0.5410,
−0.5410, p = 0.589),
p = 0.589), suggesting
suggesting no ev- no
evidence
idence ofof funnel
funnel plot
plot asymmetry
asymmetry or publication
or publication bias bias inmeta-analysis
in this this meta-analysis
[31]. [31].

Figure3.
Figure 3. Funnel
Funnel plot.
plot.

The Begg’s
The Begg’s Rank
RankCorrelation
Correlation Test measured
Test measured the correlation
the correlationbetween the ranks
between theofranks
ef- of
fect sizes
effect sizesand
andtheir
theirvariances.
variances. AA negative
negativeKendall’s
Kendall’stautau (−0.2727)
(−0.2727) suggests smaller
suggests studies
smaller studies
tendedtotoshow
tended showsmaller
smallereffects,
effects,but
butthe
therelationship
relationshipisisnot
notstrong.
strong.The
Thep-value
p-valueofof0.283,
0.283,
being
being greater
greater thanindicates
than 0.05, 0.05, indicates the asymmetry
the asymmetry observed
observed in the
in the funnel
funnel plot
plot is isnot
notstatistically
statis-
tically significant.
significant. Thus, is
Thus, there there is insufficient
insufficient evidence
evidence of publication
of publication biasbias based
based onon this
this test
test [57].
[57].The Trim and Fill method, a non-parametric technique, estimates potentially missing
studiesThe Trim
due to and Fill method,
publication biasaby
non-parametric
assessing thetechnique,
funnel plot’sestimates potentially
symmetry missing
[58]. The method
identified no missing studies on the right side of the funnel plot (SE = 2.1765),method
studies due to publication bias by assessing the funnel plot’s symmetry [58]. The suggesting
identified
no no missing
substantial studies
publication on the right side of the funnel plot (SE = 2.1765), suggesting
bias.
no substantial publication bias.
Finally, the fail-safe N was calculated using the Rosenthal approach. The fail-safe N
Finally, the fail-safe N was calculated using the Rosenthal approach. The fail-safe N
method estimates the number of unpublished studies with null findings that would need
method estimates the number of unpublished studies with null findings that would need
to exist to make the meta-analysis finding statistically non-significant. The fail-safe N of
to exist to make the meta-analysis finding statistically non-significant. The fail-safe N of
this meta-analysis is 197, exceeding Rosenthal’s threshold of 5k + 10 (65 for this study) [59].
this meta-analysis is 197, exceeding Rosenthal’s threshold of 5k + 10 (65 for this study)
Therefore, given the observed significance level of less than 0.0001, the meta-analysis results
[59]. Therefore, given the observed significance level of less than 0.0001, the meta-analysis
appear robust and not overly influenced by publication bias.
results appear robust and not overly influenced by publication bias.
Sustainability 2023, 15, 11325 10 of 19

3.3. Effect Sizes by Moderating Variables


The noticeable heterogeneity across the study results instigated a more detailed explo-
ration of potential sources [30]. The results from each study were stratified according to
various moderators. Subsequently, an analysis was carried out to identify whether these
stratified groups manifested statistically significant disparities in their respective effect
sizes.

3.3.1. Subgroup Analysis


For moderators grouped under a nominal scale, a Q-test was utilized to gauge the
variances in effect sizes among the subvariables. This classification incorporated five
variables. With respect to gaming elements, there were nine subvariables. Given that
numerous studies incorporated a mix of multiple gaming elements, the examination was
organized depending on the inclusion or exclusion of each individual element in the
experiment. The mean effect size for the variables, except gaming elements, is presented in
Table 3.

Table 3. Effect size data by moderators.

Effect Size and 95% Confidence Interval Heterogeneity


Moderator
N k g SE 95% CI Z p Q df p
Publication Journal Article 324 7 0.298 0.180 −0.055~0.650 1.657 0.098
6.426 1 0.011 *
Type Thesis 286 4 0.799 0.083 0.638~0.961 9.682 0.000
Experimental Quasi-experimental 462 10 0.475 0.161 0.160~0.790 2.953 0.003
2.953 1 0.103
Design Pre-experimental 148 1 0.778 0.093 0.595~0.961 8.325 0.000
Primary 270 4 0.801 0.094 0.616~0.986 8.484 0.000
School Level Secondary 46 1 0.624 0.297 0.042~1.207 2.102 0.036 5.589 2 0.061
Tertiary 294 6 0.270 0.205 −0.132~0.672 1.316 0.180
Technology Yes 410 9 0.383 0.149 0.091~0.674 2.572 0.010
4.171 1 0.041 *
Use No 200 2 0.932 0.224 0.494~1.370 4.171 0.000
* p < 0.05.

1. Publication type
Upon assessing the impact of gamification by the type of publication, it was observed
that master’s theses (g = 0.799) exhibited a higher effect size than that of journal articles
(g = 0.289). Statistical significance was confirmed for these differences via a homogeneity
test (Q = 6.426, df = 1, p = 0.011). Interestingly, these findings are in stark contrast to those
presented by Huang et al. [34], where journal articles (g = 0.662) and conference proceedings
(g = 0.666) recorded a medium-to-large effect size, while dissertations/theses exhibited a
negative effect size (g = −0.170).
Numerous studies concerning publication bias warn of an increased propensity for
published studies to exhibit positive effects in comparison to their unpublished counter-
parts, often attributing such tendencies to journals’ bias towards positive results [60,61],
researchers’ bias towards positive results [62], and the phenomenon of ‘salami slicing’ [63].
However, in an interesting deviation from these widely held perspectives, the findings of
this study contradict the commonly perceived pattern.
A potential explanation for these findings might be rooted in the research settings.
Specifically, journal articles, which tended to report relatively lower effects, were predomi-
nantly conducted in university environments. On the other hand, the theses that reported
larger effect sizes were chiefly carried out in elementary school settings. This disparity in
research settings across different types of sources may account for the observed differences.
2. Experimental design
The current study, following Brown’s classification [37], divided the experimental
design into two types: quasi-experimental and pre-experimental. In the quasi-experimental
Sustainability 2023, 15, 11325 11 of 19

design, while no random sampling was conducted, both control and experimental groups
were set up. In contrast, the pre-experimental design only included an experimental group,
without implementing any random sampling. The results displayed a medium-to-large
effect size (g = 0.778) in the pre-experimental design and a small-to-medium effect size
(g = 0.475) in the quasi-experimental design. However, this difference was not statistically
significant (Q = 2.953, df = 1, p = 0.103).
Sailer and Hommer’s study [35] utilized the presence of randomization, differentiating
between the true-experimental design and quasi-experimental design, as a moderating
variable in their meta-analysis on the impact of gamification on academic achievement.
Their findings revealed a larger effect size in the quasi-experimental design (g = 0.56) than
in the true-experimental design (g = 0.29). Upon synthesizing the findings of both studies,
a trend became evident: as experimental designs become stricter, the effect sizes reported
tend to decrease.
3. School level
A comparison of the effects of gamification was conducted in the educational envi-
ronments of primary school, secondary school, and university. The results revealed that
gamification had the largest effect size in primary schools, with a value of 0.801. The effect
size was slightly lower in secondary schools (g = 0.624) and further reduced in universities
(g = 0.270). Although there was a decreasing trend in effect sizes as the school level in-
creased, these differences were not statistically significant (Q = 5.589, df = 2, p = 0.061). This
finding aligns with a previous meta-analytic study that examined the influence of gami-
fication on academic achievement [2,34,36], which also found no statistically significant
differences among school levels.
The observed pattern of decreasing effect sizes from primary schools to universities
could be attributed to various factors. To begin, primary school students generally demon-
strate a keen interest in play and games, which makes them potentially more receptive
to gamification in education. The familiarity and engagement with digital games outside
school settings might facilitate the transference of this enthusiasm into the classroom when
gamified methods are adopted [64].
In contrast, as we ascend the educational ladder to secondary schools and universities,
students’ learning preferences and motivations may undergo significant shifts. Older
students, for instance, could perceive gamification elements, such as rewards, points,
and badges, as somewhat infantile or even as distractions from their main educational
objectives. Intrinsically motivated, these students often place greater value on autonomous
and self-directed learning experiences, exhibiting a tendency to favor traditional teaching
methodologies over gamified ones [65,66].
While these considerations offer insight into the decreasing trend of effect sizes, they
also highlight the importance of adapting educational strategies to cater to the evolving
needs and preferences of students at different stages of their academic journey. Therefore,
the effective implementation of gamification in education, across all levels, may require a
nuanced approach tailored to the age and preferences of the learners.
4. Technology use
In this moderator, we examined the impact of gamification on learning outcomes,
particularly differentiating between instances where technology such as computers, tablets,
or applications were utilized and those where no technology was used. The outcomes
yielded a noticeable distinction between the two groups. In scenarios where technology
was absent, we observed an effect size of g = 0.932, surpassing the range defined as a large
effect size. However, when technology was incorporated, the effect size (g = 0.383) fell
within the range of a small-to-medium effect size. The difference in these results was found
to be statistically significant (Q = 4.171, df = 1, p = 0.041).
The results in this moderator analysis, demonstrating higher effect sizes in non-
technology gamified interventions, appear to be in opposition to the findings of Yıldırım’s
research, which did not identify a significant difference between technology-based and
Sustainability 2023, 15, 11325 12 of 19

non-technology courses in a gamified class [36]. Similarly, in Bai et al.’s examination of


gamification’s impact, treating flipped learning as a moderating variable, no significant
statistical difference was found, although a slightly higher effect size was reported in
flipped classes (g = 0.671) compared to non-flipped ones (g = 0.446) [2].
This discrepancy in outcomes raises important considerations about the role of tech-
nology in gamified educational environments. Despite the widespread belief in the efficacy
of technology in enhancing learning outcomes, our study suggests that its deployment
does not inherently guarantee improved results. This might be due to the increased focus
and reduced distractions that a non-digital environment might offer. Furthermore, the
effectiveness of technology use in second language (L2) learning is highly dependent on a
myriad of contextual factors. These may include individual differences among learners,
such as their learning preferences, learning styles, and varying levels of technology acces-
sibility and familiarity. All these factors could significantly influence the effectiveness of
incorporating technology into the language learning process [67]. This would necessitate a
careful evaluation of whether the use of technology would enhance or potentially detract
from the intended learning outcomes.
5. Gaming elements
Gamification can be implemented using various gaming elements. In this study, the
gaming elements utilized in each study were extracted during the coding process, and
were organized into nine subelements based on the classification of previous research. As
multiple elements were discovered through this process, and because several elements were
used simultaneously in each study, the effectiveness of gaming elements was compared
based on the use of each element. The results are presented in Table 4.

Table 4. Effect size data by gaming elements.

Effect Size and 95% Confidence Interval Heterogeneity


Moderator
N k g SE 95% CI Z p Q df p
Avatar/ Yes 276 4 0.679 0.197 0.292~1.065 3.441 0.001
0.931 1 0.335
Character No 334 7 0.415 0.189 0.044~0.786 2.193 0.028
Point/ Yes 370 8 0.340 0.161 0.025~0.655 2.117 0.034
6.235 1 0.013 *
Score No 240 3 0.840 0.119 0.606~1.073 7.055 0.000
Leaderboard/ Yes 536 8 0.565 0.166 0.241~0.890 3.412 0.001
0.660 1 0.417
Scoreboard No 74 3 0.337 0.227 −0.109 ~0.782 1.482 0.138
Yes 116 3 0.426 0.185 0.064~0.788 2.306 0.021
Feedback 0.218 1 0.640
No 494 8 0.545 0.176 0.201~0.889 3.014 0.002
Yes 368 4 0.502 0.252 0.008~0.996 1.992 0.046
Level 0.010 1 0.921
No 242 7 0.532 0.169 0.204~0.860 3.177 0.001
Yes 270 4 0.591 0.147 0.303~0.879 4.027 0.000
Collaboration 0.091 1 0.763
No 340 7 0.510 0.227 0.065~0.954 2.247 0.025
Mission/ Yes 294 6 0.496 0.256 −0.006~0.998 1.936 0.053
0.210 1 0.647
Challenge No 316 5 0.625 0.119 0.392~0.858 5.248 0.000
Story/ Yes 244 4 0.713 0.211 0.300~1.125 3.385 0.001
1.196 1 0.274
Fiction No 366 7 0.413 0.175 0.070~0.757 2.359 0.018
Badge/ Yes 246 3 0.832 0.134 0.569~1.094 6.207 0.000
5.088 1 0.024 *
Reward No 364 8 0.351 0.165 0.027~0.676 2.125 0.034
* p < 0.05.

Among the nine subvariables, statistically significant differences were found for
point/score and badge/reward. First, in the case of point/score, the use of this ele-
ment corresponded to a small-to-medium effect size of 0.340, while the absence of its
use corresponded to a large effect size of 0.840 (Q = 6.235, df = 1, p = 0.013). Secondly,
Sustainability 2023, 15, 11325 13 of 19

for ‘badge/reward’, there was a difference of about 0.48 between cases where it was ap-
plied (g = 0.832) and not applied (g = 0.351), which was statistically significant (Q = 5.088,
df = 1, p = 0.024). On the other hand, no statistically significant differences were found
for the other variables, such as avatar/character, leaderboard/scoreboard, feedback, level,
collaboration, mission/challenge, and story/fiction.
The findings of Hamari, Koivisto, and Sarsa [68], which highlighted the capacity
of various gaming elements, notably points, badges, and leaderboards, to boost student
motivation and subsequently enhance academic performance. The prominent effects of
these distinct gaming elements suggest that they potentially stimulate students’ intrinsic
motivation by furnishing palpable markers of progress and accomplishments. Contrast-
ingly, Huang and colleagues [34] did not identify any significant differences among 14
gaming elements, including leaderboards, badges/awards, points/experiences, and ad-
vancements/levels. Similarly, Bai et al. [2] found no statistical significance when comparing
effects across types of gaming elements.
In view of these divergent findings, while points/scores and badges/rewards seem to
be influential components within gamified learning environments, educators are urged to
consider the comprehensive design of the learning experience. This includes contemplating
the synergistic effects among various gaming elements and aligning them with the specific
needs and preferences of learners. It is imperative to underscore that the effectiveness of
these gaming elements may fluctuate depending on the context and the characteristics of
the learners [14].

3.3.2. Meta-Regression Analysis


Meta-regression analysis was used to scrutinize the impact of interval-scale moderat-
ing variables, such as grade, duration of instruction (in weeks), number of class sessions,
number of gaming elements, etc., on the dependent variable. The outcomes of this investi-
gation are detailed in Table 5.

Table 5. Results of the meta-regression analysis.

Variable b SE 95% CI Z p
Intercept −13.8558 44.5029 −101.0800~73.3683 −0.3113 0.756
Grade 0.0115 0.3317 −0.6387~0.6616 −0.0345 0.973
No. of participants 0.0034 0.0313 −0.0579~0.0648 0.1098 0.913
Weeks 1.1266 3.2314 −5.2069~7.4601 0.3486 0.747
Sessions −0.7337 2.1939 −5.0336~3.5662 −0.3344 0.738
Sessions per week 9.1047 28.205 −45.8144~64.0238 0.3249 0.745
No. of gaming elements 0.1789 0.2491 −0.3093~0.6672 0.7183 0.473

In a meta-regression analysis, the regression coefficient, or ‘estimate’, provides insight


into the potential increase (when the estimate is positive) or decrease (when the estimate is
negative) in the dependent variable for each unit increase in the independent variable [27].
In the provided results, all the p-values exceed the standard significance threshold of 0.05,
suggesting that none of the examined variables—grade, number of participants, weeks,
sessions, sessions per week, and number of gaming elements—significantly influence the
outcome in this meta-regression model.
Contrasting with this study, Bai et al. [2] found statistically significant differences in
sample size and intervention duration. However, it is noteworthy that Bai’s findings were
based on a subgroup analysis rather than a meta-regression. In Bai’s study, the relationship
between sample size and effect sizes was not linear, showing effect sizes of g = 0.984 for
50–100 participants, g = 0.501 for fewer than 50 participants, and g = 0.106 for more than
150 participants. A similar non-linear pattern was observed for intervention duration, with
effect sizes of g = 0.906 for 1–3 months, g = 0.533 for less than 1 week, g = 0.488 for 1 week
Sustainability 2023, 15, 11325 14 of 19

to 1 month, and g = −0.278 for 1 semester or longer. Similarly, Sailer and Hommer [35]
did not find any statistically significant difference when conducting a subgroup analysis
according to the duration of intervention. The fact that the largest effect sizes were not
found in the largest sample size or the longest intervention duration groups implies that
‘more’ is not necessarily ‘better’ in the context of gamified learning. These non-linear
relationships concerning the effect size, sample size, intervention duration, and other
variables may point to the existence of ‘sweet spots’ that might be optimal for gamified
learning implementations.

3.4. Mean Effect Sizes by Dependent Variables


The study divided the dependent variables from each research into six domains: lis-
tening, speaking, reading, writing, vocabulary, and achievement. Further stratification
was performed within these categories. In DV1, the variables were split between receptive
and productive skill, whereas in DV2, they were categorized into spoken and written
language. A noteworthy detail is that Kim’s study [19], initially classified under ‘achieve-
ment’, primarily focused on reading. Thus, it was classified under ‘comprehension’ in DV1
and ‘written language’ in DV2. The outcomes, derived through these classifications and
subclassifications, are presented in Table 6.

Table 6. Effect size data by dependent variables.

Effect Size and 95% Confidence Interval Heterogeneity


Moderator
N k g SE 95% CI Z p Q df p
Receptive 449 7 0.522 0.192 0.145~0.899 2.714 0.007
DV1 0.020 1 0.888
Productive 161 4 0.559 0.183 0.201~0.918 3.061 0.002
Spoken 178 2 0.760 0.090 0.583~0.937 8.400 0.000
DV2 2.087 1 0.149
Written 432 9 0.473 0.177 0.127~0.819 2.679 0.007
Listening 148 1 0.778 0.093 0.595~0.961 8.325 0.000
Speaking 30 1 0.490 0.361 −0.217~1.197 1.358 0.175
Reading 87 1 −0.256 0.213 −0.674~0.162 −1.199 0.231
DV3 23.885 5 0.000 *
Writing 131 3 0.527 0.261 0.015~1.039 2.018 0.044
Vocabulary 138 3 0.875 0.198 0.486~1.263 4.413 0.000
Achievement 76 2 0.274 0.226 −0.169 ~0.717 1.210 0.226
* p < 0.05.

Comparisons between receptive skills (g = 0.522) and productive skills (g = 0.559) in


DV1 and between spoken language (g = 0.760) and written language (g = 0.473) in DV2
revealed no statistically significant differences. After delving deeper with a subgroup
analysis based on the subcomponents of English proficiency, distinct trends emerged.
Notably, the use of vocabulary displayed a large effect size (g = 0.875), while listening
(g = 0.778) and writing (g = 0.527) skills demonstrated medium-to-large effect sizes. Other
factors, such as speaking, achievement, and reading skills, indicated a small-to-medium
effect size. Particularly intriguing was the observation that the single case study related to
reading resulted in a negative effect size (g = −0.256).
Upon delving deeper into the subcomponents of English proficiency, vocabulary
usage displayed a large effect size. This aligns with the observation that vocabulary
acquisition plays a crucial role in language learning [69]. Similarly, listening and writing
skills demonstrated medium-to-large effect sizes, emphasizing the potential of gamified
interventions in improving these specific skill sets. In contrast, speaking, achievement,
and reading skills indicated a small-to-medium effect size. It is particularly interesting to
note that the single case study related to reading resulted in a negative effect size. This
might indicate that certain gamified approaches are less effective for reading instruction, or
that other variables might have affected the outcomes of that specific study. However, the
effectiveness of different methods in reading instruction is known to vary considerably [70],
Sustainability 2023, 15, 11325 15 of 19

suggesting that more research is needed to fully understand this result. To sum up, more
nuanced research could investigate further how different elements of gamified learning
might be more or less effective for different language skills [71].

4. Conclusions and Implications


The meta-analysis provided in this study highlighted the significant impact of gam-
ification on English language learning. The results show that the use of gamification in
education positively affects English language learning (g = 0.517). Simultaneously, signifi-
cant heterogeneity was found among the results of the subjects of this meta-analysis. To
identify the causes of this homogeneity, a subgroup analysis and meta-regression analysis
were conducted based on the moderator variables.
The higher effect size demonstrated by MA theses (g = 0.799) compared to journal arti-
cles (g = 0.298) suggests a potential publication bias, where more rigorous or conservative
research venues might publish studies with less pronounced effects. This calls for more
transparency and diversity in research reporting across different publication venues [72].
The contrasting findings on technology use (g = 0.932 without technology; g = 0.383
with technology) further extend the discourse on the role of technology in gamified learning
environments. They suggest that the mere application of technology does not automati-
cally enhance learning outcomes, echoing the need for a more nuanced understanding of
technology integration in gamified education, considering factors such as the pedagogical
alignment, learners’ technology literacy, and the appropriateness of the technology for the
learning objectives [73].
From a practical standpoint, educators and curriculum designers should take note of
the significant differences in effects based on the types of gamification elements used. Specif-
ically, the results suggest that incorporating badges/rewards rather than points/scores
might lead to more successful outcomes in English language learning settings [65].
The absence of significant influence of factors such as grade, number of participants,
weeks, sessions, and number of gaming elements on the outcomes aligns with previous
research indicating that the successful implementation of gamification may be more about
the quality of the design rather than the quantity of gamified elements [16].
Furthermore, the application of gamification across different facets of language learn-
ing without significant differences in effects validates its versatility as an instructional tool.
However, educators should note that specific subcomponents of English proficiency (e.g.,
vocabulary, listening, and writing) may respond differently to gamification.
However, these findings should be interpreted in light of certain limitations. The scope
of studies included in this meta-analysis might limit the generalizability of the findings to
other contexts or populations. Additionally, there might be other unmeasured variables
or potential confounders that were not accounted for in this study, which could have
influenced the observed effects. Moreover, it is worth noting that the single case study
related to reading resulted in a negative effect size (g = −0.256), which warrants further
investigation.
As the effectiveness of gamified learning continues to be explored, it is hoped that
future research will provide further insight into how to maximize the potential of gamifi-
cation in language learning and other educational contexts. Furthermore, more nuanced
research investigating the differential impact of gamification on specific language skills is
recommended.
These conclusions have important implications for educators and researchers in the
field of English language learning. Given the emerging evidence of the benefits of gamifica-
tion, educators are encouraged to incorporate such strategies into their teaching methods.
Additionally, the suggested inclusion of non-English studies in future systematic review
and meta-analyses would offer a more comprehensive perspective on the global impact
and potential of gamification in language learning.
Sustainability 2023, 15, 11325 16 of 19

Author Contributions: Conceptualization, J.-Y.L. and M.B.; methodology, J.-Y.L. and M.B.; software,
J.-Y.L.; validation, J.-Y.L. and M.B.; formal analysis, J.-Y.L. and M.B.; investigation, J.-Y.L. and M.B.;
resources, J.-Y.L. and M.B.; data curation, J.-Y.L. and M.B.; writing—original draft preparation, J.-Y.L.
and M.B.; writing—review and editing, J.-Y.L. and M.B.; visualization, J.-Y.L.; supervision, J.-Y.L.;
project administration, J.-Y.L. and M.B. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Coding Results I


Study PT ED SL TU G P W S S/W DV
Kim (2014) [18] Journal Quasi University Yes 14 38 15 30 2 Achievement
Lee (2019) [42] Journal Quasi Primary Yes 6 30 12 - - Speaking
Lee (2022) [11] Journal Quasi University Yes 12 87 15 30 2 Reading, Writing
Laffey (2022) [19] Journal Quasi University Yes 14.59 22 15 - - Writing
Kim (2023) [54] Thesis Quasi Primary Yes 5 40 4 6 1.5 Vocabulary
Baek (2021) [55] Thesis Quasi Primary No 4 52 12 24 2 Vocabulary
Ahn (2019) [41] Thesis Pre Primary No 3.5 148 10 10 1 Listening
Jeon (2021) [56] Thesis Quasi Secondary Yes 12 46 6 8 1.33 Vocabulary
PT: Publication Type, ED: Experimental Design, SL: School Level, TU: Technology Use, G: Grade,
P: Participants, W: Weeks, S: Sessions, S/W: Sessions per Week, DV: Dependent Variable.

Appendix B. Coding Results II


Study A/C P/S L/S F L C M/C S/F B/R NGE
Kim (2014) [18] O O O O - O - - - 5
Lee (2019) [42] - O - - - - - - - 1
Lee (2022) [11] - O O - - - O - - 3
Laffey (2022) [19] - O - - O - O O - 4
Kim (2023) [54] - - O O - - - - - 2
Baek (2021) [55] O - O - - - O O O 5
Ahn (2019) [41] - O O - O O - O O 6
Jeon (2021) [56] - O O - O O O - O 6
A/C: Avatar/Character, P/S: Point/Score, L/S: Leaderboard/Scoreboard, F: Feedback, L: Level, C: Collaboration,
M/C: Mission/Challenge, S/F: Story/Fiction, B/R: Badge/Reward, NGE: Number of Gaming Elements.

References
1. Zimmerman, E. Position statement: Manifesto for a ludic century. In The Gameful Word: Approaches, Issues, Applications; Walz, S.P.,
Deterding, S., Eds.; The MIT Press: Cambridge, MA, USA, 2015; pp. 19–22.
2. Bai, S.; Hew, K.F.; Huang, B. Does gamification improve student learning outcome? Evidence from a meta-analysis and synthesis
of qualitative data in educational contexts. Educ. Res. Rev. 2020, 30, 100322. [CrossRef]
3. Gee, J.P. What Video Games Have to Teach Us about Learning and Literacy; Palgrave Macmillan: New York, NY, USA, 2003.
4. Burke, B. Gamify: How Gamification Motivates People to Do Extraordinary Things; Bibliomotion: New York, NY, USA, 2014.
5. Le-Thi, D.; Dörnyei, Z.; Pellicer-Sánchez, A. Increasing the effectiveness of teaching L2 formulaic sequences through motivational
strategies and mental imagery: A classroom experiment. Lang. Teach. 2020, 26, 136216882091312. [CrossRef]
6. Philp, J.; Duchesne, S. Exploring engagement in tasks in the language classroom. Annu. Rev. Appl. Linguist. 2016, 36, 50–72.
[CrossRef]
Sustainability 2023, 15, 11325 17 of 19

7. Berns, A.; Isla-Montes, J.; Palomo-Duarte, M.; Dodero, J. Motivation, students’ needs and learning outcomes: A hybrid game-based
app for enhanced language learning. SpringerPlus 2016, 5, 1305. [CrossRef] [PubMed]
8. Hung, H. Clickers in the flipped classroom: Bring your own device (BYOD) to promote student learning. Interact. Learn. Environ.
2017, 25, 983–995. [CrossRef]
9. Purgina, M.; Mozgovoy, M.; Mozgovoy, M. WordBricks: Mobile technology and visual grammar formalism for gamification of
natural language grammar acquisition. J. Educ. 2019, 58, 073563311983301. [CrossRef]
10. Cardoso, W.; Rueb, A.; Grimshaw, J. Can an interactive digital game help French learners improve their pronunciation? In CALL
in a Climate of Change: Adapting to Turbulent Global Conditions—Short Papers from EUROCALL 2017; Borthwick, K., Bradley, L.,
Thouësny, S., Eds.; Research-Publishing.Net: Dublin, Ireland, 2017; pp. 67–72.
11. Lee, J. Effects of using gamification-based quiz on recalling formulaic sequences. Int. Promot. Agency Cult. Technol. 2022, 8,
589–596. [CrossRef]
12. Munday, P. The case for using Duolingo as part of the language classroom experience. RIED 2016, 19, 83–101. [CrossRef]
13. Koivisto, J.; Hamari, J. The rise of motivational information systems: A review of gamification research. Int. J. Inf. Manag. 2019,
45, 191–210. [CrossRef]
14. Sailer, M.; Hense, J.U.; Mayr, S.K.; Mandl, H. How gamification motivates: An experimental study of the effects of specific game
design elements on psychological need satisfaction. Comput. Hum. 2017, 69, 371–380. [CrossRef]
15. Dignan, A. Game Frame: Using Games as a Strategy for Success; Free Press: New York, NY, USA, 2011.
16. Hanus, M.D.; Fox, J. Assessing the effects of gamification in the classroom: A longitudinal study on intrinsic motivation, social
comparison, satisfaction, effort, and academic performance. Comput. Educ. 2014, 80, 152–161. [CrossRef]
17. Toda, A.M.; Klock, A.C.T.; Oliveira, W.; Palomino, P.T.; Rodrigues, L.; Shi, L.; Bittencourt, I.; Gasparini, I.; Isotani, S.; Cristea, A.I.
Analysing gamification elements in educational environments using an existing Gamification taxonomy. Smart Learn. Environ.
2019, 16, 6. [CrossRef]
18. Kim, S. Effects of a gamified learning environment on learning experiences: A case study of a general English course using
relative evaluation policy. MALL 2014, 17, 68–94.
19. Laffey, D. Gamification and EFL writing: Effects on student motivation. ETAK 2022, 28, 23–42. [CrossRef]
20. Toda, A.M.; Valle, P.H.D.; Isotani, S. The dark side of gamification: An overview of negative effects of gamification in education.
In Higher Education for All: From Challenges to Novel Technology-Enhanced Solutions; Cristea, A., Bittencourt, I., Lima, F., Eds.;
Springer: Cham, Switzerland, 2018; pp. 143–156.
21. Lee, J.; Hammer, J. Gamification in education: What, how, why bother? Acad. Exch. Q. 2011, 15, 146.
22. Kwon, B.; Lyou, C. The meta-analysis or domestic gamification research: Status and suggest. Humancon 2015, 39, 97–124.
[CrossRef]
23. Ministry of Education. 2022 Revised English Language Curriculum; Ministry of Education: Sejong, Republic of Korea, 2022.
24. Kim, S.; Lee, Y. Development of TPACK-P education program for improving technological pedagogical content knowledge of
pre-service teacher. J. Korea Soc. Comput. Inf. 2017, 22, 141–152.
25. Yi, S.; Lee, Y. The effects of software education teaching efficacy (SE-TE) of in-service teachers on backward design based TPACK-P
teachers’ training program. KACE 2019, 22, 113–121.
26. Kim, Y.M. Pre-service English teachers’ mobile information and communication technology-technological pedagogy and content
knowledge. FLE 2018, 25, 1–25. [CrossRef]
27. Borenstein, M.; Hedges, L.V.; Higgins, J.P.T.; Rothstein, H.R. Introduction to Meta-Analysis; John Wiley & Sons: West Sussex,
UK, 2009.
28. Morrison, A.; Polisena, J.; Husereau, D.; Moulton, K.; Clark, M.; Fiander, M.; Mierzwinski-Urban, M.; Clifford, T.; Hutton, B.;
Rabb, D. The effect of English-language restriction on systematic review-based meta-analyses: A systematic review of empirical
studies. Int. J. Technol. Assess. Health Care 2012, 28, 138–144. [CrossRef]
29. Pieper, D.; Puljak, L. Language restrictions in systematic reviews should not be imposed in the search strategy but in the eligibility
criteria if necessary. J. Clin. Epidemiol. 2021, 132, 146–147. [CrossRef] [PubMed]
30. Card, N.A. Applied Meta-Analysis for Social Science Research; Guildord Press: New York, NY, USA, 2012.
31. Egger, M.; Smith, G.D.; Schneider, M.; Minder, C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997, 315,
629–634. [CrossRef] [PubMed]
32. Sutton, A.J.; Duval, S.J.; Tweedie, R.L.; Abrams, K.R.; Jones, D.R. Empirical assessment of effect of publication bias on meta-
analyses. BMJ 2000, 320, 1574–1577. [CrossRef] [PubMed]
33. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA
statement. BMJ 2009, 339, b2535. [CrossRef]
34. Huang, R.; Ritzhaupt, A.D.; Sommer, M.; Zhu, J.; Stephen, A.; Valle, N.; Hampton, J.; Li, J. The impact of gamification in
educational settings on student learning outcomes: A meta-analysis. Educ. Technol. Res. Dev. 2020, 68, 1875–1901. [CrossRef]
35. Sailer, M.; Hommer, L. The gamification of learning: A meta-analysis. Educ. Psychol. Rev. 2020, 32, 77–112. [CrossRef]
36. Yıldırım, İ.; Şen, S. The effects of gamification on students’ academic achievement: A meta-analysis study. Interact. Learn. Environ.
2021, 29, 1301–1318. [CrossRef]
37. Brown, J.D. Understanding Research in Second Language Learning: A Teacher’s Guide to Statistics and Research Design; Cambridge
University Press: Cambridge, UK, 1988.
Sustainability 2023, 15, 11325 18 of 19

38. Dichev, C.; Dicheva, D. Gamifying education: What is known, what is believed and what remains uncertain: A critical review. Int.
J. Educ. Technol. High. Educ. 2017, 14, 9. [CrossRef]
39. Kim, J.; Castelli, D.M. Effects of gamification on behavioral change in education: A meta-analysis. Int. J. Environ. Res. Public
Health 2021, 18, 3550. [CrossRef]
40. Landers, R.N. Developing a theory of gamified learning: Linking serious games and gamification of learning. Simul. Gaming 2014,
45, 752–768. [CrossRef]
41. Ahn, Y. The Effects of a Phonemic Awareness Activity Class with Gamification on English Phonemic Awareness and Affective
Domains in Elementary School Students, and Observation for the Influence of the Native Language on Phonemic Awareness.
Master’s Thesis, Cyber Hankuk University of Foreign Studies, Seoul, Republic of Korea, 2019.
42. Lee, S. The effects of Gamification-based Artificial Intelligence Chatbot activities on elementary English learners’ speaking
performance and affective domains. Korean Soc. Elem. Engl. Educ. 2019, 25, 75–98. [CrossRef]
43. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019.
44. Schafer, J.; Graham, J.W. Missing data: Our view of the state of the art. Psychol. Methods 2002, 7, 147–177. [CrossRef]
45. Allison, P.D. Missing Data; Sage: Thousand Oaks, CA, USA, 2002.
46. Rubin, D.B. Multiple Imputations for Nonresponse in Surveys; John Wiley & Sons: New York, NY, USA, 1987.
47. Cummings, G. Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis; Routledge: New York, NY,
USA, 2012.
48. Cohen, J. Statistical Power for the Behavioral Sciences, 2nd ed.; Academic Press: New York, NY, USA, 1988.
49. Ferguson, C.J. An effect size primer: A guide for clinicians and researchers. Prof. Psychol. Res. Pract. 2009, 40, 532–538. [CrossRef]
50. Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs.
Front. Psychol. 2013, 4, 863. [CrossRef]
51. Plonsky, L.; Oswald, F.L. How big is “big”? Interpreting effect sizes in L2 research. Lang. Learn. 2014, 64, 878–912. [CrossRef]
52. Sawilowsky, S.S. New effect size rules of thumb. J. Mod. Appl. Stat. Methods 2009, 8, 597–599. [CrossRef]
53. Higgins, J.P.T.; Thompson, S.G.; Deeks, J.J.; Altman, D.G. Measuring inconsistency in meta-analyses. BMJ 2003, 327, 557–560.
[CrossRef]
54. Kim, H. A Study of the Improvement of Elementary Students’ English Vocabulary Through Gamification Using Kahoot Applica-
tion. Master’s Thesis, Woosuk University, Jeonbuk, Republic of Korea, 2023.
55. Baek, J. The Effects of Using Gamification on Primary School Students’ Learning English: Based on Students’ Acquisition
of English Vocabulary and Their Affective Attitudes on English. Master’s Thesis, Chinju National University of Education,
Gyeongnam, Republic of Korea, 2021.
56. Jeon, W. The Effects of Gamification Using Classcard and Class123 on the English Vocabulary Proficiency and the Affective
Domain for High School Students. Master’s Thesis, Cyber Hankuk University of Foreign Studies, Seoul, Republic of Korea, 2021.
57. Begg, C.B.; Mazumdar, M. Operating characteristics of a rank correlation test for publication bias. Biometrics 1994, 50, 1088–1101.
[CrossRef]
58. Duval, S.; Tweedie, R. Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in
meta-analysis. Biometrics 2000, 56, 455–465. [CrossRef]
59. Rosenthal, R. The file drawer problem and tolerance for null results. Psychol. Bull. 1979, 86, 638–641. [CrossRef]
60. Dickersin, K. The existence of publication bias and risk factors for its occurrence. JAMA 1990, 263, 1385–1389. [CrossRef]
61. Dwan, K.; Gamble, C.; Williamson, P.R.; Kirkham, J.J. Systematic review of the empirical evidence of study publication bias and
outcome reporting bias: An updated review. PLoS ONE 2013, 8, e66844. [CrossRef] [PubMed]
62. Franco, A.; Malhotra, N.; Simonovits, G. Publication bias in the social sciences: Unlocking the file drawer. Science 2014, 345,
1502–1505. [CrossRef] [PubMed]
63. Norman, G. Data dredging, salami-slicing, and other successful strategies to ensure rejection: Twelve tips on how to not get your
paper published. Adv. Health Sci. Educ. 2014, 19, 1–5. [CrossRef] [PubMed]
64. Hamari, J.; Shernoff, D.J.; Rowe, E.; Coller, B.; Asbell-Clarke, J.; Edwards, T. Challenging games help students learn: An empirical
study on engagement, flow and immersion in game-based learning. Comput. Hum. Behav. 2016, 54, 170–179. [CrossRef]
65. Landers, R.N.; Bauer, K.N.; Callan, R.C.; Armstrong, M.B. Psychological theory and the gamification of learning. In Gamification
in Education and Business; Reiners, T., Wood, L.C., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 165–186.
66. Hew, K.F.; Huang, B.; Chu, K.W.S.; Chiu, D.K.; Lo, C.K. Engaging Asian students through game mechanics: Findings from two
experiment studies. Comput. Educ. 2016, 92, 221–236. [CrossRef]
67. Warschauer, M. Information literacy in the laptop classroom. Teach. Coll. Rec. 2008, 109, 2511–2540. [CrossRef]
68. Hamari, J.; Koivisto, J.; Sarsa, H. Does gamification work? A literature review of empirical studies on gamification. In Proceedings
of the 47th Hawaii International Conference on System Sciences, Waikoloa, HI, USA, 6–9 January 2014.
69. Nation, I.S.P. Learning Vocabulary in Another Language; Cambridge University Press: Cambridge, UK, 2001.
70. Stoller, F.L. Establishing a theoretical foundation for project-based learning in second and foreign language contexts. In Project-
Based Second and Foreign Language Education: Past, Present, and Future; Beckett, G.H., Miller, P.C., Eds.; Information Age Publishing:
Greenwich, CT, USA, 2006; pp. 19–40.
71. Peterson, M. Computer Games and Language Learning; Palgrave Macmillan: New York, NY, USA, 2013.
Sustainability 2023, 15, 11325 19 of 19

72. Song, F.; Hooper, L.; Loke, Y.K. Publication bias: What is it? How do we measure it? How do we avoid it? Open Access J. Clin.
Trials. 2013, 5, 71–81. [CrossRef]
73. Bower, M. Technology-mediated learning theory. Br. J. Educ. Technol. 2019, 50, 1035–1048. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like