You are on page 1of 20

Cogent Education

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/oaed20

Fun and frustrating: Students’ perspectives on


practising speaking English with virtual humans

Elin Ericsson, Sylvana Sofkova Hashemi & Johan Lundin

To cite this article: Elin Ericsson, Sylvana Sofkova Hashemi & Johan Lundin (2023) Fun and
frustrating: Students’ perspectives on practising speaking English with virtual humans, Cogent
Education, 10:1, 2170088, DOI: 10.1080/2331186X.2023.2170088

To link to this article: https://doi.org/10.1080/2331186X.2023.2170088

© 2023 The Author(s). This open access


article is distributed under a Creative
Commons Attribution (CC-BY) 4.0 license.

Published online: 13 Feb 2023.

Submit your article to this journal

Article views: 2124

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=oaed20
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

INFORMATION & COMMUNICATIONS TECHNOLOGY IN EDUCATION |


RESEARCH ARTICLE
Fun and frustrating: Students’ perspectives on
practising speaking English with virtual humans
Elin Ericsson1*, Sylvana Sofkova Hashemi2 and Johan Lundin1
Received: 30 September 2022
Accepted: 13 January 2023 Abstract: Speaking in a foreign language is considered challenging to both teach
*Corresponding author: Elin Ericsson, and learn. Virtual humans (VHs), as conversational agents (CAs), provide opportu­
Department of Applied Information
Technology, Division of Learning, nities to practise speaking skills. Lower secondary school students (N = 25) engaged
Communication and IT, University of in an AI-based spoken dialogue system (SDS) and interacted verbally with VHs in
Gothenburg, Gothenburg, Box 100
405 30, Sweden simulated everyday-life scenarios to solve given tasks. Our analysis is based on
E-mail: elin.ericsson@ait.gu.se
system-generated metrics and self-reported experiences collected through ques­
Reviewing editor: tionnaires, logbooks, and interviews. Thematic analysis resulted in seven themes,
Fan Fang, Department of Foreign
Languages and Literature, College of revolving around the speaking practice method, scenarios and technology, which, in
Liberal Arts, Shantou University,
Shantou, China
combination with descriptive statistics, enabled a deeper understanding of the
students’ experiences. The results indicate that, on average, they found it easy, fun,
Additional information is available at
the end of the article and safe, but sometimes frustrating in scenarios not always relevant to their
everyday lives. Factors suggested as underlying the levels of experienced frustration
include technical issues and constraints with the system, such as not being under­
stood or heard as expected. The findings suggest that lower secondary school
students conversing with VHs in the SDS in an institutional educational context
facilitated a beneficial opportunity for practising speaking skills, especially

ABOUT THE AUTHOR PUBLIC INTEREST STATEMENT


Elin Ericsson is a PhD student at the University of This paper reports on a study where students
Gothenburg, at the Department of applied IT, (N = 25) practised speaking English face-to-face
working in the division for Learning, communica­ with embodied virtual humans in a system. They
tion, and IT. Her research project is within the solved tasks in everyday-life scenarios, e.g.,
field of dialogue-based computer-assisted lan­ ordering at restaurants. Our analysis is mainly
guage learning (DB-CALL). She explores the use based on the students’ self-reported experiences
of spoken dialogue systems with conversational produced through questionnaires, logbooks, and
agents and their possibilities and challenges in interviews, but also some logged data about
teaching and learning foreign languages, espe­ their activity and performance in the system. We
cially speaking skills. Research reported in this identified seven themes, revolving around the
paper will form, together with the contributions speaking practice method, scenarios, and tech­
of other two student studies, and one initial study nology, which, in combination with descriptive
of teachers’ views of using this educational statistics, enabled a deeper understanding of the
Elin Ericsson technology, the core of her doctoral thesis. The students’ experiences. Results indicate that, on
planned title is “Practising L2 Speaking with average, they found it easy, fun, and safe but
Conversational AI”. For more info see here.Photo sometimes frustrating practising in scenarios
of the first author: Elin Ericsson Photograph: with content not always relevant to their every­
Amanda Curtis. day lives. Factors suggested as underlying the
levels of experienced frustration include techni­
cal issues and not being understood or heard as
expected. The findings implicate beneficial
opportunity for practising speaking skills, espe­
cially pronunciation and interaction in dialogues.

© 2023 The Author(s). This open access article is distributed under a Creative Commons
Attribution (CC-BY) 4.0 license.

Page 1 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

pronunciation and interaction in dialogues, aligning with the key principles


of second language acquisition (SLA) for language development.

Subjects: Modern Foreign Languages; Teaching & Learning; Language Teaching & Learning

Keywords: experience; interaction; lower secondary school students; spoken dialogue


system; virtual humans

1. Introduction
Speaking is emphasised in today’s communicative language teaching approach (Byram & Méndez,
2011; Council of Europe, 2020), highlighting oral communication in the target language for mean­
ingful purposes in everyday-life situations. Hence, interaction plays a crucial role in foreign lan­
guage development and learning (Loewen & Sato, 2018). It has been found that the teaching of
speaking skills does not match the communicative approach (Adem et al., 2022) and that it is
difficult to approach it in pedagogically sound ways (Goh & Burns, 2012). Foreign language
teachers have internationally ranked speaking as the most important skill (to teach and to learn;
Thiriau, 2017), which coincides in general with how students measure, value and align their
success in language learning with their mastery of speaking skills (Darancik, 2018). However,
there are limited opportunities to effectively practise speaking the target language in education,
and little access to it outside the classroom. Additionally, speaking is considered particularly
complex to learn and frustrating due to its sense of transience (Goh & Burns, 2012). Spoken
interaction is cognitively and socially demanding, creating underlying mental and emotional
barriers such as anxiety and a lack of confidence (Burns, 2016).

Practising speaking skills with embodied chatbots (Wang et al., 2017), also known as virtual
humans (VHs; Schroeder & Craig, 2021), in a virtual learning environment, has been proposed as
one possible way to address these teaching and learning challenges and provide possibilities for
speaking practise (Johnson, 2019; Sydorenko et al., 2018). VHs add a social dimension and can
broadly be described as human-like and artificially intelligent characters (Burden & Savin-Baden,
2020; Fryer et al., 2019), visually represented or face-to-face interaction in simulations (Johnson &
Lester, 2016). VHs have human characteristics and behaviours, including social cues using facial
expressions, gestures, and body movements. The VH actions can be controlled either by a human
or by the system autonomously (Schroeder & Craig, 2021) when acting as the counterpart, that is,
a conversational agent (CA; Burden & Savin-Baden, 2020) in spoken interaction. Such spoken
dialogue systems (SDSs; Bibauw et al., 2019) use natural language processing technology to
interpret the speech of students and produce relevant responses (Griol & Callejas, 2016).
Although it is a promising technology (Golonka et al., 2014), there are few studies exploring its
experienced use in actual language teaching and learning, i.e., as supplement in an institutional
foreign language education (Bibauw et al., 2019). To fill this gap, this paper reports on a study
exploring the use of system-controlled VHs in verbal face-to-face situations in which Swedish lower
secondary school students practised speaking English as a foreign language in simulated scenar­
ios. We focus on the subskill interaction outlined by Li (2017), including interrelated skills such as
the production of utterances to maintain a fluent verbal conversation and listening comprehension
(Hodges et al., 2012).

2. Background
In this section, we delineate the theoretical framing of the study, followed by related literature
from the field before presenting the study goals and the questions guiding the research. Finally, we
provide a brief description of the context and roles of the researchers.

2.1. Learning to speak a foreign language


Analytically, language learning is understood as framed by social contexts and interactions
between people (Swain, 2000). Hence, this study is anchored in a sociocultural perspective on

Page 2 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

learning and development as the increased ability to participate in relevant ways in different social
interactions (c.f., Lantolf et al., 2014; Vygotsky, 1978). The theoretical foundation of the study is
inspired by the pedagogical approach that integrates learning through practice, as proposed by
Chapelle (2009), with the active use of the target language in task-based teaching and learning
(Blake, 2017), in line with the communicative approach. It is important that the student be
exposed to the language and given opportunities for spontaneous conversational interaction,
including key principles such as “authentic input, conscious noticing on form, opportunity for
interaction, in-time and individualised feedback, low affective filter and an environment where
language can be used” (Li, 2017, p. 28), along with an adequate level of scaffolding and guidance
(Goh & Burns, 2012). Furthermore, Swain (2013) pointed out the need for the field to consider the
role of emotions as one aspect affecting the language learning process. Individual differences in
the affective filter, such as levels of anxiety and self-confidence, speaking skills, and motivation to
speak the target language, all influence speaking ability (Goh & Burns, 2012; Li, 2017).

2.2. DB-CALL
In their review, Bibauw et al. (2019) found studies of dialogue-based computer-assisted lan­
guage learning (DB-CALL) dispersed and fragmented and suggested a conceptual framework for
the combined research in this field. DB-CALL assumes that meaningful efforts in the target
language in verbal contextualised conversations with the computer as an automated interlocu­
tor develop learners’ language skills, in contrast to computer-mediated communication through
the computer (Blake, 2017). In this case, the pragmatic unit of instruction is the dialogue,
consisting of turn-taking between the system or, more precisely, the embodied CA and the
student, where the dialogue is the actual task. Automatic speech recognition is used to recognise
and analyse what the students say through natural language processing and to converse with
them by means of a synthesised voice (Griol & Callejas, 2016) in the target language (native
speaker). Consequently, Long’s interaction hypothesis (1996), supporting the negotiation of
meaning, the importance of opportunities for output, and receiving comprehensible input and
feedback, is a prominent foundation for designing such SDSs, as suggested by Morton and Jack
(2005). We still know little about the implications of interaction in this kind of new context for L2
development (Loewen & Sato, 2018).

2.2.1. Developing speaking skills with CAs in SDSs


Chatbots and human-like embodied VHs (Huang et al., 2022) are increasingly available as CAs for
providing conversation practice, pronunciation (Kukulska-Hulme & Lee, 2020), and vocabulary
learning, but they are still scarcely used in the institutional educational context as supplements
for teaching speaking. By engaging with CAs, students can practise speaking the target language
repeatedly and with feedback from the system (Doremalen et al., 2016; Fryer et al., 2019), allowing
access anywhere and at any time (Huang et al., 2022), without travelling abroad. CAs are found to
be mostly useful for language learning, with technological, pedagogical, and social affordances
highlighted, such as ease of use, personalisation, authentic simulations, and keeping conversations
fluent (Jeon, 2022). Technological limitations affect students very differently (ibid), sometimes
causing frustration but still making students mostly willing to engage in conversations (Alm &
Nkomo, 2020). Drawbacks reported include CAs’ lack of emotions and visible cues and their
inability to confirm understanding (Jeon, 2022). The systems’ constraints on understanding several
utterances in a row are identified as a challenge, together with a restricted conversational path,
due to the lack of negotiation of meaning that we would normally see in a human-to-human
conversation (Bibauw et al., 2019). Hence, CAs still have some way to go before being sufficiently
robust, as suggested earlier by Coniam (2008).

Research shows that SDSs have the potential to enhance language learning since they
allow for meaningful everyday-life speaking activities with CAs (e.g., Anderson et al., 2008;
Doremalen et al., 2016; Hassani et al., 2016; Johnson, 2019). Progress pointing towards an
enjoyable and non-threatening use of CAs in everyday-life contexts is also suggested in
systems like Mondly and Duolingo (Alm & Nkomo, 2020). It is envisioned that student

Page 3 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

engagement and acceptance could be facilitated by placing the educational activities in


virtual reality, supporting experiences of immersion and the sense of being somewhere else
(Kaplan-Rakowski & Wojdynski, 2018; Wang et al., 2017), hence enhancing the development
of speaking skills (Morton & Jack, 2005). Divekar et al. (2021) used a multimodal SDS with
extended reality and AI-empowered CAs and showed statistically significant improvements in
students’ vocabulary learning, listening comprehension, and speaking. Moreover, Johnson and
Lester (2016) argued that interacting with CAs in virtual reality environments facilitates
cultural learning.

Studies of the use of SDSs conducted in higher education in relation to the learning out­
comes of the students’ speaking ability have demonstrated a greater impact on beginner- to
intermediate-level students than on advanced-level students (Kim, 2016). Contrastingly,
Sydorenko et al. (2018) suggested that students with more advanced proficiency levels of
the target language may benefit more from simulated interactions due to their ability to
listen and understand the interlocuter’s utterances. Anderson et al. (2008) also demonstrated
engaged and less stressed adolescent students interacting with embodied VHs and finding it
useful and fun. SDSs have been shown to enable a low-anxiety learning environment (e.g.,
Anderson et al., 2008; Morton & Jack, 2005), influencing students’ willingness to communi­
cate (Ayedoun et al., 2015; Divekar et al., 2021). Reduced anxiety over speaking has been
found and explained, for instance, by SDSs that provide enjoyable and less stressful speaking
practises (Bashori et al., 2020, 2021). The independence of conversing with a CA alone was
reported to be beneficial for learning, with the text-to-speech function mentioned as a merit
of the system but nevertheless not able to fully replace human beings in dialogues (Gallacher
et al., 2018). Subsequently, there has been scepticism about incorporating CAs into institu­
tional language education settings.

2.3. Aim & research questions


In summary, even though the field has looked at learning effects, students’ attitudes, and
experiences of using SDSs, there is a lack of studies outside of higher education as well as of
SDSs used in non-experimental institutional educational activities, in particular SDSs with VHs.
Therefore, this study was designed to be a regular part of learning a second language (L2), and it
was conducted in a lower secondary school. This study aims to explore the perspectives of lower
secondary school students on practising their foreign language speaking skills through verbal
interactions with VHs in an SDS. One way to explore this is by tapping into the students’ self-
reported experiences (affective, cognitive, and educational aspects) of practising in the system.
The study is guided by the following research questions:

(1) How do students in lower secondary school practise speaking English in an SDS with VHs ?
(2) What factors can be identified as the basis for any variations in the group’s overall
experience?

2.4. The role of the researchers


The first author of this paper initiated, designed, and conducted the study under the supervision of
the co-authors, who participated in the preparation phases, the analysis of the data, and the
writing process of this paper. The first author distributed the logbook and the questionnaires used
in the study and conducted the interviews. The relationship to the inquiry is our interest in under­
standing the possibilities and challenges of educational technology and its application in the
(foreign language) classroom, focusing on the students as users and their experiences.

3. Methods
This section describes the roles of the researcher and the participants. Next, the design of the
study, including the system used and the instruments applied for producing data in the procedure,
is presented. Finally, the steps undertaken in the two-fold analysis are outlined.

Page 4 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

3.1. Participants
A language teacher for a class (N = 25) of 12 girls and 13 boys aged 13 to 14 years, and the class
itself all agreed to use the system as part of their institutional education. The type of sampling was
convenience (Creswell & Creswell, 2018), with the participants selected based on availability. The
students studied English as a foreign language at an approximate beginner level (A1–A2, for more
information, see, Council of Europe, 2020), according to the teacher’s assessment in the seventh
grade at a lower secondary school in a suburb of a large city in Sweden. The characteristics of the
students relevant to the study are described in Section 4.1

3.2. Study design


This study mainly explored the students’ self-reported experiences, as suggested by Levy (2015), to
tap into the students’ perspectives. In addition, the study identified the differences within the
group, as suggested by Schroeder et al. (2021) for educational research on VH, rather than the
traditional between-group experimental design. To more fully understand how the students
experienced practising speaking in the system, the study employed posttrial questionnaires, log­
books, semi-structured interviews, and system-generated metrics, combining both qualitative and
quantitative data, as suggested by Bergman (2008) and Creswell and Creswell (2018). These
multiple sources enabled triangulation for a deeper understanding.

The decision to use a questionnaire was motivated as an effective way of gathering data about
the students’ self-reported experiences in the system (posttrial) and background information
(pretrial), complemented with logbooks for their reflections and ratings instantly after the speaking
session. The interviews provided the possibility to deepen understanding, and metrics were more
of an objective way of measuring performance and activity in the system. Since the experiences of
the student–VH spoken interaction are not shared reciprocally in the same way as in human-to-
human interaction, it is not yet possible to ask these VHs to reflect and report on their experiences
(c.f., about VHs’ minds in Burden & Savin-Baden, 2020); hence, our instruments are one possible
way to approach the new learning situation. The analysis is based primarily on the students’ self-
reported experiences, including affective, cognitive, and educational aspects of practising a foreign
language in this virtual environment. The SDS was selected based on the criteria for enabling
spontaneous interaction in the target language with a VH.

Analytically, the students and the SDS are viewed as constituting the learning situation,
which also includes the classroom, the teacher, and the curriculum, although data on the
physical environment and the school’s pedagogical perspective were not collected. The teacher
and a teacher trainee were, however, included in the semi-structured interviews aimed at
gaining insights into their views of the students’ experiences. The responses provided insights
into the educational situation. The role of the researcher (first author) in the study was to
present the aim of the study and to introduce the system together with the teacher, who acted
as a facilitator during the trial. The VHs guided and instructed the students within the system,
which also provided text-based instruction. The students were instructed to engage in the
speaking activities, to use the various supporting features, and to visit the learner’s dashboard
(Section 3.3).

3.3. Spoken Dialogue System (SDS)


The speaking sessions took place in Enskill (Alelo, n.d.; beta version) and sought to achieve realistic
interaction in English to develop the students’ (interactional skills. The system and the student
took turns such that both the student and the VH could be the initiators and solve different tasks
together. The conversations simulated everyday-life scenarios with an action-oriented approach, in
which the students’ task was to solve practical problems, such as ordering food at a restaurant or
giving directions. The VHs are animated and embodied female or male characters (see, Figure 1),
who act as human counterparts in the dialogues. Figure 1 illustrates a system-generated transcript
in which VH Lily and a Swedish student engaged in a spoken conversation.

Page 5 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

Figure 1. Screenshot of the VH


(upper left) and the practising
student (upper right), along with
a transcript from a VH–student
(S) verbal interaction in Enskill
(Alelo, n.d.).

Note. ©Alelo Incorporated


(Inc). Used with permission

The students practised English at the A1 and A2 levels (for more information, see, Council
of Europe, 2020). Students’ utterances are registered and analysed in the system by applying
automatic speech recognition and language technology (Johnson, 2019). The students can
express themselves spontaneously, and the system is not dependent on predetermined
responses. However, there are some constraints, such as the context, that keep conversa­
tional freedom within bounds and guide the conversation along pre-set paths. Students are
enabled to take alternative paths to solve the task. They could also choose various supporting
features, such as displaying the learning objectives, providing transcripts (text-to-speech) of
the ongoing conversation (see, Figure 2), and providing directions (suggestions for the next
action and utterance). It is not possible to change the speed of VH’s speech or the variety of
American English

To solve a task, the students must accomplish certain goals (learning objectives) during the
conversation (see, Figure 3). Instantly after the spoken dialogue, auto-generated individualised
feedback is provided with adapted instruction, suggesting adequate follow-up exercises (if
needed). The students can choose to repeat the same conversation until they have fulfilled all
learning objectives and feel satisfied with the results. The SDS includes an individual student
dashboard (beta version) showing the results and mastery score based on learning analytics,
such as time, accuracy, and turns per minute.

3.4. Data production


To explore students’ experiences, the study analysed data produced in system-generated metrics,
questionnaires, logbooks, and interviews. The metrics are logged data about the students’ speak­
ing activity and performance in the system. They displayed students’ fulfilled learning objectives,
simulation, time spent, and mastery scores (based on accuracy, fluency and turns per minute). The
system also generated transcripts of the conversations.

Page 6 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

Figure 2. Screenshot of
a student’s view with a prompt
to order food at a restaurant,
displaying the transcript.

Note. ©Alelo Inc. Used with


permission.

Figure 3. Part of a screenshot of


a student’s’ view, displaying
the learning objectives.

Note. ©Alelo Inc. Used with


permission.

Two questionnaires were constructed and conducted digitally, drawing on previous questionnaires
evaluating Enskill (Johnson, 2019) and with inspiration from the research project teaching, assess­
ment, and learning (TAL) on oral proficiency in foreign language education (Erickson et al., 2022).
Speaking was divided into three subskills in this context: pronunciation, producing one’s own utter­
ances, and interaction (ibid.). The study’s questionnaire used a similar five-point Likert scale (Creswell
& Creswell, 2018), sometimes followed by optional open-response items, as well as multiple-choice
items (Brown, 2009). We adapted vocabulary to the age group, and “avatar” was used equivalent to VH
(still controlled by the system). This study’s pretrial questionnaire (N = 25) for background information
reported on the students’ self-assessed views of learning and speaking English in an institutional
educational setting and their experience of using digital educational tools. The posttrial questionnaire
(N = 23) reported primarily on the students’ experiences of practising speaking English with VHs in the
SDS, the benefits and limitations experienced, as well as suggestions for improvement and, finally, the
students’ own evaluation of their learning and speaking English.

Page 7 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

In a separate digital logbook, the students (N = 22) documented their experiences (today’s
practice, challenges, and benefits). In addition, semi-structured posttrial interviews were con­
ducted with six volunteer students until data saturation was reached to deepen our understanding
of the learning situation based on their experiences. An interview guide was used, with pictures
representing experiences using the system. Pre- and posttrial interviews were conducted with the
teacher and a teacher trainee. All interviews were audio-recorded and transcribed verbatim.

3.5. Procedure
The study has been guided by ethical research practice principles (Swedish Research Council, 2017),
including personal privacy compliance (GDPR, 2018). Once the students (and their legal guardians)
had consented to participate in the study, their teacher and the first author researchers introduced it
and demonstrated the system. Each student was equipped with a headset with a microphone and an
account with an anonymised login to use on their laptop. Before starting the trial, the students
answered the pretrial questionnaire. The trial covered four 15-minute sessions (total 60 minutes) over
two weeks. The students chose from among 10 conversational scenarios per level. Unpredicted
technical issues with the system (for example, software bugs) occurred randomly and were solved
continuously. Directly after each speaking session, the students reflected individually by writing in the
logbook for approximately five minutes. After the trial period, they filled out the posttrial question­
naire and were interviewed until data saturation was reached.

3.6. Analysis
The metrics from the system and questionnaire ratings were analysed through descriptive statistics.
Data from the open-response items of the posttrial questionnaire, written logbook reflections and
transcripts from semi-structured interviews were analysed in two ways: (i) through reflexive thematic
analysis (Braun & Clarke, 2006, 2019) on the students’ experiences and (ii) by exploring the variations
in the overall experience within the group to identify possible patterns and subgroups.

In the thematic analysis, the data were coded systematically, with collegial consensus to ensure
reliability. A combination of data-driven (inductive) and theory-driven (deductive) approaches was
used, and seven themes were developed in relation to the research questions. We refined these
themes during the analysis and gave them illustrative titles. For further depth, as suggested by
Brown (2009), we added representative quotes (after translation) from the student reflections.

To explore the variations within the group, the complete data set was analysed at an individual
level to obtain a broad experience profile for each student. These profiles were then clustered into
three subgroups (positive, neutral, and negative) showing the degree of positivity versus negativity
towards practising speaking. Group membership was subsequently related to their engagement in
the system, measured through metrics (time spent and learning objectives), and analysed in
correlation plots.

Data produced from the pretrial questionnaire (N = 25) were analysed using descriptive statistics
for background information about the participating students. The system-generated transcripts of
the VH–student conversations were read to select an illustrative excerpt (see, Figure 1).

4. Results
This section presents background information about the students, logged data (metrics) on their
activity in the system, and their overall experience based on ratings and open-response items
posttrial. Additionally, it outlines three clustered subgroups of students (positive, neutral, and
negative) and the association established between the students’ membership in the subgroup
and their engagement with the system (time spent and learning objectives fulfilled).

4.1. Background information


The participating students reported spending an average of eight hours on digital devices daily,
including schoolwork using digital educational tools. Many students reported in the pretrial

Page 8 of 19
Table 1.. Metrics reports from trials
Simulation level Active users Maximum time Minimum time Maximum total Minimum total Mean time spent Standard
https://doi.org/10.1080/2331186X.2023.2170088

(N = 25) spent in spent in time hr: min: s time hr: min: s deviation
Ericsson et al., Cogent Education (2023), 10: 2170088

a simulation a simulation
A1 20 27 min 42s 1s 01:07:20 00:01:40 00:23:45 00:17:32
A2 23 27 min 53s 1s 00:59:38 00:01:25 00:26:19 00:18:07
A1 & A2 16 27 min 53s 1s 01:22:52 00:22:58 00:44:13 00:18:26

Page 9 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

questionnaire that learning English is generally fun and easy. However, a quarter of the students
were dissatisfied with the frequency of opportunities to speak English during lesson time and
identified this as the skill with the least opportunity for practice. The main obstacles to speaking
English reported were a lack of confidence and anxiety.

4.2. Speaking activity (Metrics)


Metrics reports generated in the system revealed that all students (N = 25) started simulations,
with 16 of the students’ making attempts at both the A1 and A2 levels. The mean value per person
of the total time spent on activities at both levels was 44 minutes and 13 seconds (Table 1). The
minimum time spent (one second) was explained by starting a simulation without proceeding. The
scenarios chosen most frequently were the planning of a party, followed by ordering at
a restaurant.

4.3. Overall experience—easy, fun, and average realistic practice of speaking skills (ratings)
The posttrial questionnaire ratings revealed that most of the students were neutral or positive
about practising speaking English with a VH, regarding both how easy (48%) and how fun (40%) it
was (see, Figure 4).

Almost half of the students found the threshold low to start using the system (48%). In addition,
11 students (48%) reported that overall, they experienced the system as good for practising
speaking English (see, Figure 5).

Figure 4. Students’ ratings of


practising speaking English in
the system.

Figure 5. Students’ overall rat­


ings of practising speaking
English in the system.

Page 10 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

The students did not consider the scenarios very realistic (as in “real life”), with the mean rating
value being 2.48. The most common answer was “average”, and 43.5% of them thought the tasks
were “not realistic at all” or “only slightly realistic”. Overall, taking in all statements and evaluating
the method of practising in relation to the development of speaking skills resulted in an average
median value of 3 (see, Table 2).

The students’ self-assessment of what subskills they practised revealed that they mostly experi­
enced pronunciation (44%), closely followed by dialogue/conversation (39%). However, around
a third (35%) had difficulty assessing which subskills they had practised, as illustrated in Figure 6.

Concerning the supportive features, only some students reported having used the system’s
transcript feature, and only a few looked at the learner’s dashboard and practised the adapted
follow-up exercises, despite being repeatedly encouraged to do so.

4.4. Overall experience with method, scenarios, technology, and design (open-response
items)
Based on open-response items, the thematic analysis revealed seven themes revolving around the
students’ self-reported experiences of the speaking practice method, the content of the scenarios,
and the technology and design of the system. Most of the coded data were distributed within
themes about the technology, mostly about not being understood or heard, resulting in repetitive
VHs, hang-ups and bugs in the system.

4.4.1. Easy, fun, and innovative


The students expressed that it was “easy”, “smart”, “exciting”, “fun”, “nice”, and “a good way” of
practising speaking and learning English: “Very good and this was fun and I’m very excited to
continue” (Student 11). Comments included this is “an innovative way of teaching English”
(Student 5) and “I liked it a lot. It was fun and nice trying a new thing, and I enjoyed it” (Student 9).

4.4.2. Learning to speak, listen, and read


Some students reflected upon which skills they had learnt and developed: “One started to feel some­
what more comfortable speaking English” (Student 16) and “you learn words and how to speak”
(Student 25). Furthermore, “I think you learnt a lot from this, since you were both speaking with an
avatar (so you learn pronunciation), and at the same time as speaking, you also learn to read (if you
have a text beside with what you are talking about)” (Student 17, student’s parentheses).

4.4.3. Less nervous and more comfortable


Many students reported feeling safe, “less nervous” and “more comfortable” speaking with a VH
compared to speaking with other people in the classroom: “When you speak to an avatar you do
not become nervous about saying something wrong. That is why I think this is good, since you
learn speaking better without having to be nervous” (Student 17). The system was reported as
a safer way of practising speaking English: “It is nice for many people who do not feel fully
comfortable speaking with other people, so [it is] good to do this in advance so you can skip real-
life conversations with human beings, but it is good to combine both” (Student 7). Additionally, this
way of speaking gives you time to think: “You can train to have quicker answers when the avatar
speaks, you can think through what to answer” (Student 11).

4.4.4. Human-like versus non-human-like interaction


Some students experienced the VH as more “human-like” and “real” compared to other systems:

The avatar felt like a human being; they spoke exactly like a normal human being and not
like a robot, like Siri or something; this one spoke normally, which was nice. They put in
scenarios in real life, so you have to say something like in a real conversation. Yes, they
succeeded; it felt quite authentic indeed. (Student 9)

Page 11 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

Table 2.. Students’ ratings of self-evaluative and overall experiences


Item Statements M SD Mdn %
1. I have developed my English- 2.65 1.11 3 22
speaking skills.
2. I feel less nervous about 2.43 1.20 3 13
speaking English.
3 I am more motivated to speak 2.65 1.07 3 17
English now than before the trial.
4. I am more prepared for 2.7 1.11 3 26
everyday-life situations now
than before the trial.
5. It has been good for me to speak 2.87 1.10 3 30
with a virtual human.
6. It has been good for me to speak 2.65 1.03 3 9
in everyday-life situations.
7. It has been good for me to 2.83 1.27 3 35
practise listening.
Note. % was calculated using the number of students whose posttrial ratings were given as “agree” (4) and “totally
agree” (5).

Figure 6. Students’ ratings of


estimated practised speaking
subskill(s).

Aspects of humanising are interesting, and students’ referring to the VHs varied between “they,”
“avatar(s),” “the figures,” “it,” “he,” “the man,” “she,” or using their human names: “I was talking
with Emma about a soccer game (Student 11). However, there were large individual differences,
and the VH was sometimes experienced as “strange,” feeling “kind of weird” compared with
a machine: “Speaking to this avatar was good. But it is a little bit difficult because he is like
a computer (Student 4). “It was not so realistic since you had to say: ‘No, I do not need any
more’ . . . instead of just saying ‘no’ as you would say in a normal conversation” (Student 12).
Frustration also arose around the repetitiveness of the interaction:

It should be more human to speak with and at least have more than one answer to answer
with than ‘Can you please repeat that again?’ I was going to throw my computer out the
window. You must be able to say that you want to book a train ticket or something in various
ways, because [the interaction] does not work otherwise. (Student 15)

Many students reported that they gradually felt used to and more comfortable with this way of
practising, indicating that it takes time: “Somewhat strange feeling at first, but you got used to it”

Page 12 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

(Student 7). “The final lesson, I have got more and more comfortable with time. I think you must
take some time to get to know the programme” (Student 5).

4.4.5. Frustration when not being understood or heard by a limited system


Whereas the students experienced a good understanding of the VHs and that “(t)he voice of them
sounded good and clear” (Student 20), they expressed frustration over not being understood or
heard by the system: “Often when I speak, they say that I should repeat myself, and that is very
annoying” (Student 21). “If you answer in another way than it says in the manual, then they say,
‘Can you say that again?’ It feels petrifying when you keep trying, but you do not know exactly
what they want you to say” (Student 9). Students expressed repeated frustration over constraints
in the system: “There were only a few answers to every question” (Student 11), “I can’t answer the
way I want, and they can’t hear me” (Student 20), and “they didn’t answer the question I asked”
(Student 2). Students also expressed frustration over the non-functioning “useless” system, con­
stantly repeated technical issues and boredom: She or he just asked many boring questions and as
I said it was very boring” (Student 20). Female students also reported that their voices were not as
easily recognised by the system:” But we have concluded that it becomes more difficult for us girls,
since it hears male voices better” (Student 22).

4.4.6. Quite realistic scenarios with preference for variation and age-adapted tasks
The sixth theme refers to tasks in the simulations and the extent to which they were experienced
as realistic/relevant and useful. This student reported positively about the content: “For instance,
there are such subjects that as a matter of fact can apply in real life, hence booking a hotel,
interview, football game, etc., that you can really make big use of” (Student 17). Additionally, the
students suggested more options to choose from, for instance, at the restaurant ordering various
foods and not only spaghetti.

However, some students would have preferred it if the scenarios were more adapted to their
age, as summarised by the teacher: “Linked to what young people are doing, their lives, hobbies,
interests, conversation subjects, and so as authentically linked to their world as possible”.
Furthermore, some comments reflected opinions about the content as preparation for the future
and for travelling:

You learn about a lot of things here, about disco, food; if you are going on a trip to England
or the USA, then you can practice and learn here beforehand. The situations were good.
I missed something you do not learn in school, but need later in life, for instance, airports,
something realistic, real life. (Student 9)

4.4.7. Suggestions
There was also feedback concerning the interface, such as optimising the tutorials and introduc­
tion to reduce the number of clicks and repetitive tutorials. Another idea was that the VH should
give follow-up questions to deepen the conversation and vary the utterances used when not
understanding. The teacher expressed a wish for a better match between the system and the
Swedish curriculum, such as speaking and listening to global English with a variety of options;
a wish also echoed in the students’ suggestions.

4.5. Three subgroups of students in relation to overall experience


Three subgroups of students’ profiles were clustered according to a manual overview of their
various posttrial ratings, open-response items, and reflection posts: negative (N = 5), neutral
(N = 8), and positive (N = 12). In the correlation plots drawn, there was no association established
between group membership with either self-reported pretrial views (on learning to speak and
speaking English) or with time spent daily on digital devices. However, there was an association
indicated between-group membership and time spent per person in the system, as well as with
fulfilled learning objectives in relation to learning objectives started (see, Table 3 and Figure 7).

Page 13 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

Table 3. Subgroups and all students’ time spent and performance in the system
Mean value (per Negative (N = 5) Middle (N = 8) Positive (N = 12) All
person) (N = 25)
Time spent per 00:05:02 00:04:44 00:05:21 00:05:01
simulation (hr: min:
s)
Total time spent in 00:32:10 00:40:10 00:47:25 00:43:07
the system (hr: min:
s)
Number of logins/ 9.6 8.5 8.7 8.8
simulations
Fulfilled learning 7.8 9.9 12.9 10.9
objectives
Started learning 31.6 32.9 31.3 31.8
objectives

Figure 7. Fulfilled learning


objectives in relation to all
started learning objectives

This means that students’ overall experiences are reflected in their engagement with the system in
terms of time spent, as well as their successful completion of learning objectives. For the negative
group, reports of being bored and frustrated dominated the overall experience of the whole trial. These
students showed little or no patience with the difficulties and even used inappropriate words.

5. Discussion
Triangulation through multi-methods showed how the students’ experiences mostly coincide with
key principles that account for developing speaking skills successfully outlined by, for instance, Li
(2017). The way of practising is reported to provide opportunity for (human-like) interaction
(Lantolf et al., 2014; Loewen & Sato, 2018), feeling comfortable (Li, 2017), practising speaking in
the target language in a fun, safe and (Swain, 2013) active way through their practical everyday-
life tasks and practice of pronunciation and dialogues (Blake, 2017). Having time to think before
answering in the dialogue nuances the picture of speaking with VHs, signalling the practice as less
stressful than human-to-human interaction. Furthermore, the students expressed requisites for
variation and personalisation of the system according to their interests, age, and educational
needs, highlighting in particular the importance of conversational themes associated with their
everyday lives and interests (e.g., Li, 2017; Long, 1996), both current and for the future. This
observed lack of relevant scenarios relates to the design of SDSs as educational tools regarding
the audience they target, as this specific system was not designed with the Swedish lower
secondary school student and curriculum in mind.

Page 14 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

The participants’ variation in reported experiences in this study echoes previous findings in DB-
CALL (see, e.g., Anderson et al., 2008; Jeon, 2022; Johnson, 2019), although the former studies
were primarily conducted in higher education (Bibauw et al., 2019). The findings reported here
nuance previous studies of VR-assisted language learning, highlighting the great potential for
positive engagement (Anderson et al., 2008; Kaplan-Rakowski & Wojdynski, 2018) and acceptabil­
ity when practising speaking skills with VHs in immersive scenarios (Wang et al., 2017). Our study
observed a large variation in terms of both how the speaking practice and technology were
experienced and the amount of effort each student put into the activity. The results also indicate
that students adapt differently to the VHs’ conversational strategies, experiencing that they are
not always appropriate or aligned with the students’ expectations. The way the students in this
study experience the VH possibly reveals something about the level of immersion in the different
simulations and the degree to which the students engage in conversations with the VH. The results
show that these students were more concerned with what the VHs said—the content and fluency
—rather than how they said it, with no regard for the picture, sound, or physical appearance. In
addition, the students reported being quite neutral about the use of VHs in language education.

Given that we see a large variation in individual experiences, it makes sense to explore this in terms
of the underlying individual factors affecting the overall experience. One possible way for such
exploration is to look for dependencies within the three subgroups. The results suggest that the few
students in the “negative group” were among those very concerned with and disturbed by technical
issues, constraints and frequently not being understood or heard by the VH. These students reported
little patience with the difficulties, spent less time on average in the system, and fulfilled fewer
learning objectives. Time is one of the components raised as crucial in relation to getting used to
the system and the new way of practising. It cannot be concluded in this study whether this result has
to do with the students’ proficiency level, pronunciation of English, digital competence or setting, or
a combination of these factors. However, one possible explanation is that the system may be more
suitable for students with a higher proficiency level in English, similar to the findings of Sydorenko et al.
(2018), in which intermediate students struggled with listening comprehension on behalf of benefit­
ting from the system. In contrast, other studies have found greater effects in beginners with low or
moderate proficiency levels (Kim, 2016). Since this study was not intended to measure the efficacy of
SDS, it was impossible to draw any similar conclusions.

Most students reported an average increase in motivation and feeling more prepared, more
willing, and less anxious about speaking in everyday life outside of an institutional educational
setting after the trial, which is in line with earlier studies (e.g., Ayedoun et al., 2015; Bashori et al.,
2020, 2021; Divekar et al., 2021). These lower secondary school students valued their experiences
with the system as easy, useful, and fun, as reported in a similar study on university students
(Johnson, 2019), who also reported Enskill as a good way to learn English, while being easy and fun
to use. The students in our study found it a good way to practice English while reporting mostly
favourable views, mixed some communication breakdown issues where a minority showed less
patience, in contrast to results from Dizon and Tang (2020), where the students tended to give up
totally in those cases.

The negative group’s results can, for instance, be related to earlier findings showing that users’
confidence in the ability to learn, speak and understand the target language increased when using
a system providing appropriate feedback (Wang & Johnson, 2008), something that did not work
entirely for all students in this study. Natural language processing was probably not sufficiently
developed to recognise all students’ speech in the SDS; similar limitations were also reported in
Johnsons’ (2019) snapshot evaluations of Enskill. The VHs (i.e., acoustic models) of this data-driven
system were not previously trained to recognise Swedish students’ variations in pronunciation and
widespread levels of English. This study was the first to use Enskill in a Swedish educational
context. One parallel can be drawn to a study of SPELL, where the VHs were trained on L1 speakers
but had to recognise non-L1 speakers’ spoken language (Anderson et al., 2008), a prominent issue
also highlighted by Godwin-Jones (2009) said to provoke frustration.

Page 15 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

In contrast, other studies found that the comprehensibility and intelligibility of “intelligent
personal assistants” interacting with non-L1 accented students is comparable to human-to-
human interaction (Moussalli & Cardoso, 2020). Further, positive student experiences have found
CAs used successfully for language assessment based on this kind of conversation (Forsyth et al.,
2019). Furthermore, the reported gender bias in this study, with less accuracy for female speakers,
aligns with earlier research (Bajorek, 2019), which was not further developed in this study. Both
students and their teacher expressed a wish for fewer constraints in the system and opportunities
to choose between different varieties of English, including the VH, which currently speaks American
English.

The strengths of this study are the rich dataset unpacking the students’ experiences in the
system and enabling insights based on their voices of self-reported experiences, as suggested by
Levy (2015), together with the measurement of performance from the system, as suggested by
Chan (2009). Practising speaking skills with VHs has been reported mostly positively, but our study
provides more nuance to current knowledge. This study provides a starting point for understanding
how lower secondary school students experience using this kind of SDS and which aspects they
highlight after interacting verbally with VHs. Future studies are recommended for this age group to
explore aspects with a bearing on learning and effectiveness in students’ foreign language
development.

6. Conclusions
This study aimed to increase knowledge about lower secondary school students’ perspectives on
practising their foreign language speaking skills in verbal interactions with VHs in an SDS. When
interpreting the results, it should be noted that the SDS used has not been designed with this age
group of students in this Swedish educational context in mind; it is a small-scale study in sample
size and time. Based on a multi-method analysis, the findings showed that the students found this
method of practising speaking skills to be mostly fun, easy, and safe but could not see the task
content in the scenarios entirely relating to their own everyday lives. In addition, there was
individual variation in the extent to which constraints and technical issues provoked frustration.
A possible relation was detected between the students’ reported positiveness/negativity toward
the overall experience and their engagement with the system in relation to the learning objectives
completed and the time spent. Regardless of the students’ mixed experiences, they were all eager
to give suggestions for improving the system, such as content linked to their current and future
everyday lives and with fewer constraints in the interactional design. The findings suggest that
these lower secondary school students engaged in conversations with VHs in the SDS, facilitating
supplementary and beneficial opportunities for practising speaking skills, especially pronunciation
and interaction in dialogues. Their self-reported experiences are, to a great extent, consistent with
the SLA key principles recognised for efficient foreign language development and learning.

Funding Citation information


The authors received no direct funding for this research. Cite this article as: Fun and frustrating: Students’ per­
spectives on practising speaking English with virtual
Author details humans, Elin Ericsson, Sylvana Sofkova Hashemi & Johan
Elin Ericsson1 Lundin, Cogent Education (2023), 10: 2170088.
E-mail: elin.ericsson@ait.gu.se
ORCID ID: http://orcid.org/0000-0001-6472-9149 References
Sylvana Sofkova Hashemi2 Adem, H., Berkessa, M., Khajavi, Y., & Khajavi, Y. (2022).
ORCID ID: http://orcid.org/0000-0001-5248-771X (Reviewing editor) A case study of EFL teachers’
Johan Lundin1 practice of teaching speaking skills vis-à-vis the
ORCID ID: http://orcid.org/0000-0001-5547-9395 principles of Communicative Language Teaching
1
Department of Applied Information Technology, Division (CLT). Cogent Education, 9(1). https://doi.org/10.1080/
of Learning, Communication and IT, University of 2331186X.2022.2087458

2
Gothenburg, Gothenburg, Sweden.
Department of Education, Communication and Learning,
®
Alelo. (n.d.). Enskill English. A revolution in English
language teaching. https://www.alelo.com/english-
University of Gothenburg, Gothenburg, Sweden. language-teaching/
Alm, A., & Nkomo, L. M. (2020). Chatbot experiences of
Disclosure statement informal language learners: A sentiment analysis.
No potential conflict of interest was reported by the authors. International Journal of Computer-Assisted Language

Page 16 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

Learning and Teaching, 10(4), 51–65. https://doi.org/ language learning. The Modern Language Journal, 93,
10.4018/IJCALLT.2020100104 741–753. https://doi.org/10.1111/j.1540-4781.2009.
Anderson, J. N., Davidson, N., Morton, H., & Jack, M. A. 00970.x
(2008). Language learning with interactive virtual Coniam, D. (2008). An evaluation of chatbots as software
agent scenarios and speech recognition: Lessons aids to learning English as a second language. The
learned. Computer Animation and Virtual Worlds, 19 EuroCALL Review, 13, 2–14. https://doi.org/10.4995/
(5), 605–619. https://doi.org/10.1002/cav.265 eurocall.2008.16353
Ayedoun, E., Hayashi, Y., & Seta, K. (2015). A conversational Council of Europe. (2020). Common European framework
agent to encourage willingness to communicate in the of reference for languages: Learning, teaching,
context of English as a foreign language. Procedia assessment – companion volume. Council of Europe
Computer Science, 60(1), 1433–1442. https://doi.org/ Publishing. http://www.coe.int/lang-cefr
10.1016/j.procs.2015.08.219 Creswell, J. W., & Creswell, J. D. (2018). Research design:
Bajorek, P. J. (2019, May 10). Voice recognition still has Qualitative, quantitative, and mixed methods
significant race and gender biases. Harvard Business approaches. Sage Publications, Inc.
Review. https://hbr.org/2019/05/voice-recognition- Darancik, Y. (2018). Students’ views on language skills in
still-has-significant-race-and-gender-biases foreign language teaching. International Education
Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2020). Studies, 11(7), 166–178. https://doi.org/10.5539/ies.
Web-based language learning and speaking anxiety. v11n7p166
Computer Assisted Language Learning, 35(5–6), Divekar, R. R., Drozdal, J., Chabot, S., Zhou, Y., Su, H., Chen.,
1058–1089. https://doi.org/10.1080/09588221.2020. Zhu, H., Hendler, J. A., & Braasch, J. (2021). Foreign
1770293 language acquisition via artificial intelligence and
Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2021). extended reality: Design and evaluation. Computer
Effects of ASR-based websites on EFL learners’ Assisted Language Learning, 1–29. Advance Online
vocabulary, speaking anxiety, and language enjoy­ Publication. https://doi.org/10.1080/09588221.2021.
ment. System, 99, 102496. https://doi.org/10.1016/j. 1879162
system.2021.102496 Dizon, G., & Tang, D. (2020). Intelligent personal assis­
Bergman, M. M. (2008). The straw men of the qualitative- tants for autonomous second language learning: An
quantitative divide and their influence on mixed investigation of Alexa. The JALT CALL Journal, 16(2),
methods research. In M. M. Bergman (Ed.), Advances 107–120. https://doi.org/10.29140/jaltcall.v16n2.273
in mixed methods research theories and applications Doremalen, J. V., Boves, L., Colpaert, J., Cucchiarini, C., &
(pp. 11–23). SAGE. Strik, H. (2016). Evaluating automatic speech
Bibauw, S., François, T., & Desmet, P. (2019). Discussing recognition-based language learning systems: A case
with a computer to practice a foreign language: study. Computer Assisted Language Learning, 29(4),
Research synthesis and conceptual framework of 833–851. https://doi.org/10.1080/09588221.2016.
dialogue-based CALL. Computer Assisted Language 1167090
Learning, 32(8), 827–877. https://doi.org/10.1080/ Erickson, G., Bardel, C., Österberg, R., & Rosén, M. (2022).
09588221.2018.1535508 Attitudes and ambiguities – Teachers’ views
Blake, R. J. (2017). Technologies for teaching and learning on second foreign language education in Swedish
L2 speaking. In C. A. Chapelle & S. Sauro (Eds.), The compulsory school. In C. Bardel, C. Hedman,
handbook of technology and second language teach­ K. Rejman, & E. Zetterholm (Eds.), Exploring language
ing and learning (pp. 107–117). Wiley-Blackwell. education: Global and local perspectives (pp.
Braun, V., & Clarke, V. (2006). Qualitative research in 157–201). Stockholm University Press. https://doi.org/
psychology using thematic analysis in psychology. 10.16993/bbz
Qualitative Research in Psychology, 3(2), 77–101. Forsyth, C. M., Luce, C., Zapata-Rivera, D., Jackson, G. T.,
https://doi.org/10.1191/1478088706qp063oa Evanini, K., & So, Y. (2019). Evaluating English lan­
Braun, V., & Clarke, V. (2019). Reflecting on reflexive the­ guage learners’ conversations: Man vs. machine.
matic analysis. Qualitative Research in Sport, Exercise, Computer Assisted Language Learning, 32(4),
and Health, 11(4), 589–597. https://doi.org/10.1080/ 398–417. https://doi.org/10.1080/09588221.2018.
2159676X.2019.1628806 1517126
Brown, J. D. (2009). Open-response Items in question­ Fryer, L. K., Nakao, K., & Thompson, A. (2019). Chatbot
naires. In J. Heigham & R. A. Croker (Eds.), Qualitative learning partners: Connecting learning experiences,
research in applied linguistics (pp. 200–219). Palgrave interest and competence. Computers in Human
Macmillan. Behavior, 93, 279–289. https://doi.org/10.1016/j.chb.
Burden, D., & Savin-Baden, S. (2020). Virtual humans 2018.12.023
today and tomorrow. CRC Press. Gallacher, A., Thompson, A., & Howarth, M. (2018). “My
Burns, A. (2016). Research and the teaching of speaking robot is an idiot!” – Students’ perceptions of AI in the
in the second language classroom. In E. Hinkel (Ed.), L2 classroom. Future-proof CALL: Language learning
Handbook of research in second language teaching as exploration and encounters – short papers from
(pp. 242–256). Routledge. EUROCALL 2018. https://doi.org/10.14705/rpnet.
Byram, M., & Méndez, M.-C. (2011). Communicative lan­ 2018.26.815
guage teaching. In K. Knapp & B. Seidlhofer (Eds.), GDPR. (2018). General data protection regulation. https://
Handbooks of applied linguistics communication eur-lex.europa.eu/legalcontent/SV/TXT/PDF/?uri=
competence language and communication problems CELEX:32016R0679&from=EN
practical solutions (pp. 491–516). Walter de Gruyter. Godwin-Jones, R. (2009). Emerging technologies speech
Chan, D. (2009). So why ask me? Are self-report data tools and technologies. Language Learning and
really that bad? In C. Lance & R. Vandenberg (Eds.), Technology, 13(3), 4–11 http://dx.doi.org/10125/
Statistical and methodological myths and urban 44186.
legends: Doctrine, verity and fable in the organiza­ Goh, C. C. M., & Burns, A. (2012). Teaching speaking:
tional and social sciences (pp. 309–336). Routledge. A holistic approach. Cambridge University Press.
Chapelle, C. A. (2009). The relationship between second Golonka, E. M., Bowles, A. R., Frank, V. M., Richardson, D. L.,
language acquisition theory and computer-assisted & Freynik, S. (2014). Technologies for foreign

Page 17 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

language learning: A review of technology types and experience. CALICO Journal, 32(3), 554–568. Equinox
their effectiveness. Computer Assisted Language Publishing Ltd.https://doi.org/10.1558/cj.v32i3.26620
Learning, 27(1), 70–105. https://doi.org/10.1080/ Li, L. (2017). New Technologies and Language Learning.
09588221.2012.700315 Palgrave.
Griol, D., & Callejas, Z. (2016). Conversational agents as Loewen, S., & Sato, M. (2018). Interaction and
learning facilitators: Experiences with a mobile mul­ instructed second language acquisition. Language
timodal dialogue system architecture. In S. Caballé & Teaching, 51(3), 285–329. https://doi.org/10.1017/
R. Clarisó (Eds.), Formative assessment, learning data S0261444818000125
analytics and gamification in ICT education (pp. Long, M. H. (1996). The role of the linguistic environment
313–331). Academic Press. in second language acquisition. In W. C. Ritchie &
Hassani, K., Nahvi, A., & Ahmadi, A. (2016). Design and T. K. Bhatia (Eds.), Handbook of second language
implementation of an intelligent virtual environment acquisition (pp. 413–468). Academic Press.
for improving speaking and listening skills. Morton, H., & Jack, M. A. (2005). Scenario-based spoken
Interactive Learning Environments, 24(1), 252–271. interaction with virtual agents. Computer Assisted
https://doi.org/10.1080/10494820.2013.846265 Language Learning, 18(3), 171–191. https://doi.org/
Hodges, H., Steffensen, S. V., & Martin, J. E. (2012). Caring, 10.1080/09588220500173344
conversing, and realizing values: New directions in Moussalli, S., & Cardoso, W. (2020). Intelligent personal
language studies. Language Sciences, 34(5), assistants: Can they understand and be understood
499–506. https://doi.org/10.1016/j.langsci.2012.03. by accented L2 learners? Computer Assisted
006 Language Learning, 33(8), 865–890. https://doi.org/
Huang, W., Hew, K. F., & Fryer, L. K. (2022). Chatbots for 10.1080/09588221.2019.1595664
language learning—Are they really useful? Schroeder, N. L., Chiou, E. K., & Craig, S. D. (2021). Trust
A systematic review of chatbot-supported language influences perceptions of virtual humans, but not
learning. Journal of Computer Assisted Learning, 38 necessarily learning. Computers & Education, 160, 1–11.
(1), 237–257. https://doi.org/10.1111/jcal.12610 https://doi.org/10.1016/j.compedu.2020.104039
Jeon, J. (2022). Exploring AI chatbot affordances in the EFL Schroeder, N. L., & Craig, S. D. (2021). Learning with virtual
classroom: Young learners’ experiences and perspec­ humans: Introduction to the special issue. Journal of
tives. Computer Assisted Language Learning, Taylor & Research on Technology in Education, 53(1), 1–7.
Francis Online. https://doi.org/10.1080/09588221. https://doi.org/10.1080/15391523.2020.1863114
2021.2021241 Swain, M. (2000). The output hypothesis and beyond:
Johnson, W. L. (2019). Data-driven development and Mediating acquisition through collaborative dialogue. In
evaluation of Enskill English. International Journal of J. P. Lantolf (Ed.), Sociocultural theory and second lan­
Artificial Intelligence in Education, 29(3), 425–457. guage learning (pp. 97–114). Oxford University Press.
https://doi.org/10.1007/s40593-019-00182-2 Swain, M. (2013). The inseparability of cognition and
Johnson, W. L., & Lester, J. C. (2016). Face-to-face inter­ emotion in second language learning. Language
action with pedagogical agents, twenty years later. Teaching, 46(2), 195–207. https://doi.org/10.1017/
International Journal of Artificial Intelligence in S0261444811000486
Education, 26(1), 25–36. https://doi.org/10.1007/ Swedish Research Council. (2017). Good research practice.
s40593-015-0065-9 https://www.vr.se/english/analysis-and-assignments
Kaplan-Rakowski, R., & Wojdynski, T. (2018). Students’ /we-analyse-and-evaluate/all-publications/publica
attitudes toward high-immersion virtual reality tions/2017-08-31-good-research-practice.html
assisted language learning. In P. Taalas, J. Jalkanen, Sydorenko, T., Daurio, P., & Thorne, S. L. (2018). Refining
L. Bradley, & S. Thouësny Eds. Future-proof CALL: pragmatically-appropriate oral communication via
Language learning as exploration and encounters – computer-simulated conversations. Computer
Short papers from EUROCALL 2018 (pp. 124–129). Assisted Language Learning, 31(1–2), 157–180. http://
Research-publishing.net. https://doi.org/10.14705/ doi.org/10.1080/09588221.2017.1394326
rpnet.2018.26.824 Thiriau, C. (2017). Global Teaching Speaking Survey: The
Kim, N. Y. (2016). Effects of voice chat on EFL learners’ results. Cambridge University Press. https://www.
speaking ability according to proficiency levels. cambridge.org/elt/blog/2017/11/06/teaching-speak
Multimedia-Assisted Language Learning, 19(4), 63– ing-survey-results/
88. https://doi.org/10.15702/mall.2016.19.4.63 Vygotsky, L. S. (1978). Mind in society: The development of
Kukulska-Hulme, A., & Lee, H. (2020). . In K.- higher psychological processes. Harvard University
M. Frederiksen, S. Larsen, L. Bradley, & S. Thouësny Press.
(Eds.), CALL for widening participation: Short papers Wang, N., & Johnson, W. L. (2008). The politeness effect in
from EUROCALL 2020 (pp. 172–176). Research an intelligent foreign language tutoring system. In B.
publishing.net. https://doi.org/10.14705/rpnet.2020. P, Woolf, E., Aïmeur, R., Nkambou, S., Lajoie (Eds.)
48.1184 Intelligent Tutoring Systems. ITS 2008. Lecture Notes
Lantolf, J., Thorne, S. L., & Poehner, M. E. (2014). in Computer Science (Vol. 5091, pp. 270–280). LNCS:
Sociocultural theory and second language develop­ Springer. https://doi.org/10.1007/978-3-540-69132-
ment. In B. Vanpatten & J. Williams(Eds.), Theories 7_31
in second language acquisition: An introduction (pp. Wang, Y. F., Petrina, S., & Feng, F. (2017). VILLAGE–Virtual
207–226). Taylor and Francis. https://doi.org/10. immersive language learning and gaming environ­
4324/9780203628942-16 ment: Immersion and presence. British Journal of
Levy, M. (2015). The role of qualitative approaches to Educational Technology, 48(2), 431–450. https://doi.
research in CALL contexts: Closing in on the learner’s org/10.1111/bjet.12388

Page 18 of 19
Ericsson et al., Cogent Education (2023), 10: 2170088
https://doi.org/10.1080/2331186X.2023.2170088

© 2023 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.
You are free to:
Share — copy and redistribute the material in any medium or format.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions
You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Cogent Education (ISSN: 2331-186X) is published by Cogent OA, part of Taylor & Francis Group.
Publishing with Cogent OA ensures:
• Immediate, universal access to your article on publication
• High visibility and discoverability via the Cogent OA website as well as Taylor & Francis Online
• Download and citation statistics for your article
• Rapid online publication
• Input from, and dialog with, expert editors and editorial boards
• Retention of full copyright of your article
• Guaranteed legacy preservation of your article
• Discounts and waivers for authors in developing regions
Submit your manuscript to a Cogent OA journal at www.CogentOA.com

Page 19 of 19

You might also like