You are on page 1of 13

Innovation in Language Learning and Teaching

ISSN: 1750-1229 (Print) 1750-1237 (Online) Journal homepage: http://www.tandfonline.com/loi/rill20

Exploring the integration of automated feedback


among lower-proficiency EFL learners

Shu Huang & Willy A. Renandya

To cite this article: Shu Huang & Willy A. Renandya (2018): Exploring the integration of automated
feedback among lower-proficiency EFL learners, Innovation in Language Learning and Teaching,
DOI: 10.1080/17501229.2018.1471083

To link to this article: https://doi.org/10.1080/17501229.2018.1471083

Published online: 08 May 2018.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=rill20
INNOVATION IN LANGUAGE LEARNING AND TEACHING
https://doi.org/10.1080/17501229.2018.1471083

Exploring the integration of automated feedback among lower-


proficiency EFL learners
a
Shu Huang and Willy A. Renandyab
a
Chengdu University of Information Technology, Chengdu, China; bNational Institute of Education, Nanyang
Technological University, Singapore, Singapore

ABSTRACT ARTICLE HISTORY


This article reports on an exploratory study which investigated the impact Received 18 December 2017
of automated feedback in a writing class among some Chinese EFL Accepted 26 April 2018
university learners who were at a lower language level. In particular, the
KEYWORDS
study explored the students’ perceptions of Pigai, the largest and most Automated writing
popular locally designed automated writing evaluation system (AWE) in evaluation; English as a
China, and the impact of integrating this AWE tool on revision quality of foreign language writing
student texts. Participants include 67 students enrolled in two classes of instruction; process-based
a College English course at a Chinese university. One of the two classes pedagogy; lower-proficiency
was randomly selected as the experimental group (N = 35) and the other students; student
as the control group (N = 32). Data of the study include student texts of perceptions; revision quality
one academic writing assignment (the pre-test), revised drafts of
another writing assignment (the post-test), and responses to a
questionnaire. Quantitative and qualitative analyses of the questionnaire
show that the lower-proficiency student participants generally thought
highly of the feedback given by Pigai. Statistical analyses of the student
texts, however, reveal that the integration of automated feedback did
not necessarily result in observable improvement in the students’
revised drafts. Implications for writing instruction and the use of AWE
technology in EFL writing classes are also discussed.

Introduction
Automated feedback, also known as computer-generated feedback (Ware 2011), is an area that has
received growing attention in current SLA literature (e.g. Bai and Hu 2017; Cheung 2016; Hyland and
Hyland 2006; Lai 2010; Lawley 2016; Ware 2011; Xi 2010). While previous researchers have been more
concerned about the psychometric properties (reliability and validity) of AWE scoring and feedback
(e.g. Shermis and Burstein 2013), a small and yet growing number of studies have started to examine
the actual impact of specific AWE programs as instructional tools in language classrooms (Hegelhei-
mer, Dursun, and Li 2016; Stevenson 2016). This article extends these recent studies by exploring the
effect of AWE on a group of lower-proficiency EFL students.
The impact of AWE feedback is usually evaluated by observing its effects on the improvement of
students’ writing (e.g. Li, Link, and Hegelheimer 2015; Wang, Shang, and Briody 2013). Much less
attention has been given to how learners perceive what they receive from the computer. Given
that learner perceptions are ‘crucial determinants in their performance as writers’ (Zamel 1987,
p. 699) and that students might ignore the feedback when their expectations about the feedback
were not met (Swain 2006), an understanding of L2 learners’ perceptions towards automated feed-
back is crucial. In addition, as the extant studies were mainly conducted among native speakers or L2

CONTACT Shu Huang hshu56@163.com


© 2018 Informa UK Limited, trading as Taylor & Francis Group
2 S. HUANG AND W. A. RENANDYA

learners of intermediate-above level (Chen and Cheng 2008; Dikli and Bleyle 2014; Fang 2010; Lai
2010), more research on lower-proficiency EFL learners becomes necessary. Furthermore, while it
is common for teachers to integrate AWE feedback in process-based writing instruction, insufficient
studies have been conducted to clarify to what extent this additional integration can benefit the lear-
ners. For instance, Huang and Zhang (2014) claimed that AWE feedback was responsible for 35% of
the revision, but the researchers did not sufficiently explain the role of AWE feedback due to the lack
of a control group. To bridge these important gaps, the present study took a mixed methods research
design to explore the impact of the integration of Pigai, the largest and most popular locally designed
AWE system in China, with a special focus on (1) how students perceive the usefulness of automated
feedback and (2) how the integration of Pigai affects revision quality of student texts.

Literature review
Literature has not been consistent in regard to students’ general perceptions of automated feedback.
Some studies have reported that students think highly of the computer-generated feedback (Fang
2010; Li, Link, and Hegelheimer 2015; Ma 2013; Tsuda 2014). Others, however, have found the oppo-
site (Chen and Cheng 2008; Grimes and Warschauer 2010; Lai 2010). Chen and Cheng (2008), for
instance, found that the students generally show an unfavorable attitude towards the automated
evaluation system used in the writing class. While it was suggested that AWE feedback might be
better perceived by learners with a lower language level (Chen and Cheng 2008), little classroom-
based research has been conducted among learners at such a proficiency level.
A close review of the previous studies indicate that positive comments from the learners were
associated mostly with the effect of AWE on language-related issues of the writing, such as
grammar, word usage, and mechanics (Dikli and Bleyle 2014; Fang 2010). Another noted advantage
of using AWE programs lies in its ability to motivate learners in writing practice (Li, Link, and Hegel-
heimerLi, Link, and Hegelheimer 2015). On a negative note, the most oft-heard complaint about AWE
programs was that there was a lack of feedback at the level of discourse (e.g. Lai 2010; Li, Link, and
HegelheimerLi, Link, and Hegelheimer 2015). Another complaint was that machine could not detect
mistakes in sentences like ‘He hardly worked’ when the student should have written ‘He worked very
hard’ (Tsuda 2014). This was why the two participants in Li, Link, and Hegelheimer, Link, and Hegel-
heimer (2015) commented in the interview that the AWE system was ‘just a machine’ (p. 10). Other
complaints include vagueness of the feedback (e.g. Lai 2010; Li, Link, and HegelheimerLi, Link, and
Hegelheimer 2015), the incomprehensibility of technical terms (Lai 2010), and the unfriendly interface
of the AWE program (e.g. Dikli and Bleyle 2014; Lai 2010; Tsuda 2014). While both positive and nega-
tive comments had been recorded in previous literature, the extant studies were mainly conducted
among native speakers or L2 learners of intermediate-above level (Dikli and Bleyle 2014; Fang 2010;
Lai 2010), and a few had been unspecific about the language level of their participants (Li, Link, and
Hegelheimer 2015), giving little attention to lower-proficiency EFL learners.
The AWE system applied in the present study is Pigai, the largest locally designed web-based AWE
service system in China. As one can see in Figure 1, the system gives an overall score and general
feedback on student writing. In addition, the system gives written feedback in response to each of
the sentences in the composition, providing diagnostic suggestions in regard to a variety of
formal aspects of writing.
This program differs from other AWE systems (e.g. Criterion) mostly in two ways: first, it provides
feedback on the appropriateness of collocations present in the student writing. It would search for
the collocations in its corpus, and if it fails to locate them, it would offer warnings. For instance, in
response to the sentence ‘They use at least one hour to learn English knowledge a day’, the
system warns that the collocation of ‘learn … knowledge’ is seldom used by native speakers of
English (Figure 2). Second, it provides multiple synonyms for some words present in the writing,
which facilitates not only student revision but also the students’ learning of vocabulary.
INNOVATION IN LANGUAGE LEARNING AND TEACHING 3

Figure 1. An example of the feedback by Pigai.

In terms of precision rate, the performance of Pigai seems to fall within an acceptable range. As
can be seen in Bai and Hu (2017), one of the first attempts to examine this issue, Pigai might not
compare very unfavorably with Criterion: the overall precision rate was 45.77%, about 10% lower
than that of Criterion; its precision for grammar errors was 58.71%, only 5% lower; and its precision
for mechanics errors reached 98.07%, almost twice as high as the precision rate of Criterion (50%).
The authors reported that Pigai was reliable in detecting certain error types: (1) conjunction errors,
such as ‘although … but’ and ‘because … so’; (2) misuses of a/an; (3) disagreement in number such
as ‘men is’ and ‘a big trees’; (4) misuses of verb forms as in ‘have grew’. However, misidentifications
might occur when the system examined complex sentences and certain structures. One given
example was that the structure of ‘see sb. do sth’ might be flagged as ‘improper use of a “verb +
clause” construction’. While an increasing number of studies have been undertaken to examine
the effect of Pigai on Chinese students’ writing performance as well as the students’ uptake of
Pigai (Huang 2015; Huang and Zhang 2014; Ma 2013), none of them were concerned specifically
with lower-proficiency EFL learners.
On the whole, one can conclude that classroom-based research on the application of the AWE pro-
grams is not very mature yet. Given the growing popularity of AWE tools in L2 writing classrooms, the
total number of studies that has been carried out to date remains relatively small. Second, not much
effort has been made to investigate how students of lower-proficiency levels perceive Pigai feedback,
and how these students may benefit from the use of automated feedback. Third, little is known as to
whether and to what extent the traditional process-based writing project might benefit from the inte-
gration of AWE technology. To fill these important gaps, the present study was conducted. In

Figure 2. Feedback on collocation by Pigai.


4 S. HUANG AND W. A. RENANDYA

particular, it examined the impact of the use of an AWE tool in process-based writing classes among a
group of low-proficiency Chinese EFL college students. Two research questions were formulated:

(1) What are the students’ perceptions of the feedback by Pigai?


(2) Does the integration of automated feedback affect the revision quality of student texts?

Methods
Participants and context
The present study took place in the context of a 16-week College English class in a national university
in China. College English is a mandatory course required for all non-English-major students in their
first and second years of college in the country, and is often examination-oriented due to a variety
of reasons such as the students’ test-driven learning style (You 2004) and the pressure teachers
are under about the passing rate of CET-4, i.e. the prevailing standard English test administered by
the National Ministry of Education of the country. In this context, the quality of students’ writing is
often evaluated based on the CET-4 writing rubric that uses impressionistic or holistic grading that
emphasizes ‘correct form rather than well-developed thought’ (You 2004, p. 104). In view of the popu-
larity of the process-based pedagogy in the field (Lee 2011), many College English teachers have
started to introduce brainstorming and peer review into the instruction (You 2004). However, the
writing cycle in these contexts is not completely the same as the ones reported in the literature
(e.g. Min 2005). Because of the large class size and heavy workload of the teachers, teacher feedback
usually comes as a summative score and is often delayed until several weeks after students’ sub-
mission of the compositions.
Participants in the present study were 67 students enrolled in two classes of a College English
course with one instructor teaching the two classes. The student participants were aged about 20
and were all sophomores. They were placed into two lower-proficiency classes by the school. Their
most recent CET-4 scores were below 425 (roughly 50 in TOEFL iBT). One of the two classes was ran-
domly selected as the experimental group (N = 35) and the other as the control group (N = 32).

Research instruments
To answer the research questions in this study, two types of instruments were developed: (1) writing
task prompt; and (2) questionnaire survey, i.e. a questionnaire on learner perceptions of automated
feedback. Two writing task prompts were developed: one for the pre-test and another for the peer
review activity. Both of them were modeled after the CET-4 writing task, and both tasks were on
topics familiar to the students.
The questionnaire (Appendix 1) was developed to probe into students’ perceptions of automated
feedback by Pigai. It includes two sections. The first section consists of 13 closed-ended items on a six-
point Likert scale, with 1 indicating strong disagreement and 6 indicating strong agreement. The 13
items were inspired by previous research on automated feedback (e.g. Hyland and Hyland 2006; Ware
2011; Xi 2010), and were written to probe into students’ (a) perceived comprehensibility of the feed-
back; (b) perceived value of the feedback for revision; (c) perceived value of the feedback for English
writing performance; and (d) perceived value for subsequent peer review activity. The internal con-
sistency reliability of the questionnaire items was above the standard reliability threshold: the Cron-
bach Alpha coefficient was .78 for the perceived comprehensibility of the feedback, .80 for perceived
usefulness of the feedback for composition revision, .74 for perceived usefulness of the feedback for
the learning of writing, and .78 for perceived value of automated feedback for the subsequent peer
review activity. The second section of the questionnaire contains 4 open-ended questions that
require students to comment on the four content areas above and provide examples wherever
possible.
INNOVATION IN LANGUAGE LEARNING AND TEACHING 5

Procedures
The main study was preceded by a piloting phase whereby all the major instruments were tried out
with a sample of six students in another lower-proficiency class of the university taught by another
teacher. The students represented a population similar to that of the main study but were not
included in the main study. Four students filled in the first version of the questionnaire, and based
on their suggestions, some questionnaire items were modified and the Chinese version of the ques-
tionnaire was constructed. Then, two other students filled in the new questionnaire and the wording
of some items was further modified.
The main study lasted for four sessions over two weeks. An outline of the experiment procedures
for both groups was presented as in Table 1.
In the first session, students in both groups were asked to write a composition in response to the
prompt for the pre-test. After that, a training program following Min (2005) was conducted for all the
students on peer review. This is considered necessary because the quality of feedback has been
found to be an important factor that influences the progression of the activity (Wang 2014).
The focus of the second session was still on student training. Thirty minutes before the second
session ended, students in the experimental group were introduced to Pigai. The teacher modeled
how to interact with the system step by step. The teacher was asked not to add any of her own com-
ments on the system. As homework, students in the experimental group were asked to explore the
website after class. On the other hand, students in the control group were merely told about some
strategies related to the CET-4 test during this 30-minute period.
In the third session, the class for the experimental group was held in a lab with wired computers.
Students were first given 30 min to write a composition on a piece of paper. Then, they were asked to
link to Pigai, log in to their own accounts, type the composition into the required page, and seek for
feedback by the machine. After handing in the first draft, students were given 15 min to revise the
composition based on the computer-generated feedback. After the break, each student was asked
to conduct an anonymous peer review. Similarly, each student in the control group was instructed
to write a composition in 30 min and to participate in the anonymous peer review activity, but
they did not receive automated feedback as the experimental group did. Considering the student
profile in terms of English proficiency level and the finding in literature that ‘L2 students may be
… handicapped when they are required to give oral or written feedback in the L2, the language
they are learning’ (Hu and Lam 2010, p. 374), the students’ L1 (i.e. Chinese) rather than English
was adopted as the main language for communication in the peer review.

Table 1. A summary of the experimental procedures.


Experimental Group Control Group
Session1 a. Pre-test writing performance a. Pre-test writing performance
b. Student training of peer review (Part 1) b. Student training of peer review (Part 1)

Session 2 a. Student training of peer review (Part 2) a. Student training of peer review (Part 2)
b. Students were introduced to Pigai

Session 3 a. Students write 1st draft in response to the writing prompt a. Students write 1st draft in response to the
b. Revise the draft based on feedback by Pigai writing prompt
c. Students conduct anonymous peer review b. Students conduct anonymous peer
review

Session 4 a. Students revise their compositions based on peer feedback a. Students revise their compositions based
b. Students hand in the final draft on peer feedback
c. Students complete the questionnaire on learner b. Students hand in the final draft
perceptions of the automated feedback
6 S. HUANG AND W. A. RENANDYA

In the last session, students in both groups were given back their own compositions and the anon-
ymous peer feedback. Then, they were asked to revise the compositions based on peer feedback and
to hand in the revised version to the teacher for assessment. After a 10-minute break, the question-
naire on learner perceptions of automated feedback was distributed to the students in the exper-
imental group.

Data analysis
To address the first research question, data from the questionnaire (see Appendix 1) were analysed.
The closed-ended items were coded using the exact Likert-scale points in the questionnaire (1 =
Strongly disagree, 2 = Disagree, 3 = Somewhat disagree, 4 = Somewhat agree, 5 = Agree, and 6 =
Strongly agree), and presented after some basic descriptive statistics. The open-ended questions
in the questionnaire were analysed using the method of ‘grounded theory’. The data were first
read through several times for a general impression. Then, an ‘open coding’ is conducted when
the responses were broken down into sentence-level chunks and coded using ‘positive’ or ‘negative’.
After that, data in each subcategory were labeled using notions employed by Jacobs et al. (1981) such
as vocabulary, collocation, and spelling. In order to establish coding reliability, another coder was
invited into the study. We independently coded all the comments by five randomly selected partici-
pants and the inter-rater agreement was good: 98% for classifying comments by attitude, and 93% by
classifying comments by the rubric-related themes. Given the good coding reliability, the remaining
data were then independently coded.
In response to the second research question, student compositions were collected both at the first
and the last session of the study. Students’ compositions collected at the first session (as the pre-test)
were scored using the CET-4 marking scheme because the purpose of the pre-test was to check
whether the two groups were comparable in terms of writing proficiency and an impressionistic
grading scheme was deemed sufficient for this purpose. The revised drafts (collected in the last
section) were scored using the 100-point marking scheme by Jacobs et al. (1981), which contains
detailed descriptors in each of the five categories including content, organization, vocabulary,
language use, and mechanics, and were rated for each of the five aspects: content (maximum =
30), organization (maximum = 20), vocabulary (maximum = 20), language use (maximum = 25),
mechanics (maximum = 5). The student compositions were randomized and evaluated blindly (i.e.
without identifying information about which group the student writer belonged to). Before the
first researcher marked all the compositions, efforts were made to ensure marking reliability. The
first researcher and an experienced English teacher independently marked 5 compositions randomly
selected from the pool (2 compositions from the pre-test and 3 from the revised texts). The inter-rater
agreement was good: the ranking orders for the five compositions in all categories were almost the
same. Both descriptive and inferential statistics were then run to analyse scores generated so far.
Independent-sample t-tests were run to determine (1) whether the two groups were comparable
in the pre-test; and (2) whether the experimental group and the control group were different in
post-test scores for content, organization, vocabulary, language use, mechanics, and the overall
scores.

Results and discussions


Learner perceptions of AWE feedback
Perceived comprehensibility of feedback by Pigai
As can be seen from Table 2, only one student disagreed with the item ‘I can understand the feedback
by Pigai’. The student quoted a line from the AWE feedback he received, ‘this collocation has
appeared in corpus for 45 times’ and commented, ‘I don’t know what it means. Is the frequency
the higher the better? This should have been mentioned somewhere’. While the perceived
INNOVATION IN LANGUAGE LEARNING AND TEACHING 7

Table 2. Perceived comprehensibility of automated feedback.


Average
StD D SoD SoA A StA Response
I can understand feedback by Pigai. 0 1(3%) 0 11(31%) 22(63%) 1(3%) 4.6
I know how to revise the composition based on 0 1(3%) 11 15(43%) 8(23%) 0 3.9
feedback I receive from Pigai. (31%)
I think the feedback by Pigai is clear. 0 2(6%) 6(17%) 19(54%) 7(20%) 1(3%) 4.0
Average 4.2
Note: StD = Strongly Disagree; D = Disagree; SoD = Somewhat Disagree; SoA = Somewhat Agree; A = Agree; StA = Strongly Agree.

comprehensibility appeared high, with the average response scoring 4.6, about one third of the stu-
dents reported disagreement regarding ‘I know how to revise the composition based on the feed-
back I receive from Pigai’, indicating that the feedback was not informative enough to many
students. A student said,
It commented beside a sentence of mine, saying that the sentence was not correct in terms of grammar. Since it
did not state clearly which part of the sentence was incorrect, I found it difficult to incorporate the suggestion into
my revision.

Likewise, another student complained, ‘It flagged a mistake in collocation, but did not advise which
word to use’. Such responses confirmed the observation in literature (Cheung 2016; Ware 2011) that
students did not necessarily have the knowledge to interpret the AWE feedback, and revealed that
indirect feedback might cause difficulties for lower-proficiency learners.

Perceived usefulness of the automated feedback for composition revision


As can be seen from Table 3, students indicated quite favorable attitudes towards the helpfulness of
the feedback for composition revision. All students agreed with the item ‘I think it can help me
improve the quality of this composition’, among whom 20% had expressed strong agreement.
94% of the students believed ‘It can help me get higher score for this composition’. Examples of
the typical comments are ‘It removed the basic grammar errors, and introduced more “advanced”
words into my composition. I feel sure that it can get me a higher score’, ‘The feedback is very
useful for my revision. It’s pointed out a lot of problems in grammar, capitalization, and word spelling’,
and ‘It helps me polish the text by indicating the misuse of inappropriate collocations in my writing’.
A small number of students (N = 3), however, commented that Pigai could not ‘really enhance com-
position quality’ because ‘it could not provide useful suggestions on content and organization’.

Perceived usefulness of the automated feedback for enhancing writing performance


Table 4 displays students’ belief in the long-term use of automated feedback. As indicated by an
average score of 4.5, the majority of students were optimistic about the value of using Pigai for
their writing performance. Many students thought highly of the synonym-related feedback of
Pigai, commenting that ‘It’s good that they can list out the “advanced” words that I may utilize in
my text’, and that ‘such feedback help me recall more words I learned before’, which explained

Table 3. Perceived usefulness of the automated feedback for composition revision.


Average
StD D SoD SoA A StA Response
The feedback can help me correct grammar mistakes in this 0 0 3 13(37%) 14(40%) 5(14%) 4.6
composition. (9%)
It can help me get higher score for this composition. 0 0 2 7(20%) 22(63%) 4(11%) 4.8
(6%)
I think it can help me improve the quality of this 0 0 0 7(20%) 21(60%) 7(20%) 5.0
composition.
Average 4.8
Note: StD = Strongly Disagree; D = Disagree; SoD = Somewhat Disagree; SoA = Somewhat Agree; A = Agree; StA = Strongly Agree.
8 S. HUANG AND W. A. RENANDYA

Table 4. Perceived usefulness of the automated feedback for enhancing writing performance.
Average
StD D SoD SoA A StA Response
The feedback can help me realize my writing 0 0 4(11%) 15(43%) 13(37%) 3(9%) 4.4
problems.
It can help me improve my grammar. 0 0 7(20%) 11(31%) 14(40%) 3(9%) 4.4
It can help me enlarge my vocabulary. 0 2 4(11%) 10(29%) 15(43%) 4(11%) 4.4
(6%)
I think it can help me enhance my writing 0 0 3(9%) 7(20%) 22(63%) 3(9%) 4.7
performance.
Average 4.5
Note: StD = Strongly Disagree; D = Disagree; SoD = Somewhat Disagree; SoA = Somewhat Agree; A = Agree; StA = Strongly Agree.

why there was a high percentage of agreement (83%) regarding the item ‘It helps me enlarge my
vocabulary’. Quite a few also comment on the use of automated feedback for the learning of
English grammar, as in ‘Pigai can identify our grammar errors. It reminds me of the grammatical
rules I learned in high school’, and ‘It can point out the basic grammar errors and I can try to
avoid making these mistakes in my next composition’. In line with the high percentage of agreement
(89%) regarding the item ‘The feedback helps me realize my writing problems’, many students pro-
duced comments like ‘I’ve never noticed so many of my writing problems in mechanics until I read
the feedback’, and ‘it guides me to discover the previously unattended language use problems in my
English writing, such as spelling, capitalization, and collocation’. Still, two students expressed doubts
about the usefulness of the AWE system, commenting that ‘What matters in writing is content, and
AWE systems cannot help us on that’.

Perceived usefulness of the feedback for peer review


As can be seen from Table 5, most students (91%) considered the integration of automated feedback
both helpful and necessary for the subsequent peer review activity. A student wrote, ‘our English is
not good. My classmates may fail to identify problems in the writing (without this step)’. Another
student explained, ‘Pigai can point out basic errors and make the composition easier to understand’.
One student said frankly, ‘it is absolutely necessary because we are not capable enough to identify
language errors in the writing, and we might not understand the composition to be reviewed if
we did peer review without Pigai’.
Very few negative comments were found in this regard. The only instance of disagreement col-
lected in the present study was ‘Pigai can to some extent identify our writing problems (but) … I
think it would be better if we can do peer review first and then submit the text to Pigai’. It seems
that the respondent did not question the value of Pigai, but merely suggested a change of sequence
in the pedagogical activity.
Overall, the results above reveal that the student participants in the present study have a high per-
ception of the use of Pigai: over 90% of the participants were convinced of the value of Pigai for their
writing performance. Qualitative analysis shows that the identified strengths of the AWE use were
largely consistent with those in literature (e.g. Chen and Cheng 2008; Dikli and Bleyle 2014; Ma

Table 5. Perceived usefulness of the feedback for peer review.


Average
StD D SoD SoA A StA Response
It helps me gain more confidence when I show my 1(3%) 2 7(20%) 10(29%) 14(40%) 1(3%) 4.1
classmates the composition during the peer review. (6%)
I think it is necessary to receive the feedback before 1(3%) 0 2(6%) 9(26%) 19(54%) 4(11%) 4.6
the peer review.
I think it is helpful to receive the feedback before the 1(3%) 0 2(6%) 6(17%) 20(57%) 6(17%) 4.8
peer review.
Average 4.5
Note: StD = Strongly Disagree; D = Disagree; SoD = Somewhat Disagree; SoA = Somewhat Agree; A = Agree; StA = Strongly Agree.
INNOVATION IN LANGUAGE LEARNING AND TEACHING 9

2013; Xi 2010). Many students report that Pigai could help them identify and correct grammatical,
lexical and mechanical errors, could benefit their vocabulary learning, and could warn them of the
previously unattended language problems in writing. Weaknesses of the machine, however,
seemed to have been less mentioned. While students did complain about the use of indirect feed-
back (echoing Lai (2010) that AWE tools could be easier to incorporate if they provide direct
rather than indirect feedback), and of the confusing technical terms (see Li, Link, and Hegelheimer
2015), only a small number (N = 3) of the participants commented on the lack of feedback on
content and organization, which was often a major concern in previous studies (e.g. Chen and
Cheng 2008; Lai 2010; Li, Link, and Hegelheimer 2015; Yang 2004). Moreover, none of the participants
doubted about the accuracy of the AWE feedback (see Tsuda 2014).
One explanation for such discrepancies could lie in the leaners’ language proficiency level. As is
suggested by Chen and Cheng (2008), learners ‘in the early stage learning of L2 writing’ (p. 107)
might put more emphasis on language form rather than meaning, and hence were less concerned
with the fact that they did not receive from the machine sufficient feedback at the level of discourse.
At the same time, as these students might not be able to detect inaccuracies in the AWE feedback
due to their own inadequate language ability, they were more likely to trust the AWE feedback,
which contributes also to their positive attitudes towards the machine. A second explanation
might lie in the different learning contexts. In the present study, students’ learning of writing was
mostly driven by the accuracy-oriented CET-4 and because of that, it may not be a surprise why
the form-related feedback along with the vocabulary-oriented learning features were so highly
appreciated. The difference might also be explained by the use of different AWE tools in study. It
is possible that Chinese students feel more comfortable with Pigai, the locally developed AWE
system that provides feedback in their mother tongue.

The impact of the integration of automated feedback on quality of student revisions


To answer this question, the present study collected 67 essays (35 experimental and 32 control) from the
pre-test, and 67 (35 experimental and 32 control) after the intervention. An independent-samples t-test
did not identify any significant between-group difference on the pre-test scores of the students (Table 6).
To determine whether revision quality of student texts in the experimental group was better than the
control group in the five subcategories of composition (i.e. content, organization, vocabulary, language
use, and mechanics) and the overall score, six independent-samples t-tests were run. No significant differ-
ence, however, was found in any of the tests (see Table 7), suggesting that the quality of student revisions
may not benefit much from the additional integration of automated feedback.
While the results seemed to indicate that the integration of Pigai was not an effective pedagogical
activity with lower-proficiency Chinese students, it was worth noting that the results were likely to be
influenced by other factors. One possible reason was that new grammatical mistakes could be pro-
duced when students attempted to incorporate content-related feedback from their peers in the final
drafts, which then lowered their scores in language use. In other words, the results might differ if
these same students had been encouraged to make use of Pigai throughout the whole writing.
Second, due to time constraints, the study lasted for only two weeks. A more sustained use of the
system might have yielded a different outcome. Third, while the AWE tool utilized in the present
study is generally considered user-friendly, it was still new to the students, and such unfamiliarity
could be one reason behind the negative results. Still, the finding above provides a rebuttal to the

Table 6. Between-group comparison on students’ writing scores before the intervention.


Experimental Control
Groups compared (N = 35) (N = 32)
M SD M SD df t p
(2-tailed)
Writing scores 6.29 1.64 6.5 1.76 65 −0.52 0.61
10 S. HUANG AND W. A. RENANDYA

Table 7. Between-group comparison on students’ writing scores after the intervention.


Experimental Control
Groups compared (N = 35) (N = 32)

M SD M SD df t p
(2-tailed)
Content 20.11 1.79 19.34 2.48 65 .33 .148
Organization 16.17 1.58 16.03 1.89 65 .33 .74
Vocabulary 15.86 .94 15.63 1.24 65 .87 .39
Language Use 17.46 1.58 18.03 1.77 65 −1.40 .17
Mechanics 4.00 0.64 3.72 0.73 65 .33 .10
Total 73.6 4.55 72.75 6.06 65 .65 .52

prevalent belief that providing more sources of feedback could result naturally in higher revision
quality. By revealing that students may not benefit much from the integration of AWE technology
if such integration is not implemented within a well thought-out pedagogical design, the study
echoed Hegelheimer, Dursun, and Li (2016) that ‘researchers and practitioners regarding the use
of AWE tools as part of classroom instruction (should) putting the learners at the center’ (p. iv).

Conclusion and implications


The present study was conducted to examine the impact of the integration of automated feedback in
process-based writing classes among a group of Chinese EFL university learners at the lower-inter-
mediate language level. In particular, the study explored the students’ perceptions of an AWE
system called Pigai, and the impact of the integration of this AWE tool on students’ revision
quality. The findings show that the lower-proficiency L2 students, i.e. those who scored below 425
in CET-4 (roughly below 50 in TOEFL iBT), were positive about the feedback by Pigai in spite of the
observed technological limitations, and that the integration of AWE feedback did not necessarily
result in observable improvement in the students’ revised drafts as was sometimes assumed.
Several practical implications can be drawn from the present study for the use of AWE technology
and peer review in L2 process-based writing instruction. First, the study lends support to the use of AWE
tools among EFL learners who are still on the way towards acquiring the language. However, to make
the best use of them, teachers may have to scaffold students’ interpretations of the machine-generated
feedback. Second, the study reminds us that the integration of technology into language classrooms
does not necessarily make the job of teaching writing easier. Instead of taking it for granted that the
additional feedback source would benefit learners to a greater extent, teachers may want to give a
fuller consideration of the design of the pedagogical activity, so as to augment the benefits and to mini-
mize the problems brought about by the AWE technology. Finally, the present study indicates that the
language-related issues in writing might be overemphasized in examination-driven, accuracy-oriented
EFL contexts, which suggests that writing instructors in these contexts may need to consider orienting
students towards a more comprehensive view of what makes a good writing.
The present study is a small-scale one exploring the integration of automated feedback in the L2
writing classrooms, and has only investigated the application of one AWE tool in one particular
context. Therefore, the generalizability of the findings would depend to a large extent on the simi-
larity of other contexts to the present one. In addition, due to time constraints, the study lasted
for only two weeks. It is possible that results of the present study might differ with a longer exper-
imental period. Future research might explore other ways of integrating the automated feedback
(e.g. asking students to conduct peer review at the help of AWE tools). Future research might also
include textual analyses of the feedback by Pigai. Also needed is research that examines teacher per-
ceptions of automated feedback and how such perceptions might influence its application in the
writing classrooms. Finally, there is a clear need for longitudinal research that can provide deeper
insights as to whether students’ perceptions of AWE feedback change over time. Research of
INNOVATION IN LANGUAGE LEARNING AND TEACHING 11

these types should be able to provide valuable insights into the questions of how to make the best
use of technology to enhance classroom teaching, and how the AWE programs might be developed
to facilitate writing instruction in different contexts.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by Education Department of Sichuan Province: [Grant Number 18SB0126].

Notes on contributors
Shu Huang is an English teacher at Chengdu University of Information Technology, China. Her research interests include
technology in language teaching, classroom-based research methodology, second language writing, and second
language listening. She has published a few articles in the area of applied linguistics and TESOL in Chinese journals.
Email: hshu56@163.com.
Dr Willy A. Renandya is a language teacher educator, currently teaching applied linguistics courses at the National Insti-
tute of Education, Singapore. He has taught in many parts of Asia, including Singapore, Malaysia, Indonesia, the Philip-
pines, and Vietnam. His most recent books are Motivation in the language classroom (2014, TESOL International), Simple,
Powerful Strategies for Student Centred Learning with George Jacobs and Michael Power (2016, Springer), and English
Language Teaching Today: Linking Theory and Practice (co-edited with Handoyo Widodo, Springer, 2016). He maintains
an active language teacher professional development forum called Teacher Voices: https://www.facebook.com/groups/
teachervoices/. Email: willy.renandya@nie.edu.sg.

ORCID
Shu Huang http://orcid.org/0000-0002-3818-7881

References
Bai, L., and G. Hu. 2017. “In the Face of Fallible AWE Feedback: How Do Students Respond?” Educational Psychology 37 (1):
67–81.
Chen, C. F. E., and W. Y. E. Cheng. 2008. “Beyond the Design of Automated Writing Evaluation: Pedagogical Practices and
Perceived Learning Effectiveness in EFL Writing Classes.” Language Learning & Technology 12 (2): 94–112.
Cheung, Y. L. 2016. “Feedback from Automated Essay Evaluation Systems: A Review of Selected Research.” TESL Reporter
48 (2): 1–15.
Dikli, S., and S. Bleyle. 2014. “Automated Essay Scoring Feedback for Second Language Writers: How Does It Compare to
Instructor Feedback?” Assessing Writing 22: 1–17.
Fang, Y. 2010. “Perceptions of the Computer-Assisted Writing Program among EFL College Learners.” Educational
Technology & Society 13 (3): 246–256.
Grimes, D., and M. Warschauer. 2010. “Utility in a Falliable Tool: A Multi-Site Case Study of Automated Writing Evaluation.”
Journal of Technology, Learning, and Assessment 8 (6): 1–43.
Hegelheimer, V., A. Dursun, and Z. Li. 2016. “Automated Writing Evaluation in Language Teaching: Theory, Development,
and Application.” CALICO JOURNAL 33 (1): i–v.
Hu, G., and S. T. E. Lam. 2010. “Issues of Cultural Appropriateness and Pedagogical Efficacy: Exploring Peer Review in a
Second Language Writing Class.” Instructional Science 38 (4): 371–394.
Huang, H. 2015. “Zai xian da xue ying yu xie zuo xing cheng xing ping jia mo xing gou jian yan jiu. Research on Model
Construction of Online College English Writing Formative Assessment.” Modern Educational Technology 25 (1): 79–86.
Huang, J., and W. Zhang. 2014. “Duo yuan fan kui dui da xue sheng ying yu zuo wen xiu gai de ying xiang yan jiu. The
Impact of the Integrated Feedback on Students’ Writing Revision.” Foreign Languages in China 11 (1): 51–56.
Hyland, K., and F. Hyland. 2006. “Feedback on Second Language Students’ Writing.” Language Teaching 39 (02): 83.
Jacobs, H., S. Zinkgraf, D. Wormuth, V. Hartfiel, and J. Hughey. 1981. Testing ESL Composition: A Practical Approach. Rowley,
MA: Newbury House.
Lai, Y. 2010. “Which Do Students Prefer to Evaluate their Essays: Peers or Computer Program.” British Journal of
Educational Technology 41 (3): 432–454.
12 S. HUANG AND W. A. RENANDYA

Lawley, J. 2016. “Spelling: Computerised Feedback for Self-Correction.” Computer Assisted Language Learning 29 (5): 868–
880.
Lee, I. 2011. “Bringing Innovation to EFL Writing through a Focus on Assessment for Learning.” Innovation in Language
Learning and Teaching 5 (1): 19–33.
Li, J., S. Link, and V. Hegelheimer. 2015. “Rethinking the Role of Automated Writing Evaluation (AWE) Feedback in ESL
Writing Instruction.” Journal of Second Language Writing 27: 1–18.
Ma, K. 2013. “Improving EFL Graduate Students’ Proficiency in Writing through an Online Automated Essay Assessing
System.” English Language Teaching 6 (7): 158–167.
Min, H.-T. 2005. “Training Students to Become Successful Peer Reviewers.” System 33 (2): 293–308.
Shermis, M. D., and J. Burstein. (Eds.). (2013). Handbook of Automated Essay Evaluation: Current Applications and New
Directions. New York: Routledge.
Stevenson, M. 2016. “A Critical Interpretative Synthesis: The Integration of Automated Writing Evaluation into Classroom
Writing Instruction.” Computers and Composition 42: 1–16.
Swain, M. 2006. “Languaging, Agency and Collaboration in Advanced Language Proficiency.” In Advanced Language
Learning: The Contribution of Halliday and Vygotsky, edited by H. Byrnes, 95–108. New York: Continuum.
Tsuda, N. 2014. “Implementing Criterion (Automated Writing Evaluation) in Japanese College EFL Classes.” 言語と文化 18:
25–45.
Wang, W. 2014. “Students’ Perceptions of Rubric-Referenced Peer Feedback on EFL Writing: A Longitudinal Inquiry.”
Assessing Writing 19: 80–96.
Wang, Y. J., H. F. Shang, and P. Briody. 2013. “Exploring the Impact of Using Automated Writing Evaluation in English as a
Foreign Language University Students’ Writing.” Computer Assisted Language Learning 26 (3): 234–257.
Ware, P. 2011. “Computer-Generated Feedback on Student Writing.” TESOL Quarterly 45 (4): 769–774.
Xi, X. 2010. “Automated Scoring and Feedback Systems: Where Are We and Where Are We Heading?” Language Testing 27
(3): 291–300.
Yang, N. D. 2004. Using MyAccess in EFL writing. In The Proceedings of 2004 International Confere and Workshop on TEFL &
Applied Linguistics (pp. 550–564). Taipei, Taiwan: Ming Chuan University.
You, X. 2004. “‘The Choice Made from No Choice’: English Writing Instruction in a Chinese University.” Journal of Second
Language Writing 13 (2): 97–110.
Zamel, V. 1987. “Recent Research on Writing Pedagogy.” TESOL Quarterly 21 (4): 697–715.

Appendix 1. Student questionnaire: perceptions of automated feedback.

Part 1: Please indicate how much you agree or disagree with the statements. Put a tick (√) in the relevant box.
1 = Strongly Disagree; 2 = Disagree; 3 = Somewhat Disagree; 4 = Somewhat Agree; 5 = Agree; 6 = Strongly Agree.

      
My Perceptions of the feedback by Pigai 1 2 3 4 5 6
         
1. The feedback can help me correct grammar mistakes in this composition.      
2. I can understand feedback by Pigai.      
3. The feedback can help me realize my writing problems.      
4. I know how to revise the composition based on feedback I receive from Pigai.      
5. I think it can help me improve the quality of this composition.      
6. I think the feedback by Pigai is clear.      
7. It helps me gain me more confidence when I show my classmates the composition during the peer review.          
8. It can help me improve my grammar.      
9. It can help me enlarge my vocabulary.      
10. It can help me get higher score for this composition.      
11. I think it is necessary to receive the feedback before the peer review.      
12. I think it can help me enhance my writing performance.      
13. I think it is helpful to receive the feedback before the peer review.      

Part 2: Questions.

1. Can you understand the feedback by Pigai? Is there anywhere that you find not clear? Please explain in detail.
2. To what extent has the feedback by Pigai helped you improve the quality of this composition? In what way? Please
give examples to illustrate your point.
3. Do you think Pigai can help you improve your writing performance? (To what extent? In what aspect?) Please give
examples to illustrate your point.
4. Do you think it is necessary to receive feedback from Pigai before the peer review activity? Please explain in detail.

You might also like