You are on page 1of 15

IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL.

17, 2024 413

A Feasible Study of a Deep Learning Model


Supporting Human–Machine Collaborative
Learning of Object-Oriented Programming
Feng Hsu Wang

Abstract—Due to the development of deep learning technology, provided by a compiler is needed to help them solve the struc-
its application in education has received increasing attention tural design problems of OOP programs. Another reason is
from researchers. Intelligent agents based on deep learning
technology can perform higher order intellectual tasks than ever. that students must wait for a teacher to help them solve these
However, the high deployment cost of deep learning models has problems due to the many students in a classroom. Therefore,
hindered their widespread application in education. In addition, there is an urgent need for computing tools to help students
there needs to be more research on applying deep learning learn OOP concepts and skills effectively.
technology in education. In this article, we develop an intelligent With the advancement of deep learning technology, its
agent using a performer-based encoder–decoder neural model to
classify object-oriented programming (OOP) errors in student practical applications have been developed with great success
code and generate hint feedback in natural language to help in many areas. Nevertheless, its application in educational
students correct the code. This study investigates the feasibility of practice is still in its infancy [1]. Deep learning technology’s
deploying this agent in an educational setting to support the high computational power requirements hinder its widespread
learning of OOP. This study first examines the low-speed application in education. Therefore, downsizing deep learning
inference problem of the deep learning model. A fast inference
algorithm is proposed for the model, which achieves a speedup of
technology is the first step to overcoming this barrier. For
eighty times. This study further explores integrating a human– example, transformer [2] is a state-of-the-art deep learning
machine collaborative learning process with the deep learning architecture that consumes much space and time to develop
agent. Students were surveyed about their perceptions of the solution models. A successful downsizing effort for trans-
agent in supporting learning. Student responses are interpreted former is performer [3], which takes only linear time to pro-
within the learning partnerships model (LPM) framework to show
how the agent’s technical automation and autonomy features cess a sequence of tokens to build a solution model. However,
support student-agent learning partnerships. Finally, implications performer still suffers from slow inference speed during the
and suggestions for educational application and research of deep model deployment phase. For example, on a workstation-level
learning technology are presented. computer, the inference speed of performer will take tens of
Index Terms—Deep learning technology, human–machine col- minutes to generate feedback of hundreds of lengths. It is hard
laborative learning, intelligent tutoring system, learning partner- to imagine students willing to spend so long waiting for a sys-
ships model (LPM), object-oriented programming (OOP). tem to provide feedback. Therefore, improving the model
inference speed is the first step to making the performer-based
deep learning models technically feasible for educational
I. INTRODUCTION practice.
Furthermore, successfully integrating new technologies into
O BJECT-ORIENTED programming (OOP) is an essential
programming skill that has evolved into a programming
paradigm adopted by software industries to develop reliable
education would improve students’ motivation and allow
them to achieve high-level educational goals through human–
and flexible software. As an elementary programming course machine collaboration [4]. Therefore, this study explores the
for university information-engineering departments, most stu- integration of the deep-learning agent into a human–machine
dents perceive the course as complex. One reason is the need collaborative learning process. Based on the technical automa-
for an appropriate tool to help students learn OOP concepts tion features of an agent [5], this study proposes a human–
and skills. More than information about syntactic code errors machine collaborative learning process in which students
were asked to solve OOP problems with the aids of the intelli-
Manuscript received 7 July 2022; revised 25 September 2022 and 28 gent agent developed using the performer-based deep learning
November 2022; accepted 30 November 2022. Date of publication 2 Decem- model. Students were surveyed to find out what they think
ber 2022; date of current version 4 January 2024. This work was supported by about the agent that supports OOP learning. Their responses
the Ministry of Science and Technology, Taiwan, under Grants MOST 110-
2221-E-130-009 and MOST 111-2410-H-130-023. to the survey are interpreted within the learning partnerships
The author is with the Department of Computer Science and Information model (LPM) [6] to show how the agent’s technical automa-
Engineering, Ming Chuan University, Taoyuan City 333, Taiwan tion and autonomy features support student–agent learning
(e-mail: fhwang@mail.mcu.edu.tw).
Digital Object Identifier 10.1109/TLT.2022.3226345 partnerships. Finally, the contributions of the intelligent agent
1939-1382 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
414 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 17, 2024

and the potential risks of using the agent in programming edu- B. Barriers and Challenges to Human–Machine
cation are discussed, and future research directions are pro- Collaborative Learning for Programming
posed. As a result, the research questions of this study are as Intelligent agent systems for educational contexts have long
follows. attracted the attention of researchers aiming to apply artificial
1) How to improve the inference speed of the performer- intelligence to educational research and practice. A salient fea-
based model to make it a technically feasible solution ture of such systems is generating feedback in natural language
in education? to support learning [7], [8]. Educational feedback includes for-
2) How the intelligent agent can support student–agent mative feedback intended to change students’ thinking or behav-
learning partnerships in a human–machine collaborative ior to improve learning [9]. Formative feedback is more
learning process? important in learning than superficial feedback, such as grades
The rest of this article is organized as follows. Section II [10], [11]. For programming education, formative feedback con-
presents previous work on deep learning, intelligent tutor- veys variants of knowledge, such as knowledge about concepts
ing systems for programming education, human–machine [12], [13], knowledge about mistakes [14], [15], [16], knowl-
collaborative learning, and the LPM. Section III presents edge about how to proceed, such as edit-based hints on adding,
the performer architecture, focusing on its fast attention deleting, and modifying text in the source code to reach a solu-
via positive orthogonal random features (FAVORþ) atten- tion [17], [18], [19], [20], [21]. However, while the edit-based
tion operation for learning a solution model. Section IV hints can guide students in revising their code, they may not help
presents the methods and evaluation criteria to address the students understand why the error happened. Finally, knowledge
two research questions. Section V presents the experiment feedback about metacognition could contain explanations of
designs. Sections VI and VII present the results and dis- metacognitive strategies or guiding questions [22].
cussions, respectively. Finally, Section VIII concludes this Regarding approaches to generating feedback on program-
article. ming, the most common method is dynamic code analysis,
which runs a program against test cases and compares the out-
put to the expected one to report testing success or failure
II. LITERATURE REVIEW [23]. However, this approach requires the cost of preparing
test cases for each problem. In addition, the success or failure
A. Deep Learning Models for Natural Language Processing
feedback is of little help for learning programming. The static
Deep learning is a branch of machine learning that uses analysis method analyzes a program without running it, and it
multilayer neural networks to achieve specific learning tasks can be applied to detect misunderstood concepts, the absence
from large amounts of data. Deep learning has become essen- or presence of some code structures [16], [24], and generate
tial for building advanced intelligent systems in many applica- hints to fix the errors [25]. However, the diversity of code cre-
tion fields, especially in computer vision, speech recognition, ates challenges for static code analysis. Several techniques
and text-based natural language processing. Regarding the can be applied to increase the number of codes a solution
deep learning technology for natural language processing, the model can recognize, such as code normalization to convert a
sequence-to-sequence (Seq2Seq) model has been the most program to another in the same or different language at the
widely adopted architecture. Most early Seq2Seq models used same abstraction level [26], [27]. Intent-based diagnostics use
sequential networks, such as recurrent neural network and a knowledge base of programming goals, plans, or buggy rules
long short-term memory. However, training these networks to match with student code to find out which solution strategy
takes longer because they must wait until the state at the previ- the student used [28], [29]. However, this approach uses bug
ous time step has been calculated before calculating the state templates for a limited set of predefined exercises, so it cannot
at the current time step. handle the wide variety of student codes.
To solve this problem, transformer based on the attention On the other hand, AI-based approaches can handle code
mechanism has made significant progress in image/speech diversity better. However, it was found that they only account
recognition and machine translation in recent years [2]. The for about 28% of the literature [8]. Some studies use neural
self-attention mechanism in transformer can compute interde- networks to generate feedback for specific programming
pendencies between arbitrary data items in a sequence in par- skills, such as recurrent programming [28], [30]. However, to
allel. However, transformer consumes O(L2) space and time to the best of our knowledge, deep learning models for generat-
conduct the attention operation on a sequence of length L. ing feedback in natural language for open exercises and
Among many improved versions of transformer, performer [3] assignments are rare [31]. It is worth noting that the automatic
is one of the most cost effective in terms of time and space. program repairs technique has been widely applied in software
Performer adopts a novel attention operation called FAVORþ, engineering industries. However, it has limited contribution to
which provides scalable, low variance, and unbiased estima- programming education because it was developed for code
tion of attention values that can be computed by kernels repre- written by professional programmers and, therefore, does not
sented as a set of random feature map decompositions. suit novice students well [32].
Therefore, the performer architecture was chosen to build the As intelligent systems are gradually integrated into human
deep learning model in this study. life, effective collaborative learning with such systems

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
WANG: FEASIBLE STUDY OF A DEEP LEARNING MODEL SUPPORTING HUMAN–MACHINE COLLABORATIVE LEARNING OF OBJECT-ORIENTED 415

becomes an important research issue [33]. Human–machine


collaboration can generally be described by two technical fea-
tures: automation and autonomy [5]. Technical automation is
about task allocation in which a technical system performs all
(or part of) tasks previously performed by humans. On the
other hand, technical autonomy is related to four technical
characteristics used to build intelligent agents: nontranspar-
ency, nondeterminism, adaptability, and openness [5]. Such a
taxonomy can be a valuable framework to capture an intelli-
gent agent’s characteristics and interpret human and machine
interactions.
From a constructivist and student-centered learning per-
spective, in human–machine collaborative learning, intelligent Fig. 1. Encoder–decoder Java code analysis model.
agents can do data analysis tasks and present information in
an “intelligent” way that enables students to focus on critical students’ capacity to know, i.e., helping students see them-
thinking and to discover meaning and value behind the data. selves as capable learners. The second is to situate learning in
With the advances in deep learning technology, the levels of students’ own experiences, helping them integrate identity
automation and autonomy of an intelligent agent are raised. goals and knowledge construction. The third is to define learn-
As a learning partner, an intelligent agent with enhanced auto- ing as mutually constructing meaning, helping students see
mation and autonomy could better support human learning that they shared authority and expertise and that learning was
[33], [34], [35]. However, little work has been conducted to about mutually constructing meaning. Finally, the LPM claims
bring deep learning technology into educational contexts, and to help students develop an internal belief system, an internal
research on the application of deep learning technology in identity/sense of self, and a capacity for mutual, interdepen-
educational contexts is under initial development [1]. dent relationships. In brief, the LPM can help researchers
investigate how a student develops to understand her/himself,
C. Learning Partnerships Model to understand his/her partners, and based on these understand-
ings, how she/he coconstructs knowledge with the partners.
Learning partnerships support the development of students’ The LPM was chosen as a lens for this study because it con-
self-authorship, defined as framing one’s reality based on tains features consistent with learning, support, self-reflection,
internal values, beliefs, and loyalty [6]. Self-authorship can be relationship maintenance, and identity development in
described in epistemological, intrapersonal, and interpersonal human–machine collaborative learning. LPM can help under-
dimensions. The epistemological dimension of self-authorship stand how learning and self-authorship development cooccur
focuses on how one knows what one knows. The intrapersonal in collaborative human–machine relationships to inform and
dimension focuses on how one understands oneself. The inter- enhance the practice of deep learning technology in educa-
personal dimension focuses on how one understands others. tional settings.
From a learning partnership perspective, self-authorship repre-
sents the ability to control one’s thoughts and create one’s own
truth or reality, thereby authoring one’s experience. Self- III. PRELIMINARIES OF THE PERFORMER-BASED MODEL
authorship emphasizes that students bring their own identities This section presents the performer-based model [31] for
and values to what they learn and believe, and participate in analyzing java code to generate feedback, including error type
the construction of knowledge [6]. As a result, students prediction and hint generation, to support students’ learning of
achieve self-authorship through continuous learning and OOP design principles. Error type prediction provides students
coconstruction of knowledge with others [36]. with potential error types in code, whereas hint feedback pro-
In this study, the LPM [6], [36] was used as a conceptual vides detailed explanations and suggestions for what to do
framework to model the process of human–machine collabora- next. The detailed explanations of errors allow students to
tive learning to promote student self-authorship. The LPM is learn about OOP concepts and illustrate how the code violates
based on three critical assumptions about achieving self- the key concepts and how they can be corrected. Nevertheless,
authorship in learning partnerships. The first is that knowledge hint feedback can add more cognitive load to students than
is socially constructed and complex, implying that students error-type predictions.
negotiate what to believe with others. The second assumption
is that one’s identity is central to the construction of knowl-
A. Architecture
edge. The third assumption is that authority and expertise are
shared in the mutual construction of knowledge, meaning that As shown in Fig. 1, the encoder encodes a given source
students participate as equal partners in the mutual construc- code into a sequence of representation features. In this study,
tion of meaning. we adopted a performer encoder architecture with six blocks
The LPM uses three principles to offer support to help stu- and four parallel attention layers or heads. The detail of the
dents develop self-authorship [36], [37]. The first is validating encoder is shown in Fig. 2. The embedding layer and the

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
416 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 17, 2024

Finally, an embeddingsim layer is used as the output layer to


determine the target feedback messages by computing the
similarity between the embedded tokens and target message
tokens using the token embedding weights of the decoder.
The loss function adopted for training the decoder is sparse
categorical cross entropy. The sparse categorical cross-
entropy loss is like the categorical cross-entropy loss, but it
uses an integer for each target label, which saves computa-
tion time and space.

B. Attention Operation in Performer


In transformer, the attention operation takes O(L2) to com-
pute an attention matrix encoding the relationship between
Fig. 2. Detailed components of the performer-based encoder–decoder
token pairs in two data sequences of length L. In performer,
model. the attention operation uses the FAVORþ algorithm to calcu-
late the attention matrices. It has been shown that any attention
sinusoid position encoding layer transform the input sequence matrix can be effectively approximated by positive random
of tokens, individually, and the resulting tokens are then added features obtained by positive-valued nonlinear mapping func-
together as the new embedded tokens. Next, these tokens are tions of the original queries and keys [3]. These random fea-
propagated to the network module consisting of the multihead tures are crucial for avoiding instabilities during model
self-attention layer, a dropout layer (optional), an add/norm training and provide an accurate approximation of the regular
layer, a feedforward layer, and another add/norm layer. The softmax-attention operation in transformer. It is worth noting
network module can be replicated and stacked one by one to that the FAVORþ algorithm requires the entire data sequence
extract deeply encoded features of the input tokens. Finally, to ensure its correct operation. This requirement means that in
the output of the encoder is fed into the error-type classifier the inference phase of a performer-based model, generating a
(ETC; a feed-forward network) and the performer-based feedback message of length L word by word will take time
decoder. complexity of O(L2), which is not feasible in practical
ETC analyzes the output feature sequence of the encoder to education.
determine what errors may exist in the code. It outputs a 36-
long vector, indicating 35 error types and the correct one, IV. METHODS
meaning no errors were detected. The binary cross entropy is
the loss function to train the ETC. Concerning the first research question, a cached version of
The decoder receives the sequence of embedded tokens the FAVORþ attention operation is developed and evaluated
from the encoder and the error prediction vector from the ETC by comparing its inference time with the noncached version.
module. It performs a series of transformations to generate Regarding the second research question, this study proposes
feedback messages for each error of the input code. The and evaluates a human–machine collaborative learning pro-
decoder was trained using the teacher force strategy in which cess with the intelligent agent developed using the performer
the target messages are rearranged as an input sequence to the model.
decoder so that the tokens before (and included) the position t
can be used to predict the target token at position tþ1. During A. Technical Improvement of the Performer Inference
the training phase, such as encoder, the rearranged message Algorithm
tokens are propagated to the decoder’s embedding and posi- 1) Incremental Inference Algorithm: The performer-based
tional encoding layers and transformed into a sequence of decoder model requires different behaviors in the training and
embedded tokens. inference phases. During the training phase, the decoder needs
Next, the embedded tokens are propagated to the masked to specify a maximum sequence length so that it can accept
multihead self-attention layer that performs masked self- the entire target feedback prefixed by a <START> token and
attention to prevent tokens from paying attention to their fol- learn to generate the target feedback with an ending <END>
lowing tokens that are not available in the inference phase. token. To facilitate the training process, target feedback,
The resulting self-attention tokens are then added to the orig- whose length is less than the maximum, is padded up with
inal tokens, normalized, and transformed through the add/ <PAD> tags (usually 0) at the end. On the other hand, during
norm layer and the feed-forward layer, respectively. Then, the inference phase, the decoder model attempts to generate
the embedded tokens are propagated to the multihead query the feedback word by word, starting from the initial
attention layer to capture the attention between the embedded <START> token until a maximum length is reached or the
tokens and the source code embedded tokens. Again, these <END> token is generated. However, since the decoder is
network modules can be replicated and stacked to extract designed with a prespecified maximum feedback length, pad-
deeply encoded features of the target feedback tokens. ding the current feedback sequence is still needed so that the

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
WANG: FEASIBLE STUDY OF A DEEP LEARNING MODEL SUPPORTING HUMAN–MACHINE COLLABORATIVE LEARNING OF OBJECT-ORIENTED 417

Algorithm 1: Incremental masked self-attention FAVORþ


(1) Initialize and cache SUM ¼ 0
(2) Let l be the position of the current input token embedding.
(3) Let K l ; Ql; and V l be the transformed tokens of the token l
using corresponding linear transformations, W K ; W Q; and
W V , respectively.
0 0
(4) Let Kl and Ql be the projected tokens of K l and Ql ;
respectively; using the cachedPl1 random projection matrix fSM .
(5) Get the
Pl1 cached SUM 1 ¼ i¼1 fSM ðkÞi  V i ; and SUM2 ¼
pzffiffiffi V
m i ¼ 1 i
(6) Compute SUM 1 ¼ SUM 1 þ Kl  V l , SUM 2 ¼SUM 2 þpzffiffiffi
0
m
Vl
(7) Numerator ¼
0
f ðkÞ  Ql  SUM 1 þ SUM 2
1
max 0...l;0...m1 SM
(8) Update cached SUM 1 and SUM 2 .
Fig. 3. Transformation-level cache modules for (a) masked self-attention
and, (b) query-attention in the decoder.
(9) P S ¼ 0 //ready to compute the denominator
Initialize and cache
(10) Get cached S ¼ jl1 ½fSM ðkÞj
(11) S ¼ S þ Kl0
decoder can generate the next feedback token, causing the Denominator ¼ max 0 ffiffiffi
f ðkÞ  Ql  S þ
1 lz
p
(12) m
0...l;0...m1 SM
inference process to take more time.
(13) return Attention ¼ Numerator / Denominator
To address this performance issue, we modified the decoder
architecture to generate the next token by receiving only the
previously generated one. For this purpose, the position-embed-
ding layer is first revised to compute a token’s positional
embedding with an extra parameter indicating its position in the Algorithm 2: Incremental query-attention FAVORþ
original sequence. Second, an incremental decoder inference (1) Let K and V be the transformed tokens of the query attention.
algorithm is developed by modifying the FAVORþ algorithm (2) Let K 0 be the projected tokens of K , using the cached random
to cache the intermediate results of previously embedded tokens projection matrix fSM :
to ensure its correct operation. (3) Initialize and cache C KV ¼ K 0  V .
2) Transformation-Level Cache Mechanism: There are (4) Let l be the current input token embedding.
two levels of caches inherent in the incremental FAVORþ (5) Let Ql be the transformed tokens of token l using the linear
algorithm for the decoder, including the transformation- and transformation W Q :
(6) Let Q0l be the projected tokens of Ql ; using the cached random
algorithm-level caches. The transformation-level caches store
projection matrix fSM .
an input token’s query, key, and value transformations, (7) Get cached C KV :
whereas the algorithm-level caches store the intermediate (8) Numerator ¼ Q0l  C KV P
results produced by the incremental FAVORþ algorithm, (9) Initialize and cache SK 0 ¼ L1 0
i¼0 ½K i
which will be presented in the following section. 0 0
(10) Denominator ¼ Qt  SK
In the decoder, the masked self-attention operation must (11) return Attention ¼ Numerator / Denominator
cache all query, key, and value tensors obtained by transform-
ing the token sequence. In contrast, the query-attention opera-
tion only needs to cache the query tensors because the key and
value tensors are transformed only once from token sequences After a series of derivation steps, (1) is transformed into the
outside the query-attention module, as shown in Fig. 3. The recurrent formula, as shown in the following. The detailed der-
transformation-level cache mechanism saves time obtaining k, ivation of the recurrent formula is omitted here. Given 0  l 
q, and v values by preventing duplicate transformation opera- L  1; 0  j  d  1, m is the number of random projection
tions. However, our test showed that the transformation-level features
cache does not significantly improve the decoder’s inference
speed. X
m 1
N att ½lj ¼ Q0 ½l; r  SUM½l; rj (2)
3) Algorithm-Level Cache Mechanism: To develop the r¼0
algorithm-level cache mechanism for the FAVORþ algo-
rithm, Att(Q, K, and V), as shown in the following, needs to where
be reformulated as a recurrent formula suitable for incremen-
SUM½l; r j ¼ SUM½l  1; rj þ K 0 ½r; l  V ½lj :
T
tal implementation: (3)

Q0 ðK 0 TV Þ N att
AttðQ; K; V Þ ¼ D1 AQ;K V ¼ pffiffiffi ¼ (1) In addition,
D d N norm
X
m 1
where dimðQ0 Þ ¼ L  m, dim(K 0 T ) ¼ m  L, and dim N norm ½l ¼ Q0 ½l; r  S ½r; l (4)
(V ) ¼ L  d. r¼0

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
418 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 17, 2024

TABLE I
EXAMPLES OF METEOR SCORES OF GENERATED HINT FEEDBACK

Fig. 4. Screenshot showing task descriptions, student codes and system


feedback.

default value of 3. The purpose of the penalty is to penalizehy-


pothetical sentences with correct words and phrases, but in a
very different order than in the references. Before the penalty
can be computed, the unigrams are first grouped into the few-
est possible chunks, defined as subsets of the unigrams adja-
cent to each other in the reference. When evaluating the
fluency of feedback, the smaller the number of chunks, the
longer the average length of each chunk is—the more consis-
tent the word order of the candidate feedback with the refer-
ence feedback. Finally, the final METEOR score is calculated
as follows:

where Meteor ¼ ð1  pÞ  F: (8)

X
l
K 0 ½ r; i ¼ S ½r; l  1 þ K 0 ½r; l:
T T
S ½r; l ¼ (5)
Table I tabulates the METEOR scores for feedback exam-
i¼0
ples, where <BOM> and <EOM> mark the start and end of
a hint, respectively. The METEOR score prefers longer hints
In practice, to improve the stability and convergence of the even if the reference and predicted hints are the same. The
FAVORþ algorithm [4], an incremental stable FAVORþ preference appears to be caused by the rounding error when
algorithm is developed, as shown in Algorithms 1 and 2, for calculating the chunking penalty.
masked attention and query attention, respectively. 5) Design of the Agent System: To deploy the agent into
4) Evaluation Criteria for Feedback: The type error pre- educational practice, a website was developed using the flask
diction is evaluated by accuracy. The hint feedback is evalu- package, as shown in Fig. 4. In this website, descriptions of
ated using METEOR [38]. METEOR includes synonym debugging tasks are displayed in the left window, and the sam-
matching and produces results that are highly correlated with ple code is displayed in the middle window for editing. Click-
those of human judgment. The METEOR algorithm first cal- ing the green diagnose button will submit the code to the
culates the precision rate P and recall rate R in the case of server for analysis, and the feedback will be displayed in the
unigrams. Then the harmonic mean F is obtained as follows: right window. There are two levels of feedback: error types
and detailed hints. Students can switch the levels at will. In
ða2 þ 1ÞP addition, students can accept or refuse the feedback and sub-
F ¼ (6)
R þ aP mit his/her decisions.
where a is the relative importance of P to R. This study uses a The proposed agent system supports the learner partner-
default value of 0.9 for a. What makes METEOR unique is ships model as follows. To help validate students’ capacity to
know, the agent returns feedback on the code modified and
that it uses a penalty formula to prevent fragmented feedback,
submitted by students. To situate learning in students’ own
given as follows:
experiences, the agent did not try to lead the ways to correct
 number chunks u the code. Instead, it is entirely up to the students to correct
p¼g (7) the code based on their own experience and evaluation of the
unigrams matched
feedback. Finally, to help students see that they can share
where g is a parameter that determines how much the penalty authority and expertise and that learning is about mutually
factor affects the final METEOR value, which has a default constructing meaning, the agent analyzes the code submitted
value of 0.5, and u is a parameter that determines the weight by students and provides feedback for informational purposes
of the incorrect ordering rate of the unigrams, which has a only. The students will learn to take the agent’s feedback into

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
WANG: FEASIBLE STUDY OF A DEEP LEARNING MODEL SUPPORTING HUMAN–MACHINE COLLABORATIVE LEARNING OF OBJECT-ORIENTED 419

TABLE II
AUTONOMY CHARACTERISTICS OF THE INTELLIGENT AGENT

Fig. 6. Screenshot showing an interactive session involving error prediction


and user responses after the user corrected the code.

Fig. 7. Screenshot showing an interactive session involving high-level error


prediction and hint feedback.
Fig. 5. Screenshot showing an interactive session involving error prediction
and user responses. achieve their learning goals by offering feedback to lead them
to understand and correct the code. Accordingly, a human–
account without being consumed by them. The autonomy machine collaborative learning process is proposed, as given
characteristics of the agent are given in Table II. in Table III. Students must evaluate the agent’s feedback and
To illustrate the human–machine interaction supported by decide whether to accept or reject the feedback. In this way,
the agent system, we consider the debugging task of designing students learn OOP design principles through collaboration
a pet system containing a pet hierarchy, including dogs and with a machine partner.
cats, and the relations between pet and keeper objects. Fig. 5
shows that the agent successfully predicted a coding error, and
V. EXPERIMENT DESIGNS
the student responded by accepting the prediction. However,
after correcting the code, the agent made the previous error A. Model Development and Performance Comparison
prediction again. This time the student responded by refusing 1) Data Collection: Regarding the first research question,
the prediction, as shown in Fig. 6. Another case showing an this study compares the performance of the proposed inference
interaction involving a higher level of code errors is shown in algorithm with the original one. A deep-learning model was
Fig. 7, where the structural relationship between a pet object first developed based on data collected from java code submit-
and a keeper object is not correctly established. The agent cor- ted by students for exercises, quizzes, and exams in an intro-
rectly made the error prediction and delivered the correct hint ductory OOP course at a university in Taiwan. A total of 330
feedback. The student responded by accepting both the error codes were collected and manually annotated. The codes were
prediction and the hint feedback. In addition, the agent also randomly assigned to humans, with one code annotated by
delivered a hint that helped identify a code segment that incor- two annotators and proofread against each other. All annota-
rectly built the relationship between a pet object and a keeper tors adopt the same hint structure to make the model suitable
object, as shown in Fig. 8. The user responded by accepting for assisted OOP learning, first describing the error type and
both the error prediction and the delivered hint feedback. related OOP concepts, then explaining why the error occurred,
and finally suggesting appropriate code changes.
B. Human-Collaborative Learning Process
Since the GPU hardware device used in this study has lim-
In OOP education, students learn the design principles of ited storage, the code length was restricted to 1000 tokens. As
OOP and the ability to write OOP code. To help them achieve a result, we sorted out a total of 35 error types, and the maxi-
the goal, they must be able to recognize and modify incorrect mum length of hint feedback is 738 tokens. However, the
OOP codes. The intelligent agent is deployed to help students small amount of code is far from the sample size required to

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
420 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 17, 2024

TABLE IV
SYSTEM EVALUATION QUESTIONNAIRE

Fig. 8. Screenshot showing an interactive session involving a hint that help


identify error code segments.

TABLE III
AUTOMATION DESIGN OF THE LEARNING PROCESS

advanced Java programming course were invited to participate


in the evaluation. Four debugging tasks of different difficulty
levels are given to the students randomly, and they are
requested to complete the debugging tasks with the agent’s
help. After the experiment, they completed a seven-item sur-
vey, as given in Table IV, about their opinions of the system.
The system also recorded the students’ interactions with the
agent, including the error predictions and hints generated by
the agent and the student’s acceptance or rejection decisions.
A total of 128 log records were collected. Finally, the students
handed in a self-reflection report describing how they learned
train a neural model. Therefore, this study adopts the code the debugging tasks in collaboration with the agent.
augmentation method proposed by Wang [39] to increase the 2) Data Analysis: The analysis used the thematic analysis
sample size by changing local variable names and randomly framework of Braun and Clarke [40]. Thematic analysis was
reordering statements without affecting the original code chosen because it allows for finding meaningful patterns within
semantics. In the end, the sample size was increased from 330 the data inductively, which helps address the research question
to 11 200. by finding what themes underlie the students’ experience and
2) Model Training: The data are divided into 80% training perspectives of the human–machine collaborative learning sup-
data and 20% testing data. The model was trained for 1000 ported by the intelligent agent. First, students’ responses to the
epochs, using 20% of the training data as validation. The train- open questions are encoded using semantic and latent codes,
ing process is conducted on the Window 10 operating system which are then grouped into meaningful candidate themes gen-
with an Intel(R) Core(TM) i9-11900K 3.50 GHz processor, erated by the researcher. These candidate themes are reviewed
128 GB RAM, and an NVIDIA RTX 3090 with 24 GB RAM. and further developed at the extraction level, considering the
3) Performance Comparison: First, we investigate the per- context relevant to the entire dataset. Themes are further refined
formance of the original performer inference algorithm, which and defined until the researcher judges that they represent a thor-
is used as a baseline for comparison. For comparison, 1000 ough study understanding. Themes are then renamed according
random samples were selected from the training, validation, to their content and encoding extraction.
and test datasets. The average prediction accuracy and infer-
ence time for each selected dataset are reported and compared. VI. RESULTS
A. Prediction Accuracy and Inference Speed Comparison
B. User Study
1) Data Collection: This study conducts a human evalua- The loss and accuracy of the training process for the error
tion of the intelligent agent. Eleven students enrolled in an classifier and hint generation model are shown in Figs. 9 and

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
WANG: FEASIBLE STUDY OF A DEEP LEARNING MODEL SUPPORTING HUMAN–MACHINE COLLABORATIVE LEARNING OF OBJECT-ORIENTED 421

TABLE V
EXAMPLES OF HUMAN–MACHINE INTERACTIONS

Fig. 9. (a) Training loss of the classifier submodel. (b) Training accuracy of
the classifier submodel.

Fig. 10. (a) Training loss of the hint feedback submodel. (b) Training accu-
racy of the hint feedback submodel.

10, respectively. As seen in the figures, the encoder–decoder


model is well trained for training and validation data.
For error prediction, the error classifier achieves the average
binary accuracy of 1.000, 0.999, and 0.999 for the training,
validation, and testing datasets, respectively. For hint genera-
tion, the model achieves average METEOR scores of 0.922,
0.915, and 0.918 for the training, validation, and testing data-
sets, respectively. The high accuracy of error predictions, high
fluency, and accurate hints generated by the agent shows that
it performs very well in the code analysis tasks, at least for the
collected data.
Regarding the inference speed, the baseline version of the
original inference algorithm took an average of 5.74 s to gen-
erate a target token. In contrast, the proposed cached version
achieves an average of about 0.07 s per token, faster than the
baseline version by nearly 80 times, which is a significant
improvement and, to the best of our knowledge, is by far the
best in the literature on performer-based models.

B. Human–Machine Interaction
Table V tabulates two scenarios of the interactions between
the agent and students. In scenario 1, the code is about a pet
system where the pet hierarchy, including dog and cat, needs
to be adequately designed using Java’s abstraction and inheri-
keyboard reading action.” This hint is not related to
tance mechanisms. Furthermore, the relationship between the
the buggy code.
pet and keeper classes must be carefully designed.
The first interaction session has two error predictions and The second hint says
three hints. The student accepted the two error predictions and “Object data/method members are the data/methods
two of the three generated hints. The first hint says owned by an object. Each object has its own data/
“Because there is only one keyboard device, the method members, independent of other objects. Syn-
scanner object can be declared once, and there is no tactically, those data/method members not declared as
need to declare a different scanner object for each static are object data/method members. You can also

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
422 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 17, 2024

TABLE VI TABLE VII


STUDENT RESPONSES TO THE QUESTIONNAIRE ITEMS STUDENT DECISIONS IN THE EVALUATION

hints as more reliable, and three students rated both feedback


types as equally reliable. In contrast, concerning the perceived
declare access privileges for object data/method mem- unreliability of the feedback types, four students rated error-type
bers; for example, the statement: private int x; indi- predictions as less reliable, four students rated hint feedback as
cates that object variable x can be accessed only by less reliable, and three rated both feedback types equally unreli-
the owner object.” able. Although the agent achieved high performance in the train-
The third hint says ing phase, it could have performed better in the human–machine
interaction phase. Nevertheless, this indicates that the agent
“The relationship between two objects can often be should play a learning partner’s role, triggering the students’
implemented with one object owning the other. For thinking and judgment instead of playing a tutor role. Regarding
example, to describe the relationship between the Pet the overall usefulness of the agent for learning in the debugging
and the Keeper, you can declare an object data of tasks, six students rated it as helpful, and five students rated it as
Keeper inside the Pet object to indicate who the fair. Finally, concerning the change of trust in the system before
Keeper of the Pet is. The keeper object can be created and after the evaluation experiment, four students indicated that
using constructors, for example, public Pet (String their trust in the system increased, and seven students felt no
type, Keeper keeper) {. . .. this.keeper;}.” change in their trust after the learning tasks.
These two hints are correct for the test case. Therefore, the
student’s decisions in this interaction session were all correct. D. Log Analysis
After correcting the code, she/he resubmitted the code and
Table VII tabulates the log analysis results of the students’ deci-
received feedback that no errors were detected.
sions to accept or reject the feedback to investigate their trust in
In scenario 2, the code is about a letter system where the let-
the agent further. On average, the student’s acceptance rate is 67/
ter hierarchy, including Chinese and English, needs to be
128 ¼ 0.523. For error predictions, the acceptance rate (0.547) is
designed using Java’s abstraction and inheritance mecha-
higher than the rejection rate (0.453). As for hints, the acceptance
nisms. Furthermore, the relationship between the Envelop
rate (0.5) is the same as the rejection rate (0.5). Furthermore, the
class and Letter class must be carefully designed. In the first
acceptance rate of error predictions (0.547) is higher than that of
session, the agent predicted an abstraction error but did not
hints (0.453). The rejection rate of error predictions (0.475) is
generate any hint feedback, which can happen due to the
lower than that of hints (0.525). As a result, the students trust the
agent’s nondeterministic characteristic. The student accepted
agent’s error predictions more than the agent’s hints, which is con-
the error prediction but rejected the none feedback. She/he
sistent with the survey data in Table VI.
resubmitted the code in session 2, and this time the agent
Table VIII tabulates the total counts of acceptance/rejection
detected an extra error named object error. In addition, the
decisions in each interaction session to elucidate further
hint that did not appear in the first session was generated in
changes in students’ trust in the agent across interactive ses-
session 2, which says
sions. The maximum number of human–machine interactive
“Subclasses can inherit common object data/meth- sessions is four. Fig. 11 shows the trends in the acceptance
ods by declaring them once in the parent class. Differ- rate of error predictions and hints in different interactive ses-
ent object data/methods can be declared in each sions. It can be seen that the acceptance rates are dropping
derived subclass.” except for the hints in the last session. The agent performs
This case shows that accurate predictions and hints can be best in the first session but could be better in subsequent ses-
obtained by repeatedly asking the agent to diagnose the same sions, which suggests that the agent be retrained on two-way
code. In other words, the agent can help students explore the interaction data rather than one-way interaction data, such as
problem space. exercises and quizzes. In addition, the students appear to be
using their judgment to accept or reject the agent’s feedback
across the interactive sessions.
C. Survey Analysis
E. Thematic Analysis
Table VI tabulates the student responses to the survey,
where all students agree that either the predicted error types or As given in Table IX, several themes emerged in students’
hints help them learn in the debugging tasks. Regarding the perceptions of the agent’s feedback. They are obtained based
perceived reliability of the feedback types, five students rated on their positive/negative effects from the perspectives of the
error-type predictions as more reliable, three students rated the 3-Ds of the learner partnerships model.

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
WANG: FEASIBLE STUDY OF A DEEP LEARNING MODEL SUPPORTING HUMAN–MACHINE COLLABORATIVE LEARNING OF OBJECT-ORIENTED 423

TABLE VIII
SESSIONS OF STUDENT ACCEPTANCE DECISIONS IN THE EVALUATION

1) Interpersonal Dimension: First, the theme “nondetermi-


nism” is about the predictability of the agent’s predictions given Fig. 11. Ratios of acceptance decisions across sessions.
the same input code. For example, the following is a negative
remark:
TABLE IX
“Sometimes, the same code has different and con- THEMES EMERGED FROM STUDENTS’ OPINIONS ON ERROR PREDICTIONS AND
flicting prediction results.” HINTS AND THEIR RELATIONS TO LPM DIMENSIONS

The theme “responsiveness” refers to the agent’s ability to


respond to students’ actions. For example, there are some neg-
ative remarks, such as
“Sometimes, after correcting the code, the agent
still delivered the same error prediction”
and
“Sometimes, the agent still delivered the same hint
feedback after modifying the code according to the
hints.”
The theme “accuracy” refers to the agent’s ability to deliver
accurate predictions. For example, there are some positive
remarks, such as 2) Epistemological Dimension: The theme “directivity”
“The hint given is more accurate (than error predic- refers to how the delivered feedback can guide students
tions), and it feels easier to understand the problem through the debugging tasks. For example, there are some pos-
than by the error type prediction.” itive remarks, such as
“The hint given is correct most of the time.” “I can know the general direction to correct the
error (by the error predictions).”
In contrast, there are some negative remarks about the
Accuracy theme, such as “It (error prediction) can let me know where there is
a problem with the code and get to the point.”
“Sometimes the hint given did not help correct the
code.” “It (error prediction) offers positive help by giving
a general debugging direction.”
“It (the error prediction) is sometimes right and
sometimes wrong, which is quite confusing.” However, there are some negative remarks, such as

The theme “expertise” refers to the student’s evaluation of “But they (the error prediction and the hint) can
the agent’s expertise. For example, a positive remark says also be misleading.”

“The agent can analyze the relationships between “Sometimes the error prediction will mislead me to
objects.” focus on the wrong place in the code to be corrected,
resulting in an inefficient debugging process.”
The theme “context-sensitivity” refers to the agent’s ability
to deliver context-sensitive feedback on the input code. For The theme “specificity” refers to the level of detail in the
example, one student said that feedback delivered by the agent. For example,

“It is nice to get (hint) feedback with snippets about “While the error prediction is difficult to under-
what went wrong in the code, helping to get straight to stand, the hints can give us a deeper understanding of
where the error is.” how to correct errors.”
“Understanding the problem (by the hints) is
The abovementioned themes can lead to a positive or a neg-
beneficial.”
ative impact on students’ trust in the agent’s behavior. There-
fore, they are classified as belonging to the interpersonal In contrast, there is a negative remark about error predic-
dimension. tions saying that

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
424 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 17, 2024

“The description of the error type prediction is a bit incremental inference algorithm achieves nearly 80 times
vague.” faster than the original one.
Improving model inference speed is essential for applying
The theme “overload” refers to the cognitive load demands
performer-based models in educational settings. Specifically,
of students when reading the feedback. For example,
in the training data, the average length of the generated feed-
“The (hint) narrative is slightly long and not easy to back is 132.90 tokens with a standard deviation of 123.67,
comprehend.” implying that the original inference algorithm takes an aver-
The theme “location identification” refers to how the age of 12.71 min with a standard deviation of 13.83 min to
agent’s feedback helps students locate error codes. For generate feedback. Such long response times make the origi-
example, nal model not a technically feasible solution for educational
applications. The new inference algorithm takes only an aver-
“I can know the approximate error location (by the
age of 9.30 s for hint generation, with a standard deviation of
hints).”
8.66 s. Although the inference time can be reduced by inves-
“I can directly find the wrong place in the code (by ting in high-end devices that might cost tens or hundreds of
the hints).” thousands of dollars, we can achieve satisfactory performance
The theme “exploration” refers to how the agent’s feedback on workstation-level systems that cost thousands of dollars.
helps students explore the error code. For example, On the other hand, the model’s prediction performance on test
data are quite good, with an average accuracy of 99.9% for
“It (error prediction) can tell me code errors I did
error prediction and 91.8% for hint feedback. Therefore, this
not notice.”
study helps address the cost barriers to applying deep learning
The abovementioned themes can lead to a positive or a neg- technology to education, making performer-based models a
ative impact on students’ understanding of the target domain. potentially viable and accessible solution in educational
Therefore, they are classified as belonging to the epistemolog- settings.
ical dimension.
3) Intrapersonal Dimension: The theme “comprehension”
B. Educational Feasibility of the Deep Learning Agent
refers to the students’ internal comprehension of the agent’s
feedback. For example, This study adopts the LPM to investigate how the students
“Sometimes, I do not understand what the hints perceived the agent in the human–machine collaborative
mean.” learning process. From the interpersonal perspective, we
found positive and negative factors that could affect students’
“My Chinese is not good; if it is not vernacular, I trust in the agent. Students’ trust in the agent can come from
cannot understand it (the hint) myself.” the agent’s expertise in error prediction and generating hints
The theme “exertion of authority” refers to the student’s with accuracy and context sensitivity. For example, the stu-
ability to evaluate the agent’s feedback. For example, dents found the agent to be very accurate at predicting codes
that incorrectly implemented relationships between objects.
“Sometimes the agent gives conflicting error
As given in Table VII, log analysis also shows that the
predictions.”
student’s average acceptance rate of the agent’s feedback is
“It (the error prediction) is sometimes right and 0.523, and the acceptance rate of error predictions is higher
sometimes wrong.” than that of hints, which is consistent with the survey data.
The abovementioned themes can lead to a positive or a neg- However, students’ trust in the agent can be reduced due to
ative impact on students’ self-trust. Therefore, they are classi- the agent’s nondeterministic behavior, inaccuracy of feedback,
fied as belonging to the intrapersonal dimension. and lack of responsiveness to students’ actions. For example,
the agent may give different diagnostic results for the same
code at different times, sometimes even contradicting each
VII. DISCUSSIONS other, leaving students needing clarification about what to do
next. In addition, the students found that sometimes after mod-
A. Technical Feasibility of the Deep Learning Agent
ifying the codes according to the given feedback, the agent
Advances in deep learning technology have enabled intelli- still gives the same feedback for the modified code. The lack
gent agents to perform high-level intellectual analysis tasks, of responsiveness may be due to the improper training data
such as code analysis more than ever before. However, apply- collection policy adopted to develop the neural model. As
ing deep learning technology in educational settings still has shown in Fig. 11, students’ acceptance rates of the agent’s
several issues. First, deep learning models usually require feedback decreased across interactive sessions, implying that
high-end computing devices, hindering their widespread the model did not respond well to students’ code-correcting
application in educational practice. This study proposes an actions. Therefore, the model’s training data should be col-
incremental FAVORþ algorithm for performer to reduce lected from two-way interactions between teachers and stu-
token generation time from O(L) to O(1). As a result, it takes dents, not just from one-way interactive activities, such as
O(L) to generate feedback of length L. The proposed homework and quizzes.

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
WANG: FEASIBLE STUDY OF A DEEP LEARNING MODEL SUPPORTING HUMAN–MACHINE COLLABORATIVE LEARNING OF OBJECT-ORIENTED 425

From the intrapersonal perspective, we found that some stu- from the burden of building solution models or test cases for
dents showed higher self-confidence by using self-authority to problems. It can handle the diversity of student code. How-
evaluate the agent’s feedback. However, we also found that ever, despite the advantages and growing popularity of intelli-
some students showed less confidence in understanding the gent agents developed by deep learning technology, the
agent’s hints. This finding raises the issue that these students feasibility of using this technology as a practical solution for
need different learning supports during the human–machine education still requires considerable research. Finding a tech-
collaborative learning process. nically feasible solution by downsizing deep learning technol-
From the epistemological perspective, error predictions can ogy is the first step toward widespread adoption in education.
provide students with general directions to correct the code The first contribution of this study is that the performer, an
and alert them to potential bugs in their code. In addition, the efficient version of the transformer, can be improved to be a
higher level of detail in the hints can help students grasp the technically feasible solution to offer intelligent learning sup-
essentials of the concepts and skills to correct the problems. In port within a dozen seconds in the deployment phase.
particular, the snippets in the hints can help students effi- This study further explores how such an intelligent agent
ciently identify the error code’s locations. However, students can support learning in educational settings. A human–
also mentioned that the error type predictions and hints are machine collaborative learning process with the intelligent
sometimes inaccurate, leading to an inefficient debugging pro- agent is proposed and evaluated based on the LPM, which
cess. Students may need to evaluate the code themselves and reveals the relationships between the agent’s technical auto-
should not put complete trust in the agent’s decisions all the mation and autonomy features and students’ development of
time. Instead, they should view the agent as a learning partner self-authorship. The results of the second research question
and learn to evaluate the agent’s feedback to decide which to are presented in the following.
accept or reject. Through this peer-evaluation process, they 1) Nontransparency: Transparency is the ability of an
will have more opportunities to think and learn. agent system to produce high-quality explanations of its
On the other hand, student cognition affects human– intent, performance, plans, and reasoning [41]. It was shown
machine interaction. Some students mentioned that the that increasing the awareness of the decision-making process
descriptions of the error types needed to be more specific for of an intelligent agent improves the trust relationship between
them to understand. Although students responded that the humans and machines [42], [43], [44]. Furthermore, literature
hints are more accurate and helpful than error predictions for shows that human trust in automated systems is essential to
understanding the problems and identifying the locations of human–machine collaboration [45]. In this study, the transpar-
the error, some students mentioned that the descriptions of the ency problem is addressed by training the agent to deliver
hints needed to be shorter and easier to understand. Therefore, hints that can help explain the error-type predictions, enabling
the agent needs to adapt the type and content of feedback to students to better understand and locate problems in code,
students’ cognitive load. thereby contributing to students’ epistemological develop-
In summary, the human evaluation results show that the ment. Furthermore, the accuracy of the error predictions and
agent has a certain level of expertise to support students in hints helps enhance students’ trust in the agent.
learning to debug OOP code, especially in the first interactive However, some students still need to know why the agent
session. In addition, the agent’s random behavior can help stu- delivered specific feedback. Therefore, other forms of feed-
dents develop explorative learning strategies and promote back that could directly explain the agent’s decisions should
self-development with self-confidence. Furthermore, while the be considered. However, nontransparency is a na€ıve nature of
agent’s expertise can be enhanced by retraining the model a neural-based agent; therefore, increasing transparency of
with two-way interaction data, students should view the agent such agents could be challenging. The importance of the
as a learning partner rather than a learning tutor. In the part- interpretability of deep learning models has created an emerg-
nership created between humans and machines, a student can ing field called eXplainable Artificial Intelligence (XAI) [46].
develop self-authorship by developing appropriate relation- For example, the XAI technology can be applied to enable the
ships with the agent and himself and then an appropriate col- agent to display the code segments that lead to a specific type
laborative learning strategy with the agent. As one student of error. Such feedback could help improve the transparency
says in his self-reflection report: of the agent, thereby improving the students’ trust in the agent.
“I see the agent’s feedback as a list of potential 2) Nondeterminism: It is found that the agent’s nondeter-
problems in the code. I will check the code for every ministic behavior negatively affects students’ trust in the
possible error option in the list, which gives me more agent. Nevertheless, the nondeterministic characteristics of
confidence to make the correct changes to the code.” the agent may help students discover novel things they did not
notice, expanding their exploration space. This nondeterminis-
tic characteristic of the agent also suggests that keeping
VIII. CONCLUSION student autonomy is essential in the human–machine collabo-
rative learning process. Students should see the agent as a
A. Implications for Education, Research, and Practice
learning partner and learn to evaluate its feedback to decide
Compared with other methods of building intelligent pro- which to accept or reject, which can help them achieve self-
gramming tutors, deep learning technology frees teachers authority.

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
426 IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, VOL. 17, 2024

3) Adaptability: As can be seen from the students’ unstable trust between students and agents. Therefore, in the
responses, human–computer interaction is dynamic and indi- study of human–machine collaborative learning, effective
vidual. Although the current agent is adaptable to the diversity interaction between students and agents and its impact on
of student code, it could have adapted better to student-modi- student’s learning outcomes can be an essential topic for
fied codes across the interactive sessions. Therefore, we rec- future research [47], [48].
ommend adding a new dimension called “responsiveness” to Future research could apply and evaluate the proposed
the characteristics of intelligent agents applied in educational human–machine collaborative learning process combining deep
settings. Responsiveness focuses on the agent’s adaptability to learning technology in different educational settings to explore
student responses. It is recommended to find appropriate data their pedagogical benefits for teachers and students. For exam-
sources that contain two-way interaction training data to ple, the learning process can be extended to a group-based
ensure that the agent learns the knowledge to generate correct human–machine collaborative learning process where students
feedback for subsequent interactive sessions. For example, in in the group learn collaboratively with the agent. In such situa-
this study, retraining the model on error codes with annotated tions, individuals will get assistance from other peers if they
feedback and correcting ones could improve the agent’s have difficulty understanding the agent’s feedback alone.
responsiveness.
Furthermore, students have different preferences for the REFERENCES
type and content of the feedback. The agent’s adaptability [1] X. Chen, H. Xie, D. Zou, and G.-J. Hwang, “Application and theory
could be increased if it can adapt to student characteristics, gaps during the rise of artificial intelligence in education,” Comput.
Educ. Artif. Intell., vol. 1, 2020, Art. no. 100002.
such as their cognitive load and trust in the agent [49]. For [2] A. Vaswani et al., “Attention is all you need,” in Proc. 31st Conf. Neural
example, the agent can learn from human–machine interac- Inf. Process. Syst., 2017, pp. 6000–6010.
tions when and how long they spend deciding to accept or [3] K. Choromanski et al., “Rethinking attention with performers,” in Proc.
9th Int. Conf. Learn. Representations, 2021, pp. 1–38.
reject the agent’s feedback, thereby adjusting its internal mod- [4] P. Urlaub and E. Dessein, “From disrupted classrooms to human-
els of students’ cognitive load and trust in the agent and machine collaboration? The pocket calculator, Google translate, and the
changing its feedback behavior accordingly. Fulfillment of future of language education,” L2 J., vol. 14, no. 1, pp. 45–59, 2022.
[5] M. Simmler and R. Frischknecht, “A taxonomy of human–machine col-
extensive adaptability can help students build a positive rela- laboration: Capturing automation and technical autonomy,” AI Soc.,
tionship with the agent, thereby improving the human– vol. 36, pp. 239–250, 2021.
machine collaborative learning process and promoting stu- [6] M. B. Magolda and P. M. King, Learning Partnerships: Theory and
Models of Practice to Educate for Self-Authorship. Sterling, VA, USA:
dents’ self-authorship. Stylus Publishing, LLC, 2004.
4) Openness: The proposed agent is currently closed to [7] H. C. Lane and K. VanLehn, “Teaching the tacit knowledge of program-
interacting with other agents and collecting data from multiple ming to novices with natural language tutoring,” Comput. Sci. Educ.,
vol. 15, no. 3, pp. 183–201, 2005.
sources. Nonetheless, a multiagent design can be expected to [8] H. Keuning, J. Jeuring, and B. Heeren, “A systematic literature review
augment the agent system with the capability to expand its of automated feedback generation for programming exercises,” ACM
original input and the flexibility in gathering source data to Trans. Comput. Educ., vol. 19, no. 1, pp. 1–43, 2019.
[9] V. J. Shute, “Focus on formative feedback,” Rev. Educ. Res., vol. 78,
improve the interaction between humans and machines. For no. 1, pp. 153–189, 2008.
example, the agent’s feedback can be adaptive if it can interact [10] J. Hattie and H. Timperley, “The power of feedback,” Rev. Educ. Res.,
with another agent that models the student’s cognitive load vol. 77, no. 1, pp. 81–112, 2007.
[11] D. Boud and M. Elizabeth, Feedback in Higher and Professional Educa-
and trust states, as aforementioned. tion. Evanston, IL, USA: Routledge, 2013.
[12] J. R. Anderson and E. Skwarecki, “The automated tutoring of intro-
ductory computer programming,” Commun. ACM, vol. 29, no. 9,
B. Limitations and Suggestions for Future Research pp. 842–849, 1986.
[13] S. Gross, B. Mokbel, B. Hammer, and N. Pinkwart, “Learning feedback
First, developing deep learning models is a domain-depen- in intelligent tutoring systems,” K€ unstliche Intelligenz, vol. 29, no. 4,
dent task, and this study focused only on an introductory OOP pp. 413–418, 2015.
[14] R. Singh, S. Gulwani, and A. Solar-Lezama, “Automated feedback gen-
course using java programming language. Therefore, the col- eration for introductory programming assignments,” ACM SIGPLAN
lected data may suffer from a data bias issue. In addition, a Notices, vol. 48, no. 6, pp. 15–26, 2013.
small population of undergraduates from the programming [15] W. Wu, G. Li, Y. Sun, J. Wang, and T. Lai, “AnalyseC: A framework
for assessing students’ programs at structural and semantic level,” in
discipline is considered. Therefore, discretion must be exerted Proc. IEEE Int. Conf. Control Automat ., 2007, pp. 742–747.
to extend the results to other courses, populations, and disci- [16] N. Truong, P. Roe, and P. Bancroft, “Static analysis of students’ Java
plines. Nonetheless, the same model development process can programs,” in Proc. 6th Australas. Conf. Comput. Educ., 2004, vol. 30,
pp. 317–325.
be replicated in other programming courses and population sit- [17] B. Hartanto and J. Reye, “CSTutor: An intelligent tutoring system that
uations. Second, this study explored how deep learning tech- supports natural learning,” in Proc. 4th Annu. Int. Conf. Comput. Sci.
nology can be integrated with programming education. Educ.: Innov. Technol., 2013, pp. 19–26.
[18] W. L. Johnson, “Understanding and debugging novice programs,” Artif.
Therefore, large-scale controlled experiments on the impact of Intell., vol. 42, no. 1, pp. 51–97, 1990.
human–machine collaborative learning on learning outcomes [19] E. Sykes, “Design, development and evaluation of the Java intelli-
are encouraged. gent tutoring system,” Technol., Instruct., Cogn. Learn., vol. 8,
no. 1, pp. 25–65, 2010.
Furthermore, because the predictive power of a deep learn- [20] W. Jin, T. Barnes, and J. Stamper, “Program representation for auto-
ing model may not be 100% correct, the human–machine matic hint generation for a data-driven novice programming tutor,” in
interaction may involve more uncertainty caused by the Proc. Int. Conf. Intell. Tutoring Syst., 2012, pp. 304–309.

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.
WANG: FEASIBLE STUDY OF A DEEP LEARNING MODEL SUPPORTING HUMAN–MACHINE COLLABORATIVE LEARNING OF OBJECT-ORIENTED 427

[21] K. Rivers and K. R. Koedinger, “Data-driven hint generation in vast [38] S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT
solution spaces: A self-improving python programming tutor,” Int. J. evaluation with improved correlation with human judgments,” in Proc.
Artif. Intell. Educ., vol. 27, pp. 37–64, 2017. ACL Workshop Intrinsic Extrinsic Eval. Measures Mach. Transl. Sum-
[22] T. Price, R. Zhi, and T. Barnes, “Evaluation of a data-driven feedback marization, 2005, pp. 65–72.
algorithm for open-ended programming,” in Proc. Int. Conf. Educ. Data [39] F. H. Wang, “A semantics-preserving approach to augmentation of Java
Mining, 2017, pp. 192–197. codes for training deep learning classification models,” in Proc. 25th
[23] F. Jurado, M. Redondo, and M. Ortega, “Using fuzzy logic applied to TANET, 2019, pp. 277–282.
software metrics and test cases to assess programming assignments and [40] V. Braun and V. Clarke, “Using thematic analysis in psychology,” Qual-
give advice,” J. Netw. Comput. Appl., vol. 35, no. 2, pp. 695–712, 2012. itative Res. Psychol., vol. 3, no. 2, pp. 77–101, 2006.
[24] R. Matloobi, M. Blumenstein, and S. Green, “Extensions to generic [41] J. Y. Chen, K. Procci, M. Boyce, J. Wright, A. Garcia, and M. Barnes,
automated marking environment: Game-2þ,” in Proc. Interactive Com- “Situation awareness-based agent transparency,” United States Army
put. Aided Learn. Conf., 2009, vol. 1, pp. 1069–1076. Res. Lab., Adelphi, MD, USA, Tech. Rep. ARL-TR-6905, 2014.
[25] M. Striewe and M. Goedicke, “A review of static analysis approaches [42] J. E. Mercado, M. A. Rupp, J. Y. C. Chen, M. J. Barnes, D. Barber, and
for programming exercises,” in Proc. Int. Conf. Comput. Assist. Assess- K. Procci, “Intelligent agent transparency in human-agent teaming for
ment Res. E-Assessment, 2014, pp. 100–113. multi-UxV management,” Hum. Factors, vol. 58, no. 3, pp. 401–415,
[26] H. Ueno, “A generalized knowledge-based approach to comprehend 2016.
Pascal and C programs,” IEICE Trans. Inf. Syst., vol. 83, no. 4, [43] K. Akash, W. L. Hu, T. Reid, and N. Jain, “Dynamic modeling of trust in
pp. 591–598, 2000. human-computer interactions,” in Proc. Amer. Control Conf., 2017,
[27] S. Xu and Y. S. Chee, “Transformation-based diagnosis of student pro- pp. 1542–1548.
grams for programming tutoring systems,” IEEE Trans. Softw. Eng., [44] W.-L. Hu, K. Akash, T. Reid, and N. Jain, “Computational modeling of
vol. 29, no. 4, pp. 360–384, Apr. 2003. the dynamics of human trust during human–machine interactions,”
[28] E. Soloway, E. Rubin, B. Woolf, J. Bonar, and W. L. Johnson, “Meno-II: IEEE Trans. Hum.-Comput. Syst., vol. 49, no. 6, pp. 485–497, Dec.
An AI-based programming tutor,” J. Comput.-Based Instruct., vol. 10, 2019.
no. 1, pp. 1–35, 1983. [45] M. E. G. Moe, M. Tavakolifard, and S. J. Knapskog, “Learning trust in
[29] W. L. Johnson and E. Soloway, “Intention-based diagnosis of nov- dynamic multiagent environments using HMMs,” in Proc. 13th Nordic
ice programming errors,” in Proc. AAAI Conf. Artif. Intell., 1984, Workshop Secure IT Syst., 2008, pp. 135–146.
pp. 162–168. [46] M. Danilevsky, K. Qian, R. Aharonov, Y. Katsis, B. Kawas, and P. Sen,
[30] M. Day, M. R. Penumala, and J. Gonzalez-Sanchez, “Annete: An intelli- “A survey of the state of explainable AI for natural language proc-
gent tutoring companion embedded into the Eclipse IDE,” in Proc. essing,” in Proc. 27th ACM SIGKDD Conf. Knowl. Discov. Data Min-
IEEE 1st Int. Conf. Cogn. Mach. Intell., 2019, pp. 71–80. ing, 2020, pp. 447–459.
[31] F. H. Wang, “An efficient performer-based model for automatic feed- [47] K. Akash, K. Polson, T. Reid, and N. Jain, “Improving human-
back generation for teaching object-oriented programming,” in Proc. machine collaboration through transparency-based feedback–Part I:
14th Annu. Int. Conf. Educ. New Learn. Technol., 2022, pp. 2167–2174. Human trust and workload model,” IFAC-PapersOnLine, vol. 51,
[32] J. Yi, U. Z. Ahmed, A. Karkare, S. H. Tan, and A. Roychoudhury, “A no. 34, pp. 315–321, 2019.
feasibility study of using automated program repair for introductory pro- [48] K. Akash, K. Polson, T. Reid, and N. Jain, “Improving human-computer
gramming assignments,” in Proc. 11th Joint Meeting Foundations collaboration through transparency-based feedback – Part II: Control
Softw. Eng., 2017, pp. 740–751. design and synthesis,” IFAC PapersOnLine, vol. 51, no. 34, pp. 322–
[33] A. Vizcaıno, “A simulated student can improve collaborative learning,” 328, 2019.
Int. J. Artif. Intell. Educ., vol. 15, pp. 3–40, 2005. [49] G. J. Hwang, H. Y. Sung, S. C. Chang, and X. C. Huang, “A fuzzy
[34] G. J. Hwang, H. Xie, B. W. Wah, and D. Gasevic, “Vision, challenges, expert system-based adaptive learning approach to improving students’
roles and research issues of artificial intelligence in education,” Comput. learning performances by considering affective and cognitive factors,”
Educ., Artif. Intell., vol. 1, pp. 1–5, 2020. Comput. Educ., Artif. Intell., vol. 1, 2020, Art. no. 100003.
[35] G. J. Hwang and C.-Y. Chang, “A review of opportunities and chal-
lenges of chatbots in education,” Interactive Learn. Environ., pp. 1–14, Feng Hsu Wang received the M.S. and Ph.D.
Jul. 2021, doi: 10.1080/10494820.2021.1952615. degrees in computer science and information engi-
[36] M. B. Magolda, “Learning partnerships model: A framework for pro- neering from National Taiwan University, Taipei,
moting self-authorship,” in Learning Partnerships: Theory and Models Taiwan, in 1988 and 1993, respectively.
of Practice to Educate for Self-Authorship, M. B. Magolda and In 1995, he joined the Graduate School of Informa-
P. M. King Eds. Sterling, VA, USA: Stylus Publishing, LLC, 2004, tion Science and Engineering, Ming Chuan Univer-
pp. 37–62. sity, Taoyuan City, Taiwan, where he is currently a
[37] M. B. Magolda and P. M. King, “Toward reflective conversations: An advis- Professor. His research interests include deep learn-
ing approach that promotes self-authorship,” Peer Rev., Emerg. Trends Key ing technology, educational technology, and peda-
Debates Undergraduate Educ., vol. 10, no. 8, pp. 8–11, 2008. gogy research.

Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 06,2024 at 02:02:13 UTC from IEEE Xplore. Restrictions apply.

You might also like