You are on page 1of 13

The Internet and Higher Education 54 (2022) 100855

Contents lists available at ScienceDirect

The Internet and Higher Education


journal homepage: www.elsevier.com/locate/iheduc

Using trace data to enhance Students' self-regulation: A learning


analytics perspective
Dan Ye a, *, 1, Svoboda Pennisi b
a
Department of Career and Information Studies, University of Georgia, Athens, GA 30602, United States of America
b
Department of Horticulture, University of Georgia-Griffin, Griffin, GA 30223, United States of America

A R T I C L E I N F O A B S T R A C T

Keywords: The purpose of this study was to investigate whether students' self-reported SRL align with their digital trace data
Self-regulated learning collected from the learning management system. This study took place in an upper-level college agriculture
Online learning environments course delivered in an asynchronous online format. By comparing online students' digital trace data with their
Digital trace data
self-reported data, this study found that digital trace data from LMS could predict students' performance more
Self-reported self-regulated learning data
Cluster analysis
accurately than self-reported SRL data. Through cluster analysis, students were classified into three levels based
on their self-regulatory ability and the characteristics of each group were analyzed. By incorporating qualitative
data, we explored possible explanations for the differences between students' self-reported SRL data and the
digital trace data. This study challenges us to question the validity of existing self-reported SRL instruments. The
three-cluster division of students' learning behaviors provides practical implications for online teaching and
learning.

1. Introduction reported surveys (Winne & Perry, 2000). Self-reported data from stu­
dents have been criticized as lacking validity (Hadwin, Nesbit,
Online education has been growing tremendously in the past decade Jamieson-Noel, Code, & Winne, 2007; Winne, 2010; Zimmerman,
(Van Rooij & Zirkle, 2016), and it has been playing a dominant role in 2008). One possible solution to address this issue is to use learners' trace
education during the coronavirus pandemic. Despite the popularity of data collected by learning management systems as a supplement to the
online education, not all students are equally successful in asynchronous self-reported SRL data. Traces are defined as “observable indicators
online courses. The situation has been even worse during the corona­ about cognition that students create as they engage with a task” (Winne
virus pandemic because most students have had no choice but to take & Perry, 2000, p. 551). Recent studies (Hwu, 2003; Yu & Zhao, 2015)
their courses online. Dray, Lowenthal, Miszkiewicz, Ruiz-Primo, and have indicated that online students' behavioral data are more accurate
Marczynski (2011) indicated that students' personal traits of self- because the data collected from modern tracking technologies occur in
direction and initiative are significant predictors of online learners' actual learning situations in real-time. Learners may be aware of the
success. Recent studies also demonstrate that in order for learners to data collection taking place, but it is relatively unobtrusive and difficult
succeed in online courses, they must have the capacity to regulate their for learners to alter, so one could assert that more authentic learning
learning (Hew & Cheung, 2014; Kizilcec & Schneider, 2015). With the behaviors can be recorded on a large scale using this approach. Winne
continuous growth of online courses and online programs offered by and Perry (2000) proposed two different conceptualizations of SRL: as
higher education, it is important to understand online students' self- an aptitude and as an event. Winne (2010) believed that self-reported
regulated learning (SRL) processes so that we can implement strate­ SRL should be considered as an aptitude and trace data could be
gies to enhance students' self-regulation abilities and thus improve their treated as an event. Trace data becomes the raw material for researchers
academic performance. to track aptitudes “in action” and how aptitudes may evolve as students
Although numerous studies about SRL have been conducted in online make progress in their studies.
learning environments, existing research has heavily relied on self- The purpose of this study is to investigate whether students' self-

* Corresponding author.
E-mail addresses: dny8514@uga.edu (D. Ye), bpennisi@uga.edu (S. Pennisi).
1
Present address: 219 Van Pelt and Opie Library, Michigan Technological University, Houghton, MI 49931, United States of America.

https://doi.org/10.1016/j.iheduc.2022.100855
Received 13 January 2022; Received in revised form 11 April 2022; Accepted 11 April 2022
Available online 15 April 2022
1096-7516/© 2022 Elsevier Inc. All rights reserved.
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

reported SRL aligns with their behavior as indicated by the digital trace she aims to achieve, and his or her related prior knowledge and expe­
data collected through the learning management system. We hope that rience. Goal setting plays a critical role in this phase (Zimmerman, 2000,
this study will help inform how trace data can be used to enhance online 2008). Pintrich (2000) postulated that learners in general have two
teaching and learning. The research questions posed are as follows: major goal orientations: mastery and performance. Learners with
mastery approach goals aim at improving their skills and knowledge
(1) How do the digital trace data collected by the learning manage­ while learners with performance approach goals are more concerned
ment system reflect the students' self-reported SRL? with outperforming others. In the monitoring phase, learners oversee
(2) What is the relationship between students' performance and the their own learning progress, motivations, effort, cognitive strategies,
digital trace data and self-reported SRL data? and the learning environment. During the controlling phase, learners
(3) What are the patterns of learning behaviors based on the digital adjust their cognitive and motivational strategies, decide to increase or
trace data and self-reported SRL data? decrease effort, choose better learning environments, etc. In the reaction
(4) What are the explanations for any differences between the stu­ and reflection phase, learners execute the selected cognitive and moti­
dents' self-reported SRL data and the digital trace data? vational strategies and then evaluate if their reactions are effective or
not. Reactions are mainly reflected in time and effort patterns that
2. Theoretical framework learners spend studying the learning task. Learners also make attribu­
tions for their performance and make reflections in this phase.
Several SRL models have been proposed in existing self-regulated
learning literature, including Zimmerman's cyclical phases model 3. Literature review on SRL-based learning analytics
(Zimmerman, 2000), Pintrich's SRL model (Pintrich, 2000), Boekaerts'
dual processing model (Boekaerts, 2011), the Conditions, Operations, Given the volume of data that learning management systems collect,
Products, Evaluations, and Standards (COPES) model (Winne & Hadwin, it can be difficult to decide which learning behavior variables to focus
1998), and Efklide's Metacognitive and Affective Model of SRL (MASRL) on. A review of the existing literature provides some guidance. This
(Efklides, 2011). SRL is a consistent adaptive process, the momentary section will also explore what existing research has been done regarding
trace data collected by the LMS can change dynamically and contextu­ the alignment between self-reported SRL and the trace data.
ally; however, long-term trace data can reflect students' SRL ability
when the content domain is fixed. Therefore, the trace data reveal the
SRL process (through momentary trace data) and inform the SRL results 3.1. Existing research on learning behavioral variables
(through long-term trace data).
In view of this SRL model development, Pintrich's SRL model was A summary of learning behavioral variables used in existing related
selected to serve as the theoretical framework of this study. Pintrich's studies was analyzed. The most commonly used learning behavior var­
SRL model is a general process model, which was developed based on an iables are study regularity (Kim, Yoon, Jo, & Branch, 2018; Li, Baker, &
extension of Zimmerman's cyclical phases model. It has been widely Warschauer, 2020; Li, Flanagan, Konomi, & Ogata, 2018; You, 2016),
used and its process-oriented nature fits well with that of digital trace procrastination (Colthorpe, Zimbardi, Ainscough, & Anderson, 2015; Li
data. Pintrich's self-regulated learning model (Fig. 1) includes four et al., 2018; Li et al., 2020; You, 2016), time investment (Gelan et al.,
phases: 1) forethought, planning, and activation, 2) monitoring, 3) 2018; Kim et al., 2018; Li et al., 2020; You, 2016), completion (Li et al.,
control, and 4) reaction and reflection. In the forethought, planning, and 2018; You, 2016), and help-seeking (Kim et al., 2018). The definition of
activation phase, the learner forms a plan on how to achieve the learning these terms in different studies varies slightly. Study regularity generally
goals based upon the perceived upcoming learning tasks, the goals he/ refers to the frequency of the student accessing various learning mate­
rials or LMS login frequency. Procrastination usually measures whether

Fig. 1. Pintrich's self-regulated learning model. Adapted from Pintrich (2004), p.390.

2
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

students submit the assignments on time or whether they study the Table 1
learning materials on time. Time investment often refers to the total time Matching map between the SRL phases, learning behavior data in LMS, and
students spend studying the course or completing a task. Completion OSLQ.
usually measures the portion of the course or the task students complete. Pintrich's SRL model Learning behavioral variable in LMS OSLQ scale
Help-seeking generally measures how many times students reach out to phase (code) (code)*
other learners or the instructor for help. However, most of these studies Forethought, Time spent on reviewing the syllabus Goal-setting (GS)
did not provide a solid pedagogy or theory support for why these planning, and (SylTim) and the rubrics (RubTim); if
learning behavioral variables were chosen. The researchers generally activation the syllabus has been downloaded
(SylDow)
chose these learning behavioral variables based on other existing
Syllabus (SylFre) and rubrics (RubFre)
studies. visit frequency
Monitoring Course logins (LoginM), total item Task strategies
3.2. Existing research on the comparison between self-reported SRL and visits (TotVisM), topic visits (TS)
the trace data (TopVisM) per module
Average days accessing the course per
module (DayVisM)
Some studies compared the academic prediction power between Control Average time spent on each module Time
these two types of data and found that digital trace data are more (TimeM) and additional resources management
powerful than self-reported SRL in predicting academic performance (AddRes) (TM)
Number of threads (#DisThr) and
(Cho & Yoo, 2017; Hadwin et al., 2007; Li et al., 2020). Cho and Yoo
replies (#DisRep) created in online
(2017) used a classification model to predict students' final grades based discussions; number of posts read
on self-reported SRL survey data and log files and found that the accu­ (#PostRea)
racy of correctly classified instances (58.33%) of the prediction model Lecture completion rate (LecCom)
from log attributes was higher than that of the prediction model from Average number of late submissions
per module (LatSubAv); Average
self-reported SRL (41.67%). Using regression analysis, Pardo, Han, and hours submitting assignments before
Ellis (2017) found that the variation of the students' final scores for their the deadline (Advance)
course is better explained when combining self-reported SRL data with Average module completion rate
the digital trace data of seven student engagement events. However, before the deadline (MComAv)
Reaction and Number of total questions asking to Help-seeking
there is limited research investigating whether students' self-reported
reflection the instructor(s) (#QSum) and asking (HS)
SRL aligns with their learning behavior as indicated by digital trace for an extension (#AskExt)
data, and how to interpret digital trace data in light of self-reported SRL Number of times asking the instructor
data. Li et al. (2020) used trace data to measure time management and (s) questions excluding asking for
effort regulation aspects of SRL and found a significant association be­ extension (#QAskIns)
Average number of revisiting a Self-evaluation
tween the trace data measures and students' self-reported time man­ module after the grades are published (SE)
agement and effort regulation after the course. However, their study was (Revisit)
only focused on two aspects of SRL. The current study aims at comparing Optional self-assessments (SAssCom)
students' trace data with that of self-reported SRL data from all aspects in and bonus quiz (BQCom) completion
rate
the hope of providing a comprehensive understanding of how the trace
Average number of retaking quizzes
data correspond to self-reported SRL measures. (RetakeQ)
*
OSLQ includes another scale named Environmental Structuring (ES), which
4. Learning behavior variables proposed in this study
has no corresponding learning behavior data, so it is excluded from the table.

Based on a review of the literature, the majority of related studies


have not provided a solid pedagogy or theoretical support for why the etc.
learning behavioral variables were chosen. Several studies pointed out
that it is critical to align the data collection with SRL model (Siadaty, 5. Methods
Gasevic, & Hatala, 2016; Yu & Zhao, 2015). In addition, several re­
searchers have argued that when doing learning analytics research, data Existing research mainly focuses on quantitative data, such as the
should be interpreted from the learners' perspective (Ferguson, 2012) clickstream data or the interaction between the learner and the content.
and that the results should provide actionable recommendations The current study includes the interaction data between the learner and
(Gasevic, Dawson, & Siemens, 2015). One key reason why existing the content, the learner and other learners, and the learner and the
learning analytic research receives such criticisms is that we do not have instructor. In addition, both quantitative and qualitative research
a solid pedagogy or theory support for data collection and results methods were used in this study because both types of data can support
explanation. Therefore, in this study, a theory-based data collection is and supplement each other while also serving different purposes.
proposed in Table 1. Using Pintrich's (2000) cyclical SRL model, we Maxwell (2013) pointed out that qualitative and quantitative methods
developed a matching map between the SRL phases and the corre­ are not simply different ways of doing the same thing, but are best used
sponding learning behavioral data in LMS. We noticed that some stu­ to address different kinds of questions and goals. In this study, quanti­
dents chose to download some learning materials and read them offline, tative data were used to address the “what” questions, while qualitative
implying the trace data may not reflect the actual time they spent data were used to address the “why” question. Although data gathered
studying. For this reason, the study monitored whether the learner within the learning management system are authentic and provide
downloaded any files and if so, that data were collected and included in critical information about learners' learning behaviors, the behavioral
the analysis. Online Self-Regulated Learning Questionnaire (OSLQ: data are part of a more complex learning process. Pardo, Ellis, and Calvo
Barnard, Paton, & Lan, 2008) was used to collect students' self-reported (2015) used a mixed research method and argued that the meaning of
SRL data in this study, so its scales were also included in the matching learning analytics is improved when combining quantitative data with
map. OSLQ includes six scales and a total of 24 items. Each scale has qualitative data. Therefore, this study combined self-reported SRL
three to six items. For example, the goal setting scale includes items measurement results with LMS behavioral data and qualitative inter­
about whether the learner sets short-term and long-term goals, and view data in the hope that they can overcome the inherent limitation of
whether the learner sets standards for his/her learning in online courses, each type of methodology and provide a more complete picture.

3
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

5.1. Research context contextual explanations for the differences between the trace data and
the self-reported data using qualitative data.
The data were collected from an upper-level, undergraduate/grad­
uate split-level science course, cross-listed with horticulture and crop 5.2.2. Data collection process
and soil sciences, and offered through a College of Agriculture and Students were invited to participate in the study voluntarily in the
Environmental Sciences. The course was taught in an asynchronous first week of the course through a course announcement. Once the
online format. The content of the course is core to both disciplines as participants agreed, they were asked to complete the OSLQ. During the
well as required for several graduate programs. However, approximately semester, the learning management system recorded learners' behavior
half of the students who registered for the course major in other biology data automatically. Data from the LMS was collected based on the
disciplines and take the course as an upper-level science elective. The module length, weekly or biweekly. Some scholars (Li et al., 2020) argue
course was chosen for this study because it has had a stable large that it is problematic to use self-report questionnaires before a course
enrollment in the past 5 years with a diverse student population. The starts and researchers should gather data during the interventions to
content is organized into modules. Each module has required readings, document whether aptitudes change. Therefore, in this study, students
lectures, additional resources, some optional self-assessments, a quiz, were asked to complete the OSLQ again in the last week of the course. By
and an online discussion or assignment. The final grades are calculated the end of the course, learners' completion data of each individual item,
based on 10 quizzes (35.3% of the final grade), four practical lab written interaction data in online discussions, interactions with the instructor,
assignments (33% of the final grade), six online discussions (28.2% of login history, time spent on reviewing content, final grades, etc. were
the final grade), and one applied concept activity (3.5% of the final recorded in an Excel file and used for data analysis. The randomly
grade). The quizzes cover basic knowledge students need to master, selected 20 participants were contacted and 11 participated in the video
while the practical lab written assignments, online discussions, and conferencing interview. The length of the interviews ranged from 30 to
applied concept activity require students to observe, perform hands-on 50 min. One of the authors is the instructor of the course. All the data
experiments, record and analyze data, interpret their findings, and collection and analysis were conducted by the other author and the
explain their understanding by applying the concepts they learned in the instructor was not involved nor given any information while the study
course. Based on the structure and components of the grading policy, was ongoing. Participants communicated only with the researcher
final grades measure both students' knowledge understanding and responsible for data collection and analysis.
application and cover both low and high levels of cognitive learning.
5.3. Data analysis
5.1.1. Participants
Participants were students who registered for this course. They came Both quantitative and qualitative data analysis methods were used in
from various majors and included both undergraduates (majority) and this study. Table 2 presents an overview of the data analysis methods
graduates. A total of 91 students were enrolled in the course. Of these, used in this study within the framework of the study's research
67 students agreed to participate in the study, but one student withdrew questions.
in the middle of the semester and another student did not complete the
SRL post-course survey. Among these 65 participants, there were 20 5.3.1. Quantitative data analysis
identified as male and 45 identified as female. A total of 47 were un­ Trace data were organized in an Excel file, and the average values
dergraduate students and the remaining 18 were graduate students. were calculated for each learning variable of the trace data and each
Among the 65 participants, 29 students indicated they were willing to scale of the self-reported SRL data.
participate in the interview at the beginning of the course. By the end of Pearson correlations were used to explore how the trace data re­
the semester, 20 students were randomly selected among these 29 and flected students' self-reported SRL data, as well as to compare the pre-
invited to participate in the interview. A total of 11 eventually partici­ course with the post-course self-reported SRL data. In addition, linear
pated in the interview. regression analysis and feature selection methods were used to construct
prediction models and identify key attributes from self-reported SRL
5.2. Data collection data and the digital trace data. Three methods were used to identify key
features: stepwise, forward, and backward.
5.2.1. Instruments Cluster analysis was used to analyze both the trace data and the self-
Although the use of Motivated Strategies for Learning Questionnaire reported SRL data. Cluster analysis is a class of techniques that are used
has a long history and is the most widely used self-report questionnaire
for self-regulated learning, it has a large number of items, making its use Table 2
to collect data impractical in an online course. In contrast, the OSLQ is An overview of the data collection and analysis methods.
short and concise, which led to the assumption that it would yield higher Research questions Data collection Analysis
completion rates and therefore more data. Thus, the OSLQ was used to methods methods
collect students' self-reported self-regulation data (Table 1). It consists of
RQ1: How do the trace data reflect the self- 1.1 Recorded data Correlation
24 items in six scales using a 5-point Likert response format ranging from reported SRL data? from LMS
strongly agree (5) to strongly disagree (1). In addition, it is an SRL 1.2 OSLQ Pre- and
questionnaire specifically designed for online learning environments, post-course
which fits well with the context of the study. Most importantly, the RQ2: What's the relationship between 1.1 Recorded data Regression
performance and the trace data and self- from LMS analysis
available psychometric data show it to be a reliable and valid way to reported SRL data? 1.2 OSLQ Pre- and Feature
assess the self-regulatory learning skills of students in both blended and post-course selection
online courses (Barnard, Lan, To, Paton, & Lai, 2009). RQ3: What learning patterns are present 1.1 Recorded data Cluster
A semi-structured interview was conducted to focus on students' based on the trace data? from LMS analysis
1.2 OSLQ Pre- and
typical self-regulated learning behaviors while taking this course. Spe­
post-course
cifically, participants were asked to recall “what they typically did while RQ4: What are the explanations for any 1.1 Recorded data Correlation
taking this course” based on the six scales of self-regulated learning: goal differences between their self-reported from LMS Thematic
setting, environmental structuring, task strategies, time management, SRL data and the digital trace data? 1.2 OSLQ Pre- and Analysis
help-seeking, and self-evaluation. An interview protocol was used dur­ post-course
1.3 Interview
ing the interview. The purpose of the interview was to explore possible

4
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

to classify a set of data objects based on similarity and dissimilarity Table 3


without prior information about the cluster membership for any of the Comparison of the pre- and post-course self-reported SRL data.
objects. In this study, the K-means clustering method was used to group OSLQ scale Item # Mean Std Dev Cronbach's Alpha Correlation
students based on relatively homogeneous learning behavioral patterns.
GS pr 5 4.258 0.555 0.772 0.473
It is a classification method to group points by computing the distance GS po 5 4.305 0.511 0.751
between points and group centers. All the values of these variables were ES pr 4 4.220 0.585 0.768 0.510
standardized by converting them into z-scores, and then the K-mean ES po 4 4.292 0.646 0.839
clustering analysis was conducted using the SPSS® software. After the TS pr 4 3.139 0.706 0.592 0.450
TS po 4 3.169 0.679 0.512
completion of the cluster analysis, the characteristics of each group of TM pr 3 3.569 0.676 0.424 0.522
learners were summarized. Finally, the results were compared with that TM po 3 3.590 0.790 0.607
of the self-reported SRL data to analyze similarities and differences. HS pr 4 3.196 0.705 0.553 0.428
HS po 4 2.919 0.777 0.677
SE pr 4 3.268 0.829 0.686 0.545
5.3.2. Qualitative data analysis
SE po 4 3.223 0.771 0.729
The interview data were transcribed and analyzed using a thematic
analysis method (proposed by Braun & Clarke, 2006) as follows. The Note. See Table 1 for the full meaning of the OSLQ scale abbreviations. The pr
topics of the interview data were categorized and coded, and the codes and po indicate pre- and post-course survey results respectively.
were assigned to themes based on the following process: (1) familiar­
ization with the data; (2) generating initial codes; (3) searching for review of the correlations of the pre- and post-course data shows a range
themes; (4) reviewing themes; (5) defining and naming themes; and (6) of between 0.428 and 0.545 (Table 3). The critical r (two-tailed) value is
producing the report. The interview recordings were first transcribed. 0.244 (p < .05) and 0.399 (p < .001) with a sample size n = 65.
After that, the transcribed data were read repeatedly while coding each Therefore, students' self-reported SRL data are positively correlated.
topic, and the codes were categorized into themes. The audio recording According to the guidelines for social sciences proposed by Cohen
was also listened to as needed to pick up nuances in meaning based on (1988), the effect size is medium.
the participants' voice inflections and tones. Matrices strategy was used To explore statistical significance between the means of pre- and
to list the topics in a summary format for each participant in an Excel post-course self-reported SRL data, both paired samples test and Wil­
file. Then, after reading across all these participants, common themes coxon signed rank tests were conducted according to the results of
were found and marked. While analyzing the common themes, an effort normality tests. The results show that there are no significant mean
was made to align them with the six scales of the OSLQ. Finally, these differences between the pre- and post-course self-reported SRL data
themes were reported. except the help-seeking scale (Table 4). The results indicate that stu­
dents thought they would be more likely to ask for help in the pre-course
5.4. Validity and reliability survey than in the post-course survey.
To better answer the question of how the trace LMS data reflect the
Different approaches were used to ensure the validity and reliability self-reported SRL data, correlation analysis of the trace data (pre- and
of both quantitative and qualitative research methods used in this study. post-course) and self-reported SRL data were conducted using rcritical =
The OSLQ was developed from an 86-item pool and reduced to 24 items .244, p<.05 and rcritical = .399, p < .001 (Table 5). The environment
after examining their internal consistency and exploratory factor anal­ structuring scale was excluded from the data analysis due to no corre­
ysis results based on data collected. Barnard et al. (2009) has shown that sponding trace data. More significant correlations were found between
the instrument is reliable and valid to assess the self-regulatory learning the learning behavior data and post-course self-reported SRL data than
skills of students in both blended and online courses. that of the pre-course data. Based on the results shown in Table 5, it is
The behavioral data automatically recorded from the LMS were apparent that the following three learning behavior variables aptly
reviewed carefully for accuracy, and both the elbow method and reflect students' self-reported SRL data: (1) bonus quiz completion rate;
silhouette method were performed to validate cluster analysis results. (2) average time spent on each module; and (3) quiz retaken rate. The
For the qualitative interview, the same questions were consistently reason is that they are significantly positively correlated to at least two
posed to all interviewees, except some additional questions were added self-reported SRL scales. There are another seven learning behavior
after the first interview in order to more closely align them with trace variables that are significantly positively correlated to at least one self-
data. The process was recorded in detail, and to enhance the validity of reported SRL scale: (1) time spent on viewing rubrics, (2) course logins
the qualitative research process, all memos, drafts, notes, and analytic per module, (3) average days accessing the course per module, (4) the
notes were maintained in an organized file and are available upon number of discussion threads created, (5) the number of discussion re­
request. plies created, (6) average hours submitting assignments before the
deadline, and (7) module completion rate before the deadline. While the
6. Results majority of correlations are positive, some are negative. It is not sur­
prising that the average late submission rate is negatively correlated
6.1. RQ1: how do the trace data reflect the self-reported SRL data? with the goal setting of the post-course survey (r(64) = − 0.253, p < .05),
but other negative correlations are interesting and worth some discus­
The self-reported SRL data were collected twice: once at the begin­ sion. First, students' self-reported help-seeking scale (post-course) is
ning of the course (pre-course) and again at the end of the course (post- negatively correlated with the average lecture completion rate (r(64) =
course). Based on the OSLQ, six perspectives of self-regulated learning − 0.326, p < .05) as well as the average time spent on additional
data were self-reported by the participants, including goal setting,
environment structuring, task strategies, time management, help-
Table 4
seeking, and self-evaluation. The question is whether the students'
Mean differences of pre- and post-course self-reported SRL data.
self-reported SRL data collected before they began the course signifi­
Scale GS ES TM TS SE HS
cantly differ from their self-reported SRL data collected after the course
ended. In order to compare these two sets of scores, the mean score of Test Wilcoxon signed rank tests Paired samples tests
each scale was calculated. Table 3 shows the comparison of the pre- and Sig. 0.497 0.358 0.952 0.734 0.671 0.003*

post-course self-reported SRL data. An informal inspection indicates that Note. See Table 1 for the full meaning of the OSLQ scale abbreviations.
the means and standard deviations of each scale appear similar. A *
p < .05.

5
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

Table 5
Correlations of the learning behavior variables and self-reported SRL data.
Variables GSPr GSPo TSPr TSPo TMPr TMPo SEPr SEPo HSPr HSPo

SylFre 0.136 0.117 0.078 0.002 0.090 0.133 − 0.001 0.059 − 0.031 − 0.005
SylTim − 0.127 − 0.004 − 0.160 ¡0.250* 0.080 − 0.073 − 0.167 − 0.139 − 0.014 − 0.085
SylDow 0.145 − 0.039 0.196 − 0.015 0.123 0.131 0.126 0.068 0.151 0.029
RubFre 0.160 0.185 0.101 − 0.106 0.038 − 0.041 0.147 − 0.115 0.081 0.111
RubTim 0.214 0.047 0.094 − 0.071 0.065 − 0.044 0.254* − 0.060 0.232 − 0.032
LoginM 0.042 0.244* − 0.125 ¡0.255* 0.018 0.168 0.082 0.094 0.119 0.109
DayVisM 0.193 0.315* 0.113 − 0.010 0.219 0.247 0.080 − 0.140 0.074 0.030
TopVisM 0.202 0.202 0.072 0.180 0.143 0.049 − 0.070 ¡0.250* 0.023 − 0.145
TotVisM 0.175 0.233 0.058 − 0.008 0.072 0.005 − 0.036 ¡0.348* − 0.034 − 0.159
TimeM 0.261* 0.214 0.106 0.077 0.250* 0.180 − 0.121 − 0.138 − 0.113 − 0.100
AddRes 0.177 0.084 0.047 − 0.018 0.044 − 0.104 0.174 − 0.119 0.123 ¡0.269*
#PostRea 0.115 0.166 0.060 0.114 0.017 0.045 0.056 − 0.044 0.078 0.178
#DisThr 0.364** 0.195 0.155 0.145 0.215 0.084 0.075 − 0.062 0.096 − 0.002
#DisRep 0.214 0.287* 0.170 0.154 0.111 0.038 0.014 − 0.120 0.040 0.016
LecCom 0.085 0.086 − 0.057 − 0.098 0.105 − 0.040 − 0.160 ¡0.387* − 0.079 ¡0.326*
LatSubAv − 0.106 ¡0.253* 0.058 − 0.005 − 0.018 − 0.053 − 0.060 − 0.041 − 0.048 − 0.023
Advance 0.112 0.322* − 0.078 0.163 0.131 0.123 − 0.118 − 0.204 − 0.016 − 0.020
MComAv 0.206 0.248* 0.092 0.215 0.156 0.089 − 0.010 − 0.169 0.039 − 0.102
#QAskIns 0.005 − 0.091 − 0.178 − 0.012 0.060 0.068 − 0.073 − 0.099 − 0.133 − 0.098
#AskExt − 0.053 − 0.129 0.070 0.038 0.126 0.189 0.074 0.063 − 0.032 − 0.087
RetakeQ 0.246* 0.173 0.108 0.118 0.141 0.207 0.034 0.293* 0.057 0.168
Revisit 0.002 − 0.130 − 0.040 − 0.155 − 0.070 − 0.041 − 0.067 − 0.201 0.026 − 0.059
SAssCom 0.185 0.133 − 0.091 0.073 0.119 0.077 − 0.148 − 0.143 0.028 − 0.164
BQCom 0.238 0.247* 0.250* 0.263* 0.236 0.295* 0.125 0.004 0.219 0.109

Note. See Table 1 for the full meaning of the OSLQ scale abbreviations and the variable codes. Pr and Po indicate pre- and post-course survey results respectively.
Numbers in bold indicate statistically significant correlations.
*p < .05 ** p < .001

resources (r(64) = − 0.269, p < .05). Secondly, students' self-reported an R2 of .568, while the backward method ended with a different model
self-evaluation scale (post-course) is negatively correlated with the [F(9, 55) = 13.451, p < .001], with an R2 of .688. Both models include
average topics visited per module (r(64) = − 0.250, p < .05), total items five predictors: (1) the average number of late submissions, (2) the
visited per module (r(64) = − 0.348, p < .05), and lecture completion average number of discussion replies created, (3) the average course
rate (r(64) = − 0.387, p < .05). Lastly, students' self-reported task stra­ logins per module, (4) the average number of discussion posts read, and
tegies scale (post-course) is negatively correlated with the average time (5) the average number of discussion threads created. Besides the five
spent on viewing the syllabus (r(64) = − 0.250, p < .05) and login per key predictors used in the first model, four additional variables were
module (r(64) = − 0.255, p < .05). included in the second model as predictors: (1) self-assessment
Overall, the LMS trace data correlate with the self-reported SRL data completion rate, (2) average time spent on each module, (3) average
to some degree. The LMS learning behavior data correlated well with the module completion rate before the deadline, and (4) average topics
goal-setting scale in the OSLQ but did not correlate well with the task visited per module.
strategies and time management scales. The worst were the help-seeking
and self-evaluation scales because at least two negative correlations 6.2.2. The relationship between performance and the self-reported SRL data
existed between these two scales and the trace data. It also shows that Multiple linear regression was used to predict final grades using all
most of the time management learning behavior variables are positively the self-reported SRL data including both the pre- and post-course sur­
correlated with the goal-setting of the OSLQ. vey data. No significant regression equation was found. Three feature
selection methods (stepwise, backward, and forward) were used to
6.2. RQ2: what is the relationship between performance and the trace eliminate unnecessary predictors and they all ended with the same
data and self-reported SRL data? model [F(1, 63) = 7.625, p < .05], with an R2 of .108. The model in­
cludes only one predictor - the mean rate of the goal setting in the post-
6.2.1. The relationship between performance and the trace data course survey.
Multiple linear regression was conducted to predict final grades The results show that the trace data could explain around 73% of the
based on all the learning behavioral data (Table 6). A significant variance of students' final grades, while the self-reported SRL data could
regression equation was found [F(24, 40) = 4.454, p < .001], with an R2 only explain around 11% variance of students' final grades. In summary,
of .728. Because the learning behavior data include 25 variables, three it appears that students' trace data collected from LMS are much more
feature selection methods were used to eliminate unnecessary pre­ powerful than students' self-reported SRL data in predicting students'
dictors: stepwise, backward, and forward. The stepwise and forward final grades.
methods ended with the same model [F(5, 59) = 15.484, p < .001], with
6.3. RQ3: what are the patterns of learning behaviors based on the trace
Table 6 data and self-reported SRL data?
Multiple linear regression results of three feature selection methods.
Method Variables R2 F Sig. Using standardized trace data, the K-means cluster analysis was
Stepwise/ LatSubAv, #DisRep, LoginM, 0.568 F(5, 59) p<
conducted based on two assumptions: (1) students with high or low self-
Forward #PostRea, #DisThr = 15.484 0.001 regulatory ability, or (2) students with high, moderate, or low self-
Backward #PostRea, LoginM, #DisThr, 0.688 F(9, 55) p< regulatory ability. These assumptions are based on existing SRL
SAssCom, TimeM, LatSubAv, = 13.451 0.001 models (Pintrich, 2000; Zimmerman, 2013) and literature reported by
#DisRep, MComAv, TopVisM
other researchers (Kim et al., 2018).
Note. See Table 1 for the full meaning of the variable codes. Fig. 2 shows the final cluster centers of two clusters based on

6
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

Fig. 2. Results of two cluster assumption using students' trace data. Cluster 1 represents low self-regulated learners (76% of the behavioral learning variables with
negative cluster centers) and cluster 2 represents high self-regulated learners (76% of the behavioral learning variables with positive cluster centers).

students' trace data collected from LMS: these students can be catego­ self-regulated learners, and 36 were classified as low self-regulated
rized into low self-regulated learners (cluster 1–76% of the behavioral learners. Based on the ANOVA results, among the 25 behavioral
learning variables with negative cluster centers) and high self-regulated learning variables, 14 had significantly different cluster centers in the
learners (cluster 2–76% of the behavioral learning variables with posi­ two clusters. The significant results show that students with low self-
tive cluster centers). Among the 65 students, 29 were classified as high regulatory ability tend to submit assignments late and ask for an

Fig. 3. Results of three cluster assumption using students' trace data. Cluster 1 represents low self-regulated learners (20% of the behavioral learning variables with
positive cluster centers), cluster 2 represents moderate self-regulated learners (52% of the behavioral learning variables with positive cluster centers), and cluster 3
represents high self-regulated learners (76% of the behavioral learning variables with positive cluster centers).

7
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

extension more often, while students with high self-regulatory ability remaining 40 were high self-regulated learners. The three-cluster results
tend to access rubrics, the course, topics, and items more frequently, categorized these students into: (1) low self-regulated learners who have
spend more time accessing the course per module, post more discussion negative cluster centers in all five SRL scales; (2) moderate self-
threads and replies, have higher lecture and unit completion rate before regulated learners who have negative cluster centers in goal setting,
the deadline, start to work on assignment earlier, and have higher self- task strategies, and time management but positive cluster centers in self-
assessment and bonus quiz completion rate. The results also show that evaluation and help-seeking; (3) high self-regulated learners who have
although there are no significant differences, students with low self- positive cluster centers in all five SRL scales. Among the 65 participants,
regulatory ability tend to access the syllabus more frequently and 29 were classified as high self-regulated learners, another 23 were
spend more time viewing the syllabus, but students with high self- moderate self-regulated learners, and the remaining 13 students were
regulatory ability tend to download the syllabus. low self-regulated learners. The ANOVA results indicate that all ten of
Using the second cluster assumption, three cluster analysis results the SRL scales had significantly different cluster centers in the three
indicated that students could be categorized into low (cluster 1–20% of clusters.
the behavioral learning variables with positive cluster centers), mod­ Based on the cluster analysis of both LMS trace data and self-reported
erate (cluster 2–52% of the behavioral learning variables with positive SRL data, similar learning behavior patterns were found among low,
cluster centers), and high (cluster 3–76% of the behavioral learning moderate, and high self-regulated learners although the trace data re­
variables with positive cluster centers) self-regulated learners (Fig. 3). sults were not as straightforward as the self-reported SRL data. However,
Among the 65 participants, 20 were classified as high self-regulated upon further examination on the results of these two types of data
learners, 20 students were moderate self-regulated learners, and the analysis, only 53.846% and 30.769% of cases were classified into the
remaining 25 students were low self-regulated learners. Based on the same cluster in the two-cluster and three-cluster analysis respectively.
ANOVA results, among the 25 behavioral learning variables, 18 had This indicates that although the trace data and self-reported SRL data
significantly different cluster centers among the three clusters. The have similarities, they are different. This leads to the next research
significant results show that high self-regulated learners in general question.
accessed the course more often, visited more topics and items, actively
participated in online discussions more often, and spent more time on 6.4. RQ4: what are the explanations for any differences between the
the course. They also tended to work on assignments or modules ahead Students' self-reported SRL data and the digital trace data?
of time to avoid late submissions, and the average unit completion
before the due date is also higher than the other two clusters. It is sur­ To answer this research question, three types of data were used to
prising that they tended to ask fewer questions and seldom revisited triangulate (Table 2). Among the 65 participants, 11 voluntarily
previous modules. participated in a semi-structured interview. According to the cluster
The moderate self-regulated learners had medium cluster center analysis results of trace data and self-reported SRL data, these 11 stu­
values in most of the behavioral learning variables, but they also had dents were classified into different clusters. Table 7 shows the cluster
some extreme cluster center values in some of the behavioral learning results, with these 11 students represented by the letters Sa – Sk. It is
variables. They tended to visit the syllabus and rubrics the least apparent that students Sc, Sd, and Si were classified as high self-regulated
frequently, but they had the highest cluster center value of syllabus learners by all these four cluster analysis methods. Although students Se,
downloading. They also logged into the course least frequently and had Sf, and Sg were classified into different groups, the results are close. The
the largest late submission value, while they have the highest cluster big differences exist between the results for students Sb, Sh, Sj, and Sk.
center values of revisiting previous modules and asking the instructor Overall, it could be construed that both trace data and self-reported SRL
questions. In general, the moderate self-regulated learners spent less data were generally able to classify high self-regulated learners more
time studying and were not good at time management, but they tended accurately than moderate or low self-regulated learners in this study.
to improve their performance through strategies such as seeking help The logical next step is to compare high self-regulated learners with
from the instructor, revisiting previous modules, reading the syllabus to moderate or low self-regulated learners to explore what causes the dif­
better understand the course requirements, etc. ferences by incorporating the interview data collected.
The low self-regulated learners had the lowest cluster center values Based on the cluster analysis results, students Sc, Sd, and Si can easily
in most of these behavioral learning variables. Based on the statistically be classified as high self-regulated learners. Three out of the four cluster
significant results, they spent least time studying, visiting topics and analysis methods identify students Se and Sf as high self-regulated
items, and watching lectures. They were the least active group in the learners, and the other method identifies them as moderate self-
online discussions. They were the latest group to begin to study a regulated learners (they can roughly be categorized as high self-
module and tended to submit assignments late, but their late submission regulated learners in order to be divided into two groups of similar
happened not as often as the moderate self-regulated learners. It appears numbers and compare them). Therefore, students Sc, Sd, Se, Sf, and Si are
that they tend to get the work done in the least time. They also logged grouped into “higher” self-regulated learners and the rest six students
into the course more frequently than the moderate group. However, they are categorized as “lower” self-regulated learners. Here, “higher” and
had the lowest cluster center values in self-evaluation and help-seeking.
Overall, they are a group of students who spend the least time studying Table 7
and do not utilize many learning strategies. Cluster memberships of the eleven students based on the cluster analysis results.
In summary, based on the cluster analysis results of LMS trace data, Student Trace data 2 Trace data 3 SRL survey 2 SRL survey 3
high self-regulated learners tend to spend the most time and effort in Clu Clu Clu Clu
studying and do not seek help much; moderate self-regulated learners
Sa High Moderate High Moderate
tend to spend moderate time and effort in studying and seek help the Sb High High Low Moderate
most often; low self-regulated learners tend to spend the least time and Sc High High High High
effort in studying and seek help the least often. Sd High High High High
As a comparison, cluster analysis was also conducted based on stu­ Se High Moderate High High
Sf High Moderate High High
dents' self-reported SRL data in both the pre-course and post-course Sg Low Low Low Moderate
surveys. The two-cluster results categorized these students into: (1) Sh Low Low High High
one group has all the SRL scales with negative cluster centers; (2) one Si High High High High
group has all the SRL scales with positive cluster centers. Among the 65 Sj High Moderate Low High
Sk Low Low High High
students, 25 were classified into low self-regulated learners, and the

8
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

“lower” are used because these students are classified comparably. Let's students Sc, Sd, and Si, their average hours of submitting assignments
first compare these two groups by looking at the correlations between before the deadline were 58, 91, and 18 h, respectively. It is noteworthy
the three types of data: LMS trace data, self-reported SRL data, and that although student Si was taking five courses in the semester, he/she
interview data. Due to the fact that only limited quantitative data can be was still able to complete all the assignments 18 h before the deadline on
collected through interviews, the comparison mainly focused on stu­ average, which is not easy. Based on the excerpts from students Sc and
dents' time management ability. Table 8 shows the correlation values Sd, they also seem to be able to balance study and personal life well.
comparison between the higher and lower self-regulated learners ach­ They tend to get all their school work done by Friday and enjoy a study-
ieved by cross-checking these three types of data. Based on previous free weekend.
analysis, trace data reflect students' self-regulated learning more accu­
rately than the self-reported SRL data, so trace data is compared with 6.4.1.2. Pattern two: being able to identify his/her weakness through self-
both the interview data and the self-reported SRL data. As shown in reflection. Being able to identify one's weakness through self-reflection
Table 8, overall, higher self-regulated learners tend to self-report their of learning and being honest with oneself is another important sign of
SRL more accurately while lower self-regulated learners' self-reported high self-regulatory ability.
data tend to have more negative correlations with trace data. This ex­ Student Sc admitted that he/she should work on the graduate school
plains why trace data and self-reported SRL data tend to classify high assignment earlier, saying:
self-regulated learners accurately. The negative correlations found
among lower self-regulated learners indicate that these students tend to I think if I were to do anything differently, it would be the graduate
self-rate SRL scales higher. school stuff, because I did that all in the last, like month of classes.
And so, if I had maybe looked at that before, it would ease my stress a
6.4.1. Patterns of higher self-regulated learners little bit.
The theme analysis results show that higher self-regulated learners Student Sd expressed a similar thought that he/she should have
have the following two patterns: (1) completing the assignments early started the graduate students' module earlier because that module was
and (2) being able to identify his/her weakness through self-reflection. more challenging for him/her.
Student Si also planned to change, saying:
6.4.1.1. Pattern one: completing the assignments early. Students Sc, Sd,
and Si all tended to get the assignments done early. This feature is a well- Um, I would definitely try to get in contact with my classmates more
recognized and important sign of high self-regulatory ability. often just so because I know there're other people out there who
Student Sc said: probably don't understand some things or just wanted to keep like …
keep each other accountable for our work…So I definitely love to do
“I am the type of person who doesn't like to procrastinate too much. that and probably be more in contact with the professor to just like
So especially for this course, … so all the course assignments were sending frequent emails...
due on Sunday nights, … well, I would try to have the lectures and
the quizzes done by Wednesday. And I would try to have the practical Both students Sc and Sd admitted that they did not start to work on
lab assignments and the discussion posts done by Friday so that I the graduate students' assignments early on in the semester, which was a
could have the weekend to myself...” mistake. Student Si reflected that he/she should interact with classmates
or the professor more frequently. The reflections of students Sc, Sd, and Si
Student Sd expressed a similar thought that he/she would usually were very specific and obtainable. They admitted their mistakes frankly
have all the assignments done by Friday so that he/she did not have any and did not use any excuse.
work to do at weekend. Student Si also indicated that he/she would get
all the assignments done by Friday so that he/she could email the pro­ 6.4.2. Patterns of lower self-regulated learners
fessor when he/she was confused with anything. The theme analysis results show that lower self-regulated learners
Although for different reasons, these students all aimed at tend to procrastinate or study just enough to get their desired grade.
completing the work at least 2 days before it was due so that there would They also do not plan to change their study approaches or strategies
be some flexibility when they needed it. The trace data show that for much. Because the regression analysis shows trace data can reflect stu­
dents' learning more accurately, students Sg, Sh, and Sk are used here to
Table 8 represent lower self-regulated learners.
Correlation comparison between “higher” and “lower” self-regulated (SR)
learners. 6.4.2.1. Pattern one: procrastinate or not studying hard. Procrastination
Correlations Higher SR Lower SR or not studying hard is an important sign of low self-regulatory ability.
learners learners Student Sg admitted that he/she is dilatory, saying:
Avg. time spent on each module Trace Vs. 0.135 0.681
Interview
I mean, there's a couple of weeks where, you know, I just get busy.
Avg. time spend on each module Trace Vs. Survey 0.015 ¡0.575 And I had to take the quizzes on Sunday or to do the discussions on
time management scale pr Sunday, but there was always preparation prior to doing them, even
Avg. time spent on each module Trace Vs. Survey ¡0.239 ¡0.623 if I did have to wait till the last minute to do those, which didn't
time management scale po
happen very often.
Avg. days visit per module Trace Vs. Interview ¡0.295 ¡0.865
Total # of questions asked to instructors Trace Vs. 0.199 ¡0.629
Student Sg also admitted that he/she sometimes does not want to
Survey help-seeking pr
Total # of questions asked to instructors Trace Vs. 0.410 ¡0.625 study although he/she knows it may harm his/her grade and bring
Survey help-seeking po stress:
Avg. assignment advance submission Trace Vs. 0.824 ¡0.240
Interview I mean, there are times like I may not want to study, but it's also, you
Avg. assignment advance submission Trace Vs. 0.380 ¡0.215 know, of a thing where it's like for the grade or good. And I have an
Survey time management scale pr understanding of that, like if I could just go ahead and knock it out
Avg. assignment advance submission Trace Vs. 0.622 0.060
now, tonight or later this week, it won't be as big of a stressor and I'll
Survey time management scale po
have a grasp on it and be able to perform well.
Note. n = 5 for higher SR learners; n = 6 for lower SR learners. Numbers in bold
indicate negative correlations. Student Sh also admitted that he/she sometimes tends to

9
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

procrastinate: more powerful in predicting learners' performance than self-reported


SRL data. Learners can be classified into three groups: high self-
The only thing is really just procrastination. Sometimes discussion…I
regulated learners who spend a lot of time and effort in studying,
would feel like it's just like so much work, so sometimes in the weeks
moderate self-regulated learners who spend moderate time and effort in
I would just be like, ok, no, just do it the next day because it was just
studying but use a lot of study strategies to improve their performance,
too much and I just didn't want to do it. But I mean, I would get to it
and low self-regulated learners who spend the least time and effort in
at the end of the day. Sometimes I would wait… like for some of
studying.
them, I waited till the last minute. But I mean, I finished everything
else.
7.1. Self-reported SRL data pre-course vs. post-course
Student Sk expressed frankly that he/she just study enough to get the
wanted grades: Through comparing students' self-reported SRL data between pre-
course and post-course, this study found that although they are close
I'm not very good at studying. I kind of just know how to do work. in general, post-course self-reported SRL data could reflect students' self-
Like I calculate what exactly I would need to get the grade that I want regulation in context more accurately. This result is consistent with
and study and study just enough to get there. existing research results (Li et al., 2020). Although students' self-
Both students Sg and Sh tend to procrastinate sometimes. Student Sk regulatory ability is relatively stable, students' reactions in different
did not submit any assignments late, but he/she wanted to study just contexts may be different. Therefore, when it is necessary to collect
enough to get the grade he/she wanted. students' self-reported SRL data that reflect the learning context, doing
so during or at the end of the course may be a better option than col­
lecting it at the beginning of the course.
6.4.2.2. Pattern two: do not plan to make much change. Although they
tend to procrastinate or not study hard, they do not plan to make much
7.2. Digital trace data vs. self-reported SRL data
change when asked what they would do differently if they took another
online course.
This study also shows that the trace data collected from LMS can
Student Sg did not plan to change much except make some adjust­
reflect learners' self-reported SRL data to some degree. The learning
ments based on the learning materials:
behavior data from LMS positively correlate with the goal-setting scale
Um, I don't know. I think I would probably keep a lot of stuff the in the self-reported SRL data, but do not correlate much with the task
same. I mean, I might… depending on what the class is like, you strategies and time management scales, and negatively correlate with
know, there might be a little bit like different things. But I think, like, the self-evaluation and help-seeking scales. It is worthwhile to focus
I have a pretty good indication of, like how I learn, how I study, what additional attention on the self-evaluation and help-seeking scales to
drives me, how to manage time. And so I think just like sticking to find out why some negative correlations existed. The help-seeking scale
that and ensuring that I'm doing the things right is kind of the same could be interpreted differently in different learning contexts. For
approach I would take. I don't know that would be like a tone of example, it could be interpreted that students who ask more questions
things I'd do different, but it is kind of adjusting to the material of the are more responsible for their learning, but it also could be interpreted
course or what's at hand. that students who ask more questions do not master the learning ma­
terials or fully follow the instructions so that they need to ask more
Student Sh expressed a similar opinion:
questions to get clarifications or guidance from the instructor. In the
Um, I don't think I would do anything different. I would really just do context of this study, the latter seems to be true because this course
what I do right now, which is just, you know, planning everything provided a clear structure and detailed instructions. For all the online
out. Planning, I think, to me is key, (that) works for me for sure, to discussions and lab report assignments, rubrics were given with grading
finish things off. criteria. During the interview, students also agreed that this course had
clear instructions. But the results of this study indicate that students' self-
Student Sk expressed that he/she would watch the lecture videos, reported help-seeking scale was negatively correlated with the average
saying: lecture completion rate as well as average time spent on additional re­
sources. It could be that students who spent less time on lectures and
I would make a commitment to watching the lecture videos more
additional resources tended to ask more questions, so they rated their
because maybe the experience might be different, maybe that they'll
help-seeking scale higher than the rest of the students, and vice versa.
actually have pertinent information that's different from what's in the
Therefore, the help-seeking scale in self-reported SRL data cannot well
textbooks in the lecture videos. What else could I do? … I don't think
reflect students' self-regulatory ability.
I'd do anything else differently over that.
For the self-evaluation scale, it is generally believed that students
Both students Sg and Sh do not plan to change their habits although who do self-evaluation more often tend to be high self-regulated
student Sg mentioned that he/she would adjust his/her study approach learners. However, this study shows that students' self-reported self-
based on the materials of the course. Student Sk said he/she would evaluation scale (post-course) is negatively correlated with the average
commit to watching the lecture videos if they had relevant information topics visited per module, average items visited per module, and lecture
that was different from what was in the textbook. According to the trace completion rate, but positively correlated with the average rate of
data, student Sk's average lecture completion rate was 2.36%. The retaking quizzes. This indicates that students tended to rate their self-
question is if he/she does not watch all or most of the lecture videos, evaluation higher when they retook quizzes more often, but their lec­
how does he/she know that the lecture videos do not have additional ture completion rate was lower and they visited fewer topics and items
useful information other than what's covered in the textbook? It sounds per module on average. These students seemed to take advantage of
like more of an excuse for not watching lecture videos. Overall, lower retaking quizzes to get higher grades instead of mastering the learning
self-regulated learners tend to use excuses more often to make them­ materials by watching lectures or studying materials.
selves feel good. It is surprising that among these five self-reported SRL scales, only
the goal-setting scale predicts students' learning performance. Both self-
7. Discussion evaluation and help-seeking scales are problematic in predicting stu­
dents' learning performance. These two scales can be interpreted
In summary, the results of this study suggest that digital trace data is differently in different contexts as explained previously. They can also

10
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

be used as strategies to improve learners' grades. Therefore, in future mastery, conceptual understanding, and critical thinking. The results of
self-regulated learning studies, we should be cautious with the use of the three cluster analysis in this study are consistent with Bain's postu­
these two scales. It is also questionable if the self-reported SRL instru­ lation: high self-regulated learners use the deep approach, so they spend
ment is valid in its ability to reflect students' true self-regulatory ability. a lot of time and effort in studying; moderate self-regulated learners use
By comparing students' self-reported SRL data with their trace data the strategic approach, so they tend to spend moderate time and effort in
collected through the course, we found that students' digital trace data studying and use learning strategies such as self-evaluation and help-
from the LMS reflect students' learning more accurately than self- seeking more often to improve their grades; and low self-regulated
reported SRL, which is consistent with existing research results (Cho & learners use the surface approach, studying just enough to get the
Yoo, 2017; Hadwin et al., 2007; Li et al., 2020). However, we should be grades they wanted.
aware that digital trace data heavily rely on the context of the course. By comparing the trace data with self-reported SRL data and the
For example, a student who spends little time and effort on this course interview data, it was found that high self-regulated learners tend to be
may spend a lot of time and effort on a different course due to different more consistent when reporting their self-regulatory ability. Through
perceived demands and interests. Therefore, it is questionable to inter­ thematic analysis, it was found that high self-regulated learners tend to
pret students' self-regulatory ability mainly based on the digital trace submit assignments early and are able to self-reflect and see their own
data of a particular course. A combination of digital trace data from shortcomings clearly. In contrast, low self-regulated learners tend to
multiple courses may be more reliable than that of a single course. We procrastinate and, surprisingly, do not plan to change their behaviors
also should be cautious about potential ethical issues when using digital much. It appears that low self-regulated learners are not concerned with
trace data to interpret students' self-regulated learning abilities. Besides their study habits although the digital trace data show that they do not
the potential bias that exists in using digital trace data from a limited have good study habits (tend to spend the least time and effort in
number of courses, some students may prefer to download learning studying). They might have intrinsically lower expectations in learning
materials to study offline and digital trace data will not account for this. compared to high self-regulated learners. This may explain why low self-
Using digital trace data alone may lead to inaccurate judgments about regulated learners tend to self-report their SRL ability relatively high.
the participation of these students. Just as Perrotta (2013) pointed out: Another possible explanation is that low self-regulated students may
“the decontextualized analysis of student data and the powerful per­ know their own weakness in self-regulation, but they choose not to
formativity arguments that underpin them may subvert concerns for admit it. Based on Boekaerts' dual processing model (Boekaerts, 2011),
social equity and justice” (p. 119). More research should be conducted to low self-regulated learners might choose a well-being pathway to pro­
explore and establish possible ethical practices and policies in relation to tect their ego from damage. According to Zimmerman (2013), it also
the use of digital trace data. For example, increasing the transparency in could be that high self-regulated learners are proactive learners who can
data ownership, analysis, and use may be one ethical practice that could plan their learning strategically in order to see their limitations, while
be adopted (Pardo & Siemens, 2014). low self-regulated learners are reactive learners who cannot identify
their own weaknesses without comparing themselves with others. It is
7.3. Important learning behavior variables not clear if low self-regulated learners truly are unable to identify their
own weaknesses or, instead, they are choosing not to admit these
The results of this study indicate that the most important learning weaknesses exist. How to help low self-regulated learners to identify
behavioral variables that impact students' learning are: study regulation, their own limitations and motivate them to change might be worth
online discussion interactions, timing of assignment submission (termed further exploration in future studies.
“procrastination” in the literature), time investment, and completion. Moderate self-regulated learners may value grades more than
This finding is consistent with existing reports (Cho & Yoo, 2017; learning. It is critical to create a learning environment in which learning
Colthorpe et al., 2015; Gelan et al., 2018; Kim et al., 2018; Li et al., 2018; and grading work together for the students; for example, Farias, Farias,
You, 2016). Compared to previous research, the results of the current and Fairfield (2010) pointed out that educators could assign grades with
study show that students' online discussion interactions are the most a developmental perspective in which students can respond to feedback.
important learning behavioral variable, which has been largely neglec­ One means of supporting grades-oriented learners is to provide detailed,
ted by other works. However, online discussions are worth about 28% of critical feedback with high standards and assure learners that they are
the final grade in the course, which may be the reason why online dis­ capable of achieving those standards. Through the process, they may
cussion interactions have been found to be the most important behav­ become confident with their ability and eventually shift to learning-
ioral learning variables and would merit further study. oriented. This study also shows that educators should be cautious with
the dilemma of learning versus grades when using self-evaluation and
7.4. Learning behavior patterns based on students' trace data help-seeking strategies in online courses. For example, it may be not
ideal to give unlimited attempts to a quiz. When answering students'
Existing self-regulated learning theories tend to categorize students questions, instead of giving answers directly, try to give hints, inspire
into two groups, such as proactive and reactive learners (Zimmerman, students, or ask thought-provoking questions to promote students' in­
2013) and learners with mastery and performance approach goals dependent thinking and deep learning.
(Pintrich, 2000). According to the significant results of two cluster
analysis, high self-regulated learners align well with learners with 7.5. Limitations of the study
mastery approach goals, but there is no clear evidence to support that
low self-regulated learners align with performance approach goals in Although students in this course came from several different disci­
this study. Both the elbow and silhouette methods have been performed plines, one of the limitations of this study is that data were collected
to identify the optimal cluster numbers. Although the result of the elbow from a single online course. Further studies in different disciplines and
method appears to support both two and three clusters, the result of the multiple online courses should be conducted to further validate the re­
silhouette methods shows that the optimal cluster number is three. sults of this study. In addition, only part of the students in the class
Moreover, it is more informative to categorize students into three to agreed to participate in this study, which may bring bias. Further studies
provide better differentiation. Bain (2004) suggested that most learning with more complete data should be conducted to yield more convincing
approaches fall into three categories: 1) the surface approach, in which results.
students are interested primarily in surviving the course; 2) the strategic Another limitation of this study is that the course final grade was
approach, in which students are driven by a desire to receive good used as the outcome variable in the linear regression. It may be better to
grades; and 3) the deep approach, in which students are learning for use students' GPAs as the outcome variable because students' self-

11
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

regulatory ability is a relatively stable ability and GPAs are a kind of Data availability
longitudinal data, which may match better than the final grades of a
particular course. In this study, final grades were used because they The authors do not have permission to share data.
reflect students' learning very well, as explained in the methods section.
In addition, it is difficult to acquire students' GPAs. However, this study Acknowledgements
involved three types of data resources, and data triangulation was used
to ensure the validity of the findings. I thank Dr. Lloyd Rieber at the University of Georgia for his guidance
and advice throughout this research project and for his extensive feed­
8. Conclusion and future research back and suggestions on earlier drafts of this manuscript.

By highlighting the need to explore new ways to measure students' References


self-regulatory ability, this study found that digital trace data could
reflect students' learning accurately. Thus, in the future, digital trace Bain, K. (2004). What the best college teachers do. Harvard University Press.
Barnard, L., Lan, W. Y., To, Y. M., Paton, V. O., & Lai, S. L. (2009). Measuring self-
data from LMSs could be used more often in both SRL research and regulation in online and blended learning environments. The Internet and Higher
educational research. While using digital trace data to conduct research, Education, 12(1), 1–6.
we also need to follow ethical rules and protect participants' privacy. In Barnard, L., Paton, V., & Lan, W. (2008). Online self-regulatory learning behaviors as a
mediator in the relationship between online course perceptions with achievement.
this study, we anonymized the data and all data were stored in The International Review of Research in Open and Distance Learning, 9(2), 1–11.
password-protected files. The instructor was excluded from the data Boekaerts, M. (2011). Emotions, emotion regulation, and self-regulation of learning. In
collection and analysis process, which provided some safeguards to the B. J. Zimmerman, & D. H. Schunk (Eds.), Handbook of self-regulation of learning and
performance (pp. 408–425). New York, NY: Routledge.
vulnerable participants who may be affected by the consequences of the Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative
study. More research should be conducted to develop related ethics Research in Psychology, 3, 77–101.
standards, examine the effect of such standards on students, while Cho, M., & Yoo, J. S. (2017). Exploring online students’ self-regulated learning with self-
reported surveys and log files: A data mining approach. Interactive Learning
weighing potential risks and benefits.
Environments, 25(8), 970–982.
This study shows that self-reported SRL data has very limited power Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York, NY:
in predicting students' academic performance, which challenges us to Routledge Academic.
question the validity of existing self-reported SRL instruments especially Colthorpe, K., Zimbardi, K., Ainscough, L., & Anderson, S. (2015). Know thy student!
Combining learning analytics and critical reflections to increase understanding of
those with self-evaluation and help-seeking scales. In practice, educators students’ self-regulated learning in an authentic setting. Journal of Learning
need to balance the dilemma of learning versus grades when using self- Analytics, 2(1), 134–155.
evaluation and help-seeking strategies in online courses. Dray, B. J., Lowenthal, P. R., Miszkiewicz, M. J., Ruiz-Primo, M. A., & Marczynski, K.
(2011). Developing an instrument to assess student readiness for online learning: A
Through cluster analysis, this study found that students can be validation study. Distance Education, 32(1), 29–47.
categorized into three groups: high, moderate, and low self-regulated Efklides, A. (2011). Interactions of metacognition with motivation and affect in self-
learners, which aligns well with the three different learning ap­ regulated learning: The MASRL model. Educational Psychologist, 46(1), 6–25.
Farias, G., Farias, C. M., & Fairfield, K. D. (2010). Teacher as judge or partner: The
proaches proposed by Bain (2004). It challenges us to rethink the two- dilemma for grades versus learning. Journal of Education for Business, 85, 336–342.
cluster division of existing SRL models. It also highlights a great po­ Ferguson, R. (2012). Learning analytics: Drivers, developments and challenges. The
tential to use students' digital trace data for their benefit. As mentioned International Journal of Technology Enhanced Learning, 4(5/6), 304–317.
Gasevic, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are
in the discussion, educators could use diverse strategies to teach stu­ about learning. Tech Trends, 59(1), 64–71.
dents with different self-regulatory abilities. We also could release some Gelan, A., Fastre, G., Verjans, M., Martin, N., Janssenswillen, G., … Thomas, M. (2018).
of the digital trace data to students so that they could benefit through Affordances and limitations of learning analytics for computer-assisted language
learning: A case study of the VITAL project. Computer Assisted Language Learning, 31
comparing themselves with others, which may help low self-regulated
(3), 294–319.
learners identify their own weaknesses based on Zimmerman (2013)’s Hadwin, A. F., Nesbit, J. C., Jamieson-Noel, D., Code, J., & Winne, P. H. (2007).
theory of proactive and reactive learners. Currently, students only have Examining trace data to explore self-regulated learning. Metacognition and Learning,
limited access to their digital trace data. Further research should explore 2, 107–124.
Hew, K. F., & Cheung, W. S. (2014). Students’ and instructors’ use of massive open online
what kind of digital trace data could be made available to students, and courses (MOOCs): Motivations and challenges. Educational Research Review, 12,
provide detailed guidelines for both instructors and students on how to 45–58.
use the data. Both ethical issues and potential benefits should be care­ Hwu, F. (2003). Learners' behaviors in computer-based input activities elicited through
tracking technologies. Computer Assisted Language Learning, 16(1), 5–29.
fully considered when deciding what kind of digital trace data to let Kim, D., Yoon, M., Jo, I.-H., & Branch, R. M. (2018). Learning analytics to support self-
students have access to. regulated learning in asynchronous online courses: A case study at a women’s
Through the triangulation of three different types of data, this study university in South Korea. Computers & Education, 127, 233–251.
Kizilcec, R. K., & Schneider, E. (2015). Motivation as a lens to understand online learners:
also found that low self-regulated learners tend to self-rate their own Toward data-driven design with the OLEI scale. Transactions on Computer-human
SRL scales relatively high, and that they are reluctant to change their Interactions (TOCHI), 22(2), 24.
study habits. Further research should be conducted to find possible Li, H., Flanagan, B., Konomi, S., & Ogata, H. (2018). Measuring behaviors and identifying
indicators of self-regulation in computer-assisted language learning courses.
approaches to help these students overcome this challenge. Research and Practice in Technology Enhanced Learning, 13(1), 1–12.
Li, Q., Baker, R., & Warschauer, M. (2020). Using clickstream data to measure,
Funding understand, and support self-regulated learning in online courses. The Internet and
Higher Education, 45. https://doi.org/10.1016/j.iheduc.2020.100727
Maxwell, J. A. (2013). Qualitative research design: An interactive approach (3rd ed.). SAGE
This research did not receive any specific grant from funding Publications, Inc.
agencies in the public, commercial, or not-for-profit sectors. Pardo, A., Ellis, R., & Calvo, R. A. (2015). Combining observational and experiential data
to inform the redesign of learning activities. In LAK’15 proceedings of the fifth
international conference on learning analytics and knowledge (pp. 305–309).
Declarations of interest Pardo, A., Han, F., & Ellis, R. A. (2017). Combining university student self-regulated
learning indicators and engagement with online learning events to predict academic
The authors declare that they have no conflict of interest. performance. IEEE Transactions on Learning Technologies, 10(1).
Pardo, A., & Siemens, G. (2014). Ethical and privacy principles for learning analytics.
The data that support the findings of this study are not publicly British Journal of Educational Technology, 45(3), 438–450.
available due to privacy and ethical restrictions. Perrotta, C. (2013). Assessment, technology and democratic education in the age of data.
Learning, Media and Technology, 38(1), 116–122. https://doi.org/10.1080/
17439884.2013.752384

12
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855

Pintrich, P. R. (2004). A conceptual framework for assessing motivation and self- Winne, P. H., & Perry, N. E. (2000). Measuring self-regulated learning. In M. Boekaerts,
regulated learning in college students. Educational Psychology Review, 16(4), P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 531–566). San
385–407. Diego: Academic Press.
Pintrich, P. R. (2000). The role of goal orientation in self-regulated learning. In You, J. W. (2016). Identifying significant indicators using LMS data to predict course
M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. achievement in online learning. The Internet and Higher Education, 29, 23–30.
452–502). San Diego, CA: Academic Press. Yu, Q., & Zhao, Y. (2015). The value and practice of learning analytics in computer
Siadaty, M., Gasevic, D., & Hatala, M. (2016). Trace-based microanalytic measurement of assisted language learning. Studies in Literature and Language, 10(2), 90–96.
self-regulated learning processes. Journal of Learning Analytics, 3(1), 183–214. Zimmerman, B. J. (2000). Attaining self-regulation: A social cognitive perspective. In
https://doi.org/10.18608/jla.2016.31.11 M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp.
Van Rooij, S. W., & Zirkle, K. (2016). Balancing pedagogy, student readiness and 531–566). San Diego: Academic Press.
accessibility: A case study in collaborative online course development. The Internet Zimmerman, B. J. (2008). Investigating self-regulation and motivation: Historical
and Higher Education, 28, 1–7. background, methodological developments, and future prospects. American
Winne, P. H. (2010). Improving measurements of self-regulated learning. Educational Educational Research Journal, 45(1), 166–183.
Psychologist, 45(4), 267–276. https://doi.org/10.1080/00461520.2010.517150 Zimmerman, B. J. (2013). From cognitive modeling to self-regulation: A social cognitive
Winne, P. H., & Hadwin, A. F. (1998). Studying as self-regulated learning. In career path. Educational Psychologist, 48(3), 135–147.
D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory
and practice (pp. 279–306). Hillsdale, NJ: Erlbaum.

13

You might also like