Professional Documents
Culture Documents
A R T I C L E I N F O A B S T R A C T
Keywords: The purpose of this study was to investigate whether students' self-reported SRL align with their digital trace data
Self-regulated learning collected from the learning management system. This study took place in an upper-level college agriculture
Online learning environments course delivered in an asynchronous online format. By comparing online students' digital trace data with their
Digital trace data
self-reported data, this study found that digital trace data from LMS could predict students' performance more
Self-reported self-regulated learning data
Cluster analysis
accurately than self-reported SRL data. Through cluster analysis, students were classified into three levels based
on their self-regulatory ability and the characteristics of each group were analyzed. By incorporating qualitative
data, we explored possible explanations for the differences between students' self-reported SRL data and the
digital trace data. This study challenges us to question the validity of existing self-reported SRL instruments. The
three-cluster division of students' learning behaviors provides practical implications for online teaching and
learning.
1. Introduction reported surveys (Winne & Perry, 2000). Self-reported data from stu
dents have been criticized as lacking validity (Hadwin, Nesbit,
Online education has been growing tremendously in the past decade Jamieson-Noel, Code, & Winne, 2007; Winne, 2010; Zimmerman,
(Van Rooij & Zirkle, 2016), and it has been playing a dominant role in 2008). One possible solution to address this issue is to use learners' trace
education during the coronavirus pandemic. Despite the popularity of data collected by learning management systems as a supplement to the
online education, not all students are equally successful in asynchronous self-reported SRL data. Traces are defined as “observable indicators
online courses. The situation has been even worse during the corona about cognition that students create as they engage with a task” (Winne
virus pandemic because most students have had no choice but to take & Perry, 2000, p. 551). Recent studies (Hwu, 2003; Yu & Zhao, 2015)
their courses online. Dray, Lowenthal, Miszkiewicz, Ruiz-Primo, and have indicated that online students' behavioral data are more accurate
Marczynski (2011) indicated that students' personal traits of self- because the data collected from modern tracking technologies occur in
direction and initiative are significant predictors of online learners' actual learning situations in real-time. Learners may be aware of the
success. Recent studies also demonstrate that in order for learners to data collection taking place, but it is relatively unobtrusive and difficult
succeed in online courses, they must have the capacity to regulate their for learners to alter, so one could assert that more authentic learning
learning (Hew & Cheung, 2014; Kizilcec & Schneider, 2015). With the behaviors can be recorded on a large scale using this approach. Winne
continuous growth of online courses and online programs offered by and Perry (2000) proposed two different conceptualizations of SRL: as
higher education, it is important to understand online students' self- an aptitude and as an event. Winne (2010) believed that self-reported
regulated learning (SRL) processes so that we can implement strate SRL should be considered as an aptitude and trace data could be
gies to enhance students' self-regulation abilities and thus improve their treated as an event. Trace data becomes the raw material for researchers
academic performance. to track aptitudes “in action” and how aptitudes may evolve as students
Although numerous studies about SRL have been conducted in online make progress in their studies.
learning environments, existing research has heavily relied on self- The purpose of this study is to investigate whether students' self-
* Corresponding author.
E-mail addresses: dny8514@uga.edu (D. Ye), bpennisi@uga.edu (S. Pennisi).
1
Present address: 219 Van Pelt and Opie Library, Michigan Technological University, Houghton, MI 49931, United States of America.
https://doi.org/10.1016/j.iheduc.2022.100855
Received 13 January 2022; Received in revised form 11 April 2022; Accepted 11 April 2022
Available online 15 April 2022
1096-7516/© 2022 Elsevier Inc. All rights reserved.
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
reported SRL aligns with their behavior as indicated by the digital trace she aims to achieve, and his or her related prior knowledge and expe
data collected through the learning management system. We hope that rience. Goal setting plays a critical role in this phase (Zimmerman, 2000,
this study will help inform how trace data can be used to enhance online 2008). Pintrich (2000) postulated that learners in general have two
teaching and learning. The research questions posed are as follows: major goal orientations: mastery and performance. Learners with
mastery approach goals aim at improving their skills and knowledge
(1) How do the digital trace data collected by the learning manage while learners with performance approach goals are more concerned
ment system reflect the students' self-reported SRL? with outperforming others. In the monitoring phase, learners oversee
(2) What is the relationship between students' performance and the their own learning progress, motivations, effort, cognitive strategies,
digital trace data and self-reported SRL data? and the learning environment. During the controlling phase, learners
(3) What are the patterns of learning behaviors based on the digital adjust their cognitive and motivational strategies, decide to increase or
trace data and self-reported SRL data? decrease effort, choose better learning environments, etc. In the reaction
(4) What are the explanations for any differences between the stu and reflection phase, learners execute the selected cognitive and moti
dents' self-reported SRL data and the digital trace data? vational strategies and then evaluate if their reactions are effective or
not. Reactions are mainly reflected in time and effort patterns that
2. Theoretical framework learners spend studying the learning task. Learners also make attribu
tions for their performance and make reflections in this phase.
Several SRL models have been proposed in existing self-regulated
learning literature, including Zimmerman's cyclical phases model 3. Literature review on SRL-based learning analytics
(Zimmerman, 2000), Pintrich's SRL model (Pintrich, 2000), Boekaerts'
dual processing model (Boekaerts, 2011), the Conditions, Operations, Given the volume of data that learning management systems collect,
Products, Evaluations, and Standards (COPES) model (Winne & Hadwin, it can be difficult to decide which learning behavior variables to focus
1998), and Efklide's Metacognitive and Affective Model of SRL (MASRL) on. A review of the existing literature provides some guidance. This
(Efklides, 2011). SRL is a consistent adaptive process, the momentary section will also explore what existing research has been done regarding
trace data collected by the LMS can change dynamically and contextu the alignment between self-reported SRL and the trace data.
ally; however, long-term trace data can reflect students' SRL ability
when the content domain is fixed. Therefore, the trace data reveal the
SRL process (through momentary trace data) and inform the SRL results 3.1. Existing research on learning behavioral variables
(through long-term trace data).
In view of this SRL model development, Pintrich's SRL model was A summary of learning behavioral variables used in existing related
selected to serve as the theoretical framework of this study. Pintrich's studies was analyzed. The most commonly used learning behavior var
SRL model is a general process model, which was developed based on an iables are study regularity (Kim, Yoon, Jo, & Branch, 2018; Li, Baker, &
extension of Zimmerman's cyclical phases model. It has been widely Warschauer, 2020; Li, Flanagan, Konomi, & Ogata, 2018; You, 2016),
used and its process-oriented nature fits well with that of digital trace procrastination (Colthorpe, Zimbardi, Ainscough, & Anderson, 2015; Li
data. Pintrich's self-regulated learning model (Fig. 1) includes four et al., 2018; Li et al., 2020; You, 2016), time investment (Gelan et al.,
phases: 1) forethought, planning, and activation, 2) monitoring, 3) 2018; Kim et al., 2018; Li et al., 2020; You, 2016), completion (Li et al.,
control, and 4) reaction and reflection. In the forethought, planning, and 2018; You, 2016), and help-seeking (Kim et al., 2018). The definition of
activation phase, the learner forms a plan on how to achieve the learning these terms in different studies varies slightly. Study regularity generally
goals based upon the perceived upcoming learning tasks, the goals he/ refers to the frequency of the student accessing various learning mate
rials or LMS login frequency. Procrastination usually measures whether
Fig. 1. Pintrich's self-regulated learning model. Adapted from Pintrich (2004), p.390.
2
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
students submit the assignments on time or whether they study the Table 1
learning materials on time. Time investment often refers to the total time Matching map between the SRL phases, learning behavior data in LMS, and
students spend studying the course or completing a task. Completion OSLQ.
usually measures the portion of the course or the task students complete. Pintrich's SRL model Learning behavioral variable in LMS OSLQ scale
Help-seeking generally measures how many times students reach out to phase (code) (code)*
other learners or the instructor for help. However, most of these studies Forethought, Time spent on reviewing the syllabus Goal-setting (GS)
did not provide a solid pedagogy or theory support for why these planning, and (SylTim) and the rubrics (RubTim); if
learning behavioral variables were chosen. The researchers generally activation the syllabus has been downloaded
(SylDow)
chose these learning behavioral variables based on other existing
Syllabus (SylFre) and rubrics (RubFre)
studies. visit frequency
Monitoring Course logins (LoginM), total item Task strategies
3.2. Existing research on the comparison between self-reported SRL and visits (TotVisM), topic visits (TS)
the trace data (TopVisM) per module
Average days accessing the course per
module (DayVisM)
Some studies compared the academic prediction power between Control Average time spent on each module Time
these two types of data and found that digital trace data are more (TimeM) and additional resources management
powerful than self-reported SRL in predicting academic performance (AddRes) (TM)
Number of threads (#DisThr) and
(Cho & Yoo, 2017; Hadwin et al., 2007; Li et al., 2020). Cho and Yoo
replies (#DisRep) created in online
(2017) used a classification model to predict students' final grades based discussions; number of posts read
on self-reported SRL survey data and log files and found that the accu (#PostRea)
racy of correctly classified instances (58.33%) of the prediction model Lecture completion rate (LecCom)
from log attributes was higher than that of the prediction model from Average number of late submissions
per module (LatSubAv); Average
self-reported SRL (41.67%). Using regression analysis, Pardo, Han, and hours submitting assignments before
Ellis (2017) found that the variation of the students' final scores for their the deadline (Advance)
course is better explained when combining self-reported SRL data with Average module completion rate
the digital trace data of seven student engagement events. However, before the deadline (MComAv)
Reaction and Number of total questions asking to Help-seeking
there is limited research investigating whether students' self-reported
reflection the instructor(s) (#QSum) and asking (HS)
SRL aligns with their learning behavior as indicated by digital trace for an extension (#AskExt)
data, and how to interpret digital trace data in light of self-reported SRL Number of times asking the instructor
data. Li et al. (2020) used trace data to measure time management and (s) questions excluding asking for
effort regulation aspects of SRL and found a significant association be extension (#QAskIns)
Average number of revisiting a Self-evaluation
tween the trace data measures and students' self-reported time man module after the grades are published (SE)
agement and effort regulation after the course. However, their study was (Revisit)
only focused on two aspects of SRL. The current study aims at comparing Optional self-assessments (SAssCom)
students' trace data with that of self-reported SRL data from all aspects in and bonus quiz (BQCom) completion
rate
the hope of providing a comprehensive understanding of how the trace
Average number of retaking quizzes
data correspond to self-reported SRL measures. (RetakeQ)
*
OSLQ includes another scale named Environmental Structuring (ES), which
4. Learning behavior variables proposed in this study
has no corresponding learning behavior data, so it is excluded from the table.
3
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
5.1. Research context contextual explanations for the differences between the trace data and
the self-reported data using qualitative data.
The data were collected from an upper-level, undergraduate/grad
uate split-level science course, cross-listed with horticulture and crop 5.2.2. Data collection process
and soil sciences, and offered through a College of Agriculture and Students were invited to participate in the study voluntarily in the
Environmental Sciences. The course was taught in an asynchronous first week of the course through a course announcement. Once the
online format. The content of the course is core to both disciplines as participants agreed, they were asked to complete the OSLQ. During the
well as required for several graduate programs. However, approximately semester, the learning management system recorded learners' behavior
half of the students who registered for the course major in other biology data automatically. Data from the LMS was collected based on the
disciplines and take the course as an upper-level science elective. The module length, weekly or biweekly. Some scholars (Li et al., 2020) argue
course was chosen for this study because it has had a stable large that it is problematic to use self-report questionnaires before a course
enrollment in the past 5 years with a diverse student population. The starts and researchers should gather data during the interventions to
content is organized into modules. Each module has required readings, document whether aptitudes change. Therefore, in this study, students
lectures, additional resources, some optional self-assessments, a quiz, were asked to complete the OSLQ again in the last week of the course. By
and an online discussion or assignment. The final grades are calculated the end of the course, learners' completion data of each individual item,
based on 10 quizzes (35.3% of the final grade), four practical lab written interaction data in online discussions, interactions with the instructor,
assignments (33% of the final grade), six online discussions (28.2% of login history, time spent on reviewing content, final grades, etc. were
the final grade), and one applied concept activity (3.5% of the final recorded in an Excel file and used for data analysis. The randomly
grade). The quizzes cover basic knowledge students need to master, selected 20 participants were contacted and 11 participated in the video
while the practical lab written assignments, online discussions, and conferencing interview. The length of the interviews ranged from 30 to
applied concept activity require students to observe, perform hands-on 50 min. One of the authors is the instructor of the course. All the data
experiments, record and analyze data, interpret their findings, and collection and analysis were conducted by the other author and the
explain their understanding by applying the concepts they learned in the instructor was not involved nor given any information while the study
course. Based on the structure and components of the grading policy, was ongoing. Participants communicated only with the researcher
final grades measure both students' knowledge understanding and responsible for data collection and analysis.
application and cover both low and high levels of cognitive learning.
5.3. Data analysis
5.1.1. Participants
Participants were students who registered for this course. They came Both quantitative and qualitative data analysis methods were used in
from various majors and included both undergraduates (majority) and this study. Table 2 presents an overview of the data analysis methods
graduates. A total of 91 students were enrolled in the course. Of these, used in this study within the framework of the study's research
67 students agreed to participate in the study, but one student withdrew questions.
in the middle of the semester and another student did not complete the
SRL post-course survey. Among these 65 participants, there were 20 5.3.1. Quantitative data analysis
identified as male and 45 identified as female. A total of 47 were un Trace data were organized in an Excel file, and the average values
dergraduate students and the remaining 18 were graduate students. were calculated for each learning variable of the trace data and each
Among the 65 participants, 29 students indicated they were willing to scale of the self-reported SRL data.
participate in the interview at the beginning of the course. By the end of Pearson correlations were used to explore how the trace data re
the semester, 20 students were randomly selected among these 29 and flected students' self-reported SRL data, as well as to compare the pre-
invited to participate in the interview. A total of 11 eventually partici course with the post-course self-reported SRL data. In addition, linear
pated in the interview. regression analysis and feature selection methods were used to construct
prediction models and identify key attributes from self-reported SRL
5.2. Data collection data and the digital trace data. Three methods were used to identify key
features: stepwise, forward, and backward.
5.2.1. Instruments Cluster analysis was used to analyze both the trace data and the self-
Although the use of Motivated Strategies for Learning Questionnaire reported SRL data. Cluster analysis is a class of techniques that are used
has a long history and is the most widely used self-report questionnaire
for self-regulated learning, it has a large number of items, making its use Table 2
to collect data impractical in an online course. In contrast, the OSLQ is An overview of the data collection and analysis methods.
short and concise, which led to the assumption that it would yield higher Research questions Data collection Analysis
completion rates and therefore more data. Thus, the OSLQ was used to methods methods
collect students' self-reported self-regulation data (Table 1). It consists of
RQ1: How do the trace data reflect the self- 1.1 Recorded data Correlation
24 items in six scales using a 5-point Likert response format ranging from reported SRL data? from LMS
strongly agree (5) to strongly disagree (1). In addition, it is an SRL 1.2 OSLQ Pre- and
questionnaire specifically designed for online learning environments, post-course
which fits well with the context of the study. Most importantly, the RQ2: What's the relationship between 1.1 Recorded data Regression
performance and the trace data and self- from LMS analysis
available psychometric data show it to be a reliable and valid way to reported SRL data? 1.2 OSLQ Pre- and Feature
assess the self-regulatory learning skills of students in both blended and post-course selection
online courses (Barnard, Lan, To, Paton, & Lai, 2009). RQ3: What learning patterns are present 1.1 Recorded data Cluster
A semi-structured interview was conducted to focus on students' based on the trace data? from LMS analysis
1.2 OSLQ Pre- and
typical self-regulated learning behaviors while taking this course. Spe
post-course
cifically, participants were asked to recall “what they typically did while RQ4: What are the explanations for any 1.1 Recorded data Correlation
taking this course” based on the six scales of self-regulated learning: goal differences between their self-reported from LMS Thematic
setting, environmental structuring, task strategies, time management, SRL data and the digital trace data? 1.2 OSLQ Pre- and Analysis
help-seeking, and self-evaluation. An interview protocol was used dur post-course
1.3 Interview
ing the interview. The purpose of the interview was to explore possible
4
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
post-course self-reported SRL data. An informal inspection indicates that Note. See Table 1 for the full meaning of the OSLQ scale abbreviations.
the means and standard deviations of each scale appear similar. A *
p < .05.
5
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
Table 5
Correlations of the learning behavior variables and self-reported SRL data.
Variables GSPr GSPo TSPr TSPo TMPr TMPo SEPr SEPo HSPr HSPo
SylFre 0.136 0.117 0.078 0.002 0.090 0.133 − 0.001 0.059 − 0.031 − 0.005
SylTim − 0.127 − 0.004 − 0.160 ¡0.250* 0.080 − 0.073 − 0.167 − 0.139 − 0.014 − 0.085
SylDow 0.145 − 0.039 0.196 − 0.015 0.123 0.131 0.126 0.068 0.151 0.029
RubFre 0.160 0.185 0.101 − 0.106 0.038 − 0.041 0.147 − 0.115 0.081 0.111
RubTim 0.214 0.047 0.094 − 0.071 0.065 − 0.044 0.254* − 0.060 0.232 − 0.032
LoginM 0.042 0.244* − 0.125 ¡0.255* 0.018 0.168 0.082 0.094 0.119 0.109
DayVisM 0.193 0.315* 0.113 − 0.010 0.219 0.247 0.080 − 0.140 0.074 0.030
TopVisM 0.202 0.202 0.072 0.180 0.143 0.049 − 0.070 ¡0.250* 0.023 − 0.145
TotVisM 0.175 0.233 0.058 − 0.008 0.072 0.005 − 0.036 ¡0.348* − 0.034 − 0.159
TimeM 0.261* 0.214 0.106 0.077 0.250* 0.180 − 0.121 − 0.138 − 0.113 − 0.100
AddRes 0.177 0.084 0.047 − 0.018 0.044 − 0.104 0.174 − 0.119 0.123 ¡0.269*
#PostRea 0.115 0.166 0.060 0.114 0.017 0.045 0.056 − 0.044 0.078 0.178
#DisThr 0.364** 0.195 0.155 0.145 0.215 0.084 0.075 − 0.062 0.096 − 0.002
#DisRep 0.214 0.287* 0.170 0.154 0.111 0.038 0.014 − 0.120 0.040 0.016
LecCom 0.085 0.086 − 0.057 − 0.098 0.105 − 0.040 − 0.160 ¡0.387* − 0.079 ¡0.326*
LatSubAv − 0.106 ¡0.253* 0.058 − 0.005 − 0.018 − 0.053 − 0.060 − 0.041 − 0.048 − 0.023
Advance 0.112 0.322* − 0.078 0.163 0.131 0.123 − 0.118 − 0.204 − 0.016 − 0.020
MComAv 0.206 0.248* 0.092 0.215 0.156 0.089 − 0.010 − 0.169 0.039 − 0.102
#QAskIns 0.005 − 0.091 − 0.178 − 0.012 0.060 0.068 − 0.073 − 0.099 − 0.133 − 0.098
#AskExt − 0.053 − 0.129 0.070 0.038 0.126 0.189 0.074 0.063 − 0.032 − 0.087
RetakeQ 0.246* 0.173 0.108 0.118 0.141 0.207 0.034 0.293* 0.057 0.168
Revisit 0.002 − 0.130 − 0.040 − 0.155 − 0.070 − 0.041 − 0.067 − 0.201 0.026 − 0.059
SAssCom 0.185 0.133 − 0.091 0.073 0.119 0.077 − 0.148 − 0.143 0.028 − 0.164
BQCom 0.238 0.247* 0.250* 0.263* 0.236 0.295* 0.125 0.004 0.219 0.109
Note. See Table 1 for the full meaning of the OSLQ scale abbreviations and the variable codes. Pr and Po indicate pre- and post-course survey results respectively.
Numbers in bold indicate statistically significant correlations.
*p < .05 ** p < .001
resources (r(64) = − 0.269, p < .05). Secondly, students' self-reported an R2 of .568, while the backward method ended with a different model
self-evaluation scale (post-course) is negatively correlated with the [F(9, 55) = 13.451, p < .001], with an R2 of .688. Both models include
average topics visited per module (r(64) = − 0.250, p < .05), total items five predictors: (1) the average number of late submissions, (2) the
visited per module (r(64) = − 0.348, p < .05), and lecture completion average number of discussion replies created, (3) the average course
rate (r(64) = − 0.387, p < .05). Lastly, students' self-reported task stra logins per module, (4) the average number of discussion posts read, and
tegies scale (post-course) is negatively correlated with the average time (5) the average number of discussion threads created. Besides the five
spent on viewing the syllabus (r(64) = − 0.250, p < .05) and login per key predictors used in the first model, four additional variables were
module (r(64) = − 0.255, p < .05). included in the second model as predictors: (1) self-assessment
Overall, the LMS trace data correlate with the self-reported SRL data completion rate, (2) average time spent on each module, (3) average
to some degree. The LMS learning behavior data correlated well with the module completion rate before the deadline, and (4) average topics
goal-setting scale in the OSLQ but did not correlate well with the task visited per module.
strategies and time management scales. The worst were the help-seeking
and self-evaluation scales because at least two negative correlations 6.2.2. The relationship between performance and the self-reported SRL data
existed between these two scales and the trace data. It also shows that Multiple linear regression was used to predict final grades using all
most of the time management learning behavior variables are positively the self-reported SRL data including both the pre- and post-course sur
correlated with the goal-setting of the OSLQ. vey data. No significant regression equation was found. Three feature
selection methods (stepwise, backward, and forward) were used to
6.2. RQ2: what is the relationship between performance and the trace eliminate unnecessary predictors and they all ended with the same
data and self-reported SRL data? model [F(1, 63) = 7.625, p < .05], with an R2 of .108. The model in
cludes only one predictor - the mean rate of the goal setting in the post-
6.2.1. The relationship between performance and the trace data course survey.
Multiple linear regression was conducted to predict final grades The results show that the trace data could explain around 73% of the
based on all the learning behavioral data (Table 6). A significant variance of students' final grades, while the self-reported SRL data could
regression equation was found [F(24, 40) = 4.454, p < .001], with an R2 only explain around 11% variance of students' final grades. In summary,
of .728. Because the learning behavior data include 25 variables, three it appears that students' trace data collected from LMS are much more
feature selection methods were used to eliminate unnecessary pre powerful than students' self-reported SRL data in predicting students'
dictors: stepwise, backward, and forward. The stepwise and forward final grades.
methods ended with the same model [F(5, 59) = 15.484, p < .001], with
6.3. RQ3: what are the patterns of learning behaviors based on the trace
Table 6 data and self-reported SRL data?
Multiple linear regression results of three feature selection methods.
Method Variables R2 F Sig. Using standardized trace data, the K-means cluster analysis was
Stepwise/ LatSubAv, #DisRep, LoginM, 0.568 F(5, 59) p<
conducted based on two assumptions: (1) students with high or low self-
Forward #PostRea, #DisThr = 15.484 0.001 regulatory ability, or (2) students with high, moderate, or low self-
Backward #PostRea, LoginM, #DisThr, 0.688 F(9, 55) p< regulatory ability. These assumptions are based on existing SRL
SAssCom, TimeM, LatSubAv, = 13.451 0.001 models (Pintrich, 2000; Zimmerman, 2013) and literature reported by
#DisRep, MComAv, TopVisM
other researchers (Kim et al., 2018).
Note. See Table 1 for the full meaning of the variable codes. Fig. 2 shows the final cluster centers of two clusters based on
6
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
Fig. 2. Results of two cluster assumption using students' trace data. Cluster 1 represents low self-regulated learners (76% of the behavioral learning variables with
negative cluster centers) and cluster 2 represents high self-regulated learners (76% of the behavioral learning variables with positive cluster centers).
students' trace data collected from LMS: these students can be catego self-regulated learners, and 36 were classified as low self-regulated
rized into low self-regulated learners (cluster 1–76% of the behavioral learners. Based on the ANOVA results, among the 25 behavioral
learning variables with negative cluster centers) and high self-regulated learning variables, 14 had significantly different cluster centers in the
learners (cluster 2–76% of the behavioral learning variables with posi two clusters. The significant results show that students with low self-
tive cluster centers). Among the 65 students, 29 were classified as high regulatory ability tend to submit assignments late and ask for an
Fig. 3. Results of three cluster assumption using students' trace data. Cluster 1 represents low self-regulated learners (20% of the behavioral learning variables with
positive cluster centers), cluster 2 represents moderate self-regulated learners (52% of the behavioral learning variables with positive cluster centers), and cluster 3
represents high self-regulated learners (76% of the behavioral learning variables with positive cluster centers).
7
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
extension more often, while students with high self-regulatory ability remaining 40 were high self-regulated learners. The three-cluster results
tend to access rubrics, the course, topics, and items more frequently, categorized these students into: (1) low self-regulated learners who have
spend more time accessing the course per module, post more discussion negative cluster centers in all five SRL scales; (2) moderate self-
threads and replies, have higher lecture and unit completion rate before regulated learners who have negative cluster centers in goal setting,
the deadline, start to work on assignment earlier, and have higher self- task strategies, and time management but positive cluster centers in self-
assessment and bonus quiz completion rate. The results also show that evaluation and help-seeking; (3) high self-regulated learners who have
although there are no significant differences, students with low self- positive cluster centers in all five SRL scales. Among the 65 participants,
regulatory ability tend to access the syllabus more frequently and 29 were classified as high self-regulated learners, another 23 were
spend more time viewing the syllabus, but students with high self- moderate self-regulated learners, and the remaining 13 students were
regulatory ability tend to download the syllabus. low self-regulated learners. The ANOVA results indicate that all ten of
Using the second cluster assumption, three cluster analysis results the SRL scales had significantly different cluster centers in the three
indicated that students could be categorized into low (cluster 1–20% of clusters.
the behavioral learning variables with positive cluster centers), mod Based on the cluster analysis of both LMS trace data and self-reported
erate (cluster 2–52% of the behavioral learning variables with positive SRL data, similar learning behavior patterns were found among low,
cluster centers), and high (cluster 3–76% of the behavioral learning moderate, and high self-regulated learners although the trace data re
variables with positive cluster centers) self-regulated learners (Fig. 3). sults were not as straightforward as the self-reported SRL data. However,
Among the 65 participants, 20 were classified as high self-regulated upon further examination on the results of these two types of data
learners, 20 students were moderate self-regulated learners, and the analysis, only 53.846% and 30.769% of cases were classified into the
remaining 25 students were low self-regulated learners. Based on the same cluster in the two-cluster and three-cluster analysis respectively.
ANOVA results, among the 25 behavioral learning variables, 18 had This indicates that although the trace data and self-reported SRL data
significantly different cluster centers among the three clusters. The have similarities, they are different. This leads to the next research
significant results show that high self-regulated learners in general question.
accessed the course more often, visited more topics and items, actively
participated in online discussions more often, and spent more time on 6.4. RQ4: what are the explanations for any differences between the
the course. They also tended to work on assignments or modules ahead Students' self-reported SRL data and the digital trace data?
of time to avoid late submissions, and the average unit completion
before the due date is also higher than the other two clusters. It is sur To answer this research question, three types of data were used to
prising that they tended to ask fewer questions and seldom revisited triangulate (Table 2). Among the 65 participants, 11 voluntarily
previous modules. participated in a semi-structured interview. According to the cluster
The moderate self-regulated learners had medium cluster center analysis results of trace data and self-reported SRL data, these 11 stu
values in most of the behavioral learning variables, but they also had dents were classified into different clusters. Table 7 shows the cluster
some extreme cluster center values in some of the behavioral learning results, with these 11 students represented by the letters Sa – Sk. It is
variables. They tended to visit the syllabus and rubrics the least apparent that students Sc, Sd, and Si were classified as high self-regulated
frequently, but they had the highest cluster center value of syllabus learners by all these four cluster analysis methods. Although students Se,
downloading. They also logged into the course least frequently and had Sf, and Sg were classified into different groups, the results are close. The
the largest late submission value, while they have the highest cluster big differences exist between the results for students Sb, Sh, Sj, and Sk.
center values of revisiting previous modules and asking the instructor Overall, it could be construed that both trace data and self-reported SRL
questions. In general, the moderate self-regulated learners spent less data were generally able to classify high self-regulated learners more
time studying and were not good at time management, but they tended accurately than moderate or low self-regulated learners in this study.
to improve their performance through strategies such as seeking help The logical next step is to compare high self-regulated learners with
from the instructor, revisiting previous modules, reading the syllabus to moderate or low self-regulated learners to explore what causes the dif
better understand the course requirements, etc. ferences by incorporating the interview data collected.
The low self-regulated learners had the lowest cluster center values Based on the cluster analysis results, students Sc, Sd, and Si can easily
in most of these behavioral learning variables. Based on the statistically be classified as high self-regulated learners. Three out of the four cluster
significant results, they spent least time studying, visiting topics and analysis methods identify students Se and Sf as high self-regulated
items, and watching lectures. They were the least active group in the learners, and the other method identifies them as moderate self-
online discussions. They were the latest group to begin to study a regulated learners (they can roughly be categorized as high self-
module and tended to submit assignments late, but their late submission regulated learners in order to be divided into two groups of similar
happened not as often as the moderate self-regulated learners. It appears numbers and compare them). Therefore, students Sc, Sd, Se, Sf, and Si are
that they tend to get the work done in the least time. They also logged grouped into “higher” self-regulated learners and the rest six students
into the course more frequently than the moderate group. However, they are categorized as “lower” self-regulated learners. Here, “higher” and
had the lowest cluster center values in self-evaluation and help-seeking.
Overall, they are a group of students who spend the least time studying Table 7
and do not utilize many learning strategies. Cluster memberships of the eleven students based on the cluster analysis results.
In summary, based on the cluster analysis results of LMS trace data, Student Trace data 2 Trace data 3 SRL survey 2 SRL survey 3
high self-regulated learners tend to spend the most time and effort in Clu Clu Clu Clu
studying and do not seek help much; moderate self-regulated learners
Sa High Moderate High Moderate
tend to spend moderate time and effort in studying and seek help the Sb High High Low Moderate
most often; low self-regulated learners tend to spend the least time and Sc High High High High
effort in studying and seek help the least often. Sd High High High High
As a comparison, cluster analysis was also conducted based on stu Se High Moderate High High
Sf High Moderate High High
dents' self-reported SRL data in both the pre-course and post-course Sg Low Low Low Moderate
surveys. The two-cluster results categorized these students into: (1) Sh Low Low High High
one group has all the SRL scales with negative cluster centers; (2) one Si High High High High
group has all the SRL scales with positive cluster centers. Among the 65 Sj High Moderate Low High
Sk Low Low High High
students, 25 were classified into low self-regulated learners, and the
8
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
“lower” are used because these students are classified comparably. Let's students Sc, Sd, and Si, their average hours of submitting assignments
first compare these two groups by looking at the correlations between before the deadline were 58, 91, and 18 h, respectively. It is noteworthy
the three types of data: LMS trace data, self-reported SRL data, and that although student Si was taking five courses in the semester, he/she
interview data. Due to the fact that only limited quantitative data can be was still able to complete all the assignments 18 h before the deadline on
collected through interviews, the comparison mainly focused on stu average, which is not easy. Based on the excerpts from students Sc and
dents' time management ability. Table 8 shows the correlation values Sd, they also seem to be able to balance study and personal life well.
comparison between the higher and lower self-regulated learners ach They tend to get all their school work done by Friday and enjoy a study-
ieved by cross-checking these three types of data. Based on previous free weekend.
analysis, trace data reflect students' self-regulated learning more accu
rately than the self-reported SRL data, so trace data is compared with 6.4.1.2. Pattern two: being able to identify his/her weakness through self-
both the interview data and the self-reported SRL data. As shown in reflection. Being able to identify one's weakness through self-reflection
Table 8, overall, higher self-regulated learners tend to self-report their of learning and being honest with oneself is another important sign of
SRL more accurately while lower self-regulated learners' self-reported high self-regulatory ability.
data tend to have more negative correlations with trace data. This ex Student Sc admitted that he/she should work on the graduate school
plains why trace data and self-reported SRL data tend to classify high assignment earlier, saying:
self-regulated learners accurately. The negative correlations found
among lower self-regulated learners indicate that these students tend to I think if I were to do anything differently, it would be the graduate
self-rate SRL scales higher. school stuff, because I did that all in the last, like month of classes.
And so, if I had maybe looked at that before, it would ease my stress a
6.4.1. Patterns of higher self-regulated learners little bit.
The theme analysis results show that higher self-regulated learners Student Sd expressed a similar thought that he/she should have
have the following two patterns: (1) completing the assignments early started the graduate students' module earlier because that module was
and (2) being able to identify his/her weakness through self-reflection. more challenging for him/her.
Student Si also planned to change, saying:
6.4.1.1. Pattern one: completing the assignments early. Students Sc, Sd,
and Si all tended to get the assignments done early. This feature is a well- Um, I would definitely try to get in contact with my classmates more
recognized and important sign of high self-regulatory ability. often just so because I know there're other people out there who
Student Sc said: probably don't understand some things or just wanted to keep like …
keep each other accountable for our work…So I definitely love to do
“I am the type of person who doesn't like to procrastinate too much. that and probably be more in contact with the professor to just like
So especially for this course, … so all the course assignments were sending frequent emails...
due on Sunday nights, … well, I would try to have the lectures and
the quizzes done by Wednesday. And I would try to have the practical Both students Sc and Sd admitted that they did not start to work on
lab assignments and the discussion posts done by Friday so that I the graduate students' assignments early on in the semester, which was a
could have the weekend to myself...” mistake. Student Si reflected that he/she should interact with classmates
or the professor more frequently. The reflections of students Sc, Sd, and Si
Student Sd expressed a similar thought that he/she would usually were very specific and obtainable. They admitted their mistakes frankly
have all the assignments done by Friday so that he/she did not have any and did not use any excuse.
work to do at weekend. Student Si also indicated that he/she would get
all the assignments done by Friday so that he/she could email the pro 6.4.2. Patterns of lower self-regulated learners
fessor when he/she was confused with anything. The theme analysis results show that lower self-regulated learners
Although for different reasons, these students all aimed at tend to procrastinate or study just enough to get their desired grade.
completing the work at least 2 days before it was due so that there would They also do not plan to change their study approaches or strategies
be some flexibility when they needed it. The trace data show that for much. Because the regression analysis shows trace data can reflect stu
dents' learning more accurately, students Sg, Sh, and Sk are used here to
Table 8 represent lower self-regulated learners.
Correlation comparison between “higher” and “lower” self-regulated (SR)
learners. 6.4.2.1. Pattern one: procrastinate or not studying hard. Procrastination
Correlations Higher SR Lower SR or not studying hard is an important sign of low self-regulatory ability.
learners learners Student Sg admitted that he/she is dilatory, saying:
Avg. time spent on each module Trace Vs. 0.135 0.681
Interview
I mean, there's a couple of weeks where, you know, I just get busy.
Avg. time spend on each module Trace Vs. Survey 0.015 ¡0.575 And I had to take the quizzes on Sunday or to do the discussions on
time management scale pr Sunday, but there was always preparation prior to doing them, even
Avg. time spent on each module Trace Vs. Survey ¡0.239 ¡0.623 if I did have to wait till the last minute to do those, which didn't
time management scale po
happen very often.
Avg. days visit per module Trace Vs. Interview ¡0.295 ¡0.865
Total # of questions asked to instructors Trace Vs. 0.199 ¡0.629
Student Sg also admitted that he/she sometimes does not want to
Survey help-seeking pr
Total # of questions asked to instructors Trace Vs. 0.410 ¡0.625 study although he/she knows it may harm his/her grade and bring
Survey help-seeking po stress:
Avg. assignment advance submission Trace Vs. 0.824 ¡0.240
Interview I mean, there are times like I may not want to study, but it's also, you
Avg. assignment advance submission Trace Vs. 0.380 ¡0.215 know, of a thing where it's like for the grade or good. And I have an
Survey time management scale pr understanding of that, like if I could just go ahead and knock it out
Avg. assignment advance submission Trace Vs. 0.622 0.060
now, tonight or later this week, it won't be as big of a stressor and I'll
Survey time management scale po
have a grasp on it and be able to perform well.
Note. n = 5 for higher SR learners; n = 6 for lower SR learners. Numbers in bold
indicate negative correlations. Student Sh also admitted that he/she sometimes tends to
9
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
10
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
be used as strategies to improve learners' grades. Therefore, in future mastery, conceptual understanding, and critical thinking. The results of
self-regulated learning studies, we should be cautious with the use of the three cluster analysis in this study are consistent with Bain's postu
these two scales. It is also questionable if the self-reported SRL instru lation: high self-regulated learners use the deep approach, so they spend
ment is valid in its ability to reflect students' true self-regulatory ability. a lot of time and effort in studying; moderate self-regulated learners use
By comparing students' self-reported SRL data with their trace data the strategic approach, so they tend to spend moderate time and effort in
collected through the course, we found that students' digital trace data studying and use learning strategies such as self-evaluation and help-
from the LMS reflect students' learning more accurately than self- seeking more often to improve their grades; and low self-regulated
reported SRL, which is consistent with existing research results (Cho & learners use the surface approach, studying just enough to get the
Yoo, 2017; Hadwin et al., 2007; Li et al., 2020). However, we should be grades they wanted.
aware that digital trace data heavily rely on the context of the course. By comparing the trace data with self-reported SRL data and the
For example, a student who spends little time and effort on this course interview data, it was found that high self-regulated learners tend to be
may spend a lot of time and effort on a different course due to different more consistent when reporting their self-regulatory ability. Through
perceived demands and interests. Therefore, it is questionable to inter thematic analysis, it was found that high self-regulated learners tend to
pret students' self-regulatory ability mainly based on the digital trace submit assignments early and are able to self-reflect and see their own
data of a particular course. A combination of digital trace data from shortcomings clearly. In contrast, low self-regulated learners tend to
multiple courses may be more reliable than that of a single course. We procrastinate and, surprisingly, do not plan to change their behaviors
also should be cautious about potential ethical issues when using digital much. It appears that low self-regulated learners are not concerned with
trace data to interpret students' self-regulated learning abilities. Besides their study habits although the digital trace data show that they do not
the potential bias that exists in using digital trace data from a limited have good study habits (tend to spend the least time and effort in
number of courses, some students may prefer to download learning studying). They might have intrinsically lower expectations in learning
materials to study offline and digital trace data will not account for this. compared to high self-regulated learners. This may explain why low self-
Using digital trace data alone may lead to inaccurate judgments about regulated learners tend to self-report their SRL ability relatively high.
the participation of these students. Just as Perrotta (2013) pointed out: Another possible explanation is that low self-regulated students may
“the decontextualized analysis of student data and the powerful per know their own weakness in self-regulation, but they choose not to
formativity arguments that underpin them may subvert concerns for admit it. Based on Boekaerts' dual processing model (Boekaerts, 2011),
social equity and justice” (p. 119). More research should be conducted to low self-regulated learners might choose a well-being pathway to pro
explore and establish possible ethical practices and policies in relation to tect their ego from damage. According to Zimmerman (2013), it also
the use of digital trace data. For example, increasing the transparency in could be that high self-regulated learners are proactive learners who can
data ownership, analysis, and use may be one ethical practice that could plan their learning strategically in order to see their limitations, while
be adopted (Pardo & Siemens, 2014). low self-regulated learners are reactive learners who cannot identify
their own weaknesses without comparing themselves with others. It is
7.3. Important learning behavior variables not clear if low self-regulated learners truly are unable to identify their
own weaknesses or, instead, they are choosing not to admit these
The results of this study indicate that the most important learning weaknesses exist. How to help low self-regulated learners to identify
behavioral variables that impact students' learning are: study regulation, their own limitations and motivate them to change might be worth
online discussion interactions, timing of assignment submission (termed further exploration in future studies.
“procrastination” in the literature), time investment, and completion. Moderate self-regulated learners may value grades more than
This finding is consistent with existing reports (Cho & Yoo, 2017; learning. It is critical to create a learning environment in which learning
Colthorpe et al., 2015; Gelan et al., 2018; Kim et al., 2018; Li et al., 2018; and grading work together for the students; for example, Farias, Farias,
You, 2016). Compared to previous research, the results of the current and Fairfield (2010) pointed out that educators could assign grades with
study show that students' online discussion interactions are the most a developmental perspective in which students can respond to feedback.
important learning behavioral variable, which has been largely neglec One means of supporting grades-oriented learners is to provide detailed,
ted by other works. However, online discussions are worth about 28% of critical feedback with high standards and assure learners that they are
the final grade in the course, which may be the reason why online dis capable of achieving those standards. Through the process, they may
cussion interactions have been found to be the most important behav become confident with their ability and eventually shift to learning-
ioral learning variables and would merit further study. oriented. This study also shows that educators should be cautious with
the dilemma of learning versus grades when using self-evaluation and
7.4. Learning behavior patterns based on students' trace data help-seeking strategies in online courses. For example, it may be not
ideal to give unlimited attempts to a quiz. When answering students'
Existing self-regulated learning theories tend to categorize students questions, instead of giving answers directly, try to give hints, inspire
into two groups, such as proactive and reactive learners (Zimmerman, students, or ask thought-provoking questions to promote students' in
2013) and learners with mastery and performance approach goals dependent thinking and deep learning.
(Pintrich, 2000). According to the significant results of two cluster
analysis, high self-regulated learners align well with learners with 7.5. Limitations of the study
mastery approach goals, but there is no clear evidence to support that
low self-regulated learners align with performance approach goals in Although students in this course came from several different disci
this study. Both the elbow and silhouette methods have been performed plines, one of the limitations of this study is that data were collected
to identify the optimal cluster numbers. Although the result of the elbow from a single online course. Further studies in different disciplines and
method appears to support both two and three clusters, the result of the multiple online courses should be conducted to further validate the re
silhouette methods shows that the optimal cluster number is three. sults of this study. In addition, only part of the students in the class
Moreover, it is more informative to categorize students into three to agreed to participate in this study, which may bring bias. Further studies
provide better differentiation. Bain (2004) suggested that most learning with more complete data should be conducted to yield more convincing
approaches fall into three categories: 1) the surface approach, in which results.
students are interested primarily in surviving the course; 2) the strategic Another limitation of this study is that the course final grade was
approach, in which students are driven by a desire to receive good used as the outcome variable in the linear regression. It may be better to
grades; and 3) the deep approach, in which students are learning for use students' GPAs as the outcome variable because students' self-
11
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
regulatory ability is a relatively stable ability and GPAs are a kind of Data availability
longitudinal data, which may match better than the final grades of a
particular course. In this study, final grades were used because they The authors do not have permission to share data.
reflect students' learning very well, as explained in the methods section.
In addition, it is difficult to acquire students' GPAs. However, this study Acknowledgements
involved three types of data resources, and data triangulation was used
to ensure the validity of the findings. I thank Dr. Lloyd Rieber at the University of Georgia for his guidance
and advice throughout this research project and for his extensive feed
8. Conclusion and future research back and suggestions on earlier drafts of this manuscript.
12
D. Ye and S. Pennisi The Internet and Higher Education 54 (2022) 100855
Pintrich, P. R. (2004). A conceptual framework for assessing motivation and self- Winne, P. H., & Perry, N. E. (2000). Measuring self-regulated learning. In M. Boekaerts,
regulated learning in college students. Educational Psychology Review, 16(4), P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 531–566). San
385–407. Diego: Academic Press.
Pintrich, P. R. (2000). The role of goal orientation in self-regulated learning. In You, J. W. (2016). Identifying significant indicators using LMS data to predict course
M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. achievement in online learning. The Internet and Higher Education, 29, 23–30.
452–502). San Diego, CA: Academic Press. Yu, Q., & Zhao, Y. (2015). The value and practice of learning analytics in computer
Siadaty, M., Gasevic, D., & Hatala, M. (2016). Trace-based microanalytic measurement of assisted language learning. Studies in Literature and Language, 10(2), 90–96.
self-regulated learning processes. Journal of Learning Analytics, 3(1), 183–214. Zimmerman, B. J. (2000). Attaining self-regulation: A social cognitive perspective. In
https://doi.org/10.18608/jla.2016.31.11 M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp.
Van Rooij, S. W., & Zirkle, K. (2016). Balancing pedagogy, student readiness and 531–566). San Diego: Academic Press.
accessibility: A case study in collaborative online course development. The Internet Zimmerman, B. J. (2008). Investigating self-regulation and motivation: Historical
and Higher Education, 28, 1–7. background, methodological developments, and future prospects. American
Winne, P. H. (2010). Improving measurements of self-regulated learning. Educational Educational Research Journal, 45(1), 166–183.
Psychologist, 45(4), 267–276. https://doi.org/10.1080/00461520.2010.517150 Zimmerman, B. J. (2013). From cognitive modeling to self-regulation: A social cognitive
Winne, P. H., & Hadwin, A. F. (1998). Studying as self-regulated learning. In career path. Educational Psychologist, 48(3), 135–147.
D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory
and practice (pp. 279–306). Hillsdale, NJ: Erlbaum.
13