You are on page 1of 8

Computers in Human Behavior 98 (2019) 166–173

Contents lists available at ScienceDirect

Computers in Human Behavior


journal homepage: www.elsevier.com/locate/comphumbeh

Prediction of academic performance associated with internet usage T


behaviors using machine learning algorithms
Xing Xua,b,c,∗, Jianzhong Wanga, Hao Pengd, Ruilin Wua,∗∗
a
School of Humanities and Social Sciences, Beihang University, Beijing 100191, China
b
Informatization Office, Beihang University, Beijing 100191, China
c
School of Information Technology and Engineering, University of Ottawa, Ottawa K1N 6N5, Canada
d
School of Computer Science and Engineering, Beihang University, Beijing 100191, China

A R T I C LE I N FO A B S T R A C T

Keywords: College students are facilitated with increasingly convenient access to the Internet, which has a civilizing in-
Higher education fluence on students' learning and living. This study attempts to reveal the association between Internet usage
Academic performance behaviors and academic performance, and to predict undergraduate's academic performance from the usage data
Internet usage behaviors by machine learning. A set of features, including online duration, Internet traffic volume, and connection fre-
Behavior discipline
quency, were extracted, calculated and normalized from the real Internet usage data of 4000 students. Three
Self-control
Machine learning
common machine learning algorithms of decision tree, neural network and support vector machine were used to
predict academic performance from these features. The results indicate that behavior discipline plays a vital role
in academic success. Internet connection frequency features are positively correlated with academic perfor-
mance, whereas Internet traffic volume features are negatively associated with academic performance. From the
perspective of the online time features, Internet time consumed results in unexpected performance between
different datasets. Furthermore, as the number of features increase, prediction accuracy is generally improved in
the methods. The results show that Internet usage data are capable of differentiating and predicting student's
academic performance.

1. Introduction self-control dilemma of problematic Internet usage (Dunbar, Proeve, &


Roberts, 2018). Internet usage behaviors are shown to be associated
In the recent decades, universities are providing the students with with academic performance. Higher rates (Ravizza, Hambrick, & Fenn,
increasingly convenient and easy access to the campus network. It is 2014) and excessive use (Cao, Masood, Luqman & Ali, 2018; Muusses,
simple for students to connect to the Internet almost everywhere and Finkenauer, Kerkhof, & Billedo, 2014) of Internet were reported to be
anytime with computers and smart devices on the campus. Nowadays, associated with poorer academic achievement. The study of Cao, Gao,
the effect of Internet use on students' academic performance remains Lian, Rong, Shi, Wang et al. (2018) expressed that Internet addiction is
controversial. The application of the Internet beneficially influences the uppermost reason resulting in the failure of university study in
students' engagement in self-directed learning Rashid and Asghar China. Moreover, with the increasing popularity of smartphones, stu-
(2016). The knowledge or new skills found on the Internet boost the dents spend considerable time in mobile communication, learning,
academic performance of the students with relatively low self-efficacy entertainment, information seeking, social networking and gaming,
(Zhu, Chen, Chen, & Chern, 2011). A study has demonstrated that the which are based on the Internet. Recent research has shown that uni-
Internet has positive effects on educational attainment (Suhail & versity students with a high risk of smartphone addiction are less likely
Bargees, 2006). In addition, with the appearance of the MOOC and E- to achieve high academic performance (Giunchiglia, Zeni, Gobbi,
learning platforms, there are abundant learning resources that play a Bignotti, & Bison, 2018; Hawi & Samaha, 2016).
positive role in promoting students' academic achievement (Jung & Lee, There are extensive residential universities and colleges in the world
2018). However, Internet can also negatively impact on the students’ (e.g. most Chinese universities, and residential colleges such as those in
academic progress. The pervasiveness of the Internet has activated a universities such as Harvard, Yale, Oxford, Cambridge, etc.). These


Corresponding author. Informatization Office, Beihang University, Beijing 100191, China.
∗∗
Corresponding author.
E-mail addresses: xuxing66@buaa.edu.cn (X. Xu), wjz@buaa.edu.cn (J. Wang), penghao@act.buaa.edu.cn (H. Peng), wuruilin@buaa.edu.cn (R. Wu).

https://doi.org/10.1016/j.chb.2019.04.015
Received 30 January 2019; Received in revised form 7 April 2019; Accepted 19 April 2019
Available online 20 April 2019
0747-5632/ © 2019 Elsevier Ltd. All rights reserved.
X. Xu, et al. Computers in Human Behavior 98 (2019) 166–173

campuses adopt centralized management, and the campus-Internet Feng, Wong, Wong, & Hossain, 2019; Malhi, Bharti, & Sidhu, 2016;
basically covers the entire area. Students have individual accounts Muusses et al., 2014; Servidio, 2014; Zhang et al., 2018) that examined
which enable 24-h access to the campus-Internet for study, entertain- the excessive Internet use and disclosed the connection between In-
ment, or social contact etc. Self-control as a protective factor for ternet usage and academic performance. Generally, researchers have
Internet use can help improve academic competence and ensure pro- been focusing on online time (Cao et al., 2018; Malhi et al., 2016;
blems get resolved (Wills, Pokhrel, Morehouse, & Fenster, 2011). Stu- Muusses et al., 2014) and Internet connection frequency (Feng et al.,
dents' academic engagements (Zhang, Qin, & Ren, 2018) and perceived 2019; Servidio, 2014) as the important Internet usage behavioral
value of learning as a focal goal behavior (Dunbar et al., 2018) would variables. Besides, several studies (Dunbar et al., 2018; Lau, 2017;
not be undermined by the temptations of non-academic Internet using, Samaha & Hawi, 2016) focused on the usage of specific aspects and
since good behavioral self-control involves attributes such as the ten- pointed out that activities based on the Internet involving human-ma-
dency to plan ahead, consider alternatives before acting, and link be- chine interaction and entertainment-based Peer-to-Peer (P2P) applica-
haviors and consequences over time (Wills et al., 2011). On the con- tions, such as online game playing, video watching, photo & video
trary, problematic Internet usage being attributed to an individual's sharing, videos & music files transferring etc., are attractive and have
weak self-control inflicts the negative educational outcomes (Kim, an increased risk of being addicted to the Internet. Usually, these online
Namkoong, Ku, & Kim, 2008). In other words, Internet usage data that activities account for high percentage of Internet traffic volume (Novak,
contains a number of behavioral records are able to reflect students' 2008). Namely, except for online time, Internet connection frequency,
learning perceptions, study state, physical activity, and future job Internet traffic consumption might be a type of novel and meaningful
prospects etc. (Gialamas, Nikolopoulou, & Koutromanos, 2013; Zhang, variables to be emphasized. Furthermore, many studies unfolded the
Tran, Huong, Hinh, Nguyen, Tho et al., 2017). Analysis on these data online behavioral effect on academic performance from the temporal
discovered various interesting relationships between online behavioral perspective. For example, the study of Tang, Xing, and Pei (2018) in-
variables and academic performance. For example, Internet using for dicated that the first temporal segment of the online learning experi-
general or educational purposes can promote students' academic self- ence was recognized as the most critical moment. Notably, Internet
efficacy and positively correlate with academic performance (Chen, usage in the different divided time slots of each day provides important
Hsiao, Chern, & Chen, 2014). Behaviors in a learning management attributes to reflect students' behaviors and habits. Our pervious study
system can be utilized as effective predictors for students' academic (Xu et al., 2016) have taken a careful insight into the daily 24 h Internet
success (Xing, Chen, Stein, & Marcinkowski, 2016). The smartphone usage behaviors and revealed that terminating the Internet connection
and Internet technologies collectively result in more time consumption at 12 a.m. would be helpful for students' academic performance. A re-
on Internet surfing than studying, which negatively affects academic cent study further showed that surfing online for entertainment during
achievement (Giunchiglia et al., 2018). Additionally, our previous the early morning (12 a.m.–6 a.m.) revealed the association with In-
study (Xu, Wang, & Wang, 2016) revealed that the test failure rate can ternet addiction to have negative academic performance (Choi et al.,
be predicted from undergraduates' online usage behaviors. Hence, we 2018). In sum, combined with the real Internet usage data, we con-
assert that the Internet usage data provides researchers an unobtrusive cluded that online time, internet connection frequency, internet traffic
way to investigate the relationship between student's online usage be- volume and online time in divided time slots comprising the basic In-
haviors and educational activities. ternet usage behaviors are potential features for predicting students’
Studies on analyzing and predicting academic performance from academic achievements.
students' online behavioral data, which allow for intervening in ad- In the current work, we aim to use students' real Internet usage data
vance to provide guidance in students' learning, are not scarce. Some as a novel metric measure to reveal the potential relationship between
researchers focused on the MOOC platform or the learning management Internet behaviors and academic performance. To be specific, Internet
system log data that record student's system operation to analyze and usage behaviors comprises of online time, internet connection fre-
predict final grade in a specific course (Helal et al., 2018; Jiang, quency, internet traffic volume and online time in 48 divided time slots.
Williams, Schenke, Warschauer, & O'dowd, 2014; Romero, López, Luna, We will identify the ways in which students are significantly different
& Ventura, 2013; Xing et al., 2016). However, corresponding data are between performer's groups. And then, we apply commonly used ma-
difficult to collect because these kinds of systems are not available for chine learning algorithms to verify the effectiveness of predicting aca-
each course at every university. Moreover, these data only target a demic performance from students' Internet usage data. The research
specific course, and are hard to be used to represent students' engage- objectives addressed are:
ment in all courses. As an effective tool, questionnaire survey is always
used to collect Internet usage data to reveal the association between 1) Investigate significant differences between performance groups for
online behaviors and academic achievement (Apuke & Iyendo, 2018; the online usage behaviors.
Chen et al., 2014; Junco, 2012). Nonetheless, an acknowledged dis- 2) Identify Internet usage features that correlate with students' aca-
advantage of these studies is the lack of assurance because behavioral demic performance.
information may be erroneously reported by students (Giunchiglia 3) Predict student's performance from the Internet usage data with
et al., 2018; Lee, Ahn, Nguyen, Choi, & Kim, 2017). In addition, for supervised learning models.
revealing the relationship between Internet usage and academic per-
formance, some studies used special smart-phone applications to track 2. Material and methodology
students' activities (Giunchiglia et al., 2018; Wang, Harari, Hao, Zhou,
& Campbell, 2015). However, this is restricted to online behaviors on a 2.1. Dataset
single-device. By contrast, full-time students' Internet usage as a kind of
behavioral data on campus is widely available and can be easily col- This study focuses on the final grades of undergraduate students’
lected for educators. To our best knowledge, the application of full-time compulsory courses as their academic achievement. The compulsory
Internet usage on academic performance analysis and prediction has courses, also called required courses or core courses, are deemed es-
not been reported so far. sential for an academic degree.
To predict students' academic performance, it is essential to find out The undergraduate students' grades of compulsory courses were
helpful variables of Internet usage that could contribute to the effective obtained from a university's academic affair management system.
prediction. There are extensive literature (Adelantado-Renau, Diez- Accordingly, the students were divided into different subgroups. Two
Fernandez, Beltran-Valls, Soriano-Maldonado, & Moliner-Urdiales, different ways of sub-grouping were adopted in this study: 1) a sub-
2018; Cao et al., 2018; Choi, Cho, Lee, Kim, & Park, 2018; Davis, 2001; group of passed students and a subgroup of failed students, which could

167
X. Xu, et al. Computers in Human Behavior 98 (2019) 166–173

Table 1
The fragment of raw data.
UID Online time (Second) Offline time (Second) Download volume (Bytes) Upload volume (Bytes) Terminal device

1 1,475,856,004 1,475,857,219 321,525 219,947 iPhone


2 1,475,856,029 1,475,861,245 24,901,898 712,326 Android
3 1,475,856,034 1,475,873,602 89,014,930 2,333,833 Windows 10
4 1,475,856,065 1,475,856,078 1,065,582,274 16,695,871 Windows 7
5 1,475,856,198 1,475,873,885 29,944,161 2,168,729 Linux
… … … … … …

help to investigate the internet behaviors of students who are at risk for number of students, and X-axis shows the values of the corresponding
a test failure; 2) a subgroup of high score students and one of non-high variable. Combined with the descriptive analysis, it can be found from
score students, which could be helpful to identify behavior features of Fig. 1-a and Fig. 2-a that, more than 75% of the students in both da-
high-score students. Here, we identify a student with a passed label if tasets spent half a day on the Internet, and some students from each
all required courses were passed, and a failed label means that the group connected to the Internet almost all day. A high proportion of
student failed at least one exam. The student who passed the tests and students were online for 900–1300 min per day. As shown in Fig. 1-b,
whose average grade of all required courses was no less than 90 (full higher than 50% of the students from passed & failed dataset accessed
marks of 100) was labeled as high-score. On the other hand, non-high- to the Internet less than 9 times a day. Meanwhile, over 50% high-score
score represents the average grade less than 90. & non-high-score students exceeded 10 times of Internet connection
Internet usage data were collected from a university's Internet ac- from Fig. 2-b. The peak frequency of Internet connections occurred
cess system. Students' Internet usage, for example each access to the between 3 and 6 times per day in both datasets. From Fig. 1-c and
Internet resources on the campus with their accounts, were recorded. Fig. 2-c, more than 50% of the students consumed more than 1 GB
The dataset contains online time, offline time, Internet download vo- download volume per day. But most students had less than 1 GB upload
lume, Internet upload volume, terminal device type, etc. Online time volume per day as shown in Fig. 1-d and Fig. 2-d.
and offline time, which are always generated in timestamp format,
represent the detailed time of the login and logout; Internet download 3. Methods
volume represents data from Internet to user terminal; Internet upload
volume denotes data from user terminal to Internet. These columns of Because the raw data of Internet Usage cannot be directly employed
volume are calculated from network packages by the Internet access for calculation, it was transformed to computable variables for the
system, and can be described by the unit byte. Table 1 shows the further use. As we know, students should have different tasks at dif-
fragment of raw data. ferent times. For example, 6–8 a.m., 12–13 p.m., 18–20 p.m. are usually
meal-time; 8 a.m. - 12 p.m. and 14–18 p.m. are usually the time to take
2.2. Population and samplings courses or self-study; and midnight-to-dawn is sleeping time, etc. In
order to obtain more informative data reflecting Internet behaviors, the
According to the final test of the autumn semester 2016, there were 24-h online period is divided into half-an-hour time slots, i.e. 0:00–0:29
14,926 undergraduates that had taken the final examination of the a.m., 0:30–0:59 a.m …. 23:00–23:29 p.m., 23:30–23:59 p.m., which are
compulsory courses. 20.21% students failed at least one test, and indicated by Time1 to Time48. The average online duration within
10.34% undergraduates belonged to the high-score group. every half-an-hour time slots and the average daily total online duration
As mentioned above, we adopted two ways of sub-grouping, i.e. of each student are then calculated.
passed group and failed group in the passed & failed dataset; and high- In addition, the preference of mobile device or PC device can be
score group and non-high-score group in the high-score & non-high- extracted from the raw data, which also indicates the Internet con-
score dataset. In order to balance the sample size, we randomly chose nection frequency with different terminals per day. Besides the features
1000 students from each group in the two sub-datasets. Then, we ob- of online duration and connection frequency, the average volumes of
tained the Internet usage data of the sample students. Removing the Internet data consumed per day are also calculated. After data pre-
National day from the dataset, there was a long period of 80 days when processing, the features from the Internet usage dataset are divided into
there were no holidays from 2016 to 10-8 until 2016-12-26. Thus, most three categories, which are listed in Table 2.
students were supposed to live on campus, and kept continuous Internet To reveal the difference between usage features and performance
usage records. Altogether, we obtained more than 20 million records level, Mann-Whitney U test is performed to test the significance of the
from 4000 sample students of the two datasets. differences between groups because of their skewed distributions.
Among these Chinese participants, the majority are male students Then, the Spearman's correlation coefficient is applied to quantify the
(n = 3085, 77.13%), while the remaining are female (n = 915, correlation between academic performance and students' behavioral
22.87%). The student cohort consist of sophomores (n = 1143, features above. Finally, for examining the substantive validity of pre-
28.58%), junior students (n = 1102, 27.55%), senior students diction from Internet usage data, decision tree (DT), neural network
(n = 886, 22.15%) and freshmen (n = 869, 21.72%). The age dis- (NN), support vector machine (SVM), as three common machine
tribution are: 19 years-old (n = 1028, 25.7%), followed by 20 years-old learning predictive algorithms for classification to make the novel
(n = 970, 24.25%), 21 years-old (n = 783, 19.58%), 18 years-old features and predictive framework have generalized value, are per-
(n = 692, 17.3%), 22 years-old (n = 303, 7.57%), and then those aged formed to predict student's study performance.
23 years and above (n = 126, 3.15%), as well as aged 17 and below
(n = 98, 2.45%). 4. Results
The overview of the sample datasets is presented in Fig. 1 for passed
& failed students and in Fig. 2 for high-score & non-high-score students, 4.1. Difference between performance groups for the internet behaviors
respectively. In Figs. 1 and 2, distribution of the average daily online
duration, the average daily download volume, the average daily upload In order to investigate whether these groups of students (passed vs.
volume, and the average daily number of Internet connections as raw failed, high-score vs. non-high-score) differed on their Internet beha-
Internet usage variables are expressed separately. Y-axis represents viors, non-parametric Mann-Whitney U test was used. Comparisons of

168
X. Xu, et al. Computers in Human Behavior 98 (2019) 166–173

Failed students

a b

c d
Fig. 1. The distribution of the passed & failed dataset, the daily online duration (a), the daily number of connections (b), the daily download data (c), the daily upload
data (d).

the features of daily online duration, Internet traffic volume and con- frequency and total connection frequency. Failed students score sig-
nection frequency between different subgroups are listed in Table 3. nificantly higher on download volume, upload volume and total volume
From Table 3, the result reveals that features of Internet volume and compared to passed students. But in terms of connection frequency,
connection frequency have significant differences between passed and passed students are reported significantly higher. They also spend
failed students. Most features also show significant differences between longer time on Internet per day in average than the counterparts, de-
high-score and non-high-score students, except for mobile connection spite not significant. Meanwhile, high-score students score significantly

a b

c d
Fig. 2. The distribution of the high-score & non-high-score dataset, the daily online duration (a), the daily number of connections (b), the daily download data (c), the
daily upload data (d).

169
X. Xu, et al. Computers in Human Behavior 98 (2019) 166–173

Table 2 Time21 (r=0.108, p < 0.001), Time22 (r=0.103, p < 0.001),


The features set from Internet usage dataset. Time23 (r=0.101, p < 0.001), Time46 (r=0.111, p < 0.001),
Categories Online duration Internet volume Connection frequency Time47 (r=0.117, p < 0.001), and Time48 (r=0.112, p < 0.001).
Meanwhile, in the high-score & non-high-score dataset, students' study
Features Time1 Download volume Mobile connection performance was significantly negatively correlated with Time5 (r=-
Time2 Upload volume PC connection
0.119, p < 0.001), Time6 (r=-0.143, p < 0.001), Time7 (r=-0.152,
… Total volume Total connection
Time47
p < 0.001), Time8 (r=-0.148, p < 0.001), Time9 (r=-0.141,
Time48 p < 0.001), Time10 (r=-0.136, p < 0.001), Time11 (r=-0.134,
Daily online duration p < 0.001), Time12 (r=-0.132, p < 0.001), Time13 (r=-0.130,
p < 0.001), Time14 (r=-0.123, p < 0.001), and Time15 (r=-0.109,
p < 0.001).
lower on daily online duration, download volume, upload volume, total
volume compared to their counterparts. On the contrary, they have
4.3. Predicting the academic performance
significantly higher PC connection than the students from the low-score
group.
Although some significant differences have been observed, this
Figs. 3 and 4 depict the mean values of daily 48-time slots between
point of the view gained by the difference analysis with Mann-Whitney
different groups. On average, passed students of Fig. 3 spent much more
U test cannot be directly applied to make predictions about academic
time on Internet within every time slot, with the exception of early
performance. The significant correlation implies that the Internet be-
morning (Time5 - Time16). And it can be found from Fig. 4 that non-
haviors can be considered as feature classes to predict students’ aca-
high-score students were more likely to use the Internet almost in each
demic performance. We applied DT, SVM, and NN as supervised
time slot. For group comparison the Mann-Whitney U test was used. The
learning techniques. The detailed usage features were normalized for
test result between passed and failed students presented that, all the
promoting the computing efficiencies of SVM and NN, because Decision
nodes showed significant differences, excluding the Time4, Time5,
Tree-based estimators are robust to arbitrary scaling of the data.
Time15, Time16, Time17 of Fig. 3. On the other hand, there were
In order to figure out the effective features for the model, three
significant differences on 39 time slots between non-high-score and
feature groups were formed by involving different combination of
high-score undergraduates, except for Time1, Time19 - Time23 and
features, as follows: 1) Group_1 comprises the features of online dura-
Time46 - Time48 of Fig. 4.
tion category; 2) Group_2 contains features from categories of online
duration and Internet volume. Besides the features of Group_1 above,
4.2. Correlating internet usage features with academic performance there are three Internet traffic features; 3) Group_3 is made up of all the
three categories features, in which mobile connection frequency, PC
The well-known Spearman's nonparametric correlation analysis was connection frequency and total connection frequency are added based
performed to quantify the correlation strength between academic per- on the group 2.
formance and features. Table 4 lists the test results, which exclude daily The datasets with different feature groups were loaded into the
48-time slots. predictive models. And then we estimated the held-out data to obtain
As shown in Table 4, students' academic performance from passed & the predictive results of passed & failed and high-score & non-high-
failed dataset displayed a statistically significant association with all the score separately. Each of those two classes contains the same number of
connection frequency features. And the study performance also in- students, corresponding to a baseline accuracy of 50%.
dicated a significant negative relationship with download volume (r=- Considering the randomness of the computation, to confirm the
0.161, p < 0.001) and total volume (r=-0.146, p < 0.05). From the validity of the results, each feature group was computed by every ma-
high-score & non-high-score dataset, students’ performance showed a chine learning model. And then computation for ten times was per-
significant correlation with daily online duration (r=-0.101, formed to obtain the average predictive accuracy. The python programs
p < 0.001), download volume (r=-0.237, p < 0.001), upload volume were used in the study. Fig. 5 shows the predicting results from dif-
(r=-0.118, p < 0.001), total volume (r=-0.227, p < 0.001), and PC ferent features with the three methods.
connection frequency (r=0.110, p < 0.001). The finding indicated that the accuracies of all the techniques were
As for the correlation results between performance and daily 48- improved as the number of features increased. Especially, except for the
time slot, we also performed the Spearman's nonparametric correlation results of DT approaches, predicting accuracies from Group_3 features
tests. Students' academic performance in the passed & failed dataset was were significantly higher than the results from Group_1. The average
significantly positively related to Time1 (r=0.106, p < 0.001), prediction results of DT ranged 60.95%–62.30% from passed & failed
Time19 (r=0.103, p < 0.001), Time20 (r=0.111, p < 0.001), dataset and ranged 58.15%–60.35% from the features of another

Table 3
Difference between performance groups without 48-time slots.
Features Failed students (n = 1000) Passed students U Non-high-score students High-score students U
(n = 1000) (n = 1000) (n = 1000)

M(SD) M(SD) M(SD) M(SD)

Daily online duration 893.04 (382.45) 944.77 (298.63) 4.871E5 944.96 (326.25) 906.49 (296.56) 4.418E5∗∗∗
Download volume 2.13E9 (1.957E9) 1.49E9 (1.285E9) 4.068E5∗∗∗ 1.79E9 (1.588E9) 1.20E9 (1.466E9) 3.633E5∗∗∗
Upload volume 4.77E8 (1.091E9) 3.68E8 (7.512E8) 4.745E5∗ 4.49E8 (1.146E9) 2.88E8 (5.934E8) 4.318E5∗∗∗
Total volume 2.6E9 (2.614E9) 1.86E9 (1.723E9) 4.16E5∗∗∗ 2.24E9 (2.324E9) 1.48E9 (1.777E9) 3.691E5∗∗∗
Mobile connection 9.17(10.64) 10.21(9.81) 4.328E5∗∗∗ 10.43(10.64) 10.11(10.29) 4.975E5
PC connection 2.29(2.46) 3.00(3.05) 3.931E5∗∗∗ 2.73(2.62) 3.10(2.44) 4.362E5∗∗∗
Total connection 11.45(11.60) 13.21(11.07) 4.226E5∗∗∗ 13.16(11.77) 13.21(11.30) 4.838E5

∗∗∗
p < 0.001.
∗∗
p < 0.01.

p < 0.05.

170
X. Xu, et al. Computers in Human Behavior 98 (2019) 166–173

Fig. 3. The mean values of half-an-hour online duration of passed & failed dataset.

Fig. 4. The mean values of half-an-hour online duration of high-score & non-high-score dataset.

Table 4 model has effective predicting power from Internet usage data between
Correlations using Spearman's Rho coefficient. passed and failed, and between high-score and non-high-score respec-
Features Passed & Failed High-score & Non-high-score
tively.
dataset dataset

Daily online duration 0.022 −0.101∗∗∗


5. Discussion
Download volume −0.161∗∗∗ −0.237∗∗∗
Upload volume −0.044 −0.118∗∗∗ The study aimed to discover the difference of Internet usage beha-
Total volume −0.146∗ −0.227∗∗∗ viors between undergraduate students’ academic performance levels, as
Mobile connection 0.116∗∗∗ 0.004
well as the association between the behaviors and academic perfor-
PC connection 0.185∗∗∗ 0.110∗∗∗
Total connection 0.134∗∗∗ 0.028 mance. Further, we explored whether Internet usage behaviors can be
used to predict their academic performance. We found that Internet
∗∗∗
p < 0.001. usage data have a valid predictable power for academic performance of
∗∗
p < 0.01. individuals.

p < 0.05. The results showed that there were significant differences between
academic performance groups for the Internet behaviors. Higher
dataset; the average predicting results of NN ranged 67.75%–70.95% Internet traffic volumes were consumed by students from lower per-
and ranged 63.70%–68.70%; SVM showed 69.55%–72.75% and formance groups, implying that the students might always use the
65.35%–69.55%. The highest values of average accuracy for the SVM Internet for leisure with heavy Internet traffic, for example playing
were 72.75% ± 2.14% (Mean ± SD) and 69.55% ± 1.38% video games, using video chat, watching or downloading videos etc.,
(Mean ± SD) from the two experiments, respectively. These values which supports the points of Chen and Peng (2008) stating that non-
were 9.20%–10.45% and 0.85%–1.80% higher than the accuracies with heavy Internet using students had better academic grades than heavy
the DT and NN, respectively. users addicted to non-academic web surfing. Considering the significant
Although the accuracies of DT were significantly lower than the differences regarding daily online duration and Internet traffic volume
values of NN and SVM with any feature group, the average predictive between academic performance groups, it could be argued that students
results exceed the baseline accuracy by about 12.30% in passed & failed with heavy Internet usage might not have the necessary effective skills
dataset and about 10.35% in high-score & non-high-score dataset. to use the Internet in a way that contributes to their academic
Accordingly, using NN and SVM instead of DT result in the performance achievements, this result is in line with Uzun and Kilis (2019). The
of about 20.95% (from passed & failed data), 18.70% (from high-score undergraduates from the better groups accessed to Internet more fre-
& non-high-score data), and 22.75% (from passed & failed data), quently everyday than the students of opposite groups did, which could
19.55% (from high-score & non-high-score data) above baseline. As the be explained by that the students might be inclined to surf Internet as
results indicate, when the academic performance class is provided, the needed with strong self-control for obtaining study resources and

171
X. Xu, et al. Computers in Human Behavior 98 (2019) 166–173

Fig. 5. The predicting results from different features with the three methods. Predictive results are presented as box plots, in which the box indicates quartile values;
the whiskers indicate the upper and lower values. Means with different letters differ significantly (Tukey's-b post hoc test, α = 0.05; a > b > … > e).

experiences to get better achievements. It is consistent with the re- study of Kassarnig et al. (2018) based on the smartphone usage data.
search of Rashid and Asghar (2016). Nevertheless, this result contra- The accuracy is also in a similar level with the best prediction sub-
dicts with the study (Servidio, 2014) that frequency of Internet con- model of student's passing or failing the course from the online activity
nections per day strongly related to Internet addiction bringing about data generated from the university learning management system (LMS),
poorer academic achievement. In addition, from Time47 to Time12 of in which the accuracy ranges 71%–76% (Helal et al., 2018). Mean-
the daily online time slots, the students from better performance groups while, online usage behaviors of this study achieve higher classification
seemed to disconnect the network more quickly than the students in accuracy than students' enrolment data of the studies (Helal et al.,
counterpart groups. It could be argued that in comparison with those 2018; Kovacic, 2012). Although some studies (Romero et al., 2013;
from higher performance groups students from lower performance Xing et al., 2016) have obtained higher accuracy than our approaches,
groups are more likely to take a high risk of being addicted to the In- their studies just aimed at only one course, and the multiple dimension
ternet, due to their worse self-control. And after 6:00 a.m. students features around that course were computed. Course-specific predictions
began to get up and connect to Internet. Expectedly the students from are restricted in the applicability, whereas the academic performance
the better groups seems be more likely to get up earlier than the other predicted in our study might have a realistic extensive application.
ones. Combining with offline-online trends during the early morning, it Besides predicting whether students pass or not, we also take high score
can be recognized that the students with better study performance have or not as the academic achievement to test the predictive model.
a more powerful behavior discipline and live a “regular life”, this result Moreover, we achieved the prediction accuracies of feature Group_3
is similar to Cao et al. (2018) and Kassarnig et al. (2018). It is also in scored significantly higher on NN and SVM than Group_1. Furthermore,
line with the literature of Adelantado-Renau et al. (2018) that reveals despite no significant difference observed, feature Group_2 resulted in
higher grades with better sleep quality, in which Internet use time was slightly higher prediction accuracies on average compared with
used as a mediator. These interesting findings should be taken into Group_1. To summarize, the students' Internet usage data are effective
consideration by educators, who should concentrate on potential ways for predicting academic performance, and the richness of features will
to control the use of Internet during the bedtime. help to increase prediction accuracy.
As for the correlation between academic performance and Internet
usage behaviors, students’ performance was significantly negatively
associated with sub-dimensions of Internet traffic volumes and online 6. Conclusions
duration (high-score & non-high-score dataset). It means, along with
the increase of download volume Internet connection duration and, In summary, Internet usage data as a new metrics to assess student's
academic performance decreases. It is in line with the literature in- academic performance are proposed. Our findings reveal that there are
dicating that the increase of technology usage leads to the decrease of some significant differences of online behaviors between academic
academic performance (Uzun & Kilis, 2019). Unexpectedly, the current performance groups. Moreover, the features of Internet behaviors are
study found that academic performance is not significantly related to significant correlated to academic performance. The results indicate
daily online duration and is significantly positively related to these that behavior discipline plays a vital role in student's academic success.
divided time slots in the passed & failed dataset, it could be argued that It is obvious that Internet usage data are effective for predicting aca-
test failure might not depend on the daily online duration, and that demic performance, and the increase of Internet behavioral features can
students might be encouraged to spend more time using Internet for remarkably improve the prediction accuracy. Internet usage data can be
study at 9 a.m. - 12 p.m. and 22:30 p.m. - 12:30 a.m. for preventing test collected easily and conveyed to rich behavioral information.
failure. Considering the positive and significant results regarding con- Combining the generality of machine learning techniques, the pre-
nection frequency, it could be argued that in order to avoid the inter- dictive procedure of this study has strong practical value for improving
ference from Internet, students of better performance might get access the educational management, especially in residential universities and
to the Internet for the purpose in a planned way, instead of connecting colleges. We hope that this demonstration will stir interest to further
to the Internet for long period, this contradicts with a previous study study the impact of the Internet usage on academic performance, as
(Servidio, 2014) stating that frequency of Internet connections per day well as the interplay of individual and real Internet behaviors.
were significantly and positively related to Internet addiction, resulting Further research can be conducted by adding more behavioral fea-
in adverse effects on academic performance. tures to provide a more comprehensive reference for the educators to
In the predictive experiments, three algorithms were conducted to develop student learning management. Since Internet resource types
predict students' academic performance from grouped Internet usage (i.e. entertainment, study, socialization, news etc.) with the access time
features. The results showed that a classification accuracy of around 23 of each type can comprehensively reflect students' Internet behavior,
percentage points above baseline, which is in a similar level with the here we recommend that future study should include more data on
online behavior. For example, Internet resource types, temporal

172
X. Xu, et al. Computers in Human Behavior 98 (2019) 166–173

distribution of access to different resources, etc. could be considered. Kim, E. J., Namkoong, K., Ku, T., & Kim, S. J. (2008). The relationship between online
Furthermore, the involvement of more demographic data is re- game addiction and aggression, self-control and narcissistic personality traits.
European Psychiatry, 23(3), 212–218. https://doi.org/10.1016/j.eurpsy.2007.10.010.
commended. For example, birthplace and family details, which can Kovacic, Z. J. (2012). Predicting student success by mining enrolment data. Research in
reflect students’ cultural backgrounds, may have an impact on the re- Higher Education, 15.
lationship between online behavior and academic performance. Lau, W. W. F. (2017). Effects of social media usage and social media multitasking on the
academic performance of university students. Computers in Human Behavior, 68,
286–291. https://doi.org/10.1016/j.chb.2016.11.043.
Acknowledgements Lee, H., Ahn, H., Nguyen, T. G., Choi, S.-W., & Kim, D. J. (2017). Comparing the self-
report and measured smartphone usage of college students: A pilot study. Psychiatry
Investig, 14(2), 198–204. https://doi.org/10.4306/pi.2017.14.2.198.
The authors gratefully acknowledge the support of China Malhi, P., Bharti, B., & Sidhu, M. (2016). Use of electronic media and its relationship with
Scholarship Council (CSC). academic achievement among school going adolescents. Psychological Studies, 61(1),
67–75. https://doi.org/10.1007/s12646-015-0346-2.
Muusses, L. D., Finkenauer, C., Kerkhof, P., & Billedo, C. J. (2014). A longitudinal study of
References
the association between Compulsive Internet use and wellbeing. Computers in Human
Behavior, 36, 21–28. https://doi.org/10.1016/j.chb.2014.03.035.
Adelantado-Renau, M., Diez-Fernandez, A., Beltran-Valls, M. R., Soriano-Maldonado, A., Novak, D. C. (2008). Managing bandwidth allocations between competing recreational
& Moliner-Urdiales, D. (2018). The effect of sleep quality on academic performance is and non-recreational traffic on campus networks. Decision Support Systems, 45(2),
mediated by internet use time: DADOS study. Jornal de Pediatria. https://doi.org/10. 338–353. https://doi.org/10.1016/j.dss.2008.01.005.
1016/j.jped.2018.03.006. Rashid, T., & Asghar, H. M. (2016). Technology use, self-directed learning, student en-
Apuke, O. D., & Iyendo, T. O. (2018). University students' usage of the internet resources gagement and academic performance: Examining the interrelations. Computers in
for research and learning: Forms of access and perceptions of utility. Heliyon, 4(12), Human Behavior, 63, 604–612. https://doi.org/10.1016/j.chb.2016.05.084.
e01052. https://doi.org/10.1016/j.heliyon.2018.e01052. Ravizza, S. M., Hambrick, D. Z., & Fenn, K. M. (2014). Non-academic internet use in the
Cao, X., Masood, A., Luqman, A., & Ali, A. (2018a). Excessive use of mobile social net- classroom is negatively related to classroom learning regardless of intellectual ability.
working sites and poor academic performance: Antecedents and consequences from Computers & Education, 78, 109–114. https://doi.org/10.1016/j.compedu.2014.05.
stressor-strain-outcome perspective. Computers in Human Behavior, 85, 163–174. 007.
https://doi.org/10.1016/j.chb.2018.03.023. Romero, C., López, M.-I., Luna, J.-M., & Ventura, S. (2013). Predicting students' final
Cao, Y., Gao, J., Lian, D., Rong, Z., Shi, J., Wang, Q., et al. (2018b). Orderliness predicts performance from participation in on-line discussion forums. Computers & Education,
academic performance: Behavioural analysis on campus lifestyle. Journal of The Royal 68, 458–472. https://doi.org/10.1016/j.compedu.2013.06.009.
Society Interface, 15(146), 20180210. https://doi.org/10.1098/rsif.2018.0210. Samaha, M., & Hawi, N. S. (2016). Relationships among smartphone addiction, stress,
Chen, L.-Y., Hsiao, B., Chern, C.-C., & Chen, H.-G. (2014). Affective mechanisms linking academic performance, and satisfaction with life. Computers in Human Behavior, 57,
internet use to learning performance in high school students: A moderated mediation 321–325. https://doi.org/10.1016/j.chb.2015.12.045.
study. Computers in Human Behavior, 35, 431–443. https://doi.org/10.1016/j.chb. Servidio, R. (2014). Exploring the effects of demographic factors, Internet usage and
2014.03.025. personality traits on Internet addiction in a sample of Italian university students.
Chen, Y.-F., & Peng, S. S. (2008). University students' internet use and its relationships Computers in Human Behavior, 35, 85–92. https://doi.org/10.1016/j.chb.2014.02.
with academic performance, interpersonal relationships, psychosocial adjustment, 024.
and self-evaluation. CyberPsychology and Behavior, 11(4), 467–469. https://doi.org/ Suhail, K., & Bargees, Z. (2006). Effects of excessive internet use on undergraduate stu-
10.1089/cpb.2007.0128. dents in Pakistan. CyberPsychology and Behavior, 9(3), 297–307. https://doi.org/10.
Choi, J., Cho, H., Lee, S., Kim, J., & Park, E.-C. (2018). Effect of the online game shutdown 1089/cpb.2006.9.297.
policy on internet use, internet addiction, and sleeping hours in Korean adolescents. Tang, H., Xing, W., & Pei, B. (2018). Time really matters: Understanding the temporal
Journal of Adolescent Health, 62(5), 548–555. https://doi.org/10.1016/j.jadohealth. dimension of online learning using educational data mining. Journal of Educational
2017.11.291. Computing Research. https://doi.org/10.1177/0735633118784705
Davis, R. A. (2001). A cognitive-behavioral model of pathological Internet use. Computers 0735633118784705.
in Human Behavior, 17(2), 187–195. https://doi.org/10.1016/S0747-5632(00) Uzun, A. M., & Kilis, S. (2019). Does persistent involvement in media and technology lead
00041-8. to lower academic performance? Evaluating media and technology use in relation to
Dunbar, D., Proeve, M., & Roberts, R. (2018). Problematic Internet Usage self-control multitasking, self-regulation and academic performance. Computers in Human
dilemmas: The opposite effects of commitment and progress framing cues on per- Behavior, 90, 196–203. https://doi.org/10.1016/j.chb.2018.08.045.
ceived value of internet, academic and social behaviors. Computers in Human Wang, R., Harari, G., Hao, P., Zhou, X., & Campbell, A. T. (2015). SmartGPA: How
Behavior, 82, 16–33. https://doi.org/10.1016/j.chb.2017.12.039. smartphones can assess and predict academic performance of college students.
Feng, S., Wong, Y. K., Wong, L. Y., & Hossain, L. (2019). The internet and facebook usage Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous
on academic distraction of college students. Computers & Education, 134, 41–49. computing. Osaka, Japan: ACM.
https://doi.org/10.1016/j.compedu.2019.02.005. Wills, T. A., Pokhrel, P., Morehouse, E., & Fenster, B. (2011). Behavioral and emotional
Gialamas, V., Nikolopoulou, K., & Koutromanos, G. (2013). Student teachers' perceptions regulation and adolescent substance use problems: A test of moderation effects in a
about the impact of internet usage on their learning and jobs. Computers & Education, dual-process model. Psychology of Addictive Behaviors : Journal of the Society of
62, 1–7. https://doi.org/10.1016/j.compedu.2012.10.012. Psychologists in Addictive Behaviors, 25(2), 279–292. https://doi.org/10.1037/
Giunchiglia, F., Zeni, M., Gobbi, E., Bignotti, E., & Bison, I. (2018). Mobile social media a0022870.
usage and academic performance. Computers in Human Behavior, 82, 177–185. Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2016). Temporal predication of
https://doi.org/10.1016/j.chb.2017.12.041. dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization.
Hawi, N. S., & Samaha, M. (2016). To excel or not to excel: Strong evidence on the Computers in Human Behavior, 58, 119–129. https://doi.org/10.1016/j.chb.2015.12.
adverse effect of smartphone addiction on academic performance. Computers & 007.
Education, 98, 81–89. https://doi.org/10.1016/j.compedu.2016.03.007. Xu, X., Wang, J., & Wang, H. (2016). How surfing habits affect academic performance: An
Helal, S., Li, J., Liu, L., Ebrahimie, E., Dawson, S., Murray, D. J., et al. (2018). Predicting experimental study. International conference on web-age information management.
academic performance by considering student heterogeneity. Knowledge-Based Nanchang, China: Springer International Publishing.
Systems, 161, 134–146. https://doi.org/10.1016/j.knosys.2018.07.042. Zhang, Y., Qin, X., & Ren, P. (2018). Adolescents' academic engagement mediates the
Jiang, S., Williams, A., Schenke, K., Warschauer, M., & O'dowd, D. (2014). Predicting association between Internet addiction and academic achievement: The moderating
MOOC performance with week 1 behavior. Educational Data Mining 2014. effect of classroom achievement norm. Computers in Human Behavior, 89, 299–307.
Junco, R. (2012). Too much face and not enough books: The relationship between mul- https://doi.org/10.1016/j.chb.2018.08.018.
tiple indices of Facebook use and academic performance. Computers in Human Zhang, M. W. B., Tran, B. X., Huong, L. T., Hinh, N. D., Nguyen, H. L. T., Tho, T. D., et al.
Behavior, 28(1), 187–198. https://doi.org/10.1016/j.chb.2011.08.026. (2017). Internet addiction and sleep quality among Vietnamese youths. Asian Journal
Jung, Y., & Lee, J. (2018). Learning engagement and persistence in massive open online of Psychiatry, 28, 15–20. https://doi.org/10.1016/j.ajp.2017.03.025.
courses (MOOCS). Computers & Education, 122, 9–22. https://doi.org/10.1016/j. Zhu, Y.-Q., Chen, L.-Y., Chen, H.-G., & Chern, C.-C. (2011). How does internet informa-
compedu.2018.02.013. tion seeking help academic performance? – the moderating and mediating roles of
Kassarnig, V., Mones, E., Bjerre-Nielsen, A., Sapiezynski, P., Dreyer Lassen, D., & academic self-efficacy. Computers & Education, 57(4), 2476–2484. https://doi.org/10.
Lehmann, S. (2018). Academic performance and behavioral patterns. EPJ Data 1016/j.compedu.2011.07.006.
Science, 7(1), https://doi.org/10.1140/epjds/s13688-018-0138-8.

173

You might also like