You are on page 1of 6

0DUNHW%DVNHW$QDO\VLVRI6WXGHQW$WWHQGDQFH

5HFRUGV
Mohammed Hussain Abdullah Hussein
College of Technological Innovation Department of Computer Science
Zayed University University of Sharjah
Dubai, UAE, P.O. Box 19282 Sharjah, UAE, P.O. Box 27272
mohammed.hussain@zu.ac.ae ahussein@sharjah.ac.ae

Abstract — Many educational institutions enforce The research question investigated in this paper is the
attendance policies, where students are expected to have their value of analyzing student attendance records, which is
absences below a certain percentage in each class. Attendance typically only collected to enforce attendance policies.
records are collected to enforce such policies, but they are rarely Specifically, would mining such data helps instructors and
utilized for anything else. In this paper, we investigate the value advisors gain insights with regards to identifying students who
of analyzing the records pulled from student attendance miss their classes due to peer pressure, as described earlier?
systems. We apply a data mining technique, the market basket The importance of this work comes from the fact that the
analysis, on student attendance data. The contribution of this number of students may reach tens of thousands in many
analysis is the identification of student groups who share highly educational institutions. Thus, any meaningful insight, even if
similar absence records. Such similarity may indicate that the it applies for a small percentage of students, will have a great
students are missing classes due to peer pressure, rather than
value.
valid excuses. The presented method helps instructors and
advisors discover this behavior, which is more efficient than This paper addresses the stated research question through
relying on instructors, who may teach many classes. To the field of educational data mining and learning analytics [1-
minimize the number of false alarms, student groups are ranked 4]. Educational data mining and learning analytics have
based on their absence similarity. We tested our method by gained a great attention as tools which provide valuable
analyzing student attendance data for over two thousand insights to higher educational institutions through the analysis
students for one semester at a public higher education of student data collected from the various IT systems.
institution. The results were helpful in identifying students with
Numerous papers have used data mining techniques to predict
miss classes due to their friends missing the classes.
student academic performance, engagement in classes and
preferences [5-10]. This work utilizes this field through the
Keywords –Educational data mining; Learning
application of a popular data mining technique, specifically
analytics; Mining student behavior
the market basked analysis, to analyze student data. Market
basket analysis is used in e-commerce, such as Amazon and
I. INTRODUCTION eBay, to analyze how frequently a group of products are
Student advisors and counselors strive to ensure the purchased together and to recommend products to customers.
success of their students. Apart from advising students with The rest of the paper is organized as follows. Section II
respect to courses they need to register for next the semester, discusses the background and related work on educational
it is expected that advisors and counselors encourage students data mining and learning analytics relevant to this paper.
to put more effort in their courses, motivate students to engage Section III presents our method of analyzing student
in extracurricular activities and help students reflect on their attendance records, based on a market basket analysis. Section
behavior. Instructors may help advisors by providing feedback IV illustrates the method through a case study involving the
with regards to the performance of the advisees. The authors’ attendance records of over two thousand students throughout
institution uses a system to keep track of student attendance. a semester, collected from the attendance system at the author
The institution uses the system to enforce attendance, where institution. Section V concludes the paper and presents the
students are withdrawn from a course if they exceed a certain current limitations and planned future work.
number of absences. The system notifies advisors once their
students are in danger of being withdrawn from a course. The
system fails short from reporting the following behavior. Alice II. BACKGROUND
and Bob are two students and they are classmate in a few Students in higher educational institutions interact with
courses. Bob and Alice are also friends. Should Bob, skip a many systems, such as registration and learning management.
class, Alice may feel pressured to skip that class. The same Mining student data collected from these systems helps
may be true the other way around. This behavior may be discover useful insights with respect to student behavior.
repeated a few times in each course, will amount to many Researchers in the field of educational data mining and
absences. However, their behavior will fly under the learning analytics have already demonstrated that such data
attendance system radar, since they are skipping classes from can be used to predict student academic performance,
difference courses. Neither the instructor, nor the advisor will
preferences and engagement [5-9] and [13-14].
suspect that one of the students is skipping the classes because
of peer pressure. Such behavior may result in students failing A survey on the application of data mining methods to
their courses. This is a problem that remains unsolved and it achieve different educational purposes, such as the way the
constitutes the problem addressed in this paper. input and output of the educational process affect each other
was presented in [4]. The survey describes the way different
This research was funded through Zayed University Research
Incentive Fund R71068 research work on determining student failure/success rate in

978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1198
order to help students before they reached risk of failure, and
effective resource utilization and cost minimization were also
studied. The authors clustered the students based on their class class attendance
performance and overall attendance (low, medium and high).
This helped the authors in predicting the students’ graduation
performance in final year at university using only pre- Attendance
Instructor
university marks and examination marks of early years at System
se t
university. The authors of [5] Studied the effect of student ata
ed
class attendance on their academic scores and registered a nc
da
en
significant relation between attendance and the academic att
score. The study used the T-test to measure the relation
between the percentage of attendance and the percentage score absence rules
of the students. A survey on the application of data mining in
learning management systems is presented in [6]. The survey Learning
describes the way each step of the data mining process is Analytics Advisor
applied to the field of e-learning, from preprocessing to
interpreting results. The survey focused on the open-source Fig. 1. The system architecture
Modular Object-Oriented Development Learning
Environment (MOODLE) as the source of the educational may signify missing classes due to peer pressure. The next
data. section describes the presented method for finding patterns
The authors of [7] extracted knowledge that describes within student attendance records.
students’ performance in end semester examination to help in
identifying the dropouts and students who need special III. METHOD
attention. A study to identify patterns of interaction between
the students is proposed by [8]. The study related these We propose a method that utilizes the attendance records
patterns to student performance. A case study was presented for all students. The collected data is then passed to a data
where students were trained on use of the Internet to mining technique for analysis. Specifically, the method mines
accomplish education-related tasks. By monitoring the the records for association rules, where each rule links the
student’s communications solving the difficulties that arose, absences of one student to the absences of one or more
the course instructors searched for patterns of interaction and students. The method makes use of the market basket analysis
related these patterns to student performance and final course [15], which is a data mining technique that retailers, such as
grades. As the prediction of student performance, early in the Amazon and eBay use to find associations between their
course, is vital to student success, the authors of [9] presented products. Retailers generate association rules by inspecting
an approach to evaluate student data and predict the student their transactions and finding items that frequently appear
performance in courses during their early period of study. together. For example, the following rule may be used by a
Students were asked to fill a questionnaire which include retailer recommender system.
questions related to several personal, socio-economic, {Gaming_Console, Motion_Detector} ⇒ {Motion_Game}
psychological, school and college related variables that were
expected to affect student performance. The authors then built The rule states that if a customer buys a certain gaming
data mining models based on the data collected from the console, as well as that console’s motion detector, then the
questionnaire. customer will likely to buy a motion-based game, such as a
dancing game. The set of the two items, Gaming_Console and
Learning analytics is an essential component for Motion_Detector, represents the left side of the association
leveraging the benefits of big data [10] in educational rule, whereas the set of one item, Motion_Game, represents
contexts. The challenges and opportunities of big data for the right side of the rule. Such rules help retailers recommend
educational institutions are studied in [11-12]. The authors of products to customers. A popular algorithm to generate
[13] presented a method to enhance the academic association rules is the apriori algorithm [16].
accreditation process through the application of big data. The
method analyzes assessment tools and learning outcomes and In our work, we use the apriori algorithm to find rules
helps educators in aligning assessment with outcomes. A associating the absences of one student to the absences of one
method to identify student utilization of campus facilities or more students. The generated rules help instructors and
through tracking student access to campus wireless networks advisors identify students who are frequently absent together.
was presented in [14]. Instructors and advisors may meet with such students to
investigate the reason behind this behavior. Such intervention
A common approach among the educational data mining supports student academic success. Please refer to Fig. 1,
literature described earlier [5-8] is the use of student which illustrates the proposed method. The next subsection
attendance, performance in assessment and interaction with defines the necessary terminology for the apriori algorithm.
learning resources to predict student performance and Please refer to [16] for the exact algorithm.
success. Student attendance is modeled as a qualitative value,
such as low, average and high attendance. This paper utilizes A. The Apriori Algorithm
the actual attendance records. Using the actual records allows The apriori algorithm works on transactional datasets
us to utilize more data mining techniques, such as the market where each row represents a transaction consists of a set of
basket analysis used in this paper. Further, existing literature items. Table I shows an example of a dataset of five
does not help in finding groups of students who share a very transactions. Consider the set of available items for purchase,
similar attendance records across different classes, which gaming consoles, controllers, virtual reality sets, virtual reality

978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1199
TABLE I SAMPLE TRANSACTION DATASET

Trans Items
1 Gaming_Console, VR_Set, VR_Game
2 Gaming_Console, Controller, Action_Game, Online_Pass
3 VR_Set, VR_Game
4 Gaming_Console, Action_Game, Online_Pass
5 VR_Set, VR_Game, Controller

TABLE II. Sample Anonymized Student Absence Rules

Student1 Student2 Student3 Student4 … Studentn


CS101_1_Jan7 0 0 0 0 … 1
CS101_1_Jan7 1 0 0 1 … 0
. . . . . . .
. . . . . . .
. . . . . . .
PH215_3_May9 1 0 1 0 … 0
PH215_3_May9 0 0 0 1 … 1

games, action games and online passes. There are two


important values that need to be calculated for each  #$ " 
! #% &%&$
 (2)
association rule. They are the rule’s support and confidence ! # $
values. First, let us define the support for a set of items. The
support of a set is the number of transactions containing that To identify and generate association rules on large
all of that set items. The support may also be expressed as the datasets, with thousands of items, the apriori algorithm look
ratio of the number transactions containing all the items in that for rules that meets a certain support threshold, as well as a
set, to the number of all transactions in the dataset; that is, we certain confidence threshold. The next subsection discusses
may express the support as a percentage. Please refer to the application of apriori in mining the attendance records,
formula (1) below. which is the main contribution of this paper.

!   


B. Mining Attendance Records with Apriori
 #$ "  (1) Products bought together, by one customer, comprise one
!  
transaction. Rather than dealing with transactions and
Similarly, the support of a rule is the number of products, this paper applies the apriori algorithm on students’
transactions containing the items of that rule left, as well as absence records. The rules we generate have a structure
right-side set items. The confidence of an association rule (left similar to the example rule below. The rule states that if
side ⇒ right side) is calculated as the ratio of the support of StudentA and StudentB are absent, then StudentC will likely be
the union between the left-side and right-side sets to the absent too. The assumption here is that the three students are
support of the left side set. Rules with higher confidence and missing the same class on the same date. Students who miss
support signify that if the customer is showing interest in the the same class on the same date comprise one transaction.
items of the left-side set, then there is a high probability that
{StudentA, StudentB} ⇒ {StudentC}
the customer will show interest in the items of the right-side
set. The following is a brief description of the steps to perform the
apriori on student absence records and the meaningful use of
It is clear from the table that all transactions containing a
the algorithm output.
VR_Set also contain a VR_Game. This observation provides a
rule R1: {VR_Set} ⇒ {VR_Game}. Most of the transactions Step 1. The first step is to collect the absence records for
containing a Gaming_Console also contains an Online_Pass. each student in each class. A list of student ids needs
The observation provides us with a second rule R2: to be extracted, Stu_IDs. Similarly, a list of
{Gaming_Console} ⇒ {Online_Pass}. More rules can be classes/sections needs to be extracted, Sec_IDs, such
also extracted. The confidence for R1 is calculated as 1, using as CS101_3.
formula (1). The confidence for R2 is calculated as 0.67, using Step 2. For each section, extract the dates where that
a similar calculation. The support for R1 is 3 (number of section had a session and the instructor of that section
transactions containing both VR_Set and VR_Game). The recorded the attendance. Append the Sec_ID to each
support for R2 is 2 (number of transactions containing both date, creating a list of dates for each section,
Gaming_Console and Online_Pass). Both the support and Sec_ID_Date_IDs. For example, if section CS101_3
confidence are important values that reflect the usefulness of had a session on Apr 12th, then CS101_Apr12 is
the rule. created. This is repeated for each section to create a
list of lists, Master_Sec_Date_List. Example
element of Master_Sec_Date_List is

978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1200
TABLE III. Sample Anonymized Student Absence Rules

R# Left ⇒ Support Confidence R# Left ⇒ Right Support Confidence


Right % %
1 {n} ⇒ {m} 0.01077 1.00 2 {l, k} ⇒ {j} 0.01197 1.00
3 {g} ⇒ {f} 0.01197 0.77 4 {j, l} ⇒ {k} 0.01197 1.00
5 {f} => {g} 0.01197 0.91 6 {q, s} => {r} 0.01197 1.00
7 {z, x}=> {y} 0.01077 0.90 8 {r} => {s} 0.01677 0.90
9 {j} => {k} 0.01437 0.85 10 {γ} => {ε} 0.01317 0.85
11 {δ} => {z} 0.01557 0.86 12 {y} => {w} 0.01437 0.80
13 {α} => {β} 0.01677 0.74 14 {j} => {l} 0.01197 0.71
15 {x} => {w} 0.01916 0.80 16 {x} => {y} 0.01677 0.70

CS101_3_Apr12, where CS101_3 is the section id based on the attendance records collected from the
and Apr12 is the date of one of the sessions. Each author institution. The experiment detail is explained
element of Master_Sec_Date_List is therefore a the next section.
transaction label.
Step 7. The advisor needs to investigate the rules. Note that
Step 3. Create a matrix where the rows represent the some rules may be the result of students
elements of Master_Sec_Date_List ids and the coincidentally missing classes, without peer
columns represent the student ids. The celli,j in pressure. The advisor may check on the student
created matrix represents whether student, j, was overall performance to set the priority for meeting
present (1) or absent (0) during session i (note that i students. Students who miss classes and are under
is composed of a section id and a date id). Table II is probation needs urgent attention, compared to
an illustration of the transaction matrix that we students with good standing status.
supply to the apriori algorithm. The second row
Step 8. Finally, advisors provide feedback for the person
shows that student1 was absent along with student4
responsible for generating the rules with regards to
on Jan 7th in section CS101_1.
the usefulness of the generated rules. The feedback
Step 4. Fill the matrix created in Step 3 with student helps tweak the support and confidence thresholds
absence values based on the records pulled from the for the next rounds.
attendance system.
Step 5. Run the apriori algorithm on the computed matrix IV. EXPERIMENT
from Step 4. The support and confidence thresholds To demonstrate the presented method, we analyzed the
need to be specified. The support threshold is based attendance records of more than 2000 students, over the period
on the number of sessions. If a student takes 5 of one semester at the author’s institution. The apriori
courses in a semester, where each section holds 30 algorithm was applied to perform the basket analysis. The
sessions throughout the semester, then the maximum authors implemented their approach using R programming
number of sessions is 150. Consider two students language [17], a popular data mining software environment.
having an exact class schedule, the maximum The following is a summary of the data collection, analysis
number of times both are absent is 150. However, it and results of the experiment.
is highly unlikely that a student misses all of
sessions, in all of the classes. Therefore, one may set A. Data Collection and Preprocessing
the support threshold a reasonable value. For
example, a threshold of 10 help generate association The authors collected more than 50,000 absence records,
rules, where each rule is supported by 10 incidents where each record documents the absence of a student with
were the two students were absent together. The respect to a class. Each record in the collected attendance data
confidence threshold is based on the required rule consists of the following items. The St_ID is the student id,
strength. Setting the threshold to 0.7 help generate C_ID, is the course id, Sec_ID is the section id and A_ Date is
rules, where each rule associates the absence of one the date of the absence:
student to another with 70% accuracy. Setting the Recordi = (St_ID, C_ID, Sec_ID, A_ Date)
support and confidence thresholds also depends on
the number of rules the user is willing to go through. The records were then preprocessed to remove the first and
The lower the thresholds, the more rules the apriori last week of classes. The rationale is to avoid the collection of
generates. absence records for the days in which many students are
absent. At the first week, students are busy with add and drop,
Step 6. The output of the apriori algorithm is a set of rules while at the last week, many student miss classes to prepare
that meets the thresholds set in Step 5. If a student for assessment. The authors also preprocessed the records for
appears in a rule, then the advisor of that student may students who change sections. An attendance matrix was
receive a notification with regards to that rule. An generated for each section, where rows represent the dates that
example output is displayed in Table III. Table III section had a session and columns represent the students
lists a set of the actual anonymized rules generated registered in that section, i.e., the class roaster.

978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1201
Fig. 2. A Graph of the Anonymized Rules – the weight of an edge is the support of the rule

These observations should help advisors and instructors in


B. Analysis and Results
identifying such cases of absences as early as possible.
The steps 1 to 8 from the Section III.B were followed to
mine the collected data for association rules. The selected To visualize the results, the authors plotted the rules as a
thresholds were support = 0.0001, around 5 transactions, and graph, where students are nodes. The edges represent the
confidence = 0.25. We used R Studio1, an open-source IDE absences two students share together in the same class,
for R, to perform the analysis. weighted by the actual number of absences. Fig. 2 shows part
of the generated graph for this experiment. The letters that
As described earlier, the apriori algorithm generates rules appear in the graph represent the same students in Table III.
in the following structure: (left ⇒ right: support, confidence),
where left refers to the left-hand side of the rule and right is Interesting examples can be seen in the figure. For
the right-hand side of the rule. Both left and right denote a set example, consider the students l, k and j. They form a group,
of one or more students, where the arrow between them in which if one is absent, the other is highly likely to be absent
denotes that if left is absent, then right is absent too. The too. The students α and β have 14 absences in common, which
support is the percentage of records where both students were is highly unlikely to be coincidental. Larger groups are also
found to be absent together. The confidence is the percentage found, such as the group of five students x, w, y, z and δ. The
of students in the left being absent, with students in the right student y appears at the center of that group. This may signify
being absent too. Table III lists a number of rules, where that y is the leader of that group and that y has major influence
actual student ids were replaced with random letters to ensure on the rest of the students.
anonymity. A support of 0.01078% translates to around 5
absences in our dataset. A confidence of 1 means that student V. CONCLUSION
in the right side of a rule is always absent if student in the left
Student attendance systems, such as the one at the authors’
side of that rule is absent. The results of the analysis helped
institution, keep absence records of each student for each of
us identify over 100 pairs of students who share similar his or her classes in each semester. The records are used for
absence records. A likely cause is one student in each pair is enforcing attendance policies only to ensure students absences
may be skipping classes because his or her friend is absent. do not exceed a certain threshold. Mining student attendance

1
RStudio: https://www.rstudio.com

978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1202
records can be useful for identifying student who miss classes [14]M. Hussain, M. B. Al-Mourad, A. Hussein, S. Mathew & E.
due to peer pressure. We investigated the value of analyzing Morsy, “A novel approach for analyzing student interaction
the records pulled from the student attendance system. We with educational systems,” in Proc. of Global Engineering
presented a method that applies a market basket analysis, Education Conference, pp. 1332-1336, 2017.
using the apriori algorithm, on student attendance data. The [15]M. J. Berry & G. Linoff, Data mining techniques: for
method was used to mine over 50,000 attendance records that marketing, sales, and customer support. John Wiley & Sons,
belong to over 2000 students at the author institution. The Inc., 1997.
method was successful in generating more than a 100 [16]R. Agrawal and R. Srikant, “Fast algorithms for mining
association rules. Each rule links the absences of one student association rules,” In Proc. of the 20th International
to the absences of one or more students. Conference on Very Large Data Bases, pp 487-499, 1994.
The main limitation of this work is the fact that the method [17]K. Hornik, “R FAQ,” The Comprehensive R Archive Network.
needs a significant number of records to be able to generate 2.1 What is R?, 2015, Retrieved 2019-01-13, from
the rules. At least, half of the semester may pass before enough https://cran.r-project.org/doc/FAQ/R-FAQ.html
absences can be recorded by class instructors. Nevertheless,
advisors may benefit from this approach to better advise
students for the next semester.

REFERENCES
[1] R. S. Baker and P. S. Inventado, “Educational data mining and
learning analytics,” Learning Analytics, pp. 61-75, 2014.
[2] A. Dutt, M. A. Ismail & T. Herawan, “A systematic review on
educational data mining,” IEEE Access, vol. 5, pp. 15991-
16005, 2017.
[3] A. Peña-Ayala, “Educational data mining: A survey and a data
miningbased analysis of recent works,” Expert Systems with
Applications, vol. 41, no. 4, pp. 1432-1462, 2014.
[4] H. Alagib, A. Hamza, & P. Kommers, “A Review of
Educational Data Mining Tools & Techniques,” International
Journal of Educational Technology and Learning, vol. 3, no. 1,
pp. 17-23, 2018.
[5] O. D. Ayodele, “Class attendance and academic performance
of second year university students in an organic chemistry
course,” African Journal of Chemical Education, vol. 7, no. 1,
pp. 63-75, 2017.
[6] C. Romero, S. Ventura & E. García, “Data mining in course
management systems: Moodle case study and
tutorial,” Computers & Education, vol. 51, no. 1, pp. 368-384,
2008.
[7] B. K. Baradwaj & S. Pal, “Mining educational data to analyze
students' performance,” International Journal of Advanced
Computer Science and Applications, vol. 2, no. 6, pp. 63 - 69,
2011.
[8] L. Talavera, & E. Gaudioso, “Mining student data to
characterize similar behavior groups in unstructured
collaboration spaces,” in Proc. of the 16th European
conference on artificial intelligence, pp. 17-23, 2004.
[9] V. Ramesh, P. Thenmozhi & K. Ramar, “Study of influencing
factors of academic performance of students: A data mining
Approach,” International Journal of Scientific & Engineering
Research, vol. 3, no. 7, pp. 1-5, 2012.
[10]A. Gandomi & M. Haider, “Beyond the hype: Big data
concepts, methods, and analytics,” International Journal of
Information Management, vol. 35, no. 2, pp. 137-144, 2015.
[11]B. Daniel, “Big Data and analytics in higher education:
Opportunities and challenges,” British Journal of Educational
Technology, vol. 46, no. 5, pp. 904-920, 2015.
[12]V. Kellen, “Applying Big Data in higher education: A case
study,” Cutter Consortium white paper, vol. 13, no. 8, 2013.
[13]M. Hussain, M. Al-Mourad, S. Mathew & A. Hussein, “Mining
educational data for academic accreditation: aligning
assessment with outcomes,” Global Journal of Flexible
Systems Management, vol. 18, no. 1, pp. 51-60, 2017.

978-1-5386-9506-7/19/$31.00 ©2019 IEEE 9–11 April, 2019 - American University in Dubai, Dubai, UAE
2019 IEEE Global Engineering Education Conference (EDUCON)
Page 1203

You might also like