You are on page 1of 9

IOP Conference Series: Materials Science and Engineering

PAPER • OPEN ACCESS You may also like


- Towards Data-driven Education with
A Predictive Analytics Model for Students Grade Learning Analytics for Educator 4.0
Salimah Mokhtar, Jawad A. Q. Alshboul
Prediction by Supervised Machine Learning and Ghassan O. A. Shahin

- External validation of a novel signature of


illness in continuous cardiorespiratory
To cite this article: Siti Dianah Abdul Bujang et al 2021 IOP Conf. Ser.: Mater. Sci. Eng. 1051 012005 monitoring to detect early respiratory
deterioration of ICU patients
Rachael A Callcut, Yuan Xu, J Randall
Moorman et al.

- Mechanical ventilation intervention based


View the article online for updates and enhancements. on machine learning from vital signs
monitoring: a scoping review
Marlin Ramadhan Baidillah, Pratondo
Busono and Riyanto Riyanto

This content was downloaded from IP address 119.94.164.200 on 12/08/2023 at 13:36


ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

A Predictive Analytics Model for Students Grade Prediction


by Supervised Machine Learning

Siti Dianah Abdul Bujang.1,2,4*, Ali Selamat.1,4*, Ondrej Krejcar.3*


1
Malaysia-Japan International Institute of Technology; Universiti Teknologi Malaysia
Kuala Lumpur, Jalan Sultan Yahya Petra, Kuala Lumpur, 54100, Malaysia.
2
Media and Games Center of Excellence (MagicX) Universiti Teknologi Malaysia,
Skudai 81310, Johor Bahru, Johor, Malaysia.
3
Faculty of Informatics and Management, University of Hradec Kralove,
Rokitanskeho 62, 50003 Hradec Kralove, Czech Republic.
4
Department of Information and Communication Technology, Polytechnic Sultan Idris
Shah, Sungai Lang, Sungai Air Tawar, Selangor 45100, Malaysia.

*Corresponding author: sdianah84@gmail.com

Abstract. Research on predictive analytics has increasingly evolved due to its impact on
providing valuable and intuitive feedback that could potentially assist educators in improving
student success in higher education. By leveraging predictive analytics, educators could design
an effective mechanism to improve the academic results to prevent students’ dropout and assure
student retention. Hence, this paper aims to presents a predictive analytics model using
supervised machine learning methods that predicts the student's final grade (FG) based on their
historical academic performance of studies. The work utilized dataset gathered from 489 students
of Information and Communication Technology Department at north-western Malaysia
Polytechnic over the four past academic years, from 2016 to 2019. We carried out the
experiments using Decision Tree (J48), Random Forest (RF), Support Vector Machines (SVM),
and Logistic Regression (LR) to study the comparison performance for both classification and
regression techniques in predicting students FG. The findings from the results present that J48
was the best predictive analytics model with the highest prediction accuracy rate of 99.6% that
could contribute to the early detection of students’ dropout so that educators can remain the
outstanding achievement in higher education.

1. Introduction
One of the crucial aspects of every educational institution is to determine the students’ academic
performance in the competitive environment and making the right decision for further strategy and
actions [1]. In today's world of data science, the application of predictive analytics is a recent frontier
field of higher education similar to other industries such as banking, marketing, financial service,
healthcare, fraud detection, and population trends. Over the years, predictive analytics has been
extensively studied due to its potential as an early warning system for predicting future academic
outcomes by using different types of student-related data [2,3]. Furthermore, it can go beyond the
understanding of how best to predict what will happen in the future.
The utilization of machine learning in predictive analytics has covered a wide range of areas for
predicting students’ performance [4]. Machine learning is part of artificial intelligence that can learn

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

from experience with no external assistance from the interruption of human. Machine learning provides
various techniques for prediction which include supervised and non-supervised learning algorithms such
as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB),
Decision Trees (DT) and k-Nearest Neighbor (kNN) [5]. However, studies on the use of machine
learning in predictive analytics for enhancing students performance are still lacking in Malaysia of
higher education [6,7].
As more data on students is accessible, higher education needs to understand and use that data to
gain insights from the educational environment. In the context of Malaysian Polytechnic, educators have
to review the performance of students assessed through final exams directly from an established vast
database at the end of each semester to determine their academic performance. However, this database
is lack of ability to analytics, insights, and trend of student success or failure based on a grade by
different courses. Due to this, educators and institutions face the challenge of monitoring the level of
complexity of a course that can affect the grade of students each semester. Based on that reason, it is
great to develop an appropriate solution to assist the institution by knowing the early grade prediction
to monitor the progress in a course for improving the students’ learning process based on predicted
grades. Therefore, this study aims to develop a predictive analytics model using supervised machine
learning techniques to help in facilitating higher education in predicting the students' FG based on their
historical academic performance for a course. We applied various techniques (J48, RF, SVM, and LR)
on the real data of Malaysia Polytechnic students.
The rest of the paper is organized as follows. Section 2 discusses some of the existing related works
of how machine learning techniques have been conducted in student grade prediction, followed by
Section 3, which focuses on the methodology of the proposed predictive model of this study. Section 4
presents the results of the experimental analysis and discussion on the identified limitations. Finally,
Section 5 concludes the outcome and highlights the future direction of this paper.

2. Related Works
The emergence of predictive analytics in higher education institutions is highly demanding to determine
better academic performance. Predictive analytics can overcome and improve the quality of students’
academic performance by analyzing the historical data for future improvement. It uses many techniques
from data mining, statistics, modeling, machine learning, and artificial intelligence to analyze past data
to address several issues such as high dropout rate and low student retention [8].
Various research on predictive analytics studies has been carried out by using machine learning to
predict student academic performance for the institution to improve the decision making quality [9].
Iqbal et al. [10] applied machine learning techniques to predict students’ grades in different courses for
the dataset of the Electrical Engineering Department at Information Technology University (ITU) in
Lahore, Pakistan. Their study indicated that the Restricted Boltzmann Machine (RBM) technique is
suitable for modeling tabular data and showed better results than other techniques used in predicting the
students’ performance in a particular course. The investigation in [11] indicated that SVM is best
performs for simple data in predicting a student’s grade. The efficiency of SVM in training the small
dataset size in producing higher classification accuracy for predicting students’ performance also has
been supported in [12].
According to [13], the proposed predictive model could prevent students’ dropout and enhance the
academic performances of Electrical Engineering students based on course grade records at Eastern
Washington University. The author has shown data was trained to predict the student’s Grade Point
Average (GPA) at a level of approximately 85% accuracy from the mean by utilizing machine learning
algorithms. Other than that, much more complex research conducted by Adekitan and Salau [14] who
used predictive Konstanz Information Miner (KNIME) and regression-based models separately to
predict students’ Cumulative Grade Point Average (CGPA) at Covenant University, Nigeria. Their
predictive model was indicated that LR has 89.15% of maximum accuracy compared to five other
algorithms (Probabilistic Neural Network (PNN), Decision Tree (DT), RF, Naïve Bayes (NB), and Tree
Ensemble) which be reasonably determined based on students’ GPA performance in three years of study.

2
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

Table 1. Comparison of Related Studies

Dataset Machine Learning Best


Paper Year Data Source Attribute
Size Algorithm Model
Undergraduate students Collaborative Filtering
Iqbal et al. of the Electrical (UBCF), Matrix
2017 225 Grades, GPA RBM
[10] Engineering Program Factorization (SVD &
from 2013 to 2015 NMF), RBM
Students of Craig School
Anderson of Business at California Historical grade data
2017 683 NB, kNN, SVM SVM
et al. [11] State University, Fresno from 18 semester
from 2006 to 2015
Student ID, Age, B.Sc. MLP-ANN, NB, SVM,
Abu Name, B.Sc Grade, kNN, Linear
Graduated students in
Zohair 2019 50 Course, Name Course, Discrimination SVM
master’s program
[12] Grade, Instructors Analysis (LDA)
name
Gender, whether first-
generation student,
Das and Students of Electrical
family income,
Rodriguez- Engineering at Eastern
2019 227 SAT/ACT score, high kNN, NB, SVM kNN
Marek Washington University
school GPA, Math
[13] from 2007 to 2016
I/II/III grades, physics
I/II/III grades
Students of Covenant
Adekitan University in Nigeria Students GPA for the
PNN, RF, DT, NB,
and Salau 2019 1841 from 2002 to 2014 first three academic LR
Tree Ensemble, LR
[14] across seven engineering years, final CGPA
departments
Research Method (RM)
Students of Computer grade, Research Project
Abana
2019 133 Engineering program in (RP) grade, gender, RT, RepTree and J48 RT
[15]
4 years backlog, programming
proficiency
Students of Business
Administration
Department,
Tsiakmaki Final score grade of LR, RF, SVM, DT, M5
2018 592 Technological RF
et al. [16] first-semester course Rules, kNN
Educational Institution
Western, Greece from
2013 to 2017
Students of Computer Academic Period,
Systems Engineering Subject Name, Final
Buenaño-
Degree at Ecuador Grade, Area, Situation,
Fernández 2019 335 DT DT
University from Semester, Code
et al. [17]
semester 2016-1 to Subject, Teacher,
2018-2 Section

Abana [15] has developed a classification model using DT (Random Tree (RT), RepTree, and J48)
to predict student's grades for a research project. The prediction model was evaluated with 133 instances
that contain five attributes in four years of studies. The study has concluded that RT is the best solution
with an accuracy of 75.188%. Nonetheless, this paper suggested that using additional samples and
attributes be implemented for more accurate predictive results in the future. Tsiakmaki et al. [16] was
carried out several experiments using regression tasks and other models (LR, RF, SVM, DT, M5 Rules
and kNN) for predicting students’ grade in six courses and two laboratory course of study. The results
reported that RF obtained a better satisfactory accuracy, which indicates an early identification of
learning difficulties triggers proactive actions that could improve the final outcome. In another work,
[17] proposed methodology using DT algorithm to monitor and predict students’ final grades based on
their historical performance of grades at Ecuadorian University. However, the authors have stated that

3
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

it is not an easy task to obtain the best predictive results when faced with similar academic pattern. We
have systematically summarized the study related to overall student grade predictions in Table 1.

3. Research Methodology
The steps of the proposed predictive analytics model for predicting students’ FG was illustrated in Figure
1. We used supervised algorithms J48, RF, SVM, and LR techniques to predict FG of student for a
particular course.

Figure 1. Proposed Predictive Analytics Model of Supervised Machine Learning

3.1 Data Collection


In this study, real data of Diploma Information Technology (Digital Technology) students were collected
from the Information and Communication Technology Department at north-western Malaysia
Polytechnic across students’ cohort December 2016 to June 2019. The dataset contains data of 489
students who taken Computer System Architecture (CSA) course. The information on the attributes in
the dataset is shown in Table 2.

3.2 Data Pre-processing


We apply to pre-process the collected data to prepare it for the selection of machine learning algorithms
using supervised learning. First, we removed some irrelevant attributes based on feature selection in
WEKA (Waikato Environment for Knowledge Analysis version 3.8.3) to develop machine learning
models. We select feature selection using the WrapperSubsetEval technique that uses the BestFirst
search method to pick the relevant attributes in the dataset. Based on the results, we removed all data
related to class, cohort, gender, CAM and FEM except for the other five attributes Year, TM, FGP,
Group and FG were selected in the final dataset. To monitor the students pass or fail in a course, we
grouped the students FG into five classes including ‘Extremely Excellent’ (A+), ’Excellent’ (A), ’Good’
(A-, B+, B), ’Pass’ (B-, C+, C, C-, D+) and ’Fail’ (E, E-, F).

4
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

Table 2. Attributes used in the dataset

Attribute Name Abbreviation Description Type Possible Values


Class Class Class assign for the CSA course Nominal DDT1A, DDT1B,
DDT1C, DDT1D
Year Year Year of student intake Numeric [2016 - 2019]
Cohort Cohort Student intake by cohort which two Nominal June, December
times a year
Gender Gender Students gender Nominal Female, Male
Continuous CAM Marks obtained from quiz, problem- Numeric [47 - 96]
Assessment Mark based task, tutorial and test in class
Final Examination FEM Marks obtained by students in the Numeric [19 - 90]
Mark final examination
Total Mark TM Marks obtained by the percentage of Numeric [38 - 91]
CAM and FE based on the course
curriculum
Final Grade Points FGP Grade points obtained from student Numeric [0.67 – 4.00]
CGPA
Group Group Group of grade to categorize the Nominal Extremely Excellent,
student performance Excellent, Good, Pass
and Fail.
Final Grade FG Student grade achievement based on Nominal A+, A, A-, B+, B, B-,
TM students C+, C, C-, D+, E

3.3 Design Model and Experimental Process


In this study, four classification and regression models were implemented which includes J48, RF, SVM
and LR to predict student FG performance. We used WEKA to conduct the experiment by using ten
folds cross-validation whereby our dataset is partitioned into a training (90%) and testing (10%) set for
evaluation. The four different predictive models were compared. The accuracy results are presented in
detail in the following section.

3.4 Data Visualization


All results of descriptive analysis and the predictive model are presented by using a visualization
technique for viewing the trend and pattern of students' FG performance. After loading the dataset, the
information indicates that most of the students receive grade B+, B and B- grade in CSA course from
2016 to 2019. We discovered that there is a normal distribution of TM by the majority of the students
received above 60 to 80 scores from the previous CAM and FEM performance. Therefore, the results
concluded that the average student's performance is at the level of 'Good' and 'Pass' in this course. Figure
2 summarized the overall grade distribution and TM for the entire dataset.

5
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

Figure 2. (i) Visualizing the grade distribution for dataset N=489 from 2016 to 2019. (ii) Density
plot and histogram of TM score of CSA course

4. Results and Discussion


In this study, we conduct four experiments using J48, LR, RF and SVM to predict FG of students based
on their historical academic performance for the CSA course. We compare FG predictions with actual
FG to measure the accuracy of our predictions. In Table 3, the results of our predictive model J48 showed
significantly better performance with the highest accuracy rate of 99.8% of other algorithms. However,
it is interesting to note that the accuracy rates of LR and RF with 98.6% and 97.9% respectively show
slightly different results with J48. On the other hand, the SVM accuracy rate shows a significant
difference, with only 85.9% of the FG prediction being correctly classified compared to the actual grade.
This overall result means that all the mistakes done by the classifier are not far away from the actual
grade. The classification of the lower grade (B-, C+, C, C-, D+, E, E-, F) is a more severe mistake than
classifying for the upper grade (A+, A, A-, B+, B). This implies that the number of students who received
the upper grade especially grade A+ and A is the easiest to predict than the remaining other grades. This
accuracy is better than the previous studies done by [11,13,16] that also conducted the prediction of
students’ grades using the same algorithms.

Table 3. Comparison of Algorithms Performance

Model Accuracy (%) MAE RMSE RAE (%)


J48 99.8 0.0009 0.0238 0.6
LR 98.6 0.0026 0.0502 1.7
RF 97.9 0.0040 0.0497 2.7
SVM 85.9 0.1496 0.2649 98.8

We also compare the models with the evaluation metric using Mean Absolute Error (MAE), Root
Mean Squared Error (RMSE) and Relative Absolute Error (RAE). We performed ten folds cross-
validation which each fold had the same distribution as the whole dataset. Nine folds is used for the
training process and the remaining one fold used for testing the efficiency of the predictive model. We
have visualized the MAE, RMSE and RAE results and accuracy rate for each model in Figure 4.

6
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

Figure 4. (i) Visualizing the evaluation of grade prediction models by MAE, RMSE, and RAE score
rate. (ii) Accuracy score rate of grade prediction models.

5. Conclusion and Future Directions


This study develops a predictive analytical model using supervised machine learning techniques by J48,
LR, RT and SVM in predicting FG of students in CSA courses at the Malaysia Polytechnic. We note
that the reported findings show J48 is a promising technique for predicting student FG whereas it can
be a good reason for this institution to improve student performance in the course. The model can also
be used as an early warning system to identify failure students in the classroom by the course
coordinators and educators, to take strategic decisions to improve student performance. Thus, the early
prediction of student performance can trigger educators to track student dropouts in a particular course
at an early stage. There are limitations and challenges in obtaining student performance data sets due to
data confidentiality. We selected only one course for the dataset to train and test the prediction accuracy.
However, with these findings, we seek to leverage and enhance the potential of predictive analytics
research for higher education in Malaysia by having a positive impact on the educational environment.
In the future, this study will expand the data set by adding more course data using predictive results to
improve student performance.

Acknowledgements
This work is also partially supported by the SPEV project “Smart Solutions in Ubiquitous Computing
Environments”, 2020, University of Hradec Kralove, Faculty of Informatics and Management, Czech
Republic (under ID: UHK-FIM-SPEV-2020-2102) and by the Research University Grant Vot-20H04 at
Universiti Teknologi Malaysia(UTM), Malaysia Research University Net-work (MRUN) Vot 4L876.
We would also like to thank you for consulting to Sebastien Mambou and Ayca Kirimtat, Ph.D. students
at FIM UHK and Polytechnic Sultan Idris Shah especially the Information and Communication
Technology Department for providing the data for this research.

7
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005

References

[1] Solomon, D. (2018) ‘Predicting Performance and Potential Difficulties of University Student
using Classification: Survey Paper’, International Journal of Pure and Applied Mathematics,
118(18), pp. 2703–2707.
[2] Liz-Domínguez, M. et al. (2019) ‘Systematic literature review of predictive analysis tools in
higher education’, Applied Sciences (Switzerland), 9 (24).
[3] Cui, Y. et al. (2019) ‘Predictive analytic models of student success in higher education: A review
of methodology’, Information and Learning Science, 120(3–4), pp. 208–227.
[4] Altabrawee, H., O. A. J. Ali, and S. Q. Ajmi. (2019) 'Predicting Students’ Performance Using
Machine Learning Techniques,' J. Univ. BABYLON pure Appl. Sci., vol. 27, no. 1, pp. 194–205.
[5] Mduma N, Kalegele K, Machuve D. A survey of machine learning approaches and techniques
for student dropout prediction. Data Sci J. 2019;18(1):1–10.
[6] Shahiri, A.M., Husain, W. and Rashid, N.A. “A Review on Predicting Student’s Performance
Using Data Mining Techniques,” Procedia Comput. Sci., vol. 72, pp. 414–422, 2015, doi:
10.1016/j.procs.2015.12.157.
[7] Yunus, M., Basheer, I., Mutalib, S., Hamimah, N. and Hamid, A. “Predictive analytics of
university student intake using supervised methods,” vol. 8, no. 4, pp. 367–374, 2019, doi:
10.11591/ijai.v8.i4.pp367-374.
[8] Mohamad, N., Ahmad, N.B. and Jawawi, D.N.A, “Malaysia MOOC: Improving Low Student
Retention with Predictive Analytics,” Int. J. Eng. Technol., vol. 7, no. 2.29, p. 398, 2018.
[9] Asiah, M. et al. (2019) ‘A Review on Predictive Modeling Technique for Student Academic
Performance Monitoring’, MATEC Web of Conferences, 255, p. 03004.
[10] Iqbal, Z., Qadir, J., Mian, A.N. and Kamiran, F. (2017) ‘Machine Learning Based Student Grade
Prediction: A Case Study’, pp. 1–22.
[11] Anderson, T. and Anderson, R. (2017). ‘Applications of Machine Learning To Student Grade
Prediction in Quantitative Business Courses’, Global Journal of Business Pedagogy, 1(3), pp.
13–22.
[12] Abu Zohair, L. M. (2019) ‘Prediction of Student’s performance by modelling small dataset size’,
International Journal of Educational Technology in Higher Education. 16 (1).
[13] Das, A. K. and Rodriguez-Marek, E. (2019) ‘A predictive analytics system for forecasting
student academic performance: Insights from a pilot project at eastern Washington university’,
2019 Joint 8th International Conference on Informatics, Electronics and Vision, (ICIEV) & 3rd
International Conference on Imaging, Vision and Pattern Recognition, (IVPR), IEEE. pp. 255–
262.
[14] Adekitan, A. I. and Salau, O. (2019) ‘The impact of engineering students’ performance in the
first three years on their graduation result using educational data mining’.Heliyon 5
e01250.ggf22n
[15] Abana, E. C. (2019) ‘A decision tree approach for predicting student grades in Research Project
using Weka’, International Journal of Advanced Computer Science and Applications, 10(7), pp.
285–289.
[16] Tsiakmaki, M. et al. (2019) ‘Predicting university students’ grades based on previous academic
achievements’, 2018 9th International Conference on Information, Intelligence, Systems and
Applications, IISA 2018. IEEE
[17] Buenaño-Fernández, D., Gil, D. and Luján-Mora, S. (2019) ‘Application of machine learning in
predicting performance for computer engineering students: A case study’, Sustainability
(Switzerland), 11(10), pp. 1–18.

You might also like