Professional Documents
Culture Documents
net/publication/349028267
CITATION READS
1 171
5 authors, including:
Saifur Rahman
Comilla University
10 PUBLICATIONS 24 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Partha Chakraborty on 11 February 2021.
Abstract—Machine learning is a sub-field of computer science models teachers can identify low-performance students and
refers to a system’s ability to automatically learn from experience can update their teaching methods to offer additional guidance
and predict new things using the learned knowledge. Different to eligible students. The early prediction may also allow
machine learning techniques can be used to predict the result
of the students in examination using previous data. Machine students to gain a clear understanding of how well or bad
learning models can recognize vulnerable students who are at they will do in a course and then take appropriate steps.
risk and take early action to prevent them from failure. Here, Machine learning’s basic concept is that it can automatically
a model was developed based on the academic performance learn from practical experience. There are several supervised
of the students and their result in the SSC exam. This paper and unsupervised types of approaches to machine learning
also shows a comparative study of different machine learning
techniques for predicting student results. Five different machine that are used to retrieve hidden information and correlations
learning techniques were used to demonstrate the proposed between data, which will ultimately assist decision-makers to
work. They are Naive Bayes, K-nearest Neighbours, Support take proper action in the future. Machine learning techniques
Vector Machine, XG-boost, Multi-layer Perceptron. Data were such as Naive Bayes, SVM, KNN, etc. may be very helpful
preprocessed before fitting into these classifiers. Among the five in predicting the performance of students based on the back-
classifiers, MLP achieved the highest accuracy of 86.25%. Other
classifiers also achieved a satisfactory result as all of them were ground and term exam performances of students.
above 80% accuracy. The results showed the effectiveness of In this work, A dataset was constructed consisting of aca-
machine learning techniques to predict the performance of the demic results of different subjects and their respective GPA in
students. the Secondary School Certificate Examination of 400 students
Index Terms—Machine learning, Result, Prediction from a renowned school1 . Dataset was cleaned and prepro-
cessed before fitting into the models. Five experiments with
I. I NTRODUCTION five different classifiers namely K-Nearest Neighbors, Support
The concept of equal opportunity is a crucial factor that Vector Machine, Naı̈ve Bayes, XG Boost, and Multi-layer
must be taken into consideration when talking about the Perceptron. All of the classifier’s performance was evaluated
development of a nation. In the educational factor, this idea is using different evaluation matrices such as precision, recall,
to guarantee every person has the same options for accessing f1-score, and accuracy.
and completing studies. There are some shortcomings in the
education system in many developed countries like Bangladesh II. R ELATED W ORK
that it is still trying to overcome. Many students participate in A grammar-guided genetic programming algorithm, G3P-
the board exam every year without understanding their overall MI, was implemented by the University of Cordoba to predict
performance in earlier. As a result, student dropout rates create whether the student will fail or pass a certain course [7].
capital wastage for all actors in the education sector and The algorithm has a 74.29% accuracy. A platform that can
also impact the institutions’ assessment processes. Through forecast student performance using machine learning algo-
evaluating student’s previous results of all regular exams using rithms has been created by the Vishwakarma Engineering
machine learning techniques, teachers can anticipate the stu- Research journal [8]. Two criteria were used, attendance of
dents resulting in the board exam. One of the greatest problems students and the related subject marks. A model for predicting
many education institutes face is enhancing the efficiency student performance has been developed by Somiya College
of educational processes in order to increase the success of
students. To meet the expectations using machine learning 1 Feni Model High School, Feni, Bangladesh
5 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 1, January 2021
TABLE I
F EATURE DESCRIPTION
Mumbai [9]. The team correctly expressed the correlation with XgBoost, and Multi-Layer Perceptron. These classifiers were
a student’s past academic results. With data set growth, neural trained with the training dataset. Finally, each classifier was
network output improvements have been documented. And evaluated with the test dataset by forming some evaluation
their precision hit 70.48 percent. Artificial neural networks matrix.
(ANNs) were used by De Albuquerque et al. [10] to fore-
cast the success of the student in exams. This model used A. Data Description
features such as grades, study periods, and school ratings for The dataset of the student’s performance was collected from
features, and high precision of 85% was obtained. Kotsiantis Feni Model High School in Bangladesh. The performance data
et al. [11] estimated the success of the pupil on final tests contained student’s marks for the different subjects of class
using techniques of machine learning. They used demographic (9-10) school students of the academic year (2013-2014) and
features like sex, age from an e-learning method as inputs. (2016-2017). After eliminating incomplete data, the dataset
They found that the strategy of Naı̈ve Bayes obtained a higher comprised of 400 students in the dataset.
average accuracy (73%) than the alternatives. The Eindhoven The final dataset contained 31 columns. First two columns
University of Technology performed an assessment of the of the dataset are the Student name and his/her respective id.
efficacy of machine learning for dropout student outcome pre- The latter 28 columns are the marks of 7 major and common
diction [12]. The basic approach was to use various machine subjects of science background students. For each student, the
learning approaches such as CART, BayesNet, and Logit, marks of four exams such as half-yearly examination, annual
to construct numerous prediction models. By using the J48 examination, pre-test, and test -examination were included.
classifier, the most effective model was developed. Researchers The final column is each student’s GPA which they achieved
from three separate universities in India undertook a related in the Secondary School Certificate (SSC) Examination.
analysis [13]. A data set of university students was analyzed Student’s GPA is the only Predictive variable of the dataset.
by different algorithms, after which the forecasts’ accuracy Possible values of the Student’s GPA is A+, A, A-, B, C,
and recall values have been compared. The architecture of the D, F follows the SSC Grading System from Intermediate and
ADT decision tree provided the most correct outcomes. This Secondary Education Boards, Bangladesh [17]. Here, A+ is
[14] was done at the University of Minho, Portugal. Using the highest result that can be achieved by a student, and F is
decision trees, random forests, vector support machines, and the lowest. The dataset consisted of 400 data items of students
neural networks it evaluates whether the student had passed the marks in the different subjects throughout the academic year
test in math and Portuguese language subjects were included and their performance in the SSC Exam. In the dataset, there
in the data set. Such techniques were evaluated in terms of were most data from A- category, precisely 93 data items.
accuracy. Another paper [15], predicts student’s success at The lowest was from the D category. 58 students achieved the
the beginning of an academic cycle, based on their academic highest GPA A+ and 28 students failed in the exam.
record. The research was performed on historical data stored
B. Data Preprocessing
within the information system of Masaryk University. The
findings demonstrate that this technique is as successful as To ensure the maximum performance from the models
with machine learning techniques, such as support vector the dataset was preprocessed before fitting into the models.
machines and it came to an accuracy of 85%. Because of the numerical nature of our data, there was not
much to preprocess. Preprocessing was done in two steps.
III. M ETHODOLOGY Normalization and Label Encoding. RobustScaler was used for
The full methodology of the proposed study for the student’s data normalization. Label Encoder was to transform categori-
performance prediction is shown in figure 1. The proposed cal values like A+, A, B, etc into numerical values. Machine
study was designed on supervised machine learing techniques. Learning algorithms work better with numerical values than
The first step was to collect data of students regular exams categorical values.
marks. Then, the dataset was preprocessed by normalizing the
data. After that, the preprocessed dataset was split into train C. Classification Model
and test parts. Five supervised classifiers were used namely, After Preprocessing, the dataset was split into the training
Naive Bayes, K-Nearest Neighbour, Support Vector Machine, dataset and test dataset. The test data were 20% of the dataset.
6 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 1, January 2021
7 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 1, January 2021
F1 Score
Class Name NB KNN SVM XGBoost MLP
A+ 0.81 0.87 0.82 0.92 0.83
A 0.78 0.85 0.83 0.86 0.80
A- 0.88 0.84 0.88 0.82 0.89
B 0.76 0.50 0.73 0.76 0.82
C 0.84 0.90 0.83 0.77 0.93
D 0.40 0.67 0.40 0.40 0.67
F 0.93 0.92 0.82 0.89 0.92
8 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 1, January 2021
TABLE IV
C OMPARISON STUDY WITH PREVIOUS WORK
Fig. 5. ROC curve for SVM Fig. 7. ROC curve for MLP
9 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 1, January 2021
10 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
View publication stats