You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/344753535

Machine Learning Based Predicting Student Academic Success

Article · October 2020


DOI: 10.1109/ICUMT51630.2020.9222435

CITATIONS READS

15 766

1 author:

Al-Bahri Mahmood
Sohar University
28 PUBLICATIONS 190 CITATIONS

SEE PROFILE

All content following this page was uploaded by Al-Bahri Mahmood on 01 February 2022.

The user has requested enhancement of the downloaded file.


2020 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)

Machine Learning Based Predicting Student


Academic Success

Khalfan Al Mayahi Dr. Mahmood Al-Bahri


Department of Mathematical and Department of Mathematical and
Physical Sciences Physical Sciences
University of Nizwa University of Nizwa
Nizwa, Sultanat of Oman Nizwa, Sultanat of Oman
12839392@uofn.edu.om m.albahri@unizwa.edu.om

Abstract— Today, all institutions and companies are The purpose of this work is to show the possibilities of
accelerating the use of AI technologies in their businesses to using machine learning in education. For example: a review
achieve a clear vision and quality results. The education sector of existing experience, as well as the development of a
is one of the sectors where AI can be used because of big data. model for predicting the success of the exam by a student
In this work we created a machine-based learning model to
based on his previous academic success.
predict a student’s educational performance. The developed
model relied on the student’s previous data and performance II. MACHINE LEARNING AND EDUCATION
in the last stage of the school. The model showed a very
accurate accuracy rate that can be adopted. A. Machine Learning

Keywords— Machine Learning, Education, Algorithm, The term "machine learning" was first coined by the
Predicting, accuracy. pioneer of computer games and artificial intelligence,
Arthur Samuel in 1952. The program, which demonstrated
I. INTRODUCTION the ability to self-learn based on its previous experience,
Education is an area in which a large amount of data is refuted the judgment that computers can only execute
produced and accumulated. The traditional educational algorithms strictly specified for them. Arthur Samuel
process involves thousands of hours spent in school and defined machine learning as "methods that enable
performing various tasks at home over the years. This computers to learn without directly programming them."
interaction of students with teaching materials generates a [1, 10]. A more formal definition of machine learning was
lot of information. Education management systems and given by American data scientist Tom Mitchell: "It is said
online educational platforms collect data on student that a computer program learns on the basis of experience
interaction with the online system. For example, such data: E with respect to a certain class of problems T and a
students’ progress and result of completing assignments measure of quality P, if the quality of solving problems
and exercises, about involvement in group projects and from T, measured on the basis of P, improves with the
discussions. Over the years, the university has been acquisition of experience E". Thus, machine learning is a
accumulating data about its applicants: their gender, age, subset of artificial intelligence. Machine learning stands at
Grades upon graduation from school in various subjects. the intersection of disciplines such as mathematics,
Later, data is collected about the same people, but already statistics, probability theory, graph theory, and the student
as students. their attendance, grades in various subjects, of algorithms that can learn independently from experience.
success in scientific activity, what types of assignments [2, 9]
were given better or worse, what teachers taught the course. B. Applying machine learning in education
Analyzing this information correctly can help to Despite the fact that the use of machine learning
provide a more complete picture of the learning process. It methods in many areas is already ingrained and considered
can also reveal useful and possibly non-obvious generally accepted. In education, this technology has not
connections: how the level of initial training affects yet found wide application. Education is one of the
academic performance in a particular subject, whether industries in which new data is constantly being generated.
success in mastering the discipline depends on gender, Data is also accumulated in traditional offline education
attendance or teacher, which teachers' students show the institutions such as schools, colleges, higher education
best results. institutions, and in online education systems [5]. Figure 1
Machine learning techniques can predict the outcome of shows an automated education application in the education
a situation based on historical data. In contrast to traditional sector. The application about Identifying Student
measures of measuring student performance. Such as Candidates in Higher Education using Machine Learning
grades and accumulated points, which only help to measure [11].
the final result of the student. Applying machine learning
techniques can help educators and researchers gain
valuable insights into how to improve and personalize
learning, make predictions and recommendations, and
drive change in real time when it makes sense and is
needed.

978-1-7281-9281-9/20/$31.00 ©2020 IEEE 264

Authorized licensed use limited to: Carleton University. Downloaded on November 03,2020 at 13:56:47 UTC from IEEE Xplore. Restrictions apply.
III. MACHINE LEARNING BASED PREDICTION MODLE
In this work a model based on machine learning was
developed. The task of this model is to predict if a student
will pass a specific course based on his previous academic
success data. Such a prediction should be carried out some
time before the exam, for example, 2-4 months. The reason
is that this information can be used as: students would have
time to put more effort into mastering this discipline, and
the teacher would have the opportunity to pay more
attention to these students. In machine learning terms, there
is a supervised learning task, namely a classification task.
C. Data collection
To construct this model, student data was used by the
University of Nizwa at the College of Arts and Sciences in
the Department of Physical and Mathematical Sciences,
Fig.1 AWS Machine Learning Diagram [ 11] Computer Science section. The data for the university
applicants and first year students were taken from the
Mostly machine learning methods are used by academic years 2017/2017 and 2018/2019. First of all, the
educational platforms. These methods allow to automate data of applicants were collected, for which the following
the process of collecting, storing and analyzing data. The characteristics are known: First Name, Last Name, and ID
first studies in this area were carried out by the professor of of the applicant, Student grades upon graduation from
mathematics Sotiris Kotsiantis in 2003[3]. In his article school in Mathematics, Arabic, Social Studies and English.
“Use of machine learning techniques for educational After that, data on student performance during the first year
proposes: a decision support system for forecasting were taken: student ratings for the first and second
students’ grades” Kotsiantis writes that the use of machine semester. At the University of Nizwa, a centenary grading
learning in educational practices is a promising and system has been adopted, for which there is the following
developing area. The author notes that a huge amount of scale for converting a quantitative assessment into a
data about a student is accumulated in the Distance learning qualitative one, presented in Table 3:
process.
TABLE I. TRANSFER OF MARKS
In Paper [4], Kotsiantis describes how he used existing
regression analysis methods to predict student grades in 100% Grade
distance learning. He compares some of the modern 90-100 A
regression algorithms. The aim to find out which algorithm 77-89 B
is more appropriate not only for accurately predicting 67-76 C
academic performance, but also for use as an analytical tool 60-66 D
for support and decision-making for teachers. With 0-60 F
information in front of them about current and projected
student performance, educators can minimize student
For each student, the rating contained information about
failures by supporting them and providing additional
his name, place in the rating, group, grade for the exam for
learning materials. Five different machine learning
each subject of this semester on a 100% scale, average
approaches have been tested. In order to build an algorithm
grade, minimum grade, and the presence of unsatisfactory
that can most accurately predict the future performance of
grades. For each student, the classification contained
students. These approaches include decision trees, neural
information about his name, his place in the classification,
networks, Bayesian networks, logistic regression, and
the group and the score on the exam in each subject in this
support vector machines. All these models are described in
semester on a scale of 100%, the average minimum score,
detail by Kotsianthis in his article.
the minimum and the presence of unsatisfactory degrees.
Knewton is one of the first companies to actively apply
The data for students of the educational programs
big data analysis technologies in education. The company
"Computer Science" and " Physics" were analyzed
sees its mission in personalizing learning around the world
together, as:
[6].
- the same entrance exams are accepted for both
Among the mainstream schools and universities that
directions (Mathematics, Arabic, Social Studies
collect data about their students, but hardly use it for
and English)
analysis. there are separate institutions that build the entire
educational process around technology. AltSchool is an - first-year programs in basic subjects are the same
example of such an institution[7]. (academic disciplines, topics to be mastered, hours
allocated for lecture, seminar and independent
work coincide).

265

Authorized licensed use limited to: Carleton University. Downloaded on November 03,2020 at 13:56:47 UTC from IEEE Xplore. Restrictions apply.
All data were taken with the help of computer science Computer Grade for Computer Skills in 1st
section staff and students. Skills_Uni Semester, 0 to 100
D. Primary data processing and analysis Physic_Uni Grade for Physic in 1st Semester, 0 to
First of all, the data of applicants and ratings of students 100
from both programs were combined into single lists. After Average Average mark in all subjects for the 1st
that, was carried out the mapping of the students' data with mark_Uni semester
the applicants according to the full coincidence of the full
name. That is, for each student from the rating before Math_Ana2 Evaluation of mathematical analysis in
retakes for the second semester, data about him as an the second term, from 0 to 100
applicant was tightened. Mapping was done in Microsoft
Excel using the VLOOKUP vertical pull-up formula. In
total, the sample included 550 students. E. Building the model

In the next step, the subjects were identified for which To build a classification model, the resulting dataset
students received the most failures. These subjects in the was loaded and processed using the pandas module. To
first semester were Math, Computer Skills, Physics. build machine learning, the dataset must be represented as
Courses with an unsatisfactory grade were chosen for two a matrix. Each line corresponds to a record of one student.
reasons: These disciplines are the most difficult and the Variables X1, X2,…, Xn are characteristics, and variable Y
second reason is with a high percentage of misses, the is the target. That is, the one that needs to be predicted.
problem of unbalanced sampling disappears. The following In this model, the student's characteristics are his Grade
are charts 1 showing the percentage distribution of grades upon graduation from school and university grades. Target
in Math, Computer Skills, Physics in the first semester, in variable - Second semester math grade. Accordingly,
descending order of percentage of non-grades. The highest descriptive characteristics were selected in array X, and
percentage of F grades in the first semester was for math at target variable was written in Y. The below part of code
15%. The next subject in terms of the level of non- show how characteristics get into variable X and target
acceptance was Physics with 14. The number of failures in variable Y is shown in the following part of the code:
Computer skills was 5%.
X = file[[' Math_Sch ', 'Math _Uni', ' Computer
Skills_Uni', ' Physic_Uni', ' Average mark_Uni.' ]]
DISTRIBUTION OF GRADES
y = file[' Math_Ana2']
%
Then the sample was pseudo randomly divided into two
Math Computer Skills Physics parts: training and test samples. The training sample was
70%, and the test sample was 30%. For the training set,
40

both the X variable and the target Y variable are known.


29
25

25

The algorithm uses a training sample to determine how the


22

21
20

20

20
18

15
14

14
12

input characteristics affect the value of the target variable.


5

In the test sample, the algorithm knows only the


A B C D F
characteristics of each student. Already trained on 70% of
the data, the algorithm predicts the value of the target
Fig.2 Distribution of grades in three courses variable. After that, can be checked the accuracy of such a
prediction, since the "correct answer" is known for these
The next step was to identify which variables should be 30% of the sample. The pseudo random division of the
used to predict the assessment by mathematical analysis. sample into training and test parts means that the division
Since the data of applicants, such as student grades upon occurs randomly, but it is fixed. So that when choosing
graduation from school, are available for each student from different machine learning algorithms, it was possible to
both study programs and enrollment years, they were track which algorithm shows the best prediction accuracy
included in the sample. Thus, in the final dataset, there were on the same data.
records of 550 students containing the fields presented in
Table 2: Predicting a student's grade accurate to a point cannot
be very accurate. If the algorithm predicted that the student
TABLE II. FINAL DATASET FIELDS will get an 85 in the exam, and he got an 82 or 88, it cannot
be said that this is a strong error. In addition, the task of
Courses data of applicants
predicting an accurate estimate does not make sense. It is
Math_Sch Grade upon graduation from school not important to know in advance what grade the student
Arabic_Sch Grade upon graduation from school will receive: 80 or 90, 60 or 70, it is important to divide
students into risk groups: who is most likely to pass and
Social Grade upon graduation from school who will not pass. For example, there is practically no
Studies_sch difference between grades 80 and 90, since in both cases
English_sch Grade upon graduation from school the student passed, but there is difference between grades
Math _Uni Grade for Math in 1st Semester, 0 to 100 55 and 62, so the student did not pass. However, there is an
important distinction between grades 65 and 80, as they

266

Authorized licensed use limited to: Carleton University. Downloaded on November 03,2020 at 13:56:47 UTC from IEEE Xplore. Restrictions apply.
separate pass and fail. Therefore, it was decided to 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = 88 %
introduce three classes as following: 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
TABLE III. DISTRIBUTION OF GRADES BY Accuracy - the percentage of correctly identified objects
CLASSIFICATION
among all predictions. However, this metric is not always
Predicted Classification Classification effective.
score Description
𝑇𝑃
A, B 2 The student will pass 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = 97 %
𝑇𝑃 + 𝐹𝑃
F, D 0 The student will not
pass Precision - the proportion of objects correctly classified as
C 1 suspicion Class 1 among all objects classified positively.

If the algorithm predicts a student's grade of 67 or 𝑇𝑃


higher, he falls into the group of those who are most likely 𝑅𝑒𝑐𝑎𝑙𝑙 = = 90 %
𝑇𝑃 + 𝐹𝑁
to pass. If a student's predicted grade is below 67, then he
is in the group of those who have a high risk of not passing The Recall metric demonstrates what part of objects of
the exam. However, if the student's predicted grade is 4, a positive class the algorithm found among all objects of a
then he falls into the suspicion group, since an error of one positive class [8].
point will cause a classification error. At the same time, it
is not so scary to classify a student into a risk group if he B. SVC Model Accuracy
passes the exam, than to predict that he will most likely pass In this model, for 165 students, for whom the prediction
and be mistaken. Also, the human factor cannot be was built, the division into classes.
excluded, since the mark for the exam largely depends on out of 165 people, prediction is already impossible for
the student. A student with a low level of initial knowledge 26 (16%) fell into the class of those who will receive the
could prepare well at the last moment or cheat. A well- test, and for 18 (21%) failures were predicted.
prepared student could get anxious on an exam and forget
basic things. For the classes "pass" and "not pass" the error matrix
looks like this:
A number of machine learning models were tested,
which included various implementations of linear True Positive = 119, False Positive = 2
regression, support vector machines, naive Bayesian False Negative =15, True Negative = 3
classifier. Two models showed the highest accuracy:
Thus, the metrics of the classifier have the following
• SVC - Support Vector Classification based on support values:
vector machine.
Accuracy = 87%, Precision= 98%, Recall= 88%
• Elastic Net - a kind of linear classification algorithm
Thus, the first model turns out to be more accurate,
Next section presents in detail about the accuracy of the since the proportion of students for whom the system could
model. not predict exam success was significantly reduced, and the
Accuracy, Precision and Recall metrics show high values.
IV. MODEL ACCURACY
It was also found that student grades upon graduation from
A. SVC Model Accuracy school in Mathematics, Arabic, Social Studies and English
For the model built using the support vector machine, do not add accuracy to the predictive model. The target
the following indicators were demonstrated: variable is influenced by characteristics such as the student
The total sample size is 550 students, the test sample grades in mathematics, the mathematical analysis in the
was 30%, which means that predictions were made for 165 first semester, the score in computer skills, physics, and the
students. 24% of students were classified in the uncertainty average score for the entire first semester. These
class, for 65% it was predicted that they would pass the characteristics were included in the final models.
exam, for 11% that they would not pass. Conclusion
Thus, out of 165 people, for 38, prediction is The result of this work is the developed machine
impossible, 113 fell into the class of those who will receive learning model. This model allows to predict the success of
the test, and for 14, failure was predicted. the student at the end of the first year, based on data on his
For the classes "pass" and "not pass" the error matrix school success and grades for the first semester of
looks like this: university studies. The implemented model predicted
whether the student would pass or fail the exam. The model
True Positive = 110, False Positive = 3 showed an accuracy of 87%.
False Negative =12, True Negative = 2
REFERENCES
Thus, the metrics of the classifier have the following
values: [1] Samuel A.L. Some studies in machine learning using the game
of checkers // IBM J. Res. Dev. 1959. Т. 3. № 3. С. 210–229.

267

Authorized licensed use limited to: Carleton University. Downloaded on November 03,2020 at 13:56:47 UTC from IEEE Xplore. Restrictions apply.
[2] Mitchell T. Machine learning // McGraw-Hill
Science/Engineering/Math, 1997. 432 С.
[3] Kotsiantis, Sotiris B. "Use of machine learning techniques for
educational proposes: a decision support system for
forecasting students’ grades." Artificial Intelligence Review
37.4 (2012): 331-344.
[4] Kotsiantis, Sotiris B., C. J. Pierrakeas, and Panayiotis E.
Pintelas. "Preventing student dropout in distance learning
using machine learning techniques." International conference
on knowledge-based and intelligent information and
engineering systems. Springer, Berlin, Heidelberg, 2003.
[5] Lai C.-C. An empirical study of three machine
learning methods for spam filtering // Knowledge-
Based Syst. 2007. Т. 20. № 3. С. 249–254.
[6] URL: https://newtonew.com/tech/knewton-adaptivnoe-
obuchenie-v-dejstvii.
[7] The Future of Big Data and Analytics in K-12 Education/

Education Week URL:
http://www.edweek.org/ew/articles/2016/01/13/the-future-of-
big-data-and-analytics.html.
[8] Powers D.M.W. Evaluation: From Precision, Recall and F-
Measure to ROC, Informedness, Markedness &
Correlation // J. Mach. Learn. Technol. 2011. Т. 2. № 1. С.
37–63.
[9] Russell S.J. Artificial Intelligence. A Modern Approach.,
1995. 106-10 С.
[10] Dietterich T. Introduction to Machine Learning // Second
Edition Adaptive Computation and Machine Learning, 2010
[11] Identifying Student Candidates in Higher Education using
Machine Learning– URL:
https://www.clearscale.com/blog/higher-education-machine-
learning/]

268

Authorized licensed use limited to: Carleton University. Downloaded on November 03,2020 at 13:56:47 UTC from IEEE Xplore. Restrictions apply.
View publication stats

You might also like