You are on page 1of 23

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/349240193

Enhancing Prediction of Student Success: Automated Machine Learning


Approach

Article  in  Computers & Electrical Engineering · February 2021


DOI: 10.1016/j.compeleceng.2020.106903

CITATIONS READS
47 133

3 authors, including:

Hassan Zeineddine Udo C. Braendle


American University In Dubai American University In Dubai
15 PUBLICATIONS   151 CITATIONS    50 PUBLICATIONS   484 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Global Sustainability Research Network (GSRN) View project

All content following this page was uploaded by Hassan Zeineddine on 03 February 2022.

The user has requested enhancement of the downloaded file.


PREPRINT
Enhancing Prediction of Student Success: Automated Machine Learning Approach

Hassan Zeineddine, PhD*, Udo Braendle, PhD*, Assaad Farah, PhD*

*AUD/IBM Center of Excellence for Smarter Logistics, American University in Dubai, Dubai, PO Box 28282, UAE

Abstract

Students’ success has recently become a primary strategic objective for most institutions of higher education. With

budget cuts and increasing operational costs, academic institutions are paying more attention to sustaining students’

enrollment in their programs without compromising rigor and quality of education. With the scientific advancements

in Big Data Analytics and Machine Learning, universities are increasingly relying on data to predict students’

performance. Many initiatives and research projects addressed the use of students’ behavioral and academic data to

classify students and predict their future performance using advanced statistics and Machine Learning. To allow for

early intervention, this paper proposes the use of Automated Machine Learning to enhance the accuracy of predicting

student performance using data available prior to the start of the academic program.

Key words: Automated Machine Learning, Prediction Accuracy, Student Performance, Pre-Admission Data,

Ensemble Model, Higher Education.

1
P
1. Introduction

Student retention is a pressing issue for academic institutions around the globe, given tight budgets and limited

resources [1]. The average dropout rate in Organization for Economic Co-operation and Development (OECD)

countries is around 45% [2]. Accordingly, higher education establishments are creating and setting intervention

strategies to remedy this problem. Researchers and practitioners agree that such strategies are most effective if applied

in a student’s first year of study. Hence, a lot of focus has been placed on predicting, as early as possible, vulnerable

students who are prone to drop their courses [2, 3].

Recently, predictive analysis has relied on Machine Learning to support business decision-making. Applications

in finance, operations and risk management are good attestations of the relevance of Machine Learning research in

various business functions. Evermann et al., for example, used machine learning to predict business process

performance [4], and Carneiro et al. to spot credit-card fraud [5].

More and more, Machine Learning is used in the field of higher education management. Specifically, there has

been an increased interest in adopting Machine Learning to predict student performance and identify students at risk

based on initial data gathered during their years of study, as surveyed in the work of Miguéis et al. [6]. Fewer work

addressed the prediction of student performance using data prior to starting their academic journey [2, 3].

Given the complexity of choosing an optimal prediction model for a given dataset from a wide pool of predictive

methods and different hyper-parameter values per model, the automation of this process can help increase the

prediction accuracy [7, 8, 9]. In relation to that, Automated Machine Learning (AutoML) is a technique meant to

derive the best classification model and corresponding hyper-parameters for a given decision-making problem. This

technique can add value if used in predicting student performance. Yet, the review of the literature in this area shows

a lack of empirical work using AutoML. Our research work relies on AutoML to help increase the accuracy of

predicting student performance using data available upon entering an academic program.

The rest of the paper is organized as follows. Section 2 will discuss the Theoretical Background. Section 3 presents

the Methodology and Section 4 highlights the results. Section 5 concludes.

2
PREPRINT
2. Theoretical Background

The topic of predicting student performance in academic institutions has attracted the attention of researchers and

academic administrators for the past two decades [10]. The literature is mainly focusing on two fronts: identifying the

most critical attributes for predicting student performance and finding the best prediction method for enhancing the

prediction accuracy [2, 6, 10, 11, 12, 13 14].

In relation to identifying critical attributes, several factors may affect a student’s performance such as social and

economic standing, psychological elements, demographics, school systems, and social networks [15]. Reviews of the

common attributes used in predicting student performance discussed several factors and categorized them as either

internal or external [16]. Attributes such as assignment marks, quizzes, class tests and attendance are classified as

internal assessment [17]. Several papers have also used cumulative grade point average (CGPA) as their main internal

attributes to assess student performance [16]. In terms of external assessment, one needs to mention student

demographics such as gender, age, family background, special needs, etc. [18]. Other popular external attributes are

socio-demographic characteristics, extra-curricular activities, high school background and social interaction network

[18, 19, 20]. Several researchers have also used psychometric factors such as personal interest, study habits, and family

support [21, 22].

Several machine learning methods have been used in the literature to predict student performance, mainly Logistic

Regression, Decision Tree, Artificial Neural Network, Naive Bayes, K-Nearest Neighbor, Support Vector Machines,

and different Ensemble methods. The next paragraphs discuss the use of these methods in predicting student

performance and focus on the prediction accuracy.

2.1. Logistic Regression

Regression methods for predicting student performance use a finite set of relationships among the dependent and

independent variables, generating a predictive function that models these associations [12, 18, 19, 23]. The logistic

regression method for predicting student performance is normally used to describe the associations between a

number of independent variables that could be categorized as binary, categorical and continuous [2, 13, 21, 24]. The

level of prediction accuracy using the logistic regression is around 70% using variables such as career aspirations,

CGPA, psychological scores, and personal interests.

3
2.2. Decision Tree

Many researchers have used the Decision Tree prediction method for its clarity and ease in exposing small and

large data sets and forecasting the value [6, 18, 21, 23, 25]. The logic when applying decision tree techniques is

equivalent to a series of IF-THEN statements, which can help in simplifying the understanding of this method. There

are several papers that have used this method to predict student performance using key indicators such as student

grades in specific courses and current CGPA [13, 22, 26]. The accuracy of prediction using this method while relying

on data prior to students starting an academic program is around 70% [2], and reaches 90% when using data gathered

after joining the program [16].

2.3. Artificial Neural Network

An Artificial Neural Network (ANN) can detect all existing interactions among independent variables. It has been

widely used as a method in educational data mining. The ANN’s ability to detect with high confidence complex

associations between independent and dependent variables makes it a powerful tool in predicting student performance

[12, 13, 23, 24, 25, 26]. The most common variables used in forecasting student performance using neural network

are student attitude towards learning, admission data, CGPA and grades in specific courses. This technique led to up

to 98% accuracy in predicting student performance using data after students joining an institution, and had an accuracy

of around 70% using data prior to students starting their academic journey [2, 16].

2.4. Naive Bayes

Naïve Bayes is another method used to predict student performance. It uses all attributes existing in the data and

makes comparisons among independent variables to show the significance and effect of each of these predictors. The

papers that used this method predominantly considered variables such as grades, scholarships, CGPA, high school

background, demographics, social network data and internal assessments. Research using Naïve Bayes relied mostly

on data gathered after students had started their academic journey [6, 13, 21, 23, 24], with a minimum accuracy of

50% and a maximum of 76% [16].

4
2.5. K-Nearest Neighbors

The K-Nearest Neighbors is a simple algorithm that classifies a data point based on the prevalent class of its K-

Nearest Neighbors. The data in this technique encompasses a number of multivariate attributes that are used for

classification. The K-Nearest Neighbors method is quick in predicting student performance in terms of level of

learning (slow, medium, good and excellent learner) [13, 21, 23]. Its accuracy rate was slightly above 60% when using

psychomotor factors, and reached 83% when using data extracted from internal assessments, CGPA, and extra-

curricular activities [16].

2.6. Support Vector Machine

Support Vector Machine (SVM) is a supervised learning method that classifies data points by segregating them

using an N-dimensional hyperplane, where N is the number of attributes characterizing a data point. This method has

helped researchers in predicting student performance when working with small samples [6, 12, 13, 21, 23, 25]. The

SVM also proves to be effective when dealing with overlapped data. Earlier research used CGPA, extra-curricular

activities, psychomotor tests and internal assessments in predicting student performance [19] and reached an accuracy

of` around 80% [16].

2.7 Ensemble of Methods

There is a general consensus that combining prediction methods produces more accurate and more robust

prediction results [27]. The collective decision of all methods is the result of a probabilistic averaging or a voting

scheme. To ensure an increase in accuracy over individual methods, the methods in an ensemble should have a fair

level of uncorrelated errors [28]. In another term, each constituent method should yield better accuracy than the other

methods in the set if applied individually on a different segment of the data space. In addition, none of the methods

will be able to yield optimal accuracy if applied on the entire data space. Several papers addressed the topic of

predicting student performance using the Ensemble Method [2, 3, 6, 11, 24]. Specifically, those papers relied on

Radom Forest, Boosted Trees, Bagged Trees, and Information Fusion. Delen [11] reported 82% accuracy in predicting

students’ performance within their first year of studies using the Information Fusion approach. Migues [6] reported

95% accuracy in predicting students’ performance within their first year of studies using Boosted Trees, relying on

5
PREPRINT
earned grades and completed credits. Hoffait and Schyns’ work [2] was distinguished with their use of Random Forest

based on data gathered prior to admission. They extended different ensemble models with a special algorithm to

increase their prediction accuracy. The algorithm aims at identifying a subset of students who are most likely to fail,

out of the general set of students who are predicted to fail. It ensures that the prediction accuracy rate , using the

identified subset, should be equal to a confidence level, which is defined by the decision maker. After applying the

algorithm on Random Forest, it identified 21.2% of students from the set of those who were facing a high risk of

failure, with a confidence of 91%. However, when considering the entire set of students, the authors reported close to

70% accuracy for predicting Fail, and close to 59% accuracy for predicting Pass.

2.8 Automated Machine Learning

After reviewing the widely used prediction methods, it is important to re-emphasize the value of automation for

choosing an optimal prediction model, given the complexity of such a task. Various AutoML applications have

recently been described in the literature [7, 8, 9]. The Study of Tuggener et al. [9] confirms the superiority of auto-

generated machine learning models over human-designed models. Luo et al. highlight the cost of building and

generalizing Machine Learning models that often requires hundreds of manual iterations to identify a suitable

prediction model and corresponding hyper-parameters, and encourage medical researchers to adopt AutoML for cost

efficiency. Salvador et al. [7] led experimental analysis examining the search space of 812 billion possible

combinations of methods and categorical hyper-parameters, for 21 publicly available data sets, and 7 data sets from

real chemical production processes. Relying on their results, they encouraged practitioners to use AutoML on a broad

variety of classification problems. Stadelmann et al. reported practical use of AutoML in-analyzing house and client-

related data at PricewaterhouseCoopers [8].

In light of the reviewed literature, there is an evident need to use AutoML in an attempt to improve the accuracy

of predicting student performance. Particularly, such a need is prominent when predicting students’ performance based

on data prior to starting their first academic year, where the accuracy level is around 70%. Increasing the accuracy of

prediction, based on data available from day one, is not only of high value for researchers but also for practitioners

focusing on student success and retention.

6
This paper relies on an automatic search algorithm in machine learning to identify the optimal model to predict

student success at the start of their first year in a university – using data available prior to starting a new program. This

can help in an early intervention approach to mitigate their risk of failure.

3. Methodology

In this study, we rely on AutoML to derive the best classification model and corresponding hyper-parameters.

Amongst the most popular tools that offer AutoML features are Auto-Weka [28] and Auto-sklearn [29]. We chose to

run the Auto-Weka search algorithm with the hyper-parameter optimization option. Figure 1 represents the automated

machine learning process that looped through the list of predictive methods and corresponding hyper-parameter values

to identify the model with the best accuracy. The search algorithm concluded with an Ensemble Model of multiple

methods that yielded the best classification accuracy out of all the auto-tested combinations of prediction methods and

corresponding hyper-parameters. The prediction mechanism of the identified Ensemble Model is based on a voting

scheme that adopts the prediction outcome resulting from the majority of the constituent methods. The constituent

methods of the ensemble are:

▪ Artificial Neural Network

▪ K-Nearest Neighbors

▪ K-Means Clustering

▪ Naïve Bayes

▪ Support Vector Machine

▪ Logistic Regression

▪ Decision Tree

7
All combinations of
All ML hyper-parameter Training dataset Testing dataset
methods values that correspond
to one ML method

Select Select a combination Train ML Test ML


Chose
one of of hyper-parameters method based method based
Start best ML
the ML that correspond the on the training on the testing
method
Methods selected ML method dataset dataset

Done all Done all


methods? combinations?

End

Figure 1 – Automated Machine Learning Process

3.1. Artificial Neural Network

Mimicking the neural connections and interactions in the human brain, an ANN models a brain neuron using a

mathematical function F(𝑥) [30]. The ANN simulates the interconnections among neurons by nesting functions based

on a network model. The function’s parameter x is a vector of size n, 𝑥 = [𝑥1 , 𝑥2 , … , 𝑥𝑛 ]. We can represent this

function as:

F(𝑥) = S(∑𝑛𝑖 𝜔𝑖 𝐹(𝑥𝑖 )) (1)

If x is a scalar, F(𝑥) = 𝑥. The factor ω is a weight that will be learned through training the network on historical data.

S is a transfer function that normalizes the output within a specific range of values. The adopted transfer function in

this study is the Sigmoid function that modulates values between 0 and 1 as follows:

1
S(𝑥) = (2)
1+𝑒 −𝑥

The ANN is a hierarchical model made of multiple layers. Each layer has a number of nodes (neurons) that

connects via unidirectional links with all nodes in the downstream layer. There is no connection to upstream or same-

layer nodes. Normally, there are three types of layers: the input layer, a set of middle layers, and the output layer as

8
shown in figure 2. The architecture of the ANN adopted in this study is made of an input layer representing the

different categorical values of the adopted data features, 2 middle layers having 12 and 7 neurons respectively, and an

output layer made of one neuron representing the binary outcome.

Input Layer x1 x2 ..... xn

ẋ1 ẋ2 ..... ẋm

Middle Layers

ẍ1 ẍ2 ..... ẍl

Output Layer x

Figure 2 – ANN Hierarchy

3.2. K-Nearest Neighbors

The K-Nearest Neighbors method classifies a data point based on the dominant class of its K-nearest neighboring

points within a training data set. The distance between two data points is measured using a specific function, such as

the Euclidean, Manhattan and Chebychev functions [30]. In our study, the adopted distance function is the Euclidean,

in which K was set to 1. The Euclidean distance between two data points x and y, where x and y are vectors of size n,

is:

𝐸(𝑥, 𝑦) = √∑𝑛𝑖=0(𝑥𝑖 − 𝑦𝑖 )2 (3)

3.3. K-Means Clustering

The K-Means Clustering method assigns a data point to one class out of K different classes. Before the

classification, the clustering algorithm arranges the data points of a training set into K different clusters, which

represent eventually the classes. In this study, K was equal to 2 since we have two different classes: Pass and Fail.

9
The assignment of a data point to a cluster is decided based on its distance from the centroid of each cluster. The

centroid is the average data point of all points in the cluster. The adopted distance function for this algorithm is the

Euclidean Distance [30].

In our study, after the training phase to assign historical data points into two different clusters, the K-Means

Clustering classifier is able to predict the outcome of a new data point by assigning it to the cluster that has the closest

centroid.

3.4. Naïve Bayes

The Naive Bayes classifier is a simple technique that predicts outcomes based on the Bayesian theorem. The

training of Naïve Bayes classifier is fast compared to other computationally intensive models. It classifies a data point

x based on the conditional probability of being in a class C given the values of its constituent scalars [x1, x2, …, xn],

without relying on any additional parameter. The class that has the highest probability of occurrence given the inputs

will be the predicted class [14].

The probability of being in the class C given 𝑥 = [𝑥1 , 𝑥2 , … , 𝑥𝑛 ] is as follows:

𝑃(𝐶)
𝑃(𝐶|𝑥) = ∗ ∏𝑛𝑖=1 𝑃(𝑥𝑖 |𝐶) (4)
𝑃(𝑥)

3.5. Support Vector Machine

The SVM classifier derives boundaries between data points that belong to different classes. Points within certain

boundaries are normally part of a common class. The ideal scenario is when the data points belonging to different

classes are separable via a linear boundary. However, in most cases this is not possible due to data overlaps as shown

in figure 3. SVM casts the data points to a new higher dimension space in which the data becomes linearly separable

with a hyperplane, using a specific kernel function. This technique is based on the Cover's Theorem stating that non-

linearly separable data points would highly likely be separated by a hyperplane if projected to a higher-dimensional

space via some non-linear transformation. The boundary hyperplane will be realized by referencing the borderline

data points, which are called the support vectors. The identified support vectors should be away from the boundary by

a given margin. Not only the kernel function takes care of casting to a new space but also provides the dot product

10
between two data points x and y for measuring distances, hence reducing the computational overhead. We relied in

this study on the polynomial kernel function F of degree d as shown below [30]:

𝐹(𝑥, 𝑦) = (∑𝑛𝑖=1 𝑥𝑖 ∙ 𝑦𝑖 + 𝑐)𝑑 (5)

Figure 3 – Non-linearly separable overlapping data classes

3.6. Logistic Regression

Logistic Regression classifier transforms the output of a linear regression function f(x) into a value between 0 and

1 using the logistic function L as described in the function below [30]. It reflects the odds of class occurrence with

respect to the given features.

1
𝐿(𝑓(𝑥)) = (6)
1+𝑒 −𝑓(𝑥)

3.7. Decision Tree

A Decision Tree classifier learns from a set of historic data points and generates a corresponding tree-like structure.

The features and respective values are analyzed and structured in a hierarchical tree-like topology, which helps in

answering questions by a simple root-to-leave traversal. The root and all other decision nodes are connected to two or

more downstream nodes (all representing answers to decision questions). A leaf node has no downstream connections

and represents the final answer to the series of questions captured in the path of nodes preceding it up to the root [30].

Figure 4 is a snapshot from a section of the Decision Tree pertaining to this study.

11
Figure 4 – Section of the Decision Tree (No – CGPA Not Below 2.0; Yes – CGPA Below 2.0)

3.8. Data Sources

We have collected the data for this study from different sources within academic institutions in the United Arab

Emirates. Specifically, we relied on student records from Admission, Registrar and Student Service offices. Our

sample included records of 1491 students, of whom 1014 were in good academic standing.

We faced three main challenges when building the predictive model based on this sample: data inconsistency,

imbalance and overlap. For students who have spent at least a semester in a university program, several data features

would exist and should help in producing predictive models with high precisions. For example, we can rely on several

features to predict students’ success in a particular course or program such as grades in key courses, exams, past terms’

CGPAs, probations, warnings, class participations, and extra-curricular engagements. For new entrants, in the absence

of this data, other variables that are available upon admission are required to build a precise predictive model. These

variables represent common attributes of the admitted students such as age, gender, ethnicity, study program, course

load, on-campus residency, probation, and school education system. We used all of these variables in this study.

Furthermore, to address the differences in the high school systems and the inconsistency in evaluation schemes, we

relied on the students’ placement in developmental English and Math courses that are based on scores from standard

exams such as TOFFEL, IELT, English ACCUPLACER, Math ACCUPLACER, and SAT. We have used 13 data

12
features in developing this predictive model, as described in Table 1, and transformed their values to categorical

ranges.

The imbalance between the number of passing (1014) and failing (477) students biases the predictive model. We

needed to apply a careful data balancing technique to ensure better precision without compromising the learning value

from the data. We chose the Synthetic Minority Oversampling Technique (SMOTE) [31] to create extra data points

in the training data set in order to make a balance between the data classes. Table 2 shows the percentage of failing

students for each category under each data feature.

Students having similar data might end up having different outcomes causing confusion to a prediction method.

Due to this data overlap, a method resorts to a particular stochastic guess within certain probabilistic limits to predict

an outcome, leading to reduced prediction accuracy. Our proposed ensemble of multiple predictive methods increases

the prediction accuracy since it relies on voting amongst different methods. In other words, the prediction outcome of

the Ensemble Model is the most recurring classification among the set of methods.

Table 1
Data Features.
Feature Values Description

Program BBA, ENG, BAIS, ARC, ID, Program of study in the university. The 6 considered programs

VC, BCIS, GEN are: Bachelor of Business Administration (BBA), Engineering

(ENG), Bachelor of Arts in International Studies (BAIS),

Architecture (ARC), Interior Design (ID), Visual

Communications (VC), Bachelor of Communication and

Information Studies (BCIS), and General (GEN)

School HSD, IB, IGCSE, BAC, OTH School system from which a student is coming, as per the UAE

System Ministry of Education. Several high school systems are

considered: High School Diploma (HSD), International

Baccalaureate (IB), International General Certificate of

Secondary Education (IGCSE), Baccalaureate (BAC), Other

(OTH)

13
Ethnicity NAMR, AUS, ASIA, SAMR, The ethnic community to which the student belongs: North

EURO, LEVN, PERS, GCC, American (NAMR), Australian (AUS), Asian (ASIA), South

AFRC, NAFR, SASA, NASA American (SAMR), European (EURO), Levantine (LEVN),

Persian (PERS), Arab Gulf (GCC), African (AFRC), North

African (NAFR), South Asian (SASA), North Asian (NASA)

Gender Male, Female

Age Group AGE20+ and AGE19- The age is inferred from the date of birth an grouped under two

categories: 19-and-below, and 20-and-above

Scholarship NONE, QUART, HALF, The scholarship status of the student: No scholarship (NONE),

FULL 25% scholarship (QUART), 50% scholarship (HALF), and

100% scholarship (FULL)

Transfer TRC, TRN, NON The transfer status of the students: Transferred from another

Status university with no credits counted (TRN), Transferred from

another university with some credits counted (TRC), Not

transferred from any university (i.e. coming directly from a

high school. This is the case of the majority of students)

(NON)

Admitted on YES, NO The admission on probation status: Student admitted on


Probation
probation (YES), Student admitted with no probation (NO).

In Dorm YES, NO The dorm occupancy: Student lives in a campus-based

dormitory (YES), Student does not live in a campus-based

dormitory (NO)

Course Load HIGH, MODR, NORM The course load is inferred from the number of registered

courses (credits): High load (HIGH) is for 6 or more courses

(18+ credits), Normal load (NORM) is for 5 courses (15

credits), Moderate load (MODR) is for 4 course or lower (12-

credits)

14
Math Level MATH1, MATH2, MATH3, The level of math skills upon admission based on the math

MATH4, MATH5 placement test. The lowest math is MATH1 and the highest

math level can go up to MATH 5 depending on the program of

study.

English Level ENGL1, The level of English skills upon admission based on the

ENGL2, ENGL3, ENGL4, English placement test. The lowest English is ENGL1 and the

ENGL5 highest English level can go up to ENGL5 depending on the

program of study.

Result PASS, FAIL The outcome based on the student’s Grade Point Average

(CGPA) ranging from 0 to 4, in their first semester at the

university. If the CGPA is below 2, the student is considered to

be failing (FAIL); otherwise, the student is considered to be

passing (PASS).

Table 2
Descriptive Data.
Feature Percentage of failing students

Program GEN (60%), BBA(42%), ENG(34%), ARC(26%), BAIS(24%), VC(19%),

ID(15%), BCIS(13%)

School System HSD(33%), IGCSE(32%), BAC(26%), IB(19%)

Ethnicity NASA(42%), GCC(39%), SASA (38%), NAFR(38%), AFRC(35%), PERS(32%),

MEST(28%), EURO(21%)

Gender Female(14%), Male(49%)

Age Group Age19-(30%), Age20+(49%)

Scholarship NONE(37%), FULL(19%), QUART(10%), HALF(0%)

Transfer Status AS(93%), TRN(54%), NHS(29%), TRC(26%)

Admitted on Probation YES(60%), NO(30%)

In Dorm YES(31%), NO(32%)

15
Course Load LOW(65%), MOD(67%), NORM(35%), HIGH(18%)

Math Level NONE(15%), MATH1(48%), MATH2(36%), All other Math levels (average of

25%)

English Level NONE(23%), ENGL1(44%), ENGL2(18%), All other English levels (average of

10%)

Result 31.9% of the dataset were failing students.

4. Results

We used a 10 folds cross-validation to test the accuracy of the resulting Ensemble Model. The model is trained on

90% of the points and tested with 10% over 10 different runs. It is important to note that the data points that are

allocated for testing as part of the 10% split are different each time. Figure 5 is a schematic representation of the cross-

validation process adopted in Weka for this study.

Trained
Training SMOTEd
Ensemble
Dataset Dataset
Model
90%

Start Split SMOTE Train Test

10%

Dataset Test
Dataset
No

Yes Finished Aggregate


10 loops Results
End

Figure 5 – 10 folds Cross Validation of the Model Accuracy

16
Table 3 lists the classification methods with their corresponding accuracy rates when applied on our data set. In

addition to the overall accuracy, the table differentiates between the accuracy of predicting Fail and Pass. This

differentiation is important to assess the efficiency of these methods in targeting students at risk.

Table 3
Methods Comparison.

Classification Method Accuracy Rate of Accuracy Rate of Overall Kappa

predicting Fail predicting Pass Accuracy Statistics

Students Students Rate

Ensemble Model 83.0% 72.5% 75.9% 0.50

Artificial Neural Network 73.6% 69.8% 71.0% 0.39

K-Nearest Neighbors 77.4% 65.4% 69.2% 0.37

K-Means Clustering 74.2% 36.4% 48.5% 0.08

Naïve Bayes 76.7% 69.8% 72.0% 0.42

Support Vector Machine 56.0% 82.2% 73.8% 0.38

Logistic Regression 73.0% 69.8% 70.8% 0.38

Decision Tree 76.7% 65.7% 69.2% 0.37

Further, table 3 highlights the kappa coefficient (κ), which is a statistic representing the level of agreement between

two different classifiers. It factors in the possibility of accidental agreements. In our case, the agreement is measured

between the modeled classifier and the observed process.

𝑃𝑜 −𝑃𝑒
𝑘= (7)
1−𝑃𝑒

Po is the probability of making the right prediction, i.e. the accuracy measure. Pe is the probability of accidental

agreement between the classifiers. In a binary system having two predictors, 𝑃𝑒 = 𝑃1 (𝑎) ∙ 𝑃2 (𝑎) + 𝑃1 (𝑏) ∙ 𝑃2 (𝑏),

where Pi(n) is the probability of classifier i predicting class n. A kappa coefficient between 0.4 and 0.75 is considered

good according to Fleiss’ Scale. A kappa below 0.4is poor, and above 0.75 is excellent. Our Ensemble Model achieved

a Kapa of 0.5, which is nearly 20% higher than what is achieved in using each prediction model separately (using the

same data). This implies that our Ensemble Model resulting from the automatic search leaves less chance for accidental

guessing.

17
5. Conclusion

The reported work in this paper contributes to the body of knowledge in the field of predicting student academic

success. Specifically, it relies on AutoML to increase the prediction accuracy of student performance using data

features available prior to the students starting their new academic program, i.e. pre-start data. In effect, the accuracy

of predicting student performance using pre-start data has never exceeded 70%, as found in the current literature [2,

3]. In our study, we achieved 75.9% overall accuracy through the use of AutoML, with a Kapa of 0.5. Accordingly,

we encourage researchers in this field to adopt AutoML in their search for an optimal student performance prediction

model, especially when using pre-start data.

Beside improving the overall prediction accuracy, it is of paramount importance to improve the accuracy of

predicting the failing students, who need immediate attention and support from specialized units within academic

institutions. The maximum accuracy rate reported in the literature on predicting failure of new-start students is at

70%. In our case, the auto-generated Ensemble Model predicts failing students with an accuracy of 83%, after

balancing the data using Synthetic Minority Oversampling Technique. Such a result emphasizes the importance of

balancing data using advanced statistical techniques to achieve better prediction, especially if the minority class is of

interest. The authors acknowledge the overgeneralization limitation of using SMOTE. Yet, since the data set

contains a sizeable minority, the risk of creating synthetic values outside of the minority set, which overlap with the

majority set, is rather minor.

The resulting increase in prediction accuracy of students at risk allows academic institutions to be more efficient

in supporting those students while utilizing the least amount of resources. Future studies may rely on descriptive

statistics to analyze the role of different psychographic variables and their impact on the predictive model. It would

be interesting for upcoming studies to test auto-generated ensemble models in predicting student career success using

academic and psychographic data.

18
References

[1] M. Tight, Student retention and engagement in higher education, Journal of Further and Higher Education, Mar

2019, DOI: 10.1080/0309877X.2019.1576860.

[2] A.S. Hoffait, M. Schyns, Early detection of university students with potential difficulties, Decision Support

Systems 101 (2017) 1–11.

[3] J.P. Vandamme, N. Meskens, J.F. Superby, Predicting academic performance by data mining methods,

Education Economics 15 (4) (2007) 405–419.

[4] J. Evermann, J.R. Rehse, P. Fettke, Predicting process behaviour using deep learning, Decision Support Systems

100 (2017) 129-140.

[5] N. Carneiro, G. Figueira, M. Costa, A data mining based system for credit-card fraud detection in e-tail, Decision

Support Systems 95 (2017) 91-101.

[6] V.L. Miguéis, Ana Freitas, Paulo J.V. Garcia, André Silva, Early segmentation of students according to their

academic performance: A predictive modelling approach, Decision Support Systems 115 (2018) 36-51.

[7] M. M. Salvador, M. Budka, B. Gabrys, Automatic Composition and Optimization of Multicomponent Predictive

Systems With an Extended Auto-WEKA, IEEE Transactions on Automation Science and Engineering 16 (2) 2019.

[8] T. Stadelmann, M. Amirian, I. Arabaci, M. Arnold, G. F. Duivesteijn, I. Elezi, M. Geiger, S. Lӧrwald, B.B.

Meier, K. Rombach, Deep learning in the wild, IAPR Workshop on Artificial Neural Networks in Pattern

Recognition, Springer, 2018, pp. 17–38,.

[9] L. Tuggener, M. Amirian, K. Rombach, S. Lӧrwald, A. Varlet, C. Westermann, T. Stadelmann, Automated

Machine Learning in Practice, State of the Art and Recent Results, Proceedings of the 6th IEEE Swiss Conference

on Data Science (SDS’19), Bern, Switzerland, June 14, 2019.

[10] A. Pena-Ayala, Educational data mining: a survey and a data mining-based analysis of recent works, Expert

Systems with Applications 41 (4) (2014) 1432–1462.

[11] D. Delen, A comparative analysis of machine learning techniques for student retention management, Decision

Support Systems 49 (4) (2010) 498–506.

19
[12] S. Huang, N. Fang, Predicting student academic performance in an engineering dynamics course: a comparison

of four types of predictive mathematical models, Computers & Education 61 (2013) 133–145.

[13] F. Marbouti, H.A. Diefes-Dux, K. Madhavan, Models for early prediction of at-risk students in a course using

standards-based grading, Computers & Education 103 (2016) 1–15.

[14] C. Márquez-Vera, A. Cano, C. Romero, A.Y.M. Noaman, H. Mousa Fardoun, S. Ventura, Early dropout

prediction using data mining: a case study with high school students, Expert Systems 33 (1) (2016) 107–124.

[15] M. Richardson, C. Abraham, R. Bond, Psychological correlates of university students' academic performance: a

systematic review and meta-analysis, Psychological Bulletin 138 (2) (2012) 353–387.

[16] A.M. Shahiri, H. Wahidah, A.R. Nur’aini, A Review on Predicting Student's Performance Using Data Mining

Techniques. Procedia Computer Science 72 (2015) 414-422.

[17] Z.K. Papamitsiou, V. Terzis, A.A. Economides, Temporal learning analytics for computer based testing,

Proceedings of the Fourth International Conference on Learning Analytics And Knowledge, LAK ’14, ACM, New

York, NY, USA, 2014, pp. 31–35.

[18] S. Natek, M. Zwilling, Student data mining solution-knowledge management system related to higher education

institutions, Expert Systems with Applications 41 (14) (2014) 6400–6407.

[19] M. Mayilvaganan, D. Kalpanadevi, Comparison of classification techniques for predicting the performance of

students academic environment, 2014 International Conference on Communication and Network Technologies

(ICCNT), IEEE, 2014, pp. 113–118.

[20] G. Putnik, E. Costa, C. Alves, H. Castro, L. Varela, V. Shah, Analysing the correlation between social network

analysis measures and performance of students in social network-based engineering education, International Journal

of Technology and Design Education 26 (3) (2016) 413–437.

[21] G. Gray, C. McGuinness, P. Owende, An application of classification models to predict learner progression in

tertiary education, Advance Computing Conference (IACC), 2014 IEEE International, 2014, pp. 549–554.

[22] T. Mishra, D. Kumar, S. Gupta, Mining students' data for prediction performance, 2014 Fourth International

Conference on Advanced Computing Communication Technologies, 2014, pp. 255–262.

20
[23] P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira, R. Abreu, A Comparative Study of Classification and

Regression Algorithms for Modelling Students' Academic Performance, International Educational Data Mining

Society, Madrid, 2015.

[24] C. Romero, P.G. Espejo, A. Zafra, J.R. Romero, S. Ventura, Web usage mining for predicting final marks of

students that use Moodle courses, Computer Applications in Engineering Education 21 (1) (2013) 135–146.

[25] E.B. Costa, B. Fonseca, M.A. Santana, F.F. de Araújo, J. Rego, Evaluating the effectiveness of educational data

mining techniques for early prediction of students' academic failure in introductory programming courses,

Computers in Human Behavior 73 (2017) 247–256.

[26] C. Romero, S. Ventura, Data mining in education, Wiley Interdisciplinary Reviews: Data Mining and

Knowledge Discovery 3 (1) (2013) 12–27.

[27] G. Seni, J. Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions,

Morgan and Claypool, 2010.

[28] L. Kotthoff, C. Thornton, H.H. Hoos, F. Hutter, K. Leyton-Brown, Auto-WEKA 2.0: Automatic model

selection and hyperparameter optimization in WEKA, Journal of Machine Learning Research 18 (2017) 1-5.

[29] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated

machine learning, Advances in Neural Information Processing Systems 28 (2015) 2962–2970.

[30] G. James, D. Witten, T. Hastie, R. Tibshirani, R., An Introduction to Statistical Learning: With Applications in

R, 2013, Springer.

[31] G. Douzas, F. Bacaoa, F. Last. Improving imbalanced learning through a heuristic oversampling method based

on k-means and SMOTE. Information Sciences 465 (2018) 1-20.

21
Bios:

Hassan Zeineddine holds a PhD in computer sciences from the University of Ottawa in Canada. He has 15 years of

industry experience associated with several leading telecommunication companies in North America. Hassan’s

current research interests are in the fields of data analytics, operations research, logistics and supply chains

collaboration. His other research interests include process modeling and simulations.

Assaad Farah holds a PhD in Management from the University of Bath in the United Kingdom. In addition to his

academic responsibilities, he is an executive educator and consultant mainly for the UAE public sector. Prior to that,

he worked in the aeronautical and mobile industry in Canada. His research focus revolves around knowledge

management, strategic HRM and AI.

Udo Braendle has worked in practice and for universities for more than 15 years. His research mainly focuses on

management science, regulation and the social and environmental behavior of firms. He has published widely on

these issues in leading journals, such as the Social Responsibility Journal, Journal of Management and Governance,

and the Business Strategy Review.

22

View publication stats

You might also like