Professional Documents
Culture Documents
1
II. LITERATURE REVIEW III. DATA
In recent years, there has been growing interest in The dataset used for this research was acquired be-
utilizing machine learning and data mining techniques tween July 12 and July 25, 2018, from a tertiary ed-
to predict student academic performance. These stud- ucational institution— the University for the Develop-
ies, diverse in nature, employ a range of methodologies ment of the National Races of the Union (UDNR) —lo-
and focus on various global contexts. Notably, current cated in the north-western region of Sagaing, Myan-
empirical studies have yet to explore the application of mar. Our sample comprises 735 students from the third,
machine learning techniques for academic prediction fourth, and fifth academic years. Founded in 1964
within the context of Myanmar. as the Academy for Development of National Groups
Albreiki et al. (2021) conducted a comprehensive re- (ADNG), the primary objective of UDNR is to nur-
view of key studies between 2009 and 2021, shedding ture an educational workforce dedicated to the socio-
light on the expanding role of machine learning in ed- economic development of national races, promoting
ucational areas. The results affirm the transformative unity and prioritizing the progress of border areas, par-
potential of machine learning. More specifically, edu- ticularly in education and other social sectors.
cators who leverage machine learning can gain a clearer The unique composition of the university, with its
understanding of student progress, enabling early inter- blend of ethnic majority and minority groups, presents
ventions for students facing challenges. an intriguing backdrop for our research. This diversity
A study from the Obafemi Awolowo University in stems from UDNR’s policy of selecting students based
Nigeria compared two neural network models: the on household income and ethnic affiliations. The insti-
Multilayer Perceptron and the Generalized Regression tution ensures that a significant segment of its student
Neural Network. The objective was to determine body comes from low-income families living in border
the most effective model for predicting students’ aca- areas. Furthermore, deliberate efforts are in place to
demic performance using just one performance met- achieve a balance across the eight major ethnic groups
ric—academic results. The findings highlighted the in each class. Students are randomly assigned to one of
promising utility of machine learning methodologies six classes (A to F), guaranteeing that the ethnic distri-
for educators in forecasting student performance Iyanda bution is independent of other factors that might affect
et al. (2018). their GPAs.
Similarly, Yakubu and Abubakar (2022) used ma- Our research highlights the underrepresented ethnic-
chine learning on data from a Nigerian university to ities of Kachin, Kayah, Kayin, Chin, Mon, Bamar,
predict student performance, utilizing early indicators Rakhine, and Shan. Information about students’ back-
such as age, gender, and previous academic scores. The grounds was collected using a structured questionnaire,
study highlighted the influential roles of gender, high administered to third (grade 14), fourth (grade 15), and
school examination scores, and region in determining fifth (grade 16) year students enrolled between 2015
academic achievement. It suggests that these tools can and 2017. Of the respondents, 191 were male and
be crucial for higher education institutions when allo- 544 were female. Their standardized test scores, repre-
cating resources and devising intervention strategies. sented as GPA and used as the outcome variable in our
Chen and Ding (2023) emphasizes the significance of analysis, were sourced from the administrative records
predicting academic performance for policymaking. In of the respective academic years.
an attempt to predict academic performance in Penn- Our aim was to identify the factors affecting the aca-
sylvania, various machine learning models were em- demic performance of these advanced students. The
ployed: decision tree (48% accuracy), random forest reason for focusing on these cohorts lies in the extensive
(54% accuracy), logistic regression (50% accuracy), history of their academic data, especially their GPAs
support vector machine (51% accuracy), and neural net- from previous years. A ”lag” approach was adopted,
work (60% accuracy). Among them, the neural network using prior GPAs as predictors for subsequent academic
demonstrated the highest accuracy, highlighting its po- years. Consequently, first and second-year students
tential as a valuable tool in shaping educational policies, were not considered. First-year students did not have
such as funding allocation and teacher selection. a GPA history, and the data for second-year students
In summary, the current literature underscores the was not extensive enough for a thorough prediction.
growing enthusiasm and significant advancements in This approach resulted in one observation from each
harnessing machine learning techniques within the ed- third-year student, two from each fourth-year student,
ucation domain, especially for predictive analytics con- and three from every fifth-year student, accumulating a
cerning student performance. However, most existing dataset of 1333 observations for the 735 students.
studies focus on specific institutions or distinct geo- The features considered in our analysis include GPA
graphic areas. This reveals a gap in the research, sug- history, total marks of grade 11, weekly study hours,
gesting opportunities for broader analyses that cover ethnicity, gender, religious affiliation, residential back-
a range of educational settings, including places like ground, and the location of the Basic Education High
Myanmar. School (BEHS). Each feature provides distinct insights.
2
For example, GP A lag and T otalmarksGrade11 re- crafted to ensure the developed models are robust, ac-
flect the impact of past academic achievements on the curate, and reliable in their predictions.
current GPA, while Studyhrperweek might indicate stu-
dents’ dedication. Variables related to ethnicity, reli- 1. Data Sampling: In this initial stage, the stu-
gion, and gender offer socio-demographic perspectives, dent academic performance data, collected from
whereas residential and BEHS location variables pro- the University for the Development of the Na-
vide contextual insights. Before starting the training tional Races of the Union (UDNR) in Myanmar,
process for our algorithms, we transformed our fea- are segmented into five subsets using the K-fold
ture data to suitable formats. For numerical attributes, cross-validation technique, with a specific choice
we applied the z-score normalization technique. This of K=5. This technique ensures that every data
method recalibrates the data by subtracting the mean of point is used for validation exactly once while the
each feature and then dividing the result by its standard remaining data points form the training set.
deviation. For categorical features, we employed var- 2. Hyperparameter Tuning: In the second stage,
ious encoding strategies, such as binary, ordinal, and Grid Search with K-fold cross-validation is em-
one-hot encoding, depending on the structure of the ployed to find the optimal hyperparameters for
data and the targeted algorithmic outcome. each machine learning model. This exhaustive
It’s important to note that our dataset is comprehen- searching method considers all possible combina-
sive, with no missing values across all observations and tions of the hyperparameters to find the combina-
features. This completeness allowed us to conduct our tion that minimizes the error, thus improving the
analyses without resorting to data imputation, ensuring prediction performance of the model. Table 2 de-
the credibility of our findings. tails the hyperparameters employed for each algo-
rithm within the grid search, along with the se-
Table 1: Summary Statistics lected parameters used in the model training pro-
Variable Obs. Mean Std Min Max cess.
GPA 1,333 4.3 0.6 3.0 5.0
GPA lag 1,333 -0.1 1 -2.0 1.2
TotalmarksGrade11 1,333 -0.1 1 -2.5 2.5 Table 2: Hyper-parameters
Studyhrperweek 1,333 1.8 1.7 0.0 6.0 Grid Choice
ownethpct 1,333 0.0 1 -1.7 2.8
search model
Gender
male 1,333 0.3 0.4 0 1 ANN
Religion - No. of nodes in 9, 18, 27 9
Buddhism 1,333 0.8 0.4 0 1 1st hidden layers
Christianity 1,333 0.2 0.4 0 1
Other 1,333 0.0 0.1 0 1
- No. of nodes in 4, 9, 18 4,9
Ethnicity 2nd hidden layers
Bamar 1,333 0.2 0.4 0 1 SVR
Chin 1,333 0.1 0.3 0 1 - C value 1, 5, 10 1
Kachin 1,333 0.1 0.2 0 1
Kayah 1,333 0.04 0.2 0 1 - Gamma value scale, auto auto
Kayin 1,333 0.1 0.3 0 1 RF
Mon 1,333 0.02 0.2 0 1 - Maximum depth None, 5, 10 5
Rakhine 1,333 0.1 0.3 0 1 - No. of estimators 100, 200, 300 100, 200, 300
Shan 1,333 0.3 0.5 0 1
Residential] GBR
rural 1,333 0.6 0.5 0 1 - Learning rate 0.01, 0.05, 0.1 0.01
suburban 1,333 0.3 0.5 0 1 - No. of estimators 300, 500, 1000 300, 500
urban 1,333 0.1 0.3 0 1
XGBoost
LocationofBEHS
centeroftown 1,333 0.1 0.3 0 1 - Maximum depth 6, 8, 10 6
isolatedarea 1,333 0.1 0.3 0 1 - No. of estimators 500, 1000, 1500 500, 1000
outskirtoftown 1,333 0.4 0.5 0 1
rural 1,333 0.3 0.5 0 1 Note: For the grid search, two parameters for each al-
gorithm were selected for tuning, while suitable values
were assigned to other parameters not included in this
IV. METHODOLOGY table.
3
repeated five times, once for each subset, resulting 3. Random Forest Regression (RF): RF, an en-
in five models for each algorithm. semble technique, constructs an array of decision
trees and amalgamates their predictions (Breiman,
4. Model Testing: Finally, the saved models are 2001a). By instilling randomness in tree formula-
tested using the testing data, which is the one tion and considering only a feature subset at each
fold left out in each iteration. The performance bifurcation, RF encourages diversity among trees,
of the models is evaluated using three evaluation mitigating variance.
metrics: Coefficient of Determination (R2 ), Root
Mean Square Error (RMSE), and Mean Absolute 4. Gradient Boosting Regressor (GBR): GBR is a
Error (MAE). These metrics provide a comprehen- potent ensemble learning algorithm that sequen-
sive understanding of the model’s predictive capa- tially crafts multiple weak learners, predominantly
bilities and enable a thorough comparison of the decision trees (Friedman, 2001). Each subsequent
performance of the five algorithms. tree rectifies the errors of its antecedent, incremen-
tally enhancing the model’s precision. By integrat-
Pn ing the forecasts of these individual trees, GBR
(yi − yˆi )2
R2 = 1 − Pi=1
n 2
(1) yields a robust predictive model that adeptly dis-
i=1 (yi − ȳ) cerns intricate variable interrelations.
r Pn 5. Extreme Gradient Boosting (XGBoost): XG-
i=1 (yi − yˆi )2
RM SE = (2) Boost is an evolved variant of gradient boost-
n ing that embeds regularization techniques to de-
ter overfitting and bolster generalization (Chen and
Pn
i=1 |yi − yˆi | Guestrin, 2016). It refines the model through
M AE = (3) gradient-based methodologies and approximate
n
tree learning, culminating in an exceptionally ef-
where yi represents the observed data, yˆi repre- ficient and precise predictor.
sents the predicted data, ȳ is the mean of the ob-
served data, n is the total number of observations. C. Feature Importance
4
training data, XGBoost exhibited the most promising B. Feature Importance
performance with the lowest MAE of 0.16 and RMSE
of 0.22. Additionally, its R2 value of 0.88 was notably Feature importance illustrates the extent to which a
higher than the other models, suggesting a superior fit to particular variable aids in the predictions made by the
the training data. Regarding the testing data, the models model, defining the relative utility of every feature in
generally showcased closely matched results. However, the model’s predictive process. In Figure 2, we can ob-
the Support Vector Regression (SVR) stands out with an serve two distinct PFI plots which delineate the vital
MAE of 0.22, RMSE of 0.30, and an R2 of 0.77, mak- elements in our academic performance analysis. The
ing it the best-performing model for the testing dataset. left panel of the diagram arranges the important features
Given SVR’s promising results, we will further utilize based on the change in RMSE loss before and after the
this model to conduct a feature importance analysis us- permutation of features. The right panel portrays the
ing the Permutation Feature Importance (PFI) method. feature importance through the respective RMSE fol-
In Figure 1, five plots visually contrast the actual versus lowing each feature permutation.
predicted GPA outcomes for the testing data, as derived
Feature importance (type=ratio) Feature importance (type=raw)
from each of the five machine learning algorithms. GPA_lag 2.703 GPA_lag 0.769
TotalmarksGrade11 1.075 TotalmarksGrade11 0.305
Gender_male 1.037 Gender_male 0.295
Studyhrperweek 1.034 Studyhrperweek 0.293
Ethnicity 1.024 Ethnicity 0.291
Residential 1.023 Residential 0.29
LocationofBEHS 1.017 LocationofBEHS 0.289
ownethpct 1.015 ownethpct 0.288
Religion 1.006 Religion 0.285
Full model 1.0 Full model 0.284
1.0 1.5 2.0 2.5 0.3 0.4 0.5 0.6 0.7 0.8
Root mean square error (RMSE) Root mean square error (RMSE)
5
SVR algorithm in anticipating student performance but Chen, S. and Ding, Y. (2023). A machine learning ap-
also showcased the admirable predictive performances proach to predicting academic performance in penn-
exhibited by the other four algorithms, which displayed sylvania’s schools. Social Sciences, 12(3):118.
results nearly parallel to that of the SVR, denoting a
generally robust efficacy across the range of algorithms Chen, T. and Guestrin, C. (2016). Xgboost: A scalable
deployed. This good fit against the testing data essen- tree boosting system. In Proceedings of the 22nd
tially illustrates a promising avenue for leveraging ma- acm sigkdd international conference on knowledge
chine learning techniques in educational settings, af- discovery and data mining, pages 785–794.
fording a tool that can help in the accurate prediction of Cortes, C. and Vapnik, V. (1995). Support-vector net-
student academic performance to aid educational strate- works. Machine Learning, 20:273–297.
gies.
For a more detailed understanding of the influenc- Friedman, J. H. (2001). Greedy function approxima-
ing factors behind student academic performance, fu- tion: a gradient boosting machine. Annals of Statis-
ture research avenues could delve deeper by employ- tics, 29(5):1189–1232.
ing advanced interpretable machine learning methods.
Techniques such as Accumulated Local Effects (ALE) Iyanda, A. R., Ninan, O. D., Ajayi, A. O., and
(Apley and Zhu, 2020) and SHapley Additive exPlana- Anyabolu, O. G. (2018). Predicting student academic
tions (SHAP) (Lundberg and Lee, 2017) hold promise performance in computer science courses: A compar-
in providing deeper insights, facilitating the uncovering ison of neural network models. International Journal
of intricate patterns and relationships. Leveraging these of Modern Education & Computer Science, 10(6).
advanced techniques could potentially foster a compre- Kudari, J. M. (2016). Survey on the factors influ-
hensive understanding and guide the development of ences the students’ academic performance. Interna-
strategies aimed at nurturing academic excellence. tional journal of Emerging Research in Management
& Technology, 5(6):30–36.
ACKNOWLEDGEMENT
Lundberg, S. M. and Lee, S.-I. (2017). A unified ap-
We extend our heartfelt gratitude to the University proach to interpreting model predictions. Advances
for the Development of the National Races of the Union in neural information processing systems, 30.
(UDNR) for granting permission to conduct the survey
and collect essential data. Riegle-Crumb, C. (2006). The path through math:
A special acknowledgement is also due to our co- Course sequences and academic performance at the
author, Zin Mar Oo. Beyond her contributions as an intersection of race-ethnicity and gender. American
author, her expertise and mentorship in machine learn- Journal of Education, 113(1):101–122.
ing techniques were indispensable, bridging the gap in Volwerk, J. J. and Tindal, G. (2012). Documenting stu-
our initial understanding of the subject. dent performance: An alternative to the traditional
calculation of grade point averages. Journal of Col-
R EFERENCES lege Admission, 216:16–23.
Abu-Naser, S. S., Zaqout, I. S., Abu Ghosh, M., Atallah, Yakubu, M. N. and Abubakar, A. M. (2022). Apply-
R. R., and Alajrami, E. (2015). Predicting student ing machine learning approach to predict students’
performance using artificial neural network: In the performance in higher educational institutions. Ky-
faculty of engineering and information technology. bernetes, 51(2):916–934.
Albreiki, B., Zaki, N., and Alashwal, H. (2021). A
systematic literature review of student’ performance
prediction using machine learning techniques. Edu-
cation Sciences, 11(9):552.
Apley, D. W. and Zhu, J. (2020). Visualizing the effects
of predictor variables in black box supervised learn-
ing models. Journal of the Royal Statistical Soci-
ety: Series B (Statistical Methodology), 82(4):1059–
1086.
Breiman, L. (2001a). Random forests. Machine Learn-
ing, 45:5–32.
Breiman, L. (2001b). Statistical modeling: The two cul-
tures (with comments and a rejoinder by the author).
Statistical Science, 16(3):199–231.