You are on page 1of 17

Predicting GAT Examination Result in Addis

Ababa University from previous dataset using


Machine Learning

THESIS

SUBMITTED TO ADDIS ABABA UNIVERSITY, IN PARTIAL FULFILLMENT OF THE


REQUIREMENT FOR THE

MASTER OF SCIENCE IN INFORMATION SYSTEM

BY

MESERET ABIY
Id No. GSE/2809/13

2023

(DEPARTEMNT - INFORMATION SYSTEM)


ADDIS ABABA UNIVERSITY
Background
Education
Education is major leading force of different developments like economic, social, and human; it
is a valuable tool for gaining learning and wisdom (Sternberg, 2003) Education is both the act of
teaching knowledge to others from someone else. It creates a dynamic environment that change
is not one time event, change will be continuous and each innovation
will be judged in terms of its contribution to overall economic performance. There are three main
types of education, namely, Formal, Informal and Non-formal. On the other side developing the
ability to think critically is obtained from education. (Johnson & Majewska, n.d.) Critical
thinking will increase creativity and improve how we use and manage our time.(Husband, n.d.)
Critical thinking involves logic as well as creativity. It may involve inductive and deductive
reasoning, analysis and problem-solving as well as creative, innovative and complex approaches
to the resolution of issues and challenges.

In general it is the power of the change that is greatly affecting our world in the areas of science,
technology, economics, and culture. Education quality is essential for development of a country.
Ultimate goal of education is to help an individual navigate life and contribute to society once
they become older. Higher education institutions, particularly universities, are among the most
stable and change resistant social institutions to have existed during the past500years. Those
institutions have effectively developed and transmitted the store of knowledge from
one generation to another. University relevance can be primarily measured with relation to their
contribution to economic development with in a country. Globalization has quickly entered
discourses about schooling. Government and business groups talk about the necessity of schools
meeting the needs of the global economy. Developing quality of education would improve
quality of life.

One of the place in which shaping of education concern has expressed itself is in curriculum
planning for the universities. According to The World Bank Economic Review, Volume 15,
Issue 3, October 2001, Pages 367–391, the development impact of education at education centers
varied widely across countries and has fallen day to day because of

 The institutional/governance environment could have been sufficiently perverse


 Marginal returns to education could have fallen rapidly as the supply of educated labor
expanded while demand remained stagnant
 Educational quality could have been so low that years of schooling created no human
capital.
And the extent and mix of these three phenomena vary from country to country.
Introduction

The amount of data which is generated within education domain is getting increased at alarming
rate with the help of admission system, and other teaching learning (Umesh Kumar Pandey & S.
Pal, n.d.)Although especially in developing countries like Ethiopia those data are used for simple
decision making purpose or for admission purpose. To make a data useful, implementation of
Machine learning with Educational data is a key way of improving learning outcomes by mining
and analyzing large amounts of data collected by learning institutes.. According to(Archana &
Elangovan, 2014) based on the kinds of pattern data mining tasks can be classified into two
categories; predictive and descriptive tasks, so with in Machine learning educational data mining
(EDM) is useful for predictive and descriptive modeling.

Many studies are being done to forecast student success. Alongside the regression.
The correlations between students' demographic traits, university entrance requirements, scores
on aptitude tests, first-year course performance, and overall performance were examined. Based
on characteristics of pre-university and university performance, Kabakchieva has created models
to predict student success.

Recently online student management systems are in high availability rate, it embraces Graduate
Admission test (GAT).implies student digital data has come to big data size. This invites
researchers to make predictions about the students’ performance by processing educational data
using machine learning techniques. All kinds of information about the student’s demographic
data, learning environment, types of course can be used for prediction, which affect the sucss or
failure of a student graduate admission test result. On the other hand it will become a valuable
input to Addis Ababa University that can be used for different teaching and learning purposes.

Even though it differs from country to country, students in many nations are required to sit for
national examination to join their academic institution. Some institutions administer certain types
of well-established globally recognized examinations. For instance most of the US colleges and
universities require that all their applicants take one or more standardized tests such as SAT
(Scholastic Aptitude Test), GRE (Graduate Record Examination), GMAT (Graduate
Management Admission Test) and TOE FL (Test of English as a foreign language), while others
administer locally-developed examinations to screen students for pursuing their academic career
(Mengesh Nigus, n.d.)Working on predictive validity of high-stake tests is not a new trend. As
long
as students are screened in to undergraduate or graduate courses through the admission tests,
studying the influence of these tests on the future achievement of the students is of utmost
importance. The nature of these tests, in fact indicates that educational experts discover to what
extent admission tests can predict the academic success of the students in future.

Although there are more students graduating from various higher education institutions'
undergraduate programs, Addis Ababa University's graduate program still has a limited capacity,
so it is necessary to determine which students are best suited and most likely to succeed in these
institutions' postgraduate programs. It is also thought that enrolling students who are not
qualified wastes resources and that failing to hire the most qualified applicants has long-term
detrimental effects on a discipline.

Students finishing secondary and preparatory programs are required to take national
examinations, such as the University Entrance Examination (UEE), which is administered by the
National Agency for Examinations, at the conclusion of each year (NAE). The results (EGSECE,
entrance examination results) serve as the sole criterion for admission to the nation's various
universities for that specific year. Depending on the capacity of the universities, the cutting
scores may change from one year to the next. These measurements are employed in post-
secondary education not for their reliability or ability to predict student success, but rather to
control the number of applications to the various universities in the nation. It is also believed.

 “GAT is a computer-based Graduate Admission Test designed for the purpose


of screening candidates who wish to pursue their post graduate studies at the Addis Ababa
University. The test measures the cognitive abilities of candidates with emphasis on high
academic abilities and talents such as understanding, analyzing, synthesizing, problem solving,
critical thinking and the ability to evaluate and use information. More specifically, GAT
Measures verbal, quantitative and analytical reasoning abilities”.(
http://www.aau.edu.et/blog/announcement-for-applicants-to-the-graduate-admission-test-gat)

Machine learning and Data science over the years have proven very efficient and decisive in
many sectors including education. This paper is all about analysis of the performance of
students for Information Science department in Addis Ababa University considering different
factors ranging from personal and environmental Machine learning is one feature of artificial
intelligence (AI). It assists at-risk students and secures their future result by providing good
learning resources and on the other hand to improve universities rank. It is a Critical area within
the educational sectors to find the better system of teaching and learning. In this study, author
has selected two classification algorithms and parameters from supervised learning which related
to student result prediction.

Supervised: Supervised learning has been a great success in real-world applications, sometimes
is also called classification or inductive learning in machine learning.it is done by teaching or
training the machine using data that is well labeled. It has the presence of supervisor as a teacher
which means some data is already labeled with the correct answer. After that, the machine is
provided with a new set of examples which is an example data so that the supervised learning
algorithm analyzes the training data and produces a correct outcome from labeled data. There are
different types of algorithm in supervised learning some of those the writer used in this paper
are: Decision tree and Naïve Bayes.
Decision tree: a supervised learning algorithm which is a graphical representation that uses
flow-chart-like branching tree methodology. These trees are constructed beginning with the root
of the tree and proceeding down to its leaves , represents all possible outcomes of a decision
based on certain criteria. 

As (Charbuty & Abdulazeez, 2021)states A decision tree is one in which each path is Starting
from the root is described by data isolation Sequence until you get a Boolean result at the leaf
node. It is a hierarchical representation of knowledge a relationship that contains nodes and links.
If relations are used to categorization and nodes represent purpose. 

There are different techniques of machine learning algorithms such as supervised, unsupervised
etc. Within this study the prediction task on the above department have performed with the help
of two single supervised educational data mining algorithms, namely: Decision tree, Naïve
Bayes. The performance of those two machine learning algorithms was then compared.

The researcher used Decision tree & Naïve Bayes algorithms in Decission Tree algorithm by
using the WEKA tool, ID3 and J48 were applied to predict which students are likely will have a
better result at the last graduating year.

Naïve Bayes: a supervised learning algorithm. It assumes Presence of a particular trait within a
class is independent of the presence of another feature. Naive Bayes is a simple but powerful
prediction algorithm. 

Using Machine Learning on Educational Data


Educators are using ML to identify main factors about students earlier and take action to improve
success and failure. Is trying to improve education and mainly changing teaching, learning, and
research. There are mainly there machine learning applications around education. Adaptive
Learning, Increasing Efficiency, Learning Analytics, Predictive Analytics, Personalized
Learning, Evaluating Assessments.

In order to generate datasets for prediction, we used educational data from educational
institutions of the Information science department at Addis Ababa University in this research.
The university's Integrated Student Information Management System (ISIMS) is the source of
the data for our study.

It support the educators to take precautions on the teaching learning processes and useful for
Prediction of low-performing students at an early stage. For the purpose of this work the
researcher used from the above applications Predictive analytics. It supports to conclude about
the things that might happen in the future. In this study, machine learning has been used to create
a prediction system that goes beyond conventional graphics and descriptive statistics. In the
study, estimates of student success were made using demographic and social variables like age,
gender, city, and educational attainment of the family. The majority of the study's variables are
made up of elements over which neither the participants in the study—students nor teachers—
can intervene. At the start of the educational process, however, the main objective is to inform
teachers and students. In other words, the information gathered from the students at the start of
the exam process is intended to support the students' subsequent educational activities with in
graduate level of studies. As a result, preventative measures against potential flaws in the
educational process may be possible in advance. With the help of this data, the university can
keep track of their post graduate students' development and address academic issues around
research areas before they worsen.

Motivation of the study


My own experience at Addis Ababa University as a staff member working on the system areas at
the Department of Information and Communication Technology served as the basis for
conducting research in this field. As someone who participates actively in Addis Ababa
University's Integrated Student Information Management System (ISIMS), it was noticed that
different student's status with that sometimes someone who had a good grade in the under
graduate course completion will not have an appealing result in GAT, and the opposite is also
true. The idea is to create a model that can predict a GAT result before they enter to graduate
program.

Moreover, to the best of my knowledge, Ethiopia has almost no empirical data focusing on this
subject. In order to address the issue of the dearth of empirical evidence, this research intends to
present empirical data on why students who perform well undergraduate program will perform
poorly on their university post graduate entrance exam and vice versa. The findings of this study
and the evidence it produces will have far-reaching immediate effects, especially when they are
shared with the relevant parties.

Statement of the Problem


In the information-rich 21st century, a sustainable learning approach is required.
Technology is intertwined with an ever-growing variety of information sources.
Future educational events may benefit from analysis of the educational procedure and activities.
The production of has undergone incredible changes as a result of the digital age.
If the raw data is not processed and turned into information, it will not have much value in the
way of consumption, adaptation, sharing, and the transformation of resources and services. Raw
data has been literally compared to crude oil. Despite having value, it is useless. It becomes
valuable when similarly processed raw data is available. Using different machine learning and
data mining techniques it will have a value.

The first year in university is a vital transition period. This is due to the fact that at this period,
pupils establish the groundwork for their future academic achievement and perseverance. The
majority of research on first-year university students' academic retention and success indicated
that most students, irrespective their academic overcome social, emotional, and other obstacles,
navigate the adjustment period, and succeed in school. On a variety of elements that may
influence students' academic success at the university level, both national and international
research have been undertaken.(Goni, 2015) discovered a statistically significant positive
correlation between gender and students' CGPA ratings, with males performing better.

 Educational organizations are one of the important and key tools for change within the
developing countries and playing a major role for growth and development of our society. Addis
Ababa University is the first leading educational institute in Ethiopia; this institution has been
expected by government for substantial growth. By this, efficient and effective management and
accurate decision-making within these institutions are essential. Evaluating and monitoring of
key success indicators is not only essential for the management of graduate program office of
AAU but is also of critical importance for getting research success. The problem
is that no documented evaluation model or tool to evaluate the success of MSC, MA and PHD
works before joining the university. In selection and admission process we have to assure
whether the criteria we use are valid, whether they (criteria) help us to admit those applicants
with the best chance of success, and whether they enable us to eliminate those with the poorest
chance of success Similarly, it is essential to validate the selection criterion (i.e., University
Entrance Examination Result). No attempt has been done yet. Therefore, there is a need to design
and develop such an evaluation model and tool which can be used by managements of the
colleges. Therefore, the motivation for this study is to develop a methodology for the evaluation
of students’ performance of Master’s and PHD programs. The problem statement of this thesis is
to determine significance of which factor from background of the students is influential for their
GAT exam result .For this machine learning technique has been used to identify the rate of
significance. It supports the management at early stage to identify and categorize students who
will be successful or not. Similarly the researcher have focused on the personal backgrounds of
the students (, female; universities where they obtained first degree or second degree and are they
from social background or natural science background). In this regard so far no attempts have
been made to predict the GAT exam result in Addis Ababa University graduate programs.

This study described and examines the factors that influence the prediction of test success from
candidates for graduate admission at AAU. That adds to the analysis of admission criteria in
postgraduate programs and may be in the future how it relates to student success. One of the
variations With this research, learning factors that are significant in relation to educational
background i.e. between students' success and whether or not their backgrounds are suitable for
a particular bachelor's degree or graduate degree. The setting is different (e. g pedagogy, and the
order of the activities. ), there are evaluations of best Models Finally, we also include
observations about the generalization of the solution.

Based on the above background and statement of the problem the main research question for this
research is

What are the predictor variables is more important that constitute a model from background of
the student to predict GAT exam result in Addis Ababa University?
The following sub-research questions needed to be investigated to attain the result of the
investigation of the main research question:

1. What are the components to select and design models from personal background and
educational background of the students?

2 Identifying the most relevant input variables (feature selection) to predict Graduate Admission
test (GAT) result

3. Which one of the predictor variables is more important in explaining the result of the GAT
result?

Objective of the study


Objectives
General Objectives
The general objective of this research is to developing classifier model that predicts the result of
GAT result from the students’ educational and also personal background.
Specific Objectives
 To achieve the general objective, the following specific objectives are formulated:
Prepare the dataset in order to make it suitable for the selected algorithm.
 Identify the variables that have more influence on the predictive models to forecast
success in the admission test
 Identify relevant attributes to predict the Graduate Admission test result.
 Determine the appropriate machine learning algorithm to build the predictive model.
 Develop a classifier model to predicting the GAT result..

Scope and Limitation of the study


This study about centered on distinguishing the unmistakable components that are related to the
Graduate Admission Test (GAT) in Addis Ababa University. Preliminary gone to students’
demographic data contains Age, Sex, Marital status, previous department, and Number of years
which the student have been stayed in the school. And also those students’ course grade CGPA
within their undergraduate program was considered briefly to construct the predictive model.
These components were chosen based on understanding of the issue. In this research we utilized
the information that covers from 2008 EC up to 2013 EC. Other than, a classification technique
is utilized to develop prescient models for Predicting the graduate entrance exam result. This
model did not take into account contextual factors such as the social environment, national
policies, economic conditions, and employment prospects.
Another but basic limitation on this study is it excludes the candidates’ secondary school leaving
exam result because of different way of assessment within each year of exams by Ethiopian
ministry of education.

In this study about; we as it were utilized the information gotten from Addis Ababa University
Integrated Student Information Management (ISIMS), specifically GAT exam database which
doesn’t incorporate other private or public educational institutions information. Too, this study as
it was centered on graduate program admission test with in the school status.

Organization of the paper


The proposal is organized into six chapters. The primary chapter is an introduction part, which
contains background to the investigate work, explanation of the problem addressed, objective of
the research, scope and Significance of the research and methodologies received for the study.

The Second chapter is managed around writing review on information mining technology,
methods/techniques utilized, and its application within the education areas, such as Theoretical
review about machine learning, Empirical review about machine learning, summary of
theoretical and empirical reviews and conceptual framework.

The third chapter is committed to grant understanding around the machine learning analysis
tools, techniques and calculations that connected within the paper. In this chapter, issues related
to data analysis algorithms and how the calculation beneath the method work for the ISIMS
database particularly GAT Module is tended to and examined almost the choice and preparation
of data process that's embraced within the study.

Research Methodology
The area of research concerned using different algorithms to enable and learning computers and
after those performing tasks based on their learning. To solve a specific problem, a process
known as machine learning is used. Representing human problem-solving skills and giving the
system these tools. Machine learning encompasses knowledge that was previously known.
Computer skills. Using the information at hand, algorithms enhance themselves. These
algorithms then generate predictions for potential novel situations. There are numerous machine
learning techniques. According to the data, these methods' degrees of success vary. Therefore, it
would be incorrect to claim that a particular method would be appropriate for all data

This research is based on the graduate admission test result prediction Process using machine
learning, for this implementation, widely used by Data understanding, Data preparation,
Modeling, evaluation which is for ensuring the achievement of consistent and reliable results.
Using python, offering a wide range of classification methods for data mining, is used as a
machine learning tool for the research implementation.

First of all, preparing the collected data from the university student management database is
better to knowing and selecting and after all to predict candidates that will be successful join the
post graduate program. The identified result is transformed into a data mining task – the task for
classifying students into two categories – (Pass and Fail) by analyzing the available student data
with selected machine learning algorithm methods for classification.

The stage within the study is includes incorporates the information choice and per-processing,
which are data preparation, data cleaning and data test; because it profoundly influencing the
quality of the ultimate comes about.

Chapter two
Literature review
2 – I Theoretical Literature Review
The University College of Addis Ababa, the country of Ethiopia's first higher education
institution, was founded in 1950. Little progress was made in the following 50 years despite the
necessity for the nation to increase the size of the higher education sector. There were only two
public universities and sixteen associated and autonomous junior colleges in the nation in 1995,
for instance. In addition to the three higher education institutions that are under different Federal
government entities and the eight teacher training colleges under the Regional Governments,
several more universities were added as a result of the government's decentralization effort to
expand the higher education system in regional states, bringing the total number of universities
to nine (Yizengaw, 2007)

The results of college entrance tests, whether positive or negative, have an indirect impact on
people's lives because they determine how many years of education a person is capable of
obtaining. Several researches have demonstrated that a person's educational level, or the highest
level of education they have attained, affects their health. (Groot & Maassen van den Brink,
2007; Seiglie et al., 2020)

Many studies have been conducted to assess the validity and reliability of college entrance
examinations, and it has been discovered that socio demographic factors affect the exams' final
scores. In light of the fact that local literature indicates that educational attainment has an impact
on Colombians' health levels, the study of the country's specific example is pertinent. (Lucumí,
Gomez, Brownson, & Parra, 2015)
(Ömer Faruk Akmeşe)These studies show that students who come from large cities succeed
more than those who come from smaller towns. More women than men achieve success.
Students who have a father who works in government are more successful than other students.
Student achievement rises along with rising family income. As their conclusion In general,
student achievement tends to rise with the father's educational attainment. The algorithm with the
best predictive capability was random forest.

According to Coady and Dizioli (2018), education increases life expectancy and decreases
inequality, and also as Laura melissa Cruz Castro. Felipe Ortiz . Diego Lemus states it is
important to look into socio-demographic factors and how they affect Colombian college
entrance exam scores. Finally investigates Socioeconomic characteristics can predict test
results of Saber 11.

2.1. Overview Machine Learning

The fast development in digitization caused us to have large size of data in each field. Having as
well much data is getting worth in case we know how to utilize it. Data mining aims to get to
information from data utilizing different machine learning procedures. With data mining, it gets
to be conceivable to set up the connections between the data and make accurate predictions for
end of the analysis. One of the implementation areas of machine learning is education. Using
Data mining in education is the field that permits us to make predictions around the longer term
by analyzing the data gotten so distant within the field of education by utilizing different
machine learning algorithms.

Data Mining can be utilized in educational field to upgrade our understanding of learning and
preparing to the last result of the students. Extricating and assessing factors related to the student
result as portrayed by Alaa el-Halees. Mining in educational environment is called Educational
Data Mining.

Mainly there are three methods of machine learning those are classification, clustering, and
association rule mining. The methods are differing to be used based on the field of study and the
nature of the data we have. In this study the major classification algorithms (decision tree, and
naive Bayes) were employed on the educational datasets to predict the ultimate results of
understudies.

As (Yadav & Pal, 2012) study the power of machine learning is the ability to analyze significant
nonlinear relationships, given that significant input variables are expected

Technique for Predictive Analysis


It is common to refer to a variety of statistical, modeling, machine learning, and data mining
approaches as "prediction analysis techniques" when they are used to examine historical and
current data in order to forecast upcoming or unknowable occurrences.

Unlike descriptive analysis, prediction analysis is concerned with defining specific variables that
need to be predicted, identifying explanatory variables for those variables, and examining the
connections between those variables and the explanatory variables. Statistics and machine
learning are two categories of prediction analytic approaches.

Linear regression analysis, logistic analysis, and time series analysis are examples of statistical
techniques, while decision trees, neural networks, gene algorithms, and Naive Bayes are
examples of machine learning techniques.

Naive Bayesian is a simple probabilistic classifier based on Bayesian theorem with the (naive)
independence assumption (Krishnaiah et al.,2013). They have done a research paper on strategy
to classify the educational capability utilizing the Naïve Bayes Classification method. It is found
that Naïve Bayes Classification calculation performs well when the variables are non-numerical.

As Joo Kil-hong, et al. 25 September 2020 Data mining techniques like decision trees are used to
categorize data, and they have the advantage of being directly applicable to decision-making
because the results are expressed in understandable tree structures. Decision trees are currently
effectively used in South Korea in the fields of artificial intelligence, big data, and scenario
design for decision support systems in the context of smart cities. An alternative to NP-complete
problems is the decision tree algorithm ID3, which uses entropy and information gain values to
classify data and build trees using the Greedy Algorithm based on the purity of the classified
specific nodes.

According to Mohammad, F. M,& Aimal, K.,et al. study result they have used two different
public databases from different sources to do a comparison test of machine learning models,
Using their criteria for selection and rejection, they choose 20 papers over the period of 2012 to
2019, and after careful examination, they obtained 11 machine learning models that the
researchers employed in their experiments. They didn't know which one is the best one, so they
choose two public databases and applied 11 machine learning models to them after conducting
data mining on them to find out. The results show that "Decision Tree" and "Random Forest" are
the top two machine learning models out of 11 based on accuracy.

2.2. Application of Machine Learning in education


Various hypotheses have been proposed and investigates have been conducted to clarify what
student background is almost graduate admission test result prediction in different universities
and Colleges particularly covers. The literature covers a wide assortment of such speculations
and ideas. These concepts and subtopics are diagram of understudies result prediction of AAU
graduate admission test.

Different authors have been worked out in the area of educational data mining and machine
learning at national and international level. Some of the important studies are as follows

In order to forecast graduate admissions, Acharya et al. created four machine learning regression
models: linear regression, vector support, decision tree, and random forest. The scores that the
applicant received and his or her personal information were used to create a web application by
Roa et al. that served as a predictor of potential college admissions.

In order to help students select the best university, Ghai,B. created an American Graduate
Admission Prediction model that predicts whether or not they will be admitted to the institution.

In order to anticipate graduate admissions in the USA, Gupta et al. created a machine learning
decision support system that took into consideration standardized test results, GPA, and Institute
Reputation.

Predictive data mining methods were utilized by Kaur et al. to categorize and identify slow
learners among students. thorough assessment of the literature helps identify the factors that
influence student achievement. The input variables used both parameters. For the datasets of
high school students, four classification algorithms—MLP, Naive Bayes, SMO and J48 were
used. With 75% accuracy, MLP was found to perform better than other classifiers. The
researchers demonstrated that children who had access to a computer and the internet at home
performed better on assessments. By using the case study of BSMRSTU in Bangladesh, the
writers of this article seek to ascertain the likelihood that a student will be admitted to a
university. Three alternative machine learning techniques that are more efficient than others are
being used for this.

It has also been used in MOOCs to predict performance. One study makes use of students' work
on assignments from the. Together with their participation in the forums and the percentage of
tasks they completed for peer assessment during the first week, create two Models of logistic
regression that forecast whether or not students will succeed and receive completion certificates,
with accuracy percentages of 79.6 percent and 92.6 percent, respectively.

(Nguyen Thai Nghe 1 , Paul Janecek 2 , and Peter Haddawy 3 ) They have made a number of
significant contributions through their studies. First, our findings give insight on how data
mining tools are used to analyze real-world data sets, including techniques for enhancing
prediction accuracy. Second, based on the unedited implementations provided by the Weka open
source data-mining tool, the results from these case studies demonstrate that the Decision Tree
approach was substantially more accurate than the Bayesian Network technique for projecting
student achievement.
2 – II Empirical Literature Review
Lau, Sun and Yang (2019) study explores student performance through modeling and analysis of
data gathered from a single Chinese university. The ANN model is a key quality assessment tool
that assesses students' performance across universities, eliminating any inequities, and
subsequently improving the quality of education. When the writer compared to other learning
models used, the ANN performed better. Additionally, a back propagation (BP)-based ANN was
used to evaluate the effectiveness of the teaching system, and its performance was able to satisfy
the system's standards for precision and practicality. Low prediction accuracies were found
despite the models correctly classifying the students at various stages of the prediction process
according to their academic grades. With a prediction accuracy of over 80%, the study by
effectively used ANN to predict the student's mood during a self-assessment online test. Using
BP-based ANN at three separate colleges, a similar strategy was utilized but concentrated on
predicting academic performance in mathematics courses, with prediction accuracy of 93.02%.

Abeer Badr El Din Ahmed, Ibrahim Sayed Elaraby conduct a research by using data collected
from a database of students used at one of the educational institutions, based on a sampling
strategy used by the department of information systems from the sessions of 2005 to 2010. The
data initially has 154 records, based on that in order to forecast the student's performance based
on the student database, the decision tree method is employed in this study. To forecast a
student's final grade, they employ a few attributes that they pulled from their database of
information. This supports them to identifying the students who require extra attention to lower
the failure rate, and taking the proper action at the appropriate time.

Alaa Khalaf Hamoud, Aqeel Majeed Humad, Wid Akeel Awadh, Ali Salah Hashim conduct a
study to present a model for predicting student success that uses Bayes algorithms and
recommends the most effective algorithm based on performance information. In this model, the
responses to the students' questionnaire were used with two developed Bayes. The questionnaire
has 62 questions that address the topics most important to student performance from those
important topics such as relationships between, social interaction, academic achievement, and
general health. The survey was created using Google form and open-source software
(LimeSurvey), and 161 students responded in total. With this they categorized the model design
into two stages. Totally the naive Bayes algorithm is chosen as the best option for predicting
pupils' achievement.
Chapter Three
METHODOLOGY
Case study and data collection
This section explains the procedures we used to compile and examine the data from the
postgraduate applications. We talk about the machine learning algorithms we decided on before
starting the challenging process of getting the data ready for analysis. Then, we demonstrate our
model for predicting the results of graduate entrance tests and how we adjusted the settings of the
prediction algorithms to enhance our preliminary findings.

The study was carried out in Addis Ababa University about Graduate Admission Test (GAT),
which was prepared and taken by Institute of Educational Research (IER) of Addis Ababa
University. This entrance exam composed of three sections: Verbal Reasoning, (60 questions),
Quantitative Reasoning (40 questions) and Analytical Reasoning
(25questions),(http://www.aau.edu.et/blog/announcement-_-graduate-admission-test-gat/). It was
released starting from September in 2013 E.C. when it starts the first one was taken by manually
by paper base and after the first attempt it is by computer based system. The intention to the
target users who are candidate students who wants to join post graduate program can be from any
university and also from any learning programs above and equal to first degree graduates. A total
of 00000 students accessed the test. Before starting actual exam the candidates are required to fill
a brief survey questionnaire which prepared by the office. It contains a question about their
personal background, educational background and also family backgrounds.

The datasets were trained using a variety of machine learning classification techniques, including
ANN, Naive Bayes (NB), Support Vector Machines (SVM), K Nearest Neighbor (KNN), and
Linear Discriminant Analysis (LDA). It was intended to assess whether method would perform
better in terms of producing a fair accuracy prediction rate of students' exam results from the
given datasets. Because of their effectiveness in dealing with huge dataset sizes and in providing
more accurate results, decision trees and naive Bayes were chosen. And then the two classifier
techniques are listed below along with a brief description of each.

For the analysis of data, two types of sources were used, primary and secondary. The primary
one includes the student data from Integrated Student Information Management system (ISIMS)
database of Addis Ababa University, Particularly from GAT module within the system. The
secondary source consists of the information about the self-reported results of IER staffs about
the admission test and some documentation about the exam found in the office. The limited
number of students completing the survey correctly based on the given instruction is a clear
limitation of the study. The following steps are part of the research process:

I. Data Selection
II. Data Transformation
III. Implementation of Naive Bayes and decision trees algorithms
IV. Selecting the best result from the two algorithms
V. Classification

Classification Methods
Decision Tree

In recent literature, C4.5 decision trees (Quinlan, 1993), is extremely popular classification
technique, have been employed numerous times to predict student retention, including in Yadav,
Bharadwaj, and Pal (2012), Nandeshwar, Menzies, and Nelson (2011), Laura (2012), and Lin
(2012). The way this approach operates is by creating a tree structure and performing split
operations on each node based on the information gain values for each characteristic of the
dataset and the relevant class. The attribute with the greatest information gain at each level is
selected as the split criterion's foundation.

Decision trees are commonly utilized to compile data for the purpose of making decisions. Users
can perform actions on the root node of a decision tree at this point. Individuals split every node
continuously from this node in accordance with the decision tree learning process. The end result
is a decision tree, each branch of which indicates a potential option event and its result.

Naive Bayes

As (Alaa Khalaf Hamoud, Aqeel Majeed Humadi, Wid Akeel Awadh, Ali Salah Hashim ) study The
simplest type of Bayesian network is the naive Bayes, in which each attribute is given the class
variable's value on their own. The term "conditional independence" refers to this trait. The
conditional independence assumption is obviously rarely accurate in the majority of real-world
situations. Extending the structure of naive Bayes to explicitly describe the dependencies
between characteristics is an easy way to get around this problem. An expanded naive Bayes is
known as an augmented naive Bayesian network if the class node points directly to all attribute
nodes and there are links between attribute nodes.

There are numerous distinct prediction algorithms and strategies utilized in knowledge discovery
and data mining. Every technique or method has benefits and drawbacks. In order to confirm and
verify the results with multiple algorithms, this study employs two prediction approaches. On the
basis of accuracy and precision, an optimal result may be chosen. In order to forecast and
classify the entrance exam results for postgraduate candidates, this study will analyze collective
student information from the ISIMS database. By using Bayes algorithms, we also aim to clarify
the many aspects that influence student success and failure rates in relation to other variables in
the data set of applicants. Based on the TP rate, FP rate, accuracy, and recall that the algorithms
produce when they are applied to the data set, we concentrate on the performance specifics of
two algorithms (naive Bayes and Dicision Tree) in this study.

A Naive Bayesian model, according to Jayaprakash, Balamurugan, and Chandar (2018), is


simple to construct and does not require time-consuming iterative parameter estimation, making
it especially helpful for very large datasets. The Naive Bayesian classifier performs excellently
even though its simplicity and is popular because it frequently outperforms more advanced
classification techniques.

You might also like