Professional Documents
Culture Documents
Group A
1
in a single year without completing high school. Status rate is the proportion of students,
typically between 16 and 24, who have not completed high school and who are not, at a given
point in time, enrolled in a high school program. Dropout event rates typically yield a smaller
statistic than status event rates.
In an attempt to identify students at risk for dropout, create targeted interventions, and
implement relevant public policy to improve graduation rates, researchers have sought to
identify factors associated with dropout. While many studies have focused on either student or
environmental variables, we examined both levels concurrently using comparative analysis to
better understand factors associated with dropout.
Comparative analysis of several classification methods (neural networks, decision trees,
logistic regression) was used in order to develop early models of students who are most likely
to drop out. The dataset of this study come from Universiti Teknologi Petronas which is located
in Perak, Malaysia.
2
1.3 Objectives
The aim of this study is to construct a model of student dropout pattern of Universiti
Teknologi Petronas (UTP). In order to ensure that the generated output aligns with the purpose
of the study, three of the objectives has to be fulfilled:
1. To identify the most significant factor that causes student to drop out.
2. To gain insights on the pattern among the factor between students who drop out and
students who endures.
3. To determine the best model to predict the behavior and precondition of a student before
dropping out of university.
3
2.0 LITERATURE REVIEW
2.1 Introduction
Data mining is a process of discovering useful information, hidden pattern or rules in large
quantities of data. It is also known as a knowledge discovery, knowledge extraction,
information discovery, information harvesting and data analytics. The purpose of this technique
is to discover meaningful knowledge like commercially valuable, exploitable pattern and novel
from data. Once the patterns are found, it can be further used to help in decision making
(Bharati, n.d.).
4
In the absence of a comparative framework of learning algorithms, the aim of the study
conducted by Rubén Manrique, Bernardo Pereira Nunes, Olga Marino, Marco Antonio
Casanova, and Terhi Nurmikko-Fuller (2019) has been to provide such an analysis, based on a
proposed classification of strategies for predicting dropouts in Higher Education Institutions.
Three different student representations are implemented (namely Global Feature-Based, Local
Feature-Based, and Time Series) in conjunction with the appropriate learning algorithms for
each of them. A description of each approach, as well as its implementation process, are
presented in this paper as technical contributions.
An experiment based on a dataset of student information from two degrees, namely
Business Administration and Architecture, acquired through an automated management system
from a university in Brazil is used.
The paper’s findings can be summarized as: (i) of the three proposed student
representations, the Local Feature-Based was the most suitable approach for predicting dropout.
In addition to providing high quality results, the Local Feature-Based representations are
simple to build, and the construction of the model is less expensive when compared to more
complex ones; (ii) as a conclusion of the results obtained via Local Feature-Based, dropout can
be said to be accurately predicted using grades of a few core courses, so there is no need for a
complex features extraction process; (iii) considering temporal aspects of the data does not
seem to contribute to the prediction performance although it increases computational costs as
the model complexity increases.
5
to improve the online program retention rate. With student enrollment data and academic
performance information, the researchers able to build prediction models using logistic
regression method. In addition, we apply other classifiers including k nearest neighbor (kNN),
decision tree, naïve Bayes, support vector machine, and random forest to predict the dropout
for comparison purposes.
2.3.3 Survival Analysis based Framework for Early Prediction of Student Dropouts
Retention of students at colleges and universities has been a concern among educators
for many decades. In the paper conducted by Sattar Ameri, Mahtab J. Fard, Ratna B. Chinnam,
Chandan K. Reddy (2016), the researchers developed a survival analysis framework for early
prediction of student dropout using Cox proportional hazards model (Cox). Other than that,
they also applied time-dependent Cox (TD-Cox), which captures time-varying factors and can
leverage those information to provide more accurate prediction of student dropout.
The model of the research utilizes different groups of variables such as demographic,
family background, financial, high school information, college enrollment and semester-wise
credits.
The proposed framework has the ability to address the challenge of predicting dropout
students as well as the semester that the dropout will occur. This study enabled the researchers
to perform proactive interventions in a prioritized manner where limited academic resources
are available. This is critical in the student retention problem because not only correctly
classifying whether a student is going to dropout is important but also when this is going to
happen is crucial for a focused intervention. The method are evaluated on real student data
collected at Wayne State University. Results show that the proposed Cox-based framework can
predict the student dropouts and semester of dropout with high accuracy and precision
compared to the other state-of-the-art methods.
6
2.4 Data Mining In Education
According to a study in Malaysia, it indicated that data mining techniques are widely
used in higher educational system in order to increase the effectiveness of the traditional
method to provide a guideline to improve the decision-making process. Data mining techniques
were used to analyze the existing work, to identify existing gaps and to plan future works
(Beikzadeh & Delavari, 2004).
In the educational sector, data mining is often defined as the process of converting raw
data to useful information in order to extract crucial insights. Data mining is an analytic
approach that capitalizes on the advances of technology and the extreme of richness of data in
higher education to improve research and decision making through uncovering hidden trends
and patterns that lead to predictive modelling using a combination of explicit knowledge base.
When data mining systems were compared with other education systems, it emphasizes
the role of expert to interpret the findings obtained from analyzing the data retrieved from the
course. The results show that data mining systems help improve exercise, scheduling the course,
and identifying potential dropouts at an early phase (Hamalainen, 2004).
2.5 Summary
In this research, we will focus on developing a classification model similar covered in
section 2.3. As a whole, three data mining technique will be used. The techniques used are
decision tree, logistic regression, and neural network. Decision tree model for our study are
separated into 2 parts, which consists of 2 and 3 branches. To further the insights, we set 2
different maximum branches of 2 and 3. The benchmark of the study is to acquire the best
predictive model with the lowest misclassification rate to identify the underlying factor of
dropout among students.
7
3.0 METHODOLOGY
3.1 Process Flow
Data mining analysis involves a series of processes. it's essential to follow the quality processes
in order that data mining analysis is conducted in an exceedingly consistently manner. There
are several approaches to carry out data mining such as CRISP-DM (Cross – Industry Standard
Process for Data Mining) and SEMMA. In this study, SEMMA which is Sample, Explore,
Modify, Model and Assess is applied to conduct the data mining analysis on the student dropout
from Universiti Teknologi Petronas by using the data mining software which is the SAS
Enterprise Miner Workstation 15.1. SEMMA is well – known data mining methodology
developed by the SAS Institute. A pictorial representation of the process flow of SEMMA is
summarized as shown in Figure 1 below.
8
3.2.1 Sample
A historical data on the list of a total of 7606 students (7606 observations) for the
student dropout from Universiti Teknologi Petronas is collected. Figure 2 and Figure 3 shows
an overview of the historical data. Based on Figure 4, the dataset consists of a total of 15
variables which are BIL, UMUR, JANTINA, PROGRAM, NEGERI,
KATEGORI_KAWASAN_TINGGAL, PENDAPATAN_KELUARGA,
KUMPULAN_PENDAPATAN_KELUARGA, BIL_TANGGUNGAN, TAJAAN,
KELAS_GRADE_PELAJAR, KELAS_GRADE_SPM, JENIS_SEKOLAH, STATUS, and
STATUS_NEW.
9
Figure 4. Sample statistics of data imported
These 15 variables can be categorized into three main model roles : input attribute,
output attribute (target), and ID. From the 15 variables, the variables that are categorized as the
input attributes are UMUR, JANTINA, PROGRAM, NEGERI,
KATEGORI_KAWASAN_TINGGAL, PENDAPATAN_KELUARGA,
KUMPULAN_PENDAPATAN_KELUARGA, BIL_TANGGUNGAN, TAJAAN,
KELAS_GRADE_PELAJAR, KELAS_GRADE_SPM, JENIS_SEKOLAH, and STATUS.
And the output attribute or the target is STATUS_NEW. This output variable (target variable)
namely Status New refers to whether the students have dropout from the University Teknologi
Petronas.
In this phase before data exploration, the main idea is to acquire related and specific
scope from the large dataset by using appropriate sampling technique to select the suitable size
of sample so that the process of knowledge discovery or data mining analysis can speed up.
However in this study, all the 7606 observations in the dataset are considered and included in
the data mining analysis since the dataset is not very large. The variables along with their model
role, measurement level, and description are shown in the table below.
10
Variable Model Measurement Level Description
Name Role
BIL Rejected Interval Number of students.
UMUR Input Interval Age of students that measured in years.
JANTINA Rejected Binary Gender of students that is either Male
or Female.
PROGRAM Input Nominal Program of students taking.
NEGERI Input Nominal State of students living.
KATEGORI_ Input Interval Category living area of students.
KAWASAN_
TINGGAL
PENDAPATA Rejected Interval Income of student’s family measure in
N_KELUARG RM.
A
KUMPULAN Input Ordinal Category of student’s family income.
_PENDAPAT
AN_KELUAR
GA
BIL_TANGG Input Interval Family burden.
UNGAN
TAJAAN Input Nominal Sponsorship of the students.
KELAS_GRA Input Ordinal Student’s grade.
DE_PELAJAR
KELAS_GRA Input Ordinal Student’s SPM grade.
DE_SPM
JENIS_SEKO Input Nominal Type of school.
LAH
STATUS Rejected Nominal Students’ Status.
STATUS_NE Target Binary Students’ status.
W
Table 2. The model role, measurement level, and description of the variables.
11
Figure 5. Bil_Tanggungan
Based on figure 5, the first bar which consists of 213 data is considered as missing
data. The highest Frequency of this data is 3197 between the range of 3.9 and 5.2. The lowest
frequency is 11 between the range 11.7 and 13.
Figure 6. Jantina
Figure 6 above refers to the gender variable which consists of male and female. From
the figure above, it is obvious that the number of female students is more than male students
who are 4242 female students and 3364 male students.
12
Figure 7. Jenis Sekolah
Figure 7 above shows that type of school SK (Sekolah Kebangsaan) has the highest
frequency which is 4551 students, while the lowest is SB (Sekolah Berasrama) and SBK
(Sekolah Bantuan Kebangsaan) which only have one student. It also has 2 missing value on the
variable Jenis Sekolah.
Figure 8. Kategori_Kawasan_Tinggal
Figure 8 shows that there are 3707 students that stay in Bandar and Pinggir Bandar.
3479 students stay at area Luar Bandar and few students who stay at Bandar and Luar Negara.
13
Figure 9. Kelas_Grade_Pelajar
The figure above shows that the dataset has 2887 students who get great grades. It is
also have 786 students who get fail on their class performance.
14
Figure 11. Kumpulan_Pendapatan_Keluarga
The figure above shows that 5591 student’s family income have below RM40,000.
15
Figure 13. Pendapatan_Keluarga
Figure 13 shows the family income, most of the family having the income between
RM0 to RM34355. It is also have 240 missing values.
16
Figure 15. STATUS_NEW
Figure 15 shows the status that students still continue study or dropout from study.
From the chart above there have 2319 students dropout from university, and 5287 students still
continue study.
17
Figure 17. Tajaan
Figure above shows most of the students get their sponsorship from PTPTN which have
5901 students. It also have 1052 students did not get any sponsorship.
18
3.2.2 Explore
19
3.2.2.1 Stat Explore
In order to further explore the dataset given, Stat Explore node had been executed. The
variable worth plot shows orders the input variables by their worth in predicting the target
variable. Results obtained shows that Kelas_Grade_Pelajar rank the highest which followed by
Tajaan and Kelas_Grade_SPM. These three variables are the most important factor that affects
the output of whether a student will dropout from university or retain. According to the output
report, none of the variables have relatively large standard deviation.
3.2.3 Modify
3.2.3.1 Data Partition
The “Data Partition” node at the Sample tab in SAS Enterprise Miner is used for dataset
allocations. The dataset is allocated into two types: training data and testing data. The training
data is used for model construction whereas the testing data is used for model evaluation. To
construct the model for data mining analysis on the student dropout from Universiti Teknologi
Petronas, 70% of the dataset is allocated for data training. And the rest of 30% of the dataset is
allocated for data testing to evaluate the generated model.
20
Figure 21. Imputation Summary
According to the Figure 21, the impute method for replacing the 148 missing values in
interval input variables Bil_Tanggungan is by using median method in which the missing
values will be replaced with the 50th percentile, which is either the middle value or the
arithmetic mean of the two middle values for a set of numbers arranged in ascending order.
And the impute method for replacing the 2 missing values in nominal input variable
Jenis_Sekolah, 21 missing values in nominal input variable Negeri and 68 ordinal input
variable Kelas_Grade_Value is by using count method and the missing value is replaced by the
most frequently occurring value for the variable. In this case, mean is the preferred statistic for
replacing the missing values for interval input variable Bil_Tangguangan since the variables’
values are at least approximately symmetric (input variables Bil_Tanggungan has no skewed
distribution).
21
Figure 22. Overview on the variable roles and levels
3.2.4 Model
In this study, three types of predictive modelling methods for data mining analysis on
the student dropout from Universiti Teknologi Petronas are applied. The three types of
prediction methods are Regression, Decision Tree, and Neural Network. Firstly, for regression,
the suitable regression model used is the logistic regression since the target variable,
STATUS_NEW is a binary variable that will classify or predict whether a student will student
dropout from Universiti Teknologi Petronas.
Several logistic regression models are developed to investigate and determine which
one will be resulting in the best and desired outcome. In this study, the logistic regression
models are initially divided into two categories which are logistic regression with imputation
and without imputation. For the logistic regression with imputation, it is then divided into
another four categories which are logistic regression with all input’s selection, backward
selection, forward selection, and stepwise selection. The following Figure 23 illustrates the
choices of model selection available in SAS Enterprise Miner for regression modelling.
22
Figure 23. Model selection for regression modelling
Based on the Figure 23, the model selection method for “Backward” is referred to the
variables selection that start with all variables consisted in the model and then systematically
eliminate the variables that are not significantly associated with the target until no other
variables in the model meets the significance level of 0.05 (5%).
For “Forward” method of model selection, it is referred to the selection of variables that
start with none variables in the model and then systematically add the chosen variables into the
model that are significantly associated with the target until no other variables in the model
meets the significance level of 0.05 (5%).
For “Stepwise” model selection method, variable selection is started with none variable
in the model and then systematically adds variables that are significantly associated with the
target. After a variable is added to the model, it can be removed if it is deemed that the variable
is no longer significantly associated with the target.
The model selection method for “None” is referred to all inputs (input variables) will
be selected (default selection) and included in the final model to fit the regression model if it
is chosen. In overall, a total of five logistic regression models are built in this study. The overall
conceptual framework for the logistic regression models is clearly summarized in the Figure
24 below.
23
from Universiti Teknologi Petronas is based on two main splitting rules which are in terms of
maximum branch and nominal target criterion.
Maximum branch refers to the maximum number of branches or subset to be split in a
decision tree whereas nominal target criterion refers to the method of searching for and
evaluating candidate splitting rules in the presence of a nominal target. The nominal target
criterion can be categorized into three main types which are Entropy, Gini, and Classification
Error (ProbChisq).
In this study, the decision tree model is firstly divided into two categories which are
decision tree of 2 branches and 3 branches. And then for each number of maximum branches,
the decision tree models are divided into three types of nominal target criterion which are
decision tree with Entropy as target criterion, decision tree with Gini as target criterion, and
decision tree with Classification Error (ProbChisq) as target criterion. The overall conceptual
framework for the decision tree models is clearly summarized in the Figure 25 below.
24
Figure 26. Conceptual framework for neural network models.
3.2.5 Assess
In this study, firstly misclassification rate is assessed for comparing how well each of
the models constructed under each type of predictive modelling (Logistic Regression, Decision
Tree, and Neural Network) since the target variable STATUS_NEW is a binary variable. After
that, second comparison is conducted by using the “Model Comparison” node at the Assess tab
in SAS Enterprise Miner to compare which type of predictive modelling method is the best one
for accurate prediction on the student dropout from Universiti Teknologi Petronas. Figure 27
below illustrates the overview diagram of all the model constructed.
25
Figure 27. Overview diagram of all model constructed.
26
4.0 RESULTS AND DISCUSSION
4.1 Data Mining Technique One: Regression
There are five types of regression analysis for data mining analysis on the students’
background from one of the universities in Perak for prediction of students dropout through the
regression node contained in the Model tab in SAS Enterprise Miner as shown in Figure 28
below. In this section, the results of the predictive modelling by regression analysis in terms of
the score rankings overlay for target variable, fit statistics and analysis of maximum likelihood
estimates are illustrated and explained.
Figure 28. Regression node that can be accessed from the Model tab
Figure 29. Scoring rankings overlay of logistic regression without imputation (default
selection) model at depth = 60%
27
Figure 30. Scoring rankings overlay of logistic regression with imputation (default selection)
model at depth = 100%
Figure 29. and above shows the outputs of scoring ranking overlay. According to the
scoring ranking overlay in Figure 29 , it is observed that at depth = 60% in which around 4564
students study in the university from the total 7606 observations being included, the logistic
regression model predicts that 50.78% of the 4564 students dropout from the university. While
according to the scoring ranking overlay in Figure 30, it is observed that at depth = 100% in
which all the 7606 observations are included, the model successfully predicts that 30.48% of
all the 7606 students study in the university. When comparing to the actual data in which 2319
out of 7606 that is equivalent to 30.49% of the students dropout from the university which is
close to the percentage of students dropout from the university being predicted by the model,
it is concluded that the model is accurate and therefore it is useful to perform prediction.
28
Figure 31. Fit statistics table for logistic regression without imputation (default selection)
Figure 32. Analysis of maximum likelihood estimates for logistic regression without
imputation (default selection) 1
29
Figure 33. Analysis of maximum likelihood estimates for logistic regression without
imputation (default selection) 2
Figure 31 shows the fit statistics table of the model. It is observed that the
misclassification rate is 11.48% for train error and 9.98% for test error. Figure 32, 33 shows
the output of analysis of maximum likelihood estimates. Based on Figure 32, 33, it is observed
that there are several variables which have the significant values that are less than the
significance level of 5% which indicates that these variables are significant for the prediction
of the students will dropout from the university. These variables are BIL_TANGGUNGAN
and UMUR. It is noticed that the coefficient value for variable BIL_TANGGUNGAN is greater
than the coefficient value for variable UMUR indicates that the age of a student is significantly
bring more impact on the prediction of the dropout of the student than the family burden.
Therefore, the equation for this logistic regression that is without imputation and
transformation the model selection method is set as default is:
𝑒 0.1398(𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.0687(𝑈𝑀𝑈𝑅)−2.9657
𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 =
1 + 𝑒 0.1398(𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.0687(𝑈𝑀𝑈𝑅)−2.9657
30
Figure 34. Scoring rankings overlay of logistic regression with imputation (default selection)
model at depth = 60%
Figure 35. Scoring rankings overlay of logistic regression with imputation (default selection)
model at depth = 100%
Figure 34 and Figure 35 shows the outputs of scoring ranking overlay. According to
the scoring ranking overlay in Figure 34 , it is observed that at depth = 60% in which around
4564 students study in the university from the total 7606 observations being included, the
logistic regression model predicts that 50.78% of the 4564 students dropout from the university.
While according to the scoring ranking overlay in Figure 35, it is observed that at depth = 100%
in which all the 7606 observations are included, the model successfully predicts that 30.48%
of all the 7606 students study in the university. When comparing to the actual data in which
31
2319 out of 7606 that is equivalent to 30.49% of the students dropout from the university which
is close to the percentage of students dropout from the university being predicted by the model,
it is concluded that the model is accurate and therefore it is useful to perform prediction.
Figure 36. Fit statistics table for logistic regression with imputation (default selection)
Figure 37. Analysis of maximum likelihood estimates for logistic regression with imputation
(default selection) 1
32
Figure 38. Analysis of maximum likelihood estimates for logistic regression with imputation
(default selection) 2
Figure 36 shows the fit statistics table of the model. It is observed that the
misclassification rate is 8.98% for train error and 7.18% for test error. Figure 37 and 38 shows
the output of analysis of maximum likelihood estimates. Based on Figure 37 and 38, it is
observed that there are several variables which have the significant values that are less than the
significance level of 5% which indicates that these variables are significant for the prediction
of the students dropout from the university. These variables are KELAS_GRADE_PELAJAR
and UMUR. It is noticed that the coefficient value for variable UMUR is greater than the
coefficient value for variable KELAS_GRADE_PELAJAR indicates that the age of students
is significantly bring more impact on the prediction of the students dropout from the university
compared the students’ performance. Therefore, the equation for this logistic regression that is
with imputation the model selection method is set as default is:
𝑒 0.000575(𝐾𝐸𝐿𝐴𝑆_𝐺𝑅𝐴𝐷𝐸_𝑃𝐸𝐿𝐴𝐽𝐴𝑅)+0.0039(𝑈𝑀𝑈𝑅)−3.1062
𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 =
1 + 𝑒 0.000575(𝐾𝐸𝐿𝐴𝑆_𝐺𝑅𝐴𝐷𝐸_𝑃𝐸𝐿𝐴𝐽𝐴𝑅)+0.0039(𝑈𝑀𝑈𝑅)−3.1062
33
Figure 39. Scoring rankings overlay of logistic regression with imputation and
transformation (default selection) model at depth = 60%
Figure 40. Scoring rankings overlay of logistic regression with imputation and
transformation (default selection) model at depth = 100%
Figure 40 and above shows the outputs of scoring ranking overlay. According to the
scoring ranking overlay in Figure 39, it is observed that at depth = 60% in which around 4564
students study in the university from the total 7606 observations being included, the logistic
regression model predicts that 50.78% of the 4564 students dropout from the university. While
according to the scoring ranking overlay in Figure 40, it is observed that at depth = 100% in
which all the 7606 observations are included, the model successfully predicts that 30.48% of
all the 7606 students study in the university. When comparing to the actual data in which 2319
34
out of 7606 that is equivalent to 30.49% of the students dropout from the university which is
close to the percentage of students dropout from the university being predicted by the model,
it is concluded that the model is accurate and therefore it is useful to perform prediction.
Figure 41. Fit statistics table for logistic regression with imputation and transformation
(default selection)
Figure 42. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (default selection) 1
35
Figure 43. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (default selection) 2
Figure 41 shows the fit statistics table of the model. It is observed that the
misclassification rate is 8.98% for train error and 7.18% for test error. Figure 42 and 43 shows
the output of analysis of maximum likelihood estimates. Based on Figure 42 and 43, it is
observed that there are several variables which have the significant values that are less than the
significance level of 5% which indicates that these variables are significant for the prediction
of the students dropout. These variables are KELAS_GRADE_PELAJAR and UMUR. It is
noticed that the coefficient value for variable UMUR is greater than the coefficient value for
variable KELAS_GRADE_PELAJAR and indicates that the age of students is significantly
bring more impact on the prediction of the students dropout from university compared the
students’ performance. Therefore, the equation for this logistic regression that is with
imputation and transformation and the model selection method is set as default is:
𝑒 0.000575(𝐾𝐸𝐿𝐴𝑆_𝐺𝑅𝐴𝐷𝐸_𝑃𝐸𝐿𝐴𝐽𝐴𝑅)+0.0039(𝑈𝑀𝑈𝑅)−3.1062
𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 =
1 + 𝑒 0.000575(𝐾𝐸𝐿𝐴𝑆_𝐺𝑅𝐴𝐷𝐸_𝑃𝐸𝐿𝐴𝐽𝐴𝑅)+0.0039(𝑈𝑀𝑈𝑅)−3.1062
36
Figure 44. Scoring rankings overlay of logistic regression with imputation and
transformation (backward selection) model at depth = 60%
Figure 45. Scoring rankings overlay of logistic regression with imputation and
transformation (backward selection) model at depth = 100%
Figure 45 and above shows the outputs of scoring ranking overlay. According to the
scoring ranking overlay in Figure 44 , it is observed that at depth = 60% in which around 4564
students study in the university from the total 7606 observations being included, the logistic
regression model predicts that 50.78% of the 4564 students dropout from the university. While
according to the scoring ranking overlay in Figure 45, it is observed that at depth = 100% in
which all the 7606 observations are included, the model successfully predicts that 30.48% of
37
all the 7606 students study in the university. When comparing to the actual data in which 2319
out of 7606 that is equivalent to 30.49% of the students dropout from the university which is
close to the percentage of students dropout from the university being predicted by the model,
it is concluded that the model is accurate and therefore it is useful to perform prediction.
Figure 46. Fit statistics table for logistic regression with imputation and transformation
(backward selection)
Figure 47. Summary of backward elimination for logistic regression with imputation and
transformation (backward selection) model
38
Figure 48. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (forward selection) model 1
Figure 49. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (backward selection) model 2
Figure 47 shows the fit statistics table of the model. It is observed that the
misclassification rate is 8.94% for train error and 7.22% for test error. Figure 48 and 49 shows
the output of analysis of maximum likelihood estimates. Based on Figure 48 and 49, it is
observed that there are several variables which have the significant values that are less than the
significance level of 5% which indicates that these variables are significant for the prediction
of the students dropout from the university. These variables are IMP_BIL_TANGGUNGAN,
and KELAS_GRADE_PELAJAR. And the rest of the input variables that have significance
39
value greater than 0.05 as shown in Figure 48 and 49 are eliminated. It is noticed that the
coefficient value for variable KELAS_GRADE_PELAJAR is greater than the coefficient value
for variable IMP_BIL_TANGGUNGAN indicates that the students’ performance is
significantly bring more impact on the prediction of the student dropout from the university
compared the family and. Therefore, the equation for this logistic regression that is with
imputation and transformation and the model selection method is set as backward is:
0.0001(𝐼𝑀𝑃𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.000625(𝐾𝐸𝐿𝐴𝑆𝐺𝑅𝐴𝐷𝐸𝑃𝐸𝐿𝐴𝐽𝐴𝑅 )−2.8216
𝑒
𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 =
0.0001(𝐼𝑀𝑃𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.000625(𝐾𝐸𝐿𝐴𝑆𝐺𝑅𝐴𝐷𝐸𝑃𝐸𝐿𝐴𝐽𝐴𝑅 )−2.8216
1+𝑒
Figure 50. Scoring rankings overlay of logistic regression with imputation and
transformation (forward selection) model at depth = 60%
40
Figure 51. Scoring rankings overlay of logistic regression with imputation and
transformation (forward selection) model at depth = 100%
Figure 51 above shows the outputs of scoring ranking overlay. According to the scoring
ranking overlay in Figure 50, it is observed that at depth = 60% in which around 4564 students
study in the university from the total 7606 observations being included, the logistic regression
model predicts that 50.78% of the 4564 students dropout from the university. While according
to the scoring ranking overlay in Figure 51, it is observed that at depth = 100% in which all the
7606 observations are included, the model successfully predicts that 30.48% of all the 7606
students study in the university. When comparing to the actual data in which 2319 out of 7606
that is equivalent to 30.49% of the students dropout from the university which is close to the
percentage of students dropout from the university being predicted by the model, it is concluded
that the model is accurate and therefore it is useful to perform prediction.
41
Figure 52. Fit statistics table for logistic regression with imputation and transformation
(forward selection)
Figure 53. Summary of forward selection for logistic regression with imputation and
transformation (forward selection) model
42
Figure 54. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (forward selection) model 1
Figure 55. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (forward selection) model 2
Figure 52 shows the fit statistics table of the model. It is observed that the
misclassification rate is 8.98% for train error and 7.18% for test error. Figure 54 and 55 shows
the output of analysis of maximum likelihood estimates. Based on Figure 54 and 55, it is
observed that there are several variables which have the significant values that are less than the
significance level of 5% which indicates that these variables are significant for the prediction
43
of the student dropout from the university. These variables are IMP_BIL_TANGGUNGAN,
KELAS_GRADE_PELAJAR and UMUR. And the rest of the input variables that have
significance value greater than 0.05 as shown in Figure 54 and 55 are eliminated. It is noticed
that the coefficient value for variable UMUR is greater than the coefficient value for variable
IMP_BIL_TANGGUNGAN and KELAS_GRADE_PELAJAR indicates that the age of
students is significantly bring more impact on the prediction of the students dropout from the
university compared the family burden and the students’ performance. Therefore, the equation
for this logistic regression that is with imputation and transformation and the model selection
method is set as forward is:
0.0001(𝐼𝑀𝑃𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.0006(𝐾𝐸𝐿𝐴𝑆𝐺𝑅𝐴𝐷𝐸𝑃𝐸𝐿𝐴𝐽𝐴𝑅 )+0.0034(𝑈𝑀𝑈𝑅)−3.5469
𝑒
𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 =
0.0001(𝐼𝑀𝑃𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.0006(𝐾𝐸𝐿𝐴𝑆𝐺𝑅𝐴𝐷𝐸𝑃𝐸𝐿𝐴𝐽𝐴𝑅 )+0.0034(𝑈𝑀𝑈𝑅)−3.5469
1+𝑒
Figure 56. Scoring rankings overlay of logistic regression with imputation and
transformation (stepwise selection) model at depth = 60%
44
Figure 57. Scoring rankings overlay of logistic regression with imputation and
transformation (stepwise selection) model at depth = 100%
Figure 56 and 57 above shows the outputs of scoring ranking overlay. According to the
scoring ranking overlay in Figure 56, it is observed that at depth = 60% in which around 4564
students study in the university from the total 7606 observations being included, the logistic
regression model predicts that 50.78% of the 4564 students dropout from the university. While
according to the scoring ranking overlay in Figure 57, it is observed that at depth = 100% in
which all the 7606 observations are included, the model successfully predicts that 30.48% of
all the 7606 students study in the university. When comparing to the actual data in which 2319
out of 7606 that is equivalent to 30.49% of the students dropout from the university which is
close to the percentage of students dropout from the university being predicted by the model,
it is concluded that the model is accurate and therefore it is useful to perform prediction.
45
Figure 58. Fit statistics table for logistic regression with imputation and transformation
(stepwise selection)
Figure 59. Summary of stepwise selection for logistic regression with imputation and
transformation (stepwise selection) model
46
Figure 60. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (stepwise selection) model 1
Figure 61. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (stepwise selection) model 2
47
Figure 62. Analysis of maximum likelihood estimates for logistic regression with imputation
and transformation (stepwise selection) model 3
Figure 58 shows the fit statistics table of the model. It is observed that the
misclassification rate is 8.94% for train error and 7.22% for test error. Figure shows the output
of analysis of maximum likelihood estimates. Based on Figure 60, 61 and 62, it is observed
that there are several variables which have the significant values that are less than the
significance level of 5% which indicates that these variables are significant for the prediction
of the students dropout from the university. These variables are KELAS_GRADE_PELAJAR
and IMP_BILANGAN_TANGGUNGAN. And the rest of the input variables that have
significance value greater than 0.05 as shown in Figure 60 and 61 are eliminated. It is noticed
that the coefficient value for variable KELAS_GRADE_PELAJAR is greater than the
coefficient value for variable IMP_BILANGAN_TANGGUNGAN indicates that the students’
performance is significantly bring more impact on the prediction of the students dropout from
the university compared the family burden. Therefore, the equation for this logistic regression
that is with imputation and transformation and the model selection method is set as stepwise is:
0.0001(𝐼𝑀𝑃_𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.0006(𝐾𝐸𝐿𝐴𝑆𝐺𝑅𝐴𝐷𝐸𝑃𝐸𝐿𝐴𝐽𝐴𝑅 )−2.8216
𝑒
𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 = 0.0001(𝐼𝑀𝑃_𝐵𝐼𝐿𝑇𝐴𝑁𝐺𝐺𝑈𝑁𝐺𝐴𝑁 )+0.0006(𝐾𝐸𝐿𝐴𝑆𝐺𝑅𝐴𝐷𝐸𝑃𝐸𝐿𝐴𝐽𝐴𝑅 )−2.8216
1+𝑒
48
4.1.7 Logistic Regression Models Comparison
Regression Model Model Selection Misclassification Rate
Method Train (%) Test (%)
Logistics Regression Default 11.48 9.98
without Imputation
Logistic Regression Default 8.98 7.18
with Imputation
Logistic Regression Default 8.98 7.18
with Imputation and Backward 8.98 7.22
Transformation Forward 8.98 7.18
Stepwise 8.94 7.22
Table 3. Comparison between different types of logistic regression models in term of
misclassification rate.
Table 3 above shows a comparison is conducted between different types of logistic
regression models in term of misclassification rate. According to the table, it is observed that
the lowest value of test error is 7.18%. Therefore, logistic regression without imputation
(default selection) model and logistic regression with imputation (default selection) model are
no longer in our consideration of choosing the best regression model for prediction of the
students will dropout from Universiti Teknologi Petronas. Meanwhile, it is observed that
logistic regression with imputation model for the three types of model selection: backward,
forward, and stepwise share the same misclassification rate.
49
Figure 63. Decision tree node that can be accessed from the Model tab.
Figure 64. Scoring rankings overlay of decision tree model at depth= 60%
50
Figure 65. Scoring rankings overlay of decision tree model at depth= 100%
Figure 64 and 65 above shows the outputs of scoring ranking overlay for decision tree
model. The output of the scoring ranking overlay for each decision tree no matter what is the
maximum branches and the nominal target criterion is same. According to the scoring ranking
overlay in Figure 64, it 4.2.2 Decision Tree: 2 branches with Chi-Square as target criterionis
observed that at depth = 60% in which around 4565 students study in the university from the
total 7606 observation been included, the decision tree model predicts that 50.78% of the 4564
students dropout from the university . While according to the score ranking overlay in Figure
65, it is observed that at depth = 100% in which all the 7606 observation are included, the
model successfully predicts that 30.48% of all the 7606 students study in the university. When
comparing to the actual data in which 2319 out of 7606 that is equivalent to 30.49% of the
students dropout from the university which is close to the percentage of students dropout from
the university being predicted by the model, it is conclude that the model is accurate and
therefore it is useful to perform prediction.
51
4.2.2 Decision Tree: 2 branches with Chi-Square as target criterion
Figure 66. Decision tree diagram of 2 branches with Chi-Square as target criterion
Figure 66 above shows the overall decision tree diagram of 2 branches with Chi-
Square as target criterion. Based on the decision tree model above, the parent node or root
node is the variable is Tajaan. The total of the leaf nodes which is also known as terminal
nodes where it has exactly one incoming edge and no outgoing edge is 19 nodes. The set of
the rules that is represented by this decision model will be presented in Appendix 1.
Figure 67. Fit statistics table for decision tree model of 2 branches with Chi-Square as target
criterion.
The Figure 67 above show the fit statistics table for decision tree model of 2 branches
with Chi-Square as target criterion. Due to the decision tree is binary, then we will look for the
misclassification rate. The misclassification plot displays how many observations were
52
correctly and incorrectly classified. A significant number of misclassification might indicate
that the model does not fit the data. From the table above, we can notice that the
misclassification rate is 0.0748. It’s mean that the error rate for the 2 branches with Chi-Square
as target criterion only has 7.48%.
Figure 68. Variable important of the input variable for the decision tree model of 2 branches
with Chi-Square as target criterion
For Figure 68, it illustrates the number of splitting rules and important of the input
variables. It is observed that the variable Kelas_Grade_Pelajar has the highest importance
where the value of important equals to 1 indicates that the variable is the most significant in
this model and the numbered of splitting rules is equals to 8. This denotes that the variable is
split into 5 types of rules in this model which are shown in the Figure 68, first, less than berhenti
and greater or equal to berhenti or missing; second, less than gagal or missing and greater equal
to gagal; third, less than sederhana and greater equal to sederhana or missing; fourth, less than
gagal and grater equal to gagal or missing; fifth, less than berhenti or missing and greater equal
to berhenti; sixth, less than sederhana or missing and greater or equal to serderhana; and the
seventh and eighth are same which is less than cemerlang and greater equal to sederhana or
missing.
The second importance variable is Tajaan where the value of importance is equals to
0.4515, and the number of splitting rules is 2. The third importance variable is Bil_Tanggungan
with the value of importance equals to 0.1230 and the number of splitting rules equals to 2.
The fourth importance variable is Kategori_Kawasan_Tinggal where the importance value is
53
0.1056, and the number of splitting rules is 3. Next, Umur is the fifth importance variable with
the value of importance equals to 0.0795 and the number of splitting rules is 2. Lastly, the
Kumpulan_Pendapatan_Keluarge has the importance value of 0.0681 with the number of
splitting rules equals to 1. From the table, we can observed that the variable Program,
Kelas_Grade_SPM, Negeri and Jenis_Sekolah has the number equals to 0 in part number of
splitting and importance value. It means that, these variables are not importance to this decision
model of 2 branches with Chi-Square as target criterion.
Figure 69. Decision tree diagram of 2 branches with Gini as target criterion
Figure 69 above shows the overall decision tree diagram of 2 branches with Gini as
target criterion. Based on the decision tree model above, the parent node or root node is the
variable is Tajaan. The total of the leaf nodes which is also known as terminal nodes where it
has exactly one incoming edge and no outgoing edge is 19 nodes. The set of the rules that is
represented by this decision model will be presented in Appendix 2
Figure 70. Fit statistics table for decision tree model of 2 branches with Gini as target
criterion
54
The Figure 70 above show the fit statistics table for decision tree model of 2 branches
with Gini as target criterion. Due to the decision tree is binary, then we will look for the
misclassification rate. The misclassification plot displays how many observations were
correctly and incorrectly classified. A significant number of misclassification might indicate
that the model does not fit the data. From the table above, we can notice that the
misclassification rate is 0.0814. It’s mean that the error rate for the 2 branches with Gini as
target criterion only has 8.14%.
Figure 71. Variable important of the input variable for the decision tree model of 2 branches
with Gini as target criterion
For Figure 71, it illustrates the number of splitting rules and important of the input
variables. It is observed that the variable Kelas_Grade_Pelajar has the highest importance
where the value of important equals to 1 indicates that the variable is the most significant in
this model and the numbered of splitting rules is equals to 8. This denotes that the variable is
split into 8 types of rules in this model which are shown in the Figure 71; first, less than berhenti
and greater or equal to berhenti or missing; second, less than gagal or missing and greater equal
to gagal; third, less than sederhana and greater equal to sederhana or missing; fourth, less than
gagal and grater equal to gagal or missing; fifth, less than berhenti or missing and greater equal
to berhenti; sixth, less than sederhana or missing and greater or equal to serderhana; and the
seventh and eighth are same which is less than cemerlang and greater equal to sederhana or
missing.
The second importance variable is Tajaan where the value of importance is equals to
0.4469, and the number of splitting rules is 1. The third importance variable is Bil_Tanggungan
with the value of importance equals to 0.0922 and the number of splitting rules equals to 2.
The fourth importance variable is Program where the importance value is 0.0906, and the
55
number of splitting rules is 2. Next,the fifth importance variable is
Kumpulan_Pendapatan_Keluarga with the value of importance equals to 0.0681 and the
number of splitting rules is 1. The sixth importance variable is Katergori_Kawasan_Tinggi
which has the importance value of 0.0512 and the number of splitting rules equals to 1. The
rest following importance variable is Kelas_Grade_SPM , Umur, and Jenis Sekolah with the
importance value of 0.0489, 0.0468 ,0.0324, and the number of splitting rules is equals to 1.
From the table, we can observed that the variable Negeri has the number equals to 0 in part
number of splitting and importance value. It means that, this variable is not significant to this
decision model of 2 branches with Gini as target criterion.
Figure 72. Decision tree diagram of 2 branches with Entropy as target criterion
Figure 72 above shows the overall decision tree diagram of 2 branches with Entropy as
target criterion. Based on the decision tree model above, the parent node or root node is the
variable is Tajaan. The total of the leaf nodes which is also known as terminal nodes where it
has exactly one incoming edge and no outgoing edge is 21 nodes. The set of the rules that is
represented by this decision model will be presented in Appendix 3
56
Figure 73. Fit statistics table for decision tree model of 2 branches with Entropy as target
criterion
The Figure 73 above show the fit statistics table for decision tree model of 2 branches
with Entropy as target criterion. Due to the decision tree is binary, then we will look for the
misclassification rate. The misclassification plot displays how many observations were
correctly and incorrectly classified. A significant number of misclassification might indicate
that the model does not fit the data. From the table above, we can notice that the
misclassification rate is 0.0858. It’s mean that the error rate for the 2 branches with Entropy
as target criterion only has 8.58%.
Figure 74. Variable important of the input variable for the decision tree model of 2 branches
with Entropy as target criterion
For Figure 74, it illustrates the number of splitting rules and important of the input
variables. It is observed that the variable Kelas_Grade_Pelajar has the highest importance
where the value of important equals to 1 indicates that the variable is the most significant in
this model and the numbered of splitting rules is equals to 8. This denotes that the variable is
split into 8 types of rules in this model which are shown in the Figure 74; first, less than berhenti
and greater or equal to berhenti or missing; second, less than gagal or missing and greater equal
to gagal; third, less than sederhana and greater equal to sederhana or missing; fourth, less than
57
gagal and grater equal to gagal or missing; fifth, less than berhenti or missing and greater equal
to berhenti; sixth, less than sederhana or missing and greater or equal to serderhana; and the
seventh and eighth are same which is less than cemerlang and greater equal to sederhana or
missing.
The second importance variable is Tajaan where the value of importance is equals to
0.4515, and the number of splitting rules is 2. The third importance variable is Bil_Tanggungan
with the value of importance equals to 0.1230 and the number of splitting rules equals to 2.
The fourth importance variable is Program where the importance value is 0.1189, and the
number of splitting rules is 3. Next, the fifth importance variable is
Kategory_Kawasan_Pendapatan with the value of importance equals to 0.0785 and the number
of splitting rules is 2. The rest following importance variable is
Kumpulan_Pendapatan_Keluarga, Kelas_Grade_SPM, and Umur with the importance value of
0.0681, 0.0489, 0.0468, and the number of splitting rules is equals to 1. From the table, we can
observed that the variable Jenis_Sekolah and Negeri has the number equals to 0 in part number
of splitting and importance value. It means that, these variables are not significant to this
decision model of 2 branches with Entropy as target criterion.
Figure 75. Decision tree diagram of 3 branches with Chi-Square as target criterion
Figure 75 above shows the overall decision tree diagram of 3 branches with Chi-Square
as target criterion. Based on the decision tree model above, the parent node or root node is the
variable is Kelas_Grade_Pelajar. The total of the leaf nodes which is also known as terminal
58
nodes where it has exactly one incoming edge and no outgoing edge is 19 nodes. The set of the
rules that is represented by this decision model will be presented in Appendix 4.
Figure 76. Fit statistics table for decision tree model of 3 branches with Chi-Square
as target criterion.
The Figure 76 above show the fit statistics table for decision tree model of 3 branches
with Chi-Square as target criterion. Due to the decision tree is binary, then we will look for the
misclassification rate. The misclassification plot displays how many observations were
correctly and incorrectly classified. A significant number of misclassification might indicate
that the model does not fit the data. From the table above, we can notice that the
misclassification rate is 0.0748. It’s mean that the error rate for the 3 branches with Chi-Square
as target criterion only has 7.48%.
Figure 77. Variable important of the input variable for the decision tree model of 3 branches
with Chi-Square as target criterion
For Figure 77, it illustrates the number of splitting rules and important of the input
variables. It is observed that the variable Kelas_Grade_Pelajar has the highest importance
where the value of important equals to 1 indicates that the variable is the most significant in
this model and the numbered of splitting rules is equals to 2. This denotes that the variable is
split into 2 types of rules in this model which are shown in the Figure 76; first, less than gagal
59
or missing and greater or equal to gagal but less than sederhana; second, greater or equal to
sederhana.
The second importance variable is Tajaan where the value of importance is equals to
0.1736, and the number of splitting rules is 3. The third importance variable is Bil_Tanggungan
with the value of importance equals to 0.1290 and the number of splitting rules equals to 2.
The fourth importance variable is Katergory_Kawasan_Tinggal where the importance value is
0.0864, and the number of splitting rules is 2. Next, the fifth importance variable is Umur with
the value of importance equals to 0.0800 and the number of splitting rules is 2.
Kumpulan_Pendapatan_Keluarga as the sixth importance variable with the important value
equals to 0.0634 with the number of splitting rules equal to 1. The following variables which
are Program, Kelas_Grade_SPM, Negeri, and Jenis_Sekolah have no significant to this
decision tree model of 3 branches with Chi-Square as target criterion. It is because the value of
important and number of splitting rules for these variables is all equals to 0.
Figure 78. Decision tree diagram of 3 branches with Gini as target criterion
Figure 78 above shows the overall decision tree diagram of 3 branches with Gini as target
criterion. Based on the decision tree model above, the parent node or root node is the variable
is Kelas_Grade_Pelajar. The total of the leaf nodes which is also known as terminal nodes
where it has exactly one incoming edge and no outgoing edge is 19 nodes. The set of the rules
that is represented by this decision model will be presented in Appendix 5
60
Figure 79. Fit statistics table for decision tree model of 3 branches with Gini as target
criterion
The Figure 79 above show the fit statistics table for decision tree model of 3 branches
with Chi-Square as target criterion. Due to the decision tree is binary, then we will look for the
misclassification rate. The misclassification plot displays how many observations were
correctly and incorrectly classified. A significant number of misclassification might indicate
that the model does not fit the data. From the table above, we can notice that the
misclassification rate is 0.0783. It’s mean that the error rate for the 3 branches with Gini as
target criterion only has 7.83%.
Figure 80. Variable important of the input variable for the decision tree model of 3 branches
with Gini as target criterion
For Figure 80, it illustrates the number of splitting rules and important of the input
variables. It is observed that the variable Kelas_Grade_Pelajar has the highest importance
where the value of important equals to 1 indicates that the variable is the most significant in
this model and the numbered of splitting rules is equals to 2. This denotes that the variable is
split into 2 types of rules in this model which are shown in the Figure 78; first, greater or equal
61
to cemerlang or missing; second, less than gagal or missing and greater or equal to gagal but
less than sederhana.
The second importance variable is Tajaan where the value of importance is equals to
0.1737, and the number of splitting rules is 3. The third importance variable is Program with
the value of importance equals to 0.1504 and the number of splitting rules equals to 6. The
fourth importance variable is Bil_Tanggungan where the importance value is 0.1495, and the
number of splitting rules is 5. Next, the fifth importance variable is Umur with the value of
importance equals to 0.0995 and the number of splitting rules is 5.
Kumpulan_Pendapatan_Keluarga is the sixth importance variable with the important value
equals to 0.0877 with the number of splitting rules equal to 3. The following variables which
are Kelas_Grade_SPM, and Katergori_Kawasan_Tinggal is the seventh and eighth importance
variable where the importance value equals to 0.0729, 0.0688 and the number of splitting rules
equals to 4 and 2. Based on the table we can found that the importance value and the number
of splitting rules for variable Negeri and Jenis_Sekolah is equals to 0. It means that these two
variables have no significant to this decision tree model of 3 branches with Gini as target
criterion.
Figure 81. Decision tree diagram of 3 branches with Entropy as target criterion
Figure 81 above shows the overall decision tree diagram of 3 branches with Entropy as
target criterion. Based on the decision tree model above, the parent node or root node is the
variable is Kelas_Grade_Pelajar. The total of the leaf nodes which is also known as terminal
62
nodes where it has exactly one incoming edge and no outgoing edge is 19 nodes. The set of the
rules that is represented by this decision model will be presented in Appendix 6.
Figure 82. Fit statistics table for decision tree model of 3 branches with Entropy as target
criterion.
The Figure 82 above show the fit statistics table for decision tree model of 3 branches
with Entropy as target criterion. Due to the decision tree is binary, then we will look for the
misclassification rate. The misclassification plot displays how many observations were
correctly and incorrectly classified. A significant number of misclassification might indicate
that the model does not fit the data. From the table above, we can notice that the
misclassification rate is 0.0805. It’s mean that the error rate for the 3 branches with Entropy as
target criterion only has 8.05%.
Figure 83. Variable important of the input variable for the decision tree model of 3 branches
with Entropy as target criterion
For Figure 83, it illustrates the number of splitting rules and important of the input
variables. It is observed that the variable Kelas_Grade_Pelajar has the highest importance
where the value of important equals to 1 indicates that the variable is the most significant in
this model and the numbered of splitting rules is equals to 2. This denotes that the variable is
63
split into 2 types of rules in this model which are shown in the Figure 81; first, greater or equal
to cemerlang or missing; second, less than gagal or missing and greater or equal to gagal but
less than sederhana.
The second importance variable is Tajaan where the value of importance is equals to
0.1737, and the number of splitting rules is 3. The third importance variable is Bil_Tanggungan
with the value of importance equals to 0.1616 and the number of splitting rules equals to 6.
The fourth importance variable is Program where the importance value is 0.1335, and the
number of splitting rules is 5. Next, the fifth importance variable is Umur with the value of
importance equals to 0.0992 and the number of splitting rules is 5.
Kumpulan_Pendapatan_Keluarga is the sixth importance variable with the important value
equals to 0.0784 with the number of splitting rules equal to 2. The following variables which
are Kelas_Grade_SPM, and Katergori_Kawasan_Tinggal is the seventh and eighth importance
variable where the importance value equals to 0.0676, 0.0599 and the number of splitting rules
equals to 3 and 1. Based on the table we can found that the importance value and the number
of splitting rules for variable Negeri and Jenis_Sekolah is equals to 0. It means that these two
variables have no significant to this decision tree model of 3 branches with Entropy as target
criterion.
64
testing error, we can observed that the decision tree model of 2 branches with Chi-Square as
target criterion has the lowest percent of testing error which is 7.48%. On the other hand, if a
comparison is made between the model among the decision tree with 3 branches only based on
the percentage of testing error, we can noticed that, the lowest percent of testing error which is
7.48% belongs to the decision tree model of 3 branches with Chi-Square as target criterion
which is same as the comparison among 2 branches decision tree. Based on the result
comparing the percentage of testing, we can notice that the decision tree model with Chi-Square
as target criterion has the lowest percent of testing error which is 7.48%, no matter the decision
tree is in 2 branches or 3 branches which will shown in the table 5 below:
Decision Tree Splitting Rules Misclassification Rate
Maximum Nominal Target Train (%) Test (%)
Branch Criterion
1 With 2 branches Chi-Square 8.75 7.48
2 With 3 branches Chi-Square 8.86 7.48
Table 5. Comparison between the decision tree models
Due to the same percentage of testing, we will look at the train to find the more preferred
decision tree. Based on the Table 5 above, we can noticed that the train for 2 branches with
Chi-Square as target criterion is 8.75%, however, the train for 3 branches with Chi-Square as
target criterion is 8.86%. By comparing these two decision tree models, the decision tree model
of 2 branches with Chi-Square as target criterion is more preferred as shown in Table 6 below.
Decision Tree Decision Tree Misclassification Rate Preferred
Model Train (%) Test (%)
1 With 2 8.75 7.48 ✓
branches
2 With 3 8.86 7.48
branches
Table 6. Comparison between the chosen decision tree models
65
the predictive modelling through classification by neural network analysis in terms of the score
ranking overlays, fit statistics, and Multilayer Perceptron (MLP) are analysed.
66
Figure 86. Score rankings overlay of 2 hidden units at depth = 30%
67
Figure 88. Multilayer perceptron (MLP) of neural network model with 2 hidden
units.
Figure 88 above shows the neural network architecture which is the MLP for neural
network model. Referring to the figure above, there are three layers which consists of the input
layer, hidden layer, and output layer. As default, the hidden layer is set to 1. However, the
number of nodes in the first layer are determined by the number of input variables listed inside
the model. In this model, there are 10 nodes which consists of UMUR, PROGRAM, NEGERI,
KATEGORI_KAWASAN_TINGGAL, KUMPULAN_PENDAPATAN_KELUARGA,
BIL_TANGGUNGAN, TAJAAN, KELAS_GRADE_PELAJAR, KELAS_GRADE_SPM,
JENIS_SEKOLAH, and STATUS. The hidden layer that consists of node i and node j is
68
represented by a set of 2 hidden units: hidden unit 1 and hidden unit 2. Lastly, output layer
consists of one node which is node k. Each node has a weighted connection to the other nodes
in adjacent layers as shown in Figure 89 below which the estimates represent the weight values.
The output values are computed by default.
Figure 89. Optimization Result of Neural Network Model with 2 hidden units
Figure 89 above shows the optimization results of the neural network model with 2
hidden units. The Estimates column indicates the value of weight for each of the connected
nodes. For example, IMP_Bil_Tanggungan_H11 (node 1) shows that the imputed variable
BIL_TANGGUNGAN is connected to the first hidden node (node i) with the weight of 0.063
whereas IMP_Bil_Tanggungan_H12 (node 1) implies that the imputed variable
BIL_TANGGUNGAN is connected to another hidden unit where it is located in hidden layer
(node j) with the weight of about 0.023. Figure 90 below shows the fit statistics table for neural
69
network model of 2 hidden units. It can be concluded that the misclassification rate is 9.4% for
Train dataset and 7.6% for Test dataset.
Figure 90. Fit statistics table for neural network model with 2 hidden units
70
Figure 92. Score Rankings Overlay of Neural Network Model with 3 Hidden Units at Depth
= 30%
Figure 93. Score Rankings Overlay of Neural Network Model with 3 Hidden Units at Depth
= 100%
Figure 92 and 93 above shows the output of score rankings overlay of the neural
network model constructed. According to the figure, it is observed that at depth = 30% in which
around 2282 out of 7606 students, the model classifies that as much as 82.97% of the students
recorded drop out from university. On the second figure, it can be seen that at depth = 100% in
which all 7606 observations are being made, the model successfully classify that 30.48% of
71
the 7606 students recorded drop out from university. In comparison to the actual data, we can
conclude that the model is accurate and hence is useful to perform prediction.
Figure 94. Multilayer perceptron (MLP) of neural network model with 3 hidden units
Figure 94 above demonstrates the neural network architecture which is the multilayer
perceptron (MLP) for neural network model with 3 hidden units in the hidden layer. Referring
to the figure above, there are three layers which consists of the input layer, hidden layer, and
output layer. As default, the hidden layer is set to 1. However, the number of nodes in the first
72
layer are determined by the number of input variables listed inside the model. In this model,
there are 10 nodes which consists of UMUR, PROGRAM, NEGERI,
KATEGORI_KAWASAN_TINGGAL, KUMPULAN_PENDAPATAN_KELUARGA,
BIL_TANGGUNGAN, TAJAAN, KELAS_GRADE_PELAJAR, KELAS_GRADE_SPM,
JENIS_SEKOLAH, and STATUS. The hidden layer that consists of node h, node i, and node
j is represented by a set of 3 hidden units: hidden unit 1, hidden unit 2 and hidden unit 3. Lastly,
output layer consists of one node which is node k. Each node has a weighted connection to the
other nodes in adjacent layers as shown in Figure 95 below which the estimates represent the
weight values. The output values are computed by default.
Figure 95. Optimization Result of Neural Network Model with 3 hidden units
73
Figure 95 above shows the optimization result as part of the output of the neural
network model with 3 hidden units. The Estimates column represents the values of weighted
for each of the connected nodes. For example, IMP_Bil_Tanggungan_Hl1 (node 1) implies
that the imputed variable BIL_TANGGUNGAN is connected to the first hidden units (node h)
with the weight value of approximately -0.04 whereas IMP_Bil_Tanggungan_Hl2 (node 1)
implies that the imputed variable BIL_TANGGUNGAN is connected to another hidden units
which is the second hidden units in the hidden layer (node i) with the weight value of
approximately -0.06. And then for IMP_Bil_Tanggungan_Hl3 (node 1) implies that the
imputed variable BIL_TANGGUNGAN is again connected but to the third hidden units (node
j) with the weight value of approximately -0.15. Figure 96 below shows the fit statistics table
for neural network model of 2 hidden units. It is observed that the misclassification rate is
12.17% for train error and 23.76% for test error.
Figure 96. Fit statistics table for neural network model of 3 hidden units
74
neural network model with 3 hidden units is preferable when compared to neural network
model with 2 hidden units in this particular case study.
75
5.0 MODEL COMPARISON
Model Misclassification Rate
Training(%) Testing(%)
1. Logistics Logistic Regression without 11.48 9.98
Regression Imputation (Default Selection)
2. Logistic Regression with Imputation 8.98 7.18
(Default Selection)
3. Logistic Regression with Imputation 8.98 7.18
and Transformation (Default
Selection)
4. Logistic Regression with Imputation 8.98 7.22
and Transformation (Backward
Selection)
5. Logistic Regression with Imputation 8.98 7.18
and Transformation (Forward
Selection)
6. Logistic Regression with Imputation 8.94 7.22
and Transformation (Stepwise
Selection)
7. Decision Decision Tree with Chi-Square 2 8.75 7.48
Tree Branch
8. Decision Tree with Gini 2 Branch 8.83 8.14
9. Decision Tree with Entropy 2 Branch 8.81 8.58
10. Decision Tree with Chi-Square 3 8.86 7.48
Branch
11. Decision Tree with Gini 3 Branch 8.06 7.83
12. Decision Tree with Entropy 3 Branch 8.23 8.05
13. Neural Neural Network 2 Hidden Layer 9.43 7.66
14. Network Neural Network 3 Hidden Layer 9.06 7.53
Table 7. Overview on the misclassification rate for all the models constructed.
Table 7 above shows overall view on the misclassification rate for all the models
constructed for logistic regression, decision tree, and neural network. Model comparison in
term of misclassification rate refers to the comparison between models constructed by
76
assessing the percentage level of both the train error and test error. First, choose the model with
the least percent of test error which it has the lowest percentage level will be considered as the
best model. If more than one model has the same percent of test error, then the percentage level
of train error of those models needed to be compared. In this study, two phases of comparisons
between models constructed is to determine the best model for Universiti Teknologi Petronas
to know a more accurate student dropout from the university. At the first phase, a comparison
is conducted within each data mining prediction method: logistic regression, decision tree, and
neural network, and then the preferred models are selected to be proceeded for the next
comparison, the best model from all the 14 models will be selected. In this case, since three of
the model have the same error of test and training so there are three best model that has the
least percent of test error which are Logistic Regression with Imputation (Default Selection),
Logistic Regression with Imputation and Transformation (Default Selection) and Logistic
Regression with Imputation and Transformation (Forward Selection).
77
6.0 CONCLUSION
After going through the process of identifying, data processing, trial-and-error, and
discovery, the proposed methodology has shown to be valid for predicting university student’s
dropout. To carry out this research, 3 data mining technique has been chosen to execute and
examine the behaviour of university students’ dropout, it consists of 1) Regression, 2) Decision
Tree, and 3) Neural Network. Diving deeper to each of the techniques, several sub-methods
had been implemented in order to cover a wider perspective and gain useful insights.
To obtain the best data mining technique for this particular case study, ‘Model
Comparison’ function has been executed. As a result, there are three technique that has the
equal least percentage of misclassification of 7.18% rate under Test dataset which makes it
qualified as the best model for this study. They are: 1) Logistic Regression with Imputation
(Default selection), 2) Logistic Regression with Imputation and Transformation (Default
selection), and 3) Logistic Regression with Imputation and Transformation (Forward selection).
To put things into perspective, the result obtained shows that in future, Universiti
Teknologi Petronas (UTP) could carry out any of the best modelling technique stated above to
understand and predict the behaviour of a particular student from the university. By knowing
the underlying pattern among university students, staff in Universiti Teknologi Petronas (UTP)
could recognise the potential of dropout in any of its students and hence provide care and
attention into solving the matter before it solidifies.
According to the results of regression technique, the most significant variable for the
prediction of the students dropout from the university is students’ performance and followed
by family burden and age. Besides that, according to the results of the 6 decision tree models,
we know that students’ performance is the most important point to identify whether the students
dropout or do not dropout from the university and followed by sponsorship, family burden,
living area, age and family income. Thus, we can conclude that students’ performance is the
main reason that students dropout from Universiti Teknologi Petronas.
In this line, it is important to realize that identifying students at risk of dropping out by
using the best method in this study is only the first step in truly addressing the issue of school
dropout. The next step is to identify the specific needs and problems of each individual student
who is in danger of dropping out and then to implement programmes to provide effective and
appropriate dropout ‐ prevention strategies. Therefore, stakeholders should be able to attend to
students' needs to help them in time to avoid dropout. As the pace of this society is increasing,
it is important for everyone to be a knowledgeable person not only to keep up but also to make
themselves useful in the journey of making this world a better place.
78
REFERENCE
Bjerk, D. (2012). Re-examining the impact of dropping out on criminal and labor outcomes in
early adulthood. Economics of Education Review, 31, 110 –122. Retrieved from
http://dx.doi.org/10.1016/j.econedurev .2011.09.003
Wang Deze. (October, 2018). Application of large data mining technology in Colleges
and Universities. ICBDR 2018 Proceedings of the 2nd International Conference on Big
Data Research, p86-89.
Rubén Manrique, Bernardo Pereira Nunes, Olga Marino, Marco Antonio Casanova, and Terhi
Nurmikko-Fuller. (Mar, 2019). An Analysis of Student Representation, Representative
Features and Classification Algorithms to Predict Degree Dropout. LAK19
Proceedings of the 9th International Conference on Learning Analytics & Knowledge,
p401-410.
Kyehong Kang, Sujing Wang. (Mar, 2018). Analyze and Predict Student Dropout from Online
Programs. ICCDA 2018 Proceedings of the 2nd International Conference on Compute
and Data Analysis, p6-12.
Sattar Ameri, Mahtab J. Fard, Ratna B. Chinnam, Chandan K. Reddy. (2016). Survival Analysis
based Framework for Early Prediction of Student Dropouts. CIKM '16 Proceedings of
the 25th ACM International on Conference on Information and Knowledge
Management, p903-912.
79
APPENDIX
Appendix 1: Decision Tree Rule of 2 Branches with Chi-Square as Target Criterion
*------------------------------------------------------------*
Node = 8
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 8
Number of Observations = 87
Predicted: STATUS__NEW=1 = 0.17
Predicted: STATUS__NEW=0 = 0.83
*------------------------------------------------------------*
Node = 10
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 10
Number of Observations = 292
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 14
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL
then
Tree Node Identifier = 14
Number of Observations = 423
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 16
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar <= BAIK
AND Kategori_Kawasan_Tinggal < 2.5 or MISSING
80
then
Tree Node Identifier = 16
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.20
Predicted: STATUS__NEW=0 = 0.80
*------------------------------------------------------------*
Node = 17
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar <= BAIK
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 17
Number of Observations = 5
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 20
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar <= BAIK or MISSING
then
Tree Node Identifier = 20
Number of Observations = 874
Predicted: STATUS__NEW=1 = 0.08
Predicted: STATUS__NEW=0 = 0.92
*------------------------------------------------------------*
Node = 21
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 21
Number of Observations = 311
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 22
81
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals All Values
then
Tree Node Identifier = 22
Number of Observations = 1862
Predicted: STATUS__NEW=1 = 0.03
Predicted: STATUS__NEW=0 = 0.97
*------------------------------------------------------------*
Node = 23
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 23
Number of Observations = 8
Predicted: STATUS__NEW=1 = 0.75
Predicted: STATUS__NEW=0 = 0.25
*------------------------------------------------------------*
Node = 26
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Bil_Tanggungan < 5.5
then
Tree Node Identifier = 26
Number of Observations = 750
Predicted: STATUS__NEW=1 = 0.17
Predicted: STATUS__NEW=0 = 0.83
*------------------------------------------------------------*
Node = 28
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
AND Kategori_Kawasan_Tinggal < 2.5 or MISSING
82
then
Tree Node Identifier = 28
Number of Observations = 104
Predicted: STATUS__NEW=1 = 0.19
Predicted: STATUS__NEW=0 = 0.81
*------------------------------------------------------------*
Node = 30
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL or
MISSING
then
Tree Node Identifier = 30
Number of Observations = 101
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 36
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Bil_Tanggungan >= 5.5 or MISSING
then
Tree Node Identifier = 36
Number of Observations = 277
Predicted: STATUS__NEW=1 = 0.36
Predicted: STATUS__NEW=0 = 0.64
*------------------------------------------------------------*
Node = 38
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 38
Number of Observations = 59
Predicted: STATUS__NEW=1 = 0.39
Predicted: STATUS__NEW=0 = 0.61
83
*------------------------------------------------------------*
Node = 39
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 39
Number of Observations = 10
Predicted: STATUS__NEW=1 = 0.90
Predicted: STATUS__NEW=0 = 0.10
*------------------------------------------------------------*
Node = 40
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kumpulan_Pendapatan_Keluarga <= M40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 40
Number of Observations = 69
Predicted: STATUS__NEW=1 = 0.59
Predicted: STATUS__NEW=0 = 0.41
*------------------------------------------------------------*
Node = 41
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 41
Number of Observations = 22
Predicted: STATUS__NEW=1 = 0.18
Predicted: STATUS__NEW=0 = 0.82
*------------------------------------------------------------*
Node = 42
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK or MISSING
84
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Kategori_Kawasan_Tinggal < 3.5 or MISSING
AND Bil_Tanggungan >= 5.5 or MISSING
then
Tree Node Identifier = 42
Number of Observations = 52
Predicted: STATUS__NEW=1 = 0.06
Predicted: STATUS__NEW=0 = 0.94
*------------------------------------------------------------*
Node = 43
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Kategori_Kawasan_Tinggal >= 3.5
AND Bil_Tanggungan >= 5.5 or MISSING
then
Tree Node Identifier = 43
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.55
Predicted: STATUS__NEW=0 = 0.45
85
Appendix 2: Decision Tree Rule of 2 Branches with Gini as Target Criterion
*------------------------------------------------------------*
Node = 8
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA PENTADBIRAN PERNIAGAAN, DIPLOMA MULTIMEDIA
DAN DAKWAH
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 8
Number of Observations = 79
Predicted: STATUS__NEW=1 = 0.15
Predicted: STATUS__NEW=0 = 0.85
*------------------------------------------------------------*
Node = 10
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 10
Number of Observations = 292
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 14
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL
then
Tree Node Identifier = 14
Number of Observations = 423
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 15
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
86
then
Tree Node Identifier = 15
Number of Observations = 1090
Predicted: STATUS__NEW=1 = 0.22
Predicted: STATUS__NEW=0 = 0.78
*------------------------------------------------------------*
Node = 18
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM <= BAIK
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 18
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.14
Predicted: STATUS__NEW=0 = 0.86
*------------------------------------------------------------*
Node = 19
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM >= CEMERLANG or MISSING
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 19
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.73
Predicted: STATUS__NEW=0 = 0.27
*------------------------------------------------------------*
Node = 23
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 23
Number of Observations = 311
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
87
*------------------------------------------------------------*
Node = 24
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals All Values
then
Tree Node Identifier = 24
Number of Observations = 1862
Predicted: STATUS__NEW=1 = 0.03
Predicted: STATUS__NEW=0 = 0.97
*------------------------------------------------------------*
Node = 25
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 25
Number of Observations = 8
Predicted: STATUS__NEW=1 = 0.75
Predicted: STATUS__NEW=0 = 0.25
*------------------------------------------------------------*
Node = 36
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL or
MISSING
then
Tree Node Identifier = 36
Number of Observations = 101
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 38
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar <= BAIK or MISSING
88
AND Bil_Tanggungan < 9.5
then
Tree Node Identifier = 38
Number of Observations = 860
Predicted: STATUS__NEW=1 = 0.07
Predicted: STATUS__NEW=0 = 0.93
*------------------------------------------------------------*
Node = 52
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA
PENTADBIRAN PERNIAGAAN, DIPLOMA MULTIMEDIA DAN DAKWAH or
MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
AND Kategori_Kawasan_Tinggal < 2.5 or MISSING
then
Tree Node Identifier = 52
Number of Observations = 65
Predicted: STATUS__NEW=1 = 0.29
Predicted: STATUS__NEW=0 = 0.71
*------------------------------------------------------------*
Node = 53
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA
PENTADBIRAN PERNIAGAAN, DIPLOMA MULTIMEDIA DAN DAKWAH or
MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 53
Number of Observations = 51
Predicted: STATUS__NEW=1 = 0.53
Predicted: STATUS__NEW=0 = 0.47
89
*------------------------------------------------------------*
Node = 54
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
then
Tree Node Identifier = 54
Number of Observations = 52
Predicted: STATUS__NEW=1 = 0.06
Predicted: STATUS__NEW=0 = 0.94
*------------------------------------------------------------*
Node = 55
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
then
Tree Node Identifier = 55
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 56
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kumpulan_Pendapatan_Keluarga <= M40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 56
Number of Observations = 69
Predicted: STATUS__NEW=1 = 0.59
Predicted: STATUS__NEW=0 = 0.41
*------------------------------------------------------------*
90
Node = 57
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 57
Number of Observations = 22
Predicted: STATUS__NEW=1 = 0.18
Predicted: STATUS__NEW=0 = 0.82
*------------------------------------------------------------*
Node = 60
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar <= BAIK or MISSING
AND Jenis_Sekolah IS ONE OF: SK or MISSING
AND Bil_Tanggungan >= 9.5 or MISSING
then
Tree Node Identifier = 60
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.57
Predicted: STATUS__NEW=0 = 0.43
*------------------------------------------------------------*
Node = 61
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar <= BAIK or MISSING
AND Jenis_Sekolah IS ONE OF: SA
AND Bil_Tanggungan >= 9.5 or MISSING
then
Tree Node Identifier = 61
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.14
Predicted: STATUS__NEW=0 = 0.86
91
Appendix 3: Decision Tree Rule of 2 Branches with Entropy as Target Criterion
*------------------------------------------------------------*
Node = 8
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA PENTADBIRAN PERNIAGAAN, DIPLOMA MULTIMEDIA
DAN DAKWAH
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 8
Number of Observations = 79
Predicted: STATUS__NEW=1 = 0.15
Predicted: STATUS__NEW=0 = 0.85
*------------------------------------------------------------*
Node = 10
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 10
Number of Observations = 292
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 14
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL
then
92
Tree Node Identifier = 14
Number of Observations = 423
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 18
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM <= BAIK
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 18
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.14
Predicted: STATUS__NEW=0 = 0.86
*------------------------------------------------------------*
Node = 19
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM >= CEMERLANG or MISSING
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 19
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.73
Predicted: STATUS__NEW=0 = 0.27
*------------------------------------------------------------*
Node = 22
93
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar <= BAIK or MISSING
then
Tree Node Identifier = 22
Number of Observations = 874
Predicted: STATUS__NEW=1 = 0.08
Predicted: STATUS__NEW=0 = 0.92
*------------------------------------------------------------*
Node = 23
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 23
Number of Observations = 311
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 24
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals All Values
then
Tree Node Identifier = 24
Number of Observations = 1862
Predicted: STATUS__NEW=1 = 0.03
Predicted: STATUS__NEW=0 = 0.97
94
*------------------------------------------------------------*
Node = 25
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 25
Number of Observations = 8
Predicted: STATUS__NEW=1 = 0.75
Predicted: STATUS__NEW=0 = 0.25
*------------------------------------------------------------*
Node = 28
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK, MAIS or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Bil_Tanggungan < 5.5
then
Tree Node Identifier = 28
Number of Observations = 750
Predicted: STATUS__NEW=1 = 0.17
Predicted: STATUS__NEW=0 = 0.83
*------------------------------------------------------------*
Node = 36
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL or
MISSING
then
Tree Node Identifier = 36
95
Number of Observations = 101
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 52
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA
PENTADBIRAN PERNIAGAAN, DIPLOMA MULTIMEDIA DAN DAKWAH or
MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
AND Kategori_Kawasan_Tinggal < 2.5 or MISSING
then
Tree Node Identifier = 52
Number of Observations = 65
Predicted: STATUS__NEW=1 = 0.29
Predicted: STATUS__NEW=0 = 0.71
*------------------------------------------------------------*
Node = 53
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA
PENTADBIRAN PERNIAGAAN, DIPLOMA MULTIMEDIA DAN DAKWAH or
MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
96
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 53
Number of Observations = 51
Predicted: STATUS__NEW=1 = 0.53
Predicted: STATUS__NEW=0 = 0.47
*------------------------------------------------------------*
Node = 54
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
then
Tree Node Identifier = 54
Number of Observations = 52
Predicted: STATUS__NEW=1 = 0.06
Predicted: STATUS__NEW=0 = 0.94
*------------------------------------------------------------*
Node = 55
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI, FELDA
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG
then
Tree Node Identifier = 55
97
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 56
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kumpulan_Pendapatan_Keluarga <= M40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 56
Number of Observations = 69
Predicted: STATUS__NEW=1 = 0.59
Predicted: STATUS__NEW=0 = 0.41
*------------------------------------------------------------*
Node = 57
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI, FELDA
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 57
Number of Observations = 22
Predicted: STATUS__NEW=1 = 0.18
Predicted: STATUS__NEW=0 = 0.82
*------------------------------------------------------------*
Node = 70
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN
98
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA KOMUNIKASI ISLAM
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Bil_Tanggungan >= 5.5 or MISSING
then
Tree Node Identifier = 70
Number of Observations = 111
Predicted: STATUS__NEW=1 = 0.50
Predicted: STATUS__NEW=0 = 0.50
*------------------------------------------------------------*
Node = 71
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN
AND Program IS ONE OF: DIPLOMA BAHASA DAN KESUSASTERAAN, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM, DIPLOMA PENTADBIRAN
PERNIAGAAN, DIPLOMA MULTIMEDIA DAN DAKWAH or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Bil_Tanggungan >= 5.5 or MISSING
then
Tree Node Identifier = 71
Number of Observations = 166
Predicted: STATUS__NEW=1 = 0.27
Predicted: STATUS__NEW=0 = 0.73
*------------------------------------------------------------*
Node = 72
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Kategori_Kawasan_Tinggal < 3.5 or MISSING
99
AND Bil_Tanggungan >= 5.5 or MISSING
then
Tree Node Identifier = 72
Number of Observations = 52
Predicted: STATUS__NEW=1 = 0.06
Predicted: STATUS__NEW=0 = 0.94
*------------------------------------------------------------*
Node = 73
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA or MISSING
AND Kategori_Kawasan_Tinggal >= 3.5
AND Bil_Tanggungan >= 5.5 or MISSING
then
Tree Node Identifier = 73
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.55
Predicted: STATUS__NEW=0 = 0.45
100
Appendix 4: Decision Tree Rule of 3 Branches with Chi-Square as Target Criterion
*------------------------------------------------------------*
Node = 3
*------------------------------------------------------------*
if Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 3
Number of Observations = 603
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 6
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 6
Number of Observations = 85
Predicted: STATUS__NEW=1 = 0.13
Predicted: STATUS__NEW=0 = 0.87
*------------------------------------------------------------*
Node = 7
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN, PTPTN/MAIPK
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 7
Number of Observations = 787
Predicted: STATUS__NEW=1 = 0.07
Predicted: STATUS__NEW=0 = 0.93
101
*------------------------------------------------------------*
Node = 9
*------------------------------------------------------------*
if Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL
then
Tree Node Identifier = 9
Number of Observations = 524
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 11
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: SENDIRI or MISSING
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 11
Number of Observations = 89
Predicted: STATUS__NEW=1 = 0.17
Predicted: STATUS__NEW=0 = 0.83
*------------------------------------------------------------*
Node = 23
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 23
Number of Observations = 153
Predicted: STATUS__NEW=1 = 0.10
Predicted: STATUS__NEW=0 = 0.90
102
*------------------------------------------------------------*
Node = 24
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI or MISSING
AND Kelas_Grade_Pelajar <= BAIK
AND Kategori_Kawasan_Tinggal < 2.5 or MISSING
then
Tree Node Identifier = 24
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.20
Predicted: STATUS__NEW=0 = 0.80
*------------------------------------------------------------*
Node = 25
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI or MISSING
AND Kelas_Grade_Pelajar <= BAIK
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 25
Number of Observations = 5
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 33
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Kategori_Kawasan_Tinggal < 1.5
103
then
Tree Node Identifier = 33
Number of Observations = 12
Predicted: STATUS__NEW=1 = 0.42
Predicted: STATUS__NEW=0 = 0.58
*------------------------------------------------------------*
Node = 34
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Kategori_Kawasan_Tinggal < 2.5 AND Kategori_Kawasan_Tinggal >= 1.5 or
MISSING
then
Tree Node Identifier = 34
Number of Observations = 90
Predicted: STATUS__NEW=1 = 0.17
Predicted: STATUS__NEW=0 = 0.83
*------------------------------------------------------------*
Node = 36
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals All Values
then
Tree Node Identifier = 36
Number of Observations = 1865
Predicted: STATUS__NEW=1 = 0.03
Predicted: STATUS__NEW=0 = 0.97
104
*------------------------------------------------------------*
Node = 37
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN, PTPTN/MAIPK or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 37
Number of Observations = 8
Predicted: STATUS__NEW=1 = 0.75
Predicted: STATUS__NEW=0 = 0.25
*------------------------------------------------------------*
Node = 38
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kumpulan_Pendapatan_Keluarga <= M40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 38
Number of Observations = 65
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 39
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 39
105
Number of Observations = 22
Predicted: STATUS__NEW=1 = 0.18
Predicted: STATUS__NEW=0 = 0.82
*------------------------------------------------------------*
Node = 40
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 40
Number of Observations = 423
Predicted: STATUS__NEW=1 = 0.15
Predicted: STATUS__NEW=0 = 0.85
*------------------------------------------------------------*
Node = 41
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 41
Number of Observations = 507
Predicted: STATUS__NEW=1 = 0.30
Predicted: STATUS__NEW=0 = 0.70
*------------------------------------------------------------*
Node = 42
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
106
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 42
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.73
Predicted: STATUS__NEW=0 = 0.27
*------------------------------------------------------------*
Node = 45
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: SENDIRI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 45
Number of Observations = 59
Predicted: STATUS__NEW=1 = 0.39
Predicted: STATUS__NEW=0 = 0.61
*------------------------------------------------------------*
Node = 46
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 46
Number of Observations = 9
Predicted: STATUS__NEW=1 = 1.00
107
Predicted: STATUS__NEW=0 = 0.00
108
Appendix 5: Decision Tree Rule of 3 Branches with Gini as Target Criterion
*------------------------------------------------------------*
Node = 3
*------------------------------------------------------------*
if Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 3
Number of Observations = 603
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 7
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN, PTPTN/MAIPK
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 7
Number of Observations = 787
Predicted: STATUS__NEW=1 = 0.07
Predicted: STATUS__NEW=0 = 0.93
*------------------------------------------------------------*
Node = 9
*------------------------------------------------------------*
if Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL
then
Tree Node Identifier = 9
Number of Observations = 524
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
109
Node = 11
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 11
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 12
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA PENTADBIRAN PERNIAGAAN
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 12
Number of Observations = 74
Predicted: STATUS__NEW=1 = 0.16
Predicted: STATUS__NEW=0 = 0.84
*------------------------------------------------------------*
Node = 14
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan < 2.5 or MISSING
then
Tree Node Identifier = 14
110
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.43
Predicted: STATUS__NEW=0 = 0.57
*------------------------------------------------------------*
Node = 15
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan < 4.5 AND Bil_Tanggungan >= 2.5
then
Tree Node Identifier = 15
Number of Observations = 25
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 22
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN/MAIPK or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 22
Number of Observations = 28
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 30
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
111
AND Kelas_Grade_SPM <= BAIK or MISSING
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 30
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.14
Predicted: STATUS__NEW=0 = 0.86
*------------------------------------------------------------*
Node = 31
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM <= MINIMA AND Kelas_Grade_SPM >= CEMERLANG
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 31
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.67
Predicted: STATUS__NEW=0 = 0.33
*------------------------------------------------------------*
Node = 32
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM >= SEDERHANA
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 32
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.80
Predicted: STATUS__NEW=0 = 0.20
112
*------------------------------------------------------------*
Node = 33
*------------------------------------------------------------*
if Umur < 18.5
AND Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 33
Number of Observations = 23
Predicted: STATUS__NEW=1 = 0.04
Predicted: STATUS__NEW=0 = 0.96
*------------------------------------------------------------*
Node = 34
*------------------------------------------------------------*
if Umur < 22.5 AND Umur >= 18.5 or MISSING
AND Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 34
Number of Observations = 25
Predicted: STATUS__NEW=1 = 0.16
Predicted: STATUS__NEW=0 = 0.84
*------------------------------------------------------------*
Node = 35
*------------------------------------------------------------*
if Umur >= 22.5
AND Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
113
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 35
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 48
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 7.5
then
Tree Node Identifier = 48
Number of Observations = 1695
Predicted: STATUS__NEW=1 = 0.03
Predicted: STATUS__NEW=0 = 0.97
*------------------------------------------------------------*
Node = 49
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan >= 7.5
then
Tree Node Identifier = 49
Number of Observations = 142
Predicted: STATUS__NEW=1 = 0.08
Predicted: STATUS__NEW=0 = 0.92
114
*------------------------------------------------------------*
Node = 50
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 50
Number of Observations = 8
Predicted: STATUS__NEW=1 = 0.75
Predicted: STATUS__NEW=0 = 0.25
*------------------------------------------------------------*
Node = 53
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 53
Number of Observations = 18
Predicted: STATUS__NEW=1 = 0.83
Predicted: STATUS__NEW=0 = 0.17
*------------------------------------------------------------*
Node = 55
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PERBANKAN DAN KEWANGAN I, DIPLOMA
PENTADBIRAN PERNIAGAAN
AND Kelas_Grade_Pelajar >= SEDERHANA
then
115
Tree Node Identifier = 55
Number of Observations = 18
Predicted: STATUS__NEW=1 = 0.22
Predicted: STATUS__NEW=0 = 0.78
*------------------------------------------------------------*
Node = 59
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 2.5 or MISSING
then
Tree Node Identifier = 59
Number of Observations = 74
Predicted: STATUS__NEW=1 = 0.05
Predicted: STATUS__NEW=0 = 0.95
*------------------------------------------------------------*
Node = 91
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga >= M40 AND Kumpulan_Pendapatan_Keluarga
<= M40
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 91
Number of Observations = 9
Predicted: STATUS__NEW=1 = 0.67
Predicted: STATUS__NEW=0 = 0.33
116
*------------------------------------------------------------*
Node = 92
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 92
Number of Observations = 14
Predicted: STATUS__NEW=1 = 0.21
Predicted: STATUS__NEW=0 = 0.79
*------------------------------------------------------------*
Node = 93
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PERBANKAN DAN KEWANGAN I, DIPLOMA
PENGURUSAN MUAMALAT, DIPLOMA PENTADBIRAN PERNIAGAAN,
DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 93
Number of Observations = 31
Predicted: STATUS__NEW=1 = 0.29
Predicted: STATUS__NEW=0 = 0.71
*------------------------------------------------------------*
Node = 95
117
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PERBANKAN DAN KEWANGAN I, DIPLOMA
PENGURUSAN MUAMALAT, DIPLOMA PENTADBIRAN PERNIAGAAN,
DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan >= 6.5 or MISSING
then
Tree Node Identifier = 95
Number of Observations = 10
Predicted: STATUS__NEW=1 = 0.20
Predicted: STATUS__NEW=0 = 0.80
*------------------------------------------------------------*
Node = 96
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 96
Number of Observations = 52
Predicted: STATUS__NEW=1 = 0.06
Predicted: STATUS__NEW=0 = 0.94
*------------------------------------------------------------*
Node = 97
*------------------------------------------------------------*
if Umur >= 18.5
118
AND Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 97
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 108
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT, DIPLOMA
PERAKAUNAN, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 108
Number of Observations = 12
Predicted: STATUS__NEW=1 = 0.08
Predicted: STATUS__NEW=0 = 0.92
*------------------------------------------------------------*
Node = 112
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA
PENGAJIAN ISLAM, DIPLOMA KAUNSELING ISLAMI, DIPLOMA BAHASA
ARAB DENGAN PENDI or MISSING
119
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 112
Number of Observations = 178
Predicted: STATUS__NEW=1 = 0.16
Predicted: STATUS__NEW=0 = 0.84
*------------------------------------------------------------*
Node = 113
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENTADBIRAN PERNIAGAAN, DIPLOMA
KOMUNIKASI ISLAM, DIPLOMA SAINS KOMPUTER DAN RANGK, DIPLOMA
MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 113
Number of Observations = 162
Predicted: STATUS__NEW=1 = 0.07
Predicted: STATUS__NEW=0 = 0.93
*------------------------------------------------------------*
Node = 114
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA PENTADBIRAN PERNIAGAAN, DIPLOMA KOMUNIKASI
ISLAM
AND Kelas_Grade_Pelajar >= SEDERHANA
120
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 114
Number of Observations = 220
Predicted: STATUS__NEW=1 = 0.39
Predicted: STATUS__NEW=0 = 0.61
*------------------------------------------------------------*
Node = 116
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SAINS KOMPUTER DAN RANGK, DIPLOMA
MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 116
Number of Observations = 31
Predicted: STATUS__NEW=1 = 0.13
Predicted: STATUS__NEW=0 = 0.87
*------------------------------------------------------------*
Node = 117
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: PTPTN or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 117
Number of Observations = 6
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
121
*------------------------------------------------------------*
Node = 118
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: PTPTN or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 118
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.40
Predicted: STATUS__NEW=0 = 0.60
*------------------------------------------------------------*
Node = 123
*------------------------------------------------------------*
if Umur < 21.5 AND Umur >= 18.5 or MISSING
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 123
Number of Observations = 36
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 124
*------------------------------------------------------------*
if Umur >= 21.5
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_Pelajar >= SEDERHANA
122
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 124
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.17
Predicted: STATUS__NEW=0 = 0.83
*------------------------------------------------------------*
Node = 125
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_SPM <= CEMERLANG
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal >= 3.5
then
Tree Node Identifier = 125
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.27
Predicted: STATUS__NEW=0 = 0.73
*------------------------------------------------------------*
Node = 126
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_SPM >= MINIMA or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal >= 3.5
then
Tree Node Identifier = 126
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.83
Predicted: STATUS__NEW=0 = 0.17
123
*------------------------------------------------------------*
Node = 157
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 157
Number of Observations = 12
Predicted: STATUS__NEW=1 = 0.42
Predicted: STATUS__NEW=0 = 0.58
*------------------------------------------------------------*
Node = 158
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 6.5 AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 158
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.33
Predicted: STATUS__NEW=0 = 0.67
*------------------------------------------------------------*
124
Node = 159
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan >= 6.5 or MISSING
then
Tree Node Identifier = 159
Number of Observations = 6
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 165
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PERBANKAN DAN KEWANGAN I, DIPLOMA
PENGURUSAN MUAMALAT, DIPLOMA PENTADBIRAN PERNIAGAAN,
DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kumpulan_Pendapatan_Keluarga <= B40
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 6.5 AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 165
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.82
Predicted: STATUS__NEW=0 = 0.18
*------------------------------------------------------------*
125
Node = 166
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PERBANKAN DAN KEWANGAN I, DIPLOMA
PENGURUSAN MUAMALAT, DIPLOMA PENTADBIRAN PERNIAGAAN,
DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kumpulan_Pendapatan_Keluarga >= M40 or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 6.5 AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 166
Number of Observations = 14
Predicted: STATUS__NEW=1 = 0.29
Predicted: STATUS__NEW=0 = 0.71
*------------------------------------------------------------*
Node = 187
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT, DIPLOMA
PERAKAUNAN, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 2.5
then
Tree Node Identifier = 187
Number of Observations = 9
Predicted: STATUS__NEW=1 = 0.33
Predicted: STATUS__NEW=0 = 0.67
*------------------------------------------------------------*
126
Node = 188
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT, DIPLOMA
PERAKAUNAN, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal >= 2.5 or MISSING
then
Tree Node Identifier = 188
Number of Observations = 17
Predicted: STATUS__NEW=1 = 0.71
Predicted: STATUS__NEW=0 = 0.29
*------------------------------------------------------------*
Node = 189
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT, DIPLOMA
PERAKAUNAN, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kumpulan_Pendapatan_Keluarga >= M40 AND Kumpulan_Pendapatan_Keluarga
<= M40
AND Kelas_Grade_SPM <= CEMERLANG or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 189
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.43
Predicted: STATUS__NEW=0 = 0.57
*------------------------------------------------------------*
127
Node = 190
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT, DIPLOMA
PERAKAUNAN, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kumpulan_Pendapatan_Keluarga >= M40 AND Kumpulan_Pendapatan_Keluarga
<= M40
AND Kelas_Grade_SPM >= MINIMA
AND Kelas_Grade_Pelajar >= SEDERHANA
then
Tree Node Identifier = 190
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.83
Predicted: STATUS__NEW=0 = 0.17
*------------------------------------------------------------*
Node = 195
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA TEKNOLOGI MAKLUMAT
AND Kelas_Grade_SPM <= CEMERLANG or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 195
Number of Observations = 58
Predicted: STATUS__NEW=1 = 0.26
Predicted: STATUS__NEW=0 = 0.74
*------------------------------------------------------------*
Node = 196
128
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA TEKNOLOGI MAKLUMAT
AND Kelas_Grade_SPM >= MINIMA AND Kelas_Grade_SPM <= MINIMA
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 196
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 197
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA TEKNOLOGI MAKLUMAT
AND Kelas_Grade_SPM >= SEDERHANA
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 197
Number of Observations = 20
Predicted: STATUS__NEW=1 = 0.30
Predicted: STATUS__NEW=0 = 0.70
*------------------------------------------------------------*
Node = 207
*------------------------------------------------------------*
if Umur < 19.5 or MISSING
AND Tajaan IS ONE OF: PTPTN or MISSING
129
AND Program IS ONE OF: DIPLOMA BAHASA DAN KESUSASTERAAN, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 207
Number of Observations = 231
Predicted: STATUS__NEW=1 = 0.23
Predicted: STATUS__NEW=0 = 0.77
*------------------------------------------------------------*
Node = 208
*------------------------------------------------------------*
if Umur < 20.5 AND Umur >= 19.5
AND Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA BAHASA DAN KESUSASTERAAN, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 208
Number of Observations = 15
Predicted: STATUS__NEW=1 = 0.27
Predicted: STATUS__NEW=0 = 0.73
*------------------------------------------------------------*
Node = 209
*------------------------------------------------------------*
if Umur >= 20.5
AND Tajaan IS ONE OF: PTPTN or MISSING
130
AND Program IS ONE OF: DIPLOMA BAHASA DAN KESUSASTERAAN, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 209
Number of Observations = 10
Predicted: STATUS__NEW=1 = 0.80
Predicted: STATUS__NEW=0 = 0.20
*------------------------------------------------------------*
Node = 216
*------------------------------------------------------------*
if Umur < 18.5
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 216
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 217
*------------------------------------------------------------*
if Umur < 18.5
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Program equals Missing
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
131
then
Tree Node Identifier = 217
Number of Observations = 15
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
132
Appendix 6: Decision Tree Rule of 3 Branches with Entropy as Target Criterion
*------------------------------------------------------------*
Node = 3
*------------------------------------------------------------*
if Kelas_Grade_Pelajar >= BERHENTI AND Kelas_Grade_Pelajar <= BERHENTI
then
Tree Node Identifier = 3
Number of Observations = 603
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 7
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN, PTPTN/MAIPK
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 7
Number of Observations = 787
Predicted: STATUS__NEW=1 = 0.07
Predicted: STATUS__NEW=0 = 0.93
*------------------------------------------------------------*
Node = 9
*------------------------------------------------------------*
if Kelas_Grade_Pelajar >= GAGAL AND Kelas_Grade_Pelajar <= GAGAL
then
Tree Node Identifier = 9
Number of Observations = 524
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
133
Node = 11
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 11
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 12
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA PENTADBIRAN PERNIAGAAN
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 12
Number of Observations = 74
Predicted: STATUS__NEW=1 = 0.16
Predicted: STATUS__NEW=0 = 0.84
*------------------------------------------------------------*
Node = 14
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan < 2.5 or MISSING
then
Tree Node Identifier = 14
134
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.43
Predicted: STATUS__NEW=0 = 0.57
*------------------------------------------------------------*
Node = 15
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan < 4.5 AND Bil_Tanggungan >= 2.5
then
Tree Node Identifier = 15
Number of Observations = 25
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 22
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN/MAIPK or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 22
Number of Observations = 28
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 30
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
135
AND Kelas_Grade_SPM <= BAIK or MISSING
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 30
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.14
Predicted: STATUS__NEW=0 = 0.86
*------------------------------------------------------------*
Node = 31
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM <= MINIMA AND Kelas_Grade_SPM >= CEMERLANG
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 31
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.67
Predicted: STATUS__NEW=0 = 0.33
*------------------------------------------------------------*
Node = 32
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI or MISSING
AND Program IS ONE OF: DIPLOMA TEKNOLOGI MAKLUMAT or MISSING
AND Kelas_Grade_SPM >= SEDERHANA
AND Kelas_Grade_Pelajar <= BAIK
then
Tree Node Identifier = 32
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.80
Predicted: STATUS__NEW=0 = 0.20
136
*------------------------------------------------------------*
Node = 33
*------------------------------------------------------------*
if Umur < 19.5 or MISSING
AND Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 33
Number of Observations = 39
Predicted: STATUS__NEW=1 = 0.13
Predicted: STATUS__NEW=0 = 0.87
*------------------------------------------------------------*
Node = 34
*------------------------------------------------------------*
if Umur < 22.5 AND Umur >= 19.5
AND Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 34
Number of Observations = 9
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 35
*------------------------------------------------------------*
if Umur >= 22.5
AND Tajaan IS ONE OF: MAIPK
AND Kelas_Grade_Pelajar <= BAIK
137
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 35
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 43
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PERBANKAN DAN KEWANGAN I, DIPLOMA
PENGURUSAN MUAMALAT, DIPLOMA PENTADBIRAN PERNIAGAAN,
DIPLOMA MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 43
Number of Observations = 66
Predicted: STATUS__NEW=1 = 0.36
Predicted: STATUS__NEW=0 = 0.64
*------------------------------------------------------------*
Node = 45
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 7.5
then
Tree Node Identifier = 45
Number of Observations = 1695
Predicted: STATUS__NEW=1 = 0.03
138
Predicted: STATUS__NEW=0 = 0.97
*------------------------------------------------------------*
Node = 46
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan >= 7.5
then
Tree Node Identifier = 46
Number of Observations = 142
Predicted: STATUS__NEW=1 = 0.08
Predicted: STATUS__NEW=0 = 0.92
*------------------------------------------------------------*
Node = 47
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 47
Number of Observations = 8
Predicted: STATUS__NEW=1 = 0.75
Predicted: STATUS__NEW=0 = 0.25
*------------------------------------------------------------*
Node = 51
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kelas_Grade_Pelajar >= SEDERHANA
139
AND Bil_Tanggungan < 7.5 AND Bil_Tanggungan >= 6.5 or MISSING
then
Tree Node Identifier = 51
Number of Observations = 8
Predicted: STATUS__NEW=1 = 0.13
Predicted: STATUS__NEW=0 = 0.88
*------------------------------------------------------------*
Node = 52
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 7.5
then
Tree Node Identifier = 52
Number of Observations = 7
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 56
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 2.5 or MISSING
then
Tree Node Identifier = 56
Number of Observations = 74
Predicted: STATUS__NEW=1 = 0.05
Predicted: STATUS__NEW=0 = 0.95
*------------------------------------------------------------*
Node = 75
140
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga >= M40 AND Kumpulan_Pendapatan_Keluarga
<= M40
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 75
Number of Observations = 9
Predicted: STATUS__NEW=1 = 0.67
Predicted: STATUS__NEW=0 = 0.33
*------------------------------------------------------------*
Node = 76
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 76
Number of Observations = 14
Predicted: STATUS__NEW=1 = 0.21
Predicted: STATUS__NEW=0 = 0.79
*------------------------------------------------------------*
Node = 79
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
141
AND Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 79
Number of Observations = 52
Predicted: STATUS__NEW=1 = 0.06
Predicted: STATUS__NEW=0 = 0.94
*------------------------------------------------------------*
Node = 80
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA PENGAJIAN ISLAM, DIPLOMA
KAUNSELING ISLAMI
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
then
Tree Node Identifier = 80
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 88
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kumpulan_Pendapatan_Keluarga >= M40 AND Kumpulan_Pendapatan_Keluarga
<= M40
AND Kelas_Grade_Pelajar >= SEDERHANA
142
AND Bil_Tanggungan < 6.5
then
Tree Node Identifier = 88
Number of Observations = 14
Predicted: STATUS__NEW=1 = 0.71
Predicted: STATUS__NEW=0 = 0.29
*------------------------------------------------------------*
Node = 89
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kumpulan_Pendapatan_Keluarga >= T20
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 6.5
then
Tree Node Identifier = 89
Number of Observations = 19
Predicted: STATUS__NEW=1 = 0.16
Predicted: STATUS__NEW=0 = 0.84
*------------------------------------------------------------*
Node = 91
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA
PENGAJIAN ISLAM, DIPLOMA KAUNSELING ISLAMI, DIPLOMA BAHASA
ARAB DENGAN PENDI or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 91
Number of Observations = 178
Predicted: STATUS__NEW=1 = 0.16
143
Predicted: STATUS__NEW=0 = 0.84
*------------------------------------------------------------*
Node = 92
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENTADBIRAN PERNIAGAAN, DIPLOMA
KOMUNIKASI ISLAM, DIPLOMA SAINS KOMPUTER DAN RANGK, DIPLOMA
MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 92
Number of Observations = 162
Predicted: STATUS__NEW=1 = 0.07
Predicted: STATUS__NEW=0 = 0.93
*------------------------------------------------------------*
Node = 93
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA PENGURUSAN MUAMALAT, DIPLOMA KAUNSELING
ISLAMI, DIPLOMA PENTADBIRAN PERNIAGAAN, DIPLOMA KOMUNIKASI
ISLAM
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 93
Number of Observations = 220
Predicted: STATUS__NEW=1 = 0.39
Predicted: STATUS__NEW=0 = 0.61
144
*------------------------------------------------------------*
Node = 95
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SAINS KOMPUTER DAN RANGK, DIPLOMA
MULTIMEDIA DAN DAKWAH
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 95
Number of Observations = 31
Predicted: STATUS__NEW=1 = 0.13
Predicted: STATUS__NEW=0 = 0.87
*------------------------------------------------------------*
Node = 96
*------------------------------------------------------------*
if Umur < 18.5 or MISSING
AND Tajaan IS ONE OF: PTPTN or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 96
Number of Observations = 6
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 97
*------------------------------------------------------------*
if Umur >= 18.5
AND Tajaan IS ONE OF: PTPTN or MISSING
145
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan equals Missing
then
Tree Node Identifier = 97
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.40
Predicted: STATUS__NEW=0 = 0.60
*------------------------------------------------------------*
Node = 102
*------------------------------------------------------------*
if Umur < 21.5 AND Umur >= 18.5 or MISSING
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 102
Number of Observations = 36
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
*------------------------------------------------------------*
Node = 103
*------------------------------------------------------------*
if Umur >= 21.5
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 103
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.17
Predicted: STATUS__NEW=0 = 0.83
146
*------------------------------------------------------------*
Node = 104
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_SPM <= CEMERLANG
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal >= 3.5
then
Tree Node Identifier = 104
Number of Observations = 11
Predicted: STATUS__NEW=1 = 0.27
Predicted: STATUS__NEW=0 = 0.73
*------------------------------------------------------------*
Node = 105
*------------------------------------------------------------*
if Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Kelas_Grade_SPM >= MINIMA or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal >= 3.5
then
Tree Node Identifier = 105
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.83
Predicted: STATUS__NEW=0 = 0.17
*------------------------------------------------------------*
Node = 131
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
147
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 131
Number of Observations = 12
Predicted: STATUS__NEW=1 = 0.42
Predicted: STATUS__NEW=0 = 0.58
*------------------------------------------------------------*
Node = 132
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan < 6.5 AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 132
Number of Observations = 6
Predicted: STATUS__NEW=1 = 0.33
Predicted: STATUS__NEW=0 = 0.67
*------------------------------------------------------------*
Node = 133
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN or MISSING
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
148
AND Kelas_Grade_Pelajar >= CEMERLANG AND Kelas_Grade_Pelajar <=
CEMERLANG or MISSING
AND Bil_Tanggungan >= 6.5 or MISSING
then
Tree Node Identifier = 133
Number of Observations = 6
Predicted: STATUS__NEW=1 = 1.00
Predicted: STATUS__NEW=0 = 0.00
*------------------------------------------------------------*
Node = 155
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 2.5
then
Tree Node Identifier = 155
Number of Observations = 7
Predicted: STATUS__NEW=1 = 0.29
Predicted: STATUS__NEW=0 = 0.71
*------------------------------------------------------------*
Node = 156
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 3.5 AND Bil_Tanggungan >= 2.5
then
Tree Node Identifier = 156
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.80
149
Predicted: STATUS__NEW=0 = 0.20
*------------------------------------------------------------*
Node = 157
*------------------------------------------------------------*
if Tajaan IS ONE OF: SENDIRI
AND Kumpulan_Pendapatan_Keluarga <= B40 or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 6.5 AND Bil_Tanggungan >= 3.5 or MISSING
then
Tree Node Identifier = 157
Number of Observations = 27
Predicted: STATUS__NEW=1 = 0.59
Predicted: STATUS__NEW=0 = 0.41
*------------------------------------------------------------*
Node = 162
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA TEKNOLOGI MAKLUMAT
AND Kelas_Grade_SPM <= CEMERLANG or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 162
Number of Observations = 58
Predicted: STATUS__NEW=1 = 0.26
Predicted: STATUS__NEW=0 = 0.74
*------------------------------------------------------------*
Node = 163
*------------------------------------------------------------*
150
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA TEKNOLOGI MAKLUMAT
AND Kelas_Grade_SPM >= MINIMA AND Kelas_Grade_SPM <= MINIMA
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 163
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 164
*------------------------------------------------------------*
if Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH, DIPLOMA
USULUDDIN, DIPLOMA TEKNOLOGI MAKLUMAT
AND Kelas_Grade_SPM >= SEDERHANA
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan < 4.5
then
Tree Node Identifier = 164
Number of Observations = 20
Predicted: STATUS__NEW=1 = 0.30
Predicted: STATUS__NEW=0 = 0.70
*------------------------------------------------------------*
Node = 174
*------------------------------------------------------------*
if Umur < 19.5 or MISSING
AND Tajaan IS ONE OF: PTPTN or MISSING
151
AND Program IS ONE OF: DIPLOMA BAHASA DAN KESUSASTERAAN, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 174
Number of Observations = 231
Predicted: STATUS__NEW=1 = 0.23
Predicted: STATUS__NEW=0 = 0.77
*------------------------------------------------------------*
Node = 175
*------------------------------------------------------------*
if Umur < 20.5 AND Umur >= 19.5
AND Tajaan IS ONE OF: PTPTN or MISSING
AND Program IS ONE OF: DIPLOMA BAHASA DAN KESUSASTERAAN, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 175
Number of Observations = 15
Predicted: STATUS__NEW=1 = 0.27
Predicted: STATUS__NEW=0 = 0.73
*------------------------------------------------------------*
Node = 176
*------------------------------------------------------------*
if Umur >= 20.5
AND Tajaan IS ONE OF: PTPTN or MISSING
152
AND Program IS ONE OF: DIPLOMA BAHASA DAN KESUSASTERAAN, DIPLOMA
TEKNOLOGI MAKLUMAT, DIPLOMA PERAKAUNAN, DIPLOMA PERBANKAN
DAN KEWANGAN I, DIPLOMA PENGAJIAN ISLAM or MISSING
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Bil_Tanggungan >= 4.5
then
Tree Node Identifier = 176
Number of Observations = 10
Predicted: STATUS__NEW=1 = 0.80
Predicted: STATUS__NEW=0 = 0.20
*------------------------------------------------------------*
Node = 183
*------------------------------------------------------------*
if Umur < 18.5
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Program IS ONE OF: DIPLOMA SYARIAH ISLAMIYYAH
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
then
Tree Node Identifier = 183
Number of Observations = 5
Predicted: STATUS__NEW=1 = 0.60
Predicted: STATUS__NEW=0 = 0.40
*------------------------------------------------------------*
Node = 184
*------------------------------------------------------------*
if Umur < 18.5
AND Tajaan IS ONE OF: MAIPK, PTPTN/MAIPK
AND Program equals Missing
AND Kelas_Grade_Pelajar >= SEDERHANA
AND Kategori_Kawasan_Tinggal < 3.5 AND Kategori_Kawasan_Tinggal >= 2.5
153
then
Tree Node Identifier = 184
Number of Observations = 15
Predicted: STATUS__NEW=1 = 0.00
Predicted: STATUS__NEW=0 = 1.00
154