You are on page 1of 8

Journal of Informatics Engineering CIT Medicom, xx (x) (20xx) xx-xx

Published by:Institute of Computer Science (IOCS)

CIT Medicom Information Engineering Journal


Journal homepage:www.medikom.iocspublisher.org

Application of the Decision Tree Algorithm to classify the


interest level of students studying at Pelita Bangsa University
Kaleb Suy1, Abdul Halim Anshor, S.Kom., M.Kom2, Heri Anto Simamora3, Mutia Adinda Utami4,
Michael Valentino Laisina5, Chandra Wijaya Pardosi6
1) 2),3),4),5),6)Department of Informatics, Pelita Bangsa University, West Java, Indonesia
1)khzputrafsr@gmail.com, 2)abdulhalimanshor@pelitabangsa.ac.id, 3)heriantosimamora6@gmail.com,
4)mutiaadindautami2607@gmail.com, 5)mikelvalentino47@gmail.com, 6)chandrawijayapardosi@gmail.coom,

Article Info Abstract

Article history Improving the quality of higher education services is the main focus
for universities in responding to student needs. This study proposes
Received : Jan' 2024
the application of the Decision Tree algorithm to classify students'
Revised : Jan' 2024
level of interest in studying at Pelita Bangsa University. Data collection
Accepted : Jan' 2024
was carried out through surveys and structured interviews with
students registered in various study programs. This data is then used
Keywords: to train a Decision Tree model to understand patterns and factors that
influence student interest.
Classification, Algorithm
Decision Tree, C4.5, This model allows the identification of critical variables that contribute
Classification of Student significantly to students' decisions to attend college. The results of the
Interests, Pelita Bangsa interest level classification can provide insight to universities to
University improve new student admission strategies, develop academic guidance
programs, and develop facilities that suit student preferences. The
successful implementation of the Decision Tree algorithm is expected
to strengthen Pelita Bangsa University's competitiveness in providing
a satisfying and relevant academic experience for students. This study
provides a contribution to the literature on the application of artificial
intelligence in higher education management and provides a
foundation for the development of more effective decision-making
strategies in the future.

Corresponding Author:
Kaleb Suy,
Depadrtement of Informatics
Pelita Bangsa University
South Cikarang, Bekasi, West Java, Indonesia, 17530
khzputrafsr@gmail.com
This is an open access article under theCC BY-NClicense.

Homepage: www.medikom.iocspublisher.org
JTI CIT p-ISSN 2337-8646 e-ISSN 2721-561X  2

Introduction

Higher education plays an important role in the formation of a person and his contribution to
the progress of society. It is important to assess students' interest in attending a college to understand
their needs and maximize their learning experience. The use of decision tree algorithms emerged as a
smart solution to overcome this complexity by analyzing and classifying students' interest levels.

Pelita Bangsa University, a progressive educational institution, is committed to providing high


quality education and thoroughly understanding the characteristics of students attending college.
Universities can use decision tree algorithms to discover patterns that influence student interest,
strengthen student determination, and improve the overall quality of learning.

In this case, this research examines the use of the decision tree algorithm to classify the level
of interest of students studying at Pelita Bangsa University. By understanding the factors that influence
student interest, it is hoped that universities can take proactive action in improving learning strategies,
developing more effective mentoring programs, and improving the quality of the academic experience.

This research also issues this issue from two perspectives, it is hoped that the results of this
research can provide meaningful insight for Pelita Bangsa University and other higher education
institutions who wish to increase their understanding of student interests.

Through this research, we hope to make a positive contribution to the development of


knowledge in the field of higher education management and provide a basis for more informed
decision making at the university level.

Method

The data used to form a decision tree to analyze the determination of students studying at Pelita
Bangsa University is Quality, Strategic, Cost, Facilities and Career Demands. The data will then be pre-
processed to produce case data that is ready to be formed into a decision tree.
Incomplete data is caused by empty data or incorrect attributes. Likewise, with the interest data of
students studying at Universita Pelita Bangsa, based on student experience, there are some attributes
that are not necessary, so the Data Preprocessing process needs to be carried out so that the data base
complies with the required provisions.
Data Preprocessing is an important thing in the data mining process, which includes:

1. Data Selection
Data on the interests of students studying at UPB based on this experience will become
case data in the operational data mining process. From the existing data, the column taken as
the decision attribute is the result, while the column taken as the determining attribute in
forming the decision tree is:
a. Student name
b. Residence
c. Student Jobs
d. Campus Excellence

Title… (First author, et al)


3  p-ISSN 2337-8646 e-ISSN 2721-561X

2. Data Preprocessing / Data Cleaning


This cleaning process includes, among other things; remove duplicate data, check for
inconsistent data and correct errors in data. Of the 224 data collected, it will be analyzed
whether there is data that is inconsistent or irrelevant so that it will disrupt the rule pattern of
the algorithm that will be formed.

3. Data Transformation
In this process, data is transferred into a form suitable for the data mining process.
(Cynthia & Ismanto, 2018)

4. Decision Trees
The C4.5 algorithm is an algorithm used to carry out predictive classification or
segmentation. The C4.5 algorithm is used to form decision trees. Decision trees are a very
powerful and well-known classification and prediction method. Decision trees convert very
large facts into rules, so that the rules can be easily understood (Marlina & Bakri, 2021).

According to Kusrini in his book, in general the C4.5 algorithm for building a decision
tree consists of several stages. Here's the outline.
a. Select the attribute as root
b. Create a branch for each value
c. Divide cases into branches
d. Repeat the process for each branch until all cases on the branch have the same class.

Selecting an attribute as the root node. Selecting the attribute that will be used as the
root is by calculating the gain value of all the attributes. And the one chosen as the first root is
the one with the highest gain value. However, before determining the gain value, first calculate
the entropy value. Determining the entropy value uses the following equation.

Information:
S = set of cases
n = number of partitions
pi = proportion of Si to S

After that, determine the gain value using the equation.

Information:
S = set of cases
A = attribute
n = number of attribute A partitions
|Si| = number of cases in the i-th partition
|S| = number of cases in s

JTI CIT, Vol. xx, No. xx month 20xx: xx-xx


JTI CIT p-ISSN 2337-8646 e-ISSN 2721-561X  4

Results and Discussion

1. Business Understanding
The problem with this research is that many students leave in the middle of lectures
because they feel they have taken a wrong major that does not suit their talents and interests.
In this research, problem classification was carried out to determine the variables that most
influence determining majors in higher education to help students.

2. Data Understanding
The data used in this research was obtained from the results of questionnaires on
several students at Pelita Bangsa University.

Table 1. Dataset

3. Data Prepartion
After the data collection process, the data is prepared for the testing stage. This stage
includes selecting tables, records and data variables, including the process of cleaning
variables that will not affect the final results.

Table 2. Testing Data

Title… (First author, et al)


5  p-ISSN 2337-8646 e-ISSN 2721-561X

4. Modelling

The modeling process is the process of testing the decision tree model, after going
through the data preparation process, the Set Role process continues which functions to
determine labels, then uses decision tree validation in the training process, while for the
testing process uses the apply model. The training and testing process to obtain the confusion
matrix, namely the accuracy level, class precision, class recall, and the ROC curve, namely the
AUC value.

Figure 1. Modeling

5. Results and Discussion


5.1. Decision Tree algorithm modeling C4.5
The C4.5 Decision Tree algorithm modeling consists of 12 attributes which are the
attributes of the selected study program and the class is the final prediction result. The
model of the Decision Tree C4.5 algorithm is in the form of a decision tree using
rapidminer, as shown in the image below:

Figure 2. Decision tree

JTI CIT, Vol. xx, No. xx month 20xx: xx-xx


JTI CIT p-ISSN 2337-8646 e-ISSN 2721-561X  6

In the decision tree modeling in Figure 2 there are rules generated from the Decision
Tree C4.5 classification algorithm.

5.2. Evaluation
In this research, after the classification results and rules in the form of a decision tree
were created, the performance of the classification model with Decision Tree C4.5 was
then tested using cross validation (Confusion Matrix) and Area Under Cover (AUC).
Details of classification errors are obtained from the Confusion Matrix.

Table 3. Confusion Matrix

Title… (First author, et al)


7  p-ISSN 2337-8646 e-ISSN 2721-561X

Conclusion

The conclusions obtained in this research are as follows:


- This classification system application can provide cross-interest class results through
attributes of 25 items for each alternative. The attributes used are obtained from the
criteria in Talent according to the study program.
- In this calculation there are 2 stages, namely the training data calculation stage where 30
student data is taken for processing. And the data testing stage involved 26 student data,
at this stage an accuracy value of 77.78% was obtained.
- This study program classification model according to talent can be used as an alternative
reference for lecturers to be able to group students based on interests and talents when
this decision tree is formed.

Reference

[1] Khusnul Khotimah, "Decision Tree Algorithm (C4.5) for Predicting KIP Scholarship Selection at
Muhammadiyah University Kotabumi," October 2021.
[2] Suryadi Hozeng, Sitti Aisa, "Data Mining Application Using the Decision Tree Method for Credit Risk
Prediction," STMIK Dipanegara Makassar, August 2016.
[3] Annas Prasetio, "SIMULATION OF THE APPLICATION OF THE DECISION TREE (C4.5) METHOD IN
DETERMINING THE NUTRITIONAL STATUS OF TODDLER," June 2021.
[4] Yuli Mardi, "DATA MINING MEDICAL RECORDS TO DETERMINE MOST DISEASES USING DECISION
TREE C4.5," Academy of Health Recorders and Information, Jl. Gajah Mada No. 23 Padang, April 2018.
[5] Maryam, Huan Wendy Ariono, "Mobile-Based Cervical Cancer Stage Classifier Expert System Using the
Decision Tree Method," Muhammadiyah University of Surakarta, September 2022.
[6] Hananda Hafizan, Anggita Nadia Putri, "Application of the Decision Tree Classification Method to the
Nutritional Status of Toddlers in Simalungun Regency," STIKOM Tunas Bangsa, Medan, April 2020.
[7] Imam Sutoyo, "IMPLEMENTATION OF THE DECISION TREE ALGORITHM FOR CLASSIFICATION OF
STUDENT DATA," Bina Sarana Informatika University Jakarta, September 2018.
[8] Susi Fitryah Damanik, Anjar Wanto, and Indra Gunawan, "Application of the C4.5 Decision Tree
Algorithm for Classification of Family Welfare Levels in Tiga Dolok Village," STIKOM Tunas Bangsa,
Pematangsiantar, January 2022.
[9] Fida Maisa Hana, "Classification of Diabetes Patients Using the C4.5 Decision Tree Algorithm,"
Muhammadiyah University of Kudus, September 2020.
[10] Asmaul Husnah Nasrullah, "IMPLEMENTATION OF THE DECISION TREE ALGORITHM FOR
CLASSIFICATION OF BEST-SELLING PRODUCTS," Faculty of Computer Science, Ichsan University
Gorontalo, Gorontalo, September 2021.
[11] Ade Yuliana, Duwi Bayu Pratomo, "DECISION TREE ALGORITHM (C4.5) FOR PREDICTING STUDENT
SATISFACTION WITH THE PERFORMANCE OF TEDC BANDUNG POLYTECHNIC LECTURERS,"
Informatics Engineering - TEDC Bandung Polytechnic, February 2017.
[12] Ismasari Nawangsih, Agus Setiawan, "IMPLEMENTATION OF DATA MINING TO PREDICTION
DAMAGED GOODS USING THE C4.5 ALGORITHM AT THE COMPANY PT HOME CENTER
INDONESIA," Informatics Engineering, STT Pelita Bangsa, March 2019.
[13] Niki Ratama, "Analysis and Comparison of Asthma Diagnosis Application Systems Using the Certainty
Factor Algorithm and Android-Based Decision Tree Algorithm," Informatics Engineering, Pamulang
University, South Tangerang, May 2018.
[14] Msy Aulia Hasanah, Sopian Soim, Ade Silvia Handayani, "Implementation of the CRISP-DM Model Using
the Decision Tree Method with the CART Algorithm for Predicting Rainfall with Flood Potential,"
Electrical Engineering, Sriwijaya State Polytechnic, September 2021.
[15] Ari Muzakir, Rika Anisa Wulandari, "Data Mining Model as Prediction of Pregnancy Hypertension using
Decision Tree Technique," Informatics Engineering, Bina Darma University Palembang, May 2016.
[16] Randi Estian Pambudi, Sriyanto, Firmansyah, "Stroke Classification Using the C.45 Decision Tree
Algorithm," Informatics & Business Institute Darmajaya, December 2022.

JTI CIT, Vol. xx, No. xx month 20xx: xx-xx


JTI CIT p-ISSN 2337-8646 e-ISSN 2721-561X  8

[17] Yudhi Pratama Tanjung, Steven Sentinuwo, Agustinus Jacobus, "Determining Household Electrical Power
Using the Decision Tree Method," Informatics Engineering, Sam Ratulangi University, 2014.
[18] Eka Pandu Cynthia, Edi Ismanto, "DECISION TREE ALGORITHM C.45 METHOD IN CLASSIFYING
SALES DATA OF FAST FOOD OUTAGE BUSINESSES," Journal of Information Systems Research and
Informatics Engineering, July 2018.
[19] Sigit Abdillah, "APPLICATION OF THE C4.5 DECISION TREE ALGORITHM FOR DIAGNOSIS OF
STROKE USING DATA MINING CLASSIFICATION AT SANTA MARIA PEMALANG HOSPITAL,"
Informatics Engineering, Dian Nuswantoro University, 2022.
[20] Saeful Bahri, "IMPLEMENTATION OF DATA MINING TO DETERMINE STUDENT INTERESTS IN
DETERMINING A MAJOR IN HIGHER COLLEGE," Ahmad Dahlan Institute of Technology and Business,
June 2022.
[21] Mugi Raharjo, Ridwan, Jordy Lasmana Putra, Tommi Alfian Armawan Sandi, "Implementation of the Data
Mining Classification Decision Tree Method for Predicting Students' Interest in Robotics Majors,"
Computer Engineering, STMIK Nusa Mandiri, August 2019.
[22] Suherman, Marlia Purnamasari, Fitriani Dwi Hastuti, "CLASSIFICATION OF STUDENTS BASED ON
CROSS INTEREST SUBJECTS USING THE C4.5 DECISION TREE METHOD," Informatics Engineering,
Serang Raya University, September 2021.
[23] Zakarias Situmorang, Sartika Mandasari, Yuni Franciska, Karina Andriyani, Puji Sari Ramadhan, "C45
ALGORITHM IN PREDICTING PROSPECTIVE STUDENT INTERESTS," Journal of Science and Social
Research, February 2022.
[24] Basrie Basrie, "APPLICATION OF THE C 4.5 ALGORITHM FOR DETERMINING FACULTY AT STATE
ISLAMIC UNIVERSITIES FOR PROSPECTIVE STUDENTS BASED ON INTERESTS AND TALENTS,"
Sultan Aji Muhammad Idris State Islamic University Samarinda, 2022.
[25] Bramefio Qibran Husaini, Jemakmun, "Application of the C45 Decision Tree Algorithm for Classification
of Student Majors," Information Systems, Bina Darma University, March 2023.
[26] Chairun Nas, "Data Mining Predicts the Interest of Prospective Students in Choosing Higher Education
Using the C4.5 Algorithm," Management Informatics, Catur Insan Scholar University, Cirebon, October
2021.

Title… (First author, et al)

You might also like