You are on page 1of 3

Review Journal

Title Building Multiclass Classification Model of Logistic Regression and


Decision Tree Using the Chi-Square Test for Variable Selection Method
Name of Journal Journal of Hunan University (Natural Sciences)
Volume and Page Volume 49 No.4, Page 172-181
Year 2022
Author Waego H. Nugroho, Samingun Handoyo, Yusnita J. Akri, Agus D.
Sulistyono
Reviewer Fajar Djabar and Febriyani Kadir
Date 12 February 2022

Research 1. Identify the main factors that affect toddlers' health conditions in
Objectives Malang, Indonesia.
2. Build multiclass logistic regression and decision tree classification
models to predict the health status of toddlers based on the identified
factors.
3. Measure the performance of the models using appropriate metrics.
Subject of the health status of toddlers in Malang, Indonesia, and the main factors that
Research influence their health conditions.
Research Methods The research methods used in this study include data collection, data
preprocessing, variable selection, and model development and evaluation.
The authors collected data on toddlers' health status and their parents
demographic and health-related information from a hospital in Malang,
Indonesia. They preprocessed the data by handling missing values, outliers,
and imbalanced class distribution. They used the Chi-square test to select
the most important predictor variables for the multiclass logistic regression
and decision tree models. They developed the models using the selected
variables and evaluated their performance using various metrics, including
accuracy, precision, recall, and F1-score. The authors also compared the
performance of the two models and analyzed the importance of predictor
variables in the decision tree model.
Operational the health status of toddlers, which want to identify the main factors that
Definition of affect toddlers' health conditions in Malang, Indonesia. The authors
Dependent collected data on toddlers' health status from a hospital in Malang,
Variables Indonesia, and used it as the dependent variable
Methods and Chi-Square Test for Dependency between two categorical features
2
Instrument for r c
(Oi . j−Ei . j )
X =∑ ∑
2
Measuring Ei . j
i=1 j=1
Dependent
Variables The logistic regression classification model for binary classes is called the
sigmoid function and defined in the machine learning approach as follows
1
σ ( a )=
1+ exp(−a)

Operational There are Multiple Independent Variables as attributes that affect toddlers'
Definition of health conditions, namely:
Independent Maternal blood pressure before pregnancy (X1)
Variables Diabetes maternal history before pregnancy (X2)
Mother’s psychological condition (X3)
Father’s blood pressure (X4)
Paternal history of diabetes (X5)
Father’s congentinal disease (X6)
Family welfare education (X7)
Father’s psychological condition (X8)
Family Income (X9)
Drinking water quality (X10)
House floor condition (X11)
House sanitation condition (X12)
Research Steps 1. Data collection: The authors collected data on toddlers' health status and
their parents' demographic and health-related information from a hospital
in Malang, Indonesia.
2. Data analysis: The authors used statistical and machine learning
techniques to analyze the data, including the Chi-Square test for variable
selection method.
3. Feature selection: The authors used a data-centric approach to select the
most important predictor variables for building the classification models.
4. Model development: The authors developed multiclass logistic
regression and decision tree classification models using the selected
predictor variables.
5. Model evaluation: The authors evaluated the performance of the models
using the testing part of the dataset and compared their performance.
6. Results interpretation: The authors interpreted the results of their study
and discussed the implications of their findings.
Research Results from the results of the selection of features 4 main factors affecting the
health status of children under five in Malang are the mother's history of
diabetes before pregnancy, father’s blood pressure, father's psychological
condition and the quality of drinking water.
Research there is one advantage in this research, namely in the decision tree
Strengths classification method, which in this method can be clearly seen important
features that can be explored with these important features that have
described the proportion of each feature so that it becomes a separating
feature that has a high proportion value.
Weaknesses of Combining two binary response variables into a multiclass categorical
The Research variable may cause extreme class imbalance problems and is considered an
outlier. So it is necessary to solve it by oversampling, undersampling,
bootstrapping, and outlier modeling to predict instances of class imbalance.
Conclusion From the feature selection, four main factors influence the status of
toddlers' health conditions in Malang: the mother's history of diabetes
before pregnancy, the father's blood pressure, psychological condition, and
drinking water quality. The decision tree model performs better than the
logistic regression model on the various performance measures used.

You might also like