Professional Documents
Culture Documents
Abstract—Nowadays, maternal health issues are one of the deaths in 2017 occurred in Southern Asia and Sub-Saharan
most challenging issues all over the world. Many women die Africa (254000) [4]. Every day in 2017, almost 830 women
each year during pregnancy and after delivery, which is a died from maternity and childbirth-related conditions that
major cause of infant mortality. In rural areas,pregnant women
face various difficulties and challenges, including a shortage could have been prevented. Nearly all maternal deaths (94%
of doctors, inadequate knowledge, a lack of public clinics, worldwide) take place in middle and low-income countries.
infrastructure issues, and transportation issues. The mother’s The rate of maternal mortality in Bangladesh is 245 per
pregnancy is the major cause of the infant’s poor health, rather every 100,000 live births. An estimated 7,660 women per
than any other factors that may have arisen after childbirth. year in Bangladesh die from complications during pregnancy
Significant roles are played by maternal risk factors such
as the mother’s chronic condition, age, nutrition, and other that could have been avoided [5]. The maternal deaths and
medical assistance during pregnancy. Recent developments in mortality rates in Cox’s Bazar, a rural area in the southeast,
Artificial intelligence methods, particularly machine learning are among the highest in the nation. 90% pregnancy of
models, have made it easier to make predictions in a variety of women in this region give birth due to a lack of healthcare
disciplines. We can identify the primary maternal risk factors services without the assistance of a skilled birth attendant
that can lead to newborn child and maternal mortality using
machine learning techniques. This paper proposes improved or the provision of emergency treatment. Additionally, more
data preprocessing methods that involve feature engineering than 50 percent of all pregnant women have no access to
and data cleaning in order to effectively handle anomalies in the the medical services that are available to them, and 11% of
data values. To identify the maternal health risk factor, several women consult a doctor in the first 6 weeks that followed de-
machine learning algorithms were used, including Cat Boost, livery. About 36 percent of the total pregnant women receive
Random Forest, XGB, Decision Tree, and Gradient Boost. Using
the preprocessed dataset, the suggested model was developed, any kind of prenatal care from a qualified physician (WHO,
trained, and tested. The Random Forest was the best machine- 2017. Bangladesh also suffers from a scarcity of trained
learning algorithm with an accuracy score of 90%, precision birth attendants, especially in rural regions (DHS 2014) [4].
(90%), recall (90%), and F1-score (90%) Maternal deaths, which account for 14% of all deaths among
Index Terms—Random Forest, Maternal Health, Risk Fac- Bangladeshi women aged 15 to 49, are the third leading cause
tors, Machine Learning, Classification
of death [6]. Qualified labor and strong pregnancy care and
delivery attendance could avert the majority of these deaths.
I. I NTRODUCTION However, because we do not have all of the correct mortality
Fatal Maternal Risk of complications is a term used to figures, this estimate might be significantly higher. If women
describe a range of problems that could have a serious and newborns are given competent care before, during, and
influence on a woman’s and her children’s health and occur after birth, they can be saved.
during maternity, birth, or the postnatal period. If some of That’s why new approaches for early detection and anal-
these symptoms appear, the patient must seek medical help ysis of maternal health incidents are required. This paper
as soon as possible to avoid death. Approximately 585.000 proposes using machine learning techniques to make a risk
women die every year in the world during childbirth, the classifier for maternal mortality. As a result, we hope to have
postpartum period, or delivery [1], and nearly 50 million early diagnosis of morbidity instances, as well as medical
maternal and infant difficulties are reported each year, with experts making decisions, allowing for timely patient treat-
around 300 million women affected by long-term and short- ment.This might help reduce the chances to the mother and
term medical complications associated with pregnancy, de- infant mortality within this period as well as the social and
livery, and postpartum [2]. Women in rural locations and economic consequences.
impoverished communities have the highest maternal death
rates. Pregnancy-related complications and maternal mortal- II. L ITERATURE R EVIEW
ity are more common among young teenage girls (10–14 Prophylaxis lowers the chances of getting sick, and early
years old). Maternal mortality is far too common. Around detection raises the chances of simple and quick treatment.
295 000 women lost their lives in 2017 due to complications Curable infections and risks are the cause of the deaths of
related to pregnancy or childbirth [3].A significant proportion many pregnant women. It is not uncommon for a pregnant
of deaths (94%) occurred in lowland areas, and many of them woman’s disease to go undiagnosed, or for the risks that she
might have been prevented. An estimated 86% of all maternal faces to go unnoticed. Various machine learning approaches
2
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 04,2024 at 07:19:27 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Risk Level Distribution
The first step in processing the dataset was to look for any
missing values. The dataset does not contain any missing
values
A. Dataset Collection
In this experiment, the maternal health risk factors dataset
[28] was collected from the repository of UCI machine
learning, which is publicly available. It contains 1014 rows
and 7 columns. The majority of pregnant women in this
dataset have low health risks. Out of 1014 observations, 406
(40%) pregnant women were at low risk, 336 (33.1%) were
Fig. 3. The heat map for correlation features of the dataset.
at medium risk, and 272 (26.8%) were at high risk. The entire
set of attributes and their information are given below:
Age: When a woman is pregnant, she might be any age in C. Feature Engineering and Data cleaning
years.
In this dataset, “Heart Rate” is the only variable having
Systolic BP: The maximum blood pressure in millimeters
an anomaly with an irrational value. Two observations in
of mercury, is another crucial element to take into account
this variable had heart rates of 7 beats per minute (beats per
while pregnant.
minute). We all know that the average resting heart rate for
Diastolic BP: A lower blood pressure measurement in mil-
individuals over the age of ten, including older individuals,
limeters of mercury is an additional aspect to consider during
ranges between 60 and 100 beats per minute. It’s possible
pregnancy.
that the data may not be reliable because of an input error. It
BS: The amount of glucose in the blood is measured in
is feasible that the given label is incorrect, which will confuse
mmol/L.
the training process and reduce the accuracy of the model.
Heart rate: The normal number of beats per minute for a
That’s why we have dropped the Heart Rate variable.
heart rate
Risk Level: Risk Prediction Intensity level during pregnancy D. Dataset Splitting
depends on the preceding attribute. In order to use any machine learning technique, the dataset
In this case, the target variable is risk level, while the other must be divided into two parts such as training and testing.
features are predictor variables. 80% of the data in this dataset was used to model training,
while the remaining 20% was used for test the model.
B. Data Preprocessing
In order to build a better prediction model, one needs to E. Machine Learning Model
first prepare the data ready for analysis. The model’s effec- The following segment contains a list of some of the
tiveness increases when the data are transformed properly. most widely used data mining strategies for the pattern as
3
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 04,2024 at 07:19:27 UTC from IEEE Xplore. Restrictions apply.
well as classification recognition. There are multiple different IV. RESULTS AND DISCUSSION
machine learning algorithms that can be performed in the
prediction of the risk factor of maternal health. The following This study used five different machine learning techniques
5 types of Machine Learning algorithms were used to make to predict factors affecting maternal health. The purpose of
predictions and analyze data: this study was to compare several machine learning models
• Random Forest [29]. in order to develop a more effective predictive model. This
• Decision Tree [30]. section looks at how well different machine learning models
• Cat Boost [31]. work based on a number of different metrics. Five different
• XGB [32]. types of evaluation metrics were used to figure out how well
• Gradient boosting [33]. the models worked: ”Recall, Precision, Accuracy, as well as
F1-Score”
F. Performance Measurements Here we also compare the classification report of the
raw dataset and preprocessed dataset. Before performing
In this section, we’ll compare how well our proposed feature engineering and data cleaning, Table 1 shows the
methods comparative experiments were done in accordance performance measures of a different classification algorithm.
with the two editions of the dataset on maternal health risk After performing feature engineering and data cleaning,
factors: 1. original 2. Preprocessed dataset.The five classifiers Table 2 summarizes the performance indicators of various
were applied to each dataset separately to generate training classification systems.We observed an improvement in test
models, and those classifiers are Random Forest, Decision data accuracy when we used our processed dataset. It demon-
Tree, XGB, Cat Boost, and Gradient Boost and a 80%—20% strates that our model generalizes more effectively with our
splitting were used to train and test each machine learning processed dataset. Four types of assessment measures were
model. The effectiveness of various machine learning models used to evaluate the models’ performance: ”Recall, Precision,
is evaluated using indicators like: Accuracy, and F1-score for each model given in Table 2
Confusion Matrix – It is a table that is frequently utilized
in the process of describing the effectiveness of classifi-
TABLE I
cation models. Classification performance may mislead if B EFORE FEATURE ENGINEERING AND DATA CLEANING
dataset contains more than two classes or if the number
of observations in each class is unbalanced. True positives, Algorithm Accuracy Precision Recall F1 Score
false positives, false negatives, and true negatives are the Random
87% 87% 87% 87%
Forest
four major types used to build the confusion matrix [34].
Decision
Accuracy: It is a statistic that is used in classification issues 85% 85% 85% 85%
Tree
to identify what proportion of predictions are accurate Cat
84% 84% 84% 84%
Boosting
TP + TN
Accuracy = (1) XGB 83% 83% 83% 83%
TP + FN + FP + TN Gradient
80% 80% 80% 80%
Boosting
Precision: The percentage of correctly recognized positive
samples in relation to the overall number of positively
identified samples is referred to as precision.
TABLE II
TP
P recision = (2) A FTER FEATURE ENGINEERING AND DATA CLEANING
TP + FP
Algorithm Accuracy Precision Recall F1 Score
Recall: For each positive sample, the recall is determined by Random
the percentage of true positives out of all true positives. 90% 90% 90% 90%
Forest
Decision
TP Tree
87% 87% 87% 87%
Recall = (3)
TP + FN Cat
86% 86% 86% 86%
Boosting
F1 score: The F1 score is a statistic that takes the harmonic XGB 85% 86% 85% 85%
mean of a classification system’s precision and recalls values Gradient
81% 82% 81% 81%
and combines them into a single value. Boosting
(P recisionxRecall)
F 1 = 2x (4) It was noticed that ”Random Forest” was the best machine-
(P recision + Recall)
learning algorithm, with an accuracy score of 90%, precision
Where, TP = True Positive, TN = True Negative (90%), recall (90%), and F1-score (90%)
We can easily determine the risk level of maternal health,
FN whether it is high, mid, or low.Five different machine learn-
FNR = = 1 − TPR (5)
TP + FN ing algorithms were implemented in this work, with Random
Forest provides the highest accuracy for proper classification.
FP The following figure demonstrate the accuracy of correctly
FPR = = 1 − TPR (6) identifying the risk level of maternal health.
TN + FP
4
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 04,2024 at 07:19:27 UTC from IEEE Xplore. Restrictions apply.
As Random forest provides the highest accuracy, we
can select the Random forest Classifier to get the feature
importance. We can see that from the above figure, Blood
sugar is the main key factor as it has a big impact on the
classification result and and the Age is second most risk
factor.
V. C ONCLUSION
R EFERENCES
5
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 04,2024 at 07:19:27 UTC from IEEE Xplore. Restrictions apply.
[12] “Identifying Causes of Neonatal Mortality from Ob- [30] A. Navada, A. N. Ansari, S. Patil and B. A. Sonkamble, ”Overview
servational Data: A Bayesian Network Approach - of use of decision tree algorithms in machine learning,” 2011 IEEE
ProQuest,” https://www.proquest.com/docview/1705019812 Control and System Graduate Research Colloquium, 2011, pp. 37-42,
pqorigsite=gscholarfromopenview=true (accessed Oct. 27, 2022). doi: 10.1109/ICSGRC.2011.5991826.
[31] A. A. Ibrahim, R. L., M. M., R. O., and G. A., “Comparison
[13] H. L. Rowe and R. A. Gordon, “Summary of Unanswered Questions of the CatBoost Classifier with other Machine Learning Methods,”
Regarding Infant Mortality During Adult-infant Bedsharing,” SSRN International Journal of Advanced Computer Science and Applications,
Electronic Journal, 2015, doi: 10.2139/ssrn.3893656. vol. 11, no. 11, 2020, doi: 10.14569/ijacsa.2020.0111190.
[14] S. N. Dwivedi, S. Begum, A. K. Dwivedi, and A. Pandey, [32] L. Torlay, M. Perrone-Bertolotti, E. Thomas, and M. Baciu, “Machine
“Determinants of infant mortality in rural India: A three-level learning–XGBoost analysis of language networks to classify patients
model,” Health, vol. 05, no. 11, pp. 1742–1749, 2013, doi: with epilepsy,” Brain Informatics, vol. 4, no. 3, pp. 159–169, Apr.
10.4236/health.2013.511235. 2017, doi: 10.1007/s40708-017-0065-7.
[15] L. C. Kenny, W. B. Dunn, D. I. Ellis, J. Myers, P. N. Baker, [33] C. Bentéjac, A. Csörgő, and G. Martı́nez-Muñoz, “A comparative anal-
and D. B. Kell, “Novel biomarkers for pre-eclampsia detected using ysis of gradient boosting algorithms,” Artificial Intelligence Review,
metabolomics and machine learning,” Metabolomics, vol. 1, no. 3, pp. Aug. 2020, doi: 10.1007/s10462-020-09896-5.
227–234, Jul. 2005, doi: 10.1007/s11306-005-0003-1. [34] P. Flach, “Performance Evaluation in Machine Learning: The Good,
[16] E. A. Rodrı́guez, F. E. Estrada, W. C. Torres, and J. C. M. Santos, the Bad, the Ugly, and the Way Forward,” Proceedings of the AAAI
“Early Prediction of Severe Maternal Morbidity Using Machine Learn- Conference on Artificial Intelligence, vol. 33, pp. 9808–9814, Jul.
ing Techniques,” Lecture Notes in Computer Science, pp. 259–270, 2019, doi: 10.1609/aaai.v33i01.33019808. .
2016, doi: 10.1007/978-3-319-47955
[17] M. Podda, D. Bacciu, A. Micheli, R. Bellù, G. Placidi, and L.
Gagliardi, “A machine learning approach to estimating preterm infants
survival: development of the Preterm Infants Survival Assessment
(PISA) predictor,” Scientific Reports, vol. 8, no. 1, Sep. 2018, doi:
10.1038/s41598-018-31920-6.
[18] L. C. Y. Poon, N. A. Kametas, N. Maiz, R. Akolekar, and K. H.
Nicolaides, “First-Trimester Prediction of Hypertensive Disorders in
Pregnancy,” Hypertension, vol. 53, no. 5, pp. 812–818, May 2009,
doi: 10.1161/hypertensionaha.108.127977.
[19] B. Farran, A. M. Channanath, K. Behbehani, and T. A. Thanaraj,
“Predictive models to assess risk of type 2 diabetes, hypertension
and comorbidity: machine-learning algorithms and validation using
national health data from Kuwait–a cohort study,” BMJ open, vol. 3,
no. 5, May 2013, doi: 10.1136/bmjopen-2012-002457.
[20] A. R. Yarlapati, S. Roy Dey, and S. Saha, “Early Prediction of
LBW Cases via Minimum Error Rate Classifier: A Statistical Ma-
chine Learning Approach,” 2017 IEEE International Conference on
Smart Computing (SMARTCOMP), May 2017, doi: 10.1109/smart-
comp.2017.7947002.
[21] S. Yu, K. K. Tan, B. L. Sng, S. Li and A. T. H. Sia, ”Feature extraction
and classification for ultrasound images of lumbar spine with support
vector machine,” 2014 36th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society,2014, pp. 4659-
4662, doi: 10.1109/EMBC.2014.6944663.
[22] W. Cheng, L. Fang, L. Yang, H. Zhao, P. Wang and J. Yan, ”Varying
Coefficient Models for Analyzing the Effects of Risk Factors on
Pregnant Women’s Blood Pressure,” 2014 13th International Confer-
ence on Machine Learning and Applications, 2014, pp. 55-60, doi:
10.1109/ICMLA.2014.14.
[23] L. K. Woolery and J. Grzymala-Busse, “Machine learning for an
expert system to predict preterm birth risk.,” Journal of the American
Medical Informatics Association, vol. 1, no. 6, p. 439, Nov. 1994, doi:
10.1136/jamia.1994.95153433.
[24] B. N. Lakshmi, T. S. Indumathi and N. Ravi, ”A comparative
study of classification algorithms for predicting gestational risks
in pregnant women,” 2015 International Conference on Comput-
ers, Communications, and Systems (ICCCS), 2015, pp. 42-46, doi:
10.1109/CCOMS.2015.7562849.
[25] L. C. Kenny, W. B. Dunn, D. I. Ellis, J. Myers, P. N. Baker,
and D. B. Kell, “Novel biomarkers for pre-eclampsia detected using
metabolomics and machine learning,” Metabolomics, vol. 1, no. 3, pp.
227–234, Jul. 2005, doi: 10.1007/s11306-005-0003-1.
[26] N. Krupa, M. MA, E. Zahedi, S. Ahmed, and F. M. Hassan, “Antepar-
tum fetal heart rate feature extraction and classification using empir-
ical mode decomposition and support vector machine,” BioMedical
Engineering OnLine, vol. 10, no. 1, p. 6, 2011, doi: 10.1186/1475-
925x-10-6.
[27] M. Ahmed and M. A. Kashem, ”IoT Based Risk Level Prediction
Model For Maternal Health Care In The Context Of Bangladesh,” 2020
2nd International Conference on Sustainable Technologies for Industry
4.0 (STI), 2020, pp. 1-6, doi: 10.1109/STI50764.2020.9350320.
[28] “UCI Machine Learning Repository: Maternal
Health Risk Data Set Data Set,” Uci.edu, 2020.
https://archive.ics.uci.edu/ml/datasets/Maternal+Health+Risk+Data+Set
(accessed Oct. 2, 2022).
[29] I. Reis, D. Baron, and S. Shahaf, “Probabilistic Random Forest: A
Machine Learning Algorithm for Noisy Data Sets,” The Astronom-
ical Journal, vol. 157, no. 1, p. 16, Dec. 2018, doi: 10.3847/1538-
3881/aaf101.
6
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 04,2024 at 07:19:27 UTC from IEEE Xplore. Restrictions apply.