You are on page 1of 12

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2017) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 115 (2017) 338–349

7th International Conference on Advances in Computing & Communications, ICACC-2017, 22-


24 August 2017, Cochin, India

Investigation of Nutritional Status of Children based on


Machine Learning Techniques using
Indian Demographic and Health Survey Data
Sangita Khare1, S Kavyashree1, Deepa Gupta2*, Amalendu Jyotishi2
Department of Computer Science and Engineering1, Department of Mathematics2, Department of Management3,
Amrita School of Engineering, Bengaluru,
Amrita Vishwa Vidyapeetham, Amrita University, India
k_sangita@blr.amrita.edu,kavyashree2411@gmail.com,
g_deepa@blr.amrita.edu, amalendu.jyotishi@gmail.com

Abstract

Malnutrition is the leading causes of infant mortality among the developing countries including India. This study designs a
prediction model for malnutrition based on machine learning approach, using the available features in the Indian Demographic and
Health Survey (IDHS) dataset and comparing that with the literature identified features. Our findings suggest that machine learning
approach identifies some important features not identified in extant literature. Subsequently, logistic regression was carried out to
identify the probabilities of these features in explaining malnutrition. The paper contributes in exploring the possibilities of using
artificial intelligence in identifying probable correlates of malnutrition.

© 2017 The Authors. Published by Elsevier B.V.


Peer-review under responsibility of the scientific committee of the 7th International Conference on Advances in Computing &
Communications.

Keywords: Malnutrition, Nutritional status; Machine Learning;information gain;logistic regression;IDHS dataset India.

1. Introduction

Good nutrition is essential to lead a healthy lifestyle. Malnutrition is one of the global health problem especially in
the area of child survival. In developing countries, malnutrition is one big problem which is directly or indirectly
responsible for half of all deaths worldwide among children under the age of five [1, 2]. Malnourished children are

* Corresponding Author. Tel.: +91-991-692-1850; fax: +080-2844-0092.


E-mail address: g_deepa@blr.amrita.edu

1877-0509 © 2017 The Authors. Published by Elsevier B.V.


Peer-review under responsibility of the scientific committee of the 7th International Conference on Advances in Computing &
Communications.

1877-0509 © 2017 The Authors. Published by Elsevier B.V.


Peer-review under responsibility of the scientific committee of the 7th International Conference on Advances in Computing &
Communications
10.1016/j.procs.2017.09.087
Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349 339
2 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

more prone to frequent illness like reduced mobility, increased risk of fractures, infections, muscle wasting, weight
loss, low energy, more length of stay in hospital, reduced wound healing etc. The effects of malnutrition on human
health of all age groups has been the subject of extensive research for several decades. There are various factors studied
which contribute to malnutrition. These factors can be broadly categorized as demographic factors, access factors,
health factors, household factors, physical factors and socioeconomic factors.
The World Health Organization estimates that India is one of the highest-ranking countries in the world in case of
the number of children suffering from malnutrition, nearly double that of Sub Saharan Africa, with dire consequences
on mobility, mortality, productivity and economic growth [3]. Children under five in India suffer from some of the
highest levels of stunting, wasting and underweight observed in any country in the world and 7 out of every 10 children
under five are undernourished. About 50 per cent of all childhood deaths are attributed to malnutrition, according to
UNICEF. As many as 48% of Indian children under the age of 5 are stunted, a sign of chronic malnutrition. It is noticed
that girl children are more vulnerable than boys to malnutrition and mortality (NFHS 3, 2005-2006).
The extant literature to a large extent has been explorative in identifying the features explaining malnutrition. In
such an instance, using artificial intelligence can identify the proximate correlates. Comparing those with extant
literature would help in two ways. First, we can reconfirm if artificial intelligence can replicate the literature identified
features. Second, if new features are identified through artificial intelligence, could there be a plausible explanation
for those features? We attempt to do so in this paper.

2. Related Literature

The main focus here is to identify parameters that define malnutrition and also explanatory factors which affect
those malnutrition defining parameters. Other focus is to find out different techniques which are investigated in the
literature for analysis of malnutrition conditions. Health Surveys are done in various parts of world [4]. Most of the
studies explored BMI (Body Mass Index) as the prominent indicator for malnutrition. Other than BMI, the defining
parameters which is seen frequently is WHZ (Weight for Age Z score, also known as wasting). Very few work has
taken HAZ (Height for Age Z score, also known as stunting) and WAZ (Weight for Age Z score, also known as
underweight) also into account. If we include the explanatory variables referred in recent literature on malnutrition,
around 170 explanatory variables were identified globally including mother’s education, wealth index, adequate
sanitation, sex of the head of the household, does the household has the refrigerator, mother’s weight, antenatal care,
iron supplementation etc. [5,6].
These factors were considered mostly based on human intuitions, facts and surveys. Several approaches based on
statistical techniques have been explored in past to find out the explaining factors of malnutrition. Logistic regression
and linear regression methods are also used to analyze the determinants of child nutrition to find the probability of
under-nutrition among the children under the age of five [7-10] .Other techniques which are generally used on these
data sets for the construction of the model describing nutritional level for children are J48 decision tree [11], Naïve
Bayes, part rule induction classifier with ROC [12], association rules to identify the association between the mother
and the child parameters [5,13] .Descriptive analysis or bivariate analysis, shows that child feeding practices are
strongly associated with HAZ [14]. Factor analysis and least square method are used in a few papers to find the
association between the selected features with malnutrition [15]. Usage of ordinal least square and cumulative
distribution is also seen. Mother education has strong relation with child nutritional state. Binomial and multiple
logistic regression is used to estimate the effect of education on malnutrition in the univariable and multivariable
models respectively [16].
Most of the literature is found to be using models based on statistical analysis method, probability strategy, nearest
neighborhood technique, logical analysis, comparison technique, logistic regression, descriptive analysis and
association rules. The literature is scanty where the data mining techniques have been implemented. These techniques
are implemented on the features or attributes which are selected on the basis of apriori knowledge or logical deduction.
Therefore, it would be interesting to understand if the artificial intelligence (or machine learning technique) can imitate
the apriori features and/or throws up new features that may need attention of the researchers and policy makers.
The India Demographic Health Survey (IDHS) dataset 2005-2006 is the source data employed for this research
work. The 2005-2006 IDHS is known as National Family and Health Survey (NFHS) Data in Indian context. The
IDHS dataset 2005-2006 consists of data related to birth record, children record, couples record, HIV test record,
340 Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349
3 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

household record, household member record and individual record across all states and Union Territories in India. The
data set which is used in this study is Birth Record (BR) dataset, which consists of 256783 instances or observations.
Each instance in Birth record is for every child ever born to an interviewed women. This dataset consists of full birth
history given by all women interviewed including information on prenatal care, pregnancy and postnatal care along
with immunization and health of children born in the last 5 years. The IDHS 2005-2006, Birth Record dataset has 1175
features, which are divided into six different coding groups as mentioned in the Table 1. Each of the feature have
certain meaning and based on that it is categorized in different coding groups, for example Vxxx considers women
standard variables which are around 489 features in count like mother education etc. Similarly, birth history with
variable code ranging in Bxx are 18 in count.

Table1: Birth record data set of Demographic and Health Survey data set 2005-06

Variable code Label No. of Features Sample features


Vxxx Women standard 489 Mother educational level, State, Type of place of
variables residence, religion, etc.
Bxx Birth history 18 Birth order number, child is twin, date of birth, sex of
child, current age, etc.
Mxx Pregnancy, postnatal, 146 Assistance, duration of breastfeeding, delivery by
breastfeeding caesarean section, place of delivery, etc.
Hxx Immunization and 166 Received BCG, Received DPT, Received POLIO,
health Received MEASLES, received any treatment, etc.
HWxx Anthropometry for 30 Child’s age, height, weight, child anaemia level,
children child BMI, etc.
Sxxx Women and house 313 Wealth index, Household materials, cooking fuel,
related irrigation land, native language of respondents,
number of household members, etc.

3. Methodology

From the definition of WHO for malnutrition, the four main factors which define malnutrition are BMI, HAZ, WAZ
and WHZ [17].Therefore, these are considered as dependent variables (features) in this study. These features are now
considered as class variables and grouped in the range of standard deviation between -2 to +2.Here in this study the
BMI value LOW means undernutrition, MID is a normal case and HIGH is the case of overnutrition. These three
values LOW, MID and HIGH are considered as the 3 class labels or variables. Similarly classes are defined for other
three features i.e. HAZ, WAH and WHZ. The value below -2 is considered as the LOW, value in between -2 to +2 is
considered as MID and value above +2 is considered as HIGH.
In Birth record data, which consists of 256783 instances (rows), around 83% of instances have null class value,
which denotes the case of missing values. Therefore, the actual data with the class values are 17% i.e. 43734 instances
which are considered in this study. The Birth record consists of 1175 features. When these features were rendered we
could come across few features which does not play any role or have impact on the child nutritional level, few of the
features were blank (null), yet other features are indexed value of the unique identity like that of blood sample, some
features were sequential and interdependent in nature like {POLIO1, POLIO2, POLIO3} which are merged as one
single feature POLIO. Table 2 shows the list of features which have been removed, merged or not considered in the
further process as they were not related to malnutrition.
After all merging and reduction 907 features are obtained from initial 1175 .Table 3 lists the number of features
which are obtained after further cleaning process on the reduced data set. The total number of 401 features from the
2005-6 dataset have been taken for the further processing. On the other hand there is list of 164 features obtained from
literature from past work in Indian context. This study which is done to predict the nutritional state of child is mainly
divided into two phases, as shown in the proposed methodology design given in Fig. 1.
The comparison is done on the reduced form of literature identified features and our extended set of features based
on explanatory power. We identify the commonality and difference between literature and machine identified features.
The proposed model is divided in two phases as explained below.
Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349 341
4 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

Table 2: Features which are not considered in the further process.

Features removed Examples Features


Null 174 V004, V021. Ultimate area unit, primary sampling unit, etc.

Index 39 MIDx, IDX. Index to birth history, etc.


Flagged 32 Value present or not Flag for last pregnancy, flag for breastfeeding, flag for abstinence,
flag for amenorrhea, etc.
Merged 23 POLIO, DPT, Attributes [POLIO 1, POLIO 2, POLIO 3], [DPT 1, DPT 2, DPT3], [colour
of TV TV, black-white TV], etc.
Does not have any 506 No direct or indirect Wanted last child, brand of pill used, shown pill package, reason
impact over child cause over the child not to use any contraceptive method, source of FP for non-users,
health health known about side effects, how to deal with side effects, entries in
maternity table, heard of TB, year of first marriage, date of first
marriage, completeness of certain date information, year sine first
marriage, fertility preference, decision maker for using
contraception, reason for not having sex, known about pregnancy
complications, reasons didn’t deliver at health facility, regret
sterilization, child's father present during antenatal visit, etc.
FP: Family Planning TB: Tuberculosis
Table 3: Feature listed from the literature features and full data set after the cleaning process in Indian context

Variable code Label No. of Features from No. of features from


full data set [401] Literature [164]
Vxxx Women standard variables 157 97
Bxx Birth history 12 7
Mxx Pregnancy, postnatal, breastfeeding 66 19
Hxx Immunization and health 49 13
HWxx Anthropometry for children 7 4
Sxxx Women and house related 110 24

3.1. Phase I

In Phase I -initial step is pre-processing of the dataset. The four main factors which explains malnutrition BMI,
HAZ, WAZ and WHZ are considered as dependent class variables. Next step is Resampling to overcome skew
imbalanced dataset. The Resampling method implemented is SMOTE (Synthetic Minority Oversampling
Technique)[18] . It resamples the dataset by using the number of nearest neighbors specified, then the number of
SMOTE instance is created. It works by creating the synthetic samples from the minority class instead of creating
copies. For each of the chosen dependent variable i.e. BMI, HAZ, WAZ and WHZ relevant independent features were
extracted using Machine Learning techniques using Entropy based Gain Ratio concept. The open source tool WEKA
3.619 is used for this study. In the next step the Gain Ratio feature evaluator is used that can evaluate the degree with
which various features are able to distinguish between HIGH, MID and LOW class of each of the class BMI, HAZ,
WAZ and WHZ respectively. This is done by calculating Gain Ratio of all the features with respect to class variables.
Gain Ratio overcomes the bias of information gain which is biased towards features with many outcomes.

| |
Info ( ) = × ( ) (1)
| |

( )= ( )− ( ) (2)

| | | |
(D) = ∑ | |
× ( ) (3)
| |
5342 SangitaSangita
Khare Khare
et al./ Procedia Computer
et al. / Procedia ScienceScience
Computer 00 (2017)
115000–000
(2017) 338–349

Fig. 1. Proposed Model Design

( )
( )=
( )
(4)

Here Info (D) in equation (1) is information needed after partitioning based on one feature A over the dataset D,
where feature A can take v different values, here | |/| | act as a weight of jth partition. Smaller value of Info (D)
means, feature is able to classify in a better way.Gain(A) in equation (2) provides the gain in information if feature A
is chosen for classification at a particular node. GainRatio(A) in equation (4) represents the ratio of Gain(A) and
SplitInfo (D), where SplitInfo (D) as given in equation (3) represents the information generated by splitting Dataset
D into v partitions based on v different types of values which feature A can take. The feature with the maximum
Gain Ratio is selected as the splitting feature.
Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349 343
6 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

Then Decision Tree Classifier (J48) is applied, which internally uses the information gain calculated for all features
for classifying them. The accuracy is calculated based on cross - validation concept to reduce the variance of the model
for finding the best features. Two-prong approach is used in the feature selection process. The first approach is based
on all the available and usable features in the dataset to select the reduced features. The second approach is based on
meta-analysis of literature. The features (or variables) are identified which are used in past research done using DHS
data. This provides comparison between literature identified (a priori) vis-a-vis automatically identified features using
machine learning techniques.

3.2. Phase II

In the Phase II, logistic regression model is constructed for all the four dependent variables namely BMI, HAZ,
WAZ and WHZ separately based on the identified reduced explanatory features. These dependent variables are
nominal, therefore, Nominal logistic regression is used. Further, features are extracted depending on the level of
significance and odds ratio parameters. These features are grouped based on their characteristics functionalities.
Nominal logistic regression technique is used to perform logistic regression on nominal features using iterative
reweight least squares algorithm, from which maximum likelihood can be estimated for the parameters. Logistic
regression generates a formula to predict logit transformation of probability of presence of characteristic features of
interest as given by equation (5), where q is probability of presence of characteristic features of interest as given in
equation (7). Logit transformation is defined as logged odds as given by equation (6).

( )= + + + +⋯+ (5)

= = and ( ) = ln (6)


= ⋯ (7)

Logistic formula are stated in terms of the probability that Y=1, which is referred as q. It estimates the expected
probability that Y=1 for a given value of X, choice of the parameters that can maximize likelihood of observing the
sample features. Odds ratio and significance level along with the coefficient values are analyzed for the behaviour of
the features towards the nutritional impact over the child health. Goodness of fit tests is carried out to check the
robustness of the model.

4. Results and Analysis

The results were generated for the cleaned and merged set of data, which consists of 401 features. We also separately
analyzed 164 features identified through literature survey. The second set is based on apriori information whereas the
first set does not have any such assumption. Performance metric was measured on the basis of accuracy of features
classified for class variables of BMI, HAZ, WAZ and WHZ. Testing was done using 10 fold cross validation method.
From the Fig. 2, the accuracy of the class features comparison can be made for the full features and the literature
identified features. From the Fig. 2 it is clearly found the top 10 features are contributing more in all but WAZ case.
The list of top 10 features of automated and literature survey with their intersection part of features are mentioned in
Table 4a. Here there two groups of features defined as [E] that are considered as the explanatory feature and [D]
considered as the defining feature. All features in [D] category have one or combination of height, weight or age
feature.
In Table 4a there are columns listing the features obtained for the all four classes which are top automated from
literature identified features and automated selected features. “P” in the table indicates the presences of the feature for
the particular definition of malnutrition based on BMI, HAZ, WAZ and WHZ.
7
344 Sangita KhareKhare
Sangita et al./ Procedia Computer
et al. / Procedia ScienceScience
Computer 00 (2017)
115000–000
(2017) 338–349

ACCURACY OF BMI ACCURACY OF HAZ

Automated_HAZ Literature_HAZ
Automated_BMI Literature_BMI

63.41
63.23
62.95

62.79
82.03
82.01

81.99

62.01

61.89
61.65

61.45
79.67
79.35

60.99
60.97

60.88
78.89

60.79

60.69
78.44

60.39
78.2
77.51

77.9
77.33
77.25

76.69
77.1

10 15 20 25 30 50 100
10 15 20 25 30 50 100

(a) (b)

ACCURACY OF WAZ ACCURACY OF WHZ

Automated_WAZ Literature_WAZ Automated _WHZ Literature_WHZ


82.59

82.58
82.51

82.09
96.79

96.64
96.58
96.05

81.44
89.19
87.78

91.3
85.27

82.44
81.15
72.27

71.19
70.34

70.22

79.26
79.5

78.46

78.26
77.83
77.29

77.09

77.6
76.85
10 15 20 25 30 50 100 10 15 20 25 30 50 100

(c) (d)
Fig. 2. x axis indicates top ranked number of features and along y axis their respective accuracy- (a) Accuracy of BMI, (b) Accuracy of HAZ,
(c) Accuracy of WAZ and (d) Accuracy of WHZ

Now the frequency of occurrence of each features are grouped as automated selected features, literature selected
features and intersecting features. Table 4b shows the frequency list for the common or intersecting features across
the automated selected features and literature identified features. There are 3 (in case of BMI and WHZ) or 4 (in case
of HAZ and WAZ) features identified to be common between literature and automated features.
Table 4a: frequency of automated selected features and literature identified features across the occurrence of four different classes of nutrition
status
AUTOMATED LITERATURE
Selected features BMI HAZ WAZ WHZ identified features BMI HAZ WAZ WHZ
H42-Taking iron pills,
V151-Sex of household head[E]
sprinkles or syrup [E] P P P P
HW15-Height: lying or V025-Type of place of
standing [D] P P residence[E] P P P
V409-Gave child plain water V714-Respondent currently
[E] P P P working[E] P
M17-Delivery by caesarean
H0-Received POLIO 0[E]
section [E] P P P P P
W124-Has money for her own V384B-Heard family planning on
use [E] P TV last few months[E] P P P P
Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349 345
8 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

V160-Toilet facilities shared with


M3A-Assistance: doctor [E]
P P P other households[E] P
V207-Daughters who have
S47J-has a TV[E]
died[E] P P
S564-Ever had blood V122-Household has:
transfusion[E] P refrigerator[E] P P
M55E-First 3 days, given
sugar/salt solution[E] P V155-Literacy[E] P
S47U-Household has water
V238-Births in last three years[E]
pump[E] P P
V153-Has telephone[E] P V404-Currently breastfeeding[E] P
V452A-Under age 18[E] V209-Births in past year[E]
P P P
S47N-Household has V106-Highest educational
computer[E] P level[E] P
V157-Frequency of reading
S569-Drinks alcohol[E]
P newspaper or magazine[E] P
V414M-Gave child liver,
H4-Received POLIO 1[E]
heart, other organs[E] P P P
V411A-Gave child baby
V190-Wealth index[E]
formula[E] P P
M55C-First 3 days, given
H6-Received POLIO 2[E]
sugar/glucose water[E] P P

Table 4b: frequency of intersecting features for automated features and literature selecting features across the occurrence of four different classes
INTERSECTING
Features BMI HAZ WAZ WHZ
V209-Births in past year [E] P P P
V384C-Heard family planning in newspaper last few months[E]
P P P P
V137-Number of children 5 and under in household (usual) [E]
P
V151-Sex of household head[E] P
S47J-as a TV[E] P
HW2-Child's weight in kilograms[D] P
V155-Literacy[E] P
M57F-Antenatal care: govt. dispensary [E]
P P

5. Model construction

Based on the identified features and considering the explanatory features [E] we constructed a logistic regression
model for all the four dependent variables namely BMI, HAZ, WAZ and WHZ separately. These dependent variables
are nominal in characteristic. Therefore, Nominal logistic regression provides the confidents of the features across the
class variables and odd’s ratio. These predictor values are of Logit 1: low/mid (L) and Logit 2: high/mid (H). L
explains the incidence and odds ratio of undernutrition whereas H explains the incidents and odds ratio of overnutrition
or obesity, both with respect to normal situation. The features having high significance of 1, 5 and 10 percent level
are listed for automated features identifying features across the low/mid and high/mid in the Table 5, with their odds
ratio mentioned for the group of Logit 1(low/mid) and Logit 2(high/mid). Odds ratio is a measure of association
between an outcome and an exposure and represents the constant effect of a predictor. If odds ratio is greater than 1
then it is associated with higher odds of outcome and if the value is less than 1 then it is associated with lower odds
of outcome. Higher and lower odds are of specific interest on targeting the independent variable in order to address
the incident of malnutrition of higher (obesity) or lower (undernutrition) order with respect to normal (mid) level.
Similarly, for the features selected by the literature having high significance of 1, 5 and 10 percent level are listed
for the features across the low/mid and high/mid in the Table 6.
346 Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349
9 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

 If the coefficient is of negative and the odds ratio is of less than 1 then more incident/occurrence of that feature
makes the outcome (dependent variable) less likely.
 If the coefficient is of positive and the odds ratio is more than 1 then more incident/occurrence of that features
makes the outcome event (dependent variable) more likely.

Table 5: Frequency automated features with significance level and odds ratio. * signifies the 1% level of significance, ** signifies the 5% level of
significance and *** signifies the 10% level of significance. L indicates predictor value at Logit1 (low/mid) and H indicates predictor value at
Logit2 (high/mid)
LOW/MID and HIGH/MID BMI HAZ WAZ WHZ
(Automated)
coefficien odds coefficien odds Coefficien odds coefficien odds
t ratio t ratio t ratio t ratio
M17: Delivery by caesarean L[-o.11] 0.89*** L[-0.32] 0.72* L[-0.26] 0.77*
section
M3A: Assistance: doctor L[-0.19] 0.83* L[-0.35] 0.70* L[-0.29] 0.75* L[-0.22] 0.80*
H[0.18] 1.20***
V207: Daughters who have died L[0.14] 1.15*
V209:Births in past year L[0.60] 1.82* L[-1.26] 0.28* L[-0.29] 0.75*
H[0.20] 1.22* H[0.80] 2.23* H[0.91] 2.47*
V384C: Heard family planning in L[-0.21] 0.81* L[-0.44] 0.64* L[-0.37] 0.69* L[-0.28] 0.75*
newspaper last few months H[0.16] 1.18***

V137: Number of children 5 and L[-0.11] 0.90*


under in household H[-0.12] 0.88*
S47U:Household has water pump L[-0.17] 0.84*

V153:Has telephone L[-0.32] 0.73*

S47J:as a TV L[-0.46] 0.63*

S47N:Household has computer L[-0.79] 0.45*


V409:Gave child plain water L[-0.57] 1.77* L[0.30] 1.34*
H[-0.94] 0.39* H[-0.60] 1.34*
V155:Literacy L[-0.28] 0.75*

V452A:Under age 18 mother H[-0.92] 0.40* H[-0.60] 0.55**

The farther the odds ratios for a particular feature, the higher is the influence of that feature on the dependent variable.
Therefore, these features are important in predicting the incidents of malnutrition. If analysed and understood properly,
at policy level these features can be targeted appropriately to minimize the incidents of malnutrition. The goodness of
fit tests for the model, is carried out by the Pearson chi-squared test for the four models. Our results show that the
model explains WAZ better than any other dependent variable defining malnutrition.

6. Discussion

The top features are selected from the automated features which are combined for the four different class attributes
and they can be grouped into different groups so as to get some meaningful description on their contribution to the
cause of undernutrition. We grouped the important features identified through automated features (machine learning)
as well as extant survey of literature. Broadly, these features could be grouped into five categories. Each category
require a different level of policy thrust. These groups are 1. Medical and paramedical assistance 2. Past experience
of the Mother 3. Awareness on family planning 4. Household assets and facilities 5. Household Social characteristics.
Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349 347
10 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

Table 6: Frequency Literature features with significance level and odds ratio. * signifies the 1% level of significance, ** signifies the 5% level of
significance and *** signifies the 10% level of significance. L indicates predictor value at Logit1 (low/mid) and H indicates predictor value at
Logit2 (high/mid).
LOW/MID and HIGH/MID BMI HAZ WAZ WHZ
(Literature)
coefficien odds coefficien odds Coefficie odds Coefficie odds
t ratio t ratio nt ratio nt ratio
L_V384B: Heard family planning L[-0.06] 0.88* L[-0.26] 0.88* L[-0.20] 0.81*
on TV last few months
L_V160: Toilet facilities shared L[-0.14] 0.87*
with other households
L_S47J:has a TV L[-0.26] 0.77* L[-0.40] 0.67*
H[0.18] 1.20**
V209:Births in past year L[0.62] 1.85* L[-0.63] 0.53* L[-0.56] 0.57* L[-0.24] 0.78*
H[0.24] 1.27* H[0.19] 1.69* H[0.77] 2.15* H[1.02] 2.78*
V137: Number of children 5 and L[-0.09] 0.91*
under in household H[-0.1] 0.91**
L_V025:Type of place of H[-0.33] 0.72* H[-0.19] 0.83** L[0.07] 1.07***
residence H[-0.28] 0.75*
L_V122:Household has: L[-0.4] 0.67* L[-0.20] 0.82*
refrigerator
L_V155:Literacy L[-0.24] 0.79*
H[-0.12] 0.89*
L_V404:Currently breastfeeding L[0.24] 1.27*
V384C: Heard family planning in L[-0.27] 0.77* L[-0.08] 0.93** L[-0.16] 0.85*
newspaper last few months
L_V238:Births in last three years H[0.19] 1.21*

V151:Sex of household head H[0.20] 1.22**


L_V106:Highest educational level L[-0.2] 0.82*
L_V157:Frequency of reading L[-0.05] 0.95*
newspaper H[0.14] 1.15**
L_H4:Received POLIO 1 L[0.12] 1.13* H[-0.22] 0.80**
H[-0.51] 0.60*
L_V190:Wealth index L[-0.26] 0.77*
H[-0.12] 0.88**
L_B4:-Sex of child H[0.21] 1.24*
L_H6:Received POLIO 2 H[-0.22] 0.80*
M57F: Antenatal care: govt. L[-0.23] 0.79***
dispensary

Features are extracted depending on the level of significances and odds ratio, and these features are categorized
into five groups depending on their characteristics in Table 7.
The results show that features related to the medical and paramedical assistance and awareness creation would hugely
favor addressing malnutrition issues. Awareness through newspaper and television appears to have implications on
addressing malnutrition. Similarly awareness and action in providing antenatal care in government dispensaries, polio
vaccination, breast feeding and giving plain good quality drinking water to the child are also important healthcare
actions proving influential in addressing malnutrition issues.
On the other side, developing human resource capabilities in the households and especially among the mothers viz.,
literacy, preventing underage marriage and higher educational attainment would not only directly address malnutrition
issues but also indirectly help the household creating assets. Toilet facilities, increase in household income that can
help generating wealth and other assets are some of the medium and long term goals. It is also important to understand
incidents of malnutrition in specific geographical context viz., rural vs. urban, across states, slum areas to identify the
higher incidents of malnutrition and have target policy for the regions.
348 Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349
11 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

Table 7: frequency of intersecting features for automated and literature across the occurrence of four different classes. (Underlined features are
identified from literature and dotted underlined are the common for automated and literature features.)

Grouping the features

Medical and M17-Delivery M3A: V409:Gave V404:Currently H4:Received H6:Recei M57F:


paramedical by caesarean Assistance: child plain breastfeeding POLIO 1 ved Antenatal
Assistance section doctor water POLIO 2 care: govt.
dispensary
Past Experience of V207: Daughters who have V209:Births in past V137: Number of children V238:Births in last three years
the mother died year 5 and under in household

Awareness V384C: Heard family V384B: Heard family planning S47J:as a TV V157:Frequency of reading
planning in newspaper last on TV last few months newspaper
few months

Assets and S47U: V153:Has S47N: V122:Household V190: V160: Toilet V025:
facilities Household telephone Household has has: refrigerator Wealth facilities shared Type of
has water computer index with other place of
pump households residence
Household Social V155:Literacy V452A:Under V151:Sex of household V106:Highest B4:Sex of child
Characteristics age 18 head educational level

7. Conclusion

There are positives and negatives on dependency on either of the artificial or human intelligence alone. Human
intelligence may have limited cognitive capacity to assimilate multiple feature. At the same time human intelligence
is useful in bringing logical deduction. Artificial intelligence on the other hand can iterate large information but lack
the logical abilities. Therefore, a combination of both would be useful in arriving at important feature selection and
policy decisions. There are many applications ranging from medical domain to robotics which are using artificial
intelligence techniques [20-22]. Human intelligence is paramount. It would be more useful when it is combined with
features selected through artificial intelligence. Our comparison of extant literature based features and artificial
intelligence brings out these aspects. However, our data is limited to a particular time and context of India. Segregating
the data across different geographical features as well as considering longitudinal data over different points of time
would bring out interesting features more explicitly. Besides, newly identified features can be probed at field level to
identify their influence on malnutrition. Future research can address these issues.

References

[1] Pelletier DL, Frongillo EA. Changes in child survival are strongly associated with changes in malnutrition in developing countries. J
Nutr 2003; 133:107-19.
[2] El-Ghannam AR.The global problems of child malnutrition and mortality in different world regions. J Health Soc Policy 2003; 16:1-
26.
[3] World Bank Report on Malnutrition in India. Source: The World Bank (2009). Retrieved 2017-03-13.
[4] http://www.dhsprogram.com/ .Accessed on Jan 2016.
[5] SN Ariyadasa, LK Munasinghe, SHD Senanayake, NAS Fernando. Data Mining Approach to Minimize Child Malnutrition in
Developing Countries, Sri Lankan Context. The International Conference on Advances in ICT for Emerging Regions-ICTer2012,
Colombo, Sri Lanka.
[6] Nguyen Ngoc Hien, Sin Kam .Nutritional Status and the Characteristics Related to Malnutrition in Children under Five Years of age in
Nghean, Vietnam, J Prev Med Public Health 2008; 41(4):232-40.
Sangita Khare et al. / Procedia Computer Science 115 (2017) 338–349 349
12 Sangita Khare et al./ Procedia Computer Science 00 (2017) 000–000

[7] S Anoop, B Saravanan, A Joseph, A Cherian, K S Jacob .Maternal depression and low maternal intelligence as risk factors for
malnutrition in children: a community based case-control study from South India. Arch Dis Child 2004 Apr; 89(4): 325–329.
[8] Griffiths PL, Bentley ME. The Nutrition Transition Is Underway in India. J Nutr 2001; 131(10): 2692-2700
[9] R Morales, A Marı´a Aguilar, A Calzadilla .Geography and culture matter for malnutrition in Bolivia. Econ Hum Biol 2004;2(3): 373–
389
[10] Suriyakala V, Deepika MG, Amalendu J, Deepa G. Factors Affecting Infant Mortality Rate in India: An Analysis of Indian States.
Intelligent Systems Technologies and Applications. Advances in Intelligent Systems and Computing, vol 530. Springer; pp 707-19.
[11] A Kalea, N Autib. Automated Menu Planning Algorithm for Children: Food Recommendation by Dietary Management System using
ID3 for Indian Food Database. 2nd International Symposium on Big Data and Cloud Computing (ISBCC’15), 2015 Elsevier Publication.
[12] Z Markos, F Doyore, M Yifiru, J Haidar. Predicting Under Nutrition Status of Under-Five Children Using Data Mining Techniques:
The Case of 2011 Ethiopian Demographic and Health Survey. J Health and Medical Informatics 2014, Vol-5, Issue-2.
[13] AL Fort, MT Kothari, N Abderrehim. Association between Maternal, Birth and Newborn Characteristics and Neonatal Mortality in Five
Asian Countries. DHS Working Papers No. 55. Calverton, Maryland, USA: Macro International. Available at
http://dhsprogram.com/pubs/pdf/WP55/WP55.pdf.
[14] MT Ruel, P Menon. Child Feeding Practices Are Associated with Child Nutritional Status in Latin America: Innovative Uses of the
Demographic and Health Surveys. J Nutr 2002; 132(6):1180-7.
[15] M Arimond, MT Ruel. Dietary diversity is associated with child Nutritional status: Evidence from 11 Demographic and Health Survey.
J Nutr 2004; 134(10):2579-85.
[16] BA Abuya, J Ciera, EK Murage.Effect of mother’s education on child’s nutritional status in the slums of Nairobi. BMC Pediatrics
201212:80. 2012, Source: PubMed.
[17] F Arnold, S Parasuraman, P Arokiasamy, M Kothari. Nutrition in India. National Family Health Survey (NFHS-3) India 2005-06.
Mumbai: International Institute for Population Sciences; Calverton, Maryland, USA: ICF Macro.
[18] NV Chawla1, KW Bowyer, LO Hall, WP Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial
Intelligence Research , 2002, 321 - 357.
[19] http://www.cs.waikato.ac.nz/ml/index.html
[20] G Deepa, K Sangita, A Ashish. A method to predict diagnostic codes for chronic diseases using machine learning techniques. Proceeding
- IEEE International Conference on Computing, Communication and Automation, ICCCA 2016; 281-287.
[21] K Sangita, G Deepa. Association rule analysis in cardiovascular disease, 2nd International Conference on Cognitive Computing and
Information Processing, CCIP 2016, Mysuru; India. Article number 7802881.
[22] D Vinitha, G Deepa, K Sangita, A Ashish. Investigation of chronic disease correlation using data mining techniques. 2nd International
Conference on Recent Advances in Engineering and Computational Sciences RAECS 2015; Article number 745332.

You might also like