You are on page 1of 5

and several methods were applied to remove non-significant


Felipe Feijoo, Dongping Du, and Thomas Gebregergis.

PhysioNet 2012 Challenge: Predicting ICU Patient Mortality


Abstract—ICU recourse is limited due to high cost, thus, the features. Logistic Regression (LR), Support Vector Machine
intensive care need to be provided to the patients who need to (SVM), and Neural Network (NN) models were training to
be took care of most. Developing a good approach to predict the predict the in-hospital risk.
in-hospital mortality can be quite helpful on evaluating the
need for ICU, and making triage decisions. Physionet 2012 II. MATERIALS AND METHOD
challenge collected data from patients during the first 48 hours
ICU stay, and called for good methods for predicting the
A. Data
mortalities. In our project, we applied some machine learning This project used the dataset provide by Physionet 2012
methods to learn the information in the large ICU dataset. challenge. The data was originally collected from the open
Features, e.g., mean, variance, median, skewness, kurtosis, were access Multiparameter Intelligent Monitoring in Intensive
extracted, and non-significant features were removed with Care (MIMIC) II database [5]. Three datasets were provide,
some method, e.g., t-test, forward/backward/stepwise selection. i.e., set A, Set B, and set C. Each dataset comprised of 4000
Logistic regression model, support vector machine, and neural patients’ information and measurements during their first 48
network model were trained to make prediction. The logistic hours’ stays in the ICU. Time-stamped measurements of 37
regression model gives the best performance, and can correctly distinct variables were recorded, and 5 static variables were
classify 42% percent of the all the patients in dataset C. collected when the patient first came into the ICU. The
I. INTRODUCTION hospital mortality outcomes of dataset A was given for model
development. Models, such as MIMIC II, mesh data from
The intensive care unit (ICU) is a special department in various data resources, thus, provide more opportunities to
hospital that provides intensive care to severe ill and injured develop more mathematical and statistical models that can
patients. This intensive care cannot be provided to all patients predicted a patient’s outcome, hence provide references for
due to limited resources and high cost of ICUs. Hence, ICU operating.
determining who needs the benefit best from the ICU
facilities would be one of the most important decisions in B. Feature Extraction
ICU operating. Different methods have been proposed to We first convert the temporal variable into scalar features,
estimate the severity of illness, and to predict the survival i.e., for each temporal variables, the mean, median, variance,
probability of patients, e.g. many researchers have given maximum, minimum, sknewness and kurtosis were
simplistic score, i.e., APACHE (Acute Physiology and calculated. Some equations used to calculate feature is shown
Chronic Health Evaluation) [1], MPM [2] or SAPS [3], to in Table 1. Beside, each variable is fitting to a linear regress
evaluate h the illness severity. They assign weight ranging model y=β 0 + β 1 t , where t (hours) is the time
from 0 to 4 to each physiologic or chemical measurement when the data point were measured, and y is the linearly
collected during the first 36 hours of a patients’ stay in ICU. predicted values. We minimized the differences between the
The sum of these weights scores the illness severity of real data and the predicted data to get the best set of β .
patient. However, in these scoring methods, weight of each
Thus, for each temporal variable, 9 features were extracted.
measurement is determined by a panel of experts, which
The 5 static variables were considered as patients’ features. It
highly depends on the expertise of the panel members.
may be noted that ‘MechVent’ is a binary variable, and it is
The faster development on computational technique and not necessary to calculate the features. So we set this value to
data processing capability enable researcher to study the 1 if mechanical ventilation respiration is used by a specific
potential of large data information. Now many machine patient during the 48 hours, and 0 otherwise. Therefore, each
learning methods has been proposed and applied in hospital single patient has 330 features, and the size of whole dataset
to improve the patients’ monitoring system, especially in the A is 4000 by 330.
ICUs where patients need special observation [4]. Table 1 Equations Used to Calculate Features
The PhysioNet 2012 challenge collected data from ICU Features Equation
patients during the first 48 hours of stay, and promoted n
students to develop models for predicting in-hospital 1
mortality. Considering the data complexity, we applied some
Mean X́ = ∑ X i
n i=1
machine learning methods to analyze the patients’ data, and
n
trained different models. Features were extracted from data, 1
Variance var = ∑ ( X − X́)2
n i=1 i
*Resrach supported by ABC Foundation. n
F. A. Author is with the National Institute of Standards and Technology, 1
Boulder, CO 80305 USA (corresponding author to provide phone: 303-555- ∑ ( X − X́ )3
n i=1 i
5555; fax: 303-555-5555; e-mail: author@ boulder.nist.gov).
S. B. Author, Jr., was with Rice University, Houston, TX 77005 USA. He Skewness s 1= 3

(√ )
n
is now with the Department of Physics, Colorado State University, Fort 1 2
Collins, CO 80523 USA (e-mail: author@lamar. colostate.edu).
T. C. Author is with the Electrical Engineering Department, University

n i=1
(X i− X́ )
of Colorado, Boulder, CO 80309 USA, on leave from the National Research
Institute for Metals, Tsukuba, Japan (e-mail: author@nrim.go.jp).
n
1 were selected including the predictor variables Age, Height

n i=1
( X i− X́)4 and Weight.
Kurtosis k 1= n 2 Correlation Test using Pearson’s correlation coefficient.

( 1

n i=1
( X i− X́ )2 ) Accordingly, Pearson’s correlation test was done on the
significant 137 features to check for their linear relationship.
n The correlation coefficient of 137 was calculated and put
β0 , in to correlation matrix. We were interested on coefficient s
β1 Minimize{ ∑|X i−β 0 + β 1 t i| } with absolute value greater than 0.9. Hence 28 pairs of
i=1
variables were in this range. To decide which variables to
reduce, it was important to look at the p-values of the t-test
C. Feature Selection for feature selection. Based on that, the variables with small
p-value get considered as representative variable of the high
T-test. A t-test was done on all the candidate features to check
correlated pair. When developing the logistic regression
if they have significance in predicting the mortality of ICU
models (discussed later in the paper), feature selection
patients. Assuming a feature has independent samples for
algorithms were utilized (forward selection, backward
both groups. 248 features were tested if the population of the
selection and step wise selection).
groups is significantly different. Moreover, the result from
this hypothesis testing is an indication of relevancy of D. Outliers and Data Imputation
variables relative to the outcome which is 0 for survived The deletion or replacement of outliers’ data points was
patients and 1 for in hospital death. The hypotheses on this
performed after the features were calculated (we could have
statistical test are: Null hypothesis (Ho):
μ0 i=μ1 i ; removed the outliers first as well). Depending on the feature,
the outliers were deleted, or removed by median or mean of
Alternate hypothesis (H1): .
μ0 i≠μ1 i = Where,
μ0 i the column. Chi squared test for outliers was also utilized to
populations mean of feature i for group of patients who evaluate if a data point cold be selected as outlier. Two data
survive at the end of 48 hours.
μ1i = populations mean of sets were created. The first data was develop by replacing the
missing values (data imputation) and outliers by the mean
feature i for group of patients who were dead within 48 value of the column, and the second data set was using
hours, and i = 1, 2, 3 … 248. median value of the column (without considering the value of
y 1i − y 0i outliers in both data sets). We only considered in the models
those features with less than 50% of missing values.


s s
1i 2 0 i2
+ E. Model Description
Test statistic (t) = n1 i n 0i
Neural Network Model. The neural network model has been
Where,
y
0i = sample mean of feature i for group of widely used in many fields due to its good performance on
patients who survive at the end of 48 hours. complex and nonlinear problems. Most of the data in our
case is not normally distributed; hence, neural network may
y1i= sample mean of feature i for group of be a more efficient tool to learn the hidden information in
patients who were dead within 48 hours. the data. Figure 1 shows the overall block diagram of the
s neural network structure used in this project. 110 features
1i 2
= sample variance of feature i for group of were left after removing non-significant and high-correlated
patients who were dead within 48 hours. items. Considering the model complexity and computational
s efficiency, we choose three-layer network with number of
0 i2
= sample variance of feature i for group of neurons S 1=50 , S 1=50 , and S 1=1 in each
patients who survive at the end of 48 hours. layer. Since we are estimating the survival probability of
n1i= number of samples for group of patients
patients, Log-sigmoid transfer function is applied at each
layer. The output of this neural network is:
who were dead within 48 hours.
n0 i= number of samples for group of patients
3 3 2 1 1 2 3
a =f (W f ( W f ( W P+b ) +b ) +b )
who survive at the end of 48 hours.
w11,1 ⋯ w11,n

( )
|t|>t0 . 025 , n1 i +n0i −2
If, , we reject the null hypothesis.
Which is an indication of the means of the two groups are Where W i= ⋮ ⋱ ⋮ , bi=(b 1 , … b S )T ,
i

w S ,1 ⋯ w1S ,n
1
different and this in turn indicates feature i is a qualified
i i
predictor.
i=1,2,3 and n = 110, 50 and 50 accordingly. P is
In general, a feature which gives a p-value of less than
0.05 was considered significant predictor. Thus out of 248
matrix composing all the features, and a3 is the
features which were tested using this statistic 137 features predicted value. All the model parameters yield to the
optimal fit to the in hospital outcomes.
First Layer Second Layer Third Layer (summarized on Table 2). The threshold to classify patients
p1
1 1 was set to 0.77 (>0.77 classified as death). The model is
based on the data set imputed with the mean values of each
p2
2 2 feature. The data was divided 80% and training, 20% testing.
risk The unofficial event 1 score on dataset A is 0.46.
p4 1 Table 2 Features used in logistic regression model
Features Attributes used
49 49
Static Age, ICUType, MechVent,
Variables
50 50 Mean HR, Lactate, GCS, Urine, Fi02, HC03, pH, Weight, Pa02,
pn
Na, NIMAP, MAP, DiasABP, K
Median GCS, Temp, pH, Creatinine, PaC02, Plateletes, NIMAP,
Figure 1. Neural Network Structure
NIDiasABP, DiasABP
Kurtosis DiasABP, Fi02, HC03, Glucose, WBC, SysABP, Temp,
Logistic Regression (LN). The logistic regression is a type of GCS, NISysABP, MAP, K
regression that is commonly used to predict the relation of Skewness HC03, HR, Pa02, Creatinine, WBC, Urine, Plateletes, pH,
covariates with a categorical response variable. This model GCS, Na, NIDiasABP, PaC02
Minimum BUN, GCS, pH, Mg, Temp, Pa02, HCT, Creatinine, Na,
estimates the probability of an outcome to belong to a DiasABP, PaC03, HR
category (response variable, 0 or 1) based on the predictors Maximu Temp, NISysABP, Fi02, HC03, Pa02, Plateletes, pH,
x i . In this work, the probability p represents the m NIDiasABP, DiasABP, MAP, PaC02
Beta0 Creatinine, Mg, PaC02, Plateletes, MAP, HCT, Temp, K
probability of death of a single patient. The model is
described as follows: Beta1 GCS, Pa02, HC03, Plateletes, NIDiasABP, Na, Fi02,
NIMAP, Creatinine, Glucose, BUN, SysABP
log ( 1−p p )=β +∑ β x0
i=1
i i
Variance Temp, PaC02, NIDiasABP, HC03, Pa02, Plateletes, GCS,
HR, Na

Therefore, the probability of belonging to a category is


calculated as follows: B. Neural Networks
1
p= −(β 0+ ∑ β i x i )
1+e i=1

This probability is then a function of the predictors and the


regression coefficients β i .
Figure 2 Neural network structure
First, the irrelevant features were removed with step-wise
Support Vector Machine (SVM) is a classification technique feature selection, and high correlated values were deleted.
which does not need any distribution assumption. It is Then, the significant features (see Table 3) were put into the
popular for its non-parametric nature and its robustness with neural network model, and the model structure was shown in
high dimensional data. It uses the subset of training cases Figure 2. A 0.5 thresh hold was used to predict the mortality,
which are known as support vectors to represent decision i.e., survival = 1 when risk >0.5, survival = 0 otherwise. The
boundary. In addition, it works by creating a maximal model performance was evaluated on the testing dataset,
separating hyper plane between the positive and negative which accounts for 20% of the whole dataset. The sensitivity
cases. The parameters for a linearly separable case should be is 0.5882, and the positive predicted value is 0.6452. The
trained as follows: unofficial event 1 score on dataset A is 0.5781, and the
w . xi+b≥1 if, yi=1 unofficial event 2 score on dataset A is 321.4658.
Table 3 Features used in Neural Network model
w . xi+b≤−1 if, yi=-1
Features Variables
If data is not linearly separable, a special treatment is needed Static Age, ICUType
that involves the transformation of data to a new n- Variables
dimensional space. This takes challenging computation, so Mean Lactate, GCS, Fi02, pH, Weight, Pa02, DiasABP,Glucose,
that it is usually easy using a kernel trick. Radial basis, WBC, HCT, Mg, SysABp
Median Temp, pH, Plateletes, NIMAP, NIDiasABP,Mg, HR, BUN
polynomial, quadratic and linear are the most commonly
used kernel functions. In this classification algorithm the Kurtosis HC03, Weight, Plateletes, GCS, HR, DiasABP, Temp, Na,
NIDiasABP, Fi02, WBC, pH, Creatinine
polynomial kernel of order 3 was found very compelling. Skewness NISysABP, HC03, Glucose, PaC02, Na, Plateletes, Fi02,
III. RESULTS WBC, SysABP, Creatinine, DiasABP
Minimum BUN, GCS, Pa02, Mg, Glucose, Creatinine, NIMAP,
On this section we provide the models that we have been NIDiasABP, pH, K, DiasABP, HR, Urine, MAP
Maximum Urine, Temp, Glucose, Pa02, NISysABP, DiasABP, Fi02,
able to develop. BUN, HCT, Plateletes
Beta0 BUN, Temp, Mg, Creatinine, PaC02, NIDiasABP,
A. Logistic Regression: NIMAP, DiasABP, Fi02, HR
The model was obtained by using Step-Wise feature Beta1 GCS, Pa02, Na, HC03, Creatinine, NIDiasABP, Weight,
selection. This algorithm selected 105 features as predictors NISysABP, SysABP, HR, Plateletes
Variance Temp, Plateletes, Glucose, Creatinine, SysABP, Pa02, performed by looking at the correlation matrix and utilizing
Urine, NISysABP, pH, PaC02, HCT
algorithms like forward/backward/stepwise selection. Three
C. Support Vector Machine main models are presented: Support Vector Machine, Neural
Two methods were used to do the classification of patients in Network model, and Logistic Regression model. Out of
hospital death. The first SVM algorithm, from Penalized these three models, Logistic regression outperformed the
SVM library, used 109 predictor variables. It uses penalty other two models by obtaining an Even 1 score of 0.404 (Se
functions to deal with the high dimensionality of the data. = 0.449573, PPV = 0.403994) on testing set B, and 0.42 (Se
This was done using a tuning parameter lambda in the range = 0.482394, PPV = 0.420245) on testing set C. From the
of [0.01, 1]. Accordingly, the best optimal lambda was 0.5 scores in the different sets A,B,C we could conclude that the
and this algorithm yields 47 significant variables (see Table NN and SVM models were maybe over fitted to the data set
4). Using the output of the previous model, an SVM model A (since high scores on set A but low scores in sets B and C).
was trained again for the 47 significant variables using a This is not the case for logistic regression since the score
polynomial kernel of order 3 since the data was not linearly was not that much reduced on sets B and C in comparison
separable. The best score for this model in the event 1 was with set A. Better performances could be achieved by
0.361 on data set B and 0.33 on data set C. improving the data processing, missing value
Table 4 Features used in Support vector machine
manipulation/imputation, outliers treatment, and developing
Features Attributes used better techniques to estimates thresholds for classification of
Static Age, ICUType
Variables patients. Also, the knowledge and support from a physician
Mean Mg, Lactate, Urine, HC03, Fi02, GCS, NIDiasABP, can help to better understand the relation of variables, and
Weight, HR, PaC02 what the important or significant variables are. This
Median PaO2, Creatinine, Mg, Urine, HCO3, GCS
Kurtosis HCO3, Na
knowledge could help us to develop better models and
Skewness HCO3, Na, pH improve the scores.
Minimum GCS, WBC, DiasABP, HCO3, Pa02, Mg, BUN, MAP
Maximum Weight, Creatinine VI. TEAM MEMBER’S CONTRIBUTION
Beta0 HR, WBC, Creatinine
Beta1 Temp, BUN, Creatinine, GCS, Pa02, HC03 All the members in our team contributed greatly. The
Variance Fi02,HC03 design of the study was discussed and agreed by all team
members. The progress of this project and the contribution of
IV. DISCUSSION each member were listed below:
We have trained three different types of models using Data organization & feature extraction Dongping Du
Support vector machines (SVM), neural network (NN) and Feature t-test & correlation test Thomas Gebregergis
logistic regression algorithms (LR). Each model was trained selectio Step-wise feature selection Felipe Feijoo
using data set A and tested on data sets B and C. 80% of set A n
Models LR model Felipe Feijoo
was used to train the algorithms as 20% was used to validate
NN model Dongping Du
the models in its split set. SVM model Thomas Gebregergis
The logistic regression model get the best score of all Imputation and missing values treatment Felipe Feijoo, Thomas
Gebregergis
with 0.404 on set B and 0.42 on set C for event 1. The SVM Model submission Dongping Du
model seems highly over fitted. Even though it can predict Report Abstract, introduction & Dongping Du
good enough on the training set around 0.80, it can only get a method
score of 0.361 on set B and 0.33 on set C. The main Discussion and method Thomas Gebregergis
advantage of the SVM model was relatively simple since it Results and method Felipe Feijoo
only used 47 predictor variables. 109 predictor variables were Final revision and proof reading Felipe Feijoo
used to train the neural network and 105 for the logistic
regression. REFERENCES
Table 5 scores of models [1] Knaus W, Zimmerman J, Wagner D, Draper E. APACHE II: a severity
of disease classification system. Critical Care Medicine 1985;13:818–
Score on Event 1 score Event 1 score Mode 829..
Trials
set A (Set B) (Set C) l [2] Lemeshow S, Teres D, Klar J, et al. Mortality probability model
Trail 2 0.87 0.361 0.330 SVM (MPM II) based on an international cohort of intensive care unit
Trail 3 0.57 0.364 0.359 NN patients. JAMA 1993;270:2478–2486.H. Poor, An Introduction to
Signal Detection and Estimation. New York: Springer-Verlag, 1985,
Trail 4 0.46 0.404 0.420 LR ch. 4.
[3] Le Gall JR Lemeshow S SF. A new simplified acute physiology score
(SAPS II) based on a European/North American multicenter study.
V. SUMMARY AND CONCLUSION JAMA 1993;270:2957–2963.
[4] Harra,L, Williams, D.,Harris, S., Martinez, D., Fong, K., “2012
In this work we have addressed the challenge PhysioNet PhysioNet Challenge: An Artificial Neural Network to Pr edict
2012. The mail goal is to predict patient survival/death for Mortality in ICU Patients and Application of Solar Physics Analysis
Methods,” Computing in Cardiology (CinC),2012, 485-488.
patients attending the ICU. Data preparation was performed [5] Saeed M, Villarroel M, Reisner A, Clifford G, Lehman L, Moody G,
by taking care of outliers and missing values. Also, features Heldt T, Kyaw T, Moody B, RG M. “Multiparameter intelligent
for each variable were calculated. Feature selection was
monitoring in intensive care ii (mimic-ii):A public-access intensive
care unit database.” Critical Care Medicine 2011;39(5):952–960.

You might also like