You are on page 1of 8

5th International Conference on Advances in Civil Engineering (ICACE-2020)

21-23 December 2020


CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace

PREDICTING HOSPITAL ADMISSIONS IN DHAKA DUE TO CHEST


DISEASES USING MULTIPLE LINEAR REGRESSION AND FEED
FORWARD NEURAL NETWORK
R.A. Rafsan1*, Z.S. Ishmam2, T. Ahammed3
1Department of Civil Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, email:
rizvanahmedrafsan@ug.ce.buet.ac.bd
2Department of Civil Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, email:

zuhayrshahidishmam@ug.ce.buet.ac.bd
3Department of Civil Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, email:

turazahammed@ug.ce.buet.ac.bd
*Corresponding Author

Abstract

The prevalence of various kinds of chest diseases, including Pulmonary Tuberculosis, Asthma, COPD,
Emphysema, Pneumonia and allied diseases have been on the rise worldwide. Hospitalization and morbidity of
patients with some of chest and pulmonary diseases have shown a distinct correlation with air quality. Specialized
hospitals that treat chest diseases have seen a sharp rise in hospital admissions in Dhaka City, where air quality
level is one of the worst in the world. This study aims to find models to predict hospital admission due to chest
diseases using air quality levels and meteorological parameters as predictors. We developed two prediction
models, Multiple Linear Regression (MLR) prediction and Multi-layer Perceptron (MLP), a type of feedforward
Artificial Neural Network (ANN) to predict hospital admissions due to chest diseases and compared the
performances of these models. We collected daily hospital admission data from National Institute of Diseases of
the Chest and Hospital (NIDCH), Dhaka, which is a specialized hospital for treating patients with different kinds
chest diseases. Daily average concentration of pollutants data was collected from three different Continuous Air
Monitoring Stations (CAMS) in Dhaka and daily average data of meteorological parameters for Dhaka was
collected from Bangladesh Meteorological Department. All data were collected for the period of 2013 to 2018.
We measured the performance of the model using Root Mean Squared Error (RMSE) for the predicted number of
hospital admission with no scaling of the data. The RMSE value for the prediction model derived via MLP is
12.879 which is lower than the RMSE of the model derived via MLR, for which the RMSE is 12.978. The results
show that, prediction error of the prediction model derived with MLP is usually lower than the model derived via
MLR, which indicates that Artificial Neural Networks can improve the prediction performance for this type of
dataset. This study demonstrated a modeling approach to predict hospital admission due to air pollution level,
which could help the hospitals to get prepared for future surge in patient due to severe air pollution.

Keywords: Chest disease; air quality; meteorological parameter; Multi-layer Perceptron (MLP); Multiple Linear
Regression (MLR).

1. Introduction

Dhaka city has seen staggering increase of air pollutant concentration in the past decade. With the gradual
worsening of air quality, the number of patients with chest and respiratory diseases has also risen in the hospitals
of Dhaka city. Past studies show correlation between ambient air pollution and health deterioration [1]. Many
studies also recognize the increase in air pollution level to be associated with respiratory tract and chest diseases
like Asthma, Chronic obstructive pulmonary disease (COPD) etc. [2, 3]. Along with air pollution past studies
have found atmospheric changes like weather and meteorological conditions to be a causal factor of sudden trigger
for patients with respiratory conditions to have emergency hospital visit [4, 5].

Multiple Linear Regression (MLR) analysis [6], linear regression model with variables unlagged or lagged by 24
hours [7], are some of the studies that use statistical analysis and machine learning based techniques to assess the
association of daily averaged pollutant concentrations, meteorological variables and daily hospitalization counts
due to respiratory diseases. Examples of neural network study in the same field include, use of both Artificial
Neural Network (ANN) and conditional logistic regression [8], ANN based classifier using Multi-Layer
Perceptron (MLP) with back propagation algorithm that predicts Peak Event (peak demand days) [9] and seven
5th International Conference on Advances in Civil Engineering (ICACE-2020)
21-23 December 2020
CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace
days ahead forecasting of childhood hospital admissions using meteorological and air pollution as input variables
using ANN [10]. There have been few studies related to air pollution, meteorological influence and health hazard
in Bangladesh [11–13].To the best of our knowledge there have been no past studies in Bangladesh for creating a
prediction model of patients suffering from chest diseases.

This study seeks to develop forecast models using MLP and MLR and evaluate the possible association of
meteorological parameters and air pollution with the number of daily hospital admissions due to chest diseases.
The relevant patient data we used is the number of daily patient admission in National Institute of Diseases of the
Chest and Hospital (NIDCH), Dhaka. This hospital best represents the acute situation of respiratory distresses of
the mass people exposed to an alarming pollution level. We also evaluated and compared the performance of two
the two different models mentioned, for the prediction of number of indoor admissions with air pollutant and
meteorological data. We believe this study can help hospital administration, policy makers in both environmental
and medical disciplines in taking suitable decisions to combat sudden surge of patients from a broader approach.

2. Methodology
2.1 Study Site

Dhaka city, currently the sixth most densely populated cities in the world, has an area of 2161 km2. Following the
global urbanization trend, the city has become one of the fastest growing cites. But, due to the lack of proper
planning for infrastructures, resource management and negligence towards environmental policy frameworks,
Dhaka Metropolitan Area (DMA) has witnessed a dreadful degradation in the overall environmental condition,
especially in the air quality [14]. Vehicular emission, brick kilns and construction works have been identified as
the major source of pollution inside the city for the past few decades [15–17]. The major vehicle heavy roads are
located in the south and south eastern part of the city where the Continuous Air Monitoring Systems (CAMS) are
operating. Most of the brick kiln clusters are in the six districts surrounding DMA [18]. The meteorological data
collection station is located at Agargaon, which is within a range of three kilometers from all three CAMS stations.

2.2 Medical, Meteorological and Pollutant Data

National Institute of Diseases of the Chest and Hospital (NIDCH), Mohakhali, Dhaka, has a database of their daily
indoor and outdoor patients’ record. The records include gender and morbidity data for indoor patients, and for
outdoor attendees a separate children and adult (both male and female) demographic data is also available. For
our study, daily number patient admission (Indoor Patients) was collected for the time period of 2013 to 2018.
The corresponding daily meteorological data of Dhaka city was obtained from Bangladesh Meteorological
Department (BMD) which collects and processes meteorological data obtained from different monitoring stations
throughout the country. Of all the meteorological parameters recorded by BMD, we used 4 parameters in this
study namely – average dry-bulb temperature (°C), average humidity (%), total rainfall (mm) and prevailing
windspeed (knots).
Pollutant data was provided by the Department of Environment (DoE), Dhaka. Their Clean Air and Sustainable
Environment (CASE) project monitors the criteria pollutants, carbon monoxide, nitrogen dioxide, ozone, sulfur
dioxide, PM10 and PM2.5 with 11 Continuous Air Monitoring Systems (CAMS), which has essentially created a
monitoring network across the country. Three of these CAMS (CAMS -1, Sangshad Bhaban, Sher-e-Bangla
Nagar; CAMS-2, Farmgate; CAMS-3, Darussalam) are continuously monitoring the criteria pollutants inside
Dhaka city. These CAMS also collect some meteorological parameters (Solar radiation, Relative Humidity,
Ambient Temperature and Rainfall) among which, solar radiation parameter is also used as a meteorological
parameter in this study.

2.3 Data Preprocessing

The daily CAMS data provided by DoE from all the three CAMS stations was combined with meteorological
parameters and number indoor patient admissions to make a total of three datasets. All of the datasets had a total
of 11 predictor variables comprising the air pollutant and meteorological parameters mentioned before and the
only response variable was the number of daily indoor patient admission in NIDCH. Daily CAMS data and the
indoor patient data contained a significant percentage of missing values; whereas, the data collected from BMD
contained no missing value. We imputed the missing data with simple average method by which we replaced a
missing value of a specific chronology in a year with the average value of the previous and next year’s data of the
same chronology. Missing values were imputed for every pollutant but the patient data was kept unchanged as it
5th International Conference on Advances in Civil Engineering (ICACE-2020)
21-23 December 2020
CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace
was the output variable in our models. Table 1 shows the summary of the percentage of missing values for
different variables before and after the imputation process. After the completion of this data imputation process,
some missing values were still in the datasets. In the data cleaning process, we made the datasets uniform by
ensuring no sample in the datasets contained any missing value. The overall flow of this study including the
preprocessing and forecasting steps are shown in Figure 1.

Table 1: Percentage of missing values before and after data imputation for different predictor variables
containing missing values
Unprocessed Data After Imputation
CAMS-1 CAMS-2 CAMS-3 CAMS-1 CAMS-2 CAMS-3
SO2 78.5 21.2 12.1 7.3 4.1 4.5
NO2 84.1 42.0 12.8 8.4 2.6 1.6
CO 54.7 14.6 24.6 10.2 3.1 0.2
O3 65.3 11.4 17.5 7.1 3.0 1.5
PM2.5 43.8 35.9 5.0 7.6 7.9 0.4
PM10 52.6 46.6 5.2 5.5 6.8 0.3
Solar Radiation 36.7 34.5 2.6 5.2 2.5 0.2

Medians, quartiles along with the spread of different variables of the datasets are shown in Figure 2. From the
figure, it is worth noting that, most of the predictor variables and the response variable contain a significant amount
of noise or outliers and all the predictor variables are in different ranges. As a result, in order to ensure that the
data that are being compared are comparable and to speed up the optimization process of the models, all the input
variables otherwise known as features were scaled. For scaling we used the standardization method also known
as z-score normalization. The formulation of this method is:
𝑥− 𝜇
𝑥𝑛𝑒𝑤 = (1)
𝜎

Where, 𝑥𝑛𝑒𝑤 = feature vector after scaling, 𝑥 = original feature vector, 𝜇 = mean of the feature vector and 𝜎 =
standard deviation of the feature vector.

Figure 1: Overall data preprocessing, data cleaning and forecasting process of this study.

2.4 Multiple Linear Regression

The Multiple linear regression model (MLR) is one of the most widely used models in statistical analysis and
predictive modeling. For our multiple linear regression model, we used Scikit-learn (version 0.23.1), a Python
machine learning library [19]. For training and testing purpose, all three datasets were split with a ratio of 60:40
and for better prediction accuracy the datasets were randomly split. The objective of MLR is to use several
explanatory variables to predict one or more response variable with a linear relationship. The model we used can
be expressed as:
𝑦 = 𝛼 + ∑11 𝑖=1 𝛽𝑖 𝑋𝑖 (2)
5th International Conference on Advances in Civil Engineering (ICACE-2020)
21-23 December 2020
CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace
Here, 𝑦 = value of response variable (daily no. of indoor patients), 𝛼 = unknown regression bias, 𝛽𝑖 = unknown
regression coefficients, 𝑥𝑖 = values of independent variables. To determine the accuracy of the models, we
determined the Root-Mean-Square-Error (RMSE) for the observed and predicted no. of patient admission.

Figure 2: Summary of air pollutant, meteorological and medical data showing median, quartiles and range. Whiskers are
showing the range for all variables.

2.5 Artificial Neural Network

Artificial neural networks (ANNs) are complex multivariate statistical models, which can be used to establish a
complex nonlinear relationship between random response variables with given explanatory variable. We used a
fully connected feedforward neural network, specifically Multi-layer Perceptron (MLP) which has previously
been used for patient forecasting using medical and environmental data[9, 20]. This type of network is constructed
with different layers, specifically, an input layer, one or more hidden layers and an output layer, each layer
consisting one or more nodes. The input data is inserted inside the network through the input layer nodes and the
data moves forward though the nodes of the network to the output nodes. The nodes are connected with linear
connections. The value of each node is calculated by summing up the input values with its associated weights and
biases, and then the result is passed to the nodes of the next layer. To introduce non linearity to the network we
used Rectified Linear Units (ReLU) as activation function to each of the hidden nodes. The ReLU activation
function is as follows:

𝑓(𝑥) = max(0, 𝑥) (3)

Where, 𝑥 is the input to a node. In our present study, we used a deep feedforward neural network architecture with
a total of 6 hidden layers, determined by trial and error method, generalized for all 3 of our datasets. The overall
network architecture along with the no. of nodes used in each layer is shown in Figure 3. The no. of nodes in each
layer is also determined by trial and error.

For all 3 datasets, the datasets were first split into train and test subsets and we used the same subsets used in the
MLR models. To implement the MLP models we used PyTorch (Version 1.6.0), a Torch based open-source
machine learning framework for Python, with good Graphical Processing Unit (GPU) support [21]. The networks
were optimized by Adam optimizer using backpropagation algorithm[22]. We trained the networks using 3
different loss functions, Mean Squared Error (MSE), Mean Absolute Error (MAE) and Huber Loss [23]. We used
5th International Conference on Advances in Civil Engineering (ICACE-2020)
21-23 December 2020
CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace
a specific form of Huber loss known as Smooth L1 Loss, which is a robust loss function for regression problems.
The function can be described as:
1
(𝑦 − 𝑦̂)2 , 𝑓𝑜𝑟 |𝑦 − 𝑦̂| ≤ 1
𝐿𝛿=1 (𝑦, 𝑦̂) = { 2 1 (3)
|𝑦 − 𝑦̂| − , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
2

Where, 𝑦 = observed value, 𝑦̂ = predicted value. In case of outliers, Huber loss function increases less rapidly
than quadratic loss functions. Hence, the estimation of the loss using this loss function is robust to outliers [24].
For hyperparameter optimization and model selection, we further split the test subset into cross-validation set and
final test set with a ratio of 50:50. After determining the optimized models for all 3 datasets, we determined RMSE
as a measure of accuracy of the models, as we did for the MLR models.

Figure 3: Schematic diagram of feedforward neural network architecture with no. of nodes in each layer. Bias nodes are not
shown in the diagram. Outputs of the nodes is passed through a ReLU activation function before entering the next layer
except for the output layer.

3. Results and Discussion

We used ANN, specifically MLP and MLR to forecast the number of hospital admissions due to chest related
diseases. RMSE of the observed and predicted numbers for each of the datasets and training methods used, are
summarized in Table 2. From the table it is evident that for our datasets, prediction error is the lowest for the
MLP model trained with CAMS-3 air pollutant parameters and MSE-loss function. Prediction error for MLR is
also lower for the dataset with CAMS-3 air pollutant parameters. This indicates that, the optimal model for
predicting hospital admissions would be a MLP trained with MSE-loss compared to MLR for this type of dataset.
Figure 4 shows the observed and predicted number of indoor admissions in the test set with the best performing
model by taking 7 data points moving average. From the figure, we observed that, although our model could
recognize the patterns of patient admission fluctuation, it could not identify the peak numbers.

Table 2: Summary of RMSE of predicted no. of indoor admissions using models trained with different loss
functions
MLP MLP MLP
Station MLR
(MSE-trained) (MAE-trained) (Huber Loss-trained)
Sangshad (CAMS-1) 13.321 13.413 13.447 13.133
Farmgate (CAMS-2) 13.400 13.548 13.450 13.380
Darussalam (CAMS-3) 12.524 12.803 12.879 12.978
5th International Conference on Advances in Civil Engineering (ICACE-2020)
21-23 December 2020
CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace
It is also worth mentioning that, in our study, feature selection was done considering the correlations between
different air pollutant and meteorological parameters. But the prediction error is higher compared to previous
similar studies [10]. One possible reason could be that, on weekly and government holidays, the patient records
are usually low and on the next working day, there is a surge of patients. Due to this sudden surge events, the
overall unpredictable nature of the dependent variable of our dataset increases. Another reason could be not
considering the sequential studied variables as time series. All independent and target variables were taken to be
discrete observations fed into the neural network. This does not take into account the complex seasonality i.e.
weekly or monthly seasonality of patient data and yearly seasonality of pollutant data. And it is also observed that
patient admissions are much higher in the winter season.

The handling process of our data classified the real-time variability and surges of the datasets as anomalies and
outliers. This had a significant impact on the prediction results. And it is quite difficult to explain in an intelligible
form the relative importance of the various input variables because association with environmental parameters
and medical problems are too complex to be expressed analytically. The training and application of ANN models
after having taken into account even more of the factors affecting the phenomenon would probably result in
significant improvement of predicting ability of ANN models. Other neural network architectures, specifically
recurrent neural network with Long Short-term Memory (LSTM) cells may also improve the prediction
performance.

Figure 4: No. of hospital admissions for chest diseases predicted by the optimum model (7 data points moving average).
Prediction is done on the whole test set and compared to the observed values in the same set.

4. Conclusion

Our study showed MLP trained with MSE-loss function will produce a better result than MLR for predicting
hospital admission of patients with chest diseases from air quality and meteorological data sets. Identification of
the trends in indoor patient admission can also be done with these predictive models. Further study needs to be
conducted on multiple hospitals to have a more conclusive result. Time series prediction with ANN Most of the
hospital policies in Bangladesh are based on cause and effect, not from factual studies. This study will help
hospital administrators adopt necessary changes to be well prepared before a sudden influx of patients.

Acknowledgement

We are thankful to Department of Civil Engineering, Bangladesh University of Engineering and Technology
(BUET) for the necessary guidance and motivation for this study. We would also like to thank Department of
Environment (DOE) for providing us air quality, Bangladesh Meteorological Department (BMD) for the
meteorological data, National Institute of Diseases of the Chest and Hospital (NIDCH) for the patient data.

References
1. Prado GF, Zanetta DMT, Arbex MA, Braga AL, Pereira LAA, de Marchi MRR, de Melo Loureiro AP,
5th International Conference on Advances in Civil Engineering (ICACE-2020)
21-23 December 2020
CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace
Marcourakis T, Sugauara LE, Gattás GJF, Gonçalves FT, Salge JM, Terra-Filho M, de Paula Santos U
(2012) Burnt sugarcane harvesting: Particulate matter exposure and the effects on lung function,
oxidative stress, and urinary 1-hydroxypyrene. Sci Total Environ 437:200–208 .
https://doi.org/10.1016/j.scitotenv.2012.07.069
2. Zheng XY, Ding H, Jiang LN, Chen SW, Zheng JP, Qiu M, Zhou YX, Chen Q, Guan WJ (2015)
Association between Air pollutants and asthma emergency room visits and hospital admissions in time
series studies: A systematic review and meta-Analysis. PLoS One 10: .
https://doi.org/10.1371/journal.pone.0138146
3. Moore E, Chatzidiakou L, Kuku MO, Jones RL, Smeeth L, Beevers S, Kelly FJ, Barratt B, Quint JK
(2016) Global associations between air pollutants and chronic obstructive pulmonary disease
hospitalizations: A systematic review. Ann Am Thorac Soc 13:1814–1827 .
https://doi.org/10.1513/AnnalsATS.201601-064OC
4. Rossi OVJ, Kinnula VL, Tienari J, Huhti E (1993) Association of sever asthma attacks with weather,
pollen, and air pollutants. Thorax 48:244–248 . https://doi.org/10.1136/thx.48.3.244
5. Forsberg B, Stjemberg N, Falk M, Lundback B, Wall S (1993) Air pollution levels, meterological
conditions and asthma symptoms. Eur Respir J 6:1109–1115
6. Wang K-Y, Chau T-T (2013) An association between air pollution and daily outpatient visits for
respiratory disease in a heavy industry area. PLoS One 8:e75220
7. Samet JM, Speizer FE, Bishop Y, Spengler JD, Ferris BG (1981) The relationship between air pollution
and emergency room visits in an industrial community. J Air Pollut Control Assoc 31:236–240 .
https://doi.org/10.1080/00022470.1981.10465214
8. Shakerkhatibi M, Dianat I, Asghari Jafarabadi M, Azak R, Kousha A (2015) Air pollution and hospital
admissions for cardiorespiratory diseases in Iran: artificial neural network versus conditional logistic
regression. Int J Environ Sci Technol 12:3433–3442 . https://doi.org/10.1007/s13762-015-0884-0
9. Khatri KL, Tamil LS (2018) Early Detection of Peak Demand Days of Chronic Respiratory Diseases
Emergency Department Visits Using Artificial Neural Networks. IEEE J Biomed Heal Informatics
22:285–290 . https://doi.org/10.1109/JBHI.2017.2698418
10. Moustris KP, Douros K, Nastos PT, Larissi IK, Anthracopoulos MB, Paliatsos AG, Priftis KN (2012)
Seven-days-ahead forecasting of childhood asthma admissions using artificial neural networks in
Athens, Greece. Int J Environ Health Res 22:93–104 . https://doi.org/10.1080/09603123.2011.605876
11. Mahmood SAI (2011) Air pollution kills 15,000 Bangladeshis each year: The role of public
administration and governments integrity. J Public Adm Policy Res 3:129–140
12. Islam MM, Afrin S, Ahmed T, Ali MA (2015) Meteorological and seasonal influences in ambient air
quality parameters of Dhaka city. J Civ Eng 43:67–77
13. Khalequzzaman M, Kamijima M, Sakai K, Hoque BA, Nakajima T (2010) Indoor air pollution and the
health of children in biomass-and fossil-fuel users of Bangladesh: situation in two different seasons.
Environ Health Prev Med 15:236–243
14. Akash M, Akter J, Tamanna T, Kabir MR (2018) The Urbanization and Environmental Challenges in
Dhaka City. SSRN Electron J 145–157 . https://doi.org/10.2139/ssrn.3152116
15. Ahmed F, Ishiga H (2006) Trace metal concentrations in street dusts of Dhaka city, Bangladesh. Atmos
Environ 40:3835–3844 . https://doi.org/10.1016/j.atmosenv.2006.03.004
16. Begum BA, Biswas SK, Hopke PK (2011) Key issues in controlling air pollutants in Dhaka,
Bangladesh. Atmos Environ 45:7705–7713 . https://doi.org/10.1016/j.atmosenv.2010.10.022
17. Hossain KMA, Easa SM (2012) Pollutant dispersion characteristics in Dhaka city, Bangladesh. Asia-
Pacific J Atmos Sci 48:35–41 . https://doi.org/10.1007/s13143-012-0004-8
18. Guttikunda SK, Begum BA, Wadud Z (2013) Particulate pollution from brick kiln clusters in the
Greater Dhaka region, Bangladesh. Air Qual Atmos Heal 6:357–365 . https://doi.org/10.1007/s11869-
012-0187-2
19. Varoquaux G, Buitinck L, Louppe G, Grisel O, Pedregosa F, Mueller A (2015) Scikit-learn. GetMobile
Mob Comput Commun 19:29–33 . https://doi.org/10.1145/2786984.2786995
20. Thakur S, Dharavath R (2019) Artificial neural network based prediction of malaria abundances using
big data: A knowledge capturing approach. Clin Epidemiol Glob Heal 7:121–126 .
https://doi.org/10.1016/j.cegh.2018.03.001
21. Ketkar N (2017) Deep Learning with Python. Deep Learn with Python 195–208 .
https://doi.org/10.1007/978-1-4842-2766-4
22. Kingma DP, Ba JL (2015) Adam: A method for stochastic optimization. 3rd Int Conf Learn Represent
ICLR 2015 - Conf Track Proc 1–15
23. Huber PJ (1964) Robust Estimation of a Location Parameter. Ann Math Stat 35:73–101 .
https://doi.org/10.1214/aoms/1177703732
24. Li D, Han M, Wang J (2012) Chaotic time series prediction based on a novel robust echo state network.
5th International Conference on Advances in Civil Engineering (ICACE-2020)
21-23 December 2020
CUET, Chattogram, Bangladesh
www.cuet.ac.bd/icace
IEEE Trans Neural Networks Learn Syst 23:787–797 . https://doi.org/10.1109/TNNLS.2012.2188414

You might also like