You are on page 1of 5

A Real Time Urban Flood Monitoring System for

Metro Manila
Felan Carlo C. Garcia1, Alvin E. Retamar2, Joven C. Javier3
Solutions and Services Engineering Division,
Advanced Science and Technology Institute,
ASTI Bldg., CP Garcia Ave.
University of the Philippines Technology Park,
Quezon City, Philippines
felan@asti.dost.gov.ph1, ning@asti.dost.gov.ph2, joven@asti.dost.gov.ph3,

Abstract— A real time urban flood monitoring system was solution for monitoring street-level flooding in real time at
deployed into two streets (Earnshaw and San Diego Streets) on predetermined levels. [3]
España Boulevard, Manila. The system consists of a ground-based In our previous paper [3], we discussed the design
pressure sensor and a rain gauge connected to a locally designed and prototype development of a remote monitoring station for
data logger with telemetry capabilites using GPRS network. Data
the monitoring of urban flooding, however, the previous study
from the stations are received by a TCP server and is processed in
order to provide visual information and realtime flood updates lacks data gathering during heavy rain conditions and several
through mobile and web services. An ahead of time flood estimation features from the visualization tool, namely a historical data
system was implemented using a Random Forest algorithm in order archive and a mobile application in order to deliver updates
to provide an early warning advisory to motorist and users of the for users on the road. It was also noted that the 3-10 minutes
system. Results from the test validation show that the resulting sending interval of the data logger is a considerable lag time
prediction model indicates a strong predictive performance without for the getting the next flood readings. Abrupt changes on
relying on rainfall-runoff model obtained through geological and flood levels might likely occur during these time intervals and
hydrological surveys. the lack of ahead-of-time information can possibly put
Keywords—Disaster Response and Risk Management; Urban unsuspecting motorists into unfavorable situations. To address
Flood; Warning systems; Web Visualization and Information this issue, a flood estimation advisory system was proposed to
Systems; Machine Learning; Random Forests; provide an ahead of time flood estimation to the users
whenever a rain event occurs.
I. INTRODUCTION The challenge on implementing an ahead-of-time
Floods in Metro Manila continue to grab headlines flood estimation requires a thorough hydrological and
and draw national and international attention not only because geological survey of the area near the station in order to
of the threat to life and property, but also because of the provide an accurate flood estimation using a rainfall-runoff
nuisance it generates thru massive snarls in traffic. By model. In the context of this project, performing surveys per
monitoring real-time floods in the streets of Metro Manila, area for each of the proposed 60 stations is outside of its scope
worsening traffic may be avoided through early identification of works and objectives.
of impassable roads and suggestion of alternative routes to A study by Caruana et al. have shown that data
motorists. At the same time, historical archive of flood data driven approach using a machine learning method is capable
can provide researchers with the ability to develop, risk of establishing ahead of time flood predictions and is a viable
management and infrastructure changes regarding this major approach when there is lack of a geological and hydrological
problem in the area.[1] model of a specific site.[4] While their study used an Artificial
In order to address the problems related to weather Neural Networks (ANN) to implement a hydrological model
events, DOST launched Project NOAH to provide an for urban flooding, several other algorithms are also available
integrated disaster prevention and mitigation system especially for researchers in order to conduct machine learning
in areas with possible high-risk of disaster throughout the predictions.
Philippines. However the target of this project is on a national One study compared various supervised learning
scale basis and lacks an area-based real time monitoring algorithms in terms of performance and accuracy and their
system.[2] results have shown that the top performing algorithms are
In view of this, the Department of Science and Boosted Trees, Random Forests, Support Vector Machines
Technology - Advanced Science and Technology Institute (SVM) and Neural Networks. [5]
(DOST-ASTI) is implementing a project entitled “Flood In this contribution, an ahead-of-time flood level
sensor Development, Installation, and Monitoring of Urban prediction was developed and incorporated to the system in
Flooding in Metro Manila”, which aims to develop and order to provide a flood estimation advisory for motorists and
implement 60 urban flood stations along with a software citizens alike. Random Forest was the method chosen due to
its ease of implementation and subjacent computational

978-1-4799-8641-5/15/$31.00 ©2015 IEEE


requirements compared to ANN while providing a consistent locally developed solar powered data logger with telemetry
prediction performance.[5] using GPRS and data storage capabilities. The electronic
instrumentation is placed inside a mesh cage in order to
prevent tampering of the hardware. Two stations were
II. THEORETICAL FRAMEWORK deployed at two nearby streets (Earnshaw and San Diego
Streets) on España Boulevard, Manila.
A. Random Forest Algorithm
The Server Applications consists of the TCP server to
Random Forest (RF) is an ensemble learning algorithm process incoming data from the hardware stations and insert
utilizing multitudes of uncorrelated decision tree predictors them into a PostgreSQL and MongoDB databases.
formulated on the basis of various random features that are
sampled independently from the same distribution for all trees
in the forest. [6]
Random Forest utilizes bootstrap aggregating and random
feature selection by building decision trees from a bootstrapped
sample taken from a training set. During the construction of a
tree, a node is split based on the best among the random subset
of the features. This introduction of randomness results into an
increase bias of the prediction. However, since the final
prediction of the random forest is based on the average
prediction of all the built decision trees, the final prediction’s
variance decreases, compensating for the increase in bias and
yielding an overall better prediction model. [7]
Figure 2: Web Visualization of the urban monitoring stations. The
In random forests, errors are estimated internal during the green and red markers indicate the various stations available on the
run. Cross-validation or a separate test sample is not required website. The color of the markers changes according to the measure water
to get an unbiased estimate of the test set error. Approximately level.
one-third of the cases are left out from the sample and used as
out-of-bag (OOB) data. This OOB data is used as a running
unbiased approximation of the classification error and Three web services were developed in order to cater
estimates of variable importance as trees are added to the various users.
forest.[8]
1.) A web application was developed as a visualization
tool providing real time updates and 24-hour historical
rain and flood data to the users as shown on Figure 2.
III. METHODOLOGY
2.) A mobile application was developed to provide users
with on-the-go information and monitoring of street
flooding in real-time, allowing them to automatically
adjust their routes and travel schedules.

3.) A web application programming interface (API)


service was also developed in order to share data
through various institutions and researchers interested
on the issue of flooding in Metro Manila.

Figure 1: Urban Flood Monitoring System [2] consisting of various B. Prediction Model
hardware and software components. The green line shows the flow of the There are various steps before coming up with a
data gathered from the hardware monitors onto its destination to the
servers and web services. prediction for a given station, these are data extraction,
preparation, prediction, and validation.
Figure 1 shows the remote monitoring’s system
architecture. The architecture can be subdivided into three 1.) Data extraction – Data gathered by the monitoring
major components: Electronic Instrumentation, Server stations is sent the server and stored via a database.
Applications and Web Services. Data of the station starting from October 2014 – July
2015 was extracted from the database.
A. Systems Layout
2.) Data preparation – Data extracted from the database
The Electronic Instrumentation consists of ground based contains various information such as the health of the
pressure sensor and automatic rain gauge connected to a data logger and solar panels aside from sensor

2
readings. Preparation of the training data involves
truncating the unnecessary data and taking only the
necessary features and labels that are needed by the
prediction model.

The parameters flood level (cm) and rain


amount (mm) from the data are taken as the features
of the training set. For the label, n - 1 time lag of the
flood level values are used.

TABLE I
FEATURES AND LABELS EXTRACTED FROM THE DATA Figure 3: Flood levels measured from October 2014 – July 2015
AND PREPARED AS THE TRAINING SET

Features Features Label

Flood level Rain amount Flood level


(cm) (mm) (n-1 time lagged
component)

3.) Prediction – once the training set is prepared, a


Random Forest model is used to train the model. A
Python implementation using Scikit-learn’s Random
Forest with 3000 number of trees was used in this
application.[7] Figure 4: Rain amount measured from October 2014 – July 2015

4.) Validation – The OOB score of the trained Random


Forest model can obtained by using Scikit-learn’s
provide OOB score during the training process of the
algorithm[7]. The resulting model from the Random
Forest is tested using two datasets of recorded rain
and flood events: June 8, 2015 and August 7, 2015.
The coefficient of determination R2 of the prediction
is computed to determine its score. The highest
possible R2 score is 1.0, lower values indicate worse
performance.[7]

IV. RESULTS
Data from October 2014 – July 2015 was extracted from
the database. Since not all days have rain and flood event, only Figure 5: Rain Amount measured on October 6, 2014 (San Diego, Manila)
days that have rain and flood level data are considered for the
training set. For this training set, we will only consider the
station located at San Diego, Manila as it is the station with the
complete setup consisting of a pressure based water level
sensor and rain gauge.

Figure 6: Flood levels measured on October 6, 2014 (San Diego, Manila)

3
Figures 3 and 4 shows the corresponding flood levels and Figure 7 shows the predicted values closely following the
rain amount for the months of October 2014 – July 2015. trend of the measured flood levels during the rain event on June
Figures 5 and 6 shows a portion of the data from October 2014 8, 2015. From Table 2, the computed coefficient of
as shown on Figures 3 & 4. These values are processed and determination R2 for the prediction is 0.938 and indicates a
placed into the appropriate features and labels for the training good performance for the prediction model.
set.
The training set is inputted upon the Random Forest
learning algorithm and the resulting prediction model has an
OOB score of 0.9375 indicating a strong predictive
performance. The model is tested against two data set
consisting of: June 8, 2015 and August 7, 2015.

Figure 8: Prediction of the model for the data of August 7, 2015 (San
Diego, Manila)

TABLE III
PREDICTION VS MEASURED FLOOD LEVELS FOR
AUGUST 7, 2015
Figure 7: Prediction of the model for the data of June 8, 2015 (San Diego,
Manila) Datetime Predicted Measured
(cm) (cm)
TABLE II 8/7/2015 0:00 2.844366733 1.154
PREDICTION VS MEASURED FLOOD LEVELS FOR 8/7/2015 0:10 16.4080174 8.261
JUNE 8, 2015
8/7/2015 0:20 42.4489574 37.497
Datetime Predicted Measured (cm) 8/7/2015 0:30 41.8369032 57.384
(cm) 8/7/2015 0:40 51.2479304 57.939
6/8/2015 15:30 12.41337965 16.635 8/7/2015 0:50 68.2231516 64.467
6/8/2015 15:40 28.29556716 31.366 8/7/2015 1:00 65.0109754 64.025
6/8/2015 15:50 35.5754212 59.129 8/7/2015 1:10 49.4974892 54.631
6/8/2015 16:00 66.0253236 64.297 8/7/2015 1:20 51.3606842 53.554
6/8/2015 16:10 67.7663474 66.166 8/7/2015 1:30 47.2175836 46.302
6/8/2015 16:20 65.7641354 67.096
8/7/2015 1:40 34.4449046 29.701
6/8/2015 16:30 61.1911596 57.747
8/7/2015 1:50 28.48777502 28.273
6/8/2015 16:40 44.9490882 45.644
8/7/2015 2:00 17.54201948 19.627
6/8/2015 16:50 41.324948 39.242
8/7/2015 2:10 20.3947056 19.094
6/8/2015 17:00 35.5668246 33.1
8/7/2015 2:20 10.1917846 11.411
6/8/2015 17:10 27.6735284 26.018
8/7/2015 2:30 1.204951757 1.154
6/8/2015 17:20 20.553613 21.87
8/7/2015 2:50 1.15838487 1.144
6/8/2015 17:30 17.97338434 17.236
8/7/2015 3:00 1.15838487 1.144
6/8/2015 17:40 13.05010465 11.819
8/7/2015 3:10 1.25 0.5
6/8/2015 17:50 7.8972839 8
6/8/2015 18:00 5.6873718 5.7
6/8/2015 18:10 4.0492496 3.2
6/8/2015 18:20 1.65587424 1.5
6/8/2015 18:30 2.375998728 0.7

4
Figure 8 shows the predicted values also closely following monitoring system using a machine learning algorithm. It was
the trend of the measured flood levels during the rain event on also shown that the method can be repeated for the other
August 7, 2015. From Table 3, the computed coefficient of stations in order to provide an ahead of time flood estimation
determination R2 for the prediction is 0.958 and indicates a specific to the area without the need of performing additional
strong performance for the prediction model. geological and hydrological survey.
Both prediction on the two datasets have shown that The coefficient of determination of the predicted
Random Forest provides a strong performing prediction model values indicated strong predictive performances, however the
for an ahead-of-time flood estimation. While the computed model underestimates the rise of the flood level. Further
coefficient of determination for both datasets are > 0.90, it is optimization of the algorithm can be performed and additional
worth noting that the prediction model underestimates the rise data gathering of flood levels in the particular area can be
of the flood level. In the context of flood estimation as an done in order to refine the ahead of time flood estimation
ahead-of-time warning tool, overestimating the flood levels is system.
the preferred behavior of the prediction model as it can lead to
more conservative decisions for citizens and motorist alike. REFERENCES
[1] Urban Flood Risk Management – A Tool for Integrated Flood
Management (2008). Retrieved from:
http://www.apfm.info/publications/tools/Tool_06_Urban_Flood_Risk_
Management.pdf
[2] A.M.F. Lagmay (2012) Disseminating near real-time hazards
information and flood maps in the Philippines through Web-GIS.
DOST-Project NOAH Open-File reports Vol. 1 (201), pp. 28-36 ISSN
2362 7409
[3] A. Retamar, F.C. Garcia, J.J. Yabut, and J.Javier, “Design and
Development of a Remote Station for Real-time Monitoring of Urban
Flooding”, Proceedings of the Asia-Pacific Advanced Network 2014 v.
38, p. 99-114.
[4] A. Duncan, E. Keedwell, S.Djordjevic, D.Savic, "Machine Learning-
Based Early Warning System for Urban Flood Management", ICFR
2013: International Conference on Flood Resilience: Experiences in
Figure 9: Flood Estimate feature added on the web monitor Asia and Europe, University of Exeter, UK, 5-7 September 2013
[5] R. Caruana, A. Niculescu-Mizil., “An empirical comparison of
supervised learning algorithms”, ICML '06 Proceedings of the 23rd
Figure 9 shows the integration of the ahead of time flood international conference on Machine learning Pages 161-168
estimation for the web application of the project. [6] Breiman, L., 2001, Random forests. Machine Learning, 45, 5-32.
[7] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12,
V. CONCLUSION pp. 2825-2830, 2011.
[8] Breiman, L. (n.d.). Random Forests Leo Breiman and Adele Cutler.
The research was able to developed and incorporate Retrieved August 10, 2015, from
an ahead of time flood estimation model to the urban flood http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

You might also like