You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221936175

Development and Verification of an Online Artificial Intelligence System for


Detection of Bursts and Other Abnormal Flows

Article  in  Journal of Water Resources Planning and Management · May 2010


DOI: 10.1061/(ASCE)WR.1943-5452.0000030

CITATIONS READS

105 369

3 authors:

Steve Mounce Joby B Boxall


The University of Sheffield The University of Sheffield
78 PUBLICATIONS   874 CITATIONS    205 PUBLICATIONS   2,848 CITATIONS   

SEE PROFILE SEE PROFILE

John Machell
The University of Sheffield
63 PUBLICATIONS   790 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Assessing the Underworld View project

Water age and quality in drinking water distribution networks View project

All content following this page was uploaded by Steve Mounce on 15 March 2019.

The user has requested enhancement of the downloaded file.


Development and Verification of an Online Artificial
Intelligence System for Detection of Bursts and
Other Abnormal Flows
S. R. Mounce1; J. B. Boxall2; and J. Machell3

Abstract: Water lost through leakage from water distribution networks is often appreciable. As pressure increases on water resources,
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

there is a growing emphasis for water service providers to minimize this loss. The objective of the work presented in this paper was to
assess the online application and resulting benefits of an artificial intelligence system for detection of leaks/bursts at district meter area
共DMA兲 level. An artificial neural network model, a mixture density network, was trained using a continually updated historic database that
constructed a probability density model of the future flow profile. A fuzzy inference system was used for classification; it compared latest
observed flow values with predicted flows over time windows such that in the event of abnormal flow conditions alerts are generated.
From the probability density functions of predicted flows, the fuzzy inference system provides confidence intervals associated with each
detection, these confidence values provide useful information for filtering and ranking alerts. Additionally an accurate estimate of
abnormal flow magnitude is produced to further aid in ranking of alerts. A water supply system in the U.K. was used for a case study with
near real-time flow data provided by general packet radio service. The online burst alert system was constructed to operate alongside an
existing flat-line alarm system, and continuously analyze a set of 144 DMAs every hour. The new system identified a number of events
and alerts were raised prior to their detection in the control room; either through flat-line alarms or customer contacts. Examples are given
of alert correlation with burst reports and subsequent mains repairs for a 2-month trial period. Forty four percent of alerts were found to
correspond to bursts confirmed by repair data or customer contacts, 32% of alerts were confirmed as unusual short-term demand from
manual analysis, 9% were related to known industrial events, and only 15% were ghosts. The results indicate that the system is an
effective and viable tool for online burst detection in water distribution systems with the potential to save water and improve customer
service.
DOI: 10.1061/共ASCE兲WR.1943-5452.0000030
CE Database subject headings: Water distribution systems; Water management; Leakage; Artificial intelligence; Neural networks.
Author keywords: Water distribution systems; Water management; Leakage; Artificial intelligence; Neural networks.

Introduction works are often high, and it is not unusual for more than one-third
of the water to be lost before delivery, even in many Central and
Water supplies are under great duress as a result of factors such as Eastern European countries. In the United Kingdom, often seen as
rapid population growth, unsustainable consumption patterns, a European leader, leakage as a proportion of water supplied has
poor management practices, inadequate investment in infrastruc- been consistently around 22–23% during the current decade
ture, and low efficiency in water use. This situation is likely to 共OFWAT 2006兲, as this is close to the “economic” level of leak-
become more acute in the future as a result of the more extreme age using current tools, techniques, and technologies. The eco-
events predicted to occur as a consequence of climate change and nomic level of leakage 共ELL兲 is defined as “the level at which it
described in the GEO-4 scenarios 关United Nations Environment costs more to reduce leakage further than to produce that water
Programme 共UNEP兲 2007兴. Leakage losses from distribution net- from an alternative source” 共House of Lords 2006兲. This in-
creased pressure on water resources and a growing emphasis on
1 sustainable business practice in recent years have led water ser-
Pennine Water Group, Dept. of Civil and Structural Engineering,
Univ. of Sheffield, Sheffield S1 3JD, U.K. 共corresponding author兲.
vice providers to consider water loss in treated water supply sys-
E-mail: S.R.Mounce@sheffield.ac.uk tems as a key “water supply system performance indicator.” It is
2
Pennine Water Group, Dept. of Civil and Structural Engineering, no longer simply an economic issue, and the environmental and
Univ. of Sheffield, Sheffield S1 3JD, U.K. E-mail: J.B.Boxall@sheffield. social costs must also be factored in. Hence, technological devel-
ac.uk opments to reduce losses in water distribution systems are essen-
3
Pennine Water Group, Dept. of Civil and Structural Engineering, tial.
Univ. of Sheffield, Sheffield S1 3JD, U.K. E-mail: j.machell@sheffield. An important issue for leaks and bursts is that of timely detec-
ac.uk tion minimizing water losses. Burst duration can be divided con-
Note. This manuscript was submitted on September 29, 2008; ap-
ceptually into awareness, location, and repair times 关Water
proved on April 30, 2009; published online on May 15, 2009. Discussion
period open until October 1, 2010; separate discussions must be submit- Research Centre 共WRC兲 1994兴. Often there will be a gap between
ted for individual papers. This paper is part of the Journal of Water a leak commencing and the water company becoming aware of it.
Resources Planning and Management, Vol. 136, No. 3, May 1, 2010. One of the primary aims of the described research is to minimize
©ASCE, ISSN 0733-9496/2010/3-309–318/$25.00. the time prior to awareness, as shown conceptually in Fig. 1. The

JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010 / 309

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


onstrate potential, they are not widely used in practice due to the
necessary larger than normal industry practice deployment of in-
struments and the accuracy of hydraulic models required. Real-
time detection of burst and leak incidences direct from analysis of
hydraulic data is not common practice in the water industry.
Proactive leakage detection. District metering is a widely
adopted practice in the United Kingdom and its use has increased
internationally. It is based on the subdivision of distribution sys-
Fig. 1. Life cycle of an example burst
tems into discrete zones by the permanent closure of valves and
the measurement of the flows into 共and in some cases out of兲 each
system is designed to provide detection of bursts and leaks as they zone. A DMA will generally comprise 500–3,000 properties. Gen-
occur but not existing or background leakage. erally, import and export points of a DMA, water storage inlet/
This paper presents the results of applying an online artificial outlets, pumping stations, large industrial users, a low pressure
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

intelligence 共AI兲 system for automated hourly analysis to a pilot point in the DMA, and important locations on trunk mains are
case study consisting of 144 district meter areas 共DMAs兲 and an monitored with permanently installed flow meters and pressure
evaluation of system performance with direct feedback to and transducers 关UK Water Industry Research 共UKWIR兲 2006兴. By
from the water company control room. This self-learning AI sys- monitoring average night flows, unusual changes in water vol-
tem is not reliant on any special hardware or network configura- umes can be detected. If the night flow minus the legitimate flow
tion and produces intelligent “smart alarms.” is close to zero, leakage is assumed to be close to zero. In con-
trast, a positive residual will signify leakage in the absence of any
other factors. The identification and quantification of distribution
Background losses in this way relies on the accurate estimation of the expected
night flows 共McKenzie and Seago 2005兲. System operators will
generally decide, based on experience with the particular water
Leakage system, if night flow increase is due to a leak or other factors.
Water utilities generally first obtain awareness of bursts through Although this technique can be effective, night flow analysis is
customer contacts via call centers where the details of service not necessarily conducted routinely or consistently due to reliance
problems are collected such as complaints of low pressure or on manual interpretation. Without customer contact, a leak may
discoloration and possibly signs of visible surface water. Al- run undetected for several months, or longer. Hence there is scope
though large bursts in a water supply system are usually found for automated systems to address the issue of leak/burst identifi-
and fixed quite rapidly due to multiple customer contacts, not all cation and reduce water losses.
bursts result in visible surface water and can go undetected. Re-
lying solely on customer contacts adds to the poor image of water Artificial Neural Networks
utility companies, and a more proactive approach is needed in
which the control room and field teams are alerted to failure be- Recent developments in the field of computational intelligence
fore customers make contact; hence minimizing limitation of sup- variously called soft computing, machine learning, or data-driven
plies, maintaining service to customers, and limiting water losses. modeling are helping to solve various problems in the water re-
Leaks that do not impact on customers are hence undetected and sources domain. These techniques are ideal for analyzing engi-
become background leakage. Water companies typically address neering problems where solutions need to be developed without
background leakage when cumulative background leakage be- solving the microscale interactions that actually occur and may be
comes apparent in flow data and is above the ELL for a given poorly understood, but where measured data are readily available.
area. Leakage detection activities may be classified in two groups: Artificial neural networks ANNs have been widely applied in
reactive and proactive. different fields of engineering and are a modeling approach based
Reactive leak location is carried out by water company person- on how biological neural systems are believed to work. It has
nel once it is deemed that a sufficiently serious problem has de- been proved mathematically that ANNs are universal computing
veloped, identified through customer contacts, or via other machines capable of arbitrary nonlinear function approximation
information. This process commonly entails application of many provided they are given sufficient training data 共Hornik et al.
different techniques ranging from listening rods or sticks through 1989兲. ANNs have been successfully applied to a range of water
step testing to geophones and noise correlators. Several research modeling problems and have displayed particular promise for
and development projects have explored the potential for leak forecasting applications. They can be evaluated based on physical
location by searching for deviation in pressure signals across mul- model outcomes and experimental/field data can be further inte-
tiple pressure monitoring points. Misiunas et al. 共2005兲 presented grated in order to enhance their performance. An ANN-based
such a technique, validated for both laboratory and field condi- model was effectively applied to reservoir inflow prediction and
tions and then subsequently demonstrated the potential for detec- operation 共Jain et al. 1999兲. The ANN was found to model the
tion and location of medium to large bursts within a DMA from high flows better, whereas low flows were better predicted
continuous monitoring of network inflow and pressure for a case through the autoregressive integrated moving average model.
study 共Misiunas et al. 2006兲. Shinozuka and Liang 共2005兲 devel- ANNs were used for predicting runoff over three medium-sized
oped a technique to identify the location and severity of leak watersheds in Kansas 共Anmala et al. 2000兲, they concluded that
damage using a damage detection approach based on a neural recurrent ANNs with feedback generally resulted in best perfor-
network inverse analysis method. Stoianov et al. 共2007兲 reported mance. Bowden et al. 共2002兲 demonstrated two methodologies for
results of field trials to explore leakage detection using decentral- optimal division of data for ANN models by means of an example
ized “intelligent” sensors for a near real-time monitoring system of forecasting salinity in the River Murray at Murray Bridge
for water transmission pipelines. While all these techniques dem- 共South Australia兲 14 days in advance. ANNs have been applied to

310 / JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


both short-term demand forecasting for distribution systems, i.e., Algorithms
for 24 h demand forecasts 共Greenaway et al. 2006兲 and for much
longer term demand prediction, such as over a 10-year horizon The viability of using ANNs to analyze DMA flow meter data for
共Chang and Makkeasorn 2006兲. Maier and Dandy 共2000兲 provide the identification of burst events from historical data sets was
a comprehensive review of 43 papers dealing with the use of presented in Mounce et al. 共2002兲. An ANN model called the
ANNs for the prediction and forecasting of water resources vari- mixture density network 共MDN兲 as presented by Bishop 共1994兲
ables, as well as a useful protocol for developing such models. was used as the time series predictor, to produce a probability
Their study found that in all but two papers reviewed, feed- density forecast rather than a point prediction as discussed earlier.
forward networks were used, and that most used the back- A MDN is a mixture density model combined with an ANN. It
propagation training algorithm. consists of a two-hidden layer network; the first layer with Sig-
There are many forecasting problems where simple point pre- moidal units, and the second with Gaussian units. Empirical in-
dictions are not adequate to guide action. In the realm of water vestigations indicated that 10–15 hidden units gave marginally
resources management it is often important to know the probabil- beneficial accuracy for time series prediction. Further, one or two
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

ity of a variable falling within 共or outside兲 different ranges. It Gaussians and 800–1,000 training cycles gave good general per-
could be that a 10% probability of flooding may be enough to formance 共Mounce 2005兲. The target vector is clamped directly to
place emergency services on alert. Problems such as these have the Gaussian nodes and the network is trained using maximum
generated an interest in the area of probability density forecasting. likelihood 共minimizing the negative log of the likelihood兲. The
The standard solution to regression problems is to apply a sum- mechanism for adapting the network parameters is the back-
of-squares error function to N training data pairs, 共x1 , t1兲 , . . . , propagation of error as in a standard MLP ANN; the only differ-
共xn , tn兲, to predict a conditional mean of a new unseen data vector, ence being the error function. Various parameters of the mixture
具tn+1 兩 xn+1典. This form of point prediction is often inadequate in models 共mixing coefficients, means, and variances兲 are governed
practice because there is no indication of the level of uncertainty by the outputs of the neural network which takes xt as its input.
in the model’s predictions. Uncertainty of predictions can be ex- The network is trained with a scaled conjugate gradient optimizer
with the error function E being the negative log-likelihood L of
pressed in terms of probability distributions. A model of the pro-

再兺 冎
the training data
cess can hence be created as a sequence of probability density
functions. This means that for any given input value, e.g., xn+1, N N M

the model produces a conditional density function p共tn+1 兩 xn+1兲. E = − ln L = − 兺


n=1
ln p共xn兲 = − 兺
n=1
ln
j=1
p共xn兩j兲P共j兲 where
One mechanism for achieving this is described later in the meth-

再 冎
odology.
1 储x − ␮j储2
p共x兩j兲 = exp − 共1兲
Fuzzy Logic 共2␲␴2兲d/2 2␴2j
In Eq. 共1兲, x is a data point, p共x兲 is a mixture distribution with M
A time series prediction on its own will not provide a classifica-
component density functions p共x 兩 j兲 and P共j兲 the mixing param-
tion. Forecasting can be very useful, but many applications re-
eters 关the density functions are Gaussian in this case, and having
quire event detection. ANNs can be used for classification in this
the adjustable parameters P共j兲, ␮ j, and ␴ j兴 and where d is the
way, but often water resource data sets do not have a classifica-
dimensionality of the target space t 共for univariate time series
tion vector required for the various training and test sets. For
prediction, this equals 1兲.
example, in the case of burst detection the actual start date and
The input is a lag vector of 96 past 共flow兲 values at a 15-min
time of the burst is unknown 共or at least has high uncertainty, time step and the target is a 1-day ahead prediction. In many
except for extreme events兲. In this case, some approach is re- applications, the distance into the future is simply the next time
quired to detect discrepancies between actual and predicted val- step—this is the usual approach for optimal accuracy of predic-
ues, over some time period. Even where a classification ANN is tion a short period into the future. In this system, a one day ahead
possible, the knowledge obtained is not readily accessible to prediction allows a series of predictions to be built up for the last
human inspection and is, in any case, virtually incomprehensible 24 h. This series of predictions can then be analyzed by the sys-
共a “black box”兲. Fuzzy logic is a useful technique for building tem in conjunction with the actual observed last 24 h data 共facili-
knowledge based systems that represent the impreciseness asso- tating at worst daily downloading of data兲. The classification
ciated with human reasoning and also allows a degree of trans- window used by the FIS is 12 or 24 h for normal event detection
parency. Fuzzy sets can contain elements with only partial degree and thus the one day ahead prediction enables the detection of
of membership. A fuzzy inference system 共FIS兲 can be used to discrepancy caused by abnormal flow. A full theoretical justifica-
enable a degree of certainty to be placed on a classification or tion appears in Mounce et al. 共2003兲. Once trained, the MDN
output 共propositions can be true to a certain degree–somewhere network can predict the conditional probability density function
between 0 and 1兲. The main components of the FIS are the fuzzy of the target data for any given value of the input vector, rather
membership functions defined for the set of variables in the sys- than just a point prediction, which is the case for most other ANN
tem, the set of fuzzy rules, and the algorithms and strategies for models used for time series prediction. An 85% confidence inter-
defuzzification 共obtaining a single valued output兲. Mamlook and val for the predicted value is computed from the mixture model,
Al-Jayyousi 共2003兲 use a fuzzy logic approach for analysis and which is a linear combination of component densities 共general
identification of factors that affect leakage, based on the weights normal distributions兲 and this is used in conjunction with the
determined for each of the factors. The study revealed that the observed data to form a residual that is analyzed by a FIS classi-
major factors that affect leakage are: pipe age, pipe material, op- fication module. The main variable is the number of values in the
erational aspects, and demand patterns. Hybrid systems using the window for which the actual value lies above the upper range of
strengths of both ANNs and FIS 共or other expert systems兲 are the confidence interval with the membership functions and fuzzy
becoming increasingly popular. rules as set out in Mounce 共2005兲. The FIS gives a fuzzy value on

JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010 / 311

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


quality. The system assembles a potential data file and then con-
ducts several tests for amount and quality of data. First, the data
obtained via ODBC from the data warehouse is transformed into
internal MATLAB time series objects consisting of both time se-
ries data and events 共the events are loaded from the event data-
base兲. Events that can be applied to the time series include “reset”
共do not use any data for this logger before this date, e.g., in the
case of a network change such as rezoning兲, “alarm on,” and
“alarm off.” For a new site there would not necessarily be any
alarm states recorded, but over time these would be automatically
populated by the system itself. The data are then subjected to a
number of logic checks based on overall amount of valid data
with user definable parameters and representing a trade off be-
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

tween stringency and a satisfactory training set size. For the case
study, the rules were as follows when analyzing three months
Fig. 2. High level system scheme training data 共where x is the highest number of consecutive days
with all raw data present, that is, data not missing or periods
corresponding to an alarm state, and y is the total of all days with
the presence of a flow abnormality within a windowed time pe- all raw data present兲:
riod 共configurable to between 2 and 24 h兲 at a particular confi-
dence level 共since deviation outside the confidence interval for a
particular singular time step does not, in practice, indicate a IF x ⬍ 14 days
burst兲. The window of analysis used by the FIS in the online INVALID
system was 12 h, selected on the basis that a good rule of thumb
ELSEIF 14⬍ x ⬍ 28
for abnormal flow persisting before it can be potentially classified
VALID, FLAG= ’ Low data period’
as a burst is around this period 共other events, such as industrial
usage or fire fighting, can have a similar appearance especially at ELSEIF (y/TOTAL DAYS) ⬍0.15
night but tend to last for less time兲. The system was then further INVALID
developed to provide burst size estimation 共Mounce et al. 2007兲. ELSEIF (y/TOTAL DAYS) ⬍0.30
VALID, FLAG= ’ Low data period’
ELSE
Online System VALID

Significant modifications to the offline AI software previously The site is either then passed, failed, or tagged as “low data
developed were necessary in order to make it suitable for auto- period—treat with caution.” The maximum period that the system
mated online analysis, including strategies for dealing with data utilizes to train the ANN is a user definable value on the GUI, the
quality, multiple sites, and alarm handling. A version of the sys- default setting is 3 months 共derived empirically as providing a
tem, including a new graphical user interface 共GUI兲, was devel- compromise between data quantity and data variability兲.
oped in MATLAB, which makes use of the Database toolbox, the Flow sensor data obtained from a water distribution system is
Fuzzy Logic toolbox and NETLAB 共Nabney 2001兲. Fig. 2 pro- usually in the form of time series—that is, a data stream whose
vides a scheme of the system integration for the online AI system. value is a function of time. In U.K. industry standard practice, the
The automated analysis system is data-driven starting from the data stream has been sampled at a regular time interval of 15 min
logger units which initiate calls to the telemetry software. The to produce a discrete series xt. A full theoretical basis for modeling
transmitted data are then mirrored to a server and export function- time series in this context using the state-space model can be
ality in the communications software is used to automatically found in Mounce et al. 共2003兲. Time series data of real-world
update 共every hour兲 a set of comma-separated values 共CSV兲 files. phenomena is inherently nonstationary 共for example, in the
An open database connectivity 共ODBC兲 text driver is used to financial world such as shares and derivatives兲. If the series is
interface to an MS Access database storing the time series history, nonstationary, then all the typical results of the classical
current logger values are appended by a data warehouse applica- regression analysis are not valid and results will be spurious. For
tion. The online AI system accesses the database via an ODBC a stationary signal, the signal properties do not change much over
driver. The system runs analysis at predefined intervals 共e.g., time. However, most interesting signals contain numerous
every hour兲 and any classifications by the FIS result in automated nonstationary or transitory characteristics: drift, seasonal trends,
text alerts which can be e-mailed or processed and stored in a abrupt changes, and beginnings and ends of events 共Kendall and
database. Ord 1990兲. These fluctuations may not be limited only to the mean
of the series, but may also affect its overall variance structure. The
time domain must be considered when adopting an MDN network
Data Handling for a temporal prediction task. If the generator of the data itself
Data obtained from water distribution systems can, in general, be evolves with time, then time must be considered as an additional
classed as “dirty” in comparison to other fields. This manifests as variable in the joint probability density 共Bishop 1995兲. However,
large chunks of missing data, data from faulty loggers and the the MDN can be extended to nonstationary problems, provided
presence of erroneous date stamps. Therefore, a methodology for that the model is treated as continuously adaptive. In other words,
dealing with these issues was necessary in order to be able to for nonstationary data, the model must be reestimated within a
handle a large number of loggers with potentially varying data relatively short time interval. In the case of flow data for

312 / JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


distribution systems, good quality time series are “almost” dealt with 共e.g., either a burst has been repaired or the alarm
stationary in that they exhibit stationarity over extended periods— related to an unusual demand兲. An option allows the automatic
in effect, a set of quasi-stationary models. Therefore, by retraining clearing of alarms after a set period 共such as one week兲.
the networks with a slightly updated data set for the DMA, current
conditions can be continually built into the model.
A second data quality or assurance test is thus applied to assess Harrogate and Dales Case Study
if training sets are “almost stationary” in order to restrict the use
of spurious or bad quality data. The training data period is The Harrogate and Dales 共H&D兲 administrative area in North
analyzed to assess how the mean and variance change over time, Yorkshire in the U.K. stretches from Harrogate northwards to-
with the mean ␮n and variance ␴2n calculated for each week’s data. ward Richmond, Darlington, and Middlesbrough. The area con-
These values are compared to the average mean ␮n and variance sists of nearly 200 DMAs 共excluding trunk main and industrial
␴2n over the entire period. If either measure is an outlier of a range user DMAs兲 and includes approximately 122,000 properties.
specified by a user defined multiplier, e.g., ␮n / ␮MULT ⬍ ␮i or ␮i Yorkshire Water Service, the water utility company responsible,
⬎ ␮N ⴱ ␮MULT then week i is defined as failing the test, similarly
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

conducted a network service pilot in the H&D area by installing


for variance. If a user defined number of failures is exceeded for 450 Cello loggers equipped with a general packet radio service
mean 共␮LIM兲 and/or variance 共␴LIM 2
兲 then the training data are 共GPRS兲 communications infrastructure to dramatically improve
rejected on this pass. For the case study, with 12 weeks of training data transfer for both flow and pressure data. Data are communi-
data and as a consequence of empirical testing, ␮MULT = 1.5, cated every 30 minutes and two readings are obtained 共15-minute
␴MULT
2
= 1.5, ␮LIM = 3, and ␴LIM
2
= 6. In practice, around 25% of the sampled data兲. In addition to telemetry, the pilot system incorpo-
worst quality data sets were excluded at any particular time. rates high 共for flow兲 and low 共for pressure兲 flat-line alarms, the
Each data stream/instrument/site is thus in one of four original alarm values being set according to the high and low
automatically controlled states at any one time: fail stationary test; average values over 12 months, plus or minus 20% respectively
normal; sufficient but low period training data; insufficient 共these levels were then manually adjusted as the pilot progressed兲.
training data. A site can also be manually activated/deactivated This system 共the combination of GPRS and flat-line alarm levels兲
using the GUI 共for example, in the event of known sensor failure represents a significant step forward in detecting events and is
or planned operational work兲. helping demonstrate a step change in awareness of system perfor-
mance. Faster detection of bursts/leaks in the system enables
Yorkshire Water to react to help minimize the inconvenience of
Training
interruption to customers as well as decrease the damaging con-
Training sets which pass the screening tests are preprocessed 关see sequences to infrastructure, in particular for detecting sudden
Mounce et al. 共2006兲兴, including a simple linear transformation of catastrophic bursts. Around 60–80 alerts are received per week on
the data. Bowden et al. 共2003兲 found that when transforming data this system, a significant number being ghosts, and many events
for use with ANNs in water resources applications, taking a loga- are still not detected prior to customer contact.
rithmic transformation of the data, removing the seasonality of Yorkshire Water Services’ pilot provided the case study data
the data and transforming the inputs and outputs to normality 共flow only兲 sources to trial the AI data analysis system for detect-
were all found to give significantly larger forecasting errors than a ing more subtle change from DMA flow data. The online system
simple linear transformation of the data. Missing data are filled in was operational and generating alerts from August 2007. As a
using ARIMA modeling 共Vandaele 1983兲. result of promising initial results, the utility company agreed to
An ANN is trained for each valid logger when the system is allocate the additional resources required to perform validation of
initialized and thereafter at some predefined time period in order alerts within an operational environment. Access to all DMAs
to capture the usual distribution of values. The retraining interval within the H&D region was provided, resulting in a 2-month trial
is selectable on the GUI and by default is weekly, found to pro- on 144 sites. It should be emphasized that this trial was on data
vide good compromise between computational expense, system from a “live” network and not a desktop study using historical
variability, and daily and weekly patterns of water use. If the data sets. Alerts from the trial were e-mailed automatically both to
regime for the supply system being monitored changes drastically the research team and to a dedicated e-mail account accessible by
共e.g., system alterations such as valve position modifications兲, network controllers. In many cases, leakage teams were notified
then the training set will need to be started anew 共reset event兲. about a problem in a DMA following AI alerts, which conducted
a leak survey and located a leak. Sources of information for cor-
relating alerts included the following:
Testing
• Direct liaison with the control room;
Each valid logger is also tested, with the latest data, at some • Daily e-mailed briefings on major events;
regular interval controlled by a timer 共selectable on the GUI and • Customer contact database 共customer reports of visible leak-
by default hourly, dictated by the data update period兲. In a similar age兲;
manner to training sets, test data are checked for availability and • Comparison with flat-line alerts 共the AI system was used to
quality. If passed, the data are presented to the appropriately feedback to the flat-line system and help tune alarm levels兲;
trained neural network. If events are detected based on a user and
definable minimum level of confidence, then an alarm event is • Work management system 共WMS兲 record of repairs database.
added to the event database for the DMA, and an automated Proformas were completed by control room staff to bring to-
e-mail is sent out describing the alert 共SCADA ID, DMA name, gether the information from these sources for each alert and hence
the FIS output level and confidence, the classification period, and it was possible, in many cases, to build up an accurate picture of
an estimate of the leakage flow rate兲. the cause of an alert and the subsequent sequence of events as
The GUI 共see Fig. 3兲 allows an operator to review sites, and an illustrated in the following results. The live nature of the pilot
“alarm” state can be cleared if it is judged that the event has been meant that complete information was not always available.

JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010 / 313

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

Fig. 3. Online AI system GUI

Results No flat-line alert was generated as the increase was below the high
alarm level 共30 L/s for this DMA兲. The control room received
Example events for the automated, online AI system are now contacts from customers, the first more than 29 h after the AI alert,
provided and summary results are then presented from the two at 08:16 on February 27. WMS confirmed that a service pipe
month rigorous evaluation period 共January 26–March 29, 2008兲. repair was conducted on February 27. Note that the burst start

Example Events
Fig. 4 illustrates a situation in which the AI system correctly
produced an alert for a burst that the flat-line system did not
detect. The alert was received at 03:11 on February 26, with a size
estimate of just over 2 L/s. As can be seen from the graph, the
flow was abnormal in the 12 h window up to this time 共the period
of FIS analysis兲. The DMA has 3,057 domestic and 474 commer-
cial properties.

SCADA ID: 3193_02



ALERTⴱ %CONFIDENCE: 80.00 Fuzzy
output: 0.82 Dates: From 25-Feb-2008
13:45:00 To 26-Feb-2008 01:45:00
Burst size estimate: 2.1 L/s Fig. 4. Event 1—AI alert for burst 共February 22–29, 2008兲

314 / JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

Fig. 5. Event 2—alerts, contact, and repair for burst 共November 19–
December 8, 2007兲

time, indicated on Fig. 4, is an estimate based on retrospective


visual inspection.
The next example shows an AI alert for a smaller leak which
precedes a customer contact but for which no WMS information
was available. No flat-line alert was generated as the increase was
below the high alarm level. The DMA has 96 domestic and 7
commercial properties.

SCADA ID: 3278_02



ALERTⴱ %CONFIDENCE: 85.00 Fuzzy
output: 0.763 Dates: 23-Nov-2007
23:00:00 24-Nov-2007 11:00:00 Burst
size estimate: 0.3 L/s

As can be seen in Fig. 5, the DMA had raised flow rates for nearly
2 weeks. There is no WMS record, so it is unclear what caused the
return to normal conditions. It is initially somewhat surprising that
the customer contact appears to be as the flow is returning to
normal rather than slightly prior to this as would be expected if a
Fig. 6. Event 3—AI alerts for industrial usage, cascaded meters
repair had been initiated following customer contact. It is
therefore feasible that an investigation and repair was initiated
following night line analysis and that the contact was triggered by
a presence and activity within the DMA. This example the information could be valuable for defining and checking
demonstrates how small leaks such as this, which are not customer compliance with contractual/billing arrangements for industrial
impacting, are often not identified by current systems in a timely users.
manner and become part of background leakage. However the
automated AI system accurately identifies such small leaks and
hence could redefine the ELL or the “sustainable level of leakage” SCADA ID: 3109_02
共House of Lords 2006兲. ⴱ
ALERTⴱ %CONFIDENCE: 99.7 Fuzzy
The final example illustrates how the system can detect output: 0.729 Dates: 06-Nov-2007
unusual industrial events. Postanalysis revealed that a large 17:15:00 07-Nov-2007 05:15:00 Burst
industrial user in a DMA was extracting more water than usual size estimate: 5.7 L/s
because they were having problems with their on-site borehole.
This resulted in three AI alerts: for the instrument monitoring the
Summary of All Alerts
industrial user, for DMA1 共containing the industrial user兲, and for
DMA2 共supply water to DMA1 in a cascade fashion, rather than A total of 59 alerts were produced by the AI system for the period
DMA1 being directly connected to the trunk main system兲. Fig. 6 of January 26–March 29 2008. One hundred forty four sites were
shows the time series plots for these three locations around the in the system but due to data quality and subsequent failure of
period of the event. Alerts of this type provide added value, data checks typically 90–100 were under analysis at any one time
previously unavailable knowledge and warning concerning in the two month period. Of the 59 alerts:
system activity of which water utilities would like to be aware. • Twenty six are leakage or suspected leakage: Eighteen have
For example, there are suspected incidences of illegal 3rd party been correlated with WMS 共16 unique bursts, 3 detected by
use 共e.g., tankering which has resulted in discoloration events兲 and the flat-line alarm—a given burst event can effect multiple

JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010 / 315

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


ness and response to system events. The work presented in this
report demonstrates how a further step change in awareness of
system performance and hence response can be achieved through
the application of self-learning, automatic data analysis routines.
The system developed provides “sensitive” detection of abnormal
flow events as they occur and due to the FIS system provides a
confidence estimate, in the form of a percentage, of how unusual
the flow event is together with an estimate of the burst flow rate
Fig. 7. Summary of alerts for trial period that can be very effectively use to prioritize events and response.

Discussion
flow meters depending on network configuration兲; and eight
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

have been correlated to customer reports of bursts, and appar- Data from online instruments potentially provides a wealth of
ent repairs on nightline. information about what is happening in a water distribution net-
• Five have been correlated with known industrial events 共1 de- work. The AI system presented in this paper adopts a “bottom up”
tected by the flat-line system兲; approach to analyzing data as this is most appropriate to the dirty
• Nineteen are “abnormal”—large unusual demands or short- nature of hydraulic time series data. ANNs are particularly suited
term increases in nightline 共0 on flat line兲; and to dealing with missing or spurious sensor readings and exhibit
• Nine unknown/ghosts 共generally result of spurious data兲. graceful degradation in such situations 共i.e., their performance
This data are shown graphically in Fig. 7. The AI system de- will only slowly decrease as data quality deteriorates兲. The ap-
tected many more valid events 共confirmed as above兲 than the proach mirrors the bottom level of the data information knowl-
edge wisdom pyramid used in knowledge management 共Ackoff
flat-line system. The flat-line system generated many more ghosts.
1989兲. The AI system converts flow sensor data into usable infor-
While complete details on flat-line alerts were not available, for
mation 共abnormal flow兲. However, more intelligent algorithms are
the 5 events picked up by both systems the AI system had an
needed to turn information into knowledge for decision makers.
average 1 h 45 min advantage in detection time, even with the an
For example, some form of knowledge base is needed to interpret
imposed delay due to the data route.
a number of alerts for set of related DMAs 共which may include
The abnormal classification covers situations where the AI
flow and pressure兲 and deliver a system wide analysis of events.
system produced an alert and subsequent manual data investiga-
Future work will explore the development of this type of ap-
tion confirmed that an event of some type did occur, but for which
proach, especially with the expected increase in density of cover-
there was no correlation with any further information from the
age of pressure monitoring locations which will enable more
utility company. Examples include large unusual demands 共un- accurate location of incidents by means of a “grid” of pressures.
known industrial activity, for example兲, and those related to In the longer term, there exists the possibility of combining hy-
known network maintenance or unexplained but significant short- draulic data with other signals 共for example, water quality兲 to
term increases in nightline. The signature of some of these abnor- allow the preemptive management of events and thus reduce their
mal events such as large industrial demands, or closure and impact and increase levels of service of the utility company.
opening of valves, could not be differentiated from bursts in the It should be noted that the AI system did not always detect
current analysis system. Unexpected or unlicensed water use such catastrophic failures that were repaired very quickly 共the com-
as the filling of private fire tanks, increased industrial use for new pany reports that on average the lag time, from awareness to
processes, unauthorized filling of street cleaning equipment and repair, for these is 1.5 h in the day and 3.5 at night兲. The AI
bowsers or illegal connections can all generate abnormal high system does not detect such rapidly repaired incidents because
flow and hence alerts from the AI system. These are all activities they do not have time to impact on the FIS analysis window.
of which water service providers should be aware of. For ex- However, in this situation the flat-line alarm system and the AI
ample, the AI system detected an unauthorized hydrant opening at system can be seen as complementary because the flat-line alarms
around 3 L/s in a rural DMA, which was located and closed would quickly identify the higher magnitude bursts. In such a
within a matter of hours. Unexpected events can also be produced combined scenario the settings of the flat-line alarms could be
by operational activity such as network rezoning or changes to “relaxed” reducing the number of false alarms they currently pro-
valve arrangements or pump schedules. Their detection by this duce. In combination, the two systems would identify sudden
system provides an additional check for the company of the tim- large events and slowly developing or long lasting events thereby
ing, magnitude, etc., of such activities, particularly important covering the majority of possible scenarios.
when such activities are frequently outsourced to third party con-
tractors.
Yorkshire Water has won the U.K. “Utility Company of the Conclusions
Year” award an unprecedented 3 years in a row and the latest
figures from OFWAT show that the company has met or out per- There are a range of applications for machine learning where
formed agreed leakage targets for 10 years running and overall simple point prediction is not adequate to guide decision making,
leakage has been cut by more than 45%. Despite this, the com- and a classification 共for event detection兲 is not provided. The
pany is committed to reducing leakage even further and improv- developed hybrid ANN and FIS that utilizes a MDN architecture
ing customer service. A major step in achieving this was the overcomes this problem for DMA flow data analysis. The AI sys-
introduction of GPRS technology for communication of hydraulic tem was modified and extended from an offline analysis tool to an
performance data and the associated flat-line alarm system. This automated online application which performs data-driven self-
demonstrably brought about a step change in the business aware- learning 共with no need for classification examples兲 and updating

316 / JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


共retraining兲. Significant modifications were made including being of data for neural network models in water resources applications.”
able to deal with multiple data sites; implementing strategies for Water Resources Research, 38共2兲, 2.1–2.11.
coping with poor quality or spurious data; adding functionality for Chang, N., and Makkeasorn, A. 共2006兲. “Water demand analysis in urban
alarm handling; and the temporal synchronization and handling of region by neural network models.” Proc., 8th Water Distribution Sys-
ANN training and testing. The system was used to monitor and tem Analysis Symp., USEPA/Univ. of Cincinnati, Cincinnati.
analyze flow time series data from DMA flow meters in a live Greenaway, G., Guanlao, R., Bayda, N., and Zhang, Q. 共2006兲. “Water
water distribution network. The aim was to detect flow anomalies, distribution systems demand forecasting with pattern recognition.”
especially burst events, and to assess the potential operational and Proc., 8th Water Distribution System Analysis Symp., USEPA/Univ. of
customer service benefits to be gained from its application by a Cincinnati, Cincinnati.
Hornik, K., Maxwell, S., and Halbert, W. 共1989兲. “Multilayer feedfor-
water company 共by control room staff aggregating relevant data
ward networks are universal approximators.” Neural Networks, 2,
sources a posterior兲.
359–366.
A trial was conducted in which the system analyzed continu-
House of Lords. 共2006兲. “Water management.” HL Rep. No. 191-I, Vol. 1,
ally updated 共every hour兲 data originating from 144 DMAs, pro- House of Lords Science and Technology Committee, London.
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

viding timely alerts via automated e-mails. These alerts have been Jain, S. K., Das, A., and Srivastava, D. K. 共1999兲. “Application of ANN
used by the network controllers and compared off line with flat- for reservoir inflow prediction and operation.” J. Water Resour. Plann.
line alarms, repair information, customer contacts, and on the Manage., 125共5兲, 263–271.
ground information. The results of the trial demonstrate that the Kendall, M., and Ord, J. K. 共1990兲. Time series, 3rd Ed., Hodder and
system is able to detect abnormal flow events in a timely and Stoughton Limited, Ky.
effective manner. Major findings include: Maier, H. R., and Dandy, G. C. 共2000兲. “Neural networks for the predic-
• Enables early detection and fast awareness 共Fig. 1兲 with the tion and forecasting of water resources variables: A review of model-
potential to initiate response to events before customers are ling issues and applications.” Environ. Modell. Software, 15, 101–
impacted; 124.
• Provides a confidence estimate, in the form of a percentage, of Mamlook, R., and Al-Jayyousi, O. 共2003兲. Fuzzy sets analysis for leak
the abnormality of the flow event and an estimate of the burst detection in infrastructure systems: A proposed methodology, Clean
size; Technology Environmental Policy, Vol. 6, No. 1, Springer Berlin,
• Low number of ghosts versus genuine event detections 共de- Heidelberg, 26–31.
pending on confidence threshold selected兲; McKenzie, R. and Seago, C. 共2005兲. “Assessment of real losses in po-
• Ability to detect medium to small events which a flat-line table water distribution systems: Some recent developments.” Water
Supply, 5共1兲, 33–40.
system cannot, i.e., more sensitive to changes in the system;
Misiunas, D., Vítkovský, J., Olsson, G., Lambert, M., and Simpson, A.
• Technique proven in a live environment of a real system; and
共2006兲. “Failure monitoring in water distribution networks.” Water
• Forty four percent of alerts were correlated to definite bursts, Sci. Technol., 53共4–5兲, 503–511.
using WMS and customer contact information, with the ma- Misiunas, D., Vítkovský, J., Olsson, G., Simpson, A. and Lambert, M.
jority of other alerts likely to be unknown leakage, high ab- 共2005兲. “Pipeline burst detection and location using a continuous
normal demands 共32%兲 or other notable network or industrial monitoring of transients.” J. Water Resour. Plann. Manage., 131共4兲,
events 共9%兲. 316–325.
It is concluded that the system is an effective and viable pro- Mounce, S. R. 共2005兲. “A hybrid neural network fuzzy rule-based system
active tool for online burst detection in water distribution systems applied to leak detection in water pipeline distribution networks.”
with the potential to providing significant advantage over current Ph.D. thesis, Univ. of Bradford.
practice. Mounce, S. R., Boxall, J. B., and Machell, J. 共2007兲. “An artificial neural
network/fuzzy logic system for DMA flow meter data analysis pro-
viding burst identification and size estimation.” Water management
Acknowledgments challenges in global change, B. Ulanicki et al., eds., Taylor and Fran-
cis, London, 313–320.
Mounce, S. R., Day, A. J., Wood, A. S., Khan, A., Widdop, P. D., and
The writers wish to acknowledge the support given for this re-
Machell, J. 共2002兲. “A neural network approach to burst detection.”
search by Yorkshire Water Services Ltd., U.K., and the time, en- Water Sci. Technol., 45共4–5兲, 237–246.
ergy and enthusiasm of the specific individuals involved. Mounce, S. R., Khan, A., Wood, A. S., Day, A. J., Widdop, P. D., and
Machell, J. 共2003兲. “Sensor-fusion of hydraulic data for burst detec-
tion and location in a treated water distribution system.” Inf. Fusion,
References 4共3兲, 217–229.
Mounce, S. R., Machell, J., and Boxall, J. 共2006兲. “Development of arti-
Ackoff, R. L. 共1989兲. “From data to wisdom.” Journal of Applied Systems ficial intelligence systems for analysis of water supply system data.”
Analysis, 16, 3–9. Proc., 8th Water Distribution System Analysis Symp., USEPA/Univ. of
Anmala, J., Zhang, B., and Govindaraju, R. 共2000兲. “Comparison of Cincinnati, Cincinnati.
ANNs and empirical approaches for predicting watershed runoff.” J. Nabney, I. 共2001兲. NETLAB: Algorithms for pattern recognition,
Water Resour. Plann. Manage., 126共3兲, 156–166. Springer, New York.
Bishop, C. M. 共1994兲. “Mixture density networks.” Technical Rep. No. Office of Water Services 共OFWAT兲. 共2006兲. 具www.ofwat.gov.uk典 共Sept.
NCRG/94/004, Dept. of Computer Science and Applied Mathematics, 28, 2008兲.
Aston Univ., Birmingham, U.K. Shinozuka, M., and Liang, J. 共2005兲. “Use of SCADA for damage detec-
Bishop, C. M. 共1995兲. Neural networks for pattern recognition, Oxford tion of water delivery systems.” J. Eng. Mech., 131共3兲, 225–230.
University Press, New York. Stoianov, I., Graham, N., Madden, S., Odenwald, T., and Stein, M.
Bowden, G. J., Dandy, G. C., and Maier, H. R. 共2003兲. “Data transfor- 共2007兲. “WaterSense: Integrating sensor nets with enterprise decision
mation for neural network models in water resources applications.” J. support.” Water management challenges in global change, B. Ulan-
Hydroinform., 5, 245–258. icki, et al., eds., Taylor and Francis, London, 123–126.
Bowden, G. J., H. R. Maier, and G. C. Dandy 共2002兲. “Optimal division UK Water Industry Research 共UKWIR兲. 共2006兲. “Integrated network

JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010 / 317

J. Water Resour. Plann. Manage., 2010, 136(3): 309-318


management roadmap.” Rep. No. 07/WM/18/4, UK Water Industry Vandaele, W. 共1983兲. Applied time series and Box-Jenkins models, Aca-
Research Ltd, London. demic, San Diego.
United Nations Environment Programme 共UNEP兲. 共2007兲. “Fourth global Water Research Centre 共WRc兲. 共1994兲. “Managing leakage.” Rep. A,
environment outlook: Environment for development assessment re- U.K. Water Industry Research Ltd./WRc, Wiltshire, England.
port.” United Nations Environment Programme, Progress Press Ltd,
Malta.
Downloaded from ascelibrary.org by University of Sheffield on 03/15/19. Copyright ASCE. For personal use only; all rights reserved.

318 / JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT © ASCE / MAY/JUNE 2010

View publication stats J. Water Resour. Plann. Manage., 2010, 136(3): 309-318

You might also like