Water Resour Manage (2011) 25:1–19

DOI 10.1007/s11269-010-9684-y
The Proportional Hazards Modeling of Water Main
Failure Data Incorporating the Time-dependent Effects
of Covariates
Suwan Park· Hwandon Jun· Newland Agbenowosi ·
Bong Jae Kim· Kiyoung Lim
Received: 5 January 2010 / Accepted: 2 June 2010 /
Published online: 18 June 2010
© Springer Science+Business Media B.V. 2010
Abstract Proportional hazards models (PHMs) for the times between consecutive
pipe breaks were constructed using case study water main break data. The 150 mm
individual cast iron pipes in the case study water distribution systemwere categorized
into seven groups according to the past break history to construct a distinct PHM for
each group. During the modeling process the assumption of the proportional hazards
of covariates was examined to include the time-dependent effects of covariates on the
failure hazard in the models. By analyzing the baseline hazard rates, the hazards of
the third through the seventh break were found to follow a form similar to a bath-tub.
The estimated regression coefficients and the hazard ratios of the selected covariates
were used to analyze the variations in the factors and their effects, including the time-
dependent effects on the pipe failures. The changes in the relative hazards of the
covariates were also analyzed according to the number of breaks. The constructed
PHMs were verified by analyzing the deviance residuals of each model.
Keywords Hazard rates · Pipe break· Proportional hazards model ·
Consecutive breaks · Time-dependent effects of covariates · Water main
S. Park · K. Lim
Department of Civil and Environmental Engineering,
Pusan National University, Busan, 609-735, Korea
H. Jun (B)
School of Civil Engineering, Seoul National University of Technology,
Seoul, 139-743, Korea
e-mail: hwjun@snut.ac.kr, hwandonjun@gmail.com
N. Agbenowosi
Booz Allen Hamilton, 8283 Greensboro Drive, McLean, VA 22102, USA
B. J. Kim
Ulsan Regional Office, Korea Water Resources Corporation, Ulsan, 680-721, Korea
2 S. Park et al.
1 Introduction
The methodologies for estimating the condition of water mains may be categorized
into physical and statistical modeling methods. In general, physical modeling meth-
ods are used to estimate the effects of internal and external loads affecting the
structural capacity of a pipe in an attempt to evaluate the condition of water mains.
Conversely, statistical methods are used to evaluate the deterioration of the pipes
using recorded failure data, such as the times and locations of failure, as well as other
operational and environmental data of the pipes.
Physical modeling methods may be used to identify factors governing pipe failures
and are utilized to estimate the effects of these factors. However, these often require
extremely detailed data related to a specific mode of pipe failure. Therefore, the
construction of physical models tends to be hindered by the financial and managerial
circumstances usually encountered by the utilities in collecting the required data.
Meanwhile, statistical methods are able to provide not only the direct factors,
but also the combined effects of the direct and indirect factors on pipe failure.
Even though statistical methods mainly rely on recorded pipe failure data, in the
case where sufficient data are available, a statistical model can be constructed to
provide the failure pattern of a pipe in terms of failure rate, hazard of break and
the probabilities of pipe life expectancy. Rajani and Kleiner (2001) and Kleiner and
Rajani (2001) discuss in detail the strengths and weaknesses of physical and statistical
modeling methods.
The proportional hazards model (PHM) developed by Cox (1972) is an advanced
statistical modeling approach that can identify the factors for pipe failure and the
relative magnitudes of their effects on pipe failure. The PHM separately models
the time-dependent ageing process of a pipe and the time-independent effects
of the environmental and operational stress factors, and assumes a multiplicative
relationship between them. The PHM is also called as a semi-parametric model since
no particular form of probability distribution is assumed for the time-dependent
ageing process.
The PHM’s versatility and robust theoretical basis have made it possible to apply
this modeling approach in the field of the water main break pattern modeling.
This was initially attempted by Marks et al. (1985), where the PHM was used to
predict water main breaks based on the computed probability of the time duration
between consecutive breaks. Andreou et al. (1987), Marks et al. (1987), Eisenbeis
(1994), Brémond (1997), and Lei (1997) later presented their own PHMs, with the
methodologies for their implementation in specific settings. Park et al. (2008) takes
on a different approach to model water main break pattern in which the log-linear
ROCOF and the power law process are applied to water main failure data and the
economically optimal replacement times of individual pipes are estimated.
This paper presents a comprehensive process for constructing PHMs for the time
intervals between consecutive pipe breaks. The illustrated process includes defining
possible covariates, categorizing groups of pipes for modeling purposes, analyzing
the assumption of the proportional hazards for the covariates, estimating the baseline
survival and hazard functions and verifying the constructed models.
The check on the assumption of the proportional hazards for the chosen covariates
is especially crucial in the process of proportional hazards modeling. If the assump-
tion of the proportionality is violated for a covariate, one can partition the data into
different categories and build a different PHM for each category. However, if one
The Proportional Hazards Modeling of Water Main Failure Data 3
is interested in evaluating the relative hazards between different pipe categories, the
covariate that can be used to represent the category must be used in a PHM that
utilizes the whole data without categorization. In this case the time-dependency of
the effects of a covariate on the hazard must be considered.
Furthermore, the goodness of fit for the constructed models is provided in this
paper to ensure that the models represent the observed pipe failure phenomena well.
Analyses on the relative effects of the various factors that cause pipe failure in the
case study area are also provided based on the constructed PHMs, and any patterns
of the hazard and survival functions for the consecutive pipe failures are analyzed to
identify the characteristics of the failures.
The processes of checking the model and constructing more refined model are ex-
pected to provide more credibility on the subsequent applications of the constructed
models in scheduling maintenance of the water mains.
2 Methodologies for Constructing the Proportional Hazards Model
2.1 Overview of the Water Main Break Database
The water main break database of the study area in the U.S.A. contains information
on the pipe material/joint type, installation time, failure time, diameter and length.
The water distribution system was predominantly cast iron (CI), accounting for 75%
of the pipe inventory, which combined with ductile iron pipes account for over 95%
of the pipes in the system. Other pipe materials were galvanized iron, concrete,
asbestos cement and plastic pipes, which accounted for only a minor percentage
of the pipe in the system. Breaking down the pipe inventory by diameter, 79% of
the system was composed of either 150 or 200 mm diameter pipes. The pipes with
diameters smaller than 100 mm accounted for a little more than 2% of the total.
Even though PHMs may be constructed using the entire set of case study pipes,
partitioning of the pipes according to given criteria is necessary if one is interested
in the failure hazard characteristics of a particular group of pipes. An analysis of the
failure hazard of the cast iron 150 mm pipes, which represent the majority of the
pipes in the break database, was selected as the main focus of the modeling in this
study.
Since the PHM is based on the assumption that the hazard rate of an item is
proportional to a referenced item with a different level of the same characteristics, it
is crucial that a pipe defined as an individual pipe has consistent internal and external
characteristics along its length. The data used in defining the individual pipes were
the water main break database and the information available for the case study area,
such as the land development status, internal pressure and the number of customers
in a grid of the utility’s proprietary management system.
In addition to the basic information related to the pipe breaks, the water main
break database contained information on the locations at which the pipe mater-
ial/joint type change from the starting point of a pipe. A grid in the utility’s propri-
etary management system is assigned a type of internal pipe pressure for which four
types of pressure were available. Furthermore, the utility’s proprietary management
system contained information on the locations at which the land development status
changes.
4 S. Park et al.
Individual pipes were defined using the locations of the pipe material/joint type,
internal pipe pressure, and the land development status. Since any of the three cri-
teria must not change along a newly defined pipe’s length, the criteria are examined
from the starting point of the originally defined length of a pipe and a new individual
pipe is defined whenever a criterion changes.
2.2 Definition of the Proportional Hazards Models for the Consecutive
Pipe Failures
The PHM is used to assess the hazard rate of an item i in the form,
h
i
(t) = h
0
(t) exp(β

x
i
) (1)
where h
0
(t) is an arbitrary and unspecified baseline hazard function that has a unit
of number of events per interval of time, x
i
a vector of covariates that influence
the death or event of interest for the ith item and β a vector of coefficients of
the covariates. The relationship between the PHM and the corresponding survival
function is expressed as:
h
i
(t) = −
d
dt
log S
i
(t) (2)
Therefore, the corresponding survival function is expressed as,
S
i
(t) = exp
_

_
t
0
h
0
(τ) exp(β

x
i
)dτ
_
=
_
S
0
(t)
_
exp(β

x
i
)
(3)
where S
0
(t) is the baseline survival function.
The PHM enables one to estimate relative effects of the factors that cause pipe
failures by separately modeling the time-dependent ageing mechanism using the
baseline hazard function and the effects of the factors on pipe failure using the
exponential covariate function which is usually assumed as time-independent.
The usual applications of the PHM are limited to the case where a failure of an
individual item implies the death or termination of its life. The failure time for this
kind of item, which occurs only once in its lifetime, is defined as the survival time
of the item. However, when a break or leak occurs on a water main, it is usually
repaired and put back into service rather than replacing an entire stretch of pipe.
Therefore, the definition of the survival time, on which the PHM is based, needs to
be modified if one is interested in estimating the hazard and survival probability of
each consecutive break of a water main.
For this purpose the pipes may be assumed to gain a new life after a repair
and; thereby, allowing an individual pipe to have multiple survival times, which
are defined as the elapsed time to the first break since installation and the time
intervals between pairs of consecutive breaks. Individual 150 mm cast iron pipes are
categorized into seven ordered survival time groups (STGs) according to the total
number of breaks recorded to construct distinct PHM for each STG. Table 1 shows
the STGs and the model number defined for this modeling approach. In Table 1,
“Total No. of Pipes” is the total number of pipes that belong to the corresponding
STG, and “No. of Failed Pipes” is the number of pipes that have the survival time for
the corresponding STG. Therefore, “No. of Failed Pipes” represents the number of
survival time data points for each STG.
The Proportional Hazards Modeling of Water Main Failure Data 5
Table 1 Definition of the STG and the model number
Survival time group Minimum total Survival time Censored Total no. No. of
(or model number) number of breaks survival time of pipes failed pipes
I 0 1st BT–IT LOT–IT 9,642 4,329
II 1 2nd BT–1st BT LOT–1st BT 4,329 1,149
III 2 3rd BT–2nd BT LOT–2nd BT 1,149 500
IV 3 4th BT–3rd BT LOT–3rd BT 500 268
V 4 5th BT–4th BT LOT–4th BT 268 162
VI 5 6th BT–5th BT LOT–5th BT 162 95
VII 6 7th BT–6th BT LOT–6th BT 95 64
In Table 1, BT represents the break time, IT the installation time and LOT the
last observed break data, which was December of 1997. The hazard of a break for
each group is modeled using the PHMs and; therefore, parameters of the covariates
and the baseline hazard function for each failure interval are estimated.
The PHMs are constructed by determining the covariates to be included and by
estimating the regression coefficients for each covariate in the models. The baseline
survival functions and corresponding baseline hazard functions are then constructed
using estimates of the baseline survival probabilities at the recorded failure times.
The covariates to be included in the final PHMs are determined by analyzing the
adjusted survival curves for each covariate, the proportionality assumption on the
hazard rates for different levels of a covariate, the statistical significance of the esti-
mated regression coefficient of each covariate and the estimated partial likelihood of
the model. Figure 1 presents the process of proportional hazards modeling used for
this study.
2.3 Selection of Preliminary Covariates for the Models
Since each pipe, defined as an individual pipe, has consistent characteristics for the
chosen criteria along its defined length, the criteria used in defining individual pipes
are employed as candidates for the covariates of the PHMs. Therefore, the covariates
considered in the modeling procedures are the degree of land development (DL),
internal pressure type of pipe (PT), length of pipe (L) and the number of customers
in a grid (C). In addition, the pipe material-joint types are also considered as
covariates, for which pit-cast iron, spun cast iron rigid joint and spun cast iron flexible
joint types are coded as Pit-CI, SR and SF, respectively.
The covariates are modeled as an integer, binary or continuous variables to
maximize the statistical significance of the variables and models. In other words
different types of the covariates are examined in the modeling processes, and the
type that maximizes the statistical significance of a covariate is selected. The length
of pipe (L) and number of customers in a grid (C) are treated as a continuous variable
for Model I, but as a binary variable for Models II to VII. That is, for Models II to
VII, the values of L and C are determined to be ‘0’ or ‘1’ based on whether the
logarithmic values of the covariate exceeds the average of the logarithmic values of
the corresponding covariate of the pipes in STG II. Table 2 shows the values of the
covariates taken depending on the criterion of a covariate.
In Table 2, the covariate TYPE takes a value of ‘0’ for pit-cast iron (Pit-CI), ‘1’
for SR and ‘2’ for SF in Model I. In Model II, the covariate TYPE takes a value of
‘1’ for Pit-CI, ‘0’ for SR and ‘2’ for SF.
6 S. Park et al.
Pipe Break Times Covariate Values
Univariate Analysis
(Significance Test of Individual Covariate for Empirical Survival Probability)
Selection of Significant Covariates Including Interaction Terms;
Estimation of Regression Coefficients
Is the Proportionality Assumption Satisfied?
Estimation of the Baseline Hazard and Survivor Functions
Residual Analysis
C
o
n
s
i
d
e
r

M
o
d
e
l
i
n
g

t
h
e

T
i
m
e
-
D
e
p
e
n
d
e
n
t

E
f
f
e
c
t
s

o
f

C
o
v
a
r
i
a
t
e
s

o
n

t
h
e

H
a
z
a
r
d
YES
NO
Fig. 1 Process of the proportional hazards modeling
To select significant covariates for a PHM the forward, backward or stepwise vari-
able selection procedure can be applied, which solely use the statistical significance
of the covariates. However, it is desirable to consider all possible combinations of
covariates, including the interaction between the covariates, to better capture the
failure hazards of a water main.
Therefore, the covariates to be included in a model are preliminarily determined
using a univariate analysis, which provides a preview of the relationship between
a covariate and the survival probability. The covariates to be used in the models
are also examined using the log-likelihood ratio statistic and Akaike Information.
As a result, interactions between the covariates are found for Models I and II. The
coefficients of the selected covariates are calculated using the phproc procedure of
the SAS system (SAS 2009), by which the partial likelihood function of a STG is
maximized.
2.4 Tests on the Assumption of the Proportional Hazards
The most fundamental assumption embedded in the PHM is that of the proportional
effects of the covariates on the hazards. In other words, if the value of a covariate in
a model increases or decreases, the hazard must proportionally vary in the same way,
The Proportional Hazards Modeling of Water Main Failure Data 7
Table 2 Types and characteristics of the covariates
Covariate Definition Type Characteristics
SR Spun CI rigid Binary Used for Model III through Model VII;
joint If a pipe is spun rigid joint, SR = 1,
otherwise, SR = 0
SF Spun CI flexible Binary Used for Model III through Model VII;
joint If a pipe is spun rigid joint, SF = 1,
otherwise, SF = 0
TYPE Pipe material/ Integer Used for Model I and Model II
joint type
DL Degree of land Binary urban = 1, non-urban = 0
development
L Pipe length Continuous For Model I continuous variable,
or binary otherwise binary variable.
C Customers in Continuous For Model I continuous variable,
a grid or binary otherwise binary variable.
PT1 Pressure type 1 Binary If a pipe is PT1, PT1 = 1, otherwise PT1 = 0
PT2 Pressure type 2 Binary If a pipe is PT2, PT2 = 1, otherwise PT2 = 0
PT3 Pressure type 3 Binary If a pipe is PT3, PT3 = 1, otherwise PT3 = 0
and this amount of change in a hazard must remain constant over time. Therefore,
if the assumption of the proportionality of the effects of a covariate on the hazard
violates, one can either partition the pipes into a number of groups or consider
modeling the covariate as a product of time, which is commonly called in the
literature as the “time-dependent covariate.”
Although the term “time-dependent covariate” is widely used in the literature,
the term “time-dependent covariate” is considered to be a misnomer since what
actually time-dependent is not the covariate itself but are the effects of the covariate
on the hazards. Modeling the time-dependent effects of covariates on the hazard
is especially needed when one is interested in evaluating the relative magnitudes
of hazards depending on different levels of a covariate. The relative magnitudes of
Fig. 2 Standardized score
residuals of TYPE of Model I
8 S. Park et al.
Fig. 3 Standardized score
residuals of DL of Model I
hazards are estimated using the hazard ratios. For concise terminology the term
“time-dependent covariate” was also used below in conjunction with the more
correct explanation that is the time-dependent effects of covariates on the hazard.
To test the assumption of proportional hazards of the covariates the score process
of Klein and Moeschberger (2003), which uses the Schoenfeld residuals, are consid-
ered. Figures 2, 3, 4 and 5 show the score process of the covariates used for Model I.
The changes in the standardized score residuals for time are approximated by the
Brownian Bridge, and the probability of having a Brownian Bridge value exceeding
±1.35 is 0.05. Therefore, the null hypothesis of the proportional hazards for a
covariate may be rejected, with a significance level of 0.05, if the absolute maximum
of the standardized score residuals of a covariate exceeds 1.35. As shown in Figs. 2, 4
and 5, the absolute maximum of the standardized score residuals of TYPE, L and C
Fig. 4 Standardized score
residuals of L of Model I
The Proportional Hazards Modeling of Water Main Failure Data 9
Fig. 5 Standardized score
residuals of C of Model I
exceed 1.35 for Model I. Therefore, the covariates TYPE, L and C may need to be
modeled as the so-called time-dependent covariates.
However, treating the covariate L in Model I as a time-dependent covariate
resulted in an unrealistically large value of the regression coefficient for L. The
resulting model greatly over-estimated the effects of length on pipe failure, although
including the time-dependent L is statistically more significant than its non inclusion.
This trend was also found for TYPE and L in Model II, SR in Model III and SR and
SF in Model V. Therefore, these covariates are not modeled as the time-dependent
ones. Table 3 represents the final covariates selected and the estimated parameters of
the PHMs based on the criterion used in determining the time-dependent covariate
in this study.
2.5 Estimation of the Baseline Functions
Based on the semi-parametric modeling approach taken in this paper, no particular
form of probability distribution was assumed for the pipe break time data on the
onset of the analyses. Therefore, the baseline hazard function for each STG was
estimated by fitting an appropriate function for the log–log transformed values of
the baseline survival probabilities. These survival estimates were obtained at each
recorded failure time using the ‘baseline’ statement of the SAS system (SAS 2009).
When an appropriate function for the log–log transformed values of the baseline
survival probabilities was fitted, the appropriateness of a particular parametric
model, such as Weibull, for the survival estimates was also examined to find out if
the pipe break times followed a specific probability distribution.
Figure 6 shows the graph of the LLS of STG I plotted against the log of time in
months, for which a logistic equation is fitted as Eq. 4.
LLS(t) =
28.867
1 + exp (−2.613 · ln t + 14.87)
− 23.172 (4)
10 S. Park et al.
Table 3 Estimated parameters with time-dependent covariates
Model Covariate Parameter P value Hazard 95% hazard ratio confidence limits
(Pr < Ch-sqare) ratio Lower limit Upper limit
I TYPE 1.34242 <0.0001 3.828 3.443 4.257
DL 0.54472 <0.0001 1.724 1.543 1.926
L 0.000497 <0.0001 1.000 1.000 1.001
C 0.01968 <0.0001 1.020 1.019 1.021
DL·L 0.000259 0.0093 1.000 1.000 1.000
TYPE·C −0.01284 <0.0001 0.987 0.987 0.988
TYPE·time −0.02302 <0.0001 0.977 0.977 0.978
C·time −0.00003 <0.0001 1.000 1.000 1.000
II TYPE 0.17846 <0.0001 1.195 1.117 1.279
DL 0.34444 0.0183 1.411 1.060 1.879
L 1.70421 <0.0001 5.497 4.609 6.556
C 1.25702 <0.0001 3.515 2.309 5.351
DL·C −0.531 0.0153 0.588 0.382 0.903
C·time −1.6262 <0.0001 0.197 0.177 0.218
III SR 0.25651 0.3334 1.292 0.769 2.173
SF −0.06004 0.8321 0.942 0.541 1.640
L 0.85493 <0.0001 2.351 1.934 2.859
C −0.16995 0.0679 0.844 0.703 1.013
IV L 0.72912 <0.0001 2.073 1.612 2.666
V SR −0.87835 0.0606 0.415 0.166 1.040
SF −1.01847 0.0408 0.361 0.136 0.958
DL 0.64732 0.0572 1.910 0.980 3.723
L 0.32466 0.0457 1.384 1.006 1.902
VI L 0.44147 0.0340 1.555 1.034 2.339
VII L 0.66856 0.0097 1.951 1.176 3.239
The baseline survival function of STG I is obtained by transforming Eq. 4 into a
survival function, as follows:
ˆ
S
0
(t) = exp
_
−exp
_
28.867
1 + exp (−2.613 · ln t + 14.87)
− 23.172
__
(5)
Fig. 6 Plot of ln (month) and
LLS for STG I
The Proportional Hazards Modeling of Water Main Failure Data 11
The baseline hazard function of STG I was obtained by fitting a LOESS regression
model, which is a non-parametric regression model, to the estimated baseline hazard
rates, which were calculated as the differences in the cumulative baseline hazard rates
between successive failure times. The LOESS regression model fitted using a degree
of local polynomials of 2 and smoothing parameter of 0.25 for STG I is shown in
Fig. 7.
In Fig. 7 the unit of the hazard rate is the number of (the first) breaks per
month. The survival function of STG I is obtained as Eq. 6 using the covariates and
corresponding coefficients shown in Table 3.
ˆ
S
i
(t) = exp
_
−exp
_
28.20
1+exp (−2.163 · ln t+14.87)
−23.172
__
exp
_
1.342TYPE+0.6447DL+0.0005L+0.0197C+0.0027DL · L
−0.0128TYPE · C−0.0230TYPE · t−0.00003C · t
_
(6)
The graph of LLS vs. log of time for STG II shows a linear relationship, as in Fig. 8,
with a time scale of month, and is represented by Eq. 7.
LLS(t) = 1.272 ln t − 7.187 (7)
Therefore, the corresponding baseline survival function is obtained as Eq. 8, which
is a Weibull type of survival function.
ˆ
S
0
(t) = exp
_
−exp (−7.187) · t
1.272
_
(8)
After calculating the differences in the cumulative baseline hazard rates between
successive failure times, the baseline hazard function of STG II is also obtained as
a LOESS regression model. The LOESS regression model fitted using a degree of
local polynomials of 2 and smoothing parameter of 0.5 for STG II is shown in Fig. 9.
Fig. 7 Graph of the estimated
baseline hazard function for
Model I
12 S. Park et al.
Fig. 8 Plot of ln time and LLS
for STG II
-8
-6
-4
-2
0
2
0
ln (month)
L
L
S
1 2 3 4 5 6 7
The LOESS regression models were fitted for the baseline hazard functions of STG
I and II, because the general trends of the baseline hazards of the STGs were better
modeled using the LOESS regression models than parametric regression models.
In Fig. 9 the unit of the hazard rate is the number of (the second) breaks per
month. The survival function of STG II is obtained as Eq. 9 using the covariates and
corresponding coefficients shown in Table 3.
ˆ
S
i
(t) =
_
exp
_
−exp (−7.187) · t
1.272
__
exp(0.603SR
i
+1.834SF
i
+0.582DL
i
+0.963L
i
+0.218C
i
)
(9)
Linear functions were also proved to be appropriate for the functional form of
the LLS of STG III, IV, V, VI and VII, which also means that the corresponding
failure times follow a Weibull distribution. However, rather than using the LOESS
model, the baseline hazard rates for STG III, IV, V, VI and VII are better repre-
sented using the parametric curve fitting method, for which quadratic functions are
used.
Fig. 9 Graph of the estimated
baseline hazard function for
Model II
The Proportional Hazards Modeling of Water Main Failure Data 13
Equations 10 to 16 represent the estimated baseline survival functions of STG I,
II, III, IV, V, VI and VII, respectively.
ˆ
S
0
(t) = exp
_
−exp
_
28.867
1 +e
−2.613·ln t+14.87
− 23.172
__
(10)
ˆ
S
0
(t) = exp
_
−exp (−7.187) · t
1.272
_
(11)
ˆ
S
0
(t) = exp
_
−exp (−3.361) · t
0.600
_
(12)
ˆ
S
0
(t) = exp
_
−exp (−2.927) · t
0.609
_
(13)
ˆ
S
0
(t) = exp
_
−exp (−2.417) · t
0.548
_
(14)
ˆ
S
0
(t) = exp
_
−exp (−2.642) · t
0.627
_
(15)
ˆ
S
0
(t) = exp
_
−exp (−2.072) · t
0.551
_
(16)
Equations 17 to 21 represent the estimated baseline hazard functions of STG III,
IV, V, VI and VII, respectively, starting with STG III.
h
0
(t) = 3.336 × 10
−7
t
2
− 6.990 × 10
−5
t + 0.0089 (17)
h
0
(t) = 7.632 × 10
−7
t
2
− 9.865 × 10
−5
t + 0.0173 (18)
h
0
(t) = 7.099 × 10
−6
t
2
− 9.161 × 10
−4
t + 0.0359 (19)
h
0
(t) = 5.175 × 10
−6
t
2
− 3.546 × 10
−4
t + 0.0276 (20)
h
0
(t) = 8.408 × 10
−6
t
2
− 4.972 × 10
−4
t + 0.0428 (21)
A survival or hazard function can be constructed for a specific pipe by multiplying
the estimated baseline survival and baseline hazard function with the exponential
covariate function, where the characteristics of the pipe of interest are used as the
covariates values.
Figures 10 and 11 show the baseline survival functions for STG I to VII and
baseline hazard functions for STG III to VII, respectively.
In Fig. 11 the unit of the hazard rate is the number of the corresponding order of
breaks per month for each model. The baseline hazard rates for STGIII to VII shown
in Fig. 11 are relatively high during the very early periods, but start to immediately
decrease, and after some time begin to increase. Therefore, it was found that the
baseline hazards of the third to seventh breaks follow forms similar to a ‘bath-tub’.
14 S. Park et al.
Fig. 10 Graphs of the
estimated baseline survival
functions
The ‘bath-tub’ curve is a commonly used term in the reliability engineering field to
explain the generally high failure rates observed during the early and later periods in
an item’s lifetime.
The baseline hazard rates of the pipes with fewer breaks are generally lower than
those with a larger number of breaks for a given time, as shown in Fig. 11, and it takes
less time for the baseline hazard rates of the pipes with a larger number of breaks to
begin to rise.
The decreasing hazard rates immediately after the third break and beyond are
conjectured to be incurred by additional breaks due to imperfect repairs. Since a
break due to an imperfect repair tends to occur in a relatively short time period after
a break, the hazard rates would be high during the early periods after a break, but
would then decrease as the pipes experience fewer breaks. After a period of low
hazard rates the break hazard increases due to the ageing of the pipes.
The baseline median survival times corresponding to a survival probability of
‘0.5’ are estimated to be 505, 212, 147, 66, 42, 38 and 22 months for STG I to VII,
Fig. 11 Graphs of the
estimated baseline hazard
functions
The Proportional Hazards Modeling of Water Main Failure Data 15
respectively. Therefore, it is conjectured that the time elapsed between succes-
sive failures generally decrease as more breaks occur. This phenomenon was also
confirmed by analyzing the baseline hazard graphs in Fig. 11 which shows generally
increasing hazard rates for a given time point as the number of breaks increase.
2.6 Analysis of the Model Residuals
To verify the constructed PHMs for each of the STGthe deviance residuals suggested
by Therneau et al. (1990) were estimated for each model. The deviance residuals,
which were estimated using Eq. 22, represent the difference between the recorded
and expected failure times predicted by the PHM.
r
Di
= sgn (r
Mi
) [−2 {r
Mi
+ δ
i
ln (δ
i
−r
Mi
)}]
1/2
(22)
In Eq. 22, r
Mi
is the martingale residual of the ith pipe, sgn (r
Mi
) takes a value of
‘−1’ if r
Mi
is negative and ‘+1’ if r
Mi
is positive, δ
i
takes the value of ‘0’ if the observed
survival time is censored and ‘1’ if the observed survival time is uncensored.
As the absolute values of the deviance residuals become greater, they provide evi-
dence that the difference between the recorded and expected failure times predicted
by the PHM would be large. It is considered that the PHM did not perform well in
predicting a failure time if the absolute value of the deviance residual is greater than
3 (Allison 1996). Figures 12 and 13 show the deviance residuals of the PHMs for STG
I and II, respectively.
It was estimated that STG I and II had 35 and three recorded failure times,
respectively, for which the absolute values of the deviance residuals were greater
than 3. These number of failure times found for STGI and II represent only a fraction
of the total number of the recorded failure times. Furthermore, no other STGs were
found to have recorded failure times for which the absolute values of the deviance
residuals were greater than 3. Therefore, it was concluded that the constructed PHMs
were well fitted to the recorded failure times.
Fig. 12 Deviance residuals of
Model I
16 S. Park et al.
Fig. 13 Deviance residuals of
Model II
3 Analysis of the Constructed Models Using the Estimated Regression Coefficients
and the Hazard Ratios
As shown in Table 3, the time-dependency of the effects of the covariates on the
hazard was found for Models I and II. In Model I, the 95% confidence interval for
the hazard ratio of TYPE does not include ‘1’. By analyzing the effects of TYPE
alone in Model I, it was found that the hazards of SR was 3.8 times higher than that
of Pit-CI, and that of SF was 3.8 times higher than that of SR. In the mean time, the
actual relative hazards of TYPE change due to the time-dependency of the effects
of the covariate. As a result, the hazards of failure for Pit-CI were lowest during the
early periods after installation, but around 5 years after installation, the hazards of
failure for SR and SF became lower than that of Pit-CI, resulting in the order in terms
of increasing hazard of SF, SR and Pit-CI.
This phenomenon may be explained by the hazards of failure for SR and SF
being higher than that of Pit-CI immediately after installation due to some possible
manufacturing defects or poor installation for SR and SF. However, after the
problems of manufacture and installation for SR and SF have manifested during the
5-year period, the hazards of Pit-CI become greater than those of SR and SF.
Similarly, for STG I, the failure hazards in the urban area were estimated to
be about 1.7 times higher than those in the non-urban area when considering the
estimate of the hazard ratio for the degree of land development (DL). For STG I,
the lone effect of the number of customers (C) in a grid was found to be such that
the hazards increase about seven times for an increase of 100 customers in a grid.
The lone effect of the pipe length (L) on the hazards of the first failure was found
to be such that the hazards increase about 1.6 times for an increase of 1,000 m in pipe
length. Therefore, although the coefficient for L was very close to ‘0’, the effects of
the length should not be ignored for very long pipes. In an urban area, considering
the interaction between L and DL, the hazards of the first failure would be expected
to increase 2.1 times for every 1,000 m increase in pipe length.
The Proportional Hazards Modeling of Water Main Failure Data 17
The interaction between TYPE and C in STG I results in such that for the same
type of material, the hazards of failure increases 3.6 times for every decrease of
100 customers, and for the same number of customers SF has 0.3 and 0.08 times
the hazards than SR and Pit-CI, respectively. Considering the interaction of the
number of customers with time, the pipes with a larger number of customers have
lower hazards of failure for a given time, and over time the effects of the number of
customers were found to decrease for the pipes in STG I.
In Model II, TYPE for which the hazard ratio was estimated to be 1.2 is defined
as 0, 1 and 2 for SR, Pit-CI and SF, respectively. Therefore, the hazards of failure
increased in the order: SR, Pit-CI and SF. The hazard ratio of DL was estimated
to have a similar value to that of DL in Model I, but the hazard ratios of L and C
were estimated to be ‘5.5’ and ‘3.5’, respectively. Therefore, for the second break,
the hazards of failure were found to vary more sensitively to changes in L and C
compared to the first break. In other words, the changes in the hazard rates in STG
II were more noticeable than the ones in STG I for pipes with different length and
number of customers. The interaction between DL and C plays a role in diminishing
the effects of C on the hazards of failure. While time-dependency of the effects of C
in Model II also existed as in Model I, the effects were found to be higher in Model II.
As the number of breaks increased, the number of covariates in the models tended
to generally decrease, so that only the length had statistical meaning in relation to
explaining the hazards of the sixth and seventh failures. Furthermore, the effects of
length were estimated as generally diminishing as more breaks occurred. The reason
for this phenomenon is conjectured that when the number of breaks is small the
hazard rate is higher for longer pipes due to the greater physical extent over which
a break can occur, and as a pipe experiences more breaks the effects of the time-
dependent ageing process may become greater than the effects of the length.
The internal pipe pressure (PT) was not selected as an important covariate in
any of the models, because the internal pressure data used in the analyses were
for a grid, in which many pipes exist, not for an individual pipe. Furthermore, the
internal pressure data used in the analyses were not based on accurate measurement
or precise simulation but were defined as some expected range of pressures. It was
conjectured that more accurate data, such as transient pressure records, would be
needed to exactly assess the effects of the internal pressure on pipe failure. The
effects of the number of customers were not evident, other than for STG I and STG
II, since other STGs either do not have the covariate or the 95% confidence interval
of the covariate includes ‘1’ in their models.
4 Summary and Conclusions
In this paper, the PHMs for the times between subsequent pipe breaks were
constructed using case study water main break data. The individual 150 mm cast
iron pipes were categorized into seven ordered STGs, according to the total number
of breaks, to construct a distinct PHM for each STG. The PHMs were constructed
by determining the covariates to be included in the PHMs and by estimating
the regression coefficients for each covariate in the models. The baseline survival
functions and corresponding baseline hazard functions were then constructed using
the estimates of the baseline survival probabilities at the recorded failure times.
18 S. Park et al.
During the process of constructing the PHMs, the assumption of the proportional
hazards of the covariates was examined using standardized score residuals for each
modeled covariate. As a result, STG I and II were found to have two and one
covariates, respectively, that have time-dependent effects on the hazard. The pipe
material/joint type and the number of customers were modeled as the covariates for
which the effects on the hazard are time-dependent for STG I. For STG II, only
the number of customers was modeled as the covariate for which the effects on the
hazard are time-dependent. The constructed PHMs were verified by analyzing the
deviance residuals of each model.
By analyzing the baseline hazard rates of the STGs, the hazards of break were
found to follow a form similar to a ‘bath-tub’ during the time between consecutive
breaks. Based on the analyses of the baseline hazard rates, the general conditions
of the pipes were suggested to further deteriorate as more breaks occur. This
phenomenon was also observed by the decreasing median survival times of the
baseline survival functions of the STGs.
The estimated regression coefficients and hazard ratios of the selected covariates
for each PHM were used to analyze the variations in the factors and their effects,
including the time-dependent effects on the failures of the pipes. Furthermore, the
changes in the relative hazards of the covariates were also analyzed with regard to
the increase in the number of breaks.
The PHMs constructed in this paper may be applicable only to the case study
water pipes, since the PHM utilizes specific pipe failure data and information
related to the operation and environment of a pipe. However, the procedures for
constructing and validating the PHMs presented in this paper may be applied to other
cases for constructing PHMs for water mains.
The PHMs developed in this paper is considered to be more sophisticated than the
ones in Park et al. (2008) since the models developed in Park et al. (2008) can show
only the break pattern of a pipe along time. The PHMs developed in this paper,
on the contrary, provide more information on the status of pipe break than other
models such as the ROCOFs in Park et al. (2008). They are the main effects of the
factors of failure including the effects of the time on the failure-related effects, the
changes in the general conditions of the pipes as more breaks occur, and the survival
probabilities of the pipes for each order of break, all of which are lacking in Park
et al. (2008).
Since the methodology developed in this paper is based on the PHMs for consec-
utive pipe breaks, it is inevitable that the utilities prepare continuously monitored
records of pipe breaks in their systems if the intention is to use the methodology
developed in this paper. Actual implementation of the methods in this study may
also well depend on other types of data such as soil, traffic, pressure and so on that
are related to pipe break rather than the pipe break data itself. Once all the relevant
data are available, the utilities could apply the proposed methods to their systems
using the SAS package (SAS 2009) and some high level computer languages such as
the one used in MATLAB.
Managers of water utilities could benefit from using the PHMs constructed for
their systems by identifying the factors affecting pipe breaks and their relative effects.
By identifying the factors and their relative effects, for example, the managers will
be able to select more appropriate pipe materials for their systems. If more detailed
data related to the operation and maintenance of their pipes is available, namely,
The Proportional Hazards Modeling of Water Main Failure Data 19
pressure, traffic volume and soil characteristics, they will possibly be able to more
efficiently allocate funds so that the hazards of pipe failures are reduced.
In addition, the estimated survival functions of a pipe are expected to provide
general conditions of a pipe of interest if subsequent breaks are assumed to occur.
One can analyze the shapes of the estimated survival functions of a pipe and roughly
predict the number of breaks from which relatively frequent breaks occur. For
example, based on the estimated baseline survival functions shown in Fig. 10, one
can roughly assume that the pipes in the case study system start to have frequent
breaks from the fourth break.
This assumption is based on the observation that the slopes of the estimated
baseline survival functions of STG IV, V, VI and VII are notably steeper than the
previous ones. One can conduct a similar analysis for each pipe using the estimated
baseline survival functions and the covariate values for a pipe of interest. The
information on the number of breaks at which a pipe enters into a fast-breaking
stage could help managers of water utility decide whether to continuously repair or
to replace a pipe.
Acknowledgement This work was supported for two years by Pusan National University Research
Grant.
References
Andreou SA, Marks DH, Clark RM (1987) A new methodology for modeling break failure patterns
in deteriorating water distribution systems: applications. Adv Water Resour 10:11–20
Allison PD (1996) Survival Analysis using SAS: a practical guide. Cary, NC, SAS Institute Inc.,
pp 173–174
Brémond B (1997) Statistical modeling as help in network renewal decision, European commis-
sion co-operation on science and technology (COST). Committee C3—diagnostics of urban
infrastructure. Paris, France
Cox DR (1972) Regression models and life tables. J R Stat Soc, R Stat Soc 34(B):187–220
Eisenbeis P (1994) Modélisation statistique de la prévision des défaillances sur les conduites d’eau
potable. Ph.D. thesis, University Louis Pasteur of Strasbourg, collection Etudes Cemagref
No. 17, France
Klein JP, Moeschberger ML (2003) Survival Analysis: techniques for censored and truncated data.
Springer, New York, pp 374–381
Kleiner Y, Rajani B (2001) Comprehensive review of structural deterioration of water mains: statis-
tical models. Urban Water, Taylor and Francis 3:131–150
Lei J (1997) Statistical approach for describing lifetimes of water mains—Case Trondheim Mu-
nicipality. SINTEF Civil and Environmental Engineering, Report No. 22F007.28, Trondheim,
Norway
Marks DH et al (1985) Predicting urban water distribution maintenance strategies: a case study of
New Haven Connecticut. US Environmental Protection Agency (cooperative agreement R8 1
0558-01–0)
Marks HD, Andreou S, Jeffrey L, Park C, Zaslavski A (1987) Statistical models for water main
failures. US Environmental Protection Agency (Cooperative Agreement CR8 1 0558) M.I.T.
Office of Sponsored Projects No. 94211. Boston, MA
Park S, Jun H, Kim BJ, Im GC (2008) Modeling of water main failure rates using the log-linear
ROCOF and the power law process. Water Resour Manag 22:1311–1324
Rajani B, Kleiner Y (2001) Comprehensive review of structural deterioration of water mains: physi-
cally based models. Urban Water, Taylor and Francis 3:151–164
SAS Institute Inc (2009) http://www.sas.com/index.html
Therneau TM, Grambsch PM, Fleming TR (1990) Martingale-based residuals for survival models.
Biometrika, Oxford Journals 77:147–160