You are on page 1of 17

1

METHODOLOGY FOR PIPELINE RELIABILITY ASSESSMENT UNDER
VARYING INCIDENT CRITERIA AND DIFFERENT DATABASES
!"#$% '(%#$)*$%+ ,"-(.*$% /01-2*$%

3"..(%4")56)7 $2*8". 6)9".#$*6"):
!"#$% '(;#$)*$%
<8")(: =>?@ >?A@BC@B
D$E: =>?@ >?>FBG?B
/99606$*6"): H6*82$)6$) I)(.7J ')%*6*2*(
IK#$60 $55.(%%: 6(%#$)*$%L#$60M0(6M0*

ABSTRACT

N$% *.$)%#6%%6") 464(06)( )(*O".P 6% "9 7.($* 6#4".*$)Q( *" $)J Q"2)*.J 2%6)7 )$*2.$0 7$%(%
6) 6*% R$.6"2% *(Q8)"0"76Q$0 4."Q(%%(%M S"O(R(.+ *8( 2%(920)(%% Q$))"* "R(.%8$5"O *8( *8.($*
4"%(5 *" 4("40( $)5 4."4(.*J -J *8( 7.65 9$602.(%M ') ".5(. *" T2$)*69J *8( .(06$-606*J "9 *8(
7.65+ %(R(.$0 O65(0J .(Q"7)61(5 464(06)( 6)Q65()* 5$*$-$%(% 8$R( -(() (%*$-06%8(5M
U.7$)61$*6")% *8$* 6)6*6$*(5 *8(%( 5$*$-$%(% $.(: VW X(4$.*#()* "9 !.$)%4".*$*6") <64(06)(
$)5 S$1$.5"2% Y$*(.6$0% W$9(*J /5#6)6%*.$*6") U996Q( "9 <64(06)( W$9(*J Z92.*8(. U<W[+
I2."4($) N$% 464(06)( ')Q65()* 5$*$ N."24 ZIN'N[+ V)6*(5 \6)75"# U)%8".( <64(06)(
U4(.$*".%] /%%"Q6$*6") ZV\U</[+ 3$)$56$) ^$*6")$0 I)(.7J _"$.5 Z^I_[M S"O(R(.+ ($Q8
5$*$-$%( Q")*$6)% 5$*$ $-"2* 464(06)(% "4(.$*(5 6) .(#"*( 7("7.$486Q$0 .(76")% O6*8 R$.J6)7
%"60 *J4(%+ 2)5(. 5699(.()* 6)Q65()* 5$*$ .(76%*.$*6")+ Q0$%%696Q$*6") ". Q"00(Q*6") Q.6*(.6$M D".
0")7(. *6#( 4(.6"5 (R() 6) %6)70( 5$*$-$%( *8(.( 6% R$.6$*6") "9 *8(%( Q.6*(.6$M S()Q(+ 56.(Q*
6)*(7.$*6") "9 5$*$ 6)*" ")( $)$0J%6% $)5 %$#40( .$6%(% %2%46Q6")% $-"2* *8( R$0656*J "9
.(%20*6)7 6)9(.()Q(%M
Authors move beyond the qualitative pipeline incident database comparison and provide a
methodology for quantitative integration of all available statistical information to improve gas
pipeline network reliability evaluation. We develop a new model, called Criteria-Based Poisson
model, which takes into consideration various incident data collection criteria and extend it to the
hierarchical case when different databases with differing incident registration criteria can be joined
in the same analysis. With the real data examples we demonstrate the usefulness of our method and
methodology, which unfolds itself to be of great usefulness in reliability prediction. The Lithuanian
pipeline network failure rate assessment shows the advantages of hierarchical structuring of
Criteria-Based Poisson model in small sample problems.

Keywords: Gas pipeline, gas grid, reliability, incident criteria, Bayesian, hierarchical
1. INTRODUCTION
Gas transmission pipeline network is of great importance to any country, which uses natural
gas in numerous technological processes or other purposes. However, the usefulness cannot
overshadow the possible threat posed to people and property by gas leakage or other incidents in the
system. The importance of recognizing such events has been already expressed in EU Council
Directive 96/82/EC (for this see [4, 5]). Upon the gas leakage event ignited jet fire, flash fire or
explosion could be a risky outcome [1]. In addition to this kind of risk, leakage of significant



2
volume could lead to cascading events [2], i.e. loss of pressure and decrease of flow rate at the
failed node or whole pipeline will lead to degraded characteristics of reliability in other parts of the
network, hence failing to achieve required or acceptable performance measures.
In order to quantify the reliability, several widely recognized pipeline incident databases have
been established. Organizations that initiated these databases are: US Department of Transportation
Pipeline and Hazardous Materials Safety Administration Office of Pipeline Safety (further OPS),
European Gas pipeline Incident data Group (EGIG), United Kingdom Onshore Pipeline Operators’
Association (UKOPA), Canadian National Energy Board (NEB). Even though these databases are
quite extensive (some more than others), its incident data seems to be never used together, i.e.
statistical information carried by one database was not used to support information contained in
another. Qualitative comparisons like Papadakis’ [3] do not fill the gap as well. This is
understandable since each database contains data about pipelines operated in remote geographical
regions with varying soil types, under different incident data registration, classification or collection
criteria. For longer time period even in single database there is variation of these criteria. Hence,
direct integration of data into one analysis and sample raises suspicions about the validity of
resulting inferences.
The purpose of this paper is to move beyond the qualitative pipeline incident database
comparison and by providing a methodology to draw guidelines for quantitative integration of all
available statistical information to improve reliability evaluation. In Section 2, we will review four
natural gas transmission pipeline network incident databases; section 3 is devoted for development
of a new model and methodology, which takes into consideration various incident data collection
criteria while section 4 discusses the model extension to hierarchical structure in order to
incorporate all information from available international databases; in section 5 we will apply our
model and methodology to OPS database and in chapter 6 the case of Lithuanian pipeline network
will be analysed with hierarchical our model extension.
2. A BRIEF REVIEW OF PIPELINE INCIDENT DATABASES
As already explained, there are several databases that reflect different experience in gas
pipeline networks in various countries or geographical regions. Mostly used and cited international
pipeline incident databases are as follows:
1. OPS (data from 1970 to 1990 in [7], from 1991 to 2011 in [6]);
2. EGIG [8];
3. UKOPA [9];
4. NEB [Error! Reference source not found.];
However, these databases are not identical and differ in covered time periods, incident
criteria, geographical location of pipeline networks, record types etc. Main differences and
similarities are summarized in the following table.


Table 1. International pipeline incident databases: similarities and differences
!"#$
ZH"Q$*6")[
&'()*$'+ ,$-).+,"+)/' (,)+$,)" 0"+"1".$ ,$(/,* +23$.
456 <SYW/
ZVW/[
B`?@KB`C>:
5$#$7( 8678(. *8$) F@@@ ab
B`CAKG@@G:
5$#$7( 8678(. *8$) F@@@@ ab
G@@>KG@BG:
5$#$7( 8678(. *8$) F@@@@ ab
0($P% $-"R( CA@@@ #
>
M

')Q65()* 9.(T2()Q6(% $)5 Q$2%(%b 5(*$60(5
6)9".#$*6") 9". ($Q8 6)Q65()* $% 6)5(4()5()*
%*25Jb ') $00 4(.6"5% 9$*$06*6(% $)5 6)c2.6(% O(.(
.(Q".5(5+ (E40"%6")% $)5 96.(% $% O(00M
78&8
ZI2."4([
/00 5(*(Q*$-0( 2)6)*()*6")$0 7$%
.(0($%(%M
^2#-(. "9 6)Q65()*%+ Q$2%(%+ 56%*.6-2*6") -J
5(*(Q*6") #(*8"5%+ 464(06)( 56$#(*(.+ 56$#(*(.+



3
O$00 *86QP)(%%+ $7(+ Q"R(. *J4(b 67)6*6")
9.(T2()QJ 7."24(5 -J 8"0( %61( $)5 464(
56$#(*(.b 6)c2.6(%M
9:45;
ZN.($*
_.6*$6)[
/00 5(*(Q*$-0( 2)6)*()*6")$0 7$%
.(0($%(%M
')Q65()*% 9.(T2()QJ $)5 Q$2%(%b 0($P$7( R"02#(
56%*.6-2*6") 7."24(5 -J 5(*(Q*6") #(*8"5%+
464(06)( 56$#(*(.+ O$00 *86QP)(%%+ %"60 *J4(+ $7(+
*J4( "9 Q"R(.M
!7<
Z3$)$5$[
/00 5(*(Q*$-0( 2)6)*()*6")$0 7$%
.(0($%(%b
*8( 5($*8 "9 ". %(.6"2% 6)c2.J *" $
4(.%")M
')Q65()* 9.(T2()Q6(%M


As we can see, the data collection criteria highly differ for OPS database – whole time series
is divided into three regions (Figure 1), which cannot be analysed as one sample without
appropriate model. Incident criteria can be though as the basis for screening out insignificant events
or applying censoring procedure.


Figure 1 Influence of collection criteria to incidents frequency (OPS database)
It is important to realize that the data represented by considered databases do not cover every
country and the usage of it to analyse samples from other, not covered by database, gas transmission
networks might be questionable. On the other hand, samples from small countries, like the one from
Lithuanian gas pipeline network are not representative enough and then there is no other choice but
to use the international experience.
Regarding Lithuanian gas transmission pipeline network, the available data are quite scattered
due to relatively short length of the pipelines and due to information collection of just incidents that
resulted to gas explosions or fires. The criterion was changed in 2004 and since then all incidents
resulting to gas leakage are recorded. To obtain more clear demonstration and to avoid the
profusion of zero frequencies we pooled incidents data into 5-year bins (Figure 2).
1970 1980 1990 2000 2010
0
.
5
1
.
0
1
.
5
years
F
a
i
l
u
r
e

f
r
e
q
u
e
n
c
y
,

1
/
(
y
e
a
r
*
1
0
0
0

k
m
)
Change of collection criteria



4

Figure 2 Incidents frequency change over time for Lithuanian natural gas transmission network
Current state-of-art situation of pipeline reliability analysis involves physical processes
(cracking, degradation, etc.) analysis (stochastic/deterministic) or statistical incidents (frequencies,
rates, etc.) investigation. In the following sections we confine ourselves to the second option due to
several reasons – scientific literature covering issues of physical processes analysis is quite well
developed and the statistical incidents data aspects are left somewhat underrepresented. The usual
approach is to concentrate on several (but possibly falsely thought) main characteristics like mean
and variance of incidents and to conclude from these measures the overall state of pipeline system
reliability.
3. NETWORK RELIABILITY ASESSMENT CONSIDERING VARIATION OF
INCIDENT CRITERIA
We present here the methodology of modelling of incidents count data, when the incident
definition changes over time. Although we will consider Poissonian data, the methodology can be
extended to other distributions. Suppose we have a number of incidents

X over some time period
! when the length of pipeline system is

L and the exposure is

E = !L [time unit · length unit].
However, to make things clearer we assume that

! = 1 time units (e.g. one year incidents
collection) and drop this variable from our further notation. We assume that the incidents rate at the
age

t is the function of age and incident criterion

C
t
at that time

t , i.e.

! = ! t;C
t
( )
, (1)
where

C
t
might be a set of different criteria, which we will assume to be mutually independent of
each other and the occurrence of incidents can fall just in one of the categories defined by the set. In
addition, let

C denote the set of all criteria used, so that

C
t
! C,"t . However, this is not entirely
correct, because in reality an incident could satisfy more than one criteria and model is just an
approximation of the real phenomena. Further we will demonstrate how to construct sets of criteria
in a way, to satisfy above assumption as close as possible.
One of the reasons why we would like to incorporate data under different collection criteria is
to infer the current state or to predict reliability of network with the highest certainty level possible.
Hence, this requires incorporation of all available information into analysis. However, when we
analyse total data from databases we cannot observe how many incidents were collected under one
or another criteria – just pooled sample is available to us. For example consider OPS database: in
1984 one of the criteria were tighten from 5000 $ to 50000$, but until this change, incidents with
damage over 50000$ were still observed. However, it is not possible to deduce how many incidents
fell in this category and how many under the category with damage less than 50000$ and more than
1980 1985 1990 1995 2000 2005 2010
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
years
F
a
i
l
u
r
e

f
r
e
q
u
e
n
c
y
,

1
/
(
y
e
a
r
*
1
0
0
0

k
m
)
Change of collection criteri



5
5000$. The same is for the shift of criteria in 2003. Without knowing how many incidents fell under
each category, how can we estimate overall state of network reliability? Or even more, how can we
predict future incidents?
To answer that, we assume that each data point

X
t
is a sum of multinomial random variables
- the number of incidents under each criteria out of overall number of incidents, which follows
multinomial distribution with parameters

p and

X
t
, where

p = p
1
,..., p
K
( )
and

K is the
cardinality of

C (set of all criteria). More formally, if each of

X
t
1
,..., X
t
K
is a number of incidents
under each criterion, then the distribution of vector of incidents number

X
t
1
,..., X
t
K
( )
conditional on

X
t
j
C
t
!
is as follows:

X
t
1
,..., X
t
K
( )
| X
t
j
C
t
!
= X
t
! Multinomial p
t
, X
t
( )
, (2)
where

p is the probability vector of incidents number. The conditional distribution for each vector
component is then:

X
t
j
p
j
,! t
( )
! Poisson p
j
E
t
! t
( ) ( )
; (3)

p
j
is the probability of incidents under j
th
criterion.
Since the information of how many incidents

X
t
j
have occurred under each criterion is not
available to us, all that could be done is to model the pooled sum

X
t
j
C
t
!
= X
t
together with
probability vector

p . From the properties of Poisson distribution we easily conclude with a
Criterion-Dependent-Poisson (CDP) model

X
t
! Poisson E
t
! t
( ) ( )
, (4)
where

! t
( )
= " t
( )
p
i
C
t
#
and the summation is over those probabilities that correspond to the
collection of criteria

C
t
. Hence, when we observe incidents satisfying all our criteria, we have that
the sets

C
t
and

C coincide, from what follows that

! t
( )
= " t
( )
p
i
C
t
#
= " t
( )
, while in the case
when we observe just incidents under subset of

C , the incident rate will be always smaller than

! t
( )
, as expected.
Further we will illustrate the model with two incidents criteria. Suppose we have an incident
count time series

X = X
1
,..., X
T
( )
and two criteria, i.e.

C = C
1
,C
2
{ }
. For the half of the observation
time, the data were collected according to criteria

C
1
, while at the age

T / 2 +1 the threshold was
lifted to criteria

C
2
. In reality it would be rarely the case that these two criteria are mutually
independent. Sometimes might happen that incidents satisfying

C
1
meets

C
2
as well, e.g.

C
1
is the
damage higher than 1000 $, while

C
2
is for damage higher than 1000 $ and leakage above 10,000
m
3
leakage. However, we can instead introduce another criterion

C
1
!
= C
1
\ C
2
, which accounts for
incidents that

C
1
but not

C
2
. This transformation gives us a set of mutually independent incidents

! C = C
1
!
,C
2
{ }
for which we can assign probability

p = p
1
, p
2
( )
.
The probability of data from database to be in category

C
1
= C
1
!
"C
2
is then

p
1
+ p
2
= 1,
while to be in category

C
2
is

p
2
. Hence, we have a following CDP model:



6

X
t
!
Poisson E
t
! t
( ) ( )
,t = 1,T / 2
Poisson E
t
! t
( )
p
2
( )
,t = ( T / 2 +1),T
"
#
$
%
$
. (5)
This model enables analyst to use all available data in order to evaluate and predict pipeline
system reliability. Such possibility leads to more accurate inferences, since there is no need to
discard part of statistical information. The general framework does not differ much from simple
case presented above, and schematically can be represented as in flow-chart (Figure 3).


Figure 3 Methodology of pipeline network reliability model construction under different incident criteria.
4. QUANTIFICATION OF BETWEEN-DATABASE UNCERTAINTY
In this section we will tackle a problem of data usage arising due to the differences of
databases. Each database or data sample represents different natural gas pipeline networks,
exploited by different operators, in different environmental and social conditions. Hence, one
cannot expect network failure data homogeneity. Especially the heterogeneity is pronounced in the
failure rates of young age pipelines (see Figure 4).

"#$% &''()*+, '$(,*$(#+- -*,

"#$% %.,.&//0 *1'/.-(2*
&''()*+, '$(,*$(#+- -*,
3

'#+,&(+- %.,.&//0
*1'/.-(2* -*,-4
5#
6*7(+* 2*',#$ #7 8$#9&9(/(,(*-

:*-
;$&+-7#$% ,# 3


1
"#$% & %#)*/

2

<*)*7(+(,(#+ #7



7

Figure 4 Dynamics of failure rates as estimated from various databases
This raises a question: would it be appropriate to pool all data into one sample and further
perform reliability assessment pretending we are analysing “virtual” network, which accurately
represents international situation? That would be one passible (easy) way to go. But without taking
into account variability in data we may end up with overly optimistic uncertainty boundaries, not to
mention that estimates would hardly fit real situation when data varies a lot from one source to
another.
In order to deal with this issue, we employ hierarchical Bayesian model perspective, which
can be presented as in Figure 5.

Figure 5 Graphical representation of hierarchical Bayesian model
The lowest level of the hierarchy presents data and its stochastic behaviour expressed by
some parametric distribution. Data might be collected at different pipeline networks or databases. It
is then partially merged in the second level of hierarchy, where all samples share part of their
information through within-source parameters
and
called unobservable variables. Then
this
information is used to infer in the third stage, or hyperprior stage, about between-source
parameters
. As shown, information goes up to the highest hierarchy level and strength of possible
abnormalities deteriorates as it flows deeper and deeper.

Our final addition to the methodology of pipeline network reliability analysis is to use
hierarchical Bayesian modelling approach for the CDP model (Hierarchical CDP, or HCDP). To do
this, we need to decide which parameters will be treated as hierarchically dependent, and which
should be estimated independently of the others. The most general case would be to put hierarchies
1970 1980 1990 2000 2010
0
.
0
0
.
5
1
.
0
1
.
5
Time, Years
F
a
i
l
u
r
e

r
a
t
e
OPS
UKOPA
EGÌG
Canada
Lithuania
!
i
1
!
i
2
!



8
on all parameters in model; however, this might be impossible to do for
probability vectors
, since different operators or databases may employ some totally different criterions.

The hierarchical construction opens a possibility to not just investigate international pipeline
incident databases jointly and assess the variability of worlds’ pipeline network reliability, but also
to investigate gas transmission networks not covered by any of databases: if few data samples are
available – hierarchical structure will strengthen the inferences; if no statistical information is at
hand (newly deployed transmission network) – more realistic uncertainty boundaries can be
obtained from HCDP rather than just using one database information.
5. APPLICATION TO REAL PIPELINE DATA
5.1. Criterion-Dependent-Poisson model for OPS pipeline network data
In this section we will demonstrate the application of CDP model for OPS gas transmission
incident database sample. Until 1983 incidents were recorded to OPS if the damage were above
5000 $, or there were events with injuries or deaths. Then the criterion were increased to 50000 $ in
1984 and finally additional criterion were introduced in 2002 – leakage above 84000 m
3
. Accidents
with deaths and injuries were always recorded, so that we can discard it from our set of criteria, so
that we are left with 3 not mutually exclusive criteria. Hence, we form the set
( ) ( ) ( ) { }
or >84
1 2 3 3
C C 5000$ ,C 50000$ ,C 50000$ 000m = > > > .
None of the criteria are mutually exclusive and following relations hold:

C
2
! C
1
,C
3
! C
1
,C
2
! C
3
;
hence we have situation when redefinition of

C is needed. However, it is not so trivial with

C
3
,
since incident in this group might be of damage greater than 50000 $ but with leakage less than
84000 m
3
or vice versa. It may also happen that an incident falls in both categories. This leads to the
following redefinition of the set of incident criteria:
( )
{
( )
( )
( )
}
1 3
2 3
3 3
4 3
C C 5000$, 50000$, 84000m ,
C 5000$, 50000$, 84000m ,
C 50000$, 84000m ,
C 50000$, 84000m ,
!
! = > < <
!
> < >
!
> <
!
> >

where all criteria are mutually exclusive and the following expressions holds:
1 1 2 3 4
2 3 4
3 2 3 4
C C C C C ,
C C C ,
C C C C .
! ! ! !
= " " "
! !
= "
! ! !
= " "

If we denote

! p = p
1
!
, p
2
!
, p
3
!
, p
4
!
( )
a probability vector for

! C , then it is obvious, that
probability vector for

C will be

p = 1, p
3
!
+ p
4
!
, p
2
!
+ p
3
!
+ p
4
!
( )
or (since

p
k
!
"
= 1 )

p = 1,1! p
1
"
! p
2
"
,1! p
1
"
( )
. It should be mentioned that the last expression of probability vector
should be used to avoid identification problems. Having this, the final expression of the model is as
follows:

p = p
1
,..., p
K
( )



9

X
t
!
Poisson E
t
! t
( ) ( )
,t = 1,14
Poisson E
t
! t
( )
1" p
1
#
" p
2
#
( ) ( )
,t = 15,33
Poisson E
t
! t
( )
1" p
1
#
( ) ( )
,t = 34,42
$
%
&
&
&
'
&
&
&
.
To enable Bayesian analysis we have used uniform distributions for probabilities as an
expression of prior beliefs about the proportion of data in each category. Such prior can be
interpreted as a non-informative since no value is given a priority. By performing goodness-of-fit
checking procedures the power law trend

! t
( )
="
1
t
"
2
has been validated as having the best fit.
We compared replicated values from our model and from a simple model that accounts just
incidents occurred after the last change of criterion in regulatory documents, i.e. since 2003.
Surprisingly, 95 % credibility intervals (measures of uncertainty boundaries) in the inference
regions (see Figure 6) are almost identical for both cases contrary to the expected wider intervals for
partial sample. Hence, this leads to the conclusion that in this particular OPS case it does not matter
whether all available data (since 1970) is used or just a part of the sample (since 2003) in the simple
model. The differences should become more obvious in case of even smaller data samples.
However, when we look at the prediction region, superiority of using all available data becomes
clear – smaller sample leads to rapidly increasing credibility bounds, while CDP model under full
data sample provides much more accurate (les uncertain) forecast. This is especially important in
analysis of long-term reliability.

Figure 6 95 % credibility intervals for replicated predicted data
The probabilities of incidents to fall in one or another category (see Figure 7) is somewhat a
redundant result – although it allows efficient inclusion of all available pipeline reliability data, it
could be left without any further consideration if the modification of incident criterion is not
planned. However, it might be of use to analyse how the change of criteria would influence
reliability predictions and how it would affect general risk level (i.e. level of total incidents number)
expressing frequency or severity for corresponding incidents. It might turn out to be beneficial to go
back to some previous criterion.
2005 2010 2015 2020
6
0
8
0
1
0
0
1
2
0
1
4
0
1
6
0
years
N
u
m
b
e
r

o
f

f
a
i
l
u
r
e
s

f
o
r

1
0
0
0

k
m

o
f

p
i
p
e
l
i
n
e
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
!
!
!
!
!
!
!
!
!
Reliability prediction Reliability inference
Full data sample
Partial sample



10

Figure 7 Densities of probabilities of incidents in pipeline network to fall in categories

C
2
and

C
3

We conclude this section by stressing the validity of proposed model and its superiority due to
ability to account for data collected under varying incidence criteria.
5.2. Worldwide pipeline incident experience assessment by hierarchical CDP
We continue application of proposed model to a set of available different databases or data
samples. As presented at the beginning of the paper, we have samples from OPS, EGIG, UKOPA,
NEB and Lithuania. Hence, overall there are 5 samples, which traditionally would not be analysed
jointly, but our methodology resolves this issue. We have already presented differences in each data
samples: EGIG, UKOPA and NEB could be regarded as very similar data samples, since incident
criteria are almost identical, OPS used 3 criterions and sample from Lithuania is related to two
criterions. Therefore, in order to apply the proposed hierarchical model, we need to construct a set

C of criteria.
Since 2004, Lithuania collected information about incidence with gas leakage (denote it

C
2
),
so until this moment, previously used criterion (gas explosion) forms a subset of

C
2
. If we denote
the first criterion

C
1
we have a relation

C
1
! C
2
. We transform

C = C
1
,C
2
( )
as follows:

! C = ! C
1
Explosion
( )
, ! C
2
Gas leakage without explosion
( ) { }

Evidently, criteria

! C
1
and

! C
2
are mutually exclusive and

C
2
= ! C
1
" ! C
2
,

C
1
= ! C
1
. This
leads to a probability vector

p = ! p
1
,1
( )
for

C .
It is difficult to relate probability vectors for different databases through hierarchical structure
– the number of components differs and the criterions themselves are not identical. Hence, we apply
hierarchical structure just for trend

! t
( )
= ! t;"
( )
parameters.
Mathematical representation of hierarchically structured whole data is as follows:


for OPS case;
0.15 0.20 0.25 0.30
0
1
0
2
0
3
0
4
0
5
0
6
0
Density of probability to be in C2
Density of probability to be in C3

X
t
1
|!
1
~
Poisson E
t
k
" t;!
1
( ) ( )
, t = 1,14
Poisson E
t
k
" t;!
1
( )
1# $ p
1
# $ p
2
( ) ( )
, t = 15,33
Poisson E
t
k
" t;!
1
( )
1# $ p
1
( ) ( )
, t = 34,42
%
&
'
'
(
'
'






H
(
R
(
0

'




11
for Lithuanian case;

for EGIG, UKOPA and NEB.





Here we made an assumption that

! t;"
( )
has two unknown parameters

!
k
= !
1
k
,!
2
k
( )
of
which the first is positive, hence has lognormal distribution, and the second is normally distributed.
Normality and log-normality are assumptions (could be changed by another) and the sensitivity of
them has not been investigated in this paper. We admit this shortcoming, but at this stage, while
presenting application of our methodology, we are not focusing on this issue.
We are particularly interested in how well our hierarchical model covers all observed data and
how much including the international experience on the incidents in pipeline networks strengthen
inferences on Lithuanian gas transmission network.
The first issue, performance of our model with regard to all data samples, is not easy to
address. Since the number of data samples is small, parameters

!
k
form the so called unobservable
data sample which itself is small. Due to this reason the uncertainties in the second level hierarchy
parameter estimates will be high, resulting to very broad credibility intervals. Hence, the influence
of the hyper-prior (Level III) distribution has to be investigated. We will consider three prior
distributions of the following forms:
!

! µ,"
( )
#1 - uniform distribution;
!

! µ,"
( )
#"
$2
- Jeffrey’s prior;
!

! µ,"
( )
#"
$1
- invariant Haar measure.
For the sake of consistency, we will consider the failure rate trend

! t;"
( )
alone, i.e. when all
incidents are observed. The influence of the prior distributions on the trend line is negligible – no
significant differences were observed, hence enabling to conclude that even under small sample of
unobservable parameters, the parameter estimates are very robust. However, the impact is much
more pronounced when we turn to the estimation of credibility intervals, as Figure 8reveals.


X
t
2
|!
2
~
Poisson E
t
2
" t;!
2
( )
# p
1
( )
, t = 19,34
Poisson E
t
2
" t;!
2
( ) ( )
, t = 35,44
$
%
&
'
&

X
t
k
|!
k
~ Poisson E
t
k
" t;!
k
( ) ( ) {
,k = 3,5

!
1
k
~ LN µ
1
,"
1
2
( )
,k = 1,5
!
2
k
~ N µ
2
,"
2
2
( )
,k = 1,5

! µ
1

2
,"
1
,"
2
( )
= ! µ
1
,"
1
( )
! µ
2
,"
2
( )






H
(
R
(
0

'
'

H(R(0 '''



12

Figure 8 Comparison of the influence of hyper-prior distribution on Bayesian credibility intervals
This prior is not the only model that heavily influences posterior results, though. Due to very
small sample in Lithuanian gas network case, posterior expectation of probability for incident to
occur under one of criterion application is highly sensitive to prior distribution. To gain insight in
the level of sensitivity we used Beta distribution as a prior and varied each parameter (a and b) over
the range from 0 to 500. The 3D plot (Figure 9) shows the final results.


Figure 9 Analysis of prior distribution influence on probability of criterion application.
The steepest changes are for parameter values close to zero and then posterior expectation
does not react to the prior distribution influence so extremely. This is because for small Beta
distribution parameter values, the data, no matter how small sample is, still provides some
information. While for high parameters values, the prior completely overshadows the information
provided by statistical sample. This sensitivity analysis shows how important in our case is the prior
distribution selection, but it doesn’t quite answer the question of what parameters of beta
distribution should be selected. We recommend to use parameters (1,1), or in other words, to use
uniform distribution over the interval [0;1] as uniform distribution does not give a priority to any
value in that interval. In fact, the information about the parameter in question is evenly spread over
the interval and each value “attracts” the posterior distribution with equal strength. Hence, the
dominating influence cancels out.
The widest intervals are under the uniform hyper-prior distribution, hence it is the least
informative distribution. However, the bounds in this case are extremely wide (the figure is in log-
scale) and clearly unreasonable: any experienced expert would discard such bounds as carrying no
useful information. Hence, we can say that by using such subjective knowledge, we discard uniform
1970 1980 1990 2000 2010
!
1
5
!
1
0
!
5
0
5
1
0
years
L
o
g
a
r
i
t
h
m

o
f

f
a
i
l
u
r
e

r
a
t
e

t
r
e
n
d
! ! ! ! ! ! ! ! ! ! ! !
!
!
!
! ! ! !
!
! !
!
!
!
!
! !
!
! ! !
! !
!
!
!
!
! !
!
!
!
! !
!
! !
!
! !
! ! ! !
! !
!
! ! ! !
! ! ! ! ! !
!
!
!
! !
!
! !
!
! !
! ! ! !
!
! ! !
!
! !
!
!
! !
! !
! !
!
! !
!
! !
!
!
!
!
! !
!
!
Uniform prior
Haar measure
Jeffrey prior
0
100
200
300
400
500
0
100
200
300
400
500
0
0.2
0.4
0.6
0.8
1
a
Posterior expectation sensitivity to prior distribution
b



13
hyper-prior distribution from our further analysis and turn to more informative ones. Upper bounds
for Haar measure and Jeffrey’s hyper-prior are still very high resulting to 107 and 26 incidents for
1000 km of pipelines accordingly. We again use a subjective judgement and select for our analysis
most informative prior out of three considered - Jeffreys prior. This distribution represents the most
realistic situation of failure rates in pipeline networks, because the highest observed rate is 1.61
incidence for 1000 km at OPS database, thus 107 would be clearly too high even for the credibility
bounds. In Jeffrey’s case we also observed a good coverage of the incident frequencies plotted from
all data samples. Hence, the validity of this prior model is supported as well.
Now we turn our eye on the inference for Lithuanian pipeline network. Due to the small size
of the network and the incident criterion used until 2004, the data is quite scattered and the
inferences based on it alone would be questionable. But until now, there was nothing to be done: at
best, those few data could be translated into failure frequency estimate and qualitatively compared
to international experience to validate that the situation in network reliability is not worse or better
than in general level.


Figure 10 Inferred and predicted failure rates for simple CDP and hierarchical model for Lithuanian gas
transmission network

When comparing the estimated trend (see Figure 10, inferred zone), hierarchical model
generally provides wider credibility bounds, than obtained from non-hierarchical variant of CDP
model (only considering Lithuanian case). This is due to the fact that hierarchical structure of the
model allows incorporating and quantifying additional level of uncertainty, i.e. variation throughout
different databases is now accounted as well. In addition, two data points that were collected under
new incident criterion (since 2003) are less underestimated by hierarchical model.
On the other hand, when we predicted failure rate for next 40 years (see Figure 10, predicted
zone), uncertainty about the future failure rate increases significantly for CDP. While in case of
hierarchical model for the same prediction period, for a while, it stays affected by the general
decreasing trend. This is due to the fact that this additional information from various databases, the
same information that gave rise to wider uncertainty bounds in inference part, is now making future
predictions more certain (as it involves clearly revealed decreasing trend). In other words, we are
more informed about the future state of network reliability than we would be if we had just used
small Lithuanian pipeline network data sample.



14


Figure 11 Reliability prediction for the next 20 years for 1000 km of pipeline length. Filled areas represents 99 %
credibility bounds under each model: stripped region is for CDP, while solid region is for hierarchical CDP
We also predicted reliability function for the next few years of 1000 km of Lithuanian
pipeline network (see Figure 11). The differences between hierarchical and non-hierarchical
variants of CDP are remarkable. Firstly, the 99 % credibility bounds for hierarchical model are
narrower, while for non-hierarchical model these bounds are probably too wide to make any long-
term predictions. The consequences of using just simple Poisson model (without incorporating
information about the criteria) or regular CDP model to predict reliability for Lithuanian gas
transmission network would be underestimation of failure rate or overestimation of the true level of
reliability.
6. DISCUSION AND CONCLUSIONS
In this paper we have provided a short review of international natural gas pipeline network
incident databases, their differences and difficulties arising in inferences about data contained in
there.
Most important results of this paper is a new model (as well as its extension to hierarchical
framework), which we called Criterion-Dependent-Poisson model for the purpose to properly
incorporate whole available data no matter what was the criterion of incident registration. We have
demonstrated that it provides a coherent way to handle changes of data collection criterion and to
use this information to obtain more correct inferences on current state of gas network as well as
more certain predictions.
At first sight our model might seem to be of limited use when number of different criteria is
high compared to the size of data sample that they represent. However, Bayesian analysis, as a
contrast to the frequentist framework, is able to handle such issues and provide reasonable
conclusions. But situations of high number of criteria is not expected in any database, whether it is
international or some local (e.g. representing one country) since the change of criteria is change in
legislation which, as international experience shows, does not occur very often.
In our model development we said nothing about the dynamics of probability vectors. We
treated it as being time-independent. Theoretically, there is no difficulty to impose so functional
form on it with dependence on time covariate. However, it is not clear what forms might be
assumed and whether it is needed at all. Hence, further testing and probably some theoretical in-
depth analysis would be needed to answer this.



15
In addition to the presentation of CDP model, we extended it to hierarchical framework,
which allows using information from various databases. This enables to quantify additional level of
uncertainty, arising due to the variation of different pipeline operating conditions as well as
maintenance programs. Through hierarchical model HCDP information from one database is
supported by information contained in other databases. However, such information sharing does not
overshadow probabilistic evidences already contained in the single database.
We demonstrated out CDP method for OPS database sample, while hierarchical extension
HCDP was applied to assess the level of Lithuanian pipeline network reliability. In case of OPS
data sample, CDP enabled to use all collected information about the incidents in North American
pipeline network. Curiously, this increased amount of information did not resulted to more
informed inferences (significantly lower uncertainty) about current state of OPS gas grid reliability,
but when we made a forecast for next 40 years, uncertainty about future value differed significantly.
Hence, due to ability to incorporate information collected under various incident criteria, our model
enabled to predict future reliability more certainly. As a result of this more certain future, it
provides a way to make more informed decisions, e.g. when planning maintenance.
Hierarchical extension of CDP was used for Lithuanian pipeline grid due to small size
statistical sample. In addition, as already presented at the beginning of this paper, there is one time
point when incident criterion was changed. Under new criterion there is just two data points as
abnormalities to the previous ones and no valuable inferences could be made. Hence, CDP allows
incorporating information collected under older criterion while HCDP allows to support
information from Lithuanian database sample with information contained in other international
databases. We showed that reliability of Lithuanian pipeline grid could now be predicted with more
certainty as well. Information borrowed through hierarchical model structure enables more
informed future predictions as compared to the case when just local country specific, but rare
information and CDP was used. In addition, hierarchical CDP model showed better fit for the data
points observed after the change of criteria and hence provides more accurate way of failure rate
prediction.
Implementation of hierarchical CDP model presented some difficulties with regards to prior
distribution selection. We demonstrated how sensitive posterior results can be to the prior
distribution of between-database variation parameters as well as to the prior distribution of
probability vectors. However, we gave practical advises on how to proceed in these ways and how
to obtain more accurate posterior results.

7. ACKNOWLEDGEMENTS
!86% .(%($.Q8 O$% 4$.*6$00J 92)5(5 -J *8( 7.$)* Z^"M /!IK@AdG@BG[ 9."# *8( ,(%($.Q8 3"2)Q60
"9 H6*82$)6$M


8. REFERENCES
BM eU+ fMKXMb /S^+ _M eM /)$0J%6% "9 8$1$.5 $.($% $%%"Q6$*(5 O6*8 8678K4.(%%2.( )$*2.$0K7$%
464(06)(%M !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+ G@@G+ ^"M BF+ 4M B?`KBCCM
GM S/^+ gMfMb hI^N+ hMNM /) 6)*(7.$*(5 T2$)*6*$*6R( .6%P $)$0J%6% #(*8"5 9". )$*2.$0 7$%
464(06)( )(*O".PM !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+ G@B@+ ^"M G>+ 4M AGCK
A>iM
>M </</X/\'W+ NM Y$c". 8$1$.5 464(06)(%: $ Q"#4$.$*6R( %*25J "9 ")%8".( *.$)%#6%%6")
$QQ65()*%M !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+5 B```+ ^"M BG+ 4M `BKB@?M



16
AM </</X/\'W+ NMb <U,!I,+ WMb hI!!'N+ eM IV 6)6*6$*6R( ") *8( Q")*."0 "9 #$c". $QQ65()*
8$1$.5% $.6%6)7 9."# 464(06)(%M !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+5 B```+
^" BG+ 4M CFK`@M
FM </</X/\'W+ NM /%%(%%#()* "9 .(T26.(#()*% ") %$9(*J #$)$7(#()* %J%*(#% 6) IV
.(720$*6")% 9". *8( Q")*."0 "9 #$c". 8$1$.5 464(06)(%M !"#$%&' ") 1&6&$4"#+ 7&/-$0&'+5 G@@@+
^"M ?C+ 4M i>KC`M
iM <SYW/ 464(06)( 6)Q65()*% 5$*$-$%(:
8**4:dd4.6#6%M48#%$M5"*M7"RdQ"##d.(4".*%d%$9(*Jd<W'M8*#0 ZR6(O(5 $* G@B>KB@KGi[b
?M NUHV_+ IMb N,II^DIHX+ eMb X,IW^/3\+ ,Mb N,'DD'W+ DM SMb <'N^/!/,U+ HM eM <64(06)(
$QQ65()* (99(Q*% 9". )$*2.$0 7$% *.$)%#6%%6") 464(06)(%M D6)$0 .(4".*M ^(O e(.%(J ')%*6*2*( "9
!(Q8)"0"7J Z/272%*+ B``i[M
CM IN'N 8&+ ,09-'0%- 3%204-%/+5 :#$"9-&% 8&+ ,09-'0%- 3%204-%/ 8$"#9M ZW(Q.M N$%2)6( ^M+ G@@C[M
`M YQ3U^IHH+ ,Mb S/WhIHH+ XMeMjM ;<=,> ,09-'0%- ,$"4#2/ *"++ 3%204-%/+ ? @ABCDCE@E FM
Z/#-(.7$*(+ G@BB[M
B@M G"2#+ "% H&)-/I &%4 :%.0$"%7-%/J > K"79&$&/0.- >%&'I+0+ ") ,09-'0%- ,-$)"$7&%2-5 CEEED
CEEAL ^$*6")$0 I)(.7J _"$.5M 'WW^ B?B`KiBC>M G@BB+ 3$)$5$M