17 views

Uploaded by fantommass

Scientific paper

save

You are on page 1of 17

**METHODOLOGY FOR PIPELINE RELIABILITY ASSESSMENT UNDER
**

VARYING INCIDENT CRITERIA AND DIFFERENT DATABASES

!"#$% '(%#$)*$%+ ,"-(.*$% /01-2*$%

3"..(%4")56)7 $2*8". 6)9".#$*6"):

!"#$% '(;#$)*$%

<8")(: =>?@ >?A@BC@B

D$E: =>?@ >?>FBG?B

/99606$*6"): H6*82$)6$) I)(.7J ')%*6*2*(

IK#$60 $55.(%%: 6(%#$)*$%L#$60M0(6M0*

ABSTRACT

N$% *.$)%#6%%6") 464(06)( )(*O".P 6% "9 7.($* 6#4".*$)Q( *" $)J Q"2)*.J 2%6)7 )$*2.$0 7$%(%

6) 6*% R$.6"2% *(Q8)"0"76Q$0 4."Q(%%(%M S"O(R(.+ *8( 2%(920)(%% Q$))"* "R(.%8$5"O *8( *8.($*

4"%(5 *" 4("40( $)5 4."4(.*J -J *8( 7.65 9$602.(%M ') ".5(. *" T2$)*69J *8( .(06$-606*J "9 *8(

7.65+ %(R(.$0 O65(0J .(Q"7)61(5 464(06)( 6)Q65()* 5$*$-$%(% 8$R( -(() (%*$-06%8(5M

U.7$)61$*6")% *8$* 6)6*6$*(5 *8(%( 5$*$-$%(% $.(: VW X(4$.*#()* "9 !.$)%4".*$*6") <64(06)(

$)5 S$1$.5"2% Y$*(.6$0% W$9(*J /5#6)6%*.$*6") U996Q( "9 <64(06)( W$9(*J Z92.*8(. U<W[+

I2."4($) N$% 464(06)( ')Q65()* 5$*$ N."24 ZIN'N[+ V)6*(5 \6)75"# U)%8".( <64(06)(

U4(.$*".%] /%%"Q6$*6") ZV\U</[+ 3$)$56$) ^$*6")$0 I)(.7J _"$.5 Z^I_[M S"O(R(.+ ($Q8

5$*$-$%( Q")*$6)% 5$*$ $-"2* 464(06)(% "4(.$*(5 6) .(#"*( 7("7.$486Q$0 .(76")% O6*8 R$.J6)7

%"60 *J4(%+ 2)5(. 5699(.()* 6)Q65()* 5$*$ .(76%*.$*6")+ Q0$%%696Q$*6") ". Q"00(Q*6") Q.6*(.6$M D".

0")7(. *6#( 4(.6"5 (R() 6) %6)70( 5$*$-$%( *8(.( 6% R$.6$*6") "9 *8(%( Q.6*(.6$M S()Q(+ 56.(Q*

6)*(7.$*6") "9 5$*$ 6)*" ")( $)$0J%6% $)5 %$#40( .$6%(% %2%46Q6")% $-"2* *8( R$0656*J "9

.(%20*6)7 6)9(.()Q(%M

Authors move beyond the qualitative pipeline incident database comparison and provide a

methodology for quantitative integration of all available statistical information to improve gas

pipeline network reliability evaluation. We develop a new model, called Criteria-Based Poisson

model, which takes into consideration various incident data collection criteria and extend it to the

hierarchical case when different databases with differing incident registration criteria can be joined

in the same analysis. With the real data examples we demonstrate the usefulness of our method and

methodology, which unfolds itself to be of great usefulness in reliability prediction. The Lithuanian

pipeline network failure rate assessment shows the advantages of hierarchical structuring of

Criteria-Based Poisson model in small sample problems.

Keywords: Gas pipeline, gas grid, reliability, incident criteria, Bayesian, hierarchical

1. INTRODUCTION

Gas transmission pipeline network is of great importance to any country, which uses natural

gas in numerous technological processes or other purposes. However, the usefulness cannot

overshadow the possible threat posed to people and property by gas leakage or other incidents in the

system. The importance of recognizing such events has been already expressed in EU Council

Directive 96/82/EC (for this see [4, 5]). Upon the gas leakage event ignited jet fire, flash fire or

explosion could be a risky outcome [1]. In addition to this kind of risk, leakage of significant

2

volume could lead to cascading events [2], i.e. loss of pressure and decrease of flow rate at the

failed node or whole pipeline will lead to degraded characteristics of reliability in other parts of the

network, hence failing to achieve required or acceptable performance measures.

In order to quantify the reliability, several widely recognized pipeline incident databases have

been established. Organizations that initiated these databases are: US Department of Transportation

Pipeline and Hazardous Materials Safety Administration Office of Pipeline Safety (further OPS),

European Gas pipeline Incident data Group (EGIG), United Kingdom Onshore Pipeline Operators’

Association (UKOPA), Canadian National Energy Board (NEB). Even though these databases are

quite extensive (some more than others), its incident data seems to be never used together, i.e.

statistical information carried by one database was not used to support information contained in

another. Qualitative comparisons like Papadakis’ [3] do not fill the gap as well. This is

understandable since each database contains data about pipelines operated in remote geographical

regions with varying soil types, under different incident data registration, classification or collection

criteria. For longer time period even in single database there is variation of these criteria. Hence,

direct integration of data into one analysis and sample raises suspicions about the validity of

resulting inferences.

The purpose of this paper is to move beyond the qualitative pipeline incident database

comparison and by providing a methodology to draw guidelines for quantitative integration of all

available statistical information to improve reliability evaluation. In Section 2, we will review four

natural gas transmission pipeline network incident databases; section 3 is devoted for development

of a new model and methodology, which takes into consideration various incident data collection

criteria while section 4 discusses the model extension to hierarchical structure in order to

incorporate all information from available international databases; in section 5 we will apply our

model and methodology to OPS database and in chapter 6 the case of Lithuanian pipeline network

will be analysed with hierarchical our model extension.

2. A BRIEF REVIEW OF PIPELINE INCIDENT DATABASES

As already explained, there are several databases that reflect different experience in gas

pipeline networks in various countries or geographical regions. Mostly used and cited international

pipeline incident databases are as follows:

1. OPS (data from 1970 to 1990 in [7], from 1991 to 2011 in [6]);

2. EGIG [8];

3. UKOPA [9];

4. NEB [Error! Reference source not found.];

However, these databases are not identical and differ in covered time periods, incident

criteria, geographical location of pipeline networks, record types etc. Main differences and

similarities are summarized in the following table.

Table 1. International pipeline incident databases: similarities and differences

!"#$

ZH"Q$*6")[

&'()*$'+ ,$-).+,"+)/' (,)+$,)" 0"+"1".$ ,$(/,* +23$.

456 <SYW/

ZVW/[

B`?@KB`C>:

5$#$7( 8678(. *8$) F@@@ ab

B`CAKG@@G:

5$#$7( 8678(. *8$) F@@@@ ab

G@@>KG@BG:

5$#$7( 8678(. *8$) F@@@@ ab

0($P% $-"R( CA@@@ #

>

M

')Q65()* 9.(T2()Q6(% $)5 Q$2%(%b 5(*$60(5

6)9".#$*6") 9". ($Q8 6)Q65()* $% 6)5(4()5()*

%*25Jb ') $00 4(.6"5% 9$*$06*6(% $)5 6)c2.6(% O(.(

.(Q".5(5+ (E40"%6")% $)5 96.(% $% O(00M

78&8

ZI2."4([

/00 5(*(Q*$-0( 2)6)*()*6")$0 7$%

.(0($%(%M

^2#-(. "9 6)Q65()*%+ Q$2%(%+ 56%*.6-2*6") -J

5(*(Q*6") #(*8"5%+ 464(06)( 56$#(*(.+ 56$#(*(.+

3

O$00 *86QP)(%%+ $7(+ Q"R(. *J4(b 67)6*6")

9.(T2()QJ 7."24(5 -J 8"0( %61( $)5 464(

56$#(*(.b 6)c2.6(%M

9:45;

ZN.($*

_.6*$6)[

/00 5(*(Q*$-0( 2)6)*()*6")$0 7$%

.(0($%(%M

')Q65()*% 9.(T2()QJ $)5 Q$2%(%b 0($P$7( R"02#(

56%*.6-2*6") 7."24(5 -J 5(*(Q*6") #(*8"5%+

464(06)( 56$#(*(.+ O$00 *86QP)(%%+ %"60 *J4(+ $7(+

*J4( "9 Q"R(.M

!7<

Z3$)$5$[

/00 5(*(Q*$-0( 2)6)*()*6")$0 7$%

.(0($%(%b

*8( 5($*8 "9 ". %(.6"2% 6)c2.J *" $

4(.%")M

')Q65()* 9.(T2()Q6(%M

As we can see, the data collection criteria highly differ for OPS database – whole time series

is divided into three regions (Figure 1), which cannot be analysed as one sample without

appropriate model. Incident criteria can be though as the basis for screening out insignificant events

or applying censoring procedure.

Figure 1 Influence of collection criteria to incidents frequency (OPS database)

It is important to realize that the data represented by considered databases do not cover every

country and the usage of it to analyse samples from other, not covered by database, gas transmission

networks might be questionable. On the other hand, samples from small countries, like the one from

Lithuanian gas pipeline network are not representative enough and then there is no other choice but

to use the international experience.

Regarding Lithuanian gas transmission pipeline network, the available data are quite scattered

due to relatively short length of the pipelines and due to information collection of just incidents that

resulted to gas explosions or fires. The criterion was changed in 2004 and since then all incidents

resulting to gas leakage are recorded. To obtain more clear demonstration and to avoid the

profusion of zero frequencies we pooled incidents data into 5-year bins (Figure 2).

1970 1980 1990 2000 2010

0

.

5

1

.

0

1

.

5

years

F

a

i

l

u

r

e

f

r

e

q

u

e

n

c

y

,

1

/

(

y

e

a

r

*

1

0

0

0

k

m

)

Change of collection criteria

4

Figure 2 Incidents frequency change over time for Lithuanian natural gas transmission network

Current state-of-art situation of pipeline reliability analysis involves physical processes

(cracking, degradation, etc.) analysis (stochastic/deterministic) or statistical incidents (frequencies,

rates, etc.) investigation. In the following sections we confine ourselves to the second option due to

several reasons – scientific literature covering issues of physical processes analysis is quite well

developed and the statistical incidents data aspects are left somewhat underrepresented. The usual

approach is to concentrate on several (but possibly falsely thought) main characteristics like mean

and variance of incidents and to conclude from these measures the overall state of pipeline system

reliability.

3. NETWORK RELIABILITY ASESSMENT CONSIDERING VARIATION OF

INCIDENT CRITERIA

We present here the methodology of modelling of incidents count data, when the incident

definition changes over time. Although we will consider Poissonian data, the methodology can be

extended to other distributions. Suppose we have a number of incidents

X over some time period

! when the length of pipeline system is

L and the exposure is

E = !L [time unit · length unit].

However, to make things clearer we assume that

! = 1 time units (e.g. one year incidents

collection) and drop this variable from our further notation. We assume that the incidents rate at the

age

t is the function of age and incident criterion

C

t

at that time

t , i.e.

! = ! t;C

t

( )

, (1)

where

C

t

might be a set of different criteria, which we will assume to be mutually independent of

each other and the occurrence of incidents can fall just in one of the categories defined by the set. In

addition, let

C denote the set of all criteria used, so that

C

t

! C,"t . However, this is not entirely

correct, because in reality an incident could satisfy more than one criteria and model is just an

approximation of the real phenomena. Further we will demonstrate how to construct sets of criteria

in a way, to satisfy above assumption as close as possible.

One of the reasons why we would like to incorporate data under different collection criteria is

to infer the current state or to predict reliability of network with the highest certainty level possible.

Hence, this requires incorporation of all available information into analysis. However, when we

analyse total data from databases we cannot observe how many incidents were collected under one

or another criteria – just pooled sample is available to us. For example consider OPS database: in

1984 one of the criteria were tighten from 5000 $ to 50000$, but until this change, incidents with

damage over 50000$ were still observed. However, it is not possible to deduce how many incidents

fell in this category and how many under the category with damage less than 50000$ and more than

1980 1985 1990 1995 2000 2005 2010

0

.

0

0

0

.

1

0

0

.

2

0

0

.

3

0

years

F

a

i

l

u

r

e

f

r

e

q

u

e

n

c

y

,

1

/

(

y

e

a

r

*

1

0

0

0

k

m

)

Change of collection criteri

5

5000$. The same is for the shift of criteria in 2003. Without knowing how many incidents fell under

each category, how can we estimate overall state of network reliability? Or even more, how can we

predict future incidents?

To answer that, we assume that each data point

X

t

is a sum of multinomial random variables

- the number of incidents under each criteria out of overall number of incidents, which follows

multinomial distribution with parameters

p and

X

t

, where

p = p

1

,..., p

K

( )

and

K is the

cardinality of

C (set of all criteria). More formally, if each of

X

t

1

,..., X

t

K

is a number of incidents

under each criterion, then the distribution of vector of incidents number

X

t

1

,..., X

t

K

( )

conditional on

X

t

j

C

t

!

is as follows:

X

t

1

,..., X

t

K

( )

| X

t

j

C

t

!

= X

t

! Multinomial p

t

, X

t

( )

, (2)

where

p is the probability vector of incidents number. The conditional distribution for each vector

component is then:

X

t

j

p

j

,! t

( )

! Poisson p

j

E

t

! t

( ) ( )

; (3)

p

j

is the probability of incidents under j

th

criterion.

Since the information of how many incidents

X

t

j

have occurred under each criterion is not

available to us, all that could be done is to model the pooled sum

X

t

j

C

t

!

= X

t

together with

probability vector

p . From the properties of Poisson distribution we easily conclude with a

Criterion-Dependent-Poisson (CDP) model

X

t

! Poisson E

t

! t

( ) ( )

, (4)

where

! t

( )

= " t

( )

p

i

C

t

#

and the summation is over those probabilities that correspond to the

collection of criteria

C

t

. Hence, when we observe incidents satisfying all our criteria, we have that

the sets

C

t

and

C coincide, from what follows that

! t

( )

= " t

( )

p

i

C

t

#

= " t

( )

, while in the case

when we observe just incidents under subset of

C , the incident rate will be always smaller than

! t

( )

, as expected.

Further we will illustrate the model with two incidents criteria. Suppose we have an incident

count time series

X = X

1

,..., X

T

( )

and two criteria, i.e.

C = C

1

,C

2

{ }

. For the half of the observation

time, the data were collected according to criteria

C

1

, while at the age

T / 2 +1 the threshold was

lifted to criteria

C

2

. In reality it would be rarely the case that these two criteria are mutually

independent. Sometimes might happen that incidents satisfying

C

1

meets

C

2

as well, e.g.

C

1

is the

damage higher than 1000 $, while

C

2

is for damage higher than 1000 $ and leakage above 10,000

m

3

leakage. However, we can instead introduce another criterion

C

1

!

= C

1

\ C

2

, which accounts for

incidents that

C

1

but not

C

2

. This transformation gives us a set of mutually independent incidents

! C = C

1

!

,C

2

{ }

for which we can assign probability

p = p

1

, p

2

( )

.

The probability of data from database to be in category

C

1

= C

1

!

"C

2

is then

p

1

+ p

2

= 1,

while to be in category

C

2

is

p

2

. Hence, we have a following CDP model:

6

X

t

!

Poisson E

t

! t

( ) ( )

,t = 1,T / 2

Poisson E

t

! t

( )

p

2

( )

,t = ( T / 2 +1),T

"

#

$

%

$

. (5)

This model enables analyst to use all available data in order to evaluate and predict pipeline

system reliability. Such possibility leads to more accurate inferences, since there is no need to

discard part of statistical information. The general framework does not differ much from simple

case presented above, and schematically can be represented as in flow-chart (Figure 3).

Figure 3 Methodology of pipeline network reliability model construction under different incident criteria.

4. QUANTIFICATION OF BETWEEN-DATABASE UNCERTAINTY

In this section we will tackle a problem of data usage arising due to the differences of

databases. Each database or data sample represents different natural gas pipeline networks,

exploited by different operators, in different environmental and social conditions. Hence, one

cannot expect network failure data homogeneity. Especially the heterogeneity is pronounced in the

failure rates of young age pipelines (see Figure 4).

"#$% &''()*+, '$(,*$(#+- -*,

"#$% %.,.&//0 *1'/.-(2*

&''()*+, '$(,*$(#+- -*,

3

'#+,&(+- %.,.&//0

*1'/.-(2* -*,-4

5#

6*7(+* 2*',#$ #7 8$#9&9(/(,(*-

:*-

;$&+-7#$% ,# 3

1

"#$% & %#)*/

2

<*)*7(+(,(#+ #7

7

Figure 4 Dynamics of failure rates as estimated from various databases

This raises a question: would it be appropriate to pool all data into one sample and further

perform reliability assessment pretending we are analysing “virtual” network, which accurately

represents international situation? That would be one passible (easy) way to go. But without taking

into account variability in data we may end up with overly optimistic uncertainty boundaries, not to

mention that estimates would hardly fit real situation when data varies a lot from one source to

another.

In order to deal with this issue, we employ hierarchical Bayesian model perspective, which

can be presented as in Figure 5.

Figure 5 Graphical representation of hierarchical Bayesian model

The lowest level of the hierarchy presents data and its stochastic behaviour expressed by

some parametric distribution. Data might be collected at different pipeline networks or databases. It

is then partially merged in the second level of hierarchy, where all samples share part of their

information through within-source parameters

and

called unobservable variables. Then

this

information is used to infer in the third stage, or hyperprior stage, about between-source

parameters

. As shown, information goes up to the highest hierarchy level and strength of possible

abnormalities deteriorates as it flows deeper and deeper.

Our final addition to the methodology of pipeline network reliability analysis is to use

hierarchical Bayesian modelling approach for the CDP model (Hierarchical CDP, or HCDP). To do

this, we need to decide which parameters will be treated as hierarchically dependent, and which

should be estimated independently of the others. The most general case would be to put hierarchies

1970 1980 1990 2000 2010

0

.

0

0

.

5

1

.

0

1

.

5

Time, Years

F

a

i

l

u

r

e

r

a

t

e

OPS

UKOPA

EGÌG

Canada

Lithuania

!

i

1

!

i

2

!

8

on all parameters in model; however, this might be impossible to do for

probability vectors

, since different operators or databases may employ some totally different criterions.

The hierarchical construction opens a possibility to not just investigate international pipeline

incident databases jointly and assess the variability of worlds’ pipeline network reliability, but also

to investigate gas transmission networks not covered by any of databases: if few data samples are

available – hierarchical structure will strengthen the inferences; if no statistical information is at

hand (newly deployed transmission network) – more realistic uncertainty boundaries can be

obtained from HCDP rather than just using one database information.

5. APPLICATION TO REAL PIPELINE DATA

5.1. Criterion-Dependent-Poisson model for OPS pipeline network data

In this section we will demonstrate the application of CDP model for OPS gas transmission

incident database sample. Until 1983 incidents were recorded to OPS if the damage were above

5000 $, or there were events with injuries or deaths. Then the criterion were increased to 50000 $ in

1984 and finally additional criterion were introduced in 2002 – leakage above 84000 m

3

. Accidents

with deaths and injuries were always recorded, so that we can discard it from our set of criteria, so

that we are left with 3 not mutually exclusive criteria. Hence, we form the set

( ) ( ) ( ) { }

or >84

1 2 3 3

C C 5000$ ,C 50000$ ,C 50000$ 000m = > > > .

None of the criteria are mutually exclusive and following relations hold:

C

2

! C

1

,C

3

! C

1

,C

2

! C

3

;

hence we have situation when redefinition of

C is needed. However, it is not so trivial with

C

3

,

since incident in this group might be of damage greater than 50000 $ but with leakage less than

84000 m

3

or vice versa. It may also happen that an incident falls in both categories. This leads to the

following redefinition of the set of incident criteria:

( )

{

( )

( )

( )

}

1 3

2 3

3 3

4 3

C C 5000$, 50000$, 84000m ,

C 5000$, 50000$, 84000m ,

C 50000$, 84000m ,

C 50000$, 84000m ,

!

! = > < <

!

> < >

!

> <

!

> >

where all criteria are mutually exclusive and the following expressions holds:

1 1 2 3 4

2 3 4

3 2 3 4

C C C C C ,

C C C ,

C C C C .

! ! ! !

= " " "

! !

= "

! ! !

= " "

If we denote

! p = p

1

!

, p

2

!

, p

3

!

, p

4

!

( )

a probability vector for

! C , then it is obvious, that

probability vector for

C will be

p = 1, p

3

!

+ p

4

!

, p

2

!

+ p

3

!

+ p

4

!

( )

or (since

p

k

!

"

= 1 )

p = 1,1! p

1

"

! p

2

"

,1! p

1

"

( )

. It should be mentioned that the last expression of probability vector

should be used to avoid identification problems. Having this, the final expression of the model is as

follows:

p = p

1

,..., p

K

( )

9

X

t

!

Poisson E

t

! t

( ) ( )

,t = 1,14

Poisson E

t

! t

( )

1" p

1

#

" p

2

#

( ) ( )

,t = 15,33

Poisson E

t

! t

( )

1" p

1

#

( ) ( )

,t = 34,42

$

%

&

&

&

'

&

&

&

.

To enable Bayesian analysis we have used uniform distributions for probabilities as an

expression of prior beliefs about the proportion of data in each category. Such prior can be

interpreted as a non-informative since no value is given a priority. By performing goodness-of-fit

checking procedures the power law trend

! t

( )

="

1

t

"

2

has been validated as having the best fit.

We compared replicated values from our model and from a simple model that accounts just

incidents occurred after the last change of criterion in regulatory documents, i.e. since 2003.

Surprisingly, 95 % credibility intervals (measures of uncertainty boundaries) in the inference

regions (see Figure 6) are almost identical for both cases contrary to the expected wider intervals for

partial sample. Hence, this leads to the conclusion that in this particular OPS case it does not matter

whether all available data (since 1970) is used or just a part of the sample (since 2003) in the simple

model. The differences should become more obvious in case of even smaller data samples.

However, when we look at the prediction region, superiority of using all available data becomes

clear – smaller sample leads to rapidly increasing credibility bounds, while CDP model under full

data sample provides much more accurate (les uncertain) forecast. This is especially important in

analysis of long-term reliability.

Figure 6 95 % credibility intervals for replicated predicted data

The probabilities of incidents to fall in one or another category (see Figure 7) is somewhat a

redundant result – although it allows efficient inclusion of all available pipeline reliability data, it

could be left without any further consideration if the modification of incident criterion is not

planned. However, it might be of use to analyse how the change of criteria would influence

reliability predictions and how it would affect general risk level (i.e. level of total incidents number)

expressing frequency or severity for corresponding incidents. It might turn out to be beneficial to go

back to some previous criterion.

2005 2010 2015 2020

6

0

8

0

1

0

0

1

2

0

1

4

0

1

6

0

years

N

u

m

b

e

r

o

f

f

a

i

l

u

r

e

s

f

o

r

1

0

0

0

k

m

o

f

p

i

p

e

l

i

n

e

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

!

!

!

!

!

!

!

!

!

Reliability prediction Reliability inference

Full data sample

Partial sample

10

Figure 7 Densities of probabilities of incidents in pipeline network to fall in categories

C

2

and

C

3

We conclude this section by stressing the validity of proposed model and its superiority due to

ability to account for data collected under varying incidence criteria.

5.2. Worldwide pipeline incident experience assessment by hierarchical CDP

We continue application of proposed model to a set of available different databases or data

samples. As presented at the beginning of the paper, we have samples from OPS, EGIG, UKOPA,

NEB and Lithuania. Hence, overall there are 5 samples, which traditionally would not be analysed

jointly, but our methodology resolves this issue. We have already presented differences in each data

samples: EGIG, UKOPA and NEB could be regarded as very similar data samples, since incident

criteria are almost identical, OPS used 3 criterions and sample from Lithuania is related to two

criterions. Therefore, in order to apply the proposed hierarchical model, we need to construct a set

C of criteria.

Since 2004, Lithuania collected information about incidence with gas leakage (denote it

C

2

),

so until this moment, previously used criterion (gas explosion) forms a subset of

C

2

. If we denote

the first criterion

C

1

we have a relation

C

1

! C

2

. We transform

C = C

1

,C

2

( )

as follows:

! C = ! C

1

Explosion

( )

, ! C

2

Gas leakage without explosion

( ) { }

Evidently, criteria

! C

1

and

! C

2

are mutually exclusive and

C

2

= ! C

1

" ! C

2

,

C

1

= ! C

1

. This

leads to a probability vector

p = ! p

1

,1

( )

for

C .

It is difficult to relate probability vectors for different databases through hierarchical structure

– the number of components differs and the criterions themselves are not identical. Hence, we apply

hierarchical structure just for trend

! t

( )

= ! t;"

( )

parameters.

Mathematical representation of hierarchically structured whole data is as follows:

for OPS case;

0.15 0.20 0.25 0.30

0

1

0

2

0

3

0

4

0

5

0

6

0

Density of probability to be in C2

Density of probability to be in C3

X

t

1

|!

1

~

Poisson E

t

k

" t;!

1

( ) ( )

, t = 1,14

Poisson E

t

k

" t;!

1

( )

1# $ p

1

# $ p

2

( ) ( )

, t = 15,33

Poisson E

t

k

" t;!

1

( )

1# $ p

1

( ) ( )

, t = 34,42

%

&

'

'

(

'

'

H

(

R

(

0

'

11

for Lithuanian case;

for EGIG, UKOPA and NEB.

Here we made an assumption that

! t;"

( )

has two unknown parameters

!

k

= !

1

k

,!

2

k

( )

of

which the first is positive, hence has lognormal distribution, and the second is normally distributed.

Normality and log-normality are assumptions (could be changed by another) and the sensitivity of

them has not been investigated in this paper. We admit this shortcoming, but at this stage, while

presenting application of our methodology, we are not focusing on this issue.

We are particularly interested in how well our hierarchical model covers all observed data and

how much including the international experience on the incidents in pipeline networks strengthen

inferences on Lithuanian gas transmission network.

The first issue, performance of our model with regard to all data samples, is not easy to

address. Since the number of data samples is small, parameters

!

k

form the so called unobservable

data sample which itself is small. Due to this reason the uncertainties in the second level hierarchy

parameter estimates will be high, resulting to very broad credibility intervals. Hence, the influence

of the hyper-prior (Level III) distribution has to be investigated. We will consider three prior

distributions of the following forms:

!

! µ,"

( )

#1 - uniform distribution;

!

! µ,"

( )

#"

$2

- Jeffrey’s prior;

!

! µ,"

( )

#"

$1

- invariant Haar measure.

For the sake of consistency, we will consider the failure rate trend

! t;"

( )

alone, i.e. when all

incidents are observed. The influence of the prior distributions on the trend line is negligible – no

significant differences were observed, hence enabling to conclude that even under small sample of

unobservable parameters, the parameter estimates are very robust. However, the impact is much

more pronounced when we turn to the estimation of credibility intervals, as Figure 8reveals.

X

t

2

|!

2

~

Poisson E

t

2

" t;!

2

( )

# p

1

( )

, t = 19,34

Poisson E

t

2

" t;!

2

( ) ( )

, t = 35,44

$

%

&

'

&

X

t

k

|!

k

~ Poisson E

t

k

" t;!

k

( ) ( ) {

,k = 3,5

!

1

k

~ LN µ

1

,"

1

2

( )

,k = 1,5

!

2

k

~ N µ

2

,"

2

2

( )

,k = 1,5

! µ

1

,µ

2

,"

1

,"

2

( )

= ! µ

1

,"

1

( )

! µ

2

,"

2

( )

H

(

R

(

0

'

'

H(R(0 '''

12

Figure 8 Comparison of the influence of hyper-prior distribution on Bayesian credibility intervals

This prior is not the only model that heavily influences posterior results, though. Due to very

small sample in Lithuanian gas network case, posterior expectation of probability for incident to

occur under one of criterion application is highly sensitive to prior distribution. To gain insight in

the level of sensitivity we used Beta distribution as a prior and varied each parameter (a and b) over

the range from 0 to 500. The 3D plot (Figure 9) shows the final results.

Figure 9 Analysis of prior distribution influence on probability of criterion application.

The steepest changes are for parameter values close to zero and then posterior expectation

does not react to the prior distribution influence so extremely. This is because for small Beta

distribution parameter values, the data, no matter how small sample is, still provides some

information. While for high parameters values, the prior completely overshadows the information

provided by statistical sample. This sensitivity analysis shows how important in our case is the prior

distribution selection, but it doesn’t quite answer the question of what parameters of beta

distribution should be selected. We recommend to use parameters (1,1), or in other words, to use

uniform distribution over the interval [0;1] as uniform distribution does not give a priority to any

value in that interval. In fact, the information about the parameter in question is evenly spread over

the interval and each value “attracts” the posterior distribution with equal strength. Hence, the

dominating influence cancels out.

The widest intervals are under the uniform hyper-prior distribution, hence it is the least

informative distribution. However, the bounds in this case are extremely wide (the figure is in log-

scale) and clearly unreasonable: any experienced expert would discard such bounds as carrying no

useful information. Hence, we can say that by using such subjective knowledge, we discard uniform

1970 1980 1990 2000 2010

!

1

5

!

1

0

!

5

0

5

1

0

years

L

o

g

a

r

i

t

h

m

o

f

f

a

i

l

u

r

e

r

a

t

e

t

r

e

n

d

! ! ! ! ! ! ! ! ! ! ! !

!

!

!

! ! ! !

!

! !

!

!

!

!

! !

!

! ! !

! !

!

!

!

!

! !

!

!

!

! !

!

! !

!

! !

! ! ! !

! !

!

! ! ! !

! ! ! ! ! !

!

!

!

! !

!

! !

!

! !

! ! ! !

!

! ! !

!

! !

!

!

! !

! !

! !

!

! !

!

! !

!

!

!

!

! !

!

!

Uniform prior

Haar measure

Jeffrey prior

0

100

200

300

400

500

0

100

200

300

400

500

0

0.2

0.4

0.6

0.8

1

a

Posterior expectation sensitivity to prior distribution

b

13

hyper-prior distribution from our further analysis and turn to more informative ones. Upper bounds

for Haar measure and Jeffrey’s hyper-prior are still very high resulting to 107 and 26 incidents for

1000 km of pipelines accordingly. We again use a subjective judgement and select for our analysis

most informative prior out of three considered - Jeffreys prior. This distribution represents the most

realistic situation of failure rates in pipeline networks, because the highest observed rate is 1.61

incidence for 1000 km at OPS database, thus 107 would be clearly too high even for the credibility

bounds. In Jeffrey’s case we also observed a good coverage of the incident frequencies plotted from

all data samples. Hence, the validity of this prior model is supported as well.

Now we turn our eye on the inference for Lithuanian pipeline network. Due to the small size

of the network and the incident criterion used until 2004, the data is quite scattered and the

inferences based on it alone would be questionable. But until now, there was nothing to be done: at

best, those few data could be translated into failure frequency estimate and qualitatively compared

to international experience to validate that the situation in network reliability is not worse or better

than in general level.

Figure 10 Inferred and predicted failure rates for simple CDP and hierarchical model for Lithuanian gas

transmission network

When comparing the estimated trend (see Figure 10, inferred zone), hierarchical model

generally provides wider credibility bounds, than obtained from non-hierarchical variant of CDP

model (only considering Lithuanian case). This is due to the fact that hierarchical structure of the

model allows incorporating and quantifying additional level of uncertainty, i.e. variation throughout

different databases is now accounted as well. In addition, two data points that were collected under

new incident criterion (since 2003) are less underestimated by hierarchical model.

On the other hand, when we predicted failure rate for next 40 years (see Figure 10, predicted

zone), uncertainty about the future failure rate increases significantly for CDP. While in case of

hierarchical model for the same prediction period, for a while, it stays affected by the general

decreasing trend. This is due to the fact that this additional information from various databases, the

same information that gave rise to wider uncertainty bounds in inference part, is now making future

predictions more certain (as it involves clearly revealed decreasing trend). In other words, we are

more informed about the future state of network reliability than we would be if we had just used

small Lithuanian pipeline network data sample.

14

Figure 11 Reliability prediction for the next 20 years for 1000 km of pipeline length. Filled areas represents 99 %

credibility bounds under each model: stripped region is for CDP, while solid region is for hierarchical CDP

We also predicted reliability function for the next few years of 1000 km of Lithuanian

pipeline network (see Figure 11). The differences between hierarchical and non-hierarchical

variants of CDP are remarkable. Firstly, the 99 % credibility bounds for hierarchical model are

narrower, while for non-hierarchical model these bounds are probably too wide to make any long-

term predictions. The consequences of using just simple Poisson model (without incorporating

information about the criteria) or regular CDP model to predict reliability for Lithuanian gas

transmission network would be underestimation of failure rate or overestimation of the true level of

reliability.

6. DISCUSION AND CONCLUSIONS

In this paper we have provided a short review of international natural gas pipeline network

incident databases, their differences and difficulties arising in inferences about data contained in

there.

Most important results of this paper is a new model (as well as its extension to hierarchical

framework), which we called Criterion-Dependent-Poisson model for the purpose to properly

incorporate whole available data no matter what was the criterion of incident registration. We have

demonstrated that it provides a coherent way to handle changes of data collection criterion and to

use this information to obtain more correct inferences on current state of gas network as well as

more certain predictions.

At first sight our model might seem to be of limited use when number of different criteria is

high compared to the size of data sample that they represent. However, Bayesian analysis, as a

contrast to the frequentist framework, is able to handle such issues and provide reasonable

conclusions. But situations of high number of criteria is not expected in any database, whether it is

international or some local (e.g. representing one country) since the change of criteria is change in

legislation which, as international experience shows, does not occur very often.

In our model development we said nothing about the dynamics of probability vectors. We

treated it as being time-independent. Theoretically, there is no difficulty to impose so functional

form on it with dependence on time covariate. However, it is not clear what forms might be

assumed and whether it is needed at all. Hence, further testing and probably some theoretical in-

depth analysis would be needed to answer this.

15

In addition to the presentation of CDP model, we extended it to hierarchical framework,

which allows using information from various databases. This enables to quantify additional level of

uncertainty, arising due to the variation of different pipeline operating conditions as well as

maintenance programs. Through hierarchical model HCDP information from one database is

supported by information contained in other databases. However, such information sharing does not

overshadow probabilistic evidences already contained in the single database.

We demonstrated out CDP method for OPS database sample, while hierarchical extension

HCDP was applied to assess the level of Lithuanian pipeline network reliability. In case of OPS

data sample, CDP enabled to use all collected information about the incidents in North American

pipeline network. Curiously, this increased amount of information did not resulted to more

informed inferences (significantly lower uncertainty) about current state of OPS gas grid reliability,

but when we made a forecast for next 40 years, uncertainty about future value differed significantly.

Hence, due to ability to incorporate information collected under various incident criteria, our model

enabled to predict future reliability more certainly. As a result of this more certain future, it

provides a way to make more informed decisions, e.g. when planning maintenance.

Hierarchical extension of CDP was used for Lithuanian pipeline grid due to small size

statistical sample. In addition, as already presented at the beginning of this paper, there is one time

point when incident criterion was changed. Under new criterion there is just two data points as

abnormalities to the previous ones and no valuable inferences could be made. Hence, CDP allows

incorporating information collected under older criterion while HCDP allows to support

information from Lithuanian database sample with information contained in other international

databases. We showed that reliability of Lithuanian pipeline grid could now be predicted with more

certainty as well. Information borrowed through hierarchical model structure enables more

informed future predictions as compared to the case when just local country specific, but rare

information and CDP was used. In addition, hierarchical CDP model showed better fit for the data

points observed after the change of criteria and hence provides more accurate way of failure rate

prediction.

Implementation of hierarchical CDP model presented some difficulties with regards to prior

distribution selection. We demonstrated how sensitive posterior results can be to the prior

distribution of between-database variation parameters as well as to the prior distribution of

probability vectors. However, we gave practical advises on how to proceed in these ways and how

to obtain more accurate posterior results.

7. ACKNOWLEDGEMENTS

!86% .(%($.Q8 O$% 4$.*6$00J 92)5(5 -J *8( 7.$)* Z^"M /!IK@AdG@BG[ 9."# *8( ,(%($.Q8 3"2)Q60

"9 H6*82$)6$M

8. REFERENCES

BM eU+ fMKXMb /S^+ _M eM /)$0J%6% "9 8$1$.5 $.($% $%%"Q6$*(5 O6*8 8678K4.(%%2.( )$*2.$0K7$%

464(06)(%M !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+ G@@G+ ^"M BF+ 4M B?`KBCCM

GM S/^+ gMfMb hI^N+ hMNM /) 6)*(7.$*(5 T2$)*6*$*6R( .6%P $)$0J%6% #(*8"5 9". )$*2.$0 7$%

464(06)( )(*O".PM !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+ G@B@+ ^"M G>+ 4M AGCK

A>iM

>M </</X/\'W+ NM Y$c". 8$1$.5 464(06)(%: $ Q"#4$.$*6R( %*25J "9 ")%8".( *.$)%#6%%6")

$QQ65()*%M !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+5 B```+ ^"M BG+ 4M `BKB@?M

16

AM </</X/\'W+ NMb <U,!I,+ WMb hI!!'N+ eM IV 6)6*6$*6R( ") *8( Q")*."0 "9 #$c". $QQ65()*

8$1$.5% $.6%6)7 9."# 464(06)(%M !"#$%&' ") *"++ ,$-.-%/0"% 0% /1- ,$"2-++ 3%4#+/$0-+5 B```+

^" BG+ 4M CFK`@M

FM </</X/\'W+ NM /%%(%%#()* "9 .(T26.(#()*% ") %$9(*J #$)$7(#()* %J%*(#% 6) IV

.(720$*6")% 9". *8( Q")*."0 "9 #$c". 8$1$.5 464(06)(%M !"#$%&' ") 1&6&$4"#+ 7&/-$0&'+5 G@@@+

^"M ?C+ 4M i>KC`M

iM <SYW/ 464(06)( 6)Q65()*% 5$*$-$%(:

8**4:dd4.6#6%M48#%$M5"*M7"RdQ"##d.(4".*%d%$9(*Jd<W'M8*#0 ZR6(O(5 $* G@B>KB@KGi[b

?M NUHV_+ IMb N,II^DIHX+ eMb X,IW^/3\+ ,Mb N,'DD'W+ DM SMb <'N^/!/,U+ HM eM <64(06)(

$QQ65()* (99(Q*% 9". )$*2.$0 7$% *.$)%#6%%6") 464(06)(%M D6)$0 .(4".*M ^(O e(.%(J ')%*6*2*( "9

!(Q8)"0"7J Z/272%*+ B``i[M

CM IN'N 8&+ ,09-'0%- 3%204-%/+5 :#$"9-&% 8&+ ,09-'0%- 3%204-%/ 8$"#9M ZW(Q.M N$%2)6( ^M+ G@@C[M

`M YQ3U^IHH+ ,Mb S/WhIHH+ XMeMjM ;<=,> ,09-'0%- ,$"4#2/ *"++ 3%204-%/+ ? @ABCDCE@E FM

Z/#-(.7$*(+ G@BB[M

B@M G"2#+ "% H&)-/I &%4 :%.0$"%7-%/J > K"79&$&/0.- >%&'I+0+ ") ,09-'0%- ,-$)"$7&%2-5 CEEED

CEEAL ^$*6")$0 I)(.7J _"$.5M 'WW^ B?B`KiBC>M G@BB+ 3$)$5$M

- MTECH-IE.pdfUploaded byAnonymous 1kUvpdbWG
- Suspension BridgeUploaded byDaniel Rojas
- Front MatterUploaded byAnas Aminullah
- y 4201162171Uploaded byAnonymous 7VPPkWS8O
- Bayesian-Statistics Final 20140416 3Uploaded bymuralidharan
- AM&R Session 2 FailureConsiderationsMaintenance&Reli0001Uploaded byvalentina sucerquia
- ESCO CastlipUploaded byRiady Sulindro
- Software Reliability (1)Uploaded byhimanshu_agra
- 123457Uploaded bysajuleh
- MnPAVE1Uploaded byUNIPLAN d.o.o. Valjevo
- World Class Maintenance Presentation.pdfUploaded byJose Luis Rattia
- INCORPORATING TESTING EFFORT INTO SOFTWARE RELIABILITY GROWTH MODEL WITH TIME VARYING LEARNING FACTORSUploaded byAnonymous vQrJlEN
- week01_ch02_Uploaded byMartin Andimile Mbila
- RML Product Development Manual- 031108Uploaded byVignesh Deep
- ISOPE 06 CSA Codes Offshore Structures Masterson Frederking 2006Uploaded byHaiyan Lan
- Comparing r Cm and Pm o 2000Uploaded byJuan David
- Maintenance concept development:A case studyUploaded byMassimo Parcianello
- Poster Wenbin DongUploaded bykhabiran
- Bankruptcy PredictionUploaded byAncuta Ioana
- Process Sampling 101Uploaded byscribd_alpha
- Q8 IM12 FinalUploaded byJb Macaroco
- Equipment Criticality White PaperUploaded bycderin2000
- NorsysUploaded bybayesialab42
- QUALITY ASSURANCE EVALUATION FOR PROGRAMS USING MATHEMATICAL MODELSUploaded byIJAET Journal
- Generator Circuit Breaker Brochure Retrofit Application Brochure GB.fr-fRUploaded byHansam Ver