You are on page 1of 13

Asset Replacement

Strategies in Ageing Grids:


Periodic Maintenance
vs. Condition Based
Robert Ross
Performance of HV Energy Systems Reliability & Asset Management Department IWO
Delft University of Technology Delft, – Institute of Science and Development
Netherlands Ede, Netherlands
rob.ross@tudelft.nl r.ross@iwo.nl

ABSTRACT
This paper discusses the background of the transition to modern asset management. The Corrective, Period
Based, Condition Based and Risk Based maintenance styles are discussed with the circumstances where these are
applicable. Redundancy plays an important role in grids to safeguard security of supply. With the tendency to delay
replacement and to harvest as much as possible operational life of assets, this redundancy is also put to its limits. The
paper pays particular attention to the quality loss of redundancy if the service life of assets is prolonged. An important
conclusion is that even in the absence of failures an unwanted situation may build up where redundancy is not
effective anymore. Strategies towards strengthening redundancy are recommended to be developed. A measure for
redundancy quality is presented.

KEYWORDS: Asset Management, Corrective Maintenance, Period Based Maintenance, Condition Based Maintenance, Risk
Based Maintenance, Redundancy, Repair, Replacement, Health Index, Combined Health Index, Risk Index, Redundancy
quality

1 INTRODUCTION

Utility grids can consist of large numbers of components and many connections that form their network serving
to supply energy. In the nineteenth century electric light and power supply started out as separate installations and in
the early 1880s the first public or multi-client grids appeared in various countries almost simultaneously. In the
twentieth century the electricity grids in Europe became state owned or controlled and were regarded a public service at
almost any required cost. The grids tended to be overdesigned, over-serviced and often deployed in a national context. By
the end of the twentieth century energy was recognized as strategic. However not at any price since energy costs
became a significant part of production costs and had to be better controlled in order to face international competition.
Therefore in addition to reliability, the demand for cost-efficiency became a high priority.
Components and connections in the grid and must meet the requirements in power supply. As most grid assets wear
due to mechanisms driven by temperature, electro-magnetic fields, mechanical forces and (other) ambient conditions
like salt fog and acidity, preventive and corrective maintenance actions are often necessary. Ultimately replacement
will be necessary. Cost, human resources, materials as well as planned outage form efforts and risks for utilities. Not
undertaking the efforts also forms risks. Therefore maintenance strategies require due consideration. The present
paper discusses several maintenance strategies and which conditions favor which approach. The paper leans on
statistical techniques which are largely discussed in [1], but even if data lack making data analysis impossible, the
principles may still be applicable and may be used in the motivation of methods to repair or replace assets.
2 BACKGROUND OF PRESENT ASSET MANAGEMENT

In Europe the need for cross-border electricity trade grew as a way to secure energy supply and cost-efficiency. This
internationalization was one of the consequences of the drive towards securing energy supply, keeping energy affordable,
protecting the environment, reducing climate change and improving electricity grids. The European Community policy was
summarized in the phrase: sustainable, secure and affordable energy.
This policy had far-reaching consequences. The traditional electricity grids were organized as a chain consisting of
power plants interconnected to a high voltage transmission network, a distribution network, while the electric sales to
end-users such as industry and consumers also belonged to the utility service. By the end of the twentieth century this
utility chain was broken up where electricity production and sales were privatized (see Fig. 1).


Figure 1: Breaking up the electricity supply chain. Production and sales (supply) are privatized, whereas transmission and
distribution kept a monopoly but under supervision of a government installed Regulator. The transition that ended state monopolies on
electricity supply was accompanied by the implementation of modern asset management.


Grids are so capital intensive and had such an impact on spatial planning that neither investments nor right of way
were available to build competing parallel infrastructures. As a consequence there is usually just one electric grid shared
by all providers. As a result the network utilities kept a monopoly on electricity transmission and distribution. In pursue of
cost efficiency and affordability many countries installed a Regulator to supervise and set market prices and quality
standards. This assured that the grid owners / operators could not drive up their prices for their transmission and
distribution services.
Although the utilities kept a physical monopoly, they were compared to world-wide best practice and faced penalties
if underperforming by either poor power quality or excessive expenses. Precise tasks and formulations may differ per
country, but typically the responsibilities of Distribution System Operators (DSOs) and Transmission Operators (TSOs)
comprise maintaining an electricity network, providing connections between producers and consumers. Furthermore TSOs
have a task of maintaining the balance between supply and demand of electric power.
In this context the asset management of utilities aims at providing a resilient, secure and cost-efficient infrastructure.
This leads to solutions that differ from those in the main part of the twentieth century where technical quality of reliable
electrical energy supply faced much less constraints on financial and workforce resources that come with public services.
In addition to Corrective Maintenance (CM) and Period Based Maintenance PBM), new maintenance styles like
Condition Based Maintenance (CBM) and Risk Based Maintenance (RBM) came up supported by tools like diagnostics,
condition monitoring, health index and risk index. CBM focusses on the functionality of grid components. RBM also takes
the consequences of dysfunctionality into account measured in terms of a set of business values like safety, power quality,
security of supply etc. However CBM and RBM are not always more feasible nor more cost-efficient than conventional
styles like CM and PBM. A brief overview on the styles is given below in Section 2.1. For further reading a more detailed
introduction and discussion on the pros and cons of these styles is given in section 1.3.5 of [1].


3 ASPECTS OF MAINTENANCE INCLUDING REPLACEMENT

The physical grid components are the tangible assets that together shape the network infrastructure. Asset
Management (AM) is the collective term for the structured decision-making and execution of plans to reach an optimized
balance between performance, efforts and risk with the utilization of the assets. An AM system is an organized set of
systematic and coordinated activities to stay in control. Standards have been developed for common practice in AM such
as PAS 55 [2] and ISO 55000 standards [3], [4], [5]. An important part of asset management is the selection of
maintenance style(s). Maintenance is defined here to comprise the activities of inspections, servicing and replacement.
The longer components can be utilized, the more value is retrieved. However, the more costly the maintenance, the
less profitable the delay of investments. Moreover if the functionality is jeopardized and failures occur with possibly
significant consequences, then business values may be violated severely. Liability and mitigation costs as well as
reputation damage may rise disproportionally and waste all benefits of prolonging the operation life of ageing assets.
How should asset management contribute to a sustainable, secure and affordable energy? This paper discusses some
of the maintenance and configuration optimizing dilemmas. First some basic aspects of maintenance and replacement are
discussed to set the context and definitions. Next the information is applied to diagnosable assets and finally the role of
redundancy is studied.

3.1 MAINTENANCE ACTIVITIES AND STYLES



Maintenance of the grid and components can be defined in various ways. Here maintenance is subdivided into three
classes of activities [1]:
• Inspections: activities or monitoring to assess the functioning and condition of assets
• Servicing: activities to restore or enhance the functioning and condition of assets, usually aiming to prolong the
asset lifetime
• Replacement: removing an asset and installing a (usually new) asset.

The four maintenance styles mentioned above can be applied to each of the three activity classes. Fig. 2 shows how
the maintenance styles respond to the so-called hazard rate h which is a measure for the likelihood that a working asset
will fail in the near future.
A description of the four styles is:
• Corrective Maintenance (CM): taking action to restore the required functionality after noted malfunction. This
usually means repair (i.e. servicing) or replacement. An advantage is that the full lifetime of the asset is
consumed; a disadvantage is that failure may come by surprise and flexibility is required to respond adequately
to possibly hazardous failure events. Emergency situations are often more costly to solve than planned actions.
• Period Based Maintenance (PBM): preventive action (inspection, servicing, replacement) that is planned based
on a period in terms of time, running hours, number of switching actions etc. So, not only time periods. The
advantage of PBM is that it allows efficient planning of resources and efforts. It prevents failures. A disadvantage
is that the planning is based on mainly the weakest assets in the batch and therefore not optimized for cost
efficiency. Excessive maintenance and replacement costs tend to be promoted by PBM, but the gain is to remain
in control. A method for optimizing the cost efficiency of PBM is described in [6] and Section 9.1.1 of [1].
• Condition Based Maintenance (CBM): preventive action based on the perceived asset functionality condition.
This requires diagnostic methods to assess the condition and knowledge rules to interpret the diagnostic results
in terms of remaining life before breakdown or possibly required reduction of the load. The advantage of CBM is
that maintenance is tuned to the individual asset. The disadvantage is a required flexibility as the utility has to
respond to the events that take place. How much time is left for mitigation, depends on the type of the relevant
process, on the availability of adequate diagnostics, on the adequacy of knowledge rules for interpretation of
diagnostic results and on the moment of evaluation.
• Risk Based Maintenance (RBM): preventive action based firstly on the perceived asset functionality condition and
secondly to what extent a failure event or malfunction is able to violate the corporate business values (such as
safety, liability, financial impact, etc.). RBM is not necessarily more cost efficient than CBM, but aims at a better
overall performance on the involved corporate business values, which may or may not be monetized (for further
reading, see e.g. section 9.1.4.2 in [1]).



2a. Hazard rate for maintenance style CM. Note: the hazard rate can rise infinitely.


2b. Hazard rate for PBM with two periods or CBM with two maximum hazard rate levels compared to CM.

Figure 2a and 2b: comparison of maintenance styles by their hazard rates. The coarser vertical scale for h(t) in 2a is to show the ongoing
development of hazard rate for CM. PBM is controlled by time, CBM by maximum allowable hazard rate. The RBM style resembles
CBM. In all cases a Weibull distribution with α=80 yr and β=3 was assumed in the present example. Where CM leads to high hazard
rates and likely failure, the styles PBM and CBM bring the asset back into new state (theoretically) and overall the hazard rate can be
kept low, which results in a significantly longer expected life. The difference between PBM and CBM is that PBM works with the same
maintenance period T for all assets, whereas CBM tunes the maintenance to need of the individual asset.


Though required technology and information may differ between the styles, it cannot be claimed that one style is
generally better or more efficient than the other. It depends on the purpose, on the balance between operational
expenses versus capital expenses (i.e. investment) and on the circumstances which style is more adequate. As an
example, pressurized oil cables can be periodically checked for oil pressure (PBM inspections) and the oil may be
replenished if necessary (CM/CBM servicing). As another example, many polyethylene insulated cables are not inspected
at all typically, but will be repaired after a failure (CM servicing) and if forensics indicate, the cable may be replaced
(CBM/RBM). As a third example, utilities may use periodic replacement schemes, by e.g. replacing cable after 40 or 50
years (PBM replacement). This sacrifices a part of the remaining cable operational life, but may save on costs for
emergency repairs and PBM allows a much easier planning. So, in practice utilities can have various reasons to apply a mix
of these styles [1].

3.2 RELIABILITY, AVAILABILITY, REDUNDANCY AND REPARABILITY



Reliability R is related to the likelihood that a system can fulfill its function. Assets are supposed to work at
commisioning and their operational life ends with the failure (or removal for other reasons). With inspections and
servicing it may be possible to extend this operational life employing the maintenance styles PBM or CBM. The reliability R
is equal to 1 minus the probability of failure F, i.e.
𝑅 = 1 − 𝐹 (1)
However some systems can be repaired after failure. This may not hold for the individual components, but usually it
does for a circuit, connection or grid that are repaired by replacing the faulted component. Reparable systems balance
between a working and a failed state. There are periods that the system functions and times that it is failed and under
repair. The availability A is defined as the ratio of the time Tup from getting available to function until failure and the total
time T (which is the sum of the time Tup plus the downtime Tdown after failure until the assets gets back into operation), i.e.:

𝑇!"
𝐴= (2)
𝑇!" + 𝑇!"#$

One certainty is that any event that can happen, will happen someday however unexpected it may seem. In grids this
is usually taken into account in two ways. In addition to the maintenance styles above two measures are:
• Redundancy: installing a connection as a double circuit (or higher redundancy) as in Table 1 prevents a failure in
a single circuit to down the connection. Of course the system is less or even not redundant anymore after a
failure. Moreover a failure in the second circuit before repair of the first downs the connection.
• Fast repair: if a system is reparable after failure, faster restoration of the functionality (higher correction or
repair rate) reduces the chances of a simultansous circuit malfunction and increases the availability of the grid.
Combining redundancy and fast repair/replacement is common practice in strategic infrastructures. The effect of this
approach can be calculated with so-called Markov chains. It is beyond the scope of this paper to discuss the mathematics,
but a typical result is summarized here. For further reading in relation to AM and specific conditions, see e.g. Sections 8.6
and 8.7 in [1].
First consider a connection without reduncancy, i.e. consisting of a single circuit. Assume that a failure rate λ and a
repair rate μ apply. The mean time between failures (MTBF1) of the circuit is:

1
𝑀𝑇𝐵𝐹! = (3)
𝜆

In the long run, the availability A∞,1 of the circuit is:

𝜇
𝐴!,! = (4)
𝜇+𝜆

Now, consider the case of a connection consisting of a double circuit. Assume that each circuit has a failure rate λ, a
repair rate μ (single repair only) and each circuit is able to carry the full load, then the mean time of between failures
(MTBF2), i.e. between events of a simultaneous outage of both circuits, is (note that usually μ >> λ):

2𝜆 + 𝜇
𝑀𝑇𝐵𝐹! = (5)
2𝜆!

In the long run, the availability A∞,2 of this connection is:

𝜇 ! + 2𝜆𝜇
𝐴!,! = ! (6)
𝜇 + 2𝜆𝜇 + 2𝜆!

-1
To make it more practical and for comparison, assume that the circuit would fail once per 20 years (λ=1/20 yr ) a
-1 -1
typical repair would bring the asset back into sercive in 10 days (μ = 0.1 d = 36.5 yr ). Table 1 compares the MTBF and A∞
of the single circuit and the redundant connection of two parallel circuits. The results show that the redundant connection
MTBF is 366 times the MTBF of the single circuit. The unavailability 1-A∞ of the single circuit is much larger than that of the
redundant connection.
The single circuit provides an average availability of more than 99%. In common language a percentage of more than
99% is often regarded as statisfactory, but for a strategic service as power supply this is generally regarded not sufficient.
Having a grid with possibly thousands of connections and each connection being out of service for half a day (i.e. the 12 hr
mentioned in Table 1) annually requires many corrective actions and probably considerable damage is done to society and
economy. It is not uncommon to require MTBFs in the range of thousands of years, e.g. to exceed 5000 yr.

Table 1: Comparison of MTBF and A∞ of a single circuit and a redundant connection (two parallel circuits)
Entity \ Case: Single circuit Redundant connection

Block diagram (see [1], Ch.7)



S0: circuit up; connection up
S0: circuit up; connection up
States (see [1], Ch.8) S1: 1 circuit down, connection up
S1: circuit down, connection down
S2: 2 circuits down, connection down

Applied state diagram (see Ch.8 of [1])


in the present case.
-1 -1
Here: λ=1/20 yr ; μ=36.5 yr

Mean Time Between Failures MTBF [yr] 20 7320
Availability A∞ [%] 99.86 99.9996
Unavailability 1-A∞ [%] 0.14 0.0004
Annual outage time [hr] 12 0.03


To avoid misunderstandings, the meaning of an MTBF > 4000 yr is not that such assets are expected to have an
operational lifetime τ > 4000 yr, but that an asset in that condition would only fail once every at least 4000 yr. Or put in
other words, if there are 4000 assets in that condition, then it is expected that not more than 1 asset of that condition will
fail in 1 year. That the asset may age and have a much higher failure probability after e.g. 40 years, means that its MTBF
shrinks during the wear-out phase at that time.
For utilities it is a challenge to find an optimum in asset management. A high degree of redundancy can be very
effective in building a reliable grid, but it is also costly to double or even triple the investments of a single, non-redundant
solution. Most regulators do require the utilities to apply a certain level of redundancy (particularly for heavier
connections) because of the strategic importance of electricity supply.
Fast repair touches on logistics, standardization of components and techniques, strategic storage of spares, multiple
suppliers of services and compnents, maintenance contracts etc.

3.3 SYSTEM AND/OR CIRCUIT RELIABILITY AND AVAILABILITY



How should the utility performance and its efficiency in replacement be evaluated? What is the ground for
replacement strategies: the reliability and availability of the (usually redundant) system or of each of the circuits?
The primary task of the TSO and DSO is to transmit and distribute electric power. However, as discussed in section 2
regulators also set boundary conditions such as the EU policy of sustainable, secure and affordable energy. The presence
of redundancy allows having a failure without interrupting power supply (granted the reserve does not fail during the
repair period). This seems to imply that the maintenance style CM is facilitated. In principle it is, but considering that other
business values than uninterrupted service also apply, redundancy should not be associated too strictly with CM. For
instance, if an asset like a termination or transformer fails, it can be a violent event involving an explosion and/or fire. This
may endanger human life and health as well as cause significant damage to other assets and the substation. It can also
take out other assets as collateral damage. Therefore redundancy should not become an excuse for careless operation.
In order to keep track and rank priorities, many utilities employ Health Index methods (see Fig. 3). The Health Index
(HI) is a score for the health (i.e. condition) of assets. The HI indicates the perceived hazard rate or remaining life of an
asset. With CBM it is used as the ground for decisions on maintenance actions including replacement.
The HI is generally applied to single assets. It would make sense to upgrade this methodology to a Combined HI (CHI)
at the level of circuits, connections, substations, subgrids etc. in order to prioritize and coordinate maintenance. One way
to achieve this is to link the HI to the hazard rate since this can be calculated for various system configurations and repair
actions (further reading see [1], Section 9.2.2). In this way not only assets but also the reliability and availability of circuits
can be evaluated and scored.
Figure 3: The concept of the Health Index [6]. Static and Dynamic data are collected based on which the Health Index as a condition
score is estimated. The static data are the identification and characteristic data; the dynamic data are related to the condition. Here the
methodology of TenneT TSO is shown. The HI is however not standardized as yet and various score ranges and color schemes are in use.
Nevertheless scoring assets on their condition is a commonly shared idea.

In an RBM approach the risks to utility business values are leading in prioritizing maintenance actions. The probability
of a failure event to occur is multiplied with the impact that such an event may have. This maintenance style provides the
strategy to take threat to safety and other hazards into account. For evaluation of risks so-called risk matrices (or in the
continuous variant: the risk plane) lead to a Risk Index. The risk that a redundant system can go down, is part of the
evaluation with the aspect security of supply.
In the following some examples are elaborated to compare the strategies in practice. First situations will be studied
where the condition may be diagnosed (whether economically efficient or not). Secondly, redundancy is studied while
hardly to no condition information is available. In that case the situation is analyzed by assumed distributions.

4 DIAGNOSABLE ASSETS

If the condition of an asset can be assessed with sufficient confidence the three maintenance styles CM, PBM and
CBM are all feasible in principle. If the impact of failure is known, then also RBM is feasible. The suitability and comparison
between CM, PBM and CBM is discussed in [6], while quantitative examples are elaborated in greater detail in section 9.1
of [1]. Which of the mainteanance styles is most appropriate depends on the cost and posibilities of preventive
maintenance when comparing CM and PBM. Both maintenance styles treat all assets the same. CM lets all asets run to fail
and takes advantage of the full asset operational life, but the impact of unforeseen failure may be high and flexibility is
required to adequately respond to failures. As failures are part of this asset management style, redundancy is commonly
employed to warrant continuous power supply. Part of the gain by employing the full asset life may be cancelled by
increased investments in redundancy. The need for redundancy may be higher for CM than for PBM as PBM sacrifices
asset life in order to prevent unplanned outage. Emergency situations should occur less with PBM than with CM.
When comparing CBM and PBM, the largest gain of CBM lies in utilizing information of the individual assets. Fig. 4
(taken from [6]) shows the distribution of failure times of a group of 10 assets. It is assumed that each failure is preceeded
by a detectable signal that indicates the imminent failure of that particular asset. The failure likelihood per asset is shown
by the percentage markers as a representation of failure time probability after the first indication of imminent failure. E.g.
if the green status line in Fig. 4 defines the moment of evaluation, assets i=1-4 have already signalled that failure is
imminent (i=1, 2 May have failed already), while assets i>5 have not yet given an alert. This is essential in CBM which
allows to timely plan and mitigate the situation with assets i=1-4. In contrast if PBM is applied all assets would be replaced
or serviced well before the first fails (otherwise the managemnt style becomes CM). PBM might therefore dictate
replacement at the moment the cumulative dostribution reaches the e.g. 1% level at t=3 (arbitrary units). CBM would
allow operation on the average until about t=11 (arbitrary units). So on average the lifetime would triple that what is
achieved with PBM in this case. CBM requires adequate diagnostics and adequate expert rules in order to judge the need
for mitigation.
CBM techniques and rules often require more advanced technology than CM or PBM and can be more expensive.
Evaluating whether or not such investments are worthwhile is part of asset management.

Figure 4: Illustration of CBM [6]. See text.




Not all assets allow sufficient diagnosis and some assets can hardly be diagnosed. The choice is then between CM and
PBM. Applying redundancy (see section 3.2) is a well established method to prevent the interruption of power supply.
With an ageing asset population often fear grows that CM and CBM may lead to large numbers of assets that
simultaneously require attention. The question also arises how effective redundance is with older assets. This is subject of
the next sections.

5 EVALUATION OF REDUNDANCY IN THE ABSENCE OF FAILURE AND DIAGNOSTIC DATA



In the past years quite a number of assets approached or even passed their planned lifetime. Several times a
replacement wave was predicted, which often did not materialize as yet. Many assets have been operated well within
their original specifications and may live longer than anticipated at commissioning indeed. However the replacement of
assets like cable and transformers have a long lead time which hinders adequate and timely mitigation if replacement is
urgent. Is there nothing to worry about or is it possible that despite lack of failures a replacement wave can emerge? What
are recommendable strategies?
As a warning, assets do exist that did not meet their planned lifetime. A particular danger is that redundant circuits
are usually installed at the same time in single project. The idea of redundancy is that a good cable can fully take over
when a parallel cable fails. However, if both cables are of the same age and make, while wear out becomes significant, do
the circuits still provide a sound redundancy? Or may the situation arise where one bad component is the shortfalling
spare for the failed one? Can such a situation grow without any faults or warnings? How fast can a grid condition
deteriorate unnoticed?
If the condition of an asset cannot be assessed and no failures took place, but the operational life of an asset is about
to exceed the planned lifetime or is already beyond that, then the question is how effective is redundancy?
In the following a case is described where no failure took place as yet and adequate diagnostics are not available. The
purpose is to check whether it statistically makes sense that a redundant circuit without any recent failures should be
regarded suspect. The following steps are made in the case evaluation:
a. Define the case of a cable connection and define a repair rate based on utility practice
b. Set criteria for required mitigation based on asset condition levels in terms of percieved hazard rates
c. Set boundary conditions for the ruling statistical distribution and its failure rate based on international experience
d. Evaluate whether the hazard rate of the redundant circuit indicates a need for replacement indeed


5.1 CASE DESCRIPTION

The example case under consideration is a redundant 150 kV cable connection consisting of two parallel SCOF (Self
Contained Oil Filled) cables with a length of 1.25 km each. The cables feature 2 terminations and 3 joints. Therefore each
circuit consists of 3 single phase cables, 6 terminations and 9 joints. At the moment of evalution, the connection is 56 yr in
service and does not show signs of degradation nor did faults occur. In fact, parts of the cable circuit may even be older,
but such information is not fully traceable. For the case an age of 56 yr is adopted. The circuit is believed to have been
operated below its rated load through its service life. However power demand is growing and the connection may be
operated at its rated load in the near future. Is the lack of failures sufficient to believe that this connection is sound?
Each circuit can fail due to various incidents: a wearing joint, a wearing termination, wearing cable insulation and
failure to a random process unrelated to a wear-out process (e.g. lightning strikes or cosmic radiation). The circuit survives
if it survives all failure mechanisms. In terms of system reliability theory this is a series system (ch.7 of [1]). As a
consequence the circuit hazard rate is the sum of the respective process hazard rates.
Cigré B1 regularly evaluates experience with cable world-wide. In order to adopt an objective reference the average
failure rate over a lifetime of 40 years is taken from the work of Cigré WG B1.10 (page 30 in TB 379 [7]). The average
failure rates are listed in Table 2 below. As a remark, currently Cigré WG B1.57 is drafting an update in which the failure
rates will most likely differ from Table 2 below and the outcome of the present analysis is likely to differ with the new
numbers as well. However, the method as such remains the same. TB 379 is used to define a reasonable hazard rate.


Table 2: Failure rates according to Cigré WG B1.10 [7]
Component Failure rate definition Failure rate
Cable Internal number of failures / 100 cable circuit km·yr 0.014
Cable External number of failures / 100 cable circuit km·yr 0.095
Cable Total number of failures / 100 cable circuit km·yr 0.109
Joint Internal number of failures / 100 components·yr 0.002
Joint External number of failures / 100 components·yr 0.002
Joint Total number of failures / 100 components·yr 0.004
Termination Internal number of failures / 100 components·yr 0.005
Termination External number of failures / 100 components·yr 0.009
Termination Total number of failures / 100 components·yr 0.014

The failure rate λav of a single circuit is therefore:

𝜆!" = 𝜆!"#$%,!" + 𝜆!"#$%,!" + 𝜆!"#$%&'!%(&,!" (7)

Taking into account the circuit kilometers and the component numbers, the result is:

0.109 0.004 0.014
𝜆!" = 1.25 ∙ +9∙ +6∙ = 0.00256/𝑦𝑟 (8)
100𝑦𝑟 100𝑦𝑟 100𝑦𝑟

This is taken as the average failure rate of the cable circuit over 40 yr (which was the design lifetime at the time the
cable was produced). As the cable is wearing, the failure rate will be low at the beginning and increase over the years. A
similar type of development as in Fig.2a can be expected. Normally there is a steady hazard rate λcnst not related to
wearing and increasing hazard rates that are due to wearing, with an average λwear,av. The total average hazard rate λav is:

𝜆!" = 𝜆!"#$ + 𝜆!"#$,!" (9)

It depends per utility and country what values these hazard rate have. TB379 suggests that the majority of failures is
-3 -1
due to external causes (not necessarily unrelated to wearing though). Here it is assumed that λcnst=1.042·10 yr . The
-3 -1
average hazard rate for wearing over 40 yr is λwear,av=1.521·10 yr . Note that the momentary λwear(t) is an increasing
function of time and after 40 yr is expected to be (significantly) larger than the average λwear,av (see below in section 5.3).

5.2 CRITERIA FOR REPLACEMENT

Criteria for action are not standardized. RBM describes a product of occurrence frequency (i.e. rate of occurrence or
failure rate) and a measure of impact. Some utilities translate all impacts into financial damage (monetizing), while other
utilities employ a impact gravity scale. At this stage the risk scaling is a matter of culture and preference. A further
discussion can be found in section 9.1.4.2 of [1]. Generally with a given the impact, the risk index varies with failure rate.
In the present case for the particular situation of both circuits to fail and a black-out of the 150 kV connection, Table 3
gives the boundaries. Note that the Risk Index comes with its own color scheme, which is not necessarily the same as that
of the Health Index.

Table 3: RI classes in terms of connection MTBF and hazard rate for impact of the particular aspect of simultaneous failure of both
circuits. Other risks may or may not give rise to a higher risk index due to a higher occurrence rate and/or impact. The hazard rate h
applies to the connection and is defined as the inverse MTBF.
-1
Risk Index color MTBF [yr] Hazard rate [yr ] Meaning
Green θ > 10,000 h < 0.0001 negligible to low risk, no mitigation required
Yellow 10,000 ≥ θ > 1,000 0.0001 ≤ h < 0.001 medium risk, mitigation compulsory
Orange 1,000 ≥ θ > 100 0.001 ≤ h < 0.01 high risk, mitigation compulsory
Dark orange 100 ≥ θ > 10 0.01 ≤ h < 0.1 very high risk, mitigation compulsory
Red 10 ≥ θ 0.1 ≤ h unacceptable risk, highest urgency

Table 3 will be used to evaluate the necessity to replace. If the MTBF θ of the redundant connection drops below
10,000 years then corrective action is compulsory and replacement is justified. This MTBF as explained in section 3.2
above does not only depend on the circuit failure rate λ, but also on the repair rate μ as shown in equations (5) and (6).
-1
For this type of cable a repair rate of μ =1/26 yr is assumed.

5.3 RULING DISTRIBUTION



In the present case are no failure data available which is the core of case problem. In order to evaluate whether the
situation can be suspect, a reasonable worst case scenario is designed. The present boundary conditions are:
• There are no failures during the past 40 years
• Random failures are possible with a constant hazard rate λcnst over time.
• Wearing is expected to develop and the corresponding hazard rate λwear(t) may be expected to increase over
time
• The momentary circuit hazard rate hcirc(t) can be calculated by summing the hazard rates λcnst and λwear(t)
• The average hcirc.av of the momentary hazard rate hcirc(t) over the period 0-40 yr is assumed equal to the Cigré
average hazard rate λav according to [7], the above Table 2 and Eq.(9).
Cables fail usually at their weakest spot and are examples of weakest link in a chain cases. For such cases Weibull
distributions generally apply. The two parameters Weibull distribution has a scale parameter α and a shape parameter β.
The Weibull distribution is employed in the following. The cumulative failure distribution F(t) is defined as:

𝑡 !
𝐹 𝑡 = 1 − 𝑒𝑥𝑝 − (10)
𝛼

The corresponding hazard rate of the Weibull distribution is defined as:

1 𝑑𝐹 𝛽 ∙ 𝑡 !!!
ℎ 𝑡 = ∙ = (11)
1 − 𝐹 𝑡 𝑑𝑡 𝛼!

-1 -3 -1
If β=1, then the hazard rate becomes a constant that is equal to α . The constant hazard rate λcnst=1.042·10 yr
-1
mentioned in Section 5.1 Eq.(9) can therefore be described by a Weibull distribution with αcnst=(λcnst) = 960 yr and βcnst=1:

1
𝜆!"#$ = (12)
960𝑦𝑟

The challenge is now to define a suitable Weibull distribution for the wear-out mechanism that matches the boundary
conditions above. Here the purpose is to demonstrate how the likelihood can be studied that a suspect situation arises
unnoticed. For the sake of simplicity a Weibull based hazard rate hcirc(t) is designed that is not refined to discriminate the
separate contributions of joints, terminations and cable, but just fits a single wear-out process (it is however not difficult
to design a refined multi-parameter-set model). The idea is to define a momentary circuit hazard rate hcirc(t) from which
the connection MTBF θconn can be estimated. This momentary hazard rate is the sum of the Weibull based λcnst and λwear(t)
mentioned in 5.1:

1 𝛽 ∙ 𝑡 !!!
ℎ!"#! 𝑡 = 𝜆!"#$ + 𝜆!"#$ 𝑡 = + (13)
960𝑦𝑟 𝛼!

The parameters α and shape parameter β must now be chosen such that the mean of hcirc(t) over the period 0-40 yr
-1
equals the average λav=0.00256 yr mentioned in Section 5.1, Eq.(8). Table 4 shows a selection of parametre sets (α,β)
with the resulting hcirc.av over the period 0-40 yr. These sets were chosen by first setting a value for β in a range that is
-1
quite common for paper degradation and next fine tuning α and β such that hcirc.av ≈ λav=0.00256 yr .


Table 4: Various parameter sets with the average hazard rate hcirc.av over period 0-40 yr. The MTBFcirc.av is the inverse of hcirc.av and
indicates the average time that a failure would occur. Note: the moment of evaluation is 56 years (section 5.1).
-1
α [yr] β hcirc.av [yr ] MTBFcirc.av = 1/hcirc.av [yr]
88 3.84 0.00255 391
78 4.52 0.00257 389
70 5.4 0.00256 390
66 6.05 0.00255 392


Lacking experimental data it is not possible to assess the actual ruling failure distribution. However the parameter sets
-1
in Table 4 all fulfill the boundary condition that the average hazard rate over the first 40 yr about equals 0.00256 yr . It is
explored in the following whether such sets can induce a situation that should be mitigated.

5.4 HAZARD RATE AND HI FOR OVERDUE CIRCUITS



First the results for set (α,β) = (70yr, 5.4) are presented, after which an overview is given of the other results. Fig. 5
shows how the circuit hazard rate develops over time. The figure shows the hazard rates for random failure λcnst, wear-out
failure λwear(t) and hcirc(t) as the sum of both. The adopted average hazard rate level λav over the first 40 years is also
shown. Already before the 40 years are passed hcirc(t) lifts off, but with an MTBF of 390 yr it is very well possible that the
cable did not fail as yet.
-1
The moment of evaluation is t = 56 yr and then the circuit hazard rate hcirc = 0.0358 yr . With the circuit hazard rate
λ = hcirc and the repair rate µ the MTBF can be calculated with Eq.(5), which appears θ(56yr) = 10,189 yr. This is very close
to the limit θ = 10,000 yr below which mitigation is compulsory according to Table 3. In fact, 1 yr later at t = 57 yr this limit
is exceeded as shown in Table 5. This is an example of case where it is not necessarily noticed that the connection has
degraded to a level where mitigation is compulsory.
Does this indicate that the situation out of control? No, it is not; the risk is medium and therefore higher than
allowable, but the risk not yet high. Nevertheless, good asset management practice requires to follow up the risk before it
grows high.
Quantitatively, if the (α,β) = (70yr, 5.4) set applies and if there are 1/hcirc = 28 circuits in a similar condition, then one
of those circuits per year will likely fail. If there are 10189 connections in a similar condition, then one of those connection
per year will likely black out. Rather than regarding such failures an incident, it may be realized that a much larger group
may exist and that it is a structural development.
The case description in section 5.1 also mentioned that the cable may see a higher load in the future. This means that
the temperature will rise and the paper insulation will be increased. In addition to the present wear-out effect, a higher
load will mean that the α parameter will be reduced and ageing speeds up. The present analysis does not take that into
account, but in a more comprehensive analysis it is recommendable to be involved.

Figure 5: The development of the circuit hazard rate hcirc(t) in time. The average circuit hazard rate over the first 40 yr equals λav that is
based on TB379 [7]. These results are achieved with the parameter set (α,β)= (70yr, 5.4).


Table 5: Various parameter sets (α,β) with the average hazard rate hcirc.av over period 0-40 yr. The third and fourth column show the
circuit hazard rate and its inverse at the moment of evaluation (i.e. t 56 yr). The fifth and sixth column show the connection hazard rate
for black-out of the connection an dthe inverse hazard rate. The seventh and the eigth column show the year in which the connection
-1 -1
hazard rate exceeds the limit 0.0001 yr respectively 0.001 yr . These limits are the border between green and yellow respectively
between yellow and orange (cf. Table 3).
-1 -1
α β hcirc(t=56yr) 1/hcirc(t=56yr) hconn(t=56yr) 1/hconn(t=56yr) t(hconn=0.0001yr ) t(hconn=0.001yr )
-1 -1
[yr] [yr ] [yr] [yr ] [yr] [yr] [yr]
88 3.84 0.0158 63.4 0.000019 52321 >100 yr >100 yr
78 4.52 0.0229 43.7 0.000040 24825 90 >100 yr
70 5.4 0.0358 28.0 0.000098 10189 57 74
66 6.05 0.0488 20.5 0.000183 5478 53 67

Table 5 shows the results for all parameter sets listed in Table 4. It shows that the parameter choice has a large
impact and indeed which would apply is not known as data lack. Each reasonable assumption can be considered. If data
from comparable cable circuits would be available, this could help. The most important message is however that even
redundant circuits that show no failures as yet, still can require mitigation. The background lies in the quality of
redundancy.
A step further is to investigate the effectiveness of redundancy. It may be noted from Eq.(3) and (5) that the MTBF of
a circuit inversely declines with the circuit hazard rate, but (recalling μ >> λ in general) that the MTBF of a connection
declines with the square of the circuit hazard rate. It means that redundancy itself deteriorates faster than the circuit
mean-times-between-failures do.


Figure 6: The deterioration of MTBF. The connection MTBF collapses faster than the circuit MTBF. Also the ratio of circuit MTBF and the
connection MTBF starts to increase significantly indicating the faster shrinkage of the latter. These results are achieved with the
parameter set (α,β)= (70yr, 5.4).
Fig. 6 also compares the connection and circuit MTBF θ (right-hand vertical axis scale). It underlines the decreasing
quality of redundancy. Furthermore, this ratio may be used to define a quality of redundancy Qr as 1 minus the ratio:

𝜃!"#! 𝑡 ℎ!"## 𝑡
𝑄! 𝑡 = 1 − =1− (14)
𝜃!"## 𝑡 ℎ!"#! 𝑡

The circuit hazard rates are already a factor roughly 10 larger than the average hazard rate over the first 40 years. It is
the decay of the redundancy that is worrying. Table 3 indicates that mitigation is compulsory already or soon will be. The
solution can be replacement of one or both cables, which sacrifices cable operational life. Another strategy is to install a
new, third parallel cable in the connection which changes the configuration and upgrades the redundancy. The new
situation can be analyzed by the Markov chain method (e.g. section 8.6 in [1]). A third circuit requires a bay at the
substations on both sides of the connection.

6 DISCUSSION AND CONCLUSIONS



This paper discusses the background of the transition to modern asset management. The Corrective, Period Based,
Condition Based and Risk Based maintenance styles were discussed with the circumstances where these are applicable.
Particular attention is paid to the importance and role of redundancy. An estimation method is elaborated to evaluate
situations even if failure data are not available. Enhancement of the load in the future would further accelerate ageing.
It is shown that with prolonged deployment of circuits the possibility exists that the connection must be mitigated.
However lack of failure data may just not give a reason to study the case and reach the conclusion that mitigation is
necessary indeed. A promising strategy to deal with ageing connections is to upgrade them with a third parallel cable if the
substation allows.

ACKNOWLEDGEMENTS
The author gratefully acknowledges the financial supports for this research from the Netherlands Ministry of
Economic Affairs (TKI project FIND-GO, ref. TEUE418008).


REFERENCES

[1] R. Ross, Reliability Analysis for Asset Management of Electric Power Grids, Hoboken, NJ: Wiley-IEEE Press|, 2019,
pp. 4-20.
[2] British Standard Institution, “Asset management, Vol. Part 1: Specification for the optimized management of physical
assets,” British Standard Institution, London, 2008.
[3] International Standards Organization, “ISO 55000: Asset Management - Overview, principles and terminology,”
International Standards Organization, Geneva, CH, 2014.
[4] International Standards Organization, “ISO 55001: Asset management - Requirements,” International Standards
Organization, Geneva, CH, 2014.
[5] International Standards Organization, “Asset Management - Guidelines on the Application,” International Standards
Organization, Geneva, CH, 2014.
[6] R. Ross, “Health Index methodologies for decision-making on asset maintenance and replacement,” in Cigré 2017
Colloquium of Study Committees A3, B4 & D1, Winnipeg, Canada, 2017.
[7] Cigré WG B1.10, “TB379 Update of Service Experience of HV Underground and Submarine Cable Systems,” Cigré,
Paris, 2009.


Robert Ross is professor at TU Delft, director of IWO (Institute for Science & Development, Ede). Professor at HAN
University of Applied Sciences and AM Research Strategist at TenneT (TSO in the Netherlands and part of Germany). At
KEMA and the Royal Institute for the Navy he worked on reliability and post-failure forensic investigations. His interests
concern reliability statistics, electro-technical materials, sustainable technology and superconductivity. He was granted a
SenterNovem Annual award for energy inventions and nominated Best Researcher by the World Technology Network.

You might also like