You are on page 1of 12

Capacity and Reliability Analy ses with Applications to Power

Quality
Mohammad Azama, Fang Tua, Yuri Shlapaka, Thiagalingam Kirubarajana, Krishna Pattipatia* and
Rajaiah Karanamb
a

Dept. of ECE, University of Connecticut, Storrs, CT 06269-2157, USA


GE Industrial Products and Systems, 41 Woodford Ave., Plainville, CT 06062 USA
ABSTRACT

The deregulation of energy markets, the ongoing advances in communication networks, the proliferation of intelligent
metering and protective power devices, and the standardization of software/hardware interfaces are creating a dramatic shift
in the way facilities acquire and utilize information about their power usage. The currently available power management
systems gather a vast amount of information in the form of power usage, voltages, currents, and their time-dependent
waveforms from a variety of devices (for example, circuit breakers, transformers, energy and power quality meters,
protective relays, programmable logic controllers, motor control centers). What is lacking is an information processing and
decision support infrastructure to harness this voluminous information into usable operational and management knowledge to
handle the health of their equipment and power quality, minimize downtime and outages, and to optimize operations to
improve productivity.
This paper considers the problem of evaluating the capacity and reliability analyses of power systems with very high
availability requirements (e.g., systems providing energy to data centers and communication networks with desired
availability of up to 0.9999999). The real-time capacity and margin analysis helps operators to plan for additional loads and
to schedule repair/replacement activities. The reliability analysis, based on computationally efficient sum of disjoint products,
enables analysts to decide the optimum levels of redundancy, aids operators in prioritizing the maintenance options for a
given budget and monitoring the system for capacity margin. The resulting analytical and software tool is demonstrated on a
sample data center.
Keywords: Power quality, reliability, capacity margin, sum of disjoint products, maintenance scheduling, residual life.

1. INTRODUCTION
The key factors that play a significant role in reliable, efficient and cost effective power supply to production or service
facilities are as follows:
1.

Uninterrupted power supply

2.

Power quality

3.

Presence of sufficient excess capacity to overcome unforeseen situations

4.

Intelligent scheduling of loads and standby power sources (usually generators)

Among the above stated factors, the first three are critical to high-tech industries such as semiconductor device fabrication
facilities, Internet data centers and telecommunication switching centers. This is because these industries are intolerant to the
slightest interruptions of power and are highly sensitive to power quality variations.
When the primary power source (usually a feeder from the power company) fails, emergency/standby sources need to be
online within a specified time window to ensure continued satisfactory operation of a facility. Standby generators and
Further author information: (Send correspondence to Krishna Pattipati)
Krishna Pattipati: E-mail: krishna@engr.uconn.edu
Mohammad Azam: E-mail: tinku@engr.uconn.edu

Research supported by GE Industrial Systems, Plainville, CT 06062, USA.

uninterruptible power supply (UPS) are the commonly used equipments as secondary sources of power. Superconducting
magnetic energy system (SMES) is also being incorporated into the existing secondary sources of power supply.
Power quality is the degree to which the utilization and delivery of electrical power affects the performance of electrical
equipment. Any power line disturbance (e.g., sags, spikes, surges, electrical noise, harmonics, brownouts) that affects the
performance of sensitive electronic equipment is related to power quality. As sensitive electronic loads proliferate on
commercial utility grids, the concern over power quality also increases. In modern industrial power systems, devices such as
power line conditioners (PLC), isolation transformers, transient voltage surge suppressors (TVSS), static transfer switches
(STS) and reactors are employed to provide immunity against power quality problems. Due to relatively small power
handling capacity, these devices are deployed only in some specific downstream areas of an industrial power system (e.g.,
servers, semiconductor fabrication equipment).
A system operates in a healthy state when it has sufficient capacity reserve to meet a deterministic criterion such as the
loss of the largest unit. In a marginal state, the system is not in any difficulty, but does not have sufficient margin to meet the
specified criterion. The system load exceeds the available capacity in the at risk state. The information on unused (if any)
capacity of an installation in critical systems is useful in deciding the placement of additional equipment in existing floor
space, while maintaining the system in a healthy state.
All power users have to pay energy and demand charges. Users, whose consumption patterns are highly fluctuating, need
to incorporate intelligent schemes for load scheduling to avoid the penalty for over-peaking. By scheduling standby
generators, industries can save on demand charges, while continuing normal operation.
Despite the use of above schemes, industrial power systems do not attain complete immunity against interruptions and
unhealthy operations. Simultaneous failure of both primary and secondary power sources and deterioration in equipment
performance due to disturbances and aging are the major causes. A prediction tool capable of forecasting plant reliability,
dynamically scheduling maintenance activities and analyzing capacity margin can significantly enhance the degree of
immunity.
In this paper we focus on the first three problems. We developed an efficient software tool for evaluating the reliability
of a power system. The reliability analysis identifies the components that are likely to jeopardize the stringent availability
requirements. Routine maintenance alone is almost always insufficient to strictly ensure the desired level of system
availability. Here, we propose a dynamic maintenance schedule to satisfy the demand at a particular level of availability at
minimum cost, or alternatively maximize the system availability within a given budget. We also focus on the computation of
excess capacity at different points of the system based on real-time monitored data (taking the power quality information into
account). The excess capacity analysis enables an operator to ascertain the bottlenecks in the system; this aids in ensuring
sufficient capacity margins.
The paper is organized as follows. Section 2 describes the basic methodology for reliability and capacity margin
analysis, and maintenance scheduling. Section 3 applies the methodology to the power system of a data center. Section 4
summarizes the work and proposes future plans.

2. METHODOLOGY
The theoretical approach for reliability and capacity margin analysis is discussed in this section.

2.1 Basic Concepts For Reliability Computation


2.1.1 Preliminaries
Component reliability decreases with time due to aging and other derating effects. Here, we define the concepts of minimal
path sets and minimal cut sets that are used to predict system reliability.
Suppose a system comprise of n components. We assume that each component is either functioning or has failed. We
introduce an indicator variable x i such that,
1
xi =
0

if the i th component is functioning


if the i th component has failed

(1)

We can introduce a function called the structure function ( x ) to determine whether the system is functioning or not. It is
defined by
1, if the system is functioning when the state vector is x
if the system has failed when the state vector is x
0,

( x) =
where

(2)

x = ( x1 ,. . . . . . . . . x n ) is the state vector. We assume that the replacement of a failed component with a

functioning one causes no deterioration in the performance of the system, i.e., the system is monotonic. This implies that
( x ) is a monotonically increasing function of x , that is, if x i y i i = 1,2,
, n, then ( x ) (y).

LL

A state vector x is called a path vector if ( x ) = 1 . If, in addition, (y) = 0 for all

y < x , then x is said to be the

minimal path vector. If x is a minimal path vector, then the set A = {i : xi = 1} is called a minimal path set. Thus, a minimal
path set is a minimal set of components whose functioning ensures the functioning of the system. Alternately, a state
vector x is called a cut vector if ( x ) = 0 . If in addition, (y) = 1 for all y > x , then x is said to be a minimal cut vector. If

x is a minimal cut vector, then the set C = {i : xi = 0} is called a minimal cut set. In other words, a minimal cut set is a set of

components whose failure ensures the failure of the system. Minimal cut sets are more suitable for analyzing a system in the
failure space, while minimal path sets are suitable for analysis in the success space. In the following, we will use the minimal
path set approach7.
Let

{Ai }is=1 denote the minimal path sets of a given system. We define

( x ), the indicator function for jth minimal path set,

by

j ( x ) = max
j

xi

i A

1 if all components of A j are functioning


=
0 otherwise

(3)

The above expression implies that a system will function if and only if all the components of at least one of the minimal path
set are functioning. Consequently,

( x ) = max j ( x ) = max
j

xi

i A

1 if j ( x ) = 1 for some j
=
0 if j ( x ) = 0 for all j

(4)

Further, we assume that the state of the ith component, x i is a random variable such that

P{xi = 1} = pi = 1 P{xi = 0}. The value p i , which equals to the probability that the ith component is functioning, is
called the reliability of the ith component. We introduce another variable r such that
r = P{ ( x ) = 1} = E{ ( x )}
Here r stands for the reliability of the system. When the components, e.g., the random variables
can be expressed as a function of the component reliabilities. That is,
r = r ( p)

(5)

{xi }in=1 are independent, r


(6)

where p = ( p1 , . . . . . . , pn ) . The function r ( p) is called the reliability function. When the lifetimes of components are
exponentially distributed, the reliability of the ith component at time t (given that it was operational at t = 0 ) can be computed
as e i t . Here,

i = Ti 1 , where

Ti is the mean time to failure (MTTF) of the ith component.

We can illustrate the concepts of minimal path sets described above via the following example. Suppose the system
consist of eight components is arranged as in Fig. 1, where the list of minimal path sets is also shown.

List of minimal path sets


A1 BFH
A2 BGH
A3 ACEH
A4 ACFH
A5 ADFH
A6 ADGH
A7 BDFH
A8 BDGH
A9 ACEFH

A
H

E
B

G
D

Figure 1. A system with eight independent components

For this system,


r = E{ ( x )} = E{max(x B x F x H , x B xG x H , x A xC x E x H , x A x C x F x H , x A x D x F x H , x A x D xG x H , x B x D x F x H , x B x D xG x H ,
x A xC x E x F x H )}
r = E{1 (1 x B x F x H )(1 x B xG x H )(1 x A xC x E x H )(1 x A xC x F x H )(1 x A x D x F x H )(1 x A x D xG x H )(1 x B x D x F x H )
(1- x B x D xG x H )(1 x A xC x E x F x H )}

(7)

(8)

To compute the value of r, we need to simplify the above expression as the sum of disjoint products (independent
events). However, this simplification is an NP hard problem. In the following, we employ an efficient method for computing
the sum of disjoint products (SDP).
2.1.2 Simplification of the reliability expression

There are several algorithms for obtaining SDP. Among them Abraham algorithm1, Abraham Lock Revised (ALR)
algorithm2, Abraham Lock Wilson (ALW) algorithm3 and Klaus Heidtmanns (KDH88) algorithm4 are efficient for different
type of systems. For example, KDH88 and Abraham algorithm are very efficient for small-sized networks, ALR works well
with both medium and small-sized networks, while ALW can handle even larger networks (but, complexity of
implementation is higher for ALW). Based on the consideration of complexity of implementation and speed of operation, we
find that ALR algorithm works most efficiently for our sample problem (power system of a data center). The algorithm, in a
concise form, is presented below:
1.

Find all the minimal paths and order them according to the following rules:
a)

Order by the size of the term, smaller terms precede larger terms.

b) For each group of terms of the same size, do lexicographic ordering. For example, abc precedes abd, abd
precedes bce, etc.
Suppose there are altogether s minimal paths, { A1
i = 2,

LL , s, determine whether

LL A } . Label the first minimal path set to be disjoint. For

Ai is disjoint with A1 ,

LL, A
s

i 1 by

using bitwise comparison. If yes, then label

Ai to be disjoint. Otherwise, go to step 2.


2. Take first path set A j which was found to be non-disjoint in step 1
i)

Form a polynomial, where each term is a set of variables in a prior path that is not also in the incumbent.

ii) Simplify the polynomial by absorption (we denote this polynomial by APj ).

ALR algorithm requires that the indicator variables x i be represented by alphabets. So, from here onwards, x A will be denoted as A ,

x B will be denoted as B , etc. Hence, we need to rewrite the expression for r in (7) as,
r = E{ ( x )} = E{max( BFH , BGH , ACEH , ACFH , ADFH , ADGH , BDFH , BDGH , ACEFH )}

iii) Invert the simplified form by an iterative rapid minimized inversion procedure. Order the inverted terms
according to the rules stated in step 1 (we denote this polynomial by AmPj ).
iv) Convert the minimized inverted form into a disjoint polynomial, AD j using the steps described below:
a)

Form a polynomial consisting entirely of 0-valued variables, where each term of this polynomial consists
only of those variables that: A) Are in a prior sister term of the inverted form, and
B) Are not included in the incumbent term of the inverted form.

b) Simplify this polynomial as in step 2 of the main algorithm.


c)

Invert and minimize (all variables are now 1-valued), we used Shier and Whitteds SW 35 method for fast
inversion.

d) If the minimized inverted form consists of 1 term, continue; else, if it contains factors of the form
x + y + z , etc, put these factors into disjoint form, i.e., x + x y + x yz , etc.
e)

Multiply the factors obtained in step (d) by the incumbent term from the inverted form and augment the
disjoint subpolynomial. When all terms of the inverted form for the incumbent minimal path have been
processed, proceed to step (v) of the main algorithm.

v) Multiply each term of the disjoint polynomial by all of the 1-valued variables of the incumbent path.
vi) Augment the system polynomial.
vii) Go to the next path set that is found to be non-disjoint in step 1 and start over from step 2( i).
For the example in Fig. 1, the ALR algorithm works as follows:
Term 1: A1 BFH
Labeled as disjoint according to step 1 (as this is the first term the of the ordered path set).
Term 2: A2 BGH
Not disjoint with A1. Step 2(i) gives rise to the polynomial AP = F and AmP = AP = F . After performing inversion
2

(as stated in step 2(iii)), we obtain, AmP = F . Since no ordering is required here, we perform step 2(iv) next and
2
determine AD = F . Step 2(v) results in the disjoint term, BF GH .
2

Term 3: A3 ACEH
Not disjoint with (A1, A2). Step 2(i) gives rise to the polynomial AP3 = BF + BG and AmP3 = AP3 = BF + BG . After
performing inversion (as stated in step 2(iii)), we obtain AmP3 = B + F G. Again, since no ordering is required here,

we perform step 2(iv) next and determine

AD3 = B + BF G . Step 2(v) results in the disjoint terms

AB CEH , ABCEF G H .
Performing disjoint operations for the rest of the path sets, we finally obtain

( x ) = BFH + B FGH + A BCEH + ABCE F GH + A BC E FH + ABC DFH + ABC D EGH + ABC DE FGH
+ ABDF GH + A BD FGH + A BC DEF GH
from which system reliability r can be computed directly.

Rapid inversion is simultaneous multiplication and inversion. Every term of the product is a Boolean product of a term T of the
multiplicand, which is a Boolean product of inverse of several variables, by the inverse V of a single variable. When T contains V, the
T-by-V product is T, and the term-by-V products for all the terms of the multiplicand that differ from T in that they contain all of the
variables of T except V are absorbed into T. For example, let the multiplicand be:

P = T + A B D + B C G + B D E F , the variable V = A and the term T = A B C . Then the resultant product is P.V = T + A B D .

2.2 Maintenance Scheduling


A dynamic maintenance scheduling method is proposed here based on the need to ensure a very high level of availability. We
consider two related scheduling problems:
1.

Minimize maintenance cost, subject to availability, a a min .


First, compute the reliability of the system over a period of time, t c ( t c can be either user defined or tentative). Suppose
at t m1 t , system availability a < a min , i.e., a < a min | t =t . We also assume that the system consists of n components
x1

i)

LL x

LL b , respectively. The algorithm works as follows:


component, x , (i = 1, LL n) and compute system availability. If current a > a
m1

n and their replacement costs are b1

Replace the i th
i
a successful solution set, S success .

ii) Replace two components, {x j , x k } such that x j S success

and x k | k j S success ( j = 1,

min

, then insert x i in

LL n, k = 1, LL n, ).

Compute the system availability. If current a > a min , then include {x j , x k } in the successful solution set, S success .
iii) Continue to increase the number of components to be replaced and determine all the elements of S success .
iv) Compute the cost to implement each successful solution and find the one with minimum cost.
2.

Maximize availability within a specified budget, B.


In this case, proceed as follows:
i)

Determine all possible combinations of the


, n ) at a time.
(where, m = 1,

LL

set

of components,

x1

LL x

take

components

ii) Find the combinations with replacement cost less than or equal to the specified budget and compute availability with
replacement of each combination at t = t c . Find the combination that results in maximum availability and label it as
the best solution.

2.3 A Software Module for Assessment of Reliability/Availability and Maintenance Scheduling


In order to assess reliability/availability and to schedule maintenance activity, we are developing an integrated software tool
consisting of five compononents. A brief discussion on their functions is given below.
1.

Sensors/Component State Evaluator: This block translates the monitored data into useful information for evaluating
and estimating the states of components in the system. By using signal processing techniques and making

Component specifications,
System parameters

Historical Lifetime Data


Implemented Block
Lifetime Data Modifier

Theoretical Reliability
Predictor

Sensors/
Component State
Evaluators
Monitored Data

Availability Evaluation &


Maintenance Scheduling
Series of alternate solutions

Desired Availability Level


&
Maintenance Budget Data

Partially
Implemented Block

Comparison and Best


Solution Evaluation

Power
System

Maintenance & Repair

Figure 2. Block diagram of the reliability/availability assessment module

In practical situations, allocated budget for dynamic maintenance seldom exceeds the cost required to replace more than a few component
at a time. Consequently, we set m = 2 as default. However, the software is able to handle higher values of m.

comparison with the conditions expected/specified at different levels of performance, useful information concerning
residual lifetime of a component is extracted here6.
2.

Lifetime Data Modifier: Theoretical evaluation of reliability is solely based on the components MTTF data. Since
real operating conditions of a component differ from assumed ones, discrepancies arise between specified and true
MTTFs. Using the historical data on reduction of lifetime related to the deterioration of performance and comparing
it to the processed information on component state, better predictions of MTTFs could be made. This block will be
implemented by making use of Neural Network techniques6.

3.

Theoretical Reliability Predictor: Based on the system component specifications and residual lifetime predictions,
this module predicts the theoretical reliability of the system. It also provides information on fault propagation in the
system.

4.

Availability Evaluation & Maintenance Scheduling Block: This block computes the availability of the system
projected at different instances in the future and also proposes alternative solutions for maintenance scheduling.

5.

Comparison and Best Solution Evaluation Block: After generating a series of alternative solutions, this block selects
one which provides the maximum availability under specified maintenance budget or one which ensures a given
level of availability with minimum cost.

After deployment at a particular site, the accuracy of lifetime data modifier block improves with usage. Implementation of
this block should enhance the performance of the reliability predictor.

2.4 Capacity Margin Computation


The residual capacity of individual units and of the overall system can be computed from equipment specifications and the
monitored/trended data. An overview of the algorithm is as follows.
Let U i denote a unit in a power system with a rated capacity, C i . Let the incoming lines to this unit be
these lines are originating from the units
units

{U }

ki
i
out m m =1 .

{C }
{V

i
out m

i
, I out
m

ji
i
inq q =1 .

The outgoing lines from this unit are

{L }

ki
i
out m m =1

ji
i
inq q =1 and

and terminate at the

Let the corresponding rated capacities of the units connected to incoming and outgoing lines be

{C }
, pf }

ji
i
inq q =1 and

{U }

{L }

ki
i
out m m =1

, respectively. Let the measured voltages, currents and power factors on the outgoing lines be

ki
i
out m m =1 .

U ini 1


Liin2

i
in2

Liin j




U ini j




Liout1

Liin1

Liout 2

Ui

i
out k i

L
i

i
U out
1

i
U out
2

i
U out
k

Figure 3. Computation of the excess power at Unit Ui

The power delivered to

i
i
i
i
U i out m from U i is, Pout
= Vout
.I out
. pf out
. Hence, the total power output from U i is,
m
m
m
m

ki

ji

i
Poi = Pout
. The sum of the capacities of input units is given by, Cini = Cini q . The excess capacity of Ui can be
m
m =1

evaluated as,

q =1

CExcessi = min(Ci P , C P ) = min(Ci , C ) P


i
o

i
in

i
o

i
in

i
o

In order to improve reliability, redundant components are added to a power system. Typically, one finds four different
types of redundant situations:
1) A unit has no redundancy (represented by O Operating alone)
2) A unit has redundancy and the redundant units are operational and are sharing load (represented by O/S
Operating and Sharing).
3) A unit has redundancy, but the redundant units are not operational and require a specified amount of time to be
brought into operation (represented by R/C Redundant Cold Spare).
4) A unit has redundancy and the redundant units can be readily brought into operation (represented by R/H
Redundant Hot Spare).
For case 1, the expression for C Excessi can be readily used. If we encounter O/S type of redundancy, then we need to run a
Load flow/Power flow analysis on the system to determine the exact amount of power being handled by the unit and to
compute C Excessi for each unit. For cases 3 and 4, the excess capacities of the redundant units are zero. The capacity margin of
the system is evaluated as,

C m arg insystem = min(C Excessi ) . Capacity margin of a system is a dynamic performance indicator; it

is evaluated from the monitored real-time data and is vital for the well-being analysis of a system.
The above stated procedure gives satisfactory results only under ideal operating conditions, i.e., under the situation
where the power quality is excellent and the devices experience no derating phenomena. In practice, one needs to consider
the effects of harmonics and decrease in capacity of devices due to non-ideal operating conditions. These are currently being
incorporated into our software.

2.5 Software Module for Excess Capacity Computation


The excess capacity computation module uses the monitored data, information on derating effects of the devices due to aging
and power qulity variations, and the line and bus (nodes in this case) specifications as the primary inputs. Initially, the
monitored data is modified** to compensate for the derating effects by using different parametric models of components to
approximate the loss of capacity of a component due to aging and power quality variations. Next, Load Flow computation
techniques are employed to determine how power is apportioned among different nodes. Then, by using a recursive
algorithm the excess capacity at each node/component is computed. This is done dynamically, as the monitored data is

Load Flow
analysis
Trend data

Calculate,

C i Poi

Is it an in
node (source)?

using Specifications and


Load Flow results
no

Modification
of monitored

Update Rated Capacity


( C j ) based on current
operational conditions

Monitored
power data

For each preceding node j,


determine C j

Using Load Flow data,


calculate how Cj is apportioned.
Suppose Cji is apportioned to node i.

yes
The Marginal Capacity of
node i is,

C i Poi

The Marginal Capacity of Node i

min(Ci , j C ji ) Poi

Figure 4. Excess capacity computation module

acquired. This module also identifies the device (component) having minimum capacity margin. If this capacity margin falls
below a specified limit, which is essential for a healthy system, then an alarm is triggered.
**

We are in the process of implementing this block. At present monitored data is directly fed to the Load Flow sub-module.
The Load Flow sub-module has been designed with provisions to choose among Newton-Raphson, Gauss-Siedel and Fast Decoupled
Computation methods.

3. APPLICATION OF THE TECHNIQUES ON A REAL WORLD SYSTEM


Data centers are industrial installations where a very high level of availability of power is a requirement. Power system of a
data center is usually comprised of several feeds, backup generators, UPS, switchgears, motor control centers (MCCs),
switchboards, etc. In addition, distribution transformers are present in some cases. Different type of servers, computer
networks and air conditioning systems are the primary load components for this system. The system houses a wide variety of
components, from power quality insensitive induction motors to delicate power electronic devices. Consequently, it serves as
a versatile test-bed for the performance evaluation of the reliability/availability, excess power capacity and maintenance
scheduling techniques we have developed.

3.1 Power System of a Data Center


The one-line diagram of a typical power system of a data center is shown in Fig.5. From the diagram it can be seen that this
system has multiple inputs/sources (although usually only one source functions at a time while others assume redundant
roles) and multiple outputs. Most of the other components of the system have redundant components as well (e.g., system
A utility distribution switchgear has the redundant system B utility distribution switchgear as backup). The most
important loads for this system are the single corded server and the dual corded server. To ensure the reliability of power
supply to these components, two UPSs are installed as additional redundant sources. Presence of these redundancies increases
the number of alternate paths. For the dual corded server, there exist 80 minimal path sets (so do for the single corded
server); the number of elements in some of the paths is as high as 9. These numbers make the evaluation of the structure
function in terms of SDP fairly complex. On the other hand, this network is an assortment of different types of devices and
480 V AC
Utility
Feed A

480 VAC
UTILITY
FEED A

System A Utility
Distribution
Switchgear
SYSTEM "A"
UTILITY DISTRIBUTION
SWITCHGEAR

System A
Computer
Switchgear

480 V AC
Utility
Feed B

480 V AC
Gen.

480 VAC
UTILITY
FEED B

System B Utility
Distribution
Switchgear

SYSTEM "A"
UPS / BYPASS

Battery

System B
UPS/Bypass
SYSTEM "B"
UPS / BYPASS

Generator
Switchgear

SYSTEM "B"
UTILITY DISTRIBUTION
SWITCHGEAR

SYSTEM "B"
COMPUTER
SWITCHGEAR

System A
UPS/Bypass

480 VAC

System B
Computer
Switchgear

SYSTEM "A"
COMPUTER
SWITCHGEAR

480 V AC
Gen.

480 VAC

GENERATOR
SWITCHGEAR

System A
Mechanical
Switchgear

System B
Mechanical
Switchgear

SYSTEM "A"
MECHANICAL
SWITCHGEAR

SYSTEM "B"
MECHANICAL
SWITCHGEAR

MCCs

MCCs

MCC'S

MCC'S

Battery

BATTERY

BATTERY

System A
UPS
Switchboard

System B
UPS
Switchboard

SYSTEM "A"
UPS
SWITCHBOARD

SYSTEM "B"
UPS
SWITCHBOARD

Maintenance
Switchgear
MAINTENANCE
SWITCHGEAR

Load
Bank
LOAD
BANK

System A Critical
Switchboard
SYSTEM "A"
CRITICAL
SWITCHBOARD

PDU

PDU

RPP

RPP

PDU

PDU

System B Critical
Switchboard
SYSTEM "B"
CRITICAL
SWITCHBOARD

STATIC
SWITCH

Static Switch
PDU
PDU

RPP

RPP

RPP
RPP

Dual Corded
Server
DUAL CORDED
SERVER

SINGLE CORDED
SERVER

Single Corded Server

Figure 5. One line diagram of a data center with 480VAC primary feed

there is a wide diversity in the power handling capacities of these devices due to various derating phenomena. Along with the

monitored data, some deterministic models (e.g., for reduction of power handling capacity with aging, effect of harmonics
on power handling capacity, etc.) are needed to determine the excess capacity of the system.

3.2 Evaluation of Reliability/Availability and Excess Capacity of Power System of a Data Center
3.2.1 Reliability/Availability evaluation and maintenance scheduling

The software enables the graphical entry of the one line diagram of a system in terms of nodes (corresponding to devices) and
edges (representing signal/power carrying lines) as well as the specifications of components (both nodes and edges) through
multiple dialog boxes. Fig. 6.1 shows the one line diagram of a power system, while Fig. 6.2 shows how a user can define the
parameters for reliability improvement and maintenance scheduling.

Figure 6.1. User Interface (One line diagram of data center power system)

Figure 6.2. Output selection,


desired reliability improvement
and budget allocation menu.

The user can define lifetime distributions of components (the default is exponential, options for normal and Weibull are also
available) and distribution parameters for reliability/availability evaluation. For maintenance scheduling, the user can define
component replacement cost, replacement budget, reliability lower bound, and desired reliability improvement. In addition,
the user is allowed to insert faults in the system to simulate system operation under faulty conditions.

Figure 7.1. Reliability with replacement, according to


the specifications of Figure 6.2.

Figure 7.2. Reliability of a non-maintained system

Figures 7.1-7.2 (shown in previous page) show reliability/availability under maintained and non-maintained conditions
for the specifications given in Fig. 6.2. From Fig. 7.1, it can be seen that components indexed 3 and 10 (i.e., Gen. A and the
maintenance switchgear, respectively) in the block diagram were replaced (i.e., maintenance was performed) when system
reliability/availability had fallen below the specified lower bound. Fig. 7.2 simulates the situation where no maintenance is
scheduled.
3.2.2 Excess capacity computation

Excess capacity computation requires some component specifications from the user and real-time data from different
monitoring devices. User needs to specify different line and bus parameters (e.g., line resistance and reactance, half line
susceptance, line tap setting, bus type and min/max Mvar for the generation buses). Bus voltage, current, power-factor, load
Mw and Mvar, generator Mw and Mvar and injected Mvar values are initialized with data obtained from the monitoring
system. Node ratings, such as rated voltage, rated current, rated capacity and desired capacity margin are to be specified by
the user. The computed excess capacity values appear on screen below the corresponding nodes and present user with an
overall picture of the excess capacity of the system.

4. SUMMARY AND FUTURE WORK


In this paper, we have developed an analysis and software tool for system reliability/availability evaluation, dynamic
maintenance scheduling and excess capacity computation. The software tool was tested on a 32-node industrial power
system. It takes less than 55 seconds to compute reliability and to propose maintenance schedule when the dual corded
server is selected as the output node. The load flow submodule requires less than 20 seconds to converge and the recursive
algorithm for computation of excess capacity requires an additional 2 seconds for execution. These results indicate that our
algorithms should be well suited for practical applications.
Work is under way in developing the lifetime data modifier module, as well as monitored data modifier/component state
evaluator. Once the work is completed, the predictive and deterministic capacities of the tool will be significantly enhanced.
We have performed theoretical studies to introduce a cause-effect relationship module for relating the power quality
problems experienced in an industrial power system to different events. Neural network techniques will be employed in this
module. Our future plan includes the use of Binary Decision Diagrams (BDDs) for reliability/availability analysis.

REFERENCES
[1] J. A. Abraham, An improved algorithm for network reliability, IEEE Trans. Reliability, vol. R-28, 1979, pp 58-61.
[2] M. O. Locks, A minimizing algorithm for sum of disjoint products, IEEE Trans. Reliability, vol. R-36, 1987, pp 445453.
[3] J. M. Wilson, An improved minimizing algorithm for sum of disjoint products, IEEE Trans. Reliability, vol. 39, 1990,
pp 42-45.
[4] K. D. Heidtmann, Smaller sums of disjoint products by subproduct inversion, IEEE Trans. Reliability, vol. 38, 1989,
pp 305-311.
[5] D. R. Shier & D. E. Whited, Algorithms for generating minimal cut sets by inversion, IEEE Trans. Reliability, vol. R34, 1985, pp 314-319.
[6] Fang Tu, Signal processing & neural network toolbox and its application to failure diagnosis and prognosis, SPIE
Conference on Fault Diagnosis, Prognosis and System Health Management, Orlando, Florida, April 2001.
[7] Sheldon M. Ross, Introduction to probability models, New York: Academic Press, c1980, pp 477-486.
[8] Hadi Sadat, Power system analysis, WCB/Mc Graw Hill Book Company, New York, 1999, ch-6, pp 189-240.
[9] Alexander Kusko, Emergency standby power systems, Mc Graw Hill Book Company, New York, 1989.
[10] Math H. J. Bollen, Understanding power quality problems, IEEE Press, New Jersey, 2000.
[11] W. Edward Reid, Power quality issues standards and guidelines, IEEE Trans. Industry Applications, vol. 32, 1996,
pp 625-629.

This performance was registered on a PC with inteltm Pentium III, 667 MHz processor.

[12] IEEE Recommended practice for the design of reliable industrial and commercial power systems (The Gold Book),
IEEE standard 493, 1990.
[13] A. Arsoy, S. M. Halpin, Y. Liu & P. F. Riberio, Modeling and simulation of power system harmonics, IEEE product #
EC 102 (CD ROM), 1999.