You are on page 1of 18

Distributed Power System State Estimation

Ra Sevlian1 and Umnouy Ponsukcharoen2


1. Department of Electrical Engineering
Stanford University, Stanford, CA
2. Institute for Computational and Mathematical Engineering, Stanford, CA
{raffisev, umnouyp}@stanford.edu

June 17, 2012


Abstract
The need for real time power system state estimation to be performed on a large
number of smart grid components make the use of distributed algorithms very attractive. This report explores the current state of the art of distributed power system state
estimation algorithms and evaluates their performance on multiple IEEE benchmark
transmission and distribution systems as well as randomly generated graphs.

Three

distributed estimation algorithms are evaluated in terms of convergence, communication cost, computational cost and robustness to communication errors for the IEEE 14
bus network. Analysis is also performed for convergence of one algorithm for a general
graph.

Introduction

Power system state estimation (PSSE) refers to obtaining the voltage phasors of all system buses at a given moment.

This is generally performed by making many redundant

observations of the many power ows through the network, then performing an inference to
determine the underlying phasor values. In early days, PSSE was performed in a centralized
data processing center which aggregated all the observations and computed a global solution
for the entire network [8, 7]. However, there is a a move towards expansion of system sensing
capabilities as well higher rate of estimation making decentralized estimation very attractive.
Also, the new grid structure is open for two-way electricity ow and distributed generations
down to distribution level.

Hence, power system state estimation needs to be performed

in a large scale where the centralized operations might not as eective. Distributed power
system state estimation algorithms allow system operators to deal with large scale problems
by dividing the measurements and buses into control areas. Each control area will collect
it's measurements, perform it's own state estimation and exchange information with other
control areas.

Recently proposed methods for solving for state estimates come from the sensor networking literature. [9, 4]. This report provides an experimental study for the use of these
two techniques. It also evaluates the performance of recent work in distributed sub gradient
based optimization [1]. The study divides into two parts. In the rst part, three algorithms
are implemented and their performance compared with a centralized estimator. Comparisons
are made in four performance indicators of interest: (1) convergence behavior, (2) computational complexity (3) communication cost and (4) local observability requirement. In the
second part, we analyze the performance of these techniques on dierent tree networks and
present an empirical model for one of the algorithms.

Power System State Estimation

The centralized state estimation problem is the following: Given an underlying state of the
network

X = {(V0 ), (1 , V1 ), . . . (N , VN )}

which represents the voltage magnitude and angle

at every bus except the reference, the state estimator will use a set of measurements of the
form

hk (X) + k

to construct it's state estimate

each measurement,

 N (0, R), (Rii = i )


argmin
X

x.

Assuming normally distributed errors on

the general estimator will be.

2
M 
X
zk hk (X)
k

k=1

(1)

For the sake of simplicity, we only consider power measurements for the entire system.
In the fully non linear AC power ow, a state estimator will receive the following sets of
measurements.
Real and reactive power injection at bus

Pi = Vi

Vj ( Gij cos(ij ) + Bij sin(ij ))

(2)

Vj ( Gij sin(ij ) + Bij cos(ij ))

(3)

jN (i)

Qi = V i

X
jN (i)

Real and reactive branch ow from bus

to bus

j.

Pij = Vi2 gij + Vi Vj ( gij cos(ij ) + bij cos(ij ))


Qij = Vi2 bij + Vi Vj ( gij sin(ij ) bij cos(ij ))
Where the line admittance of branch

(i, j)

is

yij = Gij + jBij

and

(4)
(5)

ij = i j .

In this

paper, however we will solve the state estimation problem based on linearized DC power
ow. In linearized DC power ow, we assume rst that all bus voltage magnitudes are close
to 1.0 per unit. Next, that all the transmission lines are lossless and the nodal voltage phase
angle dierences are small. Applying these to the nonlinear measurements to eq. (2), (4) we
have the following equation relating bus angle and real bus injection and branch ow power.

Pi =

Bij (i j )

(6)

Pij = bij (i j )

(7)

jN (i)

The linearized measurement model allows us to represent the measurement set as well
as central estimator as a simple weighted least squares formulation. From the relations in
eq. (7), the measurement vector and system state vector are related by Z = HX + . Here
Z Rnobs is the column vector containing all bus injection and branch ow measurements
n nbus
is the matrix containing elements of eq. (7). From the
(M = nbus + nbranch ). H R obs
DC modeling assumptions the state vector only contains bus angles
now on, state variable

and bus angle

x = {1 . . . N }.

will be used interchangeably.

Fron

We can therefore

simplify eq. (1) to the following centralized weighted linear least squares estimate.

xc = (H T R1 H)1 H T R1 Z

(8)

Note that in order to be able to estimate the state vector, mbus must be at least N 1
rank(H T H) = nbus 1. To make the system of equation full rank, we assume that

and

the bus

is always the slack bus with phase angle zero. We solve for each angle relative to
0 , this leads to a reduced sensing matrix Hr Rnobs nbus 1 which is H but with the rst
column removed. We can apply eq. (8) with

Hr

which results in the following.

[1 0 . . . N 0 ]T = (HrT R1 Hr )1 HrT R1 Z

(9)

Distributed Estimation Algorithms for Power System


State Estimation

Distributed algorithms for estimation work by splitting large problems into many locally
computable problems.
shown in Figure 1.

This can be illustrated in the model of the IEEE 14 bus network

As opposed to the centralized estimator which requires each sensor

to transmit information to a single location and then perform inference.

A decentralized

algorithms work by distributing the observations and computation into dierent domains.
Here, each area receives observations of variables local only to itself as well as variables
it shares with neighboring areas.

For the 14 bus example, there are four domains, where

each domain has access to only specic observations.

It then uses specically designed

messages from it's neighbors to compute estimates of it's unknown variables and possibly the
unknown variables contained in other areas. Local observability of an area is an important
assumption in many distributed techniques. If an area is locally observable, then estimation
can occur without the rest of the network. Many early and heuristic methods required local
observability, however this is not required.

The algorithms presented here do not require

local observability. In the following sections we will introduce the various algorithms and
illustrate their behavior using the 14 bus network.

Figure 1: Computational abstraction of 14 Bus IEEE benchmark network. Splitting nodes


into dierent computational domains. Block markers on edges represents branch ow measurement. Circle markers on vertices represents bus injection measurement. Network contains 26 measurements and is overdetermined.

3.1

CSE

The algorithm introduced in [9] for use in the distributed power system state estimation
originates from work in distributed estimation in sensor networks [3].

In CSE, each area

builds an estimate of the entire state of the system. In each step, an area will compute a
local estimate of the global state, and then transmit the full estimate to all neighbors in it's
communication network. Note that in CSE, each area's state vector is the full set of unkown
variables.
estimate

Therefore the message sent from area

xtk Rnbus

if

from it's neighbors, the area

k
l Nk .

to area

is in the communication set of

at time

for example is the

With a given set of messages

performs the following update.

"
xt+1
= xtk a b
k

#
X

(
xtk xtl ) HnT (zn Hn xtk )

(10)

lNk
This method puts no limitations on the communication architecture required. That is
the structure of

Nk

is independent of the topology of power system. Also, since the second

term of eq. (10) does not involve any matrix inversion, the algorithm does not assume any
local observability. Therefore, it can be used in a fully decentralized manner, in that each

bus can be a computing area. However, is no implicit inversion, since the update term is
similar to a gradient ascent direction.
Analysis of the algorithm is based mainly on introducing an aggregate state

[
xT1 (i) . . . xTnarea (i)]T .

x(i) =

With this, the evolution of the estimate vector becomes.

T
x)
x(i + 1) = (Inbus Inarea ab(L Inarea )x) DH (Z DH

(11)

is the graph Laplacian of the communication topology and represents a kroM M


necker product over matrices. IM represents an identity matrix in R
. Finally the matrix
Here

DH

is

H1T . . . 0

.
.
DH = ...

.
T
0 . . . HM
Note the similarity of the update step with that of recursive least squares lters in

standard lter theory [2] as well as gradient descent.

In standard recursive lter theory,

state updates are of the form.

x(i + 1) = x(i) + Ak (yk Hxk )


With this, we can show that the error process evolves as.

(12)

ek+1 = (I Ak H)ek .

In the case

of the CSE algorithm, we can use a similar method to show that the error process evolves
T
as ek+1 = (IM N ab(L IM + DH DH ))ek . Like RLS lters, this technique has an associate
Riccatti equation relating error covariance as a function of iteration.

3.2

ADMM

The ADMM algorithm was rst introduced to power system state estimation in [4].

It

relies on formulating an augmented Lagrangian of the combined estimation problem. Given


the global optimization problem that needs to be solved in eq. (1). We can decompose the
objective function into

L separate objective functions dependent on their own set of variables

if we introduce an extra set a constraints.

min
x
s.t.

L
X

kzk Hxk k2

k=1

xk [l] = xkl ,

l Nk .

From this we can formulate the augmented Lagrangian for the system. This is given for
the problem as

L({xk }, {xkl }, {vk,l }) =


=

nX
area
k=1
nX
area

"

#
kzk Hxk k2 +


T
vk,l
(xk [l] xkl ) + ckxk [l] xkl k2

lNk

Lk ({xk }, {xkl }, {vk,l })

(13)

k=1
The alternating direction term comes from the fact that the augmented Lagrangian in eq.
(16) is maximized by each area partially minimizing their local Lagrangian

Lk ({xk }, {xkl }, {vk,l })

then exchanging dual variables in a distributed fashion.

xt+1
=
k

arg min
x

t
Lk ({xk }, {xtkl }, {vk,l
})

(14)

xt+1
=
kl

argmin
xkl

t
L({xt+1 }, {xkl }, {vk,l
})

(15)

t+1
vk,l
=

argmin
vk,l

L({xt+1 }, {xt+1
kl }, {vk,l })

(16)

Note that in ADMM, each area's state vector is the local set of unkown variables as well
as variables that eect the obsevations set in the area. For example, the state vector for area

[1 , 2 , 5 ] as well as [4 , 6 ] since the bus power observed at bus 5


4 and 5. As shown in [4] this reduces to the following set of recursions.

1 is the local uknowns


a function of bus

xt+1
= (HkT Hk + cDk )1 (HkT zk + cDk pk )
k
1 X r+1
xl [i]
st+1
=
kl
|Nki |
i

is

(17)
(18)

lNk

pt+1
k

prk (i)

sr+1
k

xrk + srk

(19)
(20)

i
corresponding to xk (i) dened for all l Nk . Nk is
th
dened as the set of all areas which share the i element of area k 's state vector (xk [i]). Dk is
i
a diagonal matrix with the (i, i) entry of |Nk |. With appropriate choice of c, for each control
r
area k, xk converges to the estimate of the subset of the whole system estimate. Combining
values from all areas, one can obtain the whole system estimate.
Here,

3.3

xl [i]

denotes the entry of

xl

Distributed Dual Averaging

In distributed dual averaging (DDA), estimates are constructed by each node calculating a
local estimate of the global subgradient and then sharing it with neighbors dened by a communication graph. In the case of the power system estimation problem, the communication

graph is independent of the topology of the power network as shown in 1. This makes DDA
similar to the CSE algorithm in that regard. Specically, we have a neighbor set
for each node

where

i V N (i) = {j V |(i, j) E}

The core algorithm dened for a general optimization problem is dened as the following.
For a given generic convex objective function of the form.

1X
fi (x)
n k=1

min
xX

is convex but not necessarily smooth. So at iteration t, each


t
t
node k V computes an element gk fk (xk ) the subdierential of the local function fk
t
and receives information about the parameters wk , j N (i). It updates it's estimate of the
t
solution xi based on a combination of the current subdierential and the messages from it's
neighbors, via a projection operation . The algorithm is therefore.
We must assume that

fi (x)

wkt+1 =

pkl wj (t) + gkt

(21)

lN (k)

xt+1
k

= X (wit , (t))

(22)
(23)

The projection operator

is dened as.

X (z, )


= argmin
xX

1
< z, x > + (x)


(24)

Here is the proximal function, and in this study is set to the canonical proximal function
1
kxk22 as stated in [1].
2
In the case of distributed linear estimation for power system state estimation, the algoT 1
rithm reduces to the following. The objective function now is fk (x) = (zk Hk x) Rk (zk


1 T
Hk x) which gives us gk (t) = 2Hkt R1 (zk Hk x). Next, X (z, ) = argmin z T x + 2
x x .
xX
t+1
In an unconstrained case, the solution becomes xk
= t+1 zkt+1 . Since x [, ] we need
to constrain the update. The nal recursion becomes.

wi (t + 1) =

pij wj (t) + (t)

(25)

jN (i)

xi (t + 1) =

min

max

n
o o
t wi (t + 1),
,
4
4

(26)

Numerical Experiments with IEEE 14 Bus Network

The centralized and three distributed algorithms are numerically tested using MATLAB.
The power system used in the test is the IEEE 14-bus system which is shown in Figure
1. The associated admittance matrix and true underlying power states/measurements are
obtained using MATPOWER and veried with the benchmark source. In the IEEE 14-bus
grid, measurements sites and types are shown in the gure.
An abstraction of the buses, measurements and control areas is also shown in 1 The
four rectangles represent local control areas with a total of 23 measurements in p.u. where
boxes on an edge represents branch ow measurement. The redundancy ratio is 23/14=1.64
therefore, the system is very likely to have a unique solution. The communication between
the areas forms a fully connected graph.

4.1

Preliminary test and calibration

Figure 2: Performance all three algorithms under no observation noise or communication


noise.

MSE shown is between the current estimate and the underlying true power state.

Title on plots indicates levels of measurement and communication noise in simulation.


In the preliminary test, we draw measurement values as appeared in the gure above
(IEEE 14-bus system) from true measurement values without adding simulating errors.
Hence, we expect all algorithms to give the true underlying power state as a solution. Here
we keep the iterations run until the dierence between any pair of the estimate and true
underlying value diers by less than a specied threshold. The results are shown in Figure
2.

Centralized algorithm: using the iterations in (9), we found that

xc

agrees with the

true underlying state vector obtained from MATPOWER. That means all pieces of
centralized algorithm and related information are consistent.

CSE algorithm: using formula (10), with tuning until


that the iterations terminated in

13, 626

a = 8e10

and

b = 1e10 , we found

iterations, and the estimates from all control

area agree with true underlying state vector obtained from MATPOWER.

Distributed Dual Averaging algorithm: using formula (26), with


computed from formula discussed above, with extra tuning until
found that the algorithm terminated in

495, 536

0 = 2.66e10 as
N EW = 2000 ,

we
we

iterations, and the estimates from all

control area agree with true underlying state vector obtained from MATPOWER.

ADMM-based algorithm: using formula (16), with tuning until


the iterations terminate in

34 iterations ,

c = 9,

we found that

and the estimates from all control area agree

after matching with true underlying state vector obtained from MATPOWER.

4.2

Convergence under Measurement and Communication Noise

To study the convergence behavior of three iterative algorithms, we simulate measurements as


appeared above from IEEE 14-bus system with observation noise as well as communication
noise.

The convergence behavior is shown by calculating the error with respect to the
xc xrk k2 and the error to the true underlying
centralized solution x
c dened as erk,c = N1 k
1
r
r
state dened as ek,c =
k
xc xk k2 . Here is the size of vector . Note that we do not
N
use stopping criteria here in order to see the convergence behavior across large number of
iterations. The error curves obtained from the IEEE 14-bus network are shown below.

4.2.1

Measurement Noise Only

Figure 3: Performance three algorithms under only observation noise. Measurement noise
indicates values of

HIGH = 0.01

and

HIGH = 0.00001.

Title on plots indicates levels of

measurement and communication noise in simulation.


Convergence behavior for measurement noise only experiments is shown in Figure 3. In
term of convergence behavior, the ADMM algorithm is the most eective method.

The

convergence rate is high and the number of steps to reach the cut-o level is about 35
iterations, while other two methods require 3 or 4 more order of magnitude number of
iterations. Moreover, the convergence rate of the ADMM algorithm in terms of
be linear after reaching the cut-o level of

M SE .

ek,c

tends to

The CSE algorithm tends to be at

after reaching this cut-o level, while the dual averaging algorithm has very at convergence

rate long before reaching the cut-o level. It would be interesting to see why the convergence
behavior of the dual averaging algorithm has a sudden increase and then become at.

4.2.2

Communication Noise Only

Figure 4:

Performance three algorithms under only communication noise.

noise indicates values of

HIGH = 0.01

and

HIGH = 0.001.

Measurement

Title on plots indicates levels of

measurement and communication noise in simulation.


We now simulate the three methods using perfect observations and having additive gaussian noise in the messages transmitted between the areas. The results are shown in Figure 4.
In term of convergence behavior, the ADMM algorithm is still most eective method since
for both low and high communication noise, there the MSE reaches the lowest point quickly.
It is interesting to note however, that when the technique is run for the same iterations as
DDA and CSE the accuracy tends to decrease. In these experiments we also see the benet of
decreasing step size in the estimate updates in that in the high noise, no measurement case,
when the iterations tend to innity, the DDA and CSE algorithms will have higher accuracy
since they weight down more recent messages as opposed to the standard ADMM which will
weigh it equally with previous estimates.

However, for a practical distributed estimation

technique, messaging times are nite thus these techniques seem of only theoretical interest
for this application.

4.2.3

Measurement and Communication Noise

Here we simulate the three methods by having noise in the message as well as the observations. The results are shown in Figure 7. Using the same variances as in 4.2.3 and 4.2.3. We
only present cases where both are in low and high noise conditions. In the rst experiment
we see again, that given enough time, the CSE algorithm will outperform ADMM and DDA
in terms of forming consensus, however like in the other situations, ADMM performs many
orders of magnitude faster.

10

Figure 5:

Performance three algorithms under only communication noise.

noise indicates values of

HIGH = 0.01

and

HIGH = 0.001.

Measurement

Title on plots indicates levels of

measurement and communication noise in simulation.

Computational and Communication cost

In this section we give an overview of the computational as well as communication costs


and time for the three algorithms based on their formulation. To assess computation and
communication cost (time), we simulate measurements as appeared above from IEEE 14-bus
system with no noise in the experiment in 4.1. Here we will estimate computation cost and
communication cost (time) for the experiment and will extend it to general settings.
To measure the computational complexity, we count the number of basic mathematical
operations needed in the algorithm until the algorithm converges with criterion as set above.
Note that we assume there is no computational complexity for inverting

is diagonal.

since we assume

We separate the analysis of computational complexity by pre operational

(xed) and operational (marginal) computation cost.


To measure the communication cost, we dene the maximum cost of sending one number
from a local control area to the central control center to be

clc

, and the maximum cost of

sending one number for a local control area to its neighbor local area to be

clcl .

We also

assume there is no communication cost to collect local measurements to local control areas.
Notice that the communication cost is proportional to the number and size of messages sent
during the algorithm.
Similarly, to measure the communication time, we dene the maximum time of sending
one number to for a local control area to the central control center to be

tll

, and the

maximum time of sending one number for a local control area to its neighbor local area to be

tll .

We also assume there is no communication time to collect local measurements to local

control areas. In addition, there is no bandwidth congestion problem within communication


network. Notice that the communication time is not necessarily proportional to the number
and size of messages sent during the algorithm if there is not bandwidth congestion problem.
We also dene

to be the maximum degree in the communication network graph and

approximate

Centralized algorithm:

we use (9) for analysis.

11

Computational complexity: The pre-operational computational cost is dominated by

the matrix-matrix multiplication and matrix inversion therefore:

O(nbus n2area +(nbus )3 ).

The operational computational complexity for each implementation is dominated by


2
matrix-vector multiplication: O(nbus narea + (nbus ) ).
Communication cost:

the communication cost here is the cost of sending all local

measurements to the central control area. Hence, the communication cost is bounded
above by:

O(mclcl )

Communication time: The communication time is bounded above by

O(tlc )

CSE algorithm: we use (10). We found that with a = 8e 10 and b = 1e10. We


found that ni terate = 14184 for low observation noise. It would be interesting to nd a
complexity bound on the number of iterations required for a given epsilon of accuracy.
Computational complexity: There is no pre-operational computational complexity for

in this algorithm. The operational complexity for each implementation is dominated


by matrix-vector multiplication and the summations with information from neighbors:

O((narea )(niterate )(maxi |zi |) + (d)(nbus )).


Communication cost: the communication cost here is the cost of sending whole vec-

tor

O ((niterate )(d)(narea )(nbus ))

to the neighbors.

Hence, the communication cost is

bounded above by
Communication time: the communication time is bounded above by

Distributed Dual Averaging algorithm:

we use eq. (26), with

we obtained in 4.1 and the algorithms terminated after

509334

O(niterate tll )
new = 2000

as

iterations.

Computational complexity: There is no pre-operational computational complexity for

this algorithm. The operational complexity for each implementation is dominated by


matrix-vector multiplication and the summations with information from neighbors:

O((narea )(niterate )(nb us)(maxi |zi |) + (d)(nbus )).


Communication cost: the communication cost here is the cost of sending whole vector

wkr to the neighbors.

Hence, the communication cost grows as

Communication time: the communication time grows as

ADMM-based algorithm:

O((niterate )(d)(nbus )(narea )(cll ))

O((niterate )(tll )):

we use formula (16), with c = 9 as we obtain from the

previous part. We found that now the iterations terminate in

niterate = 33.

The pre-operational computational complexity is domi2


nated by the matrix-matrix multiplication and matrix inversion: O((narea )(maxi |zi |) +
(maxi |xi |)3 ). The operational complexity for each implementation is dominated by
Computational complexity:

matrix-vector multiplication and the summations with information from neighbors:

O((narea )(niterate )(maxi |xi |)(maxi |zi | + d)).


Communication cost: The communication cost here is the cost of sending components

(not whole vector) of

xrk

to the neighbors. The communication cost is bounded above

12

Computational Complexity

Algorithm
Centralized
CSE
DDA
ADMM

Pre-Operational

Operational

O(nbus M 2 + (nbus )3 )
0
0
O((narea )(maxi |zi |)2 + (maxi |xi |)3 )

O(nbus M + (nbus )2 )
O((narea )(niterate )(maxi |zi |) + (d)(nbus ))
O((narea )(niterate )(nb us)(maxi |zi |) + (d)(nbus ))
O((narea )(niterate )(maxi |xi |)(maxi |zi | + d))

Table 1: Computational complexity of three algorithms separate with pre-operational complexity and operational complexity.
Algorithm

Communication Cost

Communication Time

Centralized

O(mclcl )
O ((niterate )(d)(narea )(nbus ))
O((niterate )(d)(nbus )(narea )(cll ))
O(niterate )(d)(narea (maxi |xi |)clc + (narea )2 (maxi |xi |)

O(tlc )
O(niterate tll )
O((niterate )(tll ))
O(narea tll )

CSE
DDA
ADMM

Table 2: Computational Time and Cost of three algorithms.

by

O((niterate )(d)(narea (maxi |xi |)clc )

However, to make every control area know

the whole system state, it requires extra communication cost of sending incomplete
state information to all other control areas through local communication channels.
Only for this 4 control area system, with careful counting, we found that this extra
2
communication cost is bounded by O((narea ) (maxi |xi |). So the total communication
2
cost is O(niterate )(d)(narea (maxi |xi |)clc + (narea ) (maxi |xi |)
Communication time: The communication time is bounded by

O(narea tll ).

However,

to make every control area know the whole system state, it requires extra time to
send incomplete states information to all other control areas through local communication channels. This extra communication time is bounded by
communication cost is bounded by

O(narea tll ).

The total

O(narea tll )

The computational complexity and communication time/cost for each algorithm are in
Tables 5, 5, The table includes a numerical result from IEEE 14-bus system.

5.1

Discussion

Pre-operational computational complexity:

The CSE and dual averaging algorithms

are the best among four algorithms, while the centralized algorithms is the worst. This is
because the centralized algorithms require a big matrix inversion.

Operational computational complexity:

The centralized algorithm is the best among

four algorithms, while the dual averaging algorithm is the worst. This is because the dual
averaging algorithm requires high number of iterations.

Overall computational complexity:

In real world application, the operational com-

putational complexity is more critical than the pre-operational computational complexity.

13

Once the pre-operational computation is done, the result can be stored and reused unless
there is any update in the system (e.g. a bus connection line is cut). The numerical result
here implies that, in term of overall computational complexity, the centralized algorithm is
more advantageous than the distributed algorithm. Among the distributed algorithms, the
ADMM algorithm is the best algorithm. In the general setting, especially when the problem
is large, the ADMM algorithm may be as good or better than the centralized algorithm if

niterate

is in the same order as the problem size. In, [4] for the larger problem, IEEE 118-bus
4
system, niterate does not grow as the problem size grows. The 118-bus system reaches 10
accuracy in

ek,c

after

10

iterations. A theoretical justication is needed to show how in the

ADMM algorithm actually relates to the problems size.

Communication time (cost):

clc cll

and

tlc tll .

In small communication network, we may assume that

Then, the centralized algorithm is more preferable. When the

communication network is large, it is possible that

clc  cll

and

tlc  tll .

Then, the

distributed algorithm might be more preferable. According to the numbers we computed,


CSE and dual averaging algorithms has disadvantages due to large number of iterations and
large bits of messages.

The ADMM algorithm is a good candidate in which the commu-

nication time/cost might be comparable to the centralized algorithm.

We need a model

for relationship between local-to-local parameters and local-to-central parameters when the
problem is large as well as how in the ADMM algorithm relates to the problems size.
Roughly speaking, if the local-to-central parameters grow as fast as

niterate ,

the ADMM

algorithm is in even with the centralized algorithm.

Local Observability

None of the algorithms presented here require local observability.


T
In CSE no explicit inversion is taking place, therefore we do not require Hk Hk to be full
rank for all k . In [9] the authors illustrate numerically how lack of observability leads to
P T
the global solution. The CSE algorithm only requires global observability, that is
k Hk Hk
needs to be full rank. ADMM does not require local observability, since matrix inversion step
is being performed with psuedo measurments from it's neighbors. For the dual averaging
algorithm, this is a new result in which one may explore its mathematical proof.

Convergence on Trees Networks

Due to proliferation of sensing and computation capabilities on the distribution grid. Exploring the convergence properties of distributed estimation on tree like graphs seems and
interesting direction of study. We simulated the PSSE problem using the ADMM only. It
would be fruitful to explore the other techniques as well, however the large converegence
times limited study to faster converging techniques. In the simulations, we tested randomly
generated trees of size

N = 251020.

The algorithm used for random tree generation is

documented in [6]. For these simulations, the ground truth state vectors were randomly generated as well as the observation matrices. We assumed that each bus made complete power
observations of the entire system. That is, each bus had a bus power measurement as well
as a branch ow measurement. This was chosen to reduce the variability of the convergence
n
results. Convergence was measured by the number of iterations taken for the ec,k < . We

14

chose an

value of

0.01

and

0.001.
5

10

10

10

Iterations

Iterations

10

10

10

10

10

15

10 2
10

20

Tree Size

10

10

10

10

Iterations

10

10

10

10

10

10

10

10

10

10

Iterations

10

10

10

10

10

15

20

Tree Size

10 2
10

10

10

10

k
Figure 6: (Top Left) Number of iterations required for ec < 0.001 to be satied vs. tree size.
k
(Top Right) Number of iterations required for ec < 0.001 to be satied v.s. 2 . (Bottom
k
Left) Number of iterations required for ec < 0.01 to be satied vs. tree size. (Bottom Right)
k
Number of iterations required for ec < 0.01 to be satied v.s. 2 .
Figure 6 illustrates the convergence properties of ADMM on randomly generated tree
networks. The rst result that is apparent is that there is a large variation of termination
time for a given graph size. In the second sets of plots, we see the variation of minimum
iteration counts vs. the second smallest eigenvalue of the system. The loglog scale of the
plots show a linear relationship. Also, there is high variation in the termination conditioning
on

2 .

This is because each randomly generated tree had a randomly generated bus matrix

as well as observation.

The variation of bus matrices and observations might be causing

the additional variation.

At this moment, there has been no study of graph properties

on the ADMM algorithm assuming single bus areas. However, the experimental results

point towards a polynomial relationship O(2 (L)). The simulation data is used to t a
simple regression model. In the two experiments a exponent value of =01 = 0.52143
and =001 = 0.51850. The 95% percent condence intervals under linear regression is
95
95
=(.001)
= [0.56, 0.48] and (.01)
= [0.56 0.47]. It would be of theoretical interest to
derive this bound based on rst principles, however the scope of this work leaves this to the
future.
Table 3 shows the mean number of iterations required before termination for dierent
sized trees.

The results hint towards a linear relationship, however there is a vary large

15

Tree Size

10

20

.01

32

125

542

2198

.001

20

116

1083

2931

Table 3: Mean computation interations required for convergence for trees of size N.
3000
= 0.01
= 0.001

Mean Iterations

2500
2000
1500
1000
500
0
0

10

15

20

Tree Size
Figure 7: Mean computation time for various sizes of trees.

varaince as shown in 6. An interesting question is how much of the variation is explainable


by the sensing matrix generated (randomly generated for each test), observations (randomly
generated for each test), and

(randomly generated for each test). More study is needed

to undertand the fundemental evolution of the ADMM algorithm. One avenue of interest
would be interpreting the entire algorithm as a linear dynamical system. It can be shown,
thatusing the following

state vector.

x1 (t + 1)
s1 (t)
s1 (t + 1)
p1 (t + 1)

x1 (t)

s1 (t 1)

s1 (t)

p1 (t)

.
.
.
.
=
A

+B z
.
.

xM (t)
xM (t + 1)

sM (t 1)
sM (t + 1)

sM (t)
sM (t + 1)
pM (t + 1)
pM (t)

This system can be treated in the same manner as other stochastic approximation problems. It would be interesting to apply similar methodologies used in [5] in uncovering the
emprically determined results relating convergence and graph spectrum as well as introducing
damped update sceams to combat communication noise.

16

Conclusion

In 4, we investigated the CSE and ADMM algorithms, which have appeared in the distributed power system estimation literature.

In addition, we investigated the distributed

dual averaging method. It turns out that CSE and the dual averaging algorithms suer from
a large number of iterations and size of messages needed to be sent. The ADMM algorithm
is experimentally justied to be a good algorithm for power system state estimation for
many reasons. It converges fastest among three algorithms. It requires small computational
complexity, communication cost and time.

The local observability is not required for the

ADMM algorithm. The only poor performance indicator for the ADMM algorithm is the
tolerance to the communication noise.
It is less tolerant to the communication noise than the CSE algorithm. Yet this ammount
is small for moderate termination steps. In comparison with the centralized algorithm, the
ADMM algorithm may be more ecient in term of computational complexity, communication time and cost when the problem size is large.

According to [4], which expects the

number of iterations in the power state system estimation to grow not as fast as the size of
whole network, we found there to be a large increase as the size of the network grew in the
case of trees. Future work should also include experimental results on connected graphs to
correspond with future theoretical ndings.

References
[1] J. Duchi, A. Agarwal, and M. Wainwright. Dual averaging for distributed optimization:
convergence analysis and network scaling.

Automatic Control, IEEE Transactions on,

(99):11, 2010.
[2] S. Haykin. Adaptive lter theory (ise). 2003.
[3] S. Kar, J.M.F. Moura, and K. Ramanan.

Distributed parameter estimation in sensor

networks: Nonlinear observation models and imperfect communication. Arxiv preprint


arXiv:0809.0009, 2008.

[4] V. Kekatos and G.B. Giannakis. Distributed robust power system state estimation. Arxiv
preprint arXiv:1204.0991, 2012.

[5] R. Rajagopal and M.J. Wainwright.

Network-based consensus averaging with general

noisy channels. Signal Processing, IEEE Transactions on, 59(1):373385, 2011.


[6] A. Rodionov and H. Choo. On generating random network structures: Trees. Computational ScienceICCS 2003, pages 677677, 2003.

[7] F.C. Schweppe. Power system static-state estimation, part iii: Implementation. Power
Apparatus and Systems, IEEE Transactions on, (1):130135, 1970.

17

[8] F.C. Schweppe and D.B. Rom. Power system static-state estimation, part ii: Approximate model. power apparatus and systems, ieee transactions on, (1):125130, 1970.
[9] L. Xie, D.H. Choi, and S. Kar. Cooperative distributed state estimation: Local observability relaxed. In Power and Energy Society General Meeting, 2011 IEEE, pages 111.
IEEE, 2011.

18