You are on page 1of 18

1

Asymptotic Optimality Theory for Decentralized


Sequential Hypothesis Testing in Sensor Networks
Yajun Mei

Abstract— The decentralized sequential hypothesis testing X1,n XK, n


problem is studied in sensor networks, where a set of sensors
receive independent observations and send summary messages to
the fusion center, which makes a final decision. In the scenario
where the sensors have full access to their past observations,
the first of asymptotically Bayes sequential tests is developed, S1 SK
and the proposed test has same asymptotic performance as the
optimal centralized test that has access to all sensor observations.
Next, in the scenario where the sensors do not have full access Possible Feedback
to their past observations, a simple but asymptotically Bayes U UK,n
1,n
sequential tests is developed, in which sensor message functions
are what we call tandem quantizer, where each sensor only uses
two different sensor quantizers with at most one switch between
these two quantizers. Moreover, a new minimax formulation of Fusion Center
finding optimal stationary sensor quantizers is proposed and is
studied in detail in the case of additive Gaussian sensor noises.
Finally, our results show that the feedback from the fusion center
does not improve asymptotic performance in the scenario with Final Decision
full local memory, however, even a one-shot one-bit feedback
can significantly improve asymptotic performance in the scenario Fig. 1. General setting for sensor networks
with limited local memory.
Index Terms— Asymptotically Bayes, distributed detection,
multi-sensor, quantization, sensor networks, sequential detection, network resources. A standard mathematical formulation that
tandem quantizer can meet these requirements is to limit the sensor messages
Uk,n within a finite alphabet (perhaps binary). While this ap-
I. I NTRODUCTION proach exhibits information loss as compared to the centralized
setting of sending all raw data to the fusion center, it can
S ENSOR networks were originally motivated by their
applications in military surveillance [19], and now they
have many other important applications, including mobile
often provide a formulation that is better suited for practical
applications, due to the need for data compression, notable
and wireless communication, internet or computer network decrease in communication bandwidth, significant reduction
monitoring, and urban disaster prevention and response. in computational complexity at the fusion center, and privacy
There are various possible configurations for sensor net- of raw data.
works. Figure 1 illustrates the general setting of a widely used In sensor networks, three of the most fundamental problems
configuration. In such a network, at time n, each of a set of K are: (1) how should each sensor send messages Uk,n ’s to
sensors receives an observation Xk,n , and then sends a sensor the fusion center; (2) how should the fusion center combine
message Uk,n to a central processor, called the fusion center, the messages from different sensors to make a final decision;
which makes a final decision when observations are stopped. and (3) if necessary, what kind of feedback should be sent
If necessary, the fusion center can also send feedback to the from the fusion center to the sensors? It is worth emphasizing
local sensors. that these three problems are closely related with each other,
In many interesting applications, sensors are assumed to be e.g., the “best” decision rule at the fusion center will depend
intelligent and the information produced by the senors has on how the sensors send messages to the fusion center.
been transmitted and fused in a fashion that provides reliable Ideally one wants to find optimal solutions to all these three
summary information while using the minimum amount of problems simultaneously so as to achieve the “best” network
performance (under some suitable definition).
Manuscript received September 1, 2006; revised October 1, 2007. This As in the classical or centralized setting, there are two kinds
work was supported in part by NIH Grant R01 AI055343. The material in
this article was presented in part at the IEEE International Symposium on of decision problems in sensor networks: static and dynamic.
Information Theory, Seattle, WA, USA, 2006. In static decision problems, the number of sensor observations
The author is with the School of Industrial and Systems Engineer- (sample size) is fixed before the data are taken, and the fusion
ing, Georgia Institute of Technology, Atlanta, Georgia, USA (email:
ymei@isye.gatech.edu). center makes only one decision. Static decision problems in
Publisher Item Identifier sensor networks have been studied over the past twenty years.
c 2002 IEEE
0000–0000$00.00 °
2

For a review, see [23] and the references therein for hypothesis and
Z ³
testing in sensor networks, and see [12], [27] for estimation fk (x) ´2
problems in sensor networks. log fk (x)dx < ∞.
gk (x)
However, there is a large body of problems where the
Now let us briefly review some open problems raised in
static setting is not useful. In particular, the nature of sensor
an influential paper by Veeravalli, Basar and Poor [22] and
networks is dynamic, as information is updated over time at
summary our main results. In [22], the authors considered
the both local sensor and fusion center levels. In the statistical
two different scenarios of sensor networks, depending on how
literature, dynamic decision problems are also known as
local information is used to produce the sensor messages
sequential decision problems, which were first introduced by
Uk,n ’s at the sensors. The first scenario is the system with
Wald [24]. In the classical or centralized setting, sequential
full local memory, where the sensors have full access to their
decision problems have been the focus of the mature fields
past observations. This scenario has the following two possible
of sequential analysis and change-point problems, see, [2],
cases, which correspond to Cases B, and D in [22].
[9] and there references therein. On the other hand, research
Case (i) System with no Feedback and Full Local Memory:
on sequential decision problems in sensor networks is rather
limited, see [3], [21] for a review. Uk,n = φk,n (Xk,[1,n] ), (1)
In this article, we will study a decentralized sequential where Xk,[1,n] = (Xk,1 , Xk,2 , . . . , Xk,n ) are all local sensor
hypothesis testing problem, one of standard sequential decision observations observed up to time n at sensor Sk .
problems in sensor networks, see [3], [23] for many important Case (ii) System with Full Feedback and Full Local Mem-
applications including distributed signal detection, recognition, ory:
and surveillance. In the decentralized sequential hypothesis
testing problem, it is assumed that there are two simple Uk,n = φk,n (Xk,[1,n] ; En−1 ), (2)
hypotheses H0 and H1 on the distributions of sensor obser- where En−1 denotes the past sensor message information given
vations. In order to make a quick but good decision on which by
of these two hypotheses is true, the optimization needs to be
done jointly over sensors and fusion center policies, as well En−1 = {U1,[1,n−1] , U2,[1,n−1] , . . . , UK,[1,n−1] }, (3)
as over time. This makes the optimization highly problematic. with Uk,[1,n−1] = (Uk,1 , Uk,2 , . . . , Uk,n−1 ), and E0 is the null
In fact, it is well-known [22] that dynamic programming or set.
Bayes formulations generally become intractable, except in a It is well-known [22] that it is untractable to find Bayes
special case where it has been an open-problem to derive the solutions under this scenario via the dynamic programming
properties of Bayes solutions. approach, and it has been an open problem to find any
The goal of this article is to develop asymptotic opti- asymptotically optimal sequential tests in the system with full
mality theory for the Bayes formulation of the decentral- local memory. By using the asymptotic optimality approach,
ized sequential hypothesis testing problem, and to offer two we will offer the first of asymptotically Bayes sequential tests
classes of fairly simple decentralized sequential tests which in the system with full local memory. In our proposed tests,
are asymptotically Bayes in their respective scenarios. As a each sensor sends its local “sensor decisions” Uk,n to the
consequence, several open problems raised in [21], [22] for fusion center, which then combines all local sensor decisions
decentralized sequential hypothesis testing problems will be by using an “AND” rule to decide which of the hypotheses is
addressed asymptotically. true.
Throughout this article, we make the following assumptions, Note that a criticism of the system with full local memory
which are standard: often made is that it will require sensors to have unlimited
(A1) Conditioned on each of hypotheses, H0 and H1 , the memory and computational capacities as the time goes to
sensor observations are independent over time as well as from ∞. In our proposed tests, however, local sensor decisions (or
sensor to sensor. sensor messages) are based on local sufficient statistics which
(A2) For each 1 ≤ k ≤ K, the density of the sensor can be calculated recursively, and thus at every time n, each
observations Xk,1 , Xk,2 , · · · at sensor Sk is fk under the null senor only requires a memory of a data set of size 2 and only
hypothesis H0 , and is gk under the alternative hypothesis involves O(1) computations.
H1 . For each 1 ≤ k ≤ K, the Kullback-Leibler information The second scenario is the system with limited local mem-
numbers (or relative entropy) ory, where the sensors do not have access to their past
Z ³ g (x) ´ observations. The following three possible cases have been
k
I(gk , fk ) = log gk (x)dx considered in [22].
fk (x)
and Case (iii) System with Neither Feedback from the Fusion
Z ³ f (x) ´ Center nor Local Memory:
k
I(fk , gk ) = log fk (x)dx Uk,n = φk,n (Xk,n ). (4)
gk (x)
are finite and positive. Moreover, Case (iv) System with no Feedback and Local Memory
Z ³ Restricted to Past Sensor Messages:
gk (x) ´2
log gk (x)dx < ∞,
fk (x) Uk,n = φk,n (Xk,n ; Uk,[1,n−1] ). (5)
3

Case (v) System with Full Feedback, but Local Memory center. To take into account limited computational capabilities
Restricted to Past Sensor Messages: at the sensor level, a more practice-based view of quantized
feedback is that the fusion center determines and sends each
Uk,n = φk,n (Xk,n ; En−1 ), (6) sensor their own threshold values to be used in their sensor
where En−1 is the past sensor messages defined in (3). message functions.
As illustrated in [22], in the system with limited local It turns out that it will be sufficient for the fusion center to
memory, the Bayes formulation is tractable only in the case send one-shot one-bit feedback in order to construct simple
with full feedback specified in (6), but it has been an open sequential tests which are asymptotically Bayes in the system
problem to derive the properties of the Bayes solution due with limited local memory. Motivated by Abramson [1], and
to its complicated structure. In order to develop simple de- Kiefer and Sacks [6], our proposed tests will divide the
centralized sequential tests with good performances, another decision making into two stages. In the first stage, preliminary
closely related open problem raised in [22] is whether station- samples are taken at the sensors until the fusion center makes
ary quantizers, i.e., stationary sensor message functions, can a preliminary decision on which of H0 and H1 is true. The
lead to decentralized sequential tests that are asymptotically fusion center then sends the preliminary decision back to
optimal in the system with limited local memory. That is, can all sensors so that the sensors can optimize their respective
we find asymptotically Bayes sequential tests from the case quantizers according to the preliminary decision of the fusion
specified in (4) where the sensor message functions φk,n ≡ φk center. In the second stage, all sensors use the new “optimal”
do not change over time n? quantizers to send all future sensor messages until the fusion
We will provide asymptotic answers to both of these two center makes a final decision, which is based only on the
open problems in the system with limited local memory. sensor messages in the second stage. The number of time steps
For the second open problem, we will show that tests with taken in the first stage (when the cost c is small) is large but
stationary quantizers are generally asymptotically suboptimal is small relative to that in the second stage.
except in some extreme situations. Moreover, to address the Another way to look at our proposed tests is from the
optimality properties within the class of stationary quantizers, sensor’s point of view. In our proposed tests in the system
we also propose a new minimax formulation of finding optimal with limited local memory, at each sensor, the sensor message
stationary quantizers, and study in detail in the case of additive function is what we call a tandem quantizer, where each sensor
Gaussian sensor noises. only uses two different sensor quantizers with at most one
For the first open problem, it is very difficult to study the switch between these two quantizers. Note that the sensor
properties of the Bayes solution directly although its structure quantizers in the Bayes solution need to be changed at each
was characterized in [22]. Thus we adopt an indirect approach time step, whereas one can only obtain suboptimal decentral-
by establishing sharp upper and lower bounds on the properties ized tests in general if the sensor quantizers do not change over
of the Bayes solution, where the leading terms in the upper and time, i.e., sensor quantizers are stationary. Hence, from the
lower bounds are the same. A key step is to develop simple viewpoint of the number of switches, a tandem quantizer is the
but asymptotically optimal decentralized sequential tests in the simplest possible candidate to construct asymptotically Bayes
system with limited local memory. To accomplish this, we (or optimal) decentralized tests in the system with limited local
will consider the following new case where the fusion center memory.
quantizes the past sensor messages En−1 in (3) before sending The remainder of this article is organized as follows. In
the feedback to the sensors. Section II, we provide a formal mathematical formulation
Case (vi) System with Quantized Feedback, and Local of decentralized sequential hypothesis testing problems and
Memory Restricted to Past Sensor Messages: present some well-known results on the optimal centralized
sequential test to provide a benchmark for the decentralized
Uk,n = φk,n (Xk,n ; Vk,n−1 ), (7) sequential tests. Section III studies the decentralized sequential
detection problems in the system with full local memory, and
where the quantized feedback
Section IV considers the system with limited local memory.
Vk,n−1 = ψk,n−1 (En−1 ) Each of Sections III and IV develops asymptotic theory and
offers asymptotically Bayes tests which are easy to implement.
is chosen from a finite (possibly binary) list and En−1 defined Section V focuses on tests with stationary sensor message
in (3) is the sensor messages available before time n. functions, and proposes a new minimax formulation for the
Note that this new case can be thought of as a generalization optimal stationary sensor message functions. In Section VI,
of three cases considered in [22]. Specifically, this new case we give a concrete example to illustrate our asymptotic results.
becomes the cases defined in (4), (5) and (6) when the The proofs of most theorems are included in the Appendix.
(quantized) feedback Vk,n−1 is null, past sensor messages at
each sensor, and full feedback, respectively. From the practical
II. P ROBLEM F ORMULATION AND N OTATIONS
point of view, this new case is more appealing than the case
with full feedback specified in (6) since the full feedback Suppose there are K sensors in a sensor network. At time
assumes the fusion center send all past message information n, an observation Xk,n is made at each sensor Sk . Assume
En−1 back to the sensors even for large n, and that will require there are two simple hypotheses H0 and H1 . Under the
large communication bandwidth between sensors and fusion hypothesis H0 , the observations at sensor Sk , Xk,1 , Xk,2 , · · ·
4

are independent and identically distributed (i.i.d.) with density sequential tests attains the lower bound asymptotically. We will
function fk , and under the hypothesis H1 , they have den- establish asymptotic optimality theorems for both scenarios of
sity gk . Denote by P0 and P1 respectively the probability decentralized decision systems: full local memory (specified
measures under hypotheses H0 and H1 . Let E0 and E1 in (1) and (2)) and limited local memory (specified in (4)-
be the corresponding expectations. Note that this notation is (7)).
slightly different than the conventional notation for conditional We now introduce some notation. Let D be a positive
probabilities and expectations under the Bayes formulation, integer. Consider a random variable X whose density function
but it has advantages in the asymptotic optimality approach. is either f or g with respect to some σ-finite measure, and
Based on the information available at Sk at time n, a sensor assume that the Kullback-Leibler information number I(g, f )
message Uk,n , specified in (1) or (2) for the system with full is finite. For a (deterministic or random) measurable function
local memory, and in (4) - (7) for the system with limited φ from the range of X to a finite alphabet of size D, say
local memory, is chosen from a finite alphabet and sent to {0, 1, . . . , D − 1}, denote by fφ and gφ respectively the
the fusion center. Without loss of generality, it is assumed probability mass function of φ(X) when the density of X
that Uk,n is chosen from the finite set {0, 1, · · · , Dk − 1}. is f and g. Let
At time n, the fusion center may take one of three possible gφ (φ(X))
Zφ = log ,
actions: (1) lets all sensors continue taking observations and fφ (φ(X))
sending messages to the fusion center; (2) informs all sensors and define
to stop taking observations and decides the null hypothesis H0
is true; or (3) informs all sensors to stop taking observations ID (g, f ) = sup Eg (Zφ ) (9)
φ
and decides the alternative hypothesis H1 is true.
To define measures of performance for a decentralized and
sequential test δ, denote by τ the corresponding stopping time
VD (g, f ) = sup Eg (Zφ2 ). (10)
when the fusion center decides to stop taking observations. φ
There are four quantities that are useful and appropriate
to evaluate decentralized sequential tests: (1) probability of It is well known [20] that ID (g, f ) ≤ I(g, f ), i.e., the reduc-
Type I error, P0 (reject H0 ), (2) probability of Type II error, tion of the data from X to φ(X) cannot increase the infor-
P1 (accept H0 ), (3) expected decision-making time under H0 , mation. Tsitsiklis [20] showed that the supremum ID (g, f ) is
E0 (τ ), and (4) expected decision-making time under H1 , achieved by a Monotone Likelihood Ratio Quantizer (MLRQ)
E1 (τ ). The last two performance measures are included since ϕ of the form
τ, the number of time steps needed to make a final decision, g(X)
ϕ(X) = d if and only if λd ≤ < λd+1 ,
depends on observations and is thus a random variable. Ideally f (X)
we want these four quantities as small as possible.
where 0 = λ0 ≤ λ1 ≤ · · · ≤ λD−1 ≤ λD = ∞ are constants.
Veeravalli, Basar and Poor [22] considered the following
Similarly, by switching the roles of f and g, the supremum
Bayes formulation of the decentralized sequential detection
ID (f, g) is achieved by a MLRQ with another (likely different)
problem. Assume the two hypotheses H0 and H1 have
set of constant λ’s. These optimal MLRQ’s are not easily
known prior probabilities, say, π = P{H0 is true} = 1 −
calculated, but we follow the standard practice in the literature
P{H1 is true} for some 0 ≤ π ≤ 1, and each time step
of developing procedures that assume sensor messages are
taken for decision making costs a positive amount c. Also
constructed optimally in the sensor. The definition of VD (g, f )
for i = 0, 1 let Wi (> 0) be the cost of falsely rejecting Hi .
was first proposed in [10], [11]. Some of our theorems assume
Then the total expected cost or Bayes risk of a decentralized
that VD (g, f ) < ∞, and sufficient conditions for finiteness of
sequential test δ with a stopping time τ is
V2 (g, f ) are given in Theorem 2 and Corollary 2 of [11].
Rc (δ) = π[cE0 (τ ) + W0 P0 {reject H0 }] (8) Using these notations, define the information numbers
+(1 − π)[cE1 (τ ) + W1 P1 {reject H1 }], K
X K
X
ID = IDk (fk , gk ) and JD = IDk (gk , fk ), (11)
where the notation Rc is used to emphasize that the cost for
k=1 k=1
each time step is c. The Bayes formulation of decentralized
sequential hypothesis testing problems can then be stated as where D = (D1 , D2 , . . . , DK ). Moreover, define
follows. K
X K
X
Problem (P1): Minimize the Bayes risk Rc (δ) in (8) over Itot = I(fk , gk ) and Jtot = I(gk , fk ). (12)
k=1 k=1
all possible choices of sensor message functions and over all
possible policies at the fusion center. These four information numbers are key to our theorems.
Unfortunately, it is nearly impossible to find exactly op- In the remainder of this section, let us consider the optimal
timal solutions in sensor networks (for some special cases centralized sequential test, i.e., the Bayes solution to Problem
see [22]), and only “asymptotic optimality” results seem to (P1) in the centralized setting when the fusion center has
be working. In the asymptotic optimality approach, we first access to all sensor observations, or equivalently, when all
construct an asymptotic lower bound of Rc (δ) in (8) as c sensors send their raw data to the fusion center (Uk,n = Xk,n ).
goes to 0. Then we show that a given class of decentralized The optimal centralized sequential test not only provides
5

a benchmark for all decentralized sequential tests, but also III. S YSTEM WITH F ULL L OCAL M EMORY
sheds light on developing asymptotically optimal decentralized Let ∗
δA (c) denote a Bayes solution to the decentralized

sequential tests. Denote by δcen (c) the optimal centralized sequential hypothesis testing problem in the system with full

sequential test. It is well-known [24] that δcen (c) is in the form local memory, specified in (1) or (2). The main result in this
of Wald’s sequential probability ratio tests (SPRT) defined as section is to develop a decentralized test δA (c) in the case
follows. Choose two constant a > 0 and b > 0, and define the specified in (1) such that, under some general conditions,
log-likelihood ratio limc→0 Rc (δA ∗
(c))/Rc (δA (c)) = 1, i.e., {δA (c)} is “asymp-
n ³X totically Bayes” in the system with full local memory as the
g(Xk,i ) ´
X K
Ln = log . (13) cost of time step taken for decision making goes to 0. Here
i=1
f (Xk,i ) Rc , of course, denotes the Bayes risk defined in (8) when c
k=1
is the cost of time step taken for decision making.
Define a stopping time N = first n ≥ 1 such that Ln 6∈
(−b, a). In other words,
A. The Structure of Our Proposed Tests δA (c)
N = inf{n ≥ 1 : Ln 6∈ (−b, a)}. (14) In the system with full local memory when Dk ≥ 3 for each
k, our proposed test δA (c) is defined as follows (see Remark 1
The SPRT will stop taking observations at time N and
below for the extension of our proposed test in case of binary
Reject H0 if LN ≥ a, sensor messages).
Accept H0 if LN ≤ −b. 1) Sensor Policy: Each sensor Sk uses its local sensor
observations to conduct an SPRT test with appropriate but
The appropriate values of the thresholds a and b in the fixed boundaries, and sends its local decision to fusion center.

optimal centralized test δcen (c) are determined by the a priori Specifically, each sensor Sk calculates its local log-likelihood
probability π of H0 and the costs W0 , W1 and c in (8). From ratio statistic Lk,n recursively by
the asymptotic viewpoint, however, the thresholds a and b in

δcen (c) satisfy a ≈ b ≈ | log c| as c → 0, as shown in [4], gk (Xk,n )
Lk,n = Lk,n−1 + log (17)
[24]. To see this, note that as c → 0, both a and b are large fk (Xk,n )
and it is well-known [24] that SPRT in the centralized setting for n ≥ 1 and Lk,0 = 0. Then each sensor uses a quantizer
satisfies on Lk,n to send messages to the fusion center, i.e., the sensor
message sent from the sensor Sk to the fusion center is defined
P0 (reject H0 ) ≈ e−a , P1 (reject H0 ) ≈ e−b ,
by
E0 (N ) ≈ b/Itot , and E1 (N ) ≈ a/Jtot , 
 0 if Lk,n ≤ −rk | log c|,
where Itot and Jtot are defined in (12). Hence, the Bayes risk Uk,n = 1 if Lk,n ≥ ρk | log c|,

Rc of an SPRT can be approximated by NULL otherwise.
h b i h a i where
π c + W0 e−a + (1 − π) c + W1 e−b .
Itot Jtot I(fk , gk ) I(fk , gk )
rk = PK = , (18)
Minimizing this approximation to Rc we have k=1 I(f ,
k k g ) Itot
and
a ≈ − log c + log[Jtot W0 π/(1 − π)] ≈ | log c|, (15)
I(gk , fk ) I(gk , fk )
b ≈ − log c + log[Itot W1 (1 − π)/π)] ≈ | log c|, ρk = PK = . (19)
k=1 I(gk , fk )
Jtot
as c → 0. Thus an SPRT with threshold values a = b = | log c| Here rk and ρk can be thought of as the weight of sensor Sk
is asymptotically Bayes to Problem (P1) in the centralized in the overall final decision under the hypotheses H0 and H1 ,
setting. Since we are interested in asymptotic results in this respectively. The message “NULL” is a special sensor symbol
article, to simplify our notation, we refer to the optimal to indicate that the sensor has not reached a local decision yet.
centralized sequential tests as the SPRT whose stopping time For example, “NULL” could be represented by the situation
is defined in (14) with thresholds a = b = | log c|. Using the when the sensor does not send any sensor messages to the
approximations for the SPRT again, it is easy to see that the fusion center, e.g., the sensor is silent.

risk corresponding the optimal centralized test δcen (c) is 2) Fusion Center Policy: After receiving the local “sensor
π 1−π decisions” Uk,n ’s from the sensors, the fusion center then

Rc (δcen (c)) ≈ c| log c|( + ). (16) combines all local sensor decisions by using an “AND” rule.
Itot Jtot
That is, the fusion center will
By (15) and (16), from the asymptotic point of view, the 
optimal centralized sequential test and its Bayes risk depend  stop and accept H1 if Uk,n = 1 for all 1 ≤ k ≤ K,
mainly on c, Itot , Jtot and π and are insensitive to the costs W0 stop and accept H0 if Uk,n = 0 for all 1 ≤ k ≤ K,

and W1 of making the wrong decision. Moreover, the Bayes continue sampling otherwise.

risk of the optimal centralized sequential test δcen (c) is mainly Note that since the local log-likelihood ratio statistic Lk,n
the cost of time step for decision making. can be calculated recursively, our proposed test δA (c) reduces
6

the local memory requirements at every time n from the full prove the asymptotic optimality properties of δA (c). Note that
local memory {Xk,1 , · · · , Xk,n } to the data set of size 2, i.e., such a lower bound can be established by the optimality of
{Lk,n−1 , Xk,n }. SPRT in the centralized version, and it turns out this bound
Furthermore, it is also easy to see that in single-sensor will be sufficient for our purpose in the system with full local
systems, i.e., K = 1, our proposed test δA (c) coincides memory. The following theorem was a slight modification of
with the optimal centralized SPRT. However, our proposed Theorem 2 in [4]. While this theorem follows at once from the
test δA (c) requires that each sensor shall continue sending optimality of the centralized SPRT and its asymptotic property
the local sensor decisions to the fusion center even after the established in (16), an alternative proof, due to Chernoff [4],
log-likelihood ratio statistic exceeds the local thresholds. This is given in the Appendix so that parts of the argument can be
essential feature can be seen from the following heuristic applied conveniently in the proof of Theorem 4.
argument, which provides the motivation of δA (c). Theorem 1: For any sequential or fixed-sample-size tests
Consider the optimal centralized SPRT whose stopping time δ in the decentralized or centralized setting, if Rc (δ) ≤
is defined in (14) with thresholds a = b = | log c|. By (17), M c| log c| for some constant M, then as c → 0,
the log-likelihood ratio statistics at sensor Sk is ³ π 1 − π´
n Rc (δ) ≥ (1 + o(1))c| log c| + .
X gk (Xk,i ) Itot Jtot
Lk,n = log .
fk (Xk,i ) where o(1) → 0 does not depend on δ, and Itot and Jtot are
i=1
PK defined in (12).
Combining this with (13) yields that Ln = k=1 Lk,n for all
Next, we consider the behavior of δA (c) proposed in the
n. That is, at any given time, the log-likelihood ratio statistic
previous subsection for small values of c. For this purpose,
at the fusion center in the centralized version is the sum of the
we need to estimate the probabilities of making incorrect
local log-likelihood ratio statistics at all sensors. Thus, if each
decision when δA (c) is used and the performance of T̂ (c), the
sensor were able to send its local log-likelihood ratio statistic
number of time steps required by δA (c) to make decisions.
(a sufficient statistic) to the fusion center, then the fusion
The following theorem, whose proof is in the Appendix,
center would have been able to perform the optimal centralized
summarizes the asymptotic properties of our proposed test
SPRT. However, under the restriction that the sensor messages
δA (c) in the system with full local memory.
belong to a finite alphabet, each sensor may not be able to
send its local log-likelihood ratio statistic directly to the fusion Theorem 2: For any 0 < c < 1, the probabilities that our
center. Fortunately, the idea can be salvaged by the strong proposed test δA (c) makes incorrect decision satisfy
law of Plarge numbers (SLLN). Observe that when H0 is true,
K P0 {reject H0 } ≤ c and P1 {rejects H1 } ≤ c. (21)
Ln = k=1 Lk,n will go to −∞ as n goes to ∞. Hence,
under the null hypothesis H0 , the stopping rule of the optimal Moreover, if we let T̂ (c) denote the number of time steps
centralized SPRT can be approximated by required by δA (c) to make decisions, then as c → 0,
K
X E0 (T̂ (c)) ≤ (1 + o(1))| log c|/Itot , (22)
{ Lk,n ≤ −| log c|} (20)
E1 (T̂ (c)) ≤ (1 + o(1))| log c|/Jtot ,
k=1

for sufficiently small c. Now the SLLN implies that Lk,n /n → where Itot and Jtot are defined in (12).
I(fk , gk ) with probability 1 under the null hypothesis Now we are in a position to show that our proposed test
H0 . Hence, the weight of Lk,n in the sum is roughly δA (c) is asymptotically optimal in the system with full local
PK
I(fk , gk )/ k=1 I(fk , gk ) = rk , and thus (20) can be further memory.
approximated by {Lk,n ≤ −rk | log c| for all 1 ≤ k ≤ K},
which is exactly the stopping rule when our proposed test Theorem 3: Under the assumptions of (A1) and (A2) and
δA (c) decides that H0 is true. In other words, under the Dk ≥ 3 for all 1 ≤ k ≤ K, {δA (c)} is asymptot-
null hypothesis H0 , the stopping rule of the optimal cen- ically Bayes in the system with full local memory, i.e.,
∗ ∗
tralized SPRT can be approximated by that of our proposed limc→0 Rc (δA (c))/Rc (δA (c)) = 1 where δA (c) is a Bayes
test δA (c). Similarly, the conclusion also holds under the solution in the system with full local memory when c is the
alternative hypothesis H1 . Since the optimal centralized SPRT cost per time step for decision making.
can be approximated by our proposed test δA (c) under both Proof: From Theorem 2, for our proposed test δA (c), we
hypotheses H0 and H1 , it is not surprising to see from the have
next subsection that our proposed test δA (c) is asymptotically 1
optimal not only in the system with full local memory, but lim sup P0 {reject H0 } = 0,
c→0 c| log c|
also in the centralized setting. 1
lim sup P1 {reject H1 } = 0,
c→0 c| log c|
B. Asymptotic Optimality 1 1
lim sup E0 (T̂ (c)) ≤ ,
We first establish asymptotic lower bounds on the Bayes c→0 | log c| Itot
risk Rc (δ) in (8) for any decentralized tests in the system 1 1
lim sup E1 (T̂ (c)) ≤ ,
with full local memory. Later these bounds will be used to c→0 | log c| Jtot
7

where T̂ (c) is the number of time step required by δA (c) to Bayes in the system with full local memory regardless of the
make decisions. Combining these with the definition of Rc in value of the prior probability π. This is very attractive as one
(8) yields does not need to worry about how to choose a prior distribution
Rc (δA (c)) π 1−π when using our proposed tests.
lim sup ≤ + . (23)
c→0 c| log c| Itot Jtot
Remark 4: Since our results are first-order asymptotic, the
If our proposed test δA (c) is not asymptotically Bayes, we efficiency of our proposed test δA (c) (with respect to the
would have optimal centralized test) under nonasymptotic scenarios de-

lim inf Rc (δA (c))/Rc (δA (c)) < 1 − ² pends on the speed of convergence of its performance function
c→0 to the corresponding asymptotic values. From the proof of

for some positive constant ² which implies, due to (23), that Theorem 2, as c → 0, we have Rc (δA (c))/Rc (δcen (c)) = 1+
D+o(1)
there is a sequence {ci } with ci → 0 such that √ , where the constant D > 0 can be derived explicitly
| log c|
³ π 1 − π´ under additional reasonable conditions from nonlinear renewal

Rci (δA (ci )) < (1 − ²)ci | log ci | + . (24) theory ([17], [26]). Similar results (including the constant D)
Itot Jtot
∗ were reported both theoretically and experimentally in [11] in
Hence δA (ci ) satisfies
³ the sufficient
´ condition of Theorem 1
π
the context of decentralized sequential change-point detection
with M = (1 − ²) Itot + 1−π
Jtot , and thus Theorem 1 implies problems. If the number of sensors K is large in the system,
that the value of D can be very large. In that situation, when
³ π p the
∗ 1 − π´ value c is only moderately small, the value of D/ | log c|
Rci (δA (ci )) ≥ (1 + o(1))ci | log ci | + ,
Itot Jtot can be large, implying that the nonasymptotic performance of
which is a contradiction of (24) as ² > 0 is a constant. So the δA (c) can be very poor.
theorem holds. Remark 5: In our proposed test δA (c), a special symbol
Observe that our proposed test δA (c) does not use feedback “NULL” is used to denote that sensors have not yet reached
from the fusion center, but it is asymptotically optimal in the local decisions. When the local sensor decision is “NULL,”
system with full local memory. This fact proves the following we do not send anything. From a practical point of view,
interesting result. this is quite desirable, as it actually reduces the transmission
Corollary 1: Under the conditions of Theorem 3, feedback rate below 1 bit. From a theoretical point of view, however,
from the fusion center does not improve asymptotic perfor- our proposed test implicitly assumes that Dk ≥ 3 for each k
mance in the system with full local memory, specified in (1)- (i.e., the sensor symbols are 0, 1 and “NULL”). One referee
(2). asked what happens in case of only binary sensor messages
(Dk = 2), say, with the symbol pairs being either (0, 1) or (1,
“NULL”). It turns out that such asymptotically Bayes tests can
C. Additional Remarks be still constructed. To illustrate our idea explicitly, without
Remark 1: Although we do not know the structure of Bayes loss of generality, we assume that the binary senor symbols
solution δA∗
(c) in the system with full local memory, its are 0 and 1. If necessary, we can treat 0 as the special symbol
asymptotic properties can be easily established based on our “NULL” and do not send anything if the sensor message
results. By Theorems 1 - 3, is 0. The key idea is to extend our proposed test δA (c) by
³ π using “blocks” with block length 2. In each block, “00” and
∗ 1 − π´ “11” represents that the sensor decides H0 or H1 is true,
Rc (δA (c)) = (1 + o(1))c| log c| + .
Itot Jtot respectively, whereas “01” or “10” means that a local decision
as c → 0. Moreover, if we denote by TA∗ (c) the stopping time has not been reached yet. Mathematically, our proposed test

(the sample size) of the Bayes solution δA (c), then the proof with binary sensor messages can be defined as follows. At
of Theorem 1 shows that each sensor, for all n ≥ 1, define Uk,2n−1 = I{Lk,2n−1 ≥ 0},
where I{A} is the indicator function of the set A, and
E0 (TA∗ (c)) = (1 + o(1))| log c|/Itot , 
E1 (TA∗ (c)) = (1 + o(1))| log c|/Jtot , 
 1 if Uk,2n−1 = 1 and


 Lk,2n ≥ ρk | log c|,
as c → 0. Uk,2n = 0 if Uk,2n−1 = 0 and



 Lk,2n ≤ −rk | log c|,
Remark 2: The proof of Theorem 3 actually shows that 
1 − Uk,2n−1 otherwise.
our proposed test {δA (c)} is also asymptotically Bayes in the

centralized setting, i.e., limc→0 Rc (δcen (c))/Rc (δA (c)) = 1, Then the fusion center will make decisions only at time 2n,

where δcen (c) is the optimal centralized SPRT. In other words, and

our proposed test δA (c) has same asymptotic performance as  stop and accept H1 if Uk,2n−1 = Uk,2n = 1 for all k,
the optimal centralized test. stop and accept H0 if Uk,2n−1 = Uk,2n = 0 for all k,

continue sampling otherwise.
Remark 3: Our proposed test δA (c) does not involve π, the
prior probability of the null hypothesis, but it is asymptotically By a fairly straightforward though tedious extension of the
8

proof of Theorem 2, it can be shown that this test is asymp- function at sensor Sk is switched to the stationary MLRQ
totically Bayes in the system with full local memory. ψk which maximizes the Kullback-Leibler information number
I(gψ,k , fψ,k ), where fψ,k and gψ,k are the probability mass
IV. S YSTEM WITH L IMITED L OCAL M EMORY function induced on Uk,n when the observations Xk,n are
Let δB∗
(c) denote a Bayes solution to the decentralized distributed as fk and gk , respectively. On the other hand, if
sequential hypothesis testing problem in the system with V = 0, then each sensor message function should be switched
limited local memory, specified in (6) or (7). The main result in to the stationary MLRQ ψk which maximizes I(fψ,k , gψ,k ). To
this section is to develop a decentralized sequential test δB (c) abuse the notation, the stationary sensor message function at
in the case specified in (7) such that, under certain restric- sensor Sk in the second stage are denoted by ψk to emphasize
tions, limc→0 Rc (δB ∗
(c))/Rc (δB (c)) = 1, i.e., that {δB (c)} that an optimal MLRQ should be used. However, the actual
is “asymptotically Bayes” in the system with limited local choice of ψk will depend on the quantized feedback V.
memory as the cost of time step taken for decision making In the second stage, the fusion center begins a new cycle of
goes to 0, and Rc denotes the Bayes risk defined in (8) when making decisions, discarding all previous sensor messages and
c is the cost of time step taken for decision making. conclusions and starting afresh on the incoming i.i.d. vector
sensor messages Un = (U1,n , · · · , UK,n ) which are sent from
the sensors using optimal stationary MLRQ ψk ’s. To be more
A. The structure of our proposed test δB (c) specific, at time n of the second stage, the fusion center center
In the system with limited local memory, our proposed test calculates the log-likelihood ratio L̂u,n of Un recursively by
δB (c) splits the decision making into two stages.
K
X
1) First Stage: The purpose of this stage is to obtain a gψ,k (Uk,n )
L̂u,n = L̂u,n−1 + log ,
preliminary result on which of the two hypotheses H0 and fψ,k (Uk,n )
k=1
H1 is true. To do so, each sensor Sk uses any fixed (not
necessarily optimal) stationary MLRQ φk , i.e., the sensor for n ≥ 1 and L̂u,0 = 0. Then the fusion center decides to
message function stop the second stage at the time
Uk,n = φk (Xk,n ) = d if and only if Mc(2) = first n ≥ 1 such that |L̂u,n | ≥ | log c|, (26)
gk (Xk,n )
λk,d ≤ < λk,d+1 , and makes the following final decision:
fk (Xk,n ) (
decide H0 is true , if L̂u,M (2) ≤ −| log c|;
where 0 = λk,0 ≤ λk,1 ≤ · · · ≤ λk,Dk −1 ≤ λk,Dk = ∞ are c

pre-specified constants. decide H1 is true , if L̂u,M (2) ≥ | log c|.


c
Based on independent, identically distributed observations
Un = (U1,n , · · · , UK,n ), the fusion center calculates the log- It is important to emphasize that the threshold of the log-
likelihood ratio Lu,n of Un recursively by likelihood ratio statistic is log | log c| in the first stage and is
K
| log c| in the second stage. These choices of thresholds make
X gφ,k (Uk,n )
Lu,n = Lu,n−1 + log , sure that the number of time steps taken in the first stage (when
fφ,k (Uk,n ) the cost c is small) is large but is small relative to that in the
k=1
second stage.
for n ≥ 1 and Lu,0 = 0, where fφ,k and gφ,k are the prob-
The motivation of our proposed test δB (c) is simple: instead
ability mass function induced on Uk,n when the observations
of jointly optimizing the sensor and fusion center policies,
Xk,n are distributed as fk and gk , respectively. Then the fusion
which is an extremely difficult problem, we divide the opti-
center decides to stop the first stage at the time
mization problem into two stages: the first stage is used to
Mc(1) = first n ≥ 1 such that |Lu,n | ≥ log | log c|, (25) optimize the policies at the sensors, and then the second stage
uses the “optimal” sensor policies to develop optimal policy at
and makes a preliminary decision the fusion center. Similar ideas have been applied in sequential
(
0, if Lu,M (1) ≤ − log | log c|; experiment design, see, for example, [1], [6].
V = c
In our proposed test δB (c), each sensor uses what we call
1, if Lu,M (1) ≥ log | log c|.
c
a tandem quantizer. A tandem quantizer is a sequence of
2) Second Stage: In this stage, each sensor needs to switch two stationary quantizers with almost one switch from one
to an “optimal” MLRQ with respect to the preliminary deci- quantizer to the other. On the one hand, as we will see in the
sion in the first stage, and the fusion center will use the sensor next section, stationary quantizers at sensors will not lead to
messages from the second stage to make a final decision on asymptotically optimal Bayes solutions in the cases specified
which of hypotheses H0 and H1 is true. in (6) or (7) except in some extreme situations. In other words,
Specifically, after the fusion center stops the first stage, sensor quantizers generally need to be changed over time in
it will send its preliminary decision V to all sensors as order to develop asymptotically Bayes decision rules. On the
a binary quantized feedback. When receiving the quantized other hand, in the Bayes solution derived in [22] for the case
feedback V from the fusion center, each sensor Sk needs to specified in (6), the sensor quantizers need to be updated at
adjust the constants thresholds in the stationary MLRQ’s to each time step. This precludes the use of Bayes solution in
optimal values as follows. If V = 1, then the sensor message practice. In fact, it is nontrivial to do numerical simulations for
9


the Bayes solution due to its high frequencies of switches on Remark 6: The Bayes test δB (c) in the case specified in
sensor quantizers. Hence, from the viewpoint of the number of (6) was actually found in [22], however, its performance is
switches, a tandem quantizer is the simplest possible candidate sensitive to the error in estimating the value of π, the prior
to construct asymptotically Bayes rules. probability of H0 . On the contrary, our proposed test δB (c)
uses tandem quantizers or two stages to adaptively adjust itself
B. Asymptotic Optimality and does not depend on the prior probability π.
We need to first establish asymptotic lower bounds on the In addition, it has been a long standing open problem to

Bayes risk Rc (δ) in (8) for any decentralized tests in the study the asymptotic properties of the Bayes test δB (c) in the
system with limited local memory. Of course the lower bounds literature, see [21], [22]. Our theorems provide an asymptotic
in Theorem 1 still hold in the system in the limited local approximation to its performance. To see this, Theorem 5
memory. Unfortunately, they are too crude in the system with implies that
limited local memory, although the previous section showed Rc (δB (c)) π 1−π
that they are sharp in the system with full local memory. lim sup ≤ + .
c→0 c| log c| ID JD
The following theorem, whose proof is highly nontrivial and
Then by Theorems 4 and 6, we have
included in the Appendix, is of fundamental importance for
proving asymptotic optimality in the system with limited local ∗
Rc (δB (c)) = (1 + o(1))Rc (δB (c))
memory. ³ π 1 − π´
Theorem 4: Assume VDk (gk , fk ), defined in (10), and = (1 + o(1))c| log c| +
ID JD
VDk (fk , gk ), defined similarly by switching the role of fk and
gk , are finite for all 1 ≤ k ≤ K. Then for any tests δ in the as c → 0. Moreover, if we denote by TB∗ (c) the stopping time

system with limited local memory, if Rc (δ) ≤ M c| log c| for (the time step taken for decision) of the Bayes solution δB (c),
some constant M, then then the proof of Theorem 4 shows that
³ π 1 − π´ E0 (TB∗ (c)) = (1 + o(1))| log c|/ID ,
Rc (δ) ≥ (1 + o(1))c| log c| + ,
ID JD E1 (TB∗ (c)) = (1 + o(1))| log c|/JD ,
where ID and JD are defined in (11).
as c → 0.
Next, we study the asymptotic performance of our proposed
test δB (c) for small values of c. As in the previous section, we V. S TATIONARY S ENSOR M ESSAGES
need to estimate the probabilities of making incorrect decision
when δB (c) is used and the asymptotic performances of the In the system with limited local memory, an interesting
number of time steps required by δB (c) to make decisions. open problem raised in [22] is to investigate the asymptotic
The following theorem, whose proof is in the Appendix, optimality properties of tests with stationary sensor message
summarizes the asymptotic properties of our proposed test functions, i.e., tests whose sensor message functions φk,n ≡
δB (c) in the system with limited local memory. φk do not change over time n. Tests with stationary sensor
Theorem 5: For any 0 < c < 1, the probabilities that our message functions are attractive in both theory and application
proposed test δB (c) makes incorrect decision satisfy due to their simple structures. The main purpose in this section
is to provide a negative answer to this open problem and also
P0 {reject H0 } ≤ c and P1 {rejects H1 } ≤ c. (27) to develop asymptotic optimality theory within the classes of
Moreover, if we let Mc denote the number of time step tests with stationary sensor messages.
required by δB (c) to make decisions, then as c → 0, For tests with stationary sensor message functions, a key
observation is that since the sensor message functions are
E0 (Mc ) ≤ (1 + o(1))| log c|/ID , (28) fixed, the sensor messages Un = (U1,n , · · · , UK,n ) are i.i.d.
E1 (Mc ) ≤ (1 + o(1))| log c|/JD , vectors (conditional on each hypothesis) and the fusion center
is faced with a classical sequential hypothesis testing problem.
where ID and JD are defined in (11). Hence for stationary sensor message functions, the optimal
Now we are in a position to show that our proposed test decision policy at the fusion center is an SPRT based on
δB (c) is asymptotically optimal in the system with limited i.i.d sensor message vectors Un = (U1,n , · · · , UK,n ). For this
local memory. reason, to find optimal sequential tests with stationary sensor
message functions, it is sufficient for us to focus on those with
Theorem 6: Under the assumption of finiteness of
an SPRT used at the fusion center.
VDk (gk , fk ) and VDk (fk , gk ) for all 1 ≤ k ≤ K, {δB (c)} is
Suppose a stationary sensor message function φk is used
asymptotically Bayes in the system with full local memory,
∗ ∗ at sensor Sk for 1 ≤ k ≤ K, and denote by fφ,k and
i.e., limc→0 Rc (δB (c))/Rc (δB (c)) = 1 where δB (c) is a
gφ,k the distribution induced on Uk,n = φk (Xk,n ) when the
Bayes solution in the system with full local memory when c
distribution of Xk,n is fk and gk , respectively. To abuse the
is the cost per time step for decision making.
notation, define two new information numbers
Proof: The proof follows the same lines as that of K
X
Theorem 3, using Theorems 4 and 5 for the system with I(fφ , gφ ) = I(fφ,k , gφ,k ),
limited local memory. k=1
10

K
X sensor message functions φ by
I(gφ , fφ ) = I(gφ,k , fφ,k ). (29)
k=1
h R (δ ∗ (c)) i
c cen
e(φ) = min lim inf .
These two information numbers are nothing but the relative 0≤π≤1 c→0 Rc (δφ (c))
entropy of the sensor message vector Un = (U1,n , · · · , UK,n )
with stationary sensor message functions. Note that e(φ) can be thought of as a measure which reflects
Given that a stationary sensor message function φk is used the performance efficiency of using quantized sensor observa-
at sensor Sk for 1 ≤ k ≤ K, let φ = (φ1 , · · · , φK ) and denote tion with respect to raw sensor observations.
by δφ (c) the sequential test where an optimal SPRT policy is By (16) and (30), we have
used at the fusion center so as to minimize the Bayes risk Rc h π/Itot + (1 − π)/Jtot i
in (8). By the classical results on SPRT, we have e(φ) = min .
0≤π≤1 π/I(fφ , gφ ) + (1 − π)/I(gφ , fφ )
³ π 1−π ´
Rc (δφ (c)) ≈ c| log c| + , (30) A simple algebraic calculation shows that the efficiency of
I(fφ , gφ ) I(gφ , fφ )
stationary sensor message functions φ is
as c → 0. Note that I(fφ , gφ ) ≤ ID and I(gφ , fφ ) ≤ JD , and ³ I(f , g ) I(g , f ) ´
at least one of the inequalities is strict for stationary sensor φ φ φ φ
e(φ) = min , , (31)
message function φ, except when φ is a one-to-one function, Itot Jtot
i.e, when the sensor observations Xk,n ’s are random variables where Itot and Jtot are defined in (12), and I(fφ , gφ ) and
chosen from a finite alphabet of size Dk for all k. Thus, by I(gφ , fφ ) are defined in (29).
Theorems 4 - 6, we have A minimax formulation for the “optimal” choices of sensor
Corollary 2: Under the conditions of Theorem 6, for any message functions φ is then to seek φ which maximizes the
stationary sensor message φ that is not one-to-one, δφ (c) is efficiency e(φ) in (31). While the problem of finding the
asymptotically suboptimal in the case specified in (6) or (7) “optimal” quantizer seems to be of interest on its own in the
in the system with limited local memory, except in the extreme information theory literature, to the best of our knowledge, a
situation where π = 0 or 1 and where no testing is necessary minimax formulation like ours has not been proposed so far.
since we can decide in favor of H1 when π = 0 and in favor Note that in the definition of e(φ) in (31), Itot and Jtot can
of H0 for π = 1, without taking any observations. be replaced by ID and JD , respectively. This new definition
corresponds to the situation where the benchmark test is
the Bayes solution in the system with limited local memory
Remark 7: Using the fact that ID ≤ Itot and JD ≤ Jtot , instead of the Bayes solution in the centralized version. Both
it is evident that any test with stationary message function φ definitions are useful, and either one can be used in the
is also asymptotically suboptimal in the system with full local following arguments.
memory, except when φ is one-to-one or when π = 0 or 1. To further understand the ideas behind our minimax formu-
While stationary sensor message functions generally lead to lation, it is helpful to consider a concrete example. For that
a suboptimal test in the system with limited local memory, it purpose, in the reminder of this section, we will focus only on
is reasonable to ask which is the “optimal” stationary sensor the system with a single sensor and binary sensor message, i.e.,
message function that leads to “optimal” tests among all L = 1 and Dl = 2, where the sensor observations are normally
SPRTs with stationary sensor message functions. By (30), for distributed. But we want to emphasize that our arguments can
an SPRT δφ (c) with stationary sensor message function φ, be easily extended to a general setting.
minimizing the Bayes risk in (8) is equivalent to minimizing In a system with a single sensor, suppose X1 , X2 , · · · are
π 1−π independent normal random variable with unknown mean µ
+ . and variance 1, and we are interested in testing the null
I(fφ , gφ ) I(gφ , fφ )
hypothesis H0 : µ = −θ against the alternative hypothesis
Thus we may use this minimization to define the “optimal” H1 : µ = θ for some θ > 0. Assume we need to quantize
stationary sensor message function φ. Unfortunately, by this the data X’s because of data compression and limitations of
definition, the optimal choice of φ will depend heavily on the channel bandwidth, i.e., Yn = φλ (Xn ) = I(Xn ≥ λ) for
value of π, the prior probability of the null hypothesis. This some constant λ, where I(A) is the indicator function of the
may be undesirable in practice since different researchers may set A. Then based on the quantized data Y1 , Y2 , · · · , we want
have different opinions on the value of π. to decide which of H0 and H1 is true. The problem is how to
To overcome this problem, we will use a minimax formula- choose the best quantizer, i.e., how to choose the best value
tion to define an “optimal” choice of stationary sensor message λ.
φ. As a motivation, for a sequential test δφ (c) with stationary Intuitively one would like to define Yn = I(Xn > 0), i.e.,
sensor message φ, one can define its asymptotic efficiency as λ = 0 by the symmetry argument. However, our results imply

Rc (δcen (c)) the answer might depend on what we mean by “best,” or more
Rc (δφ (c)) precisely, what is a prior probability π of the null hypothesis.
To simplify our notation in the following discussion, define

where δcen (c) is the Bayes solution in the centralized version.
Thus, it is reasonable to define the efficiency of stationary h(a, b) = a log(a/b) + (1 − a) log((1 − a)/(1 − b)), (32)
11

and VI. E XAMPLE


hθ (λ) = h(Φ(λ + θ), Φ(λ − θ)), (33) The main goal of this section is to illustrate our theoret-
ical results in previous sections through a specific example.
where Φ(·) is the distribution function of the standard normal Detailed numerical and simulation results will be presented
distribution. Now if the prior probability π = 0, then the “best” elsewhere.
quantizer is determined by the value of λ which maximizes Suppose there are K sensors with each sensor sending
I(gφλ , fφλ ) = h(Φ(−λ + θ), Φ(−λ − θ)) = hθ (−λ). In binary message to the fusion center, i.e., Dk = 2. Assume
other words, if one strongly believes H1 is true, then one that the observations at sensor Sk are i.i.d. normal random
will want to maximize the information contained in Y (in variables with mean 0 and variance σk2 under H0 and with
the sense of Kullback-Leibler information number) when the mean µk and variance σk2 under H1 . An interesting application
true underlying model is H1 . Similarly, if π = 1, then of this model, as mentioned in [18], is to use K geographically
the “best” quantizer is determined by the value of λ which separated sensors to detect a deterministic signal (or target),
maximizes I(fφλ , gφλ ) = h(Φ(−λ − θ), Φ(−λ + θ)) = hθ (λ). which is contaminated by additive white Gaussian noise at
Here we used the facts that h(a, b) = h(1 − a, 1 − b) and each sensor.
Φ(u) + Φ(−u) = 1. That is, if one strongly believes H0 is Denote by ρk = µ2k /(2σk2 ) the signal-to-noise ratio (SNR)
true, then one will want to maximize information contained in at sensor Sk , and by
Y when the true underlying model is H0 .
K
X K
X
For µ = 1, numerical calculations show that the best values
ρ= ρk = µ2k /(2σk2 ) (34)
of λ in binary sensor quantizer Ul,n = I(Xl,n > λ) are
k=1 k=1
0.6008, −0.6008 when π = 0, 1, respectively. For any other
values of π ∈ (0, 1), numerical calculations seem to suggest the signal-to-noise ratio (SNR) at the fusion center in the
that the optimal choice of λ decreases from 0.6008 to −0.6008 centralized version of this example.
as π increases from 0 to 1. In particular, if π = 1/2, then the First, in the system with full local memory specified in (1)
corresponding optimal value is λ = 0. and (2), if we assume each sensor is also allowed to be silent
Our minimax formulation tackles the optimal stationary to indicate that it has not reached any local decision, then
binary quantizer in this example from a different viewpoint our proposed test δA (c) is well-defined and is asymptotically
in the sense that we want Y to contain reasonably rich Bayes. Alternatively, we can use the test defined in Remark 1
information under both H0 and H1 . Note that for a quantizer for binary sensor messages. It is easy to show that Itot =
φλ (X) = I(X > λ), the efficiency e(φλ ) in (31) becomes Jtot = ρ, and by Theorems 1 - 3, the Bayes risk of our
proposed test δA (c) satisfies
min(hθ (λ), hθ (−λ))
e(φλ ) = , ∗
Rc (δA (c)) ∼ Rc (δA ∗
(c)) ∼ Rc (δcen (c)) ∼ c| log c|/ρ (35)
2θ2
where hθ (λ) is defined in (33). By Theorem 7 below, for a as c → 0, where δA ∗ ∗
(c) and δcen (c) are Bayes solution in the
given θ > 0, e(φλ ) is maximized at λ = 0, and thus the system with full local memory and in the centralized version,
best binary sensor quantizer under our minimax formulation respectively. Here and everywhere below xc ∼ yc as c → 0
is Ul,n = I(Xl,n > 0), which is consistent with our intuition. means that limc→0 (xc /yc ) = 1.
It is worth emphasizing that while our minimax formu- Next, in the system with limited local memory, Theorem
lation and the Bayes formulation with π = 1/2 lead to 2 and Corollary 2 in [11] show that both VDk (fk , gk ) and
the same optimal stationary sensor message functions for VDk (gk , fk ) are finite if fk and gk are Gaussian distributions.
normal distributions, they may lead to different answers for Thus the sufficient conditions in Theorems 4 and 6 holds, and
other distributions. For instance, if X1 , X2 , · · · are i.i.d. with our proposed test δB (c) is asymptotically Bayes in the system
exponential density function fθ (x) = θ exp(−θx) for x ≥ 0, with limited local memory. Note that
and we want to test H0 : θ = 1/2 against H1 : θ = 2. Then
K
X
the best choices of λ in stationary quantizer Y = I(X > λ)
ID = JD = sup h(Φ(λ − µk /σk ), Φ(λ)),
are 2.3088 and 1.3949 under our minimax formulation and the λ
k=1
Bayes formulation with π = 1/2, respectively. The difference
mainly arises from the asymmetric properties of Kullback- and h(a, b) is defined in (32). Moreover, by Proposition 2 in
Leibler information numbers, i.e., I(g, f ) 6= I(f, g). From our [11], if the SNR ρk ’s are small at all sensors, then
point of view, our minimax formulation is more reasonable 2 2
than the Bayes formulation (with π = 1/2) as ours takes into ID = JD ≈ ρ≈ ρ.
π 3.14
account the difference in information contained in the raw data
Therefore, in the system with limited local memory in the case
X’s under H0 and H1 .
specified in (6) or (7), if all SNRs at local sensors are small,
To complete this section, we will state the following theo- then the Bayes risks of our proposed test δB (c) and the Bayes
rem, whose proof is in the Appendix. ∗
solution δB (c) satisfy
Theorem 7: For given θ > 0, min(hθ (λ), hθ (−λ)) is max- ∗
Rc (δB (c)) ∼ Rc (δB (c)) ∼ 1.57c| log c|/ρ, (36)
imized at λ = 0, where hθ (λ) is defined in (33).
as c → 0.
12

Finally, at sensor Sk , by linear transformation and Theorem There are a number of interesting problems which have
7, it is easy to show that the optimal binary stationary sensor not been addressed here. In practice, one may be interested
messages under our minimax formulation is φk (x) = I(x > in testing multiple hypotheses as in [5], [8]. The results
µk /2), which is consistent with our intuition. When this developed here are for the binary hypothesis testing problems,
optimal binary stationary quantizer is used at sensor Sk for but they provide benchmarks and ideas for the development
all k, the information numbers defined in (29) in this example of tests in multiple hypothesis testing problems. It is also
become of interest to study the system where the observations at
³ the different sensors may be dependent. Moreover, while the
µk ´
K
X µk
I(fφ , gφ ) = I(fφ , gφ ) = h Φ(− ), Φ( ) , tests developed are asymptotically optimal, they may perform
2σk 2σk poorly in some practical situations, especially in the system
k=1

and h(a, b) is defined in (32). Using Taylor expansions of Φ(x) with large number of sensors, because of the slow asymptotic
and h(a, b), it is straightforward to show that convergence. Thus finding fairly simple decentralized tests
which are not only asymptotically optimal, but have good
I(fφ , gφ ) 1 1
lim = ≈ . performance for practical values, will undoubtedly be of great
maxk ρk →0 ρ π 3.14 importance. The ideas of tandem quantizers, or more generally
Therefore, if the SNRs ρk ’s are small at all sensors, then the quantized feedback, will provide a powerful tool to tackle
Bayes risks of the test δφ (c) with binary stationary quantizer these problems. Therefore, this article is just the beginning
satisfy of further investigation.
Rc (δφ (c)) ∼ 3.14c| log c|/ρ, (37)
ACKNOWLEDGEMENT
as c → 0.
A comparison of (35)-(37) leads to the following interesting The author would like to thank Gary Lorden for his
comments. Let us choose the SPRT with the optimal binary enthusiastic and inspiring guidance of the author’s thesis
stationary quantizer as the baseline decentralized sequential research, of which this work is an outgrowth. Venugopal V.
test, and denote by R0 the corresponding Bayes risk. Then Veeravalli and Alexander G. Tartakovsky are also thanked for
the Bayes solutions in the system with full and limited local helpful discussions, as well as two referees, whose comments
memory will asymptotically reduce the Bayes risk to R0 /3.14 improved the paper.
and R0 /2, respectively. Moreover, as demonstrated by our
proposed test δB (c) in the system with limited local memory, APPENDIX
a one-shot one-bit feedback from the fusion center to sensors P ROOF OF T HEOREMS AND L EMMAS
can reduce the asymptotic risk of decentralized sequential tests A. Proof of Theorem 1
by half, but more feedbacks will not improve the asymptotic
The main tool is a lower bound on the sample size N of
performance further in the system with limited local memory.
a test with Type I and II error probabilities α and β. By the
Finally, without any feedback from the fusion center, a test
well-known Wald’s inequality [17], [24], for any (sequential
in the system with full local memory can achieve the same
or fixed-sample) test with Type I and II error probabilities α
asymptotic performance as the optimal centralized test.
and β, its sampling size N satisfy
VII. C ONCLUSIONS 1−α α
E0 (N ) ≥ [(1 − α) log + α log ]/Itot ,
We have studied a decentralized extension of sequential β 1−β
1−α α
hypothesis testing problems in two different scenarios of E1 (N ) ≥ [(1 − α) log + α log ]/Jtot ,
sensor networks under a Bayes formulation. In the system β 1−β
with full local memory, we have developed the first of asymp- where Itot and Jtot are defined in (12). As both α and β go
totically optimal decentralized sequential tests. It fact, our to 0, these inequalities become
proposed decentralized tests have the same asymptotic first-
E0 (N ) ≥ (1 + o(1))| log β|/Itot , (38)
order performances as the optimal centralized tests, although
it may perform poorly in some nonasymptotic settings due E1 (N ) ≥ (1 + o(1))| log α|/Jtot .
to the slow convergence rate. In the system with limited Based on these inequalities, Chernoff [4] presented the
local memory, we have used the idea of tandem quantizers following elegant proof of the theorem. Suppose a family of
to offer decentralized sequential tests which are simple but tests {δc } satisfies Rc (δc ) ≤ M c| log c| for some constant
asymptotically Bayes, and addressed a long-standing open M > 0. Denote by αc and βc Type I and II error probabilities
problem on the asymptotic performance of the Bayes solu- of δc . By the definition of Rc (δc ) in (8) and the assumption
tion. We also clarified issues involving the optimal stationary that Rc (δc ) ≤ M c| log c|, we have αc ≤ M c| log c|/(πW0 )
sensor quantizers, and proposed a minimax formulation under and βc ≤ M c| log c|/((1 − π)W1 ). By (38), the stopping time
which the optimal stationary quantizer is consistent with our τc of the test δc satisfies
intuition. It is interesting to note that the feedback from the
fusion center does not improve asymptotic performance in the E0 (τc ) ≥ (1 + o(1))| log βc |/Itot
system with full local memory, but does so in the system with | log((M c| log c|)/((1 − π)W1 ))|
limited local memory, even with one-bit one-shot feedback. ≥ (1 + o(1)) ,
Itot
13

which is simply (1 + o(1))| log c|/Itot , as c → 0, due to the and


fact n NX
k +n o
gk (Xk,i )
τk (Nk ) = sup n ≥ 1 : log ≤0 .
| log(c| log c|)| = | log c + log | log c|| = (1 + o(1))| log c|. fk (Xk,i )
i=Nk +1
Similarly For simplicity, denote τk = τk (0). It is well-known (e.g.,
| log c|
E1 (τc ) ≥ (1 + o(1)) . Theorem D in [6]) that for any 1 ≤ k ≤ K,
Jtot
Now using the definition of Rc (δ) in (8) again, for any test, E1 (τk ) < ∞ (41)
the cost of time step taken for decision is only portion of the ¡ ¢
since log gk (X)/fk (X) has positive mean and finite vari-
total expected cost or Bayes risk. Thus, ance under P1 by Assumption (A2).
Rc (δc ) ≥c[πE0 (τc ) + (1 − π)E1 (τc )] By definition of Nk and τk (Nk ), the stopping time T̂ (c) of
³ π 1 − π´ our proposed test δA (c) satisfies
≥ (1 + o(1))c| log c| + . ³ ´
Itot Jtot T̂c ≤ max Nk + τk (Nk ) + 1
Hence the theorem holds. 1≤k≤K
K
X
≤ max Nk + τk (Nk ) + 1.
B. Proof of Theorem 2 1≤k≤K
k=1

Note that the stopping time T̂ (c) of our proposed test δA (c) Now since Xk,1 , Xk,2 , . . . are i.i.d. under P1 , we have
can be written as T̂ (c) = min(T0 (c), T1 (c)), where E1 (τk (Nk )) = E1 (τk ), and thus
© ª K
T0 (c) = inf n : Lk,n ≤ −rk | log c| for all k , (39) ¡ ¢ X
E1 (T̂c ) ≤ E1 max Nk + E1 (τk ) + 1. (42)
and 1≤k≤K
k=1
© ª
T1 (c) = inf n : Lk,n ≥ ρk | log c| for all k . (40) By renewal theory and Assumption (A2), under P1 ,

For our proposed test δA (c), when stopping taking observa- | log c|
E1 (Nk ) = + O(1) and V ar1 (Nk ) = O(| log c|),
tions, the fusion center will accept H0 if T̂ (c) = T0 (c), and Jtot
will accept H1 if T̂ (c) = T1 (c). Equivalently, our proposed as c → 0, see [16] and [17, p. 171]. Hence,
test δA (c) rejects H0 if and only if T1 (c) < T0 (c). Hence, ¡ ¯ | log c| ¯¯¢2 | log c| 2
E1 ¯Nk − ≤ E1 (Nk − )
P0 {δA (c) rejects H0 } = P0 (T1 (c) < T0 (c)) Jtot Jtot
≤ P0 (T1 (c) < ∞). | log c| 2
= V ar1 (Nk ) + (E1 Nk − )
Jtot
For any stopping time τ, using Wald’s likelihood ratio identity
= O(| log c|),
[17], [24], we have
K
X and so
¯ | log c| ¯¯ p
P0 (τ < ∞) = E1 exp(− Lk,τ ; τ < ∞), E1 ¯Nk − = O( | log c|).
k=1
Jtot
where Lk,n is the log-likelihood ratio at sensor Sk defined Thus
in (17). By definition, for the stopping time τP = T1 (c), we | log c| ¡ | log c| ¢
K E1 max Nk = + E1 max Nk − +1
have Lk,τ ≥ ρk | log c| for k = 1, 2, · · · , K, and k=1 ρk = 1. 1≤k≤K Jtot 1≤k≤K Jtot
K
Thus | log c| X ¯¯ | log c| ¯¯
≤ + E1 Nk − +1
K
X Jtot Jtot
k=1
P0 (T1 (c) < ∞) ≤ E1 exp(− ρk | log c|; | log c| p
k=1 = + O( | log c|).
Jtot
T1 (c) < ∞)
Combining this with relations (41) and (42) yields the first
= E1 (exp(−| log c|; T1 (c) < ∞)
inequality in (22), completing the proof of the theorem.
≤ exp(−| log c|) = c
since 0 < c < 1. This proves the first inequality in (21). The C. Proof of Theorem 4
second inequality in (21) can be proved similarly.
Now we will study the properties of the stopping time T̂ (c) The proof follows the same lines as that of Theorem 1.
of our proposed test δA (c). It suffices to prove the second Assume a family of sequential tests {δc } in the system with
inequality in (22), as the proof of the first inequality in (22) limited local memory satisfies Rc (δc ) ≤ M (c| log c|) for some
is identical. To accomplish this, for 1 ≤ k ≤ K, let constant M > 0. Then the definition of Rc (δc ) in (8) implies
n n o that Type I and II error probabilities of the tests δc satisfy
X gk (Xk,i )
Nk = inf n ≥ 1 : log ≥ ρk | log c| , with α ≤ M c| log c|/(πW0 ) and β ≤ M c| log c|/((1−π)W1 ).
i=1
fk (Xk,i ) Now by a new lower bound established in Lemma 1 (below)
14

for tests in the system with limited local memory, the stopping amount and the fusion center should stop taking observations
times τc of the tests δc satisfy as soon as possible and reject the null hypothesis H0 .
| log(M c| log c|/(πW0 ))| Since there is only terminal decision in an open-ended
E1 (τc ) ≥ (1 + o(1)) hypothesis testing problem, the policy at the fusion center is
JD
a stopping time N. The null hypothesis H0 is rejected if and
| log c|
= (1 + o(1)) only if N < ∞.
JD Before we continue to prove the lemma, it is useful to
as c → 0. Similarly, switching the role of fk and gk , we have mention two different viewpoints of open-ended hypothesis
| log c| testing problems. In the above-mentioned motivation of open-
E0 (τc ) ≥ (1 + o(1)) . ended hypothesis testing problems, the cost for each time
ID
step under the hypothesis H0 is different than that under the
Then by the definition of Rc (δ) in (8), for any test, the cost alternative hypothesis H1 , whereas these costs are the same
of time step taken for decision is only portion of the total (or at least the same order) in the standard formulation of
expected cost or Bayes risk. Thus, hypothesis testing problems. An alternative viewpoint is to
think of open-ended hypothesis testing problems as standard
Rc (δc ) ≥ c[πE0 (τc ) + (1 − π)E1 (τc )]
³ π hypothesis testing problems in which the tests are required to
1 − π´
≥ (1 + o(1))c| log c| + , satisfy the constraint in (43) with β = 0. For that reason, an
ID JD open-ended test is also called “power-one” test in the literature.
and the theorem holds. Now let us go back to the proof of relation (44) for a test δ
To complete the proof, we need to prove the following satisfying (43). The key idea is to use the stopping time τ of
lemma which establishes asymptotic lower bounds on the time the test δ to construct an open-ended test in the system with
steps taken for decision making by a (sequential or fixed limited local memory.
sample size) test in the system with limited local memory. For that purpose, let a = | log α| and we need to first
Lemma 1: Assume VDk (gk , fk ), defined in (10), is finite define a pre-specified open-ended test M (a) in the system
for all 1 ≤ k ≤ K. Denote by τ the stopping time (number of with limited local memory as follows. Each sensor uses the
time step for decision making) of a test δ in the system with optimal MLRQ ϕk :
limited local memory satisfying Uk,n = ϕk (Xk,n ) = d if and only if
P0 (reject H0 ) ≤ α, P1 (reject H1 ) ≤ β, (43) gk (Xk,n )
λk,d ≤ < λk,d+1 ,
fk (Xk,n )
where 0 < α, β < 1. Then as (α, β) → (0, 0),
where 0 = λk,0 ≤ λk,1 ≤ · · · ≤ λk,Dk −1 ≤ λk,Dk = ∞
| log α| are optimally chosen in the sense that the Kullback-Leibler
E1 (τ ) ≥ (1 + o(1)) , (44)
JD information number I(gϕ,k , fϕ,k ) achieves the supremum
where JD is defined in (11). Moreover, if VDk (fk , gk ) is also IDk (gk , fk ). Here fϕ,k and gϕ,k are the probability mass
finite for all 1 ≤ k ≤ K, then function induced on Uk,n when the observations Xk,n are
distributed as fk and gk , respectively.
| log β|
E0 (τ ) ≥ (1 + o(1)) , (45) Based on the independent, identically distributed observa-
ID tions Un = (U1,n , . . . , UK,n ), the fusion center then uses the
where ID is defined in (11). one-sided SPRT with log-likelihood ratio boundary a, i.e., the
fusion center stops taking observations at time
Proof: We only need to prove that (44) holds since the
proof of (45) is identical. For a test δ satisfying (43), if M (a) = first n such that (46)
E1 (τ ) = ∞, then it is obvious that (44) holds. Thus it is n ³X
gϕ,k (Uk,i ) ´
X K
suffices to show (44) holds if E1 (τ ) < ∞. log ≥ a,
Given the overwhelming difficulty of directly studying i=1
fϕ,k (Uk,i )
k=1
the hypothesis testing problem in the system with limited
(and M (a) = ∞ if such n does not exist), and rejects the
local memory, we need to prove the lemma by considering
null hypothesis H0 whenever stopping taking observations. It
the corresponding “open-ended” hypothesis testing problems,
is evident that the stopping time M (a) defines an open-ended
developed by Robbins [13] and Robbins and Siegmund [14],
test in the system with limited local memory. By classical
[15]. For a decentralized test δ in the system with limited local
results on SPRT, see Equations (8.3) and (8.4) of Siegmund
memory satisfying (43), relation (44) will be proved through
[17], using the fact that a = log |α|, we have
studying the properties of an open-ended test constructed from
¡ ¢
the test δ. P0 M (a) < ∞ ≤ α, and (47)
The motivation of the open-ended hypothesis testing prob- E1 (M (a)) = | log α|/JD + O(1),
lem is as follows. Assume that if H0 is true, sampling costs
nothing and the preferred action at the fusion center is just where JD is defined in (11).
to take observations without stopping. On the other hand, if Now for an arbitrary test δ in the system with limited local
H1 is true, each time step for decision making costs a fixed memory satisfying (43), we can construct the corresponding
15

open-ended test by combining its stopping time τ with our To complete the proof of Lemma 1, we need to prove the
pre-specified stopping time M (a). Specifically, define a new following results for the open-ended tests, i.e., tests defined by
stopping time τ̂δ by a stopping time τ that stop taking observations only to reject
½ H0 .
τ if δ accepts H1
τ̂δ = ,
τ + M1 if δ accepts H0 Lemma 2: Under the assumption of Lemma 1, for any
where M1 is the stopping time obtained by applying M (a), stopping time τ in the system with limited local memory
defined in (46) with a = | log α|, to all sensors observations satisfying P0 (N < ∞) ≤ α1 (< 1), relation (48) implies (49).
from time τ on, i.e., Xk,τ +1 , Xk,τ +2 , . . . for 1 ≤ k ≤ K. Proof: A key idea in the proof of this lemma is to use
Then the new stopping time τ̂δ defines an open-ended test conditional log-likelihood ratios to construct a martingale, and
which will declare that the null hypothesis H1 is true if and the proof follows the same argument as in Theorem 1 of [11].
only if τ̂δ is finite. Note that in the system with limited local memory, we can
By definition, the open-ended test τ̂δ is a test in the system rewrite
with limited local memory since it is the combination of Uk,n = ϕk,n (Xk,n ),
two tests, δ and M (a), in the system with limited local where ϕk,n may depend on En−1 , all past sensor mes-
memory. Note that if H1 is true, both τ and M1 will stop ϕ ϕ
sages defined in (3). Denote by fk,n and gk,n respec-
with probability 1, and hence τ̂ will also stop with probability tively the conditional density induced on Uk,n given En−1
1 under H1 . In other words, if H1 is true, then τ̂δ will stop and when the density of Xk,n is fk and gk . Denote by
reject the null hypothesis H0 with probability 1. This suggests Zk,n³ the conditional log-likelihood
´ ratio function of Uk,n ,
that τ̂δ indeed is an open-ended test in the system with limited log( gk,nϕ ϕ
(Uk,n )/fk,n (Uk,n ) .
local memory.
Since X1,n , · · · , XK,n are independent, so are
Now we need to study the properties of our new open-ended
U1,n , · · · , UK,n given En−1 . Thus in the fusion center,
test τ̂δ . Because M1 is independent of τ, and as a copy of
the conditional log-likelihood ratio of (U1,n , · · · , UK,n ) given
M (a), M1 also satisfies (47), we have PK
³ ´ ³ ´ En−1 is Zn = k=1 Zk,n . By Theorem 1 (or Theorem 3)
P0 τ̂δ < ∞ ≤ P0 δ accept H1 in Lai [8] (also see Theorem 1 of Lai [7]), to prove (49), it
³ ´ suffices to show that for any δ > 0,
+ P0 M1 < ∞; δ accepts H0 n t o
¡ ¢ X
≤ α + P0 M1 < ∞ ≤ 2α, lim sup P1 max Zi ≥ ID (1 + δ)n = 0. (50)
n→∞ t≤n
i=1
and
By the definition of IDk (gk , fk ),
E1 (τ̂δ ) = E1 (τ ; δ accepts H1 ) + K
X K
X
¡ ¢
+ E1 τ + M1 ; δ accepts H0 E1 (Zi ) = E1 (Zk,i ) ≤ IDk (gk , fk ) = ID ,
³ ¢ k=1 k=1
= E1 (τ ) + E1 (M1 )P1 δ accepts H0
thus the left-hand size of (50) is less than or equal to

E1 (τ ) + β E1 (M1 )
³ | log α| ´ n K ³
t X
X ´ o
= E1 (τ ) + β + O(1) . lim sup P1 max Zk,i − E1 (Zk,i ) ≥ ID δn
JD n→∞ t≤n
i=1 k=1
Meanwhile, using lemma 2 below, for any open-ended test N K
X n Xt ³ ´ o
in the system with limited local memory satisfying ≤ lim sup P1 max Zk,i − E1 (Zk,i ) ≥ δ1 n ,
n→∞ t≤n
k=1 i=1
P0 (N < ∞) ≤ α1 (< 1), (48)
where δ1 = IPD δ/K.
n
we have Note that i=1 (Zk,i − E1 (Zk,i )) is a martingale. Doob’s
log |α1 | submartingale inequality (Theorem 14.6 of [25]) implies that
E1 (N ) ≥ (1 + o(1)) , (49)
JD n t ³
X ´ o
where o(1) → 0 uniformly as α1 → 0. In particular, this is P1 max Zk,i − E1 (Zk,i ) ≥ δ1 n
t≤n
i=1
true for N = τ̂δ , our open-ended tests constructed from the n
X
test δ. Combining the above three inequalities yields ≤ E1 (Zk,i )2 /(δ12 n2 ).
| log(2α)| ³ | log α| ´ i=1
E1 (τ ) ≥ (1 + o(1)) −β + O(1)
JD JD Note that for any i, E1 (Zk,i )2 ≤ VDk (gk , fk ) by definition,
| log α| and hence
= (1 + o(1)) ,
JD n Xt ³ ´ o V (g , f )
Dk k k
where as (α, β) → (0, 0), the term o(1) → 0 and does P1 max Zk,i − E1 (Zk,i ) ≥ δ1 n ≤ ,
t≤n
i=1
δ12 n
not depend on τ. Relation (44) follows at once from the
arbitrariness of the test δ and its corresponding stopping time which implies (50) since VDk (gk , fk ) is finite. Relation (49)
τ, and thus Lemma 1 holds. follows.
16

(2)
D. Proof of Theorem 5 of δB (c) is just the decision of δB (c). Thus
³ ´
To prove the theorem, it is useful to think our proposed P0 δB (c) rejects H0
³ ´
test δB (c) in the system with limited local memory as a (2)
= P0 δB (c) rejects H0
combination of two SPRTs with stationary MLRQ at sensors. ³ ¡
(1) (2) (2) ¢´
Denote by δB (c) and δB (c) the SPRT tests at the first and = E0 P0 δB (c) rejects H0 |V
(1)
second stage, respectively. Now at the first stage, δB (c) is an ≤ E0 (c) = c,
SPRT with log-likelihood ratio threshold values ± log | log c|,
and by the classical results on SPRT, we have where the second equality uses the conditional expectation on
(2)
V, and the third inequality uses the property of δB (c). The
1 1
P0 (V = 1) ≤ , P1 (V = 0) ≤ , second inequality in (27) can be proved similarly.
| log c| | log c| Next, let us consider the time step Mc taken by our proposed
(1) (2) (1)
and test δB (c). Observe that Mc = Mc + Mc , where Mc and
(2) (1)
Mc are time steps taken by δB (c) at the first stage and by
log | log c| log | log c| (2)
δB (c) at the second stage, respectively. Hence,
E0 (Mc(1) ) ≤ , E1 (Mc(1) ) ≤ ,
I1 J1
E0 (Mc ) = E0 (Mc(1) ) + E0 (Mc(2) )
(1)
where V is the decision of δB (c),
i.e., the preliminary = E0 (Mc(1) ) + E0 (Mc(2) |V = 1)P0 (V = 1)
(1)
decision of our proposed test δB (c),P Mc is the time step +E0 (Mc(2) |V = 0)P0 (V = 0)
K
taken at the first stage, and I1 = k=1 I(fφ,k , gφ,k ) and
PK (1) (2)
J1 = k=1 I(gφ,k , fφ,k ) are Kullback-Leibler information Applying the results for δB (c) and δB (c), as c → 0, we
numbers of sensor messages corresponding to the stationary have
MLRQ φk ’s used in the first stage. By definition in (11), we log | log c|
have I1 ≤ JD and J1 ≤ ID . E0 (Mc ) ≤ (1 + o(1))
(2) I1
At the second stage, note that δB (c) only uses the ob- | log c| 1
servations in the second stage to make decisions, and thus +(1 + o(1)) ×
I2 | log c|
conditional on the preliminary decision V, the SPRTs at two
(1) (2) (2) | log c|
stages, δB (c) and δB (c), are independent. Note that δB (c) +(1 + o(1)) ×1
is an SPRT with log-likelihood ratio threshold values ±| log c|, ID
by the classical results for SPRT, we have | log c|
= (1 + o(1)) .
ID
(2) (2)
P0 (δB reject H0 |V ) ≤ c, P1 (δB reject H1 |V ) ≤ c, This proved the first inequality in (28). The second inequality
in (28) can be proved similarly. Thus the theorem holds.
regardless of the decision V at the first stage. However, if we
(2)
denote by Mc the time step taken at the second stage, i.e.,
(2) (2) E. Proof of Theorem 7
Mc is the stopping time of δB (c), then the properties of
(2)
Mc will depend on V. Specifically,
To prove the theorem, by Lemma 7 below, for any given
E0 (Mc(2) |V = 0) ≤ (1 + o(1))| log c|/ID , θ ≥ 0, hθ (λ) defined in (33) is a decreasing function of λ ∈
[0, ∞). Thus
E1 (Mc(2) |V = 0) ≤ (1 + o(1))| log c|/J2 ,
min(hθ (λ), hθ (−λ)) ≤ hθ (|λ|) ≤ hθ (0),
where ID and J2 are Kullback-Leibler information numbers
of sensor messages corresponding to optimal stationary ML- with equality holding for λ = 0. This shows that
RQs ψk ’s used at sensors, where the MLRQ ψk maximizes min(hθ (λ), hθ (−λ)) is maximized at λ = 0, and so the
I(fψ,k , gψ,k ) for all k. Similarly, theorem is proved.
To complete the proof, we need the following lemmas.
E0 (Mc(2) |V = 1) ≤ (1 + o(1))| log c|/I2 ,
Lemma 3: Define
E1 (Mc(2) |V = 1) ≤ (1 + o(1))| log c|/JD ,
a(x) = (φ(x))2 + xφ(x)Φ(x) − (Φ(x))2 .
where I2 and JD are Kullback-Leibler information number
of sensor messages corresponding to optimal stationary ML- Then a(x) ≤ 0 for all x ∈ R.
RQs ψk ’s used at sensors, where the MLRQ ψk maximizes Proof: Note that the first-order derivative of a(x) is
I(gψ,k , fψ,k ) for all k.
a0 (x) = 2(−x)φ(x)φ(x) + φ(x)Φ(x) + x(−xφ(x))Φ(x)
Now we are in a position to prove the theorem. Let us first
consider the error probabilities of our proposed test δB (c). To +xφ(x)φ(x) − 2φ(x)Φ(x)
prove the first inequality in (27), note that the final decision = −x(φ(x))2 − (x2 + 1)φ(x)Φ(x).
17

On the one hand, if x ≥ 0, it is easy to see a0 (x) ≤ 0. On the If x ≥ 0, then it is easy to see that A(x, |x|) = A(x, x) = 0.
other hand, if x ≤ 0, let u = −x, then φ(u) = φ(−u) and On the other hand, if x ≤ 0, then A(x, |x|) = A(x, −x), which
h u Φ(−u) i can rewritten as
a0 (x) = (φ(u))2 (u2 + 1) 2 − ,
u +1 φ(u) A(x, −x) = φ(x)[2 log u − u + 1/u],
which is negative for all u ≥ 0 by the well-known fact (e.g.,
where
proposition 14.8 on page 141 of [25]) that Φ(−x)
u= ≥1
Φ(−u) 1 u Φ(x)
≥ = 2 .
φ(u) u + 1/u u +1 since x ≤ 0. Let w(u) = 2 log u − u + 1/u, then
Thus, a0 (x) ≤ 0 for all x ∈ R, and so a(x) is a decreasing
function of x over R. Clearly, as x → −∞, both φ(x) and w0 (u) = 2/u − 1 − 1/u2 = −(1 − 1/u)2 ≤ 0.
Φ(x) go to 0, and hence a(x) goes to 0. Therefore, a(x) ≤ 0 Thus w(u) is a decreasing function of u, and for all u ≥ 1,
for all x ∈ R. w(u) ≤ w(1) = 0. This shows that A(x, −x) ≤ 0 for all
φ(x)
Lemma 4: The function b(x) = Φ(x) + x is an increasing x ≤ 0. Therefore, A(x, y) ≤ 0 for all y ≥ max(x, −x).
function of x ∈ R.
Lemma 7: For any θ ≥ 0, the function hθ (λ) defined in
Proof: The lemma follows at once from the fact that
(33) is a decreasing function of λ ∈ [0, ∞).
a(x) Proof: By (32) and (33), it is straightforward to show that
b0 (x) = − ,
(Φ(x))2

which is positive for all x ∈ R by Lemma 3. hθ (λ) = A(λ − θ, λ + θ),
∂λ
Lemma 5: For all x ∈ R, where A(x, y) is defined in (52). Note that for any given θ ≥ 0,
φ(x) φ(x) λ+θ ≥ max(λ−θ, −(λ−θ)) for all λ ≥ 0. Applying Lemma
− ≤ max(0, 2x). (51) 6, we have A(λ − θ, λ + θ) ≤ 0 for all λ ≥ 0, i.e., the first-
Φ(−x) Φ(x)
Proof: If x ≤ 0, then the conclusion is trivial since order derivative of hθ (λ) (with respect to λ) is negative for
Φ(−x) ≥ Φ(x). If x ≥ 0, then the left-hand side of (51) λ ∈ [0, ∞). Hence, the lemma is proved.
is equal to
(b(−x) + x) − (b(x) − x) = b(−x) − b(x) + 2x, R EFERENCES
[1] L. R. Abramson, “Asymptotic sequential design of experiments with two
which is less than 2x since Lemma 4 implies b(−x) ≤ b(x) random variables,” J. R. Stat. Soc. Ser. B Stat. methodol., vol. 28, No.
for all x ≥ 0. 1, pp. 73-87, 1966.
[2] M. Basseville and I. Nikiforov, Detection of Abrupt Changes: Theory
Lemma 6: Define and Applications. Englewood Cliffs: Prentice-Hall, 1993.
h Φ(y) Φ(−y) i [3] R. S. Blum, S. A. Kassam, and H. V. Poor, “Distributed detection with
A(x, y) = φ(y) log − log muliple sensors: part II- advanced topics,” Proceedings of the IEEE, vol.
Φ(x) Φ(−x) 85, no. 1, pp. 64-79, 1997.
h Φ(y) Φ(−y) i [4] H. Chernoff, “Sequential design of experiment,” Ann. Math. Statist., vol.
−φ(x) − . (52) 30, pp. 755-770, 1959.
Φ(x) Φ(−x) [5] V. P. Dragalin, A. G. Tartakovsky, and V. V. Veeravalli, “Multihypothesis
Then A(x, y) ≤ 0 for any y ≥ max(x, −x). sequential probability ratio tests – part I: Asymptotic optimality,” IEEE
Trans. Inform. Theory, vol. 45, pp. 2448-2461, 1999.
Proof: A simple algebra calculation shows that [6] J. Kiefer and J. Sacks, “Asymptotically optimal sequential inference and
∂A(x, y) design,” Ann. Math. Statist., vol. 34, pp. 705-750, 1963.
= φ(y)B(x, y), [7] T. L. Lai, “Information bounds and quick detection of parameter changes
∂y in stochastic systems,” IEEE Trans. Inform. Theory, vol. 44, pp. 2917-
2929, 1998.
where [8] T. L. Lai, “Sequential multiple hypothesis testing and efficient fault
h Φ(y) Φ(−y) i detection-isloation in stochastic systems,” IEEE Trans. Inform. Theory,
B(x, y) = −y log − log vol. 46, pp. 595-608, 2000.
Φ(x) Φ(−x) [9] T. L. Lai, “Sequential analysis: some classical problems and new
φ(y) φ(x) challenges (with discussion),” Statistica Sinica, vol. 11, pp. 303-408,
+ − . 2001.
Φ(y)Φ(−y) Φ(x)Φ(−x) [10] Y. Mei, “Asymptotically optimal methods for sequential change-point
Now it is easy to show that detection,” Ph.D. thesis, California Institute of Technology, 2003.
Available at <http:// resolver.caltech.edu/CaltechETD:etd-05292003-
∂B(x, y) φ(x) h φ(x) φ(x) i 133431> .
= y+x+ − , [11] Y. Mei, “Information bounds and quickest change detection in decen-
∂x Φ(x)Φ(−x) Φ(x) Φ(−x) tralized decision systems,” IEEE Trans. Inform. Theory, vol. 51, pp.
which is positive for all y ≥ max(x, −x) by Lemma 5. Since 2669-2681, Jul. 2005.
[12] A. G. O. Mutambara, Decentralized estimation and control for multi-
B(y, y) = 0, we know B(x, y) ≤ 0 for all y ≥ max(x, −x). sensor systems, CRC Press, 1998.
Thus for each fixed x, A(x, y) is a decreasing function of [13] H. Robbins, “Statistical methods related to the law of the iterated
y if y ≥ max(x, −x). Hence, A(x, y) ≤ A(x, |x|) for all logarithm,” Ann. Math. Statist., vol. 41, pp. 1397-1409, 1970.
[14] H. Robbins and D. Siegmund, “Boundary crossing probabilities for the
y ≥ max(x, −x). To prove the lemma, it now suffices to Wiener process and partial sums,” Ann. Math. Statist. vol. 41, pp. 1410-
show that A(x, |x|) ≤ 0 for all x ∈ R. 1429, 1970.
18

[15] H. Robbins and D. Siegmund, “Statistical tests of power one and


the integral representation of solutions of certain partial differential
equations,” Bull. Inst. Math., Academia Sinica vol. 1, pp. 93-120, 1973.
[16] D. Siegmund, “The variance of one-sided stopping rules,” Ann. Math.
Statist., vol. 40, pp. 1074-1077, 1969.
[17] D. Siegmund, Sequential Analysis: Tests and Confidence Intervals. New
York: Springer-Verlag, 1985.
[18] A. G. Tartakovsky and V. V. Veeravalli, “An efficient sequential pro-
cedure for detecting changes in multichannel and distributed systems,”
Proceedings of the 5th International conference on Information fusion.
Annapolis, Maryland, July 2002, vol. 2, Page 1-8.
[19] R. R. Tenney and N. R. Sandell Jr., “ Detection with distributed sensors,”
IEEE Trans. Aerospace Elect. Syst. , vol. AES-17, pp. 501-510, 1981.
[20] J. N. Tsitsiklis, “Extremal properties of likelihood ratio quantizers,”
IEEE Trans. Commun., vol. 41, pp. 550-558, 1993.
[21] V. V. Veeravalli, “Sequential decision fusion: Theory and applications,”
J. Franklin Inst., vol. 336, pp. 301-322, Feb. 1999.
[22] V. V. Veeravalli, T. Basar, and H. V. Poor, “Decentralized sequential
detection with a fusion center performing the sequential test,” IEEE
Trans. Inform. Theory, vol. 39, pp. 433-442, Mar. 1993.
[23] R. Viswanathan and P. K. Varshney, “Distributed detection with multiple
sensors — Part I: Fundamentals,” Proceedings of the IEEE, vol. 85, pp.
54-63, Jan 1997.
[24] A. Wald, Sequential Analysis. New York: Wiley, 1947.
[25] D. Williams, Probabilty with Martingles. Cambridge univeristy press,
1991.
[26] M. Woodroofe, Nonlinear Renewal Theory in Sequential Analysis.
Society for Industrial and Applied Mathematics, Philadelphia, 1982.
[27] J. Xiao and Z. Luo, “Decentralized estimation in an inhomgeneous
sensing environment,” IEEE Trans. Inform. Theory, vol. 51, pp. 3564-
3575, Oct. 2005.

You might also like