You are on page 1of 13

930 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO.

2, APRIL 2018

Distributed Quickest Detection in Sensor Networks


via Two-Layer Large Deviation Analysis
Di Li , Student Member, IEEE, Soummya Kar, Member, IEEE, and Shuguang Cui, Fellow, IEEE

Abstract—We propose a distributed Bayesian quickest is implemented in sensor networks [13]–[15], it can detect the
detection algorithm for sensor networks, based on a random gos- change of statistical features, such as the mean and variance,
sip inter-sensor communication structure. Without a control or over the observation sequences taken by sensors. For exam-
fusion center, each sensor executes its local change detection pro-
cedure in a parallel and distributed fashion, interacting with its ple, quickest change detection can be implemented in sensor
neighboring sensors via random inter-sensor communications to networks for chemical industry to monitor the leakage, or to
propagate information. By modeling the information propagation surveille the change of temperature in the field, by detecting
dynamics in the network as a Markov process, a two-layer large the change in statistical patterns.
deviation analysis is presented to analyze the performance of the For signal processing implementation in sensor networks,
proposed algorithm. The first-layer analysis shows that the rela-
tion between the probability of false alarm and the conditional essentially it can be divided into the following two cate-
averaged detection delay satisfies the large deviation principle, gories: centralized versus distributed algorithms [16]–[26]. For
where the distributed Kullback–Leibler information number is centralized quickest change detection algorithms [27]–[33], a
established as a crucial factor. The second-layer analysis stud- control or fusion center exists to process the data in a central-
ies the probability that not all observations are available at one ized way. Specifically, in centralized algorithms, they assume
sensor. It shows that this probability decays exponentially fast
to zero as the averaged rounds of communication increases. The that either the raw observations from all the sensors or cer-
large deviation upper and lower bounds for the converge rate tain preprocessed information from the sensors (some people
are then derived. Finally, we show that the performance of the call this case as decentralized sensing) are available to the
distributed algorithm converges exponentially fast to that of the control or fusion center via certain communication channels;
centralized optimal one. then a final centralized detection procedure is executed at the
Index Terms—Bayesian model, distributed detection, large center. However, centralized algorithms have some disadvan-
deviation, quickest detection, sensor networks. tages, such as heavy communication burden, high computation
complexity, low scalability, and poor robustness. On the con-
trary, distributed implements do not require a control or
fusion center, and the detection procedure is implemented at
I. I NTRODUCTION each sensor in a local and parallel fashion, with interactions
UICKEST change detection problems focus on detecting among sensors in the neighborhood to exchange information.
Q abrupt changes in stochastic processes as quickly as pos-
sible, with constraints to limit the detection error. Quickest
While centralized quickest change detection algorithms have
been well-studied, there are fewer literatures on the study
change detection has wide applications in fields, such as of distributed algorithms for quickest change detection prob-
signal and image processing [2]–[4], computer network intru- lems [34], [35], which become more desired in large-scale
sion detection [5]–[7], neuroscience [8], environment and networks with a huge volume of data, in order to reduce
public health surveillance [9], [10], and system failure detec- the overall computation complexity and to enhance scalability.
tion [11], [12]. Specifically, when quickest change detection In [34], a distributed consensus-based Page’s test algorithm,
using cumulative sum log-likelihood of the data, was proposed,
Manuscript received July 15, 2017; revised December 6, 2017 and
January 26, 2018; accepted February 12, 2018. Date of publication with the assumption that the change happening time is deter-
March 1, 2018; date of current version April 10, 2018. This work was sup- ministic but unknown, which is called a non-Bayesian setup.
ported in part by NSF under Grant DMS-1622433, Grant AST-1547436, Grant In [35], a distributed change detection algorithm was proposed,
ECCS-1659025, Grant CCF-1513936 and Grant CNS-1343155, in part by
DoD under Grant HDTRA1-13-1-0029, and in part by NSFC under Grant by combining a global consensus scheme with the geometric
NSFC-61629101. A portion of this paper was presented at the Allerton moving average control charts that generate local statistics.
Conference [1]. (Corresponding author: Shuguang Cui.) In both [34] and [35], non-Bayesian setups of the change
D. Li is with the R&D Department, Unicore Communications Technology
Corporation, Fremont, CA 94538 USA, and also with the Department of happening time are considered, where the communication
Electrical and Computer Engineering, Texas A&M University, College Station, stage and the observation stage are interleaved, i.e., they are
TX 77843 USA (e-mail: dili@tamu.edu). at the same time scale and each is executed once within one
S. Kar is with the Department of Electrical and Computer
Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 USA system time slot. Under such an interleaving strategy, the con-
(e-mail: soummyak@andrew.cmu.edu). vergence of the test statistic is established when the system
S. Cui is with the Department of Electrical and Computer Engineering, time goes to infinity. However, this type of convergence anal-
University of California at Davis, Davis, CA 95616 USA (e-mail:
sgcui@ucdavis.edu). ysis over time does not fit well into quickest change detection
Digital Object Identifier 10.1109/JIOT.2018.2810825 problems, which are time-sensitive, with the goal to detect the
2327-4662 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 931

change as quickly as possible. This is different from traditional


detection problems without much consideration of the timing
issue, where the convergence analysis is commonly performed
as the system time goes to infinity.
Different from the existing work, in this paper, we propose
a distributed change detection algorithm based on a Bayesian
setup of change happening time. To the best of our knowledge,
this paper is the first work discussing the distributed change
detection algorithm under such a Bayesian setup. Additionally,
in our proposed distributed algorithm, there are multiple com-
munication steps in between two observation instants, i.e., the
communication step has a smaller time scale than that of the
Fig. 1. Distributed quickest detection system.
observation stage. In communication steps, a random point-to-
point gossip-based algorithm is proposed as in [36] and [37].
We model the information propagation procedure governed by
considering the prior distribution. Denote n as the test statis-
this communication procedure as Markov processes. We then
tic, τ ∗ as the optimal stopping time, and D(f1i , f0i ) as the
analyze the performance of the proposed distributed change
Kullback–Leibler information number.
detection algorithm, with a method of two-layer large devi-
The rest of this paper is organized as follows. Section II
ation analysis. Large deviation techniques [38], [39] have
sets up the system model and describes the quickest change
been used to analyze the performance of either centralized
detection problem. Section III presents the large deviation
or distributed estimation and detection algorithms, for exam-
analysis in the centralized change detection algorithm to set up
ple, in [36] and [40]–[42]. However, no existing work has
the background. Section IV introduces the distributed change
utilized the technique of large deviation analysis to study the
detection algorithm, and develops the corresponding two-layer
performance of the change detection algorithms, especially the
large deviation analysis. Section V provides the simulation
distributed change detection algorithms. The most related work
results to validate the analytical results from the previous
is [42], in which a distributed sequential detection method is
sections. Section VI concludes this paper.
proposed to solve the problem of Gaussian binary hypothe-
sis testing. The sequential hypothesis testing problem could
II. S YSTEM S ETUP
be considered as a special case of change detection problems,
where the change happened at the initial time point [43]. As shown in Fig. 1, we consider a wireless sensor network
The motivations of the proposed two-layer large deviation with N nodes monitoring and detecting the change happening
analysis method are twofold. In quickest detection problems, in the large scale field, such as the smart grid or battlefield.
there are two performance metrics: 1) detection delay and Assume that a change happens at time λ = k, then condi-
2) probability of false alarm (PFA). Tradeoffs exist between tioned on λ = k, independent and identically distributed (i.i.d.)
these two metrics and how to mathematically quantify such observations X1i , . . . , Xk−1
i at sensor i follow a distribution
tradeoffs is valuable. However, no closed-form quantification with density function f0i (x); observations Xki , Xk+1
i · · · follow
exists for general cases, while large deviation analysis pro- another distribution with density function f1 (x). We assume
i

vides a mathematical tool to understand the system property that observations at different sensors are independent of each
in the asymptotic case. In our proposed two-layer large devia- other and the various densities are absolutely continuous with
tion analysis method for distributed quickest detection, the first respect to the Lebesgue measure. Denote Xin = [X1i , . . . , Xni ]
layer analysis derives the relation between the detection delay as observations up to time n at node i. Let Pk be the proba-
and PFA, which shows that the PFA decays exponentially fast bility measure of Xin when the change occurs at time k, and
to zero, as the averaged detection delay (ADD) increases.The Ek be the corresponding expectation operator. Our goal is to
second layer analysis answers the following questions. In the design a sequential online detection algorithm (with a stopping
context of distributed detection schemes, one interesting topic criterion) over the observation sequence to detect the change.
is whether or not the distributed scheme can converge to the If prior knowledge about the change happening time is
corresponding centralized one, and if it is, how the distributed available, the Bayesian setup can be used to leverage this
scheme behaves when converging to the centralized one and prior knowledge for change detection. In particular, the prior
what is the convergence speed. By adopting the second layer distribution for the change happening time λ is denoted as
large deviation analysis, we prove that the performance of
πk = P(λ = k).
the proposed distributed quickest detection scheme converges
exponentially fast to that of the centralized one, when the This gives the prior probability that the change will hap-
averaged number of communication rounds increases. pen at time k. With the observations taken in the sequel,
Notation: Denote P as probability measure and E as the the posterior probability (via Bayesian rules) indicating that
corresponding expectation operator. Denote Pk and Ek as the the change has happened could be derived, which is used
conditional probability measure and the expectation operator, for decision making. As in statistics, quickest detection prob-
when the change occurs at time k. Denote Pπ and Eπ as lems could be basically split into non-Bayesian and Bayesian
the probability measure and the expectation operator, when categories [38], [43], [44].
932 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

The change detection problem can be converted to the the PFA is


hypothesis testing problem with hypotheses “H0 : λ > n” and ∞

“H1 : λ ≤ n,” i.e., to sequentially decide which hypothesis PFA(τ ) = Pπ (τ < λ) = πk Pk (τ < k)
is true at each time n. If H0 is decided, it indicates that the k=1
change has not happened; if H1 is decided, it claims that the with Eπ and Pπ defined at the beginning of this section, and
change has happened. α the upper limit of PFA.
The optimal solution to this problem is given by the
A. Centralized Scheme Shiryaev test (see [45] and [46]), where the detection strategy
First, we discuss the centralized change detection algorithm, corresponds to claiming a change when the likelihood ratio
which means that observations from all sensors are available at n exceeds a threshold, i.e., the optimal stopping time τ ∗ is
a control center, where the detection algorithm is performed. τ ∗ (A) = inf{n ≥ 1 : n ≥ A} (7)
Denote Xn = [X1n , . . . , XN
n ] as observations up to time n from
all sensors; denote the likelihood ratio for H1 : λ ≤ n versus where A is chosen such that PFA(τ ∗ (A)) = α. It is difficult
H0 : λ > n averaged over the change point (see [44]) as to set a threshold A exactly matching the above condition. We
could set A = (1 − α)/α guaranteeing that PFA(τ ∗ (A)) ≤ α,
P(Xn |λ ≤ n)P(λ ≤ n) which is due to the fact that Pπ (τ ∗ (A) < λ) = Eπ (1 − pτ ∗ (A) )
n =
P(Xn |λ > n)P(λ > n) and 1−pτ ∗ (A) ≤ 1/(1+A) with pτ ∗ (A) defined in (5), such that
n  N n i  i  k−1 i  i 
k=1 πk i=1 j=k f1 Xj j=1 f0 Xj
PFA(τ ∗ (A)) ≤ 1/(1 + A). Therefore, setting A = (1 − α)/α
= ∞ N n i  i  . (1) guarantees PFA(τ ∗ (A)) ≤ α [44].
k=n+1 πk i=1 j=1 f0 Xj
B. Isolated Scheme
Assume the prior distribution is geometric [43], i.e.,
If there is no control center and each sensor implements
πk = ρ(1 − ρ)k−1 , with ρ in (0, 1). the local change detection algorithm purely based on its own
observations, the log-likelihood ratio for hypotheses H0 : λ >
Then, we have n versus H1 : λ ≤ n of sensor i at time n is derived as
 


n n N f i Xi 1
i f1i Xni
n =
1
πk
1 j
 . log n = log
i
+ log n−1 + ρ + log i
i (8)
(1 − ρ)n i i
(2) 1−ρ f0 Xn
k=1 j=k i=1 f0 Xj
with the initial state i0 = 0.
We further have the following recursive form as: Then, to solve the optimization problem in (6) at sensor
N i
i i, the Shiryaev test with test statistic in (8) is the optimal

1 f1 Xn solution [45], [46], with the optimal stopping time τ i at sensor
n = (n−1 + ρ)
(3)
1−ρ f0i Xni i as
i=1

with the initial state 0 = 0. Taking logarithms on both sides, τ i (A) = inf n ≥ 1 : in ≥ A (9)
we have ∗
where A is chosen such that PFA(τ i (A)) = α. Since this


1  N
f1i Xni detection strategy is exclusively based on local observations
log n = log + log(n−1 + ρ) + log i
i . (4) at each sensor, it is called the isolated scheme.
1−ρ f0 Xn
i=1 Intuitively, the larger the difference between densities f1i (x)
Let FnX = σ (Xn ) be the σ −algebra generated by the and f0i (x) is, the faster the change can be detected. To quantify
observations Xn , and we denote the difference between densities f1i (x) and f0i (x), the Kullback–
Leibler information number is defined as
 
pn = P λ ≤ n|FnX (5) 

i i f1i (x) i
D f1 , f0 = log i f (x)dx (10)
as the posterior probability that the change has occurred before f0 (x) 1
time n. It follows that n = pn /(1 − pn ). which is also called divergence or KL distance between
We intend to detect the change as soon as possible, with densities f1i (x) and f0i (x). We assume a mild condition that
a constraint on the detection error. Thus, the change detec- 0 < D(f1i , f0i ) < ∞ and 0 < D(f0i , f1i ) < ∞, for each i.
tion problem can be formulated as the following optimization In the sequel, we will show that the Kullback–Leibler infor-
problem over certain decision rules: mation number is a crucial factor in analyzing the performance
inf ADD(τ ) of the change detection algorithms.
τ ∈(α)
s.t. (α) = {τ : PFA(τ ) ≤ α} (6) III. L ARGE D EVIATION A NALYSIS FOR C ENTRALIZED
AND I SOLATED A LGORITHMS
where the ADD is
Large deviation studies the asymptotic behavior of a rare
ADD(τ ) = Eπ (τ − λ|τ ≥ λ) event. Generally, for a rare event satisfying the large deviation
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 933

principle, the probability of this rare event occurring decays increases, PFA decays to zero exponentially fast and the decay
to zero at an exponentially fast rate in the asymptotic sense rate is D + | log(1 − ρ)|.
over certain quantity. For the isolated scheme, at each node i, the relation between
∗ ∗
In this section, we analyze the performance of the cen- PFA(τ i ) and CADD1 (τ i ) has a similar format to that in the
tralized algorithm, by quantifying the relation between the centralized case shown in Theorem 1. We give the following
conditional ADD (CADD) and the PFA via large deviation corollary.

analysis, showing that the event of false alarm can be con- Corollary 1: The probability of false alarm (PFA(τ i )),
sidered as a rare event and the corresponding PFA decays with the optimal stopping rule (9), satisfies the large devi-
to zero exponentially fast, when the CADD increases. The ation principle, in the asymptotic sense with respect to the

results in this section will set the background for analyzing increasing conditional ADD (CADD1 (τ i )), i.e.,
the distributed case in the next section. 1   ∗ 
Since ADD is difficult to characterize, following [44], lim ∗
∗ log PFA τ i
CADD1 (τ i )→∞ CADD1 τ i
we instead analyze the CADD. The CADD is defined as


CADDk (τ ) = Ek (τ − k|τ ≥ k), k = 1, 2, . . . The relation = − D f1i , f0i + | log(1 − ρ)| (13)
between ADD and CADD is described as follows: which implies that the large deviation decay rate of the PFA
π
ADD(τ ) = E (τ − λ|τ ≥ λ) is D(f1i , f0i ) + | log(1 − ρ)|.
∞ Remark 2: Theorem 1 and Corollary 1 imply that the
πk Pk (τ ≥ k)Ek (τ − k|τ ≥ k)
= k=1 Kullback–Leibler information is a crucial factor that deter-
Pπ {τ ≥ λ}
∞ mines the performance of change detection algorithms.
πk Pk (τ ≥ k)CADDk (τ )
= k=1 . (11) Specifically, Corollary 1 shows that, for different sensors with
Pπ {τ ≥ λ} different pairs of densities f1i (x) and f0i (x), the sensor associated
According to the optimal stopping rule (7) and the test statis- with a density pair bearing a larger Kullback–Leibler infor-
tic (3), we find CADD1 (τ ∗ ) ≥ CADDk (τ ∗ ), for k ≥ 2, which mation asymptotically leads to a smaller PFA, with the same
is explained as follows. For k = 1 (which means that the CADD performance. Compared with the isolated scheme,
change happens at time 1), by investigating (3), 1 is updated Theorem 1 shows that, in the centralized scheme, the sum of
based on the initial state 0 = 0. For k ≥ 2, by investigat- Kullback–Leibler information numbers D is used to quantify
ing (3), k is updated based on k−1 , where 0 ≤ k−1 < A the relation between PFA and CADD.
according to the optimal stopping rule (7) and the condi- In the next section, we propose a distributed change detec-
tion τ ∗ ≥ k. Thus, we have k−1 ≥ 0 . According to the tion scheme and analyze its performance. Due to the informa-
optimal stopping rule (7), the spent time of crossing the thresh- tion propagation among sensors, we show that the distributed
old after the change happens (detection delay) in the case of scheme will outperform the isolated one, and the outperform-
k ≥ 2 is less than that in the case of k = 1 on average. ing is reflected by the averaged partial sum over individual
Therefore, we have CADD1 (τ ∗ ) ≥ CADDk (τ ∗ ). Additionally, Kullback–Leibler information numbers.
the difference between CADD1 (τ ∗ ) and CADDk (τ ∗ ) could be
treated as a constant for large A, which approximately equals IV. D ISTRIBUTED C HANGE D ETECTION AND L ARGE
E∞ (log k−1 ), k ≥ 2 [44]. Therefore, in the sequel, we focus D EVIATION A NALYSIS
on the use of CADD1 (τ ∗ ), which could be also considered as In this section, a random gossip-based distributed change
the worst-case study. detection algorithm is first introduced. Then, we model
The relation between CADD1 (τ ∗ ) and PFA(τ ∗ ), for the the information propagation in this distributed scheme as
centralized scheme, is presented in the following theorem. a Markov process. Finally, two-layer large deviation analy-
Theorem 1: The probability of false alarm (PFA(τ ∗ )), with sis is presented to analyze the performance of the proposed
the optimal stopping rule (7), satisfies the large deviation prin- distributed algorithm.
ciple, in the asymptotic sense with respect to the increasing First, we interpret the network as a nondirected graph G =
conditional ADD (CADD1 (τ ∗ )), i.e., (V, E), where V is the set of nodes with |V| = N and E is the
1 
 set of edges. If node i is connected to node j, then we have
lim ∗
log PFA τ ∗ that edge (i, j) ∈ E. The connection in graph G is represented
CADD1 (τ ∗ )→∞ CADD1 (τ )
by the following N × N symmetric adjacency matrix A with
= −(D + | log(1 − ρ)|) (12)
each element Aij as:
where D is the sum of the Kullback–Leibler information 
N 1, (i, j) ∈ E or i = j
numbers across all sensors, i.e., D = Aij = (14)
1 , f0 ), and
D(f i i
i=1 0, otherwise.
D + | log(1 − ρ)| is the large deviation decay rate, quanti-
fying how fast the PFA decays to zero over the increasing We assume that the network is connected, i.e., each node has
CADD. a path to any other node.
The proof is shown in Appendix A.
Remark 1: The above theorem quantifies the tradeoff A. Distributed Algorithm
between two performance metrics: 1) PFA and 2) CADD1 , We propose a random gossip-based distributed change
in the defined change detection problem, i.e., as CADD1 detection algorithm, where a random gossip algorithm, as
934 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

the inter-sensor communication structure, is used to propagate Assume that the sampling time interval for taking observa-
information among sensors within the neighborhood. tions is , within which there are M rounds of inter-sensor
Before proceeding to the algorithm details, we summarize communications, where M is a Poisson random variable with
the algorithm structure here. At each time slot, there are three mean γ [47]. At the mth (m ∈ {1, . . . , M}) round, a node ran-
steps. domly selects another node from its neighborhood to construct
Step 1: Each sensor swaps its observation with its ran- a two-way communication pair to exchange the observations
domly selected neighbor at each communication between them. At each sampling time interval, this communi-
round. cation structure is modeled by the sequence of matrices A(m),
Step 2: After communications are finished, each sensor m = 1, 2, . . . , M, i.e., the establishment of a communication
updates its local test statistic with the old test link between nodes i and j indicates that nodes i and j are
statistic and the observations received from other neighbors with respect to the time varying adjacency matrix
sensors. A(m). Note that there may exist multiple communication links
Step 3: Compare the updated test statistic with the thresh- or pairs simultaneously in the network, but only one commu-
old, to decide whether or not change has happened. nication link is associated with one given node in each round,
If not, continue running the algorithm for the next which is also implied by the mathematical model in (15).
time slot. Now, we model the communication link formation pro-
Communication among sensors is constrained by factors, cess from the perspective of Markov process. To this end,
such as proximity, transmitting power, and receiving capabil- the communication link process governed by the time vary-
ities. We model the communication structure in terms of the ing adjacency matrix sequence {A(m)} can be represented
nondirected graph G = (V, E), which is defined at the begin- by N particles traveling on the graph [36]. We denote the
ning of this section. If node i can communicate with node state of the ith particle as zi (m), where zi (m) indicates the
j, there is an edge existing between i and j, i.e., the set of index of node that the ith particle travels to at time m, with
edges E contains the edge (i, j). We assume that the diagonal zi (m) ∈ {1, . . . , N}. The evolution of the ith particle is given
elements in adjacency matrix A are identically 1, which indi- as follows:
cates that a node can always communicate with itself. The set
E is the maximal set of allowable communication links in the zi (m) = [zi (m − 1)]→
m zi (0) = i (17)
network at any time; however, at a particular instant, only a where the notation [i]→ m denotes the neighbor of node i at time
fraction of the allowable communication links are active, for m with respect to the adjacency matrix A(m), i.e., a com-
example, to avoid strong interference among communications. munication link is established between i→ m and i at time m.
The exact communication protocol is not that important for the Thus, the traveling process of the ith particle can be viewed
theoretical analysis, as long as the connectivity of network is as originating from node i initially and then traveling on the
satisfied. For definiteness, we assume the following generic graph according to the link formation process {A(m)} (possi-
communication model, which subsumes the widely used gos- bly changing its location at each step). For each i, the process
sip protocol for real-time embedded architectures [47] and the {zi (m)} is a Markov chain on the state space {1, . . . , N} with
graph matching-based communication protocols for Internet the transition probability matrix Ā [36].
architectures [48]. Define the set M of binary symmetric N×N After M rounds of inter-sensor communications, each node
matrices as follows: accumulates some observations from other nodes, with which
the local test statistic at each node is updated. Denote Oin as the
M = A|1T A = 1T , A1 = 1, A ≤ A (15)
set of nodes whose observations are available at node i after
where A ≤ A is interpreted as component-wise. In other inter-sensor communications at the end of the observation time
words, M is the set of adjacency matrices, where each node period n. We will describe the accumulation process to obtain
is incident to exactly one edge, which is included in the edge Oin later. Then, the distributed test statistic in,D is updated as
set E. Let D denotes a probability distribution on the space  
j j
M. We define a sequence of time-varying matrices A(m), 1
i f 1 Xn
n,D =
i
 +ρ  . (18)
m = 1, 2, . . ., as an i.i.d. sequence in M with distribution 1 − ρ n−1 f
j
X
j
i
j∈On 0 n
D. Define the averaged matrix Ā as
 With this test statistic updating rule, at each sensor i,
Ā = AdD(A). (16) the distributed change detection scheme is executed with the
M following stopping time τDi :
According to the definition of M in (15), Ā is a symmetric
τDi (A) = inf n ≥ 1 : in,D ≥ A (19)
stochastic matrix. We assume Ā to be irreducible and aperi-
odic. This assumption depends on the allowable edges E and where A is chosen as A = (1 − α)/α such that
the distribution D. Such a distribution D making this assump- PFA(τDi (A)) ≤ α.
tion valid always exists if the graph (V, E) is connected, e.g., Note that the observations are transmitted via the inter-
the uniform distribution. In addition, Ā could be interpreted as sensor communications, compared to that the test statistics
the transition matrix of a Markov chain, which we will discuss are transmitted in the consensus + innovations-based dis-
later. tributed detection algorithms, which are widely adopted, such
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 935

as in [40] and [41]. It is conjectured that the consensus + following probability as:
innovations-based distributed algorithms are more efficient to

fuse the information, since the test statistic contains more Pr Oin = ν = qin (ν) ν ∈ 0, 1, . . . , 2N − 1 . (22)
information than the raw observations. However, the consensus
+ innovations-based strategy cannot be applied in our dis- B. First-Layer Large Deviation Analysis
tributed change detection problem. In traditional distributed To perform large deviation analysis, we first need to
detection problems, the consensus + innovations-based strat- interpret the stopping time τDi (A) as a form of random walk
egy is inspired by investigating the recursion form of the crossing a threshold plus a nonlinear term [44]. To this end,
test statistic in the centralized algorithm [41], where the test the stopping time τDi (A) could be rewritten as
statistic is updated by linearly combining the previous test
statistic and the current innovation. Then, in the consensus τDi (A) = inf{n ≥ 1 : Wn (ρ) + ln ≥ log(A/ρ)} (23)
+ innovations-based distributed algorithms, at the consensus where Wn (ρ) = Zn + n| log(1 − ρ)| is a random walk with
step, the test statistics are transmitted and linearly combined  
j j
at the receiver node. The linearly combined test statistic is  n  f0 Xk
the counterpart of the previous test statistic in centralized Zn = log   (24)
j j
algorithm. However, in our change detection problem, the k=1 j∈Oi f1 Xk
k
recursion form of the test statistic shown in (4) shows that the
and
update rule of the test statistic log n is not simply linearly ⎧  ⎫
combining the previous test statistic and the current innovation, ⎨ 
n−1 k f j Xsj ⎬
0
due to the factor log(n−1 + ρ). ln = log 1 + (1 − ρ)k   . (25)
⎩ j j ⎭
Now, we describe the observation accumulation process to k=1 s=1 j∈Os 1 Xs
i f
obtain Oin . Let sm n = [sn (1), . . . , sn (N)], n (i) ∈
m m with element sm
n (i)
sm Specifically, Wn (ρ) is a random walk with mean
{1, . . . , N} indexing the observation Xn at sensor i just after
the mth round of communication in the observation time period N −1
2   
j j
n. The initial state is s0n (i) = i at each sensor i, which means E1 {Wn (ρ)} = n q̄iγ (ν) D f1 , f0 + n| log(1 − ρ)|
that at the beginning of the time slot n, each sensor i only has ν=1 j∈ ν
its own observation Xni . When the communication starts, by (26)
following the communication model A(m), the observations
{Xni }i∈{1,...,N} travel across the network in the following way: where q̄iγ (ν) is the probability defined as
 
n = A(m)sn
sm m−1
1 ≤ m ≤ M. (20) q̄iγ (ν) = Pr Oiγ = ν ν ∈ 0, 1, . . . , 2N − 1 (27)

During these M rounds of inter-sensor communications until in which Oiγ , a random variable taking values from , denotes
the end of the time period n, each sensor stores observa- the set of nodes whose observations are available at node i after
tions exchanged from other sensors. Then, at the end of the γ rounds of communications, and γ is the mean value of the
time period n, observations from other sensors are accumu- number of communication rounds. Then, based on the above
lated at sensor i, and the set of sensors whose observations random walk interpretation for the stopping time, we have the
are available at sensor i is denoted by following theorem regarding the relation between PFA and

M CADD in the proposed distributed change detection scheme.

Oin = n (i) .
sm (21) Theorem 2: The probability of false alarm (PFA(τDi )), with
m=0 the stopping rule (19) in the distributed change detection
algorithm with the parameter γ as the averaged number
This observation accumulation process terminates at the end
of inter-sensor communications, satisfies the large deviation
of the time period n. Then, a similar observation accumulation
principle in the asymptotic sense with respect to increasing
process repeats during the time period n + 1, which is inde-
conditional ADD (CADD1 (τDi )), i.e.,
pendent of the previous process. Therefore, the sequence {Oin },
as the set denoting observation indices which are available at 1 

lim

i
i log PFA τDi
sensor i at the end of the nth period, is an i.i.d. process. CADD1 τD →∞ CADD1 τD
To better describe this paper in the sequel, we introduce  
some notations here. Let denotes the power set of node = − Dγi + | log(1 − ρ)| (28)
indices {1, . . . , N}, where elements of are indexed by ν, 2N −1 i  j j
with ν ∈ {0, 1, . . . , 2N − 1}. We use 0 to denote the null set where Dγi = ν=1 q̄γ (ν) j∈ ν D(f1 , f0 ), and Dγi +| log(1−
and 2N −1 to denote the whole set of node indices. For techni- ρ)| is the large deviation decay rate of PFA. We call Dγi as
cal convenience, we interpret sensors in the set ν indexed by the distributed Kullback–Leibler information number.
ν to be arranged in an ascending order with j1 denoting the first The proof is shown in Appendix B.
one and j| ν | denoting the last one, i.e., ν = {j1 , . . . , j| ν | }. Remark 3: Theorem 2 shows that Dγi , whose function is
Therefore, the set Oin , denoting nodes whose observations are similar to D in the centralized scheme and D(f1i , f0i ) in the iso-
available at node i after the observation accumulation process, lated scheme, is a crucial factor determining the performance
is a random variable taking values from . We denote the of the distributed change detection algorithm. The physical
936 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

2N −2 i
meaning of Dγi is explained as follows. Due to the observa- Theorem 3: As γ → ∞, the probability v=0 q̄γ (v) has
tion propagation process, observations and the corresponding the large deviation upper and lower bounds as follows:
log-likelihood ratios from other sensors are available at each
−2
2N
sensor; to some extent, Dγi can be considered as an accu- ln β 1 ln α
≤ lim ln q̄iγ (v) ≤ (31)
mulated form of these information. In particular, Dγi is an L γ →∞ γ L
v=0
averaged partial sum of the Kullback–Leibler information
numbers D(f1i , f0i ), i = 1, . . . , N, compared to D as the total where α, β, and L are parameters in (29) and (30).
sum. Also, from the mathematical form of Dγi , we see that The proof is presented in Appendix C.
2N −2 i
D(f1i , f0i ) ≤ Dγi ≤ D, and the case of q̄iγ (1) = 1 corre- Remark 4: Since v=0 q̄γ (v) represents the probability
sponds to the lower bound Dγi = D(f1i , f0i ), while the case of of the event that not all observations are available at sensor
q̄iγ (2N −1) = 1 corresponds to the upper bound Dγi = D. Since i, Theorem 3 implies that this event is a rare event and its
D(f1i , f0i ) ≤ Dγi ≤ D and Dγi determines the performance of probability decays exponentially fast to zero as γ → ∞.
the change detection algorithm, the above analysis proves that Based on Theorem 3, we further have the following theo-
the distributed algorithm outperforms the isolated algorithm, rem regarding the behavior of the distributed Kullback–Leibler
but falls behind the centralized one. information number Dγi defined in Theorem 2.
Theorem 4: As γ → ∞, we have the following upper and
lower bounds for the value of Dγi :
C. Second-Layer Large Deviation Analysis    ln α
Since Dγi has been shown as a crucial factor in the large D− max D f1 , f0 e L γ ≤ Dγi
j j
deviation analysis of last section, in this section, we focus on j∈{1,...,N}\i
   ln β
studying the behavior of Dγi . As we still stay in the scope of
min D f1 , f0 e L γ (32)
j j
≤D−
large deviation analysis as we did in the last section, we call j∈{1,...,N}\i
it as the second-layer large deviation analysis, where the anal-
j j
ysis in the last section is called the first-layer large deviation where D(f1 , f0 ) is the Kullback–Leibler information number
analysis. defined in (10) and D is the centralized Kullback–Leibler
As we cannot obtain the closed-form for Dγi due to the information number defined in Theorem 1, and ln α/L and
complicated probabilities incorporated, we discuss its asymp- ln β/L are the upper and lower bounds derived in Theorem 3.
totic behavior when γ → ∞. To this end, we first study the The proof is presented in Appendix D.
behavior of q̄iγ (ν), defined below (26), when γ → ∞, by Remark 5: Theorem 4 implies that Dγi converges to D
employing the concept of hitting times in Markov chains. exponentially fast, as γ → ∞. Since Dγi and D determine
For each ν = 2N − 1, without loss of generality, we the performance of the distributed and centralized algorithms,
assume that ν corresponds to the index of the sensor subset respectively, this theorem also implies that the performance
{i1 , i2 , . . . , im }, with {i
1 , i
2 , . . . , i
N−m } as the complementary of the proposed distributed algorithm converges to that of the
subset, where m ≥ 1 due to the fact that at least its own obser- centralized one at an exponentially fast rate.
vation is available at each sensor. Let Tj denotes the hitting
time, starting from state (index of sensor) j to hit another spe-
V. S IMULATION R ESULTS
cific state i in the Markov chain, whose transition probability
matrix is Ā defined in (16). From [49, Th. 7.26], since the In this section, we simulate the proposed distributed algo-
transition probability matrix Ā is irreducible, there exists con- rithm with a network of five nodes taking observations. We
stants 0 < α < 1 and 0 < L < ∞ such that P(Tj > L) ≤ α, ∀j, consider a Bayesian setup, and set the prior distribution of
and more generally, the change-point time as a geometric distribution with param-
eter ρ = 0.1. Before the change happens, we consider that

the observation at each node follows a Gaussian distribution
P Tj > kL ≤ α k , k = 0, 1, 2, . . . . (29)
N(0, 1); after the change happens, the observation at node
i, i = 1, . . . , 5, turns to follow another Gaussian distribu-
Also, there exists a constant 0 < β < 1 such that P(Tj > L) ≥
tion N(0.1 × i, 1). Note that here we consider a setup that
β, ∀j, and more generally,
observations at different nodes have different post-change dis-

tributions, which is to mimic the more general situation that
P Tj > kL ≥ β k , k = 0, 1, 2, . . . . (30)
different nodes could suffer different levels of impact from
the same physical change. For example, certain physical event,
Based on the above results of hitting times in Markov such as the leakage of chemical gas or the abrupt increasing of
chains, we first present the following large deviation related
2N −2 i temperature, would lead to different degrees of impacts in dif-
theorem on the asymptotic behavior of v=0 q̄γ (v), as γ → ferent nodes, due to their various locations. The nodes near the
∞. Since ν ∈ {0, 1, . . . , 2N − 1} according to (27), we have origin of the physical event could suffer from a more serious
2N −2 i
v=0 q̄γ (v) = 1 − q̄γ (2 − 1), where q̄γ (2 − 1) denotes the
i N i N
influence, which is reflected by a larger mean in the post-
probability that the observations from all sensors are available distribution; the nodes faraway the origin could suffer from a
2N −2 i
at sensor i, i.e., v=0 q̄γ (v) is the probability of the event less serious influence, which is reflected by a smaller mean in
that not all observations are available at sensor i. the post-distribution.
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 937

Fig. 2. First-layer large deviation analysis: comparison of decay rates in Fig. 3. Second-layer large deviation analysis in Theorem 3: simulated decay
distributed, centralized, and isolated schemes with simulation (dashed line) rate (dashed line) of the probability that not all observations are available at a
versus analytical results (solid line). sensor, and the corresponding large deviation upper and lower bounds (solid
lines).

In Fig. 2, we show the simulated and analytical decay rate


corresponding to the first-layer large deviation analysis, and
also compare the performance of the distributed scheme versus
the centralized and isolated ones. In the simulation, we set γ as
6, recalling that γ is the mean value for number of communi-
cation rounds within each sampling time period. In Fig. 2, the
dashed curves denote the simulated decay rate, and the solid
lines present the analytical decay rates in Theorem 1 for the
centralized scheme, Corollary 1 for the isolated scheme, and
Theorem 2 for the distributed scheme, respectively. A higher
decay rate implies a lower PFA under the same CADD, which
means that the performance is better. Therefore, from the sim-
ulation results of the decay rates, we see that the distributed
scheme outperforms the isolated one, but performs worse than
the centralized one, which conforms to the analytical result Fig. 4. Calculation of the lower bound ln β/L and the upper bound ln α/L
from Theorem 2. The gap between the simulated curve and in Theorem 3 with varying L.
the analytical line is due to the fact that the false alarm event
becomes very rare with the increasing of the CADD. When
the CADD goes to further larger, it requires extremely huge Recall that we intend to find α such that P(Tj > L) ≤ α, ∀j.
number of simulation rounds to have rare false alarm event Thus, we can set α = maxj P(Tj > L). In order to find β such
happening, which is very difficult for the Monte-Carlo sim- that P(Tj > L) ≥ β, ∀j, we can set β = minj P(Tj > L).
ulation. In comparison, the analytical result is derived when Then, we are ready to calculate ln α/L and ln β/L. To this
CADD goes to infinity. end, the selection of L is a critical step, as both α and β
In Fig. 3, we show the simulation results of the value are calculated based on the selection of L. Here, we show
N the calculation of ln α/L and ln β/L with different L values
(1/γ ) ln 2v=0−2 q̄iγ (v), denoting the decay rate of the rare
in Fig. 4. From Fig. 4, we see a very interesting phenomenon
event that not all observations are available at sensor i, as the
that these two bounds look converging as L increases, although
parameter γ increases, which is the second-layer large devia-
here we will not provide the mathematical proof of this result.
tion analysis shown in Theorem 3. We also present the large
This observation could imply some potential properties for
deviation lower and upper bounds in Fig. 3, from which we
hitting time in Markov chains. The further exploration with
see that the simulated decay rate locates between the large
analytical analysis based on this observation will be left for
deviation lower and upper bounds, and the bounds are rela-
our future work. Note that the upper and lower bounds in
tively tight, which verifies the analytical result in Theorem 3.
Fig. 3 are set as the values calculated with L = 15.
Here, we also present the lower bound ln β/L and the upper
In Fig. 5, we show the simulation results for the distributed
bound ln α/L of Theorem 3, which are shown in Fig. 3. Recall
Kullback–Leibler information Dγi , the value of the centralized
that Tj denotes the hitting time, starting from state (index of
Kullback–Leibler information D, and the calculation results
sensor) j to hit another specific state i in the Markov chain
for the upper and lower bounds presented in Theorem 4. From
with the transition probability matrix Ā. Then, we have

 Fig. 5, we see that the upper bound is a very tight bound,
P Tj > L = Āji1 Āi1 i2 Āi2 i3 · · · ĀiL−1 iL . (33) while the lower bound is relatively loose. However, the range
i1 ,...,iL =i of y-axis in this figure is very small from 0.3765 to 0.3810;
938 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

Since CADD1 (τ ∗ ) = E1 (τ ∗ − 1) = E1 (τ ∗ ) − 1, by
combining (34) and (35), we have
PFA(τ ∗ )ρ

log = −CADD1 τ ∗ (D + | log(1 − ρ)|)
ζ (ρ, D)(1 + o(1))
− ξ (ρ, D) + (o(1) − 1)(D + | log(1 − ρ)|).
(36)
Then, after dividing the left-hand and right-hand sides of (36)
by CADD1 (τ ∗ ) and taking the limit as CADD1 (τ ∗ ) → ∞,
we have
1

lim ∗
log PFA τ ∗
CADD1 (τ ∗ )→∞ CADD1 (τ )
= −(D + | log(1 − ρ)|). (37)

Fig. 5. Simulated distributed Kullback–Leibler information (dashed line),


centralized Kullback–Leibler information (dashed-dotted line) and the corre- A PPENDIX B
sponding analytical upper and lower bounds (solid lines) in Theorem 4.
P ROOF OF T HEOREM 2
The proof adopts the relevant results from the nonlinear
so both the lower and upper bounds are tight in this sense.
renewal theory in [50]. To complete the proof, we first present
We also see that the distributed Kullback–Leibler information
two preliminary results, regarding the proposed distributed
Dγi converges to the centralized Kullback–Leibler information
algorithm, as follows:
D, as γ increases, which implies that the performance of the  
distributed change detection scheme converges to that of the
ζ ρ, D i
γ
centralized one, since Dγi and D determine the performance PFA τDi = (1 + o(1)), as A → ∞ (38)
A  
of the distributed and centralized schemes, respectively.
1 A  
E1 τDi = i log − ξ ρ, Dγi + o(1)
Dγ + | log(1 − ρ)| ρ
VI. C ONCLUSION
as A → ∞ (39)
We have proposed a distributed Bayesian quickest detec-
tion scheme, with a random gossip-type protocol to realize where Dγi is defined below (28), denoting the averaged
inter-sensor communications. With this communication struc- value of the Kullback–Leibler information number in the dis-
ture, we modeled the information propagation procedure in the tributed algorithm, and ζ (ρ, Dγi ) and ξ(ρ, Dγi ) are functions
network as a Markov process. We analyzed the performance of of parameters ρ and Dγi .
the proposed scheme via a method of two-layer large deviation Note that the above results for the distributed algorithm is
analysis. The first-layer analysis shows that the PFA decays to similar to [44, Th. 5], which is related to the performance
zero at an exponentially fast rate, as the CADD increases, and of the isolated algorithm. The difference is that the averaged
also implies that the distributed Kullback–Leibler information partial sum of the Kullback–Leibler numbers is involved in the
number is a key factor determining the detection performance. distributed algorithm, due to the observation accumulation at
The second-layer analysis shows that the probability of the rare each node. In the sequel, we provide the proof flow for these
event that not all observations are available at a sensor decays two results.
exponentially fast to zero, as the averaged number of commu- First, we verify (38). By recalling pn defined in (5) and
nications increases, where the large deviation upper and lower n = pn /(1 − pn ), we have

 
bounds of this decay rate are also derived. Finally, it is shown
PFA τDi = Eπ 1 − pτ i
that the performance of the distributed algorithm converges D
 −1
exponentially fast to that of the centralized optimal one. = Eπ 1 + τ i
D
 
A PPENDIX A τ i −1
π
P ROOF OF T HEOREM 1 = E 1+A D
A
Recall [44, Th. 5], which establishes the following results: 1


ζ (ρ, D) = Eπ e−ωa (1 + o(1)) A → ∞ (40)
PFA τ ∗ = (1 + o(1)), as A → ∞ (34) A
A   where ωa = log τ i − a and a = log(A/ρ). For Eπ (e−ωa ), we

1 A D
E1 τ ∗ = log − ξ (ρ, D) + o(1) have
D + | log(1 − ρ)| ρ




as A → ∞ (35) Eπ e−ωa = Eπ e−ωa |τDi ≥ λ 1 − PFA τDi



N + Eπ e−ωa |τDi < λ PFA τDi
where D = i=1 D(f1 , f0 ), and both ζ (ρ, D) and ξ(ρ, D)
i i

 
are functions of ρ and D. Since ρ and D are constants once = Eπ e−ωa |τDi ≥ λ + O A−1 A → ∞ (41)
the system parameters are set, ζ (ρ, D) and ξ(ρ, D) are also
system constants. which is due to PFA(τDi ) ≤ 1/(1 + A) < 1/A.
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 939



Thus, we turn to study Eπ e−ωa |τDi ≥ λ as Since ln , n = 1, 2, . . ., are nondecreasing, we have ln ≤ l.

Then, we have E1 (ln ) < ∞, implying the uniform integrabil-
Eπ e−ωa |τDi ≥ λ ity. Therefore, the second condition is satisfied.
∞


Now, we intend to show the validity of the third condition.
= Ek e−ωa |τDi ≥ k P λ = k|τDi ≥ k . (42) According to [44, Lemma 1], we have
k=1

For any 1 ≤ k < ∞, we have P1 τDi (A) ≤ 1 + (1 − )La ≤ e−φ a + β(, A) (50)
i +| log(1−ρ)|)−1 , φ > 0 for all 0 <  < 1,
τDi = inf n ≥ 1 : Wn,k (ρ) + ln,k ≥ a (43) where La = a(DD 
and β(, A) = P1 {max1≤n<K,A Zn ≥ (1+)DD i K }, in which
,A
where Wn,k (ρ) = Zn,k + (n − k + 1)| log(1 − ρ)|, n ≥ k, is a K,A = (1 − )La and Zn is defined in (24). The term e−φ a
random walk with Ek [Wn,k (ρ)] = (n−k+1)(Dγi +| log(1−ρ)|) on the right-hand side is o(1/a). Thus, in order to show
and ln,k is a nonlinear term. In Wn,k (ρ), we have
  lim aP1 τDi (A) ≤ 1 + (1 − )La = 0 (51)
j j
 n  f0 Xt A→∞
Zn,k = log   . (44) we only need to prove that the other term β(, A)
j j
t=k j∈Oi f1 Xt is also o(1/a), since a = log(A/ρ). To this end, by
t

Then, by applying [50, Th. 4.1], we obtain applying [51, Th. 1], for ν > 0 and r ≥ 0, we have

  ∞  !


lim Ek e−ωa |τDi ≥ k = ζ ρ, Dγi (45) P1 max Zk − DD i
k ≥ νn
A→∞ 1≤k≤n
n=1
  !
where ζ (ρ, Dγi )
is a function of parameters ρ and Dγi .
r+1 

i +
r
i 2
We also have ≤ Cr E1 Z1 − DD + E1 Z1 − DD (52)



πk Pk τDi ≥ k|λ = k
lim P λ = k|τDi ≥ k = lim
i = πk . where Cr is a constant. When r = 1, the finiteness of the right-
A→∞ A→∞ Pπ τD ≥ k hand side of the above inequality implies that the left-hand
(46) side is also finite. Thus, we obtain P1 {max1≤k≤n (Zk − DD
i k) ≥

νn} = o(1/n).
Therefore, by plugging (45) and (46) into (42), we have
  Then, with the fact that

 !
lim Eπ e−ωa |τDi ≥ λ = ζ ρ, Dγi . (47)

A→∞ β(, A) ≤ P1 max Zn − DD i
n ≥ DDi
K,A (53)
1≤n<K,A
Finally, by combining (40), (41), and (47), we prove (38).
The proof of (39) depends on [50, Th. 4.5]. In order to use we have β(, A) = o(1/a). Therefore,
this theorem, the validity of the following three conditions
needs to be checked: lim aP1 τDi (A) ≤ 1 + (1 − )La = 0. (54)
A→∞
∞
P1 {ln ≤ −θ n} < ∞, for some 0 < θ < DD i By taking ε = 1 − , finally, we have

n=1 lim aP1 τDi (A) ≤ εLa
max |ln+k |, n ≥ 1, are P1 uniformly integrable A→∞

0≤k≤n

i −1 ≤ lim aP1 τDi (A) ≤ 1 + (1 − )La
lim aP1 τDi (A) ≤ εa DD + | log(1 − ρ)| =0 A→∞
A→∞ = 0. (55)
for some 0 < ε < 1, where a = log(A/ρ)
Hence, the third condition is satisfied. Therefore, the con-
where ln is defined in (25). ditions of [50, Th. 4.5] are satisfied. This theorem shows
It is easy to check that the first condition is valid, as ln ≥ that (39) is valid.
0. For the second condition, we have max0≤k≤n |ln+k | = l2n , Then, with (38) and (39), by taking the same proof method
since ln , n = 1, 2, . . ., are nondecreasing. Thus, to check that of Theorem 1, we have
the second condition is valid, we only need to show that ln ,
1 

n = 1, 2, . . ., are P1 uniformly integrable. To this end, we have lim

i
i log PFA τDi
that ln converges almost surely, as n → ∞, to the following CADD1 τD →∞ CADD1 τD
 
random variable: = − Dγi + | log(1 − ρ)| . (56)
⎧  ⎫
⎨ ∞ k f j Xsj ⎬
0
l = log 1 + (1 − ρ)k   . (48)
⎩ j j ⎭
k=1 s=1 j∈Ois f1 Xs A PPENDIX C
By taking the expectation, we have P ROOF OF T HEOREM 3
 ∞
 Recall that ν corresponds to the index of the sensor subset
 1
E1 (l) ≤ log 1 + (1 − ρ) = log .
k
(49) {i1 , i2 , . . . , im }, with {i
1 , i
2 , . . . , i
N−m } as the complementary
ρ subset, and Tj denotes the hitting time, starting from state
k=1
940 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

(index of sensor) j to hit another specific state i in the Markov A PPENDIX D


chain. Then, the probability q̄iγ (ν) could be represented as P ROOF OF T HEOREM 4
  2N −1 i  j j
q̄iγ (ν) = Pr Ti
1 > γ , . . . , Ti
N−m > γ , Ti1 ≤ γ , . . . , Tim ≤ γ Recall Dγi = ν=1 q̄γ (ν) j∈ ν D(f1 , f0 ) and D =
N
  i=1 D(f1 , f0 ). We have
i i
≤ Pr Ti
1 > γ , . . . , Ti
N−m > γ



N −2
2   j j
≤ min Pr Ti
n > γ . (57) a
Dγi = q̄iγ 2 −1 D+
N
q̄iγ (ν) D f 1 , f0
1≤n≤N−m
ν=1 j∈ ν
Thus, we have ⎛ ⎞
" #
1  i   
N −2
2 N −2
2
1

= ⎝1 − q̄iγ (ν)⎠D +
b j j
lim ln q̄γ (ν) ≤ lim ln min P Ti
n > γ q̄iγ (ν) D f1 , f0 (65)
γ →∞ γ γ →∞ γ 1≤n≤N−m
ν=1 ν=1
1  γ /L  ln α
j∈ ν
≤ lim ln α = (58) where equation a is due to the fact that ν = {1, . . . , N} with
γ →∞ γ L
ν = 2N −1, i.e., 2N −1 denotes the set of indices of all sensors,
where the second inequality is due to (29). 2N −1 i
For q̄iγ (ν), we also have and equation b is based on ν=1 q̄γ (ν) = 1.
  Then, from (65), we have
q̄iγ (ν) = Pr Ti
1 > γ , . . . , Ti
N−m > γ , Ti1 ≤ γ , . . . , Tim ≤ γ ⎛ N −2

2
   
≥ Pr Ti
1 > γ · · · Pr Ti
N−m > γ Dγi ≤ ⎝1 − q̄iγ (ν)⎠D


ν=1
Pr Ti1 ≤ γ · · · Pr Tim ≤ γ . (59) 2N −2
   
j j
This leads to + q̄iγ (ν) max D f 1 , f0
 m 
1  i  1 γ /L
N−m 
γ /L ν=1
1≤ν≤2N −2
j∈ ν
lim ln q̄γ (ν) ≥ lim ln β 1−α
γ →∞ γ γ →∞ γ N −2
2  
j j
ln β =D− q̄iγ (ν) min D f 1 , f0 . (66)
= (N − m) (60) j∈{1,...,N}\i
L ν=1
where the first inequality is due to (29) and (30), and the last We could also obtain
equality is derived with 0 < α < 1. ⎛ N −2

2
By combining (58) and (60), we have
Dγi ≥ ⎝1 − q̄iγ (ν)⎠D
ln β 1  i  ln α
(N − m) ≤ lim ln q̄γ (ν) ≤ . (61) ν=1
L γ →∞ γ L
2N −2
   
Then, we obtain + q̄iγ (ν) min
j j
D f 1 , f0
1≤ν≤2N −2
1 
N  
N −2
2
1 ν=1 j∈ ν
lim ln q̄iγ (v) ≤ lim ln 2 − 1 max q̄iγ (v) N −2
2  
γ →∞ γ γ →∞ γ v
j j
v=0
   ln α =D− q̄iγ (ν) max D f 1 , f0 . (67)
1 j∈{1,...,N}\i
= lim ln max q̄iγ (v) ≤ ν=1
γ →∞ γ v L
According to Theorem 3, as γ → ∞, we have
(62)
N −2
2
where the last inequality is due to (61). ln β ln α
e L γ ≤ q̄iγ (v) ≤ e L γ. (68)
We also have
v=0
1  i
 b ln β
N −2
2
1 a
Then, by combining (66)–(68), as γ → ∞, we derive
lim ln q̄iγ (v) ≥ lim ln q̄γ vp = (63)
γ →∞ γ γ →∞ γ L    ln α
v=0
max D f1 , f0 e L γ ≤ Dγi
j j
where vp on the right-hand side of inequality a denotes a par- D−
j∈{1,...,N}\i
ticular index of the subset of sensors such that m = N − 1,    ln β
min D f1 , f0 e L γ . (69)
j j
i.e., vp is the index of the sensor subset {i1 , i2 , . . . , iN−1 }, ≤D−
j∈{1,...,N}\i
recalling the notations defined at the beginning of this sec-
2N −2 i
tion. Since for vp ∈ {0, 2N − 2}, we have v=0 q̄γ (v) ≥
q̄iγ (vp ), implying the establishment of inequality a. According
to (61) and taking m = N − 1, we derive the equation R EFERENCES
b in (63). [1] D. Li, S. Kar, and S. Cui, “Distributed Bayesian quickest change detec-
By combining (62) and (63), we conclude that tion in sensor networks via large deviation analysis,” in Proc. 54th Annu.
Allerton Conf. Commun. Control Comput. (Allerton), Monticello, IL,
−2
2 N
USA, Sep. 2016, pp. 1274–1281.
ln β 1 ln α
≤ lim ln q̄iγ (v) ≤ . (64) [2] L. Lai, Y. Fan, and H. V. Poor, “Quickest detection in cognitive
L γ →∞ γ L radio: A sequential change detection framework,” in Proc. IEEE
v=0 Glob. Telecommun. Conf. (GLOBECOM), New Orleans, LO, USA,
Nov./Dec. 2008, pp. 1–5.
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 941

[3] H. Li, H. Dai, and C. Li, “Collaborative quickest spectrum sensing via [26] A. K. Sahu and S. Kar, “Recursive distributed detection for com-
random broadcast in cognitive radio systems,” IEEE Trans. Wireless posite hypothesis testing: Nonlinear observation models in additive
Commun., vol. 9, no. 7, pp. 2338–2348, Jul. 2010. Gaussian noise,” IEEE Trans. Inf. Theory, vol. 63, no. 8, pp. 4797–4828,
[4] S. Trivedi and R. Chandramouli, “Secret key estimation in sequen- Aug. 2017.
tial steganography,” IEEE Trans. Signal Process., vol. 53, no. 2, [27] A. G. Tartakovsky and V. V. Veeravalli, “Asymptotically optimal quick-
pp. 746–757, Feb. 2005. est change detection in distributed sensor systems,” Sequential Anal.,
[5] M. Thottan and C. Ji, “Anomaly detection in IP networks,” IEEE Trans. vol. 27, no. 4, pp. 441–475, Oct. 2008.
Signal Process., vol. 51, no. 8, pp. 2191–2204, Aug. 2003. [28] V. V. Veeravalli, “Decentralized quickest change detection,” IEEE Trans.
[6] A. G. Tartakovsky, B. L. Rozovskii, R. B. Blazek, and H. Kim, “A novel Inf. Theory, vol. 47, no. 4, pp. 1657–1665, May 2001.
approach to detection of intrusions in computer networks via adaptive [29] O. Hadjiliadis, H. Zhang, and H. V. Poor, “One shot schemes for decen-
sequential and batch-sequential change-point detection methods,” IEEE tralized quickest change detection,” IEEE Trans. Inf. Theory, vol. 55,
Trans. Signal Process., vol. 54, no. 9, pp. 3372–3382, Sep. 2006. no. 7, pp. 3346–3359, Jul. 2009.
[7] A. A. Cardenas, S. Radosavac, and J. S. Baras, “Evaluation of detec- [30] G. V. Moustakides, “Decentralized CUSUM change detection,” in
tion algorithms for MAC layer misbehavior: Theory and experiments,” Proc. 9th Int. Conf. Inf. Fusion, Florence, Italy, Jul. 2006, pp. 1–6.
IEEE/ACM Trans. Netw., vol. 17, no. 2, pp. 605–617, Apr. 2009. [31] L. Zacharias and R. Sundaresan, “Decentralized sequential change detec-
tion using physical layer fusion,” IEEE Trans. Wireless Commun., vol. 7,
[8] D. Commenges, J. Seal, and F. Pinatel, “Inference about a change
no. 12, pp. 4999–5008, Dec. 2008.
point in experimental neurophysiology,” Math. Biosci., vol. 80, no. 1,
[32] D. Li, L. Lai, and S. Cui, “Quickest change detection and identification
pp. 81–108, Jul. 1986.
across a sensor array,” in Proc. IEEE Glob. Conf. Signal Inf. Process.
[9] M. Frisén, “Optimal sequential surveillance for finance, public health,
(GlobalSIP), Austin, TX, USA, Dec. 2013, pp. 145–148.
and other areas,” Sequential Anal., vol. 28, no. 3, pp. 310–337,
[33] T. Banerjee, V. Sharma, V. Kavitha, and A. JayaPrakasam, “Generalized
Jul. 2009.
analysis of a distributed energy efficient algorithm for change detec-
[10] C. Sonesson and D. Bock, “A review and discussion of prospective tion,” IEEE Trans. Wireless Commun., vol. 10, no. 1, pp. 91–101,
statistical surveillance in public health,” J. Roy. Stat. Soc., vol. 166, Jan. 2011.
no. 1, pp. 5–21, Feb. 2003. [34] P. Braca, S. Marano, V. Matta, and P. Willett, “Consensus-based page’s
[11] J. A. Rice et al., “Flexible smart sensor framework for autonomous test in sensor networks,” Signal Process., vol. 91, no. 4, pp. 919–930,
structural health monitoring,” Smart Struct. Syst., vol. 6, nos. 5–6, Apr. 2011.
pp. 423–438, May 2010. [35] S. S. Stankovic, N. Ilic, M. S. Stankovic, and K. H. Johansson,
[12] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, “Distributed change detection based on a consensus algorithm,”
“Wireless sensor networks for habitat monitoring,” in Proc. 1st ACM Int. IEEE Trans. Signal Process., vol. 59, no. 12, pp. 5686–5697,
Workshop Wireless Sensor Netw. Appl., Atlanta, GA, USA, Sep. 2002, Dec. 2011.
pp. 88–97. [36] D. Li, S. Kar, J. M. F. Moura, H. V. Poor, and S. Cui, “Distributed
[13] T. Banerjee and V. V. Veeravalli, “Energy-efficient quickest Kalman filtering over massive data sets: Analysis through large devia-
change detection in sensor networks,” in Proc. IEEE Stat. Signal tions of random Riccati equations,” IEEE Trans. Inf. Theory, vol. 61,
Process. Workshop (SSP), Ann Arbor, MI, USA, Aug. 2012, no. 3, pp. 1351–1372, Mar. 2015.
pp. 636–639. [37] D. Li, S. Kar, F. E. Alsaadi, A. M. Dobaie, and S. Cui,
[14] Y. Mei, “Quickest detection in censoring sensor networks,” in Proc. “Distributed Kalman filtering with quantized sensing state,”
IEEE Int. Symp. Inf. Theory (ISIT), St. Petersburg, Russia, Jul. 2011, IEEE Trans. Signal Process., vol. 63, no. 19, pp. 5180–5193,
pp. 2148–2152. Oct. 2015.
[15] A. G. Tartakovsky and V. V. Veeravalli, “Quickest change detection in [38] A. Dembo and O. Zeitouni, Large Deviations Techniques and
distributed sensor systems,” in Proc. 6th IEEE Int. Conf. Inf. Fusion, Applications. New York, NY, USA: Springer, 1998.
Cairns, QLD, Australia, Jul. 2003, pp. 756–763. [39] J. A. Bucklew, Large Deviation Techniques in Decision, Simulation, and
[16] G. Mateos and G. B. Giannakis, “Distributed recursive least-squares: Estimation. New York, NY, USA: Wiley, 1990.
Stability and performance analysis,” IEEE Trans. Signal Process., [40] D. Bajovic, D. Jakovetic, J. Xavier, B. Sinopoli, and J. M. F. Moura,
vol. 60, no. 7, pp. 3740–3754, Jul. 2012. “Distributed detection via Gaussian running consensus: Large devia-
[17] L. Li, J. A. Chambers, C. G. Lopes, and A. H. Sayed, “Distributed tions asymptotic analysis,” IEEE Trans. Signal Process., vol. 59, no. 9,
estimation over an adaptive incremental network based on the affine pp. 4381–4396, Sep. 2011.
projection algorithm,” IEEE Trans. Signal Process., vol. 58, no. 1, [41] D. Jakovetić, J. M. F. Moura, and J. Xavier, “Distributed detection over
pp. 151–164, Jan. 2010. noisy networks: Large deviations analysis,” IEEE Trans. Signal Process.,
[18] A. H. Sayed, S.-Y. Tu, J. Chen, X. Zhao, and Z. J. Towfic, “Diffusion vol. 60, no. 8, pp. 4306–4320, Aug. 2012.
strategies for adaptation and learning over networks: An examination [42] A. K. Sahu and S. Kar, “Distributed sequential detection for Gaussian
of distributed strategies and network behavior,” IEEE Signal Process. shift-in-mean hypothesis testing,” IEEE Trans. Signal Process., vol. 64,
Mag., vol. 30, no. 3, pp. 155–171, May 2013. no. 1, pp. 89–103, Jan. 2016.
[43] H. V. Poor and O. Hadjiliadis, Quickest Detection. Cambridge, U.K.:
[19] P. Braca, S. Marano, V. Matta, and P. Willett, “Asymptotic optimality
Cambridge Univ. Press, 2008.
of running consensus in testing binary hypotheses,” IEEE Trans. Signal
[44] A. G. Tartakovsky and V. V. Veeravalli, “General asymptotic Bayesian
Process., vol. 58, no. 2, pp. 814–825, Feb. 2010.
theory of quickest change detection,” Theory Probability Appl., vol. 49,
[20] F. S. Cattivelli and A. H. Sayed, “Distributed detection over adap-
no. 3, pp. 458–497, 2005.
tive networks using diffusion adaptation,” IEEE Trans. Signal Process.,
[45] A. N. Shiryaev, “On optimum methods in quickest detection problems,”
vol. 59, no. 5, pp. 1917–1932, May 2011.
Theory Probability Appl., vol. 8, no. 1, pp. 22–46, 1963.
[21] S. Das and J. M. F. Moura, “Distributed Kalman filtering with dynamic [46] A. N. Shiryaev, Optimal Stopping Rules. New York, NY, USA: Springer-
observations consensus,” IEEE Trans. Signal Process., vol. 63, no. 17, Verlag, 1978.
pp. 4458–4473, Sep. 2015. [47] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip
[22] J. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributed algorithms,” IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2508–2530,
optimization and learning over networks,” IEEE Trans. Signal Process., Jun. 2006.
vol. 60, no. 8, pp. 4289–4305, Aug. 2012. [48] N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand,
[23] J. Du and Y.-C. Wu, “Distributed clock skew and offset estimation “Achieving 100 % throughput in an input-queued switch,” IEEE Trans.
in wireless sensor networks: Asynchronous algorithm and conver- Commun., vol. 47, no. 8, pp. 1260–1267, Aug. 1999.
gence analysis,” IEEE Trans. Wireless Commun., vol. 12, no. 11, [49] B. K. Driver. Introduction to Stochastic Processes II. [Online].
pp. 5908–5917, Nov. 2013. Accessed: Oct 6, 2016. Available: http://www.math.ucsd.edu/∼bdriver/
[24] A. K. Sahu, S. Kar, J. M. F. Moura, and H. V. Poor, “Distributed con- math180C_S2011/Lecture%20Notes/180Lec6b.pdf
strained recursive nonlinear least-squares estimation: Algorithms and [50] M. Woodroofe, Nonlinear Renewable Theory in Sequential Analysis.
asymptotics,” IEEE Trans. Signal Inf. Process. Over Netw., vol. 2, no. 4, Philadelphia, PA, USA: SIAM, 1982.
pp. 426–441, Dec. 2016. [51] Y. S. Chow and T. L. Lai, “Some one-sided theorems on the tail distribu-
[25] S. Das and J. M. F. Moura, “Consensus+innovations distributed Kalman tion of sample sums with applications to the last time and largest excess
filter with optimized gains,” IEEE Trans. Signal Process., vol. 65, no. 2, of boundary crossings,” Trans. Amer. Math. Soc., vol. 208, pp. 51–72,
pp. 467–481, Jan. 2017. Jul. 1975.
942 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018

Di Li (S’13) received the B.Eng. degree in automa- Shuguang Cui (S’99–M’05–SM’12–F’14) received
tion engineering and M.S. degree in information the Ph.D. degree in electrical engineering from
and communication engineering from the Beijing Stanford University, Stanford, CA, USA, in 2005.
University of Posts and Telecommunications, He has been working as an Assistant, an
Beijing, China, in 2008 and 2011, respectively, Associate, and a Full Professor in electrical and com-
and the Ph.D. degree in electrical and computer puter engineering with the University of Arizona,
engineering from Texas A&M University, College Tucson, AZ, USA, and Texas A&M University,
Station, TX, USA, in 2017. College Station, TX, USA. He is currently a
He is currently a Senior Engineer with the Child Family Endowed Chair Professor in electri-
Unicore Communications Technology Corporation, cal and computer engineering with the University
Fremont, CA, USA. His current research interests of California at Davis, Davis, CA, USA. His cur-
include signal processing on large-scale systems, distributed estimation and rent research interests include data driven large-scale system control and
detection, and quickest detection. resource management, large data set analysis, IoT system design, energy
harvesting-based communication system design, and cognitive network
optimization.
Dr. Cui was a recipient of the Thomson Reuters Highly Cited Researcher
and listed in the Worlds’ Most Influential Scientific Minds by ScienceWatch
in 2014, the IEEE Signal Processing Society 2012 Best Paper Award,
Soummya Kar (S’05–M’10) received the B.Tech. and the Amazon AWS Machine Learning Award in 2018. He has served
degree in electronics and electrical communication as the General Co-Chair and the TPC Co-Chair for many IEEE confer-
engineering from the Indian Institute of Technology ences. He has also been serving as the Area Editor for the IEEE Signal
Kharagpur, Kharagpur, India, in 2005 and the Ph.D. Processing Magazine and an Associate Editor for the IEEE T RANSACTIONS
degree in electrical and computer engineering from ON B IG DATA , the IEEE T RANSACTIONS ON S IGNAL P ROCESSING , the
Carnegie Mellon University, Pittsburgh, PA, USA, IEEE J OURNAL ON S ELECTED A REAS IN C OMMUNICATIONS Series on
in 2010. Green Communications and Networking, and the IEEE T RANSACTIONS ON
From 2010 to 2011, he was with the Electrical W IRELESS C OMMUNICATIONS. He has been an elected member of the IEEE
Engineering Department, Princeton University, Signal Processing Society SPCOM Technical Committee from 2009 to 2014
Princeton, NJ, USA, as a Post-Doctoral Research and the elected Chair of the IEEE ComSoc Wireless Technical Committee
Associate. He is currently an Associate Professor from 2017 to 2018. He is a member of the Steering Committee for the
of electrical and computer engineering with Carnegie Mellon University. His IEEE T RANSACTIONS ON B IG DATA and the IEEE T RANSACTIONS ON
current research interests include decision-making in large-scale networked C OGNITIVE C OMMUNICATIONS AND N ETWORKING. He is also a member
systems, stochastic systems, multiagent systems and data science, with of the IEEE ComSoc Emerging Technology Committee. He was elected as
applications to cyber-physical systems and smart energy systems. an IEEE ComSoc Distinguished Lecturer in 2014.

You might also like