Professional Documents
Culture Documents
2, APRIL 2018
Abstract—We propose a distributed Bayesian quickest is implemented in sensor networks [13]–[15], it can detect the
detection algorithm for sensor networks, based on a random gos- change of statistical features, such as the mean and variance,
sip inter-sensor communication structure. Without a control or over the observation sequences taken by sensors. For exam-
fusion center, each sensor executes its local change detection pro-
cedure in a parallel and distributed fashion, interacting with its ple, quickest change detection can be implemented in sensor
neighboring sensors via random inter-sensor communications to networks for chemical industry to monitor the leakage, or to
propagate information. By modeling the information propagation surveille the change of temperature in the field, by detecting
dynamics in the network as a Markov process, a two-layer large the change in statistical patterns.
deviation analysis is presented to analyze the performance of the For signal processing implementation in sensor networks,
proposed algorithm. The first-layer analysis shows that the rela-
tion between the probability of false alarm and the conditional essentially it can be divided into the following two cate-
averaged detection delay satisfies the large deviation principle, gories: centralized versus distributed algorithms [16]–[26]. For
where the distributed Kullback–Leibler information number is centralized quickest change detection algorithms [27]–[33], a
established as a crucial factor. The second-layer analysis stud- control or fusion center exists to process the data in a central-
ies the probability that not all observations are available at one ized way. Specifically, in centralized algorithms, they assume
sensor. It shows that this probability decays exponentially fast
to zero as the averaged rounds of communication increases. The that either the raw observations from all the sensors or cer-
large deviation upper and lower bounds for the converge rate tain preprocessed information from the sensors (some people
are then derived. Finally, we show that the performance of the call this case as decentralized sensing) are available to the
distributed algorithm converges exponentially fast to that of the control or fusion center via certain communication channels;
centralized optimal one. then a final centralized detection procedure is executed at the
Index Terms—Bayesian model, distributed detection, large center. However, centralized algorithms have some disadvan-
deviation, quickest detection, sensor networks. tages, such as heavy communication burden, high computation
complexity, low scalability, and poor robustness. On the con-
trary, distributed implements do not require a control or
fusion center, and the detection procedure is implemented at
I. I NTRODUCTION each sensor in a local and parallel fashion, with interactions
UICKEST change detection problems focus on detecting among sensors in the neighborhood to exchange information.
Q abrupt changes in stochastic processes as quickly as pos-
sible, with constraints to limit the detection error. Quickest
While centralized quickest change detection algorithms have
been well-studied, there are fewer literatures on the study
change detection has wide applications in fields, such as of distributed algorithms for quickest change detection prob-
signal and image processing [2]–[4], computer network intru- lems [34], [35], which become more desired in large-scale
sion detection [5]–[7], neuroscience [8], environment and networks with a huge volume of data, in order to reduce
public health surveillance [9], [10], and system failure detec- the overall computation complexity and to enhance scalability.
tion [11], [12]. Specifically, when quickest change detection In [34], a distributed consensus-based Page’s test algorithm,
using cumulative sum log-likelihood of the data, was proposed,
Manuscript received July 15, 2017; revised December 6, 2017 and
January 26, 2018; accepted February 12, 2018. Date of publication with the assumption that the change happening time is deter-
March 1, 2018; date of current version April 10, 2018. This work was sup- ministic but unknown, which is called a non-Bayesian setup.
ported in part by NSF under Grant DMS-1622433, Grant AST-1547436, Grant In [35], a distributed change detection algorithm was proposed,
ECCS-1659025, Grant CCF-1513936 and Grant CNS-1343155, in part by
DoD under Grant HDTRA1-13-1-0029, and in part by NSFC under Grant by combining a global consensus scheme with the geometric
NSFC-61629101. A portion of this paper was presented at the Allerton moving average control charts that generate local statistics.
Conference [1]. (Corresponding author: Shuguang Cui.) In both [34] and [35], non-Bayesian setups of the change
D. Li is with the R&D Department, Unicore Communications Technology
Corporation, Fremont, CA 94538 USA, and also with the Department of happening time are considered, where the communication
Electrical and Computer Engineering, Texas A&M University, College Station, stage and the observation stage are interleaved, i.e., they are
TX 77843 USA (e-mail: dili@tamu.edu). at the same time scale and each is executed once within one
S. Kar is with the Department of Electrical and Computer
Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 USA system time slot. Under such an interleaving strategy, the con-
(e-mail: soummyak@andrew.cmu.edu). vergence of the test statistic is established when the system
S. Cui is with the Department of Electrical and Computer Engineering, time goes to infinity. However, this type of convergence anal-
University of California at Davis, Davis, CA 95616 USA (e-mail:
sgcui@ucdavis.edu). ysis over time does not fit well into quickest change detection
Digital Object Identifier 10.1109/JIOT.2018.2810825 problems, which are time-sensitive, with the goal to detect the
2327-4662 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 931
vides a mathematical tool to understand the system property that observations at different sensors are independent of each
in the asymptotic case. In our proposed two-layer large devia- other and the various densities are absolutely continuous with
tion analysis method for distributed quickest detection, the first respect to the Lebesgue measure. Denote Xin = [X1i , . . . , Xni ]
layer analysis derives the relation between the detection delay as observations up to time n at node i. Let Pk be the proba-
and PFA, which shows that the PFA decays exponentially fast bility measure of Xin when the change occurs at time k, and
to zero, as the averaged detection delay (ADD) increases.The Ek be the corresponding expectation operator. Our goal is to
second layer analysis answers the following questions. In the design a sequential online detection algorithm (with a stopping
context of distributed detection schemes, one interesting topic criterion) over the observation sequence to detect the change.
is whether or not the distributed scheme can converge to the If prior knowledge about the change happening time is
corresponding centralized one, and if it is, how the distributed available, the Bayesian setup can be used to leverage this
scheme behaves when converging to the centralized one and prior knowledge for change detection. In particular, the prior
what is the convergence speed. By adopting the second layer distribution for the change happening time λ is denoted as
large deviation analysis, we prove that the performance of
πk = P(λ = k).
the proposed distributed quickest detection scheme converges
exponentially fast to that of the centralized one, when the This gives the prior probability that the change will hap-
averaged number of communication rounds increases. pen at time k. With the observations taken in the sequel,
Notation: Denote P as probability measure and E as the the posterior probability (via Bayesian rules) indicating that
corresponding expectation operator. Denote Pk and Ek as the the change has happened could be derived, which is used
conditional probability measure and the expectation operator, for decision making. As in statistics, quickest detection prob-
when the change occurs at time k. Denote Pπ and Eπ as lems could be basically split into non-Bayesian and Bayesian
the probability measure and the expectation operator, when categories [38], [43], [44].
932 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
principle, the probability of this rare event occurring decays increases, PFA decays to zero exponentially fast and the decay
to zero at an exponentially fast rate in the asymptotic sense rate is D + | log(1 − ρ)|.
over certain quantity. For the isolated scheme, at each node i, the relation between
∗ ∗
In this section, we analyze the performance of the cen- PFA(τ i ) and CADD1 (τ i ) has a similar format to that in the
tralized algorithm, by quantifying the relation between the centralized case shown in Theorem 1. We give the following
conditional ADD (CADD) and the PFA via large deviation corollary.
∗
analysis, showing that the event of false alarm can be con- Corollary 1: The probability of false alarm (PFA(τ i )),
sidered as a rare event and the corresponding PFA decays with the optimal stopping rule (9), satisfies the large devi-
to zero exponentially fast, when the CADD increases. The ation principle, in the asymptotic sense with respect to the
∗
results in this section will set the background for analyzing increasing conditional ADD (CADD1 (τ i )), i.e.,
the distributed case in the next section. 1 ∗
Since ADD is difficult to characterize, following [44], lim ∗
∗ log PFA τ i
CADD1 (τ i )→∞ CADD1 τ i
we instead analyze the CADD. The CADD is defined as
CADDk (τ ) = Ek (τ − k|τ ≥ k), k = 1, 2, . . . The relation = − D f1i , f0i + | log(1 − ρ)| (13)
between ADD and CADD is described as follows: which implies that the large deviation decay rate of the PFA
π
ADD(τ ) = E (τ − λ|τ ≥ λ) is D(f1i , f0i ) + | log(1 − ρ)|.
∞ Remark 2: Theorem 1 and Corollary 1 imply that the
πk Pk (τ ≥ k)Ek (τ − k|τ ≥ k)
= k=1 Kullback–Leibler information is a crucial factor that deter-
Pπ {τ ≥ λ}
∞ mines the performance of change detection algorithms.
πk Pk (τ ≥ k)CADDk (τ )
= k=1 . (11) Specifically, Corollary 1 shows that, for different sensors with
Pπ {τ ≥ λ} different pairs of densities f1i (x) and f0i (x), the sensor associated
According to the optimal stopping rule (7) and the test statis- with a density pair bearing a larger Kullback–Leibler infor-
tic (3), we find CADD1 (τ ∗ ) ≥ CADDk (τ ∗ ), for k ≥ 2, which mation asymptotically leads to a smaller PFA, with the same
is explained as follows. For k = 1 (which means that the CADD performance. Compared with the isolated scheme,
change happens at time 1), by investigating (3), 1 is updated Theorem 1 shows that, in the centralized scheme, the sum of
based on the initial state 0 = 0. For k ≥ 2, by investigat- Kullback–Leibler information numbers D is used to quantify
ing (3), k is updated based on k−1 , where 0 ≤ k−1 < A the relation between PFA and CADD.
according to the optimal stopping rule (7) and the condi- In the next section, we propose a distributed change detec-
tion τ ∗ ≥ k. Thus, we have k−1 ≥ 0 . According to the tion scheme and analyze its performance. Due to the informa-
optimal stopping rule (7), the spent time of crossing the thresh- tion propagation among sensors, we show that the distributed
old after the change happens (detection delay) in the case of scheme will outperform the isolated one, and the outperform-
k ≥ 2 is less than that in the case of k = 1 on average. ing is reflected by the averaged partial sum over individual
Therefore, we have CADD1 (τ ∗ ) ≥ CADDk (τ ∗ ). Additionally, Kullback–Leibler information numbers.
the difference between CADD1 (τ ∗ ) and CADDk (τ ∗ ) could be
treated as a constant for large A, which approximately equals IV. D ISTRIBUTED C HANGE D ETECTION AND L ARGE
E∞ (log k−1 ), k ≥ 2 [44]. Therefore, in the sequel, we focus D EVIATION A NALYSIS
on the use of CADD1 (τ ∗ ), which could be also considered as In this section, a random gossip-based distributed change
the worst-case study. detection algorithm is first introduced. Then, we model
The relation between CADD1 (τ ∗ ) and PFA(τ ∗ ), for the the information propagation in this distributed scheme as
centralized scheme, is presented in the following theorem. a Markov process. Finally, two-layer large deviation analy-
Theorem 1: The probability of false alarm (PFA(τ ∗ )), with sis is presented to analyze the performance of the proposed
the optimal stopping rule (7), satisfies the large deviation prin- distributed algorithm.
ciple, in the asymptotic sense with respect to the increasing First, we interpret the network as a nondirected graph G =
conditional ADD (CADD1 (τ ∗ )), i.e., (V, E), where V is the set of nodes with |V| = N and E is the
1
set of edges. If node i is connected to node j, then we have
lim ∗
log PFA τ ∗ that edge (i, j) ∈ E. The connection in graph G is represented
CADD1 (τ ∗ )→∞ CADD1 (τ )
by the following N × N symmetric adjacency matrix A with
= −(D + | log(1 − ρ)|) (12)
each element Aij as:
where D is the sum of the Kullback–Leibler information
N 1, (i, j) ∈ E or i = j
numbers across all sensors, i.e., D = Aij = (14)
1 , f0 ), and
D(f i i
i=1 0, otherwise.
D + | log(1 − ρ)| is the large deviation decay rate, quanti-
fying how fast the PFA decays to zero over the increasing We assume that the network is connected, i.e., each node has
CADD. a path to any other node.
The proof is shown in Appendix A.
Remark 1: The above theorem quantifies the tradeoff A. Distributed Algorithm
between two performance metrics: 1) PFA and 2) CADD1 , We propose a random gossip-based distributed change
in the defined change detection problem, i.e., as CADD1 detection algorithm, where a random gossip algorithm, as
934 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
the inter-sensor communication structure, is used to propagate Assume that the sampling time interval for taking observa-
information among sensors within the neighborhood. tions is , within which there are M rounds of inter-sensor
Before proceeding to the algorithm details, we summarize communications, where M is a Poisson random variable with
the algorithm structure here. At each time slot, there are three mean γ [47]. At the mth (m ∈ {1, . . . , M}) round, a node ran-
steps. domly selects another node from its neighborhood to construct
Step 1: Each sensor swaps its observation with its ran- a two-way communication pair to exchange the observations
domly selected neighbor at each communication between them. At each sampling time interval, this communi-
round. cation structure is modeled by the sequence of matrices A(m),
Step 2: After communications are finished, each sensor m = 1, 2, . . . , M, i.e., the establishment of a communication
updates its local test statistic with the old test link between nodes i and j indicates that nodes i and j are
statistic and the observations received from other neighbors with respect to the time varying adjacency matrix
sensors. A(m). Note that there may exist multiple communication links
Step 3: Compare the updated test statistic with the thresh- or pairs simultaneously in the network, but only one commu-
old, to decide whether or not change has happened. nication link is associated with one given node in each round,
If not, continue running the algorithm for the next which is also implied by the mathematical model in (15).
time slot. Now, we model the communication link formation pro-
Communication among sensors is constrained by factors, cess from the perspective of Markov process. To this end,
such as proximity, transmitting power, and receiving capabil- the communication link process governed by the time vary-
ities. We model the communication structure in terms of the ing adjacency matrix sequence {A(m)} can be represented
nondirected graph G = (V, E), which is defined at the begin- by N particles traveling on the graph [36]. We denote the
ning of this section. If node i can communicate with node state of the ith particle as zi (m), where zi (m) indicates the
j, there is an edge existing between i and j, i.e., the set of index of node that the ith particle travels to at time m, with
edges E contains the edge (i, j). We assume that the diagonal zi (m) ∈ {1, . . . , N}. The evolution of the ith particle is given
elements in adjacency matrix A are identically 1, which indi- as follows:
cates that a node can always communicate with itself. The set
E is the maximal set of allowable communication links in the zi (m) = [zi (m − 1)]→
m zi (0) = i (17)
network at any time; however, at a particular instant, only a where the notation [i]→ m denotes the neighbor of node i at time
fraction of the allowable communication links are active, for m with respect to the adjacency matrix A(m), i.e., a com-
example, to avoid strong interference among communications. munication link is established between i→ m and i at time m.
The exact communication protocol is not that important for the Thus, the traveling process of the ith particle can be viewed
theoretical analysis, as long as the connectivity of network is as originating from node i initially and then traveling on the
satisfied. For definiteness, we assume the following generic graph according to the link formation process {A(m)} (possi-
communication model, which subsumes the widely used gos- bly changing its location at each step). For each i, the process
sip protocol for real-time embedded architectures [47] and the {zi (m)} is a Markov chain on the state space {1, . . . , N} with
graph matching-based communication protocols for Internet the transition probability matrix Ā [36].
architectures [48]. Define the set M of binary symmetric N×N After M rounds of inter-sensor communications, each node
matrices as follows: accumulates some observations from other nodes, with which
the local test statistic at each node is updated. Denote Oin as the
M = A|1T A = 1T , A1 = 1, A ≤ A (15)
set of nodes whose observations are available at node i after
where A ≤ A is interpreted as component-wise. In other inter-sensor communications at the end of the observation time
words, M is the set of adjacency matrices, where each node period n. We will describe the accumulation process to obtain
is incident to exactly one edge, which is included in the edge Oin later. Then, the distributed test statistic in,D is updated as
set E. Let D denotes a probability distribution on the space
j j
M. We define a sequence of time-varying matrices A(m), 1
i f 1 Xn
n,D =
i
+ρ . (18)
m = 1, 2, . . ., as an i.i.d. sequence in M with distribution 1 − ρ n−1 f
j
X
j
i
j∈On 0 n
D. Define the averaged matrix Ā as
With this test statistic updating rule, at each sensor i,
Ā = AdD(A). (16) the distributed change detection scheme is executed with the
M following stopping time τDi :
According to the definition of M in (15), Ā is a symmetric
τDi (A) = inf n ≥ 1 : in,D ≥ A (19)
stochastic matrix. We assume Ā to be irreducible and aperi-
odic. This assumption depends on the allowable edges E and where A is chosen as A = (1 − α)/α such that
the distribution D. Such a distribution D making this assump- PFA(τDi (A)) ≤ α.
tion valid always exists if the graph (V, E) is connected, e.g., Note that the observations are transmitted via the inter-
the uniform distribution. In addition, Ā could be interpreted as sensor communications, compared to that the test statistics
the transition matrix of a Markov chain, which we will discuss are transmitted in the consensus + innovations-based dis-
later. tributed detection algorithms, which are widely adopted, such
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 935
as in [40] and [41]. It is conjectured that the consensus + following probability as:
innovations-based distributed algorithms are more efficient to
fuse the information, since the test statistic contains more Pr Oin = ν = qin (ν) ν ∈ 0, 1, . . . , 2N − 1 . (22)
information than the raw observations. However, the consensus
+ innovations-based strategy cannot be applied in our dis- B. First-Layer Large Deviation Analysis
tributed change detection problem. In traditional distributed To perform large deviation analysis, we first need to
detection problems, the consensus + innovations-based strat- interpret the stopping time τDi (A) as a form of random walk
egy is inspired by investigating the recursion form of the crossing a threshold plus a nonlinear term [44]. To this end,
test statistic in the centralized algorithm [41], where the test the stopping time τDi (A) could be rewritten as
statistic is updated by linearly combining the previous test
statistic and the current innovation. Then, in the consensus τDi (A) = inf{n ≥ 1 : Wn (ρ) + ln ≥ log(A/ρ)} (23)
+ innovations-based distributed algorithms, at the consensus where Wn (ρ) = Zn + n| log(1 − ρ)| is a random walk with
step, the test statistics are transmitted and linearly combined
j j
at the receiver node. The linearly combined test statistic is n f0 Xk
the counterpart of the previous test statistic in centralized Zn = log (24)
j j
algorithm. However, in our change detection problem, the k=1 j∈Oi f1 Xk
k
recursion form of the test statistic shown in (4) shows that the
and
update rule of the test statistic log n is not simply linearly ⎧ ⎫
combining the previous test statistic and the current innovation, ⎨
n−1 k f j Xsj ⎬
0
due to the factor log(n−1 + ρ). ln = log 1 + (1 − ρ)k . (25)
⎩ j j ⎭
Now, we describe the observation accumulation process to k=1 s=1 j∈Os 1 Xs
i f
obtain Oin . Let sm n = [sn (1), . . . , sn (N)], n (i) ∈
m m with element sm
n (i)
sm Specifically, Wn (ρ) is a random walk with mean
{1, . . . , N} indexing the observation Xn at sensor i just after
the mth round of communication in the observation time period N −1
2
j j
n. The initial state is s0n (i) = i at each sensor i, which means E1 {Wn (ρ)} = n q̄iγ (ν) D f1 , f0 + n| log(1 − ρ)|
that at the beginning of the time slot n, each sensor i only has ν=1 j∈ν
its own observation Xni . When the communication starts, by (26)
following the communication model A(m), the observations
{Xni }i∈{1,...,N} travel across the network in the following way: where q̄iγ (ν) is the probability defined as
n = A(m)sn
sm m−1
1 ≤ m ≤ M. (20) q̄iγ (ν) = Pr Oiγ = ν ν ∈ 0, 1, . . . , 2N − 1 (27)
During these M rounds of inter-sensor communications until in which Oiγ , a random variable taking values from , denotes
the end of the time period n, each sensor stores observa- the set of nodes whose observations are available at node i after
tions exchanged from other sensors. Then, at the end of the γ rounds of communications, and γ is the mean value of the
time period n, observations from other sensors are accumu- number of communication rounds. Then, based on the above
lated at sensor i, and the set of sensors whose observations random walk interpretation for the stopping time, we have the
are available at sensor i is denoted by following theorem regarding the relation between PFA and
M CADD in the proposed distributed change detection scheme.
Oin = n (i) .
sm (21) Theorem 2: The probability of false alarm (PFA(τDi )), with
m=0 the stopping rule (19) in the distributed change detection
algorithm with the parameter γ as the averaged number
This observation accumulation process terminates at the end
of inter-sensor communications, satisfies the large deviation
of the time period n. Then, a similar observation accumulation
principle in the asymptotic sense with respect to increasing
process repeats during the time period n + 1, which is inde-
conditional ADD (CADD1 (τDi )), i.e.,
pendent of the previous process. Therefore, the sequence {Oin },
as the set denoting observation indices which are available at 1
lim
i
i log PFA τDi
sensor i at the end of the nth period, is an i.i.d. process. CADD1 τD →∞ CADD1 τD
To better describe this paper in the sequel, we introduce
some notations here. Let denotes the power set of node = − Dγi + | log(1 − ρ)| (28)
indices {1, . . . , N}, where elements of are indexed by ν, 2N −1 i j j
with ν ∈ {0, 1, . . . , 2N − 1}. We use 0 to denote the null set where Dγi = ν=1 q̄γ (ν) j∈ν D(f1 , f0 ), and Dγi +| log(1−
and 2N −1 to denote the whole set of node indices. For techni- ρ)| is the large deviation decay rate of PFA. We call Dγi as
cal convenience, we interpret sensors in the set ν indexed by the distributed Kullback–Leibler information number.
ν to be arranged in an ascending order with j1 denoting the first The proof is shown in Appendix B.
one and j|ν | denoting the last one, i.e., ν = {j1 , . . . , j|ν | }. Remark 3: Theorem 2 shows that Dγi , whose function is
Therefore, the set Oin , denoting nodes whose observations are similar to D in the centralized scheme and D(f1i , f0i ) in the iso-
available at node i after the observation accumulation process, lated scheme, is a crucial factor determining the performance
is a random variable taking values from . We denote the of the distributed change detection algorithm. The physical
936 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
2N −2 i
meaning of Dγi is explained as follows. Due to the observa- Theorem 3: As γ → ∞, the probability v=0 q̄γ (v) has
tion propagation process, observations and the corresponding the large deviation upper and lower bounds as follows:
log-likelihood ratios from other sensors are available at each
−2
2N
sensor; to some extent, Dγi can be considered as an accu- ln β 1 ln α
≤ lim ln q̄iγ (v) ≤ (31)
mulated form of these information. In particular, Dγi is an L γ →∞ γ L
v=0
averaged partial sum of the Kullback–Leibler information
numbers D(f1i , f0i ), i = 1, . . . , N, compared to D as the total where α, β, and L are parameters in (29) and (30).
sum. Also, from the mathematical form of Dγi , we see that The proof is presented in Appendix C.
2N −2 i
D(f1i , f0i ) ≤ Dγi ≤ D, and the case of q̄iγ (1) = 1 corre- Remark 4: Since v=0 q̄γ (v) represents the probability
sponds to the lower bound Dγi = D(f1i , f0i ), while the case of of the event that not all observations are available at sensor
q̄iγ (2N −1) = 1 corresponds to the upper bound Dγi = D. Since i, Theorem 3 implies that this event is a rare event and its
D(f1i , f0i ) ≤ Dγi ≤ D and Dγi determines the performance of probability decays exponentially fast to zero as γ → ∞.
the change detection algorithm, the above analysis proves that Based on Theorem 3, we further have the following theo-
the distributed algorithm outperforms the isolated algorithm, rem regarding the behavior of the distributed Kullback–Leibler
but falls behind the centralized one. information number Dγi defined in Theorem 2.
Theorem 4: As γ → ∞, we have the following upper and
lower bounds for the value of Dγi :
C. Second-Layer Large Deviation Analysis ln α
Since Dγi has been shown as a crucial factor in the large D− max D f1 , f0 e L γ ≤ Dγi
j j
deviation analysis of last section, in this section, we focus on j∈{1,...,N}\i
ln β
studying the behavior of Dγi . As we still stay in the scope of
min D f1 , f0 e L γ (32)
j j
≤D−
large deviation analysis as we did in the last section, we call j∈{1,...,N}\i
it as the second-layer large deviation analysis, where the anal-
j j
ysis in the last section is called the first-layer large deviation where D(f1 , f0 ) is the Kullback–Leibler information number
analysis. defined in (10) and D is the centralized Kullback–Leibler
As we cannot obtain the closed-form for Dγi due to the information number defined in Theorem 1, and ln α/L and
complicated probabilities incorporated, we discuss its asymp- ln β/L are the upper and lower bounds derived in Theorem 3.
totic behavior when γ → ∞. To this end, we first study the The proof is presented in Appendix D.
behavior of q̄iγ (ν), defined below (26), when γ → ∞, by Remark 5: Theorem 4 implies that Dγi converges to D
employing the concept of hitting times in Markov chains. exponentially fast, as γ → ∞. Since Dγi and D determine
For each ν = 2N − 1, without loss of generality, we the performance of the distributed and centralized algorithms,
assume that ν corresponds to the index of the sensor subset respectively, this theorem also implies that the performance
{i1 , i2 , . . . , im }, with {i
1 , i
2 , . . . , i
N−m } as the complementary of the proposed distributed algorithm converges to that of the
subset, where m ≥ 1 due to the fact that at least its own obser- centralized one at an exponentially fast rate.
vation is available at each sensor. Let Tj denotes the hitting
time, starting from state (index of sensor) j to hit another spe-
V. S IMULATION R ESULTS
cific state i in the Markov chain, whose transition probability
matrix is Ā defined in (16). From [49, Th. 7.26], since the In this section, we simulate the proposed distributed algo-
transition probability matrix Ā is irreducible, there exists con- rithm with a network of five nodes taking observations. We
stants 0 < α < 1 and 0 < L < ∞ such that P(Tj > L) ≤ α, ∀j, consider a Bayesian setup, and set the prior distribution of
and more generally, the change-point time as a geometric distribution with param-
eter ρ = 0.1. Before the change happens, we consider that
the observation at each node follows a Gaussian distribution
P Tj > kL ≤ α k , k = 0, 1, 2, . . . . (29)
N(0, 1); after the change happens, the observation at node
i, i = 1, . . . , 5, turns to follow another Gaussian distribu-
Also, there exists a constant 0 < β < 1 such that P(Tj > L) ≥
tion N(0.1 × i, 1). Note that here we consider a setup that
β, ∀j, and more generally,
observations at different nodes have different post-change dis-
tributions, which is to mimic the more general situation that
P Tj > kL ≥ β k , k = 0, 1, 2, . . . . (30)
different nodes could suffer different levels of impact from
the same physical change. For example, certain physical event,
Based on the above results of hitting times in Markov such as the leakage of chemical gas or the abrupt increasing of
chains, we first present the following large deviation related
2N −2 i temperature, would lead to different degrees of impacts in dif-
theorem on the asymptotic behavior of v=0 q̄γ (v), as γ → ferent nodes, due to their various locations. The nodes near the
∞. Since ν ∈ {0, 1, . . . , 2N − 1} according to (27), we have origin of the physical event could suffer from a more serious
2N −2 i
v=0 q̄γ (v) = 1 − q̄γ (2 − 1), where q̄γ (2 − 1) denotes the
i N i N
influence, which is reflected by a larger mean in the post-
probability that the observations from all sensors are available distribution; the nodes faraway the origin could suffer from a
2N −2 i
at sensor i, i.e., v=0 q̄γ (v) is the probability of the event less serious influence, which is reflected by a smaller mean in
that not all observations are available at sensor i. the post-distribution.
LI et al.: DISTRIBUTED QUICKEST DETECTION IN SENSOR NETWORKS VIA TWO-LAYER LARGE DEVIATION ANALYSIS 937
Fig. 2. First-layer large deviation analysis: comparison of decay rates in Fig. 3. Second-layer large deviation analysis in Theorem 3: simulated decay
distributed, centralized, and isolated schemes with simulation (dashed line) rate (dashed line) of the probability that not all observations are available at a
versus analytical results (solid line). sensor, and the corresponding large deviation upper and lower bounds (solid
lines).
Since CADD1 (τ ∗ ) = E1 (τ ∗ − 1) = E1 (τ ∗ ) − 1, by
combining (34) and (35), we have
PFA(τ ∗ )ρ
log = −CADD1 τ ∗ (D + | log(1 − ρ)|)
ζ (ρ, D)(1 + o(1))
− ξ (ρ, D) + (o(1) − 1)(D + | log(1 − ρ)|).
(36)
Then, after dividing the left-hand and right-hand sides of (36)
by CADD1 (τ ∗ ) and taking the limit as CADD1 (τ ∗ ) → ∞,
we have
1
lim ∗
log PFA τ ∗
CADD1 (τ ∗ )→∞ CADD1 (τ )
= −(D + | log(1 − ρ)|). (37)
Thus, we turn to study Eπ e−ωa |τDi ≥ λ as Since ln , n = 1, 2, . . ., are nondecreasing, we have ln ≤ l.
Then, we have E1 (ln ) < ∞, implying the uniform integrabil-
Eπ e−ωa |τDi ≥ λ ity. Therefore, the second condition is satisfied.
∞
Now, we intend to show the validity of the third condition.
= Ek e−ωa |τDi ≥ k P λ = k|τDi ≥ k . (42) According to [44, Lemma 1], we have
k=1
For any 1 ≤ k < ∞, we have P1 τDi (A) ≤ 1 + (1 − )La ≤ e−φ a + β(, A) (50)
i +| log(1−ρ)|)−1 , φ > 0 for all 0 < < 1,
τDi = inf n ≥ 1 : Wn,k (ρ) + ln,k ≥ a (43) where La = a(DD
and β(, A) = P1 {max1≤n<K,A Zn ≥ (1+)DD i K }, in which
,A
where Wn,k (ρ) = Zn,k + (n − k + 1)| log(1 − ρ)|, n ≥ k, is a K,A = (1 − )La and Zn is defined in (24). The term e−φ a
random walk with Ek [Wn,k (ρ)] = (n−k+1)(Dγi +| log(1−ρ)|) on the right-hand side is o(1/a). Thus, in order to show
and ln,k is a nonlinear term. In Wn,k (ρ), we have
lim aP1 τDi (A) ≤ 1 + (1 − )La = 0 (51)
j j
n f0 Xt A→∞
Zn,k = log . (44) we only need to prove that the other term β(, A)
j j
t=k j∈Oi f1 Xt is also o(1/a), since a = log(A/ρ). To this end, by
t
Then, by applying [50, Th. 4.1], we obtain applying [51, Th. 1], for ν > 0 and r ≥ 0, we have
∞ !
lim Ek e−ωa |τDi ≥ k = ζ ρ, Dγi (45) P1 max Zk − DD i
k ≥ νn
A→∞ 1≤k≤n
n=1
!
where ζ (ρ, Dγi )
is a function of parameters ρ and Dγi .
r+1
i +
r
i 2
We also have ≤ Cr E1 Z1 − DD + E1 Z1 − DD (52)
πk Pk τDi ≥ k|λ = k
lim P λ = k|τDi ≥ k = lim
i = πk . where Cr is a constant. When r = 1, the finiteness of the right-
A→∞ A→∞ Pπ τD ≥ k hand side of the above inequality implies that the left-hand
(46) side is also finite. Thus, we obtain P1 {max1≤k≤n (Zk − DD
i k) ≥
νn} = o(1/n).
Therefore, by plugging (45) and (46) into (42), we have
Then, with the fact that
!
lim Eπ e−ωa |τDi ≥ λ = ζ ρ, Dγi . (47)
A→∞ β(, A) ≤ P1 max Zn − DD i
n ≥ DDi
K,A (53)
1≤n<K,A
Finally, by combining (40), (41), and (47), we prove (38).
The proof of (39) depends on [50, Th. 4.5]. In order to use we have β(, A) = o(1/a). Therefore,
this theorem, the validity of the following three conditions
needs to be checked: lim aP1 τDi (A) ≤ 1 + (1 − )La = 0. (54)
A→∞
∞
P1 {ln ≤ −θ n} < ∞, for some 0 < θ < DD i By taking ε = 1 − , finally, we have
n=1 lim aP1 τDi (A) ≤ εLa
max |ln+k |, n ≥ 1, are P1 uniformly integrable A→∞
0≤k≤n
i −1 ≤ lim aP1 τDi (A) ≤ 1 + (1 − )La
lim aP1 τDi (A) ≤ εa DD + | log(1 − ρ)| =0 A→∞
A→∞ = 0. (55)
for some 0 < ε < 1, where a = log(A/ρ)
Hence, the third condition is satisfied. Therefore, the con-
where ln is defined in (25). ditions of [50, Th. 4.5] are satisfied. This theorem shows
It is easy to check that the first condition is valid, as ln ≥ that (39) is valid.
0. For the second condition, we have max0≤k≤n |ln+k | = l2n , Then, with (38) and (39), by taking the same proof method
since ln , n = 1, 2, . . ., are nondecreasing. Thus, to check that of Theorem 1, we have
the second condition is valid, we only need to show that ln ,
1
n = 1, 2, . . ., are P1 uniformly integrable. To this end, we have lim
i
i log PFA τDi
that ln converges almost surely, as n → ∞, to the following CADD1 τD →∞ CADD1 τD
random variable: = − Dγi + | log(1 − ρ)| . (56)
⎧ ⎫
⎨ ∞ k f j Xsj ⎬
0
l = log 1 + (1 − ρ)k . (48)
⎩ j j ⎭
k=1 s=1 j∈Ois f1 Xs A PPENDIX C
By taking the expectation, we have P ROOF OF T HEOREM 3
∞
Recall that ν corresponds to the index of the sensor subset
1
E1 (l) ≤ log 1 + (1 − ρ) = log .
k
(49) {i1 , i2 , . . . , im }, with {i
1 , i
2 , . . . , i
N−m } as the complementary
ρ subset, and Tj denotes the hitting time, starting from state
k=1
940 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
[3] H. Li, H. Dai, and C. Li, “Collaborative quickest spectrum sensing via [26] A. K. Sahu and S. Kar, “Recursive distributed detection for com-
random broadcast in cognitive radio systems,” IEEE Trans. Wireless posite hypothesis testing: Nonlinear observation models in additive
Commun., vol. 9, no. 7, pp. 2338–2348, Jul. 2010. Gaussian noise,” IEEE Trans. Inf. Theory, vol. 63, no. 8, pp. 4797–4828,
[4] S. Trivedi and R. Chandramouli, “Secret key estimation in sequen- Aug. 2017.
tial steganography,” IEEE Trans. Signal Process., vol. 53, no. 2, [27] A. G. Tartakovsky and V. V. Veeravalli, “Asymptotically optimal quick-
pp. 746–757, Feb. 2005. est change detection in distributed sensor systems,” Sequential Anal.,
[5] M. Thottan and C. Ji, “Anomaly detection in IP networks,” IEEE Trans. vol. 27, no. 4, pp. 441–475, Oct. 2008.
Signal Process., vol. 51, no. 8, pp. 2191–2204, Aug. 2003. [28] V. V. Veeravalli, “Decentralized quickest change detection,” IEEE Trans.
[6] A. G. Tartakovsky, B. L. Rozovskii, R. B. Blazek, and H. Kim, “A novel Inf. Theory, vol. 47, no. 4, pp. 1657–1665, May 2001.
approach to detection of intrusions in computer networks via adaptive [29] O. Hadjiliadis, H. Zhang, and H. V. Poor, “One shot schemes for decen-
sequential and batch-sequential change-point detection methods,” IEEE tralized quickest change detection,” IEEE Trans. Inf. Theory, vol. 55,
Trans. Signal Process., vol. 54, no. 9, pp. 3372–3382, Sep. 2006. no. 7, pp. 3346–3359, Jul. 2009.
[7] A. A. Cardenas, S. Radosavac, and J. S. Baras, “Evaluation of detec- [30] G. V. Moustakides, “Decentralized CUSUM change detection,” in
tion algorithms for MAC layer misbehavior: Theory and experiments,” Proc. 9th Int. Conf. Inf. Fusion, Florence, Italy, Jul. 2006, pp. 1–6.
IEEE/ACM Trans. Netw., vol. 17, no. 2, pp. 605–617, Apr. 2009. [31] L. Zacharias and R. Sundaresan, “Decentralized sequential change detec-
tion using physical layer fusion,” IEEE Trans. Wireless Commun., vol. 7,
[8] D. Commenges, J. Seal, and F. Pinatel, “Inference about a change
no. 12, pp. 4999–5008, Dec. 2008.
point in experimental neurophysiology,” Math. Biosci., vol. 80, no. 1,
[32] D. Li, L. Lai, and S. Cui, “Quickest change detection and identification
pp. 81–108, Jul. 1986.
across a sensor array,” in Proc. IEEE Glob. Conf. Signal Inf. Process.
[9] M. Frisén, “Optimal sequential surveillance for finance, public health,
(GlobalSIP), Austin, TX, USA, Dec. 2013, pp. 145–148.
and other areas,” Sequential Anal., vol. 28, no. 3, pp. 310–337,
[33] T. Banerjee, V. Sharma, V. Kavitha, and A. JayaPrakasam, “Generalized
Jul. 2009.
analysis of a distributed energy efficient algorithm for change detec-
[10] C. Sonesson and D. Bock, “A review and discussion of prospective tion,” IEEE Trans. Wireless Commun., vol. 10, no. 1, pp. 91–101,
statistical surveillance in public health,” J. Roy. Stat. Soc., vol. 166, Jan. 2011.
no. 1, pp. 5–21, Feb. 2003. [34] P. Braca, S. Marano, V. Matta, and P. Willett, “Consensus-based page’s
[11] J. A. Rice et al., “Flexible smart sensor framework for autonomous test in sensor networks,” Signal Process., vol. 91, no. 4, pp. 919–930,
structural health monitoring,” Smart Struct. Syst., vol. 6, nos. 5–6, Apr. 2011.
pp. 423–438, May 2010. [35] S. S. Stankovic, N. Ilic, M. S. Stankovic, and K. H. Johansson,
[12] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, “Distributed change detection based on a consensus algorithm,”
“Wireless sensor networks for habitat monitoring,” in Proc. 1st ACM Int. IEEE Trans. Signal Process., vol. 59, no. 12, pp. 5686–5697,
Workshop Wireless Sensor Netw. Appl., Atlanta, GA, USA, Sep. 2002, Dec. 2011.
pp. 88–97. [36] D. Li, S. Kar, J. M. F. Moura, H. V. Poor, and S. Cui, “Distributed
[13] T. Banerjee and V. V. Veeravalli, “Energy-efficient quickest Kalman filtering over massive data sets: Analysis through large devia-
change detection in sensor networks,” in Proc. IEEE Stat. Signal tions of random Riccati equations,” IEEE Trans. Inf. Theory, vol. 61,
Process. Workshop (SSP), Ann Arbor, MI, USA, Aug. 2012, no. 3, pp. 1351–1372, Mar. 2015.
pp. 636–639. [37] D. Li, S. Kar, F. E. Alsaadi, A. M. Dobaie, and S. Cui,
[14] Y. Mei, “Quickest detection in censoring sensor networks,” in Proc. “Distributed Kalman filtering with quantized sensing state,”
IEEE Int. Symp. Inf. Theory (ISIT), St. Petersburg, Russia, Jul. 2011, IEEE Trans. Signal Process., vol. 63, no. 19, pp. 5180–5193,
pp. 2148–2152. Oct. 2015.
[15] A. G. Tartakovsky and V. V. Veeravalli, “Quickest change detection in [38] A. Dembo and O. Zeitouni, Large Deviations Techniques and
distributed sensor systems,” in Proc. 6th IEEE Int. Conf. Inf. Fusion, Applications. New York, NY, USA: Springer, 1998.
Cairns, QLD, Australia, Jul. 2003, pp. 756–763. [39] J. A. Bucklew, Large Deviation Techniques in Decision, Simulation, and
[16] G. Mateos and G. B. Giannakis, “Distributed recursive least-squares: Estimation. New York, NY, USA: Wiley, 1990.
Stability and performance analysis,” IEEE Trans. Signal Process., [40] D. Bajovic, D. Jakovetic, J. Xavier, B. Sinopoli, and J. M. F. Moura,
vol. 60, no. 7, pp. 3740–3754, Jul. 2012. “Distributed detection via Gaussian running consensus: Large devia-
[17] L. Li, J. A. Chambers, C. G. Lopes, and A. H. Sayed, “Distributed tions asymptotic analysis,” IEEE Trans. Signal Process., vol. 59, no. 9,
estimation over an adaptive incremental network based on the affine pp. 4381–4396, Sep. 2011.
projection algorithm,” IEEE Trans. Signal Process., vol. 58, no. 1, [41] D. Jakovetić, J. M. F. Moura, and J. Xavier, “Distributed detection over
pp. 151–164, Jan. 2010. noisy networks: Large deviations analysis,” IEEE Trans. Signal Process.,
[18] A. H. Sayed, S.-Y. Tu, J. Chen, X. Zhao, and Z. J. Towfic, “Diffusion vol. 60, no. 8, pp. 4306–4320, Aug. 2012.
strategies for adaptation and learning over networks: An examination [42] A. K. Sahu and S. Kar, “Distributed sequential detection for Gaussian
of distributed strategies and network behavior,” IEEE Signal Process. shift-in-mean hypothesis testing,” IEEE Trans. Signal Process., vol. 64,
Mag., vol. 30, no. 3, pp. 155–171, May 2013. no. 1, pp. 89–103, Jan. 2016.
[43] H. V. Poor and O. Hadjiliadis, Quickest Detection. Cambridge, U.K.:
[19] P. Braca, S. Marano, V. Matta, and P. Willett, “Asymptotic optimality
Cambridge Univ. Press, 2008.
of running consensus in testing binary hypotheses,” IEEE Trans. Signal
[44] A. G. Tartakovsky and V. V. Veeravalli, “General asymptotic Bayesian
Process., vol. 58, no. 2, pp. 814–825, Feb. 2010.
theory of quickest change detection,” Theory Probability Appl., vol. 49,
[20] F. S. Cattivelli and A. H. Sayed, “Distributed detection over adap-
no. 3, pp. 458–497, 2005.
tive networks using diffusion adaptation,” IEEE Trans. Signal Process.,
[45] A. N. Shiryaev, “On optimum methods in quickest detection problems,”
vol. 59, no. 5, pp. 1917–1932, May 2011.
Theory Probability Appl., vol. 8, no. 1, pp. 22–46, 1963.
[21] S. Das and J. M. F. Moura, “Distributed Kalman filtering with dynamic [46] A. N. Shiryaev, Optimal Stopping Rules. New York, NY, USA: Springer-
observations consensus,” IEEE Trans. Signal Process., vol. 63, no. 17, Verlag, 1978.
pp. 4458–4473, Sep. 2015. [47] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip
[22] J. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributed algorithms,” IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2508–2530,
optimization and learning over networks,” IEEE Trans. Signal Process., Jun. 2006.
vol. 60, no. 8, pp. 4289–4305, Aug. 2012. [48] N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand,
[23] J. Du and Y.-C. Wu, “Distributed clock skew and offset estimation “Achieving 100 % throughput in an input-queued switch,” IEEE Trans.
in wireless sensor networks: Asynchronous algorithm and conver- Commun., vol. 47, no. 8, pp. 1260–1267, Aug. 1999.
gence analysis,” IEEE Trans. Wireless Commun., vol. 12, no. 11, [49] B. K. Driver. Introduction to Stochastic Processes II. [Online].
pp. 5908–5917, Nov. 2013. Accessed: Oct 6, 2016. Available: http://www.math.ucsd.edu/∼bdriver/
[24] A. K. Sahu, S. Kar, J. M. F. Moura, and H. V. Poor, “Distributed con- math180C_S2011/Lecture%20Notes/180Lec6b.pdf
strained recursive nonlinear least-squares estimation: Algorithms and [50] M. Woodroofe, Nonlinear Renewable Theory in Sequential Analysis.
asymptotics,” IEEE Trans. Signal Inf. Process. Over Netw., vol. 2, no. 4, Philadelphia, PA, USA: SIAM, 1982.
pp. 426–441, Dec. 2016. [51] Y. S. Chow and T. L. Lai, “Some one-sided theorems on the tail distribu-
[25] S. Das and J. M. F. Moura, “Consensus+innovations distributed Kalman tion of sample sums with applications to the last time and largest excess
filter with optimized gains,” IEEE Trans. Signal Process., vol. 65, no. 2, of boundary crossings,” Trans. Amer. Math. Soc., vol. 208, pp. 51–72,
pp. 467–481, Jan. 2017. Jul. 1975.
942 IEEE INTERNET OF THINGS JOURNAL, VOL. 5, NO. 2, APRIL 2018
Di Li (S’13) received the B.Eng. degree in automa- Shuguang Cui (S’99–M’05–SM’12–F’14) received
tion engineering and M.S. degree in information the Ph.D. degree in electrical engineering from
and communication engineering from the Beijing Stanford University, Stanford, CA, USA, in 2005.
University of Posts and Telecommunications, He has been working as an Assistant, an
Beijing, China, in 2008 and 2011, respectively, Associate, and a Full Professor in electrical and com-
and the Ph.D. degree in electrical and computer puter engineering with the University of Arizona,
engineering from Texas A&M University, College Tucson, AZ, USA, and Texas A&M University,
Station, TX, USA, in 2017. College Station, TX, USA. He is currently a
He is currently a Senior Engineer with the Child Family Endowed Chair Professor in electri-
Unicore Communications Technology Corporation, cal and computer engineering with the University
Fremont, CA, USA. His current research interests of California at Davis, Davis, CA, USA. His cur-
include signal processing on large-scale systems, distributed estimation and rent research interests include data driven large-scale system control and
detection, and quickest detection. resource management, large data set analysis, IoT system design, energy
harvesting-based communication system design, and cognitive network
optimization.
Dr. Cui was a recipient of the Thomson Reuters Highly Cited Researcher
and listed in the Worlds’ Most Influential Scientific Minds by ScienceWatch
in 2014, the IEEE Signal Processing Society 2012 Best Paper Award,
Soummya Kar (S’05–M’10) received the B.Tech. and the Amazon AWS Machine Learning Award in 2018. He has served
degree in electronics and electrical communication as the General Co-Chair and the TPC Co-Chair for many IEEE confer-
engineering from the Indian Institute of Technology ences. He has also been serving as the Area Editor for the IEEE Signal
Kharagpur, Kharagpur, India, in 2005 and the Ph.D. Processing Magazine and an Associate Editor for the IEEE T RANSACTIONS
degree in electrical and computer engineering from ON B IG DATA , the IEEE T RANSACTIONS ON S IGNAL P ROCESSING , the
Carnegie Mellon University, Pittsburgh, PA, USA, IEEE J OURNAL ON S ELECTED A REAS IN C OMMUNICATIONS Series on
in 2010. Green Communications and Networking, and the IEEE T RANSACTIONS ON
From 2010 to 2011, he was with the Electrical W IRELESS C OMMUNICATIONS. He has been an elected member of the IEEE
Engineering Department, Princeton University, Signal Processing Society SPCOM Technical Committee from 2009 to 2014
Princeton, NJ, USA, as a Post-Doctoral Research and the elected Chair of the IEEE ComSoc Wireless Technical Committee
Associate. He is currently an Associate Professor from 2017 to 2018. He is a member of the Steering Committee for the
of electrical and computer engineering with Carnegie Mellon University. His IEEE T RANSACTIONS ON B IG DATA and the IEEE T RANSACTIONS ON
current research interests include decision-making in large-scale networked C OGNITIVE C OMMUNICATIONS AND N ETWORKING. He is also a member
systems, stochastic systems, multiagent systems and data science, with of the IEEE ComSoc Emerging Technology Committee. He was elected as
applications to cyber-physical systems and smart energy systems. an IEEE ComSoc Distinguished Lecturer in 2014.