You are on page 1of 13

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO.

7, NOVEMBER 1998

2917

Information Bounds and Quick Detection of Parameter Changes in Stochastic Systems


Tze Leung Lai
AbstractBy using information-theoretic bounds and sequential hypothesis testing theory, this paper provides a new approach to optimal detection of abrupt changes in stochastic systems. This approach not only generalizes previous work in the literature on optimal detection far beyond the relatively simple models treated but also suggests alternative performance criteria which are more tractable and more appropriate for general stochastic systems. In addition, it leads to detection rules which have manageable computational complexity for on-line implementation and yet are nearly optimal under the different performance criteria considered. Index Terms Composite moving average schemes, GLR detectors, KullbackLeibler information, sequential detection.

I. INTRODUCTION HE problem of quick detection, with low false-alarm rate, of abrupt changes in a stochastic system on the basis of sequential observations from the system has many important applications, including industrial quality control, automated fault detection in controlled dynamical systems, segmentation of signals, and gain updating in adaptive algorithms. The goals of this paper are to provide a general optimality theory for detection problems and to develop detection rules which are asymptotically optimal and yet are not too demanding in computational and memory requirements for on-line implementation. As noted in the recent monograph [2], there is a large literature on detection algorithms in complex stochastic systems but relatively little work on the statistical properties and optimality theory of detection procedures beyond very simple models. are independent with a common When the observations and with another common density density function for for , Shiryayev [15] formulated the problem function in a of optimal sequential detection of the change-time Bayesian framework by putting a geometric prior distribution and assuming a loss of for each observation taken on and a loss of for a false alarm before He after used optimal stopping theory to show that the Bayes rule triggers an alarm as soon as the posterior probability that a change has occurred exceeds some xed level. Yakir [20] generalized the result to nite-state Markov chains, while Bojdecki [3] considered a somewhat different loss function and
Manuscript received June 8, 1996; revised December 19, 1997. This work was supported by the National Science Foundation under Grant DMS9403794. The author is with the Department of Statistics, Stanford University, Stanford, CA 94305 USA. Publisher Item Identier S 0018-9448(98)07361-1.

used optimal stopping theory to nd the Bayes rule. For more general prior distributions on or non-Markovian stochastic systems , the optimal stopping problem associated with the Bayes detection rule becomes intractable. Instead of trying to solve directly the optimal stopping problem, our approach is to rst develop an asymptotic lower bound for the detection delay subject to a false-alarm probability not exceeding and then to nd an on-line detection procedure that attains this The details are given lower bound asymptotically as in Section II. The false-alarm probability constraint requires a prior distribution for its formulation. An alternative formulation which is more commonly adopted is the average run length (ARL) constraint that the expected duration to false alarm be at Again in the simple setting considered by Shiryayev least but without the prior distribution on , Lorden [8] showed that subject to this ARL constraint, the CUSUM procedure proposed by Page [11] asymptotically minimizes the worst Lordens case detection delay dened in (2) below as method is to relate the CUSUM (cumulative sum) procedure to certain one-sided sequential probability ratio tests which versus Instead of studying are optimal for testing the optimal detection problem via sequential testing theory, Moustakides [9] was able to formulate the worst case detection delay problem subject to an ARL constraint as an optimal stopping problem and to prove that Pages CUSUM rule is a solution to the optimal stopping problem. Ritov [14] later gave a somewhat simpler proof. However, for general stochastic systems , the corresponding optimal stopping problems are prohibitively difcult. By using a change-of-measure argument and the law of large numbers for log-likelihood ratio statistics, we develop in Section II an asymptotic lower bound for the worst case detection delay in general stochastic systems subject to an ARL constraint. When the post-change distribution is completely specied, this lower bound can be asymptotically attained by a likelihood-based CUSUM or moving average procedure. When there are unknown parameters in the post-change distribution, we propose in Section III two modications of the CUSUM procedure that also attain the same asymptotic lower bound as in the case of unknown parameters. One is a windowlimited generalized likelihood ratio procedure, rst introduced by Willsky and Jones [19], with a suitably chosen window size. Another modication is to replace the generalized likelihood ratio statistics in the WillskyJones scheme by mixture likelihood ratio statistics. The choice of the window size and the threshold in the WillskyJones procedure has been a long-

00189448/98$10.00 1998 IEEE

2918

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 7, NOVEMBER 1998

standing problem (cf. [2, p. 287]), and Section III addresses this problem. The use of a suitably chosen window in the generalized likelihood ratio scheme is not only needed to make the procedure computationally feasible but is also important to ensure a prescribed false-alarm rate (for a given prior distribution of ) or prescribed duration to false alarm. We give in Section II an alternative constraint in the form that the is probability of a false alarm within a period of length , irrespective of when the period starts. For a wide range of values (depending on ) of , it is shown that this constraint implies an asymptotic lower bound for the detection delay when the change point occurs at the beginning of the period, and that the window-limited likelihood ratio CUSUM and generalized/mixture likelihood ratio rules with window size and with threshold of the order of magnitude satisfy this constraint and attain the asymptotic lower bound. This result is shown to imply the asymptotic optimality of these procedures with respect to the worst case detection delay under the ARL constraint and with respect to the Bayesian detection delay under a Bayesian false-alarm probability constraint. It also provides important insights into how the window size in the WillskyJones procedure should be chosen. Section IV considers some examples and applications, and reports a simulation study of the performance of these window-limited rules and several other rules in the literature for fault detection in linear dynamic systems. II. INFORMATION BOUNDS AND OPTIMAL DETECTION THEORY Let be independent random variables with and let be a common density function We shall use independent with a common density function to denote such probability measure (with change time ) and use to denote the case (no change point). Dene the cumulative sum (CUSUM) rule (1) Here and in the sequel where is so chosen that Moustakides [9] and Ritov [14] showed we dene that (1) minimizes

In this section we generalize Lordens asymptotic theory far beyond the above setting of independent and identically distributed (i.i.d.) before, and after, some change-time The approach of Lorden [8] and of subsequent renements in [9] and [14] depends heavily on the i.i.d. structure and is difcult to generalize to dependent and nonstationary The extension of Lordens method and results by Bansal and PapantoniKazakos [1] to the case of stationary ergodic before, and after, uses ergodic theory and involves strong assumptions that require independence between and We use a different approach which is simpler and more general than that of [8] and [1]. More importantly, our approach, which involves a change-of-measure argument similar to that introduced in [4] for sequential hypothesis testing, provides new insights into the relationship between and the worst case detection delay the constraint in (2) that involves the essential supremum over and the random variables Suppose that under , the conditional density function given is for every of and that under , the conditional density function is for and is for Let (4) A natural generalization of the CUSUM rule (1) is (5)

converges in probability We shall assume that to some positive constant Noting that this under , we can regard holds in the i.i.d. case with as the KullbackLeibler information number for two joint distributions. The change-of-measure argument below explains why plays a central role in optimal detection theory. A. Generalization of Lordens Asymptotic Theory To generalize (3) beyond the i.i.d. setting, we need the dened in (4): following assumption on the

(2) Earlier, Lorden [8] proved over all rules with that this optimality property holds asymptotically as and that (3) where

(6) , assumption (6) inAs in Lordens denition (2) of and taking essential volves conditioning on supremum (which is the least upper bound, except on an event with probability ) of a random variable (which is the conditional probability). It also involves some positive in the i.i.d. case, which constant that reduces to satises (6) as will be discussed further in Section IV in connection with some examples and applications.

is the relative entropy (or KullbackLeibler information number).

LAI: INFORMATION BOUNDS AND QUICK DETECTION OF PARAMETER CHANGES IN STOCHASTIC SYSTEMS

2919

Theorem 1: Suppose that (6) holds for some positive conThen as stant

by (8), since is the largest integer follows. Moreover, since

Hence (9) ,

(7) is dened in (2). where be a positive integer Proof: Let then for some and because otherwise with implying that as , by (6). Combining this with (9) yields If , (8) for all (11)

and, therefore,

as

Since

is arbitrary, it then follows that (12)

To prove (7), let be the largest integer Suppose Then we can choose (which depends on ) satisfying (8). We rst show that as where

term in (12) is uniform over all stopping Note that the since the upper bounds in (10) and rules with Since , (11) do not depend on

(9) for the chosen and every and let generated by to Then Let be the -eld be the restriction of

and, therefore, (7) follows from (12). In Theorem 1 and its proof, the baseline ARL constraint implies the asymptotic lower bound for only for some unspecied , which is related to the constraint via (8). Because of this, we have to take

for and, therefore, which was proposed by Liu and Blostein [7] and earlier by Pollak [13] to quantify detection delay in lieu of Lordens more conservative performance criterion (2). Instead of conwhich depends on the detection rule ditioning on chosen, Lordens worst case detection delay (2) conditions and takes the essential on the random variables and over This essential supremum over supremum appears in the conclusion (7) and the assumption (6) of Theorem 1. The asymptotic lower bound for

noting that

on

Becauase

it then follows that for all large

(10)

subject to in Theorem 1 generalizes the results of and of Yakir [20] for nitePollak [13] for independent It will be shown in Theorem 4 that the state Markov chains CUSUM rule (5) with suitably chosen threshold and certain window-limited modications thereof attain this asymptotic lower bound.

2920

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 7, NOVEMBER 1998

B. Information Bounds Under Alternative Performance Criteria As pointed out above, Lordens asymptotic theory and its generalization in Theorem 1 give an asymptotic lower bound only at the that maximizes the for expected delay. If we want an asymptotic lower bound for at any given , the proof of Theorem 1 suggests that the baseline ARL constraint should be replaced by with but ; see (8). Indeed, under this probability constraint, we can use the same arguments as in (9) and (11) to show that (13) However, the conditional probability

by (14), since Moreover, as in (11), (15) implies that

for all small

as

Hence

and, therefore,

is difcult to evaluate when the denominator is small. Ignoring the denominator leads to a simpler probability constraint of the We shall require this bound form and some that depends only to hold for all on , i.e., where but as (14)

where the

term is uniform in

Since

(16) follows. A Bayesian alternative to the ARL constraint the false-alarm probability constraint

is

(17) is a probability measure on the positive integers. where as the prior distribution of the change time , Interpreting The following theorem, the left-hand side of (17) is whose proof is given in the Appendix, gives an asymptotic lower bound for

The next theorem gives a uniform (over ) asymptotic lower under the probability constraint (14) bound for and the following relaxation of assumption (6): As (15) It will be shown later that certain window-limited modications of the CUSUM rule (5) attain the asymptotic lower bound for the detection delay subject to the probability constraint (14). Theorem 2: Suppose that (14) and (15) hold for some Then as positive constant

and shows that the CUSUM rule (5) with suitably chosen attains the lower bound under certain conditions. Theorem 3: Suppose that (15) holds for some positive Let be a probability measure on the positive constant integers such that as and

uniformly in

(16) for some positive constant Let be a detection rule satisfying the probability constraint (17). Then

, dene by (9) with reProof: For any Then the same change-of-measure arguplaced by ment as in (10) shows that for all sufciently small

(18) Among the three false-alarm constraints (14), (17), and considered in Theorems 13, (14) can be regarded

LAI: INFORMATION BOUNDS AND QUICK DETECTION OF PARAMETER CHANGES IN STOCHASTIC SYSTEMS

2921

as the most stringent. Suppose a detection rule and Then for

satises (14).

When the are not i.i.d. under , the time reversal argument in (22) breaks down and the CUSUM rule (5) need not satisfy (14). To circumvent this with in (5) by , difculty, we replace leading to the window-limited CUSUM rule (23) The next theorem, whose proof is goven in the Appendix, with satises (14) and that it shows that attains the asymptotic lower bound (16) under the condition

(19) Hence and, therefore, if , (20) as Note Since that the asymptotic lower bound in Theorem 1 involves only through From (19), it also follows that

i.e., (17) holds with

in place of

When as

(24) and with suitably It also shows that under (24) the rules chosen thresholds attain the asymptotic lower bounds for detection delay in Theorems 1 and 3. Theorem 4: i) For the detection rule (23),

since , and the asymptotic only through In the lower bound (18) involves sequel we shall therefore focus on the constraint (14), which is the most stringent among the three false-alarm constraints discussed above. C. Window-Limited CUSUM and Moving Average Rules Let be positive integers such that but as (21)

If satises (21), then as some

and (15) and (24) hold for

uniformly in ii) and If (24) holds for some satises (21), then as and as iii) Let such that be a probability measure on the positive integers

Consider the probability constraint (14). To begin with, supare independent with common density function pose the for , and common density function for For the CUSUM rule (1)

for some as for some (22) where the second inequality follows from the fact that has the same distribution under as and the last inequality is a consequence of Doobs submartingale inequality (cf. [17]). Hence Then the Bayesian probability constraint (17) holds for with

and for

with

if

is so chosen that

2922

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 7, NOVEMBER 1998

If (24) holds for some

and

satises (21), then

on

and dene the mixture likelihood ratio statistics

as The window-limited CUSUM rule (23) with suitably chosen is therefore asymptotically optimal under the threshold different performance criteria of Theorems 13. Note that (23) moving average rules can be written as a composite of where (25) The proof of Theorem 4 i) in the Appendix shows that those with are the most crucial in ensuring the asymptotic Moving average rules will be discussed optimality of further at the end of Section IV. In practice, usually involves unknown parameters that make it impossible to determine and the optimal window in advance. As will be shown in the next section, replacing the likelihood ratio statistics (4) by mixture or generalized likelihood ratio statistics to in handle unknown parameters leads to detection rules that are asymptotically as efcient as the window-limited CUSUM rules which assume knowledge of the unknown parameters. III. ASYMPTOTICALLY OPTIMAL COMPOSITE MOVING AVERAGE RULES IN THE PRESENCE OF UNKNOWN PARAMETERS In practice, the post-change distribution often involves unknown parameters. Although the setting of a completely considered in Section II seems simknown distribution plistic, the optimal detection theory developed in that setting provides benchmarks and ideas for the development of detection rules in the presence of unknown parameters. In particular, suitable modications of the likelihood ratio stain (25) to handle these unknown parameters tistics will be shown to provide detection rules that attain the above asymptotic lower bounds for detection delay. A. Rules Based on Mixture Likelihood Ratios Instead of a known conditional density function for suppose that one has a parametric family of conditional density so that the baseline distribution functions and the conditional corresponds to the parameter value distribution after the change time corresponds to some other of the parameter space As in Section II, we let element denote the case Unlike Section II, the value of is not assumed to be known. We shall use (instead of ) to denote the probability measure with change time and changed parameter Let be a probability distribution

Throughout this section we shall let such that but

be positive integers

as

(26)

Dene the window-limited mixture likelihood ratio rule (27) with suitably chosen The following lemma shows that satises the probability constraint (14) and therefore also with (17) and the ARL constraint in view of (20). Lemma 1:

where Proof: Let be the -eld generated by Since is a nonnegative martingale with for it follows from Doobs submartingale for some inequality (cf. [17]) that Hence

for some

Let

Assume that under converges in probability to some positive constant , which we shall denote by The following theorem, whose proof is given in the attains the asymptotic lower bounds Appendix, shows that for detection delay in Theorems 1 and 2 under an assumption , the analogous to (24). Note that since as choice of in Lemma 1 satises Theorem 5: Suppose that for every and such that there exist and

(28)

LAI: INFORMATION BOUNDS AND QUICK DETECTION OF PARAMETER CHANGES IN STOCHASTIC SYSTEMS

2923

Suppose

in (27) satises

as

Then

not too large in a small neighborhood of and modify an alarm at time , take where (29)

when triggers as follows:

uniformly in B. Window-Limited Generalized Likelihood Ratio Detection Rules

(30)

and (32) is a compact -dimensional Lemma 2: Assume that and let denote its Lebesgue measure. submanifold of by (32) with Dene Then as and (14) holds for all sufciently small The proof of Lemma 2 is given in the Appendix. Examples and renements of the window-limited GLR rule, together with simulation studies and recursive algorithms for their implementation, are given in [5] and [6]. IV. EXAMPLES
AND

A more commonly used alternative to the mixture likelihood for testing versus based on ratio statistic is the generalized likelihood ratio (GLR) statistic Replacing by the GLR statistic in (27) leads to the window-limited GLR rule (31) Detection rules of this type were rst introduced by Willsky is used to avoid and Jones [19]. The minimal delay For example, difculties with GLR statistics when in the case of a normal density with unknown if mean and variance , we need at least two observations to dene uniquely the maximum likelihood estimate of Since

APPLICATIONS

the attainment of the asymptotic lower bounds in Theorems 13 by (23) implies that should also attain these asymptotic and We next lower bounds if satises the probability consider the choice of so that constraint (14) which, as pointed out earlier, is the most stringent of the false alarm constraints in Theorems 13. To analyze the probability in (14) for window-limited GLR rules, suppose that is a compact -dimensional submanifold and that is twice continuously of the Euclidean space Let and denote differentiable in the gradient vector and Hessian matrix, respectively, and let denote the interior of It will be assumed that and belong to For , let be the maximum based on If likelihood estimate of then and is negative denite. This yields the quadratic approximation

We rst discuss the assumptions (6) and (24) in Theorems is a Markov chain with 1 and 4. Suppose that for and for transition density function , with respect to some -nite measure on the state In this case , space and (6) and (24) reduce to

for every Suppose that the transition density function recurrent in the sense that there exist on such that a probability measure

(33) is uniformly and (34)

for all measurable subsets and all chain has a stationary distribution under with

Then the Markov and (33) holds

when is near , which is commonly used to derive the limiting chi-square distributions of GLR statistics. Let denote the largest eigenvalue of a symmetric matrix To and that is ensure that

In particular, assumptions (6) and (24) are satised by nitestate, irreducible chains. Note that assumption (15) in Theorems 2 and 3 is considerably weaker than (6). is the transition density function of a Suppose that , where assumes the value Markov chain before the change time and the value at and after The parameter space is assumed to be a metric space. Here

2924

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 7, NOVEMBER 1998

Suppose that the chain has a stationary distribution under and

and that (33) (with further that

replaced by

holds. Assume

for The are matrices that can be evaluated when and are specied up to recursively for an unknown parameter (cf. [1, p. 282]). Without assuming prior knowledge of the parameter and the change time , the window-limited GLR detector has the form

as denotes a ball with center where (35), it follows that and radius

(35) From

as (38) Using this together with Markovs inequality and (33), it , there exists such that (28) follows that for every noting that in the present Markov holds with case, (28) reduces to denotes the -dimensional where , and so that the normal density, matrix whose inverse appears in (38) is nonsingular. Note that the window-limited GLR rule (38) involves parallel recursions, one for each within a moving window. This can be easily implemented by initializing a new recursion at every stage while deleting the recursion that has been Only those recursions initialized at initialized at are used in the GLR detector (38). and converge to We shall assume that and exponentially fast and that the Kalman dened lter is asymptotically stable in the sense that of the in (37) converges exponentially fast to the solution Riccati equation

To fulll the assumptions of Theorem 5, assume in addition to for every ball centered at (33) and (35) that Window-limited GLR rules of the form (31) were introduced by Willsky and Jones [19] in the context of detecting additive changes in linear state-space models. Consider the stochastic system (36a) (36b) in which the unobservable state vector , the input vector , have dimensions and and the measurement vector respectively, and are independent Gaussian vectors with The Kalman zero means and lter provides a recursive algorithm to compute the conditional of the state given the past observations expectation . The innovations

Then

and converges exponentially fast to a limiting matrix as Under the probability measure associated with the change time and the parameter , (39) are independent normal random variables with means

are independent zero-mean Gaussian vectors with given recursively by where (37) Suppose at an unknown time the system undergoes some and/or additive change in the sense that are added to the right-hand side of (36a) and/or (36b). Then the innovations are still independent Gaussian vectors with are of the covariance matrices , but their means for instead of the baseline values form

and variances nentially fast to

for

Moreover,

converges expo-

as , and with thus dened, assumption (15) is satised in view of normal tail probability bounds. Since the

LAI: INFORMATION BOUNDS AND QUICK DETECTION OF PARAMETER CHANGES IN STOCHASTIC SYSTEMS

2925

TABLE I AVERAGE RUN LENGTHS OF FOUR DETECTION RULES FOR


WITH

DIFFERENT WINDOW SIZES

AVERAGE RUN LENGTHS

OF THE

TABLE II DETECTION RULES IN TABLE I

AT

THREE OTHER VALUES

OF

are independent, asumption (6) reduces to (15). Similarly, by independence, assumption (24) reduces in the present setting to

which is clearly satised because of normal tail probability bounds. Therefore, the theory developed in Sections II and III is applicable to the problem of detecting additive changes in linear state-space models. As shown in [6], we can choose as so that (14) holds for the GLR rule dened in (38) without modifying it as in Lemma 2 for the general setting. In fact, for linear Gaussian state-space models, [6, Theorem 1] shows that

as but where is a positive constant. Tables I and II report the results of a simulation study of the performance of the window-limited GLR rule (38) for the problem of detecting additive changes in the state-space model where and are two-dimensional random vectors,

and are independent, zero-mean Gaussian vectors. Here in (38) are matrices that can be computed the as follows: recursively for

is invertible for , and chose three different values of in this study. The tables consider four different values of the vector of additive changes, resulting in four different values It is assumed that the initial state of has the stationary distribution under The threshold is , using Monte Carlo simulations so chosen that and to evaluate The tables show that performs well in detecting changes , which is consistent with the asymptotic with developed in [6] showing that attains the theory of asymptotic lower bounds for detection delay in Theorems and as Note in 1 and 2 if satisfying this connection that in (26) we choose so that for xed and Instead of taking an inordinately large window which is much larger than and for which the size may become unmanageable, computational complexity of [5] and [6] develop a modication of (38) that is not too demanding in computational requirements for on-line implementation and yet is nearly optimal under the performance criteria of Theorems 1 and 2. The basic idea is to generalize to the form the WillskyJones window , where with for some Simulation studies and asymptotic properties of this modied version of (38) are given in [6]. Tables I and II also study the performance of the windowdened in (23), which requires limited CUSUM rule specication of the vector whose nominal value is chosen in the tables. In Table I, is correctly to be , and the rule (23) performs well when the specied , which is consistent with window size satises Theorem 4 (see condition (21) on the window size). Taking in (23) yields the CUSUM rule (5). Although the CUSUM rule (1) has the simple recursive representation

We set

in (38), in which the matrix

(40)

2926

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 7, NOVEMBER 1998

the CUSUM rule (5) applied to state-space models cannot in (5) is be written in such recursive form, because the (depending on both and ) in in fact of the form has mean for under view of (39) since Without recursions like (40), the CUSUM rule (5) at time and the involves maximization over Therefore number of computations grows to innity with a window-limited modication of (5) is needed for practical is misspecied implementation. Table II shows that when , the window-limited CUSUM rule may perform may even be considerably larger than the poorly and baseline average run length Since and as , we can the CUSUM statistics approximate for large by

V. CONCLUSION Sections II and III of this paper have extended the optimality theory in sequential change-point detection far beyond the very simple models previously considered in the literature. They also consider new performance criteria and provide a unied approach, via information bounds and window-limited likelihood-based procedures, to develop detection rules with relatively low computational complexity for on-line implementation and to show that they are nevertheless asymptotically optimal under various performance criteria. One such criterion is Bayesian, which has been studied in the literature for certain simple cases by the theory of optimal stopping. Indeed, minimization of

subject to the constraint (17) can be formulated as the optimal stopping problem of choosing the stopping rule to minimize the expected loss (45) (41) by Replacing rule dened in (25) yields in the moving average where can be regarded as a Lagrange multiplier associated with (17) and denotes expectation with respect to the meaand has conditional sure under which has distribution if and if density This optimal stopping problem, however, is intractable or complicated prior distributions for non-Markovian Instead of solving the optimal stopping problem directly, Theorem 3 develops an asymptotic lower bound for the detection delay subject to (17) and Theorem 4 shows that the CUSUM rule (5) or its window-limited modication (23) with suitably chosen asymptotically attains this lower bound. This result therefore gives an asymptotic solution to the optimal , whose exact stopping problem (45) when solution via optimal stopping theory is intractable except in relatively simple cases. The window-limited GLR rules in Section III can be represented as a composite (25) of moving average rules. Using the representation (25), parallel recursive algorithms are developed in [5] and [6] for on-line implementation of these detection rules in stochastic systems and regression models. Moreover, important sampling techniques are developed in [6] for efcient Monte Carlo evaluation of the probability in (14) so that the threshold can be suitably chosen to satisfy the probability constraint (14). Furthermore, a renement of (31) in [5] and [6] using a more exible range of window sizes enables one to detect efciently gradual as well as abrupt changes. APPENDIX A. Proof of Theorem 3 From (17), it follows that if By the assumptions on such that we can choose and therefore

which is called a moving-window FSS rule in [7] since it applies at every stage a likelihood ratio FSS (xed sample based on a sample of size) test of the null hypothesis observations. In practice, the actual value of is typically leads to even longer unknown and misspecifying in detection delays than those for the CUSUM rule in Table II We therefore propose to use, with the same misspecied , the GLR statistic in lieu of where (42) leading to the moving average rule (43) Tables I and II also give the performance of (43) and of a somewhat different FSS rule (44) which restricts for simplicity the detection times to integral so that nonoverlapping blocks of innovations multiples of are used for detection, and which was proposed by Pelkowitz instead and Schwartz [12] and Nikiforov [10] with In both tables, of the GLR statistics for some prespecied have been computed analytithe average run lengths of cally, while those of the other three rules have been computed by Monte Carlo simulations, using 1000 simulations in each case.

LAI: INFORMATION BOUNDS AND QUICK DETECTION OF PARAMETER CHANGES IN STOCHASTIC SYSTEMS

2927

Hence for

for

Let

Then on for some (47)

(46) by (9) with replaced by Dene (10), for sufciently small Then as in by Doobs submartingale inequality and the optional sampling is a theorem (cf. [17]), since with mean (see also (51) nonnegative martingale under below). Let and for some by (46). Moreover, from (15), it follows as in (11) that Then

by (47), and, therefore, as Hence Since so it then follows that To prove that when (21) and (24) , it sufces to show that for any hold and such that (see (21))

noting that

By (17), as

(48) Since as By (24) it then follows that (49) Since can be arbitrarily small, (18) follows. Since and lim inf for all large follows that for all sufciently small , it Let be the largest integer

B. Proof of Theorem 4 be the -eld We rst prove part ii) of the theorem. Let Clearly, and therefore generated by To prove that dene the stopping times

and for any and , as can be shown by applying for (49) and conditioning on in succession (in view of the property

2928

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 7, NOVEMBER 1998

if sufciently large

is a sub- -eld of

). Hence for all

by part i) of the theorem. Hence

(50) implying (48) since We next prove part i) of the theorem. From (23) it follows that

Similarly,

follows from for some

for some As in (47), it follows from Doobs submartingale inequality (cf. [17]) that for every (51) Hence

where the last inequality follows from (51). C. Proof of Theorem 5 The proof of (29) is similar to that of (50), noting that

Moreover, as in the derivation of (52), (30) follows from (29). D. Proof of Lemma 2

for all and, therefore, (14) holds if For this choice of , since (14) and (15) hold, (16) holds with replaced by Moreover,

First note that

where since under (21). Hence, under (24), (48) , from which it follows holds for all sufciently small that as uniformly in since To analyze , we use a change-of-measure denote the probability measure under argument. Let given is which the conditional density of for and is for Dene a measure Since is is compact and therefore has nite Lebesgue measure, , the RadonNikodym derivative a nite measure. For to relative to the restriction of of the restriction of to is (52)

To prove part iii) of the theorem, rst note that

and that (52) yields

as since

Hence by Walds likelihood ratio identity (cf. [16]) As in (19), we have (53)

LAI: INFORMATION BOUNDS AND QUICK DETECTION OF PARAMETER CHANGES IN STOCHASTIC SYSTEMS

2929

For , if , then therefore, by Taylors theorem

and,

REFERENCES
[1] R. K. Bansal and P. Papantoni-Kazakos, An algorithm for detecting a change in a stochastic process, IEEE Trans. Inform. Theory, vol. IT-32, pp. 227235, Mar. 1986. [2] M. Basseville and I. V. Nikiforov, Detection of Abrupt Changes. Theory and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1993. [3] T. Bojdecki, Probability maximizing approach to optimal stopping and its application to a disorder problem, Stochastics, vol. 3, pp. 6171, 1979. [4] T. L. Lai, Asymptotic optimality of invariant sequential probabiity ratio tests, Ann. Statist., vol. 9, pp. 318333, 1981. , Sequential changepoint detection in quality control and dy[5] namical systems, J. Roy. Statist. Soc. Ser. B, vol. 57, pp. 613658, 1995. [6] T. L. Lai and J. Z. Shan, Efcient recursive algorithms for detection of abrupt changes in signals and control systems, IEEE Trans. Automat. Contr., vol. 44, May 1999, to be published. [7] Y. Liu and S.D. Blostein, Quickest detection of an abrupt change in a random sequence with nite change-time, IEEE Trans. Inform. Theory, vol. 40, pp. 19851993, Nov. 1994. [8] G. Lorden, Procedures for reacting to a change in distribution, Ann. Math. Statist., vol. 42, pp. 18971908, 1971. [9] G. Moustakides, Optimal procedures for detecting changes in distributions, Ann. Statist., vol. 14, pp. 13791387, 1986. [10] I. V. Nikiforov, Two strategies in the problem of change detection and isolation, IEEE Trans. Inform. Theory, vol. 43, pp. 770776, Mar. 1997. [11] E. S. Page, Continuous inspection schemes, Biometrika, vol. 41, pp. 100115, 1954. [12] L. Pelkowitz and S. C. Schwartz, Asymptotically optimum sample size for quickest detection, IEEE Trans. Aerosp. Electron. Syst., vol. AES-23, pp. 263272, Mar. 1987. [13] M. Pollak, Optimal detection of a change in distribution, Ann. Statist., vol. 13, pp. 206227, 1985. [14] Y. Ritov, Decision theoretic optimality of the CUSUM procedure, Ann. Statist., vol. 18, pp. 14641469, 1990. [15] A. N. Shiryayev, Optimal Stopping Rules. New York: Springer-Verlag, 1978. [16] D. Siegmund, Sequential Analysis: Tests and Condence Intervals. New York: Springer-Verlag, 1985. [17] D. Williams, Probability with Martingales. Cambridge, U.K.: Cambridge Univ. Press, 1991. [18] A. S. Willsky, A survey of design methods for failure detection in dynamic systems, Automatica, vol. 12, pp. 601611, 1976. [19] A. S. Willsky and H. L. Jones, A generalized likelihood ratio approach to detection and estimation of jumps in linear systems, IEEE Trans. Automat Contr., vol. AC-21, pp. 108112, Feb. 1976. [20] B. Yakir, Optimal detection of a change in distribution when the observations form a Markov chain with a nite state space, in Change-Point Problems, E. Carlstein, H. Muller, and D. Siegmund, Eds. Hayward, CA: Inst. Math. Statist., 1994, pp. 346358.

where

for some

Hence if

and

, then

as

Therefore, by the denition of

and (53),

Since

(14) holds for all small as in Lemma 2.

if the threshold

for

is chosen

You might also like