This action might not be possible to undo. Are you sure you want to continue?
Statistical Signal Processing
YihFang H u a n g
Department of Electrical Engineering, University of Notre Dame, Notre Dame, Indiana, USA
6.1 6.2 6.3 6.4 6.5 6.6
Introduction ....................................................................................... Bayesian Estimation .............................................................................
6.2.1 M i n i m u m M e a n  S q u a r e d Error Estimation • 6.2.2 M a x i m u m a Posteriori Estimation
921 921 923 924 927 930 931
Linear Estimation ................................................................................ Fisher Statistics ....................................................................................
6.4.1 Likelihood Functions • 6.4.2 Sufficient Statistics • 6.4.3 I n f o r m a t i o n Inequality a n d C r a m &  R a o Lower B o u n d • 6.4.4 Properties o f MLE
Signal Detection ..................................................................................
6.5.1 Bayesian Detection • 6.5.2 N e y m a n  P e a r s o n Detection • 6.5.3 Detection o f a K n o w n Signal in Gaussian Noise
Suggested Readings .............................................................................. References ..........................................................................................
6.1 Introduction
Statistical signal processing is an important subject in signal processing that enjoys a wide range of applications, including communications, control systems, medical signal processing, and seismology. It plays an important role in the design, analysis, and implementation of adaptive filters, such as adaptive equalizers in digital communication systems. It is also the foundation of multiuser detection that is considered an effective means for mitigating multiple access interferences in spreadspectrum based wireless communications. The fundamental problems of statistical signal processing are those of signal detection and estimation that aim to extract information from the received signals and help make decisions. The received signals (especially those in communication systems) are typically modeled as random processes due to the existence of uncertainties. The uncertainties usually arise in the process of signal propagation or due to noise in measurements. Since signals are considered random, statistical tools are needed. As such, techniques derived to solve those problems are derived from the principles of mathematical statistics, namely, hypothesis testing and estimation. This chapter presents a brief introduction to some basic concepts critical to statistical signal processing. To begin
Copyright© 2005by AcademicPress. All rights of reproduction in any form reserved.
with, the principles of Bayesian and Fisher statistics relevant to estimation are presented. In the discussion of Bayesian statistics, emphasis is placed on two types of estimators: the minimum meansquared error (MMSE) estimator and the maximum a posteriori (MAP) estimator. An important class of linear estimators, namely the Wiener filter, is also presented as a linear MMSE estimator. In the discussion of Fisher statistics, emphasis will be placed on the maximum likelihood estimator (MLE), the concept of sufficient statistics, and information inequality that can be used to measure the quality of an estimator. This chapter also includes a brief discussion of signal detection. The signal detection problem is presented as one of binary hypothesis testing. Both Bayesian and NeymanPearson optimum criteria are presented and shown to be implemented with likelihood ratio tests. The principles of hypothesis testing also find a wide range of applications that involve decision making.
6.2 Bayesian Estimation
Bayesian estimation methods are generally employed to estimate a random variable (or parameter) based on another 921
922 random variable that is usually observable. Derivations of Bayesian estimation algorithms depend critically on the a posteriori distribution of the underlying signals (or signal parameters) to be estimated. Those a posteriori distributions are obtained by employing Bayes' rules, thus the name Bayesian estimation. Consider a random variable X that is some function of another random variable S. In practice, X is what is observed and S is the signal (or signal parameter) that is to be estimated. Denote the estimate as S(X), and the error as e ~ S  S(X). Generally, the cost is defined as a function of the estimation error that clearly depends on both X and S, thus ] ( e ) =
YihFang Huang
and the maximum a posteriori (MAP) estimator. These two estimators are discussed in more detail below.
6.2.1 M i n i m u m MeanSquared Error Estimation
When the cost function is defined to be the meansquared error (MSE), as in equation 6.4a, the Bayesian estimate can be derived by substituting J(e) = lel 2 = (s  i(xll 2 into equation 6.2. Hence the following is true:
7"~MS ~
J {L
00
(S  g(x) )2fslx(Slx)ds
}
fx(x)dx.
(6.5)
l(S, X).
The objective of Bayesian estimation is to minimize the Bayes' risk 7"4, which is defined as the ensemble average (i.e., the expected value) of the cost. In particular, the Bayes' risk is defined by:
re=~ e { l ( e ) }
Denote the resulting estimate as gMS(X), and use the same argument that leads to equation 6.3. A necessary condition for minimizing 7¢MS results as:
~s
(30
( s  i(x))2fslx(Slx)ds = 0.
(6.6)
=
00 00
](s, x)~, x(S, x)dxds,
The differentiation in equation 6.6 is evaluated at ~ = Consequently:
gMS(X).
where fs, x(S, x) is the joint probability density function (pdf) of the random variables S and X. In practice, the joint pdf is not directly obtainable. However, by Bayes' rule: fs,x(S, x) = fsrx(Slx)fx(x), (6.1) and
ioo (S iMS(X))fsIx(S]x)ds
oo
= 0;
the a posteriori pdf can be used to facilitate the derivation of Bayesian estimates. With the a posteriori pdf, the Bayes' risk can now be expressed as:
~MS(X) =
f
oo
sfslx(slx)ds = E{slx}.
oo
(6.7)
7"4.=
J {I
00 O0
](s, x)fslx(s x)ds
x)dx.
(6.2)
Because the cost function is, in general, nonnegative and so are the pdfs, minimizing the Bayes' risk is equivalent to minimizing:
In essence, the estimate that minimizes the MSE is the conditionalmean estimate. If the a posteriori distribution is Gaussian, then the conditionalmean estimator is a linear function of X, regardless of the functional relation between S and X. The following theorem summarizes a very familiar result.
Theorem
Let X = [X1, X2. . . . . XK] r and S be jointly Gaussian with zero means. The MMSE estimate of S based on X is E[SIX] and E[SIX] = ~ : l a i x i , where ai is chosen such that E[(S  ~K=I aiXi)Xj] : 0 f o r a n y j = 1, 2 . . . . . K.
j
oo l(s, x)fslx(Slx)ds.
(6.3)
OO
Depending on how the cost function is defined, the Bayesian estimation principle leads to different kinds of estimation algorithms. Two of the most commonly used cost functions are the following:
6.2.2 M a x i m u m a Posteriori Estimation
If the cost function is defined as in equation 6.4b, the Bayes' risk is as follows:
joo
]MS(e) : [el2.
]MAP(e): 0 1 if lel < ~if [e I > ~ where A<< 1.
(6.4a) (6.4b) 7~MAP = fx(X)[1
ff~P+@fslx(Slx)ds]dx.
['MA~fsx(S[x)ds. When 'J5MAP~ I
^ A
These example cost functions result in two popular estimators, namely, the minimum meansquared error (MMSE) estimator
To minimize RMAP, we maximize
A is extremely small (as is required), this is equivalent to
6
Statistical Signal Processing
923 In this example, the MMSE estimate and the MAP estimate are equivalent because the a posteriori distribution is Gaussian. Some useful insights into Bayesian estimation can be gained through this example (van Trees, 1968).
maximizing fslx(Slx). Thus, SMAP(X) is the value of s that maximizes fslx ( s[x). Normally, it is often more convenient (for algebraic manipulation) to consider lnfslx(s[x), especially since in (x) is a m o n o t o n e nondecreasing function of x. A necessary condition for maximizing lnfslx(slx) is as written here:
Remarks
1. If cy2 << ~ , the a priori knowledge is more useful than the observed data, and the estimate is very close to the a priori mean (i.e., 0). In this case, the a posteriori distribution almost has no effect on the value of the estimate. 2 2 2. If % >> ~, the estimate is directly related to the observed data as it is the sample mean, while the a priori knowledge is of little value. 3. The equivalence of SMAP to gins(X) is not restricted to the case of Gaussian a posteriori pdf. In fact, if the cost function is symmetric and nondecreasing and if the a posteriori pdf is symmetric and unimodal and satisfies lim,~o~ ](s, x)fslx(slx) = 0, then the resulting Bayesian estimation (e.g., MAP estimate) is equivalent to gMS(X).
~lnfslx(SlX) s=e~p(~) = O.
(6.8)
Equation 6.8 is often referred to as the MAP equation. Employing Bayes' rule, the MAP equation can also be written as:
~lnfxls(XlS) + ~slnJ~(s) S=eMAp(x)= O.
(6.9)
Example (van Trees, 1968) Let Xi, i = 1, 2 . . . . . K be a sequence of random variables
modeled as follows:
Xi = S + Ni
i=l;2,...,K,
6.3 Linear Estimation
Since the Bayesian estimators (MMSE and MAP estimates) presented in the previous section are usually not linear, it may be impractical to implement them. An alternative is to restrict the consideration to only the class of linear estimators and then find the optimum estimatior in that class. As such, the notion of optimality deviates from that of Bayesian estimation, which minimizes the Bayes' risk. One of the most commonly used optimization criteria is the MSE (i.e., minimizing the error variance). This approach leads to the class of Wiener filters that includes the Kalman filter as a special case. In essence, the Kalman filter is a realizable Wiener filter as it is derived with a realizable (statespace) model. Those linear estimators are much more appealing in practice due to reduced implementational complexity and relative simplicity of performance analysis. The problem can be described as follows. Given a set of zeromean random variables, X1, X2 . . . . . XK, it is desired to estimate a random variable S (also zeromean). The objective here is to find an estimator S that is linear in Xi and that is o p t i m u m in some sense, like MMSE. Clearly, if S is constrained to be linear in Xi, it can be expressed as S = ~ : = l aiXi. This expression can be used independently of the model that governs the relation between X and S. One can see that once the coefficients ai, i = 1, 2 . . . . . K are determined for all i, S is unambiguously (uniquely) specified. As such, the problem of finding an o p t i m u m estimator becomes one of finding the o p t i m u m set of coefficients, and estimation of a random signal becomes estimation of a set of deterministic parameters.
where S is a zeromean Gaussian random variable with variance ¢r2 and {Ni} is a sequence of independent and identically s distributed (iid) zeromean Gaussian random variables with variance %2 D e n o t e X = [X1X2. XK] r. .
fxls(XlS) = i  i ~ e x
K
1
i=1 x/2w~yn
p
[
(xi s)2] 20"n 2 J
1 [ l s 2] fs(s) = ~ =   exp l ZSq X/2~rO's L 2%J
fsl×(s[x) Ll,(xls)fs(s) L(x)
1 1
 f x ( x ) x / 2 ~ r ,
[/=~ ~ 1
exp
[ ~(~=~ ( x i  ~')ls2 O 5)2 JC n2 x i)
(6.10)
fslx(SlX) = C(x) exp  ~ l
2 1 s cr2 +(Ys o.2/K ~
2 A 1 K __ °'str,, 2 2 where ¢rp = (~2s + ~nn)i  K°s2+~ 2 n
'
From equation 6.10, it can be seen clearly that the conditional mean estimate and the MAP estimate are equivalent. In particular:
Sms(X) =
SMAp(X) 
O"s 2 + cr2/K
s i= 1
Xi
"
924 If the objective is to minimize the MSE, namely:
YihFang Huang
6.17 are the basis of Wiener filter, which is one of the most studied and commonly used linear adaptive filters with many applications (Haykin, 1996, 1991). The matrix inversion in equation 6.17 could present a formidable numerical difficulty, especially when the vector dimension is high. In those cases, some computationally efficient algorithms, like Levinson recursion, can be employed to mitigate the impact of numerical difficulty.
E{IIS Sll2} = E IISa necessary condition is that:
aiXil[ 2
,
(6.11)
~ai
E{IIS  ~ll 2} = o
for all i.
(6.12)
Equation 6.12 is equivalent to:
6.4
Fisher
Statistics
foralli
(6.13)
In other words, a necessary condition for obtaining the linear MMSE estimate is the uncorrelatedness between the estimation error and the observed random variables. In the context of vector space, equation 6.13 is the wellknown orthogonality principle. Intuitively, the equation states that the linear MMSE estimate of S is the projection of S onto the subspace spanned by the set of random variables {Xi}. In this framework, the norm of the vector space is the meansquare value while the inner product between two vectors is the correlation between two random variables. Let the autocorrelation coefficients of xi be E{XjX;} = rji and the crosscorrelation coefficients of Xi and S be E{SX;} = Pi. Then, equation 6.13 is simply:
K
Pi = X
a)rji
for aH i,
(6.14)
j=l
which is essentially the celebrated WienerHopf equation. Assume that rij and 9j are known for all i, j, then the coefficients {ai} can be solved by equation 6.14. In fact, this equation can be stacked up and put in the following matrix form:
Generally speaking, there are two schools of thoughts in statistics: Bayes and Fisher. Bayesian statistics and estimation were presented in the previous section, where the emphasis was on estimation of random signals and parameters. This section is focused on estimation of a deterministic parameter (or signal). A natural question that one may ask is if the Bayesian approach presented in the previous section is applicable here or if the estimation of deterministic signals can be treated as a special case of estimating random signals. A closer examination shows that an alternative approach needs to be taken (van Trees, 1968) because the essential issues that govern the performance of estimators differ significantly. The fundamental concept underlying the Fisher school of statistics is that of likelihood function. In contrast, Bayesian statistics is derived from conditional distributions, namely, the a posteriori distributions. This section begins with an introduction of the likelihood function and a derivation of the maximum likelihood estimation method. These are followed by the notion of sufficient statistics, which plays an important role in Fisherian statistics. Optimality properties of maximum likelihood estimates are then examined with the definition of Fisher information. Cram&Rao lower bound and minimum variance unbiased estimators are then discussed. 6.4.1 L i k e l i h o o d Functions Fisher's approach to estimation centers around the concept of likelihood function (Fisher, 1992). Consider a random variable X that has a probability distribution Fx(x) with probability density function (pdf) fx(X) parameterized by a parameter 0. The likelihood function (with respect to the parameter 0) is defined as:
r12
r22
rK2 ]
a2
=
P2
/
L rlK r2K . rKK J P
(6.15)
or simply:
L(x; O)=fx(xlO),
R~= 9.
Thus, the coefficient vector can be solved by: a=R 19 , (6.17)
(6.18)
(6.16)
It may appear, at the first sight, that the likelihood function is nothing but the pdf. It is important, however, to note that the likelihood function is really a function of the parameter 0 for a fixed value of x, whereas the pdf is a function of the realization of the random variable x for a fixed value of 0. Therefore, in a likelihood function, the variable is 0, while in a pdf the variable is x. The likelihood function is a quantitative indication of how
which is sometimes termed the normal equation. The orthogonality principle of equation 6.13 and the normal equation
6 Statistical Signal Processing likely that a particular realization (observation) of the random variable would have been produced from a particular distribution. The higher the value the likelihood function, the more likely the particular value of the parameter will have produced that realization of x. Hence, the cost function in the Fisherian estimation paradigm is the likelihood function, and the objective is to find a value of the parameter that maximizes the likelihood function, resulting in the m a x i m u m likelihood estimate (MLE). The MLE can be derived as follows. Let there be a random variable whose pdf, fx(x), is parameterized by a parameter 0. Define the objective function: L(x; 0) = fx(x]0),
(6.19) Solving the loglikelihood equation, 6.23, yields:
925
1 K i=1
Thus, the MLE for a DC signal embedded in additive zero mean white Gaussian noise is the sample mean. As it turns out, this sample mean is the sufficient statistic for estimating 0. The concept of the sufficient statistic is critical to the o p t i m u m properties of MLE and, in general, to Fisherian statistics. Generally speaking, the likelihood function is directly related to sufficient statistics, and the MLE is usually a function of sufficient statistics. 6 . 4 . 2 S u f f i c i e n t Statistics Sufficient statistics is a concept defined in reference to a particular parameter (or signal) to be estimated. Roughly speaking, a sufficient statistic is a function of the set of observations that contains all the information possibly obtainable for the estimation of a particular parameter. Given a parameter 0 to be estimated, assume that x is the vector consisting of the observed variables. A statistic T(x) is said to be a sufficient statistic if the probability distribution of X given T(x) = t is independent of 0. In essence, if T(x) is a sufficient statistic, then all the information regarding estimation of 0 that can be extracted from the observation is contained in T(x). The Fisher factorization theorem stated below is sometimes used as a definition for the sufficient statistic.
where fx(x]O) is the pdf of X for a given 0. Then, the MLE is obtained by:
OML~(X) = Arg { max L( x; 0)}.
(6.20)
Clearly, a necessary condition for L(x; 0) to be maximized is that:
~ L ( x ; 0) 0=6ML~= 0.
(6.21)
This equation is sometimes referred to as the likelihood equation. In general, it is more convenient to consider the loglikelihood function defined as:
l(x; 0) = In L(x; 0)
and the loglikelihood equation as:
(6.22)
Fisher Factorization Theorem
(6.23) A function of the observation set T(X) is a sufficient statistic if the likelihood function of X can be expressed as: L(_x; 0) = h(T(x), 0)g(x). (6.24)
~ l ( x ; 0) 0~MLE= 0.
Example
Consider a sequence of random variables Xi, i = 1, 2 ,  . . , K modeled as: Xi = 0 + N i , where Ni is a sequence of iid zeromean Gaussian random variables with variance 2 . In practice, this formulation can be considered as one of estimating a DC signal embedded in white Gaussian noise. The issue here is to estimate the strength, i.e., the magnitude, of the DC signal based on the a set of observations xi, i = 1, 2 , .  . , K. The loglikelihood function as defined in equation 6.22 is given by:
In the example shown in Section 6.4.1, ~iK=~ xi is a sufficient statistic, and the sample mean 1 ~ f : l xi is the MLE for 0. The fact that ~/K_I xi is a sufficient statistic can be easily seen by using the Fisher factorization theorem. In particular:
L(_x; 0) =
exp
~~2/_~1 (xi  0)2 .
1
(6.25)
The above equation can be expressed as:
= { oxp[~(i~
) _K0212ff2]}(~1 )K oxp/ [ ~
1 xg J K ] /
(6.26)
1
K
(xi  0) 2,
l(x; 0) =  K i n ( v / 2  ~ )  ~~2 Z
i=1
where x
= [Xl, 2 2..... Xk ] T.
Identifying equation 6.26 with 6.24, it can be easily seen that, if the following is defined:
926
YihFang Huang h(T((x), 0) = exp ~x4  ~~~2j In this section, the discussion will be focused on the second criterion, namely the variance, which is one of the most commonly used performance measures in statistical signal processing. In the estimation of deterministic parameters, Fisher's information is imminently related to variance of the estimators. The Fisher's information is defined as:
2
and 1 )/( exp [__ 1 ~f 2(r2 ~ 
g(x)
X2
then T(x) = y~If=1 xi is clearly a sufficient statistic for estimating 0. It should be noted that the sufficient statistic may not be unique. In fact, it is always subject to a scaling factor. In equation 6.26, it is seen that the pdf can be expressed as:
Note that the Fisher's information is nonnegative, and it is additive if the set of random variables is independent. Theorem (Information Inequality) Let T(X) be a statistic such that its variance Vo(T(X)) < oo, for all 0 in the parameter space 19. Define t~(0) = Eo{T(X)}. Assume that the following regularity condition holds:
fx(xlO) = {exp[c(O)T(x)+d(O)+S(x)]}I(x),
(6.27)
where c(0) = ~ , T(x) ~ Y~f=l xi, d(O)   ~ F1 , i = lf x i2 S ( x ) = _ ,  ~ln (2rm2)  ~ Y'~i=I x2, and I(x) is an indicator function that is valued at 1 wherever the value of the pdf is nonzero and zero otherwise. A pdf that can be expressed in the form given in equation 6.27 is said to belong to the exponential family of distributions (EFOD). It can be verified easily that Gaussian, Laplace (or twosided exponential), binomial, and Poisson distributions all belong to the EFOD. When a pdf belongs to the EFOD, one can easily identify the sufficient statistic. In fact, the mean and variance of the sufficient statistic can be easily calculated (Bickel and Doksum, 1977). As stated previously, the fact that the MLE is a function of the sufficient statistic helps to ensure its quality as an estimator. This will be discussed in more detail in the subsequent sections.
E{[~~l(x;O)]} = 0
forall
0EO,
(6.29)
where l(x, O) has been defined in equation 6.22. Assume further that t~(0) is differentiable for all 0. Then:
W(0)] 2 Vo(T(X)) < _ I(0)
Remarks
(6.30)
6.4.3 Information Inequality and CramdrRao Lower Bound
There are several criteria that can be used to measure the quality of an estimator. If 0(X) is an estimator for the parameter 0 based on the observation X, three criteria are typically used to evaluate its quality: 1. Bias: If El0 (X)] = 0,0(X) is said to be an unbiased estimator. 2. Variance: For estimation of deterministic parameters and signals, variance of the estimator is the same as variance of the estimation error. It is a commonly used performance measure, for it is practically easy to evaluate. 3. Consistency: This is an asymptotic property that is examined when the sample size (i.e., the dimension of the vector X) approaches infinity. If 0(X) converges with probability one to 0, then it is said to be strong consistent. If it converges in probability, it is weak consistent.
1. The regularity condition defined in equation 6.29 is a restriction imposed on the likelihood function to guarantee that the order of expectation operation and differentiation is interchangeable. 2. If the regularity condition holds also for second order derivatives, I(0) as defined in equation 6.28 can also be evaluated as: 82 I(0)E{~I(x;O)}. 3. The subscript 0 of the expectation (E0) and of the variance (V0) indicates the dependence of expectation and variance on 0. 4. The information inequality gives a lower bound on the variance that any estimate can achieve. It thus reveals the best that an estimator can do as measured by the error variance. 5. If T(X) is an unbiased estimator for 0, (i.e., E0{ T(X)} = 0), then the information inequality, equation 6.30, reduces to:
Vo(T(X)) < 1(0~'
1
(6.31)
which is the wellknown Cram6rRao lower bound (CRLB).
6
Statistical Signal Processing
6. In statistical signal processing, closeness to the CRLB is often used as a measure of efficiency of an (unbiased) estimator. In particular, one may define the Fisherian efficiency of an unbiased estimator 0(X) as:
927
Xi=Acos(o~oi+~b)+Ni
i=0,
1. . . . . K  1
n(6(x))
Vo{O(X)} "
i 1(0)
where {Ni} is an iid sequence of zeromean Gaussian random variables with variance ty2. Employing equation 6.23, MLE can be obtained by minimizing:
K 1
(6.32)
Note that T is always less than one, and the larger r 1, 1 the more efficient that estimator is. In fact, when 1] = l, the estimator achieves the CRLB and is said to be an efficient estimator in the Fisherian sense. 7. If an unbiased estimator has a variance that achieves the CRLB for all 0 ¢ O, it is called a uniformly minimum variance unbiased estimator (UMVUE). It can be shown easily that UMVUE is a consistent estimator and MLE is usually the UMVUE.
l(qb) = Z
i=1
( x i  A c o s (t.Ooi + ~b))2.
Differentiating J(0) with respect to 0 and setting it equal to zero yields:
K 1 K1
E xi sin (~Ooi+ fiPMLE)= A Z xi sin (~Ooi+ ~)MLE)COS(~Ooi+ ~bML~).
i=0 i=0
6.4.4 Properties of MLE
The MLE has many interesting properties, and it is the purpose of this section to enlist some of those properties. 1. It is easy to see that the MLE may be biased. This is because being unbiased was not part of the objective in seeking MLE. MLE, however, is always asymptotically unbiased. 2. As shown in the previous section, MLE is a function of the sufficient statistic. This can also be seen from the Fisher factorization theorem. 3. MLE is asymptotically a m i n i m u m variance unbiased estimator. In other words, its variance asymptotically achieves the CRLB. 4. MLE is consistent. In particular, it converges to the parameter with probability one (or in probability). 5. Under the regularity condition of equation 6.29, if there exists an unbiased estimator whose variance attains the CRLB, it is the MLE. 6. Generally speaking, the MLE of a transformation of a parameter (or signal) is the transformation of the MLE of that parameter (or signal). This is referred to as the invariance properly of the MLE. The fact that MLE is a function of sufficient statistics means that it depends on relevant information. This does not necessarily mean that it will always be the best estimator (in the sense of, say, m i n i m u m variance), for it may not make the best use of the information. However, when the sample size is large, all the information relevant to the unknown parameter is essentially available to the MLE estimator. This explains why MLE is asymptotically unbiased, it is consistent, and its variance achieves asymptotically the CRLB. Example (Kay, 1993) Consider the problem of estimating the phase of a sinusoidal signal received with additive Gaussian noise. The problem is formulated as:
Assume that:
1 K1
~ / ~cos(2~Ooi+2~b).= = 0 0
for all qb.
Then the MLE for the phase can be approximated as: ~i=0 Xi sin (COot) ~f=01 xi cos
(~Ooi) "
K 1 • •
~bML~ ~  arctan
Perceptive readers may see that implementation of MLE can easily become complex and numerically difficult, especially when the underlying distribution is nonGaussian. If the parameter to be estimated is a simple scalar, and its admissible values are limited to a finite interval, search algorithms can be employed to guarantee satisfactory results. If this is not the case, more sophisticated numerical optimization algorithms will be needed to render good estimation results. Among others, iterative algorithms such as the NewtonRaphson method and the expectation maximization method are often employed.
6.5 Signal Detection
The problem of signal detection can be formulated mathematically as one of binary hypothesis testing. Let X be the random variable that represents the observation, and let the observation space be denoted by f~ (i.e., x E f~). There are basically two hypotheses: Null hypothesis H0 : X ~ F0. (6.33a) (6.33b)
Alternative H1 : X ~ F 1.
The F0 is the probability distribution of X given that H0 is true, and F1 is the probability distribution of X given that H1 is true. In some applications (e.g., a simple radar communication
928 system), it may be desirable to detect a signal of constant amplitude, and then FI will simply be F0 shifted by a mean value equal to the signal amplitude. In general, this formulation also assumes that, with probability one, either the null hypothesis or the alternative is true. Specifically, let ~r0 and Wl be the prior probabilities of H0 and HI being true, respectively. Then: ~ro q Tr~ = 1. (6.34)
YihFang Huang Prob{d(x) = 0]Ho} + Prob{d(x) = llHo} = 1 Prob{d(x) = 01H1 } q Prob{d(x) = IIH1} = 1.
(6.39a) (6.39b)
The above constraints of equations 6.39a and 6.39b are simply outcomes of the assumptions that f~0 t3 f~l = 12 and 120 N 121 = 0 (i.e., for every observation, an unambiguous decision must be made).
The objective here is to decide whether H0 or HI is true based on the observation of X = x. A decision rule, namely a detection scheme, d(x) essentially partitions f~ into two subspaces 1)0 and fll. The subspace 1~0 consists of all observations that lead to the decision that H0 is true, while fll consists of all observations that lead to the decision that Hi is true. For notational convenience, one may also define d(x) as follows:
6.5.1 Bayesian D e t e c t i o n
The objective of a Bayes criterion is to minimize the socalled Bayes' risk which is, again, defined as the expected value of the cost. To derive a Bayes' detection rule, the costs of making decisions need to be defined first. Let the costs be denoted by Cq being cost of choosing i when j is true. In particular: Col = cost of choosing Ho when H1 is true. Clo = cost of choosing Hi when Ho is true. In addition, assume that the prior probabilities ~r0 and 71 are known. The Bayes' risk is then evaluated as follows:
d(x) = ~ 0 [ 1
For any decision rule outcomes: 1. 2. 3. 4. Decide Decide Decide Decide H0 H0 H1 H1 when when when when
if x E ~o if x E 121 "
(6.35)
d(x), there are clearly four possible
H0 H1 H0 H~ is is is is true true true true
74 A=E{C} = "rro{Prob{d(x)= 0lHo}Coo + Prob{d(x) = l l S o } C l o + ~rl{Prob(d(x) = 0]H1)Clo + Prob(d(x) = IIH1)C.}.
}
(6.40) Substituting equations 6.34, 6.39a and 6.39b into equation 6.40 yields:
Two types of error can occur: 1. TypeI error (false alarm) Decide H1 when H0 is true. The probability of typeI error is as follows: cx = J
fh
j~(x)dx.
(6.36)
2. TypeII error (miss): Decide H0 when//1 is true. The
probability of typeII error is as follows: f d f~0
"]~=IToCo0I fo(x)dx~"IToCloJ fo(x)dx ~o fh + ~ICol J fl(X)dx + 'IT1CllL fl(X)dx ~Qo 1
= qToC10 q (1  '/T0)Cll ~ I[) {(1  'Ti'0)(C01  C l l ) ~ ] ( x ) 0
Pm= ] fl(X)dx.
(6.37)

"~0(G0

Coo)do(x)}&.
Another quantity of concern is the probability of detection, which is often termed the power in signal detection literature, defined by:
Note that the sum of the first two terms is a constant. In general, it is reasonable to assume that: C01Cll >0 and C10C00>0.
[3= l  Pm= [ fl(x)dx.
f J
(6.38)
In other words, the costs of making correct decisions are less than those of making incorrect decisions:
II(x) __a (1  ~o)(Col  Cll))q(X). h(X) a= no(Clo  Coo)g(x).
Among the various decision criteria, Bayes and NeymanPearson are most popular. Detection schemes may also be classified into parametric and nonparametric detectors. The discussion here focuses on Bayes and NeymanPearson criteria that are considered parametric. Before any decision criteria can be derived, the following constraints need be stated first:
It can be seen easily that be rewritten as:
h(x) > 0 and/2(x) > 0. Thus, 74 can
6
Statistical Signal Processing
929 other arbitrary likelihood ratio test with threshold X and falsealarm rate and power as, respectively, oLand [3. If o~ G ~*, then
f T4 = constant + ] [Ii(x)  I2(x)]dx. J ~0 To minimize 74, the observation space f} need be partitioned such that x E f}l whenever:
/l(X) __> / 2 ( X ) .
In other words, decide H~ ifi
[3_<[3*. The NeymanPearson Lemma showed that if one desires to increase the power of an LRT, one must also accept the consequence of an increased falsealarm rate. As such, the NeymanPearson detection criterion is aimed to maximize the power under the constraint that the falsealarm rate be upper bounded by, say, s0. The NeymanPearson detector can be derived by first defining a cost function:
l = (1  [3) + )t(&  o~0). It can be shown that: (6.45)
(1  wr0)(C01 
Cll))q(x)
~ qT0(C10
C00)J~(x).
(6.41)
So, the Bayes' detection rule is essentially evaluating the likelihood ratio defined by:
L(x) ~ fl (x)
f(x)'
(6.42)
l = x(1  c~0) + Ia [Ji(x)&  xJ;(x)l&,
0
(6.46)
and an LRT will minimize I for any positive X. In particular: Comparing it to the threshold yields:
X ~X T/'0
Clo  Coo
Cll "
(6.43)
a )q(x) { Z X ~ H1 L(x) = ~ 5 < x ~//0 To satisfy the constraint and to maximize the power, choose ;t so that o~ = oL0,namely:
1  q'l"0 Col 
In particular: j~ (x) { > )t ~ H1 L(x) = ~ < X =~ Ho" (6.44)
c~ =
dl J~fLim ( l IH0) = c%,
X
A decision rule characterized by the likelihood ratio and a threshold as in equation 6.44 is referred to as a likelihood ratio test (LRT). A Bayes' detection scheme is always an LRT. Depending on how the a posteriori probabilities of the two hypotheses are defined, the Bayes' detector can be realized in different ways. One typical example is the socalled MAP detector that renders the minimum probability of error by choosing the hypothesis with the maximum a posteriori probability. Another class of detectors is the minimax detectors, which can be considered an extension of Bayes' detectors. The minimax detector is also an LRT. It assumes no knowledge of the prior probabilities (i.e., ~r0 and nT1) and selects the threshold by choosing the prior probability that renders the maximum Bayes' risk. The minimax detector is a robust detector because its performance does not vary with the prior probabilities.
where fLiHo(l[Ho) is the pdf of the likelihood ratio. The threshold is determined by solving the above equation. The NeymanPearson detector is known to be the most powerful detector for the problem of detecting a constant signal in noise. One advantage of the NeymanPearson detector is that its implementation does not require explicit knowledge of the prior probabilities and costs of decisions. However, as is the case for Bayes' detector, evaluation of the likelihood ratio still requires exact knowledge of the pdf of X under both hypotheses. 6.5.3 D e t e c t i o n o f a K n o w n S i g n a l i n G a u s s i a n Noise Consider a signal detection problem formulated as follows:
Ho : Xi = Ni
6.5.2 N e y m a n  P e a r s o n D e t e c t i o n
The principle of the NeymanPearson criterion is founded on the NeymanPearson Lemma stated below:
H1 : Xi = Ni + S,
(6.47)
NeymanPearson Lemma
Let dx*(x) be a likelihood ratio test with a threshold h* as defined in equation 6.44. Let o0 and [3* be the falsealarm rate and power, respectively, of the test dx* (x). Let dx(x) be an
i = 1, 2 . . . . . K. Assume that S is a deterministic constant and that Xi, i = 1, 2 . . . . . K are iid zeromean Gaussian random variables with a known variance 2. The likelihood ratio is then as written here:
~(xl, x2 . . . . . xK)
"~A fo(xil "
930 Taking the logarithm of L(x) yields the loglikelihood ratio of:
YihFang Huang
Combining equations 6.50 and 6.51 yields a relation between the false alarm rate and power, namely: [3 = 1  go (gol(1  c¢)  v / K S ) . (6.52)
lnL(x) = ~
K
i=1
(2x/Sz $2) 2ff2
(6.48)
Straightforward algebraic manipulations show that the LRT is characterized by comparing a test statistic:
K
T(x) A=E Xi
i=1
(6.49)
Figure 6.1 is a good illustration of the NeymanPearson Lemma. The shaded area under frbH,(t]H1) is the value of power, and the shaded area under frllqo(t]Ho) is the false alarm rate. It is seen that if the threshold is moved to the left, both the power and the false alarm rate increase. Remarks 1. It can be seen from equation 6.51 that as the sample size K increases, ~ decreases and [3 increases. In fact, limK ~ [3 = 1. 2. Define d2A~, which can be taken as the signaltonoise ratio (SNR). From equation 6.52, one can see that lima_~ [3 = 1. 3. If the test statistic is defined with a scaling factor, namely T(x)= ~ = l  02X D the detector remains uns changed as long as the threshold is also scaled accordingly. Let h l ± and then T ( x ) = ~  1 hixi. The . ~ O2' detector is essentially a discretetime matched filter. This detector is also known as the correlation detector as it correlates signal with the observation. When the output of the detector is of a large numerical value, the correlation between the observation and the signal is high, and H1 is (likely to be) true. The NeymanPearson detector can be further characterized by the receiveroperation curve (ROC) shown in Figure 6.2, which is a plot of equation 6.52, parameterized by d, the SNR. It should be noted that all continuous LRTs have ROCs that are above the line of (x = [3 and are concave downward. In addition, the slope of a curve in a ROC curve at a particular point is the value of the threshold X required to achieve the prescribed value of ot and [3.
with the threshold M If T(x) >_ k, then H1 is said to be true; otherwise, H0 is said to be true. K It is interesting to note that T(x) = ~i=1 Xi turns out to be the sufficient statistic for estimating S, as shown in Section 6.4.2. This should not be surprising as detection of a constant signal in noise is a dual problem of estimating the mean of the observations. Under the iid Gaussian assumption, the test statistic is clearly a Gaussian random variable. Furthermore, E{ T(X)]Ho} = O, E{ T(X)[H1} = KS, and Var{ T(X) IHo } = Var{ T(X)[H1 } = K¢ 2. The pdfof T(x) plotted in Figure 6.1, can be written as:
1
1
t2
(tKS) 2
frlHo ( tlHo)        T e ~ frtu, (tlHx)  x/2,rrKo.2 e 2K(r2
The false alarm rate and power are given by:
ot = [3 =
ioo
fTiHo(tlHo)dt. fTiu,(tlH1)dt.
ko ko
The above two equations can be further specified as:
__et2/2Ko.2dt = 1
~.0 2 ~ / ~ 2 [3= go ~ . (6.50)
6.6 Suggested Readings
There is a rich body of literature on the subjects of statistical signal processing and mathematical statistics. A classic textbook on detection and estimation is by van Trees (1968). This book provides a good basic treatment of the subject, and it is easy to read. Since then, many books have been written. Textbooks written by Poor (1988) and Kay (1993, 1998) are the more popular ones. Poor's book provides a fairly complete coverage of the subject of signal detection and estimation. Its presentation is built on the principles of mathematical statistics and includes some brief discussions of nonparametric and robust detection theory. Kay's books are more relevant to signal processing applications though they also include a good deal of theoretical treatment in statistics. In addition, Kassam (1988)
_ _1e
h.0 ~ 2
(t Ks)2/2KcRdt= 1   g o \ ¢,v/~ ,}. (6.51)
?0Kq
In equations 6.50 and 6.51, this is true:
gO(x) a= ix ~ e _ t 2 / 2 d t .
Loo
For the NeymanPearson detector, the threshold k0 is determined by the constraint on the false alarm rate: Xo = (rv~gol(1  ~)
6
Statistical Signal Processing
931
To
FIGURE 6.1 Probability Density Function of the Test Statistic
a2
would be a good reference. It contains indepth coverage of the subject. For quick references to the subject, however, a book by Silvey (1975) is a very useful one. It should be noted that studies of statistical signal processing cannot be effective without proper background in probability theory and random processes. Many reference books on that subject, such as Billingsley (1979), Kendall and Stuart (1977), Papoulis and Pillai (2002), and Stark and Woods (2002), are available.
References
Bickel, P.J., and Doksum, K.A. (1977). Mathematical statistics: Basic ideas and selected topics. San Francisco: HoldenDay. Billingsley, P. (1979). Probability and measure. New York: John Wiley & Sons. Fisher, R.A. (1950). On the mathematical foundations of theoretical statistics. In R.A. Fisher, Contributions to mathematical statistics. New York: John Wiley & Sons. Haykin, S. (2001). Adaptive filter theory. (4th ed.) EnglewoodCliffs, NJ: Prentice Hall. Haykin, S. (Ed.). (1991). Advances in spectrum analysis and array processing. Vols. 1 and 2. Englewood Cliffs, NJ: Prentice Hall. Kassam, S.A. (1988). Signal detection in nonGaussian noise. New York: SpringerVerlag. Kay, S.M. (1993). Fundamentals of statistical signal processing: Estimation theory. Upper Saddle River, NJ: Prentice Hall.
FIGURE 6.2 An Example of Receiver Operation Curves
offers a good understanding of the subject of signal detection in nonGaussian noise, and Weber (1987) offers useful insights into signal design for both coherent and incoherent digital communication systems. If any reader is interested in learning more about mathematical statistics, Bickel and Doksum (1977)
932 Kay, S.M. (1998). Fundamentals of Statistical Signal Processing: Detection Theory. Upper Saddle River, New Jersey: PrenticeHall. Kendall, M.G., and Stuart, A. (1977). The advanced theory of statistics, Vol. 2. New York: Macmillan Publishing. Papoulis, A., and Pillai, S.U. (2002). Probability, random variables, and stochastic processes. (4th Ed.) New York: McGrawHill. Poor, H.V. (1988). An introduction to signal detection and estimation. New York: SpringerVerlag.
YihFang Huang
Silvey, S.D. (1975). Statistical inference. London: Chapman and Hall. Stark, H., and Woods, J.W. (2002). Probability, random processes, and estimation theory. (3rd ed.). Upper Saddle River, NJ: Prentice Hall. van Trees, H.L. (1968). Detection, estimation, and modulation theory. New York: John Wiley & Sons. Weber, C.L. (1987). Elements of detection and signal design. New York: SpringerVerlag.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.