You are on page 1of 12

6

Statistical Signal Processing

Yih-Fang H u a n g 6.1 Introduction ....................................................................................... 921


Department of Electrical Engineering, 6.2 Bayesian Estimation ............................................................................. 921
University of Notre Dame, 6.2.1 M i n i m u m M e a n - S q u a r e d Error Estimation • 6.2.2 M a x i m u m a Posteriori Estimation
Notre Dame, Indiana, USA
6.3 Linear Estimation ................................................................................ 923
6.4 Fisher Statistics .................................................................................... 924
6.4.1 Likelihood Functions • 6.4.2 Sufficient Statistics • 6.4.3 I n f o r m a t i o n Inequality a n d
C r a m & - R a o Lower B o u n d • 6.4.4 Properties o f MLE
6.5 Signal Detection .................................................................................. 927
6.5.1 Bayesian Detection • 6.5.2 N e y m a n - P e a r s o n Detection • 6.5.3 Detection o f a K n o w n
Signal in Gaussian Noise
6.6 Suggested Readings .............................................................................. 930
References .......................................................................................... 931

6.1 Introduction with, the principles of Bayesian and Fisher statistics relevant
to estimation are presented. In the discussion of Bayesian
Statistical signal processing is an important subject in signal statistics, emphasis is placed on two types of estimators: the
processing that enjoys a wide range of applications, including minimum mean-squared error (MMSE) estimator and the
communications, control systems, medical signal processing, maximum a posteriori (MAP) estimator. An important class
and seismology. It plays an important role in the design, of linear estimators, namely the Wiener filter, is also presented
analysis, and implementation of adaptive filters, such as adap- as a linear MMSE estimator. In the discussion of Fisher statis-
tive equalizers in digital communication systems. It is also the tics, emphasis will be placed on the maximum likelihood
foundation of multiuser detection that is considered an effect- estimator (MLE), the concept of sufficient statistics, and infor-
ive means for mitigating multiple access interferences in mation inequality that can be used to measure the quality of an
spread-spectrum based wireless communications. estimator.
The fundamental problems of statistical signal processing This chapter also includes a brief discussion of signal detec-
are those of signal detection and estimation that aim to extract tion. The signal detection problem is presented as one of
information from the received signals and help make decisions. binary hypothesis testing. Both Bayesian and Neyman-Pearson
The received signals (especially those in communication optimum criteria are presented and shown to be implemented
systems) are typically modeled as random processes due to with likelihood ratio tests. The principles of hypothesis testing
the existence of uncertainties. The uncertainties usually arise in also find a wide range of applications that involve decision
the process of signal propagation or due to noise in measure- making.
ments. Since signals are considered random, statistical tools are
needed. As such, techniques derived to solve those problems
are derived from the principles of mathematical statistics, 6.2 Bayesian Estimation
namely, hypothesis testing and estimation.
This chapter presents a brief introduction to some basic Bayesian estimation methods are generally employed to esti-
concepts critical to statistical signal processing. To begin mate a random variable (or parameter) based on another

Copyright© 2005by AcademicPress. 921


All rights of reproduction in any form reserved.
922 Yih-Fang Huang

random variable that is usually observable. Derivations of and the maximum a posteriori (MAP) estimator. These two
Bayesian estimation algorithms depend critically on the a estimators are discussed in more detail below.
posteriori distribution of the underlying signals (or signal
parameters) to be estimated. Those a posteriori distributions
are obtained by employing Bayes' rules, thus the name Baye-
6.2.1 M i n i m u m Mean-Squared Error Estimation
sian estimation. When the cost function is defined to be the mean-squared
Consider a random variable X that is some function of error (MSE), as in equation 6.4a, the Bayesian estimate can
another random variable S. In practice, X is what is observed be derived by substituting J(e) = lel 2 = (s - i(xll 2 into equa-
and S is the signal (or signal parameter) that is to be estimated. tion 6.2. Hence the following is true:
Denote the estimate as S(X), and the error as e ~ S - S(X).
Generally, the cost is defined as a function of the estimation
error that clearly depends on both X and S, thus ] ( e ) = 7"~MS ~--- J {L
--00
(S -- g(x) )2fslx(Slx)ds }
fx(x)dx. (6.5)
l(S, X).
The objective of Bayesian estimation is to minimize the Denote the resulting estimate as gMS(X), and use the same
Bayes' risk 7"4, which is defined as the ensemble average (i.e., argument that leads to equation 6.3. A necessary condition
the expected value) of the cost. In particular, the Bayes' risk is for minimizing 7¢MS results as:
defined by:

re=~ e { l ( e ) } ~s ( s - i(x))2fslx(Slx)ds = 0. (6.6)


--(30

= ](s, x)~, x(S, x)dxds, The differentiation in equation 6.6 is evaluated at ~ = gMS(X).
--00 --00
Consequently:
where fs, x(S, x) is the joint probability density function (pdf)
of the random variables S and X. In practice, the joint pdf is
not directly obtainable. However, by Bayes' rule: ioo (S-- iMS(X))fsIx(S]x)ds
oo
= 0;

fs,x(S, x) = fsrx(Slx)fx(x), (6.1) and


oo
the a posteriori pdf can be used to facilitate the derivation of
Bayesian estimates. With the a posteriori pdf, the Bayes' risk
can now be expressed as:
~MS(X) =
f oo
sfslx(slx)ds = E{slx}. (6.7)

In essence, the estimate that minimizes the MSE is the condi-

7"4.= J {I
--00 O0
](s, x)fslx(s x)ds x)dx. (6.2)
tional-mean estimate. If the a posteriori distribution is Gauss-
ian, then the conditional-mean estimator is a linear function of
X, regardless of the functional relation between S and X. The
Because the cost function is, in general, non-negative and so following theorem summarizes a very familiar result.
are the pdfs, minimizing the Bayes' risk is equivalent to min-
imizing: Theorem
Let X = [X1, X2. . . . . XK] r and S be jointly Gaussian with
j oo l(s, x)fslx(Slx)ds.
--OO
(6.3) zero means. The MMSE estimate of S based on X is E[SIX]
and E[SIX] = ~ : l a i x i , where ai is chosen such that
E[(S -- ~K=I aiXi)Xj] : 0 f o r a n y j = 1, 2 . . . . . K.
Depending on how the cost function is defined, the Bayesian
estimation principle leads to different kinds of estimation
algorithms. Two of the most commonly used cost functions
6.2.2 M a x i m u m a Posteriori Estimation
are the following: If the cost function is defined as in equation 6.4b, the Bayes'
risk is as follows:
]MS(e) : [el2. (6.4a)
joo
0 if lel < ~- (6.4b) 7~MAP = fx(X)[1- f
f~P+@fslx(Slx)ds]dx.
]MAP(e): 1 if [e I > ~ where A<< 1.
^ A
These example cost functions result in two popular estimators, To minimize RMAP, we maximize ['MA~fsx(S[x)ds.
'J5MAP--~- I
When
namely, the minimum mean-squared error (MMSE) estimator A is extremely small (as is required), this is equivalent to
6 Statistical Signal Processing 923

maximizing fslx(Slx). Thus, SMAP(X) is the value of s that In this example, the MMSE estimate and the MAP estimate are
maximizes fslx ( s[x). equivalent because the a posteriori distribution is Gaussian.
Normally, it is often more convenient (for algebraic ma- Some useful insights into Bayesian estimation can be gained
nipulation) to consider lnfslx(s[x), especially since in (x) is a through this example (van Trees, 1968).
m o n o t o n e nondecreasing function of x. A necessary condition
for maximizing lnfslx(slx) is as written here: Remarks
1. If cy2 << ~ , the a priori knowledge is more useful than
the observed data, and the estimate is very close to the
~lnfslx(SlX) s=e~p(~) = O. (6.8)
a priori mean (i.e., 0). In this case, the a posteriori
distribution almost has no effect on the value of the
Equation 6.8 is often referred to as the MAP equation. estimate.
Employing Bayes' rule, the MAP equation can also be
2 2
2. If % >> -~, the estimate is directly related to the ob-
written as: served data as it is the sample mean, while the a priori
knowledge is of little value.
3. The equivalence of SMAP to gins(X) is not restricted
~lnfxls(XlS) + ~slnJ~(s) S=eMAp(x)= O. (6.9) to the case of Gaussian a posteriori pdf. In fact, if the
cost function is symmetric and nondecreasing and if
the a posteriori pdf is symmetric and unimodal and
Example (van Trees, 1968) satisfies lim,-~o~ ](s, x)fslx(slx) = 0, then the resulting
Let Xi, i = 1, 2 . . . . . K be a sequence of random variables Bayesian estimation (e.g., MAP estimate) is equivalent
modeled as follows: to gMS(X).

Xi = S + Ni i=l;2,...,K,
6.3 Linear Estimation
where S is a zero-mean Gaussian random variable with vari-
ance ¢rs2 and {Ni} is a sequence of independent and identically Since the Bayesian estimators (MMSE and MAP estimates)
distributed (iid) zero-mean Gaussian random variables with presented in the previous section are usually not linear, it
variance %2. D e n o t e X = [X1X2. XK] r. may be impractical to implement them. An alternative is to
restrict the consideration to only the class of linear estimators
K 1 [ and then find the optimum estimatior in that class. As such,
fxls(XlS) = i - i ~ e x p (xi- s)2] the notion of optimality deviates from that of Bayesian esti-
i=1 x/2w~yn 20"n J
---2-
mation, which minimizes the Bayes' risk.
1 [- l s 2] One of the most commonly used optimization criteria is the
fs(s) = ~ = - - exp l-- -ZSq MSE (i.e., minimizing the error variance). This approach leads
X/2~rO's L 2%J
to the class of Wiener filters that includes the Kalman filter as
Ll,(xls)fs(s) a special case. In essence, the Kalman filter is a realizable
fsl×(s[x) - L(x) Wiener filter as it is derived with a realizable (state-space)
model. Those linear estimators are much more appealing in
1 1 [/=~ ~ 1 [ ~(-~=~ ( x i - - -~')ls2 practice due to reduced implementational complexity and
-- f x ( x ) x / 2 ~ r , exp -- O-n2 5)2 JC relative simplicity of performance analysis.
The problem can be described as follows. Given a set of
2 1 x i) zero-mean random variables, X1, X2 . . . . . XK, it is desired to
fslx(SlX) = C(x) exp - ~ l s cr2 +(Ys
o.2/K ~ estimate a random variable S (also zero-mean). The objective
here is to find an estimator S that is linear in Xi and that is
(6.10)
o p t i m u m in some sense, like MMSE.
2 A 1 K __ °'str,, Clearly, if S is constrained to be linear in Xi, it can be
where ¢rp = (~2s2 + ~nn)--i - K°s2+~2 2n '

expressed as S = ~ : = l aiXi. This expression can be used inde-


From equation 6.10, it can be seen clearly that the condi- pendently of the model that governs the relation between X
tional mean estimate and the MAP estimate are equivalent. In and S. One can see that once the coefficients ai, i = 1, 2 . . . . . K
particular: are determined for all i, S is unambiguously (uniquely) speci-
fied. As such, the problem of finding an o p t i m u m estimator
becomes one of finding the o p t i m u m set of coefficients, and
O"s
Sms(X) = SMAp(X) -- 2 + cr2/K Xi " estimation of a random signal becomes estimation of a set of
s i= 1 deterministic parameters.
924 Yih-Fang Huang

If the objective is to minimize the MSE, namely: 6.17 are the basis of Wiener filter, which is one of the most
studied and commonly used linear adaptive filters with many
applications (Haykin, 1996, 1991).
E{IIS- Sll2} = E IIS- aiXil[ 2 , (6.11) The matrix inversion in equation 6.17 could present a for-
midable numerical difficulty, especially when the vector di-
mension is high. In those cases, some computationally
a necessary condition is that:
efficient algorithms, like Levinson recursion, can be employed
to mitigate the impact of numerical difficulty.
E{IIS - ~ll 2} = o for all i. (6.12)
~ai

Equation 6.12 is equivalent to: 6.4 Fisher Statistics

Generally speaking, there are two schools of thoughts in statis-


foralli (6.13) tics: Bayes and Fisher. Bayesian statistics and estimation were
presented in the previous section, where the emphasis was on
estimation of random signals and parameters. This section is
In other words, a necessary condition for obtaining the linear focused on estimation of a deterministic parameter (or signal).
MMSE estimate is the uncorrelatedness between the estima- A natural question that one may ask is if the Bayesian approach
tion error and the observed random variables. In the context of presented in the previous section is applicable here or if the
vector space, equation 6.13 is the well-known orthogonality estimation of deterministic signals can be treated as a special
principle. Intuitively, the equation states that the linear MMSE case of estimating random signals. A closer examination shows
estimate of S is the projection of S onto the subspace spanned that an alternative approach needs to be taken (van Trees,
by the set of random variables {Xi}. In this framework, the 1968) because the essential issues that govern the performance
norm of the vector space is the mean-square value while the of estimators differ significantly.
inner product between two vectors is the correlation between The fundamental concept underlying the Fisher school of
two random variables. statistics is that of likelihood function. In contrast, Bayesian
Let the autocorrelation coefficients of xi be E{XjX;} = rji statistics is derived from conditional distributions, namely, the
and the crosscorrelation coefficients of Xi and S be a posteriori distributions. This section begins with an intro-
E{SX;} = Pi. Then, equation 6.13 is simply: duction of the likelihood function and a derivation of the
maximum likelihood estimation method. These are followed
K
by the notion of sufficient statistics, which plays an important
Pi = X a)rji for aH i, (6.14) role in Fisherian statistics. Optimality properties of maximum
j=l
likelihood estimates are then examined with the definition of
Fisher information. Cram&-Rao lower bound and minimum
which is essentially the celebrated Wiener-Hopf equation.
variance unbiased estimators are then discussed.
Assume that rij and 9j are known for all i, j, then the coeffi-
cients {ai} can be solved by equation 6.14. In fact, this equation
can be stacked up and put in the following matrix form: 6.4.1 L i k e l i h o o d Functions
Fisher's approach to estimation centers around the concept of
likelihood function (Fisher, 1992). Consider a random variable
r12 r22 rK2 ] a2 = P2 X that has a probability distribution Fx(x) with probability
(6.15)
/ density function (pdf) fx(X) parameterized by a parameter 0.
The likelihood function (with respect to the parameter 0) is
L rlK r2K . rKK J P
defined as:
or simply:
L(x; O)=fx(xlO), (6.18)
R~=- 9. (6.16)
It may appear, at the first sight, that the likelihood function is
Thus, the coefficient vector can be solved by: nothing but the pdf. It is important, however, to note that the
likelihood function is really a function of the parameter 0 for a
a=R 19 , (6.17) fixed value of x, whereas the pdf is a function of the realization
of the random variable x for a fixed value of 0. Therefore, in a
which is sometimes termed the normal equation. The ortho- likelihood function, the variable is 0, while in a pdf the variable
gonality principle of equation 6.13 and the normal equation is x. The likelihood function is a quantitative indication of how
6 Statistical Signal Processing 925

likely that a particular realization (observation) of the random Solving the log-likelihood equation, 6.23, yields:
variable would have been produced from a particular distribu-
tion. The higher the value the likelihood function, the more 1 K
likely the particular value of the parameter will have produced
that realization of x. Hence, the cost function in the Fisherian i=1
estimation paradigm is the likelihood function, and the objec-
Thus, the MLE for a DC signal embedded in additive zero
tive is to find a value of the parameter that maximizes the
mean white Gaussian noise is the sample mean. As it turns
likelihood function, resulting in the m a x i m u m likelihood
out, this sample mean is the sufficient statistic for estimating
estimate (MLE).
0. The concept of the sufficient statistic is critical to the
The MLE can be derived as follows. Let there be a random
o p t i m u m properties of MLE and, in general, to Fisherian
variable whose pdf, fx(x), is parameterized by a parameter 0.
statistics. Generally speaking, the likelihood function is directly
Define the objective function:
related to sufficient statistics, and the MLE is usually a func-
L(x; 0) = fx(x]0), (6.19) tion of sufficient statistics.

where fx(x]O) is the pdf of X for a given 0. Then, the MLE is


6 . 4 . 2 S u f f i c i e n t Statistics
obtained by:
Sufficient statistics is a concept defined in reference to a par-
OML~(X) = Arg { max L( x; 0)}. (6.20) ticular parameter (or signal) to be estimated. Roughly speak-
ing, a sufficient statistic is a function of the set of observations
Clearly, a necessary condition for L(x; 0) to be maximized is that contains all the information possibly obtainable for the
that: estimation of a particular parameter. Given a parameter 0 to be
estimated, assume that x is the vector consisting of the ob-
~ L ( x ; 0) 0=6ML~= 0. (6.21) served variables. A statistic T(x) is said to be a sufficient
statistic if the probability distribution of X given T(x) = t is
independent of 0. In essence, if T(x) is a sufficient statistic,
This equation is sometimes referred to as the likelihood
then all the information regarding estimation of 0 that can be
equation. In general, it is more convenient to consider the
extracted from the observation is contained in T(x). The
log-likelihood function defined as:
Fisher factorization theorem stated below is sometimes used
l(x; 0) = In L(x; 0) (6.22) as a definition for the sufficient statistic.

and the log-likelihood equation as: Fisher Factorization Theorem


A function of the observation set T(X) is a sufficient statistic if
~ l ( x ; 0) 0--~MLE= 0. (6.23) the likelihood function of X can be expressed as:

L(_x; 0) = h(T(x), 0)g(x). (6.24)


Example
Consider a sequence of random variables Xi, i = 1, 2 , - . . , K In the example shown in Section 6.4.1, ~iK=~ xi is a sufficient
modeled as: statistic, and the sample mean 1 ~ f : l xi is the MLE for 0. The
fact that ~/K_I xi is a sufficient statistic can be easily seen by
Xi = 0 + N i , using the Fisher factorization theorem. In particular:

where Ni is a sequence of iid zero-mean Gaussian random


variables with variance 2 . In practice, this formulation can
be considered as one of estimating a DC signal embedded in
white Gaussian noise. The issue here is to estimate the
L(_x; 0) = exp-
1
~-~2/_~1 (xi - 0)2 . (6.25)

strength, i.e., the magnitude, of the DC signal based on the a The above equation can be expressed as:
set of observations xi, i = 1, 2 , . - . , K. The log-likelihood
function as defined in equation 6.22 is given by:
= { oxp[~(i~ ) _K0212ff2]}(~1 )K oxp/[~ 1 -xg
K- ]J/
1 K
l(x; 0) = - K i n ( v / 2 - ~ ) - ~-~2 Z (xi - 0) 2, (6.26)
i=1
Identifying equation 6.26 with 6.24, it can be easily seen that, if
where x = [Xl, 2 2..... Xk ] T. the following is defined:
926 Yih-Fang Huang

In this section, the discussion will be focused on the second


h(T((x), 0) = exp ~- x4 - ~-~-~2j criterion, namely the variance, which is one of the most
commonly used performance measures in statistical signal pro-
cessing. In the estimation of deterministic parameters, Fisher's
and
information is imminently related to variance of the estimators.
The Fisher's information is defined as:
1 )/( [--__ 1 ~f X2
-
g(x) exp 2(r2 ~ 2

then T(x) = y~If=1 xi is clearly a sufficient statistic for estimat-


ing 0. It should be noted that the sufficient statistic may not be Note that the Fisher's information is non-negative, and it is
unique. In fact, it is always subject to a scaling factor. additive if the set of random variables is independent.
In equation 6.26, it is seen that the pdf can be expressed as:
Theorem (Information Inequality)
fx(xlO) = {exp[c(O)T(x)+d(O)+S(x)]}I(x), (6.27) Let T(X) be a statistic such that its variance Vo(T(X)) < oo,
for all 0 in the parameter space 19. Define t~(0) = Eo{T(X)}.
where c(0) = ~ , T(x) ~ Y~f=l xi, d(O) -- - ~ F1_ , i = lf x i2, S ( x ) = Assume that the following regularity condition holds:
-- ~ln (2-rm2) -- ~ Y'~i=I x2, and I(x) is an indicator function
that is valued at 1 wherever the value of the pdf is nonzero and
zero otherwise. A pdf that can be expressed in the form given E{[~--~l(x;O)]} = 0 forall 0EO, (6.29)
in equation 6.27 is said to belong to the exponential family of
distributions (EFOD). where l(x, O) has been defined in equation 6.22. Assume
It can be verified easily that Gaussian, Laplace (or two-sided further that t~(0) is differentiable for all 0. Then:
exponential), binomial, and Poisson distributions all belong to
the EFOD. When a pdf belongs to the EFOD, one can easily W(0)] 2
identify the sufficient statistic. In fact, the mean and variance Vo(T(X)) < _ - (6.30)
I(0)
of the sufficient statistic can be easily calculated (Bickel and
Doksum, 1977).
Remarks
As stated previously, the fact that the MLE is a function of
the sufficient statistic helps to ensure its quality as an estima- 1. The regularity condition defined in equation 6.29 is a
tor. This will be discussed in more detail in the subsequent restriction imposed on the likelihood function to guar-
sections. antee that the order of expectation operation and dif-
ferentiation is interchangeable.
2. If the regularity condition holds also for second order
6.4.3 Information Inequality and Cramdr-Rao derivatives, I(0) as defined in equation 6.28 can also be
Lower Bound evaluated as:
There are several criteria that can be used to measure the 82
quality of an estimator. If 0(X) is an estimator for the param- I(0)-----E{-~I(x;O)}.
eter 0 based on the observation X, three criteria are typically
used to evaluate its quality: 3. The subscript 0 of the expectation (E0) and of the
variance (V0) indicates the dependence of expectation
1. Bias: If El0 (X)] = 0,0(X) is said to be an unbiased and variance on 0.
estimator. 4. The information inequality gives a lower bound on the
2. Variance: For estimation of deterministic parameters variance that any estimate can achieve. It thus reveals
and signals, variance of the estimator is the same as the best that an estimator can do as measured by the
variance of the estimation error. It is a commonly used error variance.
performance measure, for it is practically easy to 5. If T(X) is an unbiased estimator for 0, (i.e.,
evaluate. E0{ T(X)} = 0), then the information inequality, equa-
3. Consistency: This is an asymptotic property that is tion 6.30, reduces to:
examined when the sample size (i.e., the dimension
1
of the vector X) approaches infinity. If 0(X) converges Vo(T(X)) < 1(0--~' (6.31)
with probability one to 0, then it is said to be strong
consistent. If it converges in probability, it is weak which is the well-known Cram6r-Rao lower bound
consistent. (CRLB).
6 Statistical Signal Processing 927

6. In statistical signal processing, closeness to the CRLB is Xi=Acos(o~oi+~b)+Ni i=0, 1. . . . . K - 1


often used as a measure of efficiency of an (unbiased)
estimator. In particular, one may define the Fisherian where {Ni} is an iid sequence of zero-mean Gaussian random
efficiency of an unbiased estimator 0(X) as: variables with variance ty2. Employing equation 6.23, MLE can
n(6(x))- i 1(0) be obtained by minimizing:
Vo{O(X)} " (6.32)
K 1
Note that T1 is always less than one, and the larger r 1, l(qb) = Z ( x i - A c o s (t.Ooi + ~b))2.
the more efficient that estimator is. In fact, when 1] = l, i=1
the estimator achieves the CRLB and is said to be an
efficient estimator in the Fisherian sense. Differentiating J(0) with respect to 0 and setting it equal to
7. If an unbiased estimator has a variance that achieves zero yields:
the CRLB for all 0 ¢ O, it is called a uniformly min-
imum variance unbiased estimator (UMVUE). It can K 1 K-1

be shown easily that UMVUE is a consistent estimator E xi sin (~Ooi+ fiPMLE)= A Z xi sin (~Ooi+ ~)MLE)COS(~Ooi+ ~bML~).
i=0 i=0
and MLE is usually the UMVUE.
Assume that:
6.4.4 Properties of MLE
1 K-1
The MLE has many interesting properties, and it is the purpose
of this section to enlist some of those properties. ~ / ~cos(2~Ooi+2~b).=
0 = 0 for all qb.

1. It is easy to see that the MLE may be biased. This is


because being unbiased was not part of the objective in Then the MLE for the phase can be approximated as:
seeking MLE. MLE, however, is always asymptotically
K 1 • •
unbiased. ~i=0 Xi sin (COot)
2. As shown in the previous section, MLE is a function of ~bML~ ~ -- arctan
~f=01 xi cos (~Ooi) "
the sufficient statistic. This can also be seen from the
Fisher factorization theorem. Perceptive readers may see that implementation of MLE can
3. MLE is asymptotically a m i n i m u m variance unbiased easily become complex and numerically difficult, especially
estimator. In other words, its variance asymptotically when the underlying distribution is non-Gaussian. If the par-
achieves the CRLB. ameter to be estimated is a simple scalar, and its admissible
4. MLE is consistent. In particular, it converges to the values are limited to a finite interval, search algorithms can be
parameter with probability one (or in probability). employed to guarantee satisfactory results. If this is not the case,
5. Under the regularity condition of equation 6.29, if more sophisticated numerical optimization algorithms will be
there exists an unbiased estimator whose variance needed to render good estimation results. Among others, itera-
attains the CRLB, it is the MLE. tive algorithms such as the Newton-Raphson method and the
6. Generally speaking, the MLE of a transformation of a expectation maximization method are often employed.
parameter (or signal) is the transformation of the MLE
of that parameter (or signal). This is referred to as the
invariance properly of the MLE.
6.5 Signal Detection
The fact that MLE is a function of sufficient statistics means
that it depends on relevant information. This does not neces- The problem of signal detection can be formulated mathemat-
sarily mean that it will always be the best estimator (in the ically as one of binary hypothesis testing. Let X be the random
sense of, say, m i n i m u m variance), for it may not make the best variable that represents the observation, and let the observa-
use of the information. However, when the sample size is large, tion space be denoted by f~ (i.e., x E f~). There are basically
all the information relevant to the unknown parameter is two hypotheses:
essentially available to the MLE estimator. This explains why
MLE is asymptotically unbiased, it is consistent, and its vari- Null hypothesis H0 : X ~ F0. (6.33a)
ance achieves asymptotically the CRLB.
Alternative H1 : X ~ F 1. (6.33b)
Example (Kay, 1993)
Consider the problem of estimating the phase of a sinusoidal The F0 is the probability distribution of X given that H0 is true,
signal received with additive Gaussian noise. The problem is and F1 is the probability distribution of X given that H1 is true.
formulated as: In some applications (e.g., a simple radar communication
928 Yih-Fang Huang

system), it may be desirable to detect a signal of constant Prob{d(x) = 0]Ho} + Prob{d(x) = llHo} = 1 (6.39a)
amplitude, and then FI will simply be F0 shifted by a mean Prob{d(x) = 01H1 } q- Prob{d(x) = IIH1} = 1. (6.39b)
value equal to the signal amplitude. In general, this formulation
also assumes that, with probability one, either the null hypoth- The above constraints of equations 6.39a and 6.39b are
esis or the alternative is true. Specifically, let ~r0 and Wl be the simply outcomes of the assumptions that f~0 t3 f~l = 12 and
prior probabilities of H0 and HI being true, respectively. Then: 120 N 121 = 0 (i.e., for every observation, an unambiguous
decision must be made).
~ro q- Tr~ = 1. (6.34)

The objective here is to decide whether H0 or HI is true 6.5.1 Bayesian D e t e c t i o n


based on the observation of X = x. A decision rule, namely a
detection scheme, d(x) essentially partitions f~ into two sub- The objective of a Bayes criterion is to minimize the so-called
spaces 1)0 and fll. The subspace 1~0 consists of all observations Bayes' risk which is, again, defined as the expected value of the
that lead to the decision that H0 is true, while fll consists of all cost. To derive a Bayes' detection rule, the costs of making
observations that lead to the decision that Hi is true. For decisions need to be defined first. Let the costs be denoted by
notational convenience, one may also define d(x) as follows: Cq being cost of choosing i when j is true. In particular:

Col = cost of choosing Ho when H1 is true.


d(x) = ~ 0 if x E ~o (6.35)
[ 1 if x E 121 " Clo = cost of choosing Hi when Ho is true.

For any decision rule d(x), there are clearly four possible In addition, assume that the prior probabilities ~r0 and 71 are
outcomes: known. The Bayes' risk is then evaluated as follows:

1. Decide H0 when H0 is true


2. Decide H0 when H1 is true 7-4 A=E{C} = "rro{Prob{d(x)= 0lHo}Coo + Prob{d(x) = l l S o } C l o }

3. Decide H1 when H0 is true + ~rl{Prob(d(x) = 0]H1)Clo + Prob(d(x) = IIH1)C.}.


4. Decide H1 when H~ is true (6.40)
Two types of error can occur:
Substituting equations 6.34, 6.39a and 6.39b into equation
1. Type-I error (false alarm) Decide H1 when H0 is true. 6.40 yields:
The probability of type-I error is as follows:

"]-~=IToCo0I fo(x)dx-~-"IToCloJ fo(x)dx


cx = J j~(x)dx. (6.36) ~o fh
fh
+ ~ICol J fl(X)dx + 'IT1CllL fl(X)dx
2. Type-II error (miss): Decide H0 when/-/1 is true. The ~Qo 1
probability of type-II error is as follows: = qToC10 q- (1 -- '/T0)Cll -~ I[) {(1 -- 'Ti'0)(C01 -- C l l ) ~ ] ( x )
0
f - "~0(G0 - Coo)do(x)}&.
Pm= ] fl(X)dx. (6.37)
d f~0
Note that the sum of the first two terms is a constant. In
Another quantity of concern is the probability of detection, general, it is reasonable to assume that:
which is often termed the power in signal detection literature,
defined by: C01-Cll >0 and C10-C00>0.

f In other words, the costs of making correct decisions are less


[3= l - Pm= [ fl(x)dx. (6.38)
than those of making incorrect decisions:
J

Among the various decision criteria, Bayes and Neyman-Pear- II(x) __a (1 - ~o)(Col - Cll))q(X).
son are most popular. Detection schemes may also be classified
into parametric and nonparametric detectors. The discussion h(X) a= no(Clo - Coo)g(x).
here focuses on Bayes and Neyman-Pearson criteria that are
considered parametric. Before any decision criteria can be It can be seen easily that h(x) > 0 and/2(x) > 0. Thus, 7-4 can
derived, the following constraints need be stated first: be rewritten as:
6 Statistical Signal Processing 929

f other arbitrary likelihood ratio test with threshold X and false-


T4 = constant + ] [Ii(x) - I2(x)]dx.
J ~0 alarm rate and power as, respectively, oLand [3. If o~ G ~*, then
[3_<[3*.
To minimize 7-4, the observation space f} need be partitioned The Neyman-Pearson Lemma showed that if one desires to
such that x E f}l whenever: increase the power of an LRT, one must also accept the conse-
quence of an increased false-alarm rate. As such, the Neyman-
/l(X) __> / 2 ( X ) . Pearson detection criterion is aimed to maximize the power
under the constraint that the false-alarm rate be upper
In other words, decide H~ ifi bounded by, say, s0. The Neyman-Pearson detector can be
derived by first defining a cost function:
(1 - wr0)(C01 - Cll))q(x) ~ qT0(C10-- C00)J~(x). (6.41)
l = (1 - [3) + )t(& - o~0). (6.45)
So, the Bayes' detection rule is essentially evaluating the likeli-
It can be shown that:
hood ratio defined by:

l = x(1 - c~0) + Ia [Ji(x)& - xJ;(x)l&, (6.46)


L(x) ~- fl (x) (6.42) 0
f(x)'
and an LRT will minimize I for any positive X. In particular:
Comparing it to the threshold yields:
a )q(x) { Z X ~ H1
X ~X T/'0 Clo - Coo (6.43) L(x) = ~ 5 < x ~/-/0
1 -- q'l"0 Col - Cll "
To satisfy the constraint and to maximize the power, choose ;t
In particular: so that o~ = oL0,namely:

j~ (x) { > )t ~ H1
L(x) = ~ < X =~ Ho" (6.44) c~ = J~fLim ( l IH0)dl = c%,
X

A decision rule characterized by the likelihood ratio and a where fLiHo(l[Ho) is the pdf of the likelihood ratio. The thresh-
old is determined by solving the above equation.
threshold as in equation 6.44 is referred to as a likelihood
The Neyman-Pearson detector is known to be the most
ratio test (LRT). A Bayes' detection scheme is always an LRT.
powerful detector for the problem of detecting a constant signal
Depending on how the a posteriori probabilities of the two
in noise. One advantage of the Neyman-Pearson detector is that
hypotheses are defined, the Bayes' detector can be realized in
its implementation does not require explicit knowledge of the
different ways. One typical example is the so-called MAP de-
prior probabilities and costs of decisions. However, as is the case
tector that renders the minimum probability of error by choos-
for Bayes' detector, evaluation of the likelihood ratio still re-
ing the hypothesis with the maximum a posteriori probability.
quires exact knowledge of the pdf of X under both hypotheses.
Another class of detectors is the minimax detectors, which can
be considered an extension of Bayes' detectors. The minimax
detector is also an LRT. It assumes no knowledge of the prior 6.5.3 D e t e c t i o n o f a K n o w n S i g n a l i n G a u s s i a n
probabilities (i.e., ~r0 and nT1) and selects the threshold by Noise
choosing the prior probability that renders the maximum
Consider a signal detection problem formulated as follows:
Bayes' risk. The minimax detector is a robust detector because
its performance does not vary with the prior probabilities.
Ho : Xi = Ni
(6.47)
H1 : Xi = Ni + S,
6.5.2 N e y m a n - P e a r s o n D e t e c t i o n
The principle of the Neyman-Pearson criterion is founded on i = 1, 2 . . . . . K. Assume that S is a deterministic constant and
the Neyman-Pearson Lemma stated below: that Xi, i = 1, 2 . . . . . K are iid zero-mean Gaussian random
variables with a known variance 2. The likelihood ratio is then
as written here:
Neyman-Pearson Lemma
Let dx*(x) be a likelihood ratio test with a threshold h* as
defined in equation 6.44. Let o0 and [3* be the false-alarm rate
and power, respectively, of the test dx* (x). Let dx(x) be an- ~(xl, x2 . . . . . xK) "~A fo(xil "
930 Yih-Fang Huang

Taking the logarithm of L(x) yields the log-likelihood ratio of: Combining equations 6.50 and 6.51 yields a relation between
the false alarm rate and power, namely:
K
lnL(x) = ~ (2x/Sz $2) (6.48)
2ff2 [3 = 1 -- go (go-l(1 -- c¢) -- v / K S ) . (6.52)
i=1

Straightforward algebraic manipulations show that the LRT is Figure 6.1 is a good illustration of the Neyman-Pearson
characterized by comparing a test statistic: Lemma. The shaded area under frbH,(t]H1) is the value of
power, and the shaded area under frllqo(t]Ho) is the false
K
alarm rate. It is seen that if the threshold is moved to the left,
T(x) A=E Xi (6.49)
both the power and the false alarm rate increase.
i=1

with the threshold M If T(x) >_ k, then H1 is said to be true; Remarks


otherwise, H0 is said to be true. 1. It can be seen from equation 6.51 that as the sample
K Xi turns out to be
It is interesting to note that T(x) = ~i=1 size K increases, ~ decreases and [3 increases. In fact,
the sufficient statistic for estimating S, as shown in Section limK ~ [3 = 1.
6.4.2. This should not be surprising as detection of a constant 2. Define d2A~, which can be taken as the signal-to-
signal in noise is a dual problem of estimating the mean of the noise ratio (SNR). From equation 6.52, one can see
observations. that lima_~ [3 = 1.
Under the iid Gaussian assumption, the test statistic is clearly 3. If the test statistic is defined with a scaling factor,
a Gaussian random variable. Furthermore, E{ T(X)]Ho} = O, namely T(x)= ~ = l - -0-2 s X D the detector remains un-
E{ T(X)[H1} = KS, and Var{ T(X) IHo } = Var{ T(X)[H1 } = K¢ 2. changed as long as the threshold is also scaled accord-
The pdfof T(x) plotted in Figure 6.1, can be written as: . ~ ±O-2' and then T ( x ) = ~ - 1 hixi. The
ingly. Let h l--
detector is essentially a discrete-time matched filter.
1 t2 This detector is also known as the correlation detector
frlHo ( tlHo) - - - - - - - T e ~ as it correlates signal with the observation. When the
1 -(t-KS) 2 output of the detector is of a large numerical value, the
frtu, (tlHx) - x/2,rrKo.2 e 2K(r2 correlation between the observation and the signal is
high, and H1 is (likely to be) true.
The false alarm rate and power are given by: The Neyman-Pearson detector can be further characterized
by the receiver-operation curve (ROC) shown in Figure 6.2,
ioo which is a plot of equation 6.52, parameterized by d, the SNR.
ot = fTiHo(tlHo)dt.
ko It should be noted that all continuous LRTs have ROCs that
are above the line of (x = [3 and are concave downward. In
[3 = fTiu,(tlH1)dt. addition, the slope of a curve in a ROC curve at a particular
ko
point is the value of the threshold X required to achieve the
The above two equations can be further specified as: prescribed value of ot and [3.

__e-t2/2Ko.2dt = 1 go ~ . (6.50) 6.6 Suggested Readings


~.0 2 ~ / ~ 2

_ _1e
?0-Kq
-(t Ks)2/2KcRdt= 1 - - g o \ ¢,v/~ ,}. (6.51)
There is a rich body of literature on the subjects of statistical
signal processing and mathematical statistics. A classic textbook
[3=
h.0 ~ 2
on detection and estimation is by van Trees (1968). This book
provides a good basic treatment of the subject, and it is easy to
In equations 6.50 and 6.51, this is true:
read. Since then, many books have been written. Textbooks
written by Poor (1988) and Kay (1993, 1998) are the more
gO(x) a= ix ~ e _ t 2 / 2 d t . popular ones. Poor's book provides a fairly complete coverage
Loo of the subject of signal detection and estimation. Its presenta-
tion is built on the principles of mathematical statistics and
For the Neyman-Pearson detector, the threshold k0 is deter- includes some brief discussions of nonparametric and robust
mined by the constraint on the false alarm rate: detection theory. Kay's books are more relevant to signal pro-
cessing applications though they also include a good deal of
Xo = (rv~go-l(1 - ~)- theoretical treatment in statistics. In addition, Kassam (1988)
6 Statistical Signal Processing 931

To

FIGURE 6.1 Probability Density Function of the Test Statistic

would be a good reference. It contains in-depth coverage of the


subject. For quick references to the subject, however, a book by
Silvey (1975) is a very useful one.
It should be noted that studies of statistical signal processing
cannot be effective without proper background in probability
theory and random processes. Many reference books on that
a2 subject, such as Billingsley (1979), Kendall and Stuart (1977),
Papoulis and Pillai (2002), and Stark and Woods (2002), are
available.

References
Bickel, P.J., and Doksum, K.A. (1977). Mathematical statistics: Basic
ideas and selected topics. San Francisco: Holden-Day.
Billingsley, P. (1979). Probability and measure. New York: John Wiley
& Sons.
Fisher, R.A. (1950). On the mathematical foundations of theoretical
statistics. In R.A. Fisher, Contributions to mathematical statistics.
FIGURE 6.2 An Example of Receiver Operation Curves New York: John Wiley & Sons.
Haykin, S. (2001). Adaptive filter theory. (4th ed.) Englewood-Cliffs,
NJ: Prentice Hall.
Haykin, S. (Ed.). (1991). Advances in spectrum analysis and array
offers a good understanding of the subject of signal detection in
processing. Vols. 1 and 2. Englewood Cliffs, NJ: Prentice Hall.
non-Gaussian noise, and Weber (1987) offers useful insights Kassam, S.A. (1988). Signal detection in non-Gaussian noise. New York:
into signal design for both coherent and incoherent digital Springer-Verlag.
communication systems. If any reader is interested in learning Kay, S.M. (1993). Fundamentals of statistical signal processing: Estima-
more about mathematical statistics, Bickel and Doksum (1977) tion theory. Upper Saddle River, NJ: Prentice Hall.
932 Yih-Fang Huang

Kay, S.M. (1998). Fundamentals of Statistical Signal Processing: Silvey, S.D. (1975). Statistical inference. London: Chapman and
Detection Theory. Upper Saddle River, New Jersey: Prentice-Hall. Hall.
Kendall, M.G., and Stuart, A. (1977). The advanced theory of statistics, Stark, H., and Woods, J.W. (2002). Probability, random processes, and
Vol. 2. New York: Macmillan Publishing. estimation theory. (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
Papoulis, A., and Pillai, S.U. (2002). Probability, random variables, and van Trees, H.L. (1968). Detection, estimation, and modulation theory.
stochastic processes. (4th Ed.) New York: McGraw-Hill. New York: John Wiley & Sons.
Poor, H.V. (1988). An introduction to signal detection and estimation. Weber, C.L. (1987). Elements of detection and signal design. New York:
New York: Springer-Verlag. Springer-Verlag.