6.1 Introduction with, the principles of Bayesian and Fisher statistics relevant
to estimation are presented. In the discussion of Bayesian
Statistical signal processing is an important subject in signal statistics, emphasis is placed on two types of estimators: the
processing that enjoys a wide range of applications, including minimum mean-squared error (MMSE) estimator and the
communications, control systems, medical signal processing, maximum a posteriori (MAP) estimator. An important class
and seismology. It plays an important role in the design, of linear estimators, namely the Wiener filter, is also presented
analysis, and implementation of adaptive filters, such as adap- as a linear MMSE estimator. In the discussion of Fisher statis-
tive equalizers in digital communication systems. It is also the tics, emphasis will be placed on the maximum likelihood
foundation of multiuser detection that is considered an effect- estimator (MLE), the concept of sufficient statistics, and infor-
ive means for mitigating multiple access interferences in mation inequality that can be used to measure the quality of an
spread-spectrum based wireless communications. estimator.
The fundamental problems of statistical signal processing This chapter also includes a brief discussion of signal detec-
are those of signal detection and estimation that aim to extract tion. The signal detection problem is presented as one of
information from the received signals and help make decisions. binary hypothesis testing. Both Bayesian and Neyman-Pearson
The received signals (especially those in communication optimum criteria are presented and shown to be implemented
systems) are typically modeled as random processes due to with likelihood ratio tests. The principles of hypothesis testing
the existence of uncertainties. The uncertainties usually arise in also find a wide range of applications that involve decision
the process of signal propagation or due to noise in measure- making.
ments. Since signals are considered random, statistical tools are
needed. As such, techniques derived to solve those problems
are derived from the principles of mathematical statistics, 6.2 Bayesian Estimation
namely, hypothesis testing and estimation.
This chapter presents a brief introduction to some basic Bayesian estimation methods are generally employed to esti-
concepts critical to statistical signal processing. To begin mate a random variable (or parameter) based on another
random variable that is usually observable. Derivations of and the maximum a posteriori (MAP) estimator. These two
Bayesian estimation algorithms depend critically on the a estimators are discussed in more detail below.
posteriori distribution of the underlying signals (or signal
parameters) to be estimated. Those a posteriori distributions
are obtained by employing Bayes' rules, thus the name Baye-
6.2.1 M i n i m u m Mean-Squared Error Estimation
sian estimation. When the cost function is defined to be the mean-squared
Consider a random variable X that is some function of error (MSE), as in equation 6.4a, the Bayesian estimate can
another random variable S. In practice, X is what is observed be derived by substituting J(e) = lel 2 = (s - i(xll 2 into equa-
and S is the signal (or signal parameter) that is to be estimated. tion 6.2. Hence the following is true:
Denote the estimate as S(X), and the error as e ~ S - S(X).
Generally, the cost is defined as a function of the estimation
error that clearly depends on both X and S, thus ] ( e ) = 7"~MS ~--- J {L
--00
(S -- g(x) )2fslx(Slx)ds }
fx(x)dx. (6.5)
l(S, X).
The objective of Bayesian estimation is to minimize the Denote the resulting estimate as gMS(X), and use the same
Bayes' risk 7"4, which is defined as the ensemble average (i.e., argument that leads to equation 6.3. A necessary condition
the expected value) of the cost. In particular, the Bayes' risk is for minimizing 7¢MS results as:
defined by:
= ](s, x)~, x(S, x)dxds, The differentiation in equation 6.6 is evaluated at ~ = gMS(X).
--00 --00
Consequently:
where fs, x(S, x) is the joint probability density function (pdf)
of the random variables S and X. In practice, the joint pdf is
not directly obtainable. However, by Bayes' rule: ioo (S-- iMS(X))fsIx(S]x)ds
oo
= 0;
7"4.= J {I
--00 O0
](s, x)fslx(s x)ds x)dx. (6.2)
tional-mean estimate. If the a posteriori distribution is Gauss-
ian, then the conditional-mean estimator is a linear function of
X, regardless of the functional relation between S and X. The
Because the cost function is, in general, non-negative and so following theorem summarizes a very familiar result.
are the pdfs, minimizing the Bayes' risk is equivalent to min-
imizing: Theorem
Let X = [X1, X2. . . . . XK] r and S be jointly Gaussian with
j oo l(s, x)fslx(Slx)ds.
--OO
(6.3) zero means. The MMSE estimate of S based on X is E[SIX]
and E[SIX] = ~ : l a i x i , where ai is chosen such that
E[(S -- ~K=I aiXi)Xj] : 0 f o r a n y j = 1, 2 . . . . . K.
Depending on how the cost function is defined, the Bayesian
estimation principle leads to different kinds of estimation
algorithms. Two of the most commonly used cost functions
6.2.2 M a x i m u m a Posteriori Estimation
are the following: If the cost function is defined as in equation 6.4b, the Bayes'
risk is as follows:
]MS(e) : [el2. (6.4a)
joo
0 if lel < ~- (6.4b) 7~MAP = fx(X)[1- f
f~P+@fslx(Slx)ds]dx.
]MAP(e): 1 if [e I > ~ where A<< 1.
^ A
These example cost functions result in two popular estimators, To minimize RMAP, we maximize ['MA~fsx(S[x)ds.
'J5MAP--~- I
When
namely, the minimum mean-squared error (MMSE) estimator A is extremely small (as is required), this is equivalent to
6 Statistical Signal Processing 923
maximizing fslx(Slx). Thus, SMAP(X) is the value of s that In this example, the MMSE estimate and the MAP estimate are
maximizes fslx ( s[x). equivalent because the a posteriori distribution is Gaussian.
Normally, it is often more convenient (for algebraic ma- Some useful insights into Bayesian estimation can be gained
nipulation) to consider lnfslx(s[x), especially since in (x) is a through this example (van Trees, 1968).
m o n o t o n e nondecreasing function of x. A necessary condition
for maximizing lnfslx(slx) is as written here: Remarks
1. If cy2 << ~ , the a priori knowledge is more useful than
the observed data, and the estimate is very close to the
~lnfslx(SlX) s=e~p(~) = O. (6.8)
a priori mean (i.e., 0). In this case, the a posteriori
distribution almost has no effect on the value of the
Equation 6.8 is often referred to as the MAP equation. estimate.
Employing Bayes' rule, the MAP equation can also be
2 2
2. If % >> -~, the estimate is directly related to the ob-
written as: served data as it is the sample mean, while the a priori
knowledge is of little value.
3. The equivalence of SMAP to gins(X) is not restricted
~lnfxls(XlS) + ~slnJ~(s) S=eMAp(x)= O. (6.9) to the case of Gaussian a posteriori pdf. In fact, if the
cost function is symmetric and nondecreasing and if
the a posteriori pdf is symmetric and unimodal and
Example (van Trees, 1968) satisfies lim,-~o~ ](s, x)fslx(slx) = 0, then the resulting
Let Xi, i = 1, 2 . . . . . K be a sequence of random variables Bayesian estimation (e.g., MAP estimate) is equivalent
modeled as follows: to gMS(X).
Xi = S + Ni i=l;2,...,K,
6.3 Linear Estimation
where S is a zero-mean Gaussian random variable with vari-
ance ¢rs2 and {Ni} is a sequence of independent and identically Since the Bayesian estimators (MMSE and MAP estimates)
distributed (iid) zero-mean Gaussian random variables with presented in the previous section are usually not linear, it
variance %2. D e n o t e X = [X1X2. XK] r. may be impractical to implement them. An alternative is to
restrict the consideration to only the class of linear estimators
K 1 [ and then find the optimum estimatior in that class. As such,
fxls(XlS) = i - i ~ e x p (xi- s)2] the notion of optimality deviates from that of Bayesian esti-
i=1 x/2w~yn 20"n J
---2-
mation, which minimizes the Bayes' risk.
1 [- l s 2] One of the most commonly used optimization criteria is the
fs(s) = ~ = - - exp l-- -ZSq MSE (i.e., minimizing the error variance). This approach leads
X/2~rO's L 2%J
to the class of Wiener filters that includes the Kalman filter as
Ll,(xls)fs(s) a special case. In essence, the Kalman filter is a realizable
fsl×(s[x) - L(x) Wiener filter as it is derived with a realizable (state-space)
model. Those linear estimators are much more appealing in
1 1 [/=~ ~ 1 [ ~(-~=~ ( x i - - -~')ls2 practice due to reduced implementational complexity and
-- f x ( x ) x / 2 ~ r , exp -- O-n2 5)2 JC relative simplicity of performance analysis.
The problem can be described as follows. Given a set of
2 1 x i) zero-mean random variables, X1, X2 . . . . . XK, it is desired to
fslx(SlX) = C(x) exp - ~ l s cr2 +(Ys
o.2/K ~ estimate a random variable S (also zero-mean). The objective
here is to find an estimator S that is linear in Xi and that is
(6.10)
o p t i m u m in some sense, like MMSE.
2 A 1 K __ °'str,, Clearly, if S is constrained to be linear in Xi, it can be
where ¢rp = (~2s2 + ~nn)--i - K°s2+~2 2n '
If the objective is to minimize the MSE, namely: 6.17 are the basis of Wiener filter, which is one of the most
studied and commonly used linear adaptive filters with many
applications (Haykin, 1996, 1991).
E{IIS- Sll2} = E IIS- aiXil[ 2 , (6.11) The matrix inversion in equation 6.17 could present a for-
midable numerical difficulty, especially when the vector di-
mension is high. In those cases, some computationally
a necessary condition is that:
efficient algorithms, like Levinson recursion, can be employed
to mitigate the impact of numerical difficulty.
E{IIS - ~ll 2} = o for all i. (6.12)
~ai
likely that a particular realization (observation) of the random Solving the log-likelihood equation, 6.23, yields:
variable would have been produced from a particular distribu-
tion. The higher the value the likelihood function, the more 1 K
likely the particular value of the parameter will have produced
that realization of x. Hence, the cost function in the Fisherian i=1
estimation paradigm is the likelihood function, and the objec-
Thus, the MLE for a DC signal embedded in additive zero
tive is to find a value of the parameter that maximizes the
mean white Gaussian noise is the sample mean. As it turns
likelihood function, resulting in the m a x i m u m likelihood
out, this sample mean is the sufficient statistic for estimating
estimate (MLE).
0. The concept of the sufficient statistic is critical to the
The MLE can be derived as follows. Let there be a random
o p t i m u m properties of MLE and, in general, to Fisherian
variable whose pdf, fx(x), is parameterized by a parameter 0.
statistics. Generally speaking, the likelihood function is directly
Define the objective function:
related to sufficient statistics, and the MLE is usually a func-
L(x; 0) = fx(x]0), (6.19) tion of sufficient statistics.
strength, i.e., the magnitude, of the DC signal based on the a The above equation can be expressed as:
set of observations xi, i = 1, 2 , . - . , K. The log-likelihood
function as defined in equation 6.22 is given by:
= { oxp[~(i~ ) _K0212ff2]}(~1 )K oxp/[~ 1 -xg
K- ]J/
1 K
l(x; 0) = - K i n ( v / 2 - ~ ) - ~-~2 Z (xi - 0) 2, (6.26)
i=1
Identifying equation 6.26 with 6.24, it can be easily seen that, if
where x = [Xl, 2 2..... Xk ] T. the following is defined:
926 Yih-Fang Huang
be shown easily that UMVUE is a consistent estimator E xi sin (~Ooi+ fiPMLE)= A Z xi sin (~Ooi+ ~)MLE)COS(~Ooi+ ~bML~).
i=0 i=0
and MLE is usually the UMVUE.
Assume that:
6.4.4 Properties of MLE
1 K-1
The MLE has many interesting properties, and it is the purpose
of this section to enlist some of those properties. ~ / ~cos(2~Ooi+2~b).=
0 = 0 for all qb.
system), it may be desirable to detect a signal of constant Prob{d(x) = 0]Ho} + Prob{d(x) = llHo} = 1 (6.39a)
amplitude, and then FI will simply be F0 shifted by a mean Prob{d(x) = 01H1 } q- Prob{d(x) = IIH1} = 1. (6.39b)
value equal to the signal amplitude. In general, this formulation
also assumes that, with probability one, either the null hypoth- The above constraints of equations 6.39a and 6.39b are
esis or the alternative is true. Specifically, let ~r0 and Wl be the simply outcomes of the assumptions that f~0 t3 f~l = 12 and
prior probabilities of H0 and HI being true, respectively. Then: 120 N 121 = 0 (i.e., for every observation, an unambiguous
decision must be made).
~ro q- Tr~ = 1. (6.34)
For any decision rule d(x), there are clearly four possible In addition, assume that the prior probabilities ~r0 and 71 are
outcomes: known. The Bayes' risk is then evaluated as follows:
Among the various decision criteria, Bayes and Neyman-Pear- II(x) __a (1 - ~o)(Col - Cll))q(X).
son are most popular. Detection schemes may also be classified
into parametric and nonparametric detectors. The discussion h(X) a= no(Clo - Coo)g(x).
here focuses on Bayes and Neyman-Pearson criteria that are
considered parametric. Before any decision criteria can be It can be seen easily that h(x) > 0 and/2(x) > 0. Thus, 7-4 can
derived, the following constraints need be stated first: be rewritten as:
6 Statistical Signal Processing 929
j~ (x) { > )t ~ H1
L(x) = ~ < X =~ Ho" (6.44) c~ = J~fLim ( l IH0)dl = c%,
X
A decision rule characterized by the likelihood ratio and a where fLiHo(l[Ho) is the pdf of the likelihood ratio. The thresh-
old is determined by solving the above equation.
threshold as in equation 6.44 is referred to as a likelihood
The Neyman-Pearson detector is known to be the most
ratio test (LRT). A Bayes' detection scheme is always an LRT.
powerful detector for the problem of detecting a constant signal
Depending on how the a posteriori probabilities of the two
in noise. One advantage of the Neyman-Pearson detector is that
hypotheses are defined, the Bayes' detector can be realized in
its implementation does not require explicit knowledge of the
different ways. One typical example is the so-called MAP de-
prior probabilities and costs of decisions. However, as is the case
tector that renders the minimum probability of error by choos-
for Bayes' detector, evaluation of the likelihood ratio still re-
ing the hypothesis with the maximum a posteriori probability.
quires exact knowledge of the pdf of X under both hypotheses.
Another class of detectors is the minimax detectors, which can
be considered an extension of Bayes' detectors. The minimax
detector is also an LRT. It assumes no knowledge of the prior 6.5.3 D e t e c t i o n o f a K n o w n S i g n a l i n G a u s s i a n
probabilities (i.e., ~r0 and nT1) and selects the threshold by Noise
choosing the prior probability that renders the maximum
Consider a signal detection problem formulated as follows:
Bayes' risk. The minimax detector is a robust detector because
its performance does not vary with the prior probabilities.
Ho : Xi = Ni
(6.47)
H1 : Xi = Ni + S,
6.5.2 N e y m a n - P e a r s o n D e t e c t i o n
The principle of the Neyman-Pearson criterion is founded on i = 1, 2 . . . . . K. Assume that S is a deterministic constant and
the Neyman-Pearson Lemma stated below: that Xi, i = 1, 2 . . . . . K are iid zero-mean Gaussian random
variables with a known variance 2. The likelihood ratio is then
as written here:
Neyman-Pearson Lemma
Let dx*(x) be a likelihood ratio test with a threshold h* as
defined in equation 6.44. Let o0 and [3* be the false-alarm rate
and power, respectively, of the test dx* (x). Let dx(x) be an- ~(xl, x2 . . . . . xK) "~A fo(xil "
930 Yih-Fang Huang
Taking the logarithm of L(x) yields the log-likelihood ratio of: Combining equations 6.50 and 6.51 yields a relation between
the false alarm rate and power, namely:
K
lnL(x) = ~ (2x/Sz $2) (6.48)
2ff2 [3 = 1 -- go (go-l(1 -- c¢) -- v / K S ) . (6.52)
i=1
Straightforward algebraic manipulations show that the LRT is Figure 6.1 is a good illustration of the Neyman-Pearson
characterized by comparing a test statistic: Lemma. The shaded area under frbH,(t]H1) is the value of
power, and the shaded area under frllqo(t]Ho) is the false
K
alarm rate. It is seen that if the threshold is moved to the left,
T(x) A=E Xi (6.49)
both the power and the false alarm rate increase.
i=1
_ _1e
?0-Kq
-(t Ks)2/2KcRdt= 1 - - g o \ ¢,v/~ ,}. (6.51)
There is a rich body of literature on the subjects of statistical
signal processing and mathematical statistics. A classic textbook
[3=
h.0 ~ 2
on detection and estimation is by van Trees (1968). This book
provides a good basic treatment of the subject, and it is easy to
In equations 6.50 and 6.51, this is true:
read. Since then, many books have been written. Textbooks
written by Poor (1988) and Kay (1993, 1998) are the more
gO(x) a= ix ~ e _ t 2 / 2 d t . popular ones. Poor's book provides a fairly complete coverage
Loo of the subject of signal detection and estimation. Its presenta-
tion is built on the principles of mathematical statistics and
For the Neyman-Pearson detector, the threshold k0 is deter- includes some brief discussions of nonparametric and robust
mined by the constraint on the false alarm rate: detection theory. Kay's books are more relevant to signal pro-
cessing applications though they also include a good deal of
Xo = (rv~go-l(1 - ~)- theoretical treatment in statistics. In addition, Kassam (1988)
6 Statistical Signal Processing 931
To
References
Bickel, P.J., and Doksum, K.A. (1977). Mathematical statistics: Basic
ideas and selected topics. San Francisco: Holden-Day.
Billingsley, P. (1979). Probability and measure. New York: John Wiley
& Sons.
Fisher, R.A. (1950). On the mathematical foundations of theoretical
statistics. In R.A. Fisher, Contributions to mathematical statistics.
FIGURE 6.2 An Example of Receiver Operation Curves New York: John Wiley & Sons.
Haykin, S. (2001). Adaptive filter theory. (4th ed.) Englewood-Cliffs,
NJ: Prentice Hall.
Haykin, S. (Ed.). (1991). Advances in spectrum analysis and array
offers a good understanding of the subject of signal detection in
processing. Vols. 1 and 2. Englewood Cliffs, NJ: Prentice Hall.
non-Gaussian noise, and Weber (1987) offers useful insights Kassam, S.A. (1988). Signal detection in non-Gaussian noise. New York:
into signal design for both coherent and incoherent digital Springer-Verlag.
communication systems. If any reader is interested in learning Kay, S.M. (1993). Fundamentals of statistical signal processing: Estima-
more about mathematical statistics, Bickel and Doksum (1977) tion theory. Upper Saddle River, NJ: Prentice Hall.
932 Yih-Fang Huang
Kay, S.M. (1998). Fundamentals of Statistical Signal Processing: Silvey, S.D. (1975). Statistical inference. London: Chapman and
Detection Theory. Upper Saddle River, New Jersey: Prentice-Hall. Hall.
Kendall, M.G., and Stuart, A. (1977). The advanced theory of statistics, Stark, H., and Woods, J.W. (2002). Probability, random processes, and
Vol. 2. New York: Macmillan Publishing. estimation theory. (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
Papoulis, A., and Pillai, S.U. (2002). Probability, random variables, and van Trees, H.L. (1968). Detection, estimation, and modulation theory.
stochastic processes. (4th Ed.) New York: McGraw-Hill. New York: John Wiley & Sons.
Poor, H.V. (1988). An introduction to signal detection and estimation. Weber, C.L. (1987). Elements of detection and signal design. New York:
New York: Springer-Verlag. Springer-Verlag.