You are on page 1of 216

Department of Electrical and Computer Engineering Brigham Young University · Provo, Utah

2009

Detection and Estimation Theory

Lecture Notes

For ECEn 672

Prepared by Wynn Stirling Winter Semester, 2009

Section 001

Copyright c 2009, Wynn C. Stirling

0-2

ECEn 672

Contents

1 The Formalism of Statistical Decision Theory

 

1-1

1.1 Game Theory and Decision Theory

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1-1

1.2 The Mathematical Structure of Decision Theory

 

1-4

 

1.2.1 The Formalism of Statistical Decision Theory

 

1-5

1.2.2 Special Cases

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1-9

2 The Multivariate Normal Distribution

 

2-1

2.1 The Univariate Normal Distribution

 

2-1

2.2 Development of The Multivariate Distribution

 

2-1

2.3 Transformation of Variables

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

2-4

2.4 The Multivariate Normal Density

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

2-6

3 Introductory Estimation Theory Concepts

 

3-1

3.1 Notational Conventions

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3-1

3.2 Populations and Statistics

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3-2

 

3.2.1 Sufficient Statistics

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3-3

3.2.2 Complete Sufficient Statistics

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3-9

3.3 Exponential Families

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3-13

3.4 Minimum Variance Unbiased Estimators

 

3-17

4 Neyman-Pearson Theory

 

4-1

4.1 Hypothesis Testing

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4-1

4.2 Simple Hypothesis versus Simple Alternative

 

4-2

4.3 The Neyman-Pearson Lemma

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4-3

4.4 The Likelihood Ratio .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4-8

4.5 Receiver Operating Characteristic

 

.

.

.

.

4-11

4.6 Composite Binary Hypotheses

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4-18

5 Bayes Decision Theory

 

5-1

5.1

The Bayes Principle

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-1

Winter 2009

0-3

5.2 Bayes Risk

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5- 2

5.3 Bayes Tests of Simple Binary Hypotheses

 

5-4

5.4 Bayes Envelope Function .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-10

5.5 Posterior Distributions

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-12

5.6 Randomized Decision Rules

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-15

5.7 Minimax Rules

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5- 17

5.8 Summary of Binary Decision Problems

 

5-18

5.9 Multiple Decision Problems

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-18

5.10 An Important Class of M-Ary Problems

 

5-24

6 Maximum Likelihood Estimation

 

6-1

6.1 The Maximum Likelihood Principle

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6-1

6.2 Maximum Likelihood for Continuous Distributions

 

6-5

6.3 Comments on Estimation Quality

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6-8

6.4 The Cram´er-Rao Bound

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6-9

6.5 Asymptotic Properties of Maximum Likelihood Estimators

 

6-15

6.6 The Multivariate Normal Case .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6-20

6.7 Appendix: Matrix Derivatives

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6-23

7 Conditioning

7-1

7.1 Conditional Densities

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

7-1

7.2 σ -fields

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

7-5

7.3 Conditioning on a σ -field

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

7-10

7.4 Conditional Expectations and Least-Squares Estimation

.

.

.

.

.

.

.

.

.

.

.

7-13

8 Bayes Estimation Theory

 

8-1

8.1 Bayes Risk

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8-

3

8.2 MAP Estimates

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8-

6

8.3 Conjugate Prior Distributions

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8-9

8.4 Improper Prior Distributions .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8-12

8.5 Sequential Bayes Estimation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

8-13

0-4

ECEn 672

9 Linear Estimation Theory

 

9-16

9.1 Introduction .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9-16

9.2 Minimum Mean Square Estimation (MMSE)

 

9-18

9.3 Estimation Given a Single Random Variable

9-19

9.4 Estimation Given two Random Variables

 

9-20

9.5 Estimation Given N Random Variables

9-21

9.6 Mean Square Estimation for Random Vectors

 

9-23

9.7 Hilbert Space of Random Variables

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9-24

9.8 Geometric Interpretation of Mean Square Estimation

 

9-27

9.9 Gram-Schmidt Procedure

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9-29

9.10 Estimation Given the Innovations Process

 

9-33

9.11 Innovations and Matrix Factorizations

 

9-36

9.12 LDU Decomposition

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9-37

9.13 Cholesky Decomposition

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9-38

9.14 White Noise Interpretations

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9-40

9.15 More On Modeling

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9-41

10 Estimation of State Space Systems

 

10-42

10.1 Innovations for Processes with State Space Models

 

10-42

10.2 Innovations Representations

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-48

10.3 A Recursion for P i| i 1

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-50

10.4 The Discrete-Time Kalman Filter

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-52

10.5 Perspective

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-57

10.6 Kalman Filter Example

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-59

10.6.1

Model Equations

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-59

10.7 Interpretation of the Kalman Gain

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-62

10.8 Smoothing

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-63

10.8.1 A Word About Notation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-63

10.8.2 Fixed-Lag and Fixed-Point Smoothing

 

10-64

Winter 2009

0-5

10.9 Extensions to Nonlinear Systems

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-69

10.9.1 Linearization

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-69

10.9.2 The Extended Kalman Filter

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10-72

0-6

ECEn 672

List of Figures

1-1

Loss function (or matrix) for Odd or Even game

1-2

1-2

1-3

. Risk Matrix for Statistical Odd or Even Game

Structure of a Statistical Game

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1-8

1-9

4-1

Illustration of threshold for Neyman-Pearson test

4-6

4-2

Error probabilities for normal variables with different means and equal vari-

 

ances: (a) P F A calculation, (b) P D

 

4-12

4-3

Receiver operating characteristic: normal variables with unequal means and

 

equal variances.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 4-4 Receiver operating characteristic: normal variables with equal means and

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4- 13

 

unequal

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4-15

4-5

Demonstration of convexity property of

4-16

5-1

Bayes envelope function.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-11

5-2 Bayes envelope function: normal variables with unequal means and equal

 
 

variances.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-12

5-3

5-4

5-5

. Bayes envelope function

Loss Function

. Geometrical interpretation of the risk

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-14

5-16

5-21

5-6

Geometrical interpretation of the minimax

5-22

5-7

Loss Function for Statistical Odd or Even Game

5-22

5-8

Risk set for “odd or even” game.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-23

5-9

Decision space for M = 3.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5-28

6-1

7-1

Empiric Distribution Function.

The

. rectangles { X [x x, x + ∆x], Y [y y, y + ∆y ]}

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

family

of

.

.

.

.

6-4

7-3

7-2

The family of trapezoids { X [x x, x + ∆x], Y [y X y, y + X y ]} . 7-4

9-1

Geometric interpretation of conditional

9-28

9-2

Geometric illustration of Gram-Schmidt

9-30

Winter 2009

1-1

1 The Formalism of Statistical Decision Theory

1.1 Game Theory and Decision Theory

This course is primarily focused on the engineering topics of detection and estimation. These topics have their roots in probability theory, and fit in the general area of statistical decision theory. In fact, the component of statistical decision theory that we will be concerned with fits in an even larger mathematical construct, that of game theory. Therefore, to establish these connections and to provide a useful context for future development, we will begin our discussion of this topic with a brief detour into the general area of mathematical games. A two-person, zero sum mathematical game, which we will refer to from now on simply as a game , consists of three basic components:

1. A nonempty set, Θ 1 , of possible actions available to Player 1.

2. A nonempty set, Θ 2 , of possible actions available to Player 2.

3. A loss function, L : Θ 1 × Θ 2 , representing the loss incurred by Player 1 (which, under the zero-sum condition, corresponds to the gain obtained by Player 2).

Any such triple (Θ 1 , Θ 2 , L) defines a game. Here is a simple example taken from [3, Page 2]. Example: Odd or Even . Two contestants simultaneously put up either one or two fin- gers. Player 1 wins if the sum of the digits showing is odd, and Player 2 wins if the sum of the digits showing is even. The winner in all cases receives in dollars the sum of the digits showing, this being paid to him by the loser. To create a triple (Θ 1 , Θ 2 , L) for this game we define Θ 1 = Θ 2 = { 1 , 2 } and define loss function by

L(1 , 1) = 2

L(1 , 2) = 3

L(2 , 1) = 3

L(2 , 2) = 4

1-2

ECEn 672

Θ

Θ

1

2

1

2

1

2

2 −3 −3 4
2
−3
−3
4

Figure 1-1: Loss function (or matrix) for Odd or Even game

We won’t get into the details of how to develop a strategy for this game and many others similar in structure to it; that is a topic in its own right. For those who may be interested in general game theory, [10] is a reasonable introduction.

Exercise 1-1 Consider the well-known game of Prisoner’s Dilemma. Two age nts, denoted X 1 and X 2 , are accused of a crime. They are interrogated separately, but the sentences that are passed are based upon the joint outcome. If they both confess, they are both sentenced to a jail term of three years. If neither confesses, they are both sentenced to a jail term of one year. If one confesses and the other refuses to confess, then the one who confesses is set free and the one who refuses to confess is sentenced to a ja il term of five years. This payoff matrix is illustrated in Table 1-1. The first entry in each quadrant of the payoff matrix corresponds to X 1 ’s payoff, and the second entry corresponds to X 2 ’s payoff. This particular game represents an slight extension to our original definition, since it is not a zero-sum game. When playing such a game, a reasonable strategy is for each agent to make a choice such that, once chosen, neither player would have an incenti ve to depart unilaterally from the outcome. Such a decision pair is called a Nash equilibrium point. In other words, at the Nash equilibrium point, both players can only hurt themselves by departing from their decision. What is the Nash equilibrium point for the Prisone r’s Dilemma game? Explain why this problem is considered a “dilemma.”

Exercise 1-2 In his delightful book, Superior Beings–If They Exist, How Would We Know?, Steven J. Brams introduces a game called the Revelation Game . In this game, there are two

Winter 2009

1-3

X 2 X 1 silent confesses silent 1,1 5,0 confesses 0,5 3,3
X 2
X 1
silent confesses
silent
1,1
5,0
confesses
0,5
3,3

Table 1-1: A typical payoff matrix for the Prisoner’s Dilemma.

P Believe in SB’s ex- istence P faithful with evi- dence (3,4) Don’t believe in
P
Believe in SB’s ex-
istence
P faithful with evi-
dence (3,4)
Don’t believe in
SB’s existence
Reveal him-
P
unfaithful despite
self
evidence (1,1)
SB
Don’t reveal
P faithful without ev-
idence (4,2)
P
unfaithful without
himself
evidence (2,3)

Table 1-2: Payoff for Revelation Game: 4 = best, 3 = next best, 2 = next worst, 1 = worst.

agents. Player 1 we will term the superior being (SB), and Pla yer 2 is a person (P). SB has

two strategies:

1. Reveal himself

2. Don’t reveal himself

Agent P also has two strategies:

1. Believe in SB’s existence

2. Don’t believe in SB’s existence

Figure 1-2 provides the payoff matrix for this game. What is th e Nash equilibrium point for

this game?

We will view decision theory as a game between the decision-maker, or agent, and na-

ture, where nature takes the role of, say, Player 1, and the agent becomes Player 2. The

components of this game, which we will denote by (Θ, , L), become

1. A nonempty set, Θ, of possible states of nature, sometimes referred to as the parameter

1-4

ECEn 672

2. A nonempty set, ∆, of possible decisions available to the agent, sometimes called the decision space .

3. A loss function, L : Θ × , representing the loss incurred by nature (which corresponds to the gain obtained by the agent. This function is also sometimes called the cost function .

Let’s take a minute and detail some of the important differences between game theory and decision theory.

In a two-person game, it is usually assumed that the players are simultaneously trying to maximize their winnings (or minimize their losses), whereas with decision theory, nature assumes essentially a neutral role and only the agent is trying to extremize anything. Of course, if you are paranoid, you might want to consider nature your opponent, but most people feel content to think of nature as b eing neutral. If we do so, we might be willing to be a little more bold in the decision strategies we choose, since we don’t need to be so careful about protecting ourselves.

In a game, we usually assume that each player makes its decision based on exactly the same information (cheating is not allowed), whereas in decision theory, the agent may have available additional information, via observations, that may be used to gain an advantage on nature. This difference is more apparent than real, because there is nothing about game theory that says a game has to be fair. In fact, decision problems can be viewed as simply more complex games. The fact seems to b e, that decision theory is really a subset of the larger body of game theory, but there are enough special issues and structure involved in the way the agent may use observations to warrant its being a theory on its own, apart considered from game theory proper.

1.2 The Mathematical Structure of Decision Theory

In its most straightforward expression, the agent’s job is to guess the state of nature. A good job means small loss, so the agent is motivated to get the most out of any information available in the form of observations. We suppose that before making a decision the agent is

Winter 2009

1-5

permitted to look at the observed value of a random variable or vector, X , whose distribution depends upon the true state of nature, θ Θ.

Before presenting the mathematical development, we need a preliminary definition. Let (Θ 1 , T 1 ) and (Θ 2 , T 2 ) be two measurable spaces. A transition probability is a mapping P :

Θ 1 × T 2 [0 , 1] such that 1

1. For every

θ 1 Θ 1 , P ( θ 1 , · ) is

a probability on (Θ <