You are on page 1of 9

LAST WEEK:

4.2 Cramer-Rao Lower Bound


Then, the variance of any unbiased estimator 𝜃̂ must satisfy

1
var(𝜃̂) ≥
𝜕 2 ln 𝑝(𝐱; 𝜃)
−𝐸[ ]
⏟ 𝜕𝜃 2
⏟ Fisher Bilgisi
CRB

An unbiased estimator may be found that attains the bound for all 𝜃 if and only if

𝜕 ln 𝑝(𝐱; 𝜃)
= 𝐼(𝜃)(𝑔(𝐱) − 𝜃)
𝜕𝜃

MVU estimator, is 𝜃̂ = 𝑔(𝐱), and the minimum variance is 1/𝐼(𝜃), where 𝐼(𝜃) is the Fisher
information. The CRLB may also be expressed in a slightly different form
2
𝜕 2 ln 𝑝(𝐱; 𝜃) 𝜕 ln 𝑝(𝐱; 𝜃)
−𝐸 [ ] = 𝐸 [( ) ]
⏟ 𝜕𝜃 2 ⏟ 𝜕𝜃
𝐼(𝜃) 𝐼(𝜃)

i.e,
1
var(𝜃̂ ) ≥ 2
𝜕 ln 𝑝(𝐱; 𝜙)
𝐸 [( ) ]
⏟ 𝜕𝜃
CRB

The CRLB is
1. Nonnegative
2. Additive for independent observations → CRLB for 𝑁 iid observations is 1/𝑁 𝛼 times
that for one observation

If
𝑥[𝑛] = 𝑠[𝑛; 𝜃] + 𝑤[𝑛] 𝑛 = 0,1, ⋯ 𝑁 − 1
where 𝑤[𝑛] is zero-mean AWGN with variance 𝜎 2 .

𝜎2
var(𝜃̂) ≥ 2
𝜕𝑠[𝑛; 𝜃]
∑𝑁−1
𝑛=0 ( )
𝜕𝜃

• This form demonstrates the importance of the signal dependence on 𝜃.


• Signals that change rapidly as the unknown parameter changes result in more accurate
estimators.
4.4 Transformation of Parameters
• Sometimes we are interested in estimating a function of some parameters.

o For example, we may be interested in estimating 𝐴2 instead of the amplitude


𝐴
o We may be interested in estimating not the phase 𝜙 itself, but cos 𝜙
• If we know the CRLB for the parameter 𝜃 and we desire to estimate 𝛼 = 𝑔(𝜃), then
the CRLB is

2
𝜕𝑔(𝜃)
( ) 𝜕𝑔(𝜃)
2
𝜕𝜃
var(𝛼̂) ≥ ⇒ CRB𝑔(𝜃) = ( ) × CRB𝜃
𝜕 2 ln 𝑝(𝐱; 𝜃) 𝜕𝜃
−𝐸 [ ]
𝜕𝜃 2

• For example, consider the DC level in WGN, the CRLB for the amplitude is var(𝐴) ≥
𝜎 2 /𝑁. If we are interested in estimating 𝐴2 , then

(2𝐴)2 𝜎2
̂2 ) ≥
var(𝐴 = 4𝐴2
𝑁⁄ 𝑁
𝜎 2

4.5 Extension to a Vector Parameter


SUMMARY

• It’s time we extend the results of previous sections to case where we wish to estimate
𝑇
a vector parameter 𝜽 = [𝜃1 , 𝜃2 , ⋯ , 𝜃𝑝 ]

• ̂ is unbiased.
We will assume that 𝜽
• For the vector parameter, the CRLB is found as

var(𝜃̂𝑖 ) ≥ [𝐈−1 (𝜽)]𝑖𝑖

where 𝐈(𝛉) is the 𝑝 × 𝑝 Fisher information matrix.


• Elements of the Fisher information matrix is defined by

𝜕 2 ln 𝑝(𝐱; 𝜽)
[𝐈(𝜽)]𝑖𝑗 = −𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗

for 𝑖 = 1,2, ⋯ , 𝑝 and 𝑗 = 1,2, ⋯ 𝑝.

• Note: For 𝑝 = 1, the Fisher information matrix becomes the Fisher information, i.e.,
• 𝐈(𝜽) = 𝐼(𝜃) and we have scalar CRLB.

THEOREM 2: CRAMER-RAO LOWER BOUND – VECTOR PARAMETER


It is assumed that the pdf 𝑝(𝐱, 𝛉) satisfies the regularity conditions:
𝜕 ln 𝑝(𝐱, 𝜽)
𝐸[ ]
𝜕 ln 𝑝(𝐱, 𝜽) 𝜕𝜃 1
𝐸[ ]= ⋮ =0
𝜕𝜽 𝜕 ln 𝑝(𝐱, 𝜽)
𝐸[ ]
[ 𝜕𝜃𝑝 ]

and the covariance matrix 𝐂𝛉̂ is defined by


𝐻
̂ − 𝜽)(𝜽
𝐂𝛉̂ = E [(𝜽 ̂ − 𝜽) ].

then, the estimation-error covariance matrix satisfies

𝐂𝛉̂ − 𝐈−𝟏 (𝜽) ≽ 0

where ≽ denotes that the matrix is positive semi-definite. Nothing that for a positive semi-
definite matrix, the diagonal elements are always non-negative and using the above
relationship we obtain

̂ 𝑖 ) = [𝐂𝜽̂ ]𝑖𝑖 ≥ [𝐈 −1 (𝜽)]𝑖𝑖 .


var(𝜽

An unbiased estimator may be found that attains the CRLB bound in that 𝐂𝛉̂ = 𝐈 −𝟏 (𝛉) if and
only if

𝜕 ln 𝑝(𝐱, 𝜽)
̂ − 𝜽)
= 𝐈(𝜽)(𝐠(𝐱) − 𝜽) = 𝐈(𝜽)(𝜽
𝜕𝜽
̂ = 𝒈(𝐱) is the MVU estimator and has the estimation error covariance 𝐈−1 (𝜽).
where 𝜽

Example: DC Level in WGN Revisited

Now we extend the problem of DC in WGN to the case where in addition to 𝐴, the noise
variance is also unknown. The parameter vector is

𝑥[𝑛] = 𝐴 + 𝑤[𝑛]

𝛉 = [𝐴, 𝜎 2 ]𝑇

and therefore 𝑝 = 2. The 2 × 2 Fisher information matrix is

𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽)
−𝐸 [ ] −𝐸 [ ]
𝜕𝐴2 𝜕𝐴𝜕𝜎 2
𝐈(𝛉) =
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽)
−𝐸 [ ] −𝐸 [ 2 ]
[ 𝜕𝐴𝜕𝜎 2 𝜕𝜎 2 ]
The likelihood function is
𝑁−1
1 1 2
𝑝(𝐱; 𝛉) = 𝑁 exp [− 2𝜎 2 ∑(𝑥[𝑛] − 𝐴) ]
(2𝜋𝜎 2 ) 2 𝑛=0
The log-likelihood function is
𝑁−1
𝑁 𝑁 1
ln 𝑝(𝐱; 𝜽) = − ln(2𝜋) − ln 𝜎 2 − 2 ∑(𝑥[𝑛] − 𝐴)2
2 2 2𝜎
𝑛=0

The derivatives are easily found as


𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 1 𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁
= 2 ∑(𝑥[𝑛] − 𝐴) ⇒ = −
𝜕𝐴 𝜎 𝜕𝐴2 𝜎2
𝑛=0

𝑁−1 𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 𝑁 1 2
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁 1
= − + ∑(𝑥[𝑛] − 𝐴) ⇒ = − ∑(𝑥[𝑛] − 𝐴)2
𝜕𝜎 2 2𝜎 2 2𝜎 4 𝜕𝜎 22 2𝜎 4 𝜎 6
𝑛=0 𝑛=0

𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1
= − ∑(𝑥[𝑛] − 𝐴)
𝜕𝐴 𝜕𝜎 2 𝜎4
𝑛=0

Then we have
𝑁−1
𝑁 1
−𝐸 [− 2 ] −𝐸 [− 4 ∑(𝑥[𝑛] − 𝐴)] 𝑁
𝜎 𝜎 2
0
𝐈(𝛉) = 𝑛=0
= [𝜎 ]
𝑁−1 𝑁−1 𝑁
1 𝑁 1 0
−𝐸 [− 4 ∑(𝑥[𝑛] − 𝐴)] −𝐸 [ 4 − 6 ∑(𝑥[𝑛] − 𝐴)2 ] 2𝜎 4
[ 𝜎 2𝜎 𝜎 ]
𝑛=0 𝑛=0

𝜎2
0
𝐈−𝟏 (𝛉) = 𝑁
2𝜎 4
[0 𝑁 ]
Although not true in general, for this example the Fisher information matrix is diagonal and
hence easy to convert:

𝜎2
var(𝐴̂) ≥
𝑁
4
̂2 ) ≥ 2𝜎
var(𝜎
𝑁
Example: Line Fitting
Consider the problem of line fitting given the observations
𝑥[𝑛] = 𝐴 + 𝐵𝑛 + 𝑤[𝑛], 𝑛 = 0,1, ⋯ , 𝑁 − 1
where 𝑤[𝑛] is WGN. We wish to estimate the slope 𝐵 and the intercept 𝐴. The parameter
vector is 𝜽 = [𝐴 𝐵 ]𝑇 . The likelihood function is
𝑁−1 2
1 1
𝑝(𝐱; 𝜽) = 2 𝑁/2
exp {− 2 ∑ (𝑥[𝑛] − ⏟
𝐴 − 𝐵𝑛) },
(2𝜋𝜎 ) 2𝜎 𝑠[𝑛;𝛉]
𝑛=0

therefore, the log-likelihood function is


𝑁−1
𝑁 1
ln 𝑝(𝐱; 𝜽) = ln(2𝜋𝜎 2 ) − 2 ∑(𝑥[𝑛] − 𝐴 − 𝐵𝑛)2
2 2𝜎
𝑛=0

taking the derivatives yield,


𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 1 𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁
= 2 ∑(𝑥[𝑛] − 𝐴 − 𝐵𝑛) ⇒ = −
𝜕𝐴 𝜎 𝜕𝐴2 𝜎2
𝑛=0
𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 1
= 2 ∑ 𝑛(𝑥[𝑛] − 𝐴 − 𝐵𝑛)
𝜕𝐵 𝜎
𝑛=0
𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1 1 𝑁(𝑁 − 1)(2𝑁 − 1)
⇒ 2
= − 2 ∑ 𝑛2 = − 2
𝜕𝐵 𝜎 𝜎 6
𝑛=0

𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1 1 𝑁(𝑁 − 1)
=− 2∑𝑛=− 2
𝜕𝐴 𝜕𝐵 𝜎 𝜎 2
𝑛=0

Therefore, the Fisher information matrix becomes


𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁−1
−𝐸 [ 2
] −𝐸 [ ] 1
𝜕𝐴 𝜕𝐴 𝜕𝐵 𝑁 2
𝐈(𝛉) = 2 2 = 2[ ]
𝜕 ln 𝑝(𝐱; 𝜽) 𝜕 ln 𝑝(𝐱; 𝜽) 𝜎 𝑁−1 (𝑁 − 1)(2𝑁 − 1)
−𝐸 [ ] −𝐸 [ ]
[ 𝜕𝐴 𝜕𝐵 𝜕𝐵 2 ] 2 6

And (𝜽 = [𝐴 𝐵 ]𝑇 )
2(2𝑁 − 1) 6
𝜎 2 −
𝐈 −1 (𝜽) = [ 𝑁+1 𝑁 + 1]
𝑁 6 12

𝑁+1 𝑁2 − 1
therefore, the CRLB is

2(2𝑁 − 1)
var(𝐴̂) ≥ 𝜎 2
𝑁(𝑁 + 1)

12
var(𝐵̂ ) ≥ 𝜎 2
𝑁(𝑁 2 − 1)

Although not obvious, we can actually find the efficient MVU estimators by
𝑁(𝑁 − 1)
𝜕 ln 𝑝(𝐱; 𝜽) 1 𝑁 ̂
̂
= 𝐈(𝜽)(𝜽 − 𝜽) = 2 [ 2 ] [𝐴 − 𝐴 ]
𝜕𝜽 𝜎 𝑁(𝑁 − 1) 𝑁(𝑁 − 1)(2𝑁 − 1) ⏟𝐵̂ − 𝐵
⏟ ̂−𝜽
𝜽
2 6
𝐈(𝛉)
𝑁−1
1
∑(𝑥[𝑛] − 𝐵𝑛) − 𝑁𝜎 2 𝐴
𝜎2
𝑛=0
= 𝑁−1 𝑁−1
1
2
∑ 𝑛(𝑥[𝑛] − 𝐴) − 𝐵𝜎 2 ∑ 𝑛2
[𝜎 𝑛=0 𝑛=0 ]

Therefore,
𝑁−1 𝑁−1
2(2𝑁 − 1) 6
𝐴̂ = ∑ 𝑥[𝑛] − ∑ 𝑛𝑥[𝑛]
𝑁(𝑁 + 1) 𝑁(𝑁 + 1)
𝑛=0 𝑛=0
𝑁−1 𝑁−1
6 12
𝐵̂ = − ∑ 𝑥[𝑛] + ∑ 𝑛𝑥[𝑛]
𝑁(𝑁 + 1) 𝑁(𝑁 2 − 1)
𝑛=0 𝑛=0

• NOTE: As an alternative, we can use the following identity


𝜕 ln 𝑝(𝐱; 𝜽) 𝜕 ln 𝑝(𝐱; 𝜽) ∂2 ln𝑝(𝐱; 𝜽)
𝐸[ ] = −𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗 𝜕𝜃𝑖 𝜕𝜃𝑗

4.6 Vector Parameter CRLB for Transformations


Assume that we want to estimate a function of the parameters, 𝜶 = 𝐠(𝜽), where 𝐠 is an 𝑟-
dimensional function. Then,
𝑇
𝜕𝐠(𝜽) −1 𝜕𝐠(𝜽)
var(𝛼̂𝑖 ) ≥ [ 𝐈 (𝜽) ( ) ]
𝜕𝜽 𝜕𝜽
𝑖𝑖

𝜕𝐠(𝜽)
where is the 𝑟 × 𝑝 Jacobian matrix defined as
𝜕𝜽

𝜕𝑔1 (𝜽) 𝜕𝑔1 (𝜽) 𝜕𝑔1 (𝜽)



𝜕𝜃1 𝜕𝜃2 𝜕𝜃𝑝
𝜕𝑔2 (𝜽) 𝜕𝑔2 (𝜽) 𝜕𝑔2 (𝜽)
𝜕𝐠(𝜽) ⋯
= 𝜕𝜃1 𝜕𝜃2 𝜕𝜃𝑝
𝜕𝜽
⋮ ⋮ ⋱ ⋮
𝜕𝑔𝑟 (𝜽) 𝜕𝑔𝑟 (𝜽) 𝜕𝑔𝑟 (𝜽)

[ 𝜕𝜃1 𝜕𝜃2 𝜕𝜃𝑝 ]

Example: Consider a DC level in WGN with 𝐴 and 𝜎 2 unknown. We wish to estimate


𝐴2
𝛼=
𝜎2
which can be considered to be the SNR for a single sample. Here 𝜽 = [𝐴 𝜎 2 ]𝑇 and 𝑔(𝜽) =
𝜃12⁄ 2 2
𝜃2 = 𝐴 /𝜎 . Then, as shown before,
𝑁
2
0
𝐈 −𝟏 (𝜽) = [𝜎 ]
𝑁
0
2𝜎 4
The Jacobian is

𝜕𝐠(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 2𝐴 𝐴2


=[ ]=[ ]=[ − ]
𝜕𝜽 𝜕𝜃1 𝜕𝜃2 𝜕𝐴 𝜕𝜎 2 𝜎2 𝜎4
𝑁 2𝐴
𝜕𝐠(𝜽) −1 𝜕𝐠(𝜽)
𝑇
2 0 4𝐴2 2𝐴4 4𝛼 + 2𝛼 2
2𝐴 𝐴 𝜎2 𝜎2
𝐈 (𝜽) ( ) =[ − 4] [ ][ ] = + =
𝜕𝜽 𝜕𝜽 𝜎2 𝜎 𝑁 𝐴2 𝑁𝜎 2 𝑁𝜎 4 𝑁
0 4 − 4
2𝜎 𝜎
Therefore, the estimator variance

4𝛼 + 2𝛼 2
var(𝛼̂) ≥
𝑁

4.7 CRLB for the General Gaussian Case


• Assume that the noise does not have zero-mean
𝐱~𝒩(𝝁(𝛉), 𝐂(𝛉))

so that both mean and covariance may depend on 𝛉. Then, the Fisher information
matrix is given by
𝑻
𝜕𝝁(𝜽) 𝜕𝝁(𝜽) 1 𝜕𝐂(𝜽) −1 𝜕𝐂(𝜽)
[𝐈(𝜽)]𝑖𝑗 = [ ] 𝐂 −1 (𝜽) [ ] + tr [𝐂 −1 (𝜽) 𝐂 (𝜽) ]
𝜕𝜃𝑖 𝜕𝜃𝑗 2 𝜕𝜃𝑖 𝜕𝜃𝑗

where

𝜕[𝝁(𝜽)]1
𝜕𝜃𝑖
𝜕𝝁(𝜽) 𝜕[𝝁(𝜽)]2
= 𝜕𝜃𝑖
𝜕𝜃𝑖

𝜕[𝝁(𝜽)]𝑁
[ 𝜕𝜃𝑖 ]

and
𝜕[𝐂(𝜽)]11 𝜕[𝐂(𝜽)]12 𝜕[𝐂(𝜽)]1𝑁

𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖
∂𝐂(𝜽) 𝜕[𝐂(𝜽)]21 𝜕[𝐂(𝜽)]22 𝜕[𝐂(𝜽)]2𝑁
= ⋯ .
𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖
⋮ ⋮ ⋱ ⋮
𝜕[𝐂(𝜽)]𝑁1 𝜕[𝐂(𝜽)]𝑁2 𝜕[𝐂(𝜽)]𝑁𝑁

[ 𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖 ]

For the scaler parameter case in which

𝒙~𝒩(𝝁(𝜃), 𝐂(𝜃))

this reduces to
𝑻 2
𝜕𝝁(𝜃) 𝜕𝝁(𝜃) 1 𝜕𝐂(𝜃)
𝐼(𝜃) = [ ] 𝐂 −1 (𝜃) [ ] + tr [(𝐂 −1 (𝜃) ) ]
𝜕𝜃 𝜕𝜃 2 𝜕𝜃

4.7 Fisher Information Matrices


Can be expressed in three ways:

1)

𝜕 2 ln 𝑝(𝐱; 𝜃)
[𝐈(𝜽)]𝑖𝑗 = −𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗

2)

𝜕 ln 𝑝(𝐱; 𝜽) 𝜕 ln 𝑝(𝐱; 𝜽)
[𝐈(𝜽)]𝑖𝑗 = 𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗

3) If 𝑥[𝑛] = 𝑠[𝑛; 𝜽] + 𝑤[𝑛]

𝑁−1
1 𝜕𝑠[𝑛; 𝜽] 𝜕𝑠[𝑛; 𝜽]
[𝐈(𝜽)]𝑖𝑗 = 2 ∑
𝜎 𝜕𝜃𝑖 𝜕𝜃𝑗
𝑖=0

4.8 Asymptotic CRLB for WSS Gaussian Random Processes


At times it is difficult to analytically compute the CRLB due to the need to invert the covariance
matrix. An alternative way is that if 𝑥[𝑛] is a WSS Gaussian random process and the record
length is much greater than the correlation time of the process, then we can find the elements
of the Fisher matrix approximately by

1
𝑁 2 𝜕 ln 𝑃𝑥𝑥 (𝑓, 𝜽) 𝜕 ln 𝑃𝑥𝑥 (𝑓, 𝜽)
[𝐈(𝜽)]𝑖𝑗 = ∫ 𝑑𝑓 .
2 −1 𝜕𝜃𝑖 𝜕𝜃𝑗
2
1. The approximation is valid only as 𝑁 → ∞, or asymptotically.
2. It is assumed that the data record length is much greater than the correlation
time of the process.
o Correlation time: Maximum lag 𝑘 of the autocorrelation function
𝑟𝑥𝑥 [𝑘] = 𝐸[𝑥[𝑛]𝑥[𝑛 − 𝑘]], for which autocorrelation function is
essentially zero.
3. It is assumed that the mean of 𝑥[𝑛] is zero.

You might also like