Last Week: 4.2 Cramer-Rao Lower Bound: 2 2 Fisher Bilgisi CRB

LAST WEEK:
4.2 Cramer-Rao Lower Bound

Then, the variance of any unbiased estimator 𝜃̂ must satisfy
1
var(𝜃̂) ≥
𝜕 2 ln 𝑝(𝐱; 𝜃)
−𝐸[ ]
⏟ 𝜕𝜃 2
⏟ Fisher Bilgisi
CRB
An unbiased estimator may be found that attains the bound for all 𝜃 if and only if
𝜕 ln 𝑝(𝐱; 𝜃)
= 𝐼(𝜃)(𝑔(𝐱) − 𝜃)
𝜕𝜃
MVU estimator, is 𝜃̂ = 𝑔(𝐱), and the minimum variance is 1/𝐼(𝜃), where 𝐼(𝜃) is the Fisher
information. The CRLB may also be expressed in a slightly different form
2
𝜕 2 ln 𝑝(𝐱; 𝜃) 𝜕 ln 𝑝(𝐱; 𝜃)
−𝐸 [ ] = 𝐸 [( ) ]
⏟ 𝜕𝜃 2 ⏟ 𝜕𝜃
𝐼(𝜃) 𝐼(𝜃)
i.e,
1
var(𝜃̂ ) ≥ 2
𝜕 ln 𝑝(𝐱; 𝜙)
𝐸 [( ) ]
⏟ 𝜕𝜃
CRB
The CRLB is
1. Nonnegative
2. Additive for independent observations → CRLB for 𝑁 iid observations is 1/𝑁 𝛼 times
that for one observation
If
𝑥[𝑛] = 𝑠[𝑛; 𝜃] + 𝑤[𝑛] 𝑛 = 0,1, ⋯ 𝑁 − 1
where 𝑤[𝑛] is zero-mean AWGN with variance 𝜎 2 .
𝜎2
var(𝜃̂) ≥ 2
𝜕𝑠[𝑛; 𝜃]
∑𝑁−1
𝑛=0 ( )
𝜕𝜃
• This form demonstrates the importance of the signal dependence on 𝜃.

• Signals that change rapidly as the unknown parameter changes result in more accurate
estimators.
4.4 Transformation of Parameters
• Sometimes we are interested in estimating a function of some parameters.
o For example, we may be interested in estimating 𝐴2 instead of the amplitude

𝐴
o We may be interested in estimating not the phase 𝜙 itself, but cos 𝜙
• If we know the CRLB for the parameter 𝜃 and we desire to estimate 𝛼 = 𝑔(𝜃), then
the CRLB is
2
𝜕𝑔(𝜃)
( ) 𝜕𝑔(𝜃)
2
𝜕𝜃
var(𝛼̂) ≥ ⇒ CRB𝑔(𝜃) = ( ) × CRB𝜃
𝜕 2 ln 𝑝(𝐱; 𝜃) 𝜕𝜃
−𝐸 [ ]
𝜕𝜃 2
• For example, consider the DC level in WGN, the CRLB for the amplitude is var(𝐴) ≥
𝜎 2 /𝑁. If we are interested in estimating 𝐴2 , then
(2𝐴)2 𝜎2
̂2 ) ≥
var(𝐴 = 4𝐴2
𝑁⁄ 𝑁
𝜎 2
4.5 Extension to a Vector Parameter

SUMMARY
• It’s time we extend the results of previous sections to case where we wish to estimate
𝑇
a vector parameter 𝜽 = [𝜃1 , 𝜃2 , ⋯ , 𝜃𝑝 ]
• ̂ is unbiased.
We will assume that 𝜽
• For the vector parameter, the CRLB is found as
var(𝜃̂𝑖 ) ≥ [𝐈−1 (𝜽)]𝑖𝑖
where 𝐈(𝛉) is the 𝑝 × 𝑝 Fisher information matrix.

• Elements of the Fisher information matrix is defined by
𝜕 2 ln 𝑝(𝐱; 𝜽)
[𝐈(𝜽)]𝑖𝑗 = −𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗
for 𝑖 = 1,2, ⋯ , 𝑝 and 𝑗 = 1,2, ⋯ 𝑝.
• Note: For 𝑝 = 1, the Fisher information matrix becomes the Fisher information, i.e.,
• 𝐈(𝜽) = 𝐼(𝜃) and we have scalar CRLB.
THEOREM 2: CRAMER-RAO LOWER BOUND – VECTOR PARAMETER

It is assumed that the pdf 𝑝(𝐱, 𝛉) satisfies the regularity conditions:
𝜕 ln 𝑝(𝐱, 𝜽)
𝐸[ ]
𝜕 ln 𝑝(𝐱, 𝜽) 𝜕𝜃 1
𝐸[ ]= ⋮ =0
𝜕𝜽 𝜕 ln 𝑝(𝐱, 𝜽)
𝐸[ ]
[ 𝜕𝜃𝑝 ]
and the covariance matrix 𝐂𝛉̂ is defined by

𝐻
̂ − 𝜽)(𝜽
𝐂𝛉̂ = E [(𝜽 ̂ − 𝜽) ].
then, the estimation-error covariance matrix satisfies
𝐂𝛉̂ − 𝐈−𝟏 (𝜽) ≽ 0
where ≽ denotes that the matrix is positive semi-definite. Nothing that for a positive semi-
definite matrix, the diagonal elements are always non-negative and using the above
relationship we obtain
̂ 𝑖 ) = [𝐂𝜽̂ ]𝑖𝑖 ≥ [𝐈 −1 (𝜽)]𝑖𝑖 .

var(𝜽
An unbiased estimator may be found that attains the CRLB bound in that 𝐂𝛉̂ = 𝐈 −𝟏 (𝛉) if and
only if
𝜕 ln 𝑝(𝐱, 𝜽)
̂ − 𝜽)
= 𝐈(𝜽)(𝐠(𝐱) − 𝜽) = 𝐈(𝜽)(𝜽
𝜕𝜽
̂ = 𝒈(𝐱) is the MVU estimator and has the estimation error covariance 𝐈−1 (𝜽).
where 𝜽
Example: DC Level in WGN Revisited
Now we extend the problem of DC in WGN to the case where in addition to 𝐴, the noise
variance is also unknown. The parameter vector is
𝑥[𝑛] = 𝐴 + 𝑤[𝑛]
𝛉 = [𝐴, 𝜎 2 ]𝑇
and therefore 𝑝 = 2. The 2 × 2 Fisher information matrix is
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽)
−𝐸 [ ] −𝐸 [ ]
𝜕𝐴2 𝜕𝐴𝜕𝜎 2
𝐈(𝛉) =
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽)
−𝐸 [ ] −𝐸 [ 2 ]
[ 𝜕𝐴𝜕𝜎 2 𝜕𝜎 2 ]
The likelihood function is
𝑁−1
1 1 2
𝑝(𝐱; 𝛉) = 𝑁 exp [− 2𝜎 2 ∑(𝑥[𝑛] − 𝐴) ]
(2𝜋𝜎 2 ) 2 𝑛=0
The log-likelihood function is
𝑁−1
𝑁 𝑁 1
ln 𝑝(𝐱; 𝜽) = − ln(2𝜋) − ln 𝜎 2 − 2 ∑(𝑥[𝑛] − 𝐴)2
2 2 2𝜎
𝑛=0
The derivatives are easily found as

𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 1 𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁
= 2 ∑(𝑥[𝑛] − 𝐴) ⇒ = −
𝜕𝐴 𝜎 𝜕𝐴2 𝜎2
𝑛=0
𝑁−1 𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 𝑁 1 2
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁 1
= − + ∑(𝑥[𝑛] − 𝐴) ⇒ = − ∑(𝑥[𝑛] − 𝐴)2
𝜕𝜎 2 2𝜎 2 2𝜎 4 𝜕𝜎 22 2𝜎 4 𝜎 6
𝑛=0 𝑛=0
𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1
= − ∑(𝑥[𝑛] − 𝐴)
𝜕𝐴 𝜕𝜎 2 𝜎4
𝑛=0
Then we have
𝑁−1
𝑁 1
−𝐸 [− 2 ] −𝐸 [− 4 ∑(𝑥[𝑛] − 𝐴)] 𝑁
𝜎 𝜎 2
0
𝐈(𝛉) = 𝑛=0
= [𝜎 ]
𝑁−1 𝑁−1 𝑁
1 𝑁 1 0
−𝐸 [− 4 ∑(𝑥[𝑛] − 𝐴)] −𝐸 [ 4 − 6 ∑(𝑥[𝑛] − 𝐴)2 ] 2𝜎 4
[ 𝜎 2𝜎 𝜎 ]
𝑛=0 𝑛=0
𝜎2
0
𝐈−𝟏 (𝛉) = 𝑁
2𝜎 4
[0 𝑁 ]
Although not true in general, for this example the Fisher information matrix is diagonal and
hence easy to convert:
𝜎2
var(𝐴̂) ≥
𝑁
4
̂2 ) ≥ 2𝜎
var(𝜎
𝑁
Example: Line Fitting
Consider the problem of line fitting given the observations
𝑥[𝑛] = 𝐴 + 𝐵𝑛 + 𝑤[𝑛], 𝑛 = 0,1, ⋯ , 𝑁 − 1
where 𝑤[𝑛] is WGN. We wish to estimate the slope 𝐵 and the intercept 𝐴. The parameter
vector is 𝜽 = [𝐴 𝐵 ]𝑇 . The likelihood function is
𝑁−1 2
1 1
𝑝(𝐱; 𝜽) = 2 𝑁/2
exp {− 2 ∑ (𝑥[𝑛] − ⏟
𝐴 − 𝐵𝑛) },
(2𝜋𝜎 ) 2𝜎 𝑠[𝑛;𝛉]
𝑛=0
therefore, the log-likelihood function is

𝑁−1
𝑁 1
ln 𝑝(𝐱; 𝜽) = ln(2𝜋𝜎 2 ) − 2 ∑(𝑥[𝑛] − 𝐴 − 𝐵𝑛)2
2 2𝜎
𝑛=0
taking the derivatives yield,

𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 1 𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁
= 2 ∑(𝑥[𝑛] − 𝐴 − 𝐵𝑛) ⇒ = −
𝜕𝐴 𝜎 𝜕𝐴2 𝜎2
𝑛=0
𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 1
= 2 ∑ 𝑛(𝑥[𝑛] − 𝐴 − 𝐵𝑛)
𝜕𝐵 𝜎
𝑛=0
𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1 1 𝑁(𝑁 − 1)(2𝑁 − 1)
⇒ 2
= − 2 ∑ 𝑛2 = − 2
𝜕𝐵 𝜎 𝜎 6
𝑛=0
𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1 1 𝑁(𝑁 − 1)
=− 2∑𝑛=− 2
𝜕𝐴 𝜕𝐵 𝜎 𝜎 2
𝑛=0
Therefore, the Fisher information matrix becomes

𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁−1
−𝐸 [ 2
] −𝐸 [ ] 1
𝜕𝐴 𝜕𝐴 𝜕𝐵 𝑁 2
𝐈(𝛉) = 2 2 = 2[ ]
𝜕 ln 𝑝(𝐱; 𝜽) 𝜕 ln 𝑝(𝐱; 𝜽) 𝜎 𝑁−1 (𝑁 − 1)(2𝑁 − 1)
−𝐸 [ ] −𝐸 [ ]
[ 𝜕𝐴 𝜕𝐵 𝜕𝐵 2 ] 2 6
And (𝜽 = [𝐴 𝐵 ]𝑇 )
2(2𝑁 − 1) 6
𝜎 2 −
𝐈 −1 (𝜽) = [ 𝑁+1 𝑁 + 1]
𝑁 6 12
−
𝑁+1 𝑁2 − 1
therefore, the CRLB is
2(2𝑁 − 1)
var(𝐴̂) ≥ 𝜎 2
𝑁(𝑁 + 1)
12
var(𝐵̂ ) ≥ 𝜎 2
𝑁(𝑁 2 − 1)
Although not obvious, we can actually find the efficient MVU estimators by
𝑁(𝑁 − 1)
𝜕 ln 𝑝(𝐱; 𝜽) 1 𝑁 ̂
̂
= 𝐈(𝜽)(𝜽 − 𝜽) = 2 [ 2 ] [𝐴 − 𝐴 ]
𝜕𝜽 𝜎 𝑁(𝑁 − 1) 𝑁(𝑁 − 1)(2𝑁 − 1) ⏟𝐵̂ − 𝐵
⏟ ̂−𝜽
𝜽
2 6
𝐈(𝛉)
𝑁−1
1
∑(𝑥[𝑛] − 𝐵𝑛) − 𝑁𝜎 2 𝐴
𝜎2
𝑛=0
= 𝑁−1 𝑁−1
1
2
∑ 𝑛(𝑥[𝑛] − 𝐴) − 𝐵𝜎 2 ∑ 𝑛2
[𝜎 𝑛=0 𝑛=0 ]
Therefore,
𝑁−1 𝑁−1
2(2𝑁 − 1) 6
𝐴̂ = ∑ 𝑥[𝑛] − ∑ 𝑛𝑥[𝑛]
𝑁(𝑁 + 1) 𝑁(𝑁 + 1)
𝑛=0 𝑛=0
𝑁−1 𝑁−1
6 12
𝐵̂ = − ∑ 𝑥[𝑛] + ∑ 𝑛𝑥[𝑛]
𝑁(𝑁 + 1) 𝑁(𝑁 2 − 1)
𝑛=0 𝑛=0
• NOTE: As an alternative, we can use the following identity

𝜕 ln 𝑝(𝐱; 𝜽) 𝜕 ln 𝑝(𝐱; 𝜽) ∂2 ln𝑝(𝐱; 𝜽)
𝐸[ ] = −𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗 𝜕𝜃𝑖 𝜕𝜃𝑗
4.6 Vector Parameter CRLB for Transformations

Assume that we want to estimate a function of the parameters, 𝜶 = 𝐠(𝜽), where 𝐠 is an 𝑟-
dimensional function. Then,
𝑇
𝜕𝐠(𝜽) −1 𝜕𝐠(𝜽)
var(𝛼̂𝑖 ) ≥ [ 𝐈 (𝜽) ( ) ]
𝜕𝜽 𝜕𝜽
𝑖𝑖
𝜕𝐠(𝜽)
where is the 𝑟 × 𝑝 Jacobian matrix defined as
𝜕𝜽
𝜕𝑔1 (𝜽) 𝜕𝑔1 (𝜽) 𝜕𝑔1 (𝜽)

⋯
𝜕𝜃1 𝜕𝜃2 𝜕𝜃𝑝
𝜕𝑔2 (𝜽) 𝜕𝑔2 (𝜽) 𝜕𝑔2 (𝜽)
𝜕𝐠(𝜽) ⋯
= 𝜕𝜃1 𝜕𝜃2 𝜕𝜃𝑝
𝜕𝜽
⋮ ⋮ ⋱ ⋮
𝜕𝑔𝑟 (𝜽) 𝜕𝑔𝑟 (𝜽) 𝜕𝑔𝑟 (𝜽)
⋯
[ 𝜕𝜃1 𝜕𝜃2 𝜕𝜃𝑝 ]
Example: Consider a DC level in WGN with 𝐴 and 𝜎 2 unknown. We wish to estimate

𝐴2
𝛼=
𝜎2
which can be considered to be the SNR for a single sample. Here 𝜽 = [𝐴 𝜎 2 ]𝑇 and 𝑔(𝜽) =
𝜃12⁄ 2 2
𝜃2 = 𝐴 /𝜎 . Then, as shown before,
𝑁
2
0
𝐈 −𝟏 (𝜽) = [𝜎 ]
𝑁
0
2𝜎 4
The Jacobian is
𝜕𝐠(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 2𝐴 𝐴2

=[ ]=[ ]=[ − ]
𝜕𝜽 𝜕𝜃1 𝜕𝜃2 𝜕𝐴 𝜕𝜎 2 𝜎2 𝜎4
𝑁 2𝐴
𝜕𝐠(𝜽) −1 𝜕𝐠(𝜽)
𝑇
2 0 4𝐴2 2𝐴4 4𝛼 + 2𝛼 2
2𝐴 𝐴 𝜎2 𝜎2
𝐈 (𝜽) ( ) =[ − 4] [ ][ ] = + =
𝜕𝜽 𝜕𝜽 𝜎2 𝜎 𝑁 𝐴2 𝑁𝜎 2 𝑁𝜎 4 𝑁
0 4 − 4
2𝜎 𝜎
Therefore, the estimator variance
4𝛼 + 2𝛼 2
var(𝛼̂) ≥
𝑁
4.7 CRLB for the General Gaussian Case

• Assume that the noise does not have zero-mean
𝐱~𝒩(𝝁(𝛉), 𝐂(𝛉))
so that both mean and covariance may depend on 𝛉. Then, the Fisher information
matrix is given by
𝑻
𝜕𝝁(𝜽) 𝜕𝝁(𝜽) 1 𝜕𝐂(𝜽) −1 𝜕𝐂(𝜽)
[𝐈(𝜽)]𝑖𝑗 = [ ] 𝐂 −1 (𝜽) [ ] + tr [𝐂 −1 (𝜽) 𝐂 (𝜽) ]
𝜕𝜃𝑖 𝜕𝜃𝑗 2 𝜕𝜃𝑖 𝜕𝜃𝑗
where
𝜕[𝝁(𝜽)]1
𝜕𝜃𝑖
𝜕𝝁(𝜽) 𝜕[𝝁(𝜽)]2
= 𝜕𝜃𝑖
𝜕𝜃𝑖
⋮
𝜕[𝝁(𝜽)]𝑁
[ 𝜕𝜃𝑖 ]
and
𝜕[𝐂(𝜽)]11 𝜕[𝐂(𝜽)]12 𝜕[𝐂(𝜽)]1𝑁
⋯
𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖
∂𝐂(𝜽) 𝜕[𝐂(𝜽)]21 𝜕[𝐂(𝜽)]22 𝜕[𝐂(𝜽)]2𝑁
= ⋯ .
𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖
⋮ ⋮ ⋱ ⋮
𝜕[𝐂(𝜽)]𝑁1 𝜕[𝐂(𝜽)]𝑁2 𝜕[𝐂(𝜽)]𝑁𝑁
⋯
[ 𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖 ]
For the scaler parameter case in which
𝒙~𝒩(𝝁(𝜃), 𝐂(𝜃))
this reduces to
𝑻 2
𝜕𝝁(𝜃) 𝜕𝝁(𝜃) 1 𝜕𝐂(𝜃)
𝐼(𝜃) = [ ] 𝐂 −1 (𝜃) [ ] + tr [(𝐂 −1 (𝜃) ) ]
𝜕𝜃 𝜕𝜃 2 𝜕𝜃
4.7 Fisher Information Matrices

Can be expressed in three ways:
1)
𝜕 2 ln 𝑝(𝐱; 𝜃)
[𝐈(𝜽)]𝑖𝑗 = −𝐸 [ ]
2)
𝜕 ln 𝑝(𝐱; 𝜽) 𝜕 ln 𝑝(𝐱; 𝜽)
[𝐈(𝜽)]𝑖𝑗 = 𝐸 [ ]
3) If 𝑥[𝑛] = 𝑠[𝑛; 𝜽] + 𝑤[𝑛]
𝑁−1
1 𝜕𝑠[𝑛; 𝜽] 𝜕𝑠[𝑛; 𝜽]
[𝐈(𝜽)]𝑖𝑗 = 2 ∑
𝜎 𝜕𝜃𝑖 𝜕𝜃𝑗
𝑖=0
4.8 Asymptotic CRLB for WSS Gaussian Random Processes

At times it is difficult to analytically compute the CRLB due to the need to invert the covariance
matrix. An alternative way is that if 𝑥[𝑛] is a WSS Gaussian random process and the record
length is much greater than the correlation time of the process, then we can find the elements
of the Fisher matrix approximately by
1
𝑁 2 𝜕 ln 𝑃𝑥𝑥 (𝑓, 𝜽) 𝜕 ln 𝑃𝑥𝑥 (𝑓, 𝜽)
[𝐈(𝜽)]𝑖𝑗 = ∫ 𝑑𝑓 .
2 −1 𝜕𝜃𝑖 𝜕𝜃𝑗
2
1. The approximation is valid only as 𝑁 → ∞, or asymptotically.
2. It is assumed that the data record length is much greater than the correlation
time of the process.
o Correlation time: Maximum lag 𝑘 of the autocorrelation function
𝑟𝑥𝑥 [𝑘] = 𝐸[𝑥[𝑛]𝑥[𝑛 − 𝑘]], for which autocorrelation function is
essentially zero.
3. It is assumed that the mean of 𝑥[𝑛] is zero.

Last Week: 4.2 Cramer-Rao Lower Bound: 2 2 Fisher Bilgisi CRB

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Last Week: 4.2 Cramer-Rao Lower Bound: 2 2 Fisher Bilgisi CRB

Uploaded by

Copyright:

Available Formats

LAST WEEK:

4.2 Cramer-Rao Lower Bound

• This form demonstrates the importance of the signal dependence on 𝜃.

o For example, we may be interested in estimating 𝐴2 instead of the amplitude

4.5 Extension to a Vector Parameter

var(𝜃̂𝑖 ) ≥ [𝐈−1 (𝜽)]𝑖𝑖

where 𝐈(𝛉) is the 𝑝 × 𝑝 Fisher information matrix.

for 𝑖 = 1,2, ⋯ , 𝑝 and 𝑗 = 1,2, ⋯ 𝑝.

THEOREM 2: CRAMER-RAO LOWER BOUND – VECTOR PARAMETER

and the covariance matrix 𝐂𝛉̂ is defined by

then, the estimation-error covariance matrix satisfies

𝐂𝛉̂ − 𝐈−𝟏 (𝜽) ≽ 0

̂ 𝑖 ) = [𝐂𝜽̂ ]𝑖𝑖 ≥ [𝐈 −1 (𝜽)]𝑖𝑖 .

Example: DC Level in WGN Revisited

and therefore 𝑝 = 2. The 2 × 2 Fisher information matrix is

The derivatives are easily found as

therefore, the log-likelihood function is

taking the derivatives yield,

Therefore, the Fisher information matrix becomes

• NOTE: As an alternative, we can use the following identity

4.6 Vector Parameter CRLB for Transformations

𝜕𝑔1 (𝜽) 𝜕𝑔1 (𝜽) 𝜕𝑔1 (𝜽)

Example: Consider a DC level in WGN with 𝐴 and 𝜎 2 unknown. We wish to estimate

𝜕𝐠(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 𝜕𝑔(𝜽) 2𝐴 𝐴2

4.7 CRLB for the General Gaussian Case

For the scaler parameter case in which

4.7 Fisher Information Matrices

3) If 𝑥[𝑛] = 𝑠[𝑛; 𝜽] + 𝑤[𝑛]

4.8 Asymptotic CRLB for WSS Gaussian Random Processes

You might also like