Professional Documents
Culture Documents
1
var(𝜃̂) ≥
𝜕 2 ln 𝑝(𝐱; 𝜃)
−𝐸[ ]
⏟ 𝜕𝜃 2
⏟ Fisher Bilgisi
CRB
An unbiased estimator may be found that attains the bound for all 𝜃 if and only if
𝜕 ln 𝑝(𝐱; 𝜃)
= 𝐼(𝜃)(𝑔(𝐱) − 𝜃)
𝜕𝜃
MVU estimator, is 𝜃̂ = 𝑔(𝐱), and the minimum variance is 1/𝐼(𝜃), where 𝐼(𝜃) is the Fisher
information. The CRLB may also be expressed in a slightly different form
2
𝜕 2 ln 𝑝(𝐱; 𝜃) 𝜕 ln 𝑝(𝐱; 𝜃)
−𝐸 [ ] = 𝐸 [( ) ]
⏟ 𝜕𝜃 2 ⏟ 𝜕𝜃
𝐼(𝜃) 𝐼(𝜃)
i.e,
1
var(𝜃̂ ) ≥ 2
𝜕 ln 𝑝(𝐱; 𝜙)
𝐸 [( ) ]
⏟ 𝜕𝜃
CRB
The CRLB is
1. Nonnegative
2. Additive for independent observations → CRLB for 𝑁 iid observations is 1/𝑁 𝛼 times
that for one observation
If
𝑥[𝑛] = 𝑠[𝑛; 𝜃] + 𝑤[𝑛] 𝑛 = 0,1, ⋯ 𝑁 − 1
where 𝑤[𝑛] is zero-mean AWGN with variance 𝜎 2 .
𝜎2
var(𝜃̂) ≥ 2
𝜕𝑠[𝑛; 𝜃]
∑𝑁−1
𝑛=0 ( )
𝜕𝜃
2
𝜕𝑔(𝜃)
( ) 𝜕𝑔(𝜃)
2
𝜕𝜃
var(𝛼̂) ≥ ⇒ CRB𝑔(𝜃) = ( ) × CRB𝜃
𝜕 2 ln 𝑝(𝐱; 𝜃) 𝜕𝜃
−𝐸 [ ]
𝜕𝜃 2
• For example, consider the DC level in WGN, the CRLB for the amplitude is var(𝐴) ≥
𝜎 2 /𝑁. If we are interested in estimating 𝐴2 , then
(2𝐴)2 𝜎2
̂2 ) ≥
var(𝐴 = 4𝐴2
𝑁⁄ 𝑁
𝜎 2
• It’s time we extend the results of previous sections to case where we wish to estimate
𝑇
a vector parameter 𝜽 = [𝜃1 , 𝜃2 , ⋯ , 𝜃𝑝 ]
• ̂ is unbiased.
We will assume that 𝜽
• For the vector parameter, the CRLB is found as
𝜕 2 ln 𝑝(𝐱; 𝜽)
[𝐈(𝜽)]𝑖𝑗 = −𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗
• Note: For 𝑝 = 1, the Fisher information matrix becomes the Fisher information, i.e.,
• 𝐈(𝜽) = 𝐼(𝜃) and we have scalar CRLB.
where ≽ denotes that the matrix is positive semi-definite. Nothing that for a positive semi-
definite matrix, the diagonal elements are always non-negative and using the above
relationship we obtain
An unbiased estimator may be found that attains the CRLB bound in that 𝐂𝛉̂ = 𝐈 −𝟏 (𝛉) if and
only if
𝜕 ln 𝑝(𝐱, 𝜽)
̂ − 𝜽)
= 𝐈(𝜽)(𝐠(𝐱) − 𝜽) = 𝐈(𝜽)(𝜽
𝜕𝜽
̂ = 𝒈(𝐱) is the MVU estimator and has the estimation error covariance 𝐈−1 (𝜽).
where 𝜽
Now we extend the problem of DC in WGN to the case where in addition to 𝐴, the noise
variance is also unknown. The parameter vector is
𝑥[𝑛] = 𝐴 + 𝑤[𝑛]
𝛉 = [𝐴, 𝜎 2 ]𝑇
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽)
−𝐸 [ ] −𝐸 [ ]
𝜕𝐴2 𝜕𝐴𝜕𝜎 2
𝐈(𝛉) =
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝜕 2 ln 𝑝(𝐱; 𝜽)
−𝐸 [ ] −𝐸 [ 2 ]
[ 𝜕𝐴𝜕𝜎 2 𝜕𝜎 2 ]
The likelihood function is
𝑁−1
1 1 2
𝑝(𝐱; 𝛉) = 𝑁 exp [− 2𝜎 2 ∑(𝑥[𝑛] − 𝐴) ]
(2𝜋𝜎 2 ) 2 𝑛=0
The log-likelihood function is
𝑁−1
𝑁 𝑁 1
ln 𝑝(𝐱; 𝜽) = − ln(2𝜋) − ln 𝜎 2 − 2 ∑(𝑥[𝑛] − 𝐴)2
2 2 2𝜎
𝑛=0
𝑁−1 𝑁−1
𝜕 ln 𝑝(𝐱; 𝜽) 𝑁 1 2
𝜕 2 ln 𝑝(𝐱; 𝜽) 𝑁 1
= − + ∑(𝑥[𝑛] − 𝐴) ⇒ = − ∑(𝑥[𝑛] − 𝐴)2
𝜕𝜎 2 2𝜎 2 2𝜎 4 𝜕𝜎 22 2𝜎 4 𝜎 6
𝑛=0 𝑛=0
𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1
= − ∑(𝑥[𝑛] − 𝐴)
𝜕𝐴 𝜕𝜎 2 𝜎4
𝑛=0
Then we have
𝑁−1
𝑁 1
−𝐸 [− 2 ] −𝐸 [− 4 ∑(𝑥[𝑛] − 𝐴)] 𝑁
𝜎 𝜎 2
0
𝐈(𝛉) = 𝑛=0
= [𝜎 ]
𝑁−1 𝑁−1 𝑁
1 𝑁 1 0
−𝐸 [− 4 ∑(𝑥[𝑛] − 𝐴)] −𝐸 [ 4 − 6 ∑(𝑥[𝑛] − 𝐴)2 ] 2𝜎 4
[ 𝜎 2𝜎 𝜎 ]
𝑛=0 𝑛=0
𝜎2
0
𝐈−𝟏 (𝛉) = 𝑁
2𝜎 4
[0 𝑁 ]
Although not true in general, for this example the Fisher information matrix is diagonal and
hence easy to convert:
𝜎2
var(𝐴̂) ≥
𝑁
4
̂2 ) ≥ 2𝜎
var(𝜎
𝑁
Example: Line Fitting
Consider the problem of line fitting given the observations
𝑥[𝑛] = 𝐴 + 𝐵𝑛 + 𝑤[𝑛], 𝑛 = 0,1, ⋯ , 𝑁 − 1
where 𝑤[𝑛] is WGN. We wish to estimate the slope 𝐵 and the intercept 𝐴. The parameter
vector is 𝜽 = [𝐴 𝐵 ]𝑇 . The likelihood function is
𝑁−1 2
1 1
𝑝(𝐱; 𝜽) = 2 𝑁/2
exp {− 2 ∑ (𝑥[𝑛] − ⏟
𝐴 − 𝐵𝑛) },
(2𝜋𝜎 ) 2𝜎 𝑠[𝑛;𝛉]
𝑛=0
𝑁−1
𝜕 2 ln 𝑝(𝐱; 𝜽) 1 1 𝑁(𝑁 − 1)
=− 2∑𝑛=− 2
𝜕𝐴 𝜕𝐵 𝜎 𝜎 2
𝑛=0
And (𝜽 = [𝐴 𝐵 ]𝑇 )
2(2𝑁 − 1) 6
𝜎 2 −
𝐈 −1 (𝜽) = [ 𝑁+1 𝑁 + 1]
𝑁 6 12
−
𝑁+1 𝑁2 − 1
therefore, the CRLB is
2(2𝑁 − 1)
var(𝐴̂) ≥ 𝜎 2
𝑁(𝑁 + 1)
12
var(𝐵̂ ) ≥ 𝜎 2
𝑁(𝑁 2 − 1)
Although not obvious, we can actually find the efficient MVU estimators by
𝑁(𝑁 − 1)
𝜕 ln 𝑝(𝐱; 𝜽) 1 𝑁 ̂
̂
= 𝐈(𝜽)(𝜽 − 𝜽) = 2 [ 2 ] [𝐴 − 𝐴 ]
𝜕𝜽 𝜎 𝑁(𝑁 − 1) 𝑁(𝑁 − 1)(2𝑁 − 1) ⏟𝐵̂ − 𝐵
⏟ ̂−𝜽
𝜽
2 6
𝐈(𝛉)
𝑁−1
1
∑(𝑥[𝑛] − 𝐵𝑛) − 𝑁𝜎 2 𝐴
𝜎2
𝑛=0
= 𝑁−1 𝑁−1
1
2
∑ 𝑛(𝑥[𝑛] − 𝐴) − 𝐵𝜎 2 ∑ 𝑛2
[𝜎 𝑛=0 𝑛=0 ]
Therefore,
𝑁−1 𝑁−1
2(2𝑁 − 1) 6
𝐴̂ = ∑ 𝑥[𝑛] − ∑ 𝑛𝑥[𝑛]
𝑁(𝑁 + 1) 𝑁(𝑁 + 1)
𝑛=0 𝑛=0
𝑁−1 𝑁−1
6 12
𝐵̂ = − ∑ 𝑥[𝑛] + ∑ 𝑛𝑥[𝑛]
𝑁(𝑁 + 1) 𝑁(𝑁 2 − 1)
𝑛=0 𝑛=0
𝜕𝐠(𝜽)
where is the 𝑟 × 𝑝 Jacobian matrix defined as
𝜕𝜽
4𝛼 + 2𝛼 2
var(𝛼̂) ≥
𝑁
so that both mean and covariance may depend on 𝛉. Then, the Fisher information
matrix is given by
𝑻
𝜕𝝁(𝜽) 𝜕𝝁(𝜽) 1 𝜕𝐂(𝜽) −1 𝜕𝐂(𝜽)
[𝐈(𝜽)]𝑖𝑗 = [ ] 𝐂 −1 (𝜽) [ ] + tr [𝐂 −1 (𝜽) 𝐂 (𝜽) ]
𝜕𝜃𝑖 𝜕𝜃𝑗 2 𝜕𝜃𝑖 𝜕𝜃𝑗
where
𝜕[𝝁(𝜽)]1
𝜕𝜃𝑖
𝜕𝝁(𝜽) 𝜕[𝝁(𝜽)]2
= 𝜕𝜃𝑖
𝜕𝜃𝑖
⋮
𝜕[𝝁(𝜽)]𝑁
[ 𝜕𝜃𝑖 ]
and
𝜕[𝐂(𝜽)]11 𝜕[𝐂(𝜽)]12 𝜕[𝐂(𝜽)]1𝑁
⋯
𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖
∂𝐂(𝜽) 𝜕[𝐂(𝜽)]21 𝜕[𝐂(𝜽)]22 𝜕[𝐂(𝜽)]2𝑁
= ⋯ .
𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖
⋮ ⋮ ⋱ ⋮
𝜕[𝐂(𝜽)]𝑁1 𝜕[𝐂(𝜽)]𝑁2 𝜕[𝐂(𝜽)]𝑁𝑁
⋯
[ 𝜕𝜃𝑖 𝜕𝜃𝑖 𝜕𝜃𝑖 ]
𝒙~𝒩(𝝁(𝜃), 𝐂(𝜃))
this reduces to
𝑻 2
𝜕𝝁(𝜃) 𝜕𝝁(𝜃) 1 𝜕𝐂(𝜃)
𝐼(𝜃) = [ ] 𝐂 −1 (𝜃) [ ] + tr [(𝐂 −1 (𝜃) ) ]
𝜕𝜃 𝜕𝜃 2 𝜕𝜃
1)
𝜕 2 ln 𝑝(𝐱; 𝜃)
[𝐈(𝜽)]𝑖𝑗 = −𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗
2)
𝜕 ln 𝑝(𝐱; 𝜽) 𝜕 ln 𝑝(𝐱; 𝜽)
[𝐈(𝜽)]𝑖𝑗 = 𝐸 [ ]
𝜕𝜃𝑖 𝜕𝜃𝑗
𝑁−1
1 𝜕𝑠[𝑛; 𝜽] 𝜕𝑠[𝑛; 𝜽]
[𝐈(𝜽)]𝑖𝑗 = 2 ∑
𝜎 𝜕𝜃𝑖 𝜕𝜃𝑗
𝑖=0
1
𝑁 2 𝜕 ln 𝑃𝑥𝑥 (𝑓, 𝜽) 𝜕 ln 𝑃𝑥𝑥 (𝑓, 𝜽)
[𝐈(𝜽)]𝑖𝑗 = ∫ 𝑑𝑓 .
2 −1 𝜕𝜃𝑖 𝜕𝜃𝑗
2
1. The approximation is valid only as 𝑁 → ∞, or asymptotically.
2. It is assumed that the data record length is much greater than the correlation
time of the process.
o Correlation time: Maximum lag 𝑘 of the autocorrelation function
𝑟𝑥𝑥 [𝑘] = 𝐸[𝑥[𝑛]𝑥[𝑛 − 𝑘]], for which autocorrelation function is
essentially zero.
3. It is assumed that the mean of 𝑥[𝑛] is zero.