You are on page 1of 16

Solutions to Steven Kays Statistical Estimation

book
Satish Bysany
Aalto University School of Electrical Engineering
March 1, 2011
[section]
1 Introduction
This is as set of notes describing solutions to Steven Kays book Fundamen-
tals of Statistical Signal Processing: Estimation Theory. A brief review of
notation is in order.
1.1 Notation
I is identity matrix.
0 represents a matrix or vector of all zeros.
e is a column vector of all ones.
J is exchange matrix, with 1s on the anti-diagonal and 0s elsewhere.
e
j
is a column vector whose j
th
element is 1, rest all 0.
ab
.
= a
H
b is the dot product of a and b


t
f(t) is the derivative of a scalar function f(t) depending on M 1
real vector parameter t, is dened by

t
f(t) =
_

t
1
f(t)

t
2
f(t)
.
.
.

t
M
f(t)
_

_
1


t
h(t) is the derivative of a M1 real vector function h(t) depending
upon a scalar value t.

t
f (t) =
_

t
f
1
(t)

t
f
2
(t)
.
.
.

t
f
M
(t)
_

_
2 Chapter 2
Solutions to Problems in Chapter 2
2.1 Problem 2.1
The data x = {x[0], x[1], . . . , x[N 1]} are observed where the x[n]s are
i.i.d. as N(0,
2
). We wish to estimate the variance
2
as

2
=
1
N
N1

n=0
x
2
[n] (1)
Solution From the problem denition, it follows that, n,
= E (x[n]) = 0

2
= E
_
(x[n] )
2
_
= E
_
x
2
[n]
_
Now take the E() operator on both sides of Eq(1) and using the fact
that, for any two random variables X and Y ,
E(X +Y ) = E(X) +E(Y )
E
_

2
_
=
1
N
N1

n=0
E
_
x
2
[n]
_
=
1
N
N1

n=0

2
=
N
2
N
=
2
(2)
Hence the estimator 1 is unbiased. Note that, this result holds even if the
x[n]s are not independent!
Next, applying the variance operator var() on both sides of Eq(1) and
using the fact that, for independent random variables X and Y ,
var(aX +bY ) = a
2
var(X) +b
2
var(Y )
2
var
_

2
_
=
1
N
2
N1

n=0
var
_
x
2
[n]
_
(3)
Let X N(0, 1) be normal distribution with zero-mean and unit vari-
ance. Then, by denition, Y = X
2

2
1
is chi-square distributed with
1 degree of freedom. We know that mean
_

2
n
_
= n, var
_

2
n
_
= 2n, so,
var(Y ) = var(X
2
) = 2 1 = 2.
Introducing Z = X, implies that var(Z) =
2
var(X) =
2
. Since
E(Z) = E(X) = 0, we conclude Z N(0,
2
).
Now consider var(Z
2
) = var(
2
X
2
) =
4
var(X
2
) = 2
4
. Since each of
x[n] N(0,
2
), we have,
var(x
2
[0]) = var(x
2
[1]) = = var(x
2
[N 1]) = 2
4
Hence, Eq(3) simplies to
var
_

2
_
=
1
N
2
N1

n=0
(2
4
) =
2
4
N
N
2
=
2
4
N
(4)
As N , var
_

2
_
0.
2.2 Problem 2.5
Two samples {x[0], x[1]} are independently observed from N
_
0,
2
_
distri-
bution. The estimator

2
=
1
2
_
x
2
[0] +x
2
[1]
_
(5)
is unbiased. Find the PDF of
2
to determine if it is symmetric about
2
Solution Consider two standard normal random variables X
0
and X
1
,
that is, X
i
N (0, 1) , i = 0, 1. Then, by denition, X = X
2
0
+ X
2
1
is

2
(n)-distributed with n = 2 degrees of freedom. Its PDF is
f
X
(x) =
1
2
e
x/2
x > 0
Let x[0] = X
0
and x[1] = X
1
. Then
x
2
[0] +x
2
[1] =
2
(X
2
0
+X
2
1
) =
2
X
=
2
=

2
2
X from Eq(5)
3
We know that, for two continuous random varaibles X and Y related as
Y = aX +b,
f
Y
(y) =
1
|a|
f
X
_
y b
a
_
Taking a =

2
2
, b = 0, =
2
, the PDF of
2
is
f

2(y; ) =
1
a
f
X
_
y
a
_
=
2

2
_
1
2
e
y
2a
_
=
1

2
e
y/
2
=
1

e
y/
y > 0
Its obvious that f

2(y; ) = f

2(y; ), so the PDF is not symmetric about
=
2
. Note carefully that the PDF is symmetric about , not
2
.
3 Chapter 3: CRLB
3.1 Formulas
Let a random variable X depend on some parameter t. We write the PDF
of X as f
X
(x; t) it represents a family of PDFs, each one with a dierent
value of t. When the PDF is viewed as a function of t for a given, xed
value of x, it is termed as likelihood function. We dene, the log-likelihood
function as
L(t)
.
= L
X
(t|x)
.
= ln f
X
(x; t) (6)
Note that t is a deterministic, but unknown parameter. We simply write it
as L(t) when the random variable X is known from context. For the sake
of notation, we dene

L =

t
L(t) =

t
ln f
X
(x; t) =
1
f
X
(x; t)

t
f
X
(x; t) (7)

L =

2
t
2
L(t) =

2
t
2
ln f
X
(x; t) (8)
Taking the expectation w.r.t X, if the regularity condition
E(

L) = 0 (9)
is satised, then there exists a lower bound on the variance of an unbiased
estimator

t,
var(

t )
1
E(

L)
(10)
4
Furthermore, for the equality sign, and for all t,
var(

t ) =
1
E(

L)


L = g(t)(h(x) t)

t = h(x) (11)
where g() and h() are some functions. Note that the above applies only for
unbiased estimates, so E(

t ) = t = E[h(x)]. The minimum variance is also


given by,
var(

t ) =
1
E(

L)
=
1
g(t)
= g(t) = E(

L) (12)
Note:

t is an estimate of t. Hence,

t cannot depend on t itself (if it does,
such an estimate is useless!). So the result

t = h(x) intuitively makes sense,
because

t depends only on the observed, given data x and not at all on t.
But the mean and variance of

t generally do depend on t and that is ok !
For the MVUE case, mean E(

t ) = t and variance var(

t) = g(t) both are


purely functions of t alone.
Replacing the scalar random variable X by a vector of random variables
x, the results still hold.
Facts
Identity, if the regularity condition is satised, then
E
_

L
2
_
= E
_

L
_
Fisher information I(t) for data x is dened by
I(t) = E(

L)
So, the minimum variance is the reciprocal of Fisher information. The
more the information, the lower is the CRLB.
For a deterministic signal s[n; t] with an unknown parameter t in zero-
mean AWGN w[n] N
_
0,
2
_
,
x[n] = s[n; t] +w[n] n = 1, 2, . . . , N
the minimum variance (the CRLB, if it exists) is given by
var
_

t
_

N1
n=0
_

t
s[n; t]
_
2
=

2

t
s
2
5
For an estimate

t of t, if the CRLB is known, then for any transfor-
mation = g(t) for some function g() has the new CRLB
CRLB

= CRLB
t
_

t
g(t)
_
2
The CRLB always increases as we estimate more parameters for same
given data.
Let = [
1
,
2
, . . . ,
M
]
T
be a vector parameter. Assume that an esti-
mator

=
_

1
,

2
, . . . ,

M
_
T
is unbiased, that is,
E(

) = E(

i
) =
i
The M M Fisher information matrix I() is a matrix, whose (i, j)
th
element is given by
[I()]
i,j
= E
_

2
ln p(x; )

j
_
Note that p(x; ) is a scalar function, depending on vector parameters x and
. For example, if w[n] is i.i.d N
_
0,
2
_
and x[n] =
1
+n
2
+w[n], then
p(x; ) =
1
(2
2
)
N/2
exp
_

1
2
2
N

n=1
(x[n]
1
n
2
)
2
_
Say x = [1, 2, 5, 3], = [1, 2], = 2 implies p(x; ) = 1.89 10
3
.
Note: The Fisher matrix is symmetric, because the partial derivatives do
not depend on order of evaluation. If the regularity condition
E
_

ln p(x; )
_
= 0
is satised (where the expectation is taken w.r.t p(x; )) then the covariance
matrix of any unbiased estimator

satises
C

I
1
() 0 var(
i
) [I
1
()]
i,i
6
Note: [I
1
()]
i,i
means rst you calculate the whole matrix inverse and
then take the (i, i)
th
element. The covariance matrix of any vector y is given
by

y
= E(y)
C
y
= E
_
(y
y
)(y
y
)
T

Furthermore, an estimator that attains the lower bound,


C

= I
1
()

ln p(x; ) = I()(g(x) )
for some M-dimensional function g and some M M matrix I. That es-
timator, which is the MVUE, is

= g(x), and its covariance matrix is
I
1
().
3.2 Problem 3.1
If x[n] for n = 0, 1, . . . , N 1 are i.i.d. according to U(0, ), show that the
regularity condition does not hold. That is,
E
_

ln p(x; )
_
= 0 > 0
Solution By denition of the expectation operator,
E
_

ln p(x; )
_
=
_ _

ln p(x; )
_
p(x; ) dx =
_

p(x; ) dx (13)
follows from Eq(7). Denote the N random variables as x
i
= x[i 1] for
i = 1, 2, . . . , N. It is given in the problem that their PDFs are identical:
p(x
i
; ) =
_
1/ 0 < x
i

0 otherwise
and
_

0
p(x
i
; ) dx
i
= 1
The multiple integral in Eq(13) simplies to product of integrals
_

p(x; ) dx =
__

0

p(x
1
; ) dx
1
_

__

0

p(x
N
; ) dx
N
_
7
because the x
i
s are independent. Note that the limits of the integral depend
on , so we cannot interchange the order of dierentiation and integration,
_

0

p(x
i
; ) dx
i
=

_

0
p(x
i
; ) dx
i
Hence, the regularity condition fails to hold. In fact, LHS= 1/, but RHS=0!
3.3 Problem 3.3
The data x[n] = Ar
n
+w[n] for n = 0, 1, , N1 are observed, where w[n]
is WGN with variance
2
and r > 0 is known. Find the CRLB of A. Show
that an ecient estimator exists and nd its variance.
Solution Assuming that x[i]s are statistically independent, the joint PDF
is
p(x; A) =
N1

i=0
1
(2
2
)
1/2
exp
_

1
2
2
(x[n] Ar
n
)
2
_
=
1
(2
2
)
N/2
exp
_

1
2
2
N1

n=0
(x[n] Ar
n
)
2
_
= ln p(x; A) = ln
_
2
2
_
N/2

1
2
2
N1

n=0
(x[n] Ar
n
)
2
=

A
ln p(x; A) =
1

2
N1

n=0
r
n
(x[n] Ar
n
)
Since the sum
S =
N1

n=0
r
2n
=
_
r
2N
1
r
2
1
r = 1
N r = 1
is deterministic and known (because both r and N are known), the above
equation simplies to

A
ln p(x; A) =
1

2
_
N1

n=0
r
n
x[n] AS
_
(14)

L =
S

2
_
N1

n=0
r
n
S
x[n] A
_
(15)
= g(A)(h(x) A) (16)
8
where g(A) = S/
2
is a constant (doesnt even depend on A!) and
h(x) =
N1

n=0
r
n
S
x[n]
is depends on x but not on A. Hence, from Theorem 3.1, the MVUE estimate

A is

A = h(x) =
1
S
N1

n=0
r
n
x[n]
and the variance of

A satises
var(

A)

2
S
and CRLB =
1
g(A)
=

2
S
We can also nd the second derivative, from Eq(14),

L =

2
A
2
ln p(x; A) =
S

2
(0 1)
and, as required, CRLB = 1/E[

L] and, in our case, E[

L] =

L because it
is constant (does not depend on x or A).
3.4 Problem 3.5
If x[n] = A+w[n] for n = 1, 2, . . . , N are observed, where w = [w[1], w[2], . . . , w[N]]T
N (0, C), nd the CRLB for A. Does an ecient estimator exist and if so,
what is its variance ?
Solution The joint p.d.f. of x is given by
p(x; A) =
1
_
det(2C)
exp
_

1
2
(x Ae)
T
C
1
(x Ae)
_
= ln p(x; A) = ln
1
_
det(2C)

1
2
(x Ae)
T
C
1
(x Ae)
=

A
ln p(x; A) =
1
2

A
_
(x Ae)
T
C
1
(x Ae)

Using the result that

m
T
Qm = 2
_

m
T
_
Qm
9
Setting Q = C
1
and m = (x Ae),

A
m
T
=

A
(x Ae)
T
= (0

A
Ae
T
) = e
T
So

A
ln p(x; A) = e
T
C
1
(x Ae) = (e
T
C
1
x Ae
T
C
1
e)
The scalar e
T
Qe is nothing but sum of all the elements of Q for any Q.
Consider, for example,
[1, 1, 1]
_
_
a b c
d e f
g h i
_
_
_
_
1
1
1
_
_
(17)
= [a +d +g, b +e +h, c +f +i]
_
_
1
1
1
_
_
(18)
= a +d +g +b +e +h +c +f +i (19)
So, denoting = e
T
C
1
e,

A
ln p(x; A) = (e
T
C
1
x Ae
T
C
1
e) =
_
e
T
C
1
x

A
_
The above expression is clearly of the form

A
ln p(x; A) = g(A)(h(x) A)
Hence, there exists a MVUE (the ecient estimator) given by
MVUE =

A = h(x) =
e
T
C
1
x

=
e
T
C
1
x
e
T
C
1
e
and its variance is
var(

A) =
1

=
1

N
i=1

N
j=1
(C
1
)
i,j
10
3.5 Problem 3.9
We observe two samples of a DC level in correlated Gaussian noise
x[0] = A+w[0]
x[1] = A+w[1]
where w = [w[0], w[1]]
T
is zero mean with covariance matrix
C =
2
_
1
1
_
The parameter is the cross-correlation coecient between w[0] and w[1].
Compute the CRLB of A and compare it to the case when = 0 (WGN).
Also explain what happens when = 1.
Solution: This is a special case of Problem 3.5 (see above) for N = 2.
Since
C
1
=
1

2
(
2
1)
_
1
1
_
the CRLB is
var

A =
1
e
T
C
1
e
=

2
(
2
1)
2( 1)
When = 0, var

A =
2
/2, as expected. But when 1, the matrix C
becomes singular, hence its inverse does not exist; it means that the samples
w[0] and w[1] are almost perfectly correlated and hence do not carried any
additional information.
3.6 Problem 3.13
Consider polynomial curve tting
x[n] =
p1

k=0
A
k
n
k
+w[n]
for n = 0, 1, . . . , N 1. w[n] is i.i.d. WGN with variance
2
. It is desired
to estimate {A
0
, A
1
, . . . , A
p1
}. Find the Fisher information matrix for this
problem.
11
Solution: The joint p.d.f. is
p(x; A) =
N1

n=0
1

2
2
exp
_
_
_

1
2
2
_
x[n]
p1

k=0
A
k
n
k
_
2
_
_
_
=
1
(2
2
)
N/2
exp
_
_
_

1
2
2
N1

n=0
_
x[n]
p1

k=0
A
k
n
k
_
2
_
_
_
= ln p(x; A) = ln
1
(2
2
)
N/2

1
2
2
N1

n=0
_
x[n]
p1

k=0
A
k
n
k
_
2
=

A
i
ln p(x; A) = 0
1
2
2
N1

n=0
_
2
_
x[n]
p1

k=0
A
k
n
k
_
_
0 n
i
_
_
Because

A
i
p1

k=0
A
k
n
k
=

A
i
_
A
1
n
1
+A
2
n
2
+. . . +A
i
n
i
+. . . +A
N
n
N
_
=
_
0 + 0 +. . . +

A
i
A
i
n
i
+ 0
_
= n
i
Hence, the simplication:

A
i
ln p(x; A) =
1

2
N1

n=0
n
i
_
x[n]
p1

k=0
A
k
n
k
_
=

2
A
j
A
i
ln p(x; A) =
1

2
N1

n=0
n
i
_
0 n
j
_
=
1

2
N1

n=0
n
i+j
Hence, by denition, (i, j)
th
entry of the the pp Fisher information matrix
I(A) is given by
[I(A)]
i,j
= E
_

2
A
i
A
j
ln p(x; A)
_
=
1

2
N1

n=0
n
i+j
for i, j = 0, 1, . . . , p1. Note that the Fisher information matrix is symmet-
ric, so the order of evaluation of partial derivatives can be interchanged. See
12
pg. 42, Eq (3.22) in the textbook for a special case of the above for p = 2.
Note that for the (0, 0)
th
entry of the matrix, the above expression gives
N1

n=0
n
i+j
=
N1

n=0
n
0+0
= (0
0
+ 1
0
+. . . + (N 1)
0
)
where 0
0
must be taken as 1 (even though some authors disagree).
4 Chapter 5
Neyman-Fisher Factorization Theorem If we can factor the p.d.f
p(x; th) as
p(x; th) = g(T(x), )h(x)
where g() is a function depending on x only through T(x) and h() is a
function depending only on x, then T(x) is a sucient statistic for . The
converse is also true.
4.1 Problem 5.2
The IID observations x
n
for n = 1, 2, . . . , N have exponential p.d.f
p(x
n
;
2
) =
_
x
n

2
exp(x
2
n
/2
2
) x
n
> 0
0 otherwise
Find a sucient statistic for
2
.
Solution Let u(t) be the unit step function. The joint PDF of x
1
, x
2
, . . . , x
n
is given by (because they are independent),
p(x;
2
) =
N

n=1
p(x
n
;
2
)
=
N

n=1
x
n

2
exp(x
2
n
/2
2
)u(x
n
)
=
_
N

n=1
x
n
u(x
n
)
__
1

2
exp
_

1
2
2
N

n=1
x
2
n
__
= h(x)g(T(x),
2
)
13
whence, the sucient statistic for
2
is T(x)
T(x) =
N

n=1
x
2
n
4.2 Problem 5.5
The IID observations x
n
for n = 1, 2, . . . , N are distributed according to
U[, ], where > 0. Find a sucient statistic for .
Solution The individual sample p.d.f. is given by
p(x
n
; ) =
_
1/2 < x
n
<
0 otherwise
The joint p.d.f is given by
p(x; ) =
N

n=1
p(x
n
; )
=
_
1/(2)
N
< x
n
< , n N
0 otherwise
Dene a function bool(S) for any mathematical statement S such that
bool(S) =
_
1 S is true
0 S is false
(This is also called as Indicator function, see Wikipedia). Then
p(x; ) =
1
(2)
N
bool( < x
n
< , N)
But,
x
n
< = > x
1
and > x
2
and > x
N
= ( > x
1
) ( > x
2
) ( > x
n
)
= > max{x
1
, x
2
, . . . , x
N
}
Similarly,
< x
n
= > x
n
= > max{x
1
, x
2
, . . . , x
N
}
14
Combining both of the above,
< x
n
< = ( < x
n
) ( > x
n
)
= ( > max(x)) ( > max(x))
= > max{|x
1
|, |x
2
|, . . . , |x
N
|}
So, the joind p.d.f. becomes
p(x; ) =
1
(2)
N
bool(max{|x
1
|, |x
2
|, . . . , |x
N
|} < )
= g(T(x), )h(x)
where h(x) = 1 and
T(x) = max{|x
1
|, |x
2
|, . . . , |x
N
|}
g(T(x), ) =
1
(2)
N
bool(T(x) < )
Hence, by Neyman-Fisher factorization theorem, T(x), as given above, is
the sucient statistic. Note: The sample mean is not a sucient statistic
for uniform distribution!
5 Chapter 7: MLE
The MLE for a scalar parameter is dened as the value of parameter t
that maximizes p(x; t) for a given, xed x, i.e., the value that maximizes
the likelihood function. The maximization is performed over the allowable
range of t.
To nd the MLE, solve the equation

t
ln p(x; t) = 0
for t. This equation may have multiple solutions and you should choose the
one appropriately.
Theorem. If an ecient estimator (the estimator which attains CRLB)
exists, then MLE procedure will nd it.
The MLE is
asymptotically unbiased i.e., E(

t ) t as N .
15
asymptotically ecient i.e., var(

t ) CRLB as N .
asymptotically optimal i.e., both of the above are true
Theorem. If the pdf p(x; t) is twice dierentiable and the Fisher informa-
tion I(t) is nonzero, then the MLE of the unknown parameter t is asymp-
totically distributed (for large N) according to

t N
_
t, I
1
(t)
_
i.e., Gaussian distributed with mean equal to true value t and variance equal
to CRLB (= inverse of Fisher information).
Theorem. Assume that the MLE

t of unknown parameter t is known. Con-
sider a transformation function of t,
= f(t)
for any function f(). Then the MLE of is nothing but
= f(

t )
16

You might also like