Answers 1

B4 Estimation and Inference I
1. A continuous random variable has the pdf

p(x) = kx2(1 − x) 0≤x≤1
Compute the following values: (a) k; (b) its mean; (c) its median (NB only
an equation for the median is required); (d) its mode.
(a) compute k
Require that Z
p(x) dx = 1
1
3
x4 

1 1 x
Z Z
2 2 3
0
kx (1 − x) dx = k 0 (x − x ) dx = k  − 
3 4 0
1 1 k
= k( − ) =
3 4 12
So k = 12
(b) compute mean
1
4
x5 

x
1
Z Z
3 4
µ = xp(x) dx = 12 0 (x − x ) dx = 12  − 
4 5 0
1 1 12
= 12( − ) = = 0.6
5 4 20
(c) compute median
Can start from

Z m Z m 1
0
p(x) dx = 12 0
x2(1 − x) dx =
2
1
or from Z m 2
Z 1
12 0
x (1 − x) dx = 12 m
x2(1 − x) dx
m 1
3
x4  3
x4 
 
x x
 −  = − 
3 4 0 3 4 m
m3 m4 1
2 −2 − =0
3 4 12
6m4 − 8m3 + 1 = 0
(d) compute mode
dp(x)
= 12(2x − 3x2) = 12x(2 − 3x) = 0
dx
x = 0 or x = 23 . Mode is at x = 23 .
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2
2. Joint probabilities for the discrete random variables X, Z are given by
X=1 X=2 X=3
Z =1 0.2 0.07 0.06
Z =2 0.09 0.1 0.01
Z =3 0.02 0.12 0.09
Z =4 0.06 0.08 0.1
(a) Check the normalisation of the joint distribution
(b) Compute the marginal probability distributions for X, Z.
X=1 X=2 X=3 p(z)
Z=1 0.20 0.07 0.06 0.33
Z=2 0.09 0.10 0.01 0.20
Z=3 0.02 0.12 0.09 0.23
Z=4 0.06 0.08 0.10 0.24
p(x) 0.37 0.37 0.26 1.00
(c) Compute the conditional distribution P (Z|X = 3)
p(z, x = 3)  0.06 0.01 0.09 0.10 
 
p(z|x = 3) = = , , , = (0.231, 0.038, 0.346, 0.385)

p(x = 3) 0.26 0.26 0.26 0.26
(d) Use Bayes’ rule to compute P (X = 3|Z = 4) using only information in
the previous 2 parts; then check the answer by direct calculation from
the table above.
Bayes

0.10
p(z = 4|x = 3)p(x = 3) 0.26 (0.26)
p(x = 3|z = 4) = = = 0.417
p(z = 4) 0.24
Direct
p(x = 3, z = 4) 0.10
p(x = 3|z = 4) = = = 0.417
p(z = 4) 0.24
3
3. (a) Show that cov(x1, x2) = E[x1x2] − E[x1]E[x2]
Z
cov(x1x2) = (x1 − µ1)(x2 − µ2)p(x1, x2) dx1 dx2
Z Z
= x1x2p(x1, x2) dx1 dx2 − µ1 x2p(x1, x2) dx1 dx2
Z Z
−µ2 x1p(x1, x2) dx1 dx2 + µ1µ2 p(x1, x2) dx1 dx2
Z Z
= x1x2p(x1, x2) dx1 dx2 − µ2 x1p(x1) dx1
Z
−µ1 x2p(x2) dx2 + µ1µ2
= E[x1x2] − E[x1] E[x2]

(b) If the independent continuous variables x1, x2 are normally distributed
with means and variances µi, σi2, i = 1, 2, derive the distribution of their
sum x1 + x2. Hence show that the variance of the average of N such
variables each with variance σ 2 is σ 2/N .
Method I: Direct computation
Z
mean(x1 + x2) = (x1 + x2) p(x1)p(x2) dx1 dx2
Z Z Z Z
= x1p(x1) dx1 p(x2) dx2 + x2p(x2) dx2 p(x1) dx1
= µ1 + µ2
(x1 + x2 − (µ1 + µ2))2 p(x1)p(x2) dx1 dx2

Z
var(x1 + x2) =
((x1 − µ1) + (x2 − µ2))2 p(x1)p(x2) dx1 dx2

Z
=
Z Z
= (x1 − µ1) p(x1) dx1 + (x2 − µ2)2p(x2) dx2
2
Z Z
+2 (x1 − µ1)p(x1) dx1 (x2 − µ2)p(x2) dx2
= σ12 + σ22
4
Method II: The joint distribution of x1 and x2 is the convolution of
their individual distributions. Convolution adds means and covariances.
x1 + x2 ∼ N (µ1 + µ2, σ12 + σ22)
Derive using Fourier transform:
2
− x2 2 σ2 k2
FT of e 2σ is e−2π
Method III: We can write the joint pdf for x1 and x2 as

 >  
2
 
1 

 1  x1 − µ 1  1/σ1 0 x1 − µ 1 


p(x1, x2) = exp  −   
2 x2 − µ 2 0 1/σ22 x2 − µ 2
    
2πσ1σ2 




Recall that under an affine transformation y = Ax + t, a normal dis-

tribution N (µx, Σx ) is transformed to a normal distribution N (µy , Σy )
as
µy = Aµx + t
Σy = AΣxA>
Think of forming the sum as a projection from 2D to 1D by a matrix
A = [1 1]:  
x 1 
[1 1]   = x1 + x2
x2

Then the mean is given by

 
µ1
A 
= µ1 + µ2
µ2
 
and the covariance by:

2
  
 σ1 0 1
AΣA> = [1 1]  2
 = σ +σ
1
2
2
0 σ22 1
 
5
Hence show that the variance of the average of N such variables each
with variance σ 2 is σ 2/N .
From any of the three method we have established that for a set of
independent normal variables with mean and variances µi, σi2, their sum
is a normal variable with mean and variance i µi, i σi2. Consequently,
P P
if all variable have the same mean and variance, then their average has
mean and variance
1 X 1 X 2 σ2
µi = µ, σ =
N i N2 i i N
(c) Now suppose that those variables are no longer independent but have
correlation coefficient ρ. Find the covariance and information matrices
of the vector variable x = (x1, x2)T . Write down the joint density func-
tion including the normalisation constant.
x1 and x2 correlated with coefficient ρ.

σ12 ρσ1σ2
 
Σ=
 
ρσ1σ2 σ22

σ22
 
1 −ρσ1σ2
Σ−1 = 2 2  
σ1 σ2 − ρ2σ12σ22 −ρσ1σ2 σ12
 
=
1 ρ 
1  σ12 − σ1σ2 

=
1 − ρ2 − σ1ρσ2 σ12
 
2
1 1 1
= = 2 2
det Σ |Σ| σ1 σ2 (1 − ρ2 )
1 1
= √
|Σ|1/2 σ1 σ2 1 − ρ 2
Hence joint density is
  
1 1 −1  x1 − µ1

√
 
p(x1, x2) = exp − (x 1 − µ ,
1 2x − µ 2 )Σ 
2πσ1σ2 1 − ρ2 x2 − µ 2
 
2
 

6
(d) Find the conditional density function p(x2|x1) including the normalisa-
tion constant.
p(x2, x1)
p(x2|x1) =
p(x1)
First consider the marginal p(x1)
Method I: Again, think of this as a projection from 2D to 1D by a

matrix [1 0], then the covariance is given by:
σ12 ρσ1 σ2
  
>  1  2
AΣA = [1 0]   = σ

1
ρσ1 σ2 σ22 0

Method II: integrate out x2

Z
p(x1) = p(x1, x2) dx2
Consider −2× exponent of p(x1, x2) for ease, and obtain:
 (x2 − µ2 )2 2ρ(x1 − µ1)(x2 − µ2) (x1 − µ1)2 
 
1  
− +
1 − ρ2 
 σ22 σ1 σ2 σ12 

For clarity set µ1 = µ2 = 0, then completing the square

 x2 2ρx1x2 x21 
 
1  2 
 σ2
− +
1 − ρ2  2 σ1 σ2 σ12 

2 2
 
1 
 x2 x1 2 x1

=  − ρ  + (1 − ρ ) 2
1 − ρ2 
 σ2 σ1 σ1 
Then for p(x2|x1), again consider −2× exponent,

 (x2 − µ2 )2 2ρ(x1 − µ1)(x2 − µ2) (x1 − µ1)2 
 (x1 − µ1 )2
 
1 
− + −
1 − ρ2 
 σ22 σ1 σ2 σ12 
 σ12
And after some manipulation this gives
2
1 σ2
 
x − (µ + ρ(x1 − µ1))
2 2

σ22(1 − ρ2) σ1
7
4. The data of question 2 could be taken as characterising a noisy sensor with
output Z, intended to measure a state X.
(a) Given that a measurement Z = 3 has been taken, what is the MLE for
X?
X = 1 X = 2 X = 3 p(z)
Z = 1 0.20 0.07 0.06 0.33
Z = 2 0.09 0.10 0.01 0.20
Z = 3 0.02 0.12 0.09 0.23
Z = 4 0.06 0.08 0.10 0.24
p(x) 0.37 0.37 0.26 1.00
p(z = 3|x = 1) = 0.02/0.37 = 0.054

p(z = 3|x = 2) = 0.12/0.37 = 0.324
p(z = 3|x = 3) = 0.09/0.26 = 0.346 ∗
(b) Now, if conditional probabilities P (Z = i|X = j) are as in part (a) but
prior probabilities (obtained from elsewhere) are as follows:
P (X = 1) = 0.2, P (X = 2) = 0.5, P (X = 3) = 0.3
find the MAP estimate for X.
p(x|z) ∼ p(z|x)p(x)
p(z = 3|x = 1)p(x = 1) = (0.02/0.37)(0.2) = 0.011

p(z = 3|x = 2)p(x = 2) = (0.12/0.37)(0.5) = 0.162 ∗
p(z = 3|x = 3)p(x = 3) = (0.09/0.26)(0.3) = 0.104
8
5. Independent sensors have outputs z1, z2, i = 1, 2 measuring a state x which
are unbiased and normally distributed with standard deviation σ i, i = 1, 2.
(a) Derive the maximum likelihood estimate of x given single measurements
z1, z2 from each sensor.
σ1−2z1 + σ2−2z2
x=
σ1−2 + σ2−2
(b) If also the state itself is known to have a prior distribution that is normal
N (x, σ02) what is the posterior density for x?
σ1−2z1 + σ2−2z2 + σ0−2 x
x=
σ1−2 + σ2−2 + σ0−2
(c) If, instead, the prior density is given by the exponential distribution


 a exp(−ax) if x ≥ 0
p(x) = 
 0 otherwise
and only a single sensor reading z1 is made, what is the MAP estimate
for x?
(z1 − x)2
p(x|z) ∼ exp{−(1/2) }. exp −ax x > 0
σ12
Consider
(z1 − x)2
log p(x|z) = −(1/2) − ax + const.
σ12
d/dx → σ1−2 (z1 − x) − a = 0

Hence
x̂ = z1 − σ12a
So 

 z1 − σ12a z1 > σ12a
x̂ = 
 0 otherwise
9
(d) Suppose in part b) that
x = 10, σ0 = 0.3, σ1 = σ2 = 0.4
and that the sensor readings are
z1 = 10.5 z2 = 12.0.
What is the MAP estimate for x, based on these data?
z1 = 10.5 σ1 = 0.4 σ12 = 0.16
z2 = 12.0 σ2 = 0.4 σ22 = 0.16
x̄ = 10.0 σ0 = 0.3 σ02 = 0.09
10.5 12.0 10.0

0.16 + 0.16 + 0.09
x̂MAP = 1 1 1 = 10.66
0.16 + 0.16 + 0.09
10

Answers 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Answers 1

Uploaded by

Copyright:

Available Formats

B4 Estimation and Inference I

1. A continuous random variable has the pdf

Can start from

p(z|x = 3) = = , , , = (0.231, 0.038, 0.346, 0.385)

= E[x1x2] − E[x1] E[x2]

(x1 + x2 − (µ1 + µ2))2 p(x1)p(x2) dx1 dx2

((x1 − µ1) + (x2 − µ2))2 p(x1)p(x2) dx1 dx2

Method III: We can write the joint pdf for x1 and x2 as

Recall that under an affine transformation y = Ax + t, a normal dis-

Then the mean is given by

and the covariance by:

x1 and x2 correlated with coefficient ρ.

Method I: Again, think of this as a projection from 2D to 1D by a

Method II: integrate out x2

For clarity set µ1 = µ2 = 0, then completing the square

Then for p(x2|x1), again consider −2× exponent,

p(z = 3|x = 1) = 0.02/0.37 = 0.054

p(z = 3|x = 1)p(x = 1) = (0.02/0.37)(0.2) = 0.011

d/dx → σ1−2 (z1 − x) − a = 0

10.5 12.0 10.0

You might also like