You are on page 1of 10

B4 Estimation and Inference I

1. A continuous random variable has the pdf


p(x) = kx2(1 − x) 0≤x≤1
Compute the following values: (a) k; (b) its mean; (c) its median (NB only
an equation for the median is required); (d) its mode.
(a) compute k

Require that Z
p(x) dx = 1

1
3
x4 

1 1 x
Z Z
2 2 3
0
kx (1 − x) dx = k 0 (x − x ) dx = k  − 
3 4 0
1 1 k
= k( − ) =
3 4 12
So k = 12
(b) compute mean

1
4
x5 

x
1
Z Z
3 4
µ = xp(x) dx = 12 0 (x − x ) dx = 12  − 
4 5 0
1 1 12
= 12( − ) = = 0.6
5 4 20
(c) compute median

Can start from


Z m Z m 1
0
p(x) dx = 12 0
x2(1 − x) dx =
2

1
or from Z m 2
Z 1
12 0
x (1 − x) dx = 12 m
x2(1 − x) dx
m 1
3
x4  3
x4 
 
x x
 −  = − 
3 4 0 3 4 m
m3 m4 1
2 −2 − =0
3 4 12
6m4 − 8m3 + 1 = 0
(d) compute mode
dp(x)
= 12(2x − 3x2) = 12x(2 − 3x) = 0
dx
x = 0 or x = 23 . Mode is at x = 23 .

1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2
2. Joint probabilities for the discrete random variables X, Z are given by
X=1 X=2 X=3
Z =1 0.2 0.07 0.06
Z =2 0.09 0.1 0.01
Z =3 0.02 0.12 0.09
Z =4 0.06 0.08 0.1
(a) Check the normalisation of the joint distribution
(b) Compute the marginal probability distributions for X, Z.
X=1 X=2 X=3 p(z)
Z=1 0.20 0.07 0.06 0.33
Z=2 0.09 0.10 0.01 0.20
Z=3 0.02 0.12 0.09 0.23
Z=4 0.06 0.08 0.10 0.24
p(x) 0.37 0.37 0.26 1.00
(c) Compute the conditional distribution P (Z|X = 3)
p(z, x = 3)  0.06 0.01 0.09 0.10 
 

p(z|x = 3) = = , , , = (0.231, 0.038, 0.346, 0.385)


p(x = 3) 0.26 0.26 0.26 0.26
(d) Use Bayes’ rule to compute P (X = 3|Z = 4) using only information in
the previous 2 parts; then check the answer by direct calculation from
the table above.

Bayes
 
0.10
p(z = 4|x = 3)p(x = 3) 0.26 (0.26)
p(x = 3|z = 4) = = = 0.417
p(z = 4) 0.24
Direct
p(x = 3, z = 4) 0.10
p(x = 3|z = 4) = = = 0.417
p(z = 4) 0.24

3
3. (a) Show that cov(x1, x2) = E[x1x2] − E[x1]E[x2]
Z
cov(x1x2) = (x1 − µ1)(x2 − µ2)p(x1, x2) dx1 dx2

Z Z
= x1x2p(x1, x2) dx1 dx2 − µ1 x2p(x1, x2) dx1 dx2
Z Z
−µ2 x1p(x1, x2) dx1 dx2 + µ1µ2 p(x1, x2) dx1 dx2

Z Z
= x1x2p(x1, x2) dx1 dx2 − µ2 x1p(x1) dx1
Z
−µ1 x2p(x2) dx2 + µ1µ2

= E[x1x2] − E[x1] E[x2]


(b) If the independent continuous variables x1, x2 are normally distributed
with means and variances µi, σi2, i = 1, 2, derive the distribution of their
sum x1 + x2. Hence show that the variance of the average of N such
variables each with variance σ 2 is σ 2/N .
Method I: Direct computation
Z
mean(x1 + x2) = (x1 + x2) p(x1)p(x2) dx1 dx2

Z Z Z Z
= x1p(x1) dx1 p(x2) dx2 + x2p(x2) dx2 p(x1) dx1

= µ1 + µ2

(x1 + x2 − (µ1 + µ2))2 p(x1)p(x2) dx1 dx2


Z
var(x1 + x2) =

((x1 − µ1) + (x2 − µ2))2 p(x1)p(x2) dx1 dx2


Z
=

Z Z
= (x1 − µ1) p(x1) dx1 + (x2 − µ2)2p(x2) dx2
2
Z Z
+2 (x1 − µ1)p(x1) dx1 (x2 − µ2)p(x2) dx2

= σ12 + σ22
4
Method II: The joint distribution of x1 and x2 is the convolution of
their individual distributions. Convolution adds means and covariances.
x1 + x2 ∼ N (µ1 + µ2, σ12 + σ22)
Derive using Fourier transform:
2
− x2 2 σ2 k2
FT of e 2σ is e−2π

Method III: We can write the joint pdf for x1 and x2 as


 >  
2
 
1 

 1  x1 − µ 1  1/σ1 0 x1 − µ 1 


p(x1, x2) = exp  −   
2 x2 − µ 2 0 1/σ22 x2 − µ 2
    
2πσ1σ2 



Recall that under an affine transformation y = Ax + t, a normal dis-


tribution N (µx, Σx ) is transformed to a normal distribution N (µy , Σy )
as
µy = Aµx + t
Σy = AΣxA>
Think of forming the sum as a projection from 2D to 1D by a matrix
A = [1 1]:  
x 1 
[1 1]   = x1 + x2
x2

Then the mean is given by


 
µ1
A 
= µ1 + µ2
µ2
 

and the covariance by:


2
  
 σ1 0 1
AΣA> = [1 1]  2
 = σ +σ
1
2
2
0 σ22 1
 

5
Hence show that the variance of the average of N such variables each
with variance σ 2 is σ 2/N .

From any of the three method we have established that for a set of
independent normal variables with mean and variances µi, σi2, their sum
is a normal variable with mean and variance i µi, i σi2. Consequently,
P P

if all variable have the same mean and variance, then their average has
mean and variance
1 X 1 X 2 σ2
µi = µ, σ =
N i N2 i i N

(c) Now suppose that those variables are no longer independent but have
correlation coefficient ρ. Find the covariance and information matrices
of the vector variable x = (x1, x2)T . Write down the joint density func-
tion including the normalisation constant.

x1 and x2 correlated with coefficient ρ.


σ12 ρσ1σ2
 

Σ=
 
ρσ1σ2 σ22

σ22
 
1 −ρσ1σ2
Σ−1 = 2 2  
σ1 σ2 − ρ2σ12σ22 −ρσ1σ2 σ12
 

=
1 ρ 
1  σ12 − σ1σ2 

=
1 − ρ2 − σ1ρσ2 σ12
 
2

1 1 1
= = 2 2
det Σ |Σ| σ1 σ2 (1 − ρ2 )
1 1
= √
|Σ|1/2 σ1 σ2 1 − ρ 2
Hence joint density is
  
1 1 −1  x1 − µ1


 
p(x1, x2) = exp − (x 1 − µ ,
1 2x − µ 2 )Σ 
2πσ1σ2 1 − ρ2 x2 − µ 2
 
2
 

6
(d) Find the conditional density function p(x2|x1) including the normalisa-
tion constant.

p(x2, x1)
p(x2|x1) =
p(x1)
First consider the marginal p(x1)

Method I: Again, think of this as a projection from 2D to 1D by a


matrix [1 0], then the covariance is given by:
σ12 ρσ1 σ2
  
>  1  2
AΣA = [1 0]   = σ

1
ρσ1 σ2 σ22 0


Method II: integrate out x2


Z
p(x1) = p(x1, x2) dx2
Consider −2× exponent of p(x1, x2) for ease, and obtain:
 (x2 − µ2 )2 2ρ(x1 − µ1)(x2 − µ2) (x1 − µ1)2 
 
1  
− +
1 − ρ2 
 σ22 σ1 σ2 σ12 

For clarity set µ1 = µ2 = 0, then completing the square


 x2 2ρx1x2 x21 
 
1  2 

 σ2
− +
1 − ρ2  2 σ1 σ2 σ12 

2 2
 
1 
 x2 x1 2 x1

=  − ρ  + (1 − ρ ) 2
1 − ρ2 
 σ2 σ1 σ1 

Then for p(x2|x1), again consider −2× exponent,


 (x2 − µ2 )2 2ρ(x1 − µ1)(x2 − µ2) (x1 − µ1)2 
 (x1 − µ1 )2
 
1 
− + −
1 − ρ2 
 σ22 σ1 σ2 σ12 
 σ12
And after some manipulation this gives
2
1 σ2
 
x − (µ + ρ(x1 − µ1))
2 2

σ22(1 − ρ2) σ1

7
4. The data of question 2 could be taken as characterising a noisy sensor with
output Z, intended to measure a state X.
(a) Given that a measurement Z = 3 has been taken, what is the MLE for
X?
X = 1 X = 2 X = 3 p(z)
Z = 1 0.20 0.07 0.06 0.33
Z = 2 0.09 0.10 0.01 0.20
Z = 3 0.02 0.12 0.09 0.23
Z = 4 0.06 0.08 0.10 0.24
p(x) 0.37 0.37 0.26 1.00

p(z = 3|x = 1) = 0.02/0.37 = 0.054


p(z = 3|x = 2) = 0.12/0.37 = 0.324
p(z = 3|x = 3) = 0.09/0.26 = 0.346 ∗
(b) Now, if conditional probabilities P (Z = i|X = j) are as in part (a) but
prior probabilities (obtained from elsewhere) are as follows:
P (X = 1) = 0.2, P (X = 2) = 0.5, P (X = 3) = 0.3
find the MAP estimate for X.

p(x|z) ∼ p(z|x)p(x)

p(z = 3|x = 1)p(x = 1) = (0.02/0.37)(0.2) = 0.011


p(z = 3|x = 2)p(x = 2) = (0.12/0.37)(0.5) = 0.162 ∗
p(z = 3|x = 3)p(x = 3) = (0.09/0.26)(0.3) = 0.104

8
5. Independent sensors have outputs z1, z2, i = 1, 2 measuring a state x which
are unbiased and normally distributed with standard deviation σ i, i = 1, 2.
(a) Derive the maximum likelihood estimate of x given single measurements
z1, z2 from each sensor.
σ1−2z1 + σ2−2z2
x=
σ1−2 + σ2−2
(b) If also the state itself is known to have a prior distribution that is normal
N (x, σ02) what is the posterior density for x?
σ1−2z1 + σ2−2z2 + σ0−2 x
x=
σ1−2 + σ2−2 + σ0−2
(c) If, instead, the prior density is given by the exponential distribution


 a exp(−ax) if x ≥ 0
p(x) = 
 0 otherwise
and only a single sensor reading z1 is made, what is the MAP estimate
for x?
(z1 − x)2
p(x|z) ∼ exp{−(1/2) }. exp −ax x > 0
σ12
Consider
(z1 − x)2
log p(x|z) = −(1/2) − ax + const.
σ12

d/dx → σ1−2 (z1 − x) − a = 0


Hence
x̂ = z1 − σ12a
So 

 z1 − σ12a z1 > σ12a
x̂ = 
 0 otherwise

9
(d) Suppose in part b) that
x = 10, σ0 = 0.3, σ1 = σ2 = 0.4
and that the sensor readings are
z1 = 10.5 z2 = 12.0.
What is the MAP estimate for x, based on these data?
z1 = 10.5 σ1 = 0.4 σ12 = 0.16
z2 = 12.0 σ2 = 0.4 σ22 = 0.16
x̄ = 10.0 σ0 = 0.3 σ02 = 0.09

10.5 12.0 10.0


0.16 + 0.16 + 0.09
x̂MAP = 1 1 1 = 10.66
0.16 + 0.16 + 0.09

10

You might also like