You are on page 1of 15

2 Functions of random variables

There are three main methods to find the distribution of a function of


one or more random variables. These are to use the CDF, to trans-
form the pdf directly or to use moment generating functions. We
shall study these in turn and along the way find some results which
are useful for statistics.

2.1 Method of distribution functions

I shall give an example before discussing the general method.


Example 2.1. Suppose the random variable Y has a pdf
fY (y) = 3y 2 0<y<1
and zero otherwise. Suppose we want to find the pdf of U = 2Y + 3.
The range of U is 3 < U < 5. Now
 
u−3
FU (u) = P (U ≤ u) = P (2Y + 3 ≤ u) = P Y ≤
2

  u−3 u−3  3
u−3 u−3
Z Z
2 2
P Y ≤ = f (y) dy = 3y 2 dy =
2 0 0 2
so


0
 u<3
u−3 3

FU (u) = 2 3≤u≤5

1 u>5

and (
3
dF 8 (u − 3)2 3≤5
fU (u) = =
du 0 otherwise

11
The general method works as follows.

1. U is a function of n random variables Y1 , Y2 , . . . , yn .


2. Find the region U = u in (y1 , y2 , . . . , yn ) space.
3. Find the region U ≤ u.
4. Find FU (u) = P (U ≤ u) by integrating the density f (y1 , y2 , . . . , yn )
over the region U ≤ u.
5. Find the density function fU (u) by differentiating FU (u).

The cdf method is useful for dealing with the squares of random
variables. Suppose U = X 2 , then

FU (u) = P (U ≤ u)
= P (X 2 ≤ u)
√ √
= P (− u ≤ X ≤ u)
Z √u
= √
f (x)dx
− u
√ √
= FX ( u) − FX (− u).

So if we differentiate both sides with respect to u we find


√ √
   
1 1
fU (u) = fX ( u) √ + fX (− u) √
2 u 2 u
1  √ √ 
= √ fX ( u) + fX (− u) .
2 u

So, for example if fX (x) = (x + 1)/2 − 1 ≤ x ≤ 1 and zero

12
otherwise and U = X 2 then
1  √ √ 
fU (u) = √ fX ( u) + fX (− u)
2 u
√ √ 
1 u+1 − u+1
= √ +
2 u 2 2
1
= √ 0≤u≤1
2 u
Example 2.2. As a more important example suppose Z ∼ N (0, 1)
so that
 2
1 z
fZ (z) = √ exp − − ∞ < z < ∞.
2π 2
Then if U = Z 2
1  √ √ 
fU (u) = √ fX ( u) + fX (− u)
2 u
  u 
1 1  u 1
= √ √ exp − + √ exp −
2 u 2π 2 2π 2
1  u 
= √ exp −
2πu 2

We don’t immediately recognise this pdf. We need a couple of defi-


nitions:
Definition 2.1. We say the random variable Y has a Gamma distri-
bution with parameters α > 0 and β > 0, which we shall write as
Y ∼ Ga(α, β) if
β α y α−1
fY (y) = exp(−βy) 0 ≤ y < ∞.
Γ(α)

where Z ∞
Γ(α) = y α−1 exp(−y) dy
0
is the Gamma function.

13
Note that some text books define a Gamma distribution with 1/β
instead, it doesn’t really matter.
Now we can see that
Z ∞ ∞
exp(−y) dy = −e−y 0 = 1.

Γ(1) =
0
Also if we integrate Γ(α) by parts we see that
Z ∞
Γ(α) = y α−1 exp(−y) dy
0 Z ∞
−y ∞
 α−1
(α − 1)y α−2 exp(−y) dy

= y (−e ) 0 +
Z ∞ 0
= 0 + (α − 1) y α−2 exp(−y) dy
0
= (α − 1)Γ(α − 1)

Note that as Γ(1) = 1 we have Γ(2) = 1 × Γ(1), Γ(3) = 2 ×


Γ(2) = 2, Γ(4) = 3 × Γ(3) = 6, and so on so that if n is an integer
Γ(n) = (n − 1)!.

It is possible to show, but we are not going to, that Γ( 21 ) = π.
Definition 2.2. We say that a random variable Y with a Ga(ν/2, 1/2)
distribution where ν is an integer has a Chi-Square distribution with
ν degrees of freedom and we write it as Y ∼ χ2ν . ν is the Greek letter
nu.
Example 2.3. We showed in Example 2.2 that the square of a stan-
dard normal distribution had pdf
1  u
fU (u) = √ exp − .
2πu 2
We can rewrite this, using the results above as
1/21/2 u−1/2
fU (u) = exp(−u/2)
Γ(1/2)
and so U has a Ga(1/2, 1/2) or χ21 distribution.

14
So we have proved the following theorem.
Theorem 2.1. If the random variable Z ∼ N (0, 1) then Z 2 ∼ χ21 .

Example 2.4. Probability Integral Transform.


Suppose X is a random variable with cdf FX . We can ask what is
the distribution of Y = FX (X)? the cdf FX is non-decreasing and
FX−1 (y) can be defined for 0 < y < 1 as the smallest x satisfying
FX (x) ≥ y. Therefore

P (Y ≤ y) = P (FX (X) ≤ y)
= P (X ≤ FX−1 (y))
= FX (FX−1 (y))
= y 0<y<1

so that Y has a uniform distribution on the interval (0, 1), Y ∼


U (0, 1).
Conversely if U ∼ U (0, 1) so that P (U ≤ u) = u then if X =
FX−1 (U )

P (X ≤ x) = P (FX−1 (U ) ≤ x)
= P (U ≤ FX (x))
= FX (x)

This result is useful to allow us to simulate values from particular


distributions, very helpful in doing lots of modern statistical meth-
ods. There a lots of ways of generating ui , a realisation of a U (0, 1)
rv. So if we want to generate from a distribution with cdf FX we
find xi = FX−1 (ui ). For example if FX (x) = 1 − e−λx for x > 0,
the cdf of an exponential distribution with mean λ−1 then FX−1 (u) =
−λ−1 ln(1 − u).

15
2.2 Method of direct transformation

In MTH4106 you saw how to transform a random variable by a


monotone function. The theorem, which I will not prove again was
as follows:
Theorem 2.2. Let X be a continuous random variable with prob-
ability density function fX and support I, where I = [a, b]. Let
g : I → < be a continuous monotonic function with inverse function
h : J → I, where J = g(I). Let Y = g(X). Then the probability
density function fY of Y satisfies
(
fX (h(y))|h0 (y)| if y ∈ J
fY (y) =
0 otherwise.

Example 2.5. Suppose X has the density


θ
fX (x) = x>1
xθ+1
and zero otherwise, where θ is a positive parameter. This is an exam-
ple of a Pareto distribution. We want to find the density of Y = ln X.
As the support of X, i.e. the range on which the density is non-zero,
is x > 1 the support of Y is y > 0. The inverse transformation is
x = ey and dx y
dy = e . Therefore

θ
fY (y) = y θ+1
× ey
(e )
= θe−yθ y > 0

and so Y has an exponential distribution.

There is an analogous theorem for transforming 2, or indeed n ran-


dom variables. I am not going to prove this as a theorem, see results
in calculus courses, so I will indicate the method.

16
Suppose X1 , X2 have joint pdf fX1 ,X2 (x1 , x2 ) with support A =
{(x1 , x2 ) : f (x1 , x2 ) > 0}. We are interested in the random vari-
ables Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ). The transforma-
tion y1 = g1 (x1 , x2 ) ,y2 = g2 (x1 , x2 ) is a 1-1 transformation of A
onto B. So there is an inverse transformation x1 = g1−1 (y1 , y2 )x2 =
g2−1 (y1 , y2 ). Let the determinant J, given by

∂x1 ∂x1
∂y1 ∂y2
J = ∂x
∂y12 ∂x 2

∂y2

be the Jacobian of the transformation where we assume the partial


derivatives are continuous and J 6= 0 for (y1 , y2 ) ∈ B. Then the
joint pdf of Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ) is

fY1 ,Y2 (y1 , y2 ) = |J|fX1 ,X2 (g1−1 (y1 , y2 ), g2−1 (y1 , y2 )) (y1 , y2 ) ∈ B.

We will look at some examples.


Example 2.6. X1 and X2 have joint pdf

f (x1 , x2 ) = exp(−(x1 + x2 )) x1 ≥ 0, x2 ≥ 0.

Consider the transformation y1 = x1 and y2 = x1 + x2 with inverse


x1 = y1 , x2 = y2 − y1 . The set B = {(y1 , y2 ) : 0 ≤ y1 ≤ y2 ≤ ∞}.
The Jacobian is
1 0
J = = 1.
−1 1
So the joint pdf of Y1 and Y2 is given by

fY1 ,Y2 (y1 , y2 ) = 1 × exp(−y2 ) 0 ≤ y1 ≤ y 2 ≤ ∞

If we want the pdf of Y2 = X1 + X2 we must find the marginal pdf


of Y2 by integrating out Y1 .
Z y2
fY2 (y2 ) = e−y2 dy1 = y2 e−y2 0 ≤ y2 ≤ ∞.
0

17
Note in this example that as we started with 2 random variables we
have to transform to 2 random variables. If we are only interested in
one of them we can integrate out the other.
Example 2.7. X1 and X2 have joint pdf

f (x1 , x2 ) = 8x1 x2 0 < x1 < x2 < 1.

Suppose we want to find the pdf of Y1 = X1 /X2 . We need another


variable we choose Y2 = X2 as we can then find the inverse easily.
The inverse is x1 = y1 y2 , x2 = y2 . The Jacobian is

y 2 y1
J = = y2 .
0 1

We have A = {0 < x1 < x2 < 1} which implies that B = {0 <


y1 y2 < y2 < 1} which implies that B = {0 < y1 < 1, 0 < y2 < 1}.
So
f (y1 , y2 ) = 8(y1 y2 )y2 × y2 = 8y1 y23 (y1 , y2 ) ∈ B.
Thus the marginal pdf of Y1 is
Z 1  4 1
y
f (y1 ) = 8y1 y2 dy2 = 8y1 2 = 2y1
3
0 < y1 < 1.
0 4 0

2.3 Method of moment generating functions

The moment generating function of a random variable X, written as


MX (t) is defined by
MX (t) = E[etX ]
and is defined for t in a region about 0, −h < t < h for some h.
Why is MX (t) useful?

18
First note that MX (0) = 1. Differentiating MX (t) with respect to t
assuming X is continuous we have
Z
d
MX0 (t) = etx f (x) dx
Zdt
= xetx f (x) dx
Z
MX0 (0) = xf (x) dx
= E[X]
d
where we assume we can take dt inside the integral.
Similarly
d2
Z
MX00 (t) = 2
etx f (x) dx
Zdt
= x2 etx f (x) dx
Z
MX00 (0) = x2 f (x) dx
= E[X 2 ].

Hence Var[X] = MX00 (0) − (MX0 (0))2 .


If we can calculate the value of the integral (or sum for a discrete rv)
in terms of t then we can find the moments of X by differentiation.
Example 2.8. Suppose X is a discrete binomial random variable with
probability mass function
 
n
f (x) = px (1 − p)n−x x = 0, 1, . . . , n.
x

19
Then the moment generating function is
n  
X
tx n
M (t) = e px (1 − p)n−x
x
x=0
n  
X n
= (pet )x (1 − p)n−x
x
x=0t n
= pe + (1 − p)

Thus n−1 t
M 0 (t) = n pet + (1 − p)

pe
so M 0 (0) = np = E[X].
Also
n−2 2 2 n−1 t
M 00 (t) = n(n−1) pet + (1 − p) p e t+n pet + (1 − p)
 
pe

so M 00 (0) = n(n−1)p2 +np and Var[X] = n(n−1)p2 +np−(np)2 =


np − np2 = np(1 − p).
Example 2.9. Consider an exponential distribution with pdf f (x) =
λe−λx 0 < x. Then
Z ∞
M (t) = etx λe−λx dx
Z0 ∞
= λe−x(λ−t) dx
0
 −x(λ−t) ∞
e
= λ −
λ−t 0
λ
= .
λ−t
So M 0 (t) = λ(λ − t)−2 , M 0 (0) = λ−1 = E[X] and M 00 (t) = 2λ(λ −
t)−3 , M 00 (0) = 2λ−2 and hence Var[X] = 2λ−2 − (λ−1 )2 = λ−2 .

20
Example 2.10. Suppose X has a Gamma distribution, Ga(α, β). We
find the mgf of X as follows.
Z ∞ α
tx β
MX (t) = e xα−1 exp(−βx) dx
Γ(α)
Z0 ∞ α
β
= xα−1 exp(−x(β − t)) dx
0 Γ(α)
α Z ∞
β (β − t)α α−1
= x exp(−x(β − t)) dx
(β − t)α 0 Γ(α)
Now the integral is of the pdf of a Ga(α, β − t) random variable and
so is equal to 1. Note we have to have t < β to make this work. That
is ok, so long as we can let t → 0 which we can.
Thus the mgf for a Ga(α, β) random variable X is
 α
β
MX (t) = .
β−t

We now find the mgf of a normally distributed rv.


Example 2.11. Suppose X has a normal distribution, N (µ, σ 2 ), find
the mgf of X.
Z ∞
(x − µ)2
 
tx 1
MX (t) = e √ exp − dx
−∞ σ 2π  2σ 2
Z ∞ 
1 1 2 2 2
= √ exp − 2 [x − 2µx + µ − 2σ tx] dx
σ 2π 2σ
Z−∞
∞  
1 1 2
= √ exp − 2 [x − 2x(µ + σ 2 t) + µ2 ] dx
−∞ σ 2π 2σ

Now we complete the square:

[x2 − 2x(µ + σ 2 t) + µ2 ] = [x − (µ + σ 2 t)]2 + µ2 − (µ + σ 2 t)2


= [x − (µ + σ 2 t)]2 − (2µσ 2 t + σ 4 t2 )

21
and so as the final bracket does not depend on x we can take it out-
side the integral to give
Z ∞
2µσ 2 t + σ 4 t2
   
1 1 2 2 2
MX (t) = √ exp − 2 [x − (µ + σ t)] dx exp .
−∞ σ 2π 2σ 2σ 2
Now the function inside the integral is the pdf of a N (µ + σ 2 t, σ 2 )
rv and so is equal to one. Thus
σ 2 t2
 
MX (t) = exp µt + .
2
 
0 σ 2 t2
Now differentiating the mgf we find M (t) = exp µt + 2 µ +
σ 2 t), M 0 (0) = µ = E[X] and
2 2 2 2
   
σ t σ t
M 00 (t) = exp µt + (µ + σ 2 t)2 + exp µt + σ2,
2 2
M 00 (0) = µ2 + σ 2 and hence Var[X] = µ2 + σ 2 − (µ)2 = σ 2 .

The following theorem, which we won’t prove, tells us why we can


use the mgf to find the distributions of transformed variables.
Theorem 2.3. If X1 and X2 are random variables and MX1 (t) =
MX2 (t) then X1 and X2 have the same distribution.
Example 2.12. Suppose Z ∼ N (0, 1) and Y = Z 2 . Find the distri-
bution of Y using the mgf technique.
We have that
MY (t) = E[etY ]
2
= E[etZ ]
Z ∞  2
2 1 z
= etz √ exp − dz
2π 2
Z−∞∞  2 
1 z (1 − 2t)
= √ exp − dz
−∞ 2π 2
Z ∞ 1/2
 
(1 − 2t) (1 − 2t)
= (1 − 2t)−1/2 √ exp − z 2 dz
−∞ 2π 2
22
Now the function inside the integral is the pdf of a N (0, (1 − 2t)− 1)
rv and so equals one.
Therefore
 1/2
1
MY (t) =
1 − 2t
 1/2
1/2
=
1/2 − t
which is the mgf of a Ga(1/2, 1/2) rv or equivalently of a χ21 rv.
Thus the distribution of Y is χ21 .

The moment generating function is also useful for proving other re-
sults. For example results about sums of random variables.
Suppose X1 , X2 , . .P
. , Xn are independent rvs with mgf MXi (t), i =
1, . . . , n. Let Y = i Xi then
n
Y
MY (t) = MXi (t).
i=1

This is easily proved.

MY (t) = E[etY ]
P
= E[et Xi ]
= E[etX1 etX2 · · · etXn ]
= E[etX1 ] E[etX2 ] · · · E[etXn ] by independence
= MX1 (t)MX2 (t) · · · MXn (t)

Example 2.13. If X1 , X2 , . . . , Xn are independent,


P each with an ex-
−1
ponential distribution mean λ show that Y = Xi has a Ga(n, λ)
distribution.
We showed in example 2.9 that for an exponential distribution MXi (t) =

23
λ/(λ − t). Thus
n  n
Y λ λ
MY (t) = =
i=1
λ−t λ−t

which is the mgf of a Ga(n, λ) rv.


Example 2.14. Suppose Y1 , Y2 , . . . , Yn are independent and normally
distributed with mean E[Yi ] = µi and variance Var[Yi ] = σi2 . Define
U = a1 Y1 +a2 Y2 +· · ·+an Yn where ai i = 1, 2, . . . , n arePconstants.
Show that U is normally
P 2 distributed with mean E[U ] = ai µi and
variance Var[U ] = ai σi2 .
The mgf of Yi is MYi (t) = exp(µi t + 12 σi2 t2 ) so the mgf of ai Yi is
Mai Yi (t) E[eai Yi t ] = exp(µi ai t + 21 σi2 a2i t2 ). The Yi are independent
so the ai Yi are independent. Hence
Y
MU (t) = Mai Yi (t)
i
Y 1
= exp(µi ai t + σi2 a2i t2 )
2
X 1 X 2 2 2
= exp(( ai µi )t + ( ai σi )t )
2
ComparingP a normal we see that U is normal
this with the mgf ofP
with mean ai µi and variance a2i σi2 .

We give the next result about sums of standard normal rvs as a the-
orem as the result is important.
Theorem 2.4. Suppose Y1 , . . . , Yn are independent, normally dis-
tributed with mean E[Yi ] = µi and variance Var[Yi ] = σi2 . Let
Zi = (Yi − µi )/σi so that Z1P
, . . . , Zn are independent and each has
a N (0, 1) distribution. Then Zi2 has a χ2n distribution.
Proof. We have seen before that each Zi2 has a χ21 distribution. So

24
MZi2 (t) = (1 − 2t)−1/2 . Let V =Zi2 . Then
P

Y
MV (t) = MZi2 (t)
1
=
(1 − 2t)n/2
 1  n2
2
= 1
2 −t

but this is the mgf of a Ga(n/2, 1/2) random variable, that is a χ2n
rv.


25

You might also like