You are on page 1of 26

STAT272 Probability

Topic 5
Moments

STAT272 2015 Topic 5 1


Expectation so far
• We have so far considered
n E (X) , theo mean or expected value of
2
X (denoted µX ), and E (X − µX ) , the variance of X
2
(denoted σX ).

• We can calculate the


n expectedovalues
n of higher powers
o of
3 4
(X − µX ), e.g. E (X − µX ) , E (X − µX ) etc.

• These are called the moments of X.

STAT272 2015 Topic 5 2


Moments
• There are two types of moment: “raw” and “central”.
• The k th raw moment of X is defined by
k


µk = E X .

• The k th central moment of X is defined by:


n o
k
µk = E (X − µX )

• Properties:

µ′1 = E (X) = µX
n o
1
µ1 = E (X − µX ) = 0

STAT272 2015 Topic 5 3


for any distribution which has a finite mean. Also,
2
µ2 = var X = σX
0


µ0 = E X = E (1) = 1

• Note: we shall drop the X suffix when considering moments, as


long as this is unambiguous.
n If there is oany ambiguity, we’ll use
µX,k as the notation for E (X − µX )k , etc.

STAT272 2015 Topic 5 4


Computation of Moments

k

µ′k =E X
 R
 ∞ xk fX (x) dx ; X continuous
−∞
= P k
 x fX (x) ; X discrete
x

n o
k
µk = E (X − µ)
 R
 ∞ (x − µ)k fX (x) dx ; X continuous
−∞
= P k
 (x − µ) fX (x) ; X discrete
x

STAT272 2015 Topic 5 5


• Central moments may be computed from raw moments by using
the binomial expansion:
   
k k
(X − µ)k = X k − X k−1 µ + X k−2 µ2 − · · · + (−1)k µk
1 2
k  
i k
X
= (−1) X k−i µi .
i=0
i

Hence,
k  
i k
n o
k
X
µk = E (X − µ) = (−1) µ′k−i µi
i=0
i

where µ = µ′1 = E (X).

STAT272 2015 Topic 5 6


• e.g.
2  
i 2
X
2
µ2 = σX = (−1) µ′2−i µi
i=0
i
 
2 ′

= µ2 − µ1 µ + µ2
1
= µ′2 − µ2

or
2
var X = E(X 2 ) − [E(X)]
which is a widely used formula for computing variance.

STAT272 2015 Topic 5 7


Example: Suppose X is a rv with pdf

 3 ; 1<x<∞
x4
fX (x) =
 0 ; x≤1

The raw moments are


Z∞  −2 ∞

3 x 3
µ′1 = x dx = 3 = − (0 − 1) = 1.5
x4 −2 1 2
1
Z∞  −1 ∞

3 x
µ′2 = x2 dx = 3 = −3 (0 − 1) = 3
x4 −1 1
1
Z∞
33
µ′3 = x 4 dx = 3 [log x]∞
1 =∞
x
1

STAT272 2015 Topic 5 8


and for k ≥ 3
µ′k = ∞.

The central moments are

µ1 = 0
2 2
µ2 = µ′2 − (µ′1 ) = 3 − (1.5) = 0.75

µ3 = µ′3 − 3µ2 µ + 3µ′1 µ2 − µ3
= µ′3 − 3µ′2 µ + 2µ3
= ∞.

Also, µk = ∞ for k ≥ 3.

STAT272 2015 Topic 5 9


Note:
• Moments need not exist.

• If either a pdf or pf is symmetric around zero, then all odd raw


k

moments are zero (if they exist). i.e. E X = 0 for
k = 1, 3, 5, 7, . . .

STAT272 2015 Topic 5 10


STAT272 2015 Topic 5 11
• If either a pdf or pf is symmetric about its mean µ, then all odd
central moments are zero (if they exist).

STAT272 2015 Topic 5 12


• Both raw and central moments are scale-dependent i.e. changing
the scale of measurement will change the values of the moments,
even though the “shape” of the distribution will not have
changed.
• If moments are to be used to compare the shapes of the
distributions of random variables, this dependence needs to be
“removed”.
• This can be done by standardising – the first and second
moments of the standardised random variable do not depend on
2
µX or σX i.e. they are “unit-less” – values represent the number
of standard deviations the variable is from its mean µX . (The
p
standard deviation of X is defined to be σX = σX 2 .)

STAT272 2015 Topic 5 13


2
• If X is a random variable with mean µX and variance σX , then
X − µX
U=
σX
is standardised. Proof:
 
X − µX 1
E (U ) = E = {E (X) − µX } = 0
σX σX
and  
X − µX 1
var U = var = 2 var X = 1.
σX σX
• Higher moments of U can be used to measure the “shape” of U,
and hence X, since X is just an affine (linear) transformation of
U:
X = µX + σX U

STAT272 2015 Topic 5 14


The kth moment of U is
( k )
k
 X − µX
ζk = E U =E
σX
µk
= k
σX

and we have

ζ1 = 0
ζ2 = 1
µ3
ζ3 = 3
σX
µ4
ζ4 = 4
σX

ζ3 is called the skewness and ζ4 the kurtosis.

STAT272 2015 Topic 5 15


• If either the pdf or pf is symmetric about its mean and the third
moment exists, then ζ3 = 0, i.e. symmetry implies ζ3 = 0.
• Note that ζ3 = 0 does not imply symmetry.
• Generally, if ζ3 > 0, then the pdf or pf shows a long tail to the
right, i.e. the distribution is right or positively skewed.
• If ζ3 < 0, then the distribution displays a long tail towards the
left, i.e. the distribution is left or negatively skewed.
• ζ4 is a measure of the degree of central flatness of the
distribution. This flatness near the centre (or, at the other end of
the scale, peakedness near the centre) is usually compared with
that of the the normal pdf.
µ3
• If Z ∼ N (0, 1) , ζ3 = σ3 = 0 (by symmetry).

STAT272 2015 Topic 5 16


• Also
Z∞
4
 4 1 − 1 z2
ζ4 = E Z = z √ e 2 dz

−∞
Z∞
2 4 − 21 z 2
= √ z e dz.

0

STAT272 2015 Topic 5 17


1 2 2
√ 1/2 √
2 − 12
Put w = 2z . Then z = 2w, z = 2w and dz = 2 w dw. Thus
Z∞ √
2 2 −1
ζ4 = √ 4w2 e−w w 2 dw
2π 2
0
Z∞
4 3
=√ w e−w dw
2
π
0
 
4 5
=√ Γ
π 2
 
4 3 1 1
= √ × × ×Γ
π 2 2 2
=3

• If ζ4 > 3, we conclude that the pdf is more peaked than the


normal (leptokurtic).

STAT272 2015 Topic 5 18


• If ζ4 < 3, it is less peaked than the normal (platykurtic).
• Higher moments for the standardised random variables can be
calculated. However, the values of these moments do not give
any useful information concerning the shape of the distribution.

Uses of moments

• Moments measure aspects of the shapes of distributions


(location, spread, skewness, kurtosis etc.) and allow comparisons.
• Characterizing the set of moments is not enough to determine a
distribution uniquely.
′ ′
• Under certain conditions, if two rvs X and Y have µX,k = µY,k
for k = 1, 2, 3, . . ., then they must have the same cdf.

STAT272 2015 Topic 5 19


Example (Non-unique moments)

Let X1 and X2 be two rvs with the pdfs given by


1 2
f1 (x) = √ e−(log x) /2 , 0 < x < ∞,
2πx
f2 (x) = f1 (x)[1 + sin(2π log x)], 0 < x < ∞.
k2 /2
It can be shown that E(X1k ) =e , k = 1, 2, . . . , and
Z ∞
E(X2k ) = xk f1 (x)[1 + sin(2π log x)] dx
0
Z ∞
= E(X1k ) + xk f1 (x) sin(2π log x) dx
0
= E(X1k ) + 0
= E(X1k ), k = 1, 2, . . . .

STAT272 2015 Topic 5 20


Moments of beta distribution

• Suppose X ∼ Beta(2, 3) . Then



 12x (1 − x)2 ; 0<x<1
fX (x) =
 0 ; elsewhere.

• Raw moments:
Z 1
1
j
xj xα−1 (1 − x)β−1 dx

E X =
B (α, β) 0
B (α + j, β)
=
B (α, β)
Γ (α + j) Γ (β) Γ (α + β)
=
Γ (α + β + j) Γ (α) Γ (β)
(a + j − 1) · · · α
= .
(α + β + j − 1) · · · (α + β)

STAT272 2015 Topic 5 21


• Thus, since here α = 2 and β = 3
α
µ′1 = E (X) =
α+β
2 2
= =
2+3 5
′ 2
 (α + 1) α
µ2 = E X =
(α + β + 1) (α + β)
3×2 1
= =
6×5 5
(α + 2) (a + 1) α
µ′3 =
(α + β + 2) (α + β + 1) (α + β)
4×3×2 4
= =
7×6×5 35
′ 5 4 1
µ4 = × =
8 35 14

STAT272 2015 Topic 5 22


• The central moments are
2
µ2 = σ 2 = µ′2 − (µ′1 )
= 0.2 − (0.4)2 = 0.04
µ3 = µ′3 − 3µ′2 µ + 3µ′1 µ2 − µ3
      2  3
4 1 2 2 2 2
= −3 +3 −
35 5 5 5 5 5
2
=
875
µ4 = µ′4 − 4µ′3 µ + 6µ′2 µ2 − 4µ′1 µ3 + µ4
      2  4
1 4 2 1 2 2
= −4 +6 −3
14 35 5 5 5 5
33
=
8750

STAT272 2015 Topic 5 23


• The skewness is therefore
2
µ3 875 2
ζ3 = 3 = = ,
σ 1 3 7

5

so that the distribution is right skewed, or skewed to the right.


• The kurtosis is
33
µ4 8750 33
ζ4 = 4 = =
σ 1 4 14

5
so that the distribution is less peaked than the normal
distribution.

STAT272 2015 Topic 5 24


STAT272 2015 Topic 5 25
• Often it is difficult to calculate moments directly.

• We’ll soon see a technique for calculating moments indirectly,


using the moment generating function (mgf) (the Laplace
transform of the pdf).

• For some distributions, it is computationally easier to find the


moments (raw and/or central) directly, while for others it is
easier to use the mgf.

STAT272 2015 Topic 5 26

You might also like