You are on page 1of 275

NICHOLAS N.N.

NSOWAH-NUAMAH

ADVANCED TOPICS
IN INTRODUCTORY
PROBABILITY
A FIRST COURSE IN PROBABILITY
THEORY – VOLUME III

2
Advanced Topics In Introductory Probability: A first Course in Probability Theory – Volume III
2nd edition
© 2018 Nicholas N.N. Nsowah-Nuamah & bookboon.com
ISBN 978-87-403-2238-5

3
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Contents

CONTENTS
Part 1 Bivariate Probability Distributions 7

Chapter 1 Probability And Distribution Functions


of Bivariate Distributions 8
1.1 Introduction 8
1.2 Concept of Bivariate Random Variables 8
1.3 Joint Probability Distributions 9
1.4 Joint Cumulative Distribution Functions 18
1.5 Marginal Distribution of Bivariate Random Variables 23
1.6 Conditional Distribution of Bivariate Random Variables 30
1.7 Independence of Bivariate Random Variables 35

Chapter 2 Sums, Differences, Products and Quotients


of Bivariate Distributions 45
2.1 Introduction 45
2.2 Sums of Bivariate Random Variables 46
2.3 Differences of Random Variables 63

�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma

Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr

4
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Contents

2.4 Products of Bivariate Random Variables 68


2.5 Quotients of Bivariate Random Variables 72

Chapter 3 Expectation and Variance of Bivariate Distributions 80


3.1 Introduction 80
3.2 Expectation of Bivariate Random Variables 80
3.3 Variance of Bivariate Random Variables 104

Chapter 4 Measures of Relationship of Bivariate Distributions 120


4.1 Introduction 120
4.2 Product Moment 121
4.3 Covariance of Random Variables 124
4.4 Correlation Coefficient of Random Variables 130
4.5 Conditional Expectations 135
4.6 Conditional Variances 141
4.7 Regression Curves 143

Part 2 Statistical Inequalities, Limit Laws and Sampling Distributions 154

Chapter 5 Statistical Inequalities and Limit Laws 155


5.1 Introduction 155
5.2 Markov’s Inequality 156
5.3 Chebyshev’s Inequality 160
5.4 Law of Large Numbers 170
5.5 Central Limit Theorem 177

Chapter 6 Sampling Distributions I: Basic Concepts 191


6.1 Introduction 191
6.2 Statistical Inference 192
6.3 Probability Sampling 195
6.4 Sampling With and Without Replacement 200

Chapter 7 Sampling Distributions II:


Sampling Distribution of Statistics 202
7.1 Introduction 202
7.2 Sampling Distribution of Means 206
7.3 Sampling Distribution of Proportions 216
7.4 Sampling Distribution of Differences 221
7.5 Sampling Distribution of Variance 225

5
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Contents

Chapter 8 Distributions Derived from Normal Distribution 236


8.1 Introduction 236
8.2 χ Distribution
2
237
8.3 t Distribution 243
8.4 F Distribution 247

Statistical Tables 254

Answers to Odd-Numbered Exercises 271

Bibliography 273

6
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Part 1 Bivariate Probability Distributions

PART 1
BIVARIATE PROBABILITY DISTRIBUTIONS

I salute the discovery of a single even insignificant truth more highly than
all the argumentation on the highest questions which fails to reach a truth
GALILEO (1564–1642)

7
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS

Chapter 1

PROBABILITY AND DISTRIBUTION FUNCTIONS


OF BIVARIATE DISTRIBUTIONS

1.1 INTRODUCTION

So far, all discussions in the two volumes of my book on probability (Nsowah-


Nuamah, 2017 and 2018) have been associated with a single random variable
X (that is, a one-dimensional or univariate random variable). Frequently,
we may be concerned with multivariate situations that simultaneously in-
volve two or more random variables. For instance, if we wanted to study
the relationship between weight and height of individual students we might
consider weight and height to be two random variables X and Y , respec-
tively, whose values are determined by measuring the weights and heights of
the students in the school. Such study will produce the ordered pair (X, Y ).

1.2 CONCEPT OF BIVARIATE RANDOM VARIABLES

1.2.1 Definition of Bivariate Random Variables

Many of the concepts discussed for the one-dimensional random variables


also hold for higher-dimensional case. Here in most cases we shall limit
ourselves to the two-dimensional (bivariate) case; more complex multivariate
4situations are straightforward Advanced Topics in Introductory Probability
generalisations.

Definition 1.1 BIVARIATE RANDOM 3 VARIABLE


If X = X(s) and Y = Y (s) are two real-valued functions on the
sample space S, then the pair (X, Y ) that assigns a point in the real
(x, y) plane to each point s ∈ S is called a bivariate random variable

Synonyms of a bivariate random variable are a two-dimensional random


variable/vector Fig. 1.1 is an illustration of a bivariate random variable.

Fig. 1.1 Bivariate Random Variables

1.2.2 Types of Bivariate Random Variables

Multivariate situations, similar to univariate cases, may involve discrete, as


well as continuous random variables.
8
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
Fig. 1.1 Bivariate Random Variables

1.2.2 Types of Bivariate Random Variables

Multivariate situations, similar to univariate cases, may involve discrete, as


well as continuous random variables.

Definition 1.2 DISCRETE BIVARIATE RANDOM


VARIABLE
(X, Y ) is a discrete bivariate random variable, if each of the random
variables X and Y is discrete

Definition 1.3 CONTINUOUS BIVARIATE RANDOM


VARIABLE
(X, Y ) is a continuous bivariate random variable if each of the ran-
dom variables is continuous

There are cases where one variable is discrete and the other continuous
but this will not be considered here.
Density and Distribution Functions of Bivariate Distributions 5

1.3 JOINT PROBABILITY DISTRIBUTIONS

A joint distribution is a distribution having two or more random variables,


with each random variable still having its own probability distribution, ex-
pected value and variance. In addition, for ordered pair values of the ran-
dom variables, probabilities will exist and the strength of any relationship
between the two variables can be measured.
In the multivariate case as in the univariate case we often associate a
probability (mass) function with discrete random variables and a probability
density function with continuous random variables. We shall take up the
discrete case first since it is the easier one to understand.

1.3.1 Joint Probability Distribution of Discrete Random


Variables

Suppose that X and Y are discrete random variables, and X takes values
i = 0, 1, 2, · · · , n, and Y takes values j = 1, 2, · · · , m. Most often, such a
joint distribution is given in table form. Table 1.1 is an n-by-m array which
displays the number of occurrences of the various combinations of values
of X and Y . We may observe that each row represents values of X and
each column represents values of Y . The row and column totals are called
marginal totals. Such a table is called the joint frequency distribution.

Table 1.1 Joint Frequency Distribution of X and Y


X Y Row Totals
y1 y2 ··· ym

x1 (x1 , y1 ) (x1 , y2 ) ··· (x1 , ym ) x1
y


x2 (x2 , y1 ) (x2 , y2 ) ··· (x2 , ym ) x2
y
.. .. .. .. .. 9
..
. . . . . .
i = 0, 1, 2, · · · , n, and Y takes values j = 1, 2, · · · , m. Most often, such a
joint distribution is given in table form. Table 1.1 is an n-by-m array which
displays
ADVANCED the number
TOPICS of occurrences of the various combinations of values
IN INTRODUCTORY
of X and Y . A We
PROBABILITY: may
FIRST observe
COURSE IN that each row represents values ofAND
PROBABILITY and
X DISTRIBUTION FUNCTIONS
PROBABILITY
each column THEORY
represents – VOLUME
values IIIof Y . The row and column totals areOFcalled
BIVARIATE DISTRIBUTIONS

marginal totals. Such a table is called the joint frequency distribution.

Table 1.1 Joint Frequency Distribution of X and Y


X Y Row Totals
y1 y2 ··· ym

x1 (x1 , y1 ) (x1 , y2 ) ··· (x1 , ym ) x1
y


x2 (x2 , y1 ) (x2 , y2 ) ··· (x2 , ym ) x2
y
.. .. .. .. .. ..
. . . . . .

xn (xn , y1 ) (xn , y2 ) ··· (xn , ym ) xn
y
   
Column y1 y2 ··· ym xi yj = N
Totals x x x x y
6 Advanced Topics in Introductory Probability

For example, suppose X and Y are discrete random variables, and X takes
values 0, 1, 2, 3, and Y takes values 1, 2, 3. Each of the nm row-column in-
tersections in Table 1.2 represents the frequency that belongs to the ordered
pair (X, Y ).

Table 1.2 Joint Frequency Distribution of X and Y

Values Values of Y Row


of X Totals
1 2 3
0 1 0 0 1
1 0 2 1 3
2 0 2 1 3
3 1 0 0 1
Column 2 4 2 8
Totals

Definition 1.4 JOINT PROBABILITY DISTRIBUTION


Let X and Y be discrete random variables with possible values
xi , i = 1, 2, ..., n and yj , j = 1, 2, 3, ..., m, respectively. The joint
(or bivariate) probability distribution for X and Y is given by

p(xi , yj ) = P ({X = xi } ∩ {Y = yj })

defined for all (xi , yj )

The function p(xi , yj ) is sometimes referred to as the joint probability


mass function (p.m.f.) or the joint probability function (p.f.) of X and Y .
This function gives the probability that X will assume a particular value x
while at the same time Y will assume a particular value y.

Note
(a) The notation p(x, y) for all (x, y) is the same as writing p(xi , yj ) for
i = 1, 2, ..., n and j = 1, 2, 3, ..., m. Sometimes when there is no ambi-
10
guity we shall use simply p(x, y).
The function p(xi , yj ) is sometimes referred to as the joint probability
ADVANCED TOPICS IN INTRODUCTORY
mass functionA (p.m.f.)
PROBABILITY: or theINjoint probability function
FIRST COURSE (p.f.) of XAND
PROBABILITY andDISTRIBUTION
Y. FUNCTIONS
This functionTHEORY
PROBABILITY gives the probability
– VOLUME III that X will assume a particular OFvalue
BIVARIATE
x DISTRIBUTIONS
while at the same time Y will assume a particular value y.

Note
(a) The notation p(x, y) for all (x, y) is the same as writing p(xi , yj ) for
i = 1, 2, ..., n and j = 1, 2, 3, ..., m. Sometimes when there is no ambi-
guity
Density andwe shall use simply
Distribution Functions of Bivariate Distributions
p(x, y). 7

(b) The joint probability p(xi , yj ) is sometimes denoted as


P (X = x, Y = y),
where the comma stands for ‘and’ or ‘∩’.

Definition 1.5
If X and Y are discrete random variables with joint probability mass
function p(xi , yj ), then

(a) p(xi , yj ) ≥ 0, for all i and j


n 
 m
(b) p(xi , yj ) = 1
i=1 j=1

Once the joint probability mass function is determined for discrete random
variables X and Y , calculation of joint probabilities involving X and Y is
straightforward.
Let the value that the random variables X and Y jointly take be de-
noted by the ordered pair (xi , yj ). The joint probability p(xi , yj ) is obtained
by counting the number of occurrences of that combination of values X and
Y and dividing the count by the total number of all the sample points. Thus,
#({X = xi } ∩ {Y = yj })
P ({X = xi } ∩ {Y = yj }) = n 
m

#({X = xi } ∩ {Y = yj })
i=1 j=1
#(xi , yj )
= n 
m

#(xi , yj )
i=1 j=1

where
#(xi , yj ) is the number of occurrences in the cell of the
ordered pair (xi , yj );
n 
 m
#(xi , yj ) is the total number of all sample points (cells)
i=1 j=1
of the ordered pairs (xi , yj ), denoted by N .

11
ADVANCED TOPICS IN INTRODUCTORY
8
PROBABILITY: A FIRST COURSE INAdvanced Topics in Introductory Probability
PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS

Joint Probability Distribution of Bivariate Random Variables


in Tabular Form

The joint probability distribution may be given in the form of a table of


n rows and m columns (See Table 1.3). The upper margins of the table
indicate the possible distinct values of X and Y . The numbers in the body
of the table are the probabilities for the joint occurrences of the two events
corresponding to X = xi (1 ≤ i ≤ n) and Y = yj (1 ≤ i ≤ m). The row and
column totals are the probabilities for the individual random variables and
are called marginal probabilities because they appear on the margins of
the table. Such a table is also called the joint relative frequency distribution.

Table 1.3 Joint Probability Distribution of X and Y

X Y Row Totals
y1 y2 ··· ym
x1 p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x1 )
x2 p(x2 , y1 ) p(x2 , y2 ) ··· p(x2 , ym ) p(x2 )
.. .. .. .. .. ..
. . . . . .

xn p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym ) p(xn )


n 
 m
Column p(y1 ) p(y2 ) ··· p(ym ) p(xi , yj ) = 1
Totals i=1 j=1

Note
The marginal probabilities for X are simply the simple probabilities that
X = xi for values of yj , where j assumes a value from 1 to m. Similarly, the
marginal probabilities for Y are the simple probabilities that Y = yj , where
i assumes a value from 1 and n.

It is important to note that the distribution satisfies a joint probability


function, namely,
(a) p(xi , yj ) ≥ 0, for all i = 1, 2, · · · , n; j = 1, 2, · · · , m.
n 
 m
(b) p(xi , yj ) = 1
i=1 j=1

www.job.oticon.dk

12
xn
ADVANCED
p(x , y ) p(xn , y2 )
TOPICSn IN 1INTRODUCTORY
··· p(xn , ym ) p(xn )
n 
 m
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY p(y1 ) – VOLUME
Column THEORY p(y2 )III ··· p(ym ) p(xi , yj ) =OF1 BIVARIATE DISTRIBUTIONS
Totals i=1 j=1

Note
The marginal probabilities for X are simply the simple probabilities that
X = xi for values of yj , where j assumes a value from 1 to m. Similarly, the
marginal probabilities for Y are the simple probabilities that Y = yj , where
i assumes a value from 1 and n.

It is important to note that the distribution satisfies a joint probability


function, namely,
(a) p(xi , yj ) ≥ 0, for all i = 1, 2, · · · , n; j = 1, 2, · · · , m.
n 
 m
(b) p(xi , yj ) = 1
i=1 j=1
Density and Distribution Functions of Bivariate Distributions 9
Density and Distribution Functions of Bivariate Distributions 9

Example 1.1
Example 1.1
(a) For the data in Table 1.2, calculate the joint probabilities of X and Y .
(a) For the data in Table 1.2, calculate the joint probabilities of X and Y .
(b) Does this distribution satisfy the properties of a joint probability func-
(b) Does
tion? this distribution satisfy the properties of a joint probability func-
tion?

Solution
Solution
(a) From Table 1.2, the cell ({X = 0} ∩ {Y = 1}) = (0, 1) contains one
(a) From Table 1.2,
element; the cell ({X = 0} ∩ {Y = 1}) = (0, 1) contains one
element;
Total number of elements in all cells is 8.
Total
Hencenumber of elements in all cells is 8.
Hence
P ({X = 0} ∩ {Y = 1}) = p(0, 1)
P ({X = 0} ∩ {Y = 1}) = p(0, #({X
1) = 0} ∩ {Y = 1}) 1
= n m #({X = 0} ∩ {Y = 1}) =1

= n m = 8
#({X = xi } ∩ {Y = yj }) 8
i j #({X = xi } ∩ {Y = yj })
i j

Similarly,
Similarly,
0
P ({X = 0} ∩ {Y = 2}) = p(0, 2) = 08 =0
P ({X = 0} ∩ {Y = 2}) = p(0, 2) = =0
80
P ({X = 0} ∩ {Y = 3}) = p(0, 3) = 08 =0
P ({X = 0} ∩ {Y = 3}) = p(0, 3) = =0
80
P ({X = 1} ∩ {Y = 1}) = p(1, 1) = 08 =0
P ({X = 1} ∩ {Y = 1}) = p(1, 1) = =0
82 1
P ({X = 1} ∩ {Y = 2}) = p(1, 2) = 28 =1
P ({X = 1} ∩ {Y = 2}) = p(1, 2) = = 4
81 41
P ({X = 1} ∩ {Y = 3}) = p(1, 3) = 18 =1
P ({X = 1} ∩ {Y = 3}) = p(1, 3) = = 8
8 8

When probabilities of all possible joint events, P (X = xi , Y = yj ),


When probabilities
have been determinedof in
allthis
possible joint
fashion, weevents,
have a P (X probability
joint = xi , Y = ydis-
j ),
have been determined in this fashion, we have a joint probability
tribution of X and Y and these results may be presented in a two-way dis-
tribution of X and
table as shown Y and
in the tablethese results may be presented in a two-way
below:
table as shown in the table below:

13
8
2 1
P ({X = 1} ∩ {Y = 2}) = p(1, 2) = =
ADVANCED TOPICS IN INTRODUCTORY 8 4
PROBABILITY: A FIRST COURSE IN 1 1
P ({X = 1} ∩ {Y = 3}) = p(1, 3) PROBABILITY
= = AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 8 8 OF BIVARIATE DISTRIBUTIONS

When probabilities of all possible joint events, P (X = xi , Y = yj ),


10 have been determined inAdvanced Topics
this fashion, in Introductory
we have Probability
a joint probability dis-
10 tribution of X and Y andAdvanced
these results
Topicsmayinbe presented inProbability
Introductory a two-way
table as shown in the table below:
X Y
X 1 Y2 3
0 1/8 1 02 03
10 1/8 0 2/80 1/8
0
21 00 2/8
2/8 1/8 1/8
32 1/8 0 0
2/8 0
1/8
3 1/8 0 0

(b) From the table above,


(b) From the, table
(i) p(x above,for all i = 0, 1, 2, 3; j = 1, 2, 3.
i yj ) ≥ 0,

(i) 3 
p(x i , yj ) ≥ 0,
3 for1 all1 i =2 0, 1,22, 3;1 j =1 1, 2, 3.
(ii) p(xi , yj ) = + + + + + = 1
3  3
i=0 j=1
81 81 82 82 81 81
(ii) p(xi , yj ) = + + + + + = 1
8 8 8 8 8 8
Hencei=0
this
j=1 distribution is a joint probability function.

Hence this distribution is a joint probability function.


Joint Probability Distribution of Bivariate Random Variables
in Expression
Joint ProbabilityFormDistribution of Bivariate Random Variables
in Expression Form
Sometimes the joint probability distribution of discrete random variables X
and Y is given
Sometimes by a formula.
the joint probability distribution of discrete random variables X
and Y is given by a formula.
Example 1.2
Given the1.2
Example function
Given the function
p(x, y) = k(3x + 2y), x = 0, 1; y = 0, 1, 2
p(x, y) = k(3x + 2y), x = 0, 1; y = 0, 1, 2
(a) Find the constant k > 0 such that the p(x, y) is a joint probability
(a) mass
Find function.
the constant k > 0 such that the p(x, y) is a joint probability
mass function.
(b) Present it in a tabular form for the probabilities associated with the
(b) sample
Present points
it in a(x, y). Obtain
tabular the the
form for rowprobabilities
and column totals.
associated with the
sample points (x, y). Obtain the row and column totals.
Solution
(a) (i) p(x, y) ≥ 0
Solution
(a) (i) and
Density p(x, 2 ≥0
y)
1 Distribution Functions 
of1 Bivariate Distributions 11

Density and 
Distribution Functions of Bivariate Distributions 11
(ii) k(3x + 2y) = k [(3x + 0) + (3x + 2) + (3x + 4)]
1  2 1
x=0 y=0 x=0
(ii) k(3x + 2y) = k  [(3x + 0)
1 + (3x + 2) + (3x + 4)]
x=0 y=0 = 1 (9x + 6)
k x=0
= k x=0(9x + 6)
= k[(0 + 6) + (9 + 6)]
x=0
= k[(0 + 6) + (9 + 6)]
= 21k
= 21k
For p(x, y) to be a joint probability function we must have 21k = 1 from
For p(x, y) to be a joint probability function we must have 21k = 1 from
which
which 1
k= 1
k = 21
21
(b) For the sample point {X = 0, Y = 0} = (0, 0)
(b) For the sample point {X = 0, Y = 0} = (0, 0)
1
p(0, 0) = 1 [3(0) + 2(0)] = 0
p(0, 0) = 21 [3(0) + 2(0)]
14= 0
21
= 21k

For
ADVANCED to be IN
p(x, y)TOPICS a joint probability function we must have 21k = 1 from
INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
which PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 1 OF BIVARIATE DISTRIBUTIONS
k=
21
(b) For the sample point {X = 0, Y = 0} = (0, 0)

1
p(0, 0) = [3(0) + 2(0)] = 0
21

Similarly,

1 2
p(0, 1) = [3(0) + 2(1)] =
21 21
1 4
p(0, 2) = [3(0) + 2(2)] =
21 21
1 3
p(1, 0) = [3(1) + 2(0)] =
21 21
1 5
p(1, 1) = [3(1) + 2(1)] =
21 21
1 7
p(1, 2) = [3(1) + 2(2)] =
21 21

These results are presented in the following table. Recollect that the row
and column totals are the marginal probabilities.

X Y Row
Totals
0 1 2
0 0 2/21 4/21 6/21
1 3/21 5/21 7/21 15/21
Column 3/21 7/21 11/21 1
12 Totals Advanced Topics in Introductory Probability

1.3.2 Joint Distribution of Continuous Random Variables

Definition 1.6 JOINT PROBABILITY DENSITY FUNCTION


Let X and Y be two continuous random variables. The joint (or
bivariate) probability density function f (x, y) of X and Y is given
by
 y2  x2
P ({x1 ≤ X ≤ x2 } ∩ {y1 ≤ Y ≤ y2 }) = f (x, y) dxdy
y1 x1

for two pairs (x1 , x2 ) and (y1 , y2 ) with x2 ≥ x1 , y2 ≥ y1

Fig 1.2 depicts the case of the continuous bivariate random variables
(X, Y ) which assume all the values in the rectangle x1 ≤ X ≤ x2 and
y1 ≤ Y ≤ y2 .

15
Fig 1.2
ADVANCED depicts
TOPICS the case of the continuous bivariate random variables
IN INTRODUCTORY
PROBABILITY:
(X, Y ) whichA assume
FIRST COURSE
all theINvalues in the rectanglePROBABILITY
x1 ≤ X ≤ANDx2 DISTRIBUTION
and FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
y1 ≤ Y ≤ y2 .

Fig. 1.2 Region assumed by (X, Y ) is shaded

Thus, the probability that the event

{x1 ≤ X ≤ x2 } ∩ {y1 ≤ Y ≤ y2 }

will fall in the shaded rectangle is

P ({x1 ≤ X ≤ x2 } ∩ {y1 ≤ Y ≤ y2 })
Density and Distribution Functions of Bivariate Distributions 13
This probability can be found by subtracting from the probability that the
event will fall in the (semi-infinite) rectangle having the upper-right corner
(x2 , y2 ) the probabilities that it will fall in the semi-infinite rectangle having
the upper-right corner (x1 , y2 ) and (x2 , y1 ) respectively, and then adding
back the probability that it will fall in the semi-infinite rectangle with the
upper-right corner at (x1 , y1 ). That is,

P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = P (X ≤ x2 , Y ≤ y2 ) − P (X ≤ x2 , Y ≤ y1 )
−P (X ≤ x1 , Y ≤ y2 ) + P (X ≤ x1 , Y ≤ y1 )

For computational purposes, we shall adopt Definition 1.7.

WHY WAIT FOR


Definition 1.7

PROGRESS?
Let (X, Y ) be a continuous bivariate random variable assuming all
values in the region R. The joint probability density function f is a
function satisfying the following properties:

DARE TO1 DISCOVER


Property f (x, y) ≥ 0, for all (x, y) ∈ R
 
Discovery means many different things at
Property 2 But it’s the spirit
Schlumberger. f (x,that
y) unites =1
dxdyevery
single one of us. It doesn’t
R matter whether they
join our business, engineering or technology teams,
our trainees push boundaries, break new ground
and deliver
Property the exceptional.
2 states that theIf that excites
total you,
volume
bounded by the surface given by
then we want to hear from you.
equation z = f (x, y) and the region R on the xy-plane is equal to 1.
careers.slb.com/recentgraduates
Example 1.3
Given the following function of a two-dimensional continuous random vari-
able (X, Y ):
 xy
x2 + , 0≤x≤1 0≤y≤2
f (x, y) = k
0, elsewhere

where k is a constant. 16

(a) Find the value of k > 0 such that f (x, y) is a probability density
(x2 , y2 ) the probabilities that it will fall in the semi-infinite rectangle having
the upper-right corner (x1 , y2 ) and (x2 , y1 ) respectively, and then adding
ADVANCED TOPICS IN INTRODUCTORY
back the probability
PROBABILITY: that it will
A FIRST COURSE IN fall in the semi-infinite rectangle AND
PROBABILITY withDISTRIBUTION
the FUNCTIONS
upper-right
PROBABILITYcorner
THEORYat– (x ,
1 1y
VOLUME ). That
III is, OF BIVARIATE DISTRIBUTIONS

P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = P (X ≤ x2 , Y ≤ y2 ) − P (X ≤ x2 , Y ≤ y1 )
−P (X ≤ x1 , Y ≤ y2 ) + P (X ≤ x1 , Y ≤ y1 )

For computational purposes, we shall adopt Definition 1.7.

Definition 1.7
Let (X, Y ) be a continuous bivariate random variable assuming all
values in the region R. The joint probability density function f is a
function satisfying the following properties:

Property 1 f (x, y) ≥ 0, for all (x, y) ∈ R


 
Property 2 f (x, y) dxdy = 1
R

Property 2 states that the total volume bounded by the surface given by
equation z = f (x, y) and the region R on the xy-plane is equal to 1.

Example 1.3
Given the following function of a two-dimensional continuous random vari-
able (X, Y ):
 xy
x2 + , 0≤x≤1 0≤y≤2
f (x, y) = k
0, elsewhere

where k is a constant.
(a) Find the value of k > 0 such that f (x, y) is a probability density
function.
14(b) Find P (0 < X < 1, 1 < Advanced
Y < 2). Topics in Introductory Probability

Solution
(a) For f (x, y) to be a p.d.f, it should satisfy the two conditions of Theorem
1.2. Obviously,
f (x, y) ≥ 0
since x ≥ 0, y ≥ 0, and k > 0.
Also  +∞  +∞
f (x, y) dx dy = 1
−∞ −∞

Now,
 2  1    2  1   
xy xy
x +2
dx dy = x +
2
dx dy
y=0 x=0 k y=0 x=0 k
 2  3 1
x x2 y
= + dy
y=0 3 2k x=0
 2  
1 y
= + dy
y=0 3 2k
 2
1 y2
= y+
3 4k 0
2 171
= + =1
3 k
x + dx dy = x + dx dy
y=0 x=0 k y=0 x=0 k
 2  3 1
ADVANCED TOPICS IN INTRODUCTORY x x2 y
= + dy
PROBABILITY: A FIRST COURSE IN y=0 2k x=0
3 PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III  2   OF BIVARIATE DISTRIBUTIONS
1 y
= + dy
y=0 3 2k
 2
1 y2
= y+
3 4k 0
2 1
= + =1
3 k
giving k = 3.
(b) Using the value of k found in (a) we have
 1  2  
xy
P (0 < X < 1, 1 < Y < 2) = dx dy x2 +
x=0 y=1 3
 2  1   
xy
= x +
2
dx dy
y=1 x=0 3
 2  3 1
x x2 y
= + dy
y=1 3 6 x=0
 2  
1 y
= + dy
y=1 3 6
 2
y y2 7
Density and Distribution Functions of=Bivariate
+ Distributions
= 15
3 12 y=1 12

1.4 JOINT CUMULATIVE DISTRIBUTION FUNCTIONS

1.4.1 Definition of Joint Bivariate Distribution Function

The joint behaviour of two random variables, X and Y is determined by the


joint cumulative distribution function, also called the bivariate cumulative
distribution function, or simply the joint or bivariate distribution function
of the two random variables, X and Y . The definition given in Definition
1.8 is applicable whether X and Y are discrete or continuous.

Definition 1.8 JOINT DISTRIBUTION FUNCTION


For any random variables X and Y , the joint (bivariate) cumulative
distribution function F (x, y), is given by

F (x, y) = P ({X ≤ x} ∩ {Y ≤ y}) = P (X ≤ x, Y ≤ y)

1.4.2 Joint Distribution Function of Discrete Random Variables

The joint cumulative distribution function of random variables X and Y


gives the probability that X takes on a value less that or equal to xi ,
i = 1, 2, · · · , n and that the Y takes on a value less than or equal to yj ,
j = 1, 2, · · · , m.

Definition 1.9 JOINT DISTRIBUTION FUNCTION


(Discrete Case)
The joint distribution function of two discrete random variables X
and Y is  
F (x, y) = p(xi , yj )
xi ≤x yi ≤y

18
The joint cumulative distribution function of random variables X and Y
The joint
gives cumulative that
the probability distribution
X takesfunction of random
on a value variables
less that or equal and
X to xi ,Y
ADVANCED TOPICS IN INTRODUCTORY
gives the probability
i = 1, 2, · · · , nA FIRST
PROBABILITY:
that
and that the IN
COURSE
X takes on a value less that or equal
Y takes on a value less PROBABILITY to yxji ,
than or equalAND DISTRIBUTION FUNCTIONS
= 1,
ji = 1,2,
2,······,,m.
PROBABILITY n and that
THEORY the YIIItakes on a value less than or equal OF
– VOLUME to BIVARIATE
yj , DISTRIBUTIONS
j = 1, 2, · · · , m.

Definition 1.9 JOINT DISTRIBUTION FUNCTION


Definition 1.9 JOINT
(DiscreteDISTRIBUTION
Case) FUNCTION
(Discrete Case)
The joint distribution function of two discrete random variables X
The Yjoint
and is distribution function of two discrete random variables X

and Y is F (x, y) =   p(xi , yj )
F (x, y) = xi ≤x yi ≤y p(xi , yj )
xi ≤x yi ≤y

Example 1.4
16
Example 1.4 table of Example Advanced Topics in Introductory Probability
Refer
16 to the 1.1. Calculate
Advanced Topics in Introductory Probability
Refer to the table of Example 1.1. Calculate
(a) the joint probability P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2);
(b)
(a) the
the joint
joint cumulative
probability probability
P (2 ≤ X ≤ P (X ≤Y1, ≤Y 2);
≤ 2).
(b) the joint cumulative probability P3,(X
1≤≤ 1, Y ≤ 2).
Solution
Solution
(a) The joint probability P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) is obtained as follows:
(a) The joint probability P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) is obtained as follows:
P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) = p(2, 1) + p(2, 2) + p(3, 1) + p(3, 2)
P (2 ≤ X ≤ 3, 1 ≤ Y ≤ 2) = p(2, 21) + 1p(2, 2) + p(3, 3 1) + p(3, 2)
= 0+ 2 + 1 +0== 3
= 0+ 8 + 8 +0== 8
8 8 8
(b) The joint probability P (X ≤ 1, Y ≤ 2) is as follows:
(b) The joint probability P (X ≤ 1, Y ≤ 2) is as follows:
P (X ≤ 1, Y ≤ 2) = F (1, 2),
P (X ≤ 1, Y ≤ 2) = F (1, 2),
= p(0, 1) + p(0, 2) + p(1, 1) + p(1, 2)
= p(0,
1 1) + p(0,22) +3p(1, 1) + p(1, 2)
= 1 +0+0+ 2 = 3
= 8 +0+0+ 8 = 8
8 8 8
Example 1.5
Example
Refer 1.5
to Example 1.2. Calculate
Refer to Example 1.2. Calculate
(a) the joint probability distribution P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2);
(a) the joint probability distribution P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2);
PREPARE FOR A
(b) the joint cumulative distribution function P (X ≤ 1, Y ≤ 1);
(b) the joint cumulative distribution function P (X ≤ 1, Y ≤ 1);
LEADING ROLE.
(c) the joint cumulative distribution function P (X ≤ 1, Y ≤ 2).
(c) the joint cumulative distribution function P (X ≤ 1, Y ≤English-taught
2). MSc programmes
Solution in engineering: Aeronautical,
Solution Biomedical, Electronics,
Mechanical, Communication
1  2
 1
(a) P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2) = 1  2
1 (3x + 2y) systems and Transport systems.
21
(a) P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2) = x=0 y=1 (3x + 2y)
x=0 y=1
21 No tuition fees.
1
 1 1
= 1  [(3x + 2) + (3x + 4)]
= 21 [(3x + 2) + (3x + 4)]
21 x=0
x=0
1
1 
= 1  1 (6x + 6)
= 21 (6x + 6)
21 x=0
1 x=0 E liu.se/master
6
= 1 [(0 + 6) + (6 + 6)] = 67
21
= [(0 + 6) + (6 + 6)] =
(b) P (X ≤ 1, Y ≤ 1) = 21(1, 1)
F 7


(b) P (X ≤ 1, Y ≤ 1) = F1(1, 11)
 1
= 1  1
1 (3x + 2y)
= x=0 y=0
21 (3x + 2y)
x=0 y=0
21

19
P (X ≤ 1, Y ≤ 2) = F (1, 2),
ADVANCED TOPICS IN INTRODUCTORY
= p(0, 1)
PROBABILITY: A FIRST COURSE IN + p(0, 2) + p(1,PROBABILITY
1) + p(1, 2) AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 1 2 3 OF BIVARIATE DISTRIBUTIONS
= +0+0+ =
8 8 8

Example 1.5
Refer to Example 1.2. Calculate

(a) the joint probability distribution P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2);

(b) the joint cumulative distribution function P (X ≤ 1, Y ≤ 1);

(c) the joint cumulative distribution function P (X ≤ 1, Y ≤ 2).

Solution
2
1 
 1
(a) P (0 ≤ X ≤ 1, 1 ≤ Y ≤ 2) = (3x + 2y)
x=0 y=1
21
1
1 
= [(3x + 2) + (3x + 4)]
21 x=0
1
1 
= (6x + 6)
21 x=0
1 6
= [(0 + 6) + (6 + 6)] =
21 7
(b) P (X ≤ 1, Y ≤ 1) = F (1, 1)
1  1

Density and Distribution Functions of Bivariate 1
Distributions 17
= (3x + 2y)
x=0 y=0
21
1
1 
= [(3x + 0) + (3x + 2)]
21 x=0
1
1 
= (6x + 2)
21 x=0
1 10
= [(0 + 2) + (6 + 2)] =
21 21

The reader is asked in Exercise 1.5 to solve part (c) of this example.

1.4.3 Joint Distribution Function of Continuous Random


Variables

Definition 1.10 JOINT DISTRIBUTION FUNCTION


(Continuous Case)
The cumulative distribution function F of the two-dimensional ran-
dom variable (X, Y ) is defined as
 y  x
F (x, y) = P ({X ≤ x} ∩ {Y ≤ y}) = f (s, t) ds dt
−∞ −∞

where f (s, t) is the value of the joint probability density of X and Y


at (s, t)

The joint distribution function gives the probability that the point X, Y
belongs to a semi-infinite rectangle in the plane, as shown in Fig. 1.3.

20
F (x, y) = P ({X ≤ x} ∩ {Y ≤ y}) = f (s, t) ds dt
−∞ −∞
ADVANCED TOPICS IN INTRODUCTORY
where f (s,At)FIRST
PROBABILITY: is the value of
COURSE IN the joint probability density of X and
PROBABILITY ANDY DISTRIBUTION FUNCTIONS
at (s, t)
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS

The joint distribution function gives the probability that the point X, Y
belongs to a semi-infinite rectangle in the plane, as shown in Fig. 1.3.

18 Fig. 1.3 Region of {X ≤Topics


Advanced x, Y ≤iny} is shaded Probability
Introductory

Example 1.6
Refer to Example 1.3. Calculate P (X ≤ 1, Y < 1).

Solution
 1  1  
xy
P (X ≤ 1, Y < 1) = x + dx dy 2
y=0 x=0 3
 1  1   
xy
= x +
2
dx dy
y=0 x=0 3
 1  3 1
x x2 y
= + dy
y=0 3 6 x=0
 1  
1 y
= + dy
y=0 3 6
 1
1 y2
= y+
3 12 y=0
5
=
12

Theorem 1.1
If F is the cumulative distribution function of a two-dimensional
random variable with joint probability density function f (x, y) then

∂ 2 F (x, y)
= f (x, y)
∂x ∂y
wherever F is differentiable

Example 1.7
Let
F (x, y) = (1 − e−x )(1 − e−y ), x ≥ 0, y≥0
Find the joint probability density function f (x, y).

Solution
∂F (x, y)
= e−x (1 − e−y )
∂x
21
∂ 2 F (x, y)
ADVANCED TOPICS IN INTRODUCTORY = f (x, y)
∂x ∂y
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
wherever F is differentiable

Example 1.7
Let
F (x, y) = (1 − e−x )(1 − e−y ), x ≥ 0, y≥0
Find the joint probability density function f (x, y).

Solution
∂F (x, y)Functions
Density and Distribution of Bivariate Distributions 19
= e−x (1 − e−y )
∂x
∂ 2 F (x, y)
= e−x e−y
∂x∂y
= e−(x+y) , x ≥ 0, y ≥ 0.

Hence
f (x, y) = e−(x+y) , x ≥ 0, y ≥ 0}

Note
∂ 2 F (x, y) ∂ 2 F (x, y)
=
∂x∂y ∂y∂x

1.4.4 Properties of Bivariate Cumulative Distribution Function

The joint c.d.f. of a bivariate random variable has properties which are
analogous to those of the univariate random variable.

Property 1
The function F (x, y) is a probability, hence

0 ≤ F (x, y) ≤ 1
Click here
Property 2 to learn more

TAKE THE
The bivariate distribution function F (x, y) is monotonic increasing, in a
wider sense for both variables, that is,

RIGHT TRACK
if x1 ≤ x2 , then

F (x1 , y) ≤ F (x2 , y), for y fixed

if y1 ≤ y2 , then
Give your career a head start
byy1studying
F (x, ) ≤ F (x, ywith
2 ), us.
forExperience
x fixed the advantages
of our collaboration with major companies like
Property 3 ABB, Volvo and Ericsson!
The following relations are also true:
(a) F (−∞, y) = 0

(b) F (x, −∞)Apply


= 0 by World class
www.mdh.se
15 January research
(c) F (+∞, +∞) = 1

22
f (x, y) = e , x ≥ 0, y ≥ 0}

ADVANCED TOPICS IN INTRODUCTORY


PROBABILITY: A FIRST COURSE IN
Note PROBABILITY AND DISTRIBUTION FUNCTIONS
∂ 2 F (x,
PROBABILITY THEORY – VOLUME III y) ∂ 2 F (x, y) OF BIVARIATE DISTRIBUTIONS
=
∂x∂y ∂y∂x

1.4.4 Properties of Bivariate Cumulative Distribution Function

The joint c.d.f. of a bivariate random variable has properties which are
analogous to those of the univariate random variable.

Property 1
The function F (x, y) is a probability, hence

0 ≤ F (x, y) ≤ 1

Property 2
The bivariate distribution function F (x, y) is monotonic increasing, in a
wider sense for both variables, that is,
if x1 ≤ x2 , then

F (x1 , y) ≤ F (x2 , y), for y fixed

if y1 ≤ y2 , then

F (x, y1 ) ≤ F (x, y2 ), for x fixed

Property 3
The following relations are also true:
(a) F (−∞, y) = 0

(b) F (x, −∞) = 0


20 Advanced Topics in Introductory Probability
(c) F (+∞, +∞) = 1
(d) F (−∞, −∞) = 0
(e) F (x, +∞) = P (X ≤ x) = F1 (x), where F1 (x) is the c.d.f of X
(g) P (a < X < b, Y ≤ y) = F (b, y) − F (a, y)
(h) P (X ≤ x, c < Y < d) = F (x, d) − F (x, c)
(f) P (a < X < b, c < Y < d) = F (b, d) − F (a, d) − F (b, c) + F (a, c)

Property 4
At points of continuity of f (x, y) is
∂ 2 F (x, y)
= f (x, y)
∂x∂y

1.5 MARGINAL DISTRIBUTION OF BIVARIATE RANDOM


VARIABLES

1.5.1 Marginal Distribution of Discrete Bivariate Random


Variables

The row totals of Table 1.3 provide us with the probability distribution of
X. Similarly, the column totals provide the probability distribution of Y .
These are typically called marginal probability mass functions because they
are found on the margins of tables.

Definition 1.11 MARGINAL BIVARIATE DISTRIBUTIONS


OF X AND Y 23
Let X and Y be discrete random variables with joint probability
Variables

The row totals


ADVANCED TOPICSofINTable 1.3 provide us with the probability distribution of
INTRODUCTORY
PROBABILITY: A FIRST COURSEtotals
X. Similarly, the column IN provide the probability distribution
PROBABILITY AND of Y.
DISTRIBUTION FUNCTIONS
These are typically called marginal probability mass functions because they
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
are found on the margins of tables.

Definition 1.11 MARGINAL BIVARIATE DISTRIBUTIONS


OF X AND Y
Let X and Y be discrete random variables with joint probability
function p(xi , yj ). Then the marginal distributions of X and Y re-
spectively are given by
m

g(xi ) = P (X = xi ) = p(xi , yj ), i = 1, 2, 3, ..., n
j=1

n

h(yj ) = P (Y Functions
Density and Distribution = yj ) = ofp(xi , yj ), Distributions
Bivariate j = 1, 2, 3, ..., m 21
i=1
Density and Distribution Functions of Bivariate Distributions 21
Example 1.8
For the data
Example 1.8 in Table 1.2, find the marginal probability distribution for
(a)
For the
X data andin Table
(b) Y . find the marginal probability distribution for
1.2,
(a) X and (b) Y .
Solution
To calculate the marginal probabilities the joint probabilities are required.
Solution
The joint probabilities
To calculate the marginalfor probabilities
this problemthe have been
joint calculated are
probabilities in Example
required.
1.1.
The joint probabilities for this problem have been calculated in Example
1.1.
(a) From the table obtained in Example 1.1. we shall calculate the marginal
(a) probabilities
From the table forobtained
each xi inbyExample
fixing i 1.1.
and we
summing all the joint
shall calculate proba-
the marginal
bilities acrossfor
probabilities j. Thus:
each xi by fixing i and summing all the joint proba-
bilities across j. Thus:
P (X = 0) = P (X = 0, Y = 1) + P (X = 0, Y = 2)
P (X = 0) = P (X +P= (X0,=Y 0,=Y1)=+3)P (X = 0, Y = 2)
+P (X = 0, Y = 3) 1 1
= p(0, 1) + p(0, 2) + p(0, 3) = + 0 + 0 =
81 81
= p(0, 1) + p(0, 2) + p(0, 3) = + 0 + 0 =
8 8
P (X = 1) = P (X = 1, Y = 1) + P (X = 1, Y = 2)
P (X = 1) = P (X+P = 1,
(XY ==1,1)Y+=P3)(X = 1, Y = 2)
+P (X = 1, Y = 3) 2 1 3
= p(1, 1) + p(1, 2) + p(1, 3) = 0 + + =
82 81 83
= p(1, 1) + p(1, 2) + p(1, 3) = 0 + + =
8 8 8
P (X = 2) = P (X = 2, Y = 1) + P (X = 2, Y = 2)
P (X = 2) = P (X +P= (X2,=Y 2,=Y1)=+3)P (X = 2, Y = 2)
+P (X = 2, Y = 3) 2 1 3
= p(2, 1) + p(2, 2) + p(2, 3) = 0 + + =
82 81 83
= p(2, 1) + p(2, 2) + p(2, 3) = 0 + + =
8 8 8
P (X = 3) = P (X = 3, Y = 1) + P (X = 3, Y = 2)
P (X = 3) = P (X +P= (X3,=Y 3,=Y1)=+3)P (X = 3, Y = 2)
+P (X = 3, Y = 3) 1 1
= p(3, 1) + p(3, 2) + p(3, 3) = + 0 + 0 =
81 81
= p(3, 1) + p(3, 2) + p(3, 3) = + 0 + 0 =
The results are summarised in the table below: 8 8
The results are summarised in the table below:
xi 0 1 2 3
xii ) 10 31 283 381
g(x 8 8
g(xi ) 1
8
3
8
3
8
1
8

24
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY
22 THEORY – VOLUME Advanced
III OF BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability

(b) Similar to (a) we fix j and sum all the joint probabilities across i.
Hence,
P (Y = 1) = P (Y = 1, X = 0) + P (Y = 1, X = 1)
+P (Y = 1, X = 2) + P (Y = 1, X = 3)
= p(1, 0) + p(1, 1) + p(1, 2) + p(1, 3)
1 1
= +0+0+0=
8 8

P (Y = 2) = P (Y = 2, X = 0) + P (Y = 2, X = 1) +
P (Y = 2, X = 2) + P (Y = 2, X = 3) +
= p(2, 0) + p(2, 1) + p(2, 2) + p(2, 3)
2 2 4
= 0+ + +0=
8 8 8

P (Y = 3) = P (Y = 3, X = 0) + P (Y = 3, X = 1) +
P (Y = 3, X = 2) + P (Y = 3, X = 3)
= p(3, 0) + p(3, 1) + p(3, 2) + p(3, 3)
1 1 2
= 0+ + +0=
8 8 8

The results are summarised in the table below.

yj 1 2 3
g(yj ) 2
8
4
8
2
8

The results of Examples 1.1 and 1.8 (that is, the joint and marginal proba-
bilities) are usually presented in a single table such as in Table 1.4.
How will people travel in the future, and
Note how will goods be transported? What re-
sources will we use, and how many will
(a) The marginal distributions of X and Y are the ordinary we probability
need? The passenger and freight traf-
distribution functions of X and Y but when derived from fic the
sectorjoint
is developing rapidly, and we
distribution function the adjective “marginal” is added. provide the impetus for innovation and
movement. We develop components and
(b) In marginal distribution, the probability of different values of a for
systems ran-
internal combustion engines
dom variable in a subset of random variables is determined that without
operate more cleanly and more ef-
reference to any possible values of the other variables. ficiently than ever before. We are also
pushing forward technologies that are
bringing hybrid vehicles and alternative
drives into a new dimension – for private,
corporate, and public use. The challeng-
es are great. We deliver the solutions and
offer challenging jobs.

www.schaeffler.com/careers

25
P (Y = 3) = P (Y = 3, X = 0) + P (Y = 3, X = 1) +
P (Y = 3, X = 2) + P (Y = 3, X = 3)
ADVANCED TOPICS IN INTRODUCTORY
= p(3, 0) + p(3, 1)
PROBABILITY: A FIRST COURSE IN
+ p(3, 2) +PROBABILITY
p(3, 3) AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III1 1 2 OF BIVARIATE DISTRIBUTIONS
= 0+ + +0=
8 8 8

The results are summarised in the table below.

yj 1 2 3
g(yj ) 2
8
4
8
2
8

The results of Examples 1.1 and 1.8 (that is, the joint and marginal proba-
bilities) are usually presented in a single table such as in Table 1.4.

Note
(a) The marginal distributions of X and Y are the ordinary probability
distribution functions of X and Y but when derived from the joint
distribution function the adjective “marginal” is added.
(b) In marginal distribution, the probability of different values of a ran-
dom
Density andvariable in a subset
Distribution of random
Functions variables
of Bivariate is determined without
Distributions 23
reference to any possible values of the other variables.
(c) From the table,

n 
 m m
 n

p(xi , yj ) = g(xi , yj ) = h(xi , yj ) = 1
i=1 j=1 j=1 i=1

that is, the total probability is 1.

Table 1.4 Joint and Marginal Probabilities for Table 1.2

X Y Row
Totals
1 2 3
0 1/8 0 0 1/8
1 0 2/8 1/8 3/8
2 0 2/8 1/8 3/8
3 1/8 0 0 1/8
Column 2/8 4/8 2/8 1
Totals

Computation of Marginal Probabilities of Discrete Bivariate


Random Variables from Formula

We may also compute marginal distributions from a formula.

Example 1.9
Refer to Example 1.2. Find the marginal distributions of
(a) X, (b) Y .

Solution

(a) The marginal distribution of X is given by:


2
 1
g(x) = (3x + 2y)
y=0
21
 
2
1  
= (3x + 2y)
21 y=0 26
We may also compute marginal distributions from a formula.
ADVANCED TOPICS IN INTRODUCTORY
Example 1.9 A FIRST COURSE IN
PROBABILITY: PROBABILITY AND DISTRIBUTION FUNCTIONS
Refer to Example
PROBABILITY 1.2.
THEORY Find theIII marginal distributions of
– VOLUME OF BIVARIATE DISTRIBUTIONS
(a) X, (b) Y .

Solution

(a) The marginal distribution of X is given by:


2
 1
g(x) = (3x + 2y)
y=0
21
 
2
24 1 Advanced

=  (3x +Topics
2y) in Introductory Probability
21 y=0
1
= [(3x + 0) + (3x + 2) + (3x + 4)]
21
1
= (3x + 2)
7
(b) The marginal distribution of Y is given by:
1
 1
h(y) = (3x + 2y)
x=0
21
1
1 
= (3x + 2y)
21 x=0
1
= [(0 + 2y) + (3 + 2y)]
21
1
= (3 + 4y)
21

1.5.2 Marginal Distribution of Continuous Bivariate Random


Variables

Definition 1.12
Suppose f (x, y) be the joint probability density function of the con-
tinuous two-dimensional random variable (X, Y ). We define g(x)
and h(y), the marginal probability density function of X and Y ,
respectively by  ∞
g(x) = f (x, y) dy
−∞
and  ∞
h(y) = f (x, y) dx
−∞

Example 1.10
Refer to Example 1.3. Find the marginal probability distribution of X and
Y.

Solution
Marginal probability distribution of X:
 ∞
Density and Distribution Functions of Bivariate Distributions 25
g(x) = f (x, y) dy
−∞
 2 
xy
= x2 + dy, 0<x<1
0 3
 2
xy 2
= x y+ 2
6 0
4x
= 2x2 +
6
27
ADVANCED TOPICS
Density and IN INTRODUCTORY
Distribution Functions of Bivariate Distributions 25
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME 
 2 III  OF BIVARIATE DISTRIBUTIONS
xy
= 2
x + dy, 0<x<1
0 3
 2
xy 2
= x y+ 2
6 0
4x
= 2x2 +
6

That is, 
2x2 + 23 x, 0 < x < 1
g(x) =
0, elsewhere

Marginal probability distribution of Y :


 ∞
h(y) = f (x, y) dx
−∞
 1 
xy
= x2 + dx, 0<y<2
0 3
 1
x3 x2 y
= +
3 6 0
1 y
= +
3 6
That is 
1
+ y6 , 0 < y < 2
h(y) = 3
0, elsewhere

Definition 1.13 MARGINAL CUMULATIVE DISTRIBUTION


The marginal probability distribution function of X, denoted by


FX (x) is 678'<)25<2850$67(5©6'(*5((
FX (x) = P (X ≤ x)
and &KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
the marginal probability distribution function of Y, denoted by
FY (y) is
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
FY (y) = P (Y ≤ y)
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN

28
3 6
That is
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE
1
+ y, 0<y<2
h(y) =IN 3 6 PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 0, elsewhere OF BIVARIATE DISTRIBUTIONS

Definition 1.13 MARGINAL CUMULATIVE DISTRIBUTION


The marginal probability distribution function of X, denoted by
FX (x) is
FX (x) = P (X ≤ x)
and the marginal probability distribution function of Y, denoted by
FY (y) is
FY (y) = P (Y ≤ y)
26 Advanced Topics in Introductory Probability

Thus,

F (x, y) = P (X ≤ x, Y ≤ y)
= lim F (x, y)
y→∞
= FX (x)
 x  ∞
= f (u, y) dy du
−∞ −∞

Similarly,

F (x, y) = lim F (x, y)


x→∞
= FY (y)
 y  ∞
= f (x, v) dx dv
−∞ −∞

From this, it follows that the probability density function of X alone, known
as the marginal density of X, is

g(x) = fX (x)
= F  X (x)
 ∞
= f (x, y) dy
−∞

Similarly

h(y) = fY (y)
= FY (y)
 ∞
= f (x, y) dx
−∞

Note
The marginal probability density functions g(x) and h(y) can easily be de-
termined from the knowledge of the joint density function f (x, y). However,
the knowledge of the marginal probability density functions does not, in gen-
eral, uniquely determine the joint density function. The exception occurs
when the two random variables are independent.

29
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
Density and Distribution Functions of Bivariate Distributions 27

1.6 CONDITIONAL DISTRIBUTION OF BIVARIATE RANDOM


VARIABLES

1.6.1 Conditional Distribution of Discrete Random Variables

The conditional probability distribution of a random variable is analogous


to the concept of conditional probability of events. For two events A and B,
the multiplication law gives the probability of the intersection A ∩ B as

P (A ∩ B) = P (A)P (B|A)

where P (A) is the unconditional probability of A and P (B|A) is the condi-


tional probability of B given that A has occurred.
Now, consider ({X = x}) ∩ ({Y = y}), represented by the bivariate
event (x, y). Then it follows directly from the multiplication law of proba-
bility that the bivariate probability for the intersection (xi , yj ) is

p(xi , yj ) = g(xi )p(yj |xi )

or

p(xi , yj ) = h(yj )p(xi |yj )

The probabilities g(xi ) and h(yj ) are those associated with the marginal
probability distributions for X and Y , respectively. The probability p(x|y)
is the probability that the random variable X takes a specific value x given
that Y takes on the value y written in full as P (X = x|Y = y). Thus,
P (X = 2|Y = 1) is the conditional probability that X = 2 given that Y =
1. A similar interpretation can be attached to the conditional probability
p(y|x).

Note
Conditional distribution is the opposite of marginal distribution, in which
the probability of a value of a random variable is determined without refer-
28 Advanced Topics in Introductory Probability
ence to the possible values of the other variables.

Definition 1.14 BIVARIATE CONDITIONAL


PROBABILITY DISTRIBUTION
(Discrete Case)
Suppose X and Y are discrete random variables with joint probability
distribution p(x, y) then the conditional discrete probability function
of X given Y is

P (X = x, Y = y)
P (X = x|Y = y) = , P (Y = y) > 0
P (Y = y)

That is, given the joint probability distribution p(x, y) and marginal prob-
ability functions g(x) and h(y), respectively, the conditional discrete proba-
bility function of X given Y is
30

p(x, y)
P (X = x|Y = y) = , P (Y = y) > 0
P (XP=(Yx,=Yy)= y)
P (X = x|Y = y) = , P (Y = y) > 0
ADVANCED TOPICS IN INTRODUCTORYP (Y = y)
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS

That is, given the joint probability distribution p(x, y) and marginal prob-
ability
That is,functions
given the and
joint
g(x) h(y), respectively,
probability the p(x,
distribution conditional discrete proba-
y) and marginal prob-
bility
abilityfunction of g(x)
functions X given Y is respectively, the conditional discrete proba-
and h(y),
bility function of X given Y is
p(x, y)
p(x|y) = , h(y) > 0
h(y)y)
p(x,
p(x|y) = , h(y) > 0
h(y)
and similarly, the conditional discrete probability function of Y given X is
and similarly, the conditional discrete probability function of Y given X is
p(x, y)
p(y|x) = , g(x) > 0
g(x)y)
p(x,
p(y|x) = , g(x) > 0
g(x)
This definition shows that if we have the joint probability function
of two random
This variables
definition shows andthat
desire
if we thehave
conditional
the jointdistribution
probabilityforfunction
one of
them
of twowhen
randomthe other is held
variables andfixed,
desireit the
is merely necessary
conditional to dividefor
distribution theone
joint
of
probability
them when function
the otherbyis the
heldprobability
fixed, it is function of the fixed
merely necessary variable.
to divide the joint
probability function by the probability function of the fixed variable.
Example 1.11
Refer to the
Example 1.11table in Example 1.1. Find
Refer to the table in Example 1.1. Find
(a) P (Y = 2|X = 1)
(a) P (Y = 2|X = 1)
(b) P (X = 1|Y = 2)
(b) P (X = 1|Y = 2)
(c) P (X = 1|Y = 1)
(c) P (X = 1|Y = 1)

Scholarships

Open your mind to


new opportunities
With 31,000 students, Linnaeus University is
one of the larger universities in Sweden. We
are a modern university, known for our strong
international profile. Every year more than Bachelor programmes in
1,600 international students from all over the Business & Economics | Computer Science/IT |
world choose to enjoy the friendly atmosphere Design | Mathematics
and active student life at Linnaeus University.
Master programmes in
Welcome to join us! Business & Economics | Behavioural Sciences | Computer
Science/IT | Cultural Studies & Social Sciences | Design |
Mathematics | Natural Sciences | Technology & Engineering
Summer Academy courses

31
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY
Density and THEORY – VOLUME
Distribution III
Functions of Bivariate Distributions OF BIVARIATE
29 DISTRIBUTIONS

Solution
2
P (X = 1, Y = 2) p(1, 2) 2
(a) P (Y = 2|X = 1) = = = 8
=
P (X = 1) g(1) 3
8
3
2
P (X = 1, Y = 2) p(1, 2) 2
(b) P (X = 1|Y = 2) = = = 8
=
P (Y = 2) h(2) 4
8
4

P (X = 1, Y = 1) p(1, 1) 0
(c) P (X = 1|Y = 1) = = = 2 =0
P (Y = 1) h(1) 8

Example 1.12
Refer to Example 1.2. Find the conditional probability function
(a) p(x|y), (b) p(y|x)

Solution
p(x, y)
p(x|y) =
h(y)
(a) From Example 1.2,

1
p(x, y) = (3x + 2y)
21
From Example 1.9,
1
h(y) = (3 + 4y)
21
Hence

21 (3x + 2y)
1
p(x|y) =
21 (3 + 4y)
1

3x + 2y
=
(3 + 4y)

(b)
p(x, y)
p(y|x) =
g(x)
From Example 1.9,
30 Advanced
1 Topics in Introductory Probability
g(x) = (3x + 2)
7
Hence

21 (3x + 2y)
1
p(y|x) =
7 (3x + 2)
1

7(3x + 2y)
=
21(3x + 2)
3x + 2y
=
3(3x + 2)

1.6.2 Conditional Distribution of Continuous Variables

Definition 1.15
Let (X, Y ) be a continuous two-dimensional random variable with
joint probability density function f (x, y). Let g and h be the
marginal probability density functions of X32and Y respectively. The
conditional probability of X for given Y = y is defined by
7
7(3x + 2y)
=
ADVANCED TOPICS IN INTRODUCTORY 21(3x + 2)
PROBABILITY: A FIRST COURSE IN 3x + 2y PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III = OF BIVARIATE DISTRIBUTIONS
3(3x + 2)

1.6.2 Conditional Distribution of Continuous Variables

Definition 1.15
Let (X, Y ) be a continuous two-dimensional random variable with
joint probability density function f (x, y). Let g and h be the
marginal probability density functions of X and Y respectively. The
conditional probability of X for given Y = y is defined by

f (x, y)
f (x|y) = , h(y) > 0
h(y)

and the conditional probability of Y for given X = x is defined by

f (x, y)
f (y|x) = , g(x) > 0
g(x)

Example 1.13
Refer to Example 1.3. Find (a) f (x|y), (b) f (y|x).
Solution
From Example 1.10
1 y
h(y) = +
3 6
(a)
Density and Distribution Functions of Bivariate Distributions 31
f (x, y) Distributions
Density and Distribution Functions of Bivariate 31
f (x|y) =
Therefore h(y)
Therefore x2 + xy
f (x|y) = y
3
x132 + xy
f (x|y) = 12 y
63

3 ++62xy
6x
= , 0 ≤ x ≤ 1; 0 ≤ y ≤ 2
6x22 + 2xyy
= , 0 ≤ x ≤ 1; 0 ≤ y ≤ 2
2+y
(b) The conditional probability of Y for given X = x is
(b) The conditional probability of Y for given X = x is
f (x, y)
f (y|x) = .
(x, y)
f g(x)
f (y|x) = .
g(x)
From Example 1.10,
2
From Example 1.10, g(x) = 2x2 + x
32
g(x) = 2x2 + x
Therefore 3
Therefore
x2 + xy
f (y|x) = 3
2x + xy
x22 + 2
33 x
f (y|x) =
2x
3x + xy
2 2
3x
=
6x + xy
3x 2 2x
= 3x + y
= 6x + 2x
2
, 0 ≤ y ≤ 2; 0≤x≤1
6x + y2
3x
= , 0 ≤ y ≤ 2; 0≤x≤1
6x + 2
Example 1.14
Consider the joint probability density function
Example 1.14
33
Consider the joint probability
 density function
6(1 − y), 0 ≤ x ≤ y ≤ 1
= 3x 3x 2+
2
2 +
3
xy
=
= 6x
3x2 + xy 2x
xy
= 3x 6x 2+ 2x
ADVANCED TOPICS IN INTRODUCTORY
= 3x 6x2+
6x
+
+
+yy2x
2x
, 0 ≤ y ≤ 2; 0 ≤ x ≤ 1
PROBABILITY: A FIRST COURSE = 3x
6x
IN3x +
+ 2
y
y ,, 00 ≤ y ≤ 2; 0PROBABILITY
≤x ≤ 11 AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME
=
= III6x
6x +
+ 2
2 , 0≤ ≤ 2;
≤ yy ≤ 2; 00 ≤ ≤x x≤≤1 OF BIVARIATE DISTRIBUTIONS
6x + 2
Example 1.14
Example
Example 1.14
Example 1.14
Consider the joint probability density function
1.14
Consider
Consider the joint probability density function
Consider the
the joint
joint probability
probability  density
density function
function

 6(1 − y), 0 ≤ x ≤ y ≤ 1
f (x, y) =  6(1 − y), 0 ≤ x ≤ y ≤ 1
ff (x, = 6(1
0, − y), 00elsewhere
− y), ≤x x≤ ≤ yy ≤ ≤ 11
(x, y)
y) == 6(1 0,

elsewhere
  f (x, y)  0,
0, elsewhere
elsewhere
1  3
Find P 
Y < 1  X < 3  
 .
11  X < 34
2 
Find
Find PP Y < 3 .
Find P Y Y <
< 22  X X< < 44 ..
Solution 2 4
Solution
Solution
Solution 
1 3

   
 
32 P Y < , X <
1  Advanced 3  PTopics 2111 ,, X 4333 
 in Introductory Probability
P 
 Y <  X <  = P Y
Y <
<
 X <
<
32  1
1 
 3
Advanced
43  P Topics
Y < in 2 , Introductory
22 3 X < 4
44 Probability
P Y
P Y <
< 12  X
X <
< 3 = = P 
 X < 
P Y < 2 X < 4 =  3 
of 43

We first evaluate the2 2 numerator. 4
4 The P region
P X< 3integration is sketched
P X X< < 44
below.We first evaluate the numerator. The region of4integration is sketched
below.

Now,
Now,
   1 
1 3 2
y
P Y < , X <  =  y 6(1 − y) dx dy
1 
21 43 x=0 2
y=0
P Y < ,X< =  6(1
 1 − y) dx dy
2 4 y=0
2 x=0
= 3 y − 2 y3 21
y=0
= 13 y 2 − 2 y3 2
y=0
=
21
=
2
The denominator is evaluated as follows. First we find the marginal p.d.f of
The denominator is evaluated
X.  1 as follows. First we find the marginal p.d.f of
X. g(x) =  6(1 − y) dy = 3(1 − x)2 0 ≤ x ≤ 1
1
y=x
g(x) = 6(1 − y) dy = 3(1 − x)2 0≤x≤1
y=x
so that
so that    3
3 4
P X <  =  3 3(1 − x)2 dx
43 4
P X< = x=0 3(1 − x)2 dx
4 63
= x=0
63
64
=
64
Hence,
  
Hence, 1  3 1/2 32
P Y <  X < = =
21  43 1/2
63/64 32
63
P Y <  X < = =
2 4 63/64 63

34
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS

Now,
   1 
1 3 2
y
P Y < ,X< = 6(1 − y) dx dy
2 4 y=0 x=0
 1
= 3 y 2 − 2 y3 2

y=0
1
=
2
The denominator is evaluated as follows. First we find the marginal p.d.f of
X.  1
g(x) = 6(1 − y) dy = 3(1 − x)2 0 ≤ x ≤ 1
y=x

so that
   3
3 4
P X< = 3(1 − x)2 dx
4 x=0
63
=
64
Hence,
  
1  3 1/2 32
P Y < X< = =
2 4 63/64 63
Density and Distribution Functions of Bivariate Distributions 33

1.7 INDEPENDENCE OF BIVARIATE RANDOM VARIABLES

1.7.1 Definition of Independence of Bivariate Random


Variables

We recall from the introductory probability course that, in terms of event


probabilities, two events A and B are independent if the realization of A
is not affected by the occurrence of B. That is, two events A and B are
independent if
P (A|B) = P (A)
This condition can be carried over to two random variables. That is, two
random variables are independent if the realization of one does not affect
the probability distribution of the other.
We also recall that the definition of independence of two events are
usually given as
P (A ∩ B) = P (A)P (B)
Similarly, we may carry this condition of independence over to two random
variables as in Definition 1.16.

Definition 1.16 INDEPENDENCE OF BIVARIATE


RANDOM VARIABLES
Let X have distribution function F1 (x), Y have distribution function
F2 (y) with X and Y having joint distribution function F (x, y). Then
X and Y are said to be independent if and only if

F (x, y) = F1 (x)F2 (y)

for every pair of real numbers (x, y) 35


We also recall that the definition of independence of two events are
usually given as
ADVANCED TOPICS IN INTRODUCTORY
P (A ∩ B) = P (A)P (B)
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY
Similarly, weTHEORY – VOLUME
may carry III
this condition OF BIVARIATE DISTRIBUTIONS
of independence over to two random
variables as in Definition 1.16.

Definition 1.16 INDEPENDENCE OF BIVARIATE


RANDOM VARIABLES
Let X have distribution function F1 (x), Y have distribution function
F2 (y) with X and Y having joint distribution function F (x, y). Then
X and Y are said to be independent if and only if

F (x, y) = F1 (x)F2 (y)

for every pair of real numbers (x, y)

Thus, the independence of the random variables X and Y implies that their
joint distribution function factors into the products of their marginal distri-
bution functions. This definition applies whether the random variables are
discrete or continuous.
If X and Y are not independent, they are said to be dependent. It is
usually more convenient to verify independence or otherwise with the help
34 Advanced
of the p.m.f. (in the discrete case) Topics
or p.d.f. in Introductory
(in the Probability
continuous case).

1.7.2 Independence of Bivariate Discrete Random Variables

Definition 1.17
If X and Y are discrete random variables with joint probability func-
tion p(x, y) and marginal probability function g(x) and h(y) respec-
tively, then X and Y are independent if and only if

p(x, y) = g(x)h(y)

for all pairs of real numbers (x, y)

Example 1.15
Refer to the table in Example 1.1, verify whether or not X and Y are
independent.

Solution
Consider the ordered pair (0, 1).
From Example 1.1
1
P (X = 0, Y = 1) =
8
But from the marginal distributions,
1
P (X = 0) =
8
2
P (Y = 1) =
8  

1 2
P (X = 0)P (Y = 1) =
8 8
1
which does not equal . Therefore X and Y are not independent.
8
Example 1.16
Refer to Example 1.2. Are X and Y independent?

Solution 36
From Example 1.2
P (Y = 1) =
8  

1 2
ADVANCED TOPICS IN P (X = 0)P (Y =
INTRODUCTORY 1) =
PROBABILITY: A FIRST COURSE IN 8 8 PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
1
which does not equal . Therefore X and Y are not independent.
8
Example 1.16
Refer to Example 1.2. Are X and Y independent?

Solution
From Example 1.2
Density and Distribution Functions of1 Bivariate Distributions 35
p(x, y) = (3x + 2y)
21
From Example 1.9,
1
g(x) = (3x + 2)
7
1
h(y) = (3 + 4y)
21
Now
1 1
g(x)h(y) = (3x + 2) (3 + 4y)
7 21
1
= (9x + 12xy + 8y + 6)
147
which is not equal to p(x, y); hence X and Y are not independent.

1.7.3 Independence of Continuous Random Variables

Like the discrete case, we usually verify for independence or lack of it of


continuous random variables using the result contained in Definition 1.16.

Definition 1.18
If X and Y are are continuous random variables with a joint density
function of f (x, y) and marginal density functions of g(x) and h(y),
respectively, then X and Y are independent if and only if

f (x, y) = g(x)h(y)

for all pairs of real numbers (x, y)

Example 1.17
Refer to Example 1.3. Verify whether X and Y are independent.

Solution
From Example 1.10,
2
g(x) = 2x2 + x
3
1 y
h(y) = +
3 6

37
1 1
g(x)h(y) = (3x + 2) (3 + 4y)
7 21
ADVANCED TOPICS IN INTRODUCTORY 1
PROBABILITY: A FIRST COURSE =
IN (9x + 12xy + 8y +PROBABILITY
6) AND DISTRIBUTION FUNCTIONS
147
PROBABILITY THEORY – VOLUME III OF BIVARIATE DISTRIBUTIONS
which is not equal to p(x, y); hence X and Y are not independent.

1.7.3 Independence of Continuous Random Variables

Like the discrete case, we usually verify for independence or lack of it of


continuous random variables using the result contained in Definition 1.16.

Definition 1.18
If X and Y are are continuous random variables with a joint density
function of f (x, y) and marginal density functions of g(x) and h(y),
respectively, then X and Y are independent if and only if

f (x, y) = g(x)h(y)

for all pairs of real numbers (x, y)

Example 1.17
Refer to Example 1.3. Verify whether X and Y are independent.

Solution
From Example 1.10,
36
36 Advanced
Advanced Topics
2 in
Topics in Introductory
Introductory Probability
Probability
36 Advanced
g(x) = 2x2Topics
+ x in Introductory Probability
36 3 in Introductory Probability
Advanced Topics
Now 1 y
Now
Now h(y) = +
3 6
Now    
 2 22   1 
11 + yyy 
g(x)h(y)
g(x)h(y) =
= 2x
2x 2+ 2
2+ x
x +
g(x)h(y) =  2x + 33 x  33 + 66 
2 23 132 y6 1
2 22 +1 ++ 11 xy
g(x)h(y) = = 2x2+ 1 2y + 2
x
2
= 2 x
x2 + 1 3xx
x2 yy + 329 x
x
x++6 9 xy
xy
= 3 3 x + 3
3 + 9 9
23 2 13 2 29 19
which is
which is not
is not equal
not equal to
equal to
to = x + x y + x + xy
which 3 32 xy 9 9
ff (x, y) = x 2 + xy
(x, y) = x 2 + xy
which is not equal to f (x, y) = x + 33
3
xy
Hence
Hence X
X and
and Y
Y are
are not
not independent.
(x,
independent.
f y) = x 2
+
Hence X and Y are not independent. 3
Example
Hence
ExampleX 1.18
and Y are not independent.
1.18
Example 1.18
Suppose the
Suppose the joint
the joint p.d.f.
joint p.d.f. of
p.d.f. of X
of X and Y
and Y
X and Y isis given
is given by
given by
by
Suppose
Example 1.18 

Suppose the joint p.d.f. of  and Y 0
X 4xy,
4xy, 0is<given
x< by0 < yy <
1, 1
ff (x, y)
(x, y) =
=  4xy,
y) = 0< <xx<< 1,
1, 00 <
<y< < 11
f (x, 0,
0, elsewhere
elsewhere
0,
4xy, 0elsewhere
< x < 1, 0 < y < 1
Verify whether f (x,
or not y)X=and Y are independent.
Verify whether or not X and 0,
Y are independent.
elsewhere
Verify whether or not X and Y are independent.
Solution
Verify whether or not X and Y are independent.
Solution
Solution
We
We have
We have
have
Solution  
 11
We have g(x) = 1 f (x, y) dy
g(x) =
g(x) =  00 ff (x,
(x, y)
y) dy
dy

 0 1
1
g(x) =
=
 11 f (x, y) dy
4xy dy
=
= 000 4xy
4xy dy
dy

01 
11
=  4xy
y 22 dy
y2 1
=
= 04x y
=  4x
4x 222 10
y2 00
=
= 2x,
4x 00 <
2x, x< 1
= 2x, 2 0 <
<x < 11
x<
0
Similarly
Similarly = 2x, 0 < x < 1
Similarly


Similarly  11 38
h(y) = 1 f (x, y)dx
h(y) =  00 ff (x,
h(y) = (x, y)dx
y)dx
1
= 4xy dy
0
ADVANCED TOPICS IN INTRODUCTORY
 1
PROBABILITY: A FIRST COURSE IN =
y2
4x PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY THEORY – VOLUME III 2 0 OF BIVARIATE DISTRIBUTIONS
= 2x, 0 < x < 1

Similarly
 1
h(y) = f (x, y)dx
0
= 2y, 0<y<1

Hence,
f (x, y) = g(x)h(y)
Density and Distribution Functions of Bivariate Distributions 37
for all real numbers (x, y), and therefore X and Y are independent.

To conclude this chapter we shall make two important observations.

(a) Multivariate Distributions

Our discussion of the bivariate case can be readily extended to multivariate


distributions. Apart from added computational labour, joint distributions
of three or more variables do not pose new analytical problems. Thus, we
simply state here that the joint probability mass function of k discrete ran-
dom variables p(x1 , x2 , ..., xk ) is specified to be nonnegative and its sum
over the k dimensional space is one. There are marginal density functions of
various dimensions in that case. Suppose X, Y, and Z are jointly continu-
ous random variables with density function f (x, y, z), The one dimensional
marginal distribution of X is
 ∞  ∞
fX (x) = f (x, y, z) dy dz
−∞ −∞

and the two-dimensional marginal distribution of X and Y is


 ∞
fXY (x, y) = f (x, y, z) dz
−∞

(b) Algebra of Random Variables

The four basic operations of algebra, namely, addition, subtraction, multi-


plication and division can be performed on random variables to form new
random variables. That is, if X and Y are random variables and c is a
X
real number, then we can form new variables X + Y, X − Y, XY, , c+
Y
X, cX. We can also form the random variable f (X), where f is any func-
tion on R. That is, if X has range {x1 , x2 , · · · , xn }, then f (X) has range
{f (x1 ), f (x2 ), · · · , f (xn )}. For example, if X is a random variable taking
1 1 1 1
the values {1, 2, 3, 4} with probabilities , , , and , respectively, then
2 8 4 8
1 1 1 1
X takes the values {1, 4, 9, 16} with probabilities , , , and , respec-
2
2 8 4 8
tively.
In general, the probability distributions of cX (that is, adding a random
variable to itself c times) and f (X) (if f is one-to-one) such as multiplying a
38 Advanced Topics in Introductory Probability
random variable by itself, or finding the root or cosine of a random variable
are the same as that of X. However, the determination of the probability
X
distributions of X ± Y , XY , and (Y = 0) is not straightforward and this
Y
is the subject of the next chapter.

EXERCISES

39
1.1 Given the following joint frequency distribution of X and Y
X
distributions of X ± Y , XY , and (Y = 0) is not straightforward and this
Y
is the subject
ADVANCED of the
TOPICS next chapter.X
IN INTRODUCTORY
distributions
PROBABILITY: AofFIRST , XY , and
X ± YCOURSE IN (Y =
 0) is not straightforward and DISTRIBUTION
PROBABILITY AND this FUNCTIONS
PROBABILITY THEORY – VOLUME III
Y OF BIVARIATE DISTRIBUTIONS
is the subject of the next chapter.
EXERCISES

EXERCISES
1.1 Given the following joint frequency distribution of X and Y

1.1 Given the following joint frequency distribution of X and Y


X Y
1 2 3
X1 3 Y5 1
2 12 24 36
13 31 52 10
24 24 41 65
3 1 2 0
4 4 1 5
Calculate the joint probability of X and Y .

Calculate the joint probability of X and Y .


1.2 Refer to Exercise 1.1.

1.2 Refer to Exercise


(a) Find 1.1. probability distribution of
the marginal
(i) X, (ii) Y
(a) Find the marginal probability distribution of
(b) Verify the independence of X and Y for the data of Exercise 1.1.
(i) X, (ii) Y
(b) Verify
1.3 Refer the independence
to Exercise of X and Y for the data of Exercise 1.1.
1.1. Calculate

1.3 Refer
(a) Pto(XExercise
< 2, Y 1.1.
≤ 3),Calculate
(b) P (2 ≤ X < 3, 1 ≤ Y ≤ 2)
(c) P (1 < X < 2, Y < 3), (d) P (X < 2, 0 ≤ Y < 2)
(a) P (X < 2, Y ≤ 3), (b) P (2 ≤ X < 3, 1 ≤ Y ≤ 2)
(c) Pto
1.4 Refer (1 Exercise
< X < 2,1.1.
Y < 3),
Find (d) P (X < 2, 0 ≤ Y < 2)

1.4 Refer
(a) Pto(YExercise
= 3|X = 1.1.
1), Find (b) P (Y = 2|X = 3),
(c) P (X = 2|Y = 2), (d) P (X = 1|Y = 2),
(a) P (Y = 3|X = 1), (b) P (Y = 2|X = 3),
(e) P (X = 3|Y = 1), (f) P (X = 3|Y = 2)
(c) P (X = 2|Y = 2), (d) P (X = 1|Y = 2),
(e) Pto
1.5 Refer (XExample
= 3|Y =1.5.
1), Solve(f)part
P (X = 3|Y = 2)
(c).

1.5 Refer to Example 1.5. Solve part (c).

40
1.3 Refer to Exercise 1.1. Calculate
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
(a) P (X < 2, Y ≤ 3),
PROBABILITY THEORY – VOLUME III
(b) P (2 ≤ X < 3, 1 ≤ Y ≤ 2) OF BIVARIATE DISTRIBUTIONS
(c) P (1 < X < 2, Y < 3), (d) P (X < 2, 0 ≤ Y < 2)

1.4 Refer to Exercise 1.1. Find

(a) P (Y = 3|X = 1), (b) P (Y = 2|X = 3),


(c) P (X = 2|Y = 2), (d) P (X = 1|Y = 2),
(e) P (X = 3|Y = 1), (f) P (X = 3|Y = 2)
Density and Distribution Functions of Bivariate Distributions 39
1.5 Refer to Example 1.5. Solve part (c).

1.6 The joint probability mass function of X and Y is given by

1 1 1 1
p(1, 1) = , p(1, 2) = , p(2, 1) = , p(2, 2) =
8 4 8 2
(a) Compute the conditional probability mass function of X, given
Y = i, i = 1, 2.
(b) Are X and Y independent?

1.7 Consider the bivariate discrete random variables X and Y with prob-
ability function

1
p(x, y) = (2x + y) x = 0, 1, 2; y = 0, 1, 2.
27
(a) Verify that p(x, y) is a legitimate probability mass function.
(b) Find the joint probability

P (1 ≤ X ≤ 2, 0 ≤ Y ≤ 1)

(c) Find the marginal distributions of X and Y .


(d) Find the conditional probability of
(i) Y for given X = x
(ii) X for given Y = y
(e) Verify whether X and Y are independent or not.

1.8 A bivariate discrete random variable (X, Y ) is such that

P (X = x, Y = y) = θx (1 − θ)y ,

where 0 < θ < 1 and x, y = 1, 2, 3, ...

(a) Show that the above probability mass function is legitimate


(b) Find the marginal probability distribution of X and Y
(c) Find the conditional probability of
(i) X given Y = 2, (ii) Y given X = 0
(d) Verify whether X and Y are independent or not.

41
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
PROBABILITY
40 THEORY – VOLUME Advanced
III OF BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability

1.9 A bivariate discrete random variable (X, Y ) has the following proba-
bility mass function.
 
x+y x
p (1 − p)y e−λ λx+y
x
p(x, y) = , x, y = 0, 1, 2, ...
(x + y)!
(a) Show that the marginal probability function of X is

e−λ p (λ p)x
g(x) = ; x = 0, 1, 2, ...
x!
(b) Find the marginal probability mass function of Y .
(c) Examine whether X and Y are independent.

1.10 Consider the following function of two discrete random variables


1 2
f (x, y) = (x − y), f or x = 2, 3; y = 1, 2
20
(a) Find the value of k such that the function is a legitimate proba-
bility mass function.
(b) Construct the joint probability table.
(c) Find the marginal probability distributions of X and Y.
(d) Find the conditional probability of
(i) Y for given X = x
(ii) X for given Y = y
(e) Verify whether or not X and Y are independent.

1.11 Suppose that (X, Y ) is a two-dimensional continuous random variable


with joint probability density function.

k(x + y − 2xy), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0, elsewhere

where k is a constant.

Density(a)
andFind the value Functions
Distribution of k. of Bivariate Distributions 41
(b) Find the marginal probability distribution of X and Y .
(c) Find the conditional probability
(i) X given Y = y
(ii) Y given X = x
(d) Verify whether X and Y are independent or not.
(e) P (X > Y )

1.12 A bivariate continuous random variable (X, Y ) has joint probability


density function

k(x + 2y), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0, elsewhere

where k is a constant.

(a) Find the value of k.


(b) Find the marginal probability distribution of X and Y .
42
(c) Find the conditional probability of
(c) Find the conditional probability
(i) X given Y = y
ADVANCED TOPICS IN INTRODUCTORY
(ii) Y given X = x
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
(d) Verify
PROBABILITY whether
THEORY X and
– VOLUME III Y are independent or not. OF BIVARIATE DISTRIBUTIONS

(e) P (X > Y )

1.12 A bivariate continuous random variable (X, Y ) has joint probability


density function

k(x + 2y), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0, elsewhere

where k is a constant.

(a) Find the value of k.


(b) Find the marginal probability distribution of X and Y .
(c) Find the conditional probability of
(i) Y given X = x
(ii) X given Y = y
(d) Verify whether or not X and Y are independent

1.13 Suppose that a bivariate function is given by



f (x, y) = a(x2 + xy), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
f (x, y) =
0, elsewhere

Find

(a) the constant a such that f (x, y) is a p.d.f.


(b) P (X > Y );
(c) the marginal p.d.f. of X;
(d) the marginal p.d.f. of Y.

1.14 The joint probability density function of X and Y is given by



42 7 (x
6 2
+ xy
2 ),
Advanced 0<x<
Topics in 1, 0 < y < 2Probability
Introductory
f (x, y) =
0, elsewhere
(a) Verify that this is indeed a joint p.d.f.
(b) Compute the distribution function of X.
(c) Find P (X > Y )
  
1  1
(d) Find P Y <  X > .
2 2
(e) Verify whether X and Y are independent.

1.15 Consider the following joint density function

f (x, y) = λ2 eλy , 0≤x≤y

Find

(a) the marginal p.d.f. of X


(b) the marginal p.d.f. of Y ;
(c) the conditional p.d.f. of Y given X = x;
(d) the conditional p.d.f. of X given Y = y

1.16 The number of people that enter a church in a given hour is a Poisson
random variable with λ = 10. Compute the conditional probability
that at most 2 men entered the church in a given hour, given that 10
women entered in that hour. What assumptions have you made?

1.17 The joint probability density function of 43


X and Y is given by

(a) the marginal p.d.f. of X
ADVANCED TOPICS IN INTRODUCTORY
(b) the marginal p.d.f. of Y ;
PROBABILITY: A FIRST COURSE IN PROBABILITY AND DISTRIBUTION FUNCTIONS
(c) the
PROBABILITY conditional
THEORY p.d.f.
– VOLUME III of Y given X = x; OF BIVARIATE DISTRIBUTIONS

(d) the conditional p.d.f. of X given Y = y

1.16 The number of people that enter a church in a given hour is a Poisson
random variable with λ = 10. Compute the conditional probability
that at most 2 men entered the church in a given hour, given that 10
women entered in that hour. What assumptions have you made?

1.17 The joint probability density function of X and Y is given by



xe−(x+y) , x > 0, y > 0
f (x, y) =
0, elsewhere

(a) Find the marginal p.d.f of (i) X (ii) Y


(b) Find the conditional p.d.f of
(i) X, given Y = y;
(ii) Y given X = x.
(c) Verify whether X and Y are independent.

1.18 The joint probability density function of X and Y is given by



Density and Distribution 2, of
Functions 0< 0<y<1
x < y, Distributions
Bivariate 43
f (x, y) =
0, elsewhere

(a) Find the marginal p.d.f of (i) X, (ii) Y.


(b) Find the conditional p.d.f of
(i) X, given Y = y, (ii) Y given X = x

1.19 The joint probability density function of X and Y is given by

f (x, y) = λ1 λ2 exp{−(λ1 x + λ2 y)}, 0 < x < ∞, 0 < y < ∞

(a) Find the marginal p.d.f. of (i) X (ii) Y .


(b) Verify whether X and Y are independent.

1.20 Show that the conditional densities f (x|y) and f (y|x) are legitimate
density functions.

44
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS

Chapter 2

SUMS, DIFFERENCES, PRODUCTS AND


QUOTIENTS OF BIVARIATE DISTRIBUTIONS

2.1 INTRODUCTION

In the previous chapter, we discussed the joint probability distributions,


cumulative distributions and marginal distributions of random variables X
and Y . We also discussed the independence of bivariate random variables.
In this chapter, we shall take up the distribution of sums, differences,
products and quotients of X and Y . The numerical characterisation of the
joint distribution of X and Y will be discussed in the next chapter. As
has been the practice in this book, discussions will be done separately for
discrete and continuous cases.

2.2 SUMS OF BIVARIATE RANDOM VARIABLES

2.2.1 Sums of Discrete Bivariate Random Variables

We shall introduce the concept of the sum of discrete random variables with
an example.

Example 2.1
Consider a sequence of independent experiments in each of which a fixed
event is of interest. Suppose X1 , X2 , · · · , Xn are discrete random variables

44 WE WILL TURN YOUR CV


INTO AN OPPORTUNITY
OF A LIFETIME

Do you like cars? Would you like to be a part of a successful brand?


Send us your CV on
As a constructer at ŠKODA AUTO you will put great things in motion. Things that will
www.employerforlife.com
ease everyday lives of people all around Send us your CV. We will give it an entirely
new new dimension.

45
In this chapter, we shall take up the distribution of sums, differences,
products and quotients of X and Y . The numerical characterisation of the
joint distribution
ADVANCED of INTRODUCTORY
TOPICS IN X and Y will be discussed in the next chapter. As
has been the Apractice
PROBABILITY: in thisINbook, discussions will be done SUMS,
FIRST COURSE separately for
DIFFERENCES, PRODUCTS AND
PROBABILITY
discrete and THEORY
continuous– VOLUME
cases. III QUOTIENTS OF BIVARIATE DISTRIBUTIONS

2.2 SUMS OF BIVARIATE RANDOM VARIABLES

2.2.1 Sums of Discrete Bivariate Random Variables

We shall introduce the concept of the sum of discrete random variables with
an example.

Example 2.1
Sums, Differences, Products and Quotients of Bivariate Distributions 45
Consider a sequence of independent experiments in each of which a fixed
event is of interest. Suppose X1 , X2 , · · · , Xn are discrete random variables
defined by
 44
1, if the fixed event happens in the ith experiment;
Xi =
0, if the fixed event does not happen in the ith experiment.

Let
Zn = X1 + X2 + · · · + Xn
The random variable Zn is a sum of discrete random variables Xi , i =
1, 2, ..., n, and denotes the number of times the fixed event happens in the
n trials.
For instance, the President of Regent University receives visitors com-
ing every day from the two surrounding communities. The number of visitors
on a particular day from the communities are X1 and X2 , respectively. Let
Z2 = X1 + X2 . The random variable Z2 is a sum of two random variables
and gives the total number of visitors coming to the President of Regent
University on a particular day.

Distribution of Sums of Discrete Bivariate Random Variables


in Tabular Form

Case 1
Suppose that the joint distribution of X and Y is in the form of the table as
presented in Table 1.3. Such a table can be referred to as the parent joint
probability distribution table. Then the various sums of the discrete
random variables X and Y, denoted as X + Y, and their corresponding
probabilities may be presented as in Table 2.1. This can be referred to as
the derived joint probability distribution table.

Table 2.1 Various Sums (X + Y ) and their Corresponding Probabilities

xi + yj x1 + y1 x1 + y2 ··· x1 + ym x2 + y1 x2 + y2 ···
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···

··· x2 + ym xn + y1 xn + y2 ··· xn + ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )

To find the distribution of the sum, we apply the principle of the probabilities

46
xi + yj x1 + y1 x1 + y2 ··· x1 + ym x2 + y1 x2 + y2 ···
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
· · · x2 + ym x + y1 xn + y2 · · · xn + ym
PROBABILITY THEORYn– VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
· · · p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) · · · p(xn , ym )
46 Advanced Topics in Introductory Probability
To find the distribution of the sum, we apply the principle of the probabilities
of equivalent events.1 By this principle, the probabilities of equivalent events
are equal. That is, if A ⊂ S and B ⊂ RX are equivalent events, then we
define the probability of the event B, P (B), to be equal to P (A).
This principle is illustrated in some examples below after the following
theorem.

46 Theorem 2.1 Advanced Topics in Introductory Probability


The distribution (probability) of the sum of X = xi and Y = yj
being equal to k is the sum of the probabilities that correspond to
of equivalent
all indicesevents.
i and 1j By this
that sumprinciple,
to k the probabilities of equivalent events
are equal. That is, if A ⊂ S and B ⊂ RX are equivalent events, then we
define the probability of the event B, P (B), to be equal to P (A).
This principle is illustrated in some examples below after the following
Example 2.2
theorem.
Given the table below, find the distribution of the sum X+Y.

Theorem 2.1
Y
The distribution (probability) of the sum of X = xi and Y = yj
X −1 0 1 Row Totals
being equal to k is the sum of the probabilities that correspond to
j that sum0.10
all indices i and−1 to k 0.20 0.11 0.41
0 0.08 0.02 0.26 0.36
2 0.03 0.17 0.03 0.23
Column Totals 0.21 0.39 0.40 1.00
Example 2.2
Given the table below, find the distribution of the sum X+Y.
Solution
We shall list the values of x + y in each cell within parenthesis as in the
table Differences,
Sums, below: Products and Quotients Y of Bivariate Distributions 47
X −1 0 1 Row Totals
1
The events A ⊂ S −1 0.10called
and B ⊂ RX are 0.20 0.11 0.41
Y equivalent events if
0 0.08 0.02 0.26 0.36
X −1 0 1 Row Totals
2 0.03 0.17
A = {s ∈ S|X(s) ∈ B} 0.03 0.23
−1 0.10 0.20 0.11 0.41
Column Totals 0.21 0.39 0.40 1.00
where RX is the range space of the(-2)
random(-1)
variable(0)
X.
0 0.08 0.02 0.26 0.36
(-1) (0) (1)
Solution 2 0.03 0.17 0.03 0.23
We shall list the values of x + (1)y in (2)
each cell(3) within parenthesis as in the
table below:Column Totals 0.21 0.39 0.40 1.00

The1values for x+y,


The events A ⊂ Salso
and known
B ⊂ RX as
arethe support
called eventsare
of x+y,
equivalent if {−2, −1, 0, 1, 2, 3}
and the probability of each value P [(X + Y ) = x + y] is the sum of all the
2

cell probabilities where that valueA = {soccurs.


∈ S|X(s)For
∈ B}example, there are two (mu-
tually exclusive) ways that the sum X + Y can equal 0, and that is, if X = 0
and Y R=X 0is or
where the if X =
range −1ofand
space Y = 1,variable
the random so that
X.the probability of the sum
X + Y = 0 is P (X = 0, Y = 0) or P (X = −1, Y = 1). That is,
P [(X + Y ) = 0] = P (X = 0, Y = 0) + P (X = −1 Y = 1)
= 0.11 + 0.02
= 0.13 = p(0) 47
0 0.08
(-1) 0.02(0) 0.030.26
(1) 0.36
2 0.03 0.17 0.23
2 (-1)
0.03 (0) (1)
(1) 0.17 (2) 0.03 (3) 0.23
ADVANCED TOPICS IN 2INTRODUCTORY 0.03 0.17 0.03 0.23
(1) (2) (3)
PROBABILITY: Column Totals IN0.21
A FIRST COURSE (1) 0.39 (2) 0.40 (3) 1.00 SUMS, DIFFERENCES, PRODUCTS AND
Column– Totals
PROBABILITY THEORY VOLUME III0.21 0.39 0.40 1.00QUOTIENTS OF BIVARIATE DISTRIBUTIONS
Column Totals 0.21 0.39 0.40 1.00
The values for x+y, also known as the support of x+y, are {−2, −1, 0, 1, 2, 3}
The values
and the for x+y, also
probability known
of each as2the
value support
P [(X +Y) = + y] are
ofxx+y, {−2,
is the sum−1,of0,all
1, 2,
the3}
The
and values
the for x+y,
probability also
of known
each as2the
value P support
[(X + Y ) ofxx+y,
= + y] are
is {−2,
the sum−1,of0,all
1, 2,
the3}
cell probabilities where that value2 occurs. For example, there are two (mu-
and
cell the probability
probabilities of each
wherethat value
thatthe
value P [(X + Y ) = x + y] is the sum of all the
tually exclusive) ways sumoccurs.
X + Y canFor equal
example,
0, andthere are
that is,two
if X(mu-
=0
cell probabilities
tually where that value occurs. For example, there are two X(mu-
and Y exclusive)
= 0 or if waysX =that−1 the
andsumY =X1,+ Y so can
thatequal 0, and that is,
the probability of if
the =0
sum
tually
and Y exclusive)
= 0 or if ways
X = that
−1 the
and sum
Y = X1,+ Yso can
that equal
the 0, and that is,
probability of if
the =0
X sum
X + Y = 0 is P (X = 0, Y = 0) or P (X = −1, Y = 1). That is,
and Y = 0 or if X = −1 and Y = 1, so that
X + Y = 0 is P (X = 0, Y = 0) or P (X = −1, Y = 1). That is, the probability of the sum
X +Y = 0 is +
P [(X P (X 0, Y== 0)
Y ) = 0] or P
P (X =(X0, Y= =−1,0) Y+ = 1). =
P (X That
−1 Yis,= 1)
P [(X + Y ) = 0] = P (X = 0, Y = 0) + P (X = −1 Y = 1)
P [(X + Y ) = 0] = = P 0.11
(X+=0.020, Y = 0) + P (X = −1 Y = 1)
= 0.11 + 0.02
= 0.13 =
= 0.11 + 0.02 p(0)
= 0.13 = p(0)
Again, the only way X += Y can0.13equal
= p(0)2 is when X = 2 and Y = 0 so that
Again, the only way X + Y can
the probability of the sum X + Y = 2 is equal 2 is when X = 2 and Y = 0 so that
Again, the only way X + Y can
the probability of the sum X + Y = 2 is equal 2 is when X = 2 and Y = 0 so that
the probability of P the
[(Xsum
+ YX) += Y2] ==2 isP (X = 2, Y = 0)
P [(X + Y ) = 2] = P (X = 2, Y = 0)
P [(X + Y ) = 2] = = P 0.17
(X ==p(2)
2, Y = 0)
= 0.17 = p(2)
Thus, we can summarise the table = 0.17 = p(2)
as follows:
Thus, we can summarise the table as follows:
Thus, we can summarise the table as follows:
(x + y) −2 −1 0 1 2 3
(x + y) −2 −1 0
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03 1 2 3
(x +
p(x +y)y) −20.1 0.28−1 0.13 0 1
0.29 2
0.17 3
0.03
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
Aliter
Aliter
We can obtain this distribution directly from the joint distribution table by
Aliter
We can obtain this
this example
distribution
going through stepdirectly
by step.from the joint distribution table by
We can
going obtain this
through this example
distribution
step directly
by step.from the joint distribution table by
going through this example step by step.
2
Take note that
2
Takenote that p(x + y) = P [(X + Y ) = x + y]
2
Take note that p(x + y) = P [(X + Y ) = x + y]
p(x + y) = P [(X + Y ) = x + y]

Develop the tools we need for Life Science


Masters Degree in Bioinformatics

Bioinformatics is the
exciting field where biology,
computer science, and
mathematics meet.

We solve problems from


biology and medicine using
methods and tools from
computer science and
mathematics.

Read more about this and our other international masters degree programmes at www.uu.se/master

48
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
48
PROBABILITY THEORY – VOLUME Advanced
III QUOTIENTS
Topics in Introductory OF BIVARIATE DISTRIBUTIONS
Probability

Step 1
Indicate the various summands with their corresponding joint probabilities

xi + yj −1 + (−1) −1 + 0 −1 + 1 0 + (−1) 0+0 0+1


p(xi , yj ) 0.10 0.20 0.11 0.08 0.02 0.26

2 + (−1) 2+0 2+1


0.03 0.17 0.03

Step 2
48
Sum the various values for X Advanced
and Y Topics in Introductory Probability

x+y −2 −1 0 −1 0 1 1 2 3
Step 1
p(x, y) 0.1 0.2 0.11 0.08 0.02 0.26 0.03 0.17 0.03
Indicate the various summands with their corresponding joint probabilities

By the principle of equality of the probabilities of equivalent events, we shall


x + yj −1 + (−1) −1 + 0 −1 + 1 0 + (−1) 0 + 0 0 + 1
perform ithe operation in Step 3.
p(xi , yj ) 0.10 0.20 0.11 0.08 0.02 0.26
Step 3
2 + (−1) 2 + 0 2 + 1
Sum the various probabilities corresponding to a particular value of the sum
0.03 0.17 0.03
x+y, denoted by p(x+y) (by the addition rule of probability)3 . For example,
the probability of the sum x + y = 1, is given by
Step 2
P [(X +
Sum the various values ) =and
forYX 1] Y= p(1)
= 0.26 + 0.03 = 0.29
x+y −2 −1 0 −1 0 1 1 2 3
Thus, we summarise the table in Step 3 in the table below:
p(x, y) 0.1 0.2 0.11 0.08 0.02 0.26 0.03 0.17 0.03

x+y −2 −1 1 0 2 3
By the principle of equality of the probabilities of equivalent events, we shall
p(x + y) 0.1 0.2 + 0.08 0.26 + 0.03 0.11 + 0.02 0.17 0.03
perform the operation in Step 3.
Step 4
Step 3
Present the final result as in the table below:
Sum the various probabilities corresponding to a particular value of the sum
x+y, denoted by p(x+y) (by the addition rule of probability)3 . For example,
Sums, Differences,(x +Products
y) −2 −1Quotients 0 of1by 2 Distributions
3 49
the probability of the sum x and+ y = 1, is given Bivariate
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
P [(X + Y ) = 1] = p(1)
3
We iscan
That if averify thatvalue
particular thisindistribution of +
the sum appears
= 0.26 the sum
more
0.03 = is
than a probability
once,
0.29 distri-
their probabilities
are addedThat
bution. together
is, for the purpose of constructing the probability distribution.
Thus, we summarise the table0 in Step+3y)in≤the
≤ p(x 1 table below:
and x+y −2 −1  1 0 2 3
p(x + y) = 1
p(x + y) 0.1 0.2 + 0.08 0.26 + 0.03 0.11 + 0.02 0.17 0.03

Step 4
Case 2
Present the final result as in the table below:
Suppose that the joint distribution of X and Y is in the form of Table 1.3.
Suppose also that X and Y are independent so that p(xi , yj ) = g(xi )h(yj ).
(x + y) −2 −1 0 1 2 3
Then the various sums X + Y and their corresponding probabilities may be
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
presented in the form of the following table.
3
That is if a particular value in the sum appears more than once, their probabilities
are added together for the purpose of constructing the probability distribution.
xi + yj x1 + y1 x2 + y2 ··· xn + ym
p(xi , yj ) p(x1 )p(y1 ) p(x2 )p(y2 ) ··· p(xn )p(ym )

49
Theorem 2.2
Case 2
Suppose that the joint distribution of X and Y is in the form of Table 1.3.
ADVANCED TOPICS IN INTRODUCTORY
Suppose also that X and Y are independent so that p(xi , yj ) = g(xi )h(yj ).
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
Then the various sums X + Y and their corresponding probabilities
PROBABILITY THEORY – VOLUME III
may be
QUOTIENTS OF BIVARIATE DISTRIBUTIONS
presented in the form of the following table.

xi + yj x1 + y1 x2 + y2 ··· xn + ym
p(xi , yj ) p(x1 )p(y1 ) p(x2 )p(y2 ) ··· p(xn )p(ym )

Theorem 2.2
Suppose that X and Y are independent, discrete random variables
with marginal probability distributions g(x) and h(y) respectively,
then P (X + Y = k) is the sum of the product of g(x) and h(y)
for the corresponding value k = x + y

Example 2.3
Suppose the joint probabilities of two random variables X and Y are given
in the table below.

Y
X 3 6 Row Total
-2 0.28 0.12 0.4
4 0.42 0.18 0.6
Column Total 0.7 0.3 1.0

50 Advanced Topics in Introductory Probability


(a) Verify that X and Y are independent random variables.
(b) Find the distribution of the sum of X and Y .

Solution
(a) The marginal probability distributions of the X is

x -2 4
g(x) 0.4 0.6

and the marginal probability distributions of the Y is

y 3 6
h(y) 0.7 0.3

so that we have

p(−2, 3) = 0.28 = g(−2)h(3) = 0.4(0.7) = 0.28


p(−2, 6) = 0.12 = g(−2)h(6) = 0.4(0.3) = 0.12
p(4, 3) = 0.42 = g(4)h(3) = 0.6(0.7) = 0.42
p(4, 6) = 0.18 = g(4)h(6) = 0.6(0.3) = 0.18

Hence, X and Y are independent.


(b) It has been shown that X and Y are independent. Hence, we obtain
the following table:

xi + yj −2 + 3 −2 + 6 4+3 4+6
p(xi + yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
50
p(−2, 6) = 0.12 = g(−2)h(6) = 0.4(0.3) = 0.12
ADVANCED TOPICSp(4, 3) = 0.42 =
IN INTRODUCTORY g(4)h(3) = 0.6(0.7) = 0.42
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
p(4, 6) = 0.18 = g(4)h(6) = 0.6(0.3) = 0.18
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
Hence, X and Y are independent.
(b) It has been shown that X and Y are independent. Hence, we obtain
the following table:

xi + yj −2 + 3 −2 + 6 4+3 4+6
p(xi + yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)

Note
Even though the values of the random variables of X and Y are added
together, their corresponding probabilities g(xi ) and h(yj ) are multi-
plied, if X and Y are independent.

Sums,By the principle


Differences, of equality
Products of probabilities
and Quotients of equivalent
of Bivariate 51
events, the
Distributions
distribution of the sum of X and Y is presented in the table below:
xi + yj 1 4 7 10
p(xi + yj ) 0.28 0.12 0.42 0.18

Distribution of Sums of Discrete Bivariate Random Variable in


Expression Form

In most cases the distribution of X and Y may not be presented in a table.


Suppose that all that we know are the values of X + Y and their corre-
sponding probabilities. In such a case, to obtain the distribution of X + Y
we shall use Theorem 2.3.

Theorem 2.3
Suppose X and Y are nonnegative discrete random variables with
joint probability function p(x, y). Let Z = X + Y . Then the proba-
bility distribution of Z is
z

P (Z = z) = p(x, z − x)
x=0

or equivalently,
Copenhagen P (Z = z) =
z


y=0
p(z − y, y)

Master of Excellence
Proof
cultural studies
P (Z = z)Master
Copenhagen = P (Xof+Excellence
Y = z) are
z
two-year master= degrees
P (X taught
= x, X +in
Y =English
z)
at one of Europe’s x=0
leading universities religious studies
z

= P (X = x, Y = z − x) (i)
Come to Copenhagenx=0
- and aspire!

fromApply now
which the at follows.
result science
Similarly,
www.come.ku.dk
z

P (Z = z) = p(z − y, y)
y=0

51
In most cases
ADVANCED the IN
TOPICS distribution of X and Y may not be presented in a table.
INTRODUCTORY
Suppose thatAall
PROBABILITY: that
FIRST we know
COURSE IN are the values of X + Y and their
SUMS, corre-
DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME
sponding probabilities. In suchIIIa case, to obtain the distribution
QUOTIENTSof OF
X+ BIVARIATE
Y DISTRIBUTIONS

we shall use Theorem 2.3.

Theorem 2.3
Suppose X and Y are nonnegative discrete random variables with
joint probability function p(x, y). Let Z = X + Y . Then the proba-
bility distribution of Z is
z

P (Z = z) = p(x, z − x)
x=0

or equivalently,
z

P (Z = z) = p(z − y, y)
y=0

Proof

P (Z = z) = P (X + Y = z)
z

= P (X = x, X + Y = z)
x=0
z
= P (X = x, Y = z − x) (i)
x=0

from which the result follows.


Similarly,
z

P (Z = z) = p(z − y, y)
52 y=0
Advanced Topics in Introductory Probability

Note
It is important to note the limits of summation. Beyond these limits, one
of the two component mass functions is zero. When dealing with densities
that are nonzero only on some subset of values, we must always be careful.
In case X and Y are allowed to take negative values as well, the lower index
of summation is changed from 0 to −∞.

Example 2.4
Refer to Example 1.2, find

(a) the distribution of Z = X + Y ;

(i) from the first principle (that is, using the joint distribution of X
and Y directly;)
(ii) using Theorem 2.3.

(b) P (Z = 2),

(i) using the distribution obtained in (a) (i);


(ii) using the distribution obtained in (a) (ii).

Solution
Recall from the solution of Example 1.2 that

1
p(x, y) = (3x + 2y), for x = 0, 1; y = 0, 1, 2
21
The various pairs of X and Y that we require 52
are
(b) P (Z = 2),
ADVANCED TOPICS IN INTRODUCTORY
(i) using
PROBABILITY: the distribution
A FIRST COURSE IN obtained in (a) (i); SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
(ii) using the distribution obtained in (a) (ii).

Solution
Recall from the solution of Example 1.2 that

1
p(x, y) = (3x + 2y), for x = 0, 1; y = 0, 1, 2
21
The various pairs of X and Y that we require are

(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)

By the principle of equality of probabilities of equivalent events, the events


{(0, 1)} and {1, 0} are equivalent to the event {Z = 1}. Similarly, the
events {(0, 2)} and {(1, 1)} are equivalent to the event {Z = 2}. Hence,
the possible values of Z are z = 0, 1, 2, 3.

(a) (i) Writing the sum of the joint probability mass function as
1 2
1 
P (X + Y = x + y) = (3x +
Sums, Differences, Products and Quotients21of Bivariate 2y)
Distributions 53
x=0 y=0

we calculate the distribution of Z = X + Y as follows.

P (Z = 0) = p(0, 0)
1
= [3(0) + 2(0)]
21
= 0

P (Z = 1) = p(0, 1) + p(1, 0)
1 1
= [3(0) + 2(1)] + [3(1) + 2(0)]
21 21
5
=
21

P (Z = 2) = p(0, 2) + p(1, 1)
1 1
= [3(0) + 2(2)] + [3(1) + 2(1)]
21 21
9
=
21

P (Z = 3) = p(1, 2)
1
[3(1) + 2(2)]
21
7
=
21

Hence the distribution of Z = X + Y is

x+y 0 1 2 3
p(x + y) 0 5
21
9
21
7
21

(ii) The distrbution of Z = X + Y by Theorem 2.3 is


 z
 


 p(0, z − 0), z = 0, 1, 2, 3
 x=0
P (Z = z) = 53

 z
 

x+y
ADVANCED TOPICS IN INTRODUCTORY 0 1 2 3
PROBABILITY: A FIRST COURSE IN+ y)
p(x 0 5 9 7 SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III 21 21 21 QUOTIENTS OF BIVARIATE DISTRIBUTIONS

(ii) The distrbution of Z = X + Y by Theorem 2.3 is


 z
 


 p(0, z − 0), z = 0, 1, 2, 3
 x=0
P (Z = z) =

54  z
Advanced

 Topics in Introductory Probability
 p(1, z − 1), z = 1, 2, 3
x=1

Aliter
 z
 



p(0, z − 0), z = 0, 1, 2, 3

 y=0





 z
P (Z = z) = p(1, z − 1), z = 1, 2, 3

 y=1





 z

 

 p(2, z − 2), z = 2, 3
y=2

(b) (i) From the table obtained in (a), that is, calculating the probability
from first principle, we have
3
P (Z = 2) =
7
(ii) By Theorem 2.3 (that is, using the results in (a)(ii)) we have
1

P (Z = 2) = p(x, 2 − x)
x=0
1
1 
= [3x + 2(2 − x)]
21 x=0
1
1 
Brain power =
21 x=0
3
(x + 4) By 2020, wind could provide one-tenth of our planet’s
electricity needs. Already today, SKF’s innovative know-
how is crucial to running a large proportion of the
= world’s wind turbines.
7 Up to 25 % of the generating costs relate to mainte-
nance. These can be reduced dramatically thanks to our
systems for on-line condition monitoring and automatic
Aliter lubrication. We help make it more economical to create
Using the alternative formula in Theorem 2.3, cleaner, cheaper energy out of thin air.
By sharing our experience, expertise, and creativity,
2
 industries can boost performance beyond expectations.
P (Z = 2) = p(2 − y, y) Therefore we need the best employees who can
y=0 meet this challenge!
2

= p(2 − y, y), since The=Power
p(2, 0) 0 of Knowledge Engineering
y=1
2
1 
= [3(2 − y) + 2y]
21 y=1

Plug into The Power of Knowledge Engineering.


Visit us at www.skf.com/knowledge

54
y=2

(b) (i) From the table obtained in (a), that is, calculating the probability
ADVANCED TOPICS IN INTRODUCTORY
from first
PROBABILITY: principle,
A FIRST COURSEwe IN
have SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
3
P (Z = 2) =
7
(ii) By Theorem 2.3 (that is, using the results in (a)(ii)) we have
1

P (Z = 2) = p(x, 2 − x)
x=0
1
1 
= [3x + 2(2 − x)]
21 x=0
1
1 
= (x + 4)
21 x=0
3
=
7

Aliter
Using the alternative formula in Theorem 2.3,
2

P (Z = 2) = p(2 − y, y)
y=0
2

= p(2 − y, y), since p(2, 0) = 0
y=1
2
Sums, Differences, Products 1  55
= and Quotients of 2y]
[3(2 − y) + Bivariate Distributions
21 y=1
2
1 
= (6 − y)
21 y=1
3
=
7

Theorem 2.4 CONVOLUTION THEOREM


(Discrete Case)
Suppose that X and Y are independent nonnegative discrete ran-
dom variables with marginal probability distributions g(x) and h(y)
respectively. Let Z be the sum of X and Y . Then the convolution
of g and h is the distribution of the sum Z given by
z

P (Z = z) = g(x)h(z − x) z≥0
x=0

or equivalently,
z

P (Z = z) = g(z − y)h(y) z≥0
y=0

called the convolution of g and h

Proof
Since X and Y are independent, the proof follows by writing
P (X = x, Y = z − x)
as
P (X = x)P (Y = z − x)
in Theorem 2.3.
55
Theorem 2.4 suggests a method of obtaining probability mass function when
z

P (Z = z) = g(z − y)h(y) z≥0
ADVANCED TOPICS IN INTRODUCTORY
y=0
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
called the convolution of g and h

Proof
Since X and Y are independent, the proof follows by writing
P (X = x, Y = z − x)
as
P (X = x)P (Y = z − x)
in Theorem 2.3.

Theorem 2.4 suggests a method of obtaining probability mass function when


X + Y are independent.
56 Advanced Topics in Introductory Probability
When we have two one-dimensional probability mass functions g and
h, the convolution formula
is a one-dimensional probability
z mass function. Thus the probability func-
tion
56 of the sum ofPtwo
(Z independent
= z) = random
g(x)h(z
Advanced variables
− x),
Topics is the
z≥ 0 convolution
in Introductory
4 of
Probability
the individual probability functions.
x=0

is
Notea one-dimensional probability mass function. Thus the probability func-
tion of the sum of two independent random variables is the convolution4 of
(a) Theorem 2.3 also expresses convolution;
the individual probability functions.
(b) The convolution of g and h is denoted as g ∗ h.
Note
Since
(a) Theorem 2.3 also expresses convolution;
X +Y =Y +X
(b) The convolution of g and h is denoted as g ∗ h.
it is easy to verify that
Since z

P (Z = z)
X +=Y = Y g(x)h(z
+ X − x)
x
it is easy to verify that z

= z
g(z − y)h(y)

y
P (Z = z) = g(x)h(z − x)
that is, g ∗ h = h ∗ g. x
z

Example 2.5 = g(z − y)h(y)
y
Suppose that X and Y are independent discrete random variables having
that is, g ∗ hdistributions
probability = h ∗ g.

Example 2.5 e−λ λx e−θ θy


pX (x) = and pY (y) =
Suppose that X and Y are independent
x! discrete random y! variables having
probability distributions
respectively, where x = 0, 1, 2, . . . and y = 0, 1, 2, . . . Find
(a) the distribution of thee−λ λx X +pY.(y) = e θ
−θ y
pX (x) = sum Z = and Y
x! y!
(b) P (X + Y = 4), if λ = 1 and θ = 2.
respectively, where x = 0, 1, 2, . . . and y = 0, 1, 2, . . . Find
Solution
(a) the distribution of the sum Z = X + Y.
The joint probability distribution of X and Y is given by
(b) P (X + Y = 4), if λ = 1 and θ = 2.
e−λ λx e−θ θy
p(x, y) = ·
Solution x! y!
The
4 joint probability distribution of X and Y is given by
Some writers prefer to use the French word Composition or even the German equiva-
lent Faltung
e−λ λx e−θ θy
p(x, y) = ·
x! y!
4
Some writers prefer to use the French word Composition or even the German equiva-
lent Faltung
56
x! y!
respectively, where x = 0, 1, 2, . . . and y = 0, 1, 2, . . . Find
ADVANCED TOPICS IN INTRODUCTORY
(a) the distribution
PROBABILITY: of the sum
A FIRST COURSE IN Z = X + Y. SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
(b) P (X + Y = 4), if λ = 1 and θ = 2.

Solution
The joint probability distribution of X and Y is given by
Sums, Differences, Products and Quotients
e−λ λx of
e−θBivariate
θy Distributions 57
p(x, y) = ·
x! y!
(a)
4 Bywriters
Some Theorem 2.4
prefer to use the French word Composition or even the German equiva-
lent Faltung z
 e−λ λx e−θ θz−x
P (Z = z) = ·
x=0
x! (z − x)!

(Note that the summation goes from x = 0 to x = z, since x cannot


exceed z = x + y).
z
 e−(λ+θ) λx θz−x
P (Z = z) =
x=0
x!(z − x)!
z
 λx θz−x
= e−(λ+θ)
x=0
x!(z − x)!
z
e−(λ+θ)  z!
= λx θz−x
z! x=0 x!(z − x)!

Identifying the sum as the binomial expansion of (λ + θ)z we obtain

e−(λ+θ) (λ + θ)z
P (Z = z) = , z = 0, 1, 2, ...
z!
(b) When λ = 1 and θ = 2

e−(1+2) (1 + 2)4
P (Z = 4) =
4!
Trust and responsibility
e−3 (3)4
=
4!
NNE and Pharmaplan have joined forces to create – You have to be proactive and open-minded as a
NNE Pharmaplan, the world’s leading engineering newcomer and make it clear to your colleagues what
and consultancy company focused entirely on the you are able to cope. The pharmaceutical field is new
Example 2.5 biotech
pharma and showsindustries.
that, if X and Y are independent random
to me. But busy asvariables
they are, most of my colleagues
having Poisson distribution with parameters λ and θ firespectively thenme,
nd the time to teach their
and they also trust me.
Inés Aréizaga Esteva (Spain), 25 years old Even though it was a bit hard at first, I can feel over
sum hasEducation:
a Poisson distribution with parameter λ + time
Chemical Engineer
θ. That is, the sum of
that I am beginning to be taken seriously and
the two independent Poisson random variables is again that mya contribution
Poisson random
is appreciated.
variable. The ideas here could be used to prove more generally by induction
that the sum of a finite Poisson random variables also has a Poisson distri-
bution, a result we proved earlier using the moment generating function in
my book “Introductory Probability Theory” (Nsowah-Nuamah, 2017).

Note
In general, if two independent random variables follow the same type of

NNE Pharmaplan is the world’s leading engineering and consultancy company


focused entirely on the pharma and biotech industries. We employ more than
1500 people worldwide and offer global reach and local knowledge along with
our all-encompassing list of services. nnepharmaplan.com

57
x=0

Identifying
ADVANCED TOPICS the sum as the binomial
IN INTRODUCTORY expansion of (λ + θ)z we obtain
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
−(λ+θ)
e (λ + θ)z
P (Z = z) = , z = 0, 1, 2, ...
z!
(b) When λ = 1 and θ = 2

e−(1+2) (1 + 2)4
P (Z = 4) =
4!

e−3 (3)4
=
4!

Example 2.5 shows that, if X and Y are independent random variables


having Poisson distribution with parameters λ and θ respectively then their
sum has a Poisson distribution with parameter λ + θ. That is, the sum of
the two independent Poisson random variables is again a Poisson random
variable. The ideas here could be used to prove more generally by induction
that the sum of a finite Poisson random variables also has a Poisson distri-
bution, a result we proved earlier using the moment generating function in
my book “Introductory Probability Theory” (Nsowah-Nuamah, 2017).
58 Advanced Topics in Introductory Probability
Note
In general, if two independent random variables follow the same type of
distribution, it is not necessarily true that their sum follows the same type
of distribution (see Example 2.7 in the sequel.)

2.2.2 Sums of Continuous Bivariate Random Variables

The formulas for finding sums of continuous random variables are similar to
those in the discrete case by replacing the sum by the integral. However,
the actual computation is more complex with the continuous case.

Theorem 2.5
Suppose that X and Y are continuous random variables having joint
density function f (x, y). Let Z = X + Y and denote the density
function of Z by s(z). Then
 ∞
s(z) = f (x, z − x) dx
−∞

or equivalently,  ∞
s(z) = f (z − y, y) dy
−∞

Proof
We shall use the cumulative distribution function technique. Let S denote
the c.d.f. of the random variable Z. Then
 
S(z) = P (Z ≤ z) = P (X + Y ≤ z) = f (x, y) dx dy
R

where R is the part of the region over which x + y < z, shown shaded in the
Figure 2.1.
To represent this integral as an iterated integral, we fix x and then
integrate with respect to y from z − x to −∞. Next integrate with respect
to x from −∞ to ∞. Thus
 ∞  z−x 58 
S(z) = f (x, y) dy dx
 
S(z) = P (Z ≤ z) = P (X + Y ≤ z) = f (x, y) dx dy
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN R SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
where R is the part of the region over which x + y < z, shown shaded in the
Figure 2.1.
To represent this integral as an iterated integral, we fix x and then
integrate with respect to y from z − x to −∞. Next integrate with respect
to x from −∞ to ∞. Thus
 ∞  z−x 
S(z) = and Quotientsf of
Sums, Differences, Products (x,Bivariate
y) dy dxDistributions 59
x=−∞ y=−∞

Let u = x+y; for fixed x, du = dy. If x is fixed then when y = z−x, u = z.


Thus  ∞  z 
S(z) = f (x, u − x) du dx
x=−∞ u=−∞

We assume that f (x, y) has continuous partial derivative. Therefore we


exchange the order of integration and obtain
 z  ∞ 
S(z) = f (x, u − x) dx du
u=−∞ x=−∞

Fig. 2.1 Region x + y < z is shaded

Differentiating S(z) with respect to z using the fundamental theorem of


calculus will yield the density function of the random variable Z. Hence,
 z  ∞  
d S(z) d
= f (x, u − x) dx du
dz dz u=−∞ x=−∞
 ∞
s(z) = f (x, z − x) dx
−∞

Reversing the roles played by X and Y , we obtain,


 ∞
s(z) = f (z − y, y) dy
−∞

Example 2.6
Refer to Example 1.3.
60 Advanced Topics in Introductory Probability
(a) Find the distribution of Z = X + Y ;
(b) Calculate P (Z ≤ 2) using
(i) the distribution of Z;
(ii) the joint distribution of (X, Y ), that is, from the first principle.

Solution
(a) Z = X + Y =⇒ Y = Z − X
Therefore
59
0 < Y < 2 =⇒ 0 ≤ Z − X ≤ 2 =⇒ X ≤ Z ≤ X + 2
(b) Calculate P (Z ≤ 2) using
ADVANCED TOPICS IN INTRODUCTORY
(i) theA distribution
PROBABILITY: FIRST COURSE ofINZ; SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
(ii) the joint distribution of (X, Y ), that is, from the first principle.

Solution
(a) Z = X + Y =⇒ Y = Z − X
Therefore

0 < Y < 2 =⇒ 0 ≤ Z − X ≤ 2 =⇒ X ≤ Z ≤ X + 2

To determine the region of integration we sketch below the preceding


inequality and 0 < X < 1.

It can be seen from the sketch that the region of integration is parti-
tioned into three, namely,

0 ≤ X ≤ Z when 0 < Z ≤ 1
0 ≤ X ≤ 1 when 1 < Z ≤ 2
Z − 2 ≤ X ≤ 1 when 2 < Z ≤ 3

For 0 < Z ≤ 1,
  
z x(z − x) 7 3
s(z) = x2 + dx = z
x=0 3 18
For 1 < Z ≤ 2,

This e-book
  
1 x(z − x) 2 z
s(z) = x +
2
dx = +
x=0 3 9 6

is made with SETASIGN


SetaPDF

PDF components for PHP developers

www.setasign.com

60
tioned into three, namely,

0≤X≤Z
ADVANCED TOPICS IN INTRODUCTORY when 0 < Z ≤ 1
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
0≤X≤1
PROBABILITY THEORY – VOLUME III
when 1 < Z ≤ 2QUOTIENTS OF BIVARIATE DISTRIBUTIONS
Z − 2 ≤ X ≤ 1 when 2 < Z ≤ 3

For 0 < Z ≤ 1,
  
z x(z − x) 7 3
s(z) = x +
2
dx = z
x=0 3 18
Sums,For 1 < Z ≤ Products
Differences, 2, and Quotients of Bivariate Distributions 61
Sums, Differences, Products and Quotients of Bivariate

 1and Quotients of Bivariate Distributions Distributions 61
Sums, Differences, Products x(z − x) 2 z 61
s(z) = x +
2
dx = +
Finally, for 2 < Z ≤ x=0 3, 3 9 6
Finally, for 2 < Z ≤ 3,
Finally, for
 1 2 <Z ≤ 3,
x(z − x) 

1  
  2
s(z) =  1
 x2 + x(z − x)  dx = 1 −7z 33 + 36z 22 − 57z + 36
s(z) = x=z−2
1 x + x(z 3− x) dx = 18 1 −7z + 36z − 57z + 36
s(z) = x=z−2 x2 + 3 dx = 18 −7z 3 + 36z 2 − 57z + 36
x=z−2 3 18
Therefore, the distribution of Z is
Therefore, the distribution of Z is
Therefore, the distribution
 7 of Z is
 7 z3,

 0<z≤1


 18

 7 z 33 , 0<z≤1


 1818 z , 0<z≤1

 2 z


 2 + z,

 1<z≤2
s(z) =  
 9
2 + z6 , 1<z≤2
s(z) =  

 9 + 6, 1<z≤2
s(z) =  
 91 6 



 1  −7z 33 + 36z 22 − 57z + 36 , 2 < z ≤ 3

 18
 1 −7z 3 + 36z 2 − 57z + 36 , 2 < z ≤ 3

 18 −7z + 36z − 57z + 36 , otherwise
 0, 2<z≤3



 0,
 18 otherwise
0, otherwise
(b) (i) Using the distribution of Z we calculate the required probability
(b) (i) Using the distribution of Z we calculate the required probability
as follows:
(b) (i) Using the distribution of Z we calculate the required probability
as follows:
as follows:
P (X ≤ 2) = P (0 < z ≤ 1) + P (1 < z < 2)
P (X ≤ 2) =  P (01 <7z ≤ 1) +P 2(1 <  z < 2)
P (X ≤ 2) = P (0 < z ≤ 1) + P (1 < 2z < x2)
=  1 7 x dx +  2 
 
 2 + x  dx
= x=0 1 18
7 x dx + 2
x=1 9
2 x+ 6 dx
= x=0 418 1x dx  +x=1 9 +62 dx
x=07 z 18   1 x=1
2 9z 6
2
=  7 z 4 11 +  1  2 z + z 2 22
= 18
7 z44 + 31 32 z + z42 1
= 18 4 00 + 3 3 z + 4
4118 4 3 3 4 1
= 41 0 1
= 72 41
= 72
72
(ii) Now, using the joint distribution of (X, Y )
(ii) Now, using the joint distribution of (X, Y )
(ii) Now, using the joint distribution of (X, Y )
 
P (Z ≤ 2) =   f (x, y) dy dx
P (Z ≤ 2) =   f (x, y) dy dx
P (Z ≤ 2) = z<2 f (x, y)dy dx 
z<21 2−x xy
=  z<21  2−x x2 + xy  dy dx
x=0   3  dy dx
= 1 2−x x + xy
y=0
2
= x=0 y=0 x2 2+2−x 3 dy dx

1 y=0
x=0 xy 2 2−x3
=  1 x2 y + xy 2−x dx
2
= x=0 1 x y + xy 62 dx
= x=0  x2 y + 6 y=0 dx  
62 
1
x=0 Advanced
 6Topicsy=0
x in Introductory
2  Probability
x 
=  1 x2 (2 − x) + x (2 − x)2 − x2 + x  dx
2 y=0 2
= x=0 1 x2 (2 − x) + x 6 (2 − x)2 − x + x 6 dx
=  x=0 6
1  x (2 − x) + 1 (2 − x) − x +
2 6 dx 
x=0 6 6 2 x
= (2x − x ) + (4x − 4x + x ) − (x + ) dx
2 3 2 3
x=0 6 6
 1  1  1
2x2 4x4 1 4x2 x4 x3 x2
= − + 2x2 − + − +
3 4 0
6 3 4 0
3 12 0
41
=
72

Theorem 2.6 CONVOLUTION THEOREM


61
(Continuous Case)
x=0 6 6
 1  1  1
2x2 4x4 1 4x2 x4 x3 x2
=
ADVANCED TOPICS IN INTRODUCTORY − + 2x2 − + − +
3
PROBABILITY: A FIRST COURSE IN 4 0
6 3 4SUMS,
0
3 12 0 PRODUCTS AND
DIFFERENCES,
41
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
=
72

Theorem 2.6 CONVOLUTION THEOREM


(Continuous Case)

Suppose that X and Y are independent continuous random variables


having joint density function f (x, y) with marginal probability
distributions g(x) and h(y) respectively. Let Z = X + Y and denote
the p.d.f. of Z by s(z). Then
 ∞
s(z) = g(x)h(z − x) dx
−∞

or equivalently,  ∞
s(z) = g(z − y)h(y) dy
−∞

The theorem follows from Theorem 2.5 by noting that since X and Y are
independent:

f (x, z − x) = g(x)h(z − x)
f (z − y, y) = g(z − y)h(y)

The function s is called the convolution of the functions g and h, and the
expression in the Theorem is called the convolution formula.
If X and Y are nonnegative random variables, then the convolution
formula reduces to the following
 z
s(z) = g(x)h(z − x) dx
0

This isDifferences,
Sums, because each of g andand
Products h isQuotients
equal to 0offor negativeDistributions
Bivariate argument. 63

Example 2.7
Suppose that X and Y are independent and identically distributed random
variables having p.d.f.’s

fX (x) = λ e−λ x , x≥0

and
fY (y) = λ e−λ y , y≥0
respectively, find the distribution of the sum Z = X + Y .

Solution
 z
fZ (z) = λ e−λ t λ e−λ(z−t) dt
0
 z
= λ2 e−λ z dt
0
= λ2 z e−λ z , z≥0

Note

(a) See note after Theorem 2.3.

(b) This distribution is a gamma distribution62with parameters 2 and λ.


fZ (z) = λ e−λ t λ e−λ(z−t) dt
0
 z
ADVANCED TOPICS IN INTRODUCTORY
= λ2 e−λ z dt
PROBABILITY: A FIRST COURSE IN 0 SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III 2 −λ z QUOTIENTS OF BIVARIATE DISTRIBUTIONS
= λ ze , z≥0

Note

(a) See note after Theorem 2.3.

(b) This distribution is a gamma distribution with parameters 2 and λ.

2.3 DIFFERENCES OF RANDOM VARIABLES

2.3.1 Differences of Discrete Bivariate Random Variables


The probability distribution of the difference of two independent discrete
random variables X and Y may be considered as the probability distribution
of the sum of X and (−Y ). This is demonstrated in Table 2.2.

Table 2.2 Various Differences (X − Y ) and their Corresponding Proba-


bilities

xi − yj x1 − y1 x1 − y2 · · · x1 − ym x2 − y1 x2 − y2 · · ·
64p(xi , yj ) p(x1 , y1 ) ) · · · p(x
p(x1 , y2Advanced Topics
,
1 my )
in )
Introductory
p(x ,
2 1y 2 , y2 ) · · ·
Probability
p(x

··· x2 − ym xn − y1 xn − y2 ··· xn − ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )

To find the distribution of the difference, we similarly apply the principle of


the probabilities of equivalent events.
Similar to sums of random variables, we shall present the two cases for
the difference, namely, when X and Y are not independent and when they
are.

ExampleSharp
2.8 Minds - Bright Ideas!
For the data in Example 2.2, find the distribution of the difference of the
Employees at FOSS Analytical A/S are living proof of the company value - First - using The Family owned FOSS group is
random new
variables X and Y .
inventions to make dedicated solutions for our customers. With sharp minds and the world leader as supplier of
cross functional teamwork, we constantly strive to develop new unique products - dedicated, high-tech analytical
SolutionWould you like to join our team? solutions which measure and
control the quality and produc-
IndicateFOSS
theworks
various difference
diligently withand
with innovation their corresponding
development as basis for itsjoint probabilities
growth. It is tion of agricultural, food, phar-
reflected in the fact that more than 200 of the 1200 employees in FOSS work with Re- maceutical and chemical produ-
search & Development in Scandinavia and USA. Engineers at FOSS work in production,
i − yj
xdevelopment −1 − (−1) within
and marketing, −1a−wide 0 range 1 0−
−1 of−different (−1)
fields, 0−0
i.e. Chemistry, 0−1 cts. Main activities are initiated
from Denmark, Sweden and USA
p(x ,
i jy )
Electronics, 0.1
Mechanics, Software, 0.2
Optics, 0.11
Microbiology, 0.08
Chemometrics. 0.08 0.26 with headquarters domiciled in
Hillerød, DK. The products are
We offer marketed globally by 23 sales
2 − (−1) 2 − 0 2 − 1
A challenging job in an international and innovative company that is leading in its field. You will get the companies and an extensive net
0.03 to work
opportunity 0.17 0.03
with the most advanced technology together with highly skilled colleagues. of distributors. In line with
the corevalue to be ‘First’, the
Read more about FOSS at www.foss.dk - or go directly to our student site www.foss.dk/sharpminds where
company intends to expand
you can learn more about your possibilities of working together with us on projects, your thesis etc.
Step 2 its market position.

Subtract the various values for Y from the various values for X:
Dedicated Analytical Solutions
FOSS
−y 0 69−1
xSlangerupgade −2 1 0 −1 3 2 1
p(x,
3400y) 0.1 0.2
Hillerød 0.11 0.08 0.02 0.26 0.03 0.17 0.03
Tel. +45 70103370

Step 3 www.foss.dk
By the principle of equality of the probabilities of equivalent events we shall
obtain the table below:

x−y −2 −1 0 1 2 3
63
p(x + y) 0.11 0.2 + 0.26 0.1 + 0.02 0.08 + 0.03 0.17 0.03
64 Advanced Topics in Introductory Probability
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
· · · x2 − yTHEORY
PROBABILITY m xn– − y1
VOLUME
xIIIn − y2 ··· xn − ym QUOTIENTS OF BIVARIATE DISTRIBUTIONS
· · · p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )

To find the distribution of the difference, we similarly apply the principle of


the probabilities of equivalent events.
Similar to sums of random variables, we shall present the two cases for
the difference, namely, when X and Y are not independent and when they
are.

Example 2.8
For the data in Example 2.2, find the distribution of the difference of the
random variables X and Y .

Solution
Indicate the various difference with their corresponding joint probabilities

xi − yj −1 − (−1) −1 − 0 −1 − 1 0 − (−1) 0−0 0−1


p(xi , yj ) 0.1 0.2 0.11 0.08 0.08 0.26

2 − (−1) 2−0 2−1


0.03 0.17 0.03

Step 2
Subtract the various values for Y from the various values for X:

x−y 0 −1 −2 1 0 −1 3 2 1
p(x, y) 0.1 0.2 0.11 0.08 0.02 0.26 0.03 0.17 0.03

Step 3
By the principle of equality of the probabilities of equivalent events we shall
obtain the table below:

x−y −2 −1 0 1 2 3
p(x + y) 0.11 0.2 + 0.26 0.1 + 0.02 0.08 + 0.03 0.17 0.03

Step 4Differences, Products and Quotients of Bivariate Distributions


Sums, 65
Present the final result as in the table below:

x−y −2 −1 0 1 2 3
p(x − y) 0.11 0.46 0.12 0.11 0.17 0.03

which can be presented as follows:

x−y −2 −1 0 1 2 3
p(x − y) 0.11 0.46 0.12 0.11 0.17 0.03

We can verify that this distribution of the sum is a probability distri-


bution. That is,
0 ≤ p(x − y) ≤ 1

and

p(x − y) = 1

64
We can
ADVANCED verifyINthat
TOPICS this distribution
INTRODUCTORY of the sum is a probability distri-
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
bution. That is,
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0 ≤ p(x − y) ≤ 1

and

p(x − y) = 1

Example 2.9
For the data in Example 2.3, find the distribution of the difference of the
random variables X and Y .

Solution
It has been shown in Example 2.3 that X and Y are independent. Therefor,
the various values of the sum {x + (−y)} and their corresponding probabil-
ities are given in the following table:

xi − yj −2 − 3 −2 − 6 4−3 4−6
p(xi , yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)

By the principle of equality of probabilities of equivalent events, we obtain


the distribution of the difference of X and Y, presented in the table below:

xi − yj −5 −8 1 −2
p(xi − yj ) 0.28 0.12 0.42 0.18
66 Advanced Topics in Introductory Probability

2.3.2 Differences of Continuous Bivariate Random Variables

Theorem 2.7
Suppose that X and Y are continuous random variables having joint
density function f (x, y) and let U = X − Y . Then
 ∞
fu (u) = f (x, x − u) dx
−∞

or equivalently,  ∞
fu (u) = f (u + y, y) dy
−∞

Corollary 2.1
Suppose that X and Y are independent nonnegative continuous random
variables having joint density function f (x, y) with marginal probability
distributions g(x) and h(y) respectively, and let U = X − Y . Then
 ∞
fu (u) = g(x) h(x − u) dx
0

or equivalently,  ∞
fu (u) = g(u + y) h(y) dy
0

Example 2.10
Refer to Example 1.3.
(a) Find the distribution of U = X − Y ;
65
 
1
fu (u) = g(x) h(x − u) dx
0
ADVANCED TOPICS IN INTRODUCTORY
or equivalently,
PROBABILITY: A FIRST COURSE IN
 ∞ SUMS, DIFFERENCES, PRODUCTS AND
fu (u) =III
PROBABILITY THEORY – VOLUME g(u + y) h(y) dy QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0

Example 2.10
Refer to Example 1.3.
(a) Find the distribution of U = X − Y ;
 
1
(b) Using the distribution in (a) calculate (i) P (U ≤ 1) (ii) P U < .
2

Solution
(a) X − Y = U =⇒ X = U + Y .
Therefore
Sums, Differences, Products and Quotients of Bivariate Distributions 67
0 < X < 1 =⇒ 0 ≤ U + Y ≤ 1 =⇒ −Y ≤ U ≤ 1 − Y

The region defined by the preceding inequality and 0 < Y < 2 is


sketched in the diagram below.

It can be seen from the sketch that the region of integration is parti-
tioned into three, namely,

−U ≤ Y ≤ 2 when −2 ≤ U < −1
−U ≤ Y ≤ 1 − U when −1 ≤ U < 0
0≤Y ≤1−U when 0 ≤ U ≤ 1

For −2 ≤ U < −1
  
2 7 4
fu (u) = u + u y + y 3 dy
2
−u 3 3
5 3 14 32
= u + 2 u2 + u+
18 3 9

For −1 ≤ U < 0
 1−u  7 4

fu (u) = u2 + u y + y 2 dy
−u 3 3
4 u
= −
9 6

For 0 ≤ U ≤ 1
 1−u 7 4

fu (u) = u + u y + y2 2
dy
0 3 3
4 u 5 3
= − − u
9 6 18

66
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY
68 THEORY – VOLUME Advanced
III QUOTIENTS
Topics in Introductory OF BIVARIATE DISTRIBUTIONS
Probability

Therefore

 5 3 14 32


 u + 2 u2 + u + , −2 ≤ u < −1



18 3 9





 4 u

 − , −1 ≤ u < 0
fu (u) = 9 6



 4 u 5 3

 − − u , 0≤u≤1



 9 6 18





0, elsewhere

(b)

(i) P (U ≤ 1) = P (−2 ≤ U ≤ 1)
= P (−2 ≤ U < 1) + P (−1 ≤ U < 0) + P (0 ≤ U ≤ 1)

 −1  
5 14 32
= u + 2u + 3
u+ 2
du
−2 18 3 9
 0    1 
4 u 4 u 5 3
+ − du + − − u du
−1 9 6 0 9 6 18
13 19 21
= + +
72 36 72
= 1
 
1
(ii) P U≤ = P (−2 ≤ U ≤ 1) + P (−1 ≤ U < 0)
2
 
1
+P 0 ≤ U ≤
2
 −1  
5 14 32
= u3 + 2u2 + u+ du
−2 18 3 9
 0    1  
u 4 2 4 u 5
+ du +
− − − u3 du
−1 9 6 0 9 6 18
13 19 227
= + +
72 36 1152
1043
=
1152

67
4 u 4 u 5 3
+ − du + − − u du
−1 9 6 0 9 6 18
ADVANCED TOPICS IN INTRODUCTORY
13 IN19 21
=
PROBABILITY: A FIRST COURSE+ + SUMS, DIFFERENCES, PRODUCTS AND
36
72 III
PROBABILITY THEORY – VOLUME 72 QUOTIENTS OF BIVARIATE DISTRIBUTIONS
= 1
 
1
(ii) P U≤ = P (−2 ≤ U ≤ 1) + P (−1 ≤ U < 0)
2
 
1
+P 0 ≤ U ≤
2
 −1  
5 3 14 32
= u + 2u + u +
2
du
−2 18 3 9
 0    1  
u 4 2 4 u 5
+ − du + − − u3 du
−1 9 6 0 9 6 18
13 19 227
= + +
72 36 1152
1043
=
1152
Sums, Differences, Products and Quotients of Bivariate Distributions 69

2.4 PRODUCTS OF BIVARIATE RANDOM VARIABLES

2.4.1 Product of Discrete Bivariate Random Variables

The probability distribution of the product of two discrete random variables


X and Y whose joint probability functions are in tabular form can be calcu-
lated using the principle of the equality of the probabilities of two equivalent
events. This is demonstrated in Table 2.3.

Table 2.3 Various Products XY and their Corresponding Probabilities

xi yj x1 y1 x1 y2 ··· x1 ym x2 y1 x2 y2 ···
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···

··· x2 ym xn y1 xn y2 ··· xn ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )

Example 2.11
For the data in Example 2.2, find the distribution of the product of the
random variables X and Y .

Solution
Step 1
Indicate the various products with their corresponding joint probabilities

xi yj −1(−1) −1(0) −1(1) 0(−1) 0(0) 0(1)


p(xi , yj ) 0.1 0.2 0.11 0.08 0.08 0.26

2(−1) 2(0) 2(1)


0.03 0.17 0.03

Step 2
Multiply the various values for X by the various values for Y :

68
p(xi , yj ) 0.1 0.2 0.11 0.08 0.08 0.26
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
2(−1) 2(0) 2(1)
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0.03 0.17 0.03

70
Step 2 Advanced Topics in Introductory Probability
70 Advanced Topics in Introductory Probability
Multiply the various values for X by the various values for Y :
xy 1 0 −1 0 0 0 −2 0 2
xyy)
p(x, 1
0.1 0
0.2 −1
0.11 0
0.08 0
0.02 0
0.26 −2
0.03 0
0.17 2
0.03
p(x, y) 0.1 0.2 0.11 0.08 0.02 0.26 0.03 0.17 0.03
Step 3
Step
By the3 principle of equality of the probabilities of equivalent events we shall
By the the
obtain principle
table of equality of the probabilities of equivalent events we shall
below:
obtain the table below:
xy −2 −1 0 1 2
xy
p(xy) −2
0.03 −1
0.11 0 + 0.26 + 0.17
0.2 + 0.08 + 0.02 1
0.1 2
0.03
p(xy) 0.03 0.11 0.2 + 0.08 + 0.02 + 0.26 + 0.17 0.1 0.03
Step 4
Step 4 the final result as in the table below:
Present
Present the final result as in the table below:
xy −2 −1 0 1 2
xy
p(xy) −2
0.03 −1
0.11 0
0.73 1
0.1 2
0.03
p(xy) 0.03 0.11 0.73 0.1 0.03
We can verify that this distribution of the sum is a probability distri-
WeThat
bution. can verify
is, that this distribution of the sum is a probability distri-
bution. That is, 0 ≤ p(xy) ≤ 1
0 ≤ p(xy) ≤ 1
and 
and  p(xy) = 1
p(xy) = 1

Example 2.12
Example 2.12in Example 2.3, find the distribution of the product of X and
For the data
For
Y. the data in Example 2.3, find the distribution of the product of X and
Y.
Solution
Solution
It has been shown in Example 2.3 that X and Y are independent. Therefor,
It has
the been shown
various values in
of Example 2.3 that
the product X and
xy and are independent.
Y corresponding
their Therefor,
probabilities
the various values of the product
are given in the following table: xy and their corresponding probabilities
are given in the following table:
xi yj −2(3) −2(6) 4(3) 4(6)
−2(3) 0.4(0.3)
xi y, yj ) 0.4(0.7)
p(x 4(3)
−2(6) 0.6(0.7) 4(6)
0.6(0.3)
i j
p(xi , yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
Sums, Differences, Products and Quotients of Bivariate Distributions 71
By the principle of the equality of probabilities of equivalent events we obtain
By 71
the the principle of the equality of probabilities
Sums, Differences, Products and Quotients of of equivalent
Bivariate events we obtain
Distributions
result summarised in the following table.
the result
xi ysummarised
j −6 in the following
−12 12 24table.
x iyyj ) 0.28
p(x
i j −6 0.12 −12 0.42
12 0.1824
p(xi yj ) 0.28 0.12 0.42 0.18

Example 2.13
Suppose there
Example 2.13 are two independent distributions:
Suppose there are two independent distributions:

x −2 4
g(x)
x 0.4
−2 0.6
4
g(x) 0.4 0.6

y −2 4
h(y)
y 0.4
−2 0.6
4 69
x −2 4
ADVANCED
g(x) 0.4 IN0.6
x TOPICS
−2 4INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY 0.4 0.6
g(x) THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS

y −2 4
h(y)
y 0.4
−2 0.6
4
h(y) 0.4 0.6

Find the distribution of (a) X 2 (b) XY.


Find the distribution of (a) X 2 (b) XY.
Solution
Solution

x2 4 16
(a)
x2 ) 0.4 0.6
p(x2
4 16
(a)
p(x2 ) 0.4 0.6

xy −2(−2) −2(4) 4(−2) 4(4)


(b)
p(x,
xy y) 0.4(0.4)
−2(−2) 0.4(0.6)
−2(4) 0.6(0.4)
4(−2) 0.6(0.6)
4(4)
(b)
p(x, y) 0.4(0.4) 0.4(0.6) 0.6(0.4) 0.6(0.6)

xy 8 −8 −8 16
p(x,
xy y) 0.16
8 0.24
−8 0.24
−8 0.36
16
p(x, y) 0.16 0.24 0.24 0.36

xy 4−8 16
72 0.16
0.48 Advanced Topics in Introductory Probability
p(xy)
xy −8 0.36
4 16
p(xy) 0.16 0.48 0.36
2.4.2 Product of Continuous Bivariate Random Variables

Theorem 2.8
Suppose that X and Y are continuous random variables having joint
density function f (x, y) and let V = XY . Then
 ∞    
1 v
fv (v) = f x, dx
−∞ |x| x

or equivalently,
 ∞    
1 v
fv (v) = f ,y dy
−∞ |y| y

Corollary 2.2
Suppose that X and Y are independent, nonnegative continuous random
variables having joint density function f (x, y) with marginal probability
distributions g(x) and h(y) respectively and let V = XY . Then
 ∞   
1 v
fv (v) = h g(x) dx, y>0
0 x x

or equivalently,
 ∞   
1 v
fv (v) = g h(y) dy, x>0
0 y y

70
We can calculate the probability P (XY ≤ t) for the independent, nonnega-
or equivalently,
ADVANCED TOPICS IN INTRODUCTORY
    
PROBABILITY: A FIRST COURSE∞IN 1 v SUMS, DIFFERENCES, PRODUCTS AND
fv (v)
PROBABILITY THEORY =
– VOLUME III g h(y) dy, x > 0QUOTIENTS OF BIVARIATE DISTRIBUTIONS
0 y y

We can calculate the probability P (XY ≤ t) for the independent, nonnega-


tive continuous random variables X and Y thus
 
P (XY ≤ t) = g(x)h(y) dx dy
0<xy≤t
 t  ∞     
1 v
= g h(y) dy dv
0 −∞ y y
 t
Sums, Differences, Products fv (v) dv of Bivariate Distributions
= and Quotients 73
0

The integration is restricted to the first quadrant, since g(x)h(y), the joint
density function of X and Y, are zero elsewhere.

Example 2.14
Refer to Example 1.3.
(a) Find the distribution of V = XY ;
 
1
(b) Using the distribution in (a), calculate P ≤U ≤1 .
2

Solution
V
(a) XY = V =⇒ X = .
Y
Therefore
V
0 < X < 1 =⇒ 0 ≤ ≤ 1 =⇒ 0 ≤ V ≤ Y
Y
The region defined by the preceding inequality and 0 < Y < 2 is
sketched in the diagram below.

It can be seen from the sketch that the region of integration is


V ≤ Y ≤ 2 when 0 < V ≤ 2

Therefore
  
2 v2 1 v
h(u) = + dy
y=v y3 3 y

71
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
74 Advanced Topics in Introductory Probability

 
v 1 2
= ln y −
3 2 y2 v
 
v 2 1 v2
= ln + − , 0<v≤2
3 v 2 8

(b) Now,
   1   
1 v 2 1 v2
P ≤V ≤1 = ln + − dv
2 1
2
3 v 2 8
  1
v v3 v2 2
= − + 2 + ln
2 24 6 v 1
2
= 0.3338

2.5 QUOTIENTS OF BIVARIATE RANDOM VARIABLES

2.5.1 Quotient of Discrete Bivariate Random Variables

The probability distribution of the quotient of two discrete random variables


X
X and Y ( ) whose joint probability functions are in tabular form can be
Y
calculated using the principle of the equality of the probabilities of two
equivalent events. This is demonstrated in Table 2.4.

X
Table 2.4 Various Quotient and their Corresponding Probabilities
Y

xi x1 x1 x1 x2 x2
yj The Wake
p(xi , yj )
y1
p(x1 , y1 )
y2
p(x1 , y2 )
···
···
ym
p(x1 , ym )
y1
p(x2 , y1 )
y2
p(x2 , y2 )
···
···
the only emission we want to leave behind
x2 xn xn xn
··· ···
ym y1 y2 ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )

Suppose X and Y are two independent variables. Then the distribution of

.QYURGGF'PIKPGU/GFKWOURGGF'PIKPGU6WTDQEJCTIGTU2TQRGNNGTU2TQRWNUKQP2CEMCIGU2TKOG5GTX

6JGFGUKIPQHGEQHTKGPFN[OCTKPGRQYGTCPFRTQRWNUKQPUQNWVKQPUKUETWEKCNHQT/#0&KGUGN6WTDQ
2QYGTEQORGVGPEKGUCTGQHHGTGFYKVJVJGYQTNFoUNCTIGUVGPIKPGRTQITCOOGsJCXKPIQWVRWVUURCPPKPI
HTQOVQM9RGTGPIKPG)GVWRHTQPV
(KPFQWVOQTGCVYYYOCPFKGUGNVWTDQEQO

72
The probability distribution of the quotient of two discrete random variables
X
ADVANCED TOPICS IN INTRODUCTORY
X and Y ( A) FIRST
PROBABILITY:
whoseCOURSE
joint probability
IN
functions are in tabular form can be
SUMS, DIFFERENCES, PRODUCTS AND
Y
calculated using
PROBABILITY the– principle
THEORY VOLUME IIIof the
equality of the probabilities of BIVARIATE
QUOTIENTS OF two DISTRIBUTIONS
equivalent events. This is demonstrated in Table 2.4.

X
Table 2.4 Various Quotient and their Corresponding Probabilities
Y

xi x1 x1 x1 x2 x2
··· ···
yj y1 y2 ym y1 y2
p(xi , yj ) p(x1 , y1 ) p(x1 , y2 ) ··· p(x1 , ym ) p(x2 , y1 ) p(x2 , y2 ) ···

x2 xn xn xn
··· ···
ym y1 y2 ym
··· p(x2 , ym ) p(xn , y1 ) p(xn , y2 ) ··· p(xn , ym )

Sums, Differences, Products and Quotients of Bivariate Distributions 75


Suppose X and Y are two independent variables. Then the distribution 75
Sums, Differences, Products and Quotients of Bivariate Distributions of
X
X , Y = 0 can, as before, be calculated using the principle of equality of
Y , Y = 0 can, as before, be calculated using the principle of equality of
probabilities
Y of equivalent events.
probabilities of equivalent events.

Example 2.15
Example
Consider 2.15
the distributions of two independent random variables X and Y .
Consider the distributions of two independent random variables X and Y .

x 2 5
x
g(x) 2
0.2 5
0.8
g(x) 0.2 0.8

y 4 8
y
h(y) 4
0.7 8
0.3
h(y) 0.7 0.3

X Y
Find the distribution of X and Y.
Find the distribution of Y and X.
Y X
Solution
Solution
Proceeding as before:
Proceeding as before:

x/y 2/4 2/8 5/4 5/8


x/yy)
p(x, 2/4
0.2(0.7) 2/8
0.2(0.3) 5/4
0.8(0.7) 5/8
0.8(0.3)
p(x, y) 0.2(0.7) 0.2(0.3) 0.8(0.7) 0.8(0.3)

X
The distribution of X is thus
The distribution of Y is thus
Y

x/y 1/2 1/4 5/4 5/8


x/y
p(x/y) 1/2
0.14 1/4
0.06 5/4
0.56 5/8
0.24
p(x/y) 0.14 0.06 0.56 0.24

Y
The distribution of Y follows similarly and is given in the table below:
The distribution of X follows similarly and is given in the table below:
X

y/x 2 4 4/5 8/5


y/x
p(y/x) 2
0.14 4
0.06 4/5
0.56 8/5
0.24
p(y/x) 0.14 0.06 0.56 0.24

73
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME Advanced
76 III QUOTIENTS
Topics in Introductory OF BIVARIATE DISTRIBUTIONS
Probability

2.5.2 Quotient of Continuous Bivariate Random Variables

Theorem 2.9
Suppose that X and Y are continuous random variables having joint
X
density function f (x, y) and let W = . Then
Y
 ∞
fw (w) = |y| f (wy, y) dy
−∞

Corollary 2.3
Suppose that X and Y are independent, nonnegative continuous random
variables having joint density function f (x, y) with marginal probability
X
distributions g(x) and h(y) respectively, and let W = . Then
Y
 ∞
fw (w) = y f (wy) h(y) dy
0

X
We can calculate the cumulative distribution function of for the indepen-
Y
dent, nonnegative random variables X and Y by
   
X
P ≤t = y f (wy)h(y) dy
Y
0 <x
y
≤t
0<y<∞
 t  ∞ 
= y f (wy) h(y) dy dw
0 0
 t
= fw (w) dw
0

Example 2.16
Refer to Example 1.3.
X
(a) Find the distribution of W = ;
Y
 
1 Distributions
Sums, Differences, Products and Quotients of Bivariate 77
(b) Using the distribution in (a), calculate P W ≤ .
4
Solution
X
(a) Distribution of W = :
Y
X
= W =⇒ X = Y W
Y
Therefore
1
0 < X < 1 =⇒ 0 ≤ Y W ≤ 1 =⇒ 0 ≤ W ≤
Y
The region defined by the preceding inequality and 0 < Y < 2 is
sketched in the diagram below.

74
Y
X
ADVANCED TOPICS IN INTRODUCTORY= W =⇒ X = Y W
Y
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
Therefore
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
1
0 < X < 1 =⇒ 0 ≤ Y W ≤ 1 =⇒ 0 ≤ W ≤
Y
The region defined by the preceding inequality and 0 < Y < 2 is
sketched in the diagram below.

It can be seen from the sketch that the region of integration is parti-
tioned into two, namely,
1
0 < Y ≤ 2 when 0 < W ≤
2
1 1
0<Y ≤ when ≤W <∞
W 2

 
w 2 1
h(w) = w2 + y 3 dy, 0≤w<
3 0 2
  1
w w 1
= w2 + y 3 dy, ≤w<∞
3 0 2
Therefore  w


 4(w2 + ), 0≤w< 1
 3 2
h(w) =

 1 1 w

 (w + ), 1
≤w<∞
4w3 3 2

Challenge the way we run

EXPERIENCE THE POWER OF


FULL ENGAGEMENT…

RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM

1349906_A6_4+0.indd 1 22-08-2014 12:56:57

75
  2
ADVANCED TOPICS IN INTRODUCTORY
w 1
h(w) = w2 + y 3 dy, 0≤w<
PROBABILITY: A FIRST COURSE IN 3 0 2
SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III  1
w w 1 QUOTIENTS OF BIVARIATE DISTRIBUTIONS
= 2
w + 3
y dy, ≤w<∞
3 0 2
Therefore  w


 4(w2 + ), 0≤w< 1
 3 2
h(w) =

 1 1 w
78 
Advanced
(w +Topics
), 1in≤Introductory
w<∞ Probability
4 w3 3 2

 
1
(b) Calculation of P W ≤ .
4
   
X 1 1
P ≤ = P W ≤
Y 4 4
 1  
4 w
= 4 w2 + dw
0 3
1
=
16

EXERCISES

2.1 Refer to Exercise 1.1. Find the distribution of


X
(a) X + Y (b) X − Y (c) XY (d)
Y
3X
(e) 2X + 3Y (f ) 5X − Y (g) 3X(4Y ) (h)
7Y
(i)X2 + 5Y (j) 3X (k) 3(X + Y ) (l) Y 3
(m) Y 2 − 2Y + 3X

2.2 Refer to Example 2.14. Find the distribution of

(a) the sum Z = X + Y ;


(b) W = 2X.

2.3 Refer to Exercise 1.6. Find

(a) the distribution of the sum Z = X + Y ;


(b) P (Z = 2) (i) using the distribution of Z, (ii) from the first
principle.

2.4 Refer to Exercise 1.7. Find

(a) the distribution of the sum Z = X + Y ;


(b) P (Z = 2) (i) using the distribution of Z, (ii) from the first
principle.
Sums, Differences, Products and Quotients of Bivariate Distributions 79
2.5 Refer to Exercise 1.9. Find

(a) the distribution of the sum Z = X + Y ;


(b) P (Z = 4) (i) using the distribution of Z, (ii) from the first
principle.
(c) Find P (X = x|X + Y = x + y)

2.6 Refer to Exercise 1.10. Find

(a) the distribution of the sum Z = X + Y ;


(b) P (Z = 4) (i) using the distribution of Z, (ii) from the first
principle. 76
(a) the distribution of the sum Z = X + Y ;
ADVANCED TOPICS IN INTRODUCTORY
(b) P (Z = 4) (i) using the
PROBABILITY: A FIRST COURSE IN
distribution of Z, (ii) SUMS,
from DIFFERENCES,
the first PRODUCTS AND
principle.
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
(c) Find P (X = x|X + Y = x + y)

2.6 Refer to Exercise 1.10. Find

(a) the distribution of the sum Z = X + Y ;


(b) P (Z = 4) (i) using the distribution of Z, (ii) from the first
principle.

2.7 Refer to Exercise 1.11, where k = 2. Find

(a) the distribution of the sum Z = X + Y ;


(b) P (Z ≤ 1) (i) using the distribution of Z, (ii) from the first
principle.
2
2.8 Refer to Exercise 1.12, where k = . Find
3
(a) the distribution of the sum Z = X + Y ;
(b) P (Z =≤ 1) (i) using the distribution of Z, (ii) from the first
principle.

2.9 Refer to Exercise 1.14. Find

(a) the distribution of the sum Z = X + Y ;


(b) P (Z ≤ 1) (i) using the distribution of Z, (ii) from the first
principle.
12
2.10 Refer to Exercise 1.13, where a = . Find
7
(a) the distribution of the sum Z = X + Y ;
(b) P (Z ≤ 1) (i) using the distribution of Z, (ii) from the first
principle.

2.11 If X and Y are independent Binomial random variables and X is b(n, p)


80 and Y is b(m, p), use theAdvanced
convolution theorem
Topics to show that
in Introductory X + Y is
Probability
b(n + m, p).
2.12 If X and Y are independent Geometric random variables each with
parameter p. Use the convolution theorem to show that X + Y is a
negative binomial N B(2, p).

2.13 If X and Y are independent Negative Binomial random variables and


X is b− (k1 , p) and Y is b− (k2 , p), use the convolution theorem to show
that X + Y is a negative binomial b− (k1 + k2 , p).

2.14 If X and Y are independent Gamma random variables and X is


Γ(s1 , λ) and Y is Γ(s2 , λ), use the convolution theorem to show that
X + Y is Γ(s1 + s2 , λ).

2.15 Suppose X and Y are independent Normal random variables and X is


N (µ1 , σ12 ) and Y is N (µ2 , σ22 ), use the convolution theorem to show
that X + Y is a N (µ1 + µ2 , σ12 + σ22 ).

2.16 Refer to Example 1.15. Compute P (X + Y > 2).

2.17 The moment generating function of X is given by


 
MX (t) = exp 2et − 2

and that of Y by
77
 10  10
1 t
2.15 Suppose X and Y are independent Normal random variables and X is
ADVANCED , σ12 ) and
N (µ1TOPICS Y is N (µ2 , σ22 ), use the
IN INTRODUCTORY convolution theorem to show
PROBABILITY:
that X A + FIRST
Y is aCOURSE
N (µ1 +INµ2 , σ12 + σ22 ). SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
2.16 Refer to Example 1.15. Compute P (X + Y > 2).

2.17 The moment generating function of X is given by


 
MX (t) = exp 2et − 2

and that of Y by
 10  10
1
MY (t) = 3et + 1
4

If X and Y are independent, find P (X + Y = 2).

2.18 Refer to Exercise 1.10. Find

(a) the distribution of the difference U = X − Y ;


(b) (i) P (U = 0); (ii) P (U = 1); (iii) P (U = 2).

2.19 Refer to Exercise 1.11. Find

(a) the distribution of the difference U = X − Y ;


(b) (i) P (U ≤ 12 ), (ii) P (− 12 ≤ U ≤ 12 ).

2.20 Refer to Exercise 1.10. Find


Sums, Differences, Products and Quotients of Bivariate Distributions 81
(a) the distribution of the product V = XY ;

(b) (i) P (V = 2), (ii) P (V = 3) (iii) P (V = 4), (iv) P (V = 6).

2.21 Refer to Exercise 1.11. Find

(a) the distribution of the product U = XY ;


(b) (i) P (U > 34 ), (ii) P ( 12 < U < 1), using the distribution of U,
Technical training on
2.22 Refer to Exercise 2.17. Find P (XY = 0).

2.23
WHAT you need, WHEN you need it
Refer to Exercise 1.10. Find
At IDC Technologies we can tailor our technical and engineering
(a) the distribution of the quotient
training workshops to suitW
your Y ;
= needs.
X We have extensive OIL & GAS
experience in training technical and X
engineering staff and ENGINEERING
(b) (i) P ( X
Y = 1); (ii) P (W = 2); (iii) P ( = 3).
have trained people in organisationsYsuch as General
ELECTRONICS
Motors,1.11.
2.24 Refer to Exercise Shell,Find
Siemens, BHP and Honeywell to name a few.
Our onsite training is cost effective, convenient and completely AUTOMATION &
(a) the distribution oftothe
customisable the quotient =XY ;
W engineering
technical and areas you want PROCESS CONTROL
covered. Our workshops are all comprehensive hands-on learning
(b) P (W = experiences
5). with ample time given to practical sessions and MECHANICAL
demonstrations. We communicate well to ensure that workshop content ENGINEERING
2.25 Refer to Exercise 1.12, match
and timing = 23 . Findskills, and abilities of the participants.
wherethek knowledge,
INDUSTRIAL
(a) the distribution
We run of P (X
onsite − Y );all year round and hold the workshops on
training DATA COMMS
your premises or a venue of your choice for your convenience.
(b) P (− 2 ≤ X − Y ≤ 1).
1
ELECTRICAL
For a no obligation proposal, contact us today POWER
2.26 Refer to Exercise 1.12, where k = 23 . Find
at training@idc-online.com or visit our website
for more information: www.idc-online.com/onsite/
(a) the distribution of XY ;
(b) P (XY ≤ 45 ). Phone: +61 8 9321 1702
Email: training@idc-online.com
2.27 Refer to Exercise 1.12, where k = 23 . Find
Website: www.idc-online.com
Y );
(a) the distribution of P ( X
(b) P ( X
Y < 10).

2.28 Refer to Exercise 1.13. Find


78
(a) the distribution of P (X − Y );
ADVANCED
Sums, TOPICS IN
Differences, INTRODUCTORY
Products and Quotients of Bivariate Distributions 81
PROBABILITY: A FIRST COURSE IN SUMS, DIFFERENCES, PRODUCTS AND
PROBABILITY THEORY – VOLUME III QUOTIENTS OF BIVARIATE DISTRIBUTIONS
(b) (i) P (V = 2), (ii) P (V = 3) (iii) P (V = 4), (iv) P (V = 6).

2.21 Refer to Exercise 1.11. Find

(a) the distribution of the product U = XY ;


(b) (i) P (U > 34 ), (ii) P ( 21 < U < 1), using the distribution of U,

2.22 Refer to Exercise 2.17. Find P (XY = 0).

2.23 Refer to Exercise 1.10. Find

(a) the distribution of the quotient W = Y ;


X

(b) (i) P ( X
Y = 1); (ii) P (W = 2); (iii) P ( X
Y = 3).

2.24 Refer to Exercise 1.11. Find

(a) the distribution of the quotient W = Y ;


X

(b) P (W = 5).

2.25 Refer to Exercise 1.12, where k = 23 . Find

(a) the distribution of P (X − Y );


(b) P (− 12 ≤ X − Y ≤ 1).

2.26 Refer to Exercise 1.12, where k = 23 . Find

(a) the distribution of XY ;


(b) P (XY ≤ 45 ).

2.27 Refer to Exercise 1.12, where k = 23 . Find

Y );
(a) the distribution of P ( X
(b) P ( X
Y < 10).

2.28 Refer to Exercise 1.13. Find

(a) the distribution of P (X − Y );


(b) P (− 12 ≤ X − Y ≤ 12 ).

2.29 Refer to Exercise 1.14. Find


82 Advanced Topics in Introductory Probability
(a) the distribution of XY ;
(b) P (XY ≤ 1).

2.30 Refer to Exercise 1.14. Find


 
(a) the distribution of P X
Y ;
 
(b) P X
Y ≤2 .

79
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS

Chapter 3

EXPECTATION AND VARIANCE


OF BIVARIATE DISTRIBUTIONS

3.1 INTRODUCTION

In the earlier text5 , we discussed the numerical characterisation of a single


random variable. Specifically, we discussed in Chapter 6 the concepts of
expectation and variance. In this chapter, we shall extend the concept of
expectation and variance to the case of bivariate distribution and discuss a
few more properties. We shall also realise that in this case, there arises the
concept of covariance.

Chapter
3.2
3
EXPECTATION OF BIVARIATE RANDOM VARIABLES

EXPECTATION AND VARIANCE


To refresh our minds, we recapitulate that the expectation of a discrete
random variable X is
OF BIVARIATE DISTRIBUTIONS
n 
E(X) = xi p(xi )
i=1

Example 3.1
3.1 INTRODUCTION
For the data in Example 2.3, find E(X) and E(Y ).

Solution
In the earlier text5 , we discussed the numerical characterisation of a single
For
84 convenience
random variable.weSpecifically,
present theAdvanced
data
we here.
Topics
discussed in in Introductory
Chapter Probability
6 the concepts of
expectation
5
and variance.
Nsowah-Nuamah, 2017 In this chapter, we shall extend the concept of
expectation and variance to thex case of −2bivariate
4 distribution and discuss a
i
83
few more properties. We shall also realise that in this case, there arises the
p(xi ) 0.4 0.6
concept of covariance.

3.2 E(X)
EXPECTATION OF= BIVARIATE
−2(0.4) + 4(0.6) = 1.6 VARIABLES
RANDOM

To refresh our minds, we recapitulate


yj 3 that6 the expectation of a discrete
random variable X is p(yj ) n0.7 0.3

E(X) = xi p(xi )
i=1
E(Y ) = 3(0.7) + 6(0.3) = 3.9
Example 3.1
For the data
3.2.1 in Exampleof2.3,
Expectation find E(X)
Functions ofand E(Y ). Bivariate Random
Discrete
Variables
Solution
For
The convenience
expectationwe
of present the data
any function of here.
discrete bivariate random variables,
H(X,
5
Y ), can be computed
Nsowah-Nuamah, 2017 either the

(a) parent joint probability distribution;


83 or

(b) derived joint probability distribution.


80
Expectation from Parent Joint Probability Distribution
yj )
p(y 3
0.7 6
0.3
j
p(yj ) 0.7 0.3
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
E(Y ) =III3(0.7)
PROBABILITY THEORY – VOLUME + 6(0.3) = 3.9 BIVARIATE DISTRIBUTIONS
E(Y ) = 3(0.7) + 6(0.3) = 3.9
3.2.1 Expectation of Functions of Discrete Bivariate Random
3.2.1 Expectation
Variables of Functions of Discrete Bivariate Random
Variables
The expectation of any function of discrete bivariate random variables,
The expectation
H(X, of any function
Y ), can be computed of discrete bivariate random variables,
either the
H(X, Y ), can be computed either the
(a) parent joint probability distribution; or
(a) parent joint probability distribution; or
(b) derived joint probability distribution.
(b) derived joint probability distribution.
Expectation from Parent Joint Probability Distribution
Expectation from Parent Joint Probability Distribution
The next theorem shows that we can find E(X), E(Y ) and E(XY ) of a
The next distribution
bivariate theorem shows(X, Ythat we canfrom
) directly findthe
E(X), E(Y
parent ) and
joint E(XY ) of
probability a
dis-
bivariate
tribution distribution
without first(X, Y ) directly
obtaining the from thejoint
derived parent joint probability
probability dis-
distribution
tribution without first obtaining
or the marginal distribution. the derived joint probability distribution
or the marginal distribution.

Definition 3.1
Definition 3.1a two-dimensional discrete random variable and let
Let (X, Y ) be
Let (X, Y ) be a two-dimensional discrete random variable and let
Z = H(X, Y ). Then
Z = H(X, Y ). Then
∞ 
 ∞
= Bivariate
E(Z) of
Expectation and Variance ∞  ∞ H(x
i , yj )p(xi , yj )
Distributions 85
Expectation and Variance = Bivariate
E(Z) of i=1 j=1 H(x i , yj )p(xi , yj )
Distributions 85
Expectation and Variance of Bivariate
i=1 j=1 Distributions 85
For the finite case,
For the finite case,
For the finite case, n 
m

E(Z) = n 
n 
m H(xi , yj ) p(xi , yj )
m
E(Z) = 
i=1 j=1 H(xi , yj ) p(xi , yj )
E(Z) = H(xi , yj ) p(xi , yj )
i=1 j=1
i=1 j=1
That is,
That is,
That is,
E[H(X, Y )] = H(x1 , y1 ) p(x1 , y1 ) + H(x1 , y2 ) p(x1 , y2 ) + · · ·
E[H(X, Y )] = H(x
+H(x 1 , y,1y) p(x 1 , y1,)y+)H(x
) p(x + H(x 1 , y2 ), yp(x 1 , y2 ), y+ ···
E[H(X, Y )] = H(x 1 , 1y1 )m p(x 1 , y11 ) +m H(x 1 , y2 ) 1 ) p(x
p(x 1 , y2 ) +1 )· · ·
+H(x1 ,, yym))p(x
+H(x p(x1 , ym ) ))++ H(x
· + H(x2 , y1 ) ,p(x
ym22),,p(xy1 ) , y )
2
1 2 m ) p(x21,,yy2m +· ·H(x 2 , y1 2) p(x y1 )2 m
+H(x
+ , y
· · · +2 ,H(x ) p(x , y ) + · · · + H(x , y ) p(x
p(x22n,,,yyym )
+H(x n , y21,)yp(x
y2 ) p(x 2) + n ,·y·1·)+ +H(xH(x2n, ,yym2)) p(x 2)
2 2 2 2 2 m
m
+ · · · + H(xn ,, yy1 ))p(x p(xnnn,,,yyy11m) )+ H(xn , y2 ) p(xn , y2 )
+ · · · + H(xn m 1 ) p(x ) + H(xn , y2 ) p(xn , y2 )
+ · · · + H(xn , ym ) p(xn , ym )
+ · · · + H(xn , ym ) p(xn , ym )
Example 3.2
Example
For 3.2
the data
Example 3.2 in Example 2.3, find the expectation of the function
For the data in Example 2.3, find the expectation of the function
For the data in Example 2.3, find the expectation of the function
H(X, Y ) = X 2 Y
H(X, Y ) = X 2 Y
H(X, Y ) = X 2 Y
Solution
Solution
Solution 2
E(X Y ) = {(x1 )2 (y1 )} p(x1 , y1 ) + {(x1 )2 (y2 )}p(x1 , y2 )
E(X 22 Y ) = +{(x
{(x )2 (y 2 1 )} p(x1 , y1 ) + {(x1 )2 (y
2
2 2 )}p(x1 , y2 )
E(X Y ) = {(x11 )22)(y(y 1 )}p(x
1 )} p(x1 ,2y, 1y)1 +) +{(x {(x 1 )2 )(y(y 2 )}p(x
2 )}p(x 1 ,2y, 2y)2 )
+{(x 22 )2 (y1 )}p(x2 , y1 ) +
2
(−2) 2(3)(0.28)
= +{(x +2(−2) 2 {(x2 )2 (y2 )}p(x
2
2 , y2 )
) (y1 )}p(x , y1 ) +(6)(0.12)
{(x2 ) (y+ (4)2 (3)(0.42)
2 )}p(x 2 , y2 )
= (−2)
+(4)2 (3)(0.28)
2
2
(6)(0.18) + (−2)2 (6)(0.12) + (4)2 (3)(0.42)
2 2
= (−2) (3)(0.28) + (−2) (6)(0.12) + (4) (3)(0.42)
+(4)22 (6)(0.18)
= 43.68
+(4) (6)(0.18)
= 43.68
= 43.68

Theorem 3.1 81
Theorem
Let (X, Y )3.1
be two-dimensional discrete random variable and let
= (−2)2 (3)(0.28) + (−2)2 (6)(0.12) + (4)2 (3)(0.42)
+(4)2 (6)(0.18)
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
= 43.68
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS

Theorem 3.1
Let (X, Y ) be two-dimensional discrete random variable and let
H(X, Y ) = XY . Then
n 
 m
E(XY ) = xi yj p(xi , yj )
86 Advanced Topics in Introductory Probability
i=1 j=1
86
86 Advanced
Advanced Topics in
Topics in Introductory
Introductory Probability
Probability
Example 3.3
Example
Refer
Example 3.3
to Example
3.3 2.2. Find E(XY ) from the parent joint probability dis-
Refer
Refer to Example 2.2.
to
tribution.Example 2.2. Find E(XY )) from
Find E(XY from the
the parent
parent joint
joint probability
probability dis-
dis-
tribution.
tribution.
Solution
Solution
The parent distribution of XY is given in the table below.
Solution
The
The parent
parent distribution
distribution of XY is
of XY is given
given in
in the
the table
table below.
below.
Y
X −1 Y
0Y 1 Row Totals
X
X
−1 −1
0.10
−1 00
0.20 11
0.11 Row Totals
Row0.41
Totals
−1
−1 0.10
(1)
0.10 0.20
(0)
0.20 0.11
(-1)
0.11 0.41
0.41
0 (1)
0.08
(1) (0)
0.02
(0) (-1)
0.26
(-1) 0.36
00 0.08
(0)
0.08 0.02
(0)
0.02 0.26
(0)
0.26 0.36
0.36
2 (0)
0.03
(0) (0)
0.17
(0) (0)
0.03
(0) 0.23
22 0.03
(-2)
0.03 0.17
(0)
0.17 0.03
(2)
0.03 0.23
0.23
Column Totals (-2)
(-2)
0.21 (0)
(0)
0.39 (2)
(2)
0.40 1.00
Column
Column Totals
Totals 0.21
0.21 0.39
0.39 0.40
0.40 1.00
1.00
The values in parentheses are for xy. Multiplying these values by their
The
The values
values in
corresponding
in parentheses
probabilities,
parentheses are for
we obtain
are xy.the
for xy. Multiplying these
these values
following table:
Multiplying values by
by their
their �e Graduate Programme
I joined
corresponding
MITAS because
probabilities, we obtain the following table: for Engineers and Geoscientists
corresponding probabilities, we obtain the following table:
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS
Y because for Engine
X I wanted
−1 0real 1responsibili�
Y
Y Ma
X
X
−1 −1
0.10
−1 00
0.00 11
-0.11
0
−1
−1 0.10
0.00
0.10 0.00 -0.11
0.00
-0.11
200 0.00
-0.06
0.00 0.00 0.00
0.06
0.00
22 -0.06
-0.06 0.00
0.00 0.06
0.06
Hence,
Hence,
Hence,
E(XY ) =  n
n 
n 
 m
m xi yj p(xi , yj )
m
Month 16
E(XY )) =
E(XY
 
p(xii,, yyjj))
= i=1 j=1 xxiiyyjj p(x I was a construction Mo
= 0.10
i=1 + 0.00 + (−0.11) + 0.00 + · · · + 0.00 + 0.06
i=1 j=1
=
= 0.10
j=1
0.10 +
−0.01+ 0.00
0.00 + + (−0.11)
(−0.11) + + 0.00
0.00 +
+ ·· ·· ·· +
+ 0.00
0.00 +
+ 0.06
0.06
supervisor ina const
I was
=
= −0.01
−0.01 the North Sea super
Corollary 3.1
Corollary
advising and the No
Let (X, Y )3.1
Corollary be two-dimensional discrete random variable and let H(X, Y ) =
3.1
Let
Let (X, Y )) be
(X, Y be two-dimensional
two-dimensional discrete
discrete random
random variable
Real work
variable and
and let
let H(X,
H(X, he
helping
Y )) =
Y = foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr

82
(0) (0) (0)
2 0.03
ADVANCED TOPICS IN INTRODUCTORY 0.17 0.03 0.23
PROBABILITY: A FIRST COURSE IN (-2) (0) (2) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Column Totals 0.21 0.39 0.40 1.00

The values in parentheses are for xy. Multiplying these values by their
corresponding probabilities, we obtain the following table:

Y
X −1 0 1
−1 0.10 0.00 -0.11
0 0.00 0.00 0.00
2 -0.06 0.00 0.06

Hence,
n 
 m
E(XY ) = xi yj p(xi , yj )
i=1 j=1
= 0.10 + 0.00 + (−0.11) + 0.00 + · · · + 0.00 + 0.06
= −0.01

Expectation and Variance of Bivariate Distributions 87


Corollary 3.1
Let (X, Y ) be two-dimensional discrete random variable and let H(X, Y ) =
X. Then n m 
E(X) = xi p(xi , yj )
i=1 j=1

and n 
m

E(Y ) = yj p(xi , yj )
i=1 j=1

Example 3.4
For the data in Example 2.2, find the expectation of the functions

(a) H(X, Y ) = X

(b) H(X, Y ) = Y .

Solution
(a) From the table in Example 3.3, we calculate the values of the cells with
the xi values. Thus, for x1 = −1

x1 p(x1 , y1 ) = −1(0.10) = −0.1


x1 p(x1 , y2 ) = −1(0.2) = −0.2
x1 p(x1 , y3 ) = −1(0.11) = −0.11

so that the sum for that row is:


3

x1 p(x1 , yj ) = −0.1 + (−0.2) + (−0.11) = −0.41
j=1

Similarly, for x2 = 0
3

x2 p(x2 , yj ) = 0 + 0 + 0 = 0
j=1

and for x3 = 2
3

x3 p(x3 , yj ) = 0.06 + 0.34 +830.06 = 0.46
j=1
j=1

Similarly, for x2 = 0
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE
3 IN EXPECTATION AND VARIANCE OF

PROBABILITY THEORY – VOLUME III , y )
x2 p(x =0+0+0=0 BIVARIATE DISTRIBUTIONS
2 j
j=1

and for x3 = 2
3

x3 p(x3 , yj ) = 0.06 + 0.34 + 0.06 = 0.46
j=1
88 Advanced Topics in Introductory Probability
The results are presented in the table below:

Y
X −1 0 1 Row Totals
−1 -0.1 -0.2 -0.11 -0.41
0 0 0 0 0
2 0.06 0.34 0.06 0.46

Hence
3 
 3
E(X) = xi p(xi , yj )
i=1 j=1
= −0.41 + 0 + 0.46
= 0.05

(b) We compute E(Y ) as follows:


For y1 = −1
3

y1 p(xi , y1 ) = −0.1 + (−0.08) + (−0.03) = −0.21
i=1
Similarly, for y2 = 0
3

y2 p(xi , y2 ) = 0 + 0 + 0 = 0
i=1

and for y3 = 1
3

y3 p(xi , y3 ) = 0.11 + 0.26 + 0.03 = 0.40
i=1

The results are presented in the table below:

Y
X −1 0 1
−1 -0.1 0 0.11
0 -0.08 0 0.26
Expectation and Variance of2Bivariate
-0.03Distributions
0 0.03 89
Total -0.21 0 0.40

Hence
3 
 3
E(Y ) = yj p(xi , yj )
j=1 i=1
= −0.21 + 0 + 0.40
= 0.19

Expectation from Derived Joint Probability Distribution


84
The following theorem shows that we can find E(XY ) of a bivariate dis-
E(Y ) = j=1 i=1 yj p(xi , yj )
= −0.21
j=1 i=1
+ 0 + 0.40
= −0.21 + 0 + 0.40
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN= 0.19 EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME = III 0.19 BIVARIATE DISTRIBUTIONS

Expectation from Derived Joint Probability Distribution


Expectation from Derived Joint Probability Distribution
The following theorem shows that we can find E(XY ) of a bivariate dis-
The following
tribution (X, Ytheorem
) from theshows thatjoint
derived we can find E(XY
probability ) of a bivariate
distribution without dis-
the
tribution (X, Y ) from the derived joint probability distribution without
knowledge of the parent joint probability distribution or the marginal dis- the
knowledge
tribution inofthe
thecase
parent joint probability
of independence of Xdistribution
and Y . or the marginal dis-
tribution in the case of independence of X and Y .

Theorem 3.2
Theorem
Let (X, Y )3.2
be two-dimensional discrete random variable and let
Let
H(X,(X,
Y )Y=) XY
be .two-dimensional
Then discrete random variable and let
H(X, Y ) = XY . Then

E(XY ) =  xy p(xy)
E(XY ) = xy xy p(xy)
xy

Example 3.5
Example
Refer 3.5
to Example 2.2. Find E(XY ) from the derived joint probability dis-
Refer to Example
tribution. 2.2. Find E(XY ) from the derived joint probability dis-
tribution.
Solution
Solution
From Example 2.11, we have the following:
From Example 2.11, we have the following:
xy −2 −1 0 1 2
xy
p(xy) −2
0.03 −1
0.11 0
0.73 1
0.1 2
0.03
p(xy) 0.03 0.11 0.73 0.1 0.03
xy p(xy) -0.06 -0.11 0 0.1 0.06
xy p(xy) -0.06 -0.11 0 0.1 0.06
Hence
Hence 
E(XY ) =  xy p(xy)
E(XY ) = xy xy p(xy)
xy

www.job.oticon.dk

85
From Example 2.11, we have the following:

ADVANCED TOPICS INxy INTRODUCTORY


−2 −1 0 1 2
PROBABILITY: A FIRST COURSE0.03
p(xy) IN 0.11 0.73 0.1 0.03 EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
xy p(xy) -0.06 -0.11 0 0.1 0.06

Hence
90 Advanced Topics in Introductory Probability
E(XY ) = xy p(xy)
xy
= −0.06 + (−0.11) + 0 + 0.1 + 0.06
= −0.01

the same result as in Example 3.3.

Expression of Expectation through Marginal Distribution

The following Theorem shows that the expectation of X and Y in a bivariate


distribution can be obtained through the marginal distribution.

Theorem 3.3
Let (X, Y ) be a two-dimensional random variable with probability
mass function p(x, y) and let H(X, Y ) = X.
Then the function
n 
 m
E(X) = xi p(xi , yj )
i=1 j=1

simplifies to
n

E(X) = xi p(xi )
i=1

Proof
n 
 m
E(X) = xi p(xi , yj )
i=1 i=1
n  m
= xi p(xi , yj )
i=1 i=1
(since the x values do not depend on yand can therefore be
taken from the summation of y)
n

= xi p(xi )
i=1
(from the definition of the marginal probability mass function
Expectation and Variance of Bivariate Distributions 91
of X in Definition 1.11)

Similarly, if H(X, Y ) = Y . Then


n 
 m
E(Y ) = yi p(xi , yj )
i=1 j=1

which also reduces to m



E(Y ) = yj p(yj )
j=1

Example 3.6
Refer to Example 2.2. Using the marginal distributions, find
(a) E(X) (b) E(Y ).
86
Solution
E(Y ) = yi p(xi , yj )
i=1 j=1
ADVANCED TOPICS IN INTRODUCTORY
which also reduces
PROBABILITY: toCOURSE IN
A FIRST m EXPECTATION AND VARIANCE OF

PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
E(Y ) = yj p(yj )
j=1

Example 3.6
Refer to Example 2.2. Using the marginal distributions, find
(a) E(X) (b) E(Y ).

Solution
(a) From Example 2.11, the marginal distributions of X is:

x −1 0 2
p(x) 0.41 0.36 0.23
x p(x) -0.41 0 0.46

Hence
n

E(X) = xi p(xi )
i=1
= −41 + 0 + 0.46
= 0.05

(b) From Example 3.3, the marginal distributions of Y is:

y −1 0 1
p(y) 0.21 0.39 0.4
y p(y) -0.21 0 0.4

Hence
m
92 
Advanced Topics in Introductory Probability
E(Y ) = E(Y ) = yj p(yj )
j=1
= −21 + 0 + 0.4
= 0.19

3.2.2 Expectation of Functions of Continuous Bivariate


Random Variables

Definition 3.2
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let Z = H(X, Y ).
Then  ∞  ∞
E(Z) = H(x, y) f (x, y) dx dy
−∞ −∞

Theorem 3.4
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let H(X, Y ) = XY . Then expectation
of the product of X and Y is
 ∞  ∞
E(XY ) = xy f (x, y) dx dy
−∞ −∞

87

Example 3.7
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let Z = H(X, Y ).
ADVANCED
Then TOPICS IN INTRODUCTORY
  ∞
PROBABILITY: A FIRST COURSE ∞ IN EXPECTATION AND VARIANCE OF
E(Z) =
PROBABILITY THEORY – VOLUME III H(x, y) f (x, y) dx dy BIVARIATE DISTRIBUTIONS
−∞ −∞

Theorem 3.4
Let (X, Y ) be a two-dimensional random variable with probability
density function f (x, y) and let H(X, Y ) = XY . Then expectation
of the product of X and Y is
 ∞  ∞
E(XY ) = xy f (x, y) dx dy
−∞ −∞

Example 3.7
Refer to Example 1.3. Find E(XY ).

Solution
 2  1  
xy
E(XY ) = xy x + 2
dx dy
0 3
 2 0   
1 x2y2
= x3 y + dx dy
0 0 3
 2 
y y2
= + dy
0 4 9
4 8 43
= + =
8 27 54

WHY WAIT FOR


PROGRESS?
DARE TO DISCOVER
Discovery means many different things at
Schlumberger. But it’s the spirit that unites every
single one of us. It doesn’t matter whether they
join our business, engineering or technology teams,
our trainees push boundaries, break new ground
and deliver the exceptional. If that excites you,
then we want to hear from you.

careers.slb.com/recentgraduates

88
ADVANCED TOPICS
Expectation IN INTRODUCTORY
and Variance of Bivariate Distributions 93
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS

Definition 3.3
Let (X, Y ) be a two-dimensional continuous random variable with
probability density function f (x, y). Then the expectation of the
marginal distributions of X and Y are given by
 ∞
E(X) = x g(x) dx
−∞
 ∞
E(Y ) = y h(y) dy
−∞

where g(x) and h(y) are the marginal probability density function of
X and Y respectively

Example 3.8
Refer to Example 1.3. Find (a) E(X), (b) E(Y )

Solution
To find the expectation of a random variable from a bivariate distribution
of X and Y we require the marginal probability distributions of X and Y .
The marginal probability distribution of X for Example 1.3 was found
in Example 1.10 to be

 2
2 x2 + x, 0 < x < 1
g(x) = 3
 0, elsewhere

and that of Y to be

 1 y
+ , 0<x<2
h(y) = 3 6
 0, elsewhere

Hence, by Definition 3.1


 ∞
(a) E(X) = x g(x) dx
−∞
 1  
94 2
Advanced Topics2 in Introductory Probability
= x 2x + x dx
0 3
 1  
2
= 2x + x2
3
dx
0 3
13
=
18
 2  
1 y
(b) E(Y ) = y +
dy
0 3 6
 2  
1 y2
= y+ dy
0 3 6
10
=
9

3.2.3 Properties of Expectation of Bivariate Random Variables

Similar to the univariate case, we shall assume in our discussions of the prop-
erties of the expectation of bivariate random variables that both variables
have finite expectations.
89

Property 1 Expectation of Identical Distributions


 2
1 y2
= y+ dy
0 3 6
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN 10 EXPECTATION AND VARIANCE OF
=
PROBABILITY THEORY – VOLUME III 9 BIVARIATE DISTRIBUTIONS

3.2.3 Properties of Expectation of Bivariate Random Variables

Similar to the univariate case, we shall assume in our discussions of the prop-
erties of the expectation of bivariate random variables that both variables
have finite expectations.

Property 1 Expectation of Identical Distributions

Theorem 3.5
If X1 , X2 , ..., Xn have the same distribution, then they possess a com-
mon expectation E(Xi ) = µ

Note
The expression “have the same distribution” is equivalent to the expression
“are identically distributed”. Identical distribution means that the probabil-
ity density function or the probability mass function of the random variable
remains the same from trial to trial.
For example, X and Y are identically distributed if their probability
distribution functions are

g(x) = e−x , x≥0


Expectation and Variance of Bivariate
−y Distributions 95
h(y) = e , y≥0

Property 2 Addition Law of Expectation

Theorem 3.6
Let X and Y be any two random variables with expectation E(X)
and E(Y ), respectively. The expectation of their sum is the sum
of their expectations. That is,

E(X + Y ) = E(X) + E(Y )

Proof
We shall first prove for discrete case.
n 
 m
E(X + Y ) = (xi + yj ) p(xi , yj )
i=1 j=1
n  m n 
 m
= xi p(xi , yj ) + yj p(xi , yj )
i=1 j=1 i=1 j=1
n m m
 n

= xi p(xi , yj ) + yj p(xi , yj )
i=1 j=1 j=1 i=1
n
 m

= xi p(xi ) + yj p(xi )
i=1 j=1
(following from the logic of the proof of Theorem 3.3)
= E(X) + E(Y )
90
= n  m xi p(xi , yj ) +  n  m yj p(xi , yj )
= i=1 j=1 xi p(xi , yj ) + i=1 j=1 yj p(xi , yj )
ADVANCED TOPICS INn INTRODUCTORY
m m j=1 n

i=1 j=1 
i=1
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
=  n xi  m p(xi , yj ) + m yj  n p(xi , yj )
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
= i=1 xi j=1 p(xi , yj ) + j=1 yj i=1 p(xi , yj )
i=1
n j=1  m j=1 i=1
= n xi p(xi ) +  m yj p(xi )
= i=1 xi p(xi ) + j=1 yj p(xi )
i=1(following from
j=1 the logic of the proof of Theorem 3.3)
(following
= E(X) + E(Y from
) the logic of the proof of Theorem 3.3)
= E(X) + E(Y )

We shall now prove for continuous case.


We shall now prove for
 ∞continuous
 ∞ case.
E(X + Y ) =  ∞  ∞ (x + y)f (x, y) dx dy
E(X + Y ) = −∞ ∞
−∞
 ∞(x + y)f (x, y) dx dy ∞  ∞
= −∞ ∞ x−∞
 ∞ f (x, y) dy dx +  ∞ y  ∞ f (x, y) dx dy
= −∞ ∞ x −∞
f (x, y)dy
∞ dx + −∞
y −∞ f (x, y) dx dy
= −∞ −∞
∞ x fX (x) dx +  ∞ y fY (y) dy
−∞ −∞
−∞ −∞
96 = x fX (x) dx + Topics
Advanced y fY (y) dy
in Introductory Probability
−∞ −∞
By the definition of the marginal probability densities, the result follows.
By the definition of the marginal probability densities, the result follows.
Example 3.9
For the table in Example 2.2, verify that

E(X + Y ) = E(X) + E(Y )

(a) using the parent joint probability distribution table;

(b) using the derived joint probability distribution table.

Solution

(a) The distribution of X+Y from the parent joint probability distribution
table is reproduced from the solution for Example 2.2, taking note that
the values for x + y are those in parenthesis:
PREPARE FOR A
LEADING ROLE.
Y English-taught MSc programmes
X −1 0 1 in engineering: Aeronautical,
Row Totals
−1 0.10 0.20 0.11 0.41 Biomedical, Electronics,
(-2) (-1) (0) Mechanical, Communication
0 0.08 0.02 0.26 0.36 systems and Transport systems.
(-1) (0) (1) No tuition fees.
2 0.03 0.17 0.03 0.23
(1) (2) (3)
Column Totals 0.21 0.39 0.40 1.00

E liu.se/master
To obtain E(X + Y ), we multiply the values for x + y by their corre-
sponding probabilities and obtain the following table:

X −1
Y
0 1

−1 -0.20 -0.20 0.00
0 -0.08 0.00 0.26
2 0.03 0.34 0.09
91
(a) usingTOPICS
ADVANCED the parent joint probability distribution table;
IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
(b) using the
PROBABILITY derived
THEORY joint probability
– VOLUME III distribution table. BIVARIATE DISTRIBUTIONS

Solution

(a) The distribution of X+Y from the parent joint probability distribution
table is reproduced from the solution for Example 2.2, taking note that
the values for x + y are those in parenthesis:

Y
X −1 0 1 Row Totals
−1 0.10 0.20 0.11 0.41
(-2) (-1) (0)
0 0.08 0.02 0.26 0.36
(-1) (0) (1)
2 0.03 0.17 0.03 0.23
(1) (2) (3)
Column Totals 0.21 0.39 0.40 1.00

To obtain E(X + Y ), we multiply the values for x + y by their corre-


sponding probabilities and obtain the following table:

Y
X −1 0 1
−1 -0.20 -0.20 0.00
Expectation and Variance of 0Bivariate
-0.08 Distributions
0.00 0.26 97
2 0.03 0.34 0.09

Hence,
n 
 m
E(X + Y ) = (xi + yj ) p(xi , yj )
i=1 j=1
= −0.20 + (−0.20) + 0.00 + (−0.08) · · · + 0.34 + 0.09
= 0.24

which is the same as

E(X) + E(Y ) = 0.05 + 0.19 = 0.24

from Example 3.4.

(b) The distribution of X + Y from the derived joint probability distribu-


tion table is taken from Example 2.2. Hence, to obtain E(X + Y ), we
have the following table:

(x + y) −2 −1 0 1 2 3
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
(x + y) p(x + y) −0.2 −0.28 0 0.29 0.34 0.09

Hence,

E(X + Y ) = (xi + yj ) p(xi + yj )
= −0.2 + (−0.28) + 92
0 + 0.29 + 0.34 + 0.09
= 0.24
have the following table:

(x + y)IN INTRODUCTORY
ADVANCED TOPICS −2 −1 0 1 2 3
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
p(x + y) 0.1 0.28 0.13 0.29 0.17 0.03
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
(x + y) p(x + y) −0.2 −0.28 0 0.29 0.34 0.09

Hence,

E(X + Y ) = (xi + yj ) p(xi + yj )
= −0.2 + (−0.28) + 0 + 0.29 + 0.34 + 0.09
= 0.24

Corollary 3.2
Let X1 , X2 , ..., Xn be n random variables, where n is finite. Then the ex-
pectation of the sum Sn = X1 + X2 + ... + Xn is the sum of the expectations
of the individual random variables. Thus,

E(Sn ) = E(X1 + X2 + ... + Xn )


= E(X1 ) + E(X2 ) + ... + E(Xn )
n

= E(Xi )
98 Advanced
i=1 Topics in Introductory Probability

Corollary 3.3
If X1 , X2 , ..., Xn have the same distribution with E(Xi ) = µ for all 1 ≤ i ≤ n,
then the expectation of their sum Sn = X1 + X2 + ... + Xn is

E(Sn ) = E(X1 + X2 + ... + Xn ) = nµ

Proof
Since X1 , X2 , ..., Xn have the same distribution, from Theorem 3.3, they
have a common expectation:

E(X1 ) = E(X2 ) = ... = E(Xn ) = E(X) = µ

From Corollary 3.2


n
 n

E(X1 + ... + Xn ) = E(Xi ) = µ = nµ
i=1 i=1

Example 3.10
Refer to Example 2.13. Find E(X + Y ).

Solution

E(X + Y ) = E(X) + E(Y )


= 1.6 + 1.6
= 2(1.6) = 3.2
= nµ

Property 3 Difference Law of Expectation

Theorem 3.7
Let X and Y be any two random variables with expectations E(X)
and E(Y ) respectively. The expectation of their difference is the
difference of their expectations. That is,

E(X − Y ) = E(X) − E(Y )


93
E(X + Y ) = E(X) + E(Y )
ADVANCED TOPICS IN INTRODUCTORY
= 1.6 + 1.6
PROBABILITY: A FIRST COURSE IN = 2(1.6) = 3.2 EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
= nµ

Property 3 Difference Law of Expectation

Theorem 3.7
Let X and Y be any two random variables with expectations E(X)
and E(Y ) respectively. The expectation of their difference is the
difference of their expectations. That is,

E(X − Y ) = E(X) − E(Y )


Expectation and Variance of Bivariate Distributions 99

Example 3.11
For the table in Example 2.2, verify that

E(X − Y ) = E(X) − E(Y )

Solution
The distribution of X − Y is given in Example 2.8. Hence, to obtain
E(X + Y ), we have the following table:

x−y −2 −1 0 1 2 3
p(x − y) 0.11 0.46 0.12 0.11 0.17 0.03
(x − y) p(x − y) −0.22 −0.46 0 0.11 0.34 0.09

Hence,

E(X − Y ) = −0.22 + (−0.46) + 0 + 0.11 + 0.34 + 0.09


= −0.14

which is the same as Click here


to learn more

TAKE THE
E(X) − E(Y ) = 0.05 − 0.19 = −0.14

RIGHT TRACK
from Example 3.4.

Property 4

Theorem 3.8 Give your career a head start


by studying withrandom
If a and b are constants, then for any us. Experience
variables the advantages
X and Y
of our collaboration with major companies like
ABB, Volvo and Ericsson!
E(aX + bY ) = aE(X) + bE(Y )

Corollary 3.4 Apply by World class


Let X1 , X2 , ...,15XJanuary
n be n random variables. Then www.mdh.se
research
n

E(c1 X1 + c2 X2 + ... + cn Xn ) = ci E(Xi )
i=1

94
E(X + Y ), we have the following table:

ADVANCED
x −TOPICS
y IN INTRODUCTORY
−2 −1 0 1 2 3
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
p(x − y) 0.11
PROBABILITY THEORY – VOLUME III
0.46 0.12 0.11 0.17 0.03 BIVARIATE DISTRIBUTIONS
(x − y) p(x − y) −0.22 −0.46 0 0.11 0.34 0.09

Hence,

E(X − Y ) = −0.22 + (−0.46) + 0 + 0.11 + 0.34 + 0.09


= −0.14

which is the same as

E(X) − E(Y ) = 0.05 − 0.19 = −0.14

from Example 3.4.

Property 4

Theorem 3.8
If a and b are constants, then for any random variables X and Y

E(aX + bY ) = aE(X) + bE(Y )

Corollary 3.4
Let X1 , X2 , ..., Xn be n random variables. Then
n

100 E(c1 X1 + c2 X2Advanced
+ ... + cn Topics
Xn ) = in Introductory
ci E(Xi ) Probability
i=1

where ci ’s are constants.

If ci = c for all 1 ≤ i ≤ n, then Corollary 3.4 becomes


n

E(c1 X1 + c2 X2 + ... + cn Xn ) = c E(Xi )
i=1

The proofs of Theorem 3.8 and its corollary are similar to the proofs of
Theorem 3.6 and its corollaries.

Note
Theorems 3.6, 3.7 and 3.8 and their corollaries hold whether or not the
random variables involved are independent.

Example 3.12
For the data of Example 3.1, if a = 3 and b = 4, verify whether Property 4
is valid.

Solution
From Example 3.1,

3E(X) + 4E(Y ) = 3(1.6) + 4(3.9)


= 20.4

Now,

3x + 4y 3(−2) + 4(3) 3(−2) + 4(6) 3(4) + 4(3) 3(4) + 4(6)


p(x, y) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)

or
3x + 4y 6 18 24 36 95
p(x + y) 0.28 0.12 0.42 0.18
Solution
From Example
ADVANCED 3.1,
TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY3E(X) + 4E(Y
– VOLUME III ) = 3(1.6) + 4(3.9) BIVARIATE DISTRIBUTIONS
= 20.4

Now,

3x + 4y 3(−2) + 4(3) 3(−2) + 4(6) 3(4) + 4(3) 3(4) + 4(6)


p(x, y) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)

or
3x + 4y 6 18 24 36
p(x + y) 0.28 0.12 0.42 0.18

so that

E(3X + 4Y ) = 6(0.28) + 18(0.12) + 24(0.42) + 36(0.18)


Expectation and Variance of Bivariate Distributions 101
Expectation and Variance= of1.68 + 2.16 +
Bivariate 10.08 + 6.48
Distributions 101
Expectation and Variance= of20.4
Bivariate Distributions 101
Property 5 andExpectation
Expectation of Product
Variance of Bivariate of X and Y
Distributions 101
Thus, Property
Property 4 is valid.
5 Expectation of Product of X and Y
Property 5 Expectation of Product of X and Y
Property 5 Expectation of Product of X and Y
Theorem 3.9
Theorem
Let X and 3.9
Y be two random variables with expectations E(X) and
Theorem
Let and 3.9
Y be twoThenrandom
E(YX), respectively. the variables with
expectation of expectations
their productE(X)
is and
Let X
Theorem
E(Y and Y
3.9be two
), respectively. random variables with expectations
Then the expectation of their productE(X) is and
E(YX
Let ), respectively.
and Y be
E(XY ) =twoThen the) variables
random
E(X)E(Y expectation
+ E{[X − of expectations
with their product
E(X)][Y − E(Y )]}is
E(X) and
E(XY ) = E(X)E(Y
E(Y ), respectively. Then the) expectation
+ E{[X − E(X)][Y − E(Y )]}is
of their product
E(XY ) = E(X)E(Y ) + E{[X − E(X)][Y − E(Y )]}
E(XY ) = E(X)E(Y ) + E{[X − E(X)][Y − E(Y )]}

The last term appears sufficiently often to deserve a separate name and
The last term
treatment. It is appears sufficiently
called the covarianceoften to deserve
of X and a separate
Y and denoted name and
as Cov(X, Y ).
The last term
treatment. It is appears
called sufficiently
the covariance often
of X to deserve
and Y and a separate
denoted as name and
Cov(X, Y ).
The concept of covariance is discussed in detail in the next chapter.
treatment. It
last term
The concept is called
of appears the covariance
sufficiently
covariance of X
often
is discussed and Y and
to deserve
in detail denoted
in theanext as
separate Cov(X, Y
name and
chapter. ).
The concept
treatment. of covariance is discussed in detail in the next chapter.
It is called the covariance of X and Y and denoted as Cov(X, Y ).
Example 3.13
For the table of
The concept
Example 3.13 incovariance is discussed
Example 2.2, obtain E{[Xin detail in the next
− E(X)][Y − E(Ychapter.
)]}.
Example 3.13
For the table in Example 2.2, obtain E{[X − E(X)][Y − E(Y )]}.
For the table
Example 3.13 in Example 2.2, obtain E{[X − E(X)][Y − E(Y )]}.
Solution
Forhas
thebeen
Solution
It tableproved
in Example 2.2, obtain
in Theorem 4.1 inE{[X that − E(Y )]}.
− E(X)][Y
the sequel
Solution
It has been proved in Theorem 4.1 in the sequel that
It has been proved in Theorem 4.1 in the sequel that
Solution
E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
It has been E{[X
proved−inE(X)][Y
Theorem − 4.1
E(Yin)]}the
= sequel
E(XY )that− E(X)E(Y )
E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
From Examples 3.3,
From Examples − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
E{[X3.3,
From Examples 3.3, E(XY ) = −0.01
E(XY ) = −0.01
From Examples 3.3, E(XY ) = −0.01
From Examples 3.4,
From Examples 3.4, E(XY ) = −0.01
From Examples 3.4,
E(X) = 0.05 and E(Y ) = 0.19
From Examples 3.4, E(X) = 0.05 and E(Y ) = 0.19
E(X) = 0.05 and E(Y ) = 0.19
Hence, E(X) = 0.05 and E(Y ) = 0.19
Hence,
Hence,
E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
Hence, E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y )
E{[X − E(X)][Y − E(Y )]} = E(XY +
= −0.01 )−0.05(0.19)
E(X)E(Y )
= −0.01 + 0.05(0.19)
E{[X − E(X)][Y − E(Y )]} = −0.01 = −0.0005
E(XY + )− E(X)E(Y )
0.05(0.19)
= −0.0005
−0.01 + 0.05(0.19)
= −0.0005
= −0.0005
96
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME Advanced
102 III BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability

Property 6 Expectation of Product of Independent Random


Variables

Theorem 3.10
Let X and Y be two independent random variables with expectations
E(X) and E(Y ) respectively. Then the expectation of their product
is equal to the product of their expectations:

E(XY ) = E(X)E(Y )

Proof (for the discrete bivariate case)


The random variable XY can take all values xi yj , where i = 1, 2, ..., n
and j = 1, 2, ..., m with probabilities g(xi )h(yj ), as long as X and Y are
independent random variables. Hence
n 
 m
E(XY ) = xi yj g(xi ) h(yj )
i=1 j=1
 n  m 
 
= xi g(xi )  yj h(yj )
i=1 j=1
= E(X)E(Y )

The proof of the bivariate continuous case is similar.

Note
The converse of Theorem 3.10 does not hold, that is, the random variables
X and Y may satisfy the relation E(XY ) = E(X)E(Y ) without being
independent. How will people travel in the future, and
how will goods be transported? What re-
Example 3.14 sources will we use, and how many will
For the data in Example 2.3, verify whether we need? The passenger and freight traf-
fic sector is developing rapidly, and we
E(XY ) = E(X)E(Y ) provide the impetus for innovation and
movement. We develop components and
Solution systems for internal combustion engines
The distribution of XY is given in Example 2.12. Hence, to obtain
that operate more cleanly and more ef-
E(XY ), we have the following table: ficiently than ever before. We are also
pushing forward technologies that are
bringing hybrid vehicles and alternative
drives into a new dimension – for private,
corporate, and public use. The challeng-
es are great. We deliver the solutions and
offer challenging jobs.

www.schaeffler.com/careers

97
= xi g(xi )  yj h(yj )
i=1 j=1
ADVANCED TOPICS IN INTRODUCTORY
= E(X)E(Y
PROBABILITY: A FIRST COURSE IN ) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
The proof of the bivariate continuous case is similar.

Note
The converse of Theorem 3.10 does not hold, that is, the random variables
X and Y may satisfy the relation E(XY ) = E(X)E(Y ) without being
independent.

Example 3.14
For the data in Example 2.3, verify whether

E(XY ) = E(X)E(Y )

Solution
The distribution
Expectation and of XY is of
Variance given in Example
Bivariate 2.12. Hence, to obtain
Distributions 103
E(XY ), we have the following table:

xy −6 −12 12 24
p(xy) 0.28 0.12 0.42 0.18
xy p(xy) -1.68 -1.44 5.08 4.32

Hence,

E(XY ) = −1.68 + (−1.44) + 5.08 + 4.32


= 6.24

From Example 3.1,

E(X) = 1.6 and E(Y ) = 3.9

It has been shown in Example 2.3 that X and Y are independent. Hence,

E(X)E(Y ) = 6.24

which is the same as E(XY ) above.

Corollary 3.5
Let X1 , X2 , · · · , Xn be independent random variables. Then

E(X1 X2 · · · Xn ) = E(X1 )E(X2 )...E(Xn )

This result may be written as


 n  n
 
E (Xi ) = E(Xi )
i=1 i=1

where is the “product of”.

Proof
We shall demonstrate for the case of a three random variables X, Y and Z.
The product of three random variables X Y Z may be written as (XY )Z.
Using Theorem 3.10, we obtain:

E(XY Z) = E[(XY )Z] = E(XY )E(Z)

Applying again the same theorem to the random variable XY which is a


product of random variables X and Y we obtain

E(XY Z) = E(X)E(Y )E(Z)

98
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY
104 THEORY – VOLUME Advanced
III BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability

This corollary can be proved by mathematical induction. It is easy to prove


Theorem 3.10 for any finite number of independent random variables.

Property 7 Expectation of Quotient of X and Y

Theorem 3.11
Let X and Y be two random variables with expectations E(X) and
E(Y ) respectively. Then the expectation of their quotient is
 
X E(X) 1 E(X)
E ≈ − Cov(X, Y ) + Var(Y )
Y E(Y ) E(Y )2 E(Y )3

E(X) = 0, E(Y ) = 0, where Cov(X, Y ) is called the covariance of


X and Y (see next chapter)

Note
X
The expectation of the quotient may not exist even though the moments
Y
of X and Y exist.

Property 8 Monotonicity of Expectation

Theorem 3.12
Suppose X and Y are two random variables with expectations E(X)
and E(Y ) respectively. If X ≤ Y, then

E(X) ≤ E(Y )

Proof
We write
Y = X + (Y − X)
where Y − X ≥ 0. By Theorem 3.5, we have
Expectation and Variance of Bivariate Distributions 105
E(Y ) = E(X) + E(Y − X) ≥ E(X)

Property 9 Hölder’s Inequality of Expectation

Theorem 3.13
Suppose X and Y are bivariate random variables with finite pth and
q th order moments respectively, where p and q are positive numbers
1 1
satisfying + = 1. Then XY has a finite first moment and
p q
1 1
E(|XY |) ≤ E(|X|p ) p E(|Y |q ) q

The equality holds if and only if E(|Y |q ) = c E(|X|p ) for some constant c
or X = 0.
99
Property 10 Cauchy-Schwartz’ Inequality of Expectation
p q
1 1
ADVANCED TOPICS IN INTRODUCTORY
E(|XY |) ≤ E(|X|p ) p E(|Y |q ) q
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS

The equality holds if and only if E(|Y |q ) = c E(|X|p ) for some constant c
or X = 0.

Property 10 Cauchy-Schwartz’ Inequality of Expectation

Theorem 3.14
Suppose X and Y are two random variables with second moments
E(X 2 ) and E(Y 2 ) respectively. Then E(XY ) exists and

[E(XY )]2 ≤ E(X 2 )E(Y 2 )

Proof
Case 1
If E(X 2 ) or E(Y 2 ) is infinite, the theorem is valid.

Case 2
If E(X 2 ) = 0 then we must have P (X = 0) = 1 so that P (XY = 0) = 1
and E(XY ) = 0, hence again the theorem is valid. The same argument
applies if E(Y 2 ) = 0, so that we may assume that 0 < E(X 2 ) < ∞ and
0 < E(Y 2 ) < ∞.

Case 3
Suppose we have a random variable
Zt = tX + Y


678'<)25<2850$67(5©6'(*5((

&KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN

100
Case 2
If E(X 2 ) = 0 then we must have P (X = 0) = 1 so that P (XY = 0) = 1
ADVANCED TOPICS IN INTRODUCTORY
and E(XY ) A=FIRST
PROBABILITY: 0, hence again
COURSE IN the theorem is valid. The sameEXPECTATION
argument AND VARIANCE OF
applies if E(Y
PROBABILITY 2 ) = 0,
THEORY so that IIIwe may assume that 0 < E(X 2 ) < ∞ BIVARIATE
– VOLUME and DISTRIBUTIONS
0 < E(Y ) < ∞.
2

Case 3
Suppose we have a random variable
106 Advanced Topics in Introductory Probability
Zt = tX + Y
Let us consider the real valued function
E(Zt )2 = E(tX + Y )2
For every t we have
E(Zt )2 = E(tX + Y )2 ≥ 0
since
(tX + Y )2 ≥ 0
so that
E(t2 X 2 + 2 tXY + Y 2 ) ≥ 0 (i)
The term on the left hand side of (i) is a quadratic function in t, which is
always nonnegative for all t. Hence, it does not have more than one real
root. Its discriminant must, therefore, be less than or equal to zero. The
discriminant “b2 − 4ac” for the quadratic equation (i) in t is
[2 E(XY )]2 − 4 E(X 2 )E(Y 2 ) ≤ 0
or
4 [E(XY )]2 ≤ 4 E(X 2 )E(Y 2 )
from which the result follows.

Note
(a) Equality of Theorem 3.14 holds if and only if one of the random vari-
ables equals a constant multiple of the other, say, Y = t X for some
constant t or, at least one of them is non-zero, say, X = 0;
(b) This inequality is sometimes simply called Schwartz’ inequality;
(c) Cauchy-Schwartz’ inequality is a special case of Höder’s inequality
when p = q = 2.

Property 11 Minkowski’s Inequality of Expectation

Theorem 3.15
Suppose X and Y are two random variables with finite pth order
moments. Then for p ≥ 1
1 1 1
[E(|X + Y |p )] p ≤ E(|X|p ) p + E(|Y |p ) p
Expectation and Variance of Bivariate Distributions 107

Corollary 3.6
Suppose X and Y are two random variables with expectations E(X) and
E(Y ) respectively. Then

E(|X + Y |) ≤ E(|X|) + E(|Y |)

The proof follows immediately from Theorem 3.14 for the case when p = 1.

Property 12 Expectation of Frequency of Success

Theorem 3.16
101
Suppose the random variable M is the frequency of occurrence of the
E(Y ) respectively. Then
ADVANCED TOPICS IN INTRODUCTORY
E(|X + Y |) ≤
PROBABILITY: A FIRST COURSE IN
E(|X|) + E(|Y |) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
The proof follows immediately from Theorem 3.14 for the case when p = 1.

Property 12 Expectation of Frequency of Success

Theorem 3.16
Suppose the random variable M is the frequency of occurrence of the
event A (number of successes) in n independent and identical trials.
Then
E(M ) = np

Proof
Let X1 , X2 , · · · , Xn be n independent random variables which are identically
distributed. Suppose

1, if event A occurs in the ith trial,
Xi =
0, otherwise

Then Xi are Bernoulli random variables and

E(Xi ) = pi = p

Letting
M = X1 + X2 + · · · + Xn
we have

E(M ) = E(X1 ) + E(X2 ) + · · · + E(Xn )


= np
108 Advanced Topics in Introductory Probability
Example 3.15
A box contains 20 black, 20 red and 10 green balls. From the box, 25 balls
are randomly selected with replacement.
Find the expectation of the frequency of occurrence of the red balls.

Solution
Let A = {the occurrence of red balls}. Then
20 2
p = P (A) = =
50 5
Hence the expectation of the frequency of occurrence of the red balls is

E(M ) = np
 
2
= 25 = 10
5
That is, if we select 25 balls from a box containing 20 black, 20 red
and 10 green balls at random with replacement, we are likely to have the
red ball appearing 10 times.

Property 13 Expectation of Relative Frequency of Success

Theorem 3.17
M
Let the random variable be the relative frequency of the event
n
A (or proportion of success) among the n independent and identical
trials of experiment E. Then the expectation of the relative frequency
102
equals the probability of the event:
2
= 25 = 10
5
ADVANCED TOPICS IN INTRODUCTORY
That is,AifFIRST
PROBABILITY: we select 25 INballs from a box containing 20 black,
COURSE 20 red AND VARIANCE OF
EXPECTATION
and 10 greenTHEORY
PROBABILITY balls at– VOLUME
random IIIwith replacement, we are likely to haveBIVARIATE
the DISTRIBUTIONS
red ball appearing 10 times.

Property 13 Expectation of Relative Frequency of Success

Theorem 3.17
M
Let the random variable be the relative frequency of the event
n
A (or proportion of success) among the n independent and identical
trials of experiment E. Then the expectation of the relative frequency
equals the probability of the event:
 
M
E =p
n

Proof
Thus, using Property 4 of expectation (Theorem 3.7)
 
M1
E E(M )=
n
n
1
= (n p) (from Theorem 3.15)
Expectation and Variance of Bivariate
n Distributions 109
= p

The above theorem says that the expected relative frequency of the event A
is p, where p = P (A). This is intuitively clear and it establishes a connection
between the relative frequency of an event and the probability of that event.

Example 3.16
Refer to Example 3.15. Find the expectation of the relative frequency of the
occurrence of the red balls.
Scholarships
Solution
From Example 3.15,

2
p=
5

so that
Open your mind to 3
new opportunities
q = P (A) =
5
With 31,000 students, Linnaeus University is
one of the larger universities in Sweden. We
Hence,are the expectation
a modern university,of the for
known relative frequency of occurrence of the red
our strong
balls international
is profile. Every year more than Bachelor programmes in
1,600 international students fromall over
 the Business & Economics | Computer Science/IT |
world choose to enjoy the friendly atmosphere
M 2 Design | Mathematics
E =
and active student life at Linnaeus University.
p =
n 5 Master programmes in
Welcome to join us! Business & Economics | Behavioural Sciences | Computer
Science/IT | Cultural Studies & Social Sciences | Design |
Mathematics | Natural Sciences | Technology & Engineering
3.3 VARIANCE OF BIVARIATE RANDOM VARIABLES
Summer Academy courses

3.3.1 Variance of Functions of Bivariate Random Variables

The variance of a univariate random variable has been discussed in the


book on “Introductory Probability Theory” (Nsowah-Nuamah,
103 2017). We
extend the discussion to bivariate distributions in this section. A few more
ADVANCED
The above TOPICS
theoremIN says
INTRODUCTORY
that the expected relative frequency of the event A
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
is p, where p = P (A). This is intuitively clear and it establishes a connection
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
between the relative frequency of an event and the probability of that event.

Example 3.16
Refer to Example 3.15. Find the expectation of the relative frequency of the
occurrence of the red balls.

Solution
From Example 3.15,

2
p=
5

so that
3
q = P (A) =
5

Hence, the expectation of the relative frequency of occurrence of the red


balls is
 
M 2
E =p=
n 5

3.3 VARIANCE OF BIVARIATE RANDOM VARIABLES

3.3.1 Variance of Functions of Bivariate Random Variables

The variance of a univariate random variable has been discussed in the


book on “Introductory Probability Theory” (Nsowah-Nuamah, 2017). We
extend the discussion to bivariate distributions in this section. A few more
properties will also be discussed.
The variance of a bivariate random variables X and Y can be expressed
110
through the joint probabilityAdvanced Topics in Introductory Probability
functions.

Definition 3.4
Let (X, Y ) be a two-dimensional discrete random variable with prob-
ability mass function p(xi , yj ).
If H(X, Y ) = (X − µX )2 , then
n 
 m
Var(X) = (X − µX )2 p(xi , yj )
i=1 j=1

If H(X, Y ) = (Y − µY )2 , then
n 
 m
Var(Y ) = (Y − µY )2 p(xi , yj )
i=1 j=1

where µX = E(X) and µY = E(Y )

Example 3.17
For the data in Example 2.3, find the variance of the (a) X (b) Y .
104
Solution
Var(Y ) =i=1 j=1 (Y − µY )2 p(xi , yj )
i=1 j=1
where µTOPICS
ADVANCED
X = and Y = E(Y )
IN INTRODUCTORY
E(X) µ
where µX A=FIRST
PROBABILITY: E(X) and µYIN= E(Y )
COURSE EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS

Example 3.17
Example 3.17in Example 2.3, find the variance of the
For the data (a) X (b) Y .
For the data in Example 2.3, find the variance of the (a) X (b) Y .
Solution
Solution
(a) From Example 3.1, E(X) = 1.6. Hence,
(a) From Example 3.1, E(X) = 1.6. Hence,
Var(X) = (x1 − µX )2 p(x1 , y1 ) + (x1 − µX )2 p(x1 , y2 )
Var(X) = (x1 − µX )2 p(x 1 , y1 ) + (x1 − µX ) p(x
2
1 , y2 )
+(x2 − µX )2 p(x 2 , y1 ) + (x2 − µX )2 p(x 2 , y2 )
+(x2 − µX2) p(x2 , y1 ) + (x2 − µ2X ) p(x2 , y2 )
2 2
= (−2 − 1.6) (0.28) + (−2 − 1.6) (0.12) + (4 − 1.6)2 (0.42)
= (−2 − 1.6)22 (0.28) + (−2 − 1.6)2 (0.12) + (4 − 1.6)2 (0.42)
+(4 − 1.6) (0.18)
+(4 − 1.6)2 (0.18)
= 8.64
= 8.64
(b) From Example 3.1, E(Y ) = 3.9. Hence,
(b) From Example 3.1, E(Y ) = 3.9. Hence,
Var(Y ) = (y1 − µY )2 p(x1 , y1 ) + (y1 − µY )2 p(x1 , y2 )
Var(Y ) = (y1 − µY )2 p(x 1 , y1 ) + (y1 − µY ) p(x
2
1 , y2 )
+(y2 − µY )2 p(x 2 , y 1 ) + (y 2 − µY )2 p(x 2 , y2 )
+(y2 − µ2Y )2 p(x2 , y1 ) + (y22 − µY )2 p(x2 , y2 ) 2
= (3 − 3.9) (0.7) + (6 − 3.9) (0.12) + (3 − 3.9) (0.42)
Expectation and= Variance
(3 − 3.9)of
2
(0.7) + (6 −Distributions
3.9)2 (0.12) + (3 − 3.9)2 (0.42) 111
+(6 − 3.9)2Bivariate
(0.18)
Expectation and Variance of 2Bivariate
+(6 − 3.9) (0.18) Distributions 111
= 2.23
= 2.23
Definition 3.5
Definition 3.5a two-dimensional continuous random variables with
Let (X, Y ) be
Let (X, Y ) be a two-dimensional
probability density function f (x, y). continuous random variables with
probability density
If H(X, Y ) = (X − function
µX ) , then
2 f (x, y).
If H(X, Y ) = (X − µX )2, then
∞ ∞
Var(X) =  ∞  ∞ (X − µX )2 p(xi , yj )
Var(X) = −∞ −∞ (X − µX )2 p(xi , yj )
−∞ −∞
If H(X, Y ) = (Y − µY )2 , then
If H(X, Y ) = (Y − µY )2 ,then
∞ ∞
Var(Y ) =  ∞  ∞ (Y − µY )2 p(xi , yj )
Var(Y ) = −∞ −∞ (Y − µY )2 p(xi , yj )
−∞ −∞

We may also express the variance of X and Y in a bivariate distribution


We maymarginal
through the also express the variance
probability of X and Y in a bivariate distribution
functions.
through the marginal probability functions.

Definition 3.6
Definition
Let (X, Y ) be3.6a two-dimensional discrete random variable with prob-
Let (X,mass
ability Y ) befunction
a two-dimensional discrete
p(x, y). Then randomof
the variance variable with prob-
the marginal dis-
ability mass function p(x, y). Then
tributions of X and Y are given by the variance of the marginal dis-
tributions of X and Y are given by
Var(X) =  (x − µX )2 p(x)
Var(X) = all x(x − µX )2 p(x)
x
all
Var(Y ) =  (y − µY )2 p(y)
Var(Y ) = all y(y − µY )2 p(y)
all y
where p(x) and p(y) are the marginal probability mass function of
where
X and p(x) and p(y) are the marginal probability mass function of
Y , respectively
X and Y , respectively

Example 3.18
Example 3.18in Example 2.3, find (a) Var(X) (b) Var(Y ).
For the data
For the data in Example 2.3, find (a) Var(X)
105 (b) Var(Y ).
Solution
Y
all y
ADVANCED TOPICS IN INTRODUCTORY
where p(x)
PROBABILITY: and p(y)
A FIRST are IN
COURSE the marginal probability mass function of
EXPECTATION AND VARIANCE OF
X and Y THEORY
PROBABILITY , respectively
– VOLUME III BIVARIATE DISTRIBUTIONS

Example 3.18
For the data in Example 2.3, find (a) Var(X) (b) Var(Y ).

Solution

112 Var(X) = Advanced
(x Topics in Introductory Probability
− µX )2 p(x)
all x

= (−2 − 1.6)2 (0.4) + (4(−2 − 1.6)2 (0.6)


= 8.64

The reader is asked to calculate (b).

Definition 3.7
Let (X, Y ) be a two-dimensional continuous random variable with
probability density function f (x, y). Then the variance of the
marginal distributions of X and Y are given by
 ∞
Var(X) = (x − µX )2 g(x) dx
−∞
 ∞
Var(Y ) = (y − µY )2 h(y) dy
−∞

where g(x) and h(y) are the marginal probability density function of
X and Y respectively

Like in the case of the univariate random variable, for computational


purposes, we use the following formulas:

Var(X) = E(X 2 ) − [E(X)]2



= x2 p(x) − (E(X))2 (Discrete case)
all x
 ∞
= x2 g(x) dx − (E(X))2 (Continuous case)
−∞

Var(Y ) = E(Y 2 ) − [E(Y )]2



= y 2 p(y) − (E(Y ))2 (Discrete case)
all y
 ∞
= y 2 h(y) dy − (E(Y ))2 (Continuous case)
−∞

Example 3.19
Refer to Example 1.3. Find (a) Var(X) (b) Var(Y ).

106
Var(Y ) = (y − µY )2 h(y) dy
−∞
ADVANCED TOPICS IN INTRODUCTORY
where g(x)
PROBABILITY: and h(y)
A FIRST are the
COURSE IN marginal probability density function of
EXPECTATION AND VARIANCE OF
X and Y THEORY
PROBABILITY respectively
– VOLUME III BIVARIATE DISTRIBUTIONS

Like in the case of the univariate random variable, for computational


purposes, we use the following formulas:

Var(X) = E(X 2 ) − [E(X)]2



= x2 p(x) − (E(X))2 (Discrete case)
all x
 ∞
= x2 g(x) dx − (E(X))2 (Continuous case)
−∞

Var(Y ) = E(Y 2 ) − [E(Y )]2



= y 2 p(y) − (E(Y ))2 (Discrete case)
all y
 ∞
= y 2 h(y) dy − (E(Y ))2 (Continuous case)
−∞

Example 3.19and Variance of Bivariate Distributions


Expectation 113
Refer to Example 1.3. Find (a) Var(X) (b) Var(Y ).

Solution
13
(a) From Example 3.6, E(X) = .
18
Now
 1  
2
E(X ) =
2
x2
2x + x 2
dx
0 3
 1  
2
= 2 x + x3
4
dx
0 3
 1
2 x5 2 x4
= +
5 12 0
2 2
= +
5 12
17
=
30
Therefore  2
17 13
Var(X) = − = 0.04506
30 18
10
(b) From Example 3.5, E(Y ) = .
9
Now
 2  
1 y
E(Y ) = 2
y 2
+ dy
0 3 6
 2 
1 y3
= y + 2
dy
0 3 6
 2
y3 y4
= +
9 24 0
8 16
= +
9 24
14
=
9 107

Hence
 2 
1 y3
= y +2
dy
ADVANCED TOPICS IN INTRODUCTORY 0 3 6
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III
 2 BIVARIATE DISTRIBUTIONS
y3 y4
= +
9 24 0
8 16
= +
9 24
14
=
9
Hence  
14 10 2
Var(Y ) = − = 0.32099
114 9 9 Topics in Introductory Probability
Advanced

3.3.2 Properties of Variance of Bivariate Random Variables

In discussing the properties of variance, it is assumed that the variances


exist.

Property 1 Variance of Identical and Independent Random


Variables

Theorem 3.18
If the random variables X1 , X2 , ..., Xn have the same distribution,
then they all have the same variance σ 2 , that is,

Var(Xi ) = σ 2 , 1≤i≤n

Property 2 Variance of Sum and Difference of X and Y

Theorem 3.19
If (X,Y) are two dimensional random variables, then

Var(X ± Y ) = Var(X) + Var(Y ) ± 2 Cov(X, Y )

where Cov(X, Y ), is the covariance of X and Y

Proof
We shall give the proof for the case X + Y . The proof for the case X − Y
should be tried by the reader (see Exercise 3.13).
Var(X + Y ) = E{[X + Y − E(X + Y )]2 }
= E{[X + Y − E(X) − E(Y )]2 }
= E{([X − E(X)] + [Y − E(Y )])2 }
= E{(X − E(X))2 + (Y − E(Y ))2
+2(X − E(X))(Y − E(Y ))}
= E{X − E(X)}2 + E{Y − E(Y )}2
+2{E(X − E(X))(Y − E(Y ))}
= Var(X) + Var(Y ) + 2 Cov(X, Y )

108
Expectation and Variance of Bivariate Distributions 115
Expectation
ADVANCED TOPICS
Expectation and
and Variance of
of Bivariate
IN INTRODUCTORY
Variance Bivariate Distributions
Distributions 115
115
Expectation
PROBABILITY: and A FIRSTVariance
COURSE of IN
Bivariate Distributions 115 AND VARIANCE OF
EXPECTATION
Expectation and Variance of Bivariate Distributions 115
PROBABILITY
Expectation THEORY – VOLUME III BIVARIATE
115 DISTRIBUTIONS
Example 3.20and Variance of Bivariate Distributions
Example
Example 3.20
If Var(X)3.20
Example = 8, Var(Y ) = 5 and Cov(X, Y ) = 3 find
3.20
If Var(X)
Example
If Var(X) =
3.20
= 8, Var(Y )) =
=5 5 and Cov(X, Y Y )) = = 33 find
(a)
If Var(X3.20
Var(X)
Example =+ 8,Y ), Var(Y
8, (b) Var(X
Var(Y )=5 −
and
andY )Cov(X,
Cov(X, Y ) = 3 find
find
(a) Var(X
If Var(X)
(a) Var(X = +
+ 8,Y ),
Y ), Var(Y (b) Var(X
)=5 −
(b) Var(X −andY )
Y )Cov(X, Y ) = 3 find
(a) Var(X =
If Var(X) + Y ), Var(Y (b) Var(X Y )Cov(X, Y ) = 3 find
(a) Var(X + 8,
Solution Y ), (b) Var(X )=5 − −andY)
Solution
(a) Var(X + Y ), (b) Var(X − Y )
Solution
Solution
(a) Var(X + Y ) = 8 + 5 + 2(3) = 19
Solution
(a)
(a) Var(X
Solution Var(X + +Y Y )) ==8 8++5 5++ 2(3)
2(3) = = 1919
(a)
(b) Var(X
Var(X − + Y ) =
Y)=8+5− 8 + 5 + 2(3) =
2(3) 719
(a)
(b) + + = 19
(a) Var(X
(b) Var(X − +Y
− Y )) ==8 8++5 5−+ 2(3)
− 2(3) = =7 719
(b) Var(X − Y ) = 8 + 5 − 2(3) = 7
(b) variance
The Var(X −isY not ) = 8additive
+ 5 − 2(3) =7
in general, as the expected value. However,
(b) variance
The Var(X −isY not ) = 8additive
+ 5 − 2(3) in =7
general, as
The
with variance
the additional is not additive in general, as the
the expected
expected value. However,
we obtainvalue. However,
The
with variance
the is notassumption
additional additive in of
assumption of
independence,
general, as the expected
independence, we obtain
Theorem
value.
Theorem
3.20.
However,
3.20.
The
with variance
the additional is notassumption
additive in ofgeneral, as the expected
independence, we obtainvalue.
Theorem However,
3.20.
with
The the additional
variance is not assumption
additive in ofgeneral,
independence,
as the we obtainvalue.
expected Theorem 3.20.
However,
with the
Corollary 3.7 additional assumption of independence, we obtain Theorem 3.20.
Corollary
with the
Corollary 3.7
additional assumption of independence,
3.7Xn are n dimensional random variables, then we obtain Theorem 3.20.
If X1 , X2 , ...,
Corollary 3.7Xn are n dimensional random variables, then
If X ,
Corollary
If X , ...,
3.7X n are n dimensional random variables, then
1 2
X1 , X2 , ...,
If are 
n n dimensional random variables, then
Corollary
If
X , X , ...,3.7X
X11 , X22 , ..., X n
 are  
n n dimensional

n
n
n 
random variables,
 n 
n
n then
n  
If X1 , X2 , ...,Var X n are
n Xn =
idimensional
n Var(X i ) + 2 variables,
random  n  n Cov(X
then i ,, X ),
Var
Var   n X 
n Xii = 
= n Var(X
n Var(Xii ) + 2 
) + 2 n
n j=1  n Cov(X Xj ),
n Cov(Xii , Xjj ),
Var  
i=1 
i=1
= i=1 
i=1 
n Var(X ) + 2 i=1 n Cov(X , X ),
n X

Var i=1 i=1 Xii =  i=1 Var(Xii ) + 2  ni<j
i=1
j=1

j=1 Cov(Xii , Xjj ),
Var i=1 Xi = i=1 Var(Xi ) + 2 i=1i<jj=1 Cov(Xi , Xj ),
i=1 i=1 i=1 i<jj=1
or equivalently, i=1 i<j
or equivalently,
or equivalently, 
i=1 i=1 i<jj=1
or equivalently,    i<j
or equivalently,  
n   n n  n
n 
n Xi  = 
n
n Var(Xi ) + 
n  n
or equivalently, Var    n  n Cov(Xi , Xj ),
Var
Var   n Xi  =  n Var(Xi ) +  n  n Cov(Xi , Xj ),
i=1 Xi  = 
n i=1 Var(Xi ) + 
n n
i=1 j=1  n Cov(Xi , Xj ),
Var  i=1 X = i=1 Var(X ) + i=1 j=1n Cov(X , X ),
Var i=1 Xi = i=1 Var(Xi ) + i=1i=j=1
n i  n i n i =j Cov(Xii , Xjj ),
Var i=1 Xi = i=1 Var(Xi ) + i=1i=j=1
i=1 i=1 i=1 j
j=1
j Cov(Xi , Xj ),
or equivalently, i=1 i=j
or equivalently,
or equivalently, i=1 i=1 i =j=1
j
or equivalently,  n  i=j
or equivalently,    n  n
 n
n Xi  = 
 n
n 
n
or equivalently, Var   n Xi  =  n 
n Cov(Xi , Xj )
Var 
Var   n Xi =  n j=1 n Cov(Xi , Xj )
n Cov(Xi , Xj )
Var  i=1
= i=1 
n Cov(X ,X )
i=1 X  i=1 j=1
Var i=1 n Xi = 
i
n j=1
i=1  Cov(Xii , Xjj )
Var i=1 Xi = i=1 j=1 Cov(Xi , Xj )
i=1 j=1
Property 3 Sum and i=1 Difference of Independent Random
Property
Property 3 Sum 3 Sum and
and Differencei=1of
Difference
i=1
ofj=1Independent
Independent Random Random
Property 3 Sum Variables and Difference of Independent Random
Property 3 Sum Variables
Variables and Difference of Independent Random
Property 3 Sum Variables and Difference of Independent Random
Variables
Theorem 3.20 Variables
Theorem
Theorem 3.20
3.20
If (X, Y ) are
Theorem 3.20two dimensional random variables and if X and Y are
If (X,
Theorem
If Y )
(X, Y ) are are
3.20two
two dimensional
dimensional random random variables
variables and and if
if XX andand Y Y are
are
independent
If (X,
Theorem Y ) are then,
3.20two dimensional random variables and if X and Y are
independent
If (X, Y ) are then,
independent two dimensional random variables and if X and Y are
then,
independent
If (X, Y ) are then, two dimensional random variables and if X and Y are
independent then, Var(X ± Y ) = Var(X) + Var(Y )
independent then,Var(X Var(X ± ±Y Y )) == Var(X)
Var(X) + + Var(YVar(Y ))
Var(X ± Y ) = Var(X) + Var(Y )
Var(X ± Y ) = Var(X) + Var(Y )
Var(X ± Y ) = Var(X) + Var(Y )

109
or equivalently,
ADVANCED TOPICS IN INTRODUCTORY
 n  n  n
PROBABILITY: A FIRST COURSE
 IN  EXPECTATION AND VARIANCE OF
PROBABILITY THEORYVar– VOLUMEXIII
i = Cov(Xi , Xj ) BIVARIATE DISTRIBUTIONS
i=1 i=1 j=1

Property 3 Sum and Difference of Independent Random


Variables

Theorem 3.20
If (X, Y ) are two dimensional random variables and if X and Y are
independent then,

Var(X ± Y ) = Var(X) + Var(Y )


116 Advanced Topics in Introductory Probability

Proof
The theorem follows from Theorem 3.19 by noting that Cov(X, Y ) = 0 for
the case when X and Y are independent (see Theorem 4.7 in the sequel.)

Corollary 3.8 Bienaymé Equality


Let Xi (i = 1, 2, ..., n) be independent random variables. Then
 n  n
 
Var Xi = Var(Xi )
i=1 i=1

This corollary follows from Corollary 3.7.

Note
It is important to observe that
 n  n
 
E Xi = E(Xi )
i=1 i=1

whether or not the Xi ’s are independent but it is generally not the case that
 n  n
 
Var Xi = Var(Xi )
i=1 i=1

always.

Example 3.21
For the data in Example 2.3, verify that
(a) Var(X + Y ) = Var(X) + Var(Y );

(b) Var(X − Y ) = Var(X) + Var(Y ).

Solution
It has been shown in Example 2.3 that X and Y are independent, hence:

(a) Var(X + Y ) = E(X + Y )2 − [E(X + Y )]2 .

Now
(xi + yj )2 (−2 + 3)2 (−2 + 6)2 (4 + 3)2 (4 + 6)2
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)

110
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
Expectation and Variance of Bivariate Distributions 117

Therefore, the distribution of (X + Y )2 is

(xi + yj )2 1 16 49 100
p(xi )p(yj ) 0.28 0.12 0.42 0.18

  
E (X + Y )2 = (x + y)2 p(x)p(y)
= 1(0.28) + 16(0.12) + 49(0.42) + 100(0.18)
= 0.28 + 1.92 + 20.58 + 18
= 40.78

From Example 3.7, E(X + Y ) = 5.50.


Therefore,
 
Var(X + Y ) = E (X + Y )2 − [E(X + Y )]2
= 40.78 − (5.5)2
= 10.53

Now
Var(X) = E(X 2 ) − [E(X)]2
From Example 3.1, E(X) = 1.6.
From Example 2.3, we shall find E(X 2 ). Now,

x2i (−2)2 (4)2


p(xi ) 0.4 0.6

n

E(X ) = 2
x2 p(x)
i=1
= (−2)2 (0.4) + (4)2 (0.6)
= 1.6 + 9.6 = 11.2

Therefore
Var(X) = 11.2 − (1.6)2 = 8.64
Again
118 Advanced Topics in Introductory Probability
Var(Y ) = E(Y 2 ) − [E(Y )]2

From Example 3.1, E(Y ) = 3.9.


Similarly,

yj2 (3)2 (6)2


p(yj ) 0.7 0.3


E(Y 2 ) = y 2 h(y)
= (3)2 (0.7) + (6)2 (0.3)
= 6.3 + 10.8 = 17.1

Therefore
Var(Y ) = 17.1 − (3.9)2 = 1.89
Hence,
Var(X + Y ) = Var(X) + 111
Var(Y)

 y 2 h(y)
E(Y 22)) =
2
E(Y =  y 2 h(y)
E(Y ) = 2 y 2 h(y)
E(Y ) = (3) 2 (0.7) + (6)2 (0.3)
ADVANCED TOPICS IN INTRODUCTORY 2 2
= (3) y 2 h(y)
(0.7) + (6) (0.3)
PROBABILITY: A FIRST COURSE IN= (3) (0.7) + (6)2 (0.3)
2 EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME = = 6.3
III 6.3
2+ 10.8 = 17.1
(3) +(0.7)
10.8+=(6)
2
17.1(0.3) BIVARIATE DISTRIBUTIONS
= 6.3 + 10.8 = 17.1
Therefore = 6.3 + 10.8 = 17.1
Therefore
Therefore Var(Y
Var(Y )) == 17.1 − (3.9)
(3.9)22 == 1.89
2
Therefore 17.1 − 1.89
Var(Y ) = 17.1 − (3.9)2 = 1.89
Hence,
Hence, Var(Y ) = 17.1 − (3.9) = 1.89
Hence, Var(X
Hence, Var(X + +Y Y )) == Var(X)
Var(X) + + Var(Y)
Var(Y)
Var(X + Y ) = Var(X) + Var(Y)
(b) Var(X + Y ) = Var(X) + Var(Y)
(b) Var
Var [(X Y )]
)] = Y ))22 −
− [E(X Y )]
2 2
[(X − −Y E(X −
= E(X −Y [E(X − −Y )]22
(b) Var [(X − Y )] = E(X − Y )2 − [E(X − Y )]2
(b) Var [(X − Y )] = E(X − Y ) − [E(X − Y )]
Now
Now
Now
Now (xi − yj )22 (−2 − 3)22 (−2 − 6)22 (4 − 3)22 (4 − 6)22
(xi − yj ) 2 (−2 − 3) 2 (−2 − 6) 2 (4 − 3) 2 (4 − 6) 2
(x −
p(x yj )j 2) (−2
)p(y − 3)2 (−2
0.4(0.7) − 6)2 0.6(0.7)
0.4(0.3) (4 − 3)2 0.6(0.3)
(4 − 6)2
)p(y
(xii ii−
p(x yj )j ) (−2 0.4(0.7)
− 3) 0.4(0.3)
(−2 − 6) 0.6(0.7)
(4 − 3) 0.6(0.3)
(4 − 6)
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
(xi −
(x yj )22 25 64 11 44
i − yj ) 2 25 64
(xi i−
p(x yj )j 2) 0.28
)p(y 25 0.12 64 0.42 1 4
0.18
(xi i−
p(x )p(y
yj )j ) 0.28 25 0.12 64 0.42 1 0.18
4
p(xi )p(yj ) 0.28 0.12 0.42 0.18
p(xi )p(yj ) 0.28 0.12  0.42 0.18
) 
2 = (x − y)2 g(x) h(y)
2 2
E(X − Y
E(X − Y ) 2 = (x − y) 2 g(x) h(y)
E(X − Y )2 = = (x − y) 2 g(x) h(y)
E(X − Y ) = 25(0.28) (x − y)
25(0.28) +
+ 64(0.12)
g(x) h(y)+
64(0.12) + 1(0.42)
1(0.42) + + 4(0.18)
4(0.18)
=
= 25(0.28)
7 + 7.68 +
+ 64(0.12)
0.42 + +
0.72 1(0.42) + 4(0.18)
= 725(0.28)
+ 7.68 + + 0.42
64(0.12)
+ 0.72+ 1(0.42) + 4(0.18)
= 3.82
= 7 + 7.68 + 0.42 + 0.72
7 + 7.68 + 0.42 + 0.72
= 3.82
= 3.82
= 3 3.82
xii −
x − yyjj −2
−2 −− 3 −2 −2 − − 66 44 −
− 33 44 −
− 66
xii )p(y
p(x −2 − 3 0.4(0.3)
− yjj ) 0.4(0.7) −2 − 6 0.6(0.7) 4−3 4−6
0.6(0.3)
x i )p(y
p(x − y j ) 0.4(0.7)
−2 − 3 0.4(0.3)
−2 − 6 0.6(0.7) 4−3 0.6(0.3)
4−6
p(xii )p(yjj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
p(xi )p(yj ) 0.4(0.7) 0.4(0.3) 0.6(0.7) 0.6(0.3)
xii −
x − yyjj −5
−5 −8
−8 11 −2
−2
x
p(x ii )p(yjj )
− y 0.28
−5 0.12
−8 0.421 0.18
−2
xii )p(y
p(x − yjj ) 0.28 −5 0.12 −8 0.42 1 0.18
−2
p(xi )p(yj ) 0.28 0.12 0.42 0.18
p(xi )p(yj ) 0.28 0.12 0.42 0.18

112
ADVANCED TOPICS
Expectation IN INTRODUCTORY
and Variance of Bivariate Distributions 119
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS


E [(X − Y )] = (x − y) g(x) h(y)
= −5(0.28) + (−8)(0.12) + 1(0.42) + −2(0.18)
= −1.4 − 0.96 + 0.42 − 0.36
= −2.3

Therefore,
[E(X − Y )]2 = (−2.30)2 = 5.29

Hence,
Var(X − Y ) = 15.82 − 5.29 = 10.53

But
Var(X) + Var(Y ) = 8.64 + 1.89 = 10.53

Hence,

Var(X − Y ) = Var(X) + Var(Y )

The next property is of much importance in both theoretical and applied


statistics.

Property 4

Theorem 3.21
If the random variables Xi (i = 1, 2, ..., n) are independent and have
the same variance, then
 n 

Var Xi = n σ2
i=1

Var(Xi ) = σ 2

120
This theorem follows immediately from Corollary
Advanced Topics in 3.8.
Introductory Probability

Property 5

Theorem 3.22
If (X, Y ) is a two-dimensional random variable, then

Var(aX ± bY ) = a2 Var(X) ± b2 Var(Y ) ± 2 a b Cov(X, Y )

where a and b are constant

Proof
We shall prove for the sum. The proof for the difference is similar.
Var(aX + bY ) = E{[(aX + bY ) − E(aX + bY )]2 }
= E{a[X − E(X)] + b[Y − E(Y )]2 }
= a2 E[X − E(X)]2 + b2 E[Y − E(Y )]2
+2 a b E{[X − E(X)][Y
113
− E(Y )]}
= a Var(X) + b2 Var(Y ) + 2 a b Cov(X, Y )
2
If (X, Y ) is a two-dimensional random variable, then
ADVANCED
Var(aX
TOPICS
±IN
bY ) = a22 Var(X)
INTRODUCTORY
± b22 Var(Y ) ± 2 a b Cov(X, Y )
Var(aX ± bY ) = a Var(X) ± b Var(Y ) ± 2 a b Cov(X, Y )
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
where a and
PROBABILITY b are– VOLUME
THEORY constantIII BIVARIATE DISTRIBUTIONS
where a and b are constant

Proof
Proof
We shall prove for the sum. The proof for the difference is similar.
We shall prove for the sum. The proof for the difference is similar.
Var(aX + bY ) = E{[(aX + bY ) − E(aX + bY )]2 }
Var(aX + bY ) = E{[(aX + bY ) − E(aX + bY )]22}
= E{a[X − E(X)] + b[Y − E(Y )]2 }
= E{a[X − E(X)] + b[Y − E(Y )] }
= a22 E[X − E(X)]22 + b22 E[Y − E(Y )]22
= a E[X − E(X)] + b E[Y − E(Y )]
+2 a b E{[X − E(X)][Y − E(Y )]}
+2 a b E{[X − E(X)][Y − E(Y )]}
= a2 Var(X) + b22 Var(Y ) + 2 a b Cov(X, Y )
2
= a Var(X) + b Var(Y ) + 2 a b Cov(X, Y )
More generally, we have the following results for the variance of a linear
More generally, we have the following results for the variance of a linear
combination of random variables.
combination of random variables.
Corollary 3.9
Corollary 3.9
If Xi (i = 1, 2, ..., n) are n random variables and ci a constant associated
If Xi (i = 1, 2, ..., n) are n random variables and ci a constant associated
with ith random variable, then
with ith random
 variable,
 then
n
  n n 
 n
n n n 
n
Var  ci Xi =  c2i2 V ar(Xi ) + 2  ci cj Cov(Xi , Xj )
Var i=1
ci Xi = i=1
ci V ar(Xi ) + 2 c c
i=1 j=1 i j
Cov(Xi , Xj )
i=1 i=1 i=1 j=1
i<j
i<j
or equivalently,
or equivalently,
 
n
  n
 n
 n

n n n n
Var  ci Xi =  c2i2 Var(Xi ) +   ci cj Cov(Xi , Xj )
Var i=1
ci Xi = i=1
ci Var(Xi ) + i=1 c c
j=1 i j
Cov(Xi , Xj )
i=1 i=1 i=1 j=1
i=j
i=j
or equivalently,
or equivalently,  n 
Expectation
Expectation and
and 
Variance
Variance   n 
of Bivariate
n of Bivariate
n
Distributions
n Distributions
n
121
121
Expectation andVar  i = 
i XBivariate
Variance cof ci cj Cov(Xi , Xj )
Distributions 121
Var i=1 ci Xi = i=1 j=1 ci cj Cov(Xi , Xj )
i=1 i=1 j=1
Example
Example 3.22
3.22
Example
If Var(X) 3.22
= 5, Var(Y )) = 3 and Cov(X, Y ) = 2, find
If
If Var(X) =
Var(X) = 5,
5, Var(Y
Var(Y ) == 33 and
and Cov(X,
Cov(X, YY )) =
= 2,
2, find
find
(a)
(a) Var(4X + 6Y );
(a) Var(4X
Var(4X ++ 6Y
6Y );
);
(b)
(b) Var(4X − 6Y ).
(b) Var(4X
Var(4X −− 6Y
6Y ).
).
Solution
Solution (a)
Solution (a)
(a)
(a)
(a) Var(4X
Var(4X +
+ 6Y
6Y )) =
= 4
4 2 Var(X) + 62 Var(Y ) + 2(4)(6) Cov(X, Y )
2 2
2 Var(X) + 6 2 Var(Y ) + 2(4)(6) Cov(X, Y )
(a) Var(4X + 6Y ) = 4 Var(X) + 6 Var(Y ) + 2(4)(6) Cov(X, Y )
=
= 16(5) + 36(3) + 48(2)
= 16(5)
16(5) ++ 36(3)
36(3) +
+ 48(2)
48(2)
=
= 284
284
= 284
(b)
(b) Var(4X − 6Y )) = 422 Var(X) + 622 Var(Y )) − 2(4)(6) Cov(X, Y)
(b) Var(4X − 6Y ) = 442 Var(X)
Var(4X − 6Y = Var(X) +
+ 662 Var(Y
Var(Y ) −− 2(4)(6)
2(4)(6) Cov(X,
Cov(X, YY ))
=
= 16(5) + 36(3) − 48(2)
= 16(5)
16(5) ++ 36(3) − 48(2)
36(3) − 48(2)
=
= 92
92
= 92
Property
Property 6
Property 66

Theorem 3.23
Theorem 3.23
Theorem
Let X and 3.23
Y be two
two independent random
random variables. Then
Then
Let X and
Let X and YY be
be two independent
independent random variables.
variables. Then
Var(aX ±
Var(aX bY )) =
± bY =aa22 Var(X) + b22 Var(Y )
2 Var(X) + b 2 Var(Y )
Var(aX ± bY ) = a Var(X) + b Var(Y )
where
where a and bb are constants
where aa and
and b are
are constants
constants
Corollary 3.10
Corollary 3.10
Corollary
Let X (i 3.10
= 1, 2, ..., n) be independent random114variables and cci aa constant
Let
Let X
X i (i
i
(i =
= 1,
1, 2, ...,
2,ith
..., n)
n) be
be independent
independent random
random variables
variables and
and cii a constant
constant
associated
i with random variable,
associated with i th random variable, then
th then
Let X and Y be two independent random variables. Then
ADVANCED TOPICS IN INTRODUCTORY 2
Var(aX ± bY ) = a Var(X)
PROBABILITY: A FIRST COURSE IN
+ b2 Var(Y ) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
where a and b are constants

Corollary 3.10
Let Xi (i = 1, 2, ..., n) be independent random variables and ci a constant
associated with ith random variable, then
 n  n
 
Var ci Xi = c2i Var(Xi )
i=1 i=1

From Corollary 3.10, if ci = c then


 n  n
 
Var c Xi = c2 Var(Xi )
i=1 i=1

Example 3.23
122 Advanced Topics in Introductory Probability
If Var(X) = 10 and Var(Y ) = 6, find

(a) Var(3X + 2Y ), (b) Var(2X + 2Y ),


(c) Var(3X − 2Y ), (d) Var(2X − 2Y ).

Solution
(a) Var(3X + 2Y ) = 32 Var(X) + 22 Var(Y )
= 9(10) + 4(6)
= 112
(b) Var(2X + 2Y ) = Var[2(X + Y )]
= 22 Var(X + Y )
= 22 [Var(X) + Var(Y )]
= 4(10 + 6)
= 64
The reader should solve (c) and (d).

Property 7 Variance of Frequency of Success

Theorem 3.24
WE WILL TURN YOUR CV
Let the random variable M be the frequency of success in n inde- INTO AN OPPORTUNITY
pendent trials, then OF A LIFETIME
Var(M ) = npq

Proof
We found in the proof of Theorem 3.16 that E(Xi ) = p. To find the variance,
we need to find also E(Xi2 ):

x2i 0 1
p(xi ) q p

i )to=
be1(p)
a part+
of0(q) = p brand?
2
E(X
Do you like cars? Would you like a successful
Send us your CV on
As a constructer at ŠKODA AUTO you will put great things in motion. Things that will
so that ease everyday lives of people all around Send us your CV. We will give it an entirely www.employerforlife.com
new new dimension.
Var(Xi ) = E(Xi2 ) − [E(Xi )]2
= p − p = pq2

115
Solution
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A(a)
FIRSTVar(3X
COURSE+IN
2Y ) = 32 Var(X) + 22 Var(Y ) EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III BIVARIATE DISTRIBUTIONS
= 9(10) + 4(6)
= 112
(b) Var(2X + 2Y ) = Var[2(X + Y )]
= 22 Var(X + Y )
= 22 [Var(X) + Var(Y )]
= 4(10 + 6)
= 64
The reader should solve (c) and (d).

Property 7 Variance of Frequency of Success

Theorem 3.24
Let the random variable M be the frequency of success in n inde-
pendent trials, then
Var(M ) = npq

Proof
We found in the proof of Theorem 3.16 that E(Xi ) = p. To find the variance,
we need to find also E(Xi2 ):

x2i 0 1
p(xi ) q p

E(Xi2 ) = 1(p) + 0(q) = p


so that
Var(X
Expectation and Variance ofi )Bivariate
= E(XDistributions
i ) − [E(Xi )]
2 2 123
= p − p = pq 2

Hence
Var(M ) = Var(X1 + X2 + ... + Xn )
= Var(X1 ) + Var(X2 ) + · · · + Var(Xn )
= npq

Example 3.24
Refer to example 3.15, find Var(M ).

Solution
2 3
n = 25, p = , q=
5 5
Var(M ) = npq
  
2 3
= 25
5 5
= 6

Property 8 Variance of Relative Frequency of Success

Theorem 3.25
M
Let the random variable be the relative frequency of success in n
n
independent trials, then
 
M pq
Var =
n n 116
5 5
Var(M ) = npq
ADVANCED TOPICS IN INTRODUCTORY   
2 3
PROBABILITY: A FIRST COURSE IN = 25 EXPECTATION AND VARIANCE OF
PROBABILITY THEORY – VOLUME III 5 5 BIVARIATE DISTRIBUTIONS
= 6

Property 8 Variance of Relative Frequency of Success

Theorem 3.25
M
Let the random variable be the relative frequency of success in n
n
independent trials, then
 
M pq
Var =
n n

Proof
 
M 1
Var = Var(M )
n n2
npq
= (by Theorem 3.24)
n2
pq
=
n

Example 3.25  
124 to example 3.15, find Var M
Refer Advanced
. Topics in Introductory Probability
n

Solution
2 3
n = 25, p = , q=
5 5
 
M pq
Var =
n n
  
2 3
5 5
=
25
= 0.0096

In this chapter we have discussed some moments of bivariate random


variables, namely, the expectation and variance. In the next chapter, we
shall continue with the discussion of moments and how that can be used to
measure the relationship between two random variables.

EXERCISES

3.1 Refer to the data of Exercise 1.1, find

(a) E(X + Y ) (b) E(X


 −Y )
3X
(c) E(XY ) (d) E
7Y
(e) E(2X + 3Y ) (f ) E(5X
 −Y )
3X
(g) E[(3X)(4Y )] (h) E
7Y
(i) E(X 2 + 5Y ) (j) E[3(X + Y )]
(k) E(Y 3 ) (l) E(Y 2 − 2Y + 3X)

3.2 Refer to Exercise 1.1. Suppose X and Y are not assumed to be inde-
pendence, find

(a) Var(X + Y ) (b) Var(X


117
 −Y )
X
7Y 7Y
(e) (e)
E(2X
E(2X+ 3Y+ )3Y (f) ) (f )E(5X
E(5X −Y−)Y )
3X 3X
ADVANCED TOPICS(g) (g)
IN INTRODUCTORY
E[(3X)(4Y
E[(3X)(4Y )] )](h) (h)
E E
PROBABILITY: A FIRST COURSE IN 7Y 7Y EXPECTATION AND VARIANCE OF
(i) E(X
PROBABILITY THEORY (i) 2 + 25Y
E(X
– VOLUME +
III)5Y )(j) (j)
E[3(X + Y+)]Y )]
E[3(X BIVARIATE DISTRIBUTIONS
(k) (k) 3) 3)
E(YE(Y (l) E(Y
(l) E(Y
2 − 22Y
− 2Y
+ 3X)
+ 3X)

3.2 3.2
Refer
Refer
to Exercise
to Exercise
1.1.1.1.
Suppose
Suppose X and
X and Y are
Y are notnot assumed
assumed to inde-
to be be inde-
pendence,
pendence, findfind

(a) (a)
Var(X
Var(X
+ Y+) Y ) (b) (b)
Var(X
Var(X
 −Y−)Y )
X X
(c) (c) Var(XY
Var(XY ) ) (d) (d)
VarVar
Y Y
(e) (e)
Var(2X
Var(2X
+ 3Y
+ )3Y (f
) ) (f
Var(5X
) Var(5X
− Y−) Y )
(g) (g)
Var[(3X)(4Y
Var[(3X)(4Y
)] )]
Expectation and Varianceof3X
  
3X
Bivariate Distributions 125
(h) (h)
VarVar
7Y 7Y
(i) Var(X 2 + 5Y )
1
(j) Var(3X) (k) Var(3X − 2Y )
7
(l) Var(Y 2 ) (m) Var(Y 2 − 2Y + 3X)
(n) Var[3(X + Y )]

3.3 Refer to Exercise 3.2 and rework, assuming that X and Y are inde-
pendent.
3.4 A box contains 10 green balls and 15 red balls. 5 balls are randomly
picked from the box with replacement. Find the expectation and vari-
ance of the frequency and relative frequency of the occurrence of green
balls.
3.5 Given a pair of continuous random variables having the joint density

24 x y, x > 0, y > 0 and x + y ≤ 1
f (x, y) =
0, elsewhere

Find
(a) E(X) (b) E(Y ) (c) E(XY )
(d) Var(X) (e) Var(Y ) (f ) E(X + Y )
(g) E(X − Y ) (h) Var(X + Y ) (i) Var(X − Y )
assuming independence.
3.6 Given the joint density

2, 0 < x < 1, 0 < y < 1, x + y < 1
f (x, y) =
0, elsewhere

find
(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )
if independence is assumed.
3.7 Refer to Exercise 1.7. Find
(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )

118
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN EXPECTATION AND VARIANCE OF
PROBABILITY
126 THEORY – VOLUME III
Advanced BIVARIATE DISTRIBUTIONS
Topics in Introductory Probability

3.8 Refer to Exercise 1.8. Find


(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )

3.9 Refer to Exercise 1.12. Find


(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )

3.10 Refer to Exercise 1.14. Find


(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )

3.11 Refer to Exercise 1.3. Find


(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )

3.12 Refer to Exercise 1.16. Find


(a) E(X) (b) E(Y) (c) E(XY)
(d) Var(X) (e) Var(Y) (f ) E(X − Y )
(g) E(X + Y ) (h) Var(X + Y ) (i) Var(X − Y )

3.13 Prove Theorem 3.13.

3.14 Suppose X and Y are independent Normal variables N (µ, σ 2 ) and


N (µ, 2σ 2 ), respectively. Determine µ if

(a) σ = 3 and P (X + 2Y ≤ 10) = 0.3,


(b) σ = 2 and P (2X + Y < 5) = 0.4,
(c) σ = 3 and P (X + Y ≥ 8) = 0.2.

3.15 Refer to Exercise 3.14. Determine µ and σ if


P (|2X − Y | > 10) = 0.05 and P (Y ≤ 10) = 0.9

119
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions

Chapter 4

MEASURES OF RELATIONSHIP
OF BIVARIATE DISTRIBUTIONS

4.1 INTRODUCTION

We recall that the expectation and variance of a univariate random variable


are special moments about the origin and the mean respectively, the expec-
tation being the first moment about the origin and the variance, the second
moment about the mean. When we have two or more random variables, mo-
ments are similarly calculated. In Chapter 3, we calculated the mean and
variance of bivariate random variables and also discussed their properties.
However, we shall realise in this chapter that in the multivariate case, other
types of moments can be calculated.

4.2 PRODUCT MOMENT

Let g(X1 , ..., Xn ) be any function of the random variables X1 , ..., Xn whose
density function is f (x1 , ..., xn ). Then the k th moment of g(X1 , X2 , ..., Xn )
is defined by

E[g k (X1 , X2 ..., Xn )]

=
 ∞
Develop
...
 ∞
the tools we need for Life Science
g k (x , x , · · · , x ) f (x , ..., x ) dx dx ...dx
1 2 n 1 n 1 2 n
−∞ Masters
−∞ Degree in Bioinformatics
In the case of a bivariate distribution a special type of moment that has been

127

Bioinformatics is the
exciting field where biology,
computer science, and
mathematics meet.

We solve problems from


biology and medicine using
methods and tools from
computer science and
mathematics.

Read more about this and our other international masters degree programmes at www.uu.se/master

120
ments are similarly calculated. In Chapter 3, we calculated the mean and
variance ofTOPICS
ADVANCED bivariate random variables and also discussed their properties.
IN INTRODUCTORY
However, we A
PROBABILITY: shall
FIRSTrealise in this
COURSE IN chapter that in the multivariate case, other
PROBABILITY THEORY
types of moments can– VOLUME III
be calculated. Measures of Relationship of Bivariate Distributions

4.2 PRODUCT MOMENT

Let g(X1 , ..., Xn ) be any function of the random variables X1 , ..., Xn whose
density function is f (x1 , ..., xn ). Then the k th moment of g(X1 , X2 , ..., Xn )
is defined by

E[g k (X1 , X2 ..., Xn )]


 ∞  ∞
128 = ... g k (x1 , Advanced
x2 , · · · , xn )Topics
f (x1 , ..., n ) dx1 dx2 ...dx
in xIntroductory n
Probability
−∞ −∞

In the case of a bivariate distribution a special type of moment that has been
found very useful is the product moment. We shall define product moment
for both discrete and continuous random127 variables.

4.2.1 Definition of Product Moment about Origin

Definition 4.1 PRODUCT MOMENT ABOUT ORIGIN


(Discrete Case)
Suppose X and Y are two discrete random variables. Then the
product moment of orders r and s about the origin of X and Y
respectively is given by
 

Mrs = E (X r Y s ) = xr y s p(x, y)
x=1 y=1

where r and s are nonnegative integers

If r = s = 1 this formula reduces to that of E(XY ) in Theorem 3.1.

Definition 4.2 PRODUCT MOMENT ABOUT ORIGIN


(Continuous Case)
Suppose X and Y are two continuous random variables. Then the
product moment of orders p and q about the origin of X and Y
respectively, is given by
 ∞  ∞

Mpq = E (X p Y q ) = xp y q f (x, y) dy dx
−∞ −∞
where p and q are nonnegative integers

If p = q = 1 this formula reduces to that of E(XY ) in Theorem 3.4.

The moments described in Definitions 4.1 and 4.2 for the case when
both powers are equal to unity are known as the first product moment about
the origin.

121
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
Measures THEORY – VOLUME
of Relationship III
of Bivariate Measures of Relationship of Bivariate
Distributions 129 Distributions

4.2.2 Definition of Product Moment about Mean

Definition 4.3 PRODUCT MOMENT ABOUT MEAN


(Discrete Case)
Suppose X and Y are two discrete random variables with means µX
and µY respectively. Then the product moment of orders r and s
about the mean of X and Y respectively is defined as

Mrs = E[(X − µX )r (Y − µY )s ]

Measures of Relationship (x − µDistributions
= of Bivariate X ) (y − µY ) p(x, y)
r s 129
x y

4.2.2 Definition of Product Moment about Mean

If r = s = 1, the formula in Definition 4.3 reduces to6


Definition 4.3 PRODUCT MOMENT ABOUT MEAN
M11∗
= (Discrete
E[(X − µX Case)
)(Y − µY )] = Cov(X, Y )
Suppose X and Y are two discrete random variables with means µX
and µY respectively. Then the product moment of orders r and s
about the mean of X and Y respectively is defined as
Definition 4.4 PRODUCT MOMENT ABOUT MEAN
Mrs∗
=(Continuous
E[(X − µX )rCase)
(Y − µY )s ]
 
Suppose X and Y are two continuousr random variables. Then the
= (x − µX ) (y − µY )s p(x, y)
product moment of ordersx y
p and q about the mean of X and Y
respectively is defined as

Mpq = E[(X − µX )p (Y − µY )q ]
∞  ∞ 
If r = s = 1, the formula in Definition 4.3 reduces to6
= (x − µX )p (y − µY )q f (x, y) dy dx
∗ −∞ −∞
M11 = E[(X − µX )(Y − µY )] = Cov(X, Y )

If p = q = 1 this formula reduces to


Definition 4.4∗ PRODUCT MOMENT ABOUT MEAN
M11 = E [(X − µX )(Y Case)
(Continuous − µY )] = Cov(X, Y )
Suppose X and described
The moments Y are twoincontinuous
Definitionsrandom
4.3 andvariables. Then
4.4 for the casethe
when
product moment of orders and
130 powers are equal to unityAdvanced
both p q about the
shall conveniently
Topics in bemean of X and
as theY first
referred to Probability
Introductory
respectively
product momentisabout
definedtheasmean or the first central product moment.
Example
6
4.1
See detailed
Mpq ∗discussion of this in Section
= E[(X − µX )p (Y − µY )q ]
4.3.

Refer to Example 3.8. Calculate
∞  ∞ the first product moment about the mean.
= (x − µX )p (y − µY )q f (x, y) dy dx
−∞ −∞
Solution
The first product moment about the mean, (that is, p = q = 1) for which
we are required to calculate is given by
If p = q = 1 this formula reduces to
 1  2
µ11 = E[(X − ∗ )(Y − µ )] =
X = E [(XY− µX )(Y − µ(x
Mµ11 Y )] )(y − µYY )) f (x, y) dy dx
−=µXCov(X,
0 0

FromThe moments
Example 3.5 described in Definitions 4.3 and 4.4 for the case when
both powers are equal to unity shall conveniently be referred to as the first
product moment about the µmean=or E(X) the first 13
X = central product moment.
6
18
See detailed discussion of this in Section 4.3. 10
µY = E(Y ) =
9
Hence
 1  2    
13 10 xy
= x− y− x + 122 dy dx
2
0 0 18 9 3
 1  2 
Solution
The first product moment about the mean, (that is, p = q = 1) for which
ADVANCED TOPICS
we are required toINcalculate
INTRODUCTORY
is given by
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III  1  2 Measures of Relationship of Bivariate Distributions
µ11 = E[(X − µX )(Y − µY )] = (x − µX )(y − µY ) f (x, y) dy dx
0 0

From Example 3.5


13
µX = E(X) =
18
10
µY = E(Y ) =
9
Hence
 1  2    
13 10 xy
= x− y− x2 + dy dx
0 0 18 9 3
  
1 1 2
= (18x − 13) (27x2 y + 9y 2 x − 30x2 − 10xy) dy dx
486 0 0
 1
1
= (18x − 13)(54x2 + 24x − 60x2 − 20x) dx
486 0
 1
1
= (18x − 13)(2x − 3x2 ) dx
243 0
 1
1
= (75x2 − 54x3 − 26x) dx
243 0
= −0.00617

4.3 COVARIANCE OF RANDOM VARIABLES

4.3.1 Definition of Covariance

The variance of a random variable is a measure of its variability, and the


covariance of two random variables is a measure of their joint variability, or
their degree of association.
Covariance was first mentioned and defined in Chapter 3 during the

Copenhagen
Master of Excellence cultural studies
Copenhagen Master of Excellence are
two-year master degrees taught in English
at one of Europe’s leading universities religious studies

Come to Copenhagen - and aspire!

Apply now at science


www.come.ku.dk

123
= (18x − 13)(2x − 3x ) dx
243 0
 1
1
= TOPICS IN(75x
ADVANCED 2
− 54x3 − 26x) dx
INTRODUCTORY
243
PROBABILITY: A FIRST COURSE IN
0
= −0.00617
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions

4.3 COVARIANCE OF RANDOM VARIABLES

4.3.1 Definition of Covariance

The variance of a random variable is a measure of its variability, and the


covarianceofofRelationship
Measures two random ofvariables is aDistributions
Bivariate measure of their joint variability,131
or
their degree of association.
Covariance was first mentioned and defined in Chapter 3 during the
discussion of the properties of variance. In this section, we shall introduce
it formally and discuss its properties.

Definition 4.5 COVARIANCE OF X and Y


If (X, Y ) is a bivariate random variable with E(X) = µX and E(Y ) =
µY , then the covariance of X and Y , denoted by Cov(X, Y ), is defined
as the expected value of (X − µX )(Y − µY )

Cov(X, Y ) = E(X − µX )(Y − µY )

Note
M11∗ = Cov(X, Y ) and hence the covariance is sometimes referred to as the

first product moment about the mean or simply the product moment.

Theorem 4.1
Let X and Y be random variables with a joint distribution function,
then
Cov(X, Y ) = E(XY ) − E(X)E(Y )

Proof

Cov(X, Y ) = E{[X − E(X)][Y − E(Y )]}


= E[XY − Y E(X) − XE(Y ) + E(X)E(Y )]
= E(XY ) − E(X)E(Y ) − E(X)E(Y ) + E(X)E(Y )
= E(XY ) − E(X)E(Y )

From Theorem 3.1, the covariance may also be defined as


n 
 m
Cov(X, Y ) = xi yj p(xi , yj ) − E(X)E(Y )
i=1 j=1

124
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
132 THEORY – VOLUME Advanced
III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability

Note

(a) Cov(X, X) = Var(X) ≥ 0

(b) The covariance need not be finite, or even exist. However, it is finite
if both random variables have finite variances.

Example 4.2
For the data in Example 2.3, find the covariance between X and Y .

Solution
It has been shown in Example 3.14,

E(XY ) = 6.24
E(X)E(Y ) = 6.24

Hence

Cov(X, Y ) = E(XY ) − E(X)E(Y )


= 6.24 − 6.24
= 0

Example 4.3
Refer to Example 1.3. Find Cov(X, Y ).

Solution
From Examples 3.7 and 3.8,

13 10 43
E(X) = ; E(Y ) = ; E(XY ) = .
18 9 54
Hence

Cov(X, Y ) = E(XY ) − E(X)E(Y )


  
43 13 10
= −
54 18 9
= −0.00617

which is the
Measures same as that of
of Relationship obtained in Example
Bivariate 4.1.
Distributions 133

4.3.2 Properties of Covariance

Property 1 Symmetry

Theorem 4.2
Suppose X and Y are two random variables, then

Cov(X, Y ) = Cov(Y, X)

Property 2

Theorem 4.3
Suppose X and Y are two random variables125
and a is a constant, then
Theorem
Suppose
Suppose X X 4.2
and
and Y Y are two
two random
are Cov(X, Y ) = variables,
random Cov(Y, X)then
variables, then
Suppose X and Y are Cov(X, Y ) = variables,
two random Cov(Y, X)then
Cov(X,
Cov(X, Y )) = Cov(Y, X)
ADVANCED TOPICS IN INTRODUCTORY
Cov(X, Y Y)= = Cov(Y,
Cov(Y, X)X)
PROBABILITY: A FIRST COURSE
Property 2 Cov(X,
IN Y ) = Cov(Y, X)
PROBABILITY
Property 2 THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
Property 2
Property
Property 2
2
Theorem
Property 2 4.3
Theorem
Property
Suppose
2 4.3
and Y are two random variables and a is a constant, then
X 4.3
Theorem
Suppose
Theorem and Y are two random variables and a is a constant, then
X 4.3
Suppose
Theorem
Theorem and Y are two random variables and a is a constant, then
X 4.3
4.3
Suppose
Theorem X and Y are Cov(a
two + X, Y )variables
random = Cov(X, Y )a is a constant, then
and
Suppose
Suppose X X 4.3
and
and YY are
are two
two random
Cov(a + X, Y )variables
random = Cov(X,
variables and
Y )a
and a is
is aa constant,
constant, then
then
Suppose X and Y are Cov(a + X, Y )variables
two random = Cov(X, Y )a is a constant, then
and
Cov(a
Cov(a + X, Y )) = Cov(X, Y )
Cov(a ++ X,
X, Y Y)= = Cov(X,
Cov(X, Y Y ))
Proof Cov(a + X, Y ) = Cov(X, Y )
Proof
Proof
Proof Cov(a + X, Y ) = E{[a + X − E(a + X)][Y − E(Y )]}
Proof
Proof Cov(a + X, Y ) = E{[a + X − E(a + X)][Y − E(Y )]}
Proof Cov(a + X, Y ) = E{[a E{[X+−XE(X)][Y − E(a +−X)][Y
E(Y )]} − E(Y )]}
Cov(a +
Cov(a + X,
X, YY )) == E{[a
E{[a +
E{[X +−X − E(a
XE(X)][Y
E(a + +−X)][Y
E(Y )]}
X)][Y E(Y )]}
− E(Y
Cov(a + X, Y ) = E{[X Cov(X,
E{[a +−X − E(Y )]}
)− E(a +−X)][Y
YE(X)][Y − E(Y )]}
− )]}
Cov(a + X, Y ) = = Cov(X,
E{[X+−
E{[a
E{[X −X )− E(a +−
E(X)][Y
YE(X)][Y −X)][Y
E(Y )]}
E(Y )]}
− E(Y )]}
= Cov(X,
E{[X −YE(X)][Y) − E(Y )]}
Corollary 4.1 = E{[X
= Cov(X,
Cov(X, −Y )
YE(X)][Y
) − E(Y )]}
Corollary 4.1 = Cov(X, Y )
Suppose and Y are two random
CorollaryX4.1 = Cov(X, variables
Y ) and a and b are constants. Then
Suppose X4.1
Corollary and Y are two random variables and a and b are constants. Then
Suppose
Corollary
CorollaryX and Y are two random variables and a and b are constants. Then
4.1
4.1
Suppose X and Y are Cov(a
two + X, bvariables
random + Y ) = Cov(X,
and a Y ) b are constants. Then
and
Corollary
Suppose
Suppose XX4.1
and
and YY are two
two random
Cov(a
are Cov(a + X, bvariables
random and
+ Y ) = Cov(X,
variables and a
a and
Y ) b are
and are constants.
constants. Then
Then
Suppose X and Y are Cov(a + X, bvariables
two random + Y ) = Cov(X,
and a Y ) bb are
and constants. Then
Cov(a + X, b + Y ) = Cov(X, Y )
Cov(a + X, bb + Y )) =
= Cov(X, Y ))
Property 3
Property 3 + X, +Y Cov(X, Y
Property 3 Cov(a + X, b + Y ) = Cov(X, Y )
Property
Property 3
3
Theorem
Property 3 4.4
Theorem
Property 3 4.4
Suppose X 4.4
and Y are two random variables, then
Theorem
Suppose X 4.4
Theorem and Y are two random variables, then
Suppose
Theorem
Theorem and Y are two random variables, then
X 4.4
4.4
Suppose
Theorem Cov(X,
and
X 4.4 Y + Z)random
areY two = Cov(X, Y ) + Cov(X,
variables, then Z)
Suppose
Suppose X X and
and Y are
Y
Cov(X, areY two
+ Z)random
two = Cov(X,
random variables, then
Y ) + Cov(X,
variables, then Z)
Suppose X and Cov(X, + Z)random
Y areY two = Cov(X, Y ) + Cov(X,
variables, then Z)
Cov(X,
Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z)
Cov(X, Y Y + Z) =
+ Z) = Cov(X,
Cov(X, Y Y )) +
+ Cov(X,
Cov(X, Z) Z)
Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z)

Brain power By 2020, wind could provide one-tenth of our planet’s


electricity needs. Already today, SKF’s innovative know-
how is crucial to running a large proportion of the
world’s wind turbines.
Up to 25 % of the generating costs relate to mainte-
nance. These can be reduced dramatically thanks to our
systems for on-line condition monitoring and automatic
lubrication. We help make it more economical to create
cleaner, cheaper energy out of thin air.
By sharing our experience, expertise, and creativity,
industries can boost performance beyond expectations.
Therefore we need the best employees who can
meet this challenge!

The Power of Knowledge Engineering

Plug into The Power of Knowledge Engineering.


Visit us at www.skf.com/knowledge

126
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME Advanced
134 III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability

Proof

Cov(X, Y + Z) = E([X − E(X)]{[Y + Z − E(Y + Z)]})


= E{[X − E(X)][Y − E(Y )] + [X − E(X)][Z − E(Z)]}
= E{[X − E(X)][Y − E(Y )]} + E{[X − E(X)][Z − E(Z)]}
= Cov(X, Y ) + Cov(X, Z)

Property 4

Theorem 4.5
Suppose X and Y are two random variables and a and b are
constants, then

Cov(aX, bY ) = a b Cov(X, Y )

Proof

Cov(aX, bY ) = E {[aX − aE(X)] [bY − bE(Y )]}


= E {[a(X − E(X))] [b(Y − bE(Y ))]}
= a b E {[X − E(X)] [Y − E(Y )]}
= a b Cov(X, Y )

Combining Corollary 4.1 and Theorem 4.5 , we have another corollary.

Corollary 4.2
Suppose X and Y are two random variables and a, b, c, and d are constants,
then
Cov(a + cX, b + dY ) = c d Cov(X, Y )

In general, the same kind of argument gives the important linear property
of covariance.
Measures of Relationship of Bivariate Distributions 135

Property 5 Bilinear Property of Covariance

Theorem 4.6
 
Let U = a + ni=1 ci Xi and V = b + m
j=1 dj Yj . Then
n 
 m
Cov(U, V ) = ci dj Cov(Xi , Yj )
i=1 j=1

This theorem has many applications. In particular, since

Cov(X + Y, X + Y ) = Var(X + Y )
127
= Var(X) + Var(Y ) + 2 Cov(X, Y )
n 
 m
Cov(U, V ) = ci dj Cov(Xi , Yj )
ADVANCED TOPICS IN INTRODUCTORY
i=1 j=1
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions

This theorem has many applications. In particular, since

Cov(X + Y, X + Y ) = Var(X + Y )
= Var(X) + Var(Y ) + 2 Cov(X, Y )

Property 6 Independence of Random Variables

Theorem 4.7
If X and Y are independent random variables then Cov(X, Y ) = 0

Proof
From Theorem 4.1

Cov(X, Y ) = E(XY ) − E(X)E(Y )

But if X and Y are independent, then from Theorem 3.10

E(XY ) = E(X)E(Y )

The desired result follows immediately.

Note
136 Advanced Topics in Introductory Probability
If Cov(X, Y ) = 0, X and Y need not be independent.

Property 7

Theorem 4.8
Suppose X and Y are random variables having second moments then
Cov(X, Y ) is a well-defined finite number and

[Cov(X, Y )]2 ≤ Var(X)Var(Y )

By replacing X by X − E(X) and Y by Y − E(Y ) in Theorem 3.14,


Theorem 4.8 follows immediately.

Note
The equality holds if and only if Y and X have perfect linear relation.

4.3.3 Uses of Covariance

Cov(X, Y ) is often used as a measure of the linear dependence of X and Y ,


and the reason for this is that Cov(X, Y ) is a single number (rather than
a complicated object such as a joint density function) which contains some
useful information about the joint behaviour of X and Y :
(a) Positive values indicate that X increases as Y increases;
(b) Negative values indicate that X decreases as Y increases or vice versa;
(c) A zero value of the covariance would indicate no linear dependence
between X and Y .
128
4.3.4 Limitations of Covariance
4.3.3 Uses of Covariance

Cov(X, Y )TOPICS
ADVANCED is oftenINused as a measure of the linear dependence of X and Y ,
INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
and the reason for this is that Cov(X, Y ) is a single number (rather than
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
a complicated object such as a joint density function) which contains some
useful information about the joint behaviour of X and Y :
(a) Positive values indicate that X increases as Y increases;
(b) Negative values indicate that X decreases as Y increases or vice versa;
(c) A zero value of the covariance would indicate no linear dependence
between X and Y .

4.3.4 Limitations of Covariance

The covariance is a bad measure of dependence because:


(a) It is not ‘scale-invariant’, that is, the numerical value of the covariance
is in the product of the units in which the two variables were measured.
This makes it difficult to determine whether a particular covariance is
large at first glance.
(b) It is unbounded. It can take any value. This means we cannot deter-
mineofthe
Measures maximum value
Relationship of a covariance
of Bivariate 137
at which we can say whether
Distributions
it is too large or not.
To deal with these problems, we ‘re-scale’ covariance to obtain the correla-
tion coefficient of random variables X and Y .

4.4 CORRELATION COEFFICIENT OF RANDOM VARIABLES

4.4.1 Definition of Correlation Coefficient

The problems associated with covariance can be eliminated by standardising


its value by the standard deviations σX and σY of X and Y , respectively.
This gives the simple coefficient of linear correlation or simply the correlation
Trust and responsibility
coefficient, denoted as ρ(X, Y ) or simply as ρ.
NNE and Pharmaplan have joined forces to create – You have to be proactive and open-minded as a
NNE Pharmaplan, the world’s leading engineering newcomer and make it clear to your colleagues what
and consultancy company focused entirely on the you are able to cope. The pharmaceutical field is new
pharma and biotech industries. to me. But busy as they are, most of my colleagues
Definition 4.6 CORRELATION COEFFICIENT find the time to teach me, and they also trust me.
Inés Aréizaga Esteva (Spain), 25 years old Even though it was a bit hard at first, I can feel over
If the covariance
Education: of XEngineer
Chemical and Y is defined then the correlation coefficient
time that I am beginning to be taken seriously and
is defined as that my contribution is appreciated.
Cov(X, Y )
ρ= 
Var(X)Var(Y )

0
In this definition, = 0. Correlation coefficient is also referred to as
0
Pearson’s correlation coefficient or product-moment correlation coefficient.
The product-moment correlation coefficient ρ can also be expressed as

XY − Xand
NNE Pharmaplan is the world’s leading engineering Y consultancy company
focused entirely on the pharma and ρ= biotech industries. We employ more than (i)
1500 people worldwide and offer global reach σXand
σYlocal knowledge along with
our all-encompassing list of services. nnepharmaplan.com

The reader will be asked in Exercise 4.21 to show that the numerator of
expression (i) above is defined by Cov(X, Y ). This expression may also be
used to establish a very important property of the correlation coefficient (see
Theorem 4.10 in the sequel.) 129
Measures of Relationship of Bivariate Distributions 137

ADVANCED TOPICS IN INTRODUCTORY


To deal with Athese
PROBABILITY: FIRSTproblems,
COURSE INwe ‘re-scale’ covariance to obtain the correla-
tion coefficient
PROBABILITY of random
THEORY variables
– VOLUME III X andMeasures
Y. of Relationship of Bivariate Distributions

4.4 CORRELATION COEFFICIENT OF RANDOM VARIABLES

4.4.1 Definition of Correlation Coefficient

The problems associated with covariance can be eliminated by standardising


its value by the standard deviations σX and σY of X and Y , respectively.
This gives the simple coefficient of linear correlation or simply the correlation
coefficient, denoted as ρ(X, Y ) or simply as ρ.

Definition 4.6 CORRELATION COEFFICIENT


If the covariance of X and Y is defined then the correlation coefficient
is defined as
Cov(X, Y )
ρ= 
Var(X)Var(Y )

0
In this definition, = 0. Correlation coefficient is also referred to as
0
Pearson’s correlation coefficient or product-moment correlation coefficient.
The product-moment correlation coefficient ρ can also be expressed as

XY − X Y
ρ= (i)
σX σY

The reader will be asked in Exercise 4.21 to show that the numerator of
expression (i) above is defined by Cov(X, Y ). This expression may also be
used to establish a very important property of the correlation coefficient (see
Theorem 4.10 in the sequel.)
138 Advanced Topics in Introductory Probability

4.4.2 Properties of Correlation Coefficient

Property 1 Scale-invariance

Theorem 4.9
For any a, b, c, d ∈ R such that ac = 0,

ρ(aX + b, cY + d) = IXY ρ(X, Y )

where 
+1, if ac > 0
IXY =
−1, if ac < 0

Proof

Cov(ax + b, cY + d)
ρ(aX + b, cY + d) = 
Var(aX + b) Var(cY + d)
By Corollary 4.2 130
where 
ADVANCED TOPICS IN INTRODUCTORY
+1, if ac > 0
PROBABILITY: A FIRST COURSE =
IXY IN
PROBABILITY THEORY – VOLUME III −1, ifMeasures
ac < 0 of Relationship of Bivariate Distributions

Proof

Cov(ax + b, cY + d)
ρ(aX + b, cY + d) = 
Var(aX + b) Var(cY + d)
By Corollary 4.2
Cov(aX + b, cY + d) = a c Cov(X, Y )
and by Theorem 3.22
Var(aX + b) = a2 Var(X) and Var(cY + d) = c2 Var(Y )
Hence
a c Cov(X, Y )
ρ(aX + b, cY + d) = 
a2 Var(X) c2 Var(Y )
ac
= · ρ(X, Y )
|a c|
from which the result follows.

Property 2 Bounds of Correlation Coefficient

Theorem 4.10
The correlation coefficient lies between −1 and +1, that is,

−1 ≤ ρ ≤ 1
Measures of Relationship of Bivariate Distributions 139

By dividing both sides of the formula in Theorem 4.8 by Var(X) Var(Y )


and finding the square root, Theorem 4.10 follws immediately.

4.4.3 Interpretation of Correlation Coefficient

For the value of ρ to be meaningful, both variables involved, in theory,


must be distributed normally. Positive values of ρ show that Y tends to
increase with increasing X; negative values of ρ indicate that Y tends to
decrease with increasing values of X.
There are specific interpretations of the correlation coefficient in terms
of the joint behaviour of X and Y.

(a) As a measure of linearity

The correlation coefficient measures the degree of linearity between X and


Y.
When ρ = ±1, there is perfect linearity between X and Y.

(i) If ρ(X, Y ) = 1, then Y is an increasing perfect linear function of X.


(ii) If ρ(X, Y ) = −1, Y is a decreasing perfect linear function of X.

When ρ = 0, there is no linear relationship between X and Y (see


the note below). When the value of ρ is near +1 or −1, that is an
indication of a high degree of linearity and when ρ is very small (near 0) it
indicates a lack of linearity.

Note
When ρ = 0 or near 0 it does not indicate the absence of relationship between
X and Y . It only indicates no linear relationship and it does not preclude
the possibility of some nonlinear relationship. 131
(ii) If ρ(X, Y ) = −1, Y is a decreasing perfect linear function of X.

When ρ = 0, there is no linear relationship between X and Y (see


ADVANCED TOPICS IN INTRODUCTORY
the note below).
PROBABILITY: WhenINthe value of ρ is near +1 or −1, that is an
A FIRST COURSE
indication of THEORY
PROBABILITY a high degree of linearity
– VOLUME III and when ρ is
Measures ofvery small (near
Relationship of0) it
Bivariate Distributions
indicates a lack of linearity.

Note
When ρ = 0 or near 0 it does not indicate the absence of relationship between
X and Y . It only indicates no linear relationship and it does not preclude
the possibility of some nonlinear relationship.

(b) As a measure of independence

When X and Y are independent then ρ = 0. However, the converse is not


true. That is:
(i) If ρ = 0, it does not mean that X and Y are independent.
140 Advanced Topics in Introductory Probability
140
(ii) If ρ = 0 it does not mean Advanced
that X andTopics
Y are in
dependent. The Probability
Introductory association
may be spurious.
In general, correlation and dependence are not equivalent. The exis-
In general, correlation and dependence are not equivalent. The exis-
tence of a statistical association, even if it is very strong, does not establish
tence of a statistical association, even if it is very strong, does not establish
that an increase in X causes an increase in Y, or that an increase in Y causes
that an increase in X causes an increase in Y, or that an increase in Y causes
an increase in X. A fundamental weakness of observational studies is that
an increase in X. A fundamental weakness of observational studies is that
they can demonstrate association but not causation.
they can demonstrate association but not causation.
Example 4.4
Example 4.4
For the data in Example 2.3, calculate the correlation coefficient of X and
For the data in Example 2.3, calculate the correlation coefficient of X and
Y and comment on the result.
Y and comment on the result.
Solution
Solution
From Example 4.2
From Example 4.2
Cov(X, Y ) = 0
Cov(X, Y ) = 0
From Example 3.21
From Example 3.21

σX = Var(X) = √8.64 = 2.9394
σX = Var(X) = √ 8.64 = 2.9394
σY = Var(Y ) = √1.89 = 1.3748
σY = Var(Y ) = 1.89 = 1.3748
Hence
Hence 0
ρ=  0 =0
ρ = (1.3748)(2.9394) = 0
(1.3748)(2.9394)
There is no linear correlation between X and Y .
There is no linear correlation between X and Y .
Example 4.5
Example 4.5
Refer to Example 1.3. Calculate the correlation coefficient.
Refer to Example 1.3. Calculate the correlation coefficient.
Solution
Solution
From Example 3.7 and 3.8,
From Example 3.7 and 3.8,
43
E(XY ) = 43 = 0.79630
E(XY ) = 54 = 0.79630
54
13
E(X) = 13 = 0.72222
E(X) = 18 = 0.72222
18
10
E(Y ) = 10 = 1.11111
E(Y ) = 9 = 1.11111
9
From Example 3.19
From Example 3.19
Var(X) = 0.04506
Var(X) = 0.04506
Var(Y ) = 0.32099
Var(Y ) = 0.32099

132
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
Measures of THEORY – VOLUME
Relationship III
of Bivariate Measures of Relationship of Bivariate
Distributions 141 Distributions

Hence
E(XY ) − E(X)E(Y )
ρ = 
Var(X)Var(Y )
0.79630 − 0.72222(1.11111)
= 
(0.04506)(0.32099)
= −0.05127
Measures of Relationship of Bivariate Distributions 141
4.4.4 Variance of Sum of Random Variables with Common
Variance and Common Correlation Coefficient
Hence
We now establish the variance
E(XYof sum of random) variables with correlation
) − E(X)E(Y
coefficient. ρ = 
Var(X)Var(Y )
0.79630 − 0.72222(1.11111)
= 
Theorem 4.11 (0.04506)(0.32099)
Suppose X1 , X2 , ...,=Xn−0.05127
are random variables with common vari-
ance σ 2 and common correlation Corr(Xi , Xj ) = ρ, i = j (i, j =
1, 2, ...,
4.4.4 n). Let of Sum of Random Variables with Common
Variance
Sn = X1 +Correlation
Variance and Common X2 + ... + XnCoefficient
Then
We now establish the variance of sum of random variables with correlation
coefficient.
(a) Var(Sn ) = nσ 2 [1 + (n − 1)ρ]
 
Sn σ2
(b)
Theorem 4.11Var = [1 + (n − 1) ρ]
n n
Suppose X1 , X2 , ..., Xn are random variables with common vari-
ance 2Sn
whereσ and = common correlation
X n = sample mean Corr(Xi , Xj ) = ρ, i = j (i, j =
n Let
1, 2, ..., n).
Sn = X1 + X2 + ... + Xn
Proof
Then
(a) From Corollary 3.9,
(a) Var(Sn ) = nσ 2 [1 + (n − 1)ρ]
n
 n 
 n
 
Var(S
S n ) =σ 2
n
Var(Xi ) + 2 Cov Cov(Xi , Xj )
(b) Var = [1 +
i=1 (n − 1) ρ] i j
n n i<j
n n
 n 

Sn
where = X n = sample
= mean
Var(Xi ) + 2 Var(Xi )Var(Xj )
n i=1 i j
i<j

Proof
(a) From Corollary 3.9,
n
 n 
 n
Var(Sn ) = Var(Xi ) + 2 Cov Cov(Xi , Xj )
i=1 i j
i<j
n
 n
 n 

142 = Var(Xi ) +Topics
Advanced 2 Var(Xi )Var(X
in Introductory j)
Probability
i=1 i j
i<j
 
Cov (Xi , Xj )
×  
Var(Xi )Var(Xj )
n
 n 
 n √
= σ2 + 2 σ2σ2 ρ
i=1 i j
i<j
n 
 n 133
= nσ + 2 2
ρσ 2
142 Advanced Topics in Introductory Probability
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN  
PROBABILITY THEORY – VOLUME III CovMeasures
(Xi , Xj ) of 
Relationship of Bivariate Distributions
× 
Var(Xi )Var(Xj )
n
 n 
 n √
= σ2 + 2 σ2σ2 ρ
i=1 i j
i<j
n
 n
= nσ 2 + 2 ρ σ2
i j
i<j
 
n2 − n
= nσ + 2 ρ σ
2 2
2
 
= nσ 2 + ρ σ 2 n2 − n
= nσ 2 + ρ σ 2 n (n − 1)
= nσ 2 [1 + (n − 1) ρ]
 
Sn 1
(b) Var = Var(Sn )
n n2
σ2
= [1 + (n − 1) ρ]
n
When X’s are independent, ρ = 0 and
 
Sn σ2
Var = Var(X) =
n n

as will be seen in Theorem 7.2.

4.5 CONDITIONAL EXPECTATIONS

We encountered in my book “Introductory Probability Theory” (Nsowah-


Nuamah, 2017) the concept of conditional probability. Again, in Chapter
1 of this book, we dealt with conditional distributions. In this section, we

This e-book
discuss conditional expectation7 which is simply the expected value of a
variable, given that a set of prior conditions has taken place. As we shall

is made with SETASIGN


7
It is sometimes referred to as conditional expected value or conditional mean of a
random variable

SetaPDF

PDF components for PHP developers

www.setasign.com

134
 
Sn σ2
Var
ADVANCED TOPICS IN INTRODUCTORY
= Var(X) =
n n
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY
as will be seen – VOLUME
in Theorem 7.2.III Measures of Relationship of Bivariate Distributions

4.5 CONDITIONAL EXPECTATIONS

We encountered in my book “Introductory Probability Theory” (Nsowah-


Nuamah, 2017) the concept of conditional probability. Again, in Chapter
1Measures
of this book, we dealt with
of Relationship conditional
of Bivariate distributions. In this section, 143
Distributions we
discuss conditional expectation7 which is simply the expected value of a
variable, given that a set of prior conditions has taken place. As we shall
observe
7
later, it is obtained simply by summation or integration with respect
It is sometimes referred to as conditional expected value or conditional mean of a
to conditional distributions. A precise definition will be given and their
random variable
142
properties, some of which are Advanced
analogous Topics
to properties obtained Probability
in Introductory in Chapter
3 for (unconditional) expectations, will be discussed.
 
4.5.1 Definition of Conditional Cov (Xi , Xj )
×   Expectation 
Var(Xi )Var(Xj )
Consider the random variable (Y |X) which may be interpreted as the ran-
n n  n √
dom variable Y given random
= σvariable
2
+ 2 X. Let σE(Y ρ be the conditional
2 σ 2 |X)
expectation of Y for values i=1
of X. It is important
i j to remember that E(Y |X)
itself is a random variable. Let E(Y |Xi<j = x) or simply E(Y |x) represent
the expectation of Y conditioned on
n the
 n event {X = x}. Since E(Y |x) is
= nσ 2
+ 2
a function of x it is a random variable though, ρ σ 2 strictly speaking, it is one
i j
value of the random variable E(Y |X).i<j
 
n2 − n
= nσ 2 + 2 ρ σ 2
2
Definition 4.7 CONDITIONAL  EXPECTATION

=(Discrete
nσ + ρ σ
2 Case)
2 2
n −n
If (X, Y ) is a two-dimensional discrete random variable, we define
= nσ 2 + ρ σ 2 n (n − 1)
the conditional expectation of Y for given X = x as
= nσ 2 [1 + (n − 1) ρ]
 
Sn E(Y1|X = x) = y h(y|x)
(b) Var = 2
Var(Sn ) y
n n
where h(y|x) = pY |X σ 2 denotes the conditional p.m.f of Y given
=(y|x) [1 + (n − 1) ρ]
X n
When X’s are independent, ρ = 0 and
 
Sn σ2
Var = Var(X) =
Definition 4.8 n
CONDITIONAL n
EXPECTATION
(Continuous Case)
as will be seen in Theorem 7.2.
If (X, Y ) is a two-dimensional continuous random variable, we define
the conditional expectation of Y for given X = x as
4.5 CONDITIONAL EXPECTATIONS  ∞
E(Y |X = x) = y h(y|x) dy
−∞
We encountered in my book “Introductory Probability Theory” (Nsowah-
where h(y|x) = fY |X (y|x) denotes the conditional p.d.f of Y given
Nuamah, 2017) the concept of conditional probability. Again, in Chapter
X
1 of this book, we dealt with conditional distributions. In this section, we
discuss conditional expectation7 which is simply the expected value of a
variable, given that a set of prior conditions has taken place. As we shall
7
It is sometimes referred to as conditional expected value or conditional mean of a
random variable

135
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
144 THEORY – VOLUME Advanced
III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability

From the foregoing two definitions we notice that the definition of condi-
tional expectation is almost the same as the definition of expectation, except
that instead of a probability (marginal) distribution it uses a conditional
probability (marginal) distribution.

Example 4.6
Refer to Example 1.3. Determine: (a) E(Y |x), (b) E(X|y)

Solution
 ∞
(a) E(Y |x) = y f (y|x) dy
−∞

From Example 1.3,


3x + y
f (y|x) = , 0 < y < 2, 0<x<1
6x + 2
Hence
 2  
3x + y
E(Y |x) = y dy
0 6x + 2
 2 
3xy + y 2
= dy
0 6x + 2
 
1 4
= 3x +
3x + 1 3
9x + 4
=
3(3x + 1)
 ∞
(b) E(X|y) = x f (x|y) dx
−∞

From Example 1.13,

6x2 + 2xy
f (x|y) = , 0 < x < 1, 0<y<2
2+y
Hence
 1 
Measures of Relationship of Bivariate 
Distributions 145
6x2 + 2xy
E(X|y) = x dx
0 2+y
 1 3 
6x + 2x2 y
= dx
0 2+y
 1
1 6x4 2x3 y 

= + 
2+y 4 3 
0
 
1 6 2y
= +
2+y 4 3
18 + 8y
=
12(2 + y)

4.5.2 Properties of Conditional Expectation

Property 1

136
Theorem 4.12
= 2+ 1 y  46 + 2y
3
= 2+ 1 y 46 + 2y 3
= 2+y 4 + 3
ADVANCED TOPICS IN INTRODUCTORY 218
+ +
y 8y 4 3
= 18 + 8y
PROBABILITY: A FIRST COURSE = 18 + 8y
IN 12(2 + y)
PROBABILITY THEORY – VOLUME
=III 12(2
18 ++8yy)
= 12(2 + y)Measures of Relationship of Bivariate Distributions
12(2 + y)
4.5.2 Properties of Conditional Expectation
4.5.2 Properties of Conditional Expectation
4.5.2 Properties of Conditional Expectation
4.5.2
PropertyProperties
1 of Conditional Expectation
Property 1
Property 1
Property 1
Theorem 4.12
Theorem 4.12
Theorem
Suppose X 4.12
and Y are random variables. If Y has a finite expectation
Suppose
Theorem and Y are random variables. If Y has a finite expectation
X 4.12
and Y ≥ 0, and
Suppose X thenY are random variables. If Y has a finite expectation
and
Suppose
Y ≥X 0, and
thenY are random variables. If Y has a finite expectation
and Y ≥ 0, then E(Y |X) ≥ 0
and Y ≥ 0, then E(Y |X) ≥ 0
E(Y |X) ≥ 0
E(Y |X) ≥ 0

Property 2
Property 2
Property 2
Property 2
Theorem 4.13
Theorem 4.13
Theorem
Suppose X 4.13
and Y are random variables having finite expectations
Suppose
TheoremX 4.13and Y are random variables having finite expectations
Suppose and Y are1 ≤
and ai areXconstants, random
i ≤ n, variables
then having finite expectations
and a areXconstants,
Suppose and Y are1 ≤ random then
i ≤ n, variables having finite expectations
and aii are constants,
 1 ≤ i ≤ n,then
and ai are constants,
 1nn ≤ i ≤ n,then n
E  n
 n ai Yi |X  =  n Eai E(Yi |X)
E  n a Y |X = 
n Eai E(Yi |X)
E  i=1 aii Yii |X =  i=1 Eai E(Yi |X)
E i=1 ai Yi |X = i=1 Eai E(Yi |X)
i=1 i=1
i=1 i=1

In particular if a = a then
In particular if aii = a then
In particular if ai = a then
 n 
In particular if ai = a then

 n


n
n
E n ai Yi |X  = a 
n E(Yi |X)
E  n ai Yi |X = a  n E(Yi |X)
E i=1 ai Yi |X = a i=1 E(Yi |X)
E i=1 a Y
i=1 i i
|X = a i=1
i=1
E(Yi |X)
i=1 i=1

Sharp Minds - Bright Ideas!


Employees at FOSS Analytical A/S are living proof of the company value - First - using The Family owned FOSS group is
new inventions to make dedicated solutions for our customers. With sharp minds and the world leader as supplier of
cross functional teamwork, we constantly strive to develop new unique products - dedicated, high-tech analytical
Would you like to join our team? solutions which measure and
control the quality and produc-
FOSS works diligently with innovation and development as basis for its growth. It is tion of agricultural, food, phar-
reflected in the fact that more than 200 of the 1200 employees in FOSS work with Re- maceutical and chemical produ-
search & Development in Scandinavia and USA. Engineers at FOSS work in production, cts. Main activities are initiated
development and marketing, within a wide range of different fields, i.e. Chemistry, from Denmark, Sweden and USA
Electronics, Mechanics, Software, Optics, Microbiology, Chemometrics. with headquarters domiciled in
Hillerød, DK. The products are
We offer marketed globally by 23 sales
A challenging job in an international and innovative company that is leading in its field. You will get the companies and an extensive net
opportunity to work with the most advanced technology together with highly skilled colleagues. of distributors. In line with
the corevalue to be ‘First’, the
Read more about FOSS at www.foss.dk - or go directly to our student site www.foss.dk/sharpminds where
company intends to expand
you can learn more about your possibilities of working together with us on projects, your thesis etc.
its market position.

Dedicated Analytical Solutions


FOSS
Slangerupgade 69
3400 Hillerød
Tel. +45 70103370

www.foss.dk

137
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME Advanced
146 III Measures
Topics of Relationship
in Introductory of Bivariate Distributions
Probability

Property 3

Theorem 4.14
Suppose Y1 and Y2 are random variables with finite expectations. If
Y1 ≤ Y2 then
E(Y1 |X) ≤ E(Y2 |X)

Property 4

Theorem 4.15
Suppose X and Y are two independent random variables. If Y has
a finite expectation, then

E(Y |X) = E(Y )

Property 5

Theorem 4.16
Suppose X and Y are two random variables. If Y has a moment of
order r ≥ 1, then
 r 
  
E(Y |X) ≤ [E(|Y |X)]r ≤ E(|Y |r |X)

As pointed out earlier, E(Y |X) is a random variable and hence we may find
Measures of Relationship of Bivariate Distributions
its expectation. 147

Property 6

Theorem 4.17
Suppose that X and Y are independent random variables. Then the
expectation of conditional expectation is given by

E[E(Y |X)] = E(Y )

Proof
We will prove this for the continuous case. The discrete case is proved
similarly on replacing integrals by summations.
By definition,
 +∞ 138

E(Y |x) = y h(y|x) dy


expectation of conditional expectation is given by

ADVANCED TOPICS IN INTRODUCTORY


E[E(Y |X)] = E(Y )
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions

Proof
We will prove this for the continuous case. The discrete case is proved
similarly on replacing integrals by summations.
By definition,
 +∞
E(Y |x) = y h(y|x) dy
−∞
 ∞
f (x, y)
= y dy
−∞ g(x)
where f (x, y) is the joint probability density function of (X, Y ) and g(x) is
the marginal probability density function of X.
Multiply both sides of the equation by g(x) :
 ∞  
f (x, y)
g(x)E[Y |x] = y dy g(x)
−∞ g(x)
Taking the integral of both sides with respect to x, gives an expectation of
E(Y |x), namely, E[E(Y |X)]. That is,
 ∞
E[E(Y |X)] = E(Y |x) g(x) dx
−∞
 ∞  ∞ 
f (x, y)
= y dy g(x) dx
−∞ −∞ g(x)
If all the expectations exist, it is permissible to write the above iterated
integral with the order of integration reversed. Thus
148
148 Advanced
Advanced
 ∞  Topics
Topics in
in Introductory
Introductory
 Probability
Probability
148 Advanced Topics
∞ in Introductory Probability
E[E(Y |X)] = y f (x, y) dx dy

−∞
∞ −∞

=  ∞ yy h(y)
= h(y) dy
dy
= −∞ −∞ y h(y) dy
=
= E(Y ))
−∞
E(Y
= E(Y )
Theorem
Theorem 4.17
4.17 gives
gives what
what might
might be
be called
called the
the law
law of
of total
total expectation:
expectation: the
the
Theorem 4.17
expectation of gives what
of aa random might be
random variable called
variable Y the
can be
Y can law of total
be calculated
calculated byexpectation:
by weighting the
the
expectation weighting the
expectation
conditional of a randomappropriately
expectations variable Y can
and be calculated
summing or by weighting
integrating. the
conditional expectations appropriately and summing or integrating.
conditional expectations appropriately and summing or integrating.
Property 7
Property 7
Property 7

Theorem 4.18
Theorem 4.18
Theorem
Let  = 4.18
|x) be
Let Y = E(Y |x)
Y E(Y be the
the conditional
conditional expectation
expectation of Y given
of Y X. Then
given X. Then
Let Y = E(Y |x) be the conditional expectation of Y given X. Then
 2
Y ))2 ]] ≤ π)2 ]]
2
E[(Y −
E[(Y −Y E[(Y −
≤ E[(Y − π)
E[(Y − Y )2 ] ≤ E[(Y − π)2 ]
where
where ππ= π(x) is
= π(x) is any
any other
other function
function of of X
X
where π = π(x) is any other function of X

Proof
Proof
Proof
  π)}22 ]
} = Y )) +
+ ((Y
2
E{(Y −
E{(Y π)2 }
− π) E[{(Y −
= E[{(Y −Y Y −
− π)} ]
E{(Y − π)2 } = E[{(Y −Y 2) + (Y − π)}2 ]  
= E[(Y − Y ) 2 ] + 2E[(Y − Y
= E[(Y − Y )2 ] + 2E[(Y − Y
 )(YY π)] +
− π)] + E[(
E[(Y
Y − π)2 ]]
− π)
2
= E[(Y − Y )2 ] + 2E[(Y −  )(
Y )( Y −
− π)] + E[(Y − π)2 ]
= E[(Y − Y
 ) 2 ] + E[(Y 
 − π) 2]
2
(i)
= E[(Y − Y )2 ] + E[(Y − π)2 ] (i)
= E[(Y − Y ) ] + E[(Y − π) ] (i)
because
because we
we can
can show
show that
that the
the second
second term
term isis zero.
zero. Thus,
Thus,
because we can show thatthe 
second term is zero. Thus,
 
 Y   (y − y)(y − π)f (x, y)∂y∂x
E(Y −
E(Y −YY )(
)(Y − π) =
− π) = (y − y)(y − π)f (x, y)∂y∂x
E(Y − Y )(Y − π) = xx yy (y − y)(y − π)f (x, y)∂y∂x
x 
 
y 139 

=   (y − y)f (y|x)∂y  (y − π)f (x)∂x
= (y − y)f (y|x)∂y (y − π)f (x)∂x
Proof

ADVANCED π)2 } IN
E{(Y −TOPICS E[{(Y − Y ) + (Y − π)}2 ]
= INTRODUCTORY
PROBABILITY: A FIRST COURSE IN 2
PROBABILITY THEORY= – E[(Y − YIII) ] + 2E[(Y
VOLUME − Y )(Y of
Measures − π)] + E[(Y − π)
Relationship
2
]
of Bivariate Distributions
 2 
= E[(Y − Y ) ] + E[(Y − π) ] 2
(i)

because we can show that the second term is zero. Thus,


 
E(Y − Y )(Y − π) = (y − y)(y − π)f (x, y)∂y∂x
x y
  
= (y − y)f (y|x)∂y (y − π)f (x)∂x
x y

since f (x, y) can be expressed as the product of the conditional density


function of Y given X and the marginal density function of Y , that is,
f (x, y) has been factored as f (x, y) = f (y|x)f (x). And also
  
(y − y)f (y|x)∂y
Measures of Relationship =
of Bivariate yf (y|x)∂y − yf (y|x)∂y
Distributions 149
Measures of Relationship
y of Bivariatey Distributions y 149
Measures of Relationship of Bivariate Distributions 149
= E(Y
Measures of Relationship of Bivariate Distributions
|x) − E(Y |x) 149
Now, from (i), = 0
Now, from (i),
Now, from (i),
Now, from (i), E{(Y − π)2 } = E[(Y − Y )2 ] + E[(Y − π)2 ]
E{(Y − π)2 } = E[(Y − Y  )2 ] + E[(Y − π)2 ]
E{(Y − π)22 } =  E[(Y
 ))22]] + π)22 ]
− 2
E{(Y − π) } =
≥ E[(Y
≥ E[(Y − Y
Y
+ E[(Y −
− π) ]
≥ E[(Y − Y )22 ]
from which the result follows.≥ E[(Y − Y ) ]
from which the result follows.
from which the result follows.
from which the result follows.
Theorem 4.19
Theorem 4.19
If the joint 4.19
Theorem distribution of Y and X is a Normal distribution, then
If the joint4.19
Theorem distribution of Y and X is a Normal distribution, then
the
If thefunction E(Y |x) mayofbeY expressed
joint distribution and X is as a Normal distribution, then
the function E(Y |x) mayofbe
If the joint distribution Y expressed
and X is as a Normal distribution, then
the function E(Y |x) may be expressed as
the function E(Y |x) mayE(Y be expressed
|x) = α +as βx
E(Y |x) = α + βx
E(Y |x) = α + βx
E(Y |x) = α + βx

Theorem 4.19 indicates that the conditional expectation of Y given X is


Theorem 4.19 indicates that the conditional expectation of Y given X is
a linear function
Theorem of X. that
4.19 indicates As wetheshall discuss in
conditional the sequel,ofthe
expectation function
Y given X is
a linear function
Theorem 4.19 of X. that
indicates As wetheshall discuss in
conditional the sequel,ofthe
expectation Y function
given X is
adescribed as a linear
linear function regression
of X. As we function.
shall discuss in the sequel, the function is
adescribed as a linear
linear function regression
of X. As we function.
shall discuss in the sequel, the function is
described as a linear regression function.
described
Property 8 as a linear regression function.
Property 8
Property 8
Property 8
Theorem 4.20
Theorem 4.20
Suppose that
Theorem X and Y are independent random variables.
4.20 Then the
Suppose that
Theorem X and Y are independent random variables.
4.20 Then the
Suppose that X and Y expectation
variance of conditional Y is random
are independent given byvariables. Then the
variance that
Suppose of conditional expectation
X and Y are Y is random
independent given byvariables. Then the
variance of conditional expectation Y is given by
variance of conditional expectation
Var[E(Y |X)] is E[Var(Y
= Var(YY) − given by |X)]
Var[E(Y |X)] = Var(Y ) − E[Var(Y |X)]
Var[E(Y |X)] = Var(Y ) − E[Var(Y |X)]
Var[E(Y |X)] = Var(Y ) − E[Var(Y |X)]

Proof
Proof
Just as E(Y |X), Var(Y |X) is also a random variable and has expectation
Proof
Just as E(Y |X), Var(Y |X) is also a random variable and has expectation
Proof
Just as E(Y
E[Var(Y Now,
|X)].|X), Var(Y |X) is also a random variable and has expectation
Just as E(Y
E[Var(Y Now,
|X)].|X), Var(Y |X) is also a random variable and has expectation
E[Var(Y |X)]. Now,
E[Var(Y |X)]. Now,
E[Var(Y |X)] = E{E(Y 22|X) − [E(Y |X)]22}
E[Var(Y |X)] = E{E(Y22 |X) − [E(Y |X)]2 } 2
E[Var(Y |X)] = E[E(Y |X) − [E(Y
E{E(Y 22|X)] |X)]
E{E[(Y 2 } 2}
|X)]
E[Var(Y |X)] = E{E(Y
E[E(Y 2 |X)] − [E(Y
|X) − |X)]|X)]
E{E[(Y } }
= E[E(Y 22|X)] − {E[E(Y |X)]22 }22 + {E[E(Y |X)]}22
E{E[(Y |X)]}
= E[E(Y |X)] − E{E[(Y |X)]}} + {E[E(Y |X)]}
{E[E(Y |X)]
= E[E(Y 22 |X)] − {E[E(Y |X)]}22 + {E[E(Y |X)]}22
= E[E(Y |X)] − {E[E(Y |X)]} + {E[E(Y |X)]}

140
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
150 Advanced Topics in Introductory Probability

−E{[E(Y |X)]2 }
= E(Y 2 ) − [E(Y )]2 − Var[E(Y |X)] (from Theorem 4.17)
= Var(Y ) − Var[E(Y |X)

from which the result follows.

4.6 CONDITIONAL VARIANCES

In this section we discuss conditional variances and prove a useful formula


relating conditional and unconditional variances.

4.6.1 Definition of Conditional Variance

Definition 4.9 CONDITIONAL VARIANCE


If (X, Y ) is a two-dimensional discrete random variable, we define
the conditional variance of Y given X as

Var(Y |X) = E(Y 2 |X) − [E(Y |X)]2

provided that E(Y |X) exists

Thus for the discrete case, if pX (x) > 0 and if Y has a second moment, then
the conditional variance of Y given X = x is given by

 2
 
Var(Y |X = x) = y pY |X (y|x) −
2
y pY |X (y|x)
y y

The conditional variance of Y given X = x for the continuous case is defined


similarly by replacing the summations with integrals.
Another way of definition the conditional variance is given in Definition
4.9a.

141
Var(Y |X) = E(Y 2 |X) − [E(Y |X)]2
ADVANCED TOPICS IN INTRODUCTORY
provided A
PROBABILITY: that E(Y
FIRST |X) exists
COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions

Thus for the discrete case, if pX (x) > 0 and if Y has a second moment, then
the conditional variance of Y given X = x is given by

 2
 
Var(Y |X = x) = y pY |X (y|x) −
2
y pY |X (y|x)
y y

The conditional variance of Y given X = x for the continuous case is defined


similarly by replacing the summations with integrals.
Another way of definition the conditional variance is given in Definition
4.9a.
Measures of Relationship of Bivariate Distributions 151

Definition 4.9a CONDITIONAL VARIANCE


If (X, Y ) is a two-dimensional discrete random variable, then the
conditional variance of Y given X is


Var(Y |X) = E{[Y − E(Y |X)]2 X}

provided that E(Y |X) exists

4.6.2 Properties of Conditional Variance

Property 1

Theorem 4.21
Suppose that X and Y are independent random variables. Then the
expectation of conditional variance

E[Var(Y |X)] = E[E(Y 2 |X)] − E{[E(Y |X)]2 }

Property 2

Theorem 4.22
Suppose that X and Y are independent random variables. Then the
expectation of conditional variance is given by

E[Var(Y |X)] = Var(Y ) − Var[E(Y |X)]

Proof
This follows from the proof of Theorem 4.20.

We shall notice in Theorem 4.23 in the sequel derived from Theorem


4.20 that the unconditional variance Var(Y ) is not just the expected value
of the conditional variance but also the variance of conditional expectation.

142
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
152 Advanced Topics in Introductory Probability

Theorem 4.23
Suppose that X and Y are independent random variables. Then

Var(Y ) = Var[E(Y |X)] + E[Var(Y |X)]

Proof
This follows from the proof of Theorem 4.20.

Aliter
Var(Y ) = E[Y − E(Y )]2
= E{[Y − E(Y |X)] + [E(Y |X) − E(Y )]}2
= E[Y − E(Y |X)]2 + E[E(Y |X) − E(Y )]2
since the cross-product term vanishes. Conditioning on X the two terms of
the expression on the right side, we obtain for the first term:
E{E[Y − E(Y |X)]2 |X} = E[Var(Y |X)] (by Definition 4.9a)
and the second term:
E[E[Y |X) − E(Y |X)]2 = Var[E(Y |X)]
Hence
Var(Y ) = E[Var(Y |X)] + Var[E(Y |X)]

4.7 REGRESSION CURVES

The graph of E(Y |x) as a function of x, is called the regression curve


(of the mean) of Y on X. Similarly, the graph of E(X|y) as a function of y
is known as the regression curve (of the mean) of X on Y .

4.7.1 Definition of Linear Regression

The linear regression function was encountered in Theorem 4.19 as the ex-
pected value of Y given X. In this section, we shall discuss in detail its
Measures
special of Relationship of Bivariate Distributions
characteristics. 153

Definition 4.10 LINEAR REGRESSION


The two-dimensional random variable (X, Y ) is said to have a linear
regression of Y on X if

E(Y |x) = α + βx

where α and β are constants

That is, the regression curve of Y on X is a straight line, called the re-
gression line of Y on X. The constants α and β are the parameters of the
linear regression equation. The constant α is called the intercept, which is
the point where the regression line cuts the y-axis. The constant β is called
the slope of the regression equation or the regression coefficient of Y on X
which measures a change in Y per unit change in X.
143
The constants α and β are unknown parameters and have to be esti-
mated. We discuss here two estimation methods, namely,
ADVANCED TOPICS IN INTRODUCTORY
E(Y |x) = α + βx
PROBABILITY: A FIRST COURSE IN
where α and
PROBABILITY β are
THEORY constants
– VOLUME III Measures of Relationship of Bivariate Distributions

That is, the regression curve of Y on X is a straight line, called the re-
gression line of Y on X. The constants α and β are the parameters of the
linear regression equation. The constant α is called the intercept, which is
the point where the regression line cuts the y-axis. The constant β is called
the slope of the regression equation or the regression coefficient of Y on X
which measures a change in Y per unit change in X.
The constants α and β are unknown parameters and have to be esti-
mated. We discuss here two estimation methods, namely,

(a) method of moments;

(b) method of least suares.

4.7.2 Method of Moments for Estimating Linear Regression


Function

The method of moments for estimating the linear regression function at-
tempts to find expressions for α and β that are in terms of the first-and
second-order moments of the joint distribution, namely, E(X), E(Y ), Var(X)
and Cov(X, Y ).

Theorem 4.24
If (X, Y ) has linear regression of Y on X, then

α = E(Y ) − β E(X)
Cov(XY )
β =
Var(X)

The Wake
the only emission we want to leave behind

.QYURGGF'PIKPGU/GFKWOURGGF'PIKPGU6WTDQEJCTIGTU2TQRGNNGTU2TQRWNUKQP2CEMCIGU2TKOG5GTX

6JGFGUKIPQHGEQHTKGPFN[OCTKPGRQYGTCPFRTQRWNUKQPUQNWVKQPUKUETWEKCNHQT/#0&KGUGN6WTDQ
2QYGTEQORGVGPEKGUCTGQHHGTGFYKVJVJGYQTNFoUNCTIGUVGPIKPGRTQITCOOGsJCXKPIQWVRWVUURCPPKPI
HTQOVQM9RGTGPIKPG)GVWRHTQPV
(KPFQWVOQTGCVYYYOCPFKGUGNVWTDQEQO

144
The method of moments for estimating the linear regression function at-
ADVANCED
tempts to TOPICS IN INTRODUCTORY
find expressions for α and β that are in terms of the first-and
PROBABILITY: A FIRST COURSE IN
second-order moments of the joint distribution,
PROBABILITY THEORY – VOLUME III
namely, E(X), E(Y ), Var(X)
Measures of Relationship of Bivariate Distributions
and Cov(X, Y ).

Theorem 4.24
If (X, Y ) has linear regression of Y on X, then

α = E(Y ) − β E(X)
Cov(XY )
β =
Var(X)
154 Advanced Topics in Introductory Probability

Proof
Proof for Estimating α:
From Definition 4.10, the regression function is given by

E(Y |x) = α + βx (i)

If we multiply (i) by f (x) and integrate it with respect to x, that is,


 
E(Y |x)f (x) dx = (α + βx)f (x) dx
x x
we obtain
E(Y ) = α + βE(X) (ii)
so that
α = E(Y ) − βE(X) (iii)

Proof for Estimating β:


Let us substitute (iii) into (i) to obtain

E(Y |x) = E(Y ) + β(X − E(X)) (iv)

Multiplying (i) by x and f (x) and then integrating with respect to x, we


have
 
xE(Y |x)f (x) dx = x(α + βx)f (x) dx
x x
E(XY ) = αE(X) + βE(X 2 ) (v)

Multiplying (ii) by E(X), we get

E(X)E(Y ) = αE(X) + β[E(X)]2 (vi)

Subtracting (vi) from (v) gives

E(XY ) − E(X)E(Y ) = β{E(X 2 ) − E[(X)]2 } (vi)

so that
E(XY ) − E(X)E(Y )
β =
E(X 2 ) − E[(X)]2
Measures of Relationship of Bivariate Distributions 155
Cov(X, Y )
= (from Theorem 4.1)
Var(X)
or
E[X − E(X)][(Y − E(Y )]
β=
E{[(X − E[(X)]2 }

4.7.3 Least Squares Method of Estimating Linear Regression


Function

Let the values (x1 , y1 ), (x2 , y2 ), · · · (xn , yn ), be


145plotted as points in the x, y
plane. Then the problem of estimating a linear regression function can be
Measures of Relationship of Bivariate Distributions 155

ADVANCED TOPICS IN INTRODUCTORY


or
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – β
E[XIII− E(X)][(Y
VOLUME
− E(Y )]
Measures of Relationship of Bivariate Distributions
=
E{[(X − E[(X)]2 }

4.7.3 Least Squares Method of Estimating Linear Regression


Function

Let the values (x1 , y1 ), (x2 , y2 ), · · · (xn , yn ), be plotted as points in the x, y


plane. Then the problem of estimating a linear regression function can be
treated as a problem of fitting a straight line to this set of points. The best
known method is the least squares approach.
The least squares method states that the sum of the squares of the
difference between the observed value of Y and the corresponding fitted
curve value of Y, which we denote by Ŷ , must be minimum. The values of
the parameters obtained by this minimisation determine what is known as
Measures of Relationship
the best fitting curve in theof sense
Bivariate Distributions
of least squares. 155

or Theorem 4.25
E[X − E(X)][(Y − E(Y )]
=
If (X, Y ) has linearβ regression of Y on X, then
E{[(X − E[(X)]2 }
α = µY − β µX
4.7.3 Least Squares Method of Estimating Linear Regression
Function σXY
β =
σ2
Let the values (x1 , y1 ), (x2 , y2 ), · · · (xnX, yn ), be plotted as points in the x, y
plane.
whereThenµY the problem
= E(X), 2 of=estimating
σX Var(X) and a linear
σXYregression
= Cov(X,function
Y) can be
treated as a problem of fitting a straight line to this set of points. The best
known method is the least squares approach.
ProofThe least squares method states that the sum of the squares of the
difference
Let us find between the observed
the best function of thevalue of Y =and
form h(x) α+βthe
x. corresponding fitted
This merely requires
curve valueover
optimising of Y,the
which
two we denote byαŶ ,and
parameters mustβ.beNow,
minimum.
we canThe values
write 8 of
the parameters obtained by this minimisation determine what is known as
Var(Y
the best−fitting curve=in the
α − β X) − α of
E[(Ysense −β least ] − [E(Y − α − β X)]2
X)2squares.
156 − α − β X)2 ] = Var(YAdvanced
E[(Y − α − β X) + [E(Y
Topics − α − β X)]2Probability
in Introductory
= (σY2 + β 2 σX2
− 2β σXY ) + [E(Y − α − β X)]2
Theorem 4.25
8
If (X, Y )that
Recollect (by Theorem
has linear regression of Y 3.22;
on X,and also Var(α) = 0, a property
then
2 2
ofE(X)
variance)
= E(X ) − [E(X)]
α = µY − β µX
The second term of the expression on the right hand side of the equation
does not depend on α so α can be chosen σXY so as to minimize the second term.
β =
Recall that to minimize a function σX2 we set the derivative of the function

to zero. Thus
where µY = E(X), σX 2 = Var(X) and σ
 XY = Cov(X, Y )
[E(Y − α − β X)]2 = 2[E(Y − α − β X)] = 0

Proof
giving
Let us find the best function of the form h(x) = α+β x. This merely requires
optimising over the two − α − β µX α =and
µYparameters 0 β. Now, we can write8
α = µY − β µX
Var(Y − α − β X) = E[(Y − α − β X)2 ] − [E(Y − α − β X)]2
Now to
E[(Y − minimize
α − β X)2the
] =firstVar(Y
term − weαset− βthe
X)derivative
+ [E(Y −with
α − respect
β X)]2 to β equal
to zero, to obtain =  (σ 2 + β 2 σ 2 − 2β σ ) + [E(Y − α − β X)]2
Y X XY 
8
σ 2
Y + β 2 2
σ X − 2 β σ XY =0
Recollect that
giving E(X) = E(X 2 ) − [E(X)]2

2 β σX
2
− 2 σXY = 0
σXY
β = 2
σX
146
[E(Y − α − β X)] = 2[E(Y − α − β X)] = 0

giving
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – µ Y −α−
VOLUME IIIβ µX = Measures
0 of Relationship of Bivariate Distributions
α = µY − β µX

Now to minimize the first term we set the derivative with respect to β equal
to zero, to obtain  
σY2 + β 2 σX
2
− 2 β σXY = 0
giving

2 β σX
2
− 2 σXY = 0
σXY
β = 2
σX

Corollary 4.3
If β is a coefficient of a linear regression function of X on Y , then
σY
β=ρ
σX

Proof
It is sufficient to show that
σXY σY
2 =ρ
σX σX

Thus
σXY
β = 2
σX

Challenge the way we run

EXPERIENCE THE POWER OF


FULL ENGAGEMENT…

RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM

1349906_A6_4+0.indd 1 22-08-2014 12:56:57

147
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
Measures of Relationship of Bivariate Distributions 157
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions

σXY σY
= ·
σX σX σY
σXY σY
= ·
σX σY σX
σY
= ρ
σX
Similarly, (X, Y ) is said to have linear regression of X on Y if
E(X|y) = τ + θ y (i)
with τ and θ as constants. That is, the regression curve of X on Y is a
straight line, which is called the regression line of X on Y . The number θ
is called the regression coefficient of X on Y and is defined as
σXY Cov(X, Y )
θ= = (ii)
2
σY Var(Y )

Theorem 4.26
Let us define Var(Ŷ ) = Var(Y − β X) as the mean squared predic-
tion error. Then
Var(Ŷ ) = σY2 (1 − ρ2 )

Proof
Var(Y − β X) = Var(Y ) + Var(β X) − 2Cov(Y, β X)
= σY2 + β 2 σX
2
− 2 β σXY
σ 2 σXY
= σY2 + XY 4
2
σX − 2 2 σXY
σX σX
σ 2
= σY2 − XY 2
σX
 
σXY σY 2
= σY2− ·
σX σY
= σY2 − ρ2 σY2
158 
Advanced 
Topics in Introductory Probability
= σY2 1 − ρ2

Example 4.7
Refer to Example 1.3.

(a) Find the linear regression of Y on X;

(b) Calculate the mean squared prediction error.

Solution
(a) From Example 3.19
2
σX = VarX = 0.04506
σY2 = VarY = 0.32099
σXY = Cov(X,Y) = −0.00617

Hence  
σXY 0.00617
β= 2 = − 0.04506 = −0.13693
σX
148
From Example 3.8,
(a) From Example 3.19
ADVANCED TOPICS IN INTRODUCTORY
2
σX = VarX = 0.04506
PROBABILITY: A FIRST COURSE IN
σY = IIIVarY
2
PROBABILITY THEORY – VOLUME = 0.32099
Measures of Relationship of Bivariate Distributions
σXY = Cov(X,Y) = −0.00617

Hence  
σXY 0.00617
β= 2 =− = −0.13693
σX 0.04506
From Example 3.8,
13
µX = E(X) = = 0.72222
18
10
µY = E(Y ) = = 1.11111
9
Hence

α = µY − β µX
= 1.11111 − (−0.13693)(0.72222) = 0.90111

Finally,
E(Y |x) = 0.90111 − 0.13693 x
(b) Calculation of the mean squared prediction error

Var(Ŷ ) = σY2 (1 − ρ2 )
= 0.32099[1 − (−0.05127)] = 0.33745

Since the value for β is negative, it means as X increases Y decreases.


That is, a unit increase in X corresponds to about 1.14 decrease in Y
(where
Measurestheof1.14 is in the same
Relationship unit of Distributions
of Bivariate measurement as Y .) 159

Theorem 4.27
Let (X, Y ) be a two-dimensional random variable with

E(X) = µX , E(Y ) = µY , V (X) = σX


2
and V (Y ) = σY2

If the regression of Y on X is linear, then


σY
E(Y |x) = µY + ρ (X − µX )
σX
where ρ is the correlation coefficient between Y and X

Proof
Substituting the expressions for α and β from Theorem 4.25 into Defi-
nition 4.10, we get
σXY
E(Y |x) = (µY − β µX ) + 2 x
σX
σXY σXY
= µX − 2 µX + 2 x
σX σX
σXY
= µY − 2 (X − µX )
σX
σY
= µY − ρ (X − µX ) (f romCorollary4.3)
σX
Similarly, it can be proved that if the regression of X on Y is linear, then
σX
E(X|y) = µX + θ (X − µY )
σY
It follows that if a regression is linear and ρ = 0, then E(Y |x) does not
depend on X and E(X|y) does not depend on Y.
149
Example 4.8
σXY σXY
= µX − 2 µX + σ 2 x
σX X
ADVANCED TOPICS IN INTRODUCTORY
σXY
= COURSE
PROBABILITY: A FIRST µY − IN2 (X − µX )
σX
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
σY
= µY − ρ (X − µX ) (f romCorollary4.3)
σX
Similarly, it can be proved that if the regression of X on Y is linear, then
σX
E(X|y) = µX + θ (X − µY )
σY
It follows that if a regression is linear and ρ = 0, then E(Y |x) does not
depend on X and E(X|y) does not depend on Y.

Example 4.8
Refer to Example 1.3. Write the regression equation of Y on X.

Solution
From Example 3.8
13
E(X) = = 0.72222
18
160 10 Advanced Topics in Introductory Probability
E(Y ) = = 1.11111 (to 5 decimal places)
9
From Example 3.19,

σX = 0.04506 = 0.21227

σY = 0.32099 = 0.56656

From Example 4.5,


ρ = −0.05127.
Hence
0.56656
E(Y |x) = 1.11111 − 0.05127 (x − 0.72222)
0.21227
= 1.11111 − 1.13684(x − 0.72222)

Technical training on
Theorem 4.28 WHAT you need, WHEN you need it
The product ofAtthe IDC regression
Technologies coefficients β and
we can tailor θ in the and
our technical linear
engineering
regressions E(Y |x) and
training E(X|y), respectively,
workshops is equal
to suit your needs. Weto theextensive
have square OIL & GAS
of the correlation coefficient
experience of X and
in training Y : and engineering staff and
technical ENGINEERING
have trained people in organisations such as General
ρ2 = βθ ELECTRONICS
Motors, Shell, Siemens, BHP and Honeywell to name a few.
Our onsite training is cost effective, convenient and completely AUTOMATION &
customisable to the technical and engineering areas you want PROCESS CONTROL
covered. Our workshops are all comprehensive hands-on learning
Proof experiences with ample time given to practical sessions and MECHANICAL
In Corollary 4.3 and (ii) above, We communicate well to ensure that workshop content
demonstrations. ENGINEERING
and timing match the knowledge, skills, and abilities of the participants.
σY INDUSTRIAL
We run onsite β = ρ all year round and hold the workshops on
training DATA COMMS
your premises or a venue
σX of your choice for your convenience.
ELECTRICAL
and For a no obligation σX proposal, contact us today POWER
θ=ρ
at training@idc-online.com or visit our website
σY
for more information: www.idc-online.com/onsite/
Hence
σY σX
βθ = ρ · ρ Phone: +61 8 9321 1702
σX σY
from which the result follows. Email: training@idc-online.com
Website: www.idc-online.com
Note
(a) The sign of the regression coefficient is determined by ρ, since σX > 0
and σY > 0.
150

σX = 0.04506 = 0.21227

ADVANCED TOPICS IN INTRODUCTORY
σY = 0.32099 = 0.56656
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
From Example 4.5,
ρ = −0.05127.
Hence
0.56656
E(Y |x) = 1.11111 − 0.05127 (x − 0.72222)
0.21227
= 1.11111 − 1.13684(x − 0.72222)

Theorem 4.28
The product of the regression coefficients β and θ in the linear
regressions E(Y |x) and E(X|y), respectively, is equal to the square
of the correlation coefficient of X and Y :

ρ2 = βθ

Proof
In Corollary 4.3 and (ii) above,
σY
β=ρ
σX
and
σX
θ=ρ
σY
Hence
σY σX
βθ = ρ ·ρ
σX σY
from which the result follows.

Note
(a) The of
Measures sign of the regression
Relationship coefficient
of Bivariate 161
is determined by ρ, since σX >
Distributions 0
and σY > 0.
(b) The two linear regression equations, E(Y |x) and E(X|y), have the
same sign of slope.

(c) The regression line passes through the point (EXY ) = [(E(X), E(Y )],
which is the expected value of the joint distribution.

(d) The point of the intersection of the linear regression curves E(Y |x)
and E(X|y) is [(E(X), E(Y )].

This chapter concludes discussions on bivariate distributions which we


started in Chapter 1. By induction we can easily extent most of the concepts
discussed so far to multivariate distributions in general.

EXERCISES

4.1 For the data of Exercise 1.1, find the covariance between X and Y.

4.2 For the data of Exercise 1.1, find the correlation coefficient of X and
Y.

4.3 If Var(W ) = 5, Var(X) = 3, Var(Y ) = 4 and Var(Z) = 1, find

(a) Cov(X, W ), (b) Cov(X, Y ), (c) Cov(Y, W ),


151
ADVANCED TOPICS IN INTRODUCTORY
4.1 For the data of Exercise 1.1, find the covariance between X and Y.
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Measures of Relationship of Bivariate Distributions
4.2 For the data of Exercise 1.1, find the correlation coefficient of X and
Y.

4.3 If Var(W ) = 5, Var(X) = 3, Var(Y ) = 4 and Var(Z) = 1, find

(a) Cov(X, W ), (b) Cov(X, Y ), (c) Cov(Y, W ),

(d) Cov(X, Z), (e) Cov(Y, Z), (f) Cov(W, Z)

given that the correlation coefficient between any pair is 0.6.

4.4 Refer to Exercise 4.3. If a = 4, b = 2, c = 2 and d = 3, find

(a) Cov(c + X, Y ), (b) Cov(c+X, a+Y),


(c) Cov(dX + bY ), (d) Cov(a + dY, b + cZ),
(e) Cov(aX, bZ + cW ), (f ) Cov(aW + bX, cY + dZ),
(g) Cov(c + aW, c + aW ), (h) Cov(bY + cW, X)

4.5 A die is rolled twice. Let X be the sum of the outcomes, and Y , the
first outcome minus the second. Compute Cov(X, Y )
162 Advanced Topics in Introductory Probability
4.6 Prove Theorem 4.5.
4.7 Prove Theorem 4.7.

4.8 Prove Theorem 4.8.

4.9 Refer to Exercise 1.7, find the correlation coefficient of X and Y.

4.10 Refer to Exercise 1.10, find the correlation coefficient of X and Y.

4.11 Suppose the random variable X and Y have the joint p.d.f.

x + y, 0 < x < 1, 0<y<1
f (x, y) =
0, elsewhere

Compute the correlation coefficient of X and Y.

4.12 Suppose the joint p.d.f. of X and Y is given by



1
e−y , 0 < x < y, 0<y<∞
f (x, y) = y
0, elsewhere

Compute (a) E(X|y), (b) E(X 2 |y)

4.13 Refer to Exercise 1.7.

(a) Use the least squares method to obtain the linear regression
equation of (i) Y on X; (ii) X on Y.
(b) (i) Find the product of the regression coefficients in a(i) and
a(ii);
(ii) Take the square root of your results in b(i);
(iii) Compare the result in b(ii) with that in Exercise 4.9.

4.14 Refer to Exercise 1.10. Find


(a) Var(Y |X) (b) Var(X|Y ).

4.15 Refer to Exercise 1.7. Find


(a) Var(Y |X) (b) Var(X|Y ).

4.16 Refer to Exercise 1.11. Find


(a) Var(Y |X) (b) Var(X|Y ). 152
(ii) Take the square root of your results in b(i);
ADVANCED(iii) Compare
TOPICS the result in b(ii) with that in Exercise 4.9.
IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
4.14 Refer toTHEORY
PROBABILITY Exercise 1.10. Find
– VOLUME III Measures of Relationship of Bivariate Distributions
(a) Var(Y |X) (b) Var(X|Y ).

4.15 Refer to Exercise 1.7. Find


(a) Var(Y |X) (b) Var(X|Y ).

4.16 Refer to Exercise 1.11. Find


(a) Var(Y |X) (b) Var(X|Y ).
Measures of Relationship of Bivariate Distributions 163
4.17 Verify Theorem 4.16 using Table 1.1.
4.18 Refer to Exercise 4.10.

(a) Find the regression of X on Y .


(b) Calculate the mean squared prediction error.

4.19 Refer to Exercise 4.16.

(a) Find the regression of X on Y.


(b) Calculate the mean squared prediction error.

4.20 Show that the point of intersection of Y = E(Y |x) and X = E(X|y)
is (µX , µY )

4.21 Show that ρ(X, Y ) can be defined as

Cov(X, Y ) = XY − XY

Hence show that the correlation coeficient is scale invariant.

4.22 Prove Theorem 4.10 for the discrete case.

�e Graduate Programme
I joined MITAS because for Engineers and Geoscientists
I wanted real responsibili� www.discovermitas.com
Maersk.com/Mitas �e G
I joined MITAS because for Engine
I wanted real responsibili� Ma

Month 16
I was a construction Mo
supervisor ina const
I was
the North Sea super
advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr

153
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN PART 2 STATISTICAL INEQUALITIES, LIMIT
PROBABILITY THEORY – VOLUME III LAWS AND SAMPLING DISTRIBUTIONS

PART 2
STATISTICAL INEQUALITIES, LIMIT LAWS
AND SAMPLING DISTRIBUTIONS

Life is the art of drawing sufficient conclusions from insufficient premises


SAMUEL BUTLER

154
STATISTICAL INEQUALITIES AND LIMIT LAWS
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
Chapter 5

Chapter
5.1
5
STATISTICAL INEQUALITIES AND LIMIT LAWS
INTRODUCTION

STATISTICAL
5.1.1
INEQUALITIES AND LIMIT LAWS
Statistical Inequalities

There are two well-known inequalities due to two Russian mathematicians,


5.1 INTRODUCTION
Andrey Markov and and his teacher, Pafnuty Chebyshev9 , which play an
important role in statistical work.
5.1.1If weINTRODUCTION
5.1 know the probability
Statistical Inequalities distribution of a random variable X, we may
compute E(X) and Var(X), if these exist. However, the converse is not true.
That is,
There arefrom
twoawell-known
knowledge inequalities
of E(X) anddue Var(X),
to twowe cannotmathematicians,
Russian reconstruct the
5.1.1
probabilityStatistical Inequalities
distribution
Andrey Markov and andofhisX.teacher,
This isPafnuty
becauseChebyshev
to describe
9 a probability
, which play an
distributionrole
important completely, we work.
in statistical need to know the probability function of the
There are two well-known inequalities due to two Russian mathematicians,
random variable.
If we know theHowever,
probabilityMarkov’s and Chebyshev’s
distribution inequalities
of a random variable X, we enable
may
Andrey Markov and and his teacher, Pafnuty Chebyshev9 , which play an
us to derive
compute E(X)lower
and (or upper)
Var(X), bound
if these on such
exist. probabilities
However, when
the converse onlytrue.
is not the
important role in statistical work.
mean, or both the mean and the variance of the distribution
That is, from a knowledge of E(X) and Var(X), we cannot reconstruct the are known.
If we know the probability distribution of a random variable X, we may
probability distribution of X. This is because to describe a probability
compute
5.1.2 E(X) and
Limit LawsVar(X), if these exist. However, the converse is not true.
distribution completely, we need to know the probability function of the
That is, from a knowledge of E(X) and Var(X), we cannot reconstruct the
random variable. However, Markov’s and Chebyshev’s inequalities enable
probability
Perhaps, thedistribution
two most of X. This
important lawsis because to describe a probability
us to derive lower (or upper) bound onofsuch
probability theorywhen
probabilities and statistics
only the
distribution
are theor“Law completely, we need to know the probability function ofBoth
the
mean, bothoftheLarge
meanNumbers” and the of
and the variance “Central Limit Theorem”.
the distribution are known.
random variable. However, Markov’s and Chebyshev’s inequalities enable
us 9to derive
These lower
names (orinupper)
appear bound onassuch
other publications probabilities
Markoff whenrespectively.
and Tchebysheff only the
5.1.2 Limit Laws
mean, or both the mean and the variance of the distribution are known.
In this book we have used the spellings Markov and Chebyshev, the official American
transliterations from Russian for consistency.
Perhaps, the two most important laws of probability theory and statistics
5.1.2 Limit Laws
are the “Law of Large Numbers” and
167the “Central Limit Theorem”. Both
Perhaps,
9 the two
These names mostin important
appear laws of
other publications as probability theory andrespectively.
Markoff and Tchebysheff statistics
arethis
In thebook
“Law we of Large
have used Numbers”
the spellingsand the “Central
Markov Limitthe
and Chebyshev, Theorem”. Both
official American
transliterations from Russian for consistency.
9
These names appear in other publications as Markoff and Tchebysheff respectively.
In this book we have used the spellings Markov and Chebyshev, the official American
167
transliterations from Russian for consistency.

167

www.job.oticon.dk

155
probability distribution of X. This is because to describe a probability
distribution completely, we need to know the probability function of the
ADVANCED TOPICS IN INTRODUCTORY
random variable.
PROBABILITY: However,
A FIRST COURSE Markov’s
IN and Chebyshev’s inequalities enable
us to derive THEORY
PROBABILITY lower (or upper) III
– VOLUME bound on such probabilities when only
Statistical the
Inequalities and Limit Laws
mean, or both the mean and the variance of the distribution are known.

5.1.2 Limit Laws


168 Advanced Topics in Introductory Probability
Perhaps, the two most important laws of probability theory and statistics
are the “Law of Large Numbers” and the “Central Limit Theorem”. Both
are related to the notion of the “sum of random variables”. Recall from
9
Chapter
These 3 that appear
names if Xi in(iother
= 1,publications
2, ..., n) areasrandom variables,
Markoff and thenrespectively.
Tchebysheff their sum
In this book we have used the spellings Markov and Chebyshev, the official American
Sn = X1 + X2 + · · · + Xn is also a random variable with expectation E(Sn )
transliterations from Russian for consistency.
which is the sum of the expectations of the individual random variables if n is
finite (see Corollary 3.3). Furthermore, if the Xi ’s are also independent, then
167
the additivity property of variance also holds for the variance of Sn ; that is,
the variance of the sum, Var(Sn ), is the sum of the individual variances (see
Corollary 3.9). Finally, it follows that if the Xi ’s have the same distribution,
then they possess a common mean µ (Theorem 3.3), a common variance σ 2
(Theorem 3.16), and E(Sn ) = nµ (Corollary 3.4). Theorem 3.19 states that
if Xi ’s are independent and identically distributed, then Var(Sn ) = nσ 2 .
Except for a few exceptional cases such as some special probability dis-
tributions discussed in the previous chapters the distributions involving sums
may be very complicated. Computations of probabilities with direct evalua-
tion are extremely difficult. Fortunately, under the ‘Law of Large Numbers’
and the ‘Central Limit Theorem’ (see Sections 5.4 and 5.5, respectively),
the evaluation of probabilities involving sums of random variables turn out
to be fairly simple.
In the subsequent sections we shall discuss statistical inequalities and
limit theorems in detail.

5.2 MARKOV’S INEQUALITY

Markov’s inequality relates a probability to an expectation and provides an


upper bound for the probability that a non-negative function of a random
variable is greater than or equal to some positive constant.

Theorem 5.1 MARKOV’S INEQUALITY


Let X be a nonnegative random variable having a finite expectation
E(X). Let  be any positive number, then the probability that X is
no less than  is no greater than the expectation of X divided by :

E(X)
P (X ≥ ) ≤

Statistical Inequalities and Limit Laws 169

Proof
This theorem is valid for both continuous and discrete cases. We shall first
prove for the discrete case.

(a) Discrete Case


Suppose that the probability distribution of X has the form in the following
table:

X x1 x2 ··· xk ··· xn
P (X = x) p(x1 ) p(x2 ) ··· p(xk ) ··· p(xn )

Suppose also that the values of the random variable are arranged in ascend-
156
ing order
Statistical Inequalities and Limit Laws 169

ADVANCED TOPICS IN INTRODUCTORY


Proof
PROBABILITY: A FIRST COURSE IN
This theoremTHEORY
PROBABILITY is valid– for both III
VOLUME continuous and discrete Statistical
cases. We shall first
Inequalities and Limit Laws
prove for the discrete case.

(a) Discrete Case


Suppose that the probability distribution of X has the form in the following
table:

X x1 x2 ··· xk ··· xn
P (X = x) p(x1 ) p(x2 ) ··· p(xk ) ··· p(xn )

Suppose also that the values of the random variable are arranged in ascend-
ing order
0 ≤ x1 < x2 < · · · < xn
Note that E(X) ≥ 0, since X ≥ 0. We shall consider three cases.

Case 1
If X takes only zero values, then E(X) = 0 and for any constant  > 0
0
P (X ≥ ) = 0 ≤

That is, the theorem is valid.

Suppose now that not all xi equal zero. Clearly,

E(X) > 0

Take an arbitrary constant  > 0.

Case 2
If  > xn , then {X < } is a ‘certain event’ and
E(X)
P (X < ) = 1 ≤

that is, the theorem is again valid in this case.

Case 3
Let  ≤ xn and xk , xk+1 , · · · ,Advanced
170 xn be all Topics
values in
of Introductory
X greater than  (if in a
Probability
special case  ≤ x1 , then k = 1)

By definition,

E(X) = x1 p1 + x2 p2 + · · · + xk pk + · · · + xn pn
≥ xk pk + · · · + xn pn
≥ (pk + · · · + pn ) (i)

By addition rule of probability

pk + pk+1 + · · · + pn = P (X ≥ )

Then from (i)


E(X) ≥ P (X ≥ )
or
E(X)
P (X ≥ ) ≤ (ii)

Since
P (X < ) = 1 − P (X ≥ )
then from (ii) it follows that

E(X) 157
P (X < ) ≥ 1 − (iii)

Then from (i)
E(X) ≥
ADVANCED TOPICS IN INTRODUCTORY P (X ≥ )
PROBABILITY: A FIRST COURSE IN
or
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
E(X)
P (X ≥ ) ≤ (ii)

Since
P (X < ) = 1 − P (X ≥ )
then from (ii) it follows that

E(X)
P (X < ) ≥ 1 − (iii)


(b) Continuous case


 ∞
E(X) = xf (x) dx
0  ∞
= xf (x) dx + xf (x) dx
0 
 ∞  
≥ xf (x) dx since xf (x) dx ≥ 0
 ∞ 0

≥ f (x)dx since >0



 ∞
=  f (x) dx

= P (X ≥ )

Therefore
E(X)
P (X ≥ ) ≤ ,


WHY WAIT FOR


PROGRESS?
DARE TO DISCOVER
Discovery means many different things at
Schlumberger. But it’s the spirit that unites every
single one of us. It doesn’t matter whether they
join our business, engineering or technology teams,
our trainees push boundaries, break new ground
and deliver the exceptional. If that excites you,
then we want to hear from you.

careers.slb.com/recentgraduates

158
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME
Statistical Inequalities III Laws
and Limit Statistical Inequalities
171 and Limit Laws

and hence
E(X)
P (X < ) ≥ 1 −

Aliter
Let 
0, if X < 
Y =
, if X ≥ 
Then
P (Y = 0) = P (X < )
and
P (Y = ) = P (X ≥ )
Hence

E(Y ) = 0 · P (Y = 0) +  · P (Y = )
=  P (X ≥ )

Clearly,
X≥Y
Hence
E(X) ≥ E(Y ) =  P (X ≥ )
Therefore
E(X)
P (X ≥ ) ≤

Note
(1) The Markov’s inequality may be equivalently stated as in Theorem
5.1(b).

Theorem 5.1(b) MARKOV’S INEQUALITY


Let X be a nonnegative random variable having a finite expectation
E(X). Let  be any positive number, then

E(X)
P (X < ) ≥ 1 −

172 Advanced Topics in Introductory Probability

Proof
Since X ≥  and X <  are complementary events, Theorem 5.1(b)
follows.

(2) The Markov’s inequality had appeared earlier in the work of Pafnuty
Chebyshev, and for this reason it is sometimes referred to in other
books as the first Chebyshev’s inequality. Such books refer to the
Chebyshev’s inequality, discussed in the sequel, as the second Cheby-
shev inequality.

Example 5.1
A textile factory produces on the average 150 bales of suiting material a
month. Suppose the number of bales of a suiting material produced each
month is a random variable. Find the bounds for the probability that a
particular month’s production will be 159
(2) The Markov’s inequality had appeared earlier in the work of Pafnuty
Chebyshev, and for this reason it is sometimes referred to in other
ADVANCED TOPICS IN INTRODUCTORY
books as the first Chebyshev’s inequality. Such books refer to the
PROBABILITY: A FIRST COURSE IN
Chebyshev’s
PROBABILITY THEORYinequality,
– VOLUME discussed
III in the sequel, as the secondInequalities
Statistical Cheby- and Limit Laws
shev inequality.

Example 5.1
A textile factory produces on the average 150 bales of suiting material a
month. Suppose the number of bales of a suiting material produced each
month is a random variable. Find the bounds for the probability that a
particular month’s production will be

(a) at least 160 bales;

(b) less than 200 bales.

Solution
Let X be the number of bales of the suiting material produced in a month.
E(X) = 150

(a) We are required to find P (X ≥ 160).


It is better to use the first formulation of the Markov’s inequality in
Theorem 5.1 with  = 160. Hence
150 15
P (X ≥ 160) ≤ =
160 16

(b) We are required to find P (X < 200). Using the second formulation we
have
150
P (X < 200) ≥ 1 −
200
1
=
4
Statistical Inequalities and Limit Laws 173

5.3 CHEBYSHEV’S INEQUALITY

Chebyshev’s inequality was first formulated by Irnée-Jules Bienaymé in


1853 but without proof and it was not until 1867 that Chebyshev provided
the proof. For this reason, the inequality is sometimes called Bienaymé-
Chebyshev inequality.

5.3.1 Basic Chebyshev’s Inequality

Theorem 5.2(a) CHEBYSHEV’S INEQUALITY


Let X be a random variable with finite expectation E(X) = µ and
finite variance Var(X). Then for any real number  > 0,

Var(X)
P (|X − µ| ≥ ) ≤
2

Proof
The inequality
|X − µ| ≥ 

is equivalent to the inequality 160


Var(X)
P (|X − µ|
ADVANCED TOPICS IN INTRODUCTORY ≥ ) ≤
2
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws

Proof
The inequality
|X − µ| ≥ 

is equivalent to the inequality

(X − µ)2 ≥ 2

The random variable (X − µ)2 takes only nonnegative values.


Applying to it the Markov’s inequality, we have:

P (|X − µ| ≥ ) = P [(X − µ)2 ≥ 2 ]


 
E (X − µ)2

2

174 Advanced Var(X) in Introductory Probability


≤ Topics
2

Note
(1) The Chebyshev’s inequality may be equivalently stated as in Theorem
5.2(b).

Theorem 5.2(b) CHEBYSHEV’S INEQUALITY


Let X be a random variable with finite expectation E(X) = µ and
finite variance Var(X). Then for any real number  > 0,

Var(X)
P (|X − µ| < ) ≥ 1 −
2

Proof
Since
{|X − µ| ≥ }
and
{|X − µ| < }
are complementary events, Theorem 5.2(b) follows.
(2) As indicated earlier, the Theorem 5.2 is referred to in some books as
the second Chebyshev’s inequality.

Example 5.2
An electric station services an area with 12, 000 bulbs. The probability of
switching on each of these bulbs every evening is 0.9. What are the bounds
for the probability that the number of bulbs switched on in the area in one
particular evening is different from its expected value in absolute terms by
(a) less than 100? (b) at least 120?

Solution
Let

X = the number of bulbs switched on161in the area in that evening


µ = E(X) = np = 12000(0.9) = 10800
An electric station services an area with 12, 000 bulbs. The probability of
switching on each of these bulbs every evening is 0.9. What are the bounds
ADVANCED TOPICS IN INTRODUCTORY
for the probability
PROBABILITY: that
A FIRST the number
COURSE IN of bulbs switched on in the area in one
particular evening
PROBABILITY THEORY is –different
VOLUME from
III its expected value in absolute terms
Statistical by
Inequalities and Limit Laws
(a) less than 100? (b) at least 120?

Solution
Let

X = the number of bulbs switched on in the area in that evening


µ = E(X) = np = 12000(0.9) = 10800
Statistical Inequalities and Limit Laws 175
Var(X) = npq = 12000(0.9)(0.1) = 1080

(a) We are required to calculate P (|X − µ| < 100).


We shall use the second Chebyshev’s Inequality
Var(X)
P (|X − µ| < ) ≥ 1 −
2
with  = 100.
Then
1080
P (|X − 10800| < 100) ≥ 1 − = 0.892
(100)2

(b) We are required to calculate P (|X − µ| ≥ 120).


We shall have to use the first Chebyshev’s inequality
1080
P (|X − 10800| ≥ 120) ≤ = 0.075
(120)2

Example 5.3
Suppose that a random variable has mean 5 and standard deviation 1.5. Use
the Chebyshev’s inequality to estimate the probability that an outcome lies
between 3 and 7.

Solution
µ = 5, σ = 1.5 PREPARE FOR A
LEADING ROLE.
Since we wish to estimate the probability of an outcome lying between 3
and 7, we have

English-taught
P (3 − 5 < X − µ < 7 − 5) = P (−2 < X − µ < 2) = P (|X − µ| < 2) MSc programmes
in engineering: Aeronautical,
That is  = 2. By Chebyshev’s inequality Biomedical, Electronics,
Var(X) 1.5 Mechanical, Communication
P (|X − µ| < 2) ≥ 1 − = 1 − 2 = 0.625 systems and Transport systems.
2 2
No tuition fees.
The desired probability is at least 0.625. That is, if the experiment is re-
peated a large number of times, we expect, at least, 62.5% of the outcome
to be between 3 and 7.

The following two theorems are other forms by which the Chebyshev’s
inequality are expressed. E liu.se/master


162
(b) We are required to calculate P (|X − µ| ≥ 120).
We shall have to use the first Chebyshev’s inequality
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN 1080
PROBABILITY THEORY P (|X − 10800|
– VOLUME III ≥ 120) ≤ = Statistical
0.075 Inequalities and Limit Laws
(120)2

Example 5.3
Suppose that a random variable has mean 5 and standard deviation 1.5. Use
the Chebyshev’s inequality to estimate the probability that an outcome lies
between 3 and 7.

Solution
µ = 5, σ = 1.5
Since we wish to estimate the probability of an outcome lying between 3
and 7, we have

P (3 − 5 < X − µ < 7 − 5) = P (−2 < X − µ < 2) = P (|X − µ| < 2)

That is  = 2. By Chebyshev’s inequality


Var(X) 1.5
P (|X − µ| < 2) ≥ 1 − = 1 − 2 = 0.625
2 2
The desired probability is at least 0.625. That is, if the experiment is re-
peated a large number of times, we expect, at least, 62.5% of the outcome
to be between 3 and 7.

The following two theorems are other forms by which the Chebyshev’s
176
inequality are expressed. Advanced Topics in Introductory Probability

Theorem 5.3(a)
Let X be a random variable with finite expectation E(X) = µ and
finite variance Var(X). Then for any positive number ,
1
P (|X − µ| ≥ cσ) ≤
c2

where  = cσ and σ = Var(X)

In words, the above theorem states that the probability that X assumes a
1
value outside the interval from µ − cσ to µ + cσ is never more than 2 .
c

Theorem 5.3(b)
Let X be a random variable with finite expectation E(X) = µ and
finite variance Var(X). Then for any real number  > 0,
1
P (|X − µ| < cσ) ≥ 1 −
c2

where  = cσ and σ = Var(X)

This states that the probability of the event that X takes on a value x
1
which is within c standard deviations of its expectation is at least 1 − 2 ,
c
no matter what c happens to be. That is, the probability that X assumes a
1
value within the interval from µ − c σ to µ + c σ is never less than 1 − 2 .
c
Note
Theorems 5.3(a) and 5.3(b) are used only when c > 1. Now, if c < 1, then
1 1
1 − 2 < 0 or 2 > 1, but we know that the probability of any event ranges
c c
from zero to one. Thus, Chebyshev’s inequality 163 of Theorems 5.3(a) and
5.3(b) is trivially true when c < 1.
This states that the probability of the event that X takes on a value x
ADVANCED TOPICS IN INTRODUCTORY 1
which is within c standard deviations of its expectation is at least 1 − 2 ,
PROBABILITY: A FIRST COURSE IN c
no matter what
PROBABILITY c happens
THEORY to be.
– VOLUME III That is, the probability that X assumes
Statistical a
Inequalities and Limit Laws
1
value within the interval from µ − c σ to µ + c σ is never less than 1 − 2 .
c
Note
Theorems 5.3(a) and 5.3(b) are used only when c > 1. Now, if c < 1, then
1 1
1 − 2 < 0 or 2 > 1, but we know that the probability of any event ranges
c c
from zero to one. Thus, Chebyshev’s inequality of Theorems 5.3(a) and
5.3(b) is trivially true when c < 1.

Example 5.4
Suppose a random variable X has an expectation µ = 4.6 and a variance
σ 2 = 2.25. Find the bounds for the following probabilities:
(a) P (|X −
Statistical µ| < 2σ) and
Inequalities (b)Limit
P (|XLaws
− µ| < 3σ) 177

Solution √
µ = 4.6, σ= 2.25 = 1.5

1 3
(a) P [|X − 4.6| < 2(1.5)] ≥ 1 − 2 = = 0.75
2 4
Thus,
P [|X − 4.6| < 3] ≥ 0.75
1 8
(b) P [|X − 4.6| ≤ 3(1.5)] ≥ 1 − 2 = = 0.8889
3 9
Thus,
P (|X − 4.6| < 4.5) ≥ 0.8889

From Example 5.4, we can say that the probability that the random variable
X will take on a value within two standard deviations from the mean is at
least 43 , and the probability that X will take on a value within three standard
deviations from the mean is at least 98 . It is in this sense that the standard
deviation σ controls the spread or dispersion of the distribution of a random
variable.

5.3.2 Special Cases of Chebyshev’s Inequality

The following special cases of Chebyshev’s inequality are very helpful in


sampling theory. They can be used to determine the minimum sample sizes
(number of trials).

Theorem 5.4
Let Xn be the sample mean based on a random sample of size n on a
random variable X with expectation µ and finite variance σ 2 . Then
for any real number ,

σ2
P (|X n − µ| ≥ ) ≤
n2
or equivalently,
σ2
P (|X n − µ| < ) ≥ 1 −
n2
where n is the sample size

164
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
178 THEORY – VOLUME Advanced
III Statistical
Topics in Introductory Inequalities and Limit Laws
Probability

Example 5.5
Let X denote the mean of random variables Xi with expectation µ = 100
and variance σ 2 = 2.5. Find the bound for the probability that in 120 trials
the mean differs from the expected value in absolute terms by less than 0.8.

Solution
 = 0.8, n = 120, σ 2 = 2.5

2.5
P (|X n − µ| < 0.8) ≥ 1 − = 0.974
120(0.8)2

Example 5.6
Referring to the X of Example 5.5, determine the size of n such that

P (|X n − µ| < 0.8) ≥ 0.99

Solution
P (|X n − 100| < 0.8) ≥ 0.99

Comparing this with Theorem 5.4 we have:

σ2
1− = 0.99
n2
σ2
⇒ = 0.01
n2
Click here
Rearranging we obtain to learn more

TAKE THE
RIGHT TRACK
σ2 2.5
n≥ = = 390.6
0.012 0.01(0.8)2

That is, we need at least 391 trials in order that the probability will be at
least 0.99 that the sample mean X n will lie within 0.8 of the expectation.
Give your career a head start
by studying with us. Experience the advantages
of our collaboration with major companies like
ABB, Volvo and Ericsson!

Apply by World class


www.mdh.se
15 January research

165
Solution TOPICS IN INTRODUCTORY
ADVANCED
Solution
PROBABILITY: A FIRST COURSE IN
P (|X n III− 100|
PROBABILITY THEORY – VOLUME < 0.8) ≥ 0.99 Statistical Inequalities and Limit Laws
P (|X n − 100| < 0.8) ≥ 0.99
Comparing this with Theorem 5.4 we have:
Comparing this with Theorem 5.4 we have:

σ2
1 − σ 22 = 0.99
1 − n2 = 0.99
n
σ 22
⇒ σ = 0.01
⇒ n22 = 0.01
n
Rearranging we obtain
Rearranging we obtain

σ2 2.5
n ≥ σ2 2 = 2.5 2 = 390.6
n≥ 0.01 = 0.01(0.8) = 390.6
0.01 2
Statistical Inequalities and Limit Laws 0.01(0.8) 2
179
Statistical
That Inequalities
is, weInequalities and
need at least Limit
391 Laws
trials in order that the probability will be179 at
That is, we need at least 391 trials in order that the probability will be179
Statistical and Limit Laws at
least Going
0.99 that the sample
through mean X
the solution will lie within
ofnExample 5.5, we0.8
mayof the expectation.
write the formula
least 0.99 that the sample mean X n will lie within 0.8 of the expectation.
for determining
Going throughthe sample size nofbyExample
the solution Chebyshev’s inequality
5.5, we may write as the formula
for determining the sample size n by Chebyshev’s inequality as the formula
Going through the solution of Example 5.5, we may write
for determining the sample size n by Chebyshev’s inequality as
σ2
n = 22
q
σ
n = σ 22
n = q2
q
where q = 1 − p.
where q = 1 − p.
whereAnother
q = 1 −form
p. of Chebyshev’s inequality which is of great importance
is theAnother
Bernoulli forms
form which are theinequality
of Chebyshev’s applications of the
which is ofDegreat
Moivre-Laplace
importance
Another
Integral form
Theorem. of Chebyshev’s inequality which is of great
is the Bernoulli forms which are the applications of the De Moivre-Laplace importance
is the Bernoulli
Integral Theorem. forms which are the applications of the De Moivre-Laplace
Integral Theorem.

Theorem 5.5 FIRST BERNOULLI THEOREM


Let E be an5.5
Theorem experiment
FIRSTand A an event THEOREM
BERNOULLI associated with E. Consider
Theorem 5.5
independent FIRST
trials of BERNOULLI
Let the random THEOREM
Let E be an experiment and A an event associated
n E. n beE.
variable Mwith theConsider
number
Let
of E be an experiment
times event trials
n independent A occurs and A
of E. amongan event
Let thethe associated
n trials.
random Then
variable with E. Consider
MnbybeChebyshev’s
the number
n independent
inequality trials of E. Let the random variable M be
of times event A occurs among the n trials. Then by Chebyshev’s
n the number
of times event A occurs among npq
the n trials. Then by Chebyshev’s
inequality P (|M n − np| ≥ ) ≤
inequality 2
npq
P (|Mn − np| ≥ ) ≤ npq
or equivalently, P (|Mn − np| ≥ ) ≤ 22

or equivalently, npq
or equivalently, P (|Mn − np| < ) ≥ 1 − 2

npq
P (|Mn − np| < ) ≥ 1 − npq
P (|Mn − np| < ) ≥ 1 − 22


Proof
The
ProofTheorem follows from Chebyshev’s inequality (Theorem 5.2) by remem-
Proof
bering that E(M
The Theorem n ) =from
follows np and Var(Mn ) =
Chebyshev’s inequality
npq. (Theorem 5.2) by remem-
The Theorem follows from Chebyshev’s
bering that E(Mn ) = np and Var(Mn ) = npq. inequality (Theorem 5.2) by remem-
bering that E(M
The reader should ) = np and Var(M ) = npq.
n attempt Exercisen 5.26 (a). It is a typical example of
Theorem 5.5.
The reader should attempt Exercise 5.26 (a). It is a typical example of
The reader
Theorem 5.5.should attempt Exercise 5.26 (a). It is a typical example of
Theorem 5.5 is true for any value we give to the  > 0. In particular,
Theorem 5.5.
if we Theorem
replace  5.5
by is where
true
n δ, δ is value
for any thoughtwe of as to
give “small”,
the  >then
0. Inweparticular,
have the
Theorem
following 5.5
theorem. is true for any value we give to the  > 0. In
if we replace  by n δ, where δ is thought of as “small”, then we have theparticular,
if we replace
following  by n δ, where δ is thought of as “small”, then we have the
theorem.
following theorem.

166
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
180 Advanced Topics in Introductory Probability

Theorem 5.6 SECOND BERNOULLI THEOREM


Let E be an experiment and A be an event associated with E. Con-
Mn
sider n independent trials of E. Let be the relative frequency of
n
the occurrence of event A in n trials. Then
  
 Mn  pq

P  
− p ≥ n δ ≤ 2
n nδ
or equivalently,
  
 Mn  pq
P  − p < n δ ≥ 1 − 2
n nδ

For any value of δ, the right side of the second part of Theorem 5.6
converges to 1 as n → ∞. What it means is that for any value of δ, no
matter how small, the probability that the proportion of successes in n
trials differs from the theoretical probability p by less than δ tends to 1 as
the number of trials increases without bound. That is, it is guaranteed that
the observed relative frequency of successes will converge to the theoretical
relative frequency (as measured by p) as the number of trials tends to infinity
(see Section 5.4).

Example 5.7
The probability of the occurrence of an event A in each trial of an experiment
2
is . Using Chebyshev’s inequality, find a lower bound for the probability
3
that in 10, 000 trials the deviation of the relative frequency of the event A
from the true probability of A will be less than 0.01.

Solution
The relative frequency in n independent trials is a random variable. Its
2
expectation equals p = .
3
We are required to calculate

  
Statistical Inequalities and
 Mn 2 
 < 0.01 181
P Limit
 Laws
 10, 000 −
3

Then,
  
2 1
  
 Mn 2  3 3

P  − < 0.01 ≥ 1−
10, 000 3  (10, 000)(0.01)2
2 7
= 1 − = ≈ 0.778
9 9
Thus, with probability not less than 0.778 we may expect that in 10, 000
trials the relative frequency of event A will deviate from its probability for
less than 0.01.

Example 5.8
How many times should a fair die be tossed in order to be at least 95% sure
that the relative frequency of having a four come up is within 0.01 of the
1
theoretical probability .
6 167
Solution
2 7
= 1− = ≈ 0.778
9 9
ADVANCED TOPICS IN INTRODUCTORY
Thus, with probability
PROBABILITY: not less
A FIRST COURSE IN than 0.778 we may expect that in 10, 000
trials the relative
PROBABILITY THEORYfrequency ofIIIevent A will deviate from
– VOLUME its probability
Statistical for
Inequalities and Limit Laws
less than 0.01.

Example 5.8
How many times should a fair die be tossed in order to be at least 95% sure
that the relative frequency of having a four come up is within 0.01 of the
1
theoretical probability .
6
Solution
1 5
p= , 1−p= and  = 0.01
6 6
Mn
Let be the relative frequency. Then
n
  
 Mn 
P  − p < 0.01 ≥ 0.95
n
By the second Bernoulli Theorem,
  
 Sn  pq

P  − p <  ≥ 1 − 2
n n
Thus
pq
1− = 0.95
n2
pq
⇒ = 0.05
n2
pq
⇒ n =
0.05(2 )
  
1 5
6 6
=
0.05(0.01)2
= 27777.778
182 Advanced Topics in Introductory Probability
≈ 27, 778

The Chebyshev’s inequality provides only crude and general results. This
limitation arises because of its complete universality; it is valid under any
distribution provided both the expectation and the variance are finite. But
this condition will always be satisfied when the number of values of the
random variable X is finite. In the process of achieving such a general
result, however, this inequality is not particularly tight in terms of the bound
achieved on particular distributions. For most distributions that arise in
practice, there are far sharper bounds for P (|X − µ| < ) than that given by
Chebyshev’s inequality. For instance, if the random variable X is normally
distributed, then the following formula derived in Chapter 11 of Volume I is
more exact:  

P (|X − µ| < ) = 2Φ −1
σ

Example 5.9
Referring to Example 5.4, use the Normal distribution to find the probabil-
ities and compare the results.

Solution √
µ = 4.6, σ= 2.25 = 1.5

(a) 2σ = 2(1.5) = 3

P (|X − µ| < 2σ) = P (|X − µ| < 3)


 
3
= 2Φ −1
1.5
= 2Φ(2)168
−1
= 2(0.9972) − 1
Referring to Example 5.4, use the Normal distribution to find the probabil-
ities and compare the results.
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
Solution
PROBABILITY THEORY
√ – VOLUME III Statistical Inequalities and Limit Laws
µ = 4.6, σ= 2.25 = 1.5

(a) 2σ = 2(1.5) = 3

P (|X − µ| < 2σ) = P (|X − µ| < 3)


 
3
= 2Φ −1
1.5
= 2Φ(2) − 1
= 2(0.9972) − 1
= 0.9544

(b) 3σ = 3(1.5) = 4.4

P (|X − µ| < 3σ) = P (|X − µ| < 4.5)


 
4.5
= 2Φ −1
1.5
= 2Φ(3) − 1
= 2(0.9987) − 1
Statistical Inequalities and Limit Laws 183
= 0.9974

In both cases the probabilities are greater than those obtained by Cheby-
shev’s inequality in Example 5.4.

Note that Chebyshev’s inequality provides nontrivial estimate of the


probability of the event {|X − µ| < } only in the case when the variance
of the random variable X is sufficiently small, namely, less than 2 .
Clearly, the left side of the inequality in Theorem 5.2, expressing the
probability of an event is nonnegative. It gives the estimate of the probability
of the event {|X − µ| < }, if the right side is positive.
Thus,
How will people travel in the future, and
Var(X) Var(X)
1− > 0, or < 1, or Var(X) < how2 will goods be transported? What re-
2 2 sources will we use, and how many will
we need? The passenger and freight traf-
If, on the other hand, Var(X) > 2 , then the right side of the inequality
fic sector is of
developing rapidly, and we
Theorem 5.2 will become negative and the Chebyshev inequalityprovidewill give
the impetus for innovation and
movement. We develop components and
systems for internal combustion engines
P (|X − µ ≤ ) < −a
that operate more cleanly and more ef-
ficiently than ever before. We are also
where a > 0.
pushing forward technologies that are
bringing hybrid vehicles and alternative
Thus, Chebyshev’s inequality is trivial for the case when Var(X) 2.
> anew
drives into dimension – for private,
This of course reduces the role of Chebyshev’s inequality in its corporate,
application and public use. The challeng-
to practical problems; however, its theoretical importance is great. It We
es are great. is deliver the solutions and
the starting point for several theoretical development. It provides uschallenging
offer with a jobs.
convenient interpretation of the concept of variance (or standard www.schaeffler.com/careers
deviation).
It can also be used to provide a simple proof for the law of large numbers
in the next section.

Note
The inequality that helps us derive bounds for sums of independent random
variables is the Komogorov’s inequality.
Suppose that X1 , X2 , · · · , Xn are independent random variables with
mean zero and finite variances and let Sk = X1 + X2 + · · · + Xn . Then for
>0
 
Var(Sn )
P max |Sk | ≥  ≤
1≤k≤n 169
2
Statistical Inequalities and Limit Laws 183
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
In both casesTHEORY
PROBABILITY the probabilities
– VOLUME IIIare greater than those Statistical
obtained byInequalities
Cheby- and Limit Laws
shev’s inequality in Example 5.4.

Note that Chebyshev’s inequality provides nontrivial estimate of the


probability of the event {|X − µ| < } only in the case when the variance
of the random variable X is sufficiently small, namely, less than 2 .
Clearly, the left side of the inequality in Theorem 5.2, expressing the
probability of an event is nonnegative. It gives the estimate of the probability
of the event {|X − µ| < }, if the right side is positive.
Thus,

Var(X) Var(X)
1− > 0, or < 1, or Var(X) < 2
2 2

If, on the other hand, Var(X) > 2 , then the right side of the inequality of
Theorem 5.2 will become negative and the Chebyshev inequality will give

P (|X − µ ≤ ) < −a

where a > 0.

Thus, Chebyshev’s inequality is trivial for the case when Var(X) > 2 .
This of course reduces the role of Chebyshev’s inequality in its application
to practical problems; however, its theoretical importance is great. It is
the starting point for several theoretical development. It provides us with a
convenient interpretation of the concept of variance (or standard deviation).
It can also be used to provide a simple proof for the law of large numbers
in the next section.

Note
The inequality that helps us derive bounds for sums of independent random
variables is the Komogorov’s inequality.
Suppose that X1 , X2 , · · · , Xn are independent random variables with
mean zero and finite variances and let Sk = X1 + X2 + · · · + Xn . Then for
>0
 
Var(Sn )
P max |Sk | ≥  ≤
1≤k≤n 2
184 Advanced Topics in Introductory Probability
For the proof see page 248 of Billinsley, P. (1979) listed in the bibliography.

For n = 1, Komogorov’s inequality reduces to Chebyshev’s inequality.


For n > 1, Chebyshev’s inequality gives the same bound for the single
relation |Sk | ≥ , so that Komogorov’s inequality is considerably stronger.

5.4 LAW OF LARGE NUMBERS

5.4.1 Definition of Law of Large Numbers

The Law of Large Numbers is an important consequence of Chebyshev’s


inequality. It is commonly believed that if a fair coin is tossed many times
the number of heads is about the same as the number of tails. The law of
large numbers is a mathematical formulation of this belief.

Definition 5.1 LAW OF LARGE NUMBERS


170
The law of large numbers states that the average of a number of
The Law of Large Numbers is an important consequence of Chebyshev’s
inequality. It is commonly believed that if a fair coin is tossed many times
the numberTOPICS
ADVANCED of heads is about the same as the number of tails. The law of
IN INTRODUCTORY
PROBABILITY:
large numbersA isFIRST COURSE IN
a mathematical formulation of this belief.
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws

Definition 5.1 LAW OF LARGE NUMBERS


The law of large numbers states that the average of a number of
independent and identically distributed random variables converge
to the expected value of the underlying distribution of X as the
number of random variables increases

Whatever the “law of averages” may be, it is certainly not reasonable in a


hundred tosses of a fair coin to expect exactly 50 tails or a million tosses
to expect exactly 500,000 heads. To be more reasonable, perhaps the best
we can get is that usually the proportion of tails in n tosses is close to 12 .
J.E. Kerrich who, while interned in Denmark as a British subject during the
Second World War, tossed a coin 10,000 times and recorded 5067 heads.
Before we give the various forms of the law of large numbers we shall
discuss briefly the term ”converge” used in the definition. Suppose we have a
sequence X1 , X2 , · · · of random variables defined on the same sample space.
In what follows we shall distinguish various possible modes in which the se-
quence {Xn } may converge. Convergence generally comes in two categories
Statistical
which Inequalities
we call strong and and Limit Laws
weak. 185

Theorem 5.7 CONVERGENCE IN PROBABILITY


Suppose {Xn } is a sequence of random variables (n = 1, 2, · · ·).
Then if for every  > 0

P (|Xn − θ| > )

approaches zero as n approaches infinity, then {Xn } is said to


converge in probability to θ

Symbolically, we may write convergence in probability as

lim P (|Xn − θ| > ) → 0


n→∞

or
p
Xn −→ θ
Synonyms for convergence in probability are stochastic convergence, conver-
gence in measure, or weak convergence.

Theorem 5.8 CONVERGENCE WITH PROBABILITY


Suppose {Xn } is a sequence of random variables (n = 1, 2, · · ·).
Then if
P (|Xn = X|)
is 1 as n approaches infinity, then {Xn } is said to converge with
probability one (w.p. 1) to X

That is, Xn actually converges to X except in a negligible number of cases.


This form of convergence is also called almost sure convergence (a.s.), or
almost everywhere convergence (a.e.), or strong convergence.
171
Symbolically, we may write convergence in probability as
Then if
P (|Xn = X|)
P (|Xn = X|)
Even though the IN
strong law of large numbers may not be realistic, what
is 1 as TOPICS
ADVANCED n approaches infinity, then {Xn } is said to converge with
INTRODUCTORY
is 1not
might as be
PROBABILITY: nAapproaches
realised infinity,
in the
FIRST COURSE real-world
IN {Xn } is
then situation said
may to converge
sometimes with
be achieved
probability one (w.p. 1) to X
in aprobability
PROBABILITY one (w.p.
purely theoretical
THEORY 1) toIIISuch
sense.
– VOLUME X a possibility was Statistical
unravelled by Borel
Inequalities and Limit Laws
in 1909, who established the strong law of large numbers in the case of
independent Bernoulli trials.
That is, Xn actually converges to X except in a negligible number of cases.
That is, Xn actually converges to X except in a negligible number of cases.
This form of convergence is also called almost sure convergence (a.s.), or
This form of convergence is also called almost sure convergence (a.s.), or
almost everywhere convergence (a.e.), or strong convergence.
almost everywhere convergence (a.e.), or strong convergence.
Symbolically,
Theorem we 5.9may write convergence
LAW OFin probability as
Symbolically, we maySTRONG
write convergence inLARGE NUMBERS
probability as
(Bernoulli Form)
Let Mn be a randomn→∞ lim P (|Xnof=the
variable X|) = 1
lim P (|Xn = X|) number
=1 of successes in n
n→∞
Mn
or Bernoulli trials so that n Advanced
186
is the proportion of successes. Then
or a.s. Topics in Introductory Probability
 X n −→
a.s. X 
XnM −→n
X
P lim 10 = p = 1
These two types of convergence n→∞ n are the basis of the two laws of large

numbers, namely, the weak and the strong laws of large numbers. For our
where p is the probability of success
purposes, which are statistical, the weak law of large numbers is the central
concept and when the “law of large numbers” is referred to without qualifi-
cation, this one is implied. We shall discuss it in much detail later but now
we state without proof, the strong law of large numbers.
10
The other type of convergence that plays an important role in Probability theory is
5.4.2 Strong
convergence Law ofalso
in distribution, Large
called Numbers
complete convergence or weak convergence. Sup-
pose {Xn } is a sequence of random variables, (n = 1, 2, · · ·) and let Fn (t) be their c.d.f.’s.
Even though
If for every the strong
t at which F0 (t) islaw of large numbers may not be realistic, what
continuous,
might not be realised in the real-world situation may sometimes be achieved
in a purely theoretical sense.n→∞ Such
lim a=
Fn (t) possibility
F0 (t) was unravelled by Borel
in 1909, who established the strong law of large numbers in the case of
independent
then Xn is saidBernoulli
to convergetrials.
d
in distribution to X0 , denoted by Fn (t) → F0 (t).

Theorem 5.9 STRONG LAW OF LARGE NUMBERS


(Bernoulli Form)


678'<)25<2850$67(5©6'(*5((
Let Mn be a random variable of the number of successes in n
Mn
Bernoulli trials so that is the proportion of successes. Then
n
 
&KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
Mn
P lim =p =1
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
n→∞ n
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
where p is the probability of success
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN

10
The other type of convergence that plays an important role in Probability theory is
convergence in distribution, also called complete convergence or weak convergence. Sup-
pose {Xn } is a sequence of random variables, (n = 1, 2, · · ·) and let Fn (t) be their c.d.f.’s.
If for every t at which F0 (t) is continuous,

lim Fn (t) = F0 (t)


n→∞

d
then Xn is said to converge in distribution to X0 , denoted by Fn (t) → F0 (t).

172
numbers, namely, the weak and the strong laws of large numbers. For our
purposes, which are statistical, the weak law of large numbers is the central
ADVANCED TOPICS IN INTRODUCTORY
concept and when the “law of large numbers” is referred to without qualifi-
PROBABILITY: A FIRST COURSE IN
cation, this one
PROBABILITY is implied.
THEORY WeIIIshall discuss it in much Statistical
– VOLUME detail later Inequalities
but now and Limit Laws
we state without proof, the strong law of large numbers.

5.4.2 Strong Law of Large Numbers

Even though the strong law of large numbers may not be realistic, what
might not be realised in the real-world situation may sometimes be achieved
in a purely theoretical sense. Such a possibility was unravelled by Borel
in 1909, who established the strong law of large numbers in the case of
independent Bernoulli trials.

Theorem 5.9 STRONG LAW OF LARGE NUMBERS


(Bernoulli Form)
Let Mn be a random variable of the number of successes in n
Mn
Bernoulli trials so that is the proportion of successes. Then
n
 
Mn
P lim =p =1
n→∞ n

where p is the probability of success


Statistical Inequalities and Limit Laws 187

Theorem 5.10 STRONG LAW OF LARGE NUMBERS


(Khinchin Form)
10
The other type of convergence that plays an important role in Probability theory is
convergence in distribution, also called complete convergence or random
Let X1 , X2 , · · · , Xn be a sequence of independent weak convergence.
variablesSup-
pose {Xn } is a sequence of random variables, (n = 1, 2, · · ·) and let Fn (t) be their c.d.f.’s.
If forhaving
every t a
at common
which F0 (t)distribution
is continuous,and let E(X) = µ. Then there is a
limit number µ, such that
lim Fn (t) = F0
(t)
P n→∞
lim X n = µ = 1
n→∞
d
 Xin distribution to X0 , denoted by Fn (t) → F0 (t).
then Xn is said to converge
i
where X n = n
i=1
n

The proof of this theorem goes beyond this book. To see it refer to page
250 of Billingsley, P. (1979), listed in the bibliography.

The strong law of large numbers makes better sense than the weak law
and it is indispensable for certain theoretical investigations. It is indeed the
foundation of a mathematical theory of probability based on the concept of
relative frequency. It is also known that the sample mean, X n , converges
with probability 1 to the mean µ, provided that the latter exists. The
strong law of large numbers is usually not given much attention in most
statistical textbooks, including this one partly because it is not realistic to
assert this almost sure convergence, for in any experiment we can neither be
100 percent sure nor 100 percent accurate, otherwise the phenomenon will
not be a random one. Secondly, in general, it is easier to prove the weak
law of large numbers than the strong law of large numbers.

5.4.3 Weak Law of Large Numbers

The weak law of large numbers is one of the earliest and most famous of
the limit laws of probability. We shall consider two of its forms, namely,
Bernoulli law (which relates to proportions) and Khinchin law (which relates
to the means).
173

Bernoulli Law of Large Numbers (Discrete Case)


statistical textbooks, including this one partly because it is not realistic to
assert this almost sure convergence, for in any experiment we can neither be
ADVANCED TOPICS IN INTRODUCTORY
100 percent sure
PROBABILITY: norCOURSE
A FIRST 100 percent
IN accurate, otherwise the phenomenon will
not be a random
PROBABILITY THEORYone. Secondly,
– VOLUME III in general, it is easier to prove the
Statistical weak
Inequalities and Limit Laws
law of large numbers than the strong law of large numbers.

5.4.3 Weak Law of Large Numbers

The weak law of large numbers is one of the earliest and most famous of
the limit laws of probability. We shall consider two of its forms, namely,
Bernoulli law (which relates to proportions) and Khinchin law (which relates
to the means).

Bernoulli Law of Large Numbers (Discrete Case)

The first formulation of the law of large numbers, known as the Bernoulli
188 Advanced Topics in Introductory Probability
law of large numbers, was given and proved by Jakob Bernoulli and pub-
lished posthumously in his book “Ars Conjectandi” in 1713 as a crowning
achievement. It states that if Sn represents the number of successes in n
identical Bernoulli trials, with probability of success p in each trial, then the
Mn
relative frequency is very likely to be close to p when n is a sufficiently
n
large and fixed integer. The law in a sense justifies the use of the frequency
definition of probability discussed in Chapter 3 of Volume I and it brings
the theory of probability into contact with practice.
Mathematically, the Bernoulli law of large numbers may be expressed
in the following theorem.

Theorem 5.11 BERNOULLI LAW OF LARGE NUMBERS


Let X1 , X2 , · · · , Xn , be a sequence of independent Bernoulli random
variables, each taking on the value 1 (with the occurrence of an event
A) or 0 (with non-occurrence of an event A), and let p = P (Xi =
1); i = 1, 2, ..., n. Then for sums Mn , the number of successes in n
trials, where
n

Mn = Xi ,
i=1
  
 Mn 
lim P 
 − p <  = 1

n→∞ n
for all  > 0 or equivalently,
  
 Mn 
lim P 
 − p ≥  = 0

n→∞ n
for all  > 0

Mn
That is, for n large, is very close to p. In other words the law of large
n
numbers may be stated as follows: In n trials the probability that the rela-
Mn
tive number of successes , deviates numerically from the true probability
n
p by not more than  ( > 0), approaches 1 as n approaches infinity.

Or equivalently, in n trials the probability that the relative number


Mn
of successes , deviates numerically from the true probability p by more
n
than  ( > 0), approaches 0 as n approaches infinity.

174
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE
Statistical Inequalities IN
and Limit Laws 189
PROBABILITY THEORY – VOLUME
Statistical Inequalities III Laws
and Limit Statistical Inequalities
189 and Limit Laws

Khinchin Theorem of Large Numbers (Continuous Case)


Khinchin Theorem of Large Numbers (Continuous Case)
The modern discussion of a mathematical version of the weak law of large
The modern
numbers wasdiscussion of a mathematical
first published by the Russian version of the weak Khinchin
mathematician, law of large
in
numbers
1929. Thewas first published
Khinchin Theorem by the “If
states Russian
Xi (i =mathematician, Khinchin in
1, 2, ..., n) are independent
1929.identically
and The Khinchin Theorem
distributed statesvariables
random “If Xi (i and
= 1,if2,E(X
..., n)i ) =areµ independent
exists, then
and identically distributed random variables and if
when n becomes a very large integer, the probability thati the random
E(X ) = µ exists, vari-
then
whenXn will
able becomes
differafrom
very the
large integer,expectation
common the probability
µ of that
Xi by thenot
random vari-
more than
able X will differ
any arbitrarily from the small
prescribed common expectation
difference µ of close
 is very Xi bytonot one.”moreMath-
than
any arbitrarily
ematically, this prescribed
theorem can small difference in
be expressed  isthe
very
formcloseas to in one.” Math-
the following
ematically, this theorem can be expressed in the form as in the following
theorem.
theorem.

Theorem 5.12(a) KHINCHIN LAW OF LARGE NUMBERS I


Theorem
Let , Xn be aKHINCHIN
X1 , · · ·5.12(a) sequence of LAW OF LARGEand
independent NUMBERS I
identically
Let
distributed
X 1 , · · · , X n be
continuous a sequence of independent and identically
random variables with finite common mean
distributed
µ and finite continuous randomσ 2variables
common variance . Then forwith
anyfinite common mean
real number >0
µ and finite common variance  σ 2 . Then for any real number  > 0
 
 
lim P X n − µ <  = 1
n→∞  
lim P X n − µ <  = 1
or equivalently, n→∞

or equivalently,   
 Sn 
lim P  S − µ <  = 1

 
lim P  n − µ <  = 1
n→∞ n

where S = X + X + · · · + Xn
n→∞
n 1 2 n
where Sn = X1 + X2 + · · · + Xn

Proof
Proof
By Theorem 5.4,
By Theorem 5.4,
   σ2
 
Scholarships
P 
 X
 n
− µ

<  = 1 − σ 22
P X n − µ <  = 1 − n2
so that n
so that
σ2
1 − σ 22 ≤ P (|X n − µ| < ) ≤ 1
1 − n2 ≤ P (|X n − µ| < ) ≤ 1
 n2 
σ
lim 1 − σ 22  ≤ lim P (|X n − µ| < ) ≤ 1

Open your mind to
lim 1 − n2 ≤ lim P (|X n − µ| < ) ≤ 1
n→∞
n→∞ n
n→∞
n→∞

new opportunities
With 31,000 students, Linnaeus University is
one of the larger universities in Sweden. We
are a modern university, known for our strong
international profile. Every year more than Bachelor programmes in
1,600 international students from all over the Business & Economics | Computer Science/IT |
world choose to enjoy the friendly atmosphere Design | Mathematics
and active student life at Linnaeus University.
Master programmes in
Welcome to join us! Business & Economics | Behavioural Sciences | Computer
Science/IT | Cultural Studies & Social Sciences | Design |
Mathematics | Natural Sciences | Technology & Engineering
Summer Academy courses

175
  
 Sn 
lim P 

ADVANCED TOPICS IN INTRODUCTORY − µ <  = 1

n→∞ n
PROBABILITY: A FIRST COURSE IN
where SnTHEORY
PROBABILITY = X1 +–X 2 + · · · III
VOLUME + Xn Statistical Inequalities and Limit Laws

Proof
By Theorem 5.4,
   σ2
 
P X n − µ <  = 1 −
n2
so that

σ2
1− ≤ P (|X n − µ| < ) ≤ 1
n2
 
190 σ2
lim 1 − 2 Advanced
≤ lim PTopics
(|X n −inµ|Introductory
< ) ≤ 1 Probability
n→∞ n n→∞

1 ≤ lim P (|X n − µ| < ) ≤ 1


n→∞

By Sandwich rule of limits

lim P (|X n − µ| < ) = 1


n→∞

It is in this sense that the arithmetic mean “converges” to E(X) = µ. That


is, by averaging an increasingly large number of observations of the value
of a quantity, we can obtain increasingly more accurate measures of the
expectation of that quantity.

Aliter
We first find E(X n ) and Var(X n ):
n
1 
E(X n ) = E(Xi ) = µ (see Chapter 7).
n i=1

Since the Xi are independent and identically distributed,


n
1  σ2
Var(X n ) = Var(X i ) = (see Chapter 7).
n2 i=1 n

This desired result now follows immediately from Chebyshev’s inequality as


follows
   Var(X n ) σ2
 
P X n − µ <  ≥ 1 − = 1 − → 1 as n → ∞
2 n2

Theorem 5.12(b) KHINCHIN LAW OF LARGE NUMBERS II


The law of large numbers may equivalently be stated as

lim P (|X n − µ| ≥ ) = 0
n→∞

or equivalently,
  
 Sn 
lim P 
 − µ ≥  = 0

n→∞ n
where Sn = X1 + X2 + · · · + Xn

176
ADVANCED
Statistical TOPICS IN INTRODUCTORY
Inequalities and Limit Laws 191
PROBABILITY: A FIRST COURSE IN
Statistical Inequalities and Limit Laws 191
PROBABILITY THEORY – VOLUME
Statistical Inequalities III Laws
and Limit Statistical Inequalities191 and Limit Laws

The reader is asked in Exercise 5.27 to prove this theorem.


The reader is asked in Exercise 5.27 to prove this theorem.
The
Notereader is asked in Exercise 5.27 to prove this theorem.
Note
To apply the weak law of large numbers, the random variables do not have
Note
To
to beapply the weak
identically law of large
distributed; they numbers,
just need thetorandom
have the variables
same mean do notandhavethe
To apply
to be finite
same the
identicallyweakdistributed; they just need to have the same mean andhave
variance. law of large numbers, the random variables do not the
to
samebe finite
identically distributed; they just need to have the same mean and the
variance.
same finite variance.
5.5 CENTRAL LIMIT THEOREM
5.5 CENTRAL LIMIT THEOREM
5.5 CENTRAL LIMIT THEOREM
5.5.1 Definition of Central Limit Theorem
5.5.1 Definition of Central Limit Theorem
5.5.1 all
Among Definition
probability of laws,
Central Limit Theorem
the Central Limit Theorem (CLT) is consid-
Among all probability laws, the
ered the most remarkable theoretical formulation Central Limit Theorem
from both (CLT) is consid-
practical and
Among
theoreticalall
ered the most probability
remarkable
viewpoints. laws, the
theoretical
It provides Central Limit
formulation
some Theorem
reassurance fromwhen (CLT) areis not
bothwepractical consid-
and
cer-
ered
tain the most
theoretical
whether remarkable
viewpoints.
observations theoretical
It are
provides
normally formulation
some reassurance
distributed. fromwhenbothwepractical
Whatever arenature
the and
not cer-of
theoretical
tain whether viewpoints.
observations It provides
are normally some reassurance
distributed. when
Whatever
the random variables X1 , X2 , · · · – whether they are discrete or continuous, we are
the not
nature cer-
of
tain whether
the random
Bernoulli observations
or variables
gamma – X are
as1 ,nXgets normally distributed.
· – whether
2 , · ·larger and largerthey the Whatever
are distribution the nature
discrete or continuous,
of the sum of
the random
Bernoulli variables X
or gamma – asalways
(after standardisation) , X
1 n gets , ·
2 gets · · – whether
larger andand
closer they
larger are
the to
closer discrete
distributionor continuous,
that of theofStandard
the sum
Bernoulli
(after
Normal or gamma
standardisation) – as
distribution. Inalways n gets
fact, itgets larger and
closer
is the most larger
and the distribution
closer toreason
important that offor of
thethe the
Standardsum
central
(after
Normal standardisation)
role of thedistribution. always
In fact, it is
Normal distribution gets closer
in the and closer to that of the
most important reason for the central
Statistics. Standard
Normal
role of distribution.
the Normal In fact,
distribution
Generally, the name “Central Limit it is
in the mostTheorem”
Statistics. importantisreason
used tofordescribe
the centralany
role of the
convergence Normal
Generally, theorem distribution
the namein which “Central in Statistics.
Limit Theorem”
the Normal distributionis used to describe
appears as the limit.any
MoreGenerally,
convergence
particularly, theitname
theorem in which
applies “Central
tothe sums Limit
Normal Theorem”
of largedistribution
numbers is used to describe
appears
or random as the any
limit.
variables
convergence
More theorem
havingparticularly,
approximately inthewhich
it applies tothe
Standard sumsNormal
of large
Normal distribution
numbers appears
distribution or after
random as the limit.
variables
standardisa-
More
having particularly,
approximately
tion. Though it
the nametheapplies to
Standard
“Central sums of large
Normal
Limit Theorem” numbers
distribution or
was given random
after variables
bystandardisa-
G. Polya in
having
tion. approximately
Though the name the Standard
“Central Normal
Limit Theorem”distribution
1920, the concept was first introduced by De Moivre early in the eighteenthwas after
given by standardisa-
G. Polya in
tion.
1920,
century.Though
the De the
concept
Moivre name
wasproved “Central
first introduced Limit
special cases Theorem”
by De in Moivre was
1733 and given
early by G.
in thelater
Laplace Polya
eighteenth in
in his
1920,
century.the
book of 1812 concept
De Moivre was
gave the first
proved introduced by De
special cases statement
first (incomplete) Moivre
in 1733 and early in
of Laplacethe
the general eighteenth
laterform
in his
of
century.
book
the CLT De
of 1812 Moivre
so the gave
CLT proved
theof first
Theorem special
(incomplete) cases
5.16 in the in 1733
statement and Laplace
of the
sequel (the later
general form)
Bernoulli in his
form of
is
book
the
oftenCLTof 1812
so the
referred gaveas the
toCLT theof first
Theorem
De (incomplete)
5.16 in the
Moivre–Laplace statement
sequel
CLT. The of modern
(the the general
Bernoulli
CLTform)form
(Theo-of
is
the
rem CLT
often so the
referred
5.15 in thetosequel)
CLT
as the ofbegan
Theorem
De 5.16 infrom
Moivre–Laplace
to emerge theCLT.
sequel
the The
work (the Bernoulli
modern
of the CLTform)
Russian (Theo-
schoolis
often
rem referred
5.15 in thetosequel)
of Probabilists–notably as thebeganDe Moivre–Laplace
Chebyshev to emergeand from CLT. The
the work
his students modern
of CLT
the Russian
Markov and (Theo-
school
Lyapunov
rem
of 5.15 in the sequel)
Probabilists–notably
towards began
the close of theChebyshev to emerge
nineteenthand from the
his students
century. work of
In factMarkov the
Lyapunov Russian
andproved school
Lyapunov the
of Probabilists–notably
towards the close of the Chebyshev
nineteenth and his
century. students
In fact
first general version of CLT in about 1901, though the details of his proof Markov
Lyapunov and Lyapunov
proved the
towards
first
were generalthe close
version
complicated. of of
theCLT nineteenth
in aboutcentury. In factthe
1901, though Lyapunov
details ofproved
his proofthe
192
first general version of CLT in about
Advanced 1901, though
Topics the details
in Introductory of his proof
Probability
were The
complicated.
central limit theorem has been expressed in many forms, differing
were The
complicated.limit theorem has been expressed in many forms, differing
in variouscentral
degrees of abstraction and generality. We shall present some of
in The
various central
degrees
the simplest versions limit
of buttheorem
abstraction
first we hasandbeen
state itsexpressed
generality. Wein many forms, differing
shall present
basic theorems. some of
in various degrees of abstraction and generality. We shall present some of

Theorem 5.13 POLYA’S THEOREM


Let X1 , X2 , · · · , Xn be a sequence of independent and identically dis-
tributed random variables with finite mean E(X) = µ and fine vari-
ance Var(X) = σ 2 > 0 and let

∗ n(Xn − µ)
Zn =
σ
Then
d
P (Zn∗ ≤ z ∗ ) −→ Φ(z ∗ ) as n → ∞
where Φ is the c.d.f. of the Standard Normal distribution

We shall observe in Theorem 5.17 in the sequel that Polya’s Theorem is the
central limit theorem of means.
177

Theorem 5.14 CENTRAL LIMIT THEOREM


Then
d
P (Zn∗ ≤ z ∗ ) −→ Φ(z ∗ ) as n
ADVANCED TOPICS IN INTRODUCTORY →∞
PROBABILITY: A FIRST COURSE IN
where Φ THEORY
PROBABILITY is the c.d.f. of theIIIStandard Normal
– VOLUME distribution
Statistical Inequalities and Limit Laws

We shall observe in Theorem 5.17 in the sequel that Polya’s Theorem is the
central limit theorem of means.

Theorem 5.14 CENTRAL LIMIT THEOREM


(Lindberg-Levy Theorem)
Suppose Z1 , Z2 , · · · , Zn are independent and identically distributed
Standard Normal random variables and let
Z1 + Z2 + · · · + Zn
Sn∗ = √
n

Then
P (Sn∗ ≤ s∗ ) → Φ(s∗ ) as n → ∞
where Φ is the c.d.f. of the Standard Normal distribution

For proof see page 316 of Dudewicz and Mishra, 1988, listed among the
bibliography.

5.5.2 Central Limit Theorem of Sums

The Central Limit Theorem is concerned with the distribution of the “sum
of random variables”. If X1 , X2 , ..., is a sequence of independent random

178
where Φ is the c.d.f. of the Standard Normal distribution
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
For proof seeTHEORY
PROBABILITY page 316 of Dudewicz
– VOLUME III and Mishra, 1988, listed among
Statistical the
Inequalities and Limit Laws
bibliography.

5.5.2 Central Limit Theorem of Sums


Statistical Inequalities and Limit Laws 193
The Central Limit Theorem is concerned with the distribution of the “sum
of random variables”. If X1 , X2 , ..., is a sequence of independent random
variables with expectation E(X) and variance Var(X) and if
n

Sn = Xi
i=1

Sn
we know from the law of large numbers that converges to E(X)
n
in probability. The Central Limit Theorem is concerned not with the fact
Sn
that the ratio converges to E(X) but with how it fluctuates around
n
E(X). To analyse these fluctuations, we standardise the sum

Sn − E(Sn )
Tn =  (i)
Var(Sn )

It is easily verified that Tn has mean 0 and variance 1.

As usual we shall discuss two forms of CLT, namely, the Khinchin form
(continuous case) and the Bernoulli form (discrete case.)

Continuous Case

Theorem 5.15 CENTRAL LIMIT THEOREM


(Khinchin Form)
Let X1 , · · · , Xn be a sequence of independently and identically dis-
tributed random variables, each having mean µ and variance σ 2 . Let
n

Sn = Xi .
i=1

Then the distribution of


Sn − nµ
Tn = √
σ n

tends to the Standard Normal distribution when n becomes infinite

194 Advanced Topics in Introductory Probability


Proof
Since Xn ’s are independent and identically distributed, then from Corollary
3.4 and Theorem 3.19

E(Sn ) = nµ, and Var(Sn ) = nσ 2


Substituting these expressions in (i) above the result follows immediately.

Note
These theorems will also hold when X1 , · · · , Xn are independent random
variables with the same mean and the same finite variance but not neces-
sarily identically distributed.

Discrete Case
The De Moivre-Laplace Theorem discussed in Chapter 12 of Volume I is
179
actually a special case of the Central Limit Theorem, namely, the Bernoulli
form of CLT of frequency.
Substituting these expressions in (i) above the result follows immediately.

Note
ADVANCED TOPICS IN INTRODUCTORY
These theorems
PROBABILITY: will COURSE
A FIRST IN when X1 , · · · , Xn are independent random
also hold
variables with the same mean IIIand the same finite variance
PROBABILITY THEORY – VOLUME but not
Statistical neces-
Inequalities and Limit Laws
sarily identically distributed.

Discrete Case
The De Moivre-Laplace Theorem discussed in Chapter 12 of Volume I is
actually a special case of the Central Limit Theorem, namely, the Bernoulli
form of CLT of frequency.
The Bernoulli and other forms of the central limit theorem arise out of
Theorem 5.15. For instance, if the random variables Xi ’s are independent
Bernoulli random variables, their sum
n

Mn = Xi
i=1

is a binomial random variable. Substituting the expressions


E(Mn ) = np and Var(Mn ) = npq
in (i) above, the result follows immediately.

Theorem 5.16 CENTRAL LIMIT THEOREM


(Bernoulli Form)
Let X1 , X2 , · · · , Xn be independent Bernoulli random variables with
parameter p so that E(Xi ) = p and Var(Xi ) = pq. If
n

Mn = Xi
i=1

then
Mn − np
Yn = √
npq
tends toInequalities
Statistical the standard
andNormal distribution as n → ∞
Limit Laws 195

Thus, the central limit theorem may mathematically be stated as

lim P (Zn ≤ z) = Φ(z),


n→∞

where Φ is the cumulative distribution function of the standard Normal


distribution.

Significance of the Central Limit Theorem

The computational significance of the CLT is that for large n, we can express
the cumulative distribution function of Tn in terms of N (0, 1) as follows.
In the case of frequencies,
 
y − np
P (Yn ≤ y) ≈ Φ
npq
where y is a particular value of Yn .
In the case of continuous random variables
 
t − nµ
P (Tn ≤ t) ≈ Φ √
σ n
where t is a particular value of Tn .
The logical question that arises at this point is: How large should the
n be to enable us apply the Central Limit Theorem. There is no single
answer, since this depends on the closeness of approximation required and
the actual distribution forms of Xi ’s. If the random variables Xi ’s are nor-
mally distributed, then no matter how small n 180is, their sum is also normally
distributed and P (Tn ≤ t) provides exact probabilities. If nothing is known
where y is a particular value of Yn .
In the case of continuous random variables
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN  
t − nµ
P (Tn III≤ t)
PROBABILITY THEORY – VOLUME ≈Φ √ Statistical Inequalities and Limit Laws
σ n
where t is a particular value of Tn .
The logical question that arises at this point is: How large should the
n be to enable us apply the Central Limit Theorem. There is no single
answer, since this depends on the closeness of approximation required and
the actual distribution forms of Xi ’s. If the random variables Xi ’s are nor-
mally distributed, then no matter how small n is, their sum is also normally
distributed and P (Tn ≤ t) provides exact probabilities. If nothing is known
about the distribution patterns of Xi ’s, or if the distribution of Xi ’s differs
greatly from normality then n must be large enough to guarantee approx-
imate normality for Sn . One rule of thumb states that, in most practical
situations, n equals at least 30 is satisfactory. In general, the approximation
of the sum of random variables to normality becomes better and better as
the sample size increases.

Example 5.10
A die is thrown ninety times. At a given throw the expected number of
7 32
points is and the variance is . Find
2 12
(a) the expectation of the sum
196 of points;
Advanced Topics in Introductory Probability

(b) the variance of the sum of points;

(c) the probability that the sum obtained is at most 300.

Solution
7 32
n = 90, µ = , σ2 =
2 12
 
7
(a) E(Sn ) = nµ = 90 = 315
2 
32
(b) Var(Sn ) = nσ 2 = 90 = 240
12
(c) s = 300

   
Sn − nµ s − nµ s − nµ
P (Sn ≤ s) = P √ ≤ √ = P Tn ≤ √
σ n σ n σ n

 
 300 − 315 
Thus, P (Sn ≤ 300) = P 
 T n ≤  
32 √ 
90
 12 
−15
= P Tn ≤
(1.6330)(9.4868)
= P (Tn ≤ −0.968)
= 1 − Φ(0.97)
= 1 − 0.8340 = 0.166
.

5.5.3 Central Limit Theorem of Means

Another important version of the Central Limit


181 Theorem is expressed in
terms of the means of independent and identically distributed random vari-
(b) the variance of the sum of points;

ADVANCED TOPICS IN INTRODUCTORY


PROBABILITY: A FIRST COURSE IN
(c) the probability that the sum obtained is at most 300.
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws

Solution
7 32
n = 90, µ = , σ2 =
2 12
 
7
(a) E(Sn ) = nµ = 90 = 315
2 
32
(b) Var(Sn ) = nσ 2 = 90 = 240
12
(c) s = 300

   
Sn − nµ s − nµ s − nµ
P (Sn ≤ s) = P √ ≤ √ = P Tn ≤ √
σ n σ n σ n

 
 300 − 315 
Thus, P (Sn ≤ 300) = P 
Tn ≤  32 √ 

90
 12 
−15
= P Tn ≤
(1.6330)(9.4868)
= P (Tn ≤ −0.968)
= 1 − Φ(0.97)
= 1 − 0.8340 = 0.166

5.5.3 Central Limit Theorem of Means

Another important version of the Central Limit Theorem is expressed in


terms of the means of independent and identically distributed random vari-
Statistical Inequalities and Limit Laws
ables. 197

Theorem 5.17 CENTRAL LIMIT THEOREM


(of Means)
Let X1 , · · · , Xn be n independent and identically distributed random
variables with means equal to µ and variances equal to σ 2 . Let Xn
be the random variable defined by
X 1 + X 2 + · · · + Xn
Xn =
n
The distribution of

Xn − µ
Zn = √
σ/ n
tends to the Standard Normal distribution as n → ∞

Note
Zn is the standardised mean of the random variables X1 , · · · , Xn and it is
equal to Tn in Theorem 5.16.

Proof
The proof is quite sophisticated. An outline of a proof is given in Hoel
(1971), and Freund and Walpole (1971) listed 182
among the references.
tends to the Standard Normal distribution as n → ∞

ADVANCED TOPICS IN INTRODUCTORY


Note
PROBABILITY: A FIRST COURSE IN
Zn is the standardised
PROBABILITY mean of
THEORY – VOLUME III the random variables Statistical and it is
X1 , · · · , Xn Inequalities and Limit Laws
equal to Tn in Theorem 5.16.

Proof
The proof is quite sophisticated. An outline of a proof is given in Hoel
(1971), and Freund and Walpole (1971) listed among the references.

Example 5.11
Suppose that I.Q. is a random variable with mean µ = 100 and standard
deviation σ = 25. What is the probability that in a class of 40 students
(a) the average I.Q. exceeds 100?

(b) a single person’s I.Q. exceeds 100?

Solution
25
n = 40, µ = 110, σ = 25, σX = √ = 3.9528
40
By the Central Limit Theorem, the mean I.Q. of the 40 students is normally
distributed. Hence
(a) The probability that the average I.Q. exceeds 100 is
198 Advanced  Topics in Introductory
 Probability
110 − 100
P (X ≥ 100) = P Z ≥
3.9528
= P (Z ≥ 2.53)
= 1 − Φ(2.53)
= 1 − 0.9943
= 0.0057

So, approximately, 0.6% of the mean I.Q.’s of a class of 40 students will


exceed 100.

(b) The probability that a single person’s I.Q. exceeds 100 is


 
110 − 100
P (X > 100) = P Z ≥
25
= P (Z ≥ 0.4)
= 1 − Φ(0.4)
= 1 − 0.6554
= 0.3446

Theorem 5.18
Suppose that X1 , X2 , · · · , Xn are independent and identically dis-
tributed random variables, each having mean µ and Var(X) = σ 2 .
Suppose also that the average of the measurements, X, is used as
an estimate of µ. Then
 √   √ 
 n  n
P (|X − µ| < ) ≈ Φ −Φ −
σ σ

Proof
Suppose that we wish to find

P (|X − µ| < )

for some constant  > 0. To use the Central Limit


183 Theorem to approximate
this probability, we standardise the mean, using E(X) = µ and Var(X) =
 √   √ 
 √n   √n 
P (|X − µ| < ) ≈ Φ  n − Φ − n
ADVANCED TOPICS (|XINTRODUCTORY
P IN − µ| < ) ≈ Φ σ −Φ − σ
σ σ
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws

Proof
Proof
Suppose that we wish to find
Suppose that we wish to find
P (|X − µ| < )
P (|X − µ| < )
for some constant
Statistical  > 0.
Inequalities andTo use the Central Limit Theorem to approximate
199
for some constant  > 0. ToLimit Laws
use the Central Limit Theorem to approximate
this probability, we standardise the mean, using E(X) = µ and Var(X) =
this probability, we standardise the mean, using E(X) = µ and Var(X) =
σ 22
σ ;
n;
n
P (|X − µ| < ) = P (− < X − µ < )
 
 − X −µ  
= P σ < σ < σ 

√ √ √
n n n
 √   √ 
 n  n
≈ Φ −Φ −
σ σ
If we use the Standard Normal Table I in the Appendix, then we may write
this formula as  √ 
 n
P (|X − µ| < ) ≈ 2Φ −1
σ
On the other hand
 √ 
 n
P (|X − µ| < ) ≈ 2Ψ
σ
if we use Table II in the Appendix.

The Law of Large Numbers tells us that X converges to µ in


probability; so we can hope that X is close to µ if n is large. Cheby-
shev’s inequality allows us to determine the bound of the probability of an
error of a given size, but the Central Limit Theorem gives a much sharper
approximation to the actual error.

Example 5.12
Suppose that 25 measurements are taken with σ = 1.2. Find the proba-
bility that the sample mean X deviates from µ by less than 0.3.

Solution
n = 25, σ = 1.2, e = 0.3

√   √ 
0.3 25 0.3 25
P (|X − µ| < 0.3) ≈ Φ −Φ −
1.2 1.2
 √ 
0.3 25
= 2Φ −1
1.2
= 2Φ(1.25) − 1
= 0.7888

184
On the other hand
 √ 
ADVANCED TOPICS IN INTRODUCTORY  n
P (|X −
PROBABILITY: A FIRST COURSE IN µ| < ) ≈ 2Ψ
PROBABILITY THEORY – VOLUME III
σ Statistical Inequalities and Limit Laws
if we use Table II in the Appendix.

The Law of Large Numbers tells us that X converges to µ in


probability; so we can hope that X is close to µ if n is large. Cheby-
shev’s inequality allows us to determine the bound of the probability of an
error of a given size, but the Central Limit Theorem gives a much sharper
approximation to the actual error.

Example 5.12
Suppose that 25 measurements are taken with σ = 1.2. Find the proba-
bility that the sample mean X deviates from µ by less than 0.3.

Solution
n = 25, σ = 1.2, e = 0.3

√   √ 
0.3 25 0.3 25
P (|X − µ| < 0.3) ≈ Φ −Φ −
1.2 1.2
 √ 
0.3 25
= 2Φ −1
1.2
= 2Φ(1.25) − 1
200 Advanced Topics in Introductory Probability
= 0.7888

This sort of reasoning can be turned around. That is, given  and Y, n
can be found such that

P (|Xn − µ| < ) ≥ θ

where θ is the probability that corresponds to the z value in the Standard


Normal Table (see Tables I and II in the Appendix).

With the Central Limit Theorem we can derive a formula for calculating
the value of n such that the probability that the mean deviates from µ by 
is θ :   
σ −1 1 + θ 2
n= Φ
 2
if we use the Full Normal Table (Table I in the Appendix), or equivalently,
  2
σ −1 θ
n= Ψ
 2
using the Half Normal Table (Table II in the Appendix).

Example 5.13
Refer to Example 5.12. Suppose σ = 1.2 and  = 0.3, find n such that the
probability that the mean deviates from µ by 0.3.

Solution
σ2 = 2  = 0.25 θ = 0.75

Therefore
  2
2 1 + 0.75
n= Φ−1 = [8 Φ−1 (.875)]2 = 8(1.15)2 = 84.64 ≈ 85
0.25 2

5.5.4 Central Limit Theorem of Proportions

Sometimes we may have to work with proportions of successes in n inde-


−np
pendent trials than with actual number of successes. In this case M√nnpq
becomes 185
Mn
−p
σ =2  = 0.25 θ = 0.75

Therefore TOPICS IN INTRODUCTORY


ADVANCED
PROBABILITY:
 A FIRST
 COURSEIN
2 THEORY
PROBABILITY −1 1 –+VOLUME
0.75 2III Statistical
n= Φ = [8 Φ−1 (.875)]2 = 8(1.15) 2
= 84.64Inequalities
≈ 85 and Limit Laws
0.25 2

5.5.4 Central Limit Theorem of Proportions

Sometimes we may have to work with proportions of successes in n inde-


−np
pendent trials than with actual number of successes. In this case M√nnpq
becomes
Mn
−p
n
Statistical Inequalities and Limit Lawspq 201
n

and this leads us to the following theorem.

Theorem 5.19 CENTRAL LIMIT THEOREM


(Proportions)
Mn
Suppose in n Bernoulli trials with Mn successes, is the propor-
n
tion (relative frequency) of successes and the probability of success
in each trial is p. Then
Mn
−p
Yn = n
pq
n
tends to the Standard Normal distribution as n → ∞

We conclude this chapter by summarising the two limit laws as fol-


lows. Generally, if a theorem is concerned with stating conditions under
which variables converge (in some sense) to the expected average, then it is
a law of large numbers. If on the other hand, the theorem is concerned with
determining conditions under which the sum of a large number of random
variables has a probability distribution that is approximately Normal, it is
considered a central limit theorem.

EXERCISES

5.1 Refer to Example 5.1. Find the bounds for the probability that the
month’s production will be

(a) at most 300 bales;


(b) at least 180 bales.

5.2 The mean lifetime of a certain electrical device is 4 years. Find the
lower bounds for the probability that a randomly selected device from
a consignment of such devices will not exceed 20 years.

186
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
202 Advanced Topics in Introductory Probability

5.3 The amount of savings in a certain Rural Bank is ten million cedis.
Suppose the probability that an amount of at most two hundred thou-
sand cedis drawn by a customer selected at random is 0.8. What can
we say about the number of customers.

5.4 Refer to Example 5.2. Find the bounds for the probability that the
number of bulbs switched on in the area in that evening is different
from its expected value in absolute terms by

(a) at least 80 bulbs;


(b) at most 200 bulbs;
(c) not more than 150.

5.5 The distribution of a random variable X is given by the following table:

xi 1 2 3 4 5 6
p(xi ) 0.05 0.10 0.25 0.30 0.20 0.10

(a) Find P (|X − E(X)| < 2)


(b) Use Chebyshev’s inequality to estimate the lower bounds of the
probability in (a) above. Hint: E(X) = 3.8 and Var(X) = 1.77.

5.6 Suppose that a random variable has mean 35 and standard deviation
5. Use the Chebyshev’s inequality to estimate the probability that an
outcome will lie between
(a) 24 and 46 (b) 18 and 52 (c) 31 and 39

5.7 A Pharmaceutical Company supplies boxes of paracetamol tablets to a


particular hospital. Owing to the packaging procedure, not every box
contains exactly 150 tablets. It has been found out that on the average
the number of tablets in a box is indeed 150 and the standard deviation
is 4. If the company supplies the hospital 1, 000 boxes, estimate the
number of boxes having between 140 and 160 tablets.

5.8 Suppose that a random variable has mean 80 and standard deviation
6. Use the Chebyshev’s inequality to find the value of  for which the
probability that the outcome lies between 80 −  and 80 +  is at least
Statistical
5 Inequalities and Limit Laws 203
12 .

5.9 Suppose that a random variable has mean 25 and standard deviation
0.67. Use the Chebyshev’s inequality to find the value of  for which
the probability that the outcome lies between 25 −  and 25 +  is at
most 10
13 .

5.10 Refer to Example 5.4. Find the bounds for the following probabilities:

(a) P (|X − µ| ≤ σ)
(b) P (|X − µ| ≥ 3.3σ)
(c) P (|X − µ| ≤ 2.5σ)
(d) P (|X − µ| ≥ 1.65σ)
(e) P (|X − µ| ≥ 2.2σ)

5.11 Suppose that the number of hours a certain type of light bulb will burn
before requiring replacement has a mean of 2, 000 hours and standard
187
deviation of 150 hours. If 1, 000 such bulbs are installed in a new house,
5.9 Suppose
ADVANCED thatINaINTRODUCTORY
TOPICS random variable has mean 25 and standard deviation
PROBABILITY: A FIRST COURSE
0.67. Use the Chebyshev’sIN inequality to find the value of  for which
PROBABILITY THEORY – VOLUME III Statistical Inequalities and Limit Laws
the probability that the outcome lies between 25 −  and 25 +  is at
most 10
13 .

5.10 Refer to Example 5.4. Find the bounds for the following probabilities:

(a) P (|X − µ| ≤ σ)
(b) P (|X − µ| ≥ 3.3σ)
(c) P (|X − µ| ≤ 2.5σ)
(d) P (|X − µ| ≥ 1.65σ)
(e) P (|X − µ| ≥ 2.2σ)

5.11 Suppose that the number of hours a certain type of light bulb will burn
before requiring replacement has a mean of 2, 000 hours and standard
deviation of 150 hours. If 1, 000 such bulbs are installed in a new house,
estimate the number that will require replacement between 1, 200 and
2, 800 hours from the time of installation.

5.12 Refer to Example 5.7. Find an upper bound for the probability that
in 5,000 trials the deviation of the relative frequency of the event A
from its probability will not exceed 0.08.

5.13 A manufacturing company produces ground measuring poles with a


mean length of 2 metres and a standard deviation of 1.5 metre per
pole. Assume that the pole lengths, X, are statistically independent.
A container of 100 poles is sent to a customer for distribution. Find
the probability that the total length of all poles in the container is

(a) not more than 195 metres


(b) at least 220 metres
(c) between 180 and 210

5.14 Refer to Exercise 5.13. How many observations should be included in


the sample if we wish X, the mean length of poles, to be within 0.1
metres of µ with probability 0.95.

5.15 The final scores of the students in Diploma class over a period of four
WE WILL TURN YOUR CV
years have a mean 60 and a variance 64. In a particular year there

INTO AN OPPORTUNITY
OF A LIFETIME

Do you like cars? Would you like to be a part of a successful brand?


Send us your CV on
As a constructer at ŠKODA AUTO you will put great things in motion. Things that will
www.employerforlife.com
ease everyday lives of people all around Send us your CV. We will give it an entirely
new new dimension.

188
estimate the number that will require replacement between 1, 200 and
2, 800 hours from the time of installation.
ADVANCED TOPICS IN INTRODUCTORY
5.12 Refer toA Example
PROBABILITY: 5.7. IN
FIRST COURSE Find anupper bound for the probability that
in 5,000 trials the deviation
PROBABILITY THEORY – VOLUME III ofthe relative frequency of the Inequalities
Statistical event A and Limit Laws
from its probability will not exceed 0.08.

5.13 A manufacturing company produces ground measuring poles with a


mean length of 2 metres and a standard deviation of 1.5 metre per
pole. Assume that the pole lengths, X, are statistically independent.
A container of 100 poles is sent to a customer for distribution. Find
the probability that the total length of all poles in the container is

(a) not more than 195 metres


(b) at least 220 metres
(c) between 180 and 210

5.14 Refer to Exercise 5.13. How many observations should be included in


the sample if we wish X, the mean length of poles, to be within 0.1
metres of µ with probability 0.95.
204 Advanced Topics in Introductory Probability
5.15 The final scores of the students in Diploma class over a period of four
years have a mean 60 and a variance 64. In a particular year there
were 100 students with a mean score of 58. Calculate the probability
that the sample mean, X is at most 58.

5.16 A bolt manufacturer knows that, on average, 5% of his production


is defective. He gives a guarantee on a shipment of 10, 000 bolts, by
promising to refund the money if more than K bolts are defective.
How small can the manufacturer choose K and still be assured that
he needs not give a refund of more than 1% of his shipment?

5.17 X n is to be used as an approximation to µ. If σ is known to be 2,


calculate the sample size n that will be required to ensure that the
error in this approximation will be less than 0.05 with probability of
at least 0.90.

5.18 Suppose that the number of students that enrol in the Basic Statistics
course in the Faculty of Social Studies at the University of Ghana is
a Poisson random variable with mean 500. The Co-ordinator of the
course has decided that if the number enrolling is 350 or more he will
split the group into two, otherwise they will all be in the same class.
What is the probability that the class will be split?

5.19 The mean and standard deviation of the ages of statistics students of
the University of Ghana are 20 years and 5.8 years respectively. What
is the probability that a random sample of 50 students will have a
mean age of between 18 and 23 years?

5.20 The mean age and the standard deviation of Statistics students are
18 years and 1.8 years respectively. What is the probability that a
random sample of 50 students will have a mean of 16 and 20 years?

5.21 Use the Chebyshev’s inequality to determine how large n should be in


order that you are at least 95 per cent certain that the sample mean
X n is within 0.9 σ of the population mean µ?

5.22 Use the Chebyshev’s inequality to determine how large n should be


in order that you are at most 95 per cent certain that Xn is different
from µ by at least 0.2 σ?

5.23 Let X1 , X2 , · · · , Xn be independent, identically distributed random


variables each having p.d.f.
189
order that you are at least 95 per cent certain that the sample mean
X n is within 0.9 σ of the population mean µ?
ADVANCED TOPICS IN INTRODUCTORY
5.22 Use theA Chebyshev’s
PROBABILITY: FIRST COURSEinequality
IN to determine how large n should be
PROBABILITY
in orderTHEORY – VOLUME
that you are at III
most 95 per cent certainStatistical
that Xn is Inequalities
different and Limit Laws

from µ by at least 0.2 σ?


Statistical Inequalities and Limit Laws 205
5.23 Let X1 , X2 , · · · , Xn be independent, identically distributed random
variables each having p.d.f.
 x
1
e− 2 , x > 0
f (x) = 2
0, x≤0


Let Sn = n1 ni=1 Xi . Use the Chebyshev’s inequality to estimate the
minimum possible value of n, given that

P {|Sn − E(Sn )| ≥ 0.25} ≤ 0.75

For this value of n, calculate the upper bound to the given probability
obtained by applying Chebyshev’s inequality.

5.24 The burning time of a certain type of lamp is an exponential random


variable with mean µ = 20 hours and standard deviation 14 hours.
What is the probability that 80 of these lamps will provide a total of
more than 2000 hours of burning time?

5.25 Twenty numbers are rounded off to the nearest integer and then added.
Assume the individual round-off errors are independent and uniformly
distributed over (− 12 , 12 ). Find the probability that the given sum will
differ from the sum of the original twenty numbers by 3 or more using
(a) Chebyshev’s inequality (b) Central Limit Theorem.

5.26 Suppose a coin is tossed 26 times. Estimate the probability that the
number of heads will deviate from 12 by less than 4, using
(a) Chebyshev’s inequality (b) Central Limit Theorem.

5.27 Prove Theorems 5.11.

5.28 Prove Theorem 5.12(b).

5.29 A consignment of 1,800 items has been produced by a manufacturing


company whose products are 4% defective. Suppose the company’s
production process is an independent trial process with each item pro-
duced having a probability of 0.3 of being defective. Use Chebyshev’s
inequality to give a bound on the probability that in the batch of 1,800
items the number of defective is between 39 and 69.
206 Advanced Topics in Introductory Probability
5.30 Past experience indicates that wire rods purchased from a certain com-
pany have a mean breaking strength of 400 kg and a standard deviation
of 20 kg. How many rods should you select so that you would be cer-
tain with probability 0.95 that your sample mean would not be in error
by more than 3 kg.

190
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTIONS I Basic Concepts

Chapter 6

SAMPLING DISTRIBUTIONS I
Basic Concepts

6.1 INTRODUCTION

So far we have gone through the various aspects of probability theory in


my earlier books. In the Volume I (Nsowah-Nuamah, 2017), we discussed
the basic theorems in probability in where we emphasised conditional prob-
ability, Bayes’ Theorem and Independence. We considered the Random
Variable and it was there that the concept of probability distribution was
first introduced. We also discussed the Numerical Characteristics of the
random variable and there we encountered the concepts of Mathematical
Expectation, Variance, Moments, and Moment Generating Function. In the
Volume II (Nsowah-Nuamah, 2018) we considered some special probability
distributions. These included the Bernoulli distribution, Binomial distribu-
tion, Geometric distribution, Negative binomial distribution, Poisson dis-
tribution, Hypergeometric distribution, Multinomial distribution, Uniform
distribution, Exponential distribution, Gamma distribution, Beta distribu-
tion and Normal distribution.
In Chapters 1 to 4 of this book, we extended the concept of probability
distributions to bivariate distributions. In Chapter 5 we discussed the basic
Inequalities in Statistics (Markov’s and Chebyshev’s Inequalities) and the
two main Limit Laws (the Central Limit Theorem and the Law of Large
Develop the tools we need for Life Science
Numbers) which have wider and practical applications in Statistics.
We can nowMasters Degree
conclude the in some
book with Bioinformatics
analytical statistics con-
cerned with inferential procedures. The present chapter and the subse-

207

Bioinformatics is the
exciting field where biology,
computer science, and
mathematics meet.

We solve problems from


biology and medicine using
methods and tools from
computer science and
mathematics.

Read more about this and our other international masters degree programmes at www.uu.se/master

191
Volume II (Nsowah-Nuamah, 2018) we considered some special probability
distributions. These included the Bernoulli distribution, Binomial distribu-
ADVANCED TOPICS IN INTRODUCTORY
tion, Geometric distribution, Negative binomial distribution, Poisson dis-
PROBABILITY: A FIRST COURSE IN
tribution, Hypergeometric
PROBABILITY THEORY – VOLUMEdistribution,
III Multinomial SAMPLING
distribution, Uniform I Basic Concepts
DISTRIBUTIONS
distribution, Exponential distribution, Gamma distribution, Beta distribu-
tion and Normal distribution.
In Chapters 1 to 4 of this book, we extended the concept of probability
distributions to bivariate distributions. In Chapter 5 we discussed the basic
Inequalities in Statistics (Markov’s and Chebyshev’s Inequalities) and the
two main Limit Laws (the Central Limit Theorem and the Law of Large
Numbers) which have wider and practical applications in Statistics.
208 Advanced Topics in Introductory Probability
We can now conclude the book with some analytical statistics con-
cerned with inferential procedures. The present chapter and the subse-
quent ones serve as a bridge between the topics discussed in this book and
statistical inference. 207

6.2 STATISTICAL INFERENCE

Statistical inference involves drawing conclusions on a large data set on the


basis of information obtained from a small portion of the large collection.
Two main concepts that form the basis of statistical inference are population
and sample.

6.2.1 Population

Definition 6.1 POPULATION


A population is a large data set that contains all the members (ele-
ments) of interest

Definition 6.2 FINITE POPULATION


A finite population is a population that has a stated or limited num-
ber of members

Examples of a finite population include all students in the University of


Ghana, all households in Accra, all the mud houses in Kumasi. It is possible
(though not always practical) to count the number of elements in a finite
population.

Definition 6.3 INFINITE POPULATION


An infinite population is composed of a limitless number of members

The elements
Sampling in an infinite
Distributions I population cannot be counted completely209 no
matter how long the counting process is carried on. An example of an
infinite population is an experiment of tossing a coin to determine whether
or not the coin is biased. Theoretically, this experiment could be carried
out an infinite number of times.
Philosophically, no truly infinite population of physical objects exists.
After all, given unlimited resources and time, we could enumerate even the
grains of sand, on a particular seashore. As a practical matter, then, we will
use the term infinite population when we are talking about a population
that could not be enumerated in a reasonable period of time.

6.2.2 Sample
192
It is obvious that in an infinite population complete enumeration cannot
Sampling Distributions
ADVANCED I
TOPICS IN INTRODUCTORY 209
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTIONS I Basic Concepts
or not the coin is biased. Theoretically, this experiment could be carried
out an infinite number of times.
Philosophically, no truly infinite population of physical objects exists.
After all, given unlimited resources and time, we could enumerate even the
grains of sand, on a particular seashore. As a practical matter, then, we will
use the term infinite population when we are talking about a population
that could not be enumerated in a reasonable period of time.

6.2.2 Sample

It is obvious that in an infinite population complete enumeration cannot


be secured. Even in a finite but large population it is often costly, time
consuming, or even physically impossible to conduct a census (complete
enumeration). Any inferences on the population should therefore be based
on a part of the population called the sample.

Definition 6.4 SAMPLE


A sample is a part of the population

An example of a sample includes selecting a few pupils from a particular


school in order to study the socio-economic background of pupils in that
school.
If a sample of n elements is drawn, the sample is said to be of size n.

6.2.3 Parameter and Statistic

From Chapter 7 through 12, we discussed a number of special distributions


but did not specify how they were realised in any particular case. Those
probabilities that we were calculating depended on certain parameters de-
scribing such distributions.

Population Parameter

Definition 6.5 POPULATION PARAMETER


A population parameter is a measure computed from observations of
210 the population Advanced Topics in Introductory Probability

Examples of population parameter (or simply parameter ) are the values of


p in the case of the binomial distribution, λ in the case of the Poisson distri-
bution or µ and σ in the case of the Normal distribution. Other parameters
include the median and the range.
Most often the probability distribution of a population is not known
precisely although we may have some idea of or at least be able to make
some hypothesis concerning its general behaviour. Thus, for example, we
may have some reason to suppose that a particular population is binomially
distributed. In such a case we would not know the values p and so we might
have to estimate.
In short, we have no direct knowledge of the probability distribution
and therefore have to approximate it empirically by a frequency distribution.
The number of experiments performed for this purpose, called a sample, is
necessarily finite.

Definition 6.6 SAMPLE STATISTIC


A sample statistic (or simply statistic) is a193
measure calculated from
the sample
have to estimate.
In short, we have no direct knowledge of the probability distribution
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: Ahave
and therefore toCOURSE
FIRST approximate
IN it empirically by a frequency distribution.
The number THEORY
PROBABILITY of experiments
– VOLUMEperformed
III for this purpose, called DISTRIBUTIONS
SAMPLING a sample, is I Basic Concepts
necessarily finite.

Definition 6.6 SAMPLE STATISTIC


A sample statistic (or simply statistic) is a measure calculated from
the sample

Examples of statistic include the sample mean, the sample proportion, the
sample variance, the sample range.

Definition 6.7 SAMPLING


The process of obtaining samples from a population is called sam-
pling

As a rule, small populations, for obvious reasons, are not sampled. Instead,
the entire population is examined. A sample that contains all the members in
the population is called an exhaustive sampling, or a 100 per cent sampling,
which of course, are only other names for census.
There are basically two types of sampling: probability sampling and
non-probability sampling. In this book we shall consider only probability
sampling because it is only for probability sampling that there are sound
statistical procedures for drawing conclusions concerning the population of
interest based on the sample drawn.

Copenhagen
Master of Excellence cultural studies
Copenhagen Master of Excellence are
two-year master degrees taught in English
at one of Europe’s leading universities religious studies

Come to Copenhagen - and aspire!

Apply now at science


www.come.ku.dk

194
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME
Sampling Distributions I III SAMPLING DISTRIBUTIONS
211 I Basic Concepts

6.3 PROBABILITY SAMPLING

6.3.1 Definition of Probability Sampling

Probability sampling is the scientific method by which the units of the sam-
ple are chosen based on some definite pre-assigned probability. The various
types of probability sampling include the cases where:

(a) each member of the population has an equal chance of being selected;

(b) the members of the population have different probabilities of being


selected;

(c) the probability of selecting a member is proportional to the sample


size.

Definition 6.8 PROBABILITY SAMPLE


A probability sample is a sample drawn from a population in such a
way that every member of the population has a known and nonzero
probability of being included in the sample

For a probability sample neither the sampler nor the member of the popu-
lation can decide which member will be included in the sample. The selec-
tion is achieved by the operation of chance alone. Synonym for probability
sampling is random sampling. Random sampling methods include simple
random sampling, systematic sampling, and stratified sampling. Samples
obtained by taking every k th name in a list are called systematic samples.
Sometimes a population P is divided into groups, P1 , P2 , · · · , Pr called
strata and random samples are taken from these strata and combined to
get a stratified random sample. This is often done when we want to be sure
that we shall have specified numbers of subjects from each stratum. Other
sampling techniques include cluster sampling and multistage sampling. For
details about sampling techniques see Nuamah (1994).
The simple random sampling is the only one that can be considered a
true random sampling and in this book unless otherwise stated, by random
212
sampling we mean simple random sampling.
Advanced Topics in Introductory Probability

6.3.2 Simple Random Sampling

Before giving the conventional definition of simple random sampling we shall


first consider a mathematical definition.
Let f (x) be the density function of a continuous random variable X
for the population being sampled. We are now interested in the values of
X taken by the elements of the sample. Suppose that k different samples of
(j)
size n each are drawn from this population. Let xi represent the resulting
sample values for the jth sample (j = 1, 2, · · · , k) of size i (i = 1, 2, · · · , n).
That is,
(1) (1) (1)
1st sample: x1 , x2 , · · · , xi , · · · , x(1)
n
(2) (2) (2)
2nd sample: x1 , x2 , · · · , xi , · · · , x(2)
n
.. ..
. .
(j) (j) (j)
j th sample: x1 , x2 , · · · , 195
xi , · · · , x(j)
n
. .
first consider a mathematical definition.
Let f (x) be the density function of a continuous random variable X
ADVANCED TOPICS IN INTRODUCTORY
for the population being sampled. We are now interested in the values of
PROBABILITY: A FIRST COURSE IN
X taken by the elements of the sample. Suppose that SAMPLING
PROBABILITY THEORY – VOLUME III
k differentDISTRIBUTIONS
samples of I Basic Concepts
(j)
size n each are drawn from this population. Let xi represent the resulting
sample values for the jth sample (j = 1, 2, · · · , k) of size i (i = 1, 2, · · · , n).
That is,
(1) (1) (1)
1st sample: x1 , x2 , · · · , xi , · · · , x(1)
n
(2) (2) (2)
2nd sample: x1 , x2 , · · · , xi , · · · , x(2)
n
.. ..
. .
(j) (j) (j)
j th sample: x1 , x2 , · · · , xi , · · · , x(j)
n
.. ..
. .
(k) (k) (k)
k th sample: x1 , x2 , · · · , xi , · · · , x(k)
n

The values in the ith column may be considered the values of a random
variable Xi corresponding to the ith trial of the sample having probability
density function fi (xi ). Thus,

(1) (2) (j) (k)


X1 = x1 , x1 , · · · , x1 , · · · , x1 with density function f1 (x1 )
(1) (2) (j) (k)
X2 = x2 , x2 , ···, x2 , ···, x2 with density function f2 (x2 )
.. ..
. .
(1) (2) (j) (k)
Xi = xi , xi , · · · , xi , · · · , xi with density function fi (xi )
.. ..
. .
j
Xn = x(1) (2) (k)
n , xn , · · · , x(n) , · · · , xn with density function fn (xn )

The probability density function f (xi ) has to fulfill two conditions in order
to describe the process of simple random sampling. These are given in
Definition 6.9.
Sampling Distributions I 213

Definition 6.9 SIMPLE RANDOM SAMPLING


(Mathematical Definition)
Simple random sampling is a method of sampling for which

(a) f (x1 , x2 , ..., xn ) = f1 (x1 )f2 (x2 )...fn (xn ) and

(b) f1 (x1 ) = f2 (x2 ) = ... = fn (xn ) = f (x)

where f (x) is the probability density function of the random variable


X for the population being sampled

Although the random variable X has been treated as continuous variable,


Definition 6.9 applies to both continuous and discrete variables. The definition
means that

(a) the successive selection of the element in the population are indepen-
dent and

(b) the density function of the random variable remains the same from
selection to selection as that of the population.

The independence condition can, to a certain extent, be controlled by com-


paring the empirical distributions for the 1st , 2nd , · · · elements obtained in
a large number of samples. However, it is not at all easy to be sure that
the sample is in fact drawn from a population196 described by the probability
density function f (x).
(a) the successive selection of the element in the population are indepen-
dent TOPICS
ADVANCED and IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
(b) the density
PROBABILITY THEORYfunction of the
– VOLUME III random variable remains theDISTRIBUTIONS
SAMPLING same from I Basic Concepts
selection to selection as that of the population.

The independence condition can, to a certain extent, be controlled by com-


paring the empirical distributions for the 1st , 2nd , · · · elements obtained in
a large number of samples. However, it is not at all easy to be sure that
the sample is in fact drawn from a population described by the probability
density function f (x).

Definition 6.10 SIMPLE RANDOM SAMPLE


(Mathematical Definition)
Suppose x1 , x2 , ..., xn are values assumed by a corresponding set of
random variables X1 , X2 , ..., Xn which are independent and which
have the same distribution, then the xi ’s are referred to as a simple
random sample

Conventionally, we shall define simple random sampling and simple random


214 Advanced Topics in Introductory Probability
sample as follows.

Definition 6.11 SIMPLE RANDOM SAMPLING


(Conventional Definition)
Simple random sampling is any technique designed to draw sample
members from a population in such a way that each member in the
population has an equal chance of being selected

Definition 6.12 SIMPLE RANDOM SAMPLE


(Conventional Definition)
A simple random sample is a sample drawn in such aByway
2020,that
Brain power
(a) every member of the population has the samehow
wind could provide one-tenth of our planet’s

probability
electricity needs. Already today, SKF’s innovative know-
of a large proportion of the
is crucial to running
being included in the sample, world’s wind turbines.
Up to 25 % of the generating costs relate to mainte-
(b) different members of the population are selectednance. These can be reduced dramatically thanks to our
independently
systems for on-line condition monitoring and automatic
(that is, selection of one member has no effect lubrication.
on selectionWe helpof
make it more economical to create
another) cleaner, cheaper energy out of thin air.
By sharing our experience, expertise, and creativity,
industries can boost performance beyond expectations.
Therefore we need the best employees who can
meet this challenge!

We must emphasise that in actual process of sampling, it is often quite


The Power of Knowledge Engineering
difficult to maintain randomness and conscious effort must be made to ensure
that conditions (a) and (b) in Definition 6.12 are fulfilled. There is no
general recipe that can be given to ensure that the results from sampling
are reliable besides the use of objective procedures. Suppose that there is a
sampling frame which is the list of all members of the population of interest.
Then probably, the best method of simple random sampling is the use of a
computerPluggenerated table
into The Power of random
of Knowledge numbers.
Engineering.
Visit us at www.skf.com/knowledge
Random Number Table

A random number table is a table of numbers constructed by a process that


ensures that

(a) in any position in the table, each of the numbers 0 through 9 has equal
1 197
probability of of occurring.
10
Simple random sampling is any technique designed to draw sample
members from a population in such a way that each member in the
ADVANCED TOPICS IN INTRODUCTORY
population
PROBABILITY: has anCOURSE
A FIRST equal chance
IN of being selected
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTIONS I Basic Concepts

Definition 6.12 SIMPLE RANDOM SAMPLE


(Conventional Definition)
A simple random sample is a sample drawn in such a way that

(a) every member of the population has the same probability of


being included in the sample,

(b) different members of the population are selected independently


(that is, selection of one member has no effect on selection of
another)

We must emphasise that in actual process of sampling, it is often quite


difficult to maintain randomness and conscious effort must be made to ensure
that conditions (a) and (b) in Definition 6.12 are fulfilled. There is no
general recipe that can be given to ensure that the results from sampling
are reliable besides the use of objective procedures. Suppose that there is a
sampling frame which is the list of all members of the population of interest.
Then probably, the best method of simple random sampling is the use of a
computer generated table of random numbers.

Random Number Table

A random number table is a table of numbers constructed by a process that


ensures that

(a) in any position in the table, each of the numbers 0 through 9 has equal
1 I
Sampling Distributions 215
probability of of occurring.
10
(b) The occurrence of any number in one part of the table is independent
of the occurrence of any number in any other part of the table (i.e.
knowing the numbers in one part of the table tells us nothing about
the numbers in another part of the table).
Table X of Appendix A presents a table of random numbers.

We shall illustrate in the following example how random numbers are used
in the selection of a sample.

Example 6.1
Select 15 students from a student population of 250 using a table of random
numbers.

Solution

(a) Assign a number to each member in the population


Compile the list of the students and assign each of the 250 students
a number from 001 to 250. Each student number is called a sampling
number.

(b) Select a page


If the table of random numbers is more than one page then select a
198
page at random (for example, by writing the page numbers on pieces
(a) Assign a number to each member in the population
ADVANCED TOPICS IN INTRODUCTORY
Compile the list of the students
PROBABILITY: A FIRST COURSE IN
and assign each of the 250 students
a number
PROBABILITY from –001
THEORY to 250.
VOLUME III Each student number is calledDISTRIBUTIONS
SAMPLING a sampling I Basic Concepts
number.

(b) Select a page


If the table of random numbers is more than one page then select a
page at random (for example, by writing the page numbers on pieces
of paper and picking one at random). Suppose the third page of Table
X is selected.

(c) Pick an arbitrary starting point in the random table


These random numbers are in groups of five digits. This grouping is
arbitrary. Because our population size (N = 250) has three digits, we
need to pick three-digit numbers from the table.
Suppose we decide to take the first three digits of each five-digit num-
ber. To pick the starting point, close your eyes and place your pencil
point on the table so that we do not subconsciously look for a start-
216 ing point that includes aAdvanced
member ofTopics
interest to us. AssumeProbability
in Introductory that your
pencil falls in the fourth column of numbers in the eleventh row.

The number is 46201. The first three digits of this number are 462.
This number is thrown out because it is larger than the population
size (N = 250). There is no student whose number is 462.

(d) Determine the numbers

We select numbers for our sample by moving in some (predetermined)


direction away from the starting point. We can go up, down or across.
Suppose we had decided beforehand to move across the row. Reading
the next first three-digit number we get 045 and the student assigned
number 045 is the first member of our sample. The next three-digit
number going across to the right gives us 425 which we throw out.
Moving across we look for the next three digit number which is not
larger than the population size of 250. This is 211 so the second per-
son in the sample is the student numbered 211. The next three-digit
number is 763 which we throw out. This process of picking numbers
continue until 15 students (the sample size n) are selected. The re-
sulting samples are:

045 211 006 194 214 212 029 276 044 165 154 080
076 043 244

Note
If, by chance, the same number occurs more than once, we would
ignore it, after it has appeared for the first time.

6.4 SAMPLING WITH AND WITHOUT REPLACEMENT

If we select a member from a population we have the choice of replacing or


not replacing the member in the population before a second selection. In
the first case a particular member can come up again and again, whereas in
the second, it can come up only once. This leads us to Definitions 6.13 and
6.14. 199
Note
If, byTOPICS
ADVANCED chance, the same number occurs more than once, we would
IN INTRODUCTORY
ignore it, after it
PROBABILITY: A FIRST COURSEhas appeared
IN for the first time.
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTIONS I Basic Concepts

6.4 SAMPLING WITH AND WITHOUT REPLACEMENT

If we select a member from a population we have the choice of replacing or


not replacing the member in the population before a second selection. In
the first case a particular member can come up again and again, whereas in
the second, it can come up only once. This leads us to Definitions 6.13 and
6.14.
Sampling Distributions I 217

Definition 6.13 RANDOM SAMPLING WITH


REPLACEMENT
If n members are selected from a population of N members, sequen-
tially one by one, such that at every stage the probability of selection
1
of any member is irrespective of whether this member has been
N
selected earlier or not, the procedure is known as random sampling
with replacement

In general, when sampling is with replacement from a population of N mem-


bers, the number of possible samples of size n is equal to N n .
Sampling with replacement is often associated with an infinite popula-
tion. Thus the expressions “sample from an infinite population”, “sampling
with replacement”, “samples are independent” and “simple random sample”
are all equivalent.
A finite population which is sampled with replacement can theoreti-
cally be considered infinite since samples of any size can be drawn without
exhausting the population. For more practical purposes, sampling from a
finite population which is very large can be considered as sampling from an
infinite population.

Definition 6.14 RANDOM SAMPLING WITHOUT


REPLACEMENT
If at each stage of selection of n members from a population of N
members sequentially one by one, a member is selected with equal
probability only from among members not selected earlier, sampling
is said to be random sampling without replacement

Sampling without replacement is often associated with finite population and


the expressions “sample from a finite population”, “sampling without re-
placement” and “samples are not independent” are all equivalent.
When drawing samples of size n from a finite population of size N
without replacement, and ignoring the order in which the sample values are
drawn, the number of possible samples is given by the combination of N
objects taken n at a time. From Chapter 2 this is given by

200
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
218
PROBABILITY THEORY – VOLUME Advanced
III Topics in Introductory Probability I Basic Concepts
SAMPLING DISTRIBUTIONS

 
N N!
=
n n! (N − n)!
The probability for each subset of n of the N objects of the finite
population is
1
 
N
n
We can also determine the joint probability distributions of the random
variables from a random sample of size n from a finite population by
1
f (x1 , x2 , · · · , xn ) =
N (N − 1) · · · (N − n + 1)
To generalise: when a simple random sample of size n is drawn from
a finite population with replacement, or from an infinite population with
or without replacement, we have n identically distributed random variables
X1 , X2 , · · · , Xn all possessing the same distribution as the parent population.
Also, if the population is finite but large then even when sampling is without
replacement, the sample observations, xi , can in practice be treated as if
they were identical and independent.

EXERCISES

6.1 Write short notes on the following:

(a) Simple random sampling


(b) Systematic sampling
(c) Stratified sampling
(d) Cluster sampling
(e) multistage sampling

6.2 The management of an establishment wants to estimate the age of


twenty workers from the workers population of 300. Use random num-
bers to draw this sample.

201
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
Chapter
Chapter 7
7
Chapter 7
SAMPLING
SAMPLING DISTRIBUTIONS
DISTRIBUTIONS II II
Sampling Distribution
Sampling Distribution of Statistics
of StatisticsII
SAMPLING DISTRIBUTIONS
Sampling Distribution of Statistics

7.1 INTRODUCTION
7.1 INTRODUCTION

7.1 INTRODUCTION
7.1.1 Definition of Statistic
7.1.1 Definition of Statistic
Having
7.1.1 determined
Definitionin Chapter 6 what a random sample is, we can now ex-
Having determined inofChapter
Statistic
6 what a random sample is, we can now ex-
amine the distributions of statistics calculated from random samples.
amine the distributions of statistics calculated from random samples.
Though
Having we have
determined already 6defined
in Chapter what a what
random “statistic”
sample is,is, we shall
nownow
Though we have already defined what “statistic” is,wewecan
shall ex-
now
present
amine its distributions
mathematical definition.
presentthe
its mathematicalofdefinition.
statistics calculated from random samples.
Though we have already defined what “statistic” is, we shall now
present its mathematical definition.
Definition 7.1 STATISTIC
Definition 7.1 STATISTIC
Let X , X , ..., X be a random sample from a random variable and
Let X11 , X22 , ..., Xnn be a random sample from a random variable and
let x1 , x2 , ...,7.1
x be the values assumed by the sample. Then the
let x1 , x2 , ..., xnn beSTATISTIC
Definition the values assumed by the sample. Then the
real-valued
Let X1 , X2 , function
..., Xn be a random sample from a random variable and
real-valued function
let x1 , x2 , ..., xn be the values assumed by the sample. Then the
real-valued function Y = G(X , X , ..., X )
Y = G(X11 , X22 , ..., Xnn )
is a statistic which assumes the value
Y = G(X y...,=XG(x
1 , X2 , y n) 1
, x2 , ..., xn )
is a statistic which assumes the value = G(x 1 , x2 , ..., xn )

is a statistic which assumes the value y = G(x1 , x2 , ..., xn )


Examples of a statistic include
Examples of a statistic include
n
1  n
Examples of a statistic
(a) the sample X = 1  Xi
mean: include
Trust and responsibility
(a) the sample mean: X = n Xi
n 
i=1
n
1 i=1
(a) theNNE
sample mean: have
and Pharmaplan X= Xito create
joined forces – You have to be proactive and open-minded as a
n i=1219
NNE Pharmaplan, the world’s leading
219
engineering newcomer and make it clear to your colleagues what
and consultancy company focused entirely on the you are able to cope. The pharmaceutical field is new
pharma and biotech industries. to me. But busy as they are, most of my colleagues
219 find the time to teach me, and they also trust me.
Inés Aréizaga Esteva (Spain), 25 years old Even though it was a bit hard at first, I can feel over
Education: Chemical Engineer time that I am beginning to be taken seriously and
that my contribution is appreciated.

NNE Pharmaplan is the world’s leading engineering and consultancy company


focused entirely on the pharma and biotech industries. We employ more than
1500 people worldwide and offer global reach and local knowledge along with
our all-encompassing list of services. nnepharmaplan.com

202
real-valued function

Y = G(X , X2 , ..., Xn )
ADVANCED TOPICS IN INTRODUCTORY 1
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
is a statistic
PROBABILITY which
THEORY – VOLUME n)
assumesIIIthe value y = G(x1 , x2 , ..., xSAMPLING DISTRIBUTION OF STATISTICS

Examples of a statistic include


n
220 1 
(a) the sample mean: X =Advanced
Xi Topics in Introductory Probability
n i=1
n
1 
(b) the sample variance11 : ŝ2 =219 (Xi − X)
n − 1 i=1

(c) Minimum of the sample: K = min(X1 , X2 , ..., Xn )


220 Advanced Topics in Introductory Probability
(d) Maximum of the sample: M = max(X1 , X2 , ..., Xn )
n
1 
(b) the sample variance11 : ŝ2 = (Xi − X)
n − 1 i=1
(e) the sample range: R = M − K

(c) Minimum of the sample: K = min(X1 , X2 , ..., Xn )


That is, a statistic is a formula for obtaining values of the function. However,
sometimes the term statistic is used to refer to the values themselves. Thus
we
(d)may speak ofofthe
Maximum thestatistic
sample: yM==G(x 1 , x21, ,...,
max(X ) when
X2x,n..., Xn ) we really should
say that y is the value of the statistic Y = G(X1 , X2 , ..., Xn ).

(e) the Sample


7.1.2 sample range: R = M − as
Observations K Random Variables

Earlier we have referred to sampling as a process of selecting a sample from


That is, a statistic is a formula for obtaining values of the function. However,
a population or the range of a random variable. An observation made from
sometimes the term statistic is used to refer to the values themselves. Thus
the range of a random variable is, again a random variable. This is because
we may speak of the statistic y = G(x1 , x2 , ..., xn ) when we really should
an observed value can assume any one of all possible values of a random
say that y is the value of the statistic Y = G(X1 , X2 , ..., Xn ).
variable.
Suppose X1 , X2 , ..., Xn is a simple random sample and x1 , x2 , ..., xn are
7.1.2 Sample Observations as Random Variables
the actual observations of a specific sample that has been drawn. Thus,
the term “random sample” refers to the observations thought of as random
Earlier we have referred to sampling as a process of selecting a sample from
variables but not to a particular set of actual realisations. However, since
a population or the range of a random variable. An observation made from
the Xi are identically distributed, a specific collection of sample observations
the range of a random variable is, again a random variable. This is because
can be thought of as the particular values of a random variable X. In
an observed value can assume any one of all possible values of a random
connection with this, for a random sample of size n, we may treat the xi
variable.
as realisation of X : X = x1 , X = x2 , ..., X = xn . Consequently, if
Suppose X1 , X2 , ..., Xn is a simple random sample and x1 , x2 , ..., xn are
the sample is random then sample observations thought of as independent
the actual observations of a specific sample that has been drawn. Thus,
random variables
Sampling and sample
Distributions observations thought of as a particular set221
II refers of
the term “random sample” to the observations thought of as random
sample values are equivalent and can be used interchangeably.
variables but not to a particular set of actual realisations. However, since
the
7.1.3 Xi areDefinition
identically and
distributed, a specificof
Construction collection
Sampling of sample observations
Distribution of
can be thought
11
The reason for
Statisticsof as the particular values of a random variable
dividing by n − 1 rather than the seemingly more logical choice, In
X.n, will
be explained in
connection Section
with this,7.5.
for a random sample of size n, we may treat the xi
as
Werealisation
can consider of Xall :possible
X = xsamples
1 , X = ofx2size = xncan
, ..., nXwhich . Consequently,
be drawn from if
the sample is random
the population, and fortheneachsample
sampleobservations
we compute thought of of
a statistic as interest.
independentThe
random variables and sample observations thought of as a particular
idea that sample observations are random variables leads us to the conclusion set of
sample
that anyvalues are computed
quantity equivalentfrom and sample
can be observations
used interchangeably.
must also be a random
variable. As a random variable, the computed quantity or statistic has
a 11probability
The reason fordistribution
dividing by nof
− 1its ownthan
rather called probability
the seemingly moredistribution
logical choice,of
n, the
will
statistic
be or insampling
explained distribution of the statistic which may be established
Section 7.5.
directly from the parent population distribution.

Definition 7.2 SAMPLING DISTRIBUTION OF A STATISTIC


203
The distribution of all possible values that can be assumed by some
Statistics
ADVANCED TOPICS IN INTRODUCTORY
We can consider all possible samples of size n which can be drawnSAMPLING
PROBABILITY: A FIRST COURSE IN
from DISTRIBUTIONS II
the population,
PROBABILITY and for
THEORY each sample
– VOLUME III we compute a statisticSAMPLING
of interest. The
DISTRIBUTION OF STATISTICS
idea that sample observations are random variables leads us to the conclusion
that any quantity computed from sample observations must also be a random
variable. As a random variable, the computed quantity or statistic has
a probability distribution of its own called probability distribution of the
statistic or sampling distribution of the statistic which may be established
directly from the parent population distribution.

Definition 7.2 SAMPLING DISTRIBUTION OF A STATISTIC

The distribution of all possible values that can be assumed by some


statistic, computed from samples of the same size randomly drawn
from the same population, is called the sampling distribution of that
statistic

For a given sampling distribution we are interested in knowing three


things, namely, the mean, the variance and the form of the distribution.
Even though in practice we would not at all attempt to empirically
construct the sampling distribution of a statistic, we can conceptualise the
manner in which it can be done when sampling is from a discrete, finite
population of size N . The procedure is as follows:

(a) Randomly select all possible samples of size n from the finite popula-
tion of size N .

(b) Compute the statistic of interest for each sample.

(c) List the different distinct observed values of the statistic together with
the corresponding frequency of occurrence of each distinct observed
value of the statistic.

(d) Compute the relative frequency (probabilities) of the occurrence of the


222 observed statistic. Advanced Topics in Introductory Probability

For infinite or large finite populations, one could approximate sampling dis-
tribution of a statistic by drawing large number of independent simple ran-
dom samples and by proceeding in the manner just described above. In
fact, the actual construction of a sampling distribution according to the
steps given above is a tedious task. Fortunately, there are theorems that
simplify things for us.
In the subsequent sections we shall introduce sampling distributions
for the most frequently encountered statistics: the mean, proportion and
variance. We shall now introduce an example that will be used most often
in this chapter.

Example 7.1
Suppose a population consists of four numbers: 4, 5, 7, 8.
Find

(a) the population mean;

(b) the population variance and population standard deviation.

Solution

(a) The population mean µ is


N 204
i=1 Xi 4+5+7+8 24
µ= = = =6
Find
ADVANCED TOPICS IN INTRODUCTORY
(a) the population
PROBABILITY: mean; IN
A FIRST COURSE SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
(b) the population variance and population standard deviation.

Solution

(a) The population mean µ is


N
i=1 Xi 4+5+7+8 24
µ= = = =6
N 4 4

(b) The population variance σ 2 is


N
i=1 (Xi − µ)2
σ 2
=
N
(4 − 6)2 + (5 − 6)2 + (7 − 6)2 + (8 − 6)2
=
4
4+1+1+4
=
4
= 2.5

The population standard deviation is



σ = 2.5 = 1.5811

This e-book
is made with SETASIGN
SetaPDF

PDF components for PHP developers

www.setasign.com

205
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME
Sampling Distributions II III SAMPLING DISTRIBUTION
223 OF STATISTICS

7.2 SAMPLING DISTRIBUTION OF MEANS

7.2.1 Definition of Sampling Distribution of Means

Let x1 , x2 , ..., xn be the values assumed by a corresponding set of random


variables X1 , X2 , ..., Xn . We refer to the xi ’s as a random sample of size
n. We are accustomed to thinking of the sample mean X as a specific
number calculated from a specific sample. If we consider only one sample,
this is correct. If we consider more than one sample, we need to think of X
differently. The value of X usually changes as the sample changes. When we
select different samples from a population, we generally get different values
for X. The mean X of such a sample is a value assumed by the random
variable
X1 + X2 + ... + Xn
X=
n
The sample mean X therefore is a random variable because X1 , X2 , ..., Xn
corresponding to n trials of the sample values x1 , x2 , ..., xn are random vari-
ables. After a particular random sample has been taken, X will be a number,
but before it has been drawn it will be a random variable capable of assuming
any value that the original variable X can assume.

Definition 7.3 SAMPLING DISTRIBUTION OF MEANS


The probability distribution of the sample statistic X, is called the
sampling distribution for the sample mean or the sampling distribu-
tion of means

7.2.2 Sampling Distribution of Means from Infinite Population


In the previous section we conceptualised the manner in which the sampling
distribution of a statistic can be empirically constructed. In the following
example we shall illustrate the procedure with the sampling distribution of
means.

Example 7.2
Refer to Example 7.1.
(a) List all possible samples Advanced
224 of size twoTopics
that can be drawn from
in Introductory the pop-
Probability
ulation with replacement.

(b) Find
(i) the mean of the distribution of the means;
(ii) the variance and standard deviation of the distribution of the
means.

Solution
(a) N = 4, n = 2
There are N n = 42 = 16 possible samples of size two that can be drawn
from a population of size four (where sampling is with replacement).
These are
(4, 4) (4, 5) (4, 7) (4, 8) (5, 4) (5, 5) (5, 7) (5, 8)
(7, 4) (7, 5) (7, 7) (7, 8) (8, 4) (8, 5) (8, 7) (8, 8)

Note
206
Here the notation (a, b) is an ordered pair. For example, (4, 5) denotes
means.

Solution TOPICS IN INTRODUCTORY


ADVANCED
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
(a) N = THEORY
PROBABILITY 4, n =– 2VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
There are N n = 42 = 16 possible samples of size two that can be drawn
from a population of size four (where sampling is with replacement).
These are
(4, 4) (4, 5) (4, 7) (4, 8) (5, 4) (5, 5) (5, 7) (5, 8)
(7, 4) (7, 5) (7, 7) (7, 8) (8, 4) (8, 5) (8, 7) (8, 8)

Note
Here the notation (a, b) is an ordered pair. For example, (4, 5) denotes
“first a 4 and then a 5” and is different from (5, 4) which denotes “first
a 5 and then a 4.”
(b) (i) The corresponding sample means X i are
4.0 4.5 5.5 6.0 4.5 5.0 6.0 6.5
5.5 6.0 7.0 7.5 6.0 6.5 7.5 8.0
The sample mean and their associated probabilities (relative fre-
quencies) are as follows:

Possible values of X i Frequency Relative Frequency


(Probability)
4.0 1 0.0625
4.5 2 0.1250
5.0 1 0.0625
5.5 2 0.1250
6.0 4 0.2500
6.5 2 0.1250
7.0 1 0.0625
7.5 2 0.1250
8.0 1 0.0625
Total II
Sampling Distributions 16 1.0000 225

For the distribution of the sample means Xi , the expected value is



µX = E(X i ) = xi p(xi )
= 4(0.0625) + 4.5(0.1250) + 5(0.0625) + 5.5(0.1250)
+6(0.2500) + 6.5(0.1250) + 7(0.0625)0 + 7.5(0.1250)
+8(0.0625)
= 6

which is identical with the population mean obtained in Example 7.1(a).


Thus the mean of the sample means is equal to the population mean.

Note
Without calculating the probabilities we could have obtained the mean of
the sampling distribution of means from the raw data (the sample means
themselves) as:

X 4.0 + 4.5 + 5.0 + · · · + 6.5 + 7.0 + 7.5 + 8.0
µX = =
N2 42
96
=
16
= 6

(ii) The variance of the distribution of sample means is:



(Xi − µX )2
2
σX =
Nn 207
(4 − 6)2 + (4.5 − 6)2 + · · · + (8 − 6)2

X 4.0 + 4.5 + 5.0 + · · · + 6.5 + 7.0 + 7.5 + 8.0
µX = =
ADVANCED TOPICS N IN2INTRODUCTORY 42
96
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
=
PROBABILITY THEORY
16 – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
= 6

(ii) The variance of the distribution of sample means is:



(Xi − µX )2
2
σX =
Nn
(4 − 6)2 + (4.5 − 6)2 + · · · + (8 − 6)2
=
16
4 + 2.25 + 1 + 0.25 + 0 + 0.25 + 1 + 2.25 + 4
=
16
= 1.25

The standard deviation of the distribution of the means is



σX = 1.25 = 1.118

Note
226 variance of the sampling distribution
The of means
Advanced Topics is not equal to
in Introductory the pop-
Probability
ulation variance.
As has been pointed out earlier, it is burdensome to construct the
sampling distribution of means in the way it has been done. In practice, we
shall employ the following two theorems to obtain the mean and variance of
the sample mean.

Theorem 7.1
Let X be a random variable with expectation E(X) = µ and variance
Var(X) = σ 2 . Let X be the sample mean of a random sample of size
n. Then the mean of the sampling distribution of means is given by

µX = E(X) = µ

Sharp
The theorem aboveMinds - Bright
states that Ideas!
the expected value of the sample mean is the
population mean.
Employees at FOSS Analytical A/S are living proof of the company value - First - using The Family owned FOSS group is
new inventions to make dedicated solutions for our customers. With sharp minds and the world leader as supplier of
Proof cross functional teamwork, we constantly strive to develop new unique products - dedicated, high-tech analytical
Since theWould
sample mean
you like to join is
ourdefined
team? as solutions which measure and
control the quality and produc-
FOSS works diligently withX 1 + X2and
innovation + development
· · · + Xn as basis for its growth. It is tion of agricultural, food, phar-
reflected in theX = more than 200 of the 1200 employees in FOSS work with Re-
fact that maceutical and chemical produ-
search & Development in Scandinavia
n
and USA. Engineers at FOSS work in production,
1   cts. Main activities are initiated
development and marketing, within a wide range of different fields, i.e. Chemistry,
E(X) = E(X
Electronics, Mechanics, Software, 1 ) + E(X 2 ) + · · · + E(X
Optics, Microbiology, Chemometrics. n ) from Denmark, Sweden and USA
n with headquarters domiciled in
1 Hillerød, DK. The products are
We offer
= (nµ) marketed globally by 23 sales
n
A challenging job in an international and innovative company that is leading in its field. You will get the
= µ
opportunity to work with the most advanced technology together with highly skilled colleagues.
companies and an extensive net
of distributors. In line with
the corevalue to be ‘First’, the
Read more about FOSS at www.foss.dk - or go directly to our student site www.foss.dk/sharpminds where
company intends to expand
you can learn more about your possibilities of working together with us on projects, your thesis etc.
Theorem 7.2 its market position.

If a Dedicated
populationAnalytical
is infiniteSolutions
or if sampling is with replacement, then
the variance
FOSS
of the sampling 2 ,
distribution of means, denoted by σX
is given by
Slangerupgade 69
3400 Hillerød σ2
Tel. +45 70103370 Var(X) = σ 2
X
=
n
www.foss.dk

That is, the variance of the sampling distribution is equal to the population
variance divided by the size of the sample used to obtain the sampling
distribution. 208
As has
ADVANCED been INpointed
TOPICS out earlier, it is burdensome to construct the
INTRODUCTORY
PROBABILITY: A FIRST COURSE IN in the way it has been done. In practice,
sampling distribution of means we
SAMPLING DISTRIBUTIONS II
shall employ the following two theorems to obtain the mean and variance of
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
the sample mean.

Theorem 7.1
Let X be a random variable with expectation E(X) = µ and variance
Var(X) = σ 2 . Let X be the sample mean of a random sample of size
n. Then the mean of the sampling distribution of means is given by

µX = E(X) = µ

The theorem above states that the expected value of the sample mean is the
population mean.

Proof
Since the sample mean is defined as
X1 + X 2 + · · · + X n
X =
n
1 
E(X) = E(X1 ) + E(X2 ) + · · · + E(Xn )
n
1
= (nµ)
n
= µ

Theorem 7.2
If a population is infinite or if sampling is with replacement, then
the variance of the sampling distribution of means, denoted by σX 2 ,

is given by
σ2
Var(X) = σX 2
=
n

That is, the variance of the sampling distribution is equal to the population
variance divided by the size of the sample used to obtain the sampling
Sampling Distributions II 227
distribution.

Proof
 
X 1 X2 Xn
Var(X) = Var + + ... +
n n n
1 
= Var(X 1 ) + Var(X 2 ) + ... + Var(X n )
n2
1 σ 2
= 2
(nσ 2 ) =
n n

Note
Theorems 7.1 and 7.2 do not assume Normality of the “parent” population.

Definition 7.4 STANDARD ERROR OF MEAN


The positive square root of the variance of the mean is referred to as
the standard error of the sample mean

The standard error of the mean measures chance variations of sample mean
from sample to sample. Denoting the standard error by σX , it is given by
209
σ
σ =√
Definition 7.4 STANDARD ERROR OF MEAN
ADVANCED TOPICS IN INTRODUCTORY
The positive
PROBABILITY: square
A FIRST root of
COURSE IN the variance
of the mean is referred to SAMPLING
as DISTRIBUTIONS II
the standard
PROBABILITY error
THEORY of the sample
– VOLUME III mean SAMPLING DISTRIBUTION OF STATISTICS

The standard error of the mean measures chance variations of sample mean
from sample to sample. Denoting the standard error by σX , it is given by
σ
σX = √
n

This formula shows that the standard error of the mean decreases when n,
the sample size, is increased. This means that when n is sufficiently large
and we actually have more information, sample means can be expected to
be closer to µ, the quantity which X is usually supposed to estimate.

Example 7.3
Refer to Example 7.1. Employ Theorems 7.1 and 7.2 to obtain the variance
of the sampling distribution of means if a sample of size two is drawn with
replacement from the population.

Solution
From Example 7.2, N = 4, σ 2 = 2.5.
Also, the sample size n = 2. Hence

σ2 2.5
228 Var(X) = = Topics
Advanced = 1.25
in Introductory Probability
n 2

Although Theorems 7.1 and 7.2 give us some characteristics of the sampling
distribution, it does not permit us to calculate probabilities because we do
not know the form of the sampling distributions. To be able to do this we
need to use the Central Limit Theorem, as it can be seen later.

7.2.3 Sampling Distributions of Means from Finite Population


The discussions so far have been based on the assumption that samples are
either drawn with replacement or are drawn from infinite populations. In
general, we do not sample with replacement, and in most practical situations
it is necessary to sample from a finite population; hence, we need to become
familiar with the behaviour of the sampling distribution of the sample mean
when sampling is drawn without replacement or from finite populations.

Example 7.4
Refer to Example 7.1

(a) List all possible samples of size two that can be drawn from the pop-
ulation without replacement.

(b) Find

(i) the mean of the distribution of the means;


(ii) the variance and standard deviation of the distribution of the
means.

Solution
N = 4, n = 2
 
4
(a) There are = 6 samples of size two which can be drawn without
2
replacement, namely,

(4, 5) (4, 7) (4, 8) (5, 7)210 (5, 8) (7, 8)


(b) Find

(i) the
ADVANCED mean
TOPICS of the distribution of the means;
IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
(ii) the variance and standard deviation of the distribution of the
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
means.

Solution
N = 4, n = 2
 
4
(a) There are = 6 samples of size two which can be drawn without
2
replacement, namely,

(4, 5) (4, 7) (4, 8) (5, 7) (5, 8) (7, 8)

Note
Sampling
The Distributions II example, is considered the same as (5, 4).
sample (4, 5) for 229

(b) (i) The means of the corresponding samples, Xi in (a) are


4.5 5.5 6.0 6.0 6.5 7.5
Therefore the mean of the sampling distribution of means is
n
i=1X
µX = 
4
2
4.5 + 5.5 + 6.0 + 6.0 + 6.5 + 7.5
=
6
= 6

Note
Once again the mean of the sampling distribution of means is equal to
the population mean.
(ii) The variance of this sampling distribution is

(Xi − µX )
2
σX =  
N
n
(4.5 − 6)2 + (5.5 − 6)2 + · · · + (7.5 − 6)2
=  
4
2
2.25 + 0.25 + 0 + 0 + 0.25 + 2.25
=
6
5
= = 0.83333
6

5
σX = = 0.9129
6

Note
As in the case of sampling with replacement, the variance of the sam-
pling distribution is not equal to the population variance. Moreover,
it is not equal to the population variance divided by the sample size
(σX2 = 5 = 2.5 ). The formula for obtaining the variance of the sam-
6 4
pling distribution of means in the case of sampling without replace-
ment is given in Theorem 7.3.

211
4
2
ADVANCED TOPICS IN INTRODUCTORY
4.5 + 5.5 + 6.0 + 6.0 + 6.5 + 7.5
= IN
PROBABILITY: A FIRST COURSE SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III 6 SAMPLING DISTRIBUTION OF STATISTICS
= 6

Note
Once again the mean of the sampling distribution of means is equal to
the population mean.
(ii) The variance of this sampling distribution is

(Xi − µX )
2
σX =  
N
n
(4.5 − 6)2 + (5.5 − 6)2 + · · · + (7.5 − 6)2
=  
4
2
2.25 + 0.25 + 0 + 0 + 0.25 + 2.25
=
6
5
= = 0.83333
6

5
σX = = 0.9129
6

Note
As in the case of sampling with replacement, the variance of the sam-
pling distribution is not equal to the population variance. Moreover,
it is not equal to the population variance divided by the sample size
(σX2 = 5 = 2.5 ). The formula for obtaining the variance of the sam-
6 4
pling distribution of means in the case of sampling without replace-
230 ment is given in Theorem 7.3.
Advanced Topics in Introductory Probability

Theorem 7.3
If X is the mean of a random sample of size n from a finite
population of size N (or if sampling is without replacement), whose
mean is µ and variance is σ 2 , then

E(X) = µ
σ2 N − n
Var(X) = ·
n N −1

The formula for obtaining Var(X) in Theorem 7.2, which applies to values
assumed by independent random variables (or sampling with replacement
from a finite population) and the one in Theorem 7.3, which applies to
sampling without replacement from a finite population, differ by the factor
N −n
. If N , the size of the population, is larger compared to n, the size
N −1
of the sample, the difference between the two formulas becomes negligible.
Indeed, the formula in Theorem 7.2 is frequently used as an approxima-
tion for the variance of the distribution of X for samples obtained without
replacement from sufficiently large finite populations.

Example 7.5
For the data of Example 7.1 if a sample of size two, is drawn without re-
placement, calculate the variance and hence the standard deviation of the
sampling distribution of means.
212
Solution
. If N , the size of the population, is larger compared to n, the size
N −1
of the sample, the difference between the two formulas becomes negligible.
ADVANCED TOPICS IN INTRODUCTORY
Indeed, the formula
PROBABILITY: in Theorem
A FIRST COURSE IN 7.2 is frequently used as an approxima-
SAMPLING DISTRIBUTIONS II
tion for the variance
PROBABILITY THEORY –of the distribution
VOLUME III of X for samples obtained
SAMPLINGwithout
DISTRIBUTION OF STATISTICS
replacement from sufficiently large finite populations.

Example 7.5
For the data of Example 7.1 if a sample of size two, is drawn without re-
placement, calculate the variance and hence the standard deviation of the
sampling distribution of means.

Solution
Since sampling is without replacement, we use the formula in Theorem 7.3
σ 2 = 2.5, N = 4, n=2
σ2 N − n
Var(X) = ·
n N −1
2.5 4 − 2
= ·
2 4−1
= 0.83333
Hence the standard error of the mean is
Sampling Distributions II √ 231
σX = 0.83333 = 0.9129

Note
These results are equal to the ones obtained in Example 7.4

7.2.4 Sampling Distribution of Means from Normal


Distribution

In sampling distribution of means we are assured of at least an approximately


Normal distribution under three conditions. These are when sampling is
from

(i) a normally distributed population;

(ii) a non-normally distributed population but the sample size is large;

(iii) a population whose distribution is unknown but the sample size is


large.

Theorem 7.4
If the population from which random samples are taken is normally
distributed with mean µ and variance σ 2 then the sample mean X
σ2
is normally distributed with mean µ and variance
n

Proof
From Theorem 6.22 of Volume I it follows that
 
t
MX (t) = M 1 (X1 +X2 ...+Xn ) (t) = MX1 +...+Xn
n n

Since the sampling is random (that is, simple random), the variables X1 ,
X2 , .., Xn are independent, and therefore Theorem 6.28 of Volume I may be
applied to give
     
t t t
MX (t) = MX1 MX2 ...MX n (i)
n n n
From Definition 7.2 all the random variables X1 , X2 , ..., Xn have the same
probability density function, namely that of X, and hence the same moment
generating function. Consequently, all the moment
213 generating functions
Since the sampling is random (that is, simple random), the variables X1 ,
X2 , .., Xn are independent, and therefore Theorem 6.28 of Volume I may be
ADVANCED TOPICS IN INTRODUCTORY
applied to give
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III     SAMPLING DISTRIBUTION OF STATISTICS
t t t
MX (t) = MX1 MX2 ...MX n (i)
n n n
From
232 Definition 7.2 all the random variables
Advanced TopicsXin
1, X 2 , ..., Xn have
Introductory the same
Probability
probability density function, namely that of X, and hence the same moment
generating function. Consequently, all the moment generating functions
on the right hand side of (i) are the same function, namely the moment
generating function of the random variable X. Thus
  n
t
MX (t) = MX
n

Since the moment generating function of the Normal distribution N (µ, σ 2 )


is given by (see Theorem 11.7 of Volume I)
1 2 2
MX (t) = eµ t + 2 t σ
,

t
replacing t by in the formula will yield
n
   n
t t2
µ + 12 σ2
MX (t) = e n n2

1 2 σ2
= eµ t+ 2 t n

Therefore, by the Uniqueness Property of m.g.f. (Theorem 6.26 of Volume


σ2
I), X has the Normal distribution N (µ, ).
n
Example 7.6
A random sample of 20 students was taken from a normally distributed
population with mean age of 17 years and variance of 5.

(a) Find the


The Wake
(i) expectation of the sample mean;
thevariance
(ii) only emission we standard
and hence the want todeviation
leave behind
of the sample mean.

(b) What is the probability that the mean of this sample will fall between
16 and 23?

Solution

(a) n = 20, µ = 17, σ2 = 5

(i) E(X) = µ = 17

.QYURGGF'PIKPGU/GFKWOURGGF'PIKPGU6WTDQEJCTIGTU2TQRGNNGTU2TQRWNUKQP2CEMCIGU2TKOG5GTX

6JGFGUKIPQHGEQHTKGPFN[OCTKPGRQYGTCPFRTQRWNUKQPUQNWVKQPUKUETWEKCNHQT/#0&KGUGN6WTDQ
2QYGTEQORGVGPEKGUCTGQHHGTGFYKVJVJGYQTNFoUNCTIGUVGPIKPGRTQITCOOGsJCXKPIQWVRWVUURCPPKPI
HTQOVQM9RGTGPIKPG)GVWRHTQPV
(KPFQWVOQTGCVYYYOCPFKGUGNVWTDQEQO

214
1 2 σ2
= eµ t+ 2 t n

ADVANCED TOPICS IN INTRODUCTORY


Therefore, byAthe
PROBABILITY: Uniqueness
FIRST COURSE INProperty of m.g.f. (Theorem 6.26 of Volume
SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III σ2 SAMPLING DISTRIBUTION OF STATISTICS
I), X has the Normal distribution N (µ, ).
n
Example 7.6
A random sample of 20 students was taken from a normally distributed
population with mean age of 17 years and variance of 5.

(a) Find the

(i) expectation of the sample mean;


(ii) variance and hence the standard deviation of the sample mean.

(b) What is the probability that the mean of this sample will fall between
16 and 23?

Solution

(a) n = 20, µ = 17, σ 2 = 5


Sampling Distributions II 233
(i) E(X) = µ = 17

σ2 5
(ii) Var(X) = = = 0.25
 n 20
σ2 √
σX = = 0.025 = 0.5
n
(b) Though n = 20 is small, we still apply the Normal distribution because
the population from which the sample is drawn is normally distributed.
 
16 − 17 X − 17 23 − 17
P (16 < X < 23) = P √ < √ < √
0.5 0.5 0.5
= P (−2 < Z < 12)
= Φ(12) − Φ(−2)
= 1 − {1 − Φ(2)}
= Φ(2)
= 0.9872

7.2.5 Sampling Distribution of Means from Non-Normal


Distribution

We are often faced with the problem of sampling from non-normally dis-
tributed populations or from populations whose distributions are not known.
Under these two conditions, we shall have to take large samples since when
the sample size is sufficiently large, by virtue of the Central Limit Theorem
(Theorem 5.16), the means of random samples from any distribution will
σ2
tend to be normally distributed with a mean µ and variance . This per-
n
mits the use of the Normal distribution approximation as though sampling
is from normally distributed populations. Thus inference procedures based
on the sample mean can often use the Normal distribution. But we must be
careful not to imput normality to the original observations.

Example 7.7
A random sample of size 100 is taken from a population with mean 60 and
variance 300.

(a) Find the 215

(i) expectation of the sample mean,


σ
tend to be normally distributed with a mean µ and variance . This per-
n
mits the use of the Normal distribution approximation as though sampling
ADVANCED TOPICS IN INTRODUCTORY
is from normally
PROBABILITY: distributed
A FIRST COURSE INpopulations. Thus inference procedures SAMPLING
based DISTRIBUTIONS II
on the sample
PROBABILITY mean can
THEORY often use
– VOLUME III the Normal distribution.SAMPLING
But we must be
DISTRIBUTION OF STATISTICS
careful not to imput normality to the original observations.

Example 7.7
A random sample of size 100 is taken from a population with mean 60 and
variance 300.

(a) Find the


234 Advanced Topics in Introductory Probability
(i) expectation of the sample mean,
(ii) variance and hence the standard error of the sample mean.

(b) What is the probability that the sample mean will be less than 56?

Solution
n = 100 µ = 60 Var(X) = 300

(a) (i) E(X) = µ = 60


300
(ii) Var(X) = =3
100 √
and hence σ = 3 = 1.732

(b) The sampling distribution of the mean is not known but the sample
size is large so by the Central Limit Theorem we approximate it by
the Normal distribution.
 
X n − 60 56 − 60
P (X n < 56) = P √ < √
3 3
= P (Zn < −2.31)
= Φ(−2.31)
= 1 − Φ(2.31)
= 1 − 0.9896
= 0.0104

7.3 SAMPLING DISTRIBUTION OF PROPORTIONS

7.3.1 Construction of Sampling Distribution of Proportions

In the previous section we discussed the sampling distribution of a statistic


computed from measured variables, namely, the sample mean. In this section
we shall deal with the sampling distribution of statistics that result from
counts or frequency data, namely, a sample proportion.
k
Let a population proportion, p, be defined as p = , where k is the
N
number of members of the population that possess a certain trait and N is
the total number
Sampling of members
Distributions II of the population. Let us also designate 235
the
x
sample proportion by the symbol p̂ (read p hat) and is defined as , where
n
x is the number of elements in the sample that possess a certain trait and n
is the sample size. The sample distribution of the sample proportion could
be constructed experimentally in exactly the same manner as was suggested
in the case of the sample mean. From the population, which we assume to
be finite, we would

(a) take all possible samples of a given size,

(b) for each sample compute the sample proportion p̂,

(c) prepare a frequency distribution of p̂ by 216


listing the different values of
p̂ along with their frequencies of occurrence.
x is the number of elements in the sample that possess a certain trait and n
is the sample size. The sample distribution of the sample proportion could
ADVANCED TOPICS IN INTRODUCTORY
be constructed
PROBABILITY: experimentally
A FIRST COURSE INin exactly the same manner as was suggested
SAMPLING DISTRIBUTIONS II
in the case of the sample
PROBABILITY THEORY – mean.
VOLUME III From the population, which we assume
SAMPLING to
DISTRIBUTION OF STATISTICS
be finite, we would

(a) take all possible samples of a given size,

(b) for each sample compute the sample proportion p̂,

(c) prepare a frequency distribution of p̂ by listing the different values of


p̂ along with their frequencies of occurrence.

This frequency distribution (as well as the corresponding relative frequency


distribution) would constitute the sampling distribution of p̂.

7.3.2 Sampling Distribution of Proportions With Replacement

A sample proportion p̂ may be considered as a proportion of success which


is obtained by dividing the number of success by the sample size n. Hence if
a random sample of size n is obtained with replacement, then the sampling
distribution of p̂ obeys the binomial probability law.

Theorem 7.5
If sampling is with replacement, then

(a) E(p̂) = p
p(1 − p)
(b) Var(p̂) =
n
where p̂ is the sample proportion

Example 7.8
A consignment of bulbs has 20 percent defective. If a random sample of 250
is drawn with replacement from this consignment. What is the

(a) expectation of the sample proportion?

(b) variance and hence the standard error of the sample proportion?
Challenge the way we run

EXPERIENCE THE POWER OF


FULL ENGAGEMENT…

RUN FASTER.
RUN LONGER.. READ MORE & PRE-ORDER TODAY
RUN EASIER… WWW.GAITEYE.COM

1349906_A6_4+0.indd 1 22-08-2014 12:56:57

217
p(1 − p)
(b) Var(p̂)
ADVANCED TOPICS=IN INTRODUCTORY
n
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
where p̂ is
PROBABILITY the sample
THEORY proportion
– VOLUME III SAMPLING DISTRIBUTION OF STATISTICS

Example 7.8
A consignment of bulbs has 20 percent defective. If a random sample of 250
is drawn with replacement from this consignment. What is the

(a) expectation of the sample proportion?


236 Advanced Topics in Introductory Probability
(b) variance and hence the standard error of the sample proportion?

Solution
p = 0.2, n = 250

(a) E(p̂) = p = 0.2

(b)

p(1 − p)
Var(p̂) =
n
0.2(1 − 0.2)
= = 0.00064
√ 250
σp̂ = 0.00064 = 0.0253

7.3.3 Sampling Distribution of Proportions Without


Replacement

If the sampling is without replacement, then the sampling distribution of p̂


obeys the hypergeometric probability law.

Theorem 7.6
If sampling is without replacement, then

(a) E(p̂) = p
p(1 − p) N − n
(b) Var(p̂) = ·
n N −1
where p̂ is the sample proportion

That is, E(p̂) is still identical with the population proportion p but the
N −n
variance is adjusted by a finite population correction factor .
N −1
Example 7.9
In a class of 50 students, twenty are females and thirty are males. A random
sample of twelve students are drawn from this class without replacement.
Determine the variance and hence the standard error of the proportion of
male students in the sample of twelve.

218
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME
Sampling Distributions II III SAMPLING DISTRIBUTION
237 OF STATISTICS

Solution
Let k represent the number of female students in the class. Hence

N = 50, k = 20, n = 12
The population proportion of females in the class is therefore
k 20
p= = = 0.4
N 50
Since this is sampling without replacement, we have
p(1 − p) N − n
σp̂2 = Var(p̂) = ·
n N −1
0.4(1 − 0.4) 50 − 20
= ·
12 50 − 1
= 0.012245

σp̂ = 0.012245 = 0.1107

7.3.4 Sampling Distribution of Proportions from Non-Normal


Population

The distribution of p̂ for a random sample taken with replacement ap-


proaches the Normal distribution when n becomes infinite. For a random
sampling without replacement, when the sample size is large, the distribu-
tion of sample proportions is approximately normally distributed by virtue
of the Central Limit Theorem.

Theorem 7.7
The variable
p̂ − p

σp̂2
approaches the Standard Normal distribution when n becomes
infinite, where p̂ is a sample proportion that occurs in a sample of
size n

The question that now arises is how large does the sample size have
to be for the use of the Normal approximation to be valid? A widely used
238
criterion is that both np and n(1 − p) must
Advanced be greater
Topics than 5. Probability
in Introductory

Example 7.10
Refer to Example 7.8; what is the probability that 22% will be defective?

Solution
n = 250, p = 0.2
A specific value of p̂, denoted by p0 is p0 = 0.22. We are required to calculate
P (p̂ ≤ p0 ).
 
 
 0.22 − 0.2 
P (p̂ ≤ 0.22) = P Z ≤ 
 

 0.2(1 − 0.2) 
250
 
0.02
= P Z≤
0.0253
= P (Z ≤ 0.7906) = Φ(0.7906)
219
= 0.61141
P (p̂ ≤ p0 ).  

 

ADVANCED TOPICS IN INTRODUCTORY  0.22 − 0.2 
P (p̂ ≤ 0.22) = P 

 Z ≤  


PROBABILITY: A FIRST COURSE IN 
 0.22
0.2(1−−0.2
0.2)

 SAMPLING DISTRIBUTIONS II
P (p̂ ≤– 0.22)
PROBABILITY THEORY VOLUME=III P Z ≤ 
  SAMPLING DISTRIBUTION OF STATISTICS

 0.2(1250
− 0.2) 
 
0.02 250
= P Z ≤ 
0.02
0.0253
= P Z≤
= P (Z ≤ 0.7906)
0.0253 = Φ(0.7906)
=
= P (Z ≤
0.611410.7906) = Φ(0.7906)
= 0.61141
The Normal approximation may be improved by the continuity correction
factor,
The a device
Normal that makes an
approximation mayadjustment
be improvedfor the
by fact
the that a discrete
continuity distri-
correction
bution ais device
factor, being that
approximated
makes an by a continuous
adjustment distribution.
for the fact that a The correction
discrete distri-
factor isismore
bution beingimportant if n isby
approximated small. With the distribution.
a continuous continuity correction factor,
The correction
factor is more important if n is small. With the continuity correction factor,
 

 0.5 
 p0 + − p
P (p̂ ≤ p0 ) = P   Z ≤  0.5
n 


 p 0 + − p 

, x < np

P (p̂ ≤ p0 ) = P Z ≤ 
 p(1n − p) 

, x < np
 p(1 n− p) 
 
n

 p0 + 0.5 
 − p
= Φ 0.5
n


 p
0+ − p


= Φ   p(1n− p)  
 
 p(1 n− p) 
or n
or
 

 0.5 
 p0 − − p
P (p̂ ≤ p0 ) = P 
 Z ≤  0.5
n 
 , x > np

 p 0 − − p 


P (p̂ ≤ p0 ) = P Z ≤ 
 p(1n − p) 

, x > np
 p(1 n− p) 
n

Technical training on
WHAT you need, WHEN you need it
At IDC Technologies we can tailor our technical and engineering
training workshops to suit your needs. We have extensive OIL & GAS
experience in training technical and engineering staff and ENGINEERING
have trained people in organisations such as General
ELECTRONICS
Motors, Shell, Siemens, BHP and Honeywell to name a few.
Our onsite training is cost effective, convenient and completely AUTOMATION &
customisable to the technical and engineering areas you want PROCESS CONTROL
covered. Our workshops are all comprehensive hands-on learning
experiences with ample time given to practical sessions and MECHANICAL
demonstrations. We communicate well to ensure that workshop content ENGINEERING
and timing match the knowledge, skills, and abilities of the participants.
INDUSTRIAL
We run onsite training all year round and hold the workshops on DATA COMMS
your premises or a venue of your choice for your convenience.
ELECTRICAL
For a no obligation proposal, contact us today POWER
at training@idc-online.com or visit our website
for more information: www.idc-online.com/onsite/

Phone: +61 8 9321 1702


Email: training@idc-online.com
Website: www.idc-online.com

220
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
Sampling Distributions II 239

 
 p0 − 0.5 − p 
 n 
= Φ
 


 p(1 − p) 
n

Example 7.11
Refer to Example 7.9. Calculate the probability that out of the selected
twelve students, five or less of them will be females.

Solution
5
n = 12, p = 0.4 p0 = = 0.42
12
   
0.5
 0.42 − − 0.4 
 12 
P (p̂ ≤ 0.42) = P Z ≤
  

 0.4(1 − 0.4) 
12
 
−0.002
= P Z≤
0.01414
= Φ(−0.1414) = 1 − Φ(0.01414)
= 1 − 0.0056
= 0.9944

7.4 SAMPLING DISTRIBUTION OF DIFFERENCES

7.4.1 Construction of Sampling Distribution of Differences

There are many problems in applied statistics where our interest is in two
populations, such as knowing something about the difference between two
population means or the difference between two population proportions. In
one situation we may wish to know if it is reasonable to conclude that the
two population means (or proportions) are different. In another situation we
may wish to know the magnitude of the difference between two population
means (or proportions). A knowledge of the sampling distribution of the
240
difference between two means Advanced Topics isinuseful
(or proportions) Introductory Probability
in investigations of
this type.
To empirically construct the sampling distribution of the difference
between two sample means (or proportions) we would adopt the following
procedure.
Suppose there are two populations, Population 1 and Population 2. We
would draw all possible random
 samples of size n1 from Population 1 of size
N1
N1 . There would be such samples. For each set of sample data, the
n1
sample statistic (mean or proportion) would be computed. From Population
2 of size N2 we 
would draw separately and independently all possible random
N2
samples of size in all. The sample mean (or proportion) of each sample
n2
is computed and the difference between all possible pairs of the means (or
proportions) is taken. The sampling distribution of the difference between
sample means (or proportions) would consist of all such distinct differences,
accompanied by their frequencies or relative frequencies of occurrence.
221
7.4.2 Sampling Distribution of Difference between Means
N1
N1 . There would be such samples. For each set of sample data, the
n1
ADVANCED TOPICS IN INTRODUCTORY
sample statistic (mean or proportion) would be computed. From Population
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
2 of size N2 we
PROBABILITY would

THEORY –draw
VOLUMEseparately
III and independently allSAMPLING
possible random
DISTRIBUTION OF STATISTICS
N2
samples of size in all. The sample mean (or proportion) of each sample
n2
is computed and the difference between all possible pairs of the means (or
proportions) is taken. The sampling distribution of the difference between
sample means (or proportions) would consist of all such distinct differences,
accompanied by their frequencies or relative frequencies of occurrence.

7.4.2 Sampling Distribution of Difference between Means

Theorem 7.8
Let X11 , X12 , ..., X1n1 , X21 , X22 , ...X2n2 be n1 + n2 independent ran-
dom variables, the first n1 , having identical distributions with mean
µ1 and variance σ12 , and the remaining n2 having identical distribu-
tions with mean µ2 and variance σ22 . Then

E(X 1 − X 2 ) = µ1 − µ2

and
σ12 σ22
Var(X 1 − X 2 ) = +
n1 n2
where X 1 and X 2 are the sample means of Populations 1 and 2
respectively

Example 7.12
Two brands
Sampling of tyres areIIbeing compared by a motor firm. Brand A has
Distributions 241a
mean life of 24,000 km and a standard deviation of 1,200 km, while Brand

B has a mean life of 22,000 km and a standard deviation of 1,080 km.


A random sample of 100 and 90 tyres were taken from Brand A and Brand B
respectively. Compute the mean and variance of the sampling distribution
of the difference between the two means.

Solution
n1 = 100 n2 = 90
µ1 = 24, 000 µ2 = 22, 000
σ1 = 1, 200 σ2 = 1, 080

µ1 − µ2 = 24, 000 − 22, 000 = 2, 000


(1200)2 (1080)2
Var(X 1 − X 2 ) = +
100 90
= 14, 400 + 12, 960
= 27, 360

When sampling is from non-normally distributed populations or from


populations whose distribution are not known, we need to take large samples
so that we can apply the Central Limit Theorem, under which the differ-
ence between two samples is approximately normally distributed with mean
σ2 σ2
µ1 − µ2 and variance 1 + 2 . To find probabilities
222
associated with spe-
n n
cific values of the statistic, our procedure would be the same as that given
100 90
= 14, 400 + 12, 960
ADVANCED TOPICS IN INTRODUCTORY= 27, 360
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS

When sampling is from non-normally distributed populations or from


populations whose distribution are not known, we need to take large samples
so that we can apply the Central Limit Theorem, under which the differ-
ence between two samples is approximately normally distributed with mean
σ2 σ2
µ1 − µ2 and variance 1 + 2 . To find probabilities associated with spe-
n n
cific values of the statistic, our procedure would be the same as that given
when sampling is from normally distributed populations. For example, the
problem in Example 7.12 is considered to be normally distributed because
n1 and n2 are both far greater than 30.

7.4.3 Sampling Distribution of Difference between Proportions

Suppose two random samples of different sizes are drawn from binomial
populations. If we need to compare the number of successes in the samples
242have to work with their proportions.
we Advanced Topics in Introductory Probability

Theorem 7.9
If independent random samples of different sizes n1 and n2 are drawn
from two binomial populations with proportions p1 and p2 respec-
tively, then the distribution of the difference between the two sample
proportions, p̂1 − p̂2 has mean

E(p̂1 − p̂2 ) = p1 − p2

and variance
p1 (1 − p1 ) p2 (1 − p2 )
Var(p̂1 − p̂2 ) = +
n1 n2

Example 7.13
In a national election, 60 percent of voters in Community A are in favour
of a certain candidate and 50 percent in Community B are in favour of the
candidate. If a sample of 210 voters from Community A and 160 voters from
Community B are drawn, find

(a) the mean of the difference between the sample proportions;

(b) the variance and standard deviation of the sampling distribution of


difference between the two sample proportions.

Solution

p1 = 0.6 p2 = 0.5 n1 = 200 n2 = 160


(a) E(p̂1 − p̂2 ) = p1 − p2 = 0.6 − 0.5 = 0.1

p1 (1 − p1 ) p2 (1 − p2 )
(a) Var(p̂1 − p̂2 ) = +
n1 n2
0.6(1 − 0.6) 0.5(1 − 0.5)
= +
200 160
0.6(0.4) 0.5(0.5)
= +
200 160
= 0.0012 + 223
0.00156
Solution
ADVANCED TOPICS IN INTRODUCTORY
p1 =COURSE
PROBABILITY: A FIRST
0.6 p2IN= 0.5 n1 = 200 n2 = 160 SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
(a) E(p̂1 − p̂2 ) = p1 − p2 = 0.6 − 0.5 = 0.1

p1 (1 − p1 ) p2 (1 − p2 )
(a) Var(p̂1 − p̂2 ) = +
n1 n2
0.6(1 − 0.6) 0.5(1 − 0.5)
= +
200 160
Sampling Distributions II 0.6(0.4) 0.5(0.5) 243
= +
200 160
= 0.0012 + 0.00156
= 0.00276

σ(p̂1 −p̂2 ) = 0.00276 = 0.0525

If n1 and n2 are sufficiently large then the sampling distribution of


p̂1 − p̂2 is approximately Normal. That is, by the Central Limit Theorem
 
 
 (p01 − p02 ) − (p1 − p2 ) 
P (p̂1 − p̂2 ≤ p01 − p02 ) ≈ Φ  
 

 p1 (1 − p1 ) p2 (1 − p2 ) 
+
n1 n2

Example 7.14
In Example 7.13 if in the sample of voters from Community A and Commu-
nity B there were 30 and 20 voters respectively in favour of the candidate
what is the probability of obtaining this or a smaller difference in the sample
proportions if the belief about the population parameters is correct?

Solution

x1 = 30, x2 = 20
x1 30 �e Graduate Programme
I joined MITAS because
p01 = = = 0.143 for Engineers and Geoscientists
n1 210
I wanted real responsibili� 20
www.discovermitas.com
Maersk.com/Mitas �e G
p02I joined
x2 MITAS because
= = = 0.125 for Engine
I wanted
n 2 real responsibili�
160 Ma
p01 − p02 = 0.143 − 0.125 = 0.018
p1 − p2 = 0.6 − 0.5 = 0.1

The desired probability is


 
 
 (p01 − p02 ) − (p1 − p2 ) 
P (p̂1 − p̂2 ≤ 0.018) ≈ Φ 

 p1 (1 − p1 )


p2 (1 − p2 )  Month 16
+

n1 n2

I was a construction Mo

 0.018 − 0.1


supervisor ina const
I was
= Φ 

 0.6(0.4)


0.5(0.5) 
the North Sea super
+
200 160 advising and the No
Real work he
helping foremen advis
International
al opportunities
Internationa
�ree wo
work
or placements ssolve problems
Real work he
helping fo
International
Internationaal opportunities
�ree wo
work
or placements ssolve pr

224
Example 7.14
In Example 7.13 if in the sample of voters from Community A and Commu-
ADVANCED TOPICS IN INTRODUCTORY
nity B there Awere
PROBABILITY: 30COURSE
FIRST and 20 INvoters respectively in favour of the candidate
SAMPLING DISTRIBUTIONS II
what is the probability
PROBABILITY of obtaining
THEORY – VOLUME III this or a smaller difference in the sample
SAMPLING DISTRIBUTION OF STATISTICS
proportions if the belief about the population parameters is correct?

Solution

x1 = 30, x2 = 20
x1 30
p01 = = = 0.143
n1 210
x2 20
p02 = = = 0.125
n2 160
p01 − p02 = 0.143 − 0.125 = 0.018
p1 − p2 = 0.6 − 0.5 = 0.1

The desired probability is


 
 
 (p01 − p02 ) − (p1 − p2 ) 
P (p̂1 − p̂2 ≤ 0.018) ≈ Φ  
 

 p1 (1 − p1 ) p2 (1 − p2 ) 
+
n1 n2
 
 
 0.018 − 0.1 
= Φ 


244  0.6(0.4)
Advanced 0.5(0.5)  Probability
Topics in Introductory
+
200 160
 
−0.082
= Φ √
0.0012 + 0.00276
 
−0.082
= Φ
0.0629
= Φ(−1.30) = 0.0968

7.5 SAMPLING DISTRIBUTION OF VARIANCE

If X1 , X2 , ..., Xn denote the random variables for a sample j of size n, then


the random variable s2j which denotes the sample variance is defined by

(X1j − X j )2 + (X2j − X j )2 + ... + (Xnj − X j )2


s2j =
n
where X nj , for instance, means the nth value of the j th sample and X j is
the mean of the j th sample. To find the sampling distribution of variance we
need to calculate the mean and variance of all the possible sample values.

7.5.1 Sampling Distribution of Variance with Replacement

Example 7.15
With reference to Examples 7.1, if a sample of size two is drawn with re-
placement from the population, find

(a) the mean of the sampling distribution of variances;

(b) the variance of the sampling distribution of the variances.

Solution
The sample values and means have been obtained in Example 7.2. For the
sake of convenience, we reproduce them here. The sample values i for each
j th sample of size two (Xij ) are:
225
(4, 4) (4, 5) (4, 7) (4, 8) (5, 4) (5, 5) (5, 7) (5, 8)
placement from the population, find
ADVANCED TOPICS IN INTRODUCTORY
(a) the mean of the sampling distribution
PROBABILITY: A FIRST COURSE IN
of variances; SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
(b) the variance of the sampling distribution of the variances.

Solution
The sample values and means have been obtained in Example 7.2. For the
sake of convenience, we reproduce them here. The sample values i for each
j th sample of size two (Xij ) are:

(4, 4) (4, 5) (4, 7) (4, 8) (5, 4) (5, 5) (5, 7) (5, 8)


(7, 4) (7, 5) II
Sampling Distributions (7, 7) (7, 8) (8, 4) (8, 5) (8, 7) (8, 8) 245

The corresponding sample means for each j th sample (Xj ) are:

4.0 4.5 5.5 6.0 4.5 5.0 6.0 6.5


5.5 6.0 7.0 7.5 6.0 6.5 7.5 8.0

The corresponding sample variances s2j for the 16 samples are:

0 0.25 2.25 4.00 0.25 0 1.00 2.25


2.25 1.00 0 0.25 4.00 2.25 0.25 0

Note that we have defined the sample variance as


n
i=1 (Xij − X j )2
s2j =
n

(a) Mean of sampling distribution of variances


k 2
j=1 sj
E(s2j ) = µsj =
Nn
0 + 0.25 + 2.25 + ... + 0.25 + 0
=
16
= 1.25

(b) The variance of the sampling distribution of variances


k
j=1 (sj − µs2 )2
2
Var(s2j ) =
Nn
(0 − 1.25)2 + (0.25 − 1.25)2 + (2.25 − 1.25)2 + ...
=
16
+(0.25 − 1.25)2 + (0 − 1.25)2
16
1.5625 + 1.0000 + 1.0000 + ... + 1.0000 + 1.5625
=
16
29.5
=
16
= 1.84375

226
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
246 Advanced Topics in Introductory Probability

Theorem 7.10
Let X1 , ..., Xn be a random sample of size n, from a random variable
X with expectation µ and variance σ 2 . Let s2 be the sample variance
defined as n
1 
s2 = (X − X)2
n i=1
If sampling is from an infinite population or with replacement from
a finite population, then
n−1 2
E(s2 ) = σ
n

Proof

(Xi − X)2 = [(Xi − µ) − (X − µ)]2


Taking the sum of both sides
n
 n

(Xi − X) 2
= [(Xi − µ)2 − 2(X − µ)(Xi − µ) + (X − µ)2 ]
i=1 i=1
n
 n

= [(Xi − µ)2 − 2(X − µ) (Xi − µ)
i=1 i=1
+n(X − µ)2 ] (i)
But
n
 n
 
(Xi − µ) = Xi − µ
i=1 i=1
= nX − n µ
= n(X − µ) (ii)
Substituting (ii) into (i) we obtain
n
 n

(Xi − X)2 = (X − µ)2 − 2n (X − µ)2 + n(X − µ)2
i=1 i=1
n

= (Xi − µ)2 − n(X − µ)2
i=1

www.job.oticon.dk

227
(Xi − X)2 = [(Xi − µ)2 − 2(X − µ)(Xi − µ) + (X − µ)2 ]
i=1 i=1
ADVANCED TOPICS IN INTRODUCTORY
n n

PROBABILITY: A FIRST =
COURSE [(X
INi − µ)2 − 2(X − µ) (Xi − µ) SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME
i=1 III i=1 SAMPLING DISTRIBUTION OF STATISTICS

+n(X − µ)2 ] (i)


But
n
 n
 
(Xi − µ) = Xi − µ
i=1 i=1
= nX − n µ
= n(X − µ) (ii)
Substituting (ii) into (i) we obtain
n
 n

(Xi − X)2 = (X − µ)2 − 2n (X − µ)2 + n(X − µ)2
i=1 i=1
n
Sampling Distributions II  247
Sampling Distributions =
II (Xi − µ)2 − n(X − µ)2 247
i=1

Taking the expectation of both sides, we obtain:


Taking the expectation of both sides, we obtain:

n  
n   
E   (Xi − X)22  = E   (Xi − µ)22  − nE (X − µ)22 
n n
E i=1 (Xi − X) = E i=1 (Xi − µ) − nE (X − µ)
i=1 n i=1

n
=  E(Xi − µ)22 − nE[(X − µ)22 ]
= i=1
E(Xi − µ) − nE[(X − µ) ]
i=1
n
n
=  Var(Xi ) − n Var(X)
= i=1
Var(Xi ) − n Var(X)
i=1  
 σ2 
= nσ 22
− n σ2
= nσ − n n
n
= (n − 1)σ 22 (iii)
= (n − 1)σ (iii)
Now,
Now, n
1 n
= 1
s22  (X − X)2
s = n i=1 (Xii − X)2
n i=1
and
and  
n
1  
n
E(s22 ) = E 1  (Xi − X)2
2
E(s ) = E n i=1 (Xi − X)
n i=1
 n 
1   n

= 1 E (Xi − X)
= n E i=1 (Xi − X)
n
n − 1 i=1
= n − 1 σ 22 (iv)
= n σ (iv)
n
which is very nearly σ 22 for large values of n (say n ≥ 30). Thus, if we use s22
which is very nearly2σ for large values of n (say n ≥ 30). Thus, if we use s
as an estimate of σ 2 we shall be underestimating the population variance.
as an estimate of σ we shall be underestimating the population variance.
To eliminate such a bias we often use what is called the desired unbiased
To eliminate such a bias we often use what is called the desired unbiased
estimator of variance (also called the corrected sample variance) defined by
estimator of variance (also called the corrected sample variance) defined by
n
ŝ22 = n s22
ŝ = n − 1 s
n−1
From this definition
From this definition  
 n 
E(ŝ22 ) = E n s22
E(ŝ ) = E n − 1 s
nn−1 2
= n E(s )
= n − 1 E(s2 )
n−1

228
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
248 Advanced Topics in Introductory Probability

n n−1 2
= · σ
n−1 n
= σ2

The computational formula for the corrected sample variance is


n 1 
ŝ2 = · (Xi − X)2
n−1 n

(Xi − X)2
=
n−1
Because of this, some statisticians choose to define the sample variance by
simply replacing n by n−1 in the denominator of the definition of s2 given in
Example 7.15. Others continue to define the sample variance by maintaining
the n in the denominator, because by doing this many later results are
simplified. In practice the corrected sample variance is used when n < 30.
When n ≥ 30 the sample variance is used. In fact if n is large there is very
little difference in the two results.

Example 7.16
Refer to Example 7.1. List all possible samples of size two that can be drawn
from the population with replacement and calculate

(a) the variance of the sample values;

(b) the corrected sample variance;

(c) the expectation of the sample variance.

Solution
From Example 7.2, N = 4, σ 2 = 2.5. From Example 7.3, n = 2 and
N n = 16. The mean of the sample means is 6.

(a) To find the variance we subtract the mean of the sample means from
each of the sample means, square it and divide it by the number of
the sample means.
Thus,

(4 −II6)2 + (4.5 − 6)2 + · · · + (7.5 − 6)2 + (8 − 6)2


Sampling Distributions 249
s2 =
16
= 1.25
(b) The corrected sample variance is:
(4 − 6)2 + (4 − 6)2 + · · · + (8 − 6)2 + (8 − 8)2
ŝ2 =
16 − 1
= 2.503

From the results in Example 7.16, we may note the following.


(i) The expectation of the sample variance is
n−1 2 2−1 
E(s2 ) = σ = 2.5 = 1.25
n 2
which is the same as the mean of the sampling distribution of variances
obtained in Example 7.15(a).
(ii) As the number of sample values increases, s2 and ŝ2 tend to be the
same. Using the definitional formula for the corrected sample variance
we obtain
n 2 229
ŝ2 = s2 = (1.25) = 2.5
n−1 2−1
(i) The expectation of the sample variance is
n−1
ADVANCED TOPICS IN INTRODUCTORY 2−1 
E(s2
) = 2.5 = 1.25
σ2 =
PROBABILITY: A FIRST COURSE IN n 2 SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
which is the same as the mean of the sampling distribution of variances
obtained in Example 7.15(a).
(ii) As the number of sample values increases, s2 and ŝ2 tend to be the
same. Using the definitional formula for the corrected sample variance
we obtain
n 2
ŝ2 = s2 = (1.25) = 2.5
n−1 2−1
which is the same as the value obtained in part (b) of the example
above with a difference of 0.003.
(iii) The expectation of the corrected sample variance is
n 2
E(ŝ2 ) = E(s2 ) = (1.25) = 2.5 = σ 2
n−1 2−1
(see Example 7.1).

7.5.2 Sampling Distribution of Variance Without Replacement

Example 7.17
With reference to Examples 7.1, if a sample of size two is drawn without
replacement, find
(a) the mean, (b) the variance
of the sampling distribution of variances.

Solution
The sample values and means have been obtained in Example 7.5. For the
sake of convenience, we reproduce them here.
(4, 5) (4, 7) (4, 8) (5, 7) (5, 8) (7, 8)

WHY WAIT FOR


PROGRESS?
DARE TO DISCOVER
Discovery means many different things at
Schlumberger. But it’s the spirit that unites every
single one of us. It doesn’t matter whether they
join our business, engineering or technology teams,
our trainees push boundaries, break new ground
and deliver the exceptional. If that excites you,
then we want to hear from you.

careers.slb.com/recentgraduates

230
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY
250 THEORY – VOLUME Advanced
III SAMPLING
Topics in Introductory DISTRIBUTION OF STATISTICS
Probability

The corresponding sample means are

(4.5) (5.5) (6.0) (6.0) (6.5) (7.5)

(a) Mean of sampling distribution of variances:


s2
E(s2j ) = µs2 =  
j
N
n
0.25 + 2.25 + 4.00 + 1.00 + 2.25 + 0.25
=
6
10
= = 1.6667
6

(b) The variance of sampling distribution of variances is

k
j=1 (sj − µs2 )
2
Var(s2j ) =  
4
2
(0.25 − 1.6667)2 + (2.25 − 1.6667)2
=
6
(4.00 − 1.6667) + (1 − 1.6667)2
2
+
6
(2.25 − 1.6667) + (0.25 − 1.6667)2
2
+
6
2.0070 + 0.3404 + · · · + 0.3404 + 2.0070
=
6
10.5836
=
6
Sampling Distributions =
II 1.7639 251

Theorem 7.11
Let X1 , X2 , ..., Xn be a random sample of size n, with expectation µ
and variance σ 2 . Let s2 be the sample variance defined as
n
1 
s2 = (Xi − X)2
n i=1

If sampling is without replacement from a finite population of size


N , then the variance of the sampling distribution of variances is
  
N n−1
Var(s ) =
2
σ2
N −1 n

Note
As N → ∞ this result reduces to that in Theorem
231 7.10.
Var(s ) = σ
N −1 n
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS

Note
As N → ∞ this result reduces to that in Theorem 7.10.

Example 7.18
With reference to Examples 7.17 and 7.5, find the variance of the sample
variance s2 if sampling is without replacement.

Solution
N = 4, n = 2, σ 2 = 2.5 (from Example 7.2). Hence,

  
4 2−1
Var(s ) =
2
(2.5)
4−1 2
= 1.6667

which is equal to the results obtained in Example 7.17.

To find the sampling distribution of the sample variance, it is more


252
convenient to do so in terms ofAdvanced Topics
the related in Introductory
random variable. Probability

Theorem 7.12
If random samples of size n are taken from a population having a
Normal distribution, and if

i (Xi
− X)2
s =
2
n−1
then
(n − 1)s2
σ2
has a Chi-square distribution with n − 1 degrees of freedom

Note
Chi-square distribution is discussed in Chapter 8.

Theorem 7.13
If X is normally distributed with mean µ and variance σ 2 and
X1 , X2 ..., Xn is a random sample of size n of X, then the random
variable 
(Xi − µ)2
V = i
σ2
will possess a Chi-square distribution with n degrees of freedom

This chapter concludes discussions on basic theory on sampling and


sampling distributions. The next chapter continues discussions on sampling
distributions which are based on small samples.

EXERCISES

232

7.1 A population consists of five values 2, 4, 6, 8 and 14.


This chapter
ADVANCED TOPICS INconcludes discussions on basic theory on sampling and
INTRODUCTORY
sampling distributions.
PROBABILITY: The next
A FIRST COURSE IN chapter continues discussions on sampling
SAMPLING DISTRIBUTIONS II
PROBABILITY
distributionsTHEORY – VOLUME
which are III small samples.
based on SAMPLING DISTRIBUTION OF STATISTICS

EXERCISES

7.1 A population consists of five values 2, 4, 6, 8 and 14.

(a) Calculate the mean and variance of the population.


Sampling
(b) Distributions 253
II distinct samples of size 2 drawn with replacement
List all possible
from the population and compute the mean of each sample.
(c) Construct the sampling distribution of the sample means and
calculate the mean and variance of the distribution.
(d) Comment on your results.

7.2 Rework Exercise 7.1 for the case when a random sample of size 2 is
drawn without replacement.

7.3 Referring to Exercise 7.1, calculate

(a) the expectation of the sample mean;


(b) the variance and hence the standard deviation of the sample mean
if sampling is made; (i) with replacement; (ii) without replace-
ment.

7.4 A random sample of 25 adults is taken from a Normal population with


mean height of 1.65 metres and standard deviation of 0.8 metres. Find

(a) (i) the expectation of the sample mean;


(ii) the variance and standard deviation of the sample mean;
(b) the probability that the height of an adult selected at random
will be
(i) less than 1.50 metres; PREPARE FOR A
(ii)
LEADING ROLE.
not less than 1.70 metres; (iii) between 1.55 and 1.68.

7.5 The weights of the products of a certain factory have a mean of 140
kilograms and a standard deviation of 20 kilograms. IfEnglish-taught
180 of the MSc programmes
products are selected at random, find in engineering: Aeronautical,
Biomedical, Electronics,
(a) (i) the expectation of the sample mean; Mechanical, Communication
systems
(ii) the variance and standard deviation of the sample mean;and Transport systems.
No weigh
(b) the probability that a product selected at random will tuition fees.
(i) at most 150 kilograms; (ii) at least 120 kilograms;
(iii) between 110 and 155 kilograms.

7.6 The time that a cashier spends in processing each person’s order is an
E liu.se/master
independent random variable with mean of 45 minutes and standard
deviation of 30 minutes. What is the approximate probability that the
orders of 100 persons can be processed in less than 4000 minutes?


233
(a) the expectation of the sample mean;
ADVANCED TOPICS IN INTRODUCTORY
(b) theAvariance
PROBABILITY: and hence
FIRST COURSE IN the standard deviation of the sample SAMPLING
mean DISTRIBUTIONS II
if sampling is made; (i) with replacement; (ii) without replace-
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
ment.

7.4 A random sample of 25 adults is taken from a Normal population with


mean height of 1.65 metres and standard deviation of 0.8 metres. Find

(a) (i) the expectation of the sample mean;


(ii) the variance and standard deviation of the sample mean;
(b) the probability that the height of an adult selected at random
will be
(i) less than 1.50 metres;
(ii) not less than 1.70 metres; (iii) between 1.55 and 1.68.

7.5 The weights of the products of a certain factory have a mean of 140
kilograms and a standard deviation of 20 kilograms. If 180 of the
products are selected at random, find

(a) (i) the expectation of the sample mean;


(ii) the variance and standard deviation of the sample mean;
(b) the probability that a product selected at random will weigh
(i) at most 150 kilograms; (ii) at least 120 kilograms;
(iii) between 110 and 155 kilograms.

7.6 The time that a cashier spends in processing each person’s order is an
independent random variable with mean of 45 minutes and standard
254 deviation of 30 minutes. Advanced
What is the approximate
Topics probability
in Introductory that the
Probability
orders of 100 persons can be processed in less than 4000 minutes?

7.7 Suppose a population consists of 10 identical balls in a box, four of


which are white and six green. A random sample of five is drawn
without replacement. Find

(a) the expectation of the sample proportion of white balls;


(b) the variance and hence the standard error of the sample propor-
tion.

7.8 Steel wires produced by a certain factory have a mean tensile strength
of 1, 000 kilograms and a variance of 900 kilograms. If a random sample
of 250 wires is drawn from the production line during a certain month
with a total output of 100, 000 units, find

(a) (i) the expectation and (ii) the variance of the sample mean
tensile strength;
(b) the probability that the sample mean will
(i) be more than 1, 010 kilograms;
(ii) be less than 1, 003 kilograms;
(iii) be between 995 and 1, 006 kilograms;
(iv) differ from 1, 000 by 5 kilograms or more;
(v) differ from 1, 000 by at most 7 kilograms.

7.9 Repeat Exercise 7.7 for the case when sample is with replacement.

7.10 A group of children consists of twenty males and twenty-five females.


A random sample of twenty-three children is drawn from this group
with replacement.

(a) Find
(i) the expectation of the proportion of females;
234
(ii) the variance and hence the standard error of the sample pro-
(iii)
(iii) be
be between
between 995
995 and
and 1,
1, 006
006 kilograms;
kilograms;
(iv)
ADVANCED(iv) differ
differ
TOPICS from
from 1, 000 by 5 kilograms or
1, 000
IN INTRODUCTORY by 5 kilograms or more;
more;
(v)
PROBABILITY: A differ
FIRST from 1,
COURSE 000
IN by at most 7 kilograms.
(v) differ from 1, 000 by at most 7 kilograms.
SAMPLING DISTRIBUTIONS II
PROBABILITY THEORY – VOLUME III SAMPLING DISTRIBUTION OF STATISTICS
7.9
7.9 Repeat
Repeat Exercise
Exercise 7.7
7.7 for
for the
the case
case when
when sample
sample is
is with
with replacement.
replacement.
7.10
7.10 A
A group
group of
of children
children consists
consists of
of twenty
twenty males
males and
and twenty-five
twenty-five females.
females.
A random sample of twenty-three children is drawn
A random sample of twenty-three children is drawn fromfrom this
this group
group
with replacement.
with replacement.

(a)
(a) Find
Find
(i)
(i) the
the expectation
expectation of
of the
the proportion
proportion of
of females;
females;
(ii)
(ii) the variance and hence the standard error
the variance and hence the standard error of
of the
the sample
sample pro-
pro-
portion of females.
portion of females.
(b)
(b) What
What isis the
the probability
probability that
that the
the sample
sample proportion
proportion of
of females
females is
is
at least 0.6?
at least 0.6?
7.11
7.11 Rework
Rework Exercise
Exercise 7.10
7.10 if
if sampling
sampling was
was made
made without
without replacement.
replacement.
7.12
7.12 Thirty-six
Sampling percent
percentIIof
Distributions
Thirty-six of University
University staff
staff are
are against
against aa strike
strike action. 255
action. If
If aa
sample of 120 of them are drawn at random with replacement,
sample of 120 of them are drawn at random with replacement, find find
(a) (i) the expectation of the sample proportion,
(ii) the variance and hence the standard error of the sample pro-
portion,
of the University staff who are against a strike action.
(b) What is the probability that the proportion of the University staff
who are against the strike will be between 0.5 and 0.7.
7.13 Rework Exercise 7.12 if sampling is without replacement and if there
are 400 university staff members.
7.14 A box contains 80 black balls and 60 white balls. Two samples of 40
marbles each are randomly selected with replacement from the box
and their colours noted. Suppose that the first and second samples
contained 30 and 25 black balls respectively. Find
(a) the variance of the difference in proportion of black balls in the
two sample;
(b) the probability that the difference between the two samples does
not exceed 10 balls.
7.15 Rework Exercise 7.14 if sampling is without replacement.
7.16 A random sample of size 5 is drawn from a population which is nor-
mally distributed with mean 49 and variance 9. A second random
sample of size 4 is drawn from a different population which is also nor-
mally distributed with mean 39 and variance 4. The two samples are
independent with means X 1 and X 2 respectively. Let W = X 1 − X 2 .
Calculate (a) the mean and variance of W ; (b) P (W > 8.2).
7.17 With reference to Exercise 7.2, find
(a) (i) the mean; (ii) the variance;
of the sampling distribution of variances;
(b) the corrected variance for sampling.
7.18 With reference to Exercise 7.1, find
(a) (i) the mean (ii) the variance
of the sampling distribution of variances;
(b) the corrected variance for sampling.

235
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution

Chapter 8

DISTRIBUTIONS DERIVED FROM NORMAL


DISTRIBUTION

8.1 INTRODUCTION

So far we have encountered distributions which are related to the Normal


distribution on the basis of the Central Limit Theorem. Recall that by
the Central Limit Theorem (CLT) the probability distribution of a sample
statistic from an independent identical non-Normal distribution is approxi-
mately Normal if the sample size is sufficiently large. When the sample size
is small we cannot apply the CLT or assume that sampling distributions
are Normal. The sampling distribution in this case differs from one case
to another. Besides the binomial and Poisson distributions there are three
probability distributions, which are byproducts of the Normal distribution
such can be described by one of them under certain conditions. These are
the Chi-square (χ2 ) distribution, the Student’s t (briefly t) distribution, and
the Fisher’s variance ratio (briefly F ) distribution.

8.2 χ2 DISTRIBUTION

Perhaps, the Chi-square distribution is, of all statistical distributions, the


most widely used one in statistical applications. We have already encoun-
Click here
tered the Chi-square distribution in Chapter 19 where we considered to sam-
learn more

TAKE THE
pling distribution of the sample variance.

RIGHT TRACK
256

Give your career a head start


by studying with us. Experience the advantages
of our collaboration with major companies like
ABB, Volvo and Ericsson!

Apply by World class


www.mdh.se
15 January research

236
probability distributions, which are byproducts of the Normal distribution
such can be
ADVANCED described
TOPICS by one of them under certain conditions. These are
IN INTRODUCTORY
the Chi-square (χ2 ) distribution, the Student’s t (briefly t) distribution, and
PROBABILITY: A FIRST COURSE IN
the Fisher’s THEORY
PROBABILITY variance– ratio
VOLUME(briefly
III F ) distribution.
Distributions Derived from Normal Distribution

8.2 χ2 DISTRIBUTION

Perhaps, the Chi-square distribution is, of all statistical distributions, the


most widely used one in statistical applications. We have already encoun-
tered the Chi-square distribution in Chapter 19 where we considered sam-
pling distribution
Distributions of the
Derived sample
From variance.
Normal Distribution 257

256
8.2.1 Definition of Chi-square Distribution

Recall from Volume II (Nsowah-Nuamah, 2018) that the Chi-square distri-


bution is a special type of the Gamma distribution with α = v2 and β = 2.

Definition 8.1 CHI-SQUARE DISTRIBUTION


A continuous random variable X is said to have a Chi-square dis-
tribution with parameter v > 0 if its probability density function is
given by
1 v x
f (x) = v  v  x 2 −1 e− 2 , x>0
22 Γ 2

The Chi-square distribution is completely defined by its parameter v


which is called the number of degrees of freedom (df). If a random variable
X follows a Chi-square distribution with v degrees of freedom then we write
X ∼ χ2(v) . Fig. 8.1 depicts the density curves of the Chi-square distributions
for a few selected values of v.

Fig. 8.1 χ2 Distribution with various Degrees of Freedom

237
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
258 THEORY – VOLUME Advanced
III Distributions
Topics Derived
in Introductory from Normal Distribution
Probability

8.2.2 Properties of Chi-square Distribution

Property 1

Theorem 8.1
Suppose X has a χ2 with parameter v distribution. Then the
moment-generating function of X is
v 1
MX (t) = (1 − 2t)− 2 , t<
2

The proof of this theorem is left as an exercise for the reader (see Exercise
8.1)

Example 8.1
Suppose X has a moment generating function (m.g.f)

1
MX (t) = (1 − 2t)−10 , for t <
2

then this is a χ2 distribution. What is its degree of freedom.

Solution
The Chi-square distribution has v degrees of freedom. The exponent of the
m.g.f. equals −10. Hence −10 = − v2 which gives v = 20.

Example 8.2
Suppose X has a Chi-square distribution with 5 degrees of freedom. Find
its m.g.f.

Solution
From Theorem 8.1 with v = 5, the moment generating function of X is

Distributions Derived From Normal Distribution


v 5 259
MX (t) = (1 − 2t)− 2 = (1 − 2t)− 2

Property 2

Theorem 8.2
Suppose X has a χ2 distribution with parameter v. Then

(a) E(X) = v (b) Var(X) = 2 v

Proof
Since the Chi-square distribution is a special case of the Gamma distribution
when α = v2 and β = 2, its expectation and variance is obtained simply by
substituting these values in the formulas of Theorem 10.18. Thus,
Example 8.3
238
Refer to Example 8.2. Find the (a) mean (b) variance.
Suppose X has a χ2 distribution with parameter v. Then

(a) E(X) = v
ADVANCED TOPICS IN INTRODUCTORY (b) Var(X) = 2 v
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution

Proof
Since the Chi-square distribution is a special case of the Gamma distribution
when α = v2 and β = 2, its expectation and variance is obtained simply by
substituting these values in the formulas of Theorem 10.18. Thus,
Example 8.3
Refer to Example 8.2. Find the (a) mean (b) variance.

Solution
(a) E(X) = v = 5 (b) Var(X) = 2 v = 10

Property 3

Theorem 8.3
The cumulative distribution function of the Chi-square distribution
is given by   x
v v v x
22 Γ t 2 −1 e− 2 dt
2 0

Property 4

Theorem 8.4
If X1 , X2 , ..., Xn are independent random variables having Chi-square
distributions with v1 , v2 , ... , vn degrees of freedom respectively,
then the distribution of Y = X1 + X2 + ... + Xn will possess a
Chi-square distribution with v1 + v2 + ... + vn degrees of freedom

How will people travel in the future, and


how will goods be transported? What re-
sources will we use, and how many will
we need? The passenger and freight traf-
fic sector is developing rapidly, and we
provide the impetus for innovation and
movement. We develop components and
systems for internal combustion engines
that operate more cleanly and more ef-
ficiently than ever before. We are also
pushing forward technologies that are
bringing hybrid vehicles and alternative
drives into a new dimension – for private,
corporate, and public use. The challeng-
es are great. We deliver the solutions and
offer challenging jobs.

www.schaeffler.com/careers

239
The cumulative distribution function of the Chi-square distribution
is given by   x
ADVANCED TOPICS IN INTRODUCTORY
v v v x
22 Γ t 2 −1 e− 2 dt
PROBABILITY: A FIRST COURSE IN 2 0
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution

Property 4

Theorem 8.4
If X1 , X2 , ..., Xn are independent random variables having Chi-square
distributions with v1 , v2 , ... , vn degrees of freedom respectively,
then the distribution of Y = X1 + X2 + ... + Xn will possess a
Chi-square distribution with v1 + v2 + ... + vn degrees of freedom
260 Advanced Topics in Introductory Probability

Corollary 8.1
If X1 , X2 , ..., Xn are independent random variables having Chi-square
distributions with 1 degree of freedom each, then the distribution of
Y = X1 + X2 + ... + Xn will possess a Chi-square distribution with n
degrees of freedom.

Corollary 8.2
If X and Y are independent and X ∼ χ2v and Y ∼ χ2u , then
Z = X + Y ∼ χ2v+u .

Property 5

Theorem 8.5
If X1 , and X2 are independent random variables, X1 has a Chi-
square distribution with v1 degrees of freedom and X1 + X2 has a
Chi-square distribution with v (> v1 ) degrees of freedom, then X2
has a Chi-square distribution with v − v1 degrees of freedom.

The Chi-square distribution is positively skewed (Fig. 8.1). It has been


extensively tabulated. Table I in Appendix A contains the values of χ2α,v
for  2
  χα,v
P χ2 ≤ χ2α,v = f (x) dx = 1 − α
0
where

α = 0.005, 0.01, 0.025, 0.05, 0.95, 0.975, 0.99, 0.995 and v = 1, 2, ..., 30

It is a convention in statistics to use the same symbol χ2 for both the random
variable and a value of that random variable. Thus percentile values of the
Chi-square distribution with v degrees of freedom is denoted by χ2α,v or
simply χ2α if n is understood, where the suffix α is used from now on and
in applications of probability and Statistics, to denote “lower percentile”
(100α% point).
Example 8.4
Find from Table I the following:
(a) χ20.025, 13 (b) χ20.99, 5 (c) χ20.99, 28 (d) χ20,95, 3 (e) χ20.975, 10

240
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
Distributions Derived From Normal Distribution 261

Solution

(a) This is a Chi-square for probability 0.025 and degrees of freedom of 13.
Now, referring to Table I, we proceed downward under column labelled
d.f. until we reach entry 12. Then we proceed right to meet column
headed χ20.025 . The value at that meeting point, which is 24.736, is
the required value of χ20.025,12 . Therefore the probability that a χ2 -
distributed random variable with 12 degrees of freedom does not ex-
ceed 24.736 is 0.025.

(b) This is a Chi-square for probability 0.99 and degree of freedom of 5.


Proceeding as in (a), that is, we go to Table I and proceed downward
under column v until we reach the entry 4. Then proceed right to
meet column headed χ20.99 . The value at the meeting point 0.554 is
the required value of χ20.99,5 . Therefore the probability that a χ2 -
distributed random variable with 5 degrees of freedom does not exceed
0.554 is 0.99.

The reader should complete this example.

Example 8.5
Suppose a certain Chi-square distribution value for a probability of 0.975 is
30.2. What is the degree of freedom.

Solution
Refer to Table I and proceed downward under column 0.975 until you locate
the value 30.2 in the body of the table. Then proceed left to meet column
headed v. The value of v = 17 is the required degrees of freedom.

8.2.3 Relationship Between Chi-Square and Normal


Distributions

Chi-square Distribution appears in the theory associated with random vari-


ables which are normally distributed.

Theorem 8.6
Let Z be a Standard Normal variable. Then U = Z 2 is a Chi-square
distribution with 1 degree of freedom
262 Advanced Topics in Introductory Probability

The Chi-square distribution with 1 degree of freedom is denoted by χ2(1) .

Proof
In Theorem 10.28, we proved that

1 u 1 t
P (U ≤ u) = √ t− 2 e− 2 dt (i)
2π 0
If we put v = 1 in the Chi-square cumulative distribution function (Theorem
8.3), that is, if we consider Chi-square
 √ distribution with 1 degree of freedom,
and recalling also that Γ 2 = π, we will obtain expression (i).
1

Note
If X ∼ N (µ, σ), then
 2
X −µ
∼ χ21
σ
241
1 1 t
P (U ≤ u) = √ t− 2 e− 2 dt (i)
2π 0
ADVANCED TOPICS IN INTRODUCTORY
If we put v = 1 in the Chi-square cumulative distribution function (Theorem
PROBABILITY: A FIRST COURSE IN
8.3), that is, THEORY
PROBABILITY if we consider
 Chi-square
– VOLUME  III√ distribution with 1 degree
Distributions of freedom,
Derived from Normal Distribution
and recalling also that Γ 2 = π, we will obtain expression (i).
1

Note
If X ∼ N (µ, σ), then
 2
X −µ
∼ χ21
σ

Theorem 8.7
Let Z1 , Z2 , ... , Zv be n independent Standard Normal random vari-
ables. Then the sum of the squares of these variables is a Chi-square
variable, χ2 , with v degrees of freedom. That is,

χ2(v) = Z12 + Z22 + ... + Zv2


v

= Zi2 where Zi are independent
i=1

Example 8.6
Suppose Z1 , Z2 , ... , Zv is a random sample from the Standard Normal dis-
tribution. Find the number c such that
 22 

P Zi2 > c = 0.10
i=1

Solution
22

By Theorem 8.7, Zi has a χ2 distribution with 22 degrees of freedom.
i=1


678'<)25<2850$67(5©6'(*5((

&KDOPHUV8QLYHUVLW\RI7HFKQRORJ\FRQGXFWVUHVHDUFKDQGHGXFDWLRQLQHQJLQHHU
LQJDQGQDWXUDOVFLHQFHVDUFKLWHFWXUHWHFKQRORJ\UHODWHGPDWKHPDWLFDOVFLHQFHV
DQGQDXWLFDOVFLHQFHV%HKLQGDOOWKDW&KDOPHUVDFFRPSOLVKHVWKHDLPSHUVLVWV
IRUFRQWULEXWLQJWRDVXVWDLQDEOHIXWXUH¤ERWKQDWLRQDOO\DQGJOREDOO\
9LVLWXVRQ&KDOPHUVVHRU1H[W6WRS&KDOPHUVRQIDFHERRN

242
Suppose Z1 , Z2 , ... , Zv is a random sample from the Standard Normal dis-
tribution. Find the number c such that
ADVANCED TOPICS IN INTRODUCTORY
 22 
PROBABILITY: A FIRST COURSE IN
P III Zi2
PROBABILITY THEORY – VOLUME >c = Distributions
0.10 Derived from Normal Distribution
i=1

Solution
22
Distributions Derived From Normal Distribution 263
By Theorem 8.7, Zi has a χ2 distribution with 22 degrees of freedom.
i=1

Reading from Table I in Appendix A the row labelled 22 d.f. and the column
headed by an upper-tail area of 0.10, we get the number 30.813. Therefore
 22 

P Zi2 > 30.813 = 0.10
i=1

which may equivalently be stated as


 22 

P Zi2 ≤ 30.813 = 0.90
i=1

Thus c = 30.813

Theorem 8.8
The Chi-square distribution approaches the Normal distribution
N (v, 2v) as the degrees of freedom v gets large

In practice when v > 30, Chi-square probabilities can be computed by


employing Normal functions in the usual fashion.

8.2.4 Relationship Between Chi-Square and Multinomial


Distributions

Theorem 8.9
Suppose X1 , X2 , ... , Xs has a joint multinomial distribution with
parameters n, p1 , p2 , · · · , ps . Then the sum
s
 (Xi − npi )2
i=1
npi

tends to the Chi-square distribution as n → ∞

Theorem 8.9 is very important in applied statistics. It is used for goodness-


of-fit test and contingency-table tests (tests of independence and homogene-
ity). The details of these tests may be found in Statistics books.
264 Advanced Topics in Introductory Probability

8.3 t DISTRIBUTION

The t-distribution was first introduced by W. Gosset who published his work
under the name of ”Student”.

8.3.1 Definition of t Distribution

Definition 8.2 t DISTRIBUTION


A continuous random variable X is said to have the Student’s t dis-
243
tribution with parameter v if its probability density function is given
8.3 t DISTRIBUTION
8.3 t DISTRIBUTION
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST
The t-distribution COURSE
was IN
first introduced by W. Gosset who published his work
The t-distribution
PROBABILITY THEORYwas first introduced
– VOLUME
under the name of ”Student”. III by W. Distributions
Gosset who published
Derivedhis work
from Normal Distribution
under the name of ”Student”.
8.3.1 Definition of t Distribution
8.3.1 Definition of t Distribution

Definition 8.2 t DISTRIBUTION


Definition
A continuous8.2 t DISTRIBUTION
random variable X is said to have the Student’s t dis-
A continuous random variable
tribution with parameter is said to have
v if itsXprobability thefunction
density Student’s t dis-
is given
tribution
by with parameter v if its probability density function is given
by
 
v + 1   v+1
Γ v+1  2 − v+1
2
2
x
f (x) = Γ 2  1 + x 2 − 2 −∞<x<∞
f (x) = √v π Γ  v  1 + v −∞<x<∞
√ v v
vπΓ 2
2

The parameter v is called the degree of freedom. The t distribution with v


The parameter
degrees of freedomv is is
called the denoted
usually degree ofbyfreedom. The t distribution
tα,v or simply tv . Fig. 8.2 with v
shows
degrees
the curveofoffreedom is usually
the density of thedenoted by tα,vfororvarious
t distribution simplydegrees
tv . Fig.of8.2 shows
freedom.
the curve of the density of the t distribution for various degrees of freedom.

Distributions Derived From Normal Distribution 265


Fig. 8.2 t Distribution with various Degrees of Freedom
Fig. 8.2 t Distribution with various Degrees of Freedom
8.3.2 Properties of t Distribution

Property 1
The t distribution is symmetric and for α = 0.50, t(1−α),v = −tα,v

Property 2

Theorem 8.10
Suppose a random variable X has the Student’s t distribution with
parameter v. Then

(a) E(X) = 0, for v > 1


v
(a) Var(X) = , for v > 2
v−2

Thus a t-distribution possesses no mean when v = 1 and the variance does


not exist for v ≤ 2.

Property 2

244
Theorem 8.11
(a) Var(X) = , for v > 2
v−2
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
Thus a t-distribution
PROBABILITY THEORY – possesses
VOLUME IIIno mean when v = 1 and the
Distributions variance
Derived does
from Normal Distribution
not exist for v ≤ 2.

Property 2

Theorem 8.11
If X1 , X2 · · · , Xn is a random sample from the Normal population
having the mean µ and variance σ 2 , and let X be the sample mean,
then
X −µ
T = √
ŝ/ n
has the Student’s t distribution with v = n − 1 degrees of freedom
where n
1 
ŝ =
2
(Xi − X)2
n − 1 i=1

The t-distribution has been tabulated. Table II in Appendix A contains


values of t(1−α),v for
266 Advanced Topics in Introductory Probability
α = 0.10, 0.05, 0.025, 0.01, 0.005 and v = 1, 2, 3, 4, ..., 30, 40, 60, 120, ∞,
where tα,v is such that the area to the left of tα,v under the curve of the t
distribution with v degrees of freedom is equal to 1 − α. That is, t1−α,v is
such that

 tα,v
P (t ≤ tα,v ) = f (x) dx = 1 − α
−∞

To say that P (t ≤ tα,v ) = 1 − α is the same as saying that

P (t > t−α,v ) = 1 − P (t > tα; v )

It is therefore uncommon to find texts reporting the α instead of 1 − α at


the top of the t distribution table.

Example 8.7
If t distribution has 25 degrees of freedom, find
(a) t0.975,25 (b) t0.05,25

Solution

(a) From Table II in Appendix A with v = d.f. = 25 and 1 − α = 0.975


we proceed downward under column d.f. until we reach 25. Then we
proceed right to meet column headed 0.975. The value at that meeting
point, which is 2.0595 is the required value.

Therefore the probability that a random variable with t distribution


with 25 degrees of freedom does not exceed 3.345 is 0.975.

(b) The value of t at α = 0.05 can be obtained from the value of t at


1 − α = 0.95 in Table II in Appendix A. Using the same procedure as
in (a), the reader will see that t0.05,25 = 1.708.
245
Therefore the probability that a t- distributed random variable with
proceed right to meet column headed 0.975. The value at that meeting
point, which is 2.0595 is the required value.
ADVANCED TOPICS IN INTRODUCTORY
Therefore
PROBABILITY: the probability
A FIRST COURSE IN that a random variable with t distribution
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
with 25 degrees of freedom does not exceed 3.345 is 0.975.

(b) The value of t at α = 0.05 can be obtained from the value of t at


1 − α = 0.95 in Table II in Appendix A. Using the same procedure as
in (a), the reader will see that t0.05,25 = 1.708.

Therefore
Distributions the probability
Derived that Distribution
From Normal a t- distributed random variable with
267
25 degrees of freedom does not exceed 1.708 is 0.90.

8.3.3 Relationship Between t, χ2 and Normal Distributions

Theorem 8.12
If Z is a Standard Normal variable and χ2 is a Chi-square distribution
with v degrees of freedom and Z and χ2 are independently distributed
then the distribution
Z
T =
χ2
v
has a Student’s t distribution with v degrees of freedom

Proof
We simply express the given ratio in a different form as follows:
 
X − µ
 
 σ 

X −µ n
s = 
√ s2
n
σ2
Z
= 
(n − 1)s2
(n − 1)σ 2
Z
= 
χ2n−1
n−1

(n − 1)s2
since = χ2n−1 by Theorem 7.12
σ2
By Definition 8.11, this is a t distribution with n − 1 degrees of freedom.

268
Note Advanced Topics in Introductory Probability
This
268 quantity is usually called Advanced
‘Student’sTopics
t and the corresponding
in Introductory distribu-
Probability
tion is called ‘Student’s t distribution.
Theorem 8.13
The t curve approaches a Normal curve as n approaches infinity
Theorem 8.13
The t curve approaches a Normal curve as n approaches infinity

The t distribution, however, has a greater dispersion than the Standard


Normal distribution. In practice, we can treat tn as N (0, 1) when n > 30.
The t distribution, however, has a greater dispersion than the Standard
Normal distribution. In practice, we can treat tn as N (0, 1) when n > 30.
8.4 F DISTRIBUTION
246
8.4 F DISTRIBUTION
Theorem
ADVANCED 8.13
TOPICS IN INTRODUCTORY
The t curve approaches
PROBABILITY: A FIRST COURSEaIN
Normal curve as n approaches infinity
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution

The t distribution, however, has a greater dispersion than the Standard


Normal distribution. In practice, we can treat tn as N (0, 1) when n > 30.

8.4 F DISTRIBUTION

The F distribution was named after Sir R.A. Fisher.

8.4.1 Definition of F Distribution

Definition 8.3 F DISTRIBUTION


A continuous random variable X is said to have the distribution with
parameter v if its probability density function is given by
     v1
v1 + v2 v1 2
−1
Γ x
2 v2
f (x) =    
Γ
v1
Γ
v2     v1 + v2
2 2 v1 2
1+ x
v2
where x > 0

The F distribution with v1 and v2 degrees of freedom is denoted by


Fα;v1 ,v2 . Note that the order v1 , v2 is important, that is, in general,

Scholarships
F = F
α;v1 ;v2 α;v2 ;v1

Density curves for a few selected degrees of freedom of an F distribution are

Open your mind to


new opportunities
With 31,000 students, Linnaeus University is
one of the larger universities in Sweden. We
are a modern university, known for our strong
international profile. Every year more than Bachelor programmes in
1,600 international students from all over the Business & Economics | Computer Science/IT |
world choose to enjoy the friendly atmosphere Design | Mathematics
and active student life at Linnaeus University.
Master programmes in
Welcome to join us! Business & Economics | Behavioural Sciences | Computer
Science/IT | Cultural Studies & Social Sciences | Design |
Mathematics | Natural Sciences | Technology & Engineering
Summer Academy courses

247
Γ Γ v1
2 2 1+ x 2
v2
ADVANCED TOPICS IN INTRODUCTORY
where x >A 0FIRST COURSE IN
PROBABILITY:
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution

The F distribution with v1 and v2 degrees of freedom is denoted by


Fα;v1 ,v2 . Note that the order v1 , v2 is important, that is, in general,

Fα;v1 ;v2 = Fα;v2 ;v1


Distributions Derived From Normal Distribution 269
Density curves for a few selected degrees of freedom of an F distribution are
shown in Fig 8.3.

Fig. 8.3 (a) Typical Curves of F Distribution


(v1 = 2 with various values of v2 )

Fig. 8.3 (b) Typical Curves of F Distribution


(v1 = 20 with various values of v2 )

248
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
270 Advanced Topics in Introductory Probability

8.4.2 Properties of F distribution

Property 1
As a ratio of two non-negative values (χ2v ≥ 0), an F random variable ranges
in value from 0 to +∞.

Property 2

Theorem 8.14
Suppose a random variable X has the F distribution with parameters
v1 and v2 . Then
v2
(a) E(X) = , v2 > 2
v2 − 2
2v22 (v1 + v2 − 2)
(b) Var(X) = , v2 > 4
v1 (v2 − 2)2 (v2 − 4)

Theorem 8.14 implies that an F variable has no mean when v2 ≤ 2 and has
no variance when v2 ≤ 4.

Property 3 RECIPROCAL PROPERTY

Theorem 8.15
1
Suppose X has the distribution Fv1 ,v2 then Y = has the distribu-
X
tion Fv2 ,v1 , that is,

1
F(1−α,v1 ,v2 ) =
F(α,v1 ,v2 )

where both α and 1 − α are lower - tail percentage points of an F


distribution
Distributions Derived From Normal Distribution 271

Property 4
F is a positively skewed distribution; however, its skewness decreases with
increasing v1 and v2 .

The F distribution has been tabulated for α = 0.50, 0.90, 0.95, 0.975, 0.99,
0.995, 0.999. Table III in Appendix A gives the percentage points for the
right tail of several F distributions at 1, 5 and 10 percent levels. To say
that P (F > Fv1 ,v2 ) = α is the same as saying that

P (0 ≤ F ≤ Fv1 ,v2 ) = 1 − α

Example 8.8
If the F distribution has degrees of freedom v249
1 = 30 and v2 = 24, find the
table value of
that P (F > Fv1 ,v2 ) = α is the same as saying that
ADVANCED TOPICS IN INTRODUCTORY
P (0 ≤
PROBABILITY: A FIRST COURSE IN F ≤ Fv1 ,v2 ) = 1 − α
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution

Example 8.8
If the F distribution has degrees of freedom v1 = 30 and v2 = 24, find the
table value of
(a) F0.90;30,24 (b) F0.95;12,8 (c) F0.05;12,8

Solution

(a) Table III in Appendix A gives the percentage point of the distribution
at 90 percent levels. Read the value of the table at the meeting point
of the degrees of freedom of the numerator, v1 = 30 and degrees of
freedom of the denominator, v2 = 24. The value is 1.67. That is, the
probability that an F-distributed random variable with 30 numerator
degrees of freedom and 24 denominator degrees of freedom does not
exceed 1.67 is 0.90.

(b) From Table III in Appendix A, the value is 3.28. Hence

F0.95;12,8 = 3.28

(c) By Theorem 8.15

1 1
F0.05;12,5 = = = 0.305
F0.95;8,12 3.28

250
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Distributions Derived from Normal Distribution
272 Advanced Topics in Introductory Probability

8.4.3 Relationship among χ2 , t and F Distributions

Theorem 8.16
Let χ2(v1 ) and χ2(v2 ) be two independent χ2 random variables with
parameters v1 and v2 respectively. Then F -distribution is given by

χ2(v1 ) χ2(v2 )
F(v1 ,v2 ) = /
v1 v2
where v1 is the degrees of freedom of the numerator and v2 is the
degrees of freedom of the denominator

Theorem 8.17
Let χ2α,v be a Chi-square random variable with v degrees of freedom.
Then
χ2α,v
Fα,v,∞ =
v

Theorem 8.18
Let F1,v be an F distribution with degrees of freedom 1 and v. Then

F1−α; 1, v = t21− α , v
2

Proof
From Theorem 8.12
Z2
T2 =
χ2
v
But from Theorem 12.6
Z 2 ∼ χ2 with 1 degree of freedom. Hence,
Z2

Distributions Derived From Normal t2v = 12Distribution


= F1, v 273
χ
v
Example 8.9
Verify that F0.95;1,16 = t2.975,16

Solution
From Table III in Appendix A,

F0.95;1,16 = F1−0.05;1,16 = 4.49


From Table III in Appendix A,

t1− α2 ,16 = t0.975,16 = 2.12


Comparing the F and t values, we observe that
4.49 = (2.12)2

The χ2 , t and F distributions have many important applications.


251
The reader will find them in most aspects of both theoretical and applied
t1− α2 ,16 = t0.975,16 = 2.12
ADVANCED TOPICSthe
Comparing IN INTRODUCTORY
F and t values, we observe that
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III4.49 = (2.12)2 Distributions Derived from Normal Distribution

The χ2 , t and F distributions have many important applications.


The reader will find them in most aspects of both theoretical and applied
Statistics especially under Inferential Statistics (estimation and hypothesis
testing).

EXERCISES

8.1 Prove Theorem 8.1.


8.2 A random variable X has the χ2 distribution. Find the degrees of
freedom if its moment generating function is
(a) MX (t) = (1 − 2t)−18 ,
1
(b) MX (t) = ,
(1 − 2t)6
(c) MX (t) = (1 − 2t)−32 ,
where t < 1
2

8.3 If X has Chi-Square distribution


2 x
f (x) = x5 e− 2
9
274 Advanced Topics in Introductory Probability
How many degrees of freedom has f (x).

8.4 A random variable X has the χ2 distribution. Find the m.g.f. of X if


the distribution has degrees of freedom (a) 4, (b) 10, (c) 28.

8.5 Referring to Exercise 8.3. Find the mean and variance of the χ2 vari-
able.

8.6 Suppose X and Y are independent χ2 random variables with v1 and v2


degrees of freedom, respectively. Show that the m.g.f. of Z = X + Y
is v1 +v2
MX (t) = (1 − 2t)−( 2 )

8.7 If a random variable X has the χ2 distribution. Find from Table I in


Appendix A the value for
(a) χ20.90,15 (b) χ20.99,8 (c) χ20.025,21

8.8 If X has the χ2 distribution with degrees of freedom of 12, find x such
that

(a) P (X ≤ x) = 0.05 (b) P (X ≥ x) = 0.975

8.9 If X has the t distribution, find from Table II in Appendix A the value
for
(a) t0.95,28 (b) t0.975,15 (c) t0.90,22

8.10 Suppose X has t distribution and given


(a) P (X ≥ 1.323); (b) P (X ≤ 3.078)
find (i) the degrees of freedom (ii) α

8.11 If X is Fv1 ,v2 , find from Table III in Appendix A the value for (a)
F0.95,20,30 (b) F0.90,20,24

8.12 Suppose X has the F distribution with v1 = 15 and v2 = 8, find x


such that (a) P (X ≥ x) = 0.10 (b) P (X ≤ x) = 0.10
252
(a) P (X ≤ x) = 0.05 (b) P (X ≥ x) = 0.975
ADVANCED TOPICS IN INTRODUCTORY
8.9 If X hasA the
PROBABILITY: t distribution,
FIRST COURSE IN find from Table II in Appendix A the value
PROBABILITY
for THEORY – VOLUME III Distributions Derived from Normal Distribution

(a) t0.95,28 (b) t0.975,15 (c) t0.90,22

8.10 Suppose X has t distribution and given


(a) P (X ≥ 1.323); (b) P (X ≤ 3.078)
find (i) the degrees of freedom (ii) α

8.11 If X is Fv1 ,v2 , find from Table III in Appendix A the value for (a)
F0.95,20,30 (b) F0.90,20,24

8.12 Suppose X has the F distribution with v1 = 15 and v2 = 8, find x


such that (a) P (X ≥ x) = 0.10 (b) P (X ≤ x) = 0.10

8.13 Suppose X is Fv1 ,v2 . Find the degrees of freedom if


(a) P (X ≥ 2.49) = 0.05 (b) P (X ≤ 3.34) = 0.99

8.14 Verify that (a) F0.95;1,20 = t20.975,20 (b) F0.90;1,7 = t20.95,7

253
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables

STATISTICAL TABLES

254
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
Statistical Tables 277

Table I
Standard Normal Distribution
(Full Table)

 z
1 t2
Φ(z) = √ e− 2 dt
2π −∞

255
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
278 Advanced Topics in Introductory Probability

Table II
Standard Normal Distribution
(Half Table)


1 z t2
Φ(z) = √ e− 2 dt
2π 0

256
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
Statistical Tables 279

Table III
Normal Density Function

1 − t2
φ(z) = √ e 2

257
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
280 Advanced Topics in Introductory Probability

Table IV
Percentiles of Chi Square Distribution

258
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
Statistical Tables 281

Table V
Percentiles of t Distribution

259
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
282 Advanced Topics in Introductory Probability

Table VI
Percentiles of F Distribution
F0.90

260
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III
Statistical Tables 283Statistical Tables

Percentiles of F Distribution (continued )

261
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
284 THEORY – VOLUME IIIAdvanced Topics in Introductory ProbabilityStatistical Tables

Percentiles of F Distribution
F0.95

262
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
StatisticalTHEORY
Tables – VOLUME III Statistical Tables
285

Percentiles of F Distribution (continued )

263
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY
286 THEORY – VOLUME III Advanced Statistical Tables
Topics in Introductory Probability

Percentiles of F Distribution
F0.99

264
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III
Statistical Tables 287Statistical Tables

Percentiles of F Distribution (continued )

265
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Statistical Tables
288 Advanced Topics in Introductory Probability

Table VII
Random Numbers

266
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY:
Statistical A FIRST COURSE IN
Tables 289
PROBABILITY THEORY – VOLUME III Statistical Tables

Random Numbers (Continued)

267
ADVANCED TOPICS IN INTRODUCTORY
290
PROBABILITY: A FIRST COURSE INAdvanced Topics in Introductory Probability
PROBABILITY THEORY – VOLUME III Statistical Tables

Random Numbers (Continued)

268
ADVANCED TOPICS IN INTRODUCTORY
Statistical
PROBABILITY:Tables
A FIRST COURSE IN 291
PROBABILITY THEORY – VOLUME III Statistical Tables

Random Numbers (Continued)

269
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
292
PROBABILITY THEORY – VOLUME Advanced
III Topics in Introductory Probability Statistical Tables

Random Numbers (Continued)

270
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Answers to Odd-Numbered Exercises

Answers to Odd-Numbered Exercises

Chapter 1

X Y
1 2 3
3 5 1
1 34 34 34
1.1 2 4 6
2 34 34 34
1 2
3 34 34
0
4 1 5
4 34 34 34
9 7 211 e−λp (λp)x
1.3 (a) (c) 1.5 = 1 1.7 (c)
(2x + 1) 1.9 (a) , x = 0, 1, 2, · · · 1.11 (c) (i)
34 34 219 x!  
2(x + y − 2xy), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 1.13 12
7
(c) 12
7
x 2+ x , 0≤x
2
≤ 1 1.15 (a) λe−λx , x ≥ 0
(c) λe−λ(y−x) , y ≥ x 1.17 (a) (i)x e−x , y ≥ 0

Chapter 2

X +Y 2 3 4 5 6 7
2.1 (a) 3 7 6 12 1 5
P (X + Y = k) 34 34 34 34 34 34

XY 1 2 3 4 6 8 9 12
(c) 3 7 2 8 8 1 5
P (XY = k) 34 34 34 34 34 34
0 34

2X + 3Y 5 7 8 9 10 11 12 13 14 15 17
(e) 3 2 5 1 4 5 2 6 1 5
P (2X + 3Y = k) 34 34 34 34 34 34 34 34 34
0 34

3(X)4(Y ) = 12XY 12 24 36 48 72 96 108 144


(g) 3 7 2 8 8 1 5
P (12XY = k) 34 34 34 34 34 34
0 34

3X 3 6 9 12
(j) 9 12 3 10
P (3X = k) 34 34 34 34

X +Y 2 3 4
2.3 1 3 1
P (X + Y ) 8 8 2


2 2
λz e−λ 3
z (3 − z), 0≤z≤1
2.5(a) , z = 0, 1, 2, · · · 2.7(a) s(z) = 2 2
z!
3
z (4 − 3z 2 + z 3 ), 1≤z≤2

293

271
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Answers to Odd-Numbered Exercises
294 Advanced Topics in Introductory Probability

 5

 z3 , 0<z≤1
 14 3 z 1

2.9(a) s(z) = + , 1<z≤2

 7 2 3 
 3 5 3
z − 4z 2 +
11
z + 36 , 2 < z ≤ 3
7 6 2
2
(u3 + 1), −1 ≤ u < 0
2.17 6.027 × 10−5 2.19 (a) h(u) = 3
2
3
(1 − u3 ), 0≤u≤1
X 3
Y
1 2
2 3
2.21 4[1 − u(1 − ln u)], 0≤u≤1 2.23 X 2 7 3 8
P(Y = k)
 20  20 20 20
u2 2
1 + 23 u − 3
, −1 ≤ u < 0 9
(u + 2), 0≤u<1
2.25 h(u) = u2
2.27 h(u) = 2
1 − 43 u + , 0≤u≤1 9u3
(u + 2), 1≤u<∞
3
3 u2 2
2.29 7
(1 − 4
+ u ln( u )), 0 < u < 2

Chapter 3

3.1(a) 4.47 3.3 (a) 2.01 (c) 10.39 (e) 11.24 (g) 1496.32 (i) 52.80 (k)2.11 (m) 3.21 3.5(a) 52 (c) 2
15
(e) 0.04 (g) 0 (i) 0.08 3.15 6.46

Chapter 4

1
4.1 0.064 4.3 (a) 2.3 (c) 2.68 (e) 1.2 4.5 0 4.9 -0.1832 4.11 − 11 4.13(a)(i) −0.21x + 1.52
2
24x +24x+2
(ii) −0.16y + 1.63 4.15 9(2x+1)2
4.19 0.05y + 2.68

Chapter 5

5.1(a) 12 5.3 n ≤ 250 5.5(a) 0.85 5.7 0.84 5.9 1.3947 5.11 0.9648 5.13(a) 0.3409 (c) 0.8185
5.15 0.0062 5.17 1600 5.19 0.9927 5.21 n ≈ 25 5.23 n ≈ 85,  = 0.43 5.25(a) 0.1852
5.29 0.2328

Chapter 7

7.1(a) 3.66.8 (c) E(X) = 6.8, Var = 8.48 7.3(a) 6.8 7.5(a)(i) 140 (ii) 1.49 0.85 7.7(a)(i) 1,000
(ii) 7.9 s.d = 0.657 7.11(a)(i) 0.56 (ii) 0.4855 7.15(a) 0.004 7.17(a)(ii) 458.8

Chapter 8

8.3 12 8.5(a) E(X) = 12, Var = 24 8.7(a) 22.367 (c) 10.285 8.9(a) 1.701 (c) 1.321 8.11 1.93

272
ADVANCED TOPICS IN INTRODUCTORY
PROBABILITY: A FIRST COURSE IN
PROBABILITY THEORY – VOLUME III Bibliography

Bibliography

Applebaum, D., (1996), Probability and Information: An Integrated Approach , Cambridge


University Press, Cambridge.

Bajpal, A.C., Calus, L.M. and Fairley, J.A., (1978), Statistical Methods for Engineers and
Scientists, John Wiley and Sons, New York.

Barlow, R. (1989), Statistics, A Guide to the Use of Statistical Methods in the Physical
Sciences, John Wiley and Sons, New York.

Bartoszyński, R and Niewiadomska-Bugay, M (1996), Probability and Statistical Inference,


John Wiley and Sons, New York.

Beaumont, G.P. (1986), Probability and Random Variables, Ellis Norwood Ltd, Chichester.

Berger, M.A. (1993) An Introduction to Probability and Stochastic Processes, Springer-


Verlag, New York.

Billinsley, P. (1979) Probability and Measure, John Wiley and Sons, New York.

Birnbaum, Z.W. (1962), Introduction to Probability and Mathematical Statistics, Harper


and Brothers Publishers, New York.

Blake, Ian F.B. (1979), An Introduction to Applied Probability, John Wiley and Sons, New
York.

Bradley, J., (1988), Introduction to Discrete Mathematics, Addison-Wesley Publishing


Company, Reading.

Breiman L., (1972), Society for Industrial and Applied Maths, Siam, Philadelphia.

Brémaud, Pierre (1994), An Introduction to Probabilistic Modeling, Springer - Verlag, New


York.

Chow, Y.S., Teicher, H. (1988), Probability Theory, Springer-Verlag, New York.

David F.N. (1951), Probability Theory for Statistical Methods, Cambridge University Press,
Cambridge.

Drake, A.W. (1967), Fundamentals of Applied Probability Theory, McGraw-Hill, New


York.
296 Advanced Topics in Introductory Probability
Dudewicz, E.J. and Mishra, S.N. (1988), Modern Mathematical Statistics, John Wiley and
Sons, New York.

Feller, W. (1970) An Introduction to Probability Theory and its Applications Vol. I, Second
Edition, John Wiley and Sons, New York.295
Feller, W. (1971) An Introduction to Probability Theory and its Applications Vol. II,
Second Edition, John Wiley and Sons, New York.

Freund, J.E.and Walpole, R.E. (1971), Mathematical Statistics, Prentice - Hall, Inc, En-
glewood Cliffs, New Jersey.

Fristedt, B. Gray, L. (1997), Modern Approach to Probability Theory, Birhhäuser, Boston.

Gnedenko, B.V. and Khinchin, A.Ya. (1962), An Elementary Introduction to the Theory
of Probability, Dover Publications, Inc., New York.

Goldberg, S. (1960), Probability, An Introduction, Prentice-Hall, Inc., Englewood Cliffe,


New Jersey.

Grimmeth, G. and Weslsh, D. (1986), Probability, An Introduction, Clarendon Press, Ox-


ford.

Guttman, I., Wilks, S.S and Hunter, J.S. (1982), Introduction Engineering Statistics, John
Wiley and Sons, New York.

Haln, G.J., and Shapiro, S.S. (1967) Statistical Models in Engineering, John WIley and
Sons.

Hoel, P.G. (1984), Mathematical Statistics, Fifth ed., John Wiley and Sons, New York.

Hoel, P.G., Port, S.C, and Stone, J.S. (1971), Introduction to Probability Theory, Houghton
Mifflin.

Hogg, R.V. and Cragg, A.T. (1978), Introduction to273


Mathematical Statistics, Fourth Edi-
tion, Macmillan, New York.
Grimmeth, G. and Weslsh, D. (1986), Probability, An Introduction, Clarendon Press, Ox-
ford.

ADVANCED TOPICS
Guttman, IN INTRODUCTORY
I., Wilks, S.S and Hunter, J.S. (1982), Introduction Engineering Statistics, John
PROBABILITY:
Wiley andASons,
FIRSTNewCOURSE
York. IN
PROBABILITY THEORY – VOLUME III Bibliography
Haln, G.J., and Shapiro, S.S. (1967) Statistical Models in Engineering, John WIley and
Sons.

Hoel, P.G. (1984), Mathematical Statistics, Fifth ed., John Wiley and Sons, New York.

Hoel, P.G., Port, S.C, and Stone, J.S. (1971), Introduction to Probability Theory, Houghton
Mifflin.

Hogg, R.V. and Cragg, A.T. (1978), Introduction to Mathematical Statistics, Fourth Edi-
tion, Macmillan, New York.

Johnson, N.L. and Kotz, S.(1969) Distributions in Statistics: Discrete Distributions , John
Wiley and Sons, New York.

Johnson, N.L. and Kotz, S.(1970) Distributions in Statistics: Continuous Multivariate


Distributions, John Wiley and Sons, New York.

Johnson, N.L. and Kotz, S.(1970) Distributions in Statistics: Continuous Univariate Dis-
tributions 1 & 2, John Wiley and Sons, New York.

Kendal, M.G. and Stuart, A. (1963) The Advanced Theory of Statistics,, Vol. 1, 2nd ed.,
Charles Griffin & Company Ltd., London.

Kendal, M.G. and Stuart, A. (1976) The Advanced Theory of Statistics,, Vol. 3, 4th ed.,
Charles Griffin & Company Ltd., London.

Kendal, M.G. and Stuart, A. (1979) The Advanced Theory of Statistics,, Vol. 2, 4th ed.,
Charles Griffin & Company Ltd., London.
Bibliography 297
Kolmogorov, A.N. (1956), Foundations of the Theory of Probability, Chelsea Publishing
Company, New York.

Lai, Chung Kai (1975), Elementary Probability: Theory with Stochastic Processes, Springer
- Verlag, New York.

Lamperti, J. (1966), Probability: A Survey of the Mathematical Theory, W.A. Benjamin,


INC., New York.

Lindgren, B.W. (1968) Statistical Theory, McMillan Company, New York.

Lupton R., (1993), Statistics in Theory and Practice, Princeton, Princeton University
Press.

Mendenhall, W. and Scheaffer, R.L. (1990), Mathematical Statistics With Applications,


PWS-KENT Publishing Company, Mass.

Meyer, D. (1970), Probability, W. A. Benjamin, Inc. New York.

Meyer, P.L. (1970), Introductory Probability and Its Applications, 2nd Edition, Addison
Wesley Publishing Company.

Miller, I. and Freund, J.E. (1987), Probability and Statistics for Engineers, Third Edition,
Prentice-Hall, Englewood Cliffs, New Delhi.

Mood, A.M., Graybill, F.A. and Boes, D.C (1974), Introduction to the Theorey of Statis-
tics, Third Edition, McGraw-Hill, New York.

Nsowah-Nuamah, N.N.N., (2017), A First Course in Probability Theory, Vol. I: Introduc-


tory Probability Theory and Random Variables, Ghana Universities Press, Accra, Ghana.

Nsowah-Nuamah, N.N.N., (2018), A First Course in Probability Theory, Vol. II: Theo-
retical Probability Distributions, Ghana Universities Press, Accra, Ghana.

Page, L.B. (1989), Probability for Engineering with Applications to Reliability, Computer
Science Press.

Prohorov, Yu.V. and Rozanov, Yu.A. (1969) Probability Theory, Springer-Verlag, New
York.

Robinson, E.A. (1985), Probability Theory and Applications International Human Re-
sources Dev. Corporation.

Ross, S. (1984), A First Course in Probability, 2nd Edition, Macmillan Publishing Com-
pany.

Spiegel, M.R (1980), Probability and Statistics, McGraw-Hill Book Company, New York.

Stirzaker, D.(1994), Elementaty Probability, Cambridge University Press, Cambridge.

Stoyanov. J, et. al. (1989), Exercise Manual in Probability Theory, Kluwer Academic
Publishers, London.
274
Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and
Kendal, M. (1970), Charles Griffin & Company Ltd., London.
Prohorov, Yu.V. and Rozanov, Yu.A. (1969) Probability Theory, Springer-Verlag, New
York.
ADVANCED TOPICS IN INTRODUCTORY
Robinson, E.A. (1985), Probability Theory and Applications International Human Re-
PROBABILITY: A FIRST
sources Dev. COURSE IN
Corporation.
PROBABILITY THEORY – VOLUME III Bibliography
Ross, S. (1984), A First Course in Probability, 2nd Edition, Macmillan Publishing Com-
pany.

Spiegel, M.R (1980), Probability and Statistics, McGraw-Hill Book Company, New York.

Stirzaker, D.(1994), Elementaty Probability, Cambridge University Press, Cambridge.

Stoyanov. J, et. al. (1989), Exercise Manual in Probability Theory, Kluwer Academic
Publishers, London.

Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and
Kendal, M. (1970), Charles Griffin & Company Ltd., London.
298 Advanced Topics in Introductory Probability
Studies in the History of Statistics and Probability, Vol. 1, edited by Pearson, E.S., and
Kendal, M. (1977), Charles Griffin & Company Ltd., London.

Trivedi, K.S. (1988), Probability and Statistics with Reliability, Queuing, and Computer
Science Applications, Prentice-Hall of India, New Delhi.

Tucker, H.G. (1963), An Introduction to Probability and Mathematical Statistics, Academic


Press, New York.

Wayne, W. D. (1991), Biostatistics: A Foundation for Analysis in the Health Sciences,


Fifth Edition, John Wiley and Sons, Singapore.

Wilks, S.S. (1962), Mathematical Statistics, John Wiley and Sons, New York.

275

You might also like