Professional Documents
Culture Documents
Probability
Distributions
by
Dr. Saleha Naghmi Habibullah
Edited by
Mahar Afaq Safdar Muhammadi
MSc Math, Whatsapp 03494873115
Topic No. 1
Random
Experiment,
Outcomes, Sample Space
& Events
Random Experiment
Any experiment or process in
which the outcome is
unpredictable that is eligible to
be called a random
experiment.
Example 1:
Tossing of a Coin => a Head or
a Tail.
However, until it lands on the
floor, we do not know which of
these two will turn up.
Example 2:
Survey of the Employees of an
Organization:
Possible outcomes:
single, married, separated, divorced,
widowed
Definition of
Random Variable
Concept of a Random
Variable
It is that variable which is associated
with a random experiment.
Example:
.
Let the random experiment be the toss of
a fair coin and the possible outcomes are
Head and Tail. And the sample space
associated with the experiment denoted
by C be C = {H,T}, where H and T
represent heads and tails, respectively.
Let X be a function such that
X(T) = 0 and X(H)=1. Thus X is a
real-valued function defined on
sample space C.
Probability Distributions
by
Dr. Saleha Naghmi
Habibullah
Topic No. 3
Example of
Random Variable
Example:
Let a card be selected from an
ordinary deck of playing cards. The
outcome is one of these 52 cards.
Suppose that P assigns a probability
of 1/52 to each outcome c.
Discrete Random
Variable
Definition
(Discrete Random Variable)
We say a random variable is a
discrete random variable if its
space is either finite or
countable.
Example:
Consider a sequence of independent flips of a
coin, each resulting in a head (H) or a tail (T).
Moreover, on each flip, we assume that H and
T are equally likely; that is, P(H)=P(T)=1/2.
x p(x)
0 1/2
1 1/2
sum 1
Ʃ=
x −1 x
1 1 1
P ( X = x) = = , x = 1,2,3,...
2 2 2
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 5
Concept of
Probability Mass
Function
What is Probability Mass
Function?
Consider a random experiment with a
sample space, finite or countably
infinite. A function X, which assigns to
each element c ϵ C one and only one
number X(c) = x , is called a random
variable. If we express the probabilities
against each random values of X
algebraically, then, we can say that
this is the probability mass function of
X.
Example:
Toss a fair die, the possible
values are 1,2,3,4,5,6.
Algebraically:
1
P ( X = x) = , x = 1, 2,3, 4,5, 6.
6
Definition
The probability mass
function (pmf) of X is given
by
p X ( x ) = P X = x , for x D. (1)
The PMFs satisfy two
properties:
1) 0 p X ( x ) 1, xD
2) xD
pX ( x ) = 1 ( 2)
CDF of a
Discrete Random
Variable
Example:
Suppose we roll a fair die
with the numbers 1 through
6 on it. Let X be the upface
of the roll. Then the space of
X is {1,2,….,6} and its pmf
is
p X ( i ) = 1 , for i = 1, 2,..., 6.
6
F ( 6 ) = P ( X 6 ) = 1/ 6
or
1 1 1 1 4
F ( 4) = P ( X 4) = + + + =
6 6 6 6 6
x p(x) F(x)
1 1/6 1/6
2 1/6 =1/6+1/6=2/6
3 1/6 =2/6+1/6=3/6
4 1/6 =3/6+1/6=4/6
5 1/6 =4/6+1/6=5/6
6 1/6 5/6+1/6=6/6=1
The height of each ‘Jump’ yields
the probability of that particular
value of X.
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 6
Example of
Probability Mass
Function
Example:
Consider and urn which contains slips of paper
each with one of the numbers 1,2,…,100 on it.
suppose these are i slips with the number i on it
for i = 1,2,…, 100. For example, there are 25
slips of paper with the number 25. assume that
the slips are identical except for the numbers.
Suppose one slip is drawn at random. Let X be
the number of slip.
1 + 2 + 3 + ... + 50
P ( X 50 ) =
5050
or
n ( n + 1) 50 ( 51)
51
P ( X 50 ) = 2 = 2 =
5050 5050 202
P ( X 50 ) = 0.2525
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 7
Continuous
Random Variable
Concept of a Continuous
Random Variable
In the continuous case, we
cannot define probability on
one particular point, instead, it
is the probability defined on
an interval.
Example:
If we are measuring a height of a
student. It depends on the
refinement of that measuring
instrument.
Suppose if we say that this
person is 5 feet 4 inches tall, or
in other words we are saying he
is 64 inches tall. This is the
measurement at a crude level.
• In reality it is a some value between 63.5
and 64.5, if we use a more refined
instrument and after measuring we say that
it is actually 64.2 inches. Even this is not
the actual truth, it is the height somewhere
between 64.15 and 64.25 inches.
• It all depend on the measuring instrument
that how much precision can that
instrument measures.
• Theoretically there can be an infinite
number of decimal places after the
decimals. This is the basic concept that
variable can assume any value.
Cumulative Distribution
Function F(x)
x
FX ( x ) = f X ( t ) dt
−
The function fX(t) is called a
probability density function
(pdf) of X. If fX(x) is also
continuous, then the
Fundamental Theorem of
Calculus implies that
d
FX ( x ) = f X ( x )
dx
The PDFs satisfy the two
properties:
1)
f X ( x ) dt = 1 and
−
2) f X ( x ) 0
In the Continuous case, the area
under the curve of that function
gives you the probability.
Example of
Probability
Density Function
Example:
In continuous case, consider the
following simple experiment:
choose a real number at random
from the interval (0,1).
PX ( a, b ) = b − a,
for 0 a b 1.
It follows that the pdf of X is
1 0 x 1
fX ( x) =
0 elsewhere.
For example, the probability that X is
less than an eighth or greater
than seven by eighths is
1
1 7 8 1
P X X = dx + dx
8 8 0 7
8
1
=x +x
1
8 7
0
8
1 7
= − 0 + 1 −
8 8
1 1
= +
8 8
2
=
8
1 7 1
P X X =
8 8 4
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 9
1 z
G ( z) = +
2 2
Example:
For Poisson Distribution =G ( z ) = e ( z −1)
−
( z ) −
( z )
1 2
e e
p ( x )z x
=e −
+
1!
+
2!
+ ...
( ) ( )
2 3
z z
( ) = −
1 + ( z ) + +
1
p x z x
e ...
2! 3!
G ( z) = e −
e z 2 3
e = 1 + + + + ...
− + z ( z −1) 2! 3!
=e =e
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 10
p (k ) = P ( X = k ) =
G (k )
( 0) (1)
k!
Example:
The probability generating function of a Bernoulli
random variable with parameter p. So the probability
generating function of X, where X represents the
number of heads that we obtain when tossing a fair
coin once is given by
G ( z ) = 1/ 2 + z / 2
With the help of eq. (1), we take the
derivative
1 z
lim G ( z ) = +
x → 2 2
1 z
+
G ( 0 ) 2 2
(0)
1
P ( 0) = P ( X = 0) = = =
x → 0! 0! 2
z =0
1 z
+
G (1) ( 0 ) 2 2 1
P (1) = P ( X = 1) = = =
x → 1! 1! 2
z =0
MEAN
and VARIANCE
through the
Probability generating
function
The expectation of X is given by
E ( X ) = G (1 )
−
(1)
where,
G’(1−) = limz→1G’(z) from below.
Example:
Poisson Distribution (Mean)
G (z) = e ( z −1)
G ( z ) = e ( z −1)
G (1− ) = lim G ( z )
z →1
= e (1−1) = e ( 0) = e0
G (1− ) =
So the variance of X is given by
Var ( X )
= G (1 ) + G (1 ) − G (1 )
2
− − −
(2)
Example:
Poisson Distribution (Variance)
G ( z ) = . e ( z −1) = 2 e ( z −1)
G (1− ) = 2 e (1−1) = 2 e ( 0) = 2
G (1− ) = From eq. (2)
V ( X ) = + −
2 2
V ( X ) =
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 11
Algebraic
Expressions
of Some Well-known
PGFs
For some of the more
common distributions, the
PGFs are as follows:
(i ) Constant r.v. -
if pc = 1, pk = 0, k c, then
GX ( z ) = E ( z X ) = z c
( ii ) Bernoulli r.v. - if
p1 = p, p0 = 1 − p = q, pk = 0, k 0 or 1,
then
GX ( z ) = E ( z X ) = q + pz
( iii ) Grometric r.v. - if
pk = pq k −1 , k = 1, 2,...; q = 1 − p, then
pz
GX ( z ) = if z q −1
1 − qz
( iv ) Binomial r.v. − if
X Bin ( n, p ) , then
GX ( z ) = ( q + pz ) , (q = 1 − p )
n
(v) Poisson r.v. − if
X Poisson ( ) , then
1 k − k
GX ( z ) = e z = e ( z −1)
k =0 k !
( vi ) Negative binomial r.v. − if
X NegBin ( n, p ) , then
k − 1 n k − n k
GX ( z ) = p q z
k =0 n − 1
n
pz −1
= if z q and p + q = 1
1 − qz
Discrete Uniform Distribution
Example:
Random selection of X p(x) x x
p(x)
z z
one of the ten digits 0 1/10 1/10
z0
0,1,2,...,9 1 1/10 z1 = z z /10
2 1/10 z2 z 2 /10
3 1/10 z3 z 3 /10
. . . .
. . . .
. . . .
9 1/10
z9 z 9 /10
06:5 Therefore,
0 G ( z) =
1
1 + z + z 2 + z 3 + ... + z 9
10
This is a finite geometric series in
which the first term a is 1 and
the common ratio r = z and n = 10
So,
a (1 − r n ) 1(1 − z10 )
Sum = =
1− r 1− z
1 1 − z10
G ( z) =
10 1 − z
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 12
A Linear Combination
of
PDFs is a PDF
In mathematics , a linear combination is an
expression constructed from a say of terms by
multiplying each term by a constant and adding
those results
(e.g. a linear combination of x and y would be any
expression of the form ax+by, where a and b are
constants).
Let,
1: Consider k continuous-type distributions
with the following characteristics:
pdf fi ( x ) ,
mean i , i=1,2,...k.
If ci 0, i = 1, 2,..., k ,
and
c1 + c2 + ... + ck = 0
w ( x ) = c1 f1 ( x ) + c2 f 2 ( x ) + ... + ck f k ( x ) 0
To prove
Let, w ( x ) dx = 1
−
Pr oof :
L.H .S = c1 f1 ( x ) + c2 f 2 ( x ) + ... + ck f k ( x ) dx
−
= c1 f1 ( x ) dx + c2 f 2 ( x ) dx + ... + ck f k ( x ) dx
− − −
w ( x ) dx = c1 f1 ( x ) dx + c2 f 2 ( x ) dx + ... + ck f k ( x ) dx (1)
− − − −
But f i ( x ) is a pdf
fi ( x ) dx = 1
−
eq.(1) re − written
L.H .S = c1 (1) + c2 (1) + ... + ck (1)
= c1 + c2 + ... + ck
But by our choice
c1 + c2 + ... + ck = 1
L.H .S = 1 = R.H .S
Show that the mean of the distribution having
pdf c1 f1 ( x ) + c2 f 2 ( x ) + ... + ck f k ( x ) are mean = i =1 ci i
k
E ( new density ) w ( x ) dx = 1
−
E(X ) = x c1 f1 ( x ) + c2 f 2 ( x ) + ... + ck f k ( x ) dx
−
= c1 xf1 ( x ) + c2 xf 2 ( x ) + ... + ck xf k ( x ) dx
−
= c1 xf1 ( x ) dx + c2 xf 2 ( x ) dx + ... + ck xf k ( x ) dx
− − −
by
Concept of
Cumulative Distribution
Function
(discrete and continuous)
Definition: (Cumulative Distribution Function)
( )
P c C : X ( c ) x to P ( X x ) .
Also,
FX(x) is often called simply the distribution function
(df).
• The graph for the discrete case it is like a staircase, which is
also called a Step function.
by
Sum 5050/5050=1
Show that the cdf of X is
F(x) = [x]([x]+1)/10100, for 1
≤ x ≤ 100, where [x] is the
greatest integer in x.
Therefore,
x ( x + 1)
F ( x) =
2 5050
x ( x + 1)
F ( x) =
10100
Virtual University of Pakistan
Probability Distributions
by
Obtaining the
PMF from the CDF
For each of the following CDFs F(x), find the PDF of f(x):
j
1
F ( x ) = j =1 .
x
(a)
2
Virtual University of Pakistan
Probability Distributions
by
Example of the
CDF of a
Continuous Random Variable
For a continuous random variable X ,the CDF is as
follows:
• F(x) = 0 for all x values which are less than zero
• F(x) = x for all those x values that lies between 0 and 1.
• F(x) = 1 for all those x values are greater than 1.
PDF of the Exponential Distribution
with mean=0.5, 1, 1.5
x
1 − 0.5
• f ( x) = e , 0 x → orange
0.5
• f ( x) = e −( x )
, 0 x → purple
• → skyblue
x
1 − 1.5
f ( x) = e , 0 x
1.5
CDF of the Exponential Distribution with mean=0.5, 1, 1.5
F ( x ) = 1 − e− x , 0 x
x
−
F ( x) = 1− e 0.5
, 0 x
F ( x ) = 1 − e− x , 0 x
x
−
F ( x) = 1− e 1.5
, 0 x
Virtual University of Pakistan
Probability Distributions
by
Concept of two
Random Variables being Equal
in Distribution
Let us first consider that situation when two random
variables X and Y are not equal in distribution
FX ( x ) FY ( x )
We say that X and Y are equal in distribution X DY if and only if
0 if x 0
FX ( x ) = x if 0 x 1 ( 5.6 )
1 if x 1
For instance, in the above example define the random variable Y
and transform it as Y= 1-X.
Then Y ≠ X.
by
First property
of Cumulative Distribution
Function and its Proof
Theorem 1:
Then
Part (a):
Because of the fact that a < b, we can see that the interval being
{X ≤ a} ⸦ {X ≤ b}
{X ≤ a}
is a subset of
{X ≤ b}
Theorem:
If C1 and C2 are events such that C1 ⸦ C2 , then , P(C1) ≤ P(C2).
by
Second property
of Cumulative Distribution Function
and its Proof
Theorem 1:
Then,
Therefore, F(-∞)= 0
Mathematically,
by
Third property
of Cumulative Distribution
Function and its Proof
Theorem 1:
Then,
Therefore, F(∞)= 1
by
Fourth property
of Cumulative Distribution
Function and its Proof
Theorem 1:
Then
by
Evaluating
Probabilities using CDF
Theorem 1:
Then,
for a < b,
P[a < X < b]= Fx (b) - Fx (a).
Proof:
Since the interval
{-∞< X < b} = {-∞< X < a} U
{a< X < b}.
=(1/1.64872)-(1/2.71828)
=0.6065-0.3679 =0.2386
=23.86%
Virtual University of Pakistan
Probability Distributions
by
Derivative
Of the CDF is the PDF
d
FX ( x ) = f X ( x ) (1)
dx
Example:
Consider the CDF of the exponential distribution with
mean = 1
i.e. F(x)=1 - e-x
Then, the PDF is given by
FX ( x ) = (1 − e − x ) = 0 − (−1)e − x = e − x ,
d d
x0
dx dx
2: Another Example
Then pdf is
d d x 1 d 1
FX ( x ) = = ( x) = , 0 x2
dx dx 2 2 dx 2
by
1/2 x =1/2 1 1
3 x dx = x = − =0
2 3
1/2 x =1/2 8 8
An implication of the fact that P(X=x) = 0 for all x when X is
continuous is that you can be careless about the endpoints of
intervals when finding probabilities of continuous random
variables.
That is:
P(a < X < b)= P(a < X < b)= P(a < X < b)= P(a < X < b)
by
The concept of
Monotonicity
Monotonically Increasing Function
In calculus, a function defined on a subset
of the real numbers with real values is
called monotonic if and only if it is either
entirely non-increasing, or entirely non-
decreasing.
“Mono-tone”
The concept of the CDF, will always be monotonically
increasing function.
Virtual University of Pakistan
Probability Distributions
by
Total probability is 1
Total probability is 1
The total probability associated with a random variable X of the
discrete type with pmf pX(x)
ƩxϵD pX(x)=1
or
ʃD fX(x)dx=1,
where D is the space of X.
The definition for the pdf of a continuous random
variable differs from the definition for the pmf of a
discrete random variable by simply changing the
summations that appeared in the discrete case to
integrals in the continuous case.
It is very well-known that pmfs satisfy the two properties
1 3 1
x
0 3x dx = 3 3
31
2
=x = 13 − 03 = 1
0
0
Virtual University of Pakistan
Probability Distributions
by
Example
showing the determination of an
unknown constant c
Such that f(x) or p(x) is a the PMF
Example:
x 1 2 3
2 2 2 2
1 = p ( x ) = c = c + c + c + ...
x =1 x =1 3 3 3 3
2 1 2 2 2 3
= c + + + ...
3 3 3
What we have inside the bracket is an Infinite Geometric series,
2 2
3 3
1= c = c = c ( 2) and hence
1
c= .
1− 2 1 2
3 3
As an illustration, suppose we want to find a probability of
x=4. x
1 12
Then putting c = , we have p ( x ) = .
2 23
now putting x=4,
we obtain,
4
12 1 16 8
p ( 4) = = = = 0.099
23 2 81 81
Virtual University of Pakistan
Probability Distributions
by
Example
showing the determination of an
unknown constant c
Such that f(x) is a the PDF
Example
for a constant c
Solution:
2
2
x 4
2 0
4
16
1 = cx dx = c = c − = c = c 4 = 4c,
3
0 4 0 4 4 4
and hence, c=1/4.
So, now the pdf will be as follows
x3
0 x2
fX ( x) = 4
0
elsewhere
For illustration of the computation of a probability
involving X, we have
1
3
1 x 255
P X 1 = dx = = 0.0623.
4 1/4 4 4096
Virtual University of Pakistan
Probability Distributions
by
Concept of Transformation
of a Discrete variable
(when the transformation is one-to-one)
We have a discrete random variable X and we know its
distribution.
Dy={g(x):x ϵ Dx}.
Let us assume that g is one-to-one.
y = g ( x ) = x4
maps Dx = x : x = −1, −2, −3,... onto
Dy = y : y = 1, 16, 81,...
We have
y = g ( x ) = x 4 x = g −1 ( y ) = 4 y
So, utilizing the equation
(
P Y = y = P X = 4
) 1
y =
2
, y = 1, 16, 81, ...
by
Concept of Transformation of a
Discrete Variable
(when the transformation is not one-to-one)
Consider a sequence of independent flips of a fair coin,
each resulting in a head H or a tail T.
i.e., Xϵ {1,3,5,…}.
The probability of this event is
2 x −1
1
P[ X ò1 or 3or 5 or ] =
x =1 2
Next ,
= (1/ 2)1 + (1/ 2)3 + (1/ 2 ) +
5
a
= a = (1/ 2 ) , r = (1/ 2 ) = 1/ 4
2
And infinite geometric series,
1− r
1 1
2 x −1
1 2
Required probability = 2 = 2 = .
x =1 2 3
1 3
1−
4 4
We have already determined that the probability that X is odd
is 2/3.
So, the probability that player A will win 1$ is 1/3 and the
probability that he will lose 1 $ is 2/3.
Topic No. 31
zero elsewhere,
0 y0
y
1
FY ( y ) = dx = y 0 y 1
− y 2
1 1 y
Hence, the pdf of Y is given by
1
0 y 1
fY ( y ) = 2 y
0
elsewhere.
d
dy ( 0 ) = 0, y 0
d
d d 1 12 −1 1 − 12 1
f ( y) = F ( y) = y= y= y = y = , 0 y 1
dy dy dy 2 2 2 y
d
(1) = 0, y 1
dy
1
, 0 y 1
or f ( y ) = 2 y
0
elsewhere.
Hence we can see that the shape of the PDF of the transformed
variable is very different from the shape of the PDF of the original
variable.
Virtual University of Pakistan
Probability Distributions
by
Transformation of
a Continuous Variable
using the Jacobian of transformation
Theorem 1:
FY ( y ) = P Y y = P g ( x ) y
= P X g −1 ( y ) = FX ( g −1 ( y )). (2)
Hence, the pdf of Y is
d d dx
fY ( y ) = −1 −1
( FY ( y )) = ( FX ( g ( y ))) = f X ( g ( y )) . ( 3)
dy dy dy
by
Example
of Transformation of
a Continuous Variable
(using the Jacobian of transformation)
Example : Let X have the pdf
1 0 x 1
f ( x) =
0 elsewhere
Consider the random variable Y= -2logX.
−y
de 2 1 −y2
J= =− e
dy 2
by
Another example
of Transformation of a Continuous variable
(using the Jacobian of transformation)
Example:
Let X have the uniform pdf
1
f X ( x) = , for – x .
2 2
f X ( tan ( y ) ) J = 1 + y 2
−1 1
− y
fY ( y ) = ( )
0
elsewhere
This is the pdf of a Cauchy distribution which is one of the
well-known distributions.
Virtual University of Pakistan
Probability Distributions
by
Mode
of Discrete Random Variable
Definition
A mode of the distribution of a random variable X is a value
of x that maximizes the pdf or pmf.
X P(x)
1 (1/2)1
2 (1/2)2
3 (1/2)3
4 (1/2)4
- -
- -
Therefore, simply by inspection we say that the mode = 1.
Virtual University of Pakistan
Probability Distributions
by
Mode
of a
Continuous Random Variable
Definition
1 2 −x
x e , 0 x ,
c) f ( x ) = 2
zero elsewhere.
Solution:
Taking the first derivative w.r.t. ‘x’
d 1
f ' ( x ) = x 2e− x
dx 2
1 d 2 −x
= x e
2 dx
1 2 −x
f '( x) = x e ( −1) + e − x 2 x
2
1 1
= − x 2 e − x + 2 xe − x
2 2
1
f ' ( x ) = xe − x − x 2e − x
2
Now, equating it to zero, we get
1
xe − x − x 2 e − x = 0
2
1 1
xe − x = x 2 e − x 2 = − x x 2e − x
2 xe
x = 2
1
f ( x ) = xe − x − x 2e − x
2
Now, taking the second
derivative
f ( x ) = ( x.e − x ( −1) + e − x (1) ) − ( x e ( −1) + e − x 2 x ) = − xe − x + e − x + x 2e − x − xe − x
1 2 −x 1
2 2
1 2 −x
f ( x ) = e − x − 2 xe − x + xe
2
putting the value of x in the second derivative, we get
1
f ( 2 ) = e −2 − 2 ( 2 ) e −2 + ( 2 ) e −2 = e −2 − 4e −2 + 2e −2 = −e −2 = −0.135 which is less than zero
2
by
Median
Of a
Discrete Random Variable
Definition
The median of a distribution of a random variable X of the
discrete or continuous type is a value of x such that
1 1
P ( X x) and P ( X x )
2 2
Trial and Error
81 108 189
P ( X 2) = + = = 0.74 0.5
256 256 256
2 cannot be the median
Try
81
P ( X 1) = = 0.32 0.5
256
So,
81 108 189
P ( X 1) = + = = 0.74 0.5
256 256 256
Both requirements fulfilled x = 1
Virtual University of Pakistan
Probability Distributions
by
Median of a
Continuous Random Variable
Definition
The median of a distribution of a random variable X of the
discrete or continuous type is a value of x such that P(X<x) < ½
and P(X<x) > ½ .
by
Concept of
(100p)th percentile
(quantile of order p)
of a Continuous Random Variable
Definition
Explanation:
By a quantile of order p, we mean that point on the x-axis
to the left of which the area under the curve of the
probability density function is equal to p or in other words
is equal to 100 >< p %.
Now, we know that, the point to the left of which the area is,
say, 20% is known as the 20th percentile, the point to the left
of which the area is, say, 35% is known as the 35th percentile
and so on.
Therefore, that point to the left of which the area under the
curve is equal to 100 >< p % will be known as the
(100 >< p)th percentile.
Explanation of the way in which these two equations are
fulfilled in the case of a continuous variable.
P( X p) p (1)
P( X p) p ( 2)
(1) & (2) need to be fulfilled simultaneously. But the
LHS’s of both (1) and (2) are
P( Xthe
csame
) = P( X c)
Hence it is obviously that both (1) & (2) can hold
simultaneously of and only of the two inequalities are
replaced by the equal sign
P( X p) = P( X p) = p
p
or f ( x ) dx = p
−
Find the 20th percentile of the distribution that has pdf f(x)=4x3, 0< x< 1,
zero elsewhere.
Hint: With a continuous-type random variable X, P(X< ξp) = P(X< ξp)
and hence that common value must equal p.
0.20 :
0.20
0
4 x3dx = 0.20
4 0.20 1
x
4 = 0.20 0.20 − 0 = 0.20 0.20 = 0.20 0.20 = ( 0.20 )
4 4 4
4
4 0
0.2
0
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 40
Real-Life Example
of the computation of the (100p)th percentile
(quantile of order p)
of a Continuous Random Variable
Let X be the number of gallons of ice cream that is requested at a
certain store on a hot summer day.
X
where X represents the number of gallons requested i.e., the
demand
The store will be exhausting its supply only if the demand
exceeds the amount in hand.
We want the probability of this circumstance to be only
0.10.
Hence, it is obvious that we are talking about 90th
percentile of the distribution of the random variable X.
2x
The pdf of X is given by f ( x ) = , 0 x , when =1and =2
(1 + x ) 2 2
0.90 0.90
2x 1
f ( x) = dx = 0.90 − = 0.90
(1 + x ) 1 + x2
2 2
0 0
1 1 1
− = −0.90 − 1 = −0.90
1+ 2
1 + 0 1 + 0.90
2
0.90
1 1 1
= 1 − 0.90 = 1 + 2
− 1 = 2
9 = 2
2
= 9
1 + 0.90
2
0.10 0.90
0.10 0.90 0.90 0.90
0.90 = 3
f(x)
0.9
0
0.1
0
Virtual University of Pakistan
Probability Distributions
by
Inverse CDF
or
Quantile function
• The quantile function is also called the inverse
cumulative distribution function.
For a given probability distribution,
the quantile function specifies the quantile of order p for all
values of p lying between 0 and 1.
f ( x) = e − x 0 x , 0
Expected value or (mean) 1/λ
The cumulative distribution function of Exponential
Distribution
1 − e − x x 0
F ( x) =
0 x0
The quantile function for Exponential(λ) is derived by finding
the value of x for which
p = 1 − e− x e− x = 1 − p
Taking log on both sides
ln e − x = ln (1 − p ) = − x ln e = ln (1 − p )
− x = ln (1 − p ) ln e = 1
ln (1 − p )
x =−
ln (1 − p )
Q( p; ) = − , for 0 p 1.
The quartiles are therefore,
first quartile (p = 1/4)= ln (4/3)/ λ,
median (p=2/4)= ln (2) / λ
and third quartile (p=3/4) = ln (4)/ λ
Virtual University of Pakistan
Probability Distributions
by
X ( )
for all real z, with strictFinequality
( )
z F holding
Y z , for at least one z value.
This requires that the CDFs enjoy the following property:
FY ( z ) FX ( z ) ,
Now, FX ( z ) = P ( X z )
whereas, FY ( z ) = P (Y z ) = P ( X + z ) = P ( X z − )
P ( X z − ) P ( X z) (1)
eq. (1) can be re-write as
P( X + z) P( X z)
or
P (Y z ) P ( X z )
or FY ( z ) FX ( z ) .
Example:
Let X be a continuous random variable with support (-∞, ∞). Consider
the random variable Y = X + , where > 0. Suppose that we want to
show that Y is stochastically larger than X.
Solution:
First and foremost, let us re-state the above definition interchanging the
roles of X and Y.
This requires that the CDFs enjoy the following property:
FY ( z ) FX ( z ) ,
Now, FX ( z ) = P ( X z )
whereas, FY ( z ) = P (Y z ) = P ( X + z ) = P ( X z − )
P ( X z − ) P ( X z) (1)
eq. (1) can be re-write as
P( X + z) P( X z)
or
P (Y z ) P ( X z )
or FY ( z ) FX ( z ) .
Therefore the random variable Y = X + , where > 0.
has showed that Y is stochastically larger than X.
Virtual University of Pakistan
Probability Distributions
by
Concept of
Mathematical
Expectation
for Discrete and Continuous
Random Variables
Definition 8.1. (Expectation). Let X be a random variable. If
X is a continuous random variable with pdf f(x) and
x f ( x ) dx ,
−
x p ( x ) dx
x
E ( X ) = x p ( x).
x
Sometimes the expectation E(X) is called the mathematical
expectation of X, the expected value of X, or the mean of X. When
the mean designation is used, we often denote the E(X) by µ; i.e., = E ( X ) .
Example
by
Dr. Saleha Naghmi Habibullah
Topic No. 44
Concept of
Mathematical Expectation
of a Function of a Random Variable X
(for discrete and continuous random variables)
Theorem :
If g ( x)
−
f X ( x ) dx , then the expectation of Y
X is denoted by S X . If xS X
g ( x ) p X ( x ) , then the
E (Y ) = xS g ( x ) p X ( x ) . ( 2)
X
Virtual University of Pakistan
Probability Distributions
by
E k1 g1 ( X ) + k2 g 2 ( X ) = k1 E g1 ( X ) + k2 E g 2 ( X )
Suppose that we have a discrete random variable X that
have the values -1, 0, 1 and the probabilities of three
values are P(-1) = 1/4 , P(0)= 1/2 and P(1)=1/4.
E k1 g1 ( X ) + k2 g 2 ( X ) = k1 E g1 ( X ) + k2 E g 2 ( X )
by
Example of
computing mathematical Expectation
of a Function of a
Discrete Random Variable ‘X’
Example: Let the pmf p(x) be positive at x = -1, 0, 1 and
zero elsewhere.
P ( X = 0) = P ( X = 0) = ,
2 1
4
and
P ( X 2 = 1) = P ( X = −1) P ( X = 1)
= p ( −1) + p (1)
= 1 − p ( 0)
3
=
4
It then follows that
E ( X ) = 0.P ( X = 0 ) + 1.P ( X = 1) = .
2 2 2 3
4
Virtual University of Pakistan
Probability Distributions
by
Another Example of
computing mathematical Expectation
of a Function of a
Discrete Random Variable ‘X’
Example: Let the pmf p(x) be positive at x = -1, 0, 1 and
zero elsewhere.
by
E(X ) E ( X )
2 2
Now
Var ( X ) = E X − E ( X )
2
= E X − 2 XE ( X ) + E ( X )
2 2
= E X − 2 E ( X ) E ( X ) + E ( X )
2 2
= E X − 2 E ( X ) + E ( X )
2 2 2
= E X − E ( X )
2 2
Hence, it is established that
E ( X ) − E ( X ) = E X − E ( X )
2 2 2
X − E ( X ) 0
2
But
as the square of any quantity has to be nonnegative.
E ( X − E ( X )) 0
Therefore, 2
by
E ( X ) = a1 p ( a1 ) + a2 p ( a2 ) + a3 p ( a3 ) + ...
E ( X − ) = ( x − ) p ( x ) = ( a1 − ) p ( a1 ) + ( a2 − ) p ( a2 ) + ...
2 2 2 2
x
This sum of products may be interpreted as a
“weighted average” of the squares of the deviations of
the numbers a1, a2, a3, … from the mean value µ,
= E (X − )
2
Short-cut formula
= E ( X − ) = E ( X 2 − 2 X + 2 ) ;
2 2
and since E is a linear operator,
2 = E ( X 2 ) − 2 E ( X ) + 2
= E ( X 2 ) − 2 2 + 2
= E ( X 2 ) − 2,
This frequently affords an easier way of computing the
variance of X.
Virtual University of Pakistan
Probability Distributions
by
The Concept of
Degenerate
Random Variable
and one of its basic
properties
Definition
If every single student get 7 out of 10 then, the mean mark will
be 7.
by
E(X) = c.
Hint: Show that E(X - c) equals zero
by
E = 0, E = 1.
Using the linear properties of expected value and the
definition of µ = E[X], we calculate
X − 1 −
= E X − = (EX − ) =
1
E = 0,
E ( X − ) 2
2
X − 2
( X − ) 2
E = E = = 2 = 1,
2
2
E = 0, E =1
implies that
X −
Var = 1.
Proof:
Var ( w ) = E ( w ) − E ( w ) , we have
2 2
X − X −
2
X −
2
Var = E − E
= 1 − 02 = 1 − 0 = 1.
by
The Concept of
Moments
mth Moment about Arbitrary origin
Consider the expression E ( X − a )
m
−
or
E ( X − a) = ( x − a) p ( x)
m m
= E ( X − a)
/ m
m
mth Moment about the Mean
In the expression E ( X − a )
m
if we put a = , we obtain
E(X − ) = (x − ) f ( x ) dx
m m
−
or
E(X − ) = ( x − ) p ( x)
m m
m = E ( X − )
m
Moments about the Mean are also
known as Central Moments.
mth Moment about the Origin
In the expression E ( X − a )
m
if we put a = 0, we obtain
E ( X − 0 ) =E ( X ) = x f ( x ) dx
m m m
−
or
E ( X − 0 ) =E ( X m ) = x m p ( x )
m
= E(X
/
m
m
)
A Special Case:
or
1/ = E ( X 1 ) = E ( X ) = xp ( x )
x
implying that the 1st moment about the Origin is the Mean of the
distribution.
Another Special Case:
In the m th moment about the Mean, if we put m = 2, we obtain
2 = E ( X − ) = ( x − ) f ( x ) dx
2 2
−
or
2 =E ( X − ) = ( x − ) p ( x )
2 2
implying that the 2nd moment about the mean is the Variance
of the distribution.
Virtual University of Pakistan
Probability Distributions
by
by
The Concept
of
Moment Generating
Function
Definition:
Let X be a random variable such that for some h > 0, the expectation
of etX exists for
− h t h.
The moment generating function of X is defined to be the function
M ( t ) = E ( etX ) − h t h.
M ( m)
( 0) = E ( X m
).
Since M(t) generates the values of E(Xm); m = 1,2,3,…, it is
called the moment-generating function (mgf).
In the same way, if we take the second derivative of the
first derivative, and putting t = 0, we will get the second
moment about the origin.
Example
Consider the exponential distribution given by
f ( x; ) = e − x , 0 x , 0
Find the mgf of X and use it to find the mean and variance of the
exponential distribution.
Solution
By definition, we have
M ( t ) = E ( etX ) = etx e − x dx
0
x( t − )
x( t − ) e
= e dx =
0 ( t − ) 0
( )
x( t − )
= e
(t − ) 0
Now, if t , then t - 0
and hence
M (t ) =
(t − )
( e −
− e 0
)
= ( 0 − 1) =
(t − ) ( − t )
for t .
Now
M (1) ( t ) = M ( t ) =
d
dt
( M (t ))
d
= =
dt ( − t )
d
dt
(( − t ) )
−1
(
= − ( − t )
−2
( 0 − 1) )
(
= ( − t )
−2
) = ( − t )
2
for t .
Evaluating the derivative at t=0, we obtain
1
M (1)
( 0) = = 2 =
( − 0)
2
(
= 2 ( − t )
−3
) = ( − t )
2
3
for t .
Evaluating the derivative at
t=0, we obtain
2 2 2
M ( 2)
( 0 ) = M ( 0 ) = = =
( − 0) 2
3 3
Var ( X ) = E ( X ) − E ( X )
2 2
2
2 1
= 2 −
2 1 1
= 2− 2 = 2
which is well-known.
Virtual University of Pakistan
Probability Distributions
by
Algebraic expressions
of Some well-known MGFs
Discrete Distributions
Distribution Moment Generating Function
(MGF)
Bernoulli P(X=1) = p 1 − p + pet
pet
Geometric (1-p)k-1p t − ln (1 − p )
1 − (1 − p ) et
Binomial B(n, p) (1 − p + pe ) t n
Poisson λ e
(
et −1 )
(1 − p )
r
Negative binomial NB(r, p)
(1 − pe ) t r
e at − e(b +1)t
Uniform (discrete) U(a ,b)
( b − a + 1) (1 − et )
Continuous Distributions
Distribution Moment Generating Function
(MGF)
Uniform (continuous) U(a ,b) etb − eta
t (b − a )
Normal N(µ-σ2) 1
t + 2t 2
2
e
Chi-Squared χ2k k
(1 − 2t )
−
2
Gamma Γ (k,θ) 1
(1 − t )
−k
; t
Exponential λ
(1 − t ) , ( t )
−1 −1
Distribution Moment Generating Function
(MGF)
by
by
Dr. Saleha Naghmi Habibullah
Topic No. 58
Show that
E ( X ) = (1 − F ( x ) )dx
0
So
(1 − F ( x ) ) dx = f ( y ) dydx
0 0 x
X
Now,
1 − F ( x ) = 1 − P ( X x ) = P ( X x ) = f X ( y ) dy
x
So (1 − F ( x ) ) dx = f ( y ) dydx
0 0 x
X
So (1 − F ( x ) ) dx = f ( y ) dydx
0 0 x
X
y
y
= f X ( y )dxdy = f X ( y )dx dy
0 0 00
Now,
y y
f ( y ) dx = f ( y ) 1.dx = f ( y ) x = f X ( y )( y − 0 ) = yf X ( y )
y
X X X 0
0 0
Therefore
(1 − F ( x ) ) dx = ( yf ( y ) )dy = yf ( y ) dy = E ( X ) .
0 0
X
0
X
Virtual University of Pakistan
Probability Distributions
by
Derivation
of
Mean and Variance of a distribution
in terms of its MGF
(by repeated differentiation of the mgf)
• Since a distribution that has an mgf M(t) is completely
determined by M(t), it would not be surprising if we could
obtain some properties of this distribution directly from M(t).
• For example, the existence of M(t) for –h < t < h implies that
derivatives of M(t) of all orders exist at t = 0.
So here we will apply the method of successive derivatives of the MGF in
order to prove that , for any pdf the MGF of which exists,
= M ( 0)
and
= M ( 0 ) − M ( 0 )
2 2
If X is a continuous random variable, then
dM ( t ) d tx
d tx
M (t ) = = e f ( x ) dx = e f ( x ) dx = xe tx
f ( x ) dx
−
dt dt − dt −
d mx
Now, e = me mx
dt
But here x is acting as the “constant” and t is the variable.
d d
etx = e xt = xe xt = xetx
dt dt
dM ( t ) tx
Hence, M ( t ) = = xe f ( x ) dx
dt −
Upon setting t = 0, we have
dM ( t )
M ( 0) = = xe0 x f ( x ) dx = xe0 f ( x ) dx
dt t =0 − −
= x f ( x ) dx = E ( X ) = .
−
Now, the second derivative:
dM ( t ) d tx
d tx
d tx
M ( t ) = = xe f ( )
x dx = xe ( )
f x dx = x f ( ) e dx
x
− dt
dt dt − dt −
d mx
Now, e = me mx
dt
But here x is acting as the “constant” and t is the variable
d tx
Therefore e = xetx
dt
& hence M ( t ) = x f ( x ) xe dx =
tx
x 2 f ( x ) etx dx
− −
Upon setting t = 0, we have
M ( 0 ) = x 2 f ( x ) e0 x dx = x 2 f ( x ) e0 dx
− −
= x 2 f ( x ) dx = E ( X 2 )
−
According to the short-cut formula,
2 = Var( X ) = E ( X 2 ) − 2
Therefore
= M ( 0 ) − M ( 0 )
2 2
The point to be remembered is that sometimes
one way is easier than the other.
Virtual University of Pakistan
Probability Distributions
by
Derivation
of
mth moment about the Origin
of a distribution from its MGF
(by repeated differentiation of
the MGF)
Theorem
If m is a positive integer and if M(m)(t) means the mth
derivative of M(t), we have, by repeated differentiation with
respect to t,
( )
M ( 0) = E X m .
( m)
and
E ( X ) = x m f ( x ) dx
m
−
where or
p ( 0) ,
x
x
m
Proof:
M (t ) = xetx f ( x ) dx,
−
M ( t ) = x e f ( x ) dx,
2 tx
−
M ( t ) = x e f ( x ) dx ,
3 tx
−
M ( m)
(t ) = x m tx
e f ( x ) dx
−
M ( m)
( 0 ) = x me0 x f ( x ) dx
−
= x m f ( x ) dx = E ( X m ) = mth moment about
−
Derivation
of
m th moment about an Arbitrary
origin of a distribution
from its MGF
(by repeated differentiation of the
mgf)
Theorem
Let X be a random variable such that
dt
R ( t ) = ( x − a)e
t ( x−a )
f ( x ) dx
−
R ( 0 ) = ( x − a) e
0( x − a )
f ( x ) dx = ( x − a ) e f ( x ) dx
0
− −
or
R ( 0 ) = ( x − a ) f ( x ) dx
−
−
R ( 0 ) = ( x − a)
0( x − a )
f ( x )dx = ( x − a) f ( x )dx
2 2
e
− −
by
M ( 0) M ( 0 ) 2 M ( m) ( 0 ) m
M (t ) = M ( 0) + t+ t + ... + t + ...
1! 2! m!
But we know that
M ( m)
( 0) = E ( X m
)
Also, we know that
M ( t ) = E (e tX ) M ( 0 ) = E (e 0 X )
M ( 0 ) = E (e ) M ( 0 ) = E (1) = 1
0
Therefore
E(X ) E(X2) E(Xm)
M (t ) = 1 + t+ t 2 + ... + t m + ...
1! 2! m!
or
2 m
M ( t ) = 1 + E ( X ) + E ( X 2 ) t + ... + E ( X m ) + ...
t t t
1! 2! m!
Thus the coefficient of (tm/m!) in the Maclaurin’s series
representation of M(t) is E(Xm) i.e. the mth moment about zero.
Virtual University of Pakistan
Probability Distributions
by
Derivation of the
Relationship
between the MGF of the Standardized
Variable
and
the MGF of the Original Random
Variable
Theorem:
Let the random variable X have mean µ, standard deviation
σ, and mgf M X ( t ) , − h t h
t
− t
Then M Z (t ) = e
M X , − h t h .
X −
where Z= .
Proof:
which verifies the equation.
Next:
Three general results
pertaining to
the moment generating function:
Three general results pertaining to the moment
generating function:
1. M X + a ( t ) = E e( X + a )t = e at .M X ( t ) ;
2. M bX ( t ) = E etbX = E e(bt ) X = M X ( bt ) ;
t X b+ a t ba +t Xb t ba bt X a t t
3. M X + a ( t ) = E e
= E e
= e E e = e .M X .
b
b b
Note that the third result i.e.
t
a
M X +a ( t ) = e M X
t
b
b b
is of special importance when a = -µ and b = ,
in which case , we have
− t
t
M X − (t ) = e
MX
or, in other words,
− t
t
M Z (t ) = e M X
exactly the same as what we proved.
Virtual University of Pakistan
Probability Distributions
by
Then M(-t)=M(t).
Proof:
By definition
M ( −t ) = E ( e( − t ) x ) = e ( −t ) x
f ( x ) dx
−
if f ( x) is symmetric about 0,
then f ( x) itself is f (- x)
= et ( − x ) f ( x ) dx = et ( − x ) f ( − x ) dx
− −
Applying the transformation
u = − x = du = − dx
&
x → − u → , x → u → −
So
−
M ( −t ) = e t(− x)
f ( )
− x dx = − f (u ) du =
e tu
f (u ) du
e tu
− −
= ( )
e tx
f x dx = E ( ) = M (t ) .
e tX
−
Virtual University of Pakistan
Probability Distributions
by
Cumulants
The First Three Cumulants
It is denoted by ( t ) .
In other words:
( t ) = log M ( t )
Next,
let us consider
the Role of the Cumulant Generating
Function in finding the mean and variance of
a distribution.
Two simple results:
( 0 ) =
and
( 0 ) = 2
Proof:
i) Mean:
M (t )
( t ) =
d
dt
( (t )) =
d
dt
( log M (t ) ) =
1
M (t )
( M (t ) ) =
M (t )
M (t ) M (0)
Hence ( 0 ) = = =
M (t ) t =0 M (0) M (0)
However
M (t ) = E ( etX ) M (0) = E ( e 0 X ) = E ( e 0 ) = E (1) = 1
So that ( 0 ) = = . Hence proved.
1
Proof:
ii) Variance:
M (t )
We have ( t ) =
M (t )
M (t ) M (t ) − ( M (t ) )
2
d M (t )
Hence ( t ) = =
( M (t ) )
2
dt M (t )
M (0) M (0) − ( M (0) ) (1) M (0) − ( )
2 2
so that ( 0 ) = =
( M (0) ) (1)
2 2
= M (0) − ( )
2
( 0 ) = E ( X ) − ( ) = =
2 2 2
Var( X ) . Hence proved.
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 66
Additivity Property
of the CGF and the Cumulants
The CGF of the sum
of two independent random variables
equals the sum of their CGFs
so that
= log E et ( X ) et (Y )
(
= log E etX E etY )
= log E etX + log E etY
= K X (t ) + KY (t ).
2. Additivity Property of the
07:0
Cumulants:
5
If X and Y are independent random variables,
then,
the nth cumulant of X+Y is related to the nth cumulant of X
and the nth cumulant of Y by the relation
κn(X + Y) = κn(X) + κn(Y).
by
• Definition
It is shown as the
(
C (t ) = E e t( x− )
)
When we consider the mgf for moments about ‘0’
also
we say E ( e tx
) can also be written as E ( e t ( x −0)
)
The central moment generating function is
given
C ( t ) = E e
t( x− )
= E
e tx − t
= E
e tx − t
e
= e E e = e M ( t )
− t tx − t
Now, by definition, we have
K ( t ) = log M ( t )
e K (t )
=e log M ( t )
e K (t )
= M (t )
( K ( t )− t )
C (t ) = e K (t ) − t
e =e .
• To express the central moments as functions of
the cumulants, just drop from these polynomials
all terms in which κ1 appears as a factor:
1 = 0
2 = 2
3 = 3
4 = 4 + 3 2
2
5 = 5 + 10 3 2
6 = 6 + 15 4 2 + 10 32 + 15 23 .
To express the cumulants κn for n > 1 as
functions of the central moments, drop from
these polynomials all terms in which μ'1 appears
as a factor:
2 = 2
3 = 3
4 = 4 − 32
2
5 = 5 − 103 2
6 = 6 − 154 2 − 10 + 30 .
2
3
3
2
Virtual University of Pakistan
Probability Distributions
by
Chevbyshev's Inequality
and
its proof
&
an alternative form of the inequality
• Chebyshev's inequality guarantees that, for a wide class
of probability distributions, no more than a certain
fraction of values can be more than a certain distance
from the mean.
According to the Chebychev Theorem, no more than
1/k2 of the distribution's values can be more than k
standard deviations away from the mean
Short-cut formula
Again see the statement as
by
3
P ( X − k ) = P X − 0 (1)
2
3
= P X
2
1 3/ 2
3/ 2
1
= 1−
−3/ 2 2 3
dx = 1 −
2 3
( x −3/ 2
3 1 3 3
P X = 1− +
2 2 3 2 2
1
−
1 3 31.3 2
3
= 1− = 1− = 1− .
32 2 2
• By Chebyshev's inequality, this probability has the upper
bound 1/k2=4/9.
by
Another application
Of
Chebyshev's Inequality
Alternative form
= 13 − 9 = 4.
Now, we note that if we put k = 5,
then 3 − k = − 2 and 3 + k = 8,
Therefore, the required probability
P −2 X 8 = P 3 − k X k + 3
But, according to Chebychev's Inequality,
4
P 3 − m X m + 3 1 − 2
m
So
P −2 X 8 1 − 4 / 25 or P −2 X 8 ( 25 − 4 ) / 25
or P −2 X 8 21/ 25 or P −2 X 8 0.84
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 71
n
E(X ) = n −1
a .
i =1
i
Then the mean of X is the arithmetic mean, (AM),
a1 a2 an
E ( X ) = xp ( x ) = + + +
n n n
1 1
= ( a1 + a2 + ... + an ) = ai
n n
Theorem (Jensen’s Inequality)
If φ is convex on an open interval I and X is a random
variable whose support I contained in I and has finite
expectation, then
E ( X ) E ( X ) .
If φ is strictly convex, then the inequality is strict unless X
is a constant random variable.
Then, since – logx is a convex function, we have by
Jensen’s inequality that
1 n 1 n
− log ai E ( − log X ) = − log ai = − log ( a1a2 ...an )
1/ n
n i =1 n i =1
or, equivalent,
1 n
log ai log ( a1a2 ...an ) ,
1/ n
n i =1
and, hence
1 n
( a1a2 ...an ) ai (1)
1/ n
n i =1
The quantity on the left side of this inequality is called the geometric
mean (G.M).
So, equation 1 is equivalent to saying that GM ≤ AM for any finite set
of positive numbers.
Now in eq. (1) replace ai by 1/ ai (which is positive).
1n
We then obtain 1 n
1 1 1 1
n i =1 ai a1 a2
an
1
( a1a2 an )
1n
Or, equivalently, (2)
n
1 1
n i =1 ai
The left member of this inequality is called harmonic
mean HM.
Putting the equations together, we have shown the
relationship
HM ≤ GM ≤ AM, (3)
by
Concept of a
Random Vector
(explained through an example)
Let us begin the discussion of a pair of random variables
with the following example.
Let X1 denote the number of H’s on the first two tosses and X2
denotes the number of HH’s on all three flips.
by
Concept of an event
in a case of a two-dimensional space
(i.e. a set of ordered pairs)
• Let Ɗ be the space associated with the random vector
(X1, X2).
Let X1 denote the number of H’s on the first two tosses and X2
denotes the number of HH’s on all three flips.
Then our interest can be represented by the pair of random variables
(X1, X2).
For example, (X1(HTH), X2(HTH)) represents the outcome (1,2).
Continuing in this way, X1 and X2 are real-valued functions
defined on the sample space C, which take us from the
sample space to the space of ordered number pairs.
by
the Joint
Cumulative
Distribution Function
We can uniquely define PX1,X2 in terms of the cumulative
distribution (cdf), which is given by
P a1 X 1 b1 , a2 X 2 b2 = FX1 , X 2 ( b1 , b2 ) − FX1 , X 2 ( a1 , b2 )
− FX1 , X 2 ( b1 , a2 ) + FX1 , X 2 ( a1 , a2 ) (2)
by
p X1 , X 2 ( x1 , x2 ) = P X 1 = x1 , X 2 = x2 ,
( i ) 0 p X , X ( x1 , x2 ) 1 and
1 2
( ii ) pX , X ( x1 , x2 ) = 1.
1 2
(1)
D
For an event B ϵ Ɗ, we have
P ( X 1 , X 2 ) = p X1 , X 2 ( x1 , x2 ) .
Likewise we may extend the pmf p X1 , X 2 ( x1 , x2 )
over a convenient set by using zero elsewhere.
Hence, we replace
pD
X1 , X 2 ( x1 , x2 ) by p ( x , x ).
x2 x1
1 2
Virtual University of Pakistan
Probability Distributions
by
C={TTT,TTH,THT,HTT,THH, HTH,HHT,HHH}.
Let
X1 denote the number of H’s on the first two tosses
and
X2 denotes the number of H’s on all three flips.
Then our interest can be represented by the pair of random variables (X1,
X2).
Now,
X1(TTT)=0 and X2(TTT)=0
X1(TTH)=0 and X2 (TTH)=1
X1(THT)=1 and X2 (THT)=1
X1(HTT)=1 and X2 (HTT)=1
X1(THH)=1 and X2 (THH)=2
X1(HTH)=1 and X2 (HTH)=2
X1(HHT)=2 and X2 (HHT)=2
X1(HHH)=2 and X2 (HHH)=3
Now, our interest can be represented by the pair of random
variables (X1, X2).
So, the eight possible pairs are
(X1,X2)=(0,0),
(X1,X2)=(0,1),
(X1,X2)=(1,1),
(X1,X2)=(1,1),
(X1,X2)=(1,2),
(X1,X2)=(1,2),
(X1,X2)=(2,2),
(X1,X2)=(2,3),
So, we write
Ɗ = {(0,0), (0,1),(1,1), (1,2),(2,2),(2,3)}.
with probabilities
P[(X1,X2)=(0,0)]= 1/8
P[(X1,X2)=(0,1)]= 1/8
P[(X1,X2)=(1,1)]= 2/8
P[(X1,X2)=(1,2)]= 2/8
P[(X1,X2)=(2,2)]= 1/8
P[(X1,X2)=(2,3)]= 1/8
We can conveniently table the pmf
of the random vector (X1, X2) as
Support of X2
0 1 2 3
0 1/8 1/8 0 0
2 0 0 1/8 1/8
Virtual University of Pakistan
Probability Distributions
by
x1 x2
FX1 , X 2 ( x1 , x2 ) = f X1 , X 2 ( w1 , w2 )dw1dw2
− −
for all (x1, x2) ϵ R2 .
Joint Probability Density Function
We call the integrand
x1 x2
FX1 , X 2 ( x1 , x2 ) = f X1 , X 2 ( w1 , w2 )dw1dw2
− −
(i ) f X1 , X 2 ( x1 , x2 ) 0 and
( ii ) f X , X ( x1 , x2 ) dx1dx2 = 1.
1 2
(1)
D
We may extend the definition of a pdf
f X1 , X 2 ( x1 , x2 ) over R 2
f X1 , X 2 ( x1 , x2 ) dx1dx2 by f ( x , x ) dx dx .
1 2 1 2
D − −
• For an event A ϵ Ɗ, we have
by
3 1
P 0 X1 , X 2 2
4 3
Solution:
2 3/4
3 1
Since (X1, X2) P 0 X1 , X 2 2 =
4 3 1/3
f ( x , x ) dx dx
1 2 1 2
is a continuous 0
1 3/4 2 3/4
random vector,
therefore
=
1/3 0
6 x12 x2 dx1dx2 +
1 0
0 dx dx
1 2
3/4 2
1
= 6 x2 x1 dx1 dx2 + 0
1/3 0
3 3
1 2 +1 3/4 1
dx2 + 0 = 6 x2 − 0 dx2 + 0
3 1 x1 4
P 0 X 1 , X 2 2 = 6 x2
4 3 2 +1 0
1/3
3
1/3
27 9
1 1 1
27
= 6 x2 2
dx + 0 = 3 x2 2
dx + 0 = x2 dx2 + 0
1/3
64 3 1/3
32 32 1/3
or
27
1
3 1
P 0 X1 , X 2 2 =
4 3
32 1/3
x2 dx2
27 x2 1+1 1
27 x22 1
27 (1) 2 (1/ 3) 2
= = = −
32 1 + 1 1/3 32 2 1/3 32 2 2
27 1 1 27 1 1 27 9 − 1
= − = − =
32 2 9 2 32 2 18 32 18
27 8 3 1 3
= = = .
32 18 4 2 8
Note that this probability is the volume under the
surface given by f ( x1 , x2 ) = 6 x12 x2
above the rectangular set
3 1
( 1 2 ) 2
x , x : 0 x1 , x2 1 R
4 3
(i.e. the rectangular area given by
3 1
( x1 , x2 ) : 0 x1 , x2 1 R ).
2
4 3
Virtual University of Pakistan
Probability Distributions
by
Concept of the
Support
of a
Continuous
Random Vector
(explained through an example)
• For a continuous random vector (X1, X2), the
support of (X1, X2) contains all points (x1, x2)
for which f(x1, x2) > 0.
Example:
0 x1 1, 0 x2 1 .
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 80
Properties
of the
Joint Cumulative Distribution Function
Definition
• The joint cumulative function of two random
variables X and Y is defined as
FXY ( x, y ) = P ( X x, Y y ) .
Properties:
by
Dr. Saleha Naghmi Habibullah
Topic No. 81
Properties
of the
Joint Cumulative Distribution Function
Definition
• The joint cumulative function of two random
variables X and Y is defined as
FXY ( x, y ) = P ( X x, Y y ) .
Properties:
The joint CDF satisfies the following properties:
1. FX ( x ) = FXY ( x, ) , for any x (marginal CDF of X );
2. FY ( y ) = FXY ( , y ) , for any y (marginal CDF of Y );
3. FXY ( , ) = 1;
4. FXY ( −, y ) = FXY ( x, − ) = 0;
5. P ( x1 X x2 , y1 Y y2 ) =
FXY ( x2 , y2 ) − FXY ( x1 , y2 ) − FXY ( x2 , y1 )
+ FXY ( x1 , y1 ) ;
6.If X and Y are independent, then
FXY ( x, y ) =FX ( x ) FY ( y ) .
4. FXY ( −, y ) = FXY ( x, − ) = 0;
5. P ( x1 X x2 , y1 Y y2 ) =
FXY ( x2 , y2 ) − FXY ( x1 , y2 ) − FXY ( x2 , y1 ) + FXY ( x1 , y1 ) ;
6.If X and Y are independent, then
FXY ( x, y ) =FX ( x ) FY ( y ) .
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 82
Marginal
Probability Mass Functions
(explained through an example)
• Consider a discrete random vector, that is, a vector
whose entries are discrete random variables.
Total
Marginal of Y
• To find the probability that X1 is equal to x1, keep x1
fixed , in continuous case, instead of summing we are
intergrating one variable over the other.
Virtual University of Pakistan
Probability Distributions
by
Marginal
Probability Mass Functions
(explained through an example)
Example:
C={TTT,TTH,THT,HTT,THH, HTH,HHT,HHH}.
Let
X1 denote the number of H’s on the first two tosses
and
X2 denotes the number of H’s on all three flips.
Then the space of the discrete random vector (X1,X2) is given
by
Ɗ = {(0,0), (0,1),(1,1), (1,2),(2,2),(2,3)}.
with probabilities
P[(X1,X2)=(0,0)]= 1/8
P[(X1,X2)=(0,1)]= 1/8
P[(X1,X2)=(1,1)]= 2/8
P[(X1,X2)=(1,2)]= 2/8
P[(X1,X2)=(2,2)]= 1/8
P[(X1,X2)=(2,3)]= 1/8
We can conveniently table the pmf
of the random vector (X1, X2) as:
Support of X2
0 1 2 3
0 1/8 1/8 0 0
1 0 2/8 2/8 0
Support of X1 2 0 0 1/8 1/8
Now, to find the marginal probabilities,
we have:
Support of X2
0 1 2 3
But
by
Dr. Saleha Naghmi Habibullah
Topic No. 84
f X1 , X 2 ( x1 , x2 )dx1dx2 = 1
− −
• In the continuous case, the marginal pdf of X1 is found
by integrating out x2 ,i.e.,
f X1 ( x1 ) = f X1 , X 2 ( x1 , x2 )dx2 (1)
−
x1 + x2 0 x1 1, 0 x2 1
f ( x1 , x2 ) =
0 elsewhere
It is easy to verify that
1 1
(x
0 0
1 + x2 )dx1dx2 = 1
i.e.
f X1 , X 2 ( x1 , x2 )dx1dx2 = 1
− −
The marginal pdf of X1 is
x2 1
(
1 1 1
f1 ( x1 ) = ( x1 + x2 )dx2 = x1 1 dx2 + x2 dx2 = x1 x2 + 2
1
0 2
0 0 0 0
12 1
= x1 (1 − 0 ) + − 0 = x1 (1) +
2 2
1
f1 ( x1 ) = x1 + , 0 x1 1
2
zero elsewhere,
and the marginal pdf of X2 is
x2 1
(
1 1 1
f 2 ( x2 ) = ( x1 + x2 )dx1 = x1 dx1 + x2 1 dx1 = + x2 x1 0
1 1
2 0
0 0 0
12 1
= − 0 + x2 (1 − 0 ) = + x2 (1)
2 2
1
f 2 ( x2 ) = + x2 , 0 x2 1,
2
by
Dr. Saleha Naghmi Habibullah
Topic No. 85
Another example of
Computation of probabilities
that can not be found
through Marginal PDFs
Example:
= 4 x1 x2 , 0 x1 1, 0 x2 1,
Let f ( x1 , x2 )
zero elsewhere,
be the joint pdf of X 1 and X 2 .
Find
1 1
i) P(0 X 1 , X 2 1),
2 4
ii) P ( X 1 = X 2 ) ,
iii) P( X 1 X 2 )
and
iv) P( X 1 X 2 ).
Solution:
The probability density function is given by
= 4 x1 x2 , 0 x1 1, 0 x2 1,
f ( x1 , x2 )
zero elsewhere,
As such:
1
1 2
1 1
i) P(0 X 1 , X 2 1) = f ( x1 , x2 ) dx1dx2
2 4 1 0
4
1
1
1 2 1 2
= 4 x1 x2 dx1dx2 = 4 x1 x2 dx1 dx2
1 0 10
4 4
or
1 1 x2
1 1/ 2
P (0 X 1 , X 2 1) == 4 x2 1 dx2
2 4 2
1
0
4
1 (1 / 2 )2 0 1
1
= 4 x2 − dx2 = 4 x2 dx2
4 2
1 2 2 1
4 4
1
1
1 x22
1
1 (1)2 (1 / 4 )2
= x2 dx2 = = −
21 2 2
2 2 2
1/ 4
4
11 1 15
= − =
2 2 16 2 64
ii) Now to find P(X1=X2):
P(X1=X2) can be re-written as P(X1 - X2 = 0)
Therefore, P(X1 - X2 = 0)
iii) Find P(X1 < X2).
According to the definition,
1 x2 1 x2
P ( X 1 X 2 ) = f ( x1 , x2 )dx1dx2 = 4 x1 x2 dx1dx2
0 0 0 0
1 x 2 x2 1
x22 0 1
x22
= 4 x2 1 dx2 = 4 x2 − dx2 = 4 x2 . dx2
0 2
0
0 2 2 0
2
1 x 3 1 1 0 1
= 2 x2 dx2 = 2
3 2
= 2 − =
0 4 0 4 4 2
Virtual University of Pakistan
Probability Distributions
by
Expected Value
of a Real-Valued
Function
of a Random Vector
It is a straightforward extension of the concept of the
expected value of a function of a random variable.
Let (X1, X2) be a random vector and let Y = g(X1, X2) where
g : R2→R is some real-valued function of X1 and X2;.
For example:
• Y= X1+X2,
• Y = X 12 − e 2
X
• Then Y is a random variable and we can determine its
expectation by obtaining the distribution of Y.
First and foremost, let us determine the conditions under
which the expectation (or expected value) of Y will exist.
Then, if
x f ( x ) dx ,
−
g ( x ) f ( x ) dx
−
provided that g ( x ) f ( x ) dx .
−
A similar definition is available for a function g(X1, X2) of
two random variables X1 and X2.
g (x ,x ) f
− −
1 2 X1 , X 2 ( x1 , x2 ) dx1dx2 .
Likewise, if the random vector (X1, X2) is discrete and we let
Y = g(X1, X2), then E(Y) exists if
g ( x , x ) p
x1 x2
1 2 X1 , X 2 ( x1 , x2 ) .
and is given by
E (Y ) = g ( x1 , x2 ) p X1 , X 2 ( x1 , x2 ) .
x1 x2
Virtual University of Pakistan
Probability Distributions
by
So,
2 4
P ( A1 ) = P ( A3 ) + f ( x, y ) dydx
0 −
2 4
7 3
= + dydx
8 8 0 −
0 1
P ( A4 ) = f ( x, y ) dydx
−
So,
0 4
P ( A3 ) = P ( A4 ) + f ( x, y ) dydx
− 1
0 4
3 2
= + f ( x, y ) dydx
8 8 − 1
So,
2 4
7 4 3
f ( x, y ) dydx = − = = P ( A6 )
− 1
8 8 8
2 4
7 3 4
f ( x, y ) dydx = − = = P ( A7 )
0 −
8 8 8
and
0 4
3 2 1
f ( x, y ) dydx = − = = P ( A8 )
8 8 8
Hence,
2 4
P ( A5 ) f ( x, y ) dydx
0 1
So,
2 4 0 4
f ( x, y ) dydx − f ( x, y ) dydx
− 1 − 1
3 1 2 1
= − = =
8 8 8 4
Virtual University of Pakistan
Probability Distributions
by
2 x1 + 3 2 (1) + 3 2 ( 2 ) + 3 5 14 19
E ( X 1 ) = x1 = (1) + ( 2) = + =
x1 =1,2 12 12 12 12 12 12
2 x2 + 3 2 (1) + 3 2 ( 2 ) + 3 5 14 19
E ( X 2 ) = E ( X 1 ) = x2 = (1) + ( 2) = + =
x2 =1,2 12 12 12 12 12 12
x1 + x2
E ( X 1 X 2 ) = x1. x2
Now x1 , x2 12
1+1 1+ 2 2 +1 2+2
= (11) + (1 2 ) + ( 2 1) + ( 2 2)
12 12 12 12
2 6 6 16 30
= + + + = = 2.5
12 12 12 12 12
On the other hand,
19 19
E ( X 1 ) E ( X 2 ) = = 2.5069
12 12
Hence
E ( X1 X 2 ) E ( X1 ) E ( X 2 )
Virtual University of Pakistan
Probability Distributions
by
E ( X1 ) , E ( X 1
2
) , E ( X ) , E ( X ) , and E ( X X ) .
2
2
2 1 2
Is E ( X 1 X 2 ) = E ( X 1 ) E ( X 2 ) ?
Find E ( 3 X 1 − 2 X 12 + 6 X 1 X 2 ) .
Firstly, E(X1)
As a rule:
E ( X 1 ) = x1 f X1 ( x1 )dx1 = x1 f X1 ( x1 )dx1 = x1 f ( x1 , x2 )dx2 dx1
− − − −
= x1 f ( x1 , x2 )dx2 dx1 = x1 f ( x1 , x2 )dx2 dx1
− − − −
Similarly, E(X2)
As a rule:
E ( X 2 ) = x2 f X 2 ( x2 )dx2 = x2 f X 2 ( x2 )dx2 = x2 f ( x1 , x2 )dx2 dx1
− − − −
= x2 f ( x1 , x2 )dx2 dx1 = x2 f ( x1 , x2 )dx2 dx1
− − − −
Firstly, E(X1)
1 1 1 1
E ( X1 ) = x1 f ( x1 , x2 )dx2 dx1 = x1.4 x1 x2 dx2 dx1 = 1 x2 dx2 dx1
4 x 2
− − 0 0 0 0
1
1 0
dx1 = 4 x12 . − dx1
1 1 1 2 1
x
= 4 x12 x2 dx2 dx1 = 4 x12 . 2
00 2 0 2 2
0
0
1
1
x 3
1 0
= ( 2 x12 )dx1 = 2 2
= 2 −
0 3 0
3 3
2
E ( X1 ) =
3
Then E(X2)
1 1 1 1
E ( X2 ) = x2 f ( x1 , x2 )dx1dx2 = x2 .4 x1 x2 dx1dx2 = 4 x1 x2
2
dx1dx2
− − 0 0 0 0
1
1
1 2 2 x1 1 0
dx2 = 4 x22 . − dx2
1 2 1
= 4 x2 x1dx1 dx2 = 4 x2 .
00 2 0 2 2
0
0
1
1
x 3
1 0
= ( 2 x )dx2 = 2
2
2
2
= 2 −
0 3 0
3 3
2
E ( X2 ) =
3
Similarly E(X1X2)
As a rule:
E ( X 1 X 2 ) = x1 x2 f X1 , X 2 ( x1 , x2 ) dx2 dx1 = x1 x2 f ( x1 , x2 )dx2 dx1
− − − −
= x1 x2 f ( x1 , x2 )dx2 dx1 = x1 x2 f ( x1 , x2 ) dx2 dx1
− − − −
Now E(X1X2)
1 1 1 1
E ( X1 X 2 ) = x1 x2 f ( x1 , x2 )dx1dx2 = x1 x2 .4 x1 x2 dx1dx2 = 1 x2 dx1dx2
4 x 2 2
− − 0 0 0 0
1
1 0
dx2 = 4 x22 . − dx2
1 1 13 1
x
= 4 x12 x22 dx1 dx2 = 4 x22 . 1
00 3 0 3 3
0
0
1
4 x 41 0 4
1 3
4 2
3 0
= x2 dx2 =
2
= − =
3 3 0
33 3 9
2 2 4
E ( X1 X 2 ) = E ( X1 ) E ( X 2 ) = =
3 3 9
Virtual University of Pakistan
Probability Distributions
by
Determination of
the Expectation of the Ratio of two continuous
random variables
(explained through an example)
Example:
Let X1 and X2 have the pdf
8 x1 x2 0 x1 x2 1
f ( x1 , x2 ) =
0 elsewhere
1 x2
yx2 y 2 x22 0
1 yx2 1
= 8 x1 x2 dx1 dx2 = 8 x2 1 dx2 = 8 x2 − dx2
2 2 2
0 0 0
0
0
y x2 4 1
21 0
1 2 2 1
x2
= 8 x2 dx2 = 4 y x2 dx2 = 4 y
2 3 2
= 4y −
0 2 4 4 4
0 0
1
= 4 y2 = y2.
4
Hence
y2 0 y 1
FY ( y ) =
0 elsewhere
1 y3 1
1 0 1
= 2 y 2 dy = 2 = 2 − = 2
3 3 3 3
0 0
2
E (Y ) = ,
3
X1 1 2 x1
x x
2 2
1
E (Y ) = E
= 1 2 1 2 1 1 dx2
8 x x dx dx = 8 x dx
X2 0 0 x2
0
0
1 3 x2
x
1
x 3
0
1
x 3
= 8 1
dx2 = 8 2
− dx2 = 8 2
dx2
0
3 0 0
3 3 0
3
1
8
8 x 4 1
81 0 8 1 2
= x23 dx2 = 2 = − = = .
3 3 4 0 3 4 4 3 4 3
0
Virtual University of Pakistan
Probability Distributions
by
How to obtain
product moments and simple moments
from the MGF
of a random vector?
In a simplified notation
M ( 0, 0 ) M ( 0, 0 )
E(X ) = , E (Y ) =
t1 t2
E(X ) =
2 2
M ( 0, 0 ) , E (Y ) =
2 2
M ( 0, 0 )
t1
2
t22
and
M ( 0, 0 )
2
E ( XY ) =
t1t2
Therefore
M ( 0, 0 )
1 = E ( X ) = ,
t1
M ( 0, 0 )
2 = E (Y ) =
t2
and
M ( 0, 0 ) M ( 0, 0 )
2
2
1 = E ( X ) − E ( X ) =
2
−
2 2
t1
2
t1
M ( 0, 0 ) M ( 0, 0 )
2
2
2 = E (Y ) − E (Y ) =
2
−
2 2
t 2
2 t 2
And, as far as the Co variance is concerned
we have
E ( X − 1 )(Y − 2 ) = E ( XY ) − E ( X ) E (Y )
2 M ( 0, 0 ) M ( 0, 0 ) M ( 0, 0 )
= −
t1t2 t1 t2
by
E ( X1 )
E(X) =
E ( X 2 )
Similarly, if we toss two fair coins together, then
expected value of X1 = 1
AND
E ( X 1 ) 1
E( X ) = =
E ( X 2 ) 3.5
If we consider the two experiments together, then the expected value
of X is given by
E ( X 1 ) 1
EX = =
E ( X 2 ) 3.5
Virtual University of Pakistan
Probability Distributions
by
Linear Combination
of Expected Values
of Real-Valued Functions of a Random Vector
(explained through an example)
Theorem:
Let (X1, X2) be a random vector, and,
let Y1= g1(X1, X2) and Y2= g2(X1, X2) be random variables the
expectations of which exist.
g (x , x )f
− −
1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
provided that
g (x ,x ) f
− −
1 2 X1 , X 2 ( x1 , x2 ) dx1dx2 .
In other words, if Y = g(X1, X2) ,
then
E (Y ) = g (x , x )f
1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
− −
provided that
g (x ,x ) f
− −
1 2 X1 , X 2 ( x1 , x2 ) dx1dx2 .
Focusing on the theorem again:
Theorem: Let ( X 1 , X 2 ) be a random vector,
and let Y1 = g1 ( X 1 , X 2 ) and Y2 = g 2 ( X 1 , X 2 ) be
random variables the expectations of which exist.
x+ y x + y
Reverting back to the Proof:
( )
k1 g1 ( x1 , x2 ) + k2 g 2 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 )
or
k1 g1 ( x1 , x2 ) + k2 g1 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 )
( )
k1 g1 ( x1 , x2 ) + k2 g 2 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 )
Now k1 g1 ( x1 , x2 ) + k2 g1 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 )
( )
k1 g1 ( x1 , x2 ) + k2 g 2 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 )
can be re-written as
k1 g1 ( x1 , x2 ) + k2 g1 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 )
k1 g1 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 ) + k2 g 2 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 )
....(1)
Now, let us consider the concept of Linearity of
Integration:
k g (x , x )+ k g (x , x ) f
− −
1 1 1 2 2 1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
k1 g (x ,x ) f
− −
1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
+ k2 g (x ,x ) f
− −
2 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
Now, according to the statement of the Theorem,
Y1 = g1 ( X 1 , X 2 ) and Y2 = g 2 ( X 1 , X 2 )
are random variables the expectations of which exist
i.e.
E (Y1 ) = E g1 ( X 1 , X 2 ) exists g (x , x ) f
1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2 .
− −
E (Y2 ) = E g 2 ( X 1 , X 2 ) exists g 2 ( x1 , x2 ) f X1 , X 2 ( x1 , x2 ) dx1dx2
− −
Hence k g (x , x )+ k g (x , x ) f
− −
1 1 1 2 2 1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
k1 g (x ,x ) f
− −
1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
+ k2 g (x ,x ) f
− −
2 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
Therefore k g ( x , x ) + k g ( x , x ) f
− −
1 1 1 2 2 1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
=E k1 g1 ( x1 , x2 ) + k2 g1 ( x1 , x2 ) exists.
By once again using linearity of the integral, we have
E ( k1Y1 + k2Y2 ) = k g ( x , x ) + k g ( x , x ) f
1 1 1 2 2 1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2
− −
= k1 g (x , x )f
− −
1 1 2 X1 , X 2 ( x1 , x2 ) dx1dx2 + k2 g 2 ( x1 , x2 ) f X , X ( x1 , x2 ) dx1dx2
− −
1 2
= k1 E (Y1 ) + k2 E (Y2 ) ,
by
where x1 = w1 ( y1 , y2 ), x2 = w2 ( y1 , y2 )
is the single-valued inverse of
y1 = u1 ( X 1 , X 2 ) and y2 = u2 ( X 1 , X 2 ) .
In using this change of variable technique, it should be
emphasized that we need two “new” variables to replace the
two “old” variables.
After we have found the joint pmf pY1 ,Y2 ( y1 , y2 ), we may obtain
the marginal pmf of Y1 by summing on y2 or the marginal pmf
Y2 by summing on y1.
Virtual University of Pakistan
Probability Distributions
by
1x1 2x2 e − 1 − 2
, x1 = 0,1, 2, 3,..., x2 = 0,1, 2, 3,...
p X1 , X 2 ( x1 , x2 ) = x1 ! x2 !
zero elsewhere
where µ1 and µ2 are fixed positive real numbers.
by
where,
y1 = u1 ( x1 , x2 ) = x1 + x2
y2 = u2 ( x1 , x2 ) = x1 − x2 .
• This transformation is one-to-one.
• We first determine the set T in the y1y2-plane that is the
mapping of S under this this transformation.
• Inverse function
Now y1 = x1 + x2 , y2 = x1 − x2
implies that
y1 + y2
y1 + y2 = 2 x1 x1 = w1 ( y1 , y2 ) =
2
&
y1 − y2
y1 − y2 = 2 x2 x2 = w2 ( y1 , y2 ) = .
2
Next, the Jacobian is given by
x1 x1 1 1
y1 y2 2 2 =−1−1 =−2 =−1
J= =
x2 x2 1 1 4 4 4 2
−
y1 y2 2 2
1
J = .
2
Now, using the inequalities 0 x1 1 and 0 x2 1
we can write
y1 + y2 y1 − y2
0 1 and 0 1
2 2
and it is easy to see that these are equivalent to
− y1 y2 , y2 2 − y1 , y2 y1 , y1 − 2 y2
y1 + y2 y1 − y2
f X1 , X 2 , . J , ( y1 , y2 ) T
fY1 ,Y2 ( y1 , y2 ) = 2 2
0
elsewhere
or
1
1. ,
fY1 ,Y2 ( y1 , y2 ) = 2
( y1 , y2 ) T
0 elsewhere
In other words, the joint pdf of (Y1 , Y2 ) is given by
1
,
fY1 ,Y2 ( y1 , y2 ) = 2
( y1 , y2 ) T
0 elsewhere
Virtual University of Pakistan
Probability Distributions
by
E (e ) = e
tX1 ( )
1 et −1
2 ( −1)
E (e ) = e
tX 2 e t
Let Y = X1+X2 and consider
E (e tY
)= e t ( x1 + x2 )
p X1 , X 2 ( x1 , x2 )
x1 = 0 x2 = 0
x − x −
e
e
= e
1 1 2 2
tx tx
1
e 2
x =0
1
x1 ! x =0
2
x2 !
( ) ( )
x1 x2
e t
e t
E ( etY ) = e − 1 e − 2
1 2
x1 = 0 x1 ! x2 = 0 x2 !
1 ( et −1) 2 ( et −1)
= e e
( 1 + 2 )( et −1)
=e .
Note that the factors in the brackets in the next-to-least equality are
mgfs of X1 and X2, respectively.
( 1 + 2 )
y
pY ( y ) = e −( 1 + 2 )
, y = 0,1, 2,...,
y!
Virtual University of Pakistan
Probability Distributions
by
Characteristic Function
The Characteristic function of a probability distribution
is denoted by X ( t ) and is defined as the expected value
of eitX i.e.
X ( t ) = E ( eitX )
where t is an arbitrary real number and i is the imaginary
number given by i = −1 .
If the PDF of X is given by f ( x), then
X (t ) = E (e ) f ( x)dx.
itX
= e itx
−
The characteristic function exists for every distribution and it
possesses the uniqueness property. In other words, every
distribution has unique characteristic function.
Binomial B ( n, p ) (1 − p + pe )
it n
Characteristic Functions of some well-known
discrete distributions:
( )
Poisson Pois ( )
eit − 1
e
Characteristic Functions of some well-known
discrete distributions:
peit
Geometric (1 − p )
k −1
1 − (1 − p ) eit
p
Characteristic Functions of some well-known
discrete distributions:
Exponential
Normal
Chi-squared
Gamma
Cauchy
Laplace
Distribution Characteristic Function
φ(t)
it 1 22
Normal N(µ, σ2) e − t
2
Distribution Characteristic
Function φ(t)
Chi-squared X2k k
(1 − 2it )
−
2
Gamma Γ(k, θ)
(1 − it )
−k
by
iE ( X ) = ( 0 ) and i 2 E ( X 2 ) = ( 0 ) .
Example:
• Use the characteristic function of the Exponential
distribution in order to find the mea and variance.
Solution:
CF of exponential distribution is given by
(t ) =
− it
Now, taking the first derivative w.r.t 't', we get
d d −1−1 d
(t ) = ( ) ( )( ) ( − it )
−1
= − it = − 1 − it
dt − it dt dt
−2 d i
= − ( − it ) ( 0 − i(1) ) = − ( − it ) (−i) =
−2
( − it )
2
dt
Now putting t=0, we have
i i i i
( 0) = = = =
( − i ( 0)) ( − 0) ( )
2 2 2
Therefore,
i 1
( 0 ) = = i = iE ( X ) .
Now, taking the second derivative w.r.t 't', we get
d i d −2 −1 d
(t ) = = i ( − it ) = ( −2 )( − it ) ( − it )
−2
dt ( − it )
2
dt dt
d 2i 2 −2
= −2 ( − it ) ( )
−3
0 − i (1) = = ( i = −1 i 2 = −1)
( − ) ( − )
3 3
dt it it
Now putting t=0, we have
−2 −2 −2 −2
( 0 ) = = = =
( − 0) ( ) 2
3 3 2
Therefore,
2 2 2 2
( 0 ) = −1 2 = i 2 = i E ( X 2 ) . ( i = −1, i 2 = −1)
As we know that,
V (X ) = E ( X ) − E ( X )
2 2
( t ) = M ( it )
Virtual University of Pakistan
Probability Distributions
by
P ( A B)
P ( A | B) =
P ( B)
provided P(B)≠ 0.
Suppose now that A and B are the events X=x and Y=y so
that we can write
P ( X = x, Y = y ) f ( x, y )
P( X = x | Y = y) = =
P (Y = y ) h( y)
1 1
Determine: ( )
d P X 1 .
4 2
Solution:
First we will find the marginal of x1
1 1
f X1 ( x1 ) = 10 x1 x22 dx2 = 10 x1 x22 dx2
x1 x1
x3 1 3
4
= x1 (1 − x13 )
1 x 10 x 10 x 10
= 10 x1 2
= 10 x1 − =
1 1
− 1
3 x 3 3 3 3 3
1
Now,
1 1 10
1/2 2 1/2 5 1/2
P X1 =
4 2 3 1/4 ( x1 − x1 ) 1
4
dx =
10
x1
3 2 1/4 5
−
x1
1/4
10 1 1 1 1
= − − −
3 4 2 16 2 32 5 1024 5
1 1 10 1 1 1 1
P X1 = − − −
4 2 3 8 32 160 5120
10 4 − 1 32 − 1 10 3 31
= − = −
3 32 5120 3 32 5120
10 480 − 31 4490 449
= = = .
3 5120 3 5120 1536
Virtual University of Pakistan
Probability Distributions
by
E u ( X 2 ) | x1 = u ( x2 ) f 2|1 ( x2 | x1 ) dx2 .
−
and
E X 2 − E ( X 2 | x1 ) | x1 ,
2
2 0 x1 x2 1
f ( x1 , x2 ) =
0 elsewhere.
Then the marginal probability density functions are,
respectively,
1
2dx2 = 2 (1 − x1 ) 0 x1 1
f1 ( x1 ) = x1
0 elsewhere.
and
x2
2dx1 = 2 x2 0 x2 1
f 2 ( x2 ) = 0
0 elsewhere.
The conditional pdf of X1, given X2 = x2, 0 < x2<1, is
2 1
= 0 x1 x2 1
f1|2 ( x1 | x2 ) = 2 x2 x2
0
elsewhere.
Here the conditional mean and the conditional variance
of X1 ,given X2 = x2, are respectively,
E X 1 | x2 = x1 f1|2 ( x1 | x2 ) dx1
−
x2 1 x2
= x1 dx1 = , 0 x2 1,
0
x2 2
and
x2 1
2
Var ( X 1 | x2 ) =
x2
x1 − dx1
0
2 x2
x22
= , 0 x2 1.
12
Virtual University of Pakistan
Probability Distributions
by
Then,
E[E(X2|X1)]= E(X2).
Proof:
E ( X2 ) = x h ( x ) dx
2 2 2
−
But
h( x2 ) = f ( x , x ) dx
−
1 2 1
Therefore, E ( X 2 ) can be written as
E ( X 2 ) = x2 f ( x1 , x2 ) dx1 dx2
− −
= x2 f ( x1 , x2 ) dx1dx2 = x2 f ( x1 , x2 ) dx2 dx1
− − − −
Multiplying and dividing by f1 ( x1 ), E ( X 2 ) can be written as
f ( x1 , x2 )
E ( X 2 ) = x2 dx2 f1 ( x1 ) dx1
−
− f1 ( x1 )
f ( x1 , x2 )
But x2 dx2 = E ( X 2 | x1 )
− f1 ( x1 )
Therefore
E ( X 2 ) = E ( X 2 | x1 ) f1 ( x1 ) dx1 = E E ( X 2 | x1 ) .
−
Hence, we can write
E ( X 2 ) = E E ( X 2 | X 1 ) .
by
the variances of X and Y, say σ12 and σ22, are obtained by setting
the function u(X,Y) equal to (x - µ1)2 and (x - µ2)2, respectively.
Covariance
E ( X − 1 )( X − 2 ) cov ( X , Y )
= =
1 2 1 2
x + y 0 x 1, 0 y 1
f ( x, y ) =
0 elsewhere.
and 2 = E (Y ) − 2 =
7 11
2 = E (Y ) = 2 2 2
.
12 144
Therefore, the covariance of X and Y is
2
7
1 1
11
E ( XY ) − 12 = xy ( x + y ) dxdy − = − .
0 0 12 144
Accordingly, the correlation coefficient of X and Y is
1
−
1
= 144 = − = −0.09
11 11 11
144 144
by
Properties
of the
Correlation Coefficient
1. Coefficient of Correlation lies between -1 and +1:
-1 ≤ ρ ≤ + 1 or | ρ | <1.
‘Upward-going’ scatter-diagram: 0 < ρ ≤ 1
Neither upward nor downward scatter-diagram: ρ =0
‘Downward-going’ scatter-diagram: -1 ≤ ρ < 0
Virtual University of Pakistan
Probability Distributions
by
Properties
of the
Correlation Coefficient
1. Coefficients of Correlation are independent of Change
of Origin
by
Properties
of the
Correlation Coefficient
4. Coefficient of Correlation possess the property of
symmetry
ρXY = ρYX
5. Co-efficient of correlation measures only linear
correlation between X and Y.
‘Upward-going’ scatter-diagram: 0 < ρ ≤ 1
6. If two variables X and Y are independent, coefficient of
correlation between them will be zero.
Virtual University of Pakistan
Probability Distributions
by
can be re-written as
0 x y 1
and that 0 x y 1 can be re-written as 0 x 1, x y 1
First and foremost, we need to find each of the two
marginal distributions:
Because of the nature of the support of the bivariate density function,
when finding marginal distributions:
and
and
Marginal distribution of Y :
y
fY y 2dx 2. x 0 2 y 0 2 y, 0 y 1
y
0
Then, the conditional distributions are as follows:
f x, y 2 1
f x | y , 0 x y
fY y 2 y y
and
f x, y 2 1
f y | x , x y 1
f X x 2 1 x 1 x
1 y3 1 y3 y 2
0 . , 0 y 1
y 3 y 3 3
Therefore,
2
2
y y
Var X | y E X | y E X | y
2 2
3 2
y y 4 y 3y y
2 2 2 2 2
, 0 y 1.
3 4 12 12
Virtual University of Pakistan
Probability Distributions
by
f x, y
f 2|1 y | x
f1 x
at the points where f1(x) > 0;
and the conditional mean of Y, given X = x, is given by
yf x, y dy
yf x, y
E Y | x yf 2|1 y | x dy dy
f1 x f1 x
This conditional mean of Y, given X=x, i.e. E(Y|X) is, of
course,
a function of x, say u(x).
yf x, y dy
Why ?
.
f1 x
Have a close look at the expression
In case u(x) is a linear function of x,
say u x a bx
E Var Y | X 1
2
2
2
2
The Conditional Variance can be interpreted as a
Random Variable:
by
The Concept of
Independent Random Variables
(explained through an example)
Let X1 and X2 denote the random variables of the continuous
type which have the joint pdf f(x1,x2) and marginal probability
density functions f1(x1) and f2(x2), respectively.
f x1 , x2 f1 x1 f 2|1 x2 | x1 1
Suppose that we have an instance where f2|1(x2|x1) does
not depend upon x1.
Then, for random variables of the continuous type, the
marginal pdf of X2 is given by :
f 2 x2 f x , x dx
1 2 1 f 2|1 x2 | x1 f1 x1 dx1
f 2|1 x2 | x1 f1 x1 dx1 f 2|1 x2 | x1 .
Accordingly,
f2 x2 f2|1 x2 | x1
f x1 , x2 f1 x1 f 2 x2
f(x1, x2)=f1(x1)f2(x2).
Formal Definition of Independence
in the case of
Discrete Random Variables
p(x1,x2) = p1(x1)p1(x2).
Virtual University of Pakistan
Probability Distributions
by
x1 x2 0 x1 1, 0 x2 1
f x1 , x2
0 elsewhere
2 0 2 2
so that
1
f1 x1 x1 , 0 x 1
2
and
1
1
x
2
f 2 x2 x1 x2 dx1 x1 x2
1
0 2 0
1 1
x2 0 0 x2
2 2
so that
1
f 2 x2 x2 , 0 x2 1.
2
Now
1 1 1 1 1
f1 ( x1 ) f 2 ( x2 ) x1 x2 x12 x1 x2
2 2 2 2 4
whereas
f ( x1 , x2 ) x1 x2
by
F(x , y) = FX(x)FY(y)
For all (x , y) ϵ R2.
Proof:
If X and Y are independent, then
F x, y P X x, Y y P X x P Y y FX x FY y .
Now suppose
F x, y FX x FY y for all x, y 2
and let A a, b
and B c, d .
P X A, Y B P a X b, c Y d
F b, d F a, d F b, c F a, c
FX b FY d FX a FY d FX b FY c FX a FY c
P X A, Y B FX b FX a FY d FY c
P a X b P c Y d
P X A P Y B .
It may now be shown that P(Xϵ A,Y ϵ B)=P(X ϵ A)P(Y ϵB)
for any sets A⸦ℝ and B⸦ℝ. Thus X and Y are independent.
Virtual University of Pakistan
Probability Distributions
by
u x1 f1 x1 dx1 v x2 f 2 x2 dx2
E u X 1 E v X 2 .
Hence proved.
Upon taking the functions u(.) and v(.) to be the identity
functions in in the theorem we note that for independent
random variables X1 and X2,
E X1 X 2 E X1 E X 2 . (1)
Virtual University of Pakistan
Probability Distributions
by
M t1 , t2 M t1 ,0 M 0, t2 ;
That is, the joint mgf is identically equal to the product of the
marginal mgfs.
Proof:
1. Suppose that X 1 and X 2 are independent.
M t1 , t2 M t1 ,0 M 0, t2 ;
Now X1 has a unique mgf, which, in the continuous case
is given by
M t1 , 0 et1x1 f1 x1 dx1.
Thus we have
t1x1 t2 x2
M t1 ,0 M 0, t2 e f1 x1 dx1 e f 2 x2 dx2
e f1 x1 f2 x2 dx1dx2 f1 x1 f 2 x2 dx1dx2 .
t1 x1 t2 x2
e t1 x1 t2 x2
e
We are given that,
M t1 , t2 M t1 ,0 M 0, t2 ;
Thus
M t1 , t2 et1x1 t2 x2 f1 x1 f 2 x2 dx1dx2 1
But, by definition, M(t1,t2) is the mgf of X1 and X2
i.e.
M t1 , t2 E et1 X1 t2 X 2 et1x1 t2 x2 f x1 , x2 dx1dx2 2
The uniqueness of the mgf implies that the two
distributions of probability that are described by f1 x1 f2 x2
and f x1 , x2 are the same.
Thus
f x1 , x2 f1 x1 f 2 x2 .
That is, if M t1 , t2 M t1 ,0 M 0, t2 , then X1 and X2 are
independent.
Virtual University of Pakistan
Probability Distributions
by
X p(x)
0 ½
1 ½
Example 2:
Then, we have
X p(x)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
SHAPE OF THE DISTRIBUTION
Virtual University of Pakistan
Probability Distributions
by
x1 x2 0 x1 1, 0 x2 1
f x1 , x2
0 elsewhere
2 0 2 2
so that
1
f x x , 0 x 1
and
1
1
x
2
f 2 x2 x1 x2 dx1 x1 x2
1
0 2 0
1 1
x2 0 0 x2
2 2
so that
1
f 2 x2 x2 , 0 x2 1.
2
Now
1 1 1 1 1
f1 ( x1 ) f 2 ( x2 ) x1 x2 x12 x1 x2
2 2 2 2 4
whereas
f ( x1 , x2 ) x1 x2
by
CDF
of the
Discrete Uniform Distribution
Example :
Suppose that we are rolling one fair die one time and
we want to have the cdf of the distribution in the
mathematic form, algebraic form and want to draw its
graph.
Cumulative Distribution Function:
Step 1: Find the cumulative probabilities:
X p(x)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
Cumulative Distribution Function:
Step 1: Find the cumulative probabilities:
X p(x) cumulative
1 1/6 1/6
2 1/6 2/6
3 1/6 3/6
4 1/6 4/6
5 1/6 5/6
6 1/6 6/6
Step 2: Write as follows: 0 x 1
1/ 6 1 x 2
2 / 6 2 x3
p ( x) 3 / 6 3 x 4
4 / 6 4 x5
5 / 6 5 x6
6 / 6 6 x
The entire X -axis (from - to ) must be covered.
Graph of the CDF is a step function:
Step 2: Write as follows: 0 x 1
1/ 6 1 x 2
2 / 6 2 x3
p ( x) 3 / 6 3 x 4
4 / 6 4 x5
5 / 6 5 x6
6 / 6 6 x
The entire X -axis (from - to ) must be covered.
Virtual University of Pakistan
Probability Distributions
by
Derivation of
the Mean and Variance
of the Discrete Uniform Distribution
We derive the formula for the mean of the discrete random
variable X ~ U(1,n):
n n
1
xp X x x.
x 1 x 1 n
1 1 n(n 1)
1 2 ... n
n n 2
or
n 1
2
First
1 n n 1 2n 1 n 1 2n 1
EX x
n
1
2 2
n x 1 n 6 6
n 1 2n 1 n 1 n 1 2n 1 n 1
2 2
6 2 6 4
4 n 1 2n 1 6 n 1 2 n 1 2 2n 1 3 n 1
2
24 24
2 n 1 4n 2 3n 3 2 n 1 n 1
24 24
2 n 2 1 n 2 1
.
24 12
n 1
2
The square root of this expression gives the
12
Standard Deviation of the discrete uniform distribution i.e.
n 1 2
12
Example:
Let us consider the rolling of a fair die, when X is
representing the number of dots on the uppermost face of
the die , the values would be 1, 2,3,4,5,6. therefore, n = 6
We know that
n 1
EX
2
when n= 6 we get the
mean as 3.5
Absolutely symmetric distribution
Virtual University of Pakistan
Probability Distributions
by
x 1 n n x 1
e
n x
t
Now
x 1
e 1 e e 1 e
t t n
t tn
e
n x
So t
x 1 1 e
t
1 e
t
Now,
since M t e
1 n t x
n x 1
M t
1
e t
1 e tn
e t
1 e tn
n 1 e n 1 et
t
by
Binomial distribution
(Binomial experiment
and PMF of the distribution)
Binomial Experiment
n x
p 1 p
n x
x 0,1, 2,..., n (1)
p x x
0
elsewhere.
where p is the probability of success in each trial and n is the number of trials.
Virtual University of Pakistan
Probability Distributions
by
Binomial distribution
(Binomial experiment
and PMF of the distribution)
EXAMPLE
Example:
An event has the probability p = 3/8.
Find the complete binomial distribution for n=5 trials.
q = 1- p = 5/8; and n = 5.
x 0 1 2 3 4 5
P(X=x) 0.0954 0.2861 0.3233 0.2060 0.0618 0.0074
Binomial Theorem
n
n x n x
q p p q
n
x 0 x
Virtual University of Pakistan
Probability Distributions
by
Shape
of the
Binomial distribution
• The shape of the binomial probability distribution
depends on the values of the two parameters p and n.
In general:
• As n, the number of trials, increases to ∞,
β10 and β23.
by
An example pertaining to
the Binomial Distribution
Example:
Find the smallest value of n that yields P(Y ≥1) > 0.80.
Solution:
Let us first see how
0
Here p 1/ 3 and n
So P(Y 0) (1 1/ 3) (2 / 3)
n n
Now, we are required to find the smallest value of n that
yields P(Y≥1) > 0.80.
by
Example
Example
The probability that a planted radish seed germinates is
0.80. A gardener plants nine seeds.
Let X denote the number of radish seeds that successfully
germinate.
(i) What is the average number of seeds the gardener
could expect to germinate?
(ii) What is the standard deviation of X?
(iii) What is the probability that, out of the nine seeds, at
least 8 will germinate ?
Solution:
Assuming that the nine seeds are selected at random from a large
number of seeds, it is easy to see that X can be regarded as a
binomial random variable --- as follows:
1. Either the seed will germinate (success) or not germinate
(failure)
2. The germination/non-germination of each seed is independent of
each other
3. The probability of germination of each seed is the same i.e. 0.80
4. The number of seeds is fixed i.e. 9.
Now, if X is a binomial random variable, then we know that
the mean of X is np.
Therefore,
therefore,
x x
9 9
So P ( X 8 or X 9) 0.80 (0.20) 0.809 (0.20)99
8 9 8
8 9
9 0.808 (0.20) 1 0.809 1 9 0.1678 0.20 0.1342
0.3020 0.1342 0.4362 43.62%
(a non trivial amount of probability )
Virtual University of Pakistan
Probability Distributions
by
Derivation of the
MGF
of a
Binomial distribution
The mgf of a binomial distribution is easily obtained as follows:
n
n x
M t e p x e p 1 p
tx tx n x
x x 0 x
n t x
e p 1 p
n
n x
x 0 x
1 p pe
t n
We know that M 0
and
M 0
2 2
Since
pe
n 1
M t n 1 p pe
t t
and
t n 1
M t n 1 p pe pe n n 1 . 1 p pe pe ,
t t n2 t 2
Sin ce
pe
n 1
M t n 1 p pe t t
so
n 1
M t n 1 p pe t
pe t
n 1
M 0 n 1 p pe 0
pe 0
p n 1 p np
n 1
n 1 p p
n 1
And since
pe n n 1 . 1 p pe pe
n 1 n2
M t n 1 p pe
t t t t 2
so
pe n n 1 . 1 p pe pe
n 1 n2
M 0 n 1 p pe 0 0 0 0 2
n 1 n2
n 1 p p p n n 1 . 1 p p p
2
n 1 p n n 1 .1 p
n 1 n2 2
np n n 1 p 2
implying that
M 0
2
2
np n n 1 p np
2 2
np n n p np
2 2 2
np n p np n p
2 2 2 2 2
np np 2 np 1 p npq.
Virtual University of Pakistan
Probability Distributions
by
5 1 x 2 5 x
x 0,1, 2,...,5
p x x 3 3
0 elsewhere.
and
5 10
np and np 1 p .
2
3 9
Virtual University of Pakistan
Probability Distributions
by
The sum of m
independent
Binomial random variables
with same value of p
is also Binomial
Theorem:
Let X1, X2,…., Xm be independent random variables
such that Xi has binomial b(ni, p) distribution, for
i = 1,2,…,m.
Y. i 1 X i
m
Let,
the mgf of X i is M X i t 1 p pe
t ni
By independence it follows that
i1 i1 ni
M Y t 1 p pe 1 p pe q pe
m m m
t ni t ni t
.
i 1
x 1 r
P X x p 1 p , x r , r 1, r 2,...
x r
r 1
and we say that X has a negative binomial(r, p) distribution.
Example:
3 1
or
x 1
P X x 0.5 , x 3, 4, 5,...
x
2
Calculation of Probabilities
Shape of the distribution
In general,
Shape of the Negative Binomial distribution
It is obvious that the negative binomial distribution is
moderately positively skewed and that, as p tends to zero,
the shape of the distribution tends to normality.
Mean and Variance of the
. Negative Binomial Distribution
0
Example
An oil company conducts a geological study that indicates that an
exploratory oil well should have a 20% chance of striking oil.
The company is wanting success twice i.e. two oil wells from
where they will be able to pull out oil --- and is willing to try upto
five times.
What is the probability that the second success comes on the fifth
well drilled?
Solution:
We know that, in a sequence of independent Bernoulli(p) trials, if
the random variable X denotes the number of the trial at which the rth
success occurs, then then the probability mass function of X is:
x 1 r
P X x p 1 p , x r , r 1, r 2,...
x r
r 1
Here, we do have a sequence of Bernoulli trials as --- whenever
we will dig an exploratory oil well, either we will or will not strike
oil.
4 0.2 0.8
2 3
0.082
8.2%
Virtual University of Pakistan
Probability Distributions
by
Geometric Distribution
(PMF and shape of the
distribution)
Geometric Distribution
(PMF and shape of the distribution)
Assume Bernoulli trials — that is,
n x
p X x p 1 p , x 0,1, 2,3,..., n
n 1
x
When n=1,
there are two possible outcomes,
and p, the probability of success, is the only parameter of
the Bernoulli distribution given by
1 x
p X x p 1 p , x 0,1
1 x
x
or
p X x p 1 p
1 x
x
, x 0,1
If we let X denote the number of trials until the first
success.
p X x 1 p
x 1
p, x 1, 2,3,...
Example:
Suppose that a game consists of tossing a fair coin until we
obtain the first head.
Then p=P(H)=0.5
or
1
pX x x , x 1, 2,3,...
2
Representation in tabular form:
X P(x)
1 1/21=1/2=0.5
2 1/22=1/4=0.25
3 1/23=1/8=0.125
4 1/24=1/16=0.0625
5 1/25=1/32=0.03125
6 1/26=1/64=0.015625
Virtual University of Pakistan
Probability Distributions
by
Application
of the Geometric Distribution
(explained through an example)
Example:
If the probability that a person will believe a rumor
about the retirement of a certain politician is 0.25, what
is the probability that
p X x 1 p
x 1
p, x 1, 2,3,... (1)
Here, let X denote the number of a person who hears the
rumor and is the first one to believe it.
or
pX x 0.75 0.25 , x 1, 2,3,...
x 1
Since the sixth person is the first to believe the rumor,
i.e. the first success occurs on the sixth trial, therefore
we will put x=6.
Hence
by
x1 + x2 +…+ xk-1 ≤ n.
Suppose that a die is tossed 30 times, Then, x1 represents the
event how many times 1 came, x2 represents how many times
2 came and so on upto x6, k = 6. Suppose that x1 = 2, x2 = 3,
x3 = 4, x4 =5, x4 = 6.
By adding all these numbers we get 2+3+4+5+6 = 20.
Total tosses were 30, therefore the last possibility of getting a
6 is 10 times.
and hence exactly n - (x1 +…+ xk-1 )
= 30 - (2+3+4+5+6) = 30 – 20 = 10
• The probability mass function of this multinomial
distribution is:
f x1 , , xk ; n, p1 , , pk
n!
k
p1 p2
x1 x2
p , when
xk
x n
x1 ! x2 ! xk ! i 1 i
k
0
otherwise.
for non-negative integers x1, ..., xk.
This is the multinomial pmf of k-1 random variables
X1, X2,…, Xk-1 of the discrete type.
Properties:
The mean of the Multinomial distribution
E X i npi
The variance of the Multinomial distribution
Var X i npi 1 pi
The covariance of the Multinomial distribution
cov X i , X j npi p j , i j
Note:
The binomial distribution is a special case of the
multinomial distribution.
Virtual University of Pakistan
Probability Distributions
by
0
otherwise.
Putting k=2, we obtain
n!
2
x1 x2
p1 p2 , when x n
f x1 , x2 ; n, p1 , p2 x1 ! x2 ! i 1 i
0
otherwise.
But
2
i 1 i
x n means x1 x2 n x2 n x1
And
2
i 1
pi 1 means p1 p2 1 p2 1 p1
Hence, the above equation can be re-written as
n!
f x1 ; n, p1 p1 1 p1
x1 n x1
x1 ! n x1 !
or, simply
n!
f x; n, p p 1 p
x n x
x ! n x !
Virtual University of Pakistan
Probability Distributions
by
Part (i):
What is the probability that the jury contains seven White,
three Black and two Hispanic members?
Solution:
Here, we are dealing with the Multinomial distribution with k = 3.
The resulting distribution is known as the Trinomial distribution.
To solve this problem, consider the random vector
X = (X1, X2, X3)
where
X1 = number of White members,
X2 = number of Black members and
X3 = number of Hispanic members
Then X has a Trinomial distribution with parameters n = 12 and p1=0.65,
p2 =0.20, p3 =0.15.
Hence, the answer to the first question is:
n!
P X 1 7, X 2 3, X 3 2 p1x1 p2x2 p3x3
x1 ! x2 ! x3 !
12!
0.65 0.20 0.15 0.0699 6.99% or 7%.
7 3 2
7!3!2!
Part (ii):
What is the probability that the jury contains four White
and eight Other members?
Solution:
Here, we are dealing with the Multinomial distribution
with k = 2, as follows:
White Others
(Black & Hispanic
combined)
65% 20%+15%=35%
n!
P X 1 4, X 2 8 x1 x2
p1 p2
x1 ! x2 !
12!
0.65 0.35 0.0199 1.99% or 2%.
4 8
4!8!
Part (iii):
What is the probability that the jury contains at the most
one Black member.
Solution:
Here again, we are dealing with the Multinomial
distribution with k = 2, as follows:
Black Others
(White & Hispanic
combined)
20% 65%+15%=80%
0!12!
12!
and P X 1 0.20 0.80 0.206
1 11
1!11!
Therefore, the answer is:
P X 0 or X 1 P X 0 P X 1
0.069 0.206 0.275 27.5%.
Virtual University of Pakistan
Probability Distributions
by
MGF
of the Multinomial Distribution
Recall: Concept of Multinomial Distribution
Let a random experiment be repeated n independent times.
0
otherwise.
for non-negative integers x1, ..., xk.
For this purpose, we begin with the definition of the
mgf of a multivariate distribution (in general):
Definition of Moment Generating Function
of a Random Vector having k components:
Let X ( X 1 , X 2 ,..., X k ) be a random vector.
If E et1 X1 t2 X 2 ...tk X k exists for | t1 | h1 , | t2 | h2 ,..., | tk | hk ,
where h1 , h2 ,..., hk are positive, then E et1 X1 t2 X 2 ...tk X k is denoted by
M X1 , X 2 ,..., X k t1 , t2 ,..., tk
and is called the moment generating function mgf
of the random vector X .
As far as the moment generating function of the multinomial
distribution is concerned, we have:
ti X i
k
M X t M X t1 , t2 , , tk E e i1
n
k
ti
pi e
i 1
The cumulant generating function of the Multinomial
distribution is given by
K X t K X t1 , t2 , , tk log M X t1 , t2 , , tk
k ti
n log pi e
i 1
Coming back to the MGF, the question is:
ti X i
3
M X t M X1 , X 2 , X 3 t1 , t2 , t3 E e i1
2
ti
3
pi e p1e p2e p3e t3 2
t1 t2
i 1
Now we know that (a b c)2 a 2 b 2 c 2 2ab 2ac 2bc
Hence
M X1 , X 2 , X 3 t1 , t2 , t3
t1 t2 t1 t3 t2 t3
p e p2 e p3 e 2 p1 p2e
1
2 2 t1 2 2 t2 2 2 t3
2 p1 p3e 2 p2 p3e
Then the means of the random variables X1, X2 and X3 are
given by:
M 0, 0, 0
E X1 ,
t1
M 0, 0, 0
E X2 ,
t2
M 0, 0, 0
E X3
t3
Hence, taking the partial derivative of the MGF with respect to t1 , we have:
2 2t1
t1
M X1 , X 2 , X 3 t1 , t2 , t3
t1
p1 e p2 2e 2t2 p32e 2t3 2 p1 p2et1 t2 2 p1 p3et1 t3 2 p2 p3et2 t3
.
2 p12e2t1 0 0 2 p1 p2et1 et2 2 p1 p3et1 et3 0 2 p12e2t1 2 p1 p2et1 et2 2 p1 p3et1 et3
As such,
t1
M X1 , X 2 , X 3 0,0,0 2 p12e2t 0 2 p1 p2e0e0 2 p1 p3e0e0
Here n=2.
E X i 2 pi
i.e.
E X 1 2 p1 , E X 2 2 p2 & E X 3 2 p3
Virtual University of Pakistan
Probability Distributions
by
Hypergeometric Distribution
(PMF and shape of the distribution)
Hypergeometric Distribution
by
Hypergeometric Distribution
(EXAMPLE)
Example:
Then,
by
Dr. Saleha Naghmi Habibullah
Topic No. 139
Derivation of
the Mean and Variance of
the Hypergeometric Distribution
Two important properties of the hypergeometric
probability distribution are given here.
N n
µ np and npq
2
,
N 1
k N k
where p and q .
N N
Derivation of the Mean:
Let X have the hypergeometric probability distribution
given by
k N k
h x; N , n, k ,
x n x
N
n
for x such that, 0 x n and 0 x k .
Then the mean, , is given by
EX
k N k k N k N k
x x n x k !
0
n n
x n x x n x n x
N
N
N
x ! k x !
x 0 x 1 x 1
n n n
or
N k N k
x k k 1 ! k k 1!
n
nx n
nx
EX
N x 1 N
x x 1 ! k 1 1 x !
x 1
x 1 ! ( k 1) ( x 1) !
n n
k 1 N k
k
n
x 1 n x k n k 1 N k
x 1 N N x 1 x 1 n x
n n
Let y x 1implying that x y 1, then
k n 1 k 1 N k
when x 1, y 0, and when x n, y n 1
N y 0 y n 1 y
n
k N 1 m
a b a b
N n 1
j 0 j m
j m
n
k N 1
Now can be re-written as
n 1
N
n
k
.
N 1!
N! n 1 ! ( N 1) (n 1) !
n ! N n
!
kn ! N n ! N 1 !
.
N! n 1! N n !
kn n 1! N n ! N 1! nk np where p k .
.
N N 1 ! n 1! N n ! N N
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 140
Derivation of
the Mean and Variance of
the Hypergeometric Distribution
The variance of the Hypergeometric distribution are as
follows:
K N K N n
Variance n
N N N 1
Or --- in other words --- the mean and variance of the
hypergeometric probability distribution are
N n
µ np and npq
2
,
N 1
k N k
where p and q .
N N
Derivation of the Variance:
Now, E X 2 E X X 1 X
E X E X X 1
n n
x.h x; N , n, k x x 1 h x; N , n, k
x 0 x 0
E X E X 2 2
2 2
Now, E X 2
n
k N k n
k N k
x x 1
x x 1
nk x 0
x n x nk 0 0 x 2
x n x
N N N N
n n
n
k N k
x x 1
x n x
Now E X
nk x2
2
00
N N
n
n
k k 1 k 2 ! N k
x x 1
x( x 1)( x 2)!(k x 2 2)! n x
nk x 2
N N
n
n
k k 1
k 2 ! N k
nk x 2 ( x 2)!(( k 2) ( x 2))! n x
N N
n
n
k 2 N k
k k 1
We have E X 2
nk x2 x 2 n x
N N
n
Let y x 2 implying that x y 2, then
k 2 N k
n2
k k 1
when x 2, y 0, and
EX
2 nk y 0 y n 2 y
N when x n, y n 2
N
n
Now
nk k k 1 n 2 k 2 N k
EX
2
N N y 0 y n 2 y
n
nk k k 1 N 2 m a b a b
N N n2 j 0 j m j m
n
nk k k 1 N 2
E X 2
N N n2
n
nk k k 1 N 2 ! nk k k 1 N 2 ! n ! N n !
N N! N N ! n 2 ! N n !
n 2
! N n !
n ! N n !
nk k k 1 N 2 ! n n 1 n 2 ! N n ! nk k k 1 n n 1
N N N 1 N 2 ! n 2 ! N n ! N N N 1
nk k k 1 n n 1 nk
2
Var X E X
2 2 2
N N N 1 N
nkN N 1 k k n n N N 1 n 2 k 2
2 2
2
N N 1 N N 1
2
N 1 N 2
nkN N 1 n 2 k 2 kn 2 nk 2 nk N N 1 n 2 k 2
N 2 N 1
nkN 2 nkN n 2 k 2 N kn 2 N nk 2 N nkN n 2 k 2 N n 2 k 2
N 2 N 1
nkN kn N nk N n k
2 2 2 2 2 nk N 2 nN kN nk
N N 1
2
N 2 N 1
Now
nk N 2 nN kN nk nk N k ( N n)
N 2 N 1 N .N N 1
So
N n
Var( X ) npq
N 1
k N k
where p and q .
N N
Virtual University of Pakistan
Probability Distributions
by
Poisson Distribution
(PMF and shape of the distribution)
Poisson Distribution
(PMF and shape of the distribution)
where m > 0.
Since 0,
therefore p ( x) 0
and
x e
x
p x
x x 0 x!
e
x 0 x!
Recall that the series
m2 m3 mx
1 m
2! 3! x 0 x !
x 0
e e
e 1
0
by
Poisson Process
The Poisson Process
The Poisson process is one of the most widely-used
counting processes.
• From the record of the past six months, we know that, on a particular
web server, requests for individual documents occur at the rate of 3 per
hour;
(Other than this information, the timings of the requests seem to be
totally random.)
A process in which events occur randomly either over a time
scale or over a distance scale.
In other words,
by
Application of
Poisson Process
EXAMPLE
Hence
So that t 2 7 14
e t t x e14 14 x
x 0,1, 2,...
P X x x! x! (1)
0
elsewhere,
Now, we require X<2.
So
P X 2 P X 0 or X 1 or X 2
14
14 14
14 14
14
0 1 2
e e e
0! 1! 2!
e 14 14e 14 98e 14 113e 14
113 /1202592.9591 0.00009396
We can say that the probability that at the most 2
accidents will occur at this particular crossing in a week,
this probability is almost 0, which is “Next to
Impossible!”
Virtual University of Pakistan
Probability Distributions
by
x e
x 0,1, 2,...
p x x!
0
elsewhere.
It is easy to prove that, for a Poisson distribution, mean = variance = 0
x e
x 0,1, 2,...
p x x!
0
elsewhere.
1
COV 100% 100% 100%
Example 1:
Suppose that X has a Poisson distribution with 4.
5z e5
z 0,1, 2,...
p z z!
0
elsewhere.
The mean of this distribution is 16.
&
The variance of this distribution is 16.
Therefore,
The coefficient of variation of this distribution is
1
COV 100% 100% 100%
16
1
100% 25%
4
It appears that, as increases, the coefficient of
variation of the Poisson distribution decreases.
by
Derivation
of the Mean
of the Poisson distribution
If the random variable X has a Poisson distribution with
parameter µ,
e
x
p x;
x!
then its mean and variance are given by E(X)= µ and
Var(X)= µ.
Proof:
By definition,
e x
Mean E X x. p x; where p x;
x 0 x!
0 1 2 3 4
0. e 1. e 2. e 3. e 4. e ...
0! 1! 2! 3! 4!
1
2
3
4
0 1. e 2. e 3. e 4. e ...
1! 2! 3! 4!
e x
Mean E X x. p x; where p x;
x 0 x!
2 3 4
0 1. e 2. e 3. e 4. e ...
1 2 1 2 3 1 2 3 4!
2 3 4
e e e e ...
1 1 21 2 3!
3
4
e 1
...
2! 3!
But we know that the series
m3 m4 mx
1 m ...
2! 3! x 0 x !
3 4
Mean e 1
...
2! 3!
e .e
Virtual University of Pakistan
Probability Distributions
by
Dr. Saleha Naghmi Habibullah
Topic No. 146
Derivation
of the
Variance of the Poisson distribution
If the random variable X has a Poisson distribution with
parameter µ, then its mean and variance are given by
E(X)= µ and Var(X)= µ.
Proof:
By definition
Var X E X E X
2 2
, where
E X 2 E X X 1 X E X E X X 1
e x x
x. x x 1 e .
x 0 x! x 0 x!
EX2
e x x
x. x x 1 e .
x 0 x! x 0 x!
x
e x x 1 .
x 0 x x 1 x 2 !
x 2
x2
2e 2e
x2 x 2 ! x2 x 2 !
x starts at 2, as the first two terms in
the summation are zero.
Let y x 2, then
y
EX 2
e y!
2
( y 0,1, 2,...)
y 0
when x 2, y 0,
2
e e 2
and x , y ,
Hence Var X 2 2
Virtual University of Pakistan
Probability Distributions
by
M t e p x e
me x m
me
t x
m m met m et 1
tx tx
e e e e
x x 0 x! x 0 x!
met
Since M t e
m et 1
met
Since M t e
m et 1
met e m e 1 met 2 ,
M t e
m et 1
t
and
then
M 0 m
and
2 M 0 2 m m 2 m 2 m.
That is, a Poisson distribution has 2 m 0.
By definition, the mgf of a discrete distribution is given by
M t etx p x
x
M t e p x e
me x m
me
t x
m m met m et 1
tx tx
e e e e
x x 0 x! x 0 x!
and
Therefore
M 0 m
met e m e 1 met 2
M t e
m et 1
t
put t 0
me0 e m e 1 me0 2
M 0 e
m e0 1
0
e m 0
m.1 e m 0
m.1 1. m 1. m
2 2
Therefore
M 0 m m .
2
Hence
2 M 0 2 m m 2 m 2
m m 2 m 2 m.
That is, a Poisson distribution has 2 m 0.
Virtual University of Pakistan
Probability Distributions
by
Derivation of
Poisson Approximation to the Binomial distribution
To derive an approximation formula to the binomial distribution
b(x ; n ,p) when n ∞ , p0, and the product np remains constant
we proceed :
The binomial distribution b(x ; n ,p) may be written as
n x n x
b x; n, p p q , for x 0,1,..., n.
x
n!
b x; n, p p x q n x
x !(n x)!
n n 1 n 2 ... n x 1 (n x)! x n x
pq
x !(n x)!
n n 1 n 2 ... n x 1 x n x
pq ... 1
x!
Now, let np .
Then p and q 1 p 1 .
n n
Substituting in (1), we get
n n 1 n 2 ... n x 1
n x
x
b x; n, p 1
x! n n
Now
n n 1 n 2 ... n x 1
n x
x
b x; n, p 1
x! n n
can be re-written as:
x n n 1 n 2 ... n x 1
x
n
b x; n, p . 1 1
n n
x
x! n
x n n 1 n 2 n x 1
x
n
. . ... 1 1
x ! n n n n n n
x
x 1 2 x 1
n
approaches unity.
n
n
i.e. lim 1 = e 1e 1....e 1 ( times)
n n
n
or lim 1 e .
n n
Thus the limiting value of P(X=x) is given by the
expression
x
x
e
lim b x; n, p .1.1...1.e
n x! x!
for x 0,1,...,
In other words, if X is a binomial r.v. such that
n x n x
P X x p q , then
x
x e
lim P X x , for x 0,1, 2,...,
n x!
np
by
An Example of
the Poisson Approximation to
the Binomial distribution
• To derive an approximation formula to the binomial
distribution b(x;n, p) where n ∞ , p 0, and the
product np remains constant, we proceed:
Example:
Suppose 8% of the tires manufactured at a particular plant
are defective. To illustrate the use of the Poisson
approximation for the binomial, the probability of
obtaining exactly one defective tire from a sample of 20
using equation (5.18) is calculated as follows:
20 .08
20 .08
e1.6 1.6
1 1
e
P X 1 0.3230
1! 1!
Had the true distribution, the binomial, been used instead
of the approximation,
20
P X 1 .08 .92 0.3282
1 19
1
To illustrate the use of the Poisson approximation for the
binomial, the probability of obtaining exactly one
defective tire from a sample of 20 using the formula is
calculated as follows:
20 .08
20 .08
1.6
1
1.6 1
e e
P X 1 0.3230
1! 1!
Virtual University of Pakistan
Probability Distributions
by
x x x
1 1
F x f x dx dx 1dx
a a
ba ba a
1 1 xa
xa x a
x
xa x a
F x
ba ba ba
or
1 a
F x x
ba ba
which is of the form y mx c.
Virtual University of Pakistan
Probability Distributions
by
Derivation of
Var X
12
Proof of Expectation:
The probability density function of the continuous uniform
distribution is given by
1
for a x b,
f x b a
0 for x a or x b
Therefore
1
b b
1
E X xf ( x)dx x dx xdx
a
ba ba a
b 2 a 2 b a b a
b
1 x 2
b a 2 a 2 b a 2 b a
ba ab
2 2
Proof of Variance:
By definition,
Var X E X E X
2 2
.
Now,
b
1 1 x
b 3
EX x
2
dx
2
3
a b a b a a
b a
3
3
b a a 2
ab b 2
a 2 ab b 2
3b a 3b a 3
Therefore,
a ab b a b
2 2 2
Var X
3 2
a 2 ab b 2 a 2 2ab b 2 4a 2 4ab 4b 2 3a 2 6ab 3b 2
3 4 12
b a
2
a 2ab b
2 2
.
12 12
Virtual University of Pakistan
Probability Distributions
by
Application
of the Uniform Distribution
(explained through an example)
Example:
Suppose in a quiz there are 30 participants.
6a 60 6
So, here F 6 .
b a 25 0 25
ii) There are 30 participants in the quiz
by
Exponential Distribution
(PDF, CDF and shape of the distribution)
The Probability density function (pdf) of an exponential
distribution is
e x x 0, 0
f x;
0 x 0.
1 e x x 0, 0
F x;
0 x 0.
The Cumulative distribution function (cdf) of an
exponential distribution in terms of
x
F x; 1 e
x 0, 0
0 x 0.
Proof:
x x x
F x f x dx e x dx e x dx
0 0
x x
e e x e0 e x 1
0
or
F x 1 e x , x 0
Now, let us consider the shape of the CDF:
CDF of the Exponential Distribution with mean=0.5, 1, 1.5
F x 1 e x , 0 x
x
F x 1 e 0.5
, 0 x
F x 1 e x , 0 x
x
F x 1 e 1.5
, 0 x
Important Note:
• If we are dealing with Poisson process in which events
are occurring randomly over a time scale then the
waiting time between successive events is distributed
according to the exponential distribution.
Virtual University of Pakistan
Probability Distributions
by
Derivation of
the Mean of
the Exponential Distribution
A very interesting property:
f x e x
, x 0, 0.
Therefore,
EX x f x dx x. e x dx
0
First of all we will apply the formula of integration by parts:
du
uvdx u vdx dx vdx dx
Here, we let u x and v e x (e x )
x e x
1. e x
dx xe x
1
e x dx
e dx xe
x 1 x x 1 x
xe e
Now, in order to find the mean, we will apply the limits 0 to
to the expression we have obtained:
1
E X x. e x
dx xe x
e x
0
0 0
x 1 1
xe x
0 e 0
In order to apply the upper limit to the first expression, we need
to utilize the L'Hospital Rule.
x x
We have xe x .
e
x 1 1 1 1
Now, Lim x Lim x 0.
x e x e e e
x 0 0 0
On the other hand, Lim x (0) 0 0.
x 0 e e e 1
Hence,
x 1 1 1 1
E X x x 0 0 x
e 0 e 0 e 0
1 1 1 1 1 1 1 1
0 0 0 0
e e 1
1 1
1 .
1
Thus, E X or .
Virtual University of Pakistan
Probability Distributions
by
Derivation of
the Variance of
the Exponential Distribution
A very interesting property:
x x
Here, we let u x 2
and v e ( e )
x 2 x.
du d 2
and
dx dx
Hence, we have
du
x .e dx u vdx vdx dx
2 x
dx
x 2 (e x ) 2 x. e x dx x 2e x 2 xe x dx
x e 2 x
2
xe x dx
dx
2 x 2
x e x e x
Hence, we have
dx
2
x .e
x 2 x
2
dx x e x e x
implying that
EX x e
2
2 2 x
E X
0
or
x 2
E X x E X
2 2
e 0
x2
In order to apply the upper limit to x , we need
e
to utilize the L'Hospital Rule.
x2
We have x
e
x2 2x 2 2 2
Now, Lim x Lim x Lim 2 x 2 0.
x e x e x e e
0
2 2
x 0 0
On the other hand, Lim x
0
0
0.
x 0 e e e 1
Hence, we have
x
2
E X x E X
2 2
e 0
2
0 0 E X
2 2 21 2
0 E X E X 2
Hence,
Var X E X E X
2 2
1 2 1 1
2
2 1 2
2 2 2 2 2
implying that
1
S .D.( X ) .
Hence we have the interesting property that the mean and
standard deviation of the exponential distribution are
equal.
by
Application of
the Exponential Distribution
(explained through an example)
Example:
The duration of long-distance telephone calls is found to
be exponentially distributed with a mean of 3 minutes.
What is the probability that a call will last
1 x3
P X 3 e dx e 3 e 1 0.3679.
x
3 3
3
And (ii) the probability that a call will last more than 5
minutes, is given by
3
P X 5 e dx e 3 e 1.7 0.1827.
1 x x
3 5
5
We say that, the probability of having call duration longer
than 5 minutes its probability is almost half of the
probability of getting a call duration greater than three
minutes.
Virtual University of Pakistan
Probability Distributions
by
MGF of
the Exponential Distribution
We know that the probability density function (pdf) of an
exponential distribution is
e x x 0, 0
f x;
0 x 0.
e t e t 0 e e0
M 0 t
t t t t
0 1 1
0
t t t t
Hence, the MGF of the Exponential Distribution is given
by
M 0 t for t .
t
Virtual University of Pakistan
Probability Distributions
by
Gamma distribution
(PDF, CDF and shape of the distribution)
Formal definition:
Gamma distribution
1
x 1 x /
e 0 x , 0, 0
f x
0
elsewhere
y
1 y
e dy
0
exists for α > 0 and that the value of the integral is a positive
number.
The integral is called the gamma function, and we write
y 1 y
e dy
0
Properties of the Gamma Function
• If α =1, clearly (easy to verify through
1 e y dy 1
integration)
0
1 y 2 e y dy 1 1 .
0
• Accordingly, if α is a positive integer greater than 1,
1 x e dx x e dx
0
11
0
1 1 1
x
. x e dx
0
Now
1 1 1
x
. x
0
e dx 1
implies that
1 1
x
x e , 0 x
, y t 1et dt.
0
by
Variance:
2
X
2
Derivation of the Mean
X E X
Proof:
x 1e x /
x e x / x e x /
0 x. dx 0 dx 0 dx
x e x /
x e x /
dx dx
0
0
1
we know that, 1 .
Hence,
x e x /
x e x /
E X dx 1 dx
0
1 0
1
f x; 1, dx 1
0
The variance of X is
EX E X
2 2 2
X
1 x /
E X x .
x e
dx 1
2 2 2
0
The variance of X is
EX
2
2
X
2
E X 1 2
2
.
2 2 2 2 2 2
Virtual University of Pakistan
Probability Distributions
by
Example of
computation of probabilities for
a gamma-distributed random variable
Example:
Let X have a gamma distribution with pdf
1 x
f x xe
, 0 x ,
2
1 x
f x 1
x e
, 0 x , 0, 0
Letting 2, we have
1 2 1
x 1 x
f x x e xe
2 2 1! 2
1 x
xe , 0 x
2
Hence we can see that the given density function is the pdf
of the gamma distribution having α =2.
• In general, it is easy to prove that, for 1 , the mode
of the Gamma distribution is given by 1 . .
• Here we have been given the information that the
mode=2. Therefore,
2
Mode 2 1 2 2.
2 1
Hence,
1 x 1 x 2 1 x 2
f x xe
2 xe xe , 0 x
2
2 4
Now to find P(X < 9.46).
9.46 9.46
1 x /2 1
P X 9.46 xe dx xe x/2
dx
0
4 4 0
P X 9.46 2e x 4e
1 x /2 x /2 9.46
4 0
by
M t E e e
tX tx 1 1 x /
x e dx
0
PROOF:
• First let’s combine the two exponential terms and move
the gamma fraction out of the integral:
1
M t x
1 tx x /
e dx
0
1 1 x 1 t /
e
0
x dx
Now we’re ready to do a substitution in the integral.
x 1 t u
Let u x
1 t
1 t
That means we have, du dx dx du
1 t
1
u
M t
1
e 1 t du
u
1 t
0
1
M t 1 t 1 t u
1 1 1
0
eu du
1
M t 1 t 1 t u
1 1 1
0
e u du
can be re-written as
1 t
M t
u e du
1 u
0
1 t
M t
u e du
1 u
0
can be re-written as
1 t 1 t
M t
u
1 u
e du u 1 u
e du
0 0
1 t
M t
Now, cancelling out the terms and we have the required
moment-generating function:
1
M t 1 t
for t
Virtual University of Pakistan
Probability Distributions
by
Beta distribution
(PDF, CDF and shape of the distribution)
The probability density function of the β distribution is
1
f x; , x 1 x ,
1 1
0 x 1; , 0
B ,
where,
B ,
and α and β are the shape parameters respectively.
About the Beta function B
t 1 t
1 1
F x I x , 0
, 0 x 1; , 0
B ,
where B is the beta function defined above.
Shape of the CDF:
Some Properties:
• Mean =μ
= E(x) =α/α+β
• Variance(x)
= αβ / (α+β)2 (α+β+1)
Mode:
• If α > 1and β > 1, the peak of the density is in the
interior of [0,1] and mode of the Beta distribution
is
mode = α−1/α+β −2
Proposition 1:
1 1
x 1 x x 1 x
1 1 1
B ,
1 1 11 2
x 1 x x 1 x
11
0 0
2 1 !
1. 1 1. n n 1 !
0!0!
• Therefore, the probability density function of a Beta
distribution with parameters α and β can be written as
f X x; 1, 1 1, if , x 0,1
0, f , x 0,1
which is the probability density function of a uniform
distribution of X on the interval [0,1].
Virtual University of Pakistan
Probability Distributions
by
EX x. f x dx
1
1
E X x. .x 1 x dx
1 1
0
B ,
1
1
E X x. .x 1 x dx
1 1
0
B ,
1 1
1 1
1 x dx .x 1 x dx
1 1
.x 11
0
B , 0
B ,
1
1
x 1 x dx
1
B , 0
Now we know that the function is given by
1
B , t 1 t
1 1
dt
0
Hence,
1
1 1
1 1
1 x dx .B 1,
1
B , 0
x
B ,
But we know that,
B ,
Hence
1
B 1, 1
.
B ,
. .
• The variance of this distribution is
1
2
respectively.
0
B ,
1 1
1 1
2 1
1 x dx 2 1
1 x dx
1 1
.x x
0
B , B , 0
B 2, n 1
B , 1
Therefore,
1
2
2
.
1 1
2
Virtual University of Pakistan
Probability Distributions
by
x 1 (1 x) 1
f ( x) , 0 x 1, 0, 0
B( , )
In this example, α = 2 and β = 5.
which is given by
2 1 51
(1 x)
0.3
x
P(0.2 x 0.3)
0.2
B(2,5)
dx
Solving the beta function
a b
B ( a, b)
a b
So,
2 5 2 5 2 1! 5 1!
B(2,5)
2 5 7 7 1!
1 ! 4 !
0.03333
6 !
We need to compute
P(20% x 30%) P(0.20 x 0.30)
which is given by
2 1 5 1
(1 x) x 21 (1 x)51
0.3 0.3
x
P(0.2 x 0.3)
0.2
B(2,5)
dx
0.2
0.033333
dx 0.2352
Interpretation:
by
Normal Distribution
(PDF, CDF and Shape of the distribution)
Definition:
A random variable X is said to have a normal distribution if its
pdf is
1 x
2
1
f x e 2
for x .
2
• µ is the mean or expectation of the distribution (and also
its median and mode),
• σ >0 is the standard deviation, and
• σ2 is the variance.
Shape of the distribution:
The graph points to the following two basic properties of
the normal distribution:
1. The normal curve is
asymptotic to the x-
axis,
2. The maximum ordinate
(i.e. the modal ordinate)
of the normal distribution
at x=µ is equal to
1/(σ√2π)
1 x
2
1
f x e 2
for x .
2
Cumulative Distribution Function
The formula for the cumulative distribution function of the normal
distribution is
1 x
F x 1 erf
where
for x .
2 2
erf(z) is the “error function”
2
erf z
defined by z
t 2
e dt.
0
The shape of the cumulative distribution function of the normal
distribution is S-shaped:
Virtual University of Pakistan
Probability Distributions
by
the integral g x dx is equal to zero.
Here, it is easy to see that the function
z2 z2
1 z
zf Z z z e 2
e 2
2 2
is an odd function.
Hence, the mean E Z zf z dz 0.
Z
Next: Derivation of the Variance of the Standard
Normal Distribution
Hence
z2
Var( Z ) E Z
2
z e
2 2 2
dz.
2 0
z2
Let w , then 2w=z 2 2w z (2w)1/2 z
2
(2)1/2 ( w)1/2 z 2( w)1/2 z.
Therefore
dz 1 dz 1
2 ( w)(1/2)1 ( w) 1/2
dw 2 dw 2
dz 1 1
dz dw.
dw 2w 2w
And
z2
2 2 1
2w e
w
Var( Z ) z e dz
2 2
dw
2 0 2 0 2w
2(2) 1 2(2) 1
w e w e
w w
dw dw
2 2 0 w 2 2 0 w
2 2 2 2
we w dw w1/2 w
e dw w (3/2) 1 w
e dw
0 0 0
by
1
f x e 2
, x
2
where µ is the mean of the distribution and 0 is the standard deviation.
Example:
Let X be N (2, 25) and suppose that we are interested
in computing the probability
P 0 X 10 F (10) F (0).
X X 2
Applying the standardization formula Z ,
5
we obtain
P 0 X 10 F (10) F (0)
10 2 02
5 5
8 2
1.60 0.40 .
5 5
The standard normal table, gives us values of the
distribution function of a standard normal variable,
by
X
Z X Z X Z
where, μ is the mean and σ is the standard deviation of the variable
X,
and Z is the value from the standard normal distribution for the
desired percentile.
Example:
The BMI for men aged 60 is normally distributed with
mean equal to 29 with a standard deviation of 6 whereas
the BMI for women aged 60 is normally distributed with
mean 28 and standard deviation 7.
by
1
f x e 2
for x .
2
where µ is the mean or expectation of the distribution and
σ >0 is the standard deviation.
1
Area f x dx e ( x )2 /2 2
dx
2
x
Let, z z x x z
dx
so that 0 dx dz.
dz
1 0
e dz e
z /2 z 2 /2
2
dz
2 0
0
e
z 2 /2
The function dz being an even function of z,
0
e dz e
z /2 w2 /2
can be by letting w - z , written as
2
dw.
0
Then,
1 2
e
z /2 z 2 /2
Area 2. dz
2
e dz
2 0
0
1 2 dv 1
Let v z so that 2 z z dv zdz
2 dz 2
implying that
dv dv
dz
z 2v
by
1
f x e [( x )2 /2 2 ]
x
2
The mean the normal distribution is µ
Proof:
By definition,
1
µ E ( X ) xf x dx x.e [( x )2 /2 2 ]
dx
2
x
Let, z .Then, x z and dx dx.
{Limits: when x , z ; when x , z }
1
( z )e
z 2 /2
Therefore, E ( X ) dz
2
e dz 2 ze
z /2 z 2 /2
2
dz
2
Var ( X ) E ( X ) 2 2
( x ) f ( x)dx
1
x
[{ x )2 /2 2
2
e dx
2
x
on putting z
1 2
z z. z e
z /2 z 2 /2
Var ( X ) dz
2
2 2
.e dz ,
2 2
To integrate by parts, we use the formula
0 2
0
1 1 1
e e
z /2 z 2 /2
dz dz ,
2
2 2 0
2
M
0 or M ,
i.e. µ is the median of the distribution.
ii) By definition, for any pdf f(x), the mode, if any, is that
value of x for which
f x 0
and
f x 0.
Now,
1 [( x )2 /2 2 ] x 2
f x e 2
2 2
1
3 ( x )e [( x )2 /2 2 ]
2
Equating f x 0, we see that x .
Now,
e e
3 2
2
2 (x )
2
1
f x 3
( x ) /2
1 e
( x ) /2
2 2 2
e
2
2
Substituting x in f x , we see that f x 0.
Thus,
x is the mode of the normal distribution.
by
Q3 Q1
Q
2
Inter-Quartile Range
(IQR)
Now, for a Symmetric distribution,
Q3 Q2 Q2 Q1 Q3 2Q2 Q1
1 x 1 x
2 2
Q3 Q
1 1 1 1
Q 2 e
2 2
dx e dx
1
2 Q 2 2
x
Putting, z z x x z
dx
0 dz dx
dz
Change of limits:
Q Q
As x Q z
Q Q
and as x Q z
So
substituting the values, we obtain
Q / 1 2
1 z 1
2 e
Q /
2
. dz
2
1 2
z
Now, e 2
. is an even function
So
Q / 1 2
1 z 1
2 Q /
e 2
. dz
2
can be re-written as
Q / 1 2 Q / z2
1 z 1 1 1
2
2 0
e 2
. dz
2
0 2
e 2
dz
4
Now, from the area table of the standard normal
distribution, we find that
0.6745 z2
1
0 2
e 2
dz 0.2500
Hence, we can write
Q/σ = 0.6745
or
Q= 0.6745 σ …(1)
Therefore, using eq.(1), we obtain the values of the
quartiles as follows:
by
Points of Inflection
which are equidistant from the mean
Definition: By a ‘point of inflection’ we mean a point at
which the concavity of a curve changes,
From calculus, we know that such a point is obtained by
solving the equation
f x 0
Taking the first derivative of the PDF of the normal
distribution, we have
d
1 x 1 x
2 2
1 1 d
f x
e 2 e 2
dx 2 2 dx
1 x
2
2
1 2
d 1 x
e
2 dx 2
1 x
2
1
2 1 x 1
.e 2 .2.
2
1 x 1 x
2 2
1 x 1
.e 2
x .e 2
• To find the points of inflection, we take the second
derivative:
d 1
1 x
2
1 d 1 x
2
f x 3 x .e 2 x .e 2
dx 2 3
2 dx
1 x 2 1 x
2
1 2 d d
3 e x x e 2
2 dx dx
or
1 2
1 x
2
1 x
2
1 x
2
2 d
f x 3 e 1 x e
2 dx 2
1 x
2
1 x
2
1 2 1 x d x
x e 2 .2
3 e
2 2 dx
1 x
2
1 x
2
1 2 1 x 1
f x 3 x e 2 .2
e
2
2
1 x
2
1 x
2
1 2
x
x e 2 . 2
3 e
2
1 2 x
1 x
2
2
1 x
2
2
3 e e
2 2
1 x
x 2
2
1
2
e 1
3
2 2
so that, finally, we have:
1 x
x
2
2
1
f x e 2
1
3
2 2
Now, equating the second derivative to zero, we have
x 2
1 x
2
1
2
3 e 1 0
2 2
x 2
2
3
2 1 x
1 0 0
2
2 2
2
1 x
e
x
2
1 x 2
2
2
or x x
or x , x .
At these two points, the value of the function f(x) is
2 2
1 1 1
1
2
1 2 1 2 1
e e e2
2 2 2
1
1 1 1
e 2
.
2 2 e
1
2 e 2
Hence, the two points of inflection of normal curve are
1 1
, and , .
2 e 2 e
In other words, the points of inflection occur on the right
and on the left of the mean at a distance equal to
standard deviation and thus the graph of the normal
curve is bell-shaped.
The ‘bell curve’
Virtual University of Pakistan
Probability Distributions
by
z
2 n 1 2 n 1
e 2
dz 0 as z e 2
is an odd function of z.
2
Thus,
1 3 5 0
2 n 1
z2
2 n 1 z
2 n 1
e 2
dz 0
2
z2
2 n 1 2
as z e is an odd function of z.
Thus,
1 3 5 0
Virtual University of Pakistan
Probability Distributions
by
Derivation of
the Variance of
the Exponential Distribution
A very interesting property:
x x
Here, we let u x 2
and v e ( e )
x 2 x.
du d 2
and
dx dx
Hence, we have
du
x .e dx u vdx vdx dx
2 x
dx
x 2 (e x ) 2 x. e x dx x 2e x 2 xe x dx
x e 2 x
2
xe x dx
dx
2 x 2
x e x e x
Hence, we have
dx
2
x .e
x 2 x
2
dx x e x e x
implying that
EX x e
2
2 2 x
E X
0
or
x 2
E X x E X
2 2
e 0
x2
In order to apply the upper limit to x , we need
e
to utilize the L'Hospital Rule.
x2
We have x
e
x2 2x 2 2 2
Now, Lim x Lim x Lim 2 x 2 0.
x e x e x e e
0
2 2
x 0 0
On the other hand, Lim x 0 0 0.
x 0 e e e 1
Hence, we have
x
2
E X x E X
2 2
e 0
2
0 0 E X
2 2 21 2
0 E X E X 2
Hence,
Var X E X E X
2 2
1 2 1 1
2
2 1 2
2 2 2 2 2
implying that
1
S .D.( X ) .
Hence we have the interesting property that the mean and
standard deviation of the exponential distribution are
equal.
by
1
t 2t 2
M t e 2
, for t .
The first two derivatives of MX(t) are easily derived as,
d t 12 2t 2 t 12 2t 2 d 1 22
M X t e
e t t
dt dt 2
t 2t 2
1 1 22
1 t t
e 2
2
2 t e 2
2
t
2
Putting t = 0, we have 1
(0) 2 (02 )
M X 0 e 2 2 (0)
e00 0 e0 (1) .
But, we know that
M X 0 E X
Therefore
E X .
And now taking the second derivative of MX(t) as follows:
M X t
d d
1 22
d
1 22
t
1 22
M X t e
t t t t t
2 t e
2 2
t e
2 2
dt dt dt
t 12 2t 2 1 2 2 t 12 2t 2 t 12 2t 2 1 2
e 2t e 1 t e 2t
2 2
2 t 12 2t 2 t 2 t 2 t 2t 2
1 1 1 1
t 2t 2 t 2t 2
e 2te 2 2e 2 2 te 2 4t 2e 2
Putting t = 0, we have,
Var X E X E X
2 2
.
2 2 2 2
NOTE:
All the higher moments can be derived in a similar manner.
Virtual University of Pakistan
Probability Distributions
by
K t ln E e tX
For the normal distribution with expected value μ
and variance σ2, the cumulant generating function is
t
2 2
K t t .
2
• The first derivative of the cumulant generating function
is obtained as follows:
d 2t 2 d d 2t 2
K t t t
dt 2 dt dt 2
2
1 2t t t
2 2
2
The second derivative of the cumulant generating function
is obtained as follows:
K t K t 2t
d d
dt dt
d 2 d
t 0 1 .
2 2
dt dt
The third derivative of the cumulant generating function is
given by:
K t K t 0
d d 2
dt dt
implying that all higher derivatives will be zero.
Now, the first cumulant is obtained as follows:
K 0 t
2
2
0 0
t 0
by
1 x
2
1
x . 2
2n
e dx
2
x
Substituting, z z x x z
dx
dz dx, Limits will remain unchanged.
dz
Therefore,
1 x
2
1
2 n x . 2
2n
e dx
2
can be re-written as
1 2
1 z
2 n z e 2 dz
2n
.
2
And, Now, as we know that, for any even function g i.e. a
function for which g(− x) = g(x) ∀ x ∈ ℝ
the integral g x dx is equal to 2 g x dx.
0
Therefore, we have
1 2 1 2
1 z 2 2n z
2 n 2 z dz z
2n
2 2
. e e dz
0 2 2 0
2n
2 z2
z e
2 n
2
dz ,
2 0
z2
Let y 2 y z 2 2 y z
2
z2 dy 2 z dy dy
Also, y z dy zdz dz dz
2 dz 2 z 2y
Limits remain unchanged.
2 2 n dy 2 2 n 1
2y e 2y
n
Therefore, 2 n
n y y
e dy
2
2 0 2y 2 0
2 2 n n 12 1
2n
n 1
1
y
n n 1/2 y
2 0 2
y
2 y 2 e dy 2
e dy
2 0
or
2 n n 12 n 1/2 y 2 n 12 n 12 n 1/2 y
2 y e dy .2 .2 y e dy
2
2n
0 0
2n 2n
2 n
2
n
y
n 1/2 11 y n 1/2 1 y
e dy y e dy
0 0
2n
2 n
n 1/2 1 y
0
y e dy
2n 2 n n 1/ 2
2n
2n 2 n 2n 1 2n 3 5 3 1
.
2 2 2 2 2
1
2
2 n 2n 1 2n 3 5.3.1 2 n
Putting n 1 and 2, we get 2 2 and 4 3 4 .
Hence β1=0 meaning that skewness is zero,
and
4 3 4
2 2 4 3,
2
by
An example of
the
Normal Approximation to the Poisson distribution
Normal Approximation to the Poisson distribution
26.5 25
at x 26.5, z 0.3,
5
P 22.5 X 26.5 P 0.5 Z 0.3
P 0 Z 0.5 P 0 Z 0.3 0.1915 0.1179
0.3094 30.94%.
(b) P(X >30) becomes on continuous scale P(X >29.5)
X 25 29.5 25
P X 29.5 P
5 5
4.5
PZ P Z 0.9 1 P Z 0.9
5
1 0.8159 0.1841 18.41%.
I would like to encourage you to compute the exact
probabilities by using the pmf of the Poisson distribution
by
X
2
X
1
Explanation
Conceive it:
as if we have a ‘bell-type’
that is placed on the
FLOOR.
The concept of
The concept of
Contours
By the term ‘contour’, we mean the set of points
on the X1X2 floor corresponding to which the
ordinates are of equal height.
Explanation: Consider the bivariate normal distribution for
which ρ = 0 and 1 2 .
As soon as we
‘elongate’ one of the
diameters, we get a
major axis as well as
a minor axis.
Bivariate Normal Distributions with ρ = 0.87 (to the left)
and ρ = - 0.67 (to the right)
Bivariate Normal Distributions with ρ = 0.87 (to the left)
and ρ = - 0.67 (to the right)
A few real-life examples
by
x 2 2
1 x1 1 x2 2 x2 2
1 1
2
1
2 1 2 12
1 2 22
f ( x1 , x2 ) e
2 1 2 1 2
for x1 and x2 ,
where µ1 , µ2 , 1 0, 2 0, and 1 1 .
PDF of the bivariate normal distribution for the case ρ = 0.
x 2 2
. 1 x1 1 x2 2 x2 2
1 1
2
1
2 1 2 12
1 2
f ( x1 , x2 )
2
2
e
2 1 2 1 2
1 x 2 x1 1 x2 2 x2 2
2
1 1
2(0)
1
e
2 1 02 12 1 2 22
2 1 2 1 02
1 x1 1 x 1 x x
2 2 2 2
0 2 2 2 1 21 2 2 2
1 21 1 2
2 1 2 1 2
e
e
2 1 2 1 2 1 2
Now, since f (x1,x2) represents the ordinate of the
distribution against the point (x1,x2) on the X1X2-floor,
therefore, for any particular contour, we can write:
1 x1 1 x2 2
2 2
1 2 12 2
f ( x1 , x2 ) e 2
constant c
2 1 2
1 x1 1 x2 2
2 2
1 2 12 2
Now e 2
constant c
2 1 2
1 x x
2 2
1 21 2 2 2
2 1
2 1 2 .c =a new constant
2
e
1 x1 1 x2 2
2 2
2 12
ln[ 2 1 2 .c]=another new constant=e
22
ln e
1 x1 1 x2 2
2 2
or e
2 1 2
22
Further,
1 1 1 2
2
1 1 1 2
2
2 2 2 2
x x x x
e 1
2 1 2
22
2 e 1
2
2
2
1 1 1 2
2
2 2
x x
1 where k 2e
k 1 2
2 2
Hence, we have
x1 1 x2 2
2 2
1 ... 2
k 2
1 k 2
2
RECALL
the Mathematical Equation of An Ellipse
x1 c1 x2 c2
2 2
2
2
1 ... 1
a b
Next:
The Shape of the distribution
when
ρ = 0 AND 1 2
In the special case a = b, equation (1) reduces to
x1 c1 x2 c2
2 2
1 x1 c1 x2 c2 a 2
2 2
a2 a2
and we already know that
x1 c1 x2 c2 a
2 2 2
x1 1 x2 2
2 2
1
k 2
k 2
x1 1 x2 2 k 2 ... 3
2 2
by
Conditional distributions
in the case of
Bivariate Normal Distribution
(Graphical Interpretation)
Graphical interpretation
of the concept of
conditional
distributions
associated with
the
bivariate normal distribution.
So, let us start with the pdf of the bivariate
normal distribution:
We know that the joint p.d.f. of Bivariate Normal distribution
(X, Y ) is
2
y µy
2
x µx y µy
1 x µ
x
2
1 2 (1 2 ) x y x y
f XY x, y
e
2 x y 1 2
f ( y | X x)
1
e
2 y2 1 2 x
Y|X 2 y 1 2
2
1 y
y µy x x
f ( y | X x)
1
e
2
2
2 y 1 x
Y | X 2
2 y 1
y
y y x x
x
by
f XY x, y f X x fY y ...(1)
i.e.
if the joint density fXY (x, y) is equal to the product of the marginal
density fX (x) and the marginal density fY (y).
Note that, due to equation (1), if X and Y are independent, then
the conditional distribution of Y given X = x i.e. fY | X ( y | X x)
can be written as
f XY x, y f X x fY y
fY | X ( y | X x ) fY y .
fX x fX x
So, in the case of independent random variables (X and Y),
conditioning on X = x does not change the distribution of Y.
(i.e. the conditional distribution of Y is the SAME as the
unconditional distribution of Y)
The proof of the second part of the statement “if and only if”
i.e.
In the case of a bivariate normal distribution
if ρ = 0, then X and Y are independent. N µx , µy , x2 , y2 , ,
Proof: In order to prove that if X and Y have the bivariate normal
distribution with zero correlation, then X and Y are independent,
we need to show that the bivariate normal density function
2
2
x µx y µy
1 x µx y µ
2
y
1 2 (1 2 ) x y
f XY x, y x y
e
2 x y 1 2
2 x y 1 02
which reduces to
2
1 x µx y µy
2
1 2 x y
f XY x, y e
2 x y
Further simplifying, we have:
2
y µy
2
1 x µx
1 2 x y
x, y
f XY e
2 x y
2
1 1 y µ y
2
x µx
1 2 x 2 y
e
e
2 2 x y
2
1 y µy
2
1 x µx
1 1 y
2 x
fX x y
2
e e fY
2 x 2 y
Virtual University of Pakistan
Probability Distributions
by
x 2 2
1 x1 1 x2 2 x2 2
1 1
2
1
2 1 2 12
1 2 22
f ( x1 , x2 ) e
2 1 2 1 2
for x1 and x2 ,
where µ1 , µ2 , 1 0, 2 0, and 1 1 .
PDF of the bivariate normal distribution for the case ρ = 0.
x 2 2
1 x1 1 x2 2 x2 2
1 1
2
1
2 1 2 12
1 2
f ( x1 , x2 )
2
2
e
. 2 1 2 1 2
1 x 2 x1 1 x2 2 x2 2
2
1 1
2(0)
1
e
2 1 02 12 1 2 2
2
2 1 2 1 02
1 x1 1 x2 2 1 x x
2 2 2 2
0 1 21 2 2 2
1 21 12 2 1 2 1 2
e 2
e
2 1 2 1 2 1 2
Now, since f (x1,x2) represents the ordinate of the
distribution against the point (x1,x2) on the X1X2-floor,
therefore, for any particular contour, we can write:
1 x1 1 x2 2
2 2
1 2 12 2
f ( x1 , x2 ) e 2
constant c
2 1 2
1 x1 1 x2 2
2 2
1 2 12 2
Now e 2
constant c
2 1 2
1 x x
2 2
1 21 2 2 2
2 1
2 1 2 .c =a new constant
2
e
1 x1 1 x2 2
2 2
2 12
ln[ 2 1 2 .c]=another new constant=e
22
ln e
1 x1 1 x2 2
2 2
or e
2 1 2
22
Further,
1 1 1 2
2
1 1 1 2
2
2 2 2 2
x x x x
e 1
2 1 2
22
2 e 1
2
2
2
1 1 1 2
2
2 2
x x
1 where k 2e
k 1 2
2 2
Hence, we have
x1 1 x2 2
2 2
1 ... 2
k 2
1 k 2
2
RECALL
the Mathematical Equation of An Ellipse
x1 c1 x2 c2
2 2
2
2
1 ... 1
a b
Next:
x1 c1 x2 c2
2 2
1 x1 c1 x2 c2 a 2
2 2
a2 a2
and we already know that
x1 c1 x2 c2 a
2 2 2
x1 1 x2 2
2 2
1
k 2
k 2
x1 1 x2 2 k 2 ... 3
2 2
by
Computation of Probabilities
in the case of
Bivariate Normal Distribution
explained through an
EXAMPLE
Example:
A statistics class takes two exams, Exam 1 (Midterm Exam)
and Exam 2 (Final Exam), and let us suppose that the marks of
the two exams (to be called X and Y respectively) follow a
bivariate normal distribution with parameters:
µx = 70 and µy = 60 (which can be called the marginal means)
σx = 10 and σy = 15 (which can be called the marginal standard
deviations)
and ρ = 0.6 (which is the correlation coefficient).
Suppose that we select a student at random.
1) What is the probability that the student scores over 75
on Exam 1 (Midterm Exam)?
2) What is the probability that the student scores over 85
on Exam 2 (Final Exam) given that he/she scored 75 in
Exam 1 (Midterm Exam)?
Solution of Part (1):
x x 75 70
P X 75 P Z P Z P Z 0.5
x 10
1 P Z 0.5 1 0.5 1 0.6915 0.3085=30.85% or 31%
--- a little less than one-third.
Next,
Part (2) of the question is as follows:
1
y 60 0.6 1.5( x 70 )
2
1 215 (1 0.36 )
2
e
2 15 1 0.36
or
1
y 60 0.9( x 70)
2
1 215 (0.64)
2
fY | X ( y | X x ) e
2 15 0.64
1
y 60 0.9 ( x 70 )
2
1 2 225(0.64)
e
2 15 0.8
1
y 60 0.9 ( x 70 )
2
1 2144
e
12 2
Now, since the given value of X is x=75, therefore we have
1
y 60 0.9 (75 70 )
2
1 212
2
fY | X ( y | X 75) e
12 2
2
1 1
y 60 0.9 ( 5)
2 2
y 64.5
1 212 1 212
2
e e
12 2 12 2
which is the pdf of the univariate normal distribution with mean
and standard deviation
y 64.5 y 12.
Hence, the probability that a randomly selected student scores
over 85 on Exam 2 (Final Exam) given that he/she scored 75
in Exam 1 (Midterm Exam) is given by
y y 85 64.5 20.5
P Y 85 P Z PZ PZ
12 12
y
P Z 1.71 1 P Z 1.71 1 1.71
1 0.9564 0.0436 =4.36%
--- a little less than 5%.
Virtual University of Pakistan
Probability Distributions
by
1 x
y
( x µ )2 ( y µ )2 2 ( x µx ) y µy
1 2
2 (1 ) x y x y
fX x
2 2
2 x y 1 2
e
dy
Let us now do some algebraic manipulation
in the expression of the
exponent:
The exponent is given by:
1 ( x µ ) 2 ( y µy ) 2 2 ( x µx ) y µy
x
2(1 ) x
2 2
y2
x y
Dividing and multiplying the above expression by x2 y2 , we have
2(1 ) x y
2 x x y
Completing the square in the exponent by Adding and Subtracting x µx y2 2
2
y2 ( x µx )2 x2 ( y µy )2 2 x y ( x µx ) y µy x µx 2 y2 2 x µx 2 y2 2
Combining (i) the second, third and fourth terms, and (ii) the first and fifth terms, we have
1
2(1 ) x y
2 2 2
[ x µx 2 y2 2 2 x y ( x µx ) y µy x2 ( y µy )2 ] [ y2 ( x µx )2 x µx 2 y2 2 ]
1 [ ( y µ ) x µ ]2 2 ( x µ )2[1 2 ]
2(1 2 ) x2 y2
x y x y y x
which can be re-written as
1 1
[ ( y µ ) x µ ]2
2
( x µ ) 2
[1 2
]
2(1 ) x y 2(1 ) x y
2 2 2 x y x y 2 2 2 y x
x ( y µy ) x µx y
2
1 ( x µx ) 2
2
2(1 ) x y x y 2 2
x
( y µy ) x µx 1 x µx
2 2
1
2
2(1 ) y x
2 x
Coming back to the overall expression of the
marginal pdf:
1 x
y
( x µ )2 ( y µ )2 2 ( x µx ) y µy
1 2
2 (1 ) x y x y
fX x
2 2
2 x y 1 2
e
dy
Substituting the newly obtained expression of the exponent
in the above pdf, we have
( y µy ) x µx 1 x µx 2
2
1
1 2 2 x
fX x
2
2 (1 ) y x
e dy
x y 1 2
( y µy ) x µx
2 2
1
1 x µx
1 2
2
2 (1 ) 2 x
e y x
e dy
x y 1 2
Now, let us shift our attention to the first part of the
expression inside the integral sign (the one that does NOT
involve the exponent):
1 1
2 x y 1 2
2 x 2 y 1 2
( y µy ) x µx
2 2
1 x µx
1
1 1 2(1 2 ) y x
fX x dy ... 1
2 x
e e
2 x 2 y 1 2
Multiplying and dividing the exponent inside the integral sign by y2 ,
we have
( y µy ) x µx
2
1 x µx
2 y2
1 1 2 y2 (1 2 ) y
fX x
x
2 x
e e dy
2 x 2 y 1 2
( y µy ) y x µx y
2
1 x µx
2
1
1 1 2 y2 (1 2 ) y x
2 x
e e
dy
2 x 2 y 1 2
Now, let y2 y2 1 2 y y 1 2 ,
therefore
1 ( y µy ) y x µx y
2
1 x µx
2
1 1 2 y2
fX x
y x
2 x
e e dy
2 x 2 y
y
2
1
2
1 x µx
1
1
2 y2
y µy x µx
x
2 x
e e dy
2 x 2 y
Now
2
1 x µx
2
1 y
2 y µy x µx
1 1 2 y
fX x
x
2 x
e e dy
2 x 2 y
can be re-written as
2
1 y
2
1 x µx 2 y µy x x
1 1 2 y x
2 x
e e dy
2 x 2 y
y
Letting y y x x ,
x
( x µx )2 1
yy
2
1 1
we have f X x
2 2
2 x2
e e y
dy
2 x 2 y
Hence, the integrand is the pdf of a normal density of the variable Y with:
y
E Y y y x x , Var Y y2 1 2 y2
x
Now, it is obvious that, for any density function,
the total area under the curve is unity.
1
y y
2
1 2 2
Hence, e dy 1
y
2 y
( x µx )2 ( x µx )2
1 1
fX x e 2 x2
1 e 2 x2
2 x 2 x
Therefore, the marginal distribution of X is
2
1 x µx
1
fX x e 2 1
, x
2 x
i.e. the normal distribution with mean x
and variance x2 .
i.e.
X N µx , x2 with E X x , Var X x2 ,
Similarly,
Y N µy , y2 with E Y y , Var Y y2 .
Important Note:
It is possible to have a joint p.d.f. which has marginal p.d.f.s
which are Normal, yet which is NOT bivariate normal.
Virtual University of Pakistan
Probability Distributions
by
Conditional distributions
in the case of
Bivariate Normal Distribution
the Conditional Distributions
related to
the Bivariate Normal Distribution
and,
through
DERIVATION,
I will show that the Conditional Distributions
related to the bivariate normal distribution
are themselves Normal.
So, let us start with the pdf of the bivariate
normal distribution:
We know that the joint p.d.f. of Bivariate Normal distribution
(X, Y ) is
2
y µy
2
x µx y µy
1 x µ
x
2
1 2 (1 2 ) x y x y
f XY x, y
e
2 x y 1 2
where −∞ < x, y < ∞ and the parameters are such that
−∞ < µx, µy < ∞;
σx , σy > 0;
−1 < ρ < 1.
Let us now attempt to derive the pdf of the
conditional distributions
associated with the bivariate normal
distribution:
In general, we know that, in the case of any bivariate
distribution f(x,y), the conditional distribution of Y
given X = x is expressed as follows:
f XY x, y
fY | X ( y | X x )
fX x
where f XY x, y is the joint pdf of X and Y
and f X x is the marginal pdf of X .
Now, we know that, in the case of bivariate normal
distribution, the marginal distribution of X is given by:
2
1 x µx
1
fX x
2 x
e , x
2 x
i.e. the normal distribution with mean x and variance x2 .
Therefore, in the case of the bivariate normal distribution,
we will have: 2
yµ
2
yµ
1 x µ x µ
2
y y
x
x
2 (1 ) x y
2
x y
1
e
f XY x , y
2 x y 1 2
fY | X ( y | X x )
f X x
2
1 x µx
2
1 x
2
e
x
Let us now do some algebraic manipulation
in the expression of the
exponent
in the
numerator:
The exponent is given by:
1 ( x µ ) 2 ( y µy ) 2 2 ( x µx ) y µy
x
2(1 ) x
2 2
y2
x y
Dividing and multiplying the above expression by x2 y2 , we have
2(1 ) x y
2 x x y
x µx y2 2
2
Completing the square in the exponent by Adding and Subtracting
in the square brackets:
1
2(1 2 ) x2 y2
y2 ( x µx ) 2 x2 ( y µy ) 2 2 x y ( x µx ) y µy x µx 2 y2 2 x µx 2 y2 2
Combining (i) the second, third and fourth terms, and (ii) the first and fifth terms, we have
1
2(1 2 ) x2 y2
[ x µx 2 y2 2 2 x y ( x µx ) y µy x2 ( y µy )2 ] [ y2 ( x µx ) 2 x µx 2 y2 2 ]
1 [ ( y µ ) x µ ]2 2 ( x µ ) 2[1 2 ]
2 2
2(1 ) x y
2 x y x y y x
which can be re-written as
1 1
[ ( y µ ) x µ ]2
2
( x µ ) 2
[1 2
]
2(1 ) x y 2(1 ) x y
2 2 2 x y x y 2 2 2 y x
x ( y µy ) x µx y
2
1 ( x µx ) 2
2
2(1 ) x y x y 2 2
x
( y µy ) x µx 1 x µx
2 2
1
2
2(1 ) y x
2 x
Multiplying and dividing the first part of the exponent by y2 ,
we have
( y µy ) x µx 1 x µx
2
2 2
y
2
2 y (1 ) y
2
x 2 x
( y µy ) y x µx y
2
1 x µx
2
1
2
2 y (1 )
2
y x 2 x
Now, let us revert back to the expression of the conditional
distribution:
fY | X ( y | X x)
2
2
1 x µ y µ x µ y µ
2
y y
x
x
2 (1 ) x y
2
x y
1
e
f XY x , y
2 x y 1 2
fX x
2
1 x µx
2
1 x
2
e
x
2
y 1 x µx
2
1
2
( y µ ) x
x µ
1 2 2
(1 )
y
x 2 x
This can be re-written as: e y
2 x y 1 2
fY | X ( y | X x ) 2
1 x µx
2 x
1 e
2
x
2
1 y 1 x µx
2
2 ( y µ ) x µ
2
y x
x 2 x
e
1 2 (1 )
e y
2 x y 1 2
2
1 x µx
2
1 x
2
e
2
1 x µx
Cancelling out the term in the numerator
2 x
e
with the same term in the denominator, we have
1 1 y
2
( y µy ) x µx
2 x y 1
2 2 y2 (1 2 ) x
fY | X ( y | X x )
e
1
2
x
2
1 y
2
y y
µ x µ
2 x 2 y (1 )
2 x
x
2
e
2 x y 1
Now,
fY | X ( y | X x )
2
1 y
2
y µy x µx
2 x 2 y (1 )
2 x
2
e
2 x y 1
2
1 y
2
y
y µ x µx
1
... 1
2 y (1 )
2 x
2
e
2 y 1
y y y
Now, let 2 2 1 2 1 2
y
y
and let y y x x ,
x
Then, equation (1) can be re-written as
1
2
y y
2
1 2
y
f ( y | X x) e
Y|X 2
y
1
2
y
y
2
1 2
y
Now, f ( y | X x) e
Y|X 2
y
is the pdf of a normal distribution of the variable Y with:
y
E Y y y x x ,
x
Var Y y2 1 2 y2 .
The conditional distribution of Y given x can be expressed
as follows:
y
N y x x , y 1 N y , y
2 2 2
x
تمام دوست و احباب دعاؤں میں یاد رکھیے گا۔
طالب دعا مہر آفاق صفدر محمدی
Wa.me/+923494873115