You are on page 1of 45

Statistical Theory of Distribution

Course Outline

1. Review of Probability Theory (3 lecture hours)


1.1 Probability concepts
1.2 Distribution Functions
1.3 Expectations
1.4 Moments
1.5 Moment Generating Functions
2. Common Univariate Distributions (13 lecture hours)
2.1 Discrete Distributions: Binomial, Negative Binomial, Geometric,
Hypergeometric, Poisson and Pascal
2.1 Continuous Distributions: Uniform, Normal, Gamma, Exponential, Weibull, Chi-
square, Beta and Cauchy
2.4 Truncated Distributions
3. Common Multivariate Distributions (13 lecture hours)
3.1 Multi-dimensional random variables
3.2 Trinomial and Multinomial distributions
3.3 Bi-variate and Multivariate Normal distributions
3.4 Moments and Moment Generating Functions
3.5 Marginal and Conditional Distributions
3.6 Independent random variables
3.7 Distribution Functions of Functions of Random Variables
3.7.1 Cumulative distribution function Technique
3.7.2 Moment Generating Function technique
3.7.3 Transformations
4 Sampling Distributions (13 lecture hours)
4.1 The normal distribution
4.2 The Chi-Square Distribution
4.3 The t-Distribution
4.4 The F distribution
4.5 Order Statistics
4.5.1 Definition of order statistics
4.5.2 Derivation of distribution function and density for the ith order statistics
4.5.3 Derivation of the joint pdf of the ith and jth order statistics
4.5.4 Distributions of the sample range and sample median
4.6 The sample cumulative distribution function (empirical distribution)
4.6.1 Definition
4.6.2 Distribution of the empirical distribution
5 Introduction to Limit Theorems (6 lecture hours)
5.1 Sequence of random variables
5.2 Convergence in probability and mean square
5.3 Weak law of large numbers
5.4 Central limit theorems

Textbook

1
Statistical Theory of Distribution

Mood, A.M, Graybill, F. and Boes, D.C. (1974). Introduction to the Theory of Statistics
(3rd Edition). McGraw Hill Series.

References
1. Balakrishnan N. and Nevzorov V.B (2003). A Primer on Statistical Distribution. Wiley-
Interscience.
2. Evans, Hastings N and Peacock B. (2000). Statistical Distributions (3rd Edition). Wiley-
Interscience.
3. Friedlander, F.G. and Joshi M. (2008). Introduction to the Theory of Distributions (2nd
Edition). Cambridge University Press.
4. Hogg, R.V. and Tanis, E. (2009). Probability and Statistical Inference (8th Edition).
Prentice Hall.
5. Johnson, N.L., Kemp, A.W, and Kotz, S. (2005). Univariate Discrete Distributions (3rd
Edition). Wiley Series in Probability and Statistics. Wiley-interscience.
6. Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distribution.
Volume 1, Wiley Series in Probability and Statistics. Wiley-Interscience.
7. Krishnamoorthy, K. (2006). Handbook of Statistical Distributions with Applications.
Chapman and Hall/CRC.
8. Pathak, R.S. (2000). A course in Distribution Theory and Applications. Narosa.
9. Ross, S. (2006). Introduction to Probability Models (9th Edition). Academic Press.
10. Severini, T.A. (2005). Elements of Distribution Theory. Cambridege Series in Statistical
and Probabilistic Mathematics, Cambridge University Press.
11. Stuart, A. and Ord, K. (2009). Distribution Theory (6th edition). Kendall's Advanced
Theory of Statistics Volume 1. Wiley.
12. Walpole, R.E., Myers, R.H., Myers, S.L. and Ye, K. (2006). Probability and Statistics for
Engineers and Scientists (8th edition). Prentice-Hall.

2
Statistical Theory of Distribution

Statistical Theory of Distributions

(Stat 471)

Chapter one: Review of Probability Theory

1.1 Probability concepts

What is probability?

The total situation in life would be to know with certainty what is going to

happen next. This case almost never has been because the world is full of

deterministic and non-deterministic situations. The possible value for the

occurrence of non-deterministic event (what we want to happen) is called

probability.

Random Experiment: An experiment whose outcomes are determined

only by chance factors is called a random experiment.

Sample Space: The set of all possible outcomes of a random experiment

is called a sample space. A sample apace may be finite or infinite sample space.

A sample space is said to be a discrete sample space if it has finitely many or a

countable infinity of elements. If the elements (points) of a sample space

constitute a continuum-for example, all the points on a line, all the points in a

plane –the sample space is said to be a continuous sample space.

Event: The collection of none, one, or more than one outcomes from a

sample space is called an event.

Random Variable: A variable whose numerical values are determined by

chance factors is called a random variable. Formally, it is a function from the

sample space to a set of real numbers.

3
Statistical Theory of Distribution

Discrete Random Variable: If the set of all possible values of a random

variable X is countable, then X is called a discrete random variable.

Continuous Random Variable: If the set of all possible values of X is an

interval or union of two or more non overlapping intervals, then X is called a

continuous random variable

Probability of an Event: If all the outcomes of a random experiment are

equally likely, then the probability of an event A is given by

Number of outcomes in the event A


P(A) = Total number of outcomes in the sample space

Axioms of probability theory are

A1. 0 ≤ 𝑃(𝑎𝑛 𝑒𝑣𝑒𝑛𝑡) ≤ 1 ∀ 𝑒𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 𝑆

A2. 𝑃(𝑆) = 1

A3. If events Ai are mutually exclusive, 𝑃(∪ 𝐴𝑖 ) = ∑ 𝑃(𝐴𝑖 ), we call this axiom

special addition rule. The general addition rule for probability of any two

events A and B in S is given as 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

Conditional probability, if A and B are any two events in S and𝑃(𝐵) ≠ 0,


the conditional probability
𝑃(𝐴∩𝐵)/𝑃(𝑆) 𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = =
𝑃(𝐵)/𝑃(𝑆) 𝑃(𝐵)

1.2 Distribution functions

Probability Mass Function (pmf): Let R be the set of all possible values of a

discrete r.v X; and p(k) = P( X = k) for each k in R. Then p(k) is called the

probability mass function of X. Such that:

0 ≤ p(k) ≤ 1 and ∑∞
−∞ 𝑝(𝑘) =1

4
Statistical Theory of Distribution

Probability Density Function (pdf): Any real valued function f(x) that

satisfies the following requirements is called a probability density function:


∞ 𝑏
f(x) 0 for all x; ∫−∞ 𝑓(𝑥) 𝑑𝑥 =1 P(a≤ 𝑥 ≤ 𝑏) = ∫𝑎 𝑓(𝑥) 𝑑𝑥

We cannot compute a single point in Continous random variable. Example,


𝑎
when x = a ⟹ P(x =a) = ∫𝑎 𝑓(𝑥) 𝑑𝑥 = 0

𝑥 0≤𝑥≤1
Exercise: Let f(x) = {2 − 𝑥 1 ≤ 𝑥 ≤ 2 Then show that f(x) is pdf of the random
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

variable X?

Cumulative Distribution Function (cdf): The cdf of a random variable X is

defined by F(x) = P(X x) for all x.

For a continuous random variable X with pdf f(x),


𝑥
P(X ≤ 𝑥) = ∫−∞ 𝑓(𝑥)𝑑𝑥 for all x:

For a discrete random variable X, the cdf is defined by

F(k) = P(X ≤ k) = ∑𝑘𝑖= −∞ 𝑝(𝑥 = 𝑖)

Joint probability distributions

XY discrete XY Continous

f(x,y) is the joint pmf if f(x,y) is the joint pdf if

1. f(x, y) ≥ 0 ∀ 𝑥, 𝑦 1. f(x, y) ≥ 0 ∀ 𝑥, 𝑦
∞ ∞
2. ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1 2. ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1

Marginal probability distributions:

∑𝑦 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠


f(x) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

5
Statistical Theory of Distribution

∑𝑥 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠


f(y) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

Exercise: Let f(x, y) = 𝑒 −𝑥−𝑦 for x> 0, 𝑦 > 0, then

i. Verify that the definition of joint pdf is satisfied.

ii. Find the marginal of X and Y?

1.3 Expectations

If X is a continuous random variable with the pdf f(x), then the expectation of

g(X), where g is a real valued function, is defined by



E(g(X)) = ∫−∞ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥

If X is a discrete r.v, then

E(g(X)) = ∑𝑘 𝑔(𝑥)𝑃(𝑥 = 𝑘)

where the sum is over all possible values of X.

1.4 Moments

The moments are a set of constants that represent some important

properties of the distributions. The most commonly used such constants are

measures of central tendency (mean, median, and mode), and measures of

dispersion (variance and mean deviation). Two other important measures are

the coefficient of skewness and the coefficient of kurtosis.

Moments about the Origin (Raw Moments): The moments about the origin are

obtained by finding the expected value of the r.v that has been raised to k, k =

1; 2,…. That is,



𝜇 ’k = E(xk) = ∫−∞ 𝑥 𝑘 𝑓(𝑥)𝑑𝑥 is called the kth moment about the origin.

6
Statistical Theory of Distribution

Moments about the Mean (Central Moments): When the r.v is observed in

terms of deviations from its mean, its expectation yields moments about the

mean or central moments. The first central moment is zero, and the second

central moment is the variance. The third central moment measures the degree

of skewness of the distribution, and the fourth central moment measures the

degree of flatness.

The kth moment about the mean or the kth central moment of a random variable

X is defined by

µk = E(x - µ)k , k = 1, 2, …

where µ = E(x) is the mean of X.

1.5 Moment and probability generating Functions

The moment generating function of a random variable X is defined by

Mx(t) = E(etx)

provided that –h ≤ 𝑡 ≤ h for too small h.

The moment generating function is useful in deriving the moments of X.

Specifically

𝑑𝑘 𝐸(𝑒 𝑡𝑥 )
E(xk) = |t=0 k = 1, 2 …
𝑑𝑡 𝑘

Example: let X be a r.v with pdf

𝑒 −𝑥 𝑓𝑜𝑟 𝑥 ≥ 0
f(x) = {
0 𝑓𝑜𝑟 𝑥 ≤ 0

i. Find the moment generating function of X.

ii. Calculate the first and second moments of X from the origin.

Solution

7
Statistical Theory of Distribution

∞ ∞
i. Mx(t) = E(𝑒 𝑡𝑥 ) = ∫0 𝑒 𝑡𝑥 𝑒 −𝑥 𝑑𝑥 = ∫0 𝑒 (𝑡−1)𝑥 𝑑𝑥

𝑒 (𝑡−1)𝑥
= 𝑡−1

ii. The 1st moment of x = E(x) = M’x(t)|t=0

𝑓(𝑥) 𝑓 ′ (𝑥)𝑔(𝑥)− 𝑔′ (𝑥)𝑓(𝑥)


If h(x) = then h’(x) =
𝑔(𝑥) (𝑔(𝑥))2

Thus,

𝑑(𝑀𝑥 (𝑡) 𝑒 (𝑡−1)𝑥 (𝑡 − 1) − 𝑒 (𝑡−1)𝑥


=
𝑑𝑡 (𝑡 − 1)2

𝑒 −𝑥 (−1)−𝑒 −𝑥 −𝑒 −𝑥 −𝑒 −𝑥
M’x(t)|t=0 = = -2𝑒 −𝑥 //
(−1)2 (−1)2

M’’x(t) =

Probability Generating Function: The probability generating function of a non-

negative, integer valued r.v X is defined by

P(t) = ∑∞ 𝑖
𝑖=0 𝑡 𝑃(𝑋 = 𝑖)

So that

1 𝑑𝑘 𝑃(𝑡)
P(X = k) = ( )|t=0 k = 1, 2, …
𝑘! 𝑑𝑡 𝑘

𝑑𝑃(𝑡)
Furthermore, P(0) = P(X = 0) and |t=1 = E(X).
𝑑𝑡

8
Statistical Theory of Distribution

Chapter Two

Common Univariate Distributions

2.1 Discrete Distributions

2.1.1 Bernoulli distribution

The Bernoulli distribution is used to model an experiment with only two

possible outcomes, often referred to as “success” and “failure”, usually encoded

as 1 and 0.

Definition: A discrete r.v X has a Bernoulli distribution with parameter p,

where 0 ≤ p ≤ 1, if its pmf is given by

p(1) = P(X = 1) = p and p(0) = P(X = 0) = 1− p.

We denote this distribution by Ber(p).

If X ~ Ber (p), then


𝜇 = 𝐸(𝑥) = ∑1𝑖=0 𝑥𝑃(𝑥 = 𝑖)
= 0× 𝑝(𝑥 = 0) + 1𝑝(𝑥 = 1)
=0+p =p
𝛿 2 = VarX = E(X - 𝜇)2 = ∑1𝑖=0(𝑥 − 𝜇)2 P(x=i)
= (0 - µ)2 P(X = 0) + (1 - µ)2 P(X = 1)
= (0 - p)2 (1 - p) + (1 - p)2 p
= p (1 - p) (p + 1 - p)
= p (1 - p)
2.1.2 Binomial Distribution

Let X1, …., Xn be mutually independent Bernoulli trials, each with success

probability p. Then

Y = ∑𝑛𝑖=1 𝑥𝑖 is a binomial r.v, denoted Y ~ Binomial (n, p)

9
Statistical Theory of Distribution

Definition: A discrete r.v Y has a binomial distribution with parameters n and

p, where n = 1, 2,… and 0 ≤ p ≤ 1, if its pmf is given by

P(k) = P(Y = k) = (𝑛𝑘)𝑝𝑘 𝑞 𝑛−𝑘 where k = 0, 1, 2, .. , n

Assumption of Binomial distribution

 There are only two possible outcomes for each trial success and failure.

 The probability of success is the same for each trial.

 The outcomes from different trials are independent.

 The no. of trial (n) is fixed

The mean and variance of Y is

E(y) = np and V(y) = np(1-p) = npq.

Proof:

E(y) = E( ∑𝑛𝑖=1 𝑋𝑖 )

∑𝑛𝑖=1 𝐸(𝑥𝑖)= ∑𝑛𝑖=1 𝑝 since Xi is mutually independent Bernoulli trial

= np

V(y) = V(∑𝑛𝑖=1 𝑋𝑖) = ∑𝑛𝑖=1 𝑉(𝑋𝑖) b/c Xi are independent

= ∑𝑛𝑖=1 𝑝(1 − 𝑝)

= np (1-p)

Theorem: If a r.v X is binomial distribution its moment and probability

generating functions are

Mx(t) = (pet + q)n and px(t) = (pt + q)n

Proof:

10
Statistical Theory of Distribution

𝑛 𝑛

𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑𝑛𝑥=0 𝑒 𝑡𝑥 ( ) 𝑝 𝑥 𝑞 𝑛−𝑥 = ∑𝑛𝑥=0 ( ) (𝑝𝑒 𝑡 )𝑥 𝑞 𝑛−𝑥 = (𝑝𝑒 𝑡 + 𝑞)𝑛
𝑥 𝑥

b/c for any value of a and b such that a and b, a > 0 , b > 0, and positive
𝑛
integer x and n ∑𝑛𝑥=0 ( ) (𝑎)𝑥 𝑏 𝑛−𝑥 = (a + b)n
𝑥

Now 𝑚𝑋′ (𝑡) = 𝑛𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 𝑞)𝑛−1 → 𝐸(𝑋) = 𝑚𝑋′ (0) = 𝑛𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−1 = 𝑛𝑝

𝑚𝑋′′ (𝑡) = 𝑛(𝑛 − 1)𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 𝑞)𝑛−2 + 𝑛𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 𝑞)𝑛−1

→ 𝑚𝑋′′ (0) = 𝑛(𝑛 − 1)𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−2 + 𝑛𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−1 = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝
2
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − (𝑚𝑋′ )2 = 𝑛𝑝𝑞

𝑛 𝑛
 Px(t) = 𝐸(𝑡 𝑥 ) = ∑𝑛𝑥=0 𝑡 𝑥 ( ) 𝑝 𝑥 𝑞 𝑛−𝑥 = ∑𝑛𝑥=0 ( ) (𝑝𝑡)𝑥 𝑞 𝑛−𝑥 = (𝑝𝑡 + 𝑞)𝑛
𝑥 𝑥

Exercise: using the pgf of binomial distribution calculate

i. the probability of X at x =

1and 2

ii. the mean of x

11
Statistical Theory of Distribution

Properties
Let X1, X2,…, , Xm be independent random variables with Xi ~ binomial(n i ;

p),

i= 1,2, …, m. Then ∑𝑚
𝑖=1 𝑋𝑖 ~ 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙(∑ 𝑛𝑖, 𝑝)

Remark: The probability of X = x for Bin(n, p) can be obtained from table as


follows
𝑓(𝑥; 𝑛, 𝑝) = 𝐹(𝑥; 𝑛, 𝑝) − 𝐹(𝑥 − 1; 𝑛, 𝑝), 𝑤ℎ𝑒𝑟𝑒 𝐹(−1) = 0.

Because, we tabulated the cumulative probabilities rather than the values

of 𝑓(𝑥; 𝑛, 𝑝)

2.1.3 Discrete Uniform Distribution

The probability mass function of a discrete uniform random variable X is given


1 1
by 𝑓(𝑥; 𝑁) = 𝑁 𝑓𝑜𝑟 𝑥 = 1, 2, … , 𝑁 𝑂𝑅 𝑓(𝑥; 𝑁) = 𝑁 𝐼{1,2,…,𝑁}𝑥

This distribution is used to model experimental outcomes which are equally

likely.

Theorem 2: If X has a discrete uniform distribution, then

𝑁+1 𝑁 2 −1 1 𝑒 𝑡 1− 𝑒 𝑁𝑡
𝑬(𝑋) = 𝑉(𝑋) = 𝑎𝑛𝑑 𝑚𝑋 (𝑡) = 𝑁 ∑𝑁
𝑥=1 𝑒
𝑥𝑡
= { 1− 𝑒 𝑡) }
2 12 𝑁

Proof:

𝑘(𝑘+1)
Remark ∑𝑘𝑖=1 𝑖 = ∑𝑘𝑖=1 𝑖 2 = 𝑘(𝑘 + 1)(2𝑘 + 1)/6
2

1 1 1 𝑁+1 𝑁+1
𝐸(𝑋) = ∑𝑁 𝑁 𝑁
𝑥=1 𝑥𝑓(𝑥) = ∑𝑥=1 𝑥 𝑁 = 𝑁 ∑𝑥=1 𝑥 = 𝑁 ( )𝑁 =
2 2

12
Statistical Theory of Distribution

2 1 𝑁+1 2 1 𝑁(𝑁+1)(2𝑁+1) 𝑁+1 2


𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑁 ∑𝑁 2
𝑥=1 𝑥 − ( ) = 𝑁( )−( ) =
2 6 2

(𝑁+1)(𝑁−1)
12

2.1.4 Geometric Distributions


Consider a sequence of independent Bernoulli trials with success
probability p. The distribution of the random variable that represents the
number of failures until the first success is called geometric distribution.
Let us consider the number of menstrual cycles of a woman to become

pregnant, measured from the moment she had decided to become pregnant. We

model the number of cycles up to pregnancy by a random variable X. Assume

that the probability that a woman becomes pregnant during a particular cycle is

equal to p, for 0 < p ≤ 1, independent of the previous cycles.

P(X = k) = P(no pregnancy in the first k − 1 cycles, pregnancy in the kth)

= (1 − p)k-1p for k = 1, 2, . . .

Or P(x =k) = (1 − p)kp for k = 0, 1, 2, …

This r.v X is an example of a r.v with a geometric distribution with parameter p.

Assumption of geometric distribution

 There are only two possible outcomes for each trial success and failure.

 The probability of success is the same for each trial.

 The outcomes from different trials are independent.

 The no. of trial (n) is not fixed

A random variable X that has a geometric distribution is often referred to as

a discrete waiting-time random variable. It represents how long (in terms of

the number of failures) one has to wait for a success.

13
Statistical Theory of Distribution

Theorem : If the random variable X has a geometric distribution, then

𝒒 𝒒 𝒑
𝑬(𝑿) = 𝒑 𝑽(𝑿) = 𝒑𝟐 𝒎𝑿 (𝒕) = 𝟏−𝒒𝒆𝒕

𝑑𝑞 𝑥
Proof: 𝐸(𝑋) = ∑∞ 𝑥 ∞
𝑥=0 𝑥𝑝𝑞 = 𝑝𝑞 ∑𝑥=0 𝑥𝑞
𝑥−1
= 𝑝𝑞 ∑∞
𝑥=0 =
𝑑𝑞

𝑑 𝑑 1 1
𝑝𝑞 𝑑𝑞 ∑∞ 𝑥
𝑥=0 𝑞 = 𝑝𝑞 𝑑𝑞 (1−𝑞 ) = 𝑝𝑞 (1−𝑞)2 = 𝑞/𝑝

∞ ∞ ∞
2) 2 𝑥 2 2 𝑥−2
𝐸(𝑋 = ∑ 𝑥 𝑝𝑞 = 𝑝𝑞 ∑ 𝑥 𝑞 = 𝑝𝑞 ∑(𝑥(𝑥 − 1) + 𝑥) 𝑞 𝑥−2
2

𝑥=0 𝑥=0 𝑥=0

∞ ∞ ∞ ∞
2 𝑥−2 2 𝑥−2 2
𝑑𝑞 𝑥−1
= 𝑝𝑞 ∑ 𝑥(𝑥 − 1) 𝑞 + 𝑝𝑞 ∑ 𝑥𝑞 = 𝑝𝑞 ∑ 𝑥 + 𝑝𝑞 ∑ 𝑥𝑞 𝑥−1
𝑑𝑞
𝑥=0 𝑥=0 𝑥=0 𝑥=0


𝑑 𝑞 𝑑 1 𝑞 2(1 − 𝑞) 𝑞
= 𝑝𝑞 2 ∑ 𝑥𝑞 𝑥−1 + = 𝑝𝑞 2 ( 2
) + = 𝑝𝑞 2 ( 4
)+
𝑑𝑞 𝑝 𝑑𝑞 (1 − 𝑞) 𝑝 (1 − 𝑞) 𝑝
𝑥=0

𝑞 2 𝑞 2𝑞 2 + 𝑝𝑞
=2 + =
𝑝2 𝑝 𝑝2

2 2𝑞 2 +𝑝𝑞 𝑞2 𝑞 2 +𝑝𝑞 𝑞
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = − 𝑝2 = = 𝑝2
𝑝2 𝑝2

∞ ∞
1 𝑝
𝑚𝑋 (𝑡) = ∑ 𝑒 𝑡𝑥 𝑝𝑞 𝑥 = 𝑝 ∑(𝑞𝑒 𝑡 )𝑥 = 𝑝 𝑡
=
1 − 𝑞𝑒 1 − 𝑞𝑒 𝑡
𝑥=0 𝑥=0

Theorem : If the random variable X has a geometric distribution with

parameter p, then

𝑷[𝑋 ≥ 𝑠 + 𝑡|𝑋 ≥ 𝑡] = 𝑃[𝑋 ≥ 𝑠] 𝑠, 𝑡 = 0,1,2, …

𝑃[{𝑋≥𝑠+𝑡}∩ {𝑋≥𝑡}] 𝑃(𝑋≥𝑠+𝑡) ∑∞


𝑥=𝑠+𝑡 𝑝𝑞
𝑥 𝑞 𝑠+𝑡
Proof: 𝑃[𝑋 ≥ 𝑠 + 𝑡|𝑋 ≥ 𝑡] = = = ∑∞ 𝑥
= = 𝑞𝑠 =
𝑃[𝑋≥𝑡] 𝑃(𝑋≥𝑡) 𝑥=𝑡 𝑝𝑞 𝑞𝑡

𝑃[𝑋 ≥ 𝑠]

14
Statistical Theory of Distribution

2. 1. 5 Negative Binomial Distribution

The distribution of the random variable that represents the number of

failures until the first success is called geometric distribution. Now let us

consider the r.v X that denotes the number of failures until the rth success.

If the r.v X has a negative binomial or pascal distribution with parameter r and

p then its pmf is defined as:

P(X = k/ r, p) = P(observing k failures in the first k + r - 1 trials)


P(observing a success at the (k + r)th trial)

= (𝑘+𝑟−1
𝑟−1
)𝑝𝑟−1 𝑞 𝑘 × 𝑝

Thus,
−𝑟 𝑟
P(x=k/r, p) = (𝑘+𝑟−1)𝑝𝑟 𝑞 𝑘 =(𝑘+𝑟−1)𝑝𝑟 𝑞 𝑘 = ( ) 𝑝 (−𝑞)𝑘 𝑓𝑜𝑟 𝑘 = 0,1,2
𝑟−1 𝑘 𝑘

A r.v X having a negative binomial distribution is often referred to as a discrete

waiting-time r.v. It represents how long (in terms of the number of failures) one

waits for the rth success.

Theorem : If the random variable X has a negative binomial distribution, then

𝒓𝒒 𝒓𝒒 𝒑 𝒓
𝑬(𝑿) = 𝑽(𝑿) = 𝒑𝟐 𝒎𝑿 (𝒕) = [𝟏−𝒒𝒆𝒕]
𝒑

𝑟
𝑡𝑥 −𝑟 −𝑟 𝑟 𝑝
Proof: 𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑥 ) = ∑∞ 𝑟 𝑥 ∞ 𝑡 𝑥
𝑥=0 𝑒 ( 𝑥 ) 𝑝 (−𝑞) = ∑𝑥=0 ( 𝑥 ) 𝑝 (−𝑞𝑒 ) = [1−𝑞𝑒 𝑡 ]

15
Statistical Theory of Distribution

2.1.6 Poisson Distribution

Suppose that events that occur over a period of time or space satisfy the

following:

1. The numbers of events occurring in disjoint intervals of time are

independent.

2. The probability that exactly one event occurs in a small interval of

time is Δ𝜆¸, where ¸𝜆 >0

3. It is almost unlikely that two or more events occur in a sufficiently

small interval of time.

4. The probability of observing a certain number of events in a time

interval ∆ depends only on the length of and not on the beginning

of the time interval.

The probability mass function of a Poisson distribution with mean 𝜆 is given by

𝜆𝑥 𝑒 −𝜆
𝑓(𝑥; 𝜆) = 𝑥 = 0, 1, 2, … ∞ 𝑎𝑛𝑑 𝜆 > 0
𝑥!

Theorem : If X be a random variable with a Poisson distribution; then


𝒕
𝑬(𝑿) = 𝝀 𝑽(𝑿) = 𝝀 𝒎𝑿 (𝒕) = 𝒆𝝀(𝒆 −𝟏) 𝑷𝒙(𝒕) = 𝒆𝝀(𝒕−𝟏)

𝒆𝒕𝒙 𝒆−𝜆 𝜆 𝒙 𝜆𝑥
Proof: 𝒎𝑿 (𝑡) = 𝐸(𝑒 tx ) = ∑∞
𝒙=𝟎 The infinite series ∑∞
=0 𝑥! is called the
𝒙!

Maclaurin’s series that is equivalent to 𝑒 𝜆

𝒙
(𝜆 𝑒 𝑡 )
𝒎𝑿 (𝑡) = 𝒆−𝜆 ∑∞
𝑡 𝑡 −1)
→ 𝒙=𝟎 = 𝒆−𝜆 𝒆𝜆 𝑒 = 𝑒 𝜆(𝑒
𝒙!

16
Statistical Theory of Distribution

𝑡
𝒎′𝒙 (𝑡) = 𝜆 𝒆−𝜆 𝒆𝒕 𝒆𝜆 𝑒 → 𝐸(𝑋) = 𝒎′𝒙 (0) = 𝜆
𝑡
→ 𝑚𝑥′′ (𝑡) = 𝜆 𝑒 −𝜆 𝑒 𝑡 𝑒 𝜆 𝑒 (𝜆 𝑒 𝑡 + 1)
2
→ 𝑚𝒙′′ (0) = 𝜆(𝜆 + 1) ≫ 𝑉(𝑋) = 𝑚𝑥′′ (0) − (𝑚𝑥′′ (𝑡)) = 𝜆

𝒕𝒙 𝒆−𝜆 𝜆 𝒙 (𝜆 𝑡)𝒙
Px(t) = 𝐸(𝑡 x ) = ∑∞
𝒙=𝟎 = 𝒆−𝜆 ∑∞
𝒙=𝟎 = 𝒆−𝜆 𝒆𝜆 𝑡 = 𝑒 𝜆(𝒕−1)
𝒙! 𝒙!

The Poisson distribution can also be developed as a limiting distribution of the

binomial, in which n → ∞ and p 0 so that np remains a constant. In other

words, for large n and small p, the binomial distribution can be approximated

by the Poisson distribution with mean 𝜆 = np.

To verify that let us consider P(S) =1 and based on this formula we have

𝑒 −𝜆 𝜆𝑥 𝜆𝑥
∑∞ ∞
𝑥=0 𝑓(𝑥) = ∑𝑥=0 = 𝑒 −𝜆 ∑∞
=0 𝑥!
𝑥!

→ ∑∞ −𝜆 𝜆
𝑥=0 𝑓(𝑥) = 𝑒 𝑒 = 1.

Let us now show that when 𝑛 → ∞ and 𝑝 → 0, while 𝑛𝑝 = 𝜆 remains

𝜆𝑥 𝑒 −𝜆
constant, the limiting form of the binomial distribution is for x= 0, 1, 2,
𝑥!

First let us substitute 𝜆/𝑛 for p into the formula for the binomial distribution

and simplify the resulting expression; thus, we get

𝑛! 𝜆 𝑥 𝜆 𝑛−𝑥 𝑛(𝑛 − 1) … (𝑛 − 𝑥 + 1) 𝑥 𝜆 𝑛−𝑥


𝑓(𝑥; 𝑛, 𝑝) = ( ) (1 − ) = 𝜆 (1 − )
𝑥! (𝑛 − 𝑥)! 𝑛 𝑛 𝑥! 𝑛 𝑥 𝑛

1 2 𝑥−1
(1 − 𝑛) (1 − 𝑛) … (1 − 𝑛 )
𝑥
𝜆 𝑛−𝑥
= 𝜆 (1 − )
𝑥! 𝑛
1 2 𝑥−1
If we let 𝑛 → ∞, we find that (1 − 𝑛) (1 − 𝑛) … (1 − ) → 1 and that
𝑛

17
Statistical Theory of Distribution

𝑛 𝜆
𝜆 𝑛−𝑥 𝜆 𝜆 𝜆 −𝑥
(1 − 𝑛) = [(1 − ) ] (1 − 𝑛) = 𝑒 −𝜆
𝑛

𝜆𝑥 𝑒 −𝜆
Hence, the binomial distribution f(x; n, p) approaches for x= 0, 1, 2, …
𝑥!

We obtain the probabilities of Poisson distribution from table by

𝑓(𝑥; 𝜆) = 𝐹(𝑥; 𝜆) − 𝐹(𝑥 − 1; 𝜆).

Remark

1. Let X1, X2,…, , Xn be independent observations from Poisson population

with E(xi) = 𝜆𝑖 𝑖 = 1, 2, . . , 𝑛. Then

∑𝑛𝑖=1 𝑋𝑖 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(∑𝑛𝑖=1 𝜆𝑖 ) for constant 𝜆𝑖 = 𝜆 ∑𝑛𝑖=1 𝑋𝑖 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝑛𝜆)

2. Recurrence relations:

𝜆
P(X = k /𝜆) = (𝑘) 𝑃(𝑋 = 𝑘 − 1) k = 1, 2, ….

𝜆
P(X = k + 1/𝜆) = (𝑘+1) 𝑃(𝑋 = 𝑘) k = 0, 1, 2, …

𝑘
P(X = k -1/𝜆) = (𝜆) 𝑃(𝑋 = 𝑘) k = 1, 2, 3, …

3. Relation to other distributions:

i. Binomial: Let X1 and X2 be independent poisson r,v with mean

𝜆1 𝑎𝑛𝑑 𝜆2 respactively. Then conditionality X1/(X1 + X2 = n) ~

𝜆1
binomial(n, )
𝜆1+𝜆2

ii. Multinomial: If X1, X2, …, Xm are independent Poisson (𝜆) random

variables, then the conditional distribution of X1 given X1, X2,

…,Xm = n is multinomial with n trials and cell probabilities p1 = p2

= ….= pm=1/m

iii. Gamma: let X be a poisson random variables. Then

18
Statistical Theory of Distribution

P( X ≤ 𝜆) = P(Y ≥ 𝜆) where Y is Gamma (k +1, 1) random variable

4. Approxmation:

𝑘−𝜆+0.5
Normal: P(X ≤ k /𝜆) ≅ 𝑃( 𝑍 ≤ )
√𝜆

𝑘−𝜆− 0.5
P(X ≥ k /𝜆) ≅ 𝑃( 𝑍 ≥ ) where X is a poisson (𝜆) r.v and Z is
√𝜆

the standard normal random variable.

2.1.7 Hyper-geometric Distribution

Suppose a population contains M success and N-M failures. The probability of

exactly x success in a random sample of size n is

𝑀 𝑁−𝑀
( )( )
𝑓(𝑥; 𝑛, 𝑀, 𝑁) = 𝑥 𝑛−𝑥
𝑁 𝑓𝑜𝑟 𝑥 = 0, 1, 2, … , 𝑛 where N is a positive integer, M is a
( )
𝑛

non-negative integer M≤ 𝑁 and n is a positive integer (n≤ 𝑁).

Hence 𝑓(𝑥; 𝑛, 𝑀, 𝑁) define a hyper-geometric distribution probability mass

function.

In practice, a hyper geometric distribution occurs as follows. Let an urn contain

N balls, of which M are white and (N-M) are black. Of these n balls are chosen

without replacement. So the probability of getting n balls out of the total of N

balls is 1/(𝑁𝑛). Let X be a r.v that denotes the number of white balls drawn. The

event of interest contains (𝑀


𝑥
)(𝑁−𝑀
𝑛−𝑥
) points to find the probability of x white

balls among the n balls drawn, so

19
Statistical Theory of Distribution

𝑀 𝑁−𝑀
( )( )
𝑃(𝑋 = 𝑥) = 𝑥 𝑛−𝑥 𝑓𝑜𝑟 𝑥 = 0, 1, 2, … , 𝑛
𝑁
( )
𝑛

We have hyper-geometric distribution, under the following conditions.

i. A single trial results in one of the two possible outcomes, say A and A’

ii. P(A) and hence P(A’) vary in repeated trials.

iii. Repeated trials are dependent.

iv. A fixed number of trials (n) are to be performed.

Note that the binomial distribution would apply if we do sampling with

replacement.

Example: A shipment of 20 digital voice recorders contains 5 that are defective.

If 10 of them are randomly chosen for inspection, what is the probability that 2

of the 10 will be defective?

i. When the sample is drawn without replacement

ii. When the sample is drawn with replacement


5 15
( )( )
Solution: i) 𝑓(2; 10,5,20) = 2 8
20 = 0.348
( )
10

ii) If the sample is drawn with replacement the selection of

each recorder has equal probability, p = 5/20 = 0.25

Thus, f(2, 10, 0.25) = ⟨10


2
⟩ (0.25)2(0.75)8 =

Theorem If X has a hyper-geometric distribution, then

𝑴 𝑴 𝑵−𝑴 𝑵−𝒏
𝑬(𝑿) = 𝒏 𝑵 𝑽(𝑿) = 𝒏 𝑵 𝑵 𝑵−𝟏

20
Statistical Theory of Distribution

𝑀 𝑁−𝑀 𝑀−1 𝑁−𝑀 𝑀−1 𝑁−𝑀


( )( ) 𝑀 ( )( ) 𝑀 ( )( ) 𝑀
Proof: 𝐸(𝑋) = ∑𝑥=0 𝑥 𝑥 𝑁𝑛−𝑥
𝑛
=𝑛 ∑ 𝑛 𝑥−1 𝑛−𝑥
=𝑛 ∑ 𝑛−1 𝑥−1 𝑛−𝑥
=𝑛𝑁
( ) 𝑁 𝑥=1 (
𝑁−1
) 𝑁 𝑥−1=0 (
𝑁−1
)
𝑛 𝑛−1 𝑛−1

𝑎 𝑏 𝑎+𝑏
using ∑𝑚
𝑖=0 ( 𝑖 ) ( )=( )
𝑚−𝑖 𝑚

𝑛 𝑀 𝑁−𝑀 𝑀−2 𝑁−𝑀


( )( ) 𝑛(𝑛 − 1)𝑀(𝑀 − 1) 𝑛 ( )( )
𝐸(𝑋(𝑋 − 1)) = ∑ 𝑥(𝑥 − 1) 𝑥 𝑛 − 𝑥 = ∑ 𝑥−2 𝑛−𝑥
𝑁 𝑁(𝑁 − 1) 𝑁−2
𝑥=0 ( ) 𝑥=2 ( )
𝑛 𝑛−2

𝑀(𝑀 − 1)
= 𝑛(𝑛 − 1)
𝑁(𝑁 − 1)
2 2
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝐸(𝑋(𝑋 − 1)) + 𝐸(𝑋) − (𝐸(𝑋))

𝑀(𝑀 − 1) 𝑀 𝑀 2 𝑀 (𝑁 − 𝑀)(𝑁 − 𝑛)
= 𝑛(𝑛 − 1) + 𝑛 − (𝑛 ) = 𝑛
𝑁(𝑁 − 1) 𝑁 𝑁 𝑁 𝑁(𝑁 − 1)

𝑀 𝑀 𝑁−𝑛
=𝑛 (1 − )
𝑁 𝑁 𝑁−1

Remark: Let X and Y be independent binomial random variables with common

success probability p and numbers of trials m and n, respectively. Then

𝑃(𝑋=𝑘)𝑃(𝑌=𝑠−𝑘)
P(X = k/X + Y =s) = which simplifies to
𝑃(𝑋+𝑌=𝑠)

(𝑚 𝑛
𝑘 )(𝑠−𝑘)
P(X = k/X + Y =s) = (𝑚+𝑛
𝑠 )

Thus, the conditional distribution of X given X + Y = s is hyper geometric (s, m,

m + n).

Approximation:

i. Let P = M/N, then for large N and M

P( X = k) ≅ (𝑛𝑘)𝑝𝑘 𝑞 𝑛−𝑘

ii. Let (M/N) be small and n is large such that n(M/N) = 𝜆. Then

21
Statistical Theory of Distribution

𝑒 −𝜆 𝜆𝑘
P(X = k) ≅ 𝑘!

2.1.8 The Multinomial Distribution

An immediate generalization of the binomial distribution arises when each trial

can have more than two possible outcomes. This happens, for example, when a

manufactured product is classified as superior, average, or poor, when a

student’s performance is graded as A, B,C, D and F. To treat this kind of

problem in general, let us consider the case where there are n independent

trials, with each trial permitting k mutually exclusive outcomes whose

respective probabilities are 𝑝1 , 𝑝2 , 𝑝3 , … , 𝑝𝑘 (𝑤𝑖𝑡ℎ ∑ 𝑝𝑖 = 1).

Referring to the outcomes as being of the first kind, the second kind, …, and

the k-th kind, we shall be interested in the probability 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑘 ) of getting x1

outcomes of the first kind, x2 outcomes of the second kind, …, and xk outcomes

of the kth kind, with ∑𝑘𝑖=1 𝑥𝑖 = 𝑛.

The desired probability is given by

𝑛!
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑘 ) = 𝑥 𝑝1𝑥1 𝑝2𝑥2 … 𝑝𝑘𝑥𝑘 𝑓𝑜𝑟 𝑥𝑖 = 0,1, … , 𝑛
1 !𝑥2 ! … 𝑥𝑘 !

but the xi subject to the restriction ∑𝑘𝑖=1 𝑥𝑖 = 𝑛. The joint probability

distribution whose values are given by these probabilities is called the

multinomial distribution

Example: The probabilities that the light bulb of a certain kind slide projector

will last fewer than 40 hours of continuous use, anywhere from 40 to 80 hours

of continuous use, or more than 80 hours of continuous use, are 0.3, 0.5 and

0.2. Find the probability that among eight such bulbs 2 will last fewer than 40

22
Statistical Theory of Distribution

hours, 5 will last anywhere from 40 to 80 hours, and 1 will last more than 80

hours.
8!
Solution: 𝑓(2,5,1) = 2!5!1! (0.3)2 (0.5)5 (0.2) = 0.0945

2.2. Important Univariate Continous Distributions

2.2.1 Uniform or Rectangular Distribution

Definition: A random variable X is said to have a uniform distribution on the

interval [a, b] −∞ < 𝑎 < 𝑏 < ∞ if its pdf is given by


1
a< 𝑥<𝑏
fx(x) = {𝑏−𝑎
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

If X ~U[a, b] then the distribution function is


𝑥−𝑎
𝑎≤𝑥<𝑏
𝑏−𝑎
F(x) = { 0 𝑥<𝑎
1 𝑏≤𝑥

Theorem: If the random variable X has a uniform distribution on [a, b], then

𝑎+𝑏 (𝑏−𝑎)2 𝑒 𝑏𝑡 −𝑒 𝑎𝑡
E(x) = Var(x) = Mx(t) = (𝑏−𝑎)𝑡
2 12

𝑏 1 𝑏 2 −𝑎2 𝑎+𝑏
Proof: E(x) = ∫𝑎 𝑥 𝑏−𝑎dx = = //
2(𝑏−𝑎) 2

𝑏 1 𝑏 3 −𝑎3 (𝑏−𝑎)(𝑎2 + 𝑎𝑏+ 𝑏2 )


E(x2) = ∫𝑎 𝑥 2 𝑏−𝑎 𝑑𝑥 = =
3(𝑏−𝑎) 3(𝑏−𝑎)

(𝑎2 + 𝑎𝑏+ 𝑏 2 ) 𝑎+𝑏 2


V(x) = E(x2) – (E(x))2 = -( )
3 2

𝑏−𝑎 2
=( ) //
12

𝑏 1 𝑒 𝑏𝑡 −𝑒 𝑎𝑡
Mx(t) = E(𝑒 𝑡𝑥 ) = ∫𝑎 𝑒 𝑡𝑥 𝑏−𝑎 𝑑𝑥 = (𝑏−𝑎)𝑡
//

23
Statistical Theory of Distribution

2.2.2 Normal Distribution

The probability density function of a normal random variable X with mean 𝜇

and standard deviation 𝜎 is given by

1 (𝑋−𝜇)2
f(x; 𝜇, 𝜎) = exp(− ) for −∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞ 𝑎𝑛𝑑 𝜎 > 0
√2𝜋 𝜎 2𝜎2

This distribution is commonly denoted by N (𝜇, 𝜎 2 ).

The mean 𝜇 is the location parameter, and the standard deviation 𝛿 is the scale

parameter.

The normal random variable with mean 𝜇 = 0 and standard deviation 𝜎 = 1 is

called the standard normal random variable with pdf

1 𝑧2
f(z) = exp(− 2 ) for −∞ < 𝑧 < ∞,
√2𝜋

and its cdf is denoted by Φ(z).

If X is a normal random variable with mean 𝜇 and standard deviation 𝛿, then

(𝑥−𝜇)⁄
𝑥−𝜇 1 𝜎 𝑥−𝜇
P(x≤ 𝑥) = 𝑃 (𝑍 ≤ )= ∫ 𝑒𝑥𝑝(−𝑡 2 ⁄2) 𝑑𝑡 = Φ ( )
𝜎 √2𝜋 −∞ 𝜎

Theorem: If the random variable X is a normal distribution, then

𝜎2 𝑡2
E(X) = 𝜇 Var(X) = 𝛿 2 Mx(t) = 𝑒 𝜇𝑡+ 2

Proof:
∞ −1 𝑥−𝜇 2
1
𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) 𝑡𝜇
= 𝑒 𝐸(𝑒 𝑡(𝑋−𝜇)
)=𝑒 𝑡𝜇
∫ 𝑒 𝑡(𝑥−𝜇)
𝑒 2 ( 𝜎 ) 𝑑𝑥
−∞ 𝜎√2𝜋


1 −1
[(𝑥−𝜇)2 −2𝜎2 𝑡(𝑥−𝜇)]
= 𝑒 𝑡𝜇 ∫ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞

If we complete the square inside the bracket, it becomes

(𝑥 − 𝜇)2 − 2𝜎 2 𝑡(𝑥 − 𝜇) = (𝑥 − 𝜇)2 − 2𝜎 2 𝑡(𝑥 − 𝜇) + 𝜎 4 𝑡 2 − 𝜎 4 𝑡 2 = (𝑥 − 𝜇 − 𝜎 2 𝑡)2 − 𝜎 4 𝑡 2

24
Statistical Theory of Distribution

and we have

𝜎2 𝑡2 ∞ −1
1 (𝑥−𝜇−𝜎2 𝑡)
𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = 𝑒 𝑡𝜇 𝑒 2 ∫ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞

1
The integral together with the factor is necessarily 1 since it is the area
𝜎√2𝜋

under a normal distribution with mean 𝜇 + 𝜎 2 𝑡 and variance 𝜎 2 .

𝜎2 𝑡2 𝜎2 𝑡2
Hence 𝑚𝑋 (𝑡) = 𝑒 𝑡𝜇 𝑒 2 = 𝑒 𝜇𝑡+ 2

2
𝐸(𝑋) = 𝑚𝑋′ (0) = 𝜇 𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − 𝜇 2 = 𝜎 2
Example: Suppose that an instructor assumes that a student’s final score is the

value of a normally distributed random variable. If the instructor decides to

award a grade of A to thus students whose score exceeds 𝜇 + 𝜎, a B to those

students whose score falls between 𝜇 and 𝜇 + 𝜎, a C if a score falls between 𝜇 −

𝜎 and 𝜇, a D if a score falls between 𝜇 − 2𝜎 and 𝜇 − 𝜎, and an F if the score falls

below 𝜇 − 2𝜎, then the proportions of each grade given can be calculated. For

example, since

𝜇+𝜎−𝜇
𝑃[𝑋 > 𝜇 + 𝜎] = 1 − 𝑃[𝑋 < 𝜇 + 𝜎] = 1 − 𝐹(𝑧) = 1 − 𝐹 ( ) = 1 − 𝐹(1) ≈
𝜎

0.1587

One would expect 15.87 percent of the students to receive A’s.

2.2.3 Exponential Distributions

A classical situation in which an exponential distribution arises is as follows:

Consider a Poisson process with mean 𝜆, where we count the events occurring

in a given interval of time or space. Let X denote the waiting time until the first

event to occur. Then, for a given x > 0,

25
Statistical Theory of Distribution

P (X > x) = P (no event in (0,x))

= exp(-x𝜆)

and hence

P(X≤ 𝑥) = 1 – exp(-x𝜆) (*)

The distribution in (*) is called the exponential distribution with mean waiting

time b = 1/𝜆. The probability density function is given by

1 𝑥
f(x/b) = 𝑏 exp (− 𝑏) , 𝑥 > 0, 𝑏 > 0.

Or f(x/𝜆) = 𝜆 exp(−𝜆𝑥) 𝑋 > 0 𝑎𝑛𝑑 𝜆 > 0

Remark: Before we calculate the mean and variance of exponential and gamma

functions,

We have: Γ(𝑡) = ∫0 𝑥 𝑡−1 𝑒 −𝑥 𝑑𝑥 ……. …………… (**)

Γ(𝑡) = (𝑡 − 1)Γ(𝑡 − 1) …………………..(***)

So when t is a positive integer Γ(𝑡) = (t – 1)!

∞ Γ(𝑡)
Also ∫0 𝑥 𝑡−1 𝑒 −𝜆𝑥 𝑑𝑥 = …………………..(****)
𝜆𝑡

1 ∞
Γ (2) = ∫0 𝑥 −1/2 𝑒 −𝑥 𝑑𝑥 = √𝜋 …………..(*****)

Thus, the mean and variance of exponential distribution are:


∞ 𝜆Γ(2) 1
E(x) = 𝜆 ∫0 𝑥𝑒 −𝜆𝑥 𝑑𝑥 = =
𝜆2 𝜆

∞ 𝜆Γ(3) 2
E(𝑋 2 ) = 𝜆 ∫0 𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = =
𝜆3 𝜆2

2 1 1
V(x) = E(x2) – (E(x))2 = − =
𝜆2 𝜆2 𝜆2

Theorem: If a random variable X has an exponential distribution, then

1 𝜆
Mx(t) = =
(1−𝑏𝑡) 𝜆−𝑡

26
Statistical Theory of Distribution

∞ ∞
Proof: Mx(t) = E (𝑒 𝑡𝑥 ) = ∫0 𝑒 𝑡𝑥 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆 ∫0 𝑒 −𝑥(𝜆−𝑡) 𝑑𝑥

𝜆Γ(1) 𝜆
= =
𝜆−𝑡 𝜆−𝑡

Properties:

1. Memory less Property: For a given t > 0 and s > 0, P(X > t + s/X > s) =

P(X > t) where X is the exponential random variable



𝑃(𝑋>𝑠+𝑡) 𝜆 ∫𝑠+𝑡 𝑒 −𝜆𝑥 𝑒 −𝜆(𝑠+𝑡)
→ P(X > s + t/ X > s) = = ∞ = = 𝑒 −𝜆𝑡 = 𝑃( 𝑋 > 𝑡)
𝑃(𝑋>𝑠) 𝜆 ∫𝑠 𝑒 −𝜆𝑥 𝑑𝑥 𝑒 −𝜆(𝑠)

2. Let X1, X2, …, Xn be independent exponential(0, b) random variables.

Then

∑𝑛𝑖=1 𝑋𝑖 ~ 𝑔𝑎𝑚𝑚𝑎 (𝑛, 𝑏)

2.2.4 Gamma Distributions

The gamma distribution can be viewed as a generalization of the exponential

distribution with mean 1/𝜆¸λ > 0. An exponential random variable with mean

1/𝜆 represents the waiting time until the first event to occur, while the gamma

random variable X represents the waiting time until the 𝑟th event to occur.

Therefore,

X = ∑𝑟𝑖=1 𝑌𝑖

Where Y1, Y2, …, Y𝑟 are independent exponential random variables with mean

1/𝜆

The probability density function of X is given by

𝜆𝑟 𝑒 −𝜆𝑥 𝑥 𝑟−1
f(x ) = 𝑥 > 𝑜, 𝑟 > 0, 𝜆 > 0.
Γ(𝑟)

27
Statistical Theory of Distribution

The distribution defined above is called the gamma distribution with shape

parameter 𝑟 and the scale parameter𝜆.

The gamma probability density plots in Figure 1 indicate that the degree of

asymmetry of the gamma distribution diminishes as 𝑟 increases. For large 𝑟 (X

- 𝑟)/√𝑟 is approximately distributed as the standard normal random variable.

Figure 1.

f(x) Graph of some gamma a pdfs (λ=1)

r=1

r=2

r=4

0 1 2 3 x

If the random variable X has a gamma distribution with parameter r and λ, then

𝒓 𝒓 𝝀 𝒓
𝑬(𝑿) = 𝑽(𝑿) = 𝒎𝑿 (𝒕) = ( ) for t<λ
𝝀 𝝀𝟐 𝝀−𝒕

Proof:

𝜆𝑟 −∞ 𝜆𝑟 Γ(𝑟+1) 𝑟
i. E(x) = ∫ 𝑥𝑥 𝑟−1 𝑒 −𝜆𝑥 𝑑𝑥 = =
Γ(𝑟) 0 Γ(𝑟)𝜆𝑟+1 𝜆

𝜆𝑟 −∞ 𝜆𝑟 Γ(𝑟+2) 𝑟(𝑟+1)
E(𝑥 2 ) = ∫ 𝑥 2 𝑥 𝑟−1 𝑒 −𝜆𝑥 𝑑𝑥 = =
Γ(𝑟) 0 Γ(𝑟)𝜆𝑟+2 𝜆2

28
Statistical Theory of Distribution

⟹ 𝑉𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − (𝐸(𝑥))2

𝑟(𝑟+1) 𝑟 𝑟
= – (𝜆 )2 = //
𝜆2 𝜆2


𝜆𝑟 𝑡𝑥 𝑟−1 −𝜆𝑥
𝑚𝑋 (𝑡) = 𝐸[𝑒 𝑡𝑋 ] = ∫ 𝑒 𝑥 𝑒 𝑑𝑥
0 г(𝑟)

𝜆 𝑟 ∞ (𝜆 − 𝑡)𝑟 𝑟−1 −(𝜆−𝑡)𝑥 𝜆 𝑟


=( ) ∫ 𝑥 𝑒 𝑑𝑥 = ( )
𝜆−𝑡 0 г(𝑟) 𝜆−𝑡

𝑚𝑋′ (𝑡) = 𝑟𝜆𝑟 (𝜆 − 𝑡)−𝑟−1 𝑎𝑛𝑑 𝑚𝑋′′ (𝑡) = 𝑟(𝑟 + 1)𝜆𝑟 (𝜆 − 𝑡)−𝑟−2 ;

𝑟 2 𝑟 2 𝑟
Hence 𝐸(𝑋) = 𝑚𝑋′ (0) = 𝜆 𝑎𝑛𝑑 𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − (𝜆) = 𝜆2

2.2.5 Beta Distributions

The probability density function of a beta random variable with shape

parameters a and b is given by

1
𝑓(𝑥; 𝑎, 𝑏) = 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝐼(0,1) (𝑥) 𝑤ℎ𝑒𝑟𝑒 𝑎 > 0 𝑎𝑛𝑑 𝑏 > 0
𝐵(𝑎,𝑏)

1
where the beta function B(a; b) = Γ(a) Γ (b)/Γ (a + b) and 𝐵(𝑎, 𝑏) = ∫0 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥 .

A situation where the beta distribution arises is given below.

Consider a Poisson process with arrival rate of 𝜆 events per unit time. Let Wk denote the waiting

time until the kth arrival of an event and Ws denote the waiting time until the sth arrival, s > k.

Then, Wk and Ws - Wk are independent gamma random variables with

Wk ~ gamma(k, 1/λ¸) and Ws - Wk ~ gamma(s - k; 1/𝜆)

The proportion of the time taken by the first k arrivals in the time needed for the first s arrivals is
𝑊𝑘 𝑊𝑘
= ~ 𝑏𝑒𝑡𝑎 (𝑘, 𝑠 − 𝑘).
𝑊𝑠 𝑊𝑘 + (𝑊𝑠 − 𝑊𝑘)
Remark: The beta distribution reduces to the uniform distribution over (0, 1) if a=b=1.
 Let x1, x2, …., xn be a sample from a beta distribution with shape parameter a and b. Let

29
Statistical Theory of Distribution

𝑛 𝑛
1 1
𝑥̅ = ∑ 𝑋𝑖 𝑎𝑛𝑑 𝑠 2 = ∑(𝑋𝑖 − 𝑥̅ )2
𝑛 𝑛−1
𝑖=1 𝑖=1

𝑥̅ (1−𝑥̅ ) 𝑥̅ (1−𝑥̅ )𝑎̂


Hence, 𝑎̂ = 𝑥̅ ⌈ − 1⌉ 𝑏̂ =
𝑠2 𝑠2

Theorem: If the random variable X has a beta -distribution with parameters a and b, then
𝒂 𝒂𝒃
𝑬(𝑿) = 𝒂+𝒃 𝑽(𝑿) = (𝒂+𝒃+𝟏)(𝒂+𝒃)𝟐

1 1 B(k+a,b) г(k+a)г(b) г(a+b)


Proof: E[X k ] = ∫ x k+a−1 (1 − x)b−1 dx = = . =
B(a,b) 0 B(a,b) г(k+a+b) г(a)г(b)

г(k+a)г(a+b)
г(a)г(k+a+b)

г(1+𝑎)г(𝑎+𝑏) 𝑎 г(𝑎)г(𝑎+𝑏) 𝑎
Thus, E(x) = = =
г(𝑎)г(1+𝑎+𝑏) 𝑎+𝑏 г(𝑎)г(𝑎+𝑏) 𝑎+𝑏

г(2+𝑎)г(𝑎+𝑏) (𝑎+1)𝑎 г(𝑎)г(𝑎+𝑏) (𝑎+1)𝑎


E(x2) = = = (𝑎+𝑏+1)(𝑎+𝑏)
г(𝑎)г(2+𝑎+𝑏) (𝑎+𝑏+1)(𝑎+𝑏) г(𝑎)г(𝑎+𝑏)

(𝑎+1)𝑎 𝑎 2 𝒂𝒃
Var(x) = (𝑎+𝑏+1)(𝑎+𝑏)
− (𝑎+𝑏) = (𝒂+𝒃+𝟏)(𝒂+𝒃)𝟐

2.2.6. Weibull Distribution

Let Y be a standard exponential random variable with probability density function

f(y) = 𝑒 −𝑦 𝑌 >0

Define X = b𝑌1/𝑐 + m b > 0, c > 0

30
Statistical Theory of Distribution

The distribution of X is known as the Weibull distribution with shape parameter c, scale

parameter b, and the location parameter m. Its probability density is given by

𝑐 𝑥−𝑚 𝑐−1 𝑥−𝑚 𝑐


f(x/b, c, m) = ( ) 𝑒𝑥𝑝 {− [ ] } 𝑥 > 𝑚, 𝑏 > 0, 𝑐 > 0.
𝑏 𝑏 𝑏

𝑐 𝑥 𝑐−1 𝑥 𝑐
Or f(x/b, c, m = 0) = ( ) 𝑒𝑥𝑝 {− [𝑏] } 𝑏 > 0, 𝑐 > 0.
𝑏 𝑏

Applications:

The Weibull distribution is one of the important distributions in reliability theory. It is widely

used to analyze the cumulative loss of performance of a complex system in systems engineering.

In general, it can be used to describe the data on waiting time until an event occurs. In this

manner, it is applied in risk analysis, actuarial science and engineering. Furthermore, the Weibull

distribution has applications in medical, biological, and earth sciences.

Theorem: If the random variable X has a Weibull distribution, then

𝑐+1 C+2 𝑐+1 2


E(x) = bΓ ( ) Var(x) = 𝑏 2 {Γ ( ) − [Γ ( )] }
𝑐 c 𝑐

Proof:

∞ 𝑐 𝑥 𝑐−1 𝑥 𝑐 𝑥 𝑐 𝑐
E(x) = ∫0 𝑥 𝑏 (𝑏) 𝑒𝑥𝑝 {− [𝑏] } 𝑑𝑥 → 𝑙𝑒𝑡 𝑢 = [𝑏] → 𝑑𝑢 = 𝑥 𝑐−1 𝑑𝑥 → 𝑥 =
𝑏𝑐

𝑏𝑢1/𝑐

∞ 1
𝑐 𝑥 𝑐−1 𝑏𝑐 ∞ 1
Thus, E(x) = ∫0 𝑏𝑢 𝑐 𝑏 (𝑏) 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 = ∫0 𝑏𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 =
𝑐𝑥 𝑐−1

1

𝑏 ∫0 𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢

∞ 𝑐+1
−1 𝑐+1
= b ∫0 𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 = 𝑏Γ ( 𝑐
)

31
Statistical Theory of Distribution

𝑋−𝑚 𝑐
Remark: Let X be a Weibull (b, c, m) random variable. Then, ( ) ~ exp(1) is the
𝑏

exponential distribution with mean 1.

2.2.7. Cauchy Distribution

The probability density function of a Cauchy distribution with the location parameter a and the

scale parameter b is given by


1
f(x/a, b) = 𝑥−𝑎 2
−∞ < 𝑎 < ∞, 𝑏 > 0
𝜋𝑏[1+( ) ]
𝑏

Note that:

If the r.v X follows Cauchy distribution its mean and variance do not exist.

It can be postulated as a model for describing data that arise as n realizations of the ratio

of two normal random variables

If X and Y are independent standard normal random variables, then U = X/Y follows the

Cauchy distribution with probability density function


1
f(u) = 𝜋[1+𝑢2 ] −∞ < 𝑢 < ∞, that is the standard Cauchy distribution with b =

1 and a = 0 .

CHAPTER THREE

Common Multivariate Distributions

1.1. Multidimensional Random Variables

What is random variable?

32
Statistical Theory of Distribution

Random variable(r.v) is simply a function defined on a sample space S and taking values in real

line R = (−∞, ∞). In the study of many random experiments, there are, or can be, more than one

r.v of interest; hence we are compelled to extend our definitions of the distribution and density

function of one r.v to those of several r.vs. For example, during the “health awareness week” we

may consider a population of university students and record randomly selected student’s

height(X1), weight (X2), age (X3) and blood pressure (X4). Each individual r.v may be assumed

to follow some appropriate probability distribution in the population but it may also quite

reasonable to assume that four variables together follow a certain joint probability distribution.

1.2.Trinomial and Multinomial Distribution

Let X1, X2, …, Xk be k random variables all defined on the same probability space

i. The joint probability mass function (pmf) of x = (x1, x2, …, xk) is given by

P(x) = P(X1 = x1, X2 = x2, …, Xk = xk) for all xi ∈ Xi I = 1, 2, .., k.

Such that

a. P(x) ≥ 𝑜 𝑓𝑜𝑟 x = (x1, x2, … , xk)

b. ∑𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 ∈𝑋 𝑃(𝑥) = 1

ii. The joint probability density function (pdf) of x = (x1, x2, …, xk) is given by

f(x) = f (x1, x2, …, xk) for x = (x1, x2, …, xk) such that

a. f(x) ≥ 0 for all x = (x1, x2, …, xk)


∞ ∞
b. ∫−∞ … ∫−∞ 𝑓 (𝑥)𝑑𝑥1 … 𝑥𝑘 = 1

iii. The Joint cumulative distribution function of X1, X2, …, Xk denoted by FX1, …, Xk(x1, x2,

…, xk) is defined as P(X1 ≤ x1 , X2 ≤ x2 , … , Xk ≤ xk ) =

𝑥1 𝑥𝑘
∫−∞ … ∫−∞ 𝑓𝑥1,… ,𝑥𝑘 (𝑢1 , 𝑢2 , … . , 𝑢𝑘 )𝑑𝑢1 … 𝑢𝑘 for all (x1, x2, …, xk),

f(x1, x2, …, xk) is the joint pdf.

33
Statistical Theory of Distribution

iv. Marginal Cumulative distribution functions: If FX1, …, Xk(x1, x2, …, xk) is the joint

cumulative distribution function of X1, X2, …, Xk , then the cumulative distribution functions

F(X1), F(X2), …, F(Xk ) are called marginal cumulative distribution functions.

For example: let (X, Y) be two discrete or Continous random variables assuming all values in

some region (range space). The joint probability density function f is a function satisfying the

following conditions.

X and Y discrete X and Y Continous

f(x,y) is the joint pmf if f(x,y) is the joint pdf if

 f(x, y) ≥ 0 ∀ 𝑥, 𝑦  f(x, y) ≥ 0 ∀ 𝑥, 𝑦
∞ ∞
 ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1  ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1

Marginal probability distributions:

∑𝑦 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠


f(x) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

∑𝑥 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠


f(y) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

Conditional Probability density function

Let X an Y be two random variables with joint probability f(x, y), then the conditional

probability of

𝑓(𝑥,𝑦)
a. Y given X = x is given by f(Y/ X = x) = 𝑓(𝑥)

𝑓(𝑥,𝑦)
b. X given Y = y is given by f(X/ Y = y) = 𝑓(𝑦)

34
Statistical Theory of Distribution

Let X1, X2, …, Xk be k random variables, discrete or continuous with joint probability

function f(x1, x2, …, xk) and marginal probability function f1(x1), …, fk(xk) respectively. The

random variables X1, X2, …, Xk are said to be mutually independent if and only if f(x1, x2, …,

xk) = f1(x1). f2(x2), …, . fk(xk for all (x1, x2, …, xk) with in their range.

Example1: Let X and Y be two random variables whose joint probability mass function is given

in the table.

Y
-2 -1 1 2
X P(x) P(X≤ 𝑥)

-1 1/16 1/8 1/8 1/16 3/8


0 1/16 1/16 1/16 1/16 ¼
1 1/16 1/8 1/8 1/16 3/8
p(y) 3/16 5/16 5/16 3/16

P(Y≤ 𝑦)

a. Marginal pmf of X and Y

b. F(x = 0, Y = -1), F(0, 1)

c. P( X / Y = 2), P(Y/X = 0)

d. Are they independent r.vs? How?

Example 2: Let (X, Y) have the distribution defined by the density function

𝑒 −𝑥−𝑦 𝑥 > 0, 𝑦 > 0


f(x, y) = {
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Determine the following:

a. Marginal probability density function of x and Y

b. Joint distribution function

c. P(x + y ≤ 4)

d. Are X and Y independent variables?

35
Statistical Theory of Distribution

e. P(X/Y=y) and P(Y/ X = x)

Solutions: a) f(x) = 𝑒 −𝑥 𝑓𝑜𝑟 𝑥 > 0, 𝑓(𝑦) = 𝑒 −𝑦 𝑓𝑜𝑟 𝑦 > 0.

b) Fx,y(X, Y) = ( 1 - 𝑒 −𝑥 )(1 − 𝑒 −𝑦 ) 𝑥, 𝑦 > 0.

C) 1 - 5𝑒 −4

Example 3: The probability density function of a two dimensional random variables (X, Y) is

given by

x + y 0<𝑥+𝑦 <1
f(x, y) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

i. Marginal pdf of X and Y

ii. Evaluate P(X < ½, y > ¼ )

Solution:

ii) 𝑦

x=½

y=¼

x+y=1 x
1 1 1−𝑥
1−𝑥 𝑦2
P( X < ½ , Y > ¼) = ∫02 ∫1 (𝑥 + 𝑦)𝑑𝑥𝑑𝑦 = ∫02 {⌈𝑥𝑦 + ⌉ } 𝑑𝑥
4 2 1/4

1
(1−𝑥)2 𝑥 1
= ∫0 [𝑥(1 − 𝑥) +
2 − − 32] 𝑑𝑥
2 4

1/2 15 𝑥2 𝑥
= ∫0 [32 − − 4] 𝑑𝑥 = 35/192 //
2

Conditional Expectation:

If (X, Y) is two dimensional random variables we define the conditional expectation as

follows:

36
Statistical Theory of Distribution


E(X/Y) = ∫−∞ 𝑥𝑔(𝑥/𝑦)𝑑𝑥
 ∞ 𝑤ℎ𝑒𝑟𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠 𝑟. 𝑣𝑠
𝐸(𝑌/𝑋) = ∫=∞ 𝑦𝑔(𝑦/𝑥)𝑑𝑦

𝐸(𝑋/𝑌) = ∑∞𝑖=1 𝑥𝑖𝑃(𝑥𝑖/𝑦𝑗)


 𝐸(𝑌/𝑋) = ∑∞
𝑤ℎ𝑒𝑟𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟. 𝑣𝑠.
𝑗=1 𝑦𝑗𝑃(𝑦𝑗/𝑥𝑖)

Example: Suppose the joint density function of X and Y is given by

𝑒 −𝑥/𝑦 𝑒 −𝑦
f(x, y) = { 𝑜 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑦
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Compute E(X/Y).

∞ 𝑓(𝑥,𝑦) ∞
Solution: E(X/Y) = ∫−∞ 𝑥𝑓(𝑥/𝑦)𝑑𝑥 where f(x/y) = 𝑓(𝑦)
and f(y) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥

𝑥
− 𝑥
∞ 𝑒 𝑦 𝑒 −𝑦 𝑒 −𝑦 ∞ − 𝑋 𝑑𝑥
→ f(y) = ∫0 dx = ∫ 𝑒 𝑦 𝑑𝑥 𝑙𝑒𝑡 𝑈 = → 𝑑𝑢 = → 𝑑𝑥 =
𝑦 𝑦 0 𝑌 𝑦

𝑦𝑑𝑢

𝑒 −𝑦 ∞
→ f(y) = 𝑦
∫0 𝑒 −𝑢 𝑦𝑑𝑢 = 𝑒 −𝑦 (−𝑒 −𝑢 )|∞
0

= 𝑒 −𝑦 (0 − −𝑒 0 ) = 𝑒 −𝑦
𝑥

𝑒 𝑦 𝑒−𝑦 −
𝑥
𝑓(𝑥,𝑦) 𝑦 𝑒 𝑦
𝑓𝑥 (𝑋, 𝑌) = =
𝑦 𝑓(𝑦) 𝑒 −𝑦 𝑦

𝑥

𝑒 𝑦
→ 𝑓𝑥 (𝑋, 𝑌) = { 𝑜 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑦
𝑦
𝑜 𝑂 𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥
∞𝑥 − 𝑑𝑥
Thus. E(X/Y) = ∫ 𝑥 𝑓𝑥 (𝑋, 𝑌)𝑑𝑥 = ∫0 𝑒 𝑦 𝑑𝑥 → 𝑙𝑒𝑡 𝑧 = 𝑥/𝑦 dz = 𝑑𝑥 = 𝑦𝑑𝑧
𝑦 𝑦 𝑦

∞ ∞
E(X/Y) = ∫0 𝑧𝑒 −𝑧 𝑦𝑑𝑧 = 𝑦 ∫0 𝑧𝑒 −𝑧 𝑑𝑧 = 𝑦Γ(2)


Since Γ(𝑎) = ∫0 𝑥 𝑎−1 𝑒 −𝑥 𝑑𝑥

→ E(X/Y) = y(1) = y//

1.3. Bi-variate and Multivariate Normal distributions

37
Statistical Theory of Distribution

Chapter Four

Sampling Distribution

4.1.Introduction

In statistics a collection of objects whose elements are examined in view of their individual

characteristics is called population. Whereas sample is a part of population of which

conclusion is drawn about the population based on its results. In Continous case population

38
Statistical Theory of Distribution

are usually values of identically distributed random variables, whose distribution we referred

to as a population distribution.

Definition: If X1, X2, …, Xn are independently and identically distributed random variables,

we say that they constitute a random sample from the infinite population given by their

common distribution.

If f(x1, x2, …, xn ) is the value of the joint distribution of such a set of random variables at

(X1, X2, …, Xn) we can write f(x1, x2, …, xn ) = ∏𝑛𝑖=1 𝑓(𝑥𝑖 ) where f(xi) is the value of

population distribution at xi.

Definition: If X1, X2, …, Xn constitute a random sample, then

∑ 𝑥𝑖 ∑(𝑥𝑖−𝑥̅ )2
𝑥̅ = is called the sample mean and 𝑠 2 = is called the sample variance.
𝑛 𝑛−1

𝑇ℎ𝑒 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑀𝑒𝑎𝑛: Since statistics are random variables, their values will vary

from sample to sample, it is customary to refer to their distribution as sampling distribution.

𝐸𝑥𝑎𝑚𝑝𝑙𝑒: If x1, x2, …., xn constitute a random sample from an infinite population with the

𝜎2
mean 𝜇 and the variance 𝜎 2 then E(𝑥̅ ) = 𝜇 𝑎𝑛𝑑 𝑣𝑎𝑟(𝑥̅ ) = 𝑛

∑ 𝑥𝑖 ∑ 𝑥𝑖 ∑ 𝐸(𝑥𝑖) 1 1
𝑝𝑟𝑜𝑜𝑓: Let Y = 𝑥̅ = then we get E(Y) = E( )= = 𝑛∑𝜇 = 𝑛𝜇 = 𝜇
𝑛 𝑛 𝑛 𝑛

∑ 𝑥𝑖 1 1 1
Var(Y) = var( )= 𝑣𝑎𝑟(∑ 𝑥𝑖) = ∑ 𝑣𝑎𝑟(𝑋𝑖) + 2 ∑ ∑ 2 𝑐𝑜𝑣(𝑥𝑖, 𝑥𝑗)
𝑛 𝑛2 𝑛2 𝑛

Since xi and xj are random (independent) variables, cov(xi, xj) = 0

1 1 𝑛𝜎2 𝜎2
So var(xi) = ∑ 𝑛2 𝑣𝑎𝑟(𝑥𝑖) = ∑ 𝜎2 = =
𝑛2 𝑛2 𝑛

Note: The standard deviation of the sample means is called standard error of the mean.

There are basically two ways of making inferences (generalization) in science. These are

- Deductive inference

- Inductive Inference

39
Statistical Theory of Distribution

Deductive inference is from the general to the particular

Inductive inference is from the particular to the general (population).

There are generally two closely related concepts used to make inference about population

parameter, these are:

-Estimator and hypothesis testing techniques

Estimator is a formula (function of a sample observation) used to estimate unknown

population parameter.

∑ 𝑥𝑖
Example: 𝜇̂ = 𝑥̅ = is an estimator of population mean𝜇.
𝑛

One function of statistics is the provision of techniques for making inductive inferences and

for measuring the degree of un certainty of any inferences. That uncertainty is measured in

terms of probability.

To make inference about the population parameter, we should select the sample observation

(n) from a total (N), by a method of

- Without replacement or with replacement.

If the experiment is performed without replacement we have (𝑁𝑛 ) different samples. And if

the sample is selected with replacement we have 𝑁 𝑛 different sample observations with n-

equal size.

Distribution of Sample

Let x1, x2, …, xn denote a sample of size n. The distribution (probability) of the sample x1, x2,

…, xn is defined to be the joint distribution of x1, x2, …, xn.

Example: let x1, x2, …, xn be a random sample from a Bernoulli distribution i.e. xi ~

Bernoulli (pi), where p(X = xi) = 𝑝 𝑥𝑖 (1 − 𝑝)1−𝑥𝑖 𝑓𝑜𝑟 𝑥𝑖 = 0𝑎𝑛𝑑 1

Determine the distribution of the random samples?

40
Statistical Theory of Distribution

(𝑥1, 𝑥2,….,𝑥𝑛)
Solution 𝑓𝑥1,𝑥2,…,𝑥𝑛 = 𝑓𝑥1 . 𝑓𝑥2 . , … , 𝑓𝑥𝑛 = ∏𝑛𝑖=1 𝑝(𝑥𝑖) = ∏𝑛𝑖=1 𝑝 𝑥𝑖 (1 − 𝑝)1−𝑥𝑖

𝑝∑ 𝑥𝑖 (𝑖 − 𝑝)∑(1−𝑥𝑖) = 𝑝∑ 𝑥𝑖 (𝑖 − 𝑝)𝑛−∑ 𝑥𝑖

Example-2 Let x1, x2, …, xn be a random sample from the poisson distribution. Find the

joint distribution of the n random sample.

Solution Given xi ~ poisson (𝜆) i = 1, 2, … n.

(𝑥1, 𝑥2,….,𝑥𝑛)
𝑓𝑥1,𝑥2,…,𝑥𝑛 =?

𝑒 −𝜆 𝜆𝑥𝑖
𝑓(𝑥𝑖) = (𝑥𝑖)!
xi = 0, 1, 2, ….

𝑛
(𝑥1, 𝑒 −𝜆 𝜆𝑥𝑖 𝑒 −𝑛𝜆 𝜆∑ 𝑥𝑖
𝑇ℎ𝑢𝑠, 𝑓𝑥1,𝑥2,…,𝑥𝑛 𝑥2,….,𝑥𝑛) = ∏ = 𝑛
(𝑥𝑖)! ∏𝑖=1(𝑥𝑖) !
𝑖=1

4.2. The Chi-square Distributions

Let X1, X2, …, Xn be independent standard normal random variables. The distribution of

X = ∑𝒏𝒊=𝟏 𝑿𝒊 𝟐 is called the chi-square distribution with degrees of freedom (df) n, and its

probability density function is given by

𝟏 𝒏
f(x/n) = 𝒆−𝒙⁄𝟐 𝒙𝟐−𝟏 x> 𝟎, 𝒏 > 𝟎.
𝟐𝒏⁄𝟐𝚪(𝒏⁄𝟐)

The chi-square random variable with df = n is denoted by𝑋𝑛 2 .

Applications:

The chi-square distribution is also called the variance distribution, because the variance of a

random sample from a normal distribution follows a chi-square distribution. Specifically, if X1,

…, Xn is a random sample from a normal distribution with mean 𝜇 and variance 𝜎 2 , then
𝒏
̅) 𝟐
(𝒙𝒊 − 𝒙 (𝒏 − 𝟏)𝑺𝟐
∑ = ~ 𝑿𝟐𝒏−𝟏
𝝈𝟐 𝝈𝟐
𝒊=𝟏

This distributional result is useful to make inferences about𝜎 2 .

41
Statistical Theory of Distribution

 In categorical data analysis consists of an r×c table, the usual test statistic

(𝑂𝑖𝑗−𝐸𝑖𝑗)2
T = ∑𝑟𝑖=1 ∑𝑐𝑗=1 ~ 𝑿𝟐(𝒄−𝟏)(𝒓−𝟏) where Oij and Eij denote, respectively, the
𝐸𝑖𝑗

observed and expected cell frequencies. The null hypothesis of independent attributes will be

rejected at a level of significance 𝛼, if an observed value of T is greater than (1 - 𝛼)th quintile of a

chi-square distribution with df = (r - 1) × (c - 1).

The chi-square statistic T can be used to test whether a frequency distribution tests a specific

model.

Properties:

i. If the random variable Xi, I = 1, 2, …., k are normally and independently distributed with

(𝑥𝑖 −𝜇𝑖 )2 (𝑥𝑖 −𝜇𝑖 )2


mean 𝜇𝑖 𝑎𝑛𝑑 𝑣𝑎𝑟𝑎𝑖𝑛𝑐𝑒 𝛿𝑖 2 , then 𝑍𝑖 2 = 2 ~ 𝑋12 and U = ∑ 𝑍𝑖 2 = ∑𝑘𝑖=1
𝛿𝑖 𝛿𝑖 2

has a chi-square distribution with k-degree of freedom.

(𝑥𝑖−𝜇𝑖 )
Proof: write Zi = . Then Zi has a standard normal distribution. Now
𝛿𝑖

𝑘 𝑛
𝑡𝑢 ) 𝑡 ∑ 𝑍𝑖 2 𝑡𝑍𝑖 2 2
𝑀𝑢 (𝑡) = 𝐸(𝑒 = 𝐸(𝑒 ) = 𝐸 (∏ 𝑒 ) = ∏ 𝐸(𝑒 𝑡𝑍𝑖 ) 𝑠𝑖𝑛𝑐𝑒 𝑍𝑖 ′ 𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝑖=1 𝑖=1

1 2 1
2 ∞ 2 1 ∞ 1 (1−2𝑡)𝑍 2
But 𝐸(𝑒 𝑡𝑍𝑖 ) = ∫−∞ 𝑒 𝑡𝑍𝑖 ( 2 ) 𝑒 −2𝑍 𝑑𝑧 = ∫−∞ ( 2 ) 𝑒 −2 𝑑𝑧
√2𝜋 √2𝜋

1
1 ∞ √1−2𝑡 (1−2𝑡)𝑍 2 1 1
= ⌈∫−∞ 𝑒 −2 𝑑𝑧⌉ = 𝑓𝑜𝑟 𝑡 < 2 Since the expression in the parenthesis
√1−2𝑡 √2𝜋 √1−2𝑡

1
is the density of normal with mean 0 and variance(1−2𝑡).

2 1 1/2
Hence, E(𝑒 𝑡𝑍𝑖 ) = (1−2𝑡)

2 1 1/2 1 𝑘/2 1
Implies that, 𝑍𝑖 2 ~ 𝑋 2 (1) → ∏ki=1 E(𝑒 𝑡𝑍𝑖 ) = ∏ki=1 (1−2𝑡) = (1−2𝑡) 𝑓𝑜𝑟 𝑡 < 2 is the

moment generating function of a chi-square distribution with k-degree of freedom..

ii. If z1, z2, …, zn are a random sample of a standard normal distribution, then
42
Statistical Theory of Distribution

1
a) 𝑧̅ ℎ𝑎𝑠 𝑎 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑚𝑒𝑎𝑛 0 𝑎𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑛.

b) 𝑧̅ 𝑎𝑛𝑑 ∑𝑛𝑖=1(𝑧𝑖 − 𝑧̅)2 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡

c) ∑𝑛𝑖=1(𝑧𝑖 − 𝑧̅)2 has a chi-square distribution with n-1 degree of freedom.

Proof: ( page 244)

∑𝑛
𝑖=1 𝑍𝑖 ∑𝑛
𝑖=1(𝑋𝑖−𝜇) 𝑥̅ −𝜇
 Here 𝑧̅ = = =
𝑛 𝑛 𝑛

𝑥̅ −𝜇 𝜎2 /𝑛 1
→E(𝑧̅) = 0 and var(𝑧̅) = 𝑣𝑎𝑟 ( )= =𝑛
𝑛 𝜎2

 Consider case n = 2, if n = 2

(𝑛−1)𝑠2
iii. If x1, x2, …, xn is a random sample from n(𝜇, 𝜎 2 ). Then ~ 𝑋 2 (𝑛 − 1)
𝜎2

Proof:
𝑛

∑(𝑥𝑖 − 𝜇)2 = ∑(𝑥𝑖 − 𝑥̅ + 𝑥̅ 𝜇)2 =


𝑖−1

iv. If X and Y are independent chi-square distribution with k and m degree of freedom, then X +

Y is chi-square distribution with k + m degree of freedom.

Proof:

X ~ 𝑋 2 (𝑘) 𝑎𝑛𝑑 𝑌 ~𝑋 2 (𝑚)


𝑘
Mx(t) = (1 − 2𝑡)−2 𝑎𝑛𝑑 𝑀𝑦 (𝑡) = (1 − 2𝑡)−𝑚/2

MX+Y(t) = E(𝑒 𝑡(𝑥+𝑌) ) = 𝐸(𝑒 𝑡𝑥+𝑡𝑦 ) = 𝐸(𝑒 𝑡𝑥 )𝐸(𝑒 𝑡𝑦 )𝑠𝑖𝑛𝑐𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝑘 (𝑘+𝑀)
Mx+y(t) = (1 − 2𝑡)−2 (1 − 2𝑡)−𝑚/2 = (1 − 2𝑡)− 2 which is the moment generating

function of chi-square distribution with k + m degree of freedom.

→If X1, X2, …, Xk are independent chi-square random variables with degrees of freedom

n1, …, nk, respectively, then

43
Statistical Theory of Distribution

𝑘
2
∑ 𝑋𝑖 ~ 𝑋𝑚 𝑤ℎ𝑒𝑟𝑒 𝑚 = ∑ 𝑛𝑖
𝑖=1

4. The gamma distribution with shape parameter a and scale parameter b specializes to the

chi-square distribution with df = n when a = n/2 and b = 1/2. That is, gamma(n/2, 1/2)

~ 𝑋𝑛2 .

Theorem: If the random variable X has a chi-square distribution, then

E(x) = n Var(x) = 2n and Mx(t) = (1 − 2𝑡)−𝑛/2

Proof: from gamma distribution

𝑟 𝑛⁄2 𝑟 𝑛 1 2 𝑛
E(x) = 𝜆 = 1⁄ = 𝑛 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = = ÷ (2) = × 4 = 2n//
2 𝜆2 2 2

𝑚𝑣 2
Example: Suppose the velocity (V) of an object has N (0, 1). Let k = be the kinetic energy
2

of an object.

a. Find the pdf of Y where Y = 𝑉 2

𝑚𝑣 2
b. Find the pdf of k = 2

Solution: Y = 𝑉 2 ⟹ 𝑉 = ±√𝑦 = ℎ−1 (𝑦)𝑎𝑛𝑑 𝑓𝑣 (𝑉) =

1 2
1
𝑒 −2𝑣 𝑠𝑖𝑛𝑐𝑒 𝑉~ 𝑁(0, 1)
√2𝜋

𝑑ℎ−1 (𝑦)
→ g(y) = {𝑓(ℎ−1 (𝑦)) + 𝑓(ℎ−1 (𝑦))} | |
𝑑𝑦

1
→ g(y) = {𝑓(√𝑦) + 𝑓(−√𝑦)} |2 𝑦|

1 2 1 2
1 1 1
Thus, g(y) = 2 { 𝑒 −2(√𝑦) + 𝑒 −2(−√𝑦) }
√ 𝑦 √2𝜋 √2𝜋

1 1
1 − 𝑦 −
1 2 − (√𝑦)2 𝑒 2 𝑦 2 1
g(y) = { 𝑒 2 } = where Γ (2) = √Π
2√𝑦 √2𝜋 √2𝜋√𝜋

44
Statistical Theory of Distribution

1 1
− 𝑦 −
𝑒 2 𝑦 2
→ 𝑔(𝑦) = 1 ~ 𝑋 2 (1) 𝑜𝑟 𝑋 2 (𝑛 = 1)
√2𝜋Γ(2)

𝑛 −1/2
𝑒
𝑥2
Because f(x) = 𝑛 1 ~ 𝑋 2 (𝑛)
2 2 22

𝑚𝑣 2
ii. Find the pdf of k = 2

Solution:

1 1 2
𝑓𝑣 (𝑉) = 𝑒 −2𝑣 −∞<𝑣 <∞
√2𝜋

2𝑘
ℎ−1 (𝑘) = 𝑣 = ±√
𝑚

𝑑ℎ−1 (𝑘)
g(k) = {𝑓(ℎ−1 (𝑘)) + 𝑓(ℎ−1 (𝑘))} | |
𝑑𝑘

2𝑘
𝑑(√ )
𝑑ℎ−1 (𝑘) 𝑚 2 1 1
where = = √𝑚 (2√𝑘) =
𝑑𝑘 𝑑𝑘 √2𝑚𝑘

2
1 2𝑘 1 2𝑘
1 − (√ )2 1 − (√ ) 1
2 𝑚
g(k) = { 𝑒 2 𝑚 + 𝑒 }
√2𝜋 √2𝜋 √2𝑚𝑘

2 −𝑘 1
→ 𝑔(𝑘) = ( 𝑒 𝑚)
√2𝜋 √2𝑚𝑘
1 1 𝑘
1 − −
2 −𝑘 ( )2 𝑘 2 𝑒 𝑚
𝑚
→ 𝑔(𝑘) = 𝑒 𝑚 =
2√𝜋𝑚𝑘 √𝜋

1
1 2 1 𝑘
(𝑚 ) 𝑘 − 2 𝑒 − 𝑚 1 1 1 1
→ 𝑔(𝑦) = ~ 𝑔𝑎𝑚𝑚𝑎 ( , ) →𝑟= 𝑎𝑛𝑑 𝜆 =
1 2 𝑚 2 𝑚
Γ(2)

4.3.

45

You might also like