Statistical Theory of Distribution: Stat 471

Statistical Theory of Distribution
Course Outline
1. Review of Probability Theory (3 lecture hours)

1.1 Probability concepts
1.2 Distribution Functions
1.3 Expectations
1.4 Moments
1.5 Moment Generating Functions
2. Common Univariate Distributions (13 lecture hours)
2.1 Discrete Distributions: Binomial, Negative Binomial, Geometric,
Hypergeometric, Poisson and Pascal
2.1 Continuous Distributions: Uniform, Normal, Gamma, Exponential, Weibull, Chi-
square, Beta and Cauchy
2.4 Truncated Distributions
3. Common Multivariate Distributions (13 lecture hours)
3.1 Multi-dimensional random variables
3.2 Trinomial and Multinomial distributions
3.3 Bi-variate and Multivariate Normal distributions
3.4 Moments and Moment Generating Functions
3.5 Marginal and Conditional Distributions
3.6 Independent random variables
3.7 Distribution Functions of Functions of Random Variables
3.7.1 Cumulative distribution function Technique
3.7.2 Moment Generating Function technique
3.7.3 Transformations
4 Sampling Distributions (13 lecture hours)
4.1 The normal distribution
4.2 The Chi-Square Distribution
4.3 The t-Distribution
4.4 The F distribution
4.5 Order Statistics
4.5.1 Definition of order statistics
4.5.2 Derivation of distribution function and density for the ith order statistics
4.5.3 Derivation of the joint pdf of the ith and jth order statistics
4.5.4 Distributions of the sample range and sample median
4.6 The sample cumulative distribution function (empirical distribution)
4.6.1 Definition
4.6.2 Distribution of the empirical distribution
5 Introduction to Limit Theorems (6 lecture hours)
5.1 Sequence of random variables
5.2 Convergence in probability and mean square
5.3 Weak law of large numbers
5.4 Central limit theorems
Textbook
1
Mood, A.M, Graybill, F. and Boes, D.C. (1974). Introduction to the Theory of Statistics
(3rd Edition). McGraw Hill Series.
References
1. Balakrishnan N. and Nevzorov V.B (2003). A Primer on Statistical Distribution. Wiley-
Interscience.
2. Evans, Hastings N and Peacock B. (2000). Statistical Distributions (3rd Edition). Wiley-
Interscience.
3. Friedlander, F.G. and Joshi M. (2008). Introduction to the Theory of Distributions (2nd
Edition). Cambridge University Press.
4. Hogg, R.V. and Tanis, E. (2009). Probability and Statistical Inference (8th Edition).
Prentice Hall.
5. Johnson, N.L., Kemp, A.W, and Kotz, S. (2005). Univariate Discrete Distributions (3rd
Edition). Wiley Series in Probability and Statistics. Wiley-interscience.
6. Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distribution.
Volume 1, Wiley Series in Probability and Statistics. Wiley-Interscience.
7. Krishnamoorthy, K. (2006). Handbook of Statistical Distributions with Applications.
Chapman and Hall/CRC.
8. Pathak, R.S. (2000). A course in Distribution Theory and Applications. Narosa.
9. Ross, S. (2006). Introduction to Probability Models (9th Edition). Academic Press.
10. Severini, T.A. (2005). Elements of Distribution Theory. Cambridege Series in Statistical
and Probabilistic Mathematics, Cambridge University Press.
11. Stuart, A. and Ord, K. (2009). Distribution Theory (6th edition). Kendall's Advanced
Theory of Statistics Volume 1. Wiley.
12. Walpole, R.E., Myers, R.H., Myers, S.L. and Ye, K. (2006). Probability and Statistics for
Engineers and Scientists (8th edition). Prentice-Hall.
2
Statistical Theory of Distributions
(Stat 471)
Chapter one: Review of Probability Theory
1.1 Probability concepts
What is probability?
The total situation in life would be to know with certainty what is going to
happen next. This case almost never has been because the world is full of
deterministic and non-deterministic situations. The possible value for the
occurrence of non-deterministic event (what we want to happen) is called
probability.
Random Experiment: An experiment whose outcomes are determined
only by chance factors is called a random experiment.
Sample Space: The set of all possible outcomes of a random experiment
is called a sample space. A sample apace may be finite or infinite sample space.
A sample space is said to be a discrete sample space if it has finitely many or a
countable infinity of elements. If the elements (points) of a sample space
constitute a continuum-for example, all the points on a line, all the points in a
plane –the sample space is said to be a continuous sample space.
Event: The collection of none, one, or more than one outcomes from a
sample space is called an event.
Random Variable: A variable whose numerical values are determined by
chance factors is called a random variable. Formally, it is a function from the
sample space to a set of real numbers.
3
Discrete Random Variable: If the set of all possible values of a random
variable X is countable, then X is called a discrete random variable.
Continuous Random Variable: If the set of all possible values of X is an
interval or union of two or more non overlapping intervals, then X is called a
continuous random variable
Probability of an Event: If all the outcomes of a random experiment are
equally likely, then the probability of an event A is given by
Number of outcomes in the event A

P(A) = Total number of outcomes in the sample space
Axioms of probability theory are
A1. 0 ≤ 𝑃(𝑎𝑛 𝑒𝑣𝑒𝑛𝑡) ≤ 1 ∀ 𝑒𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 𝑆
A2. 𝑃(𝑆) = 1
A3. If events Ai are mutually exclusive, 𝑃(∪ 𝐴𝑖 ) = ∑ 𝑃(𝐴𝑖 ), we call this axiom
special addition rule. The general addition rule for probability of any two
events A and B in S is given as 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
Conditional probability, if A and B are any two events in S and𝑃(𝐵) ≠ 0,

the conditional probability
𝑃(𝐴∩𝐵)/𝑃(𝑆) 𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = =
𝑃(𝐵)/𝑃(𝑆) 𝑃(𝐵)
1.2 Distribution functions
Probability Mass Function (pmf): Let R be the set of all possible values of a
discrete r.v X; and p(k) = P( X = k) for each k in R. Then p(k) is called the
probability mass function of X. Such that:
0 ≤ p(k) ≤ 1 and ∑∞
−∞ 𝑝(𝑘) =1
4
Probability Density Function (pdf): Any real valued function f(x) that
satisfies the following requirements is called a probability density function:

∞ 𝑏
f(x) 0 for all x; ∫−∞ 𝑓(𝑥) 𝑑𝑥 =1 P(a≤ 𝑥 ≤ 𝑏) = ∫𝑎 𝑓(𝑥) 𝑑𝑥
We cannot compute a single point in Continous random variable. Example,

𝑎
when x = a ⟹ P(x =a) = ∫𝑎 𝑓(𝑥) 𝑑𝑥 = 0
𝑥 0≤𝑥≤1
Exercise: Let f(x) = {2 − 𝑥 1 ≤ 𝑥 ≤ 2 Then show that f(x) is pdf of the random
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
variable X?
Cumulative Distribution Function (cdf): The cdf of a random variable X is
defined by F(x) = P(X x) for all x.
For a continuous random variable X with pdf f(x),

𝑥
P(X ≤ 𝑥) = ∫−∞ 𝑓(𝑥)𝑑𝑥 for all x:
For a discrete random variable X, the cdf is defined by
F(k) = P(X ≤ k) = ∑𝑘𝑖= −∞ 𝑝(𝑥 = 𝑖)
Joint probability distributions
XY discrete XY Continous
f(x,y) is the joint pmf if f(x,y) is the joint pdf if
1. f(x, y) ≥ 0 ∀ 𝑥, 𝑦 1. f(x, y) ≥ 0 ∀ 𝑥, 𝑦
∞ ∞
2. ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1 2. ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1
Marginal probability distributions:
∑𝑦 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠

f(x) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
5
∑𝑥 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠

f(y) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Exercise: Let f(x, y) = 𝑒 −𝑥−𝑦 for x> 0, 𝑦 > 0, then
i. Verify that the definition of joint pdf is satisfied.
ii. Find the marginal of X and Y?
1.3 Expectations
If X is a continuous random variable with the pdf f(x), then the expectation of
g(X), where g is a real valued function, is defined by

∞
E(g(X)) = ∫−∞ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥
If X is a discrete r.v, then
E(g(X)) = ∑𝑘 𝑔(𝑥)𝑃(𝑥 = 𝑘)
where the sum is over all possible values of X.
1.4 Moments
The moments are a set of constants that represent some important
properties of the distributions. The most commonly used such constants are
measures of central tendency (mean, median, and mode), and measures of
dispersion (variance and mean deviation). Two other important measures are
the coefficient of skewness and the coefficient of kurtosis.
Moments about the Origin (Raw Moments): The moments about the origin are
obtained by finding the expected value of the r.v that has been raised to k, k =
1; 2,…. That is,

∞
𝜇 ’k = E(xk) = ∫−∞ 𝑥 𝑘 𝑓(𝑥)𝑑𝑥 is called the kth moment about the origin.
6
Moments about the Mean (Central Moments): When the r.v is observed in
terms of deviations from its mean, its expectation yields moments about the
mean or central moments. The first central moment is zero, and the second
central moment is the variance. The third central moment measures the degree
of skewness of the distribution, and the fourth central moment measures the
degree of flatness.
The kth moment about the mean or the kth central moment of a random variable
X is defined by
µk = E(x - µ)k , k = 1, 2, …
where µ = E(x) is the mean of X.
1.5 Moment and probability generating Functions
The moment generating function of a random variable X is defined by
Mx(t) = E(etx)
provided that –h ≤ 𝑡 ≤ h for too small h.
The moment generating function is useful in deriving the moments of X.
Specifically
𝑑𝑘 𝐸(𝑒 𝑡𝑥 )
E(xk) = |t=0 k = 1, 2 …
𝑑𝑡 𝑘
Example: let X be a r.v with pdf
𝑒 −𝑥 𝑓𝑜𝑟 𝑥 ≥ 0
f(x) = {
0 𝑓𝑜𝑟 𝑥 ≤ 0
i. Find the moment generating function of X.
ii. Calculate the first and second moments of X from the origin.
Solution
7
∞ ∞
i. Mx(t) = E(𝑒 𝑡𝑥 ) = ∫0 𝑒 𝑡𝑥 𝑒 −𝑥 𝑑𝑥 = ∫0 𝑒 (𝑡−1)𝑥 𝑑𝑥
𝑒 (𝑡−1)𝑥
= 𝑡−1
ii. The 1st moment of x = E(x) = M’x(t)|t=0
𝑓(𝑥) 𝑓 ′ (𝑥)𝑔(𝑥)− 𝑔′ (𝑥)𝑓(𝑥)

If h(x) = then h’(x) =
𝑔(𝑥) (𝑔(𝑥))2
Thus,
𝑑(𝑀𝑥 (𝑡) 𝑒 (𝑡−1)𝑥 (𝑡 − 1) − 𝑒 (𝑡−1)𝑥

=
𝑑𝑡 (𝑡 − 1)2
𝑒 −𝑥 (−1)−𝑒 −𝑥 −𝑒 −𝑥 −𝑒 −𝑥
M’x(t)|t=0 = = -2𝑒 −𝑥 //
(−1)2 (−1)2
M’’x(t) =
Probability Generating Function: The probability generating function of a non-
negative, integer valued r.v X is defined by
P(t) = ∑∞ 𝑖
𝑖=0 𝑡 𝑃(𝑋 = 𝑖)
So that
1 𝑑𝑘 𝑃(𝑡)
P(X = k) = ( )|t=0 k = 1, 2, …
𝑘! 𝑑𝑡 𝑘
𝑑𝑃(𝑡)
Furthermore, P(0) = P(X = 0) and |t=1 = E(X).
𝑑𝑡
8
Chapter Two
Common Univariate Distributions
2.1 Discrete Distributions
2.1.1 Bernoulli distribution
The Bernoulli distribution is used to model an experiment with only two
possible outcomes, often referred to as “success” and “failure”, usually encoded
as 1 and 0.
Definition: A discrete r.v X has a Bernoulli distribution with parameter p,
where 0 ≤ p ≤ 1, if its pmf is given by
p(1) = P(X = 1) = p and p(0) = P(X = 0) = 1− p.
We denote this distribution by Ber(p).
If X ~ Ber (p), then

𝜇 = 𝐸(𝑥) = ∑1𝑖=0 𝑥𝑃(𝑥 = 𝑖)
= 0× 𝑝(𝑥 = 0) + 1𝑝(𝑥 = 1)
=0+p =p
𝛿 2 = VarX = E(X - 𝜇)2 = ∑1𝑖=0(𝑥 − 𝜇)2 P(x=i)
= (0 - µ)2 P(X = 0) + (1 - µ)2 P(X = 1)
= (0 - p)2 (1 - p) + (1 - p)2 p
= p (1 - p) (p + 1 - p)
= p (1 - p)
2.1.2 Binomial Distribution
Let X1, …., Xn be mutually independent Bernoulli trials, each with success
probability p. Then
Y = ∑𝑛𝑖=1 𝑥𝑖 is a binomial r.v, denoted Y ~ Binomial (n, p)
9
Definition: A discrete r.v Y has a binomial distribution with parameters n and
p, where n = 1, 2,… and 0 ≤ p ≤ 1, if its pmf is given by
P(k) = P(Y = k) = (𝑛𝑘)𝑝𝑘 𝑞 𝑛−𝑘 where k = 0, 1, 2, .. , n
Assumption of Binomial distribution
 There are only two possible outcomes for each trial success and failure.
 The probability of success is the same for each trial.
 The outcomes from different trials are independent.
 The no. of trial (n) is fixed
The mean and variance of Y is
E(y) = np and V(y) = np(1-p) = npq.
Proof:
E(y) = E( ∑𝑛𝑖=1 𝑋𝑖 )
∑𝑛𝑖=1 𝐸(𝑥𝑖)= ∑𝑛𝑖=1 𝑝 since Xi is mutually independent Bernoulli trial
= np
V(y) = V(∑𝑛𝑖=1 𝑋𝑖) = ∑𝑛𝑖=1 𝑉(𝑋𝑖) b/c Xi are independent
= ∑𝑛𝑖=1 𝑝(1 − 𝑝)
= np (1-p)
Theorem: If a r.v X is binomial distribution its moment and probability
generating functions are
Mx(t) = (pet + q)n and px(t) = (pt + q)n
Proof:
10
𝑛 𝑛

𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑𝑛𝑥=0 𝑒 𝑡𝑥 ( ) 𝑝 𝑥 𝑞 𝑛−𝑥 = ∑𝑛𝑥=0 ( ) (𝑝𝑒 𝑡 )𝑥 𝑞 𝑛−𝑥 = (𝑝𝑒 𝑡 + 𝑞)𝑛
𝑥 𝑥
b/c for any value of a and b such that a and b, a > 0 , b > 0, and positive
𝑛
integer x and n ∑𝑛𝑥=0 ( ) (𝑎)𝑥 𝑏 𝑛−𝑥 = (a + b)n
𝑥
Now 𝑚𝑋′ (𝑡) = 𝑛𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 𝑞)𝑛−1 → 𝐸(𝑋) = 𝑚𝑋′ (0) = 𝑛𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−1 = 𝑛𝑝
𝑚𝑋′′ (𝑡) = 𝑛(𝑛 − 1)𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 𝑞)𝑛−2 + 𝑛𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 𝑞)𝑛−1
→ 𝑚𝑋′′ (0) = 𝑛(𝑛 − 1)𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−2 + 𝑛𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−1 = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝
2
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − (𝑚𝑋′ )2 = 𝑛𝑝𝑞
𝑛 𝑛
 Px(t) = 𝐸(𝑡 𝑥 ) = ∑𝑛𝑥=0 𝑡 𝑥 ( ) 𝑝 𝑥 𝑞 𝑛−𝑥 = ∑𝑛𝑥=0 ( ) (𝑝𝑡)𝑥 𝑞 𝑛−𝑥 = (𝑝𝑡 + 𝑞)𝑛
𝑥 𝑥
Exercise: using the pgf of binomial distribution calculate
i. the probability of X at x =
1and 2
ii. the mean of x
11
Properties
Let X1, X2,…, , Xm be independent random variables with Xi ~ binomial(n i ;
p),
i= 1,2, …, m. Then ∑𝑚
𝑖=1 𝑋𝑖 ~ 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙(∑ 𝑛𝑖, 𝑝)
Remark: The probability of X = x for Bin(n, p) can be obtained from table as

follows
𝑓(𝑥; 𝑛, 𝑝) = 𝐹(𝑥; 𝑛, 𝑝) − 𝐹(𝑥 − 1; 𝑛, 𝑝), 𝑤ℎ𝑒𝑟𝑒 𝐹(−1) = 0.
Because, we tabulated the cumulative probabilities rather than the values
of 𝑓(𝑥; 𝑛, 𝑝)
2.1.3 Discrete Uniform Distribution
The probability mass function of a discrete uniform random variable X is given

1 1
by 𝑓(𝑥; 𝑁) = 𝑁 𝑓𝑜𝑟 𝑥 = 1, 2, … , 𝑁 𝑂𝑅 𝑓(𝑥; 𝑁) = 𝑁 𝐼{1,2,…,𝑁}𝑥
This distribution is used to model experimental outcomes which are equally
likely.
Theorem 2: If X has a discrete uniform distribution, then
𝑁+1 𝑁 2 −1 1 𝑒 𝑡 1− 𝑒 𝑁𝑡
𝑬(𝑋) = 𝑉(𝑋) = 𝑎𝑛𝑑 𝑚𝑋 (𝑡) = 𝑁 ∑𝑁
𝑥=1 𝑒
𝑥𝑡
= { 1− 𝑒 𝑡) }
2 12 𝑁
Proof:
𝑘(𝑘+1)
Remark ∑𝑘𝑖=1 𝑖 = ∑𝑘𝑖=1 𝑖 2 = 𝑘(𝑘 + 1)(2𝑘 + 1)/6
2
1 1 1 𝑁+1 𝑁+1
𝐸(𝑋) = ∑𝑁 𝑁 𝑁
𝑥=1 𝑥𝑓(𝑥) = ∑𝑥=1 𝑥 𝑁 = 𝑁 ∑𝑥=1 𝑥 = 𝑁 ( )𝑁 =
2 2
12
2 1 𝑁+1 2 1 𝑁(𝑁+1)(2𝑁+1) 𝑁+1 2

𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑁 ∑𝑁 2
𝑥=1 𝑥 − ( ) = 𝑁( )−( ) =
2 6 2
(𝑁+1)(𝑁−1)
12
2.1.4 Geometric Distributions

Consider a sequence of independent Bernoulli trials with success
probability p. The distribution of the random variable that represents the
number of failures until the first success is called geometric distribution.
Let us consider the number of menstrual cycles of a woman to become
pregnant, measured from the moment she had decided to become pregnant. We
model the number of cycles up to pregnancy by a random variable X. Assume
that the probability that a woman becomes pregnant during a particular cycle is
equal to p, for 0 < p ≤ 1, independent of the previous cycles.
P(X = k) = P(no pregnancy in the first k − 1 cycles, pregnancy in the kth)
= (1 − p)k-1p for k = 1, 2, . . .
Or P(x =k) = (1 − p)kp for k = 0, 1, 2, …
This r.v X is an example of a r.v with a geometric distribution with parameter p.
Assumption of geometric distribution
 There are only two possible outcomes for each trial success and failure.
 The probability of success is the same for each trial.
 The outcomes from different trials are independent.
 The no. of trial (n) is not fixed
A random variable X that has a geometric distribution is often referred to as
a discrete waiting-time random variable. It represents how long (in terms of
the number of failures) one has to wait for a success.
13
Theorem : If the random variable X has a geometric distribution, then
𝒒 𝒒 𝒑
𝑬(𝑿) = 𝒑 𝑽(𝑿) = 𝒑𝟐 𝒎𝑿 (𝒕) = 𝟏−𝒒𝒆𝒕
𝑑𝑞 𝑥
Proof: 𝐸(𝑋) = ∑∞ 𝑥 ∞
𝑥=0 𝑥𝑝𝑞 = 𝑝𝑞 ∑𝑥=0 𝑥𝑞
𝑥−1
= 𝑝𝑞 ∑∞
𝑥=0 =
𝑑𝑞
𝑑 𝑑 1 1
𝑝𝑞 𝑑𝑞 ∑∞ 𝑥
𝑥=0 𝑞 = 𝑝𝑞 𝑑𝑞 (1−𝑞 ) = 𝑝𝑞 (1−𝑞)2 = 𝑞/𝑝
∞ ∞ ∞
2) 2 𝑥 2 2 𝑥−2
𝐸(𝑋 = ∑ 𝑥 𝑝𝑞 = 𝑝𝑞 ∑ 𝑥 𝑞 = 𝑝𝑞 ∑(𝑥(𝑥 − 1) + 𝑥) 𝑞 𝑥−2
2
𝑥=0 𝑥=0 𝑥=0
∞ ∞ ∞ ∞
2 𝑥−2 2 𝑥−2 2
𝑑𝑞 𝑥−1
= 𝑝𝑞 ∑ 𝑥(𝑥 − 1) 𝑞 + 𝑝𝑞 ∑ 𝑥𝑞 = 𝑝𝑞 ∑ 𝑥 + 𝑝𝑞 ∑ 𝑥𝑞 𝑥−1
𝑑𝑞
𝑥=0 𝑥=0 𝑥=0 𝑥=0
∞
𝑑 𝑞 𝑑 1 𝑞 2(1 − 𝑞) 𝑞
= 𝑝𝑞 2 ∑ 𝑥𝑞 𝑥−1 + = 𝑝𝑞 2 ( 2
) + = 𝑝𝑞 2 ( 4
)+
𝑑𝑞 𝑝 𝑑𝑞 (1 − 𝑞) 𝑝 (1 − 𝑞) 𝑝
𝑥=0
𝑞 2 𝑞 2𝑞 2 + 𝑝𝑞
=2 + =
𝑝2 𝑝 𝑝2
2 2𝑞 2 +𝑝𝑞 𝑞2 𝑞 2 +𝑝𝑞 𝑞
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = − 𝑝2 = = 𝑝2
𝑝2 𝑝2
∞ ∞
1 𝑝
𝑚𝑋 (𝑡) = ∑ 𝑒 𝑡𝑥 𝑝𝑞 𝑥 = 𝑝 ∑(𝑞𝑒 𝑡 )𝑥 = 𝑝 𝑡
=
1 − 𝑞𝑒 1 − 𝑞𝑒 𝑡
𝑥=0 𝑥=0
Theorem : If the random variable X has a geometric distribution with
parameter p, then
𝑷[𝑋 ≥ 𝑠 + 𝑡|𝑋 ≥ 𝑡] = 𝑃[𝑋 ≥ 𝑠] 𝑠, 𝑡 = 0,1,2, …
𝑃[{𝑋≥𝑠+𝑡}∩ {𝑋≥𝑡}] 𝑃(𝑋≥𝑠+𝑡) ∑∞

𝑥=𝑠+𝑡 𝑝𝑞
𝑥 𝑞 𝑠+𝑡
Proof: 𝑃[𝑋 ≥ 𝑠 + 𝑡|𝑋 ≥ 𝑡] = = = ∑∞ 𝑥
= = 𝑞𝑠 =
𝑃[𝑋≥𝑡] 𝑃(𝑋≥𝑡) 𝑥=𝑡 𝑝𝑞 𝑞𝑡
𝑃[𝑋 ≥ 𝑠]
14
2. 1. 5 Negative Binomial Distribution
The distribution of the random variable that represents the number of
failures until the first success is called geometric distribution. Now let us
consider the r.v X that denotes the number of failures until the rth success.
If the r.v X has a negative binomial or pascal distribution with parameter r and
p then its pmf is defined as:
P(X = k/ r, p) = P(observing k failures in the first k + r - 1 trials)

P(observing a success at the (k + r)th trial)
= (𝑘+𝑟−1
𝑟−1
)𝑝𝑟−1 𝑞 𝑘 × 𝑝
Thus,
−𝑟 𝑟
P(x=k/r, p) = (𝑘+𝑟−1)𝑝𝑟 𝑞 𝑘 =(𝑘+𝑟−1)𝑝𝑟 𝑞 𝑘 = ( ) 𝑝 (−𝑞)𝑘 𝑓𝑜𝑟 𝑘 = 0,1,2
𝑟−1 𝑘 𝑘
A r.v X having a negative binomial distribution is often referred to as a discrete
waiting-time r.v. It represents how long (in terms of the number of failures) one
waits for the rth success.
Theorem : If the random variable X has a negative binomial distribution, then
𝒓𝒒 𝒓𝒒 𝒑 𝒓
𝑬(𝑿) = 𝑽(𝑿) = 𝒑𝟐 𝒎𝑿 (𝒕) = [𝟏−𝒒𝒆𝒕]
𝒑
𝑟
𝑡𝑥 −𝑟 −𝑟 𝑟 𝑝
Proof: 𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑥 ) = ∑∞ 𝑟 𝑥 ∞ 𝑡 𝑥
𝑥=0 𝑒 ( 𝑥 ) 𝑝 (−𝑞) = ∑𝑥=0 ( 𝑥 ) 𝑝 (−𝑞𝑒 ) = [1−𝑞𝑒 𝑡 ]
15
2.1.6 Poisson Distribution
Suppose that events that occur over a period of time or space satisfy the
following:
1. The numbers of events occurring in disjoint intervals of time are
independent.
2. The probability that exactly one event occurs in a small interval of
time is Δ𝜆¸, where ¸𝜆 >0
3. It is almost unlikely that two or more events occur in a sufficiently
small interval of time.
4. The probability of observing a certain number of events in a time
interval ∆ depends only on the length of and not on the beginning
of the time interval.
The probability mass function of a Poisson distribution with mean 𝜆 is given by
𝜆𝑥 𝑒 −𝜆
𝑓(𝑥; 𝜆) = 𝑥 = 0, 1, 2, … ∞ 𝑎𝑛𝑑 𝜆 > 0
𝑥!
Theorem : If X be a random variable with a Poisson distribution; then

𝒕
𝑬(𝑿) = 𝝀 𝑽(𝑿) = 𝝀 𝒎𝑿 (𝒕) = 𝒆𝝀(𝒆 −𝟏) 𝑷𝒙(𝒕) = 𝒆𝝀(𝒕−𝟏)
𝒆𝒕𝒙 𝒆−𝜆 𝜆 𝒙 𝜆𝑥
Proof: 𝒎𝑿 (𝑡) = 𝐸(𝑒 tx ) = ∑∞
𝒙=𝟎 The infinite series ∑∞
=0 𝑥! is called the
𝒙!
Maclaurin’s series that is equivalent to 𝑒 𝜆
𝒙
(𝜆 𝑒 𝑡 )
𝒎𝑿 (𝑡) = 𝒆−𝜆 ∑∞
𝑡 𝑡 −1)
→ 𝒙=𝟎 = 𝒆−𝜆 𝒆𝜆 𝑒 = 𝑒 𝜆(𝑒
𝒙!
16
𝑡
𝒎′𝒙 (𝑡) = 𝜆 𝒆−𝜆 𝒆𝒕 𝒆𝜆 𝑒 → 𝐸(𝑋) = 𝒎′𝒙 (0) = 𝜆
𝑡
→ 𝑚𝑥′′ (𝑡) = 𝜆 𝑒 −𝜆 𝑒 𝑡 𝑒 𝜆 𝑒 (𝜆 𝑒 𝑡 + 1)
2
→ 𝑚𝒙′′ (0) = 𝜆(𝜆 + 1) ≫ 𝑉(𝑋) = 𝑚𝑥′′ (0) − (𝑚𝑥′′ (𝑡)) = 𝜆
𝒕𝒙 𝒆−𝜆 𝜆 𝒙 (𝜆 𝑡)𝒙
Px(t) = 𝐸(𝑡 x ) = ∑∞
𝒙=𝟎 = 𝒆−𝜆 ∑∞
𝒙=𝟎 = 𝒆−𝜆 𝒆𝜆 𝑡 = 𝑒 𝜆(𝒕−1)
𝒙! 𝒙!
The Poisson distribution can also be developed as a limiting distribution of the
binomial, in which n → ∞ and p 0 so that np remains a constant. In other
words, for large n and small p, the binomial distribution can be approximated
by the Poisson distribution with mean 𝜆 = np.
To verify that let us consider P(S) =1 and based on this formula we have
𝑒 −𝜆 𝜆𝑥 𝜆𝑥
∑∞ ∞
𝑥=0 𝑓(𝑥) = ∑𝑥=0 = 𝑒 −𝜆 ∑∞
=0 𝑥!
𝑥!
→ ∑∞ −𝜆 𝜆
𝑥=0 𝑓(𝑥) = 𝑒 𝑒 = 1.
Let us now show that when 𝑛 → ∞ and 𝑝 → 0, while 𝑛𝑝 = 𝜆 remains
constant, the limiting form of the binomial distribution is for x= 0, 1, 2,
𝑥!
First let us substitute 𝜆/𝑛 for p into the formula for the binomial distribution
and simplify the resulting expression; thus, we get
𝑛! 𝜆 𝑥 𝜆 𝑛−𝑥 𝑛(𝑛 − 1) … (𝑛 − 𝑥 + 1) 𝑥 𝜆 𝑛−𝑥

𝑓(𝑥; 𝑛, 𝑝) = ( ) (1 − ) = 𝜆 (1 − )
𝑥! (𝑛 − 𝑥)! 𝑛 𝑛 𝑥! 𝑛 𝑥 𝑛
1 2 𝑥−1
(1 − 𝑛) (1 − 𝑛) … (1 − 𝑛 )
𝑥
𝜆 𝑛−𝑥
= 𝜆 (1 − )
𝑥! 𝑛
1 2 𝑥−1
If we let 𝑛 → ∞, we find that (1 − 𝑛) (1 − 𝑛) … (1 − ) → 1 and that
𝑛
17
𝑛 𝜆
𝜆 𝑛−𝑥 𝜆 𝜆 𝜆 −𝑥
(1 − 𝑛) = [(1 − ) ] (1 − 𝑛) = 𝑒 −𝜆
𝑛
Hence, the binomial distribution f(x; n, p) approaches for x= 0, 1, 2, …
𝑥!
We obtain the probabilities of Poisson distribution from table by
𝑓(𝑥; 𝜆) = 𝐹(𝑥; 𝜆) − 𝐹(𝑥 − 1; 𝜆).
Remark
1. Let X1, X2,…, , Xn be independent observations from Poisson population
with E(xi) = 𝜆𝑖 𝑖 = 1, 2, . . , 𝑛. Then
∑𝑛𝑖=1 𝑋𝑖 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(∑𝑛𝑖=1 𝜆𝑖 ) for constant 𝜆𝑖 = 𝜆 ∑𝑛𝑖=1 𝑋𝑖 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝑛𝜆)
2. Recurrence relations:
𝜆
P(X = k /𝜆) = (𝑘) 𝑃(𝑋 = 𝑘 − 1) k = 1, 2, ….
𝜆
P(X = k + 1/𝜆) = (𝑘+1) 𝑃(𝑋 = 𝑘) k = 0, 1, 2, …
𝑘
P(X = k -1/𝜆) = (𝜆) 𝑃(𝑋 = 𝑘) k = 1, 2, 3, …
3. Relation to other distributions:
i. Binomial: Let X1 and X2 be independent poisson r,v with mean
𝜆1 𝑎𝑛𝑑 𝜆2 respactively. Then conditionality X1/(X1 + X2 = n) ~
𝜆1
binomial(n, )
𝜆1+𝜆2
ii. Multinomial: If X1, X2, …, Xm are independent Poisson (𝜆) random
variables, then the conditional distribution of X1 given X1, X2,
…,Xm = n is multinomial with n trials and cell probabilities p1 = p2
= ….= pm=1/m
iii. Gamma: let X be a poisson random variables. Then
18
P( X ≤ 𝜆) = P(Y ≥ 𝜆) where Y is Gamma (k +1, 1) random variable
4. Approxmation:
𝑘−𝜆+0.5
Normal: P(X ≤ k /𝜆) ≅ 𝑃( 𝑍 ≤ )
√𝜆
𝑘−𝜆− 0.5
P(X ≥ k /𝜆) ≅ 𝑃( 𝑍 ≥ ) where X is a poisson (𝜆) r.v and Z is
√𝜆
the standard normal random variable.
2.1.7 Hyper-geometric Distribution
Suppose a population contains M success and N-M failures. The probability of
exactly x success in a random sample of size n is
𝑀 𝑁−𝑀
( )( )
𝑓(𝑥; 𝑛, 𝑀, 𝑁) = 𝑥 𝑛−𝑥
𝑁 𝑓𝑜𝑟 𝑥 = 0, 1, 2, … , 𝑛 where N is a positive integer, M is a
( )
𝑛
non-negative integer M≤ 𝑁 and n is a positive integer (n≤ 𝑁).
Hence 𝑓(𝑥; 𝑛, 𝑀, 𝑁) define a hyper-geometric distribution probability mass
function.
In practice, a hyper geometric distribution occurs as follows. Let an urn contain
N balls, of which M are white and (N-M) are black. Of these n balls are chosen
without replacement. So the probability of getting n balls out of the total of N
balls is 1/(𝑁𝑛). Let X be a r.v that denotes the number of white balls drawn. The
event of interest contains (𝑀

𝑥
)(𝑁−𝑀
𝑛−𝑥
) points to find the probability of x white
balls among the n balls drawn, so
19
𝑀 𝑁−𝑀
( )( )
𝑃(𝑋 = 𝑥) = 𝑥 𝑛−𝑥 𝑓𝑜𝑟 𝑥 = 0, 1, 2, … , 𝑛
𝑁
( )
𝑛
We have hyper-geometric distribution, under the following conditions.
i. A single trial results in one of the two possible outcomes, say A and A’
ii. P(A) and hence P(A’) vary in repeated trials.
iii. Repeated trials are dependent.
iv. A fixed number of trials (n) are to be performed.
Note that the binomial distribution would apply if we do sampling with
replacement.
Example: A shipment of 20 digital voice recorders contains 5 that are defective.
If 10 of them are randomly chosen for inspection, what is the probability that 2
of the 10 will be defective?
i. When the sample is drawn without replacement
ii. When the sample is drawn with replacement

5 15
( )( )
Solution: i) 𝑓(2; 10,5,20) = 2 8
20 = 0.348
( )
10
ii) If the sample is drawn with replacement the selection of
each recorder has equal probability, p = 5/20 = 0.25
Thus, f(2, 10, 0.25) = ⟨10

2
⟩ (0.25)2(0.75)8 =
Theorem If X has a hyper-geometric distribution, then
𝑴 𝑴 𝑵−𝑴 𝑵−𝒏
𝑬(𝑿) = 𝒏 𝑵 𝑽(𝑿) = 𝒏 𝑵 𝑵 𝑵−𝟏
20
𝑀 𝑁−𝑀 𝑀−1 𝑁−𝑀 𝑀−1 𝑁−𝑀

( )( ) 𝑀 ( )( ) 𝑀 ( )( ) 𝑀
Proof: 𝐸(𝑋) = ∑𝑥=0 𝑥 𝑥 𝑁𝑛−𝑥
𝑛
=𝑛 ∑ 𝑛 𝑥−1 𝑛−𝑥
=𝑛 ∑ 𝑛−1 𝑥−1 𝑛−𝑥
=𝑛𝑁
( ) 𝑁 𝑥=1 (
𝑁−1
) 𝑁 𝑥−1=0 (
𝑁−1
)
𝑛 𝑛−1 𝑛−1
𝑎 𝑏 𝑎+𝑏
using ∑𝑚
𝑖=0 ( 𝑖 ) ( )=( )
𝑚−𝑖 𝑚
𝑛 𝑀 𝑁−𝑀 𝑀−2 𝑁−𝑀

( )( ) 𝑛(𝑛 − 1)𝑀(𝑀 − 1) 𝑛 ( )( )
𝐸(𝑋(𝑋 − 1)) = ∑ 𝑥(𝑥 − 1) 𝑥 𝑛 − 𝑥 = ∑ 𝑥−2 𝑛−𝑥
𝑁 𝑁(𝑁 − 1) 𝑁−2
𝑥=0 ( ) 𝑥=2 ( )
𝑛 𝑛−2
𝑀(𝑀 − 1)
= 𝑛(𝑛 − 1)
𝑁(𝑁 − 1)
2 2
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝐸(𝑋(𝑋 − 1)) + 𝐸(𝑋) − (𝐸(𝑋))
𝑀(𝑀 − 1) 𝑀 𝑀 2 𝑀 (𝑁 − 𝑀)(𝑁 − 𝑛)
= 𝑛(𝑛 − 1) + 𝑛 − (𝑛 ) = 𝑛
𝑁(𝑁 − 1) 𝑁 𝑁 𝑁 𝑁(𝑁 − 1)
𝑀 𝑀 𝑁−𝑛
=𝑛 (1 − )
𝑁 𝑁 𝑁−1
Remark: Let X and Y be independent binomial random variables with common
success probability p and numbers of trials m and n, respectively. Then
𝑃(𝑋=𝑘)𝑃(𝑌=𝑠−𝑘)
P(X = k/X + Y =s) = which simplifies to
𝑃(𝑋+𝑌=𝑠)
(𝑚 𝑛
𝑘 )(𝑠−𝑘)
P(X = k/X + Y =s) = (𝑚+𝑛
𝑠 )
Thus, the conditional distribution of X given X + Y = s is hyper geometric (s, m,
m + n).
Approximation:
i. Let P = M/N, then for large N and M
P( X = k) ≅ (𝑛𝑘)𝑝𝑘 𝑞 𝑛−𝑘
ii. Let (M/N) be small and n is large such that n(M/N) = 𝜆. Then
21
𝑒 −𝜆 𝜆𝑘
P(X = k) ≅ 𝑘!
2.1.8 The Multinomial Distribution
An immediate generalization of the binomial distribution arises when each trial
can have more than two possible outcomes. This happens, for example, when a
manufactured product is classified as superior, average, or poor, when a
student’s performance is graded as A, B,C, D and F. To treat this kind of
problem in general, let us consider the case where there are n independent
trials, with each trial permitting k mutually exclusive outcomes whose
respective probabilities are 𝑝1 , 𝑝2 , 𝑝3 , … , 𝑝𝑘 (𝑤𝑖𝑡ℎ ∑ 𝑝𝑖 = 1).
Referring to the outcomes as being of the first kind, the second kind, …, and
the k-th kind, we shall be interested in the probability 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑘 ) of getting x1
outcomes of the first kind, x2 outcomes of the second kind, …, and xk outcomes
of the kth kind, with ∑𝑘𝑖=1 𝑥𝑖 = 𝑛.
The desired probability is given by
𝑛!
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑘 ) = 𝑥 𝑝1𝑥1 𝑝2𝑥2 … 𝑝𝑘𝑥𝑘 𝑓𝑜𝑟 𝑥𝑖 = 0,1, … , 𝑛
1 !𝑥2 ! … 𝑥𝑘 !
but the xi subject to the restriction ∑𝑘𝑖=1 𝑥𝑖 = 𝑛. The joint probability
distribution whose values are given by these probabilities is called the
multinomial distribution
Example: The probabilities that the light bulb of a certain kind slide projector
will last fewer than 40 hours of continuous use, anywhere from 40 to 80 hours
of continuous use, or more than 80 hours of continuous use, are 0.3, 0.5 and
0.2. Find the probability that among eight such bulbs 2 will last fewer than 40
22
hours, 5 will last anywhere from 40 to 80 hours, and 1 will last more than 80
hours.
8!
Solution: 𝑓(2,5,1) = 2!5!1! (0.3)2 (0.5)5 (0.2) = 0.0945
2.2. Important Univariate Continous Distributions
2.2.1 Uniform or Rectangular Distribution
Definition: A random variable X is said to have a uniform distribution on the
interval [a, b] −∞ < 𝑎 < 𝑏 < ∞ if its pdf is given by

1
a< 𝑥<𝑏
fx(x) = {𝑏−𝑎
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
If X ~U[a, b] then the distribution function is

𝑥−𝑎
𝑎≤𝑥<𝑏
𝑏−𝑎
F(x) = { 0 𝑥<𝑎
1 𝑏≤𝑥
Theorem: If the random variable X has a uniform distribution on [a, b], then
𝑎+𝑏 (𝑏−𝑎)2 𝑒 𝑏𝑡 −𝑒 𝑎𝑡
E(x) = Var(x) = Mx(t) = (𝑏−𝑎)𝑡
2 12
𝑏 1 𝑏 2 −𝑎2 𝑎+𝑏
Proof: E(x) = ∫𝑎 𝑥 𝑏−𝑎dx = = //
2(𝑏−𝑎) 2
𝑏 1 𝑏 3 −𝑎3 (𝑏−𝑎)(𝑎2 + 𝑎𝑏+ 𝑏2 )

E(x2) = ∫𝑎 𝑥 2 𝑏−𝑎 𝑑𝑥 = =
3(𝑏−𝑎) 3(𝑏−𝑎)
(𝑎2 + 𝑎𝑏+ 𝑏 2 ) 𝑎+𝑏 2

V(x) = E(x2) – (E(x))2 = -( )
3 2
𝑏−𝑎 2
=( ) //
12
𝑏 1 𝑒 𝑏𝑡 −𝑒 𝑎𝑡
Mx(t) = E(𝑒 𝑡𝑥 ) = ∫𝑎 𝑒 𝑡𝑥 𝑏−𝑎 𝑑𝑥 = (𝑏−𝑎)𝑡
//
23
2.2.2 Normal Distribution
The probability density function of a normal random variable X with mean 𝜇
and standard deviation 𝜎 is given by
1 (𝑋−𝜇)2
f(x; 𝜇, 𝜎) = exp(− ) for −∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞ 𝑎𝑛𝑑 𝜎 > 0
√2𝜋 𝜎 2𝜎2
This distribution is commonly denoted by N (𝜇, 𝜎 2 ).
The mean 𝜇 is the location parameter, and the standard deviation 𝛿 is the scale
parameter.
The normal random variable with mean 𝜇 = 0 and standard deviation 𝜎 = 1 is
called the standard normal random variable with pdf
1 𝑧2
f(z) = exp(− 2 ) for −∞ < 𝑧 < ∞,
√2𝜋
and its cdf is denoted by Φ(z).
If X is a normal random variable with mean 𝜇 and standard deviation 𝛿, then
(𝑥−𝜇)⁄
𝑥−𝜇 1 𝜎 𝑥−𝜇
P(x≤ 𝑥) = 𝑃 (𝑍 ≤ )= ∫ 𝑒𝑥𝑝(−𝑡 2 ⁄2) 𝑑𝑡 = Φ ( )
𝜎 √2𝜋 −∞ 𝜎
Theorem: If the random variable X is a normal distribution, then
𝜎2 𝑡2
E(X) = 𝜇 Var(X) = 𝛿 2 Mx(t) = 𝑒 𝜇𝑡+ 2
Proof:
∞ −1 𝑥−𝜇 2
1
𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) 𝑡𝜇
= 𝑒 𝐸(𝑒 𝑡(𝑋−𝜇)
)=𝑒 𝑡𝜇
∫ 𝑒 𝑡(𝑥−𝜇)
𝑒 2 ( 𝜎 ) 𝑑𝑥
−∞ 𝜎√2𝜋
∞
1 −1
[(𝑥−𝜇)2 −2𝜎2 𝑡(𝑥−𝜇)]
= 𝑒 𝑡𝜇 ∫ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞
If we complete the square inside the bracket, it becomes
(𝑥 − 𝜇)2 − 2𝜎 2 𝑡(𝑥 − 𝜇) = (𝑥 − 𝜇)2 − 2𝜎 2 𝑡(𝑥 − 𝜇) + 𝜎 4 𝑡 2 − 𝜎 4 𝑡 2 = (𝑥 − 𝜇 − 𝜎 2 𝑡)2 − 𝜎 4 𝑡 2
24
and we have
𝜎2 𝑡2 ∞ −1
1 (𝑥−𝜇−𝜎2 𝑡)
𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = 𝑒 𝑡𝜇 𝑒 2 ∫ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞
1
The integral together with the factor is necessarily 1 since it is the area
𝜎√2𝜋
under a normal distribution with mean 𝜇 + 𝜎 2 𝑡 and variance 𝜎 2 .
𝜎2 𝑡2 𝜎2 𝑡2
Hence 𝑚𝑋 (𝑡) = 𝑒 𝑡𝜇 𝑒 2 = 𝑒 𝜇𝑡+ 2
2
𝐸(𝑋) = 𝑚𝑋′ (0) = 𝜇 𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − 𝜇 2 = 𝜎 2
Example: Suppose that an instructor assumes that a student’s final score is the
value of a normally distributed random variable. If the instructor decides to
award a grade of A to thus students whose score exceeds 𝜇 + 𝜎, a B to those
students whose score falls between 𝜇 and 𝜇 + 𝜎, a C if a score falls between 𝜇 −
𝜎 and 𝜇, a D if a score falls between 𝜇 − 2𝜎 and 𝜇 − 𝜎, and an F if the score falls
below 𝜇 − 2𝜎, then the proportions of each grade given can be calculated. For
example, since
𝜇+𝜎−𝜇
𝑃[𝑋 > 𝜇 + 𝜎] = 1 − 𝑃[𝑋 < 𝜇 + 𝜎] = 1 − 𝐹(𝑧) = 1 − 𝐹 ( ) = 1 − 𝐹(1) ≈
𝜎
0.1587
One would expect 15.87 percent of the students to receive A’s.
2.2.3 Exponential Distributions
A classical situation in which an exponential distribution arises is as follows:
Consider a Poisson process with mean 𝜆, where we count the events occurring
in a given interval of time or space. Let X denote the waiting time until the first
event to occur. Then, for a given x > 0,
25
P (X > x) = P (no event in (0,x))
= exp(-x𝜆)
and hence
P(X≤ 𝑥) = 1 – exp(-x𝜆) (*)
The distribution in (*) is called the exponential distribution with mean waiting
time b = 1/𝜆. The probability density function is given by
1 𝑥
f(x/b) = 𝑏 exp (− 𝑏) , 𝑥 > 0, 𝑏 > 0.
Or f(x/𝜆) = 𝜆 exp(−𝜆𝑥) 𝑋 > 0 𝑎𝑛𝑑 𝜆 > 0
Remark: Before we calculate the mean and variance of exponential and gamma
functions,
∞
We have: Γ(𝑡) = ∫0 𝑥 𝑡−1 𝑒 −𝑥 𝑑𝑥 ……. …………… (**)
Γ(𝑡) = (𝑡 − 1)Γ(𝑡 − 1) …………………..(***)
So when t is a positive integer Γ(𝑡) = (t – 1)!
∞ Γ(𝑡)
Also ∫0 𝑥 𝑡−1 𝑒 −𝜆𝑥 𝑑𝑥 = …………………..(****)
𝜆𝑡
1 ∞
Γ (2) = ∫0 𝑥 −1/2 𝑒 −𝑥 𝑑𝑥 = √𝜋 …………..(*****)
Thus, the mean and variance of exponential distribution are:

∞ 𝜆Γ(2) 1
E(x) = 𝜆 ∫0 𝑥𝑒 −𝜆𝑥 𝑑𝑥 = =
𝜆2 𝜆
∞ 𝜆Γ(3) 2
E(𝑋 2 ) = 𝜆 ∫0 𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = =
𝜆3 𝜆2
2 1 1
V(x) = E(x2) – (E(x))2 = − =
𝜆2 𝜆2 𝜆2
Theorem: If a random variable X has an exponential distribution, then
1 𝜆
Mx(t) = =
(1−𝑏𝑡) 𝜆−𝑡
26
∞ ∞
Proof: Mx(t) = E (𝑒 𝑡𝑥 ) = ∫0 𝑒 𝑡𝑥 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆 ∫0 𝑒 −𝑥(𝜆−𝑡) 𝑑𝑥
𝜆Γ(1) 𝜆
= =
𝜆−𝑡 𝜆−𝑡
Properties:
1. Memory less Property: For a given t > 0 and s > 0, P(X > t + s/X > s) =
P(X > t) where X is the exponential random variable

∞
𝑃(𝑋>𝑠+𝑡) 𝜆 ∫𝑠+𝑡 𝑒 −𝜆𝑥 𝑒 −𝜆(𝑠+𝑡)
→ P(X > s + t/ X > s) = = ∞ = = 𝑒 −𝜆𝑡 = 𝑃( 𝑋 > 𝑡)
𝑃(𝑋>𝑠) 𝜆 ∫𝑠 𝑒 −𝜆𝑥 𝑑𝑥 𝑒 −𝜆(𝑠)
2. Let X1, X2, …, Xn be independent exponential(0, b) random variables.
Then
∑𝑛𝑖=1 𝑋𝑖 ~ 𝑔𝑎𝑚𝑚𝑎 (𝑛, 𝑏)
2.2.4 Gamma Distributions
The gamma distribution can be viewed as a generalization of the exponential
distribution with mean 1/𝜆¸λ > 0. An exponential random variable with mean
1/𝜆 represents the waiting time until the first event to occur, while the gamma
random variable X represents the waiting time until the 𝑟th event to occur.
Therefore,
X = ∑𝑟𝑖=1 𝑌𝑖
Where Y1, Y2, …, Y𝑟 are independent exponential random variables with mean
1/𝜆
The probability density function of X is given by
𝜆𝑟 𝑒 −𝜆𝑥 𝑥 𝑟−1
f(x ) = 𝑥 > 𝑜, 𝑟 > 0, 𝜆 > 0.
Γ(𝑟)
27
The distribution defined above is called the gamma distribution with shape
parameter 𝑟 and the scale parameter𝜆.
The gamma probability density plots in Figure 1 indicate that the degree of
asymmetry of the gamma distribution diminishes as 𝑟 increases. For large 𝑟 (X
- 𝑟)/√𝑟 is approximately distributed as the standard normal random variable.
Figure 1.
f(x) Graph of some gamma a pdfs (λ=1)
r=1
r=2
r=4
0 1 2 3 x
If the random variable X has a gamma distribution with parameter r and λ, then
𝒓 𝒓 𝝀 𝒓
𝑬(𝑿) = 𝑽(𝑿) = 𝒎𝑿 (𝒕) = ( ) for t<λ
𝝀 𝝀𝟐 𝝀−𝒕
Proof:
𝜆𝑟 −∞ 𝜆𝑟 Γ(𝑟+1) 𝑟
i. E(x) = ∫ 𝑥𝑥 𝑟−1 𝑒 −𝜆𝑥 𝑑𝑥 = =
Γ(𝑟) 0 Γ(𝑟)𝜆𝑟+1 𝜆
𝜆𝑟 −∞ 𝜆𝑟 Γ(𝑟+2) 𝑟(𝑟+1)
E(𝑥 2 ) = ∫ 𝑥 2 𝑥 𝑟−1 𝑒 −𝜆𝑥 𝑑𝑥 = =
Γ(𝑟) 0 Γ(𝑟)𝜆𝑟+2 𝜆2
28
⟹ 𝑉𝑎𝑟(𝑥) = 𝐸(𝑥 2 ) − (𝐸(𝑥))2
𝑟(𝑟+1) 𝑟 𝑟
= – (𝜆 )2 = //
𝜆2 𝜆2
∞
𝜆𝑟 𝑡𝑥 𝑟−1 −𝜆𝑥
𝑚𝑋 (𝑡) = 𝐸[𝑒 𝑡𝑋 ] = ∫ 𝑒 𝑥 𝑒 𝑑𝑥
0 г(𝑟)
𝜆 𝑟 ∞ (𝜆 − 𝑡)𝑟 𝑟−1 −(𝜆−𝑡)𝑥 𝜆 𝑟

=( ) ∫ 𝑥 𝑒 𝑑𝑥 = ( )
𝜆−𝑡 0 г(𝑟) 𝜆−𝑡
𝑚𝑋′ (𝑡) = 𝑟𝜆𝑟 (𝜆 − 𝑡)−𝑟−1 𝑎𝑛𝑑 𝑚𝑋′′ (𝑡) = 𝑟(𝑟 + 1)𝜆𝑟 (𝜆 − 𝑡)−𝑟−2 ;
𝑟 2 𝑟 2 𝑟
Hence 𝐸(𝑋) = 𝑚𝑋′ (0) = 𝜆 𝑎𝑛𝑑 𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − (𝜆) = 𝜆2
2.2.5 Beta Distributions
The probability density function of a beta random variable with shape
parameters a and b is given by
1
𝑓(𝑥; 𝑎, 𝑏) = 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝐼(0,1) (𝑥) 𝑤ℎ𝑒𝑟𝑒 𝑎 > 0 𝑎𝑛𝑑 𝑏 > 0
𝐵(𝑎,𝑏)
1
where the beta function B(a; b) = Γ(a) Γ (b)/Γ (a + b) and 𝐵(𝑎, 𝑏) = ∫0 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥 .
A situation where the beta distribution arises is given below.
Consider a Poisson process with arrival rate of 𝜆 events per unit time. Let Wk denote the waiting
time until the kth arrival of an event and Ws denote the waiting time until the sth arrival, s > k.
Then, Wk and Ws - Wk are independent gamma random variables with
Wk ~ gamma(k, 1/λ¸) and Ws - Wk ~ gamma(s - k; 1/𝜆)
The proportion of the time taken by the first k arrivals in the time needed for the first s arrivals is
𝑊𝑘 𝑊𝑘
= ~ 𝑏𝑒𝑡𝑎 (𝑘, 𝑠 − 𝑘).
𝑊𝑠 𝑊𝑘 + (𝑊𝑠 − 𝑊𝑘)
Remark: The beta distribution reduces to the uniform distribution over (0, 1) if a=b=1.
 Let x1, x2, …., xn be a sample from a beta distribution with shape parameter a and b. Let
29
𝑛 𝑛
1 1
𝑥̅ = ∑ 𝑋𝑖 𝑎𝑛𝑑 𝑠 2 = ∑(𝑋𝑖 − 𝑥̅ )2
𝑛 𝑛−1
𝑖=1 𝑖=1
𝑥̅ (1−𝑥̅ ) 𝑥̅ (1−𝑥̅ )𝑎̂

Hence, 𝑎̂ = 𝑥̅ ⌈ − 1⌉ 𝑏̂ =
𝑠2 𝑠2
Theorem: If the random variable X has a beta -distribution with parameters a and b, then
𝒂 𝒂𝒃
𝑬(𝑿) = 𝒂+𝒃 𝑽(𝑿) = (𝒂+𝒃+𝟏)(𝒂+𝒃)𝟐
1 1 B(k+a,b) г(k+a)г(b) г(a+b)

Proof: E[X k ] = ∫ x k+a−1 (1 − x)b−1 dx = = . =
B(a,b) 0 B(a,b) г(k+a+b) г(a)г(b)
г(k+a)г(a+b)
г(a)г(k+a+b)
г(1+𝑎)г(𝑎+𝑏) 𝑎 г(𝑎)г(𝑎+𝑏) 𝑎
Thus, E(x) = = =
г(𝑎)г(1+𝑎+𝑏) 𝑎+𝑏 г(𝑎)г(𝑎+𝑏) 𝑎+𝑏
г(2+𝑎)г(𝑎+𝑏) (𝑎+1)𝑎 г(𝑎)г(𝑎+𝑏) (𝑎+1)𝑎

E(x2) = = = (𝑎+𝑏+1)(𝑎+𝑏)
г(𝑎)г(2+𝑎+𝑏) (𝑎+𝑏+1)(𝑎+𝑏) г(𝑎)г(𝑎+𝑏)
(𝑎+1)𝑎 𝑎 2 𝒂𝒃
Var(x) = (𝑎+𝑏+1)(𝑎+𝑏)
− (𝑎+𝑏) = (𝒂+𝒃+𝟏)(𝒂+𝒃)𝟐
2.2.6. Weibull Distribution
Let Y be a standard exponential random variable with probability density function
f(y) = 𝑒 −𝑦 𝑌 >0
Define X = b𝑌1/𝑐 + m b > 0, c > 0
30
The distribution of X is known as the Weibull distribution with shape parameter c, scale
parameter b, and the location parameter m. Its probability density is given by
𝑐 𝑥−𝑚 𝑐−1 𝑥−𝑚 𝑐

f(x/b, c, m) = ( ) 𝑒𝑥𝑝 {− [ ] } 𝑥 > 𝑚, 𝑏 > 0, 𝑐 > 0.
𝑏 𝑏 𝑏
𝑐 𝑥 𝑐−1 𝑥 𝑐
Or f(x/b, c, m = 0) = ( ) 𝑒𝑥𝑝 {− [𝑏] } 𝑏 > 0, 𝑐 > 0.
𝑏 𝑏
Applications:
The Weibull distribution is one of the important distributions in reliability theory. It is widely
used to analyze the cumulative loss of performance of a complex system in systems engineering.
In general, it can be used to describe the data on waiting time until an event occurs. In this
manner, it is applied in risk analysis, actuarial science and engineering. Furthermore, the Weibull
distribution has applications in medical, biological, and earth sciences.
Theorem: If the random variable X has a Weibull distribution, then
𝑐+1 C+2 𝑐+1 2

E(x) = bΓ ( ) Var(x) = 𝑏 2 {Γ ( ) − [Γ ( )] }
𝑐 c 𝑐
Proof:
∞ 𝑐 𝑥 𝑐−1 𝑥 𝑐 𝑥 𝑐 𝑐
E(x) = ∫0 𝑥 𝑏 (𝑏) 𝑒𝑥𝑝 {− [𝑏] } 𝑑𝑥 → 𝑙𝑒𝑡 𝑢 = [𝑏] → 𝑑𝑢 = 𝑥 𝑐−1 𝑑𝑥 → 𝑥 =
𝑏𝑐
𝑏𝑢1/𝑐
∞ 1
𝑐 𝑥 𝑐−1 𝑏𝑐 ∞ 1
Thus, E(x) = ∫0 𝑏𝑢 𝑐 𝑏 (𝑏) 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 = ∫0 𝑏𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 =
𝑐𝑥 𝑐−1
1
∞
𝑏 ∫0 𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢
∞ 𝑐+1
−1 𝑐+1
= b ∫0 𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 = 𝑏Γ ( 𝑐
)
31
𝑋−𝑚 𝑐
Remark: Let X be a Weibull (b, c, m) random variable. Then, ( ) ~ exp(1) is the
𝑏
exponential distribution with mean 1.
2.2.7. Cauchy Distribution
The probability density function of a Cauchy distribution with the location parameter a and the
scale parameter b is given by

1
f(x/a, b) = 𝑥−𝑎 2
−∞ < 𝑎 < ∞, 𝑏 > 0
𝜋𝑏[1+( ) ]
𝑏
Note that:
If the r.v X follows Cauchy distribution its mean and variance do not exist.
It can be postulated as a model for describing data that arise as n realizations of the ratio
of two normal random variables
If X and Y are independent standard normal random variables, then U = X/Y follows the
Cauchy distribution with probability density function

1
f(u) = 𝜋[1+𝑢2 ] −∞ < 𝑢 < ∞, that is the standard Cauchy distribution with b =
1 and a = 0 .
CHAPTER THREE
Common Multivariate Distributions
1.1. Multidimensional Random Variables
What is random variable?
32
Random variable(r.v) is simply a function defined on a sample space S and taking values in real
line R = (−∞, ∞). In the study of many random experiments, there are, or can be, more than one
r.v of interest; hence we are compelled to extend our definitions of the distribution and density
function of one r.v to those of several r.vs. For example, during the “health awareness week” we
may consider a population of university students and record randomly selected student’s
height(X1), weight (X2), age (X3) and blood pressure (X4). Each individual r.v may be assumed
to follow some appropriate probability distribution in the population but it may also quite
reasonable to assume that four variables together follow a certain joint probability distribution.
1.2.Trinomial and Multinomial Distribution
Let X1, X2, …, Xk be k random variables all defined on the same probability space
i. The joint probability mass function (pmf) of x = (x1, x2, …, xk) is given by
P(x) = P(X1 = x1, X2 = x2, …, Xk = xk) for all xi ∈ Xi I = 1, 2, .., k.
Such that
a. P(x) ≥ 𝑜 𝑓𝑜𝑟 x = (x1, x2, … , xk)
b. ∑𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 ∈𝑋 𝑃(𝑥) = 1
ii. The joint probability density function (pdf) of x = (x1, x2, …, xk) is given by
f(x) = f (x1, x2, …, xk) for x = (x1, x2, …, xk) such that
a. f(x) ≥ 0 for all x = (x1, x2, …, xk)

∞ ∞
b. ∫−∞ … ∫−∞ 𝑓 (𝑥)𝑑𝑥1 … 𝑥𝑘 = 1
iii. The Joint cumulative distribution function of X1, X2, …, Xk denoted by FX1, …, Xk(x1, x2,
…, xk) is defined as P(X1 ≤ x1 , X2 ≤ x2 , … , Xk ≤ xk ) =
𝑥1 𝑥𝑘
∫−∞ … ∫−∞ 𝑓𝑥1,… ,𝑥𝑘 (𝑢1 , 𝑢2 , … . , 𝑢𝑘 )𝑑𝑢1 … 𝑢𝑘 for all (x1, x2, …, xk),
f(x1, x2, …, xk) is the joint pdf.
33
iv. Marginal Cumulative distribution functions: If FX1, …, Xk(x1, x2, …, xk) is the joint
cumulative distribution function of X1, X2, …, Xk , then the cumulative distribution functions
F(X1), F(X2), …, F(Xk ) are called marginal cumulative distribution functions.
For example: let (X, Y) be two discrete or Continous random variables assuming all values in
some region (range space). The joint probability density function f is a function satisfying the
following conditions.
X and Y discrete X and Y Continous
f(x,y) is the joint pmf if f(x,y) is the joint pdf if
 f(x, y) ≥ 0 ∀ 𝑥, 𝑦  f(x, y) ≥ 0 ∀ 𝑥, 𝑦
∞ ∞
 ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1  ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1
Marginal probability distributions:
∑𝑦 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠

f(x) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
∑𝑥 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠

f(y) = { ∞
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Conditional Probability density function
Let X an Y be two random variables with joint probability f(x, y), then the conditional
probability of
𝑓(𝑥,𝑦)
a. Y given X = x is given by f(Y/ X = x) = 𝑓(𝑥)
𝑓(𝑥,𝑦)
b. X given Y = y is given by f(X/ Y = y) = 𝑓(𝑦)
34
Let X1, X2, …, Xk be k random variables, discrete or continuous with joint probability
function f(x1, x2, …, xk) and marginal probability function f1(x1), …, fk(xk) respectively. The
random variables X1, X2, …, Xk are said to be mutually independent if and only if f(x1, x2, …,
xk) = f1(x1). f2(x2), …, . fk(xk for all (x1, x2, …, xk) with in their range.
Example1: Let X and Y be two random variables whose joint probability mass function is given
in the table.
Y
-2 -1 1 2
X P(x) P(X≤ 𝑥)
-1 1/16 1/8 1/8 1/16 3/8

0 1/16 1/16 1/16 1/16 ¼
1 1/16 1/8 1/8 1/16 3/8
p(y) 3/16 5/16 5/16 3/16
P(Y≤ 𝑦)
a. Marginal pmf of X and Y
b. F(x = 0, Y = -1), F(0, 1)
c. P( X / Y = 2), P(Y/X = 0)
d. Are they independent r.vs? How?
Example 2: Let (X, Y) have the distribution defined by the density function
𝑒 −𝑥−𝑦 𝑥 > 0, 𝑦 > 0

f(x, y) = {
Determine the following:
a. Marginal probability density function of x and Y
b. Joint distribution function
c. P(x + y ≤ 4)
d. Are X and Y independent variables?
35
e. P(X/Y=y) and P(Y/ X = x)
Solutions: a) f(x) = 𝑒 −𝑥 𝑓𝑜𝑟 𝑥 > 0, 𝑓(𝑦) = 𝑒 −𝑦 𝑓𝑜𝑟 𝑦 > 0.
b) Fx,y(X, Y) = ( 1 - 𝑒 −𝑥 )(1 − 𝑒 −𝑦 ) 𝑥, 𝑦 > 0.
C) 1 - 5𝑒 −4
Example 3: The probability density function of a two dimensional random variables (X, Y) is
given by
x + y 0<𝑥+𝑦 <1
f(x, y) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
i. Marginal pdf of X and Y
ii. Evaluate P(X < ½, y > ¼ )
Solution:
ii) 𝑦
x=½
y=¼
x+y=1 x
1 1 1−𝑥
1−𝑥 𝑦2
P( X < ½ , Y > ¼) = ∫02 ∫1 (𝑥 + 𝑦)𝑑𝑥𝑑𝑦 = ∫02 {⌈𝑥𝑦 + ⌉ } 𝑑𝑥
4 2 1/4
1
(1−𝑥)2 𝑥 1
= ∫0 [𝑥(1 − 𝑥) +
2 − − 32] 𝑑𝑥
2 4
1/2 15 𝑥2 𝑥
= ∫0 [32 − − 4] 𝑑𝑥 = 35/192 //
2
Conditional Expectation:
If (X, Y) is two dimensional random variables we define the conditional expectation as
follows:
36
∞
E(X/Y) = ∫−∞ 𝑥𝑔(𝑥/𝑦)𝑑𝑥
 ∞ 𝑤ℎ𝑒𝑟𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠 𝑟. 𝑣𝑠
𝐸(𝑌/𝑋) = ∫=∞ 𝑦𝑔(𝑦/𝑥)𝑑𝑦
𝐸(𝑋/𝑌) = ∑∞𝑖=1 𝑥𝑖𝑃(𝑥𝑖/𝑦𝑗)

 𝐸(𝑌/𝑋) = ∑∞
𝑤ℎ𝑒𝑟𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟. 𝑣𝑠.
𝑗=1 𝑦𝑗𝑃(𝑦𝑗/𝑥𝑖)
Example: Suppose the joint density function of X and Y is given by
𝑒 −𝑥/𝑦 𝑒 −𝑦
f(x, y) = { 𝑜 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑦
Compute E(X/Y).
∞ 𝑓(𝑥,𝑦) ∞
Solution: E(X/Y) = ∫−∞ 𝑥𝑓(𝑥/𝑦)𝑑𝑥 where f(x/y) = 𝑓(𝑦)
and f(y) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥
𝑥
− 𝑥
∞ 𝑒 𝑦 𝑒 −𝑦 𝑒 −𝑦 ∞ − 𝑋 𝑑𝑥
→ f(y) = ∫0 dx = ∫ 𝑒 𝑦 𝑑𝑥 𝑙𝑒𝑡 𝑈 = → 𝑑𝑢 = → 𝑑𝑥 =
𝑦 𝑦 0 𝑌 𝑦
𝑦𝑑𝑢
𝑒 −𝑦 ∞
→ f(y) = 𝑦
∫0 𝑒 −𝑢 𝑦𝑑𝑢 = 𝑒 −𝑦 (−𝑒 −𝑢 )|∞
0
= 𝑒 −𝑦 (0 − −𝑒 0 ) = 𝑒 −𝑦
𝑥
−
𝑒 𝑦 𝑒−𝑦 −
𝑥
𝑓(𝑥,𝑦) 𝑦 𝑒 𝑦
𝑓𝑥 (𝑋, 𝑌) = =
𝑦 𝑓(𝑦) 𝑒 −𝑦 𝑦
𝑥
−
𝑒 𝑦
→ 𝑓𝑥 (𝑋, 𝑌) = { 𝑜 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑦
𝑦
𝑜 𝑂 𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥
∞𝑥 − 𝑑𝑥
Thus. E(X/Y) = ∫ 𝑥 𝑓𝑥 (𝑋, 𝑌)𝑑𝑥 = ∫0 𝑒 𝑦 𝑑𝑥 → 𝑙𝑒𝑡 𝑧 = 𝑥/𝑦 dz = 𝑑𝑥 = 𝑦𝑑𝑧
𝑦 𝑦 𝑦
∞ ∞
E(X/Y) = ∫0 𝑧𝑒 −𝑧 𝑦𝑑𝑧 = 𝑦 ∫0 𝑧𝑒 −𝑧 𝑑𝑧 = 𝑦Γ(2)
∞
Since Γ(𝑎) = ∫0 𝑥 𝑎−1 𝑒 −𝑥 𝑑𝑥
→ E(X/Y) = y(1) = y//
1.3. Bi-variate and Multivariate Normal distributions
37
Chapter Four
Sampling Distribution
4.1.Introduction
In statistics a collection of objects whose elements are examined in view of their individual
characteristics is called population. Whereas sample is a part of population of which
conclusion is drawn about the population based on its results. In Continous case population
38
are usually values of identically distributed random variables, whose distribution we referred
to as a population distribution.
Definition: If X1, X2, …, Xn are independently and identically distributed random variables,
we say that they constitute a random sample from the infinite population given by their
common distribution.
If f(x1, x2, …, xn ) is the value of the joint distribution of such a set of random variables at
(X1, X2, …, Xn) we can write f(x1, x2, …, xn ) = ∏𝑛𝑖=1 𝑓(𝑥𝑖 ) where f(xi) is the value of
population distribution at xi.
Definition: If X1, X2, …, Xn constitute a random sample, then
∑ 𝑥𝑖 ∑(𝑥𝑖−𝑥̅ )2
𝑥̅ = is called the sample mean and 𝑠 2 = is called the sample variance.
𝑛 𝑛−1
𝑇ℎ𝑒 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑀𝑒𝑎𝑛: Since statistics are random variables, their values will vary
from sample to sample, it is customary to refer to their distribution as sampling distribution.
𝐸𝑥𝑎𝑚𝑝𝑙𝑒: If x1, x2, …., xn constitute a random sample from an infinite population with the
𝜎2
mean 𝜇 and the variance 𝜎 2 then E(𝑥̅ ) = 𝜇 𝑎𝑛𝑑 𝑣𝑎𝑟(𝑥̅ ) = 𝑛
∑ 𝑥𝑖 ∑ 𝑥𝑖 ∑ 𝐸(𝑥𝑖) 1 1
𝑝𝑟𝑜𝑜𝑓: Let Y = 𝑥̅ = then we get E(Y) = E( )= = 𝑛∑𝜇 = 𝑛𝜇 = 𝜇
𝑛 𝑛 𝑛 𝑛
∑ 𝑥𝑖 1 1 1
Var(Y) = var( )= 𝑣𝑎𝑟(∑ 𝑥𝑖) = ∑ 𝑣𝑎𝑟(𝑋𝑖) + 2 ∑ ∑ 2 𝑐𝑜𝑣(𝑥𝑖, 𝑥𝑗)
𝑛 𝑛2 𝑛2 𝑛
Since xi and xj are random (independent) variables, cov(xi, xj) = 0
1 1 𝑛𝜎2 𝜎2
So var(xi) = ∑ 𝑛2 𝑣𝑎𝑟(𝑥𝑖) = ∑ 𝜎2 = =
𝑛2 𝑛2 𝑛
Note: The standard deviation of the sample means is called standard error of the mean.
There are basically two ways of making inferences (generalization) in science. These are
- Deductive inference
- Inductive Inference
39
Deductive inference is from the general to the particular
Inductive inference is from the particular to the general (population).
There are generally two closely related concepts used to make inference about population
parameter, these are:
-Estimator and hypothesis testing techniques
Estimator is a formula (function of a sample observation) used to estimate unknown
population parameter.
∑ 𝑥𝑖
Example: 𝜇̂ = 𝑥̅ = is an estimator of population mean𝜇.
𝑛
One function of statistics is the provision of techniques for making inductive inferences and
for measuring the degree of un certainty of any inferences. That uncertainty is measured in
terms of probability.
To make inference about the population parameter, we should select the sample observation
(n) from a total (N), by a method of
- Without replacement or with replacement.
If the experiment is performed without replacement we have (𝑁𝑛 ) different samples. And if
the sample is selected with replacement we have 𝑁 𝑛 different sample observations with n-
equal size.
Distribution of Sample
Let x1, x2, …, xn denote a sample of size n. The distribution (probability) of the sample x1, x2,
…, xn is defined to be the joint distribution of x1, x2, …, xn.
Example: let x1, x2, …, xn be a random sample from a Bernoulli distribution i.e. xi ~
Bernoulli (pi), where p(X = xi) = 𝑝 𝑥𝑖 (1 − 𝑝)1−𝑥𝑖 𝑓𝑜𝑟 𝑥𝑖 = 0𝑎𝑛𝑑 1
Determine the distribution of the random samples?
40
(𝑥1, 𝑥2,….,𝑥𝑛)
Solution 𝑓𝑥1,𝑥2,…,𝑥𝑛 = 𝑓𝑥1 . 𝑓𝑥2 . , … , 𝑓𝑥𝑛 = ∏𝑛𝑖=1 𝑝(𝑥𝑖) = ∏𝑛𝑖=1 𝑝 𝑥𝑖 (1 − 𝑝)1−𝑥𝑖
𝑝∑ 𝑥𝑖 (𝑖 − 𝑝)∑(1−𝑥𝑖) = 𝑝∑ 𝑥𝑖 (𝑖 − 𝑝)𝑛−∑ 𝑥𝑖
Example-2 Let x1, x2, …, xn be a random sample from the poisson distribution. Find the
joint distribution of the n random sample.
Solution Given xi ~ poisson (𝜆) i = 1, 2, … n.
(𝑥1, 𝑥2,….,𝑥𝑛)
𝑓𝑥1,𝑥2,…,𝑥𝑛 =?
𝑒 −𝜆 𝜆𝑥𝑖
𝑓(𝑥𝑖) = (𝑥𝑖)!
xi = 0, 1, 2, ….
𝑛
(𝑥1, 𝑒 −𝜆 𝜆𝑥𝑖 𝑒 −𝑛𝜆 𝜆∑ 𝑥𝑖
𝑇ℎ𝑢𝑠, 𝑓𝑥1,𝑥2,…,𝑥𝑛 𝑥2,….,𝑥𝑛) = ∏ = 𝑛
(𝑥𝑖)! ∏𝑖=1(𝑥𝑖) !
𝑖=1
4.2. The Chi-square Distributions
Let X1, X2, …, Xn be independent standard normal random variables. The distribution of
X = ∑𝒏𝒊=𝟏 𝑿𝒊 𝟐 is called the chi-square distribution with degrees of freedom (df) n, and its
probability density function is given by
𝟏 𝒏
f(x/n) = 𝒆−𝒙⁄𝟐 𝒙𝟐−𝟏 x> 𝟎, 𝒏 > 𝟎.
𝟐𝒏⁄𝟐𝚪(𝒏⁄𝟐)
The chi-square random variable with df = n is denoted by𝑋𝑛 2 .
Applications:
The chi-square distribution is also called the variance distribution, because the variance of a
random sample from a normal distribution follows a chi-square distribution. Specifically, if X1,
…, Xn is a random sample from a normal distribution with mean 𝜇 and variance 𝜎 2 , then
𝒏
̅) 𝟐
(𝒙𝒊 − 𝒙 (𝒏 − 𝟏)𝑺𝟐
∑ = ~ 𝑿𝟐𝒏−𝟏
𝝈𝟐 𝝈𝟐
𝒊=𝟏
This distributional result is useful to make inferences about𝜎 2 .
41
 In categorical data analysis consists of an r×c table, the usual test statistic
(𝑂𝑖𝑗−𝐸𝑖𝑗)2
T = ∑𝑟𝑖=1 ∑𝑐𝑗=1 ~ 𝑿𝟐(𝒄−𝟏)(𝒓−𝟏) where Oij and Eij denote, respectively, the
𝐸𝑖𝑗
observed and expected cell frequencies. The null hypothesis of independent attributes will be
rejected at a level of significance 𝛼, if an observed value of T is greater than (1 - 𝛼)th quintile of a
chi-square distribution with df = (r - 1) × (c - 1).
The chi-square statistic T can be used to test whether a frequency distribution tests a specific
model.
Properties:
i. If the random variable Xi, I = 1, 2, …., k are normally and independently distributed with
(𝑥𝑖 −𝜇𝑖 )2 (𝑥𝑖 −𝜇𝑖 )2

mean 𝜇𝑖 𝑎𝑛𝑑 𝑣𝑎𝑟𝑎𝑖𝑛𝑐𝑒 𝛿𝑖 2 , then 𝑍𝑖 2 = 2 ~ 𝑋12 and U = ∑ 𝑍𝑖 2 = ∑𝑘𝑖=1
𝛿𝑖 𝛿𝑖 2
has a chi-square distribution with k-degree of freedom.
(𝑥𝑖−𝜇𝑖 )
Proof: write Zi = . Then Zi has a standard normal distribution. Now
𝛿𝑖
𝑘 𝑛
𝑡𝑢 ) 𝑡 ∑ 𝑍𝑖 2 𝑡𝑍𝑖 2 2
𝑀𝑢 (𝑡) = 𝐸(𝑒 = 𝐸(𝑒 ) = 𝐸 (∏ 𝑒 ) = ∏ 𝐸(𝑒 𝑡𝑍𝑖 ) 𝑠𝑖𝑛𝑐𝑒 𝑍𝑖 ′ 𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝑖=1 𝑖=1
1 2 1
2 ∞ 2 1 ∞ 1 (1−2𝑡)𝑍 2
But 𝐸(𝑒 𝑡𝑍𝑖 ) = ∫−∞ 𝑒 𝑡𝑍𝑖 ( 2 ) 𝑒 −2𝑍 𝑑𝑧 = ∫−∞ ( 2 ) 𝑒 −2 𝑑𝑧
√2𝜋 √2𝜋
1
1 ∞ √1−2𝑡 (1−2𝑡)𝑍 2 1 1
= ⌈∫−∞ 𝑒 −2 𝑑𝑧⌉ = 𝑓𝑜𝑟 𝑡 < 2 Since the expression in the parenthesis
√1−2𝑡 √2𝜋 √1−2𝑡
1
is the density of normal with mean 0 and variance(1−2𝑡).
2 1 1/2
Hence, E(𝑒 𝑡𝑍𝑖 ) = (1−2𝑡)
2 1 1/2 1 𝑘/2 1
Implies that, 𝑍𝑖 2 ~ 𝑋 2 (1) → ∏ki=1 E(𝑒 𝑡𝑍𝑖 ) = ∏ki=1 (1−2𝑡) = (1−2𝑡) 𝑓𝑜𝑟 𝑡 < 2 is the
moment generating function of a chi-square distribution with k-degree of freedom..
ii. If z1, z2, …, zn are a random sample of a standard normal distribution, then
42
1
a) 𝑧̅ ℎ𝑎𝑠 𝑎 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑚𝑒𝑎𝑛 0 𝑎𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑛.
b) 𝑧̅ 𝑎𝑛𝑑 ∑𝑛𝑖=1(𝑧𝑖 − 𝑧̅)2 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
c) ∑𝑛𝑖=1(𝑧𝑖 − 𝑧̅)2 has a chi-square distribution with n-1 degree of freedom.
Proof: ( page 244)
∑𝑛
𝑖=1 𝑍𝑖 ∑𝑛
𝑖=1(𝑋𝑖−𝜇) 𝑥̅ −𝜇
 Here 𝑧̅ = = =
𝑛 𝑛 𝑛
𝑥̅ −𝜇 𝜎2 /𝑛 1
→E(𝑧̅) = 0 and var(𝑧̅) = 𝑣𝑎𝑟 ( )= =𝑛
𝑛 𝜎2
 Consider case n = 2, if n = 2
(𝑛−1)𝑠2
iii. If x1, x2, …, xn is a random sample from n(𝜇, 𝜎 2 ). Then ~ 𝑋 2 (𝑛 − 1)
𝜎2
Proof:
𝑛
∑(𝑥𝑖 − 𝜇)2 = ∑(𝑥𝑖 − 𝑥̅ + 𝑥̅ 𝜇)2 =

𝑖−1
iv. If X and Y are independent chi-square distribution with k and m degree of freedom, then X +
Y is chi-square distribution with k + m degree of freedom.
Proof:
X ~ 𝑋 2 (𝑘) 𝑎𝑛𝑑 𝑌 ~𝑋 2 (𝑚)

𝑘
Mx(t) = (1 − 2𝑡)−2 𝑎𝑛𝑑 𝑀𝑦 (𝑡) = (1 − 2𝑡)−𝑚/2
MX+Y(t) = E(𝑒 𝑡(𝑥+𝑌) ) = 𝐸(𝑒 𝑡𝑥+𝑡𝑦 ) = 𝐸(𝑒 𝑡𝑥 )𝐸(𝑒 𝑡𝑦 )𝑠𝑖𝑛𝑐𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝑘 (𝑘+𝑀)
Mx+y(t) = (1 − 2𝑡)−2 (1 − 2𝑡)−𝑚/2 = (1 − 2𝑡)− 2 which is the moment generating
function of chi-square distribution with k + m degree of freedom.
→If X1, X2, …, Xk are independent chi-square random variables with degrees of freedom
n1, …, nk, respectively, then
43
𝑘
2
∑ 𝑋𝑖 ~ 𝑋𝑚 𝑤ℎ𝑒𝑟𝑒 𝑚 = ∑ 𝑛𝑖
𝑖=1
4. The gamma distribution with shape parameter a and scale parameter b specializes to the
chi-square distribution with df = n when a = n/2 and b = 1/2. That is, gamma(n/2, 1/2)
~ 𝑋𝑛2 .
Theorem: If the random variable X has a chi-square distribution, then
E(x) = n Var(x) = 2n and Mx(t) = (1 − 2𝑡)−𝑛/2
Proof: from gamma distribution
𝑟 𝑛⁄2 𝑟 𝑛 1 2 𝑛
E(x) = 𝜆 = 1⁄ = 𝑛 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = = ÷ (2) = × 4 = 2n//
2 𝜆2 2 2
𝑚𝑣 2
Example: Suppose the velocity (V) of an object has N (0, 1). Let k = be the kinetic energy
2
of an object.
a. Find the pdf of Y where Y = 𝑉 2
𝑚𝑣 2
b. Find the pdf of k = 2
Solution: Y = 𝑉 2 ⟹ 𝑉 = ±√𝑦 = ℎ−1 (𝑦)𝑎𝑛𝑑 𝑓𝑣 (𝑉) =
1 2
1
𝑒 −2𝑣 𝑠𝑖𝑛𝑐𝑒 𝑉~ 𝑁(0, 1)
√2𝜋
𝑑ℎ−1 (𝑦)
→ g(y) = {𝑓(ℎ−1 (𝑦)) + 𝑓(ℎ−1 (𝑦))} | |
𝑑𝑦
1
→ g(y) = {𝑓(√𝑦) + 𝑓(−√𝑦)} |2 𝑦|
√
1 2 1 2
1 1 1
Thus, g(y) = 2 { 𝑒 −2(√𝑦) + 𝑒 −2(−√𝑦) }
√ 𝑦 √2𝜋 √2𝜋
1 1
1 − 𝑦 −
1 2 − (√𝑦)2 𝑒 2 𝑦 2 1
g(y) = { 𝑒 2 } = where Γ (2) = √Π
2√𝑦 √2𝜋 √2𝜋√𝜋
44
1 1
− 𝑦 −
𝑒 2 𝑦 2
→ 𝑔(𝑦) = 1 ~ 𝑋 2 (1) 𝑜𝑟 𝑋 2 (𝑛 = 1)
√2𝜋Γ(2)
𝑛 −1/2
𝑒
𝑥2
Because f(x) = 𝑛 1 ~ 𝑋 2 (𝑛)
2 2 22
𝑚𝑣 2
ii. Find the pdf of k = 2
Solution:
1 1 2
𝑓𝑣 (𝑉) = 𝑒 −2𝑣 −∞<𝑣 <∞
√2𝜋
2𝑘
ℎ−1 (𝑘) = 𝑣 = ±√
𝑚
𝑑ℎ−1 (𝑘)
g(k) = {𝑓(ℎ−1 (𝑘)) + 𝑓(ℎ−1 (𝑘))} | |
𝑑𝑘
2𝑘
𝑑(√ )
𝑑ℎ−1 (𝑘) 𝑚 2 1 1
where = = √𝑚 (2√𝑘) =
𝑑𝑘 𝑑𝑘 √2𝑚𝑘
2
1 2𝑘 1 2𝑘
1 − (√ )2 1 − (√ ) 1
2 𝑚
g(k) = { 𝑒 2 𝑚 + 𝑒 }
√2𝜋 √2𝜋 √2𝑚𝑘
2 −𝑘 1
→ 𝑔(𝑘) = ( 𝑒 𝑚)
√2𝜋 √2𝑚𝑘
1 1 𝑘
1 − −
2 −𝑘 ( )2 𝑘 2 𝑒 𝑚
𝑚
→ 𝑔(𝑘) = 𝑒 𝑚 =
2√𝜋𝑚𝑘 √𝜋
1
1 2 1 𝑘
(𝑚 ) 𝑘 − 2 𝑒 − 𝑚 1 1 1 1
→ 𝑔(𝑦) = ~ 𝑔𝑎𝑚𝑚𝑎 ( , ) →𝑟= 𝑎𝑛𝑑 𝜆 =
1 2 𝑚 2 𝑚
Γ(2)
4.3.
45

Statistical Theory of Distribution: Stat 471

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Theory of Distribution: Stat 471

Uploaded by

Copyright:

Available Formats

Statistical Theory of Distribution

1. Review of Probability Theory (3 lecture hours)

Statistical Theory of Distributions

Chapter one: Review of Probability Theory

1.1 Probability concepts

deterministic and non-deterministic situations. The possible value for the

occurrence of non-deterministic event (what we want to happen) is called

Random Experiment: An experiment whose outcomes are determined

only by chance factors is called a random experiment.

Sample Space: The set of all possible outcomes of a random experiment

A sample space is said to be a discrete sample space if it has finitely many or a

countable infinity of elements. If the elements (points) of a sample space

plane –the sample space is said to be a continuous sample space.

sample space is called an event.

Random Variable: A variable whose numerical values are determined by

chance factors is called a random variable. Formally, it is a function from the

sample space to a set of real numbers.

Discrete Random Variable: If the set of all possible values of a random

variable X is countable, then X is called a discrete random variable.

Continuous Random Variable: If the set of all possible values of X is an

interval or union of two or more non overlapping intervals, then X is called a

continuous random variable

Probability of an Event: If all the outcomes of a random experiment are

equally likely, then the probability of an event A is given by

Number of outcomes in the event A

Axioms of probability theory are

A1. 0 ≤ 𝑃(𝑎𝑛 𝑒𝑣𝑒𝑛𝑡) ≤ 1 ∀ 𝑒𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 𝑆

events A and B in S is given as 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

Conditional probability, if A and B are any two events in S and𝑃(𝐵) ≠ 0,

1.2 Distribution functions

probability mass function of X. Such that:

satisfies the following requirements is called a probability density function:

We cannot compute a single point in Continous random variable. Example,

Cumulative Distribution Function (cdf): The cdf of a random variable X is

defined by F(x) = P(X x) for all x.

For a continuous random variable X with pdf f(x),

For a discrete random variable X, the cdf is defined by

F(k) = P(X ≤ k) = ∑𝑘𝑖= −∞ 𝑝(𝑥 = 𝑖)

Joint probability distributions

f(x,y) is the joint pmf if f(x,y) is the joint pdf if

Marginal probability distributions:

∑𝑦 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠

∑𝑥 𝑓(𝑥, 𝑦)𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠

Exercise: Let f(x, y) = 𝑒 −𝑥−𝑦 for x> 0, 𝑦 > 0, then

i. Verify that the definition of joint pdf is satisfied.

ii. Find the marginal of X and Y?

g(X), where g is a real valued function, is defined by

If X is a discrete r.v, then

where the sum is over all possible values of X.

The moments are a set of constants that represent some important

measures of central tendency (mean, median, and mode), and measures of

the coefficient of skewness and the coefficient of kurtosis.

1; 2,…. That is,

where µ = E(x) is the mean of X.

1.5 Moment and probability generating Functions

The moment generating function of a random variable X is defined by

provided that –h ≤ 𝑡 ≤ h for too small h.

The moment generating function is useful in deriving the moments of X.

Example: let X be a r.v with pdf

i. Find the moment generating function of X.

ii. The 1st moment of x = E(x) = M’x(t)|t=0

𝑓(𝑥) 𝑓 ′ (𝑥)𝑔(𝑥)− 𝑔′ (𝑥)𝑓(𝑥)

𝑑(𝑀𝑥 (𝑡) 𝑒 (𝑡−1)𝑥 (𝑡 − 1) − 𝑒 (𝑡−1)𝑥