Professional Documents
Culture Documents
Course Outline
Textbook
1
Statistical Theory of Distribution
Mood, A.M, Graybill, F. and Boes, D.C. (1974). Introduction to the Theory of Statistics
(3rd Edition). McGraw Hill Series.
References
1. Balakrishnan N. and Nevzorov V.B (2003). A Primer on Statistical Distribution. Wiley-
Interscience.
2. Evans, Hastings N and Peacock B. (2000). Statistical Distributions (3rd Edition). Wiley-
Interscience.
3. Friedlander, F.G. and Joshi M. (2008). Introduction to the Theory of Distributions (2nd
Edition). Cambridge University Press.
4. Hogg, R.V. and Tanis, E. (2009). Probability and Statistical Inference (8th Edition).
Prentice Hall.
5. Johnson, N.L., Kemp, A.W, and Kotz, S. (2005). Univariate Discrete Distributions (3rd
Edition). Wiley Series in Probability and Statistics. Wiley-interscience.
6. Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distribution.
Volume 1, Wiley Series in Probability and Statistics. Wiley-Interscience.
7. Krishnamoorthy, K. (2006). Handbook of Statistical Distributions with Applications.
Chapman and Hall/CRC.
8. Pathak, R.S. (2000). A course in Distribution Theory and Applications. Narosa.
9. Ross, S. (2006). Introduction to Probability Models (9th Edition). Academic Press.
10. Severini, T.A. (2005). Elements of Distribution Theory. Cambridege Series in Statistical
and Probabilistic Mathematics, Cambridge University Press.
11. Stuart, A. and Ord, K. (2009). Distribution Theory (6th edition). Kendall's Advanced
Theory of Statistics Volume 1. Wiley.
12. Walpole, R.E., Myers, R.H., Myers, S.L. and Ye, K. (2006). Probability and Statistics for
Engineers and Scientists (8th edition). Prentice-Hall.
2
Statistical Theory of Distribution
(Stat 471)
What is probability?
The total situation in life would be to know with certainty what is going to
happen next. This case almost never has been because the world is full of
probability.
is called a sample space. A sample apace may be finite or infinite sample space.
constitute a continuum-for example, all the points on a line, all the points in a
Event: The collection of none, one, or more than one outcomes from a
3
Statistical Theory of Distribution
A2. 𝑃(𝑆) = 1
A3. If events Ai are mutually exclusive, 𝑃(∪ 𝐴𝑖 ) = ∑ 𝑃(𝐴𝑖 ), we call this axiom
special addition rule. The general addition rule for probability of any two
Probability Mass Function (pmf): Let R be the set of all possible values of a
discrete r.v X; and p(k) = P( X = k) for each k in R. Then p(k) is called the
0 ≤ p(k) ≤ 1 and ∑∞
−∞ 𝑝(𝑘) =1
4
Statistical Theory of Distribution
Probability Density Function (pdf): Any real valued function f(x) that
𝑥 0≤𝑥≤1
Exercise: Let f(x) = {2 − 𝑥 1 ≤ 𝑥 ≤ 2 Then show that f(x) is pdf of the random
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
variable X?
XY discrete XY Continous
1. f(x, y) ≥ 0 ∀ 𝑥, 𝑦 1. f(x, y) ≥ 0 ∀ 𝑥, 𝑦
∞ ∞
2. ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1 2. ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1
5
Statistical Theory of Distribution
1.3 Expectations
If X is a continuous random variable with the pdf f(x), then the expectation of
E(g(X)) = ∑𝑘 𝑔(𝑥)𝑃(𝑥 = 𝑘)
1.4 Moments
properties of the distributions. The most commonly used such constants are
dispersion (variance and mean deviation). Two other important measures are
Moments about the Origin (Raw Moments): The moments about the origin are
obtained by finding the expected value of the r.v that has been raised to k, k =
6
Statistical Theory of Distribution
Moments about the Mean (Central Moments): When the r.v is observed in
terms of deviations from its mean, its expectation yields moments about the
mean or central moments. The first central moment is zero, and the second
central moment is the variance. The third central moment measures the degree
of skewness of the distribution, and the fourth central moment measures the
degree of flatness.
The kth moment about the mean or the kth central moment of a random variable
X is defined by
µk = E(x - µ)k , k = 1, 2, …
Mx(t) = E(etx)
Specifically
𝑑𝑘 𝐸(𝑒 𝑡𝑥 )
E(xk) = |t=0 k = 1, 2 …
𝑑𝑡 𝑘
𝑒 −𝑥 𝑓𝑜𝑟 𝑥 ≥ 0
f(x) = {
0 𝑓𝑜𝑟 𝑥 ≤ 0
ii. Calculate the first and second moments of X from the origin.
Solution
7
Statistical Theory of Distribution
∞ ∞
i. Mx(t) = E(𝑒 𝑡𝑥 ) = ∫0 𝑒 𝑡𝑥 𝑒 −𝑥 𝑑𝑥 = ∫0 𝑒 (𝑡−1)𝑥 𝑑𝑥
𝑒 (𝑡−1)𝑥
= 𝑡−1
Thus,
𝑒 −𝑥 (−1)−𝑒 −𝑥 −𝑒 −𝑥 −𝑒 −𝑥
M’x(t)|t=0 = = -2𝑒 −𝑥 //
(−1)2 (−1)2
M’’x(t) =
P(t) = ∑∞ 𝑖
𝑖=0 𝑡 𝑃(𝑋 = 𝑖)
So that
1 𝑑𝑘 𝑃(𝑡)
P(X = k) = ( )|t=0 k = 1, 2, …
𝑘! 𝑑𝑡 𝑘
𝑑𝑃(𝑡)
Furthermore, P(0) = P(X = 0) and |t=1 = E(X).
𝑑𝑡
8
Statistical Theory of Distribution
Chapter Two
as 1 and 0.
Let X1, …., Xn be mutually independent Bernoulli trials, each with success
probability p. Then
9
Statistical Theory of Distribution
There are only two possible outcomes for each trial success and failure.
Proof:
E(y) = E( ∑𝑛𝑖=1 𝑋𝑖 )
= np
= ∑𝑛𝑖=1 𝑝(1 − 𝑝)
= np (1-p)
Proof:
10
Statistical Theory of Distribution
𝑛 𝑛
𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑𝑛𝑥=0 𝑒 𝑡𝑥 ( ) 𝑝 𝑥 𝑞 𝑛−𝑥 = ∑𝑛𝑥=0 ( ) (𝑝𝑒 𝑡 )𝑥 𝑞 𝑛−𝑥 = (𝑝𝑒 𝑡 + 𝑞)𝑛
𝑥 𝑥
b/c for any value of a and b such that a and b, a > 0 , b > 0, and positive
𝑛
integer x and n ∑𝑛𝑥=0 ( ) (𝑎)𝑥 𝑏 𝑛−𝑥 = (a + b)n
𝑥
Now 𝑚𝑋′ (𝑡) = 𝑛𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 𝑞)𝑛−1 → 𝐸(𝑋) = 𝑚𝑋′ (0) = 𝑛𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−1 = 𝑛𝑝
→ 𝑚𝑋′′ (0) = 𝑛(𝑛 − 1)𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−2 + 𝑛𝑝𝑒 0 (𝑝𝑒 0 + 𝑞)𝑛−1 = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝
2
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − (𝑚𝑋′ )2 = 𝑛𝑝𝑞
𝑛 𝑛
Px(t) = 𝐸(𝑡 𝑥 ) = ∑𝑛𝑥=0 𝑡 𝑥 ( ) 𝑝 𝑥 𝑞 𝑛−𝑥 = ∑𝑛𝑥=0 ( ) (𝑝𝑡)𝑥 𝑞 𝑛−𝑥 = (𝑝𝑡 + 𝑞)𝑛
𝑥 𝑥
i. the probability of X at x =
1and 2
11
Statistical Theory of Distribution
Properties
Let X1, X2,…, , Xm be independent random variables with Xi ~ binomial(n i ;
p),
i= 1,2, …, m. Then ∑𝑚
𝑖=1 𝑋𝑖 ~ 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙(∑ 𝑛𝑖, 𝑝)
of 𝑓(𝑥; 𝑛, 𝑝)
likely.
𝑁+1 𝑁 2 −1 1 𝑒 𝑡 1− 𝑒 𝑁𝑡
𝑬(𝑋) = 𝑉(𝑋) = 𝑎𝑛𝑑 𝑚𝑋 (𝑡) = 𝑁 ∑𝑁
𝑥=1 𝑒
𝑥𝑡
= { 1− 𝑒 𝑡) }
2 12 𝑁
Proof:
𝑘(𝑘+1)
Remark ∑𝑘𝑖=1 𝑖 = ∑𝑘𝑖=1 𝑖 2 = 𝑘(𝑘 + 1)(2𝑘 + 1)/6
2
1 1 1 𝑁+1 𝑁+1
𝐸(𝑋) = ∑𝑁 𝑁 𝑁
𝑥=1 𝑥𝑓(𝑥) = ∑𝑥=1 𝑥 𝑁 = 𝑁 ∑𝑥=1 𝑥 = 𝑁 ( )𝑁 =
2 2
12
Statistical Theory of Distribution
(𝑁+1)(𝑁−1)
12
pregnant, measured from the moment she had decided to become pregnant. We
that the probability that a woman becomes pregnant during a particular cycle is
= (1 − p)k-1p for k = 1, 2, . . .
There are only two possible outcomes for each trial success and failure.
13
Statistical Theory of Distribution
𝒒 𝒒 𝒑
𝑬(𝑿) = 𝒑 𝑽(𝑿) = 𝒑𝟐 𝒎𝑿 (𝒕) = 𝟏−𝒒𝒆𝒕
𝑑𝑞 𝑥
Proof: 𝐸(𝑋) = ∑∞ 𝑥 ∞
𝑥=0 𝑥𝑝𝑞 = 𝑝𝑞 ∑𝑥=0 𝑥𝑞
𝑥−1
= 𝑝𝑞 ∑∞
𝑥=0 =
𝑑𝑞
𝑑 𝑑 1 1
𝑝𝑞 𝑑𝑞 ∑∞ 𝑥
𝑥=0 𝑞 = 𝑝𝑞 𝑑𝑞 (1−𝑞 ) = 𝑝𝑞 (1−𝑞)2 = 𝑞/𝑝
∞ ∞ ∞
2) 2 𝑥 2 2 𝑥−2
𝐸(𝑋 = ∑ 𝑥 𝑝𝑞 = 𝑝𝑞 ∑ 𝑥 𝑞 = 𝑝𝑞 ∑(𝑥(𝑥 − 1) + 𝑥) 𝑞 𝑥−2
2
∞ ∞ ∞ ∞
2 𝑥−2 2 𝑥−2 2
𝑑𝑞 𝑥−1
= 𝑝𝑞 ∑ 𝑥(𝑥 − 1) 𝑞 + 𝑝𝑞 ∑ 𝑥𝑞 = 𝑝𝑞 ∑ 𝑥 + 𝑝𝑞 ∑ 𝑥𝑞 𝑥−1
𝑑𝑞
𝑥=0 𝑥=0 𝑥=0 𝑥=0
∞
𝑑 𝑞 𝑑 1 𝑞 2(1 − 𝑞) 𝑞
= 𝑝𝑞 2 ∑ 𝑥𝑞 𝑥−1 + = 𝑝𝑞 2 ( 2
) + = 𝑝𝑞 2 ( 4
)+
𝑑𝑞 𝑝 𝑑𝑞 (1 − 𝑞) 𝑝 (1 − 𝑞) 𝑝
𝑥=0
𝑞 2 𝑞 2𝑞 2 + 𝑝𝑞
=2 + =
𝑝2 𝑝 𝑝2
2 2𝑞 2 +𝑝𝑞 𝑞2 𝑞 2 +𝑝𝑞 𝑞
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = − 𝑝2 = = 𝑝2
𝑝2 𝑝2
∞ ∞
1 𝑝
𝑚𝑋 (𝑡) = ∑ 𝑒 𝑡𝑥 𝑝𝑞 𝑥 = 𝑝 ∑(𝑞𝑒 𝑡 )𝑥 = 𝑝 𝑡
=
1 − 𝑞𝑒 1 − 𝑞𝑒 𝑡
𝑥=0 𝑥=0
parameter p, then
𝑃[𝑋 ≥ 𝑠]
14
Statistical Theory of Distribution
failures until the first success is called geometric distribution. Now let us
consider the r.v X that denotes the number of failures until the rth success.
If the r.v X has a negative binomial or pascal distribution with parameter r and
= (𝑘+𝑟−1
𝑟−1
)𝑝𝑟−1 𝑞 𝑘 × 𝑝
Thus,
−𝑟 𝑟
P(x=k/r, p) = (𝑘+𝑟−1)𝑝𝑟 𝑞 𝑘 =(𝑘+𝑟−1)𝑝𝑟 𝑞 𝑘 = ( ) 𝑝 (−𝑞)𝑘 𝑓𝑜𝑟 𝑘 = 0,1,2
𝑟−1 𝑘 𝑘
waiting-time r.v. It represents how long (in terms of the number of failures) one
𝒓𝒒 𝒓𝒒 𝒑 𝒓
𝑬(𝑿) = 𝑽(𝑿) = 𝒑𝟐 𝒎𝑿 (𝒕) = [𝟏−𝒒𝒆𝒕]
𝒑
𝑟
𝑡𝑥 −𝑟 −𝑟 𝑟 𝑝
Proof: 𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑥 ) = ∑∞ 𝑟 𝑥 ∞ 𝑡 𝑥
𝑥=0 𝑒 ( 𝑥 ) 𝑝 (−𝑞) = ∑𝑥=0 ( 𝑥 ) 𝑝 (−𝑞𝑒 ) = [1−𝑞𝑒 𝑡 ]
15
Statistical Theory of Distribution
Suppose that events that occur over a period of time or space satisfy the
following:
independent.
𝜆𝑥 𝑒 −𝜆
𝑓(𝑥; 𝜆) = 𝑥 = 0, 1, 2, … ∞ 𝑎𝑛𝑑 𝜆 > 0
𝑥!
𝒆𝒕𝒙 𝒆−𝜆 𝜆 𝒙 𝜆𝑥
Proof: 𝒎𝑿 (𝑡) = 𝐸(𝑒 tx ) = ∑∞
𝒙=𝟎 The infinite series ∑∞
=0 𝑥! is called the
𝒙!
𝒙
(𝜆 𝑒 𝑡 )
𝒎𝑿 (𝑡) = 𝒆−𝜆 ∑∞
𝑡 𝑡 −1)
→ 𝒙=𝟎 = 𝒆−𝜆 𝒆𝜆 𝑒 = 𝑒 𝜆(𝑒
𝒙!
16
Statistical Theory of Distribution
𝑡
𝒎′𝒙 (𝑡) = 𝜆 𝒆−𝜆 𝒆𝒕 𝒆𝜆 𝑒 → 𝐸(𝑋) = 𝒎′𝒙 (0) = 𝜆
𝑡
→ 𝑚𝑥′′ (𝑡) = 𝜆 𝑒 −𝜆 𝑒 𝑡 𝑒 𝜆 𝑒 (𝜆 𝑒 𝑡 + 1)
2
→ 𝑚𝒙′′ (0) = 𝜆(𝜆 + 1) ≫ 𝑉(𝑋) = 𝑚𝑥′′ (0) − (𝑚𝑥′′ (𝑡)) = 𝜆
𝒕𝒙 𝒆−𝜆 𝜆 𝒙 (𝜆 𝑡)𝒙
Px(t) = 𝐸(𝑡 x ) = ∑∞
𝒙=𝟎 = 𝒆−𝜆 ∑∞
𝒙=𝟎 = 𝒆−𝜆 𝒆𝜆 𝑡 = 𝑒 𝜆(𝒕−1)
𝒙! 𝒙!
words, for large n and small p, the binomial distribution can be approximated
To verify that let us consider P(S) =1 and based on this formula we have
𝑒 −𝜆 𝜆𝑥 𝜆𝑥
∑∞ ∞
𝑥=0 𝑓(𝑥) = ∑𝑥=0 = 𝑒 −𝜆 ∑∞
=0 𝑥!
𝑥!
→ ∑∞ −𝜆 𝜆
𝑥=0 𝑓(𝑥) = 𝑒 𝑒 = 1.
𝜆𝑥 𝑒 −𝜆
constant, the limiting form of the binomial distribution is for x= 0, 1, 2,
𝑥!
First let us substitute 𝜆/𝑛 for p into the formula for the binomial distribution
1 2 𝑥−1
(1 − 𝑛) (1 − 𝑛) … (1 − 𝑛 )
𝑥
𝜆 𝑛−𝑥
= 𝜆 (1 − )
𝑥! 𝑛
1 2 𝑥−1
If we let 𝑛 → ∞, we find that (1 − 𝑛) (1 − 𝑛) … (1 − ) → 1 and that
𝑛
17
Statistical Theory of Distribution
𝑛 𝜆
𝜆 𝑛−𝑥 𝜆 𝜆 𝜆 −𝑥
(1 − 𝑛) = [(1 − ) ] (1 − 𝑛) = 𝑒 −𝜆
𝑛
𝜆𝑥 𝑒 −𝜆
Hence, the binomial distribution f(x; n, p) approaches for x= 0, 1, 2, …
𝑥!
Remark
2. Recurrence relations:
𝜆
P(X = k /𝜆) = (𝑘) 𝑃(𝑋 = 𝑘 − 1) k = 1, 2, ….
𝜆
P(X = k + 1/𝜆) = (𝑘+1) 𝑃(𝑋 = 𝑘) k = 0, 1, 2, …
𝑘
P(X = k -1/𝜆) = (𝜆) 𝑃(𝑋 = 𝑘) k = 1, 2, 3, …
𝜆1
binomial(n, )
𝜆1+𝜆2
= ….= pm=1/m
18
Statistical Theory of Distribution
4. Approxmation:
𝑘−𝜆+0.5
Normal: P(X ≤ k /𝜆) ≅ 𝑃( 𝑍 ≤ )
√𝜆
𝑘−𝜆− 0.5
P(X ≥ k /𝜆) ≅ 𝑃( 𝑍 ≥ ) where X is a poisson (𝜆) r.v and Z is
√𝜆
𝑀 𝑁−𝑀
( )( )
𝑓(𝑥; 𝑛, 𝑀, 𝑁) = 𝑥 𝑛−𝑥
𝑁 𝑓𝑜𝑟 𝑥 = 0, 1, 2, … , 𝑛 where N is a positive integer, M is a
( )
𝑛
function.
N balls, of which M are white and (N-M) are black. Of these n balls are chosen
balls is 1/(𝑁𝑛). Let X be a r.v that denotes the number of white balls drawn. The
19
Statistical Theory of Distribution
𝑀 𝑁−𝑀
( )( )
𝑃(𝑋 = 𝑥) = 𝑥 𝑛−𝑥 𝑓𝑜𝑟 𝑥 = 0, 1, 2, … , 𝑛
𝑁
( )
𝑛
i. A single trial results in one of the two possible outcomes, say A and A’
replacement.
If 10 of them are randomly chosen for inspection, what is the probability that 2
𝑴 𝑴 𝑵−𝑴 𝑵−𝒏
𝑬(𝑿) = 𝒏 𝑵 𝑽(𝑿) = 𝒏 𝑵 𝑵 𝑵−𝟏
20
Statistical Theory of Distribution
𝑎 𝑏 𝑎+𝑏
using ∑𝑚
𝑖=0 ( 𝑖 ) ( )=( )
𝑚−𝑖 𝑚
𝑀(𝑀 − 1)
= 𝑛(𝑛 − 1)
𝑁(𝑁 − 1)
2 2
𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝐸(𝑋(𝑋 − 1)) + 𝐸(𝑋) − (𝐸(𝑋))
𝑀(𝑀 − 1) 𝑀 𝑀 2 𝑀 (𝑁 − 𝑀)(𝑁 − 𝑛)
= 𝑛(𝑛 − 1) + 𝑛 − (𝑛 ) = 𝑛
𝑁(𝑁 − 1) 𝑁 𝑁 𝑁 𝑁(𝑁 − 1)
𝑀 𝑀 𝑁−𝑛
=𝑛 (1 − )
𝑁 𝑁 𝑁−1
𝑃(𝑋=𝑘)𝑃(𝑌=𝑠−𝑘)
P(X = k/X + Y =s) = which simplifies to
𝑃(𝑋+𝑌=𝑠)
(𝑚 𝑛
𝑘 )(𝑠−𝑘)
P(X = k/X + Y =s) = (𝑚+𝑛
𝑠 )
m + n).
Approximation:
P( X = k) ≅ (𝑛𝑘)𝑝𝑘 𝑞 𝑛−𝑘
ii. Let (M/N) be small and n is large such that n(M/N) = 𝜆. Then
21
Statistical Theory of Distribution
𝑒 −𝜆 𝜆𝑘
P(X = k) ≅ 𝑘!
can have more than two possible outcomes. This happens, for example, when a
problem in general, let us consider the case where there are n independent
Referring to the outcomes as being of the first kind, the second kind, …, and
outcomes of the first kind, x2 outcomes of the second kind, …, and xk outcomes
𝑛!
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑘 ) = 𝑥 𝑝1𝑥1 𝑝2𝑥2 … 𝑝𝑘𝑥𝑘 𝑓𝑜𝑟 𝑥𝑖 = 0,1, … , 𝑛
1 !𝑥2 ! … 𝑥𝑘 !
multinomial distribution
Example: The probabilities that the light bulb of a certain kind slide projector
will last fewer than 40 hours of continuous use, anywhere from 40 to 80 hours
of continuous use, or more than 80 hours of continuous use, are 0.3, 0.5 and
0.2. Find the probability that among eight such bulbs 2 will last fewer than 40
22
Statistical Theory of Distribution
hours, 5 will last anywhere from 40 to 80 hours, and 1 will last more than 80
hours.
8!
Solution: 𝑓(2,5,1) = 2!5!1! (0.3)2 (0.5)5 (0.2) = 0.0945
Theorem: If the random variable X has a uniform distribution on [a, b], then
𝑎+𝑏 (𝑏−𝑎)2 𝑒 𝑏𝑡 −𝑒 𝑎𝑡
E(x) = Var(x) = Mx(t) = (𝑏−𝑎)𝑡
2 12
𝑏 1 𝑏 2 −𝑎2 𝑎+𝑏
Proof: E(x) = ∫𝑎 𝑥 𝑏−𝑎dx = = //
2(𝑏−𝑎) 2
𝑏−𝑎 2
=( ) //
12
𝑏 1 𝑒 𝑏𝑡 −𝑒 𝑎𝑡
Mx(t) = E(𝑒 𝑡𝑥 ) = ∫𝑎 𝑒 𝑡𝑥 𝑏−𝑎 𝑑𝑥 = (𝑏−𝑎)𝑡
//
23
Statistical Theory of Distribution
1 (𝑋−𝜇)2
f(x; 𝜇, 𝜎) = exp(− ) for −∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞ 𝑎𝑛𝑑 𝜎 > 0
√2𝜋 𝜎 2𝜎2
The mean 𝜇 is the location parameter, and the standard deviation 𝛿 is the scale
parameter.
1 𝑧2
f(z) = exp(− 2 ) for −∞ < 𝑧 < ∞,
√2𝜋
(𝑥−𝜇)⁄
𝑥−𝜇 1 𝜎 𝑥−𝜇
P(x≤ 𝑥) = 𝑃 (𝑍 ≤ )= ∫ 𝑒𝑥𝑝(−𝑡 2 ⁄2) 𝑑𝑡 = Φ ( )
𝜎 √2𝜋 −∞ 𝜎
𝜎2 𝑡2
E(X) = 𝜇 Var(X) = 𝛿 2 Mx(t) = 𝑒 𝜇𝑡+ 2
Proof:
∞ −1 𝑥−𝜇 2
1
𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) 𝑡𝜇
= 𝑒 𝐸(𝑒 𝑡(𝑋−𝜇)
)=𝑒 𝑡𝜇
∫ 𝑒 𝑡(𝑥−𝜇)
𝑒 2 ( 𝜎 ) 𝑑𝑥
−∞ 𝜎√2𝜋
∞
1 −1
[(𝑥−𝜇)2 −2𝜎2 𝑡(𝑥−𝜇)]
= 𝑒 𝑡𝜇 ∫ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞
24
Statistical Theory of Distribution
and we have
𝜎2 𝑡2 ∞ −1
1 (𝑥−𝜇−𝜎2 𝑡)
𝑚𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = 𝑒 𝑡𝜇 𝑒 2 ∫ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞
1
The integral together with the factor is necessarily 1 since it is the area
𝜎√2𝜋
𝜎2 𝑡2 𝜎2 𝑡2
Hence 𝑚𝑋 (𝑡) = 𝑒 𝑡𝜇 𝑒 2 = 𝑒 𝜇𝑡+ 2
2
𝐸(𝑋) = 𝑚𝑋′ (0) = 𝜇 𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − 𝜇 2 = 𝜎 2
Example: Suppose that an instructor assumes that a student’s final score is the
below 𝜇 − 2𝜎, then the proportions of each grade given can be calculated. For
example, since
𝜇+𝜎−𝜇
𝑃[𝑋 > 𝜇 + 𝜎] = 1 − 𝑃[𝑋 < 𝜇 + 𝜎] = 1 − 𝐹(𝑧) = 1 − 𝐹 ( ) = 1 − 𝐹(1) ≈
𝜎
0.1587
Consider a Poisson process with mean 𝜆, where we count the events occurring
in a given interval of time or space. Let X denote the waiting time until the first
25
Statistical Theory of Distribution
= exp(-x𝜆)
and hence
The distribution in (*) is called the exponential distribution with mean waiting
1 𝑥
f(x/b) = 𝑏 exp (− 𝑏) , 𝑥 > 0, 𝑏 > 0.
Remark: Before we calculate the mean and variance of exponential and gamma
functions,
∞
We have: Γ(𝑡) = ∫0 𝑥 𝑡−1 𝑒 −𝑥 𝑑𝑥 ……. …………… (**)
∞ Γ(𝑡)
Also ∫0 𝑥 𝑡−1 𝑒 −𝜆𝑥 𝑑𝑥 = …………………..(****)
𝜆𝑡
1 ∞
Γ (2) = ∫0 𝑥 −1/2 𝑒 −𝑥 𝑑𝑥 = √𝜋 …………..(*****)
∞ 𝜆Γ(3) 2
E(𝑋 2 ) = 𝜆 ∫0 𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = =
𝜆3 𝜆2
2 1 1
V(x) = E(x2) – (E(x))2 = − =
𝜆2 𝜆2 𝜆2
1 𝜆
Mx(t) = =
(1−𝑏𝑡) 𝜆−𝑡
26
Statistical Theory of Distribution
∞ ∞
Proof: Mx(t) = E (𝑒 𝑡𝑥 ) = ∫0 𝑒 𝑡𝑥 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆 ∫0 𝑒 −𝑥(𝜆−𝑡) 𝑑𝑥
𝜆Γ(1) 𝜆
= =
𝜆−𝑡 𝜆−𝑡
Properties:
1. Memory less Property: For a given t > 0 and s > 0, P(X > t + s/X > s) =
Then
distribution with mean 1/𝜆¸λ > 0. An exponential random variable with mean
1/𝜆 represents the waiting time until the first event to occur, while the gamma
random variable X represents the waiting time until the 𝑟th event to occur.
Therefore,
X = ∑𝑟𝑖=1 𝑌𝑖
Where Y1, Y2, …, Y𝑟 are independent exponential random variables with mean
1/𝜆
𝜆𝑟 𝑒 −𝜆𝑥 𝑥 𝑟−1
f(x ) = 𝑥 > 𝑜, 𝑟 > 0, 𝜆 > 0.
Γ(𝑟)
27
Statistical Theory of Distribution
The distribution defined above is called the gamma distribution with shape
The gamma probability density plots in Figure 1 indicate that the degree of
Figure 1.
r=1
r=2
r=4
0 1 2 3 x
If the random variable X has a gamma distribution with parameter r and λ, then
𝒓 𝒓 𝝀 𝒓
𝑬(𝑿) = 𝑽(𝑿) = 𝒎𝑿 (𝒕) = ( ) for t<λ
𝝀 𝝀𝟐 𝝀−𝒕
Proof:
𝜆𝑟 −∞ 𝜆𝑟 Γ(𝑟+1) 𝑟
i. E(x) = ∫ 𝑥𝑥 𝑟−1 𝑒 −𝜆𝑥 𝑑𝑥 = =
Γ(𝑟) 0 Γ(𝑟)𝜆𝑟+1 𝜆
𝜆𝑟 −∞ 𝜆𝑟 Γ(𝑟+2) 𝑟(𝑟+1)
E(𝑥 2 ) = ∫ 𝑥 2 𝑥 𝑟−1 𝑒 −𝜆𝑥 𝑑𝑥 = =
Γ(𝑟) 0 Γ(𝑟)𝜆𝑟+2 𝜆2
28
Statistical Theory of Distribution
𝑟(𝑟+1) 𝑟 𝑟
= – (𝜆 )2 = //
𝜆2 𝜆2
∞
𝜆𝑟 𝑡𝑥 𝑟−1 −𝜆𝑥
𝑚𝑋 (𝑡) = 𝐸[𝑒 𝑡𝑋 ] = ∫ 𝑒 𝑥 𝑒 𝑑𝑥
0 г(𝑟)
𝑚𝑋′ (𝑡) = 𝑟𝜆𝑟 (𝜆 − 𝑡)−𝑟−1 𝑎𝑛𝑑 𝑚𝑋′′ (𝑡) = 𝑟(𝑟 + 1)𝜆𝑟 (𝜆 − 𝑡)−𝑟−2 ;
𝑟 2 𝑟 2 𝑟
Hence 𝐸(𝑋) = 𝑚𝑋′ (0) = 𝜆 𝑎𝑛𝑑 𝑉(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 𝑚𝑋′′ (0) − (𝜆) = 𝜆2
1
𝑓(𝑥; 𝑎, 𝑏) = 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝐼(0,1) (𝑥) 𝑤ℎ𝑒𝑟𝑒 𝑎 > 0 𝑎𝑛𝑑 𝑏 > 0
𝐵(𝑎,𝑏)
1
where the beta function B(a; b) = Γ(a) Γ (b)/Γ (a + b) and 𝐵(𝑎, 𝑏) = ∫0 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥 .
Consider a Poisson process with arrival rate of 𝜆 events per unit time. Let Wk denote the waiting
time until the kth arrival of an event and Ws denote the waiting time until the sth arrival, s > k.
The proportion of the time taken by the first k arrivals in the time needed for the first s arrivals is
𝑊𝑘 𝑊𝑘
= ~ 𝑏𝑒𝑡𝑎 (𝑘, 𝑠 − 𝑘).
𝑊𝑠 𝑊𝑘 + (𝑊𝑠 − 𝑊𝑘)
Remark: The beta distribution reduces to the uniform distribution over (0, 1) if a=b=1.
Let x1, x2, …., xn be a sample from a beta distribution with shape parameter a and b. Let
29
Statistical Theory of Distribution
𝑛 𝑛
1 1
𝑥̅ = ∑ 𝑋𝑖 𝑎𝑛𝑑 𝑠 2 = ∑(𝑋𝑖 − 𝑥̅ )2
𝑛 𝑛−1
𝑖=1 𝑖=1
Theorem: If the random variable X has a beta -distribution with parameters a and b, then
𝒂 𝒂𝒃
𝑬(𝑿) = 𝒂+𝒃 𝑽(𝑿) = (𝒂+𝒃+𝟏)(𝒂+𝒃)𝟐
г(k+a)г(a+b)
г(a)г(k+a+b)
г(1+𝑎)г(𝑎+𝑏) 𝑎 г(𝑎)г(𝑎+𝑏) 𝑎
Thus, E(x) = = =
г(𝑎)г(1+𝑎+𝑏) 𝑎+𝑏 г(𝑎)г(𝑎+𝑏) 𝑎+𝑏
(𝑎+1)𝑎 𝑎 2 𝒂𝒃
Var(x) = (𝑎+𝑏+1)(𝑎+𝑏)
− (𝑎+𝑏) = (𝒂+𝒃+𝟏)(𝒂+𝒃)𝟐
f(y) = 𝑒 −𝑦 𝑌 >0
30
Statistical Theory of Distribution
The distribution of X is known as the Weibull distribution with shape parameter c, scale
𝑐 𝑥 𝑐−1 𝑥 𝑐
Or f(x/b, c, m = 0) = ( ) 𝑒𝑥𝑝 {− [𝑏] } 𝑏 > 0, 𝑐 > 0.
𝑏 𝑏
Applications:
The Weibull distribution is one of the important distributions in reliability theory. It is widely
used to analyze the cumulative loss of performance of a complex system in systems engineering.
In general, it can be used to describe the data on waiting time until an event occurs. In this
manner, it is applied in risk analysis, actuarial science and engineering. Furthermore, the Weibull
Proof:
∞ 𝑐 𝑥 𝑐−1 𝑥 𝑐 𝑥 𝑐 𝑐
E(x) = ∫0 𝑥 𝑏 (𝑏) 𝑒𝑥𝑝 {− [𝑏] } 𝑑𝑥 → 𝑙𝑒𝑡 𝑢 = [𝑏] → 𝑑𝑢 = 𝑥 𝑐−1 𝑑𝑥 → 𝑥 =
𝑏𝑐
𝑏𝑢1/𝑐
∞ 1
𝑐 𝑥 𝑐−1 𝑏𝑐 ∞ 1
Thus, E(x) = ∫0 𝑏𝑢 𝑐 𝑏 (𝑏) 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 = ∫0 𝑏𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 =
𝑐𝑥 𝑐−1
1
∞
𝑏 ∫0 𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢
∞ 𝑐+1
−1 𝑐+1
= b ∫0 𝑢 𝑐 𝑒𝑥𝑝{−𝑢} 𝑑𝑢 = 𝑏Γ ( 𝑐
)
31
Statistical Theory of Distribution
𝑋−𝑚 𝑐
Remark: Let X be a Weibull (b, c, m) random variable. Then, ( ) ~ exp(1) is the
𝑏
The probability density function of a Cauchy distribution with the location parameter a and the
Note that:
If the r.v X follows Cauchy distribution its mean and variance do not exist.
It can be postulated as a model for describing data that arise as n realizations of the ratio
If X and Y are independent standard normal random variables, then U = X/Y follows the
1 and a = 0 .
CHAPTER THREE
32
Statistical Theory of Distribution
Random variable(r.v) is simply a function defined on a sample space S and taking values in real
line R = (−∞, ∞). In the study of many random experiments, there are, or can be, more than one
r.v of interest; hence we are compelled to extend our definitions of the distribution and density
function of one r.v to those of several r.vs. For example, during the “health awareness week” we
may consider a population of university students and record randomly selected student’s
height(X1), weight (X2), age (X3) and blood pressure (X4). Each individual r.v may be assumed
to follow some appropriate probability distribution in the population but it may also quite
reasonable to assume that four variables together follow a certain joint probability distribution.
Let X1, X2, …, Xk be k random variables all defined on the same probability space
i. The joint probability mass function (pmf) of x = (x1, x2, …, xk) is given by
Such that
ii. The joint probability density function (pdf) of x = (x1, x2, …, xk) is given by
f(x) = f (x1, x2, …, xk) for x = (x1, x2, …, xk) such that
iii. The Joint cumulative distribution function of X1, X2, …, Xk denoted by FX1, …, Xk(x1, x2,
𝑥1 𝑥𝑘
∫−∞ … ∫−∞ 𝑓𝑥1,… ,𝑥𝑘 (𝑢1 , 𝑢2 , … . , 𝑢𝑘 )𝑑𝑢1 … 𝑢𝑘 for all (x1, x2, …, xk),
33
Statistical Theory of Distribution
iv. Marginal Cumulative distribution functions: If FX1, …, Xk(x1, x2, …, xk) is the joint
cumulative distribution function of X1, X2, …, Xk , then the cumulative distribution functions
For example: let (X, Y) be two discrete or Continous random variables assuming all values in
some region (range space). The joint probability density function f is a function satisfying the
following conditions.
f(x, y) ≥ 0 ∀ 𝑥, 𝑦 f(x, y) ≥ 0 ∀ 𝑥, 𝑦
∞ ∞
∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1 ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1
Let X an Y be two random variables with joint probability f(x, y), then the conditional
probability of
𝑓(𝑥,𝑦)
a. Y given X = x is given by f(Y/ X = x) = 𝑓(𝑥)
𝑓(𝑥,𝑦)
b. X given Y = y is given by f(X/ Y = y) = 𝑓(𝑦)
34
Statistical Theory of Distribution
Let X1, X2, …, Xk be k random variables, discrete or continuous with joint probability
function f(x1, x2, …, xk) and marginal probability function f1(x1), …, fk(xk) respectively. The
random variables X1, X2, …, Xk are said to be mutually independent if and only if f(x1, x2, …,
xk) = f1(x1). f2(x2), …, . fk(xk for all (x1, x2, …, xk) with in their range.
Example1: Let X and Y be two random variables whose joint probability mass function is given
in the table.
Y
-2 -1 1 2
X P(x) P(X≤ 𝑥)
P(Y≤ 𝑦)
c. P( X / Y = 2), P(Y/X = 0)
Example 2: Let (X, Y) have the distribution defined by the density function
c. P(x + y ≤ 4)
35
Statistical Theory of Distribution
C) 1 - 5𝑒 −4
Example 3: The probability density function of a two dimensional random variables (X, Y) is
given by
x + y 0<𝑥+𝑦 <1
f(x, y) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Solution:
ii) 𝑦
x=½
y=¼
x+y=1 x
1 1 1−𝑥
1−𝑥 𝑦2
P( X < ½ , Y > ¼) = ∫02 ∫1 (𝑥 + 𝑦)𝑑𝑥𝑑𝑦 = ∫02 {⌈𝑥𝑦 + ⌉ } 𝑑𝑥
4 2 1/4
1
(1−𝑥)2 𝑥 1
= ∫0 [𝑥(1 − 𝑥) +
2 − − 32] 𝑑𝑥
2 4
1/2 15 𝑥2 𝑥
= ∫0 [32 − − 4] 𝑑𝑥 = 35/192 //
2
Conditional Expectation:
follows:
36
Statistical Theory of Distribution
∞
E(X/Y) = ∫−∞ 𝑥𝑔(𝑥/𝑦)𝑑𝑥
∞ 𝑤ℎ𝑒𝑟𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠 𝑟. 𝑣𝑠
𝐸(𝑌/𝑋) = ∫=∞ 𝑦𝑔(𝑦/𝑥)𝑑𝑦
𝑒 −𝑥/𝑦 𝑒 −𝑦
f(x, y) = { 𝑜 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑦
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Compute E(X/Y).
∞ 𝑓(𝑥,𝑦) ∞
Solution: E(X/Y) = ∫−∞ 𝑥𝑓(𝑥/𝑦)𝑑𝑥 where f(x/y) = 𝑓(𝑦)
and f(y) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥
𝑥
− 𝑥
∞ 𝑒 𝑦 𝑒 −𝑦 𝑒 −𝑦 ∞ − 𝑋 𝑑𝑥
→ f(y) = ∫0 dx = ∫ 𝑒 𝑦 𝑑𝑥 𝑙𝑒𝑡 𝑈 = → 𝑑𝑢 = → 𝑑𝑥 =
𝑦 𝑦 0 𝑌 𝑦
𝑦𝑑𝑢
𝑒 −𝑦 ∞
→ f(y) = 𝑦
∫0 𝑒 −𝑢 𝑦𝑑𝑢 = 𝑒 −𝑦 (−𝑒 −𝑢 )|∞
0
= 𝑒 −𝑦 (0 − −𝑒 0 ) = 𝑒 −𝑦
𝑥
−
𝑒 𝑦 𝑒−𝑦 −
𝑥
𝑓(𝑥,𝑦) 𝑦 𝑒 𝑦
𝑓𝑥 (𝑋, 𝑌) = =
𝑦 𝑓(𝑦) 𝑒 −𝑦 𝑦
𝑥
−
𝑒 𝑦
→ 𝑓𝑥 (𝑋, 𝑌) = { 𝑜 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑦
𝑦
𝑜 𝑂 𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥
∞𝑥 − 𝑑𝑥
Thus. E(X/Y) = ∫ 𝑥 𝑓𝑥 (𝑋, 𝑌)𝑑𝑥 = ∫0 𝑒 𝑦 𝑑𝑥 → 𝑙𝑒𝑡 𝑧 = 𝑥/𝑦 dz = 𝑑𝑥 = 𝑦𝑑𝑧
𝑦 𝑦 𝑦
∞ ∞
E(X/Y) = ∫0 𝑧𝑒 −𝑧 𝑦𝑑𝑧 = 𝑦 ∫0 𝑧𝑒 −𝑧 𝑑𝑧 = 𝑦Γ(2)
∞
Since Γ(𝑎) = ∫0 𝑥 𝑎−1 𝑒 −𝑥 𝑑𝑥
37
Statistical Theory of Distribution
Chapter Four
Sampling Distribution
4.1.Introduction
In statistics a collection of objects whose elements are examined in view of their individual
conclusion is drawn about the population based on its results. In Continous case population
38
Statistical Theory of Distribution
are usually values of identically distributed random variables, whose distribution we referred
to as a population distribution.
Definition: If X1, X2, …, Xn are independently and identically distributed random variables,
we say that they constitute a random sample from the infinite population given by their
common distribution.
If f(x1, x2, …, xn ) is the value of the joint distribution of such a set of random variables at
(X1, X2, …, Xn) we can write f(x1, x2, …, xn ) = ∏𝑛𝑖=1 𝑓(𝑥𝑖 ) where f(xi) is the value of
∑ 𝑥𝑖 ∑(𝑥𝑖−𝑥̅ )2
𝑥̅ = is called the sample mean and 𝑠 2 = is called the sample variance.
𝑛 𝑛−1
𝑇ℎ𝑒 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑀𝑒𝑎𝑛: Since statistics are random variables, their values will vary
𝐸𝑥𝑎𝑚𝑝𝑙𝑒: If x1, x2, …., xn constitute a random sample from an infinite population with the
𝜎2
mean 𝜇 and the variance 𝜎 2 then E(𝑥̅ ) = 𝜇 𝑎𝑛𝑑 𝑣𝑎𝑟(𝑥̅ ) = 𝑛
∑ 𝑥𝑖 ∑ 𝑥𝑖 ∑ 𝐸(𝑥𝑖) 1 1
𝑝𝑟𝑜𝑜𝑓: Let Y = 𝑥̅ = then we get E(Y) = E( )= = 𝑛∑𝜇 = 𝑛𝜇 = 𝜇
𝑛 𝑛 𝑛 𝑛
∑ 𝑥𝑖 1 1 1
Var(Y) = var( )= 𝑣𝑎𝑟(∑ 𝑥𝑖) = ∑ 𝑣𝑎𝑟(𝑋𝑖) + 2 ∑ ∑ 2 𝑐𝑜𝑣(𝑥𝑖, 𝑥𝑗)
𝑛 𝑛2 𝑛2 𝑛
1 1 𝑛𝜎2 𝜎2
So var(xi) = ∑ 𝑛2 𝑣𝑎𝑟(𝑥𝑖) = ∑ 𝜎2 = =
𝑛2 𝑛2 𝑛
Note: The standard deviation of the sample means is called standard error of the mean.
There are basically two ways of making inferences (generalization) in science. These are
- Deductive inference
- Inductive Inference
39
Statistical Theory of Distribution
There are generally two closely related concepts used to make inference about population
population parameter.
∑ 𝑥𝑖
Example: 𝜇̂ = 𝑥̅ = is an estimator of population mean𝜇.
𝑛
One function of statistics is the provision of techniques for making inductive inferences and
for measuring the degree of un certainty of any inferences. That uncertainty is measured in
terms of probability.
To make inference about the population parameter, we should select the sample observation
If the experiment is performed without replacement we have (𝑁𝑛 ) different samples. And if
the sample is selected with replacement we have 𝑁 𝑛 different sample observations with n-
equal size.
Distribution of Sample
Let x1, x2, …, xn denote a sample of size n. The distribution (probability) of the sample x1, x2,
Example: let x1, x2, …, xn be a random sample from a Bernoulli distribution i.e. xi ~
40
Statistical Theory of Distribution
(𝑥1, 𝑥2,….,𝑥𝑛)
Solution 𝑓𝑥1,𝑥2,…,𝑥𝑛 = 𝑓𝑥1 . 𝑓𝑥2 . , … , 𝑓𝑥𝑛 = ∏𝑛𝑖=1 𝑝(𝑥𝑖) = ∏𝑛𝑖=1 𝑝 𝑥𝑖 (1 − 𝑝)1−𝑥𝑖
𝑝∑ 𝑥𝑖 (𝑖 − 𝑝)∑(1−𝑥𝑖) = 𝑝∑ 𝑥𝑖 (𝑖 − 𝑝)𝑛−∑ 𝑥𝑖
Example-2 Let x1, x2, …, xn be a random sample from the poisson distribution. Find the
(𝑥1, 𝑥2,….,𝑥𝑛)
𝑓𝑥1,𝑥2,…,𝑥𝑛 =?
𝑒 −𝜆 𝜆𝑥𝑖
𝑓(𝑥𝑖) = (𝑥𝑖)!
xi = 0, 1, 2, ….
𝑛
(𝑥1, 𝑒 −𝜆 𝜆𝑥𝑖 𝑒 −𝑛𝜆 𝜆∑ 𝑥𝑖
𝑇ℎ𝑢𝑠, 𝑓𝑥1,𝑥2,…,𝑥𝑛 𝑥2,….,𝑥𝑛) = ∏ = 𝑛
(𝑥𝑖)! ∏𝑖=1(𝑥𝑖) !
𝑖=1
Let X1, X2, …, Xn be independent standard normal random variables. The distribution of
X = ∑𝒏𝒊=𝟏 𝑿𝒊 𝟐 is called the chi-square distribution with degrees of freedom (df) n, and its
𝟏 𝒏
f(x/n) = 𝒆−𝒙⁄𝟐 𝒙𝟐−𝟏 x> 𝟎, 𝒏 > 𝟎.
𝟐𝒏⁄𝟐𝚪(𝒏⁄𝟐)
Applications:
The chi-square distribution is also called the variance distribution, because the variance of a
random sample from a normal distribution follows a chi-square distribution. Specifically, if X1,
…, Xn is a random sample from a normal distribution with mean 𝜇 and variance 𝜎 2 , then
𝒏
̅) 𝟐
(𝒙𝒊 − 𝒙 (𝒏 − 𝟏)𝑺𝟐
∑ = ~ 𝑿𝟐𝒏−𝟏
𝝈𝟐 𝝈𝟐
𝒊=𝟏
41
Statistical Theory of Distribution
In categorical data analysis consists of an r×c table, the usual test statistic
(𝑂𝑖𝑗−𝐸𝑖𝑗)2
T = ∑𝑟𝑖=1 ∑𝑐𝑗=1 ~ 𝑿𝟐(𝒄−𝟏)(𝒓−𝟏) where Oij and Eij denote, respectively, the
𝐸𝑖𝑗
observed and expected cell frequencies. The null hypothesis of independent attributes will be
The chi-square statistic T can be used to test whether a frequency distribution tests a specific
model.
Properties:
i. If the random variable Xi, I = 1, 2, …., k are normally and independently distributed with
(𝑥𝑖−𝜇𝑖 )
Proof: write Zi = . Then Zi has a standard normal distribution. Now
𝛿𝑖
𝑘 𝑛
𝑡𝑢 ) 𝑡 ∑ 𝑍𝑖 2 𝑡𝑍𝑖 2 2
𝑀𝑢 (𝑡) = 𝐸(𝑒 = 𝐸(𝑒 ) = 𝐸 (∏ 𝑒 ) = ∏ 𝐸(𝑒 𝑡𝑍𝑖 ) 𝑠𝑖𝑛𝑐𝑒 𝑍𝑖 ′ 𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝑖=1 𝑖=1
1 2 1
2 ∞ 2 1 ∞ 1 (1−2𝑡)𝑍 2
But 𝐸(𝑒 𝑡𝑍𝑖 ) = ∫−∞ 𝑒 𝑡𝑍𝑖 ( 2 ) 𝑒 −2𝑍 𝑑𝑧 = ∫−∞ ( 2 ) 𝑒 −2 𝑑𝑧
√2𝜋 √2𝜋
1
1 ∞ √1−2𝑡 (1−2𝑡)𝑍 2 1 1
= ⌈∫−∞ 𝑒 −2 𝑑𝑧⌉ = 𝑓𝑜𝑟 𝑡 < 2 Since the expression in the parenthesis
√1−2𝑡 √2𝜋 √1−2𝑡
1
is the density of normal with mean 0 and variance(1−2𝑡).
2 1 1/2
Hence, E(𝑒 𝑡𝑍𝑖 ) = (1−2𝑡)
2 1 1/2 1 𝑘/2 1
Implies that, 𝑍𝑖 2 ~ 𝑋 2 (1) → ∏ki=1 E(𝑒 𝑡𝑍𝑖 ) = ∏ki=1 (1−2𝑡) = (1−2𝑡) 𝑓𝑜𝑟 𝑡 < 2 is the
ii. If z1, z2, …, zn are a random sample of a standard normal distribution, then
42
Statistical Theory of Distribution
1
a) 𝑧̅ ℎ𝑎𝑠 𝑎 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑚𝑒𝑎𝑛 0 𝑎𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑛.
∑𝑛
𝑖=1 𝑍𝑖 ∑𝑛
𝑖=1(𝑋𝑖−𝜇) 𝑥̅ −𝜇
Here 𝑧̅ = = =
𝑛 𝑛 𝑛
𝑥̅ −𝜇 𝜎2 /𝑛 1
→E(𝑧̅) = 0 and var(𝑧̅) = 𝑣𝑎𝑟 ( )= =𝑛
𝑛 𝜎2
Consider case n = 2, if n = 2
(𝑛−1)𝑠2
iii. If x1, x2, …, xn is a random sample from n(𝜇, 𝜎 2 ). Then ~ 𝑋 2 (𝑛 − 1)
𝜎2
Proof:
𝑛
iv. If X and Y are independent chi-square distribution with k and m degree of freedom, then X +
Proof:
MX+Y(t) = E(𝑒 𝑡(𝑥+𝑌) ) = 𝐸(𝑒 𝑡𝑥+𝑡𝑦 ) = 𝐸(𝑒 𝑡𝑥 )𝐸(𝑒 𝑡𝑦 )𝑠𝑖𝑛𝑐𝑒 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝑘 (𝑘+𝑀)
Mx+y(t) = (1 − 2𝑡)−2 (1 − 2𝑡)−𝑚/2 = (1 − 2𝑡)− 2 which is the moment generating
→If X1, X2, …, Xk are independent chi-square random variables with degrees of freedom
43
Statistical Theory of Distribution
𝑘
2
∑ 𝑋𝑖 ~ 𝑋𝑚 𝑤ℎ𝑒𝑟𝑒 𝑚 = ∑ 𝑛𝑖
𝑖=1
4. The gamma distribution with shape parameter a and scale parameter b specializes to the
chi-square distribution with df = n when a = n/2 and b = 1/2. That is, gamma(n/2, 1/2)
~ 𝑋𝑛2 .
𝑟 𝑛⁄2 𝑟 𝑛 1 2 𝑛
E(x) = 𝜆 = 1⁄ = 𝑛 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = = ÷ (2) = × 4 = 2n//
2 𝜆2 2 2
𝑚𝑣 2
Example: Suppose the velocity (V) of an object has N (0, 1). Let k = be the kinetic energy
2
of an object.
𝑚𝑣 2
b. Find the pdf of k = 2
1 2
1
𝑒 −2𝑣 𝑠𝑖𝑛𝑐𝑒 𝑉~ 𝑁(0, 1)
√2𝜋
𝑑ℎ−1 (𝑦)
→ g(y) = {𝑓(ℎ−1 (𝑦)) + 𝑓(ℎ−1 (𝑦))} | |
𝑑𝑦
1
→ g(y) = {𝑓(√𝑦) + 𝑓(−√𝑦)} |2 𝑦|
√
1 2 1 2
1 1 1
Thus, g(y) = 2 { 𝑒 −2(√𝑦) + 𝑒 −2(−√𝑦) }
√ 𝑦 √2𝜋 √2𝜋
1 1
1 − 𝑦 −
1 2 − (√𝑦)2 𝑒 2 𝑦 2 1
g(y) = { 𝑒 2 } = where Γ (2) = √Π
2√𝑦 √2𝜋 √2𝜋√𝜋
44
Statistical Theory of Distribution
1 1
− 𝑦 −
𝑒 2 𝑦 2
→ 𝑔(𝑦) = 1 ~ 𝑋 2 (1) 𝑜𝑟 𝑋 2 (𝑛 = 1)
√2𝜋Γ(2)
𝑛 −1/2
𝑒
𝑥2
Because f(x) = 𝑛 1 ~ 𝑋 2 (𝑛)
2 2 22
𝑚𝑣 2
ii. Find the pdf of k = 2
Solution:
1 1 2
𝑓𝑣 (𝑉) = 𝑒 −2𝑣 −∞<𝑣 <∞
√2𝜋
2𝑘
ℎ−1 (𝑘) = 𝑣 = ±√
𝑚
𝑑ℎ−1 (𝑘)
g(k) = {𝑓(ℎ−1 (𝑘)) + 𝑓(ℎ−1 (𝑘))} | |
𝑑𝑘
2𝑘
𝑑(√ )
𝑑ℎ−1 (𝑘) 𝑚 2 1 1
where = = √𝑚 (2√𝑘) =
𝑑𝑘 𝑑𝑘 √2𝑚𝑘
2
1 2𝑘 1 2𝑘
1 − (√ )2 1 − (√ ) 1
2 𝑚
g(k) = { 𝑒 2 𝑚 + 𝑒 }
√2𝜋 √2𝜋 √2𝑚𝑘
2 −𝑘 1
→ 𝑔(𝑘) = ( 𝑒 𝑚)
√2𝜋 √2𝑚𝑘
1 1 𝑘
1 − −
2 −𝑘 ( )2 𝑘 2 𝑒 𝑚
𝑚
→ 𝑔(𝑘) = 𝑒 𝑚 =
2√𝜋𝑚𝑘 √𝜋
1
1 2 1 𝑘
(𝑚 ) 𝑘 − 2 𝑒 − 𝑚 1 1 1 1
→ 𝑔(𝑦) = ~ 𝑔𝑎𝑚𝑚𝑎 ( , ) →𝑟= 𝑎𝑛𝑑 𝜆 =
1 2 𝑚 2 𝑚
Γ(2)
4.3.
45