Probability and Statistics Elements Explained

Chi-Kong Ng, SEEM3410, Dept.
of SEEM, CUHK 2:1
Chapter 2.
Elements of Probability and Statistics

Chi-Kong Ng, SEEM3410, Dept. of SEEM, CUHK 2:2
2.1. Introduction
Probability and Statistics are important for:
• input modeling
• generation of random samples
• validation of simulation models
• performance of statistical analyses of outputs
• design of experiments
2.2. Probability
• Experiment:
Any activity with uncertain outcome.
• Sample Space:
The set of all possible outcomes of an experiment.
• Discrete Sample Space:
It has finitely many or a countable infinity of elements.
• Continuous Sample Space:
Its elements (points) constitute a continuum.
• Event:
Any subset of a sample space is called an event.
• Impossible Event:
The empty set ∅ is called the impossible event.
• Mutually Exclusive Events:
Two events A1 and A2 are said to be mutually exclusive, if and
only if,
A1 ∩ A2 = ∅.
Events A1 , A2, A3, . . . are mutually exclusive, if and only if,
Ai ∩ Aj = ∅ for i 6= j.
• Probability of an Event:
The probability of an event (or outcome) is the proportion of times
the event occur in a long run of repeated experiments.
• Axioms and Basic Properties of Probabilities:

◦ Axiom 1: If S is the sample space, then P (S) = 1.
◦ Axiom 2: If A is an event in S, then 0 ≤ P (A) ≤ 1.
◦ Axiom 3: If A and B are mutually exclusive, then
P (A ∪ B) = P (A) + P (B).
◦ If A and B are any events in S, then

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
◦ If A and Ā are complementary events in S, then

P (Ā) = 1 − P (A).
◦ P (∅) = 0 for any sample space S.

◦ If A is an event in a finite sample space S, then P (A)

equals the sum of the probabilities of the individual outcomes
comprising A.
• Conditional Probability:
If A and B are any events in a sample space S and P (B) 6= 0, the
conditional probability of A given B is
P (A ∩ B)
P (A|B) = .
P (B)
• Independent Events:
Two events A and B are independent, if and only if,
P (A ∩ B) = P (A) P (B),
or, equivalently,
P (A|B) = P (A).
• Rule of Total Probability:

If B1, B2, . . . , Bn are mutually exclusive events of which one must
occur, then for any event A in S,
n
X
P (A) = P (Bi) P (A|Bi).
i=1
• Bayes’ Theorem:
If B1, B2, . . . , Bn are mutually exclusive events of which one must
occur, then for any event A in S such that P (A) 6= 0,
P (A ∩ Br ) P (Br ) P (A|Br )
P (Br |A) = = n for r = 1, . . . , n.
P (A) X
P (Bi) P (A|Bi)
i=1
2.3. Random Variables

• Random Variables:
Random variables are some real-valued functions of the outcome
defined on the sample space.
• Discrete versus Continuous Random Variables:
◦ Discrete — can assume at most a finite or a countably infinite
number of possible values.
◦ Continuous — can take on any real value in some range.
2.4. PDFs and CDFs

2.4.1. Discrete Case
• Let X be a discrete random variable.
• For X with possible values x1, . . . , xn, a probability distribution
function (PDF) (also called probability mass function (PMF)) is a
function such that
◦ f (xi ) = P (X = xi) ≥ 0
Xn
◦ f (xi) = 1.
i=1
• The cumulative distribution function (CDF) (or just distribution

function) of X is defined by
X
F (x) = P (X ≤ x) = f (xi).
xi ≤x
• Properties:
◦ F (−∞) = 0
◦ F (∞) = 1
◦ If a < b, then F (a) ≤ F (b) for any real numbers a and b
2.4.2. Continuous Case

• Let X be a continuous random variable.
• A function with values f (x), defined over the set of all real numbers,
is call a probability density function (PDF) of X, if and only if,
◦ f (x) ≥ 0, for −∞ < x < ∞
Z ∞
◦ f (x) dx = 1
−∞
Z b
◦ P (a ≤ X ≤ b) = f (x) dx, for a ≤ b.
a
• In general,
◦ P (X = x) = 0, for −∞ < x < ∞
◦ f (x) 6= P (X = x).
• The cumulative density function (CDF) (also called cumulative

distribution function or just distribution function) of X:
Z x
F (x) = P (X ≤ x) = f (t) dt, for −∞ < x < ∞.
−∞
• If f (x) and F (x) are, respectively, values of the PDF and the CDF
of X at x, then
P (a ≤ X ≤ b) = F (b) − F (a)
for any real constants a and b with a ≤ b, and
d F (x)
f (x) =
dx
wherever the derivative exists.
2.5. Expectations, Moments, and

Variances
• The expected value of X:
X

 x f (x) (discrete case)

µ ≡ E(X) = Zx ∞
x f (x) dx (continuous case)



−∞
• Informally speaking, the expected value of X can be thought of as

the “central value” of X or the long-run theoretical average value
of X.
• The expected value of the random variable g(X):

X

 g(x) f (x) (discrete case)
 x
E g(X) = Z ∞
g(x) f (x) dx (continuous case)



−∞
• If a and b are constants, then

E(aX + b) = aE(X) + b.
• If c1, c2, . . . , cn are constants, then

" n # n
X X
E ci gi (X) = ci E gi (X) .
i=1 i=1
• The r-th moment about the origin of X:

µr ′ = E(X r ), for r = 0, 1, 2, . . .
• The r-th moment about the mean of X:
r

µr = E (X − µ) , for r = 0, 1, 2, . . .
2

• µ2 = E (X − µ) is called the variance of X, and it is denoted
by σ 2 or Var(X). The positive square root of the variance is called
the standard deviation of X.
• Variance measures the variability or dispersion of the random
variable on the real line.
2
• Var(X) = E(X 2) − E(X)


Var(aX + b) = a2 Var(X).
2.6. Joint PDFs, Joint CDFs, and

Marginal PDFs
2.6.1. Discrete Case
• Let X and Y be discrete random variables.
• The joint probability distribution function (joint PDF) of X and Y :
fX,Y (x, y) = P (X = x, Y = y)
for each pair of values (x, y) within the range of X and Y .
• Properties:
◦ fX,Y (x, y) ≥ 0 for each pair of values (x, y) within its domain
XX
◦ fX,Y (x, y) = 1.
x y
• The joint cumulative distribution function (joint CDF) of X and Y :

XX
FX,Y (x, y) = P (X ≤ x, Y ≤ y) = fX,Y (s, t)
s≤x t≤y
for −∞ < x < ∞ and −∞ < y < ∞.

• The marginal probability distribution function (marginal PDF) of X:
X
fX (x) = P (X = x) = fX,Y (x, y)
y
for each x within the range of X.

2.6.2. Continuous Case

• Let X and Y be continuous random variables.
• A function with values fX,Y (x, y), defined over the xy-plane, is
called the joint probability density function (joint PDF) of X and Y ,
if and only if,
ZZ

P (X, Y ) ∈ A = fX,Y (x, y) dx dy
A
for any region A in the xy-plane.
• Properties:
◦ fX,Y (x, y) ≥ 0, for −∞ < x < ∞ and −∞ < y < ∞;
Z ∞Z ∞
◦ fX,Y (x, y) dx dy = 1.
−∞ −∞
• The joint cumulative density function (joint CDF) of X and Y :

Z y Z x
FX,Y (x, y) = P (X ≤ x, Y ≤ y) = fX,Y (s, t) ds dt
−∞ −∞
for −∞ < x < ∞ and −∞ < y < ∞.

•
∂2
fX,Y (x, y) = FX,Y (x, y)
∂x ∂y
wherever these partial derivatives exist.
• The marginal probability density function (marginal PDF) of X:
Z ∞
fX (x) = fX,Y (x, y) dy, for −∞ < x < ∞.
−∞
2.7. Expectations, Product Moments,

Covariances, and Correlations
• The expected value of the random variable g(X, Y ):
 XX


 g(x, y) fX,Y (x, y) (disc. case)

x y
E g(X, Y ) = Z ∞Z ∞
g(x, y) fX,Y (x, y) dx dy (cont. case)




−∞ −∞

E(aX + bY ) = a E(X) + b E(Y )
• If c1, . . . , ck are constants, then
" k # k
X X
E ci gi (X, Y ) = ci E gi(X, Y )
i=1 i=1
• The r-th and s-th product moment about the origin of X and Y :
µr,s′ = E(X r Y s),
for r = 0, 1, 2, . . . and s = 0, 1, 2, . . .
• µ′1,0 = E(X) = µX , and µ′0,1 = E(Y ) = µY .
• The r-th and s-th product moment about the respective means of
X and Y :
µr,s = E (X − µX )r (Y − µY )s ,

for r = 0, 1, 2, . . . and s = 0, 1, 2, . . .
2
• If µX and µY exist, then µ2,0 = σX , and µ0,2 = σY2 .

• µ1,1 = E (X − µX )(Y − µY ) is called the covariance between X
and Y , and it is denoted by σX,Y or Cov(X, Y ).
• Covariance is a measure of linear relationship between X and Y .

◦ If large(small) X tends to go with large(small) Y , then σX,Y > 0
◦ If large(small) X tends to go with small(large) Y , then σX,Y < 0
◦ If there is no tendency for X and Y to occur jointly in agreement
or disagreement over being big or small, then σX,Y = 0
• Cov(X, Y ) = E(XY ) − E(X) E(Y ).
Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) + 2ab Cov(X, Y )
• Interpreting value of covariance is difficult since it depends on units

of measurement.
• The correlation between X and Y :

σX,Y
ρX,Y = .
σX σY
◦ Has same sign as covariance
◦ Always between −1 and +1
◦ Numerical value does not depend on units of measurement
◦ Dimensionless — universal interpretation
2.8. Independence
• Let X and Y be continuous/discrete random variables.
• X and Y are independent, if and only if,
fX,Y (x, y) = fX (x) fY (y)
for all (x, y) within their range.
• Independent ⇒ σX,Y = 0
Independent 6 ⇐ σX,Y = 0
• Independent ⇒ E(XY ) = E(X) E(Y )
Independent 6 ⇐ E(XY ) = E(X) E(Y )
2.9. Sampling Distributions

2.9.1. Basic
• Population:
A population consists of the totality of the observations with which
we are concerned.
• Sample:
When the population is too large to study in its entirety, or
techniques used in the study are destructive in nature, in either
cases we must depend on a subset or “sample” of observations from
the population to help us make inferences concerning that same
population.
• Independent and Identically Distributed (IID)

Random Variables:
All random variables have the same PDF and all are mutually
independent.
• Random Sample:
If X1, X2, . . . , Xn are IID random variables, we say that they
constitute a random sample of size n from the infinite population
given by their common PDF f (x).
• Statistic:
A statistic is a random variable which is a function of a random
sample X1, X2, . . . , Xn.
• Sampling Distributions:
Distributions of statistics are referred to as sampling distributions.
2.9.2. Sample Mean

• Let X1, X2, . . . , Xn constitute a random sample of size n from an
infinite population with the mean µ and the variance σ 2.
• Sample Mean:
n
1X
X̄ = Xj
n j=1
• µX̄ ≡ E(X̄) = µ
2 σ2
• σX̄≡ Var(X̄) =
n
2
• σX̄ (the positive square root of σX̄ ) is called the standard error of
the mean.
• If Xj ∼ N (µ, σ 2), then

σ2

X̄ ∼ N µ,
n
regardless of the size of n.
• Central Limit Theorem:
The statistic (called the standardized sample mean)
X̄ − µ
Z= √
σ/ n
is a random variable whose PDF approaches that of the standard
normal distribution as n → ∞.
2.9.3. Sample Variance

• Let X1, X2, . . . , Xn constitute a random sample of size n from an
infinite population with the mean µ and the variance σ 2. Let X̄ be
the sample mean.
• The statistic n
2 1 X
S = (Xj − X̄)2
n − 1 j=1
is called the sample variance (or the variance of the random sample).
• The statistic S (the positive square root of S 2 ) is called the
sample standard deviation (or the standard deviation of the random
sample).
• E(S 2 ) = σ 2
• The statistics
2 (n − 1)S 2
χ =
σ2
has the chi-square distribution with (n − 1) degrees of freedom.
• The statistics
X̄ − µ
T = √
S/ n
has the t distribution with (n − 1) degrees of freedom.
2.9.4. Sample Proportion

• If X has a binomial distribution with parameters (n, p), then the
sample proportion
X
p̂ =
n
namely, the proportion of the time that the event actually occurs
in n trials, is an unbiased estimator of the parameter p.
• E(p̂) = p
2.10. Confidence Intervals

2.10.1. Basic
• An interval estimate for a population parameter is called a
confidence interval.
• An interval estimate for an unknown parameter θ is an interval of
the form
l < θ < u,
where the end-points l and u are computed from the sample data.
• Because different samples will produce different values of l and u,
these end-points are values of the random variables L and U ,
respectively.
• Let Θ̂ be a point estimator for θ. Based on the sampling distribution

of Θ̂, we can choose l and u such that the following probability
statement is true:
P (L < θ < U ) = 1 − α
where 0 < α < 1.
• Such an interval l < θ < u, computed for a particular sample, is
called a 100(1 − α)% confidence interval, the fraction (1 − α) is
called the confidence coefficient or the degree of confidence, and
the end-points l and u are called the lower and upper confidence
limits.
2.10.2. Confidence Intervals/Bounds for Means

• Let x̄ and s be the values of the mean and standard deviation,
respectively, of a random sample of size n from a population with
unknown mean µ.
• If the population is normal with known variance σ 2, then a
100(1 − α)%
◦ two-sided confidence interval for µ is
σ σ
x̄ − zα/2 · √ < µ < x̄ + zα/2 · √
n n
◦ lower- and upper-confidence bounds for µ, respectively, are
σ σ
x̄ − zα · √ < µ and µ < x̄ + zα · √
n n
where zα is such that P (Z > zα ) = α.
Standard
Normal
• If the population is non-normal with known variance σ 2. If n is

large (≥ 30 in practice), then an approximate 100(1 − α)%
σ σ
x̄ − zα/2 · √ < µ < x̄ + zα/2 · √
n n
σ σ
x̄ − zα · √ < µ and µ < x̄ + zα · √
n n
• If the population is normal with unknown variance σ 2, then a

100(1 − α)%
s s
x̄ − tα/2,n−1 · √ < µ < x̄ + tα/2,n−1 · √
n n
s s
x̄ − tα,n−1 · √ < µ and µ < x̄ + tα,n−1 · √
n n
where tα,ν is such that P (T > tα,ν ) = α.
t with ν degrees
of freedom
2.10.3. Confidence Intervals/Bounds for Variances

• Let s be the value of the standard deviation of a random sample
of size n from a normal population with unknown standard
deviation σ, then a 100(1 − α)%
◦ confidence interval for σ 2 is
(n − 1)s2 2 (n − 1)s2
<σ < 2
χ2α/2,n−1 χ1−α/2,n−1
◦ lower- and upper-confidence bounds for σ 2 are, respectively,

(n − 1)s2 2 2 (n − 1)s2
< σ and σ < 2
χ2α,n−1 χ1−α,n−1
where χ2α,ν is such that P (χ2 > χ2α,ν ) = α.
χ2 with ν degrees of freedom

2.10.4. Confidence Intervals/Bound for

Proportions
• Let p̂ = x/n be the value of the sample proportion of a
random sample of size n from a binomial population with unknown
parameter p. If n is large, then an approximate 100(1 − α)%
◦ confidence interval for p is
r r
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 · < p < p̂ + zα/2 ·
n n
◦ lower- and upper-confidence bounds for p, respectively, are
r r
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα · < p and p < p̂ + zα ·
n n
2.11. Tests of Hypotheses

• When testing a hypothesis concerning the value of some parameter
θ, the statement of equality will always be included in H0, that is
H0 : θ = θ0.
Here, θ0 is called the null value. In this way H0 pinpoints a specific
numerical value that could be the actual value of θ.
• The second hypothesis is called the alternative hypothesis or the
motivated hypothesis (denoted by H1 ). It usually takes one of the
three forms: H1 : θ 6= θ0 , H1 : θ < θ0 or H1 : θ > θ0.
• It is hoped that the evidence leads us to reject the null hypothesis
H0 and thereby to accept the alternative hypothesis H1.
• On the other hand, failing to reject H0 implies that we have not

found sufficient evidence to reject H0 . It does not mean that there
is a high probability that H0 is true. It may simply mean that
more data are required to reach a strong conclusion. Therefore,
rather than saying we “accept H0,” we use the terminology “fail to
reject H0 .”
• In general, the test at α% significance level of the hypothesis
H0 : θ = θ0
H1 : θ 6= (<)(>)θ0
will lead to rejection of H0, if and only if, θ0 is not in the 100(1−α)%
confidence interval (lower-confidence bound) (upper-confidence
bound) for θ.
2.12. Common Continuous Distributions

1. Uniform Distribution (see page 43)
2. Triangular Distribution (see page 46)
3. Normal Distribution (see page 49)
4. Log-Normal Distribution (see page 52)
5. Exponential Distribution (see page 55)
6. Erlang Distribution (see page 58)
7. Gamma Distribution (see page 61)
8. Beta Distribution (see page 65)
9. Weibull Distribution (see page 70)
10. Chi-Square Distribution (see page 73)

11. F Distribution (see page 76)
12. Student’s t Distribution (see page 79)
2.12.1. Uniform Distribution

Common Notations:
uniform(a, b) or U (a, b)
1/(b−a)
Expression in Arena:
f(x)
UNIForm(a, b)
Parameters:
0
• a is the minimum a
x
b
• b is the maximum, where a < b U (a, b)

Probability Density Function:


 1

for a ≤ x ≤ b
f (x) = b−a
0

otherwise
Cumulative Density Function:



 0 for x ≤ a
 x−a

F (x) = for a ≤ x ≤ b

 b−a

1 for b ≤ x

Mean and Variance:
a+b (b − a)2
E(X) = and Var(X) =
2 12
Applications:
• Used when all values over a finite range are considered to be equally
likely
• Used when no information other than the range is available
• Used to transform to other random variates
2.12.2. Triangular Distribution

Common Notation:
triangular(a, b, m) 2/(b−a)
TRIAngular(a, m, b)
f(x)
Parameters:
• a is the minimum 0
a m b
x
• m is the mode, where a < m < b A triangular distribution

• b is the maximum


2(x − a)
for a ≤ x ≤ m


 (b − a)(m − a)


f (x) = 2(b − x)
for m ≤ x ≤ b
(b − a)(b − m)




0 otherwise



 0 for x ≤ a
(x − a)2



for a ≤ x ≤ m


(b − a)(m − a)

F (x) = 2
(b − x)
1− for m ≤ x ≤ b


(b − a)(b − m)




1 for b ≤ x

Mean and Variance:
a+m+b
E(X) =
3
a + m2 + b2 − am − ab − mb
2
Var(X) =
18
Applications:
• Used when the exact form of the distribution is not known, but
estimates for the minimum, maximum, and most likely values are
available
2.12.3. Normal Distribution

1.6
µ=0, σ=0.25
µ=0, σ=0.50
1.4
µ=0, σ=1.00
µ=1, σ=1.00
1.2 µ=−1, σ=1.00
Common Notations: 1
normal(µ, σ 2) or N (µ, σ 2)
f(x)
0.8
0.6
Expression in Arena: 0.4
NORMal(µ, σ) 0.2
0
−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
Parameters: x
• µ ∈ (−∞, ∞) is the location parameter N (µ, σ 2)
• σ > 0 is the scale parameter


" 2#
1 1 x−µ
f (x) = √ exp − for x ∈ R
σ 2π 2 σ

No closed form
Mean and Variance:
E(X) = µ and Var(X) = σ 2

Applications:
• Used in situations in which the Central Limit Theorem applies
• Used when the distribution appears symmetric
• Use to modeling equity returns, asset returns, errors in observations
of real phenomena
• Used to model the changes in many financial quantities
2.12.4. Log-Normal Distribution

1.8
µ=0, σ=0.25
1.6 µ=0, σ=0.50
µ=0, σ=1.00
1.4 µ=1, σ=1.00
µ=−1, σ=1.00
Common Notation: 1.2
lognormal(µ, σ 2) or LN (µ, σ 2) 1
f(x)
0.8
0.4
LOGNormal(µl , σl )
0.2
0
0 0.5 1 1.5 2 2.5 3
Parameters: x
• µl = exp(µ + σ 2/2) > 0 LN (µ, σ 2)

p
• σl = exp(2µ + σ 2)[exp(σ 2) − 1] > 0
 ( 2 )
1 ln(x) − µ

 1
 √ exp − for x > 0
f (x) = xσ 2π 2 σ

0 otherwise


No closed form
Mean and Variance:
E(X) = µl and Var(X) = σl2

Applications:
• Used when the quantity is the product of a large number of random
quantities
• Used to represent task times that have a distribution skewed to the
right
• Used as a rough model in the absence of data
• Used to model equity prices
Note:
X ∼ LN (µ, σ 2) ⇐⇒ ln(X) ∼ N (µ, σ 2)
2.12.5. Exponential Distribution

1
β=1
β=2
Common Notations: β=3
0.8
exponential(β) or expo(β)
0.6
f(x)
EXPOnential(β) 0.4
0.2
Parameter:
β > 0 is the scale parameter 0
0 1 2 3 4 5 6
x
exponential(β)
 1 exp(−x/β)

for x > 0
f (x) = β
0 otherwise

1 − exp(−x/β) for x > 0
F (x) =
0 otherwise
Mean and Variance:
E(X) = β and Var(X) = β 2

Applications:
• Used to model inter-event times in random arrival
• Used to model the interval of time between the calls
Notes:
• The exponential distribution is the only continuous distribution
with the memoryless property, i.e.,
P (X > s + t | X ≥ s) = P (X > t) for all s ≥ 0 and t ≥ 0
• If X1, . . . , Xk are IID random variables with Xi ∼ exponential(β),

then X = X1 + · · · + Xk ∼ Erlang(β, k).
2.12.6. Erlang Distribution

0.5
β=2.0, k=1
β=2.0, k=2
β=2.0, k=4
0.4 β=1.0, k=4
β=0.5, k=4
Common Notations:
0.3
k-Erlang(β) or Erlang-k(β)
f(x)
0.2
ERLAng(β, k) 0.1
0
0 2 4 6 8 10 12 14 16
Parameters: x
• β > 0 is the scale parameter Erlang(β, k)
• k (positive integer) is the shape parameter


 −k k−1
 β x exp(−x/β)
for x > 0
f (x) = (k − 1)!
0 otherwise



k−1


 1 − exp(−x/β) X (x/β)j
for x > 0
F (x) = j!
 j=0
0

otherwise
Mean and Variance:
E(X) = kβ and Var(X) = kβ 2

Applications:
• Used in simulation in which an activity occurs in successive phases
and each phase has an exponential distribution
• Used to represent the time required to complete a task
Note:
• If X1, . . . , Xk are IID random variables with Xi ∼ exponential(β),
then X = X1 + · · · + Xk ∼ Erlang(β, k).
2.12.7. Gamma Distribution

1.2
α=0.5, β=1.0
α=1.0, β=1.0
1 α=2.0, β=1.0
α=2.0, β=2.0
α=2.0, β=3.0
gamma(α, β)
f(x)
0.6
GAMMa(β, α) 0.2
0
Parameters: 0 1 2 3
x
4 5 6 7
• α > 0 is the shape parameter gamma(α, β)

• β > 0 is the scale parameter

 −α α−1 −x/β
β x e
for x > 0
f (x) = Γ(α)
0 otherwise

where Z ∞
Γ(α) = y α−1 e−y dy for α > 0
0
Note that
• Γ(y + 1) = y Γ(y) for y > 0
• Γ(y + 1) = y! for any nonnegative integer y
√
• Γ(y + 21 ) = π · 1 · 3 · 5 · · · (2y − 1)/2y for any positive integer y
√
• Γ( 21 ) = π

• If α is a positive integer, then
α−1


 1 − exp(−x/β) X (x/β)j
for x > 0
F (x) = j=0
j!

0 otherwise

• If α is not an integer, there is no closed form.
Mean and Variance:
E(X) = αβ and Var(X) = αβ 2

Applications:
• Used to represent the time required to complete some task
• Used to modeling financial losses
Notes:
• For a positive integer α, the gamma(α, β) distribution is called the
α-Erlang(β) distribution.
• If X1, . . . , Xm are independent random variables with Xi ∼
gamma(αi, β), then X = X1 +· · ·+Xm ∼ gamma(α1 +· · ·+αm , β).
2.12.8. Beta Distribution

Common Notation:
beta(α1, α2 )
BETA(β, α), where β = α1 and α = α2
Parameters:
α1 > 0 and α2 > 0 are shape parameters.

 α −1 α2 −1
x 1 (1 − x)
for 0 < x < 1


f (x) = B(α1, α2)

0 otherwise

where
Z 1
B(α1, α2 ) = tα1−1(1 − t)α2 −1 dt for α1 > 0 and α2 > 0
0
Note that
• B(α1, α2) = B(α2, α1)
Γ(α1) Γ(α2)
• B(α1, α2) =
Γ(α1 + α2)

No closed form, in general
Mean and Variance:
α1 α1α2
E(X) = and Var(X) =
α1 + α2 (α1 + α2)2(α1 + α2 + 1)
Applications:
• Used to represent random proportions
• Used to represent many input quantities that can be assumed to
have a range bounded on both ends
• Use to model random recovery rates when assessing credit risks
Notes:
• If X1 and X2 are independent random variables with Xi ∼
gamma(αi, β), then
X1
∼ beta(α1, α2).
X1 + X2
3 3
α1=1.5, α2=5.0 α1=1, α2=1
α =1.5, α =3.0 α =2, α =2
1 2 1 2
α =3.0, α =1.5 α =3, α =3
1 2 1 2
α1=5.0, α2=1.5 α1=5, α2=5
2 2
f(x)
f(x)
1 1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
3 3
α =0.5, α =2.0 α =0.8, α =0.2
1 2 1 2
α =1.0, α =2.0 α =0.5, α =0.5
1 2 1 2
α1=2.0, α2=1.0 α1=0.2, α2=0.8
α =2.0, α =0.5 α =0.8, α =0.8
1 2 1 2
2 2
f(x)
f(x)
1 1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
beta(α1, α2 )
2.12.9. Weibull Distribution

1.2
α=0.5, β=1.0
α=1.0, β=1.0
1 α=2.0, β=1.0
α=2.0, β=2.0
α=2.0, β=3.0
Weibull(α, β)
f(x)
0.6
WEIBull(β, α) 0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Parameters: x
• α > 0 is the shape parameter Weibull(α, β)

• β > 0 is the scale parameter

αβ −α xα−1 exp [−(x/β)α]

for x > 0
f (x) =
0 otherwise

1 − exp [−(x/β)α ]

for x > 0
F (x) =
0 otherwise
Mean and Variance:

β 1
E(X) = Γ
α ( α
2
2)
β 2 1 1
Var(X) = 2Γ − Γ
α α α α
Applications:
• Used in reliability models to represent the lifetime of a device
• Used to represent nonnegative task times
• Used to model extreme values from a bounded distribution
2.12.10. Chi-Square Distribution

1
ν=1
ν=2
ν=3
0.8 ν=4
Common Notation:
0.6
χ2(ν)
f(x)
0.4
Nil 0.2
0
0 1 2 3 4 5 6 7
x
Parameter: χ2 (ν)
ν (positive integer) is the degrees of freedom
1

ν/2−1 −x/2
 x e for x > 0
f (x) = 2ν/2 · Γ(ν/2)
0 otherwise

No closed form
Mean and Variance:
E(X) = ν and Var(X) = 2ν

Applications:
• Used in inferences
• The chi-square distribution with one degree of freedom is the
distribution of the hedging error from an option that is hedged
only discretely.
Notes:
• If Z ∼ N (0, 1), then Z 2 ∼ χ2(1).
• If X1, . . . , Xm are independent random variables with Xi ∼ χ2(νi ),
then X = X1 + · · · + Xm ∼ χ2(ν1 + · · · + νm ).
2.12.11. F Distribution
2
ν =1, ν =1
1 2
ν1=10, ν2=10
1.8
ν1=10, ν2=90
ν1=90, ν2=10
1.6
ν1=90, ν2=90
1.2
F (ν1, ν2)
f(x)
1
0.8
Nil 0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x
Parameters: F (ν1, ν2)

• ν1 (positive integer) is the degrees of freedom for numerator
• ν2 (positive integer) is the degrees of freedom for denominator
ν + ν ν ν1 /2

1 2 1
xν1 /2−1


 Γ
2 ν2


for x > 0

(ν1 +ν2 )/2
f (x) = ν ν
1 2 ν1
 Γ Γ x+1
2 2 ν

2



0 otherwise


No closed form
Mean and Variance:

ν2
E(X) = for ν2 > 2
ν2 − 2
2ν22(ν1 + ν2 − 2)
Var(X) = for ν2 > 4.
ν1(ν2 − 2)2(ν2 − 4)
Applications:
• Used to evaluate the extent of potential losses
2.12.12. Student’s t Distribution

0.4
ν=∞
ν=5
ν=2
ν=1
0.3
Common Notation:
t(ν)
f(x)
0.2
Nil 0.1
0
−4 −3 −2 −1 0 1 2 3 4
x
Parameter: t(ν)
ν (positive integer) is the degrees of freedom

ν +1
Γ −(ν+1)/2
x2

1 2
f (x) = √ · ν · 1 + for x ∈ R
πν Γ ν
2
No closed form
Mean and Variance:
ν
E(X) = 0 and Var(X) = for ν > 2
ν −2
Applications:
• Used to model equity/asset returns to take care of heavy tails
2.13. Common Discrete Distributions

1. Poisson Distribution (see page 82)
2. Bernoulli Distribution (see page 85)
3. Binomial Distribution (see page 87)
4. Geometric Distribution (see page 90)
5. Negative Binomial Distribution (see page 93)
6. Hypergeometric Distribution (see page 96)
7. Discrete Uniform Distribution (see page 99)
2.13.1. Poisson Distribution

0.4
λ=1
λ=4
λ=7
Poisson(λ) or Po(λ)
f(x)
0.2
Parameter:
λ>0
0.1
POISson(λ) 0
0 2 4 6 8 10 12 14 16
x
Poisson(λ)
Probability Distribution/Mass Function:


−λ x
e λ

for x ∈ {0, 1, 2, . . . }
f (x) = x!

0 otherwise
Cumulative Distribution Function:



 0 for x < 0

⌊x⌋ i
F (x) = X λ
−λ
 e for x ≥ 0
i!


i=0
Mean and Variance:
E(X) = λ and Var(X) = λ

Applications:
• Used to model the number of random events occurring in a fixed
interval of time
• Used to model random batch size
• Used to model the number of claims in insurance
• Used to model financial shocks
Note:
• If X1, . . . , Xm are independent random variables with Xi ∼
Poisson(λi), then X = X1 + · · · + Xm ∼ Poisson(λ1 + · · · + λm).
2.13.2. Bernoulli Distribution

0.7
Bernoulli(p) 0.5
0.4
f(x)
Parameter:
0.3
p ∈ [0, 1] is the probability
0.2
of success
0.1
Expression in Arena: 0
0 1
DISCrete(1 − p, 0, 1, 1) x
Bernoulli(0.7)


 1−p if x = 0
f (x) = p if x = 1
0 otherwise

Mean and Variance:
E(X) = p and Var(X) = p(1 − p)

Applications:
• Used to model random occurrence with two possible outcomes
• Used to generate other discrete random variate
Note:
• If X1, . . . , Xn are IID random variables with Xi ∼ Bernoulli(p),
then X = X1 + · · · + Xn ∼ binomial(n, p).
2.13.3. Binomial Distribution

0.25
n=16, p=0.3
Common Notations: n=16, p=0.5
binomial(n, p) or bin(n, p) 0.2
or B(n, p)
0.15
f(x)
Parameters: 0.1
• n is the number of inde-

0.05
pendent Bernoulli trials
• p ∈ [0, 1] is the common 0
0 2 4 6 8 10 12 14 16
x
success probability
B(n, p)
DISCrete F (0), 0, F (1), 1, . . . , F (n − 1), n − 1, 1, n


 n px(1 − p)n−x

for x ∈ {0, 1, . . . , n}
f (x) = x

0 otherwise

Mean and Variance:
E(X) = np and Var(X) = np(1 − p)

Applications:
• The probability of getting x successes in n independent Bernoulli
trials with probability p of success on each trial
• Used to model the number of defective items in a batch of size n
• Used in option pricing and option evaluations
Note:
• If X1, . . . , Xm are independent random variables and Xi ∼
binomial(ni, p), then
X = X1 + · · · + Xm ∼ binomial(n1 + · · · + nm , p).
2.13.4. Geometric Distribution

0.3
p=0.1
p=0.2
p=0.3
0.25
Common Notations: 0.2
geometric(p) or geom(p)
f(x)
0.15
Parameter: 0.1
p ∈ [0, 1] is the probability 0.05
of success
0
0 2 4 6 8 10 12 14 16 18 20
x
Expression in Arena: geometric(p)

DISCrete F (x1), x1, . . . , F (xn), xn

p(1 − p)x−1

for x ∈ {1, 2, 3, . . . }
f (x) =
0 otherwise
Mean and Variance:

1−p

1 1
E(X) = and Var(X) =
p p p
Applications:
• The probability of getting the first success on the x-th trial
• Used to model the number of good product units, separating
consecutive bad ones
• Used to model financial portfolio
• Used to model foreign currency exchange rate and financial shocks
Notes:
• The geometric distribution is the only discrete distribution with the
memoryless property, i.e.,
P (X > s + t | X ≥ s) = P (X > t) for all s ≥ 1 and t ≥ 1.
• Some books refer the probability of getting x failures before the

first success as the geometric distribution:
p(1 − p)x

for x ∈ {0, 1, 2, . . . }
f (x) =
0 otherwise
2.13.5. Negative Binomial Distribution

0.06
k=3, p=0.1
k=3, p=0.2
Common Notations: 0.05
NegativeBinomial(k, p) or 0.04
negbin(k, p) or NB(k, p)
f(x)
0.03
Parameters: 0.02
• k is a positive integer 0.01
• p is the probability of 0
0 10 20 30 40 50 60
success x
N B(k, p)

 x−1 k

p (1 − p)x−k for x ∈ {k, k + 1, k + 2, . . . }
f (x) = k−1
0 otherwise

Mean and Variance:

1−p

k k
E(X) = and Var(X) =
p p p
Applications:
• The probability of getting the k-th success on the x-th trial
Note:
• Some books refer the probability of getting x failures before the
k-th success in a sequence of independent Bernoulli trials with
probability p of success on each trial as the negative binomial
distribution:

−

k + x 1
pk (1 − p)x for x ∈ {0, 1, 2, . . . }

f (x) = x
0 otherwise

2.13.6. Hypergeometric Distribution

0.35
N=100, m=10, n=20
N=100, m=30, n=20
0.3 N=100, m=50, n=20
0.25
0.2
Common Notation:
f(x)
0.15
Nil
0.1
0.05
0
0 2 4 6 8 10 12 14 16 18 20
x
Expression in Arena: hypergeometric()

Parameters:
• N (positive integer) is the population size
• n ∈ {1, . . . , N } is the number of draws
• m ∈ {0, 1, . . . , N } is the initial number of items with the desired
characteristics in the population


−


 m N m

 x n− x max(0, n + m − N ),


for x ∈
f (x) = N · · · , min(m, n)



 n
0

otherwise
Mean and Variance:
m m N − m N − n
E(X) = n and Var(X) = n
N N N N −1
Applications:
• The probability of getting x successes in n draws (without
replacement) from a finite population of N in which m of them
are looked upon as successes
2.13.7. Discrete Uniform Distribution

0.1
Common Notations:
DiscreteUniform(a, b) or DU (a, b)
f(x)
Parameters:
• a (integer) is the minimum
• b (≥ a, integer) is the maximum 0
7 8 9 10 11 12 13 14 15 16
x
DU (7, 16)
1 2

DISCrete b−a+1 , a, b−a+1 , a + 1, . . . , 1, b


1
for x ∈ {a, a + 1, . . . , b}


f (x) = b − a + 1
0

otherwise
Cumulative Distribution Function:



 0 for x < a


 ⌊x⌋ − a + 1
F (x) = for a ≤ x ≤ b

 b−a+1


1 for b < x
Mean and Variance:
a+b (b − a + 1)2 − 1
E(X) = and Var(X) =
2 12
Applications:
• Used when all values over a finite range are considered to be equally
likely
• Used when no information other than the range is available

Probability and Statistics Elements Explained

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability and Statistics Elements Explained

Uploaded by

Copyright:

Available Formats

Chi-Kong Ng, SEEM3410, Dept.

of SEEM, CUHK 2:1

Elements of Probability and Statistics

• Axioms and Basic Properties of Probabilities:

◦ If A and B are any events in S, then

◦ If A and Ā are complementary events in S, then

◦ P (∅) = 0 for any sample space S.

◦ If A is an event in a finite sample space S, then P (A)

• Rule of Total Probability:

2.3. Random Variables

2.4. PDFs and CDFs

• The cumulative distribution function (CDF) (or just distribution

2.4.2. Continuous Case

• The cumulative density function (CDF) (also called cumulative

2.5. Expectations, Moments, and

• Informally speaking, the expected value of X can be thought of as

• The expected value of the random variable g(X):

• If a and b are constants, then

• If c1, c2, . . . , cn are constants, then

• The r-th moment about the origin of X:

• If a and b are constants, then

2.6. Joint PDFs, Joint CDFs, and

• The joint cumulative distribution function (joint CDF) of X and Y :

for −∞ < x < ∞ and −∞ < y < ∞.

for each x within the range of X.

2.6.2. Continuous Case

• The joint cumulative density function (joint CDF) of X and Y :

for −∞ < x < ∞ and −∞ < y < ∞.

2.7. Expectations, Product Moments,

• If a and b are constants, then

• Covariance is a measure of linear relationship between X and Y .

• Interpreting value of covariance is difficult since it depends on units

• The correlation between X and Y :

2.9. Sampling Distributions

• Independent and Identically Distributed (IID)

2.9.2. Sample Mean

• If Xj ∼ N (µ, σ 2), then

2.9.3. Sample Variance

2.9.4. Sample Proportion

2.10. Confidence Intervals

• Let Θ̂ be a point estimator for θ. Based on the sampling distribution

2.10.2. Confidence Intervals/Bounds for Means

• If the population is non-normal with known variance σ 2. If n is

• If the population is normal with unknown variance σ 2, then a

2.10.3. Confidence Intervals/Bounds for Variances

◦ lower- and upper-confidence bounds for σ 2 are, respectively,

χ2 with ν degrees of freedom

2.10.4. Confidence Intervals/Bound for

2.11. Tests of Hypotheses

• On the other hand, failing to reject H0 implies that we have not

2.12. Common Continuous Distributions

10. Chi-Square Distribution (see page 73)

2.12.1. Uniform Distribution

• b is the maximum, where a < b U (a, b)

Probability Density Function:

Cumulative Density Function:

Mean and Variance:

2.12.2. Triangular Distribution

• m is the mode, where a < m < b A triangular distribution

Probability Density Function:

Cumulative Density Function:

Mean and Variance:

2.12.3. Normal Distribution