You are on page 1of 10

D

Important Probability Distributions

Development of stochastic models is facilitated by identifying a few probabil-


ity distributions that seem to correspond to a variety of data-generating pro-
cesses, and then studying the properties of these distributions. In the following
tables, I list some of the more useful distributions, both discrete distributions
and continuous ones. The names listed are the most common names, although
some distributions go by different names, especially for specific values of the
parameters. In the first column, following the name of the distribution, the
parameter space is specified.
There are two very special continuous distributions, for which I use spe-
cial symbols: the uniform over the interval [a, b], designated U(a, b), and the
normal (or Gaussian), denoted by N(µ, σ 2 ). Notice that the second parame-
ter in the notation for the normal is the variance. Sometimes, such as in the
functions in R, the second parameter of the normal distribution is the stan-
dard deviation instead of the variance. A normal distribution with µ = 0 and
σ 2 = 1 is called the standard normal. I also often use the notation φ(x) for
the PDF of a standard normal and Φ(x) for the CDF of a standard normal,
and these are generalized in the obvious way as φ(x|µ, σ 2) and Φ(x|µ, σ 2).
Except for the uniform and the normal, I designate distributions by a
name followed by symbols for the parameters, for example, binomial(π, n) or
gamma(α, β). Some families of distributions are subfamilies of larger families.
For example, the usual gamma family of distributions is a the two-parameter
subfamily of the three-parameter gamma.
There are other general families of probability distributions that are de-
fined in terms of a differential equation or of a form for the CDF. These include
the Pearson, Johnson, Burr, and Tukey’s lambda distributions.
Most of the common distributions fall naturally into one of two classes.
They have either a countable support with positive probability at each point
in the support, or a continuous (dense, uncountable) support with zero prob-
ability for any subset with zero Lebesgue measure. The distributions listed in
the following tables are divided into these two natural classes.

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
418 Appendix D. Important Probability Distributions

There are situations for which these two distinct classes are not appropri-
ate. For many such situations, however, a mixture distribution provides an
appropriate model. We can express a PDF of a mixture distribution as
m
X
pM (y) = ωj pj (y | θj ),
j=1

where the m distributions with PDFs pj can be either discrete or continuous.


A simple example is a probability model for the amount of rainfall in a given
period, say a day. It is likely that a nonzero probability should be associated
with zero rainfall, but with no other amount of rainfall. In the model above,
m is 2, ω1 is the probability of no rain, p1 is a degenerate PDF with a value
of 1 at 0, ω2 = 1 − ω1 , and p2 is some continuous PDF over IR+ , possibly
similar to a distribution in the exponential family.
A mixture family that is useful in robustness studies is the -mixture dis-
tribution family, which is characterized by a given family with CDF P that is
referred to as the reference distribution, together with a point xc and a weight
. The CDF of a -mixture distribution family is

Pxc , (x) = (1 − )P (x) + I[xc ,∞[ (x),

where 0 ≤  ≤ 1.
Another example of a mixture distribution is a binomial with constant
parameter n, but with a nonconstant parameter π. In many applications, if
an identical binomial distribution is assumed (that is, a constant π), it is often
the case that “over-dispersion” will be observed; that is, the sample variance
exceeds what would be expected given an estimate of some other parameter
that determines the population variance. This situation can occur in a model,
such as the binomial, in which a single parameter determines both the first
and second moments. The mixture model above in which each pj is a binomial
PDF with parameters n and πj may be a better model.
Of course, we can extend this
Pm kind of mixing even further. Instead of
ωj pj (y | θj ) with ωj ≥ 0 and j=1 ωj = 1, we can take ω(θ)p(y | θ) with
R
ω(θ) ≥ 0 and ω(θ) dθ = 1, from which we recognize that ω(θ) is a PDF and
θ can be considered to be the realization of a random variable.
Extending the example of the mixture of binomial distributions, we may
choose some reasonable PDF ω(π). An obvious choice is a beta PDF. This
yields the beta-binomial distribution, with PDF
 
n Γ(α + β) x+α−1
pX,Π (x, π) = π (1 − π)n−x+β−1 I{0,1,...,n}×]0,1[(x, π).
x Γ(α)Γ(β)

This is a standard distribution but I did not include it in the tables below.
This distribution may be useful in situations in which a binomial model is
appropriate, but the probability parameter is changing more-or-less continu-
ously.

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
Appendix D. Important Probability Distributions 419

We recognize a basic property of any mixture distribution: It is a joint


distribution factored as a marginal (prior) for a random variable, which is often
not observable, and a conditional distribution for another random variable,
which is usually the observable variable of interest.
In Bayesian analyses, the first two assumptions (a prior distribution for
the parameters and a conditional distribution for the observable) lead immedi-
ately to a mixture distribution. The beta-binomial above arises in a canonical
example of Bayesian analysis.
Some distributions are recognized because of their use as conjugate priors
and their relationship to sampling distributions. These include the inverted
chi-square and the inverted Wishart.

General References

Evans et al. (2000)give general descriptions of 40 probability distributions.


Balakrishnan and Nevzorov (2003) provide an overview of the important char-
acteristics that distinguish different distributions and then describe the impor-
tant characteristics of many common distributions. Leemis and McQueston
(2008) present an interesting compact graph of the relationships among a
large number of probability distributions.
Currently, the most readily accessible summary of common probability
distributions is Wikipedia: http://wikipedia.org/ Search under the name
of the distribution.

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
420 Appendix D. Important Probability Distributions

Table D.1. Discrete Distributions (PDFs are wrt counting measure)


1
discrete uniform PDF m, y = a1 , . . . , am
P
a1 , . . . , am ∈ IR mean ai /m
(ai − ā)2 /m, where ā =
P P
variance ai /m
Bernoulli PDF πy (1 − π)1−y , y = 0, 1
π ∈]0, 1[ mean π
variance π(1 − π)
!
n y
binomial (n Bernoullis) PDF π (1 − π)n−y , y = 0, 1, . . . , n
y
n = 1, 2, . . . ; π ∈]0, 1[ CF (1 − π + πeit )n
mean nπ
variance nπ(1 − π)
geometric PDF π(1 − π)y , y=0,1,2,. . .
π ∈]0, 1[ mean (1 − π)/π
variance (1 − π)/π2
!
y+n−1 n
negative binomial (n geometrics) PDF π (1 − π)y , y = 0, 1, 2, . . .
n−1
„ «n
π
n = 1, 2, . . . ; π ∈]0, 1[ CF
1 − (1 − π)eit
mean n(1 − π)/π
variance n(1 − π)/π2
d
n! Y yi X
multinomial PDF Q πi , yi = 0, 1, . . . , n, yi = n
yi !
“P i=1 ”n
d iti
n = 1, 2, . . ., CF i=1 πi e
P
for i = 1, . . . , d, πi ∈]0, 1[, πi = 1 means nπi
variances nπi (1 − πi )
covariances −nπi πj
! !
M N −M
y n−y
hypergeometric PDF ! ,
N
n
y = max(0, n − N + M ), . . . , min(n, M )
N = 2, 3, . . .; mean nM/N
M = 1, . . . , N ; n = 1, . . . , N variance (nM/N )(1 − M/N )(N − n)/(N − 1)

continued ...

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
Appendix D. Important Probability Distributions 421

Table D.1. Discrete Distributions (continued)

Poisson PDF θy e−θ /y!, y = 0, 1, 2, . . .


it
θ ∈ IR+ CF eθ(e −1)

mean θ
variance θ
hy y
power series PDF θ , y = 0, 1, 2, . . .
c(θ)
P it y
θ ∈ IR+ CF y hy (θe ) /c(θ)

d
{hy } positive constants mean θ (log(c(θ))

d d2
hy θy θ (log(c(θ)) + θ2 2 (log(c(θ))
P
c(θ) = y variance
dθ dθ

πy
logarithmic PDF − , y = 1, 2, 3, . . .
y log(1 − π)
π ∈]0, 1[ mean −π/((1 − π) log(1 − π))
variance −π(π + log(1 − π))/((1 − π)2 (log(1 − π))2 )
Benford’s PDF logb (y + 1) − logb (y), y = 1, . . . , b − 1
b integer ≥ 3 mean b − 1 − logb ((b − 1)!)

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
422 Appendix D. Important Probability Distributions

Table D.2. The Normal Distributions

def 1 2 2
normal; N(µ, σ2 ) PDF φ(y|µ, σ2 ) = √ e−(y−µ) /2σ
2πσ
2 t2 /2
µ ∈ IR; σ ∈ IR+ CF eiµt−σ
mean µ
variance σ2
1 T −1
multivariate normal; Nd (µ, Σ) PDF e−(y−µ) Σ (y−µ)/2
(2π)d/2 |Σ|1/2
T
t−tT Σt/2
µ ∈ IRd ; Σ  0 ∈ IRd×d CF eiµ
mean µ
covariance Σ
1 −1 T −1
matrix normal PDF e−tr(Ψ (Y −M ) Σ (Y −M ))/2
(2π)nm/2 |Ψ |n/2 |Σ|m/2
M ∈ IRn×m , Ψ  0 ∈ IRm×m , mean M
Σ  0 ∈ IRn×n covariance Ψ ⊗ Σ
1 H −1
complex multivariate normal PDF e−(z−µ) Σ (z−µ)/2
(2π)d/2 |Σ|1/2
I d, Σ  0 ∈ C
µ ∈C I d×d mean µ
covariance Σ

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
Appendix D. Important Probability Distributions 423

Table D.3. Sampling Distributions from the Normal Distribution

1
chi-squared; χ2ν PDF yν/2−1 e−y/2 IĪR+ (y)
Γ(ν/2)2ν/2
ν ∈ IR+ mean ν
if ν ∈ ZZ+ , variance 2ν
Γ((ν + 1)/2)
t PDF √ (1 + y2 /ν)−(ν+1)/2
Γ(ν/2) νπ
ν ∈ IR+ mean 0
variance ν/(ν − 2), for ν > 2
ν /2 ν /2
ν1 1 ν2 2 Γ(ν1 + ν2 ) yν1 /2−1
F PDF I (y)
Γ(ν1 /2)Γ(ν2 /2) (ν2 + ν1 y)(ν1 +ν2 )/2 ĪR+
ν1 , ν2 ∈ IR+ mean ν2 /(ν2 − 2), for ν2 > 2
variance 2ν22 (ν1 + ν2 − 2)/(ν1 (ν2 − 2)2 (ν2 − 4)), for ν2 > 4
|W |(ν−d−1)/2
exp −trace(Σ −1 W ) I{M | M 0∈IRd×d } (W )
` ´
Wishart PDF νd/2
2 |Σ|ν/2 Γd (ν/2)
d = 1, 2, . . . ; mean νΣ
ν > d − 1 ∈ IR; covariance Cov(Wij , Wkl ) = ν(σik σjl + σil σjk ), where Σ = (σij )
Σ  0 ∈ IRd×d

e−λ/2 ν/2−1 −y/2 X (λ/2)k 1
noncentral chi-squared PDF y e yk IĪR+ (y)
2ν/2 k=0
k! Γ(ν/2 + k)2k
ν, λ ∈ IR+ mean ν+λ
variance 2(ν + 2λ)
2
ν ν/2 e−λ /2
noncentral t PDF (ν + y2 )−(ν+1)/2 ×
Γ(ν/2)π1/2
∞ «k/2
ν + k + 1 (λy)k
„ « „
X 2
ν ∈ IR+ , λ ∈ IR Γ
2 k! ν + y2
k=0
λ(ν/2)1/2 Γ((ν − 1)/2)
mean , for ν > 1
Γ(ν/2) „ «2
ν ν Γ((ν − 1)/2)
variance (1 + λ2 ) − λ2 , for ν > 2
ν−2 2 Γ(ν/2)
„ «ν1 /2 „ «ν1 /2+ν2 /2
ν1 ν2
noncentral F PDF e−λ/2 yν1 /2−1 ×
ν2 ν2 + ν1 y

X (λ/2)k Γ(ν2 + ν1 + k) ν1 „ « k „ «k
ν2
ν1 , ν2 , λ ∈ IR+ yk IĪR+ (y)
Γ(ν2 )Γ(ν1 + k)k! ν2 ν2 + ν1 y
k=0

mean ν2 (ν1 + λ)/(ν1 (ν2 − 2)), for ν2 > 2


„ «2 „
(ν1 + λ)2 + (ν1 + 2λ)(ν2 − 2)
«
ν2
variance 2 , for ν2 > 4
ν1 (ν2 − 2)2 (ν2 − 4)

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
424 Appendix D. Important Probability Distributions

Table D.4. Distributions Useful as Priors for the Normal Parameters


„ «α+1
1 1
inverted gamma PDF e−1/βy IĪR+ (y)
Γ(α)β α y
α, β ∈ IR+ mean 1/β(α − 1) for α > 1
variance 1/(β 2 (α − 1)2 (α − 2)) for α > 2
„ «ν/2+1
1 1
inverted chi-squared PDF e−1/2y IĪR+ (y)
Γ(ν/2)2ν/2 y
ν ∈ IR+ mean 1/(ν − 2) for ν > 2
variance 2/((ν − 2)2 (ν − 4)) for ν > 4

Table D.5. Distributions Derived from the Univariate Normal


1 2 2
lognormal PDF √ y−1 e−(log(y)−µ) /2σ IĪR+ (y)
2πσ
2 /2
µ ∈ IR; σ ∈ IR+ mean eµ+σ
2 2
variance e2µ+σ (eσ − 1)
s
λ −λ(y−µ)2 /2µ2 y
inverse Gaussian PDF e IĪR+ (y)
2πy3
µ, λ ∈ IR+ mean µ
variance µ3 /λ
Z λ(y−µ)/σ
1 −(y−µ)2/2σ 2 2 /2
skew normal PDF e e−t dt
πσ q −∞

µ, λ ∈ IR; σ ∈ IR+ mean µ+σ π(1+λ2 )

variance σ2 (1 − 2λ2 /π)

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
Appendix D. Important Probability Distributions 425

Table D.6. Other Continuous Distributions (PDFs are wrt Lebesgue measure)

Γ(α + β) α−1
beta PDF y (1 − y)β−1 I[0,1] (y)
Γ(α)Γ(β)
α, β ∈ IR+ mean α/(α + β)
αβ
variance
(α + β)2 (α + β + 1)
d d
!αd+1 −1
Γ( d+1
P
i=1 αi )
Y αi −1
X
Dirichlet PDF Qd+1 yi 1− yi I[0,1]d (y)
i=1 Γ(αi ) i=1 i=1

α ∈ IRd+1
+ mean α/kαk1 (αd+1 /kαk1 is the “mean of Yd+1 ”.)
α (kαk1 − α)
variance
kαk21 (kαk1 + 1)
1
uniform; U(θ1 , θ2 ) PDF I[θ ,θ ] (y)
θ2 − θ1 1 2
θ1 < θ2 ∈ IR mean (θ2 + θ1 )/2
variance (θ22 − 2θ1 θ2 + θ12 )/12
1
Cauchy PDF „ “ ”2 «
y−γ
πβ 1 + β

γ ∈ IR; β ∈ IR+ mean does not exist


variance does not exist
e−(y−µ)/β
logistic PDF
β(1 + e−(y−µ)/β )2
µ ∈ IR; β ∈ IR+ mean µ
variance β 2 π2 /3
αγ α
Pareto PDF I[γ,∞[ (y)
yα+1
α, γ ∈ IR+ mean αγ/(α − 1) for α > 1
variance αγ 2 /((α − 1)2 (α − 2)) for α > 2
power function PDF (y/β)α I[0,β[ (y)
α, β ∈ IR+ mean αβ/(α + 1)
variance αβ 2 /((α + 2)(α + 1)2 )
1
von Mises PDF eκ cos(x−µ) I[µ−π,µ+π] (y)
2πI0 (κ)
µ ∈ IR; κ ∈ IR+ mean µ
variance 1 − (I1 (κ)/I0 (κ))2

continued ...

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle
426 Appendix D. Important Probability Distributions

Table D.6. Other Continuous Distributions (continued)


1
gamma PDF yα−1 e−y/β IĪR+ (y)
Γ(α)β α
α, β ∈ IR+ mean αβ
variance αβ 2
1
three-parameter gamma PDF (y − γ)α−1 e−(y−γ)/β I]γ,∞[ (y)
Γ(α)β α
α, β ∈ IR+ ; γ ∈ IR mean αβ + γ
variance αβ 2
exponential PDF θ−1 e−y/θ IĪR+ (y)
θ ∈ IR+ mean θ
variance θ2
1 −|y−µ|/θ
double exponential PDF 2θ
e
µ ∈ IR; θ ∈ IR+ mean µ
(folded exponential) variance 2θ2
α α−1 −y α /β
Weibull PDF y e IĪR+ (y)
β
α, β ∈ IR+ mean β 1/α Γ(α−1 + 1)
β 2/α Γ(2α−1 + 1) − (Γ(α−1 + 1))2
` ´
variance
1 −(y−α)/β
extreme value (Type I) PDF e exp(e−(y−α)/β )
β
α ∈ IR; β ∈ IR+ mean α − βΓ0(1)
variance β 2 π2 /6

c
Elements of Computational Statistics, Second Edition 2013 James E. Gentle

You might also like