You are on page 1of 9

Probability distribution

1 Introduction

(mathematics). For other uses, see Distribution.

p(S)
0.16

1
_
6
__
5
36
1
_
9
__
1
12
__
1
18
__
1
36

In probability and statistics, a probability distribution

0.14
assigns a probability to each measurable subset of the pos0.12
sible outcomes of a random experiment, survey, or pro0.10
cedure of statistical inference. Examples are found in experiments whose sample space is non-numerical, where
0.08
the distribution would be a categorical distribution; ex0.06
periments whose sample space is encoded by discrete
0.04
random variables, where the distribution can be specied by a probability mass function; and experiments with
0.02
sample spaces encoded by continuous random variables,
2 3 4 5 6 7 8 9 10 11 12
where the distribution can be specied by a probability
S
density function. More complex experiments, such as
those involving stochastic processes dened in continuous The probability mass function (pmf) p(S) species the probability
time, may demand the use of more general probability distribution for the sum S of counts from two dice. For example,
measures.
the gure shows that p(11) = 1/18. The pmf allows the compuIn applied probability, a probability distribution can be tation of probabilities of events such as P(S > 9) = 1/12 + 1/18
+ 1/36 = 1/6, and all other probabilities in the distribution.
specied in a number of dierent ways, often chosen for
mathematical convenience:
To dene probability distributions for the simplest cases,
one needs to distinguish between discrete and continu by supplying a valid probability mass function or ous random variables. In the discrete case, one can easily
probability density function
assign a probability to each possible value: for example,
when throwing a fair die, each of the six values 1 to 6
by supplying a valid cumulative distribution function has the probability 1/6. In contrast, when a random varior survival function
able takes values from a continuum then, typically, probabilities can be nonzero only if they refer to intervals: in
by supplying a valid hazard function
quality control one might demand that the probability of
a 500 g package containing between 490 g and 510 g
by supplying a valid characteristic function
should be no less than 98%.

0.2

0.3

0.4

by supplying a rule for constructing a new random

variable from other random variables whose joint
probability distribution is known.

34.1% 34.1%

0.1

A probability distribution can either be univariate or

multivariate. A univariate distribution gives the probabilities of a single random variable taking on various alternative values; a multivariate distribution (a joint probability
distribution) gives the probabilities of a random vector
a set of two or more random variablestaking on various combinations of values. Important and commonly
encountered univariate probability distributions include
the binomial distribution, the hypergeometric distribution, and the normal distribution. The multivariate normal distribution is a commonly encountered multivariate
distribution.

0.0

0.1%
3

2.1%

13.6%

13.6%
0

2.1%

0.1%
3

The probability density function (pdf) of the normal distribution,

also called Gaussian or bell curve, the most important continuous random distribution. As notated on the gure, the probabilities of intervals of values correspond to the area under the
curve.

1

4 DISCRETE PROBABILITY DISTRIBUTION

if a total order is dened for its possible values), the cumulative distribution function (CDF) gives the probability that the random variable is no larger than a given value;
in the real-valued case, the CDF is the integral of the
probability density function (pdf) provided that this function exists.

Terminology

As probability theory is used in quite diverse applications, terminology is not uniform and sometimes confusing. The following terms are used for non-cumulative
probability distribution functions:
Probability mass, Probability mass function,
p.m.f.: for discrete random variables.
Categorical distribution: for discrete random variables with a nite set of values.
Probability density, Probability density function,
p.d.f.: most often reserved for continuous random
variables.

Tail: the complement of the head within the support; the large set of values where the pmf or pdf is
relatively low.
Expected value or mean: the weighted average of
the possible values, using their probabilities as their
weights; or the continuous analog thereof.
Median: the value such that the set of values less
than the median has a probability of one-half.
Variance: the second moment of the pmf or pdf
about the mean; an important measure of the
dispersion of the distribution.
Standard deviation: the square root of the variance,
and hence another measure of dispersion.
Symmetry: a property of some distributions in
which the portion of the distribution to the left of
a specic value is a mirror image of the portion to
its right.
Skewness: a measure of the extent to which a pmf
or pdf leans to one side of its mean.

The following terms are somewhat ambiguous as they can

refer to non-cumulative or cumulative distributions, deBecause a probability distribution Pr on the real line is
pending on authors preferences:
determined by the probability of a scalar random variable
X being in a half-open interval (-, x], the probability
Probability distribution function: continuous or distribution is completely characterized by its cumulative
discrete, non-cumulative or cumulative.
distribution function:
Probability function: even more ambiguous, can
mean any of the above or other things.

F (x) = Pr [X x]

all for x R.

Finally,

Probability distribution: sometimes the same as

probability distribution function, but usually refers to
See also: Probability mass function and Categorical disthe more complete assignment of probabilities to all
tribution
measurable subsets of outcomes, not just to specic
A discrete probability distribution is a probability
outcomes or ranges of outcomes.

2.1

Basic terms

Mode: for a discrete random variable, the value with

highest probability (the location at which the probability mass function has its peak); for a continuous
random variable, the location at which the probability density function has its peak.
Support: the smallest closed set whose complement The probability mass function of a discrete probability distribution. The probabilities of the singletons {1}, {3}, and {7} are
has probability zero.
respectively 0.2, 0.5, 0.3. A set not containing any of these points

Head: the range of values where the pmf or pdf is has probability zero.
relatively high.

4.1

Measure theoretic formulation

1
0
The cdf of a discrete probability distribution, ...

4.1 Measure theoretic formulation

A measurable function X : A B between a probability
space (A, A, P ) and a measurable space (B, B) is called
a discrete random variable provided its image is a
countable set and the pre-image of singleton sets are
measurable, i.e., X 1 (b) A for all b B . The
latter requirement induces a probability mass function
fX : X(A) R via fX (b) := P (X 1 (b)) . Since the
pre-images of disjoint sets are disjoint

fX (b) =

bX(A)

P (X 1 (b)) = P

bX(A)

X 1 (b) = P (A) =

bX(A)

4.2 Cumulative density

1
0
... of a distribution which has both a continuous part and a discrete part.

Equivalently to the above, a discrete random variable can

be dened as a random variable whose cumulative distribution function (cdf) increases only by jump discontinuitiesthat is, its cdf increases only where it jumps to a
higher value, and is constant between those jumps. The
points where jumps occur are precisely the values which
the random variable may take.

4.3 Delta-function representation

Consequently, a discrete probability distribution is ofdistribution characterized by a probability mass function. ten represented as a generalized probability density funcThus, the distribution of a random variable X is discrete, tion involving Dirac delta functions, which substantially
and X is called a discrete random variable, if
unies the treatment of continuous and discrete distributions. This is especially useful when dealing with probability distributions involving both a continuous and a discrete part.

Pr(X = u) = 1
u

as u runs through the set of all possible values of X.

Hence, a random variable can assume only a nite or
countably innite number of valuesthe random variable is a discrete variable. For the number of potential
values to be countably innite, even though their probabilities sum to 1, the probabilities have to decline to zero
fast enough. for example, if Pr(X = n) = 21n for n = 1,
2, ..., we have the sum of probabilities 1/2 + 1/4 + 1/8 +
... = 1.

For a discrete random variable X, let u0 , u1 , ... be the

values it can take with non-zero probability. Denote

Well-known discrete probability distributions used in statistical modeling include the Poisson distribution, the
Bernoulli distribution, the binomial distribution, the
geometric distribution, and the negative binomial distribution. Additionally, the discrete uniform distribution is commonly used in computer programs that make
equal-probability random selections between a number of
choices.

Pr

i = X 1 (ui ) = { : X() = ui }, i = 0, 1, 2, . . .
These are disjoint sets, and by formula (1)
(

)
i

Pr(i ) =

Pr(X = ui ) = 1.

It follows that the probability that X takes any value except for u0 , u1 , ... is zero, and thus one can write X as

X=

ui 1i

8 RANDOM NUMBER GENERATION

except on a set of probability zero, where 1A is the of distributions, singular distributions, which are neither
indicator function of A. This may serve as an alternative continuous nor discrete nor a mixture of those. An exdenition of discrete random variables.
ample is given by the Cantor distribution. Such singular
distributions however are never encountered in practice.

Continuous probability distribution

Note on terminology: some authors use the term continuous distribution to denote the distribution with continuous cumulative distribution function. Thus, their denition includes both the (absolutely) continuous and singular distributions.

By one convention, a probability distribution is called

continuous if its cumulative distribution function F (x) =
A continuous probability distribution is a probabil- (, x] is continuous and, therefore, the probability
ity distribution that has a probability density function. measure of singletons {x} = 0 for all x .
Mathematicians also call such a distribution absolutely
continuous, since its cumulative distribution function is Another convention reserves the term continuous probaabsolutely continuous with respect to the Lebesgue mea- bility distribution for absolutely continuous distributions.
sure . If the distribution of X is continuous, then X These distributions can be characterized by a probability
is called a continuous random variable. There are density function: a non-negative Lebesgue integrable
many examples of continuous probability distributions: function f dened on the real numbers such that
normal, uniform, chi-squared, and others.
x
Intuitively, a continuous random variable is the one which
F
(x)
=
(,
x]
=
f (t) dt.
can take a continuous range of valuesas opposed to a

discrete distribution, where the set of possible values for

Discrete distributions and some continuous distributions
the random variable is at most countable. While for a
(like the Cantor distribution) do not admit such a density.
discrete distribution an event with probability zero is im1
possible (e.g., rolling 3 2 on a standard die is impossible, and has probability zero), this is not so in the case
of a continuous random variable. For example, if one 6 Some properties
measures the width of an oak leaf, the result of 3 cm
The probability distribution of the sum of two inis possible; however, it has probability zero because undependent random variables is the convolution of
countably many other potential values exist even between
each of their distributions.
3 cm and 4 cm. Each of these individual outcomes has
probability zero, yet the probability that the outcome will
Probability distributions are not a vector space
fall into the interval (3 cm, 4 cm) is nonzero. This apthey are not closed under linear combinations, as
parent paradox is resolved by the fact that the probability
these do not preserve non-negativity or total intethat X attains some value within an innite set, such as
gral 1but they are closed under convex combinaan interval, cannot be found by naively adding the probtion, thus forming a convex subset of the space of
abilities for individual values. Formally, each value has
functions (or measures).
an innitesimally small probability, which statistically is
equivalent to zero.
Formally, if X is a continuous random variable, then it 7 Kolmogorov denition
has a probability density function (x), and therefore its
probability of falling into a given interval, say [a, b] is
Main articles: Probability space and Probability measure
given by the integral
In the measure-theoretic formalization of probability theory, a random variable is dened as a measurable funcPr[a X b] =
f (x) dx
tion X from a probability space (,F ,P) to measurable
a
space (X ,A) . A probability distribution of X is the
In particular, the probability for X to take any single value pushforward measure X*P of X , which is a probability
a (that is a X a) is zero, because an integral with measure on (X ,A) satisfying X*P = PX 1 .
coinciding upper and lower limits is always equal to zero.

The denition states that a continuous probability distribution must possess a density, or equivalently, its cumula- 8 Random number generation
tive distribution function be absolutely continuous. This
requirement is stronger than simple continuity of the cu- Main article: Pseudo-random number sampling
mulative distribution function, and there is a special class

10.2

Related to positive real-valued quantities that grow exponentially (e.g. prices, incomes, populations)

A frequent problem in statistical simulations (the Monte 10.2 Related to positive real-valued quanCarlo method) is the generation of pseudo-random numtities that grow exponentially (e.g.
bers that are distributed in a given way. Most algorithms
prices, incomes, populations)
are based on a pseudorandom number generator that produces numbers X that are uniformly distributed in the
Log-normal distribution, for a single such quantity
interval [0,1). These random variates X are then transwhose log is normally distributed
formed via some algorithm to create a new random vari Pareto distribution, for a single such quantity whose
ate having the required probability distribution.
log is exponentially distributed; the prototypical
power law distribution

Applications
10.3 Related to real-valued quantities that

The concept of the probability distribution and the ranare assumed to be uniformly disdom variables which they describe underlies the mathetributed over a (possibly unknown)
matical discipline of probability theory, and the science
region
of statistics. There is spread or variability in almost any
value that can be measured in a population (e.g. height of
Discrete uniform distribution, for a nite set of valpeople, durability of a metal, sales growth, trac ow,
ues (e.g. the outcome of a fair die)
etc.); almost all measurements are made with some in Continuous uniform distribution, for continuously
trinsic error; in physics many processes are described
distributed values
probabilistically,from the kinetic properties of gases to
the quantum mechanical description of fundamental particles. For these and many other reasons, simple numbers
are often inadequate for describing a quantity, while 10.4 Related to Bernoulli trials (yes/no
events, with a given probability)
probability distributions are often more appropriate.
As a more specic example of an application, the cache
language models and other statistical language models
used in natural language processing to assign probabilities
to the occurrence of particular words and word sequences
do so by means of probability distributions.

10

Main article: List of probability distributions

The following is a list of some of the most common probability distributions, grouped by the type of process that
they are related to. For a more complete list, see list of
probability distributions, which groups by the nature of
the outcome being considered (discrete, continuous, multivariate, etc.)
Note also that all of the univariate distributions below are
singly peaked; that is, it is assumed that the values cluster
around a single point. In practice, actually observed quantities may cluster around multiple values. Such quantities
can be modeled using a mixture distribution.

10.1

Related to real-valued quantities that

grow linearly (e.g. errors, osets)

Normal distribution (Gaussian distribution), for a

single such quantity; the most common continuous
distribution

Basic distributions:
Bernoulli distribution, for the outcome of a
single Bernoulli trial (e.g. success/failure,
yes/no)
Binomial distribution, for the number of positive occurrences (e.g. successes, yes votes,
etc.) given a xed total number of independent
occurrences
Negative binomial distribution, for binomialtype observations but where the quantity of interest is the number of failures before a given
number of successes occurs
Geometric distribution, for binomial-type observations but where the quantity of interest is
the number of failures before the rst success;
a special case of the negative binomial distribution
Related to sampling schemes over a nite population:
Hypergeometric distribution, for the number
of positive occurrences (e.g. successes, yes
votes, etc.) given a xed number of total
occurrences, using sampling without replacement
Beta-binomial distribution, for the number of
positive occurrences (e.g. successes, yes
votes, etc.) given a xed number of total occurrences, sampling using a Polya urn scheme
(in some sense, the opposite of sampling
without replacement)

11

10.5

Related to categorical outcomes

(events with K possible outcomes,
with a given probability for each
outcome)

Students t distribution, the distribution of the ratio

of a standard normal variable and the square root of
a scaled chi squared variable; useful for inference
regarding the mean of normally distributed samples
with unknown variance (see Students t-test)

Categorical distribution, for a single categorical outcome (e.g. yes/no/maybe in a survey); a generalization of the Bernoulli distribution

F-distribution, the distribution of the ratio of two

scaled chi squared variables; useful e.g. for inferences that involve comparing variances or involving
R-squared (the squared correlation coecient)

Multinomial distribution, for the number of each

type of categorical outcome, given a xed number
of total outcomes; a generalization of the binomial
10.9
distribution

Multivariate hypergeometric distribution, similar to

the multinomial distribution, but using sampling Main article: Conjugate prior
without replacement; a generalization of the
hypergeometric distribution

10.6

Related to events in a Poisson process

(events that occur independently with
a given rate)

Poisson distribution, for the number of occurrences

of a Poisson-type event in a given period of time
Exponential distribution, for the time before the next
Poisson-type event occurs
Gamma distribution, for the time before the next k
Poisson-type events occur

10.7

Related to the absolute values of vectors with normally distributed components

Rayleigh distribution, for the distribution of vector magnitudes with Gaussian distributed orthogonal components. Rayleigh distributions are found in
RF signals with Gaussian real and imaginary components.
Rice distribution, a generalization of the Rayleigh
distributions for where there is a stationary background signal component. Found in Rician fading
of radio signals due to multipath propagation and in
MR images with noise corruption on non-zero NMR
signals.

10.8

Related to normally distributed

quantities operated with sum of
squares (for hypothesis testing)

Chi-squared distribution, the distribution of a sum

of squared standard normal variables; useful e.g. for
inference regarding the sample variance of normally
distributed samples (see chi-squared test)

Beta distribution, for a single probability (real number between 0 and 1); conjugate to the Bernoulli distribution and binomial distribution
Gamma distribution, for a non-negative scaling
parameter; conjugate to the rate parameter of a
Poisson distribution or exponential distribution, the
precision (inverse variance) of a normal distribution,
etc.
Dirichlet distribution, for a vector of probabilities
that must sum to 1; conjugate to the categorical distribution and multinomial distribution; generalization of the beta distribution
Wishart distribution, for a symmetric non-negative
denite matrix; conjugate to the inverse of the
covariance matrix of a multivariate normal distribution; generalization of the gamma distribution

Copula (statistics)
Empirical probability
Histogram
Joint probability distribution
Likelihood function
List of statistical topics
Kirkwood approximation
Moment-generating function
Quasiprobability distribution
RiemannStieltjes integral application to probability
theory

12

References

B. S. Everitt: The Cambridge Dictionary of Statistics, Cambridge University Press, Cambridge (3rd
edition, 2006). ISBN 0-521-69027-7
Bishop: Pattern Recognition and Machine Learning,
Springer, ISBN 0-387-31073-8
den Dekker A. J., Sijbers J., (2014) Data distributions in magnetic resonance images: a review,
Physica Medica,

13

Hazewinkel, Michiel, ed. (2001), Probability distribution, Encyclopedia of Mathematics, Springer,

ISBN 978-1-55608-010-4

14

14
14.1

Text and image sources, contributors, and licenses

Text

Probability distribution Source: https://en.wikipedia.org/wiki/Probability_distribution?oldid=673249902 Contributors: Damian Yerrick, AxelBoldt, Bryan Derksen, Zundark, The Anome, Ap, Fnielsen, Miguel~enwiki, Patrick, Michael Hardy, Dcljr, Tomi, TakuyaMurata, Jipumarino, Den fjttrade ankan~enwiki, Poor Yorick, Caramdir~enwiki, A5, Dino, Dysprosia, Jitse Niesen, Populus, Phys,
Shizhao, Robbot, Benwing, Seglea, Gandalf61, Rursus, (:Julien:), Ojigiri~enwiki, Giftlite, Pgan002, Chirlu, Knutux, LiDaobing, OverlordQ, MarkSweep, Fintor, Tordek ar, Rich Farmbrough, Paul August, Bender235, MisterSheik, Marner, O18, Whosyourjudas, 3mta3,
Runner1928, Tsirel, Alansohn, Eric Kvaalen, Sjschen, PAR, Avenue, Velella, Cburnett, TheCoee, Morton.lin, Bkkbrad, OdedSchramm,
Btyner, Graham87, BD2412, Jclemens, TexasDawg, Salix alba, FlaBot, DVdm, Whosasking, YurikBot, Wavelength, RobotE, Jtkiefer,
Gaius Cornelius, Ponnu, Samois98, Schmock, Rwalker, Emijrp, Johndburger, Closedmouth, Teply, Jsnx, SmackBot, Unyoyega, Delldot,
Eskimbot, Chris the speller, Shawn M. O'Hare, Jprg1966, Lollerskates, Silly rabbit, Nbarth, Iwaterpolo, Eliezg, Gala.martin, ShaunES,
G716, Jon Awbrey, Mental Blank, OverInsured, Bjankuloski06, Dicklyon, Hu12, Courcelles, JRSpriggs, AbsolutDan, Thamelry, Davhorn,
Thijs!bot, Epbr123, Pabristow, Salgueiro~enwiki, Roger.simmons, VictorAnyakin, JAnDbot, Martinkunev, GoodDamon, SiobhanHansa,
Magioladitis, A.M.R., SHCarter, Riceplaytexas, Sandym, Baccyak4H, JJ Harrison, David Eppstein, Pax:Vobiscum, Uvainio, R'n'B, Lilac
Soul, MGriebe, J.delanoy, Hughperkins, It Is Me Here, Tarotcards, JayJasper, Richard D. LeCour, Policron, Domminico, Serguei S.
Dukachev, DavidCBryant, Larryisgood, Classical geographer, TXiKiBoT, Rponamgi, Anonymous Dissident, X-Bert, Jsd115, Adrokin,
Abhinav316, Markhebner, Quietbritishjim, TNARasslin, Ttony21, Dreginald, BotMultichill, Statlearn, Jojalozzo, Oxymoron83, Techman224, OKBot, Josuechan, Megaloxantha, Melcombe, Digisus, Rinconsoleao, ClueBot, Rumping, Justin W Smith, The Thing That
Should Not Be, Napzilla, ImperfectlyInformed, Niceguyedc, UKoch, WestwoodMatt, Lartoven, NuclearWarfare, Qwfp, Darkicebot, Gjnaasaa, XLinkBot, Alexius08, MystBot, Tayste, Addbot, Applepiein, Topology Expert, Jncraton, MrOllie, Bguras puppy, OlEnglish,
Jan eissfeldt, Loupeter, Zorrobot, Tedtoal, Luckas-bot, AnomieBOT, JackieBot, Csigabi, DirlBot, Xqbot, Bdmy, TinucherianBot II, Ptrf,
Rainald62, Hamamelis, AllCluesKey, Sky Attacker, I dream of horses, Gate2quality, Stpasha, Firelog, Lotje, Duoduoduo, Bhoola Pakistani,
Marie Poise, Kastchei, Jowa fan, David Vose, EmausBot, Branny 96, , JaeDyWolf, TuHan-Bot, Spamdingel, SporkBot, ClueBot
NG, Tillander, CocuBot, Frietjes, Braincricket, Amircrypto, Helpful Pixie Bot, ServiceAT, AvocatoBot, Xuhuyang, ChrisGualtieri, Illia
Connell, Dexbot, A3141592653589, Ginsuloft, Airwoz, Loraof, Ljyljy1379, KasparBot and Anonymous: 219

14.2

Images

File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original

artist: ?
License: Public domain Contributors: Own work using Inkscape and Open Oce Draw software. Original artist: Tim Stellmach
License: Public domain Contributors: Own work Original artist: Oleg Alexandrov
File:Discrete_probability_distribution.svg Source:
distribution.svg License: Public domain Contributors: Own work Original artist: Incnis Mrsi (<a href='//commons.wikimedia.org/wiki/
User_talk:Incnis_Mrsi' title='User talk:Incnis Mrsi'>talk</a>)
sepalwidth.svg License: CC BY-SA 3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (original); Pbroks13 (talk) (redraw)
File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-bysa-3.0 Contributors: ? Original artist: ?
File:Loglogisticpdf_no-labels.svg Source:
cense: CC BY-SA 3.0 Contributors:

Loglogisticpdf.svg Original artist: Loglogisticpdf.svg: Qwfp (talk)

File:Mixed_probability_distribution.svg Source:
distribution.svg License: Public domain Contributors: Own work Original artist: Incnis Mrsi (<a href='//commons.wikimedia.org/wiki/
User_talk:Incnis_Mrsi' title='User talk:Incnis Mrsi'>talk</a>)
File:Normal_probability_distribution.svg Source:
distribution.svg License: Public domain Contributors: Own work Original artist: Incnis Mrsi (<a href='//commons.wikimedia.org/wiki/
User_talk:Incnis_Mrsi' title='User talk:Incnis Mrsi'>talk</a>)
File:People_icon.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/37/People_icon.svg License: CC0 Contributors: OpenClipart Original artist: OpenClipart
File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?
Original artist: ?
Contributors:
Created from scratch in Adobe Illustrator. Based on Image:Question book.png created by User:Equazcion Original artist:
Tkgd2007
svg License: CC BY 2.5 Contributors: Own work, based (in concept) on gure by Jeremy Kemp, on 2005-02-09 Original artist: Mwtoews