You are on page 1of 5

Probability mass function

From Wikipedia, the free encyclopedia


Jump to navigationJump to search

The graph of a probability mass function. All the values of this function must be
non-negative and sum up to 1.
In probability and statistics, a probability mass function is a function that gives
the probability that a discrete random variable is exactly equal to some value.[1]
Sometimes it is also known as the discrete density function. The probability mass
function is often the primary means of defining a discrete probability
distribution, and such functions exist for either scalar or multivariate random
variables whose domain is discrete.

A probability mass function differs from a probability density function (PDF) in


that the latter is associated with continuous rather than discrete random
variables. A PDF must be integrated over an interval to yield a probability.[2]

The value of the random variable having the largest probability mass is called the
mode.

Contents
1 Formal definition
2 Measure theoretic formulation
3 Examples
3.1 Finite
3.2 Infinite
4 Multivariate case
5 References
6 Further reading
Formal definition
Probability mass function is the probability distribution of a discrete random
variable, and provides the possible values and their associated probabilities. It
is the function {\displaystyle p:\mathbb {\mathbb {R} } }{\displaystyle p:\mathbb
{\mathbb {R} } } {\displaystyle \rightarrow [0,1]}{\displaystyle \rightarrow [0,1]}
defined by

{\displaystyle p_{X}(x)=P(X=x)}{\displaystyle p_{X}(x)=P(X=x)}

for {\displaystyle -\infty <x<\infty }{\displaystyle -\infty <x<\infty },[2] where


{\displaystyle P}P is a probability measure. {\displaystyle p_{X}(x)}p_{X}(x) can
also be simplified as {\displaystyle p(x)}p(x).[3]

The probabilities associated with all (hypothetical) values must be non-negative


and sum up to 1,

{\displaystyle \sum _{x}p_{X}(x)=1\quad }{\displaystyle \sum _{x}p_{X}(x)=1\quad }


and {\displaystyle \quad p_{X}(x)\geq 0}{\displaystyle \quad p_{X}(x)\geq 0}.
Thinking of probability as mass helps to avoid mistakes since the physical mass is
conserved as is the total probability for all hypothetical outcomes {\displaystyle
x}x.

Measure theoretic formulation


A probability mass function of a discrete random variable {\displaystyle X}X can be
seen as a special case of two more general measure theoretic constructions: the
distribution of {\displaystyle X}X and the probability density function of
{\displaystyle X}X with respect to the counting measure. We make this more precise
below.
Suppose that {\displaystyle (A,{\mathcal {A}},P)}(A,{\mathcal {A}},P) is a
probability space and that {\displaystyle (B,{\mathcal {B}})}(B,{\mathcal {B}}) is
a measurable space whose underlying σ-algebra is discrete, so in particular
contains singleton sets of {\displaystyle B}B. In this setting, a random variable
{\displaystyle X\colon A\to B}X\colon A\to B is discrete provided its image is
countable. The pushforward measure {\displaystyle X_{*}(P)}{\displaystyle X_{*}(P)}
—called the distribution of {\displaystyle X}X in this context—is a probability
measure on {\displaystyle B}B whose restriction to singleton sets induces the
probability mass function (as mentioned in the previous section) {\displaystyle
f_{X}\colon B\to \mathbb {R} }f_{X}\colon B\to \mathbb {R} since {\displaystyle
f_{X}(b)=P(X^{-1}b)=P(X=b)}{\displaystyle f_{X}(b)=P(X^{-1}b)=P(X=b)} for each
{\displaystyle b\in B}b\in B.

Now suppose that {\displaystyle (B,{\mathcal {B}},\mu )}(B,{\mathcal {B}},\mu ) is


a measure space equipped with the counting measure μ. The probability density
function {\displaystyle f}f of {\displaystyle X}X with respect to the counting
measure, if it exists, is the Radon–Nikodym derivative of the pushforward measure
of {\displaystyle X}X (with respect to the counting measure), so {\displaystyle
f=dX_{*}P/d\mu }f=dX_{*}P/d\mu and {\displaystyle f}f is a function from
{\displaystyle B}B to the non-negative reals. As a consequence, for any
{\displaystyle b\in B}b\in B we have

{\displaystyle P(X=b)=P(X^{-1}b)=(X_{*}P)b=}{\displaystyle P(X=b)=P(X^{-


1}b)=(X_{*}P)b=}{\displaystyle \int _{b}fd\mu =f(b),}{\displaystyle \int _{b}fd\mu
=f(b),}
demonstrating that {\displaystyle f}f is in fact a probability mass function.

When there is a natural order among the potential outcomes {\displaystyle x}x, it
may be convenient to assign numerical values to them (or n-tuples in case of a
discrete multivariate random variable) and to consider also values not in the image
of {\displaystyle X}X. That is, {\displaystyle f_{X}}f_X may be defined for all
real numbers and {\displaystyle f_{X}(x)=0}f_{X}(x)=0 for all {\displaystyle
x\notin X(S)}{\displaystyle x\notin X(S)} as shown in the figure.

The image of {\displaystyle X}X has a countable subset on which the probability
mass function {\displaystyle f_{X}(x)}f_{X}(x) is one. Consequently, the
probability mass function is zero for all but a countable number of values of
{\displaystyle x}x.

The discontinuity of probability mass functions is related to the fact that the
cumulative distribution function of a discrete random variable is also
discontinuous. If {\displaystyle X}X is a discrete random variable, then
{\displaystyle P(X=x)=1}{\displaystyle P(X=x)=1} means that the casual event
{\displaystyle (X=x)}{\displaystyle (X=x)} is certain (it is true in the 100% of
the occurrences); on the contrary, {\displaystyle P(X=x)=0}{\displaystyle P(X=x)=0}
means that the casual event {\displaystyle (X=x)}{\displaystyle (X=x)} is always
impossible. This statement isn't true for a continuous random variable
{\displaystyle X}X, for which {\displaystyle P(X=x)=0}{\displaystyle P(X=x)=0} for
any possible {\displaystyle x}x: in fact, by definition, a continuous random
variable can have an infinite set of possible values and thus the probability it
has a single particular value x is equal to {\displaystyle {\frac {1}{\infty }}=0}
{\displaystyle {\frac {1}{\infty }}=0}. Discretization is the process of converting
a continuous random variable into a discrete one.

Examples
Main articles: Bernoulli distribution, Binomial distribution, and Geometric
distribution
Finite
There are three major distributions associated, the Bernoulli distribution, the
binomial distribution and the geometric distribution.

Bernoulli distribution, Ber(p), is used to model an experiment with only two


possible outcomes. The two outcomes are often encoded as 1 and 0.
{\displaystyle p_{X}(x)={\begin{cases}p,&{\text{if }}x{\text{ is 1}}\\1-
p,&{\text{if }}x{\text{ is 0}}\end{cases}}}{\displaystyle p_{X}
(x)={\begin{cases}p,&{\text{if }}x{\text{ is 1}}\\1-p,&{\text{if }}x{\text{ is
0}}\end{cases}}}
An example of the Bernoulli distribution is tossing a coin. Suppose that
{\displaystyle S}S is the sample space of all outcomes of a single toss of a fair
coin, and {\displaystyle X}X is the random variable defined on {\displaystyle S}S
assigning 0 to the category "tails" and 1 to the category "heads". Since the coin
is fair, the probability mass function is
{\displaystyle p_{X}(x)={\begin{cases}{\frac {1}{2}},&x\in \{0,1\},\\0,&x\notin \
{0,1\}.\end{cases}}}{\displaystyle p_{X}(x)={\begin{cases}{\frac {1}{2}},&x\in \
{0,1\},\\0,&x\notin \{0,1\}.\end{cases}}}
Binomial distribution, models the number of successes when someone draws n times
with replacement. Each draw or experiment is independent, with two possible
outcomes. The associated probability mass function is {\textstyle {\binom {n}
{k}}p^{k}(1-p)^{n-k}}{\textstyle {\binom {n}{k}}p^{k}(1-p)^{n-k}}.

The probability mass function of a fair die. All the numbers on the die have an
equal chance of appearing on top when the die stops rolling.
An example of the binomial distribution is the probability of getting exactly one 6
when someone rolls a fair die three times.
Geometric distribution describes the number of trials needed to get one success.
Its probability mass function is {\textstyle p_{X}(k)=(1-p)^{k-1}p}{\textstyle
p_{X}(k)=(1-p)^{k-1}p}.
An example is tossing the coin until the first head appears.
Other distributions that can be modeled using a probability mass function are the
categorical distribution (also known as the generalized Bernoulli distribution) and
the multinomial distribution.
If the discrete distribution has two or more categories one of which may occur,
whether or not these categories have a natural ordering, when there is only a
single trial (draw) this is a categorical distribution.
An example of a multivariate discrete distribution, and of its probability mass
function, is provided by the multinomial distribution. Here the multiple random
variables are the numbers of successes in each of the categories after a given
number of trials, and each non-zero probability mass gives the probability of a
certain combination of numbers of successes in the various categories.
Infinite
The following exponentially declining distribution is an example of a distribution
with an infinite number of possible outcomes—all the positive integers:
{\displaystyle {\text{Pr}}(X=i)={\frac {1}{2^{i}}}\qquad {\text{for }}i=1,2,3,\dots
}{\displaystyle {\text{Pr}}(X=i)={\frac {1}{2^{i}}}\qquad
{\text{for }}i=1,2,3,\dots }
Despite the infinite number of possible outcomes, the total probability mass is 1/2
+ 1/4 + 1/8 + ⋯ = 1, satisfying the unit total probability requirement for a
probability distribution.
Multivariate case
Main article: Joint probability distribution
Two or more discrete random variables have a joint probability mass function, which
gives the probability of each possible combination of realizations for the random
variables.

References
Stewart, William J. (2011). Probability, Markov Chains, Queues, and Simulation:
The Mathematical Basis of Performance Modeling. Princeton University Press. p. 105.
ISBN 978-1-4008-3281-1.
A modern introduction to probability and statistics : understanding why and how.
Dekking, Michel, 1946-. London: Springer. 2005. ISBN 978-1-85233-896-1. OCLC
262680588.
Rao, Singiresu S., 1944- (1996). Engineering optimization : theory and practice
(3rd ed.). New York: Wiley. ISBN 0-471-55034-5. OCLC 62080932.
Further reading
Johnson, N. L.; Kotz, S.; Kemp, A. (1993). Univariate Discrete Distributions (2nd
ed.). Wiley. p. 36. ISBN 0-471-54897-9.
vte
Theory of probability distributions
probability mass function (pmf)probability density function (pdf)cumulative
distribution function (cdf)quantile function
Loglogisticpdf no-labels.svg
raw momentcentral momentmeanvariancestandard deviationskewnesskurtosisL-moment
moment-generating function (mgf)characteristic functionprobability-generating
function (pgf)cumulantcombinant
Categories: Types of probability distributions
Navigation menu
Not logged in
Talk
Contributions
Create account
Log in
ArticleTalk
ReadEditView history
Search
Search Wikipedia
Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
Contribute
Help
Learn to edit
Community portal
Recent changes
Upload file
Tools
What links here
Related changes
Special pages
Permanent link
Page information
Cite this page
Wikidata item
Print/export
Download as PDF
Printable version

Languages
Deutsch
Español
Français
한국어
日本語
Português
Русский
Türkçe
中文
16 more
Edit links
This page was last edited on 30 August 2021, at 11:59 (UTC).
Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. By using this site, you agree to the Terms of Use and
Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia

You might also like