You are on page 1of 6

Wallenius' noncentral hypergeometric

distribution
In probability theory and statistics, Wallenius'
noncentral hypergeometric distribution (named
after Kenneth Ted Wallenius) is a generalization of the
hypergeometric distribution where items are sampled
with bias.

This distribution can be illustrated as an urn model


with bias. Assume, for example, that an urn contains
m1 red balls and m2 white balls, totalling N = m1 + m2
balls. Each red ball has the weight ω1 and each white
ball has the weight ω2 . We will say that the odds ratio
is ω = ω1 / ω2 . Now we are taking n balls, one by one,
in such a way that the probability of taking a particular
ball at a particular draw is equal to its proportion of the Probability mass function for Wallenius' Noncentral
total weight of all balls that lie in the urn at that Hypergeometric Distribution for different values of
moment. The number of red balls x1 that we get in this the odds ratio ω.
experiment is a random variable with Wallenius' m1 = 80, m2 = 60, n = 100, ω = 0.1 ... 20
noncentral hypergeometric distribution.

The matter is complicated by the fact that there is more than one noncentral hypergeometric distribution.
Wallenius' noncentral hypergeometric distribution is obtained if balls are sampled one by one in such a way
that there is competition between the balls. Fisher's noncentral hypergeometric distribution is obtained if the
balls are sampled simultaneously or independently of each other. Unfortunately, both distributions are
known in the literature as "the" noncentral hypergeometric distribution. It is important to be specific about
which distribution is meant when using this name.

The two distributions are both equal to the (central) hypergeometric distribution when the odds ratio is 1.

The difference between these two probability distributions is subtle. See the Wikipedia entry on noncentral
hypergeometric distributions for a more detailed explanation.

Univariate distribution
Wallenius' distribution is particularly Univariate Wallenius' Noncentral
complicated because each ball has a Hypergeometric Distribution
probability of being taken that depends
Parameters
not only on its weight, but also on the
total weight of its competitors. And the
weight of the competing balls depends
on the outcomes of all preceding draws.
Support
This recursive dependency gives rise to a
difference equation with a solution that is PMF
given in open form by the integral in the
expression of the probability mass
where
function in the table above.
Mean Approximated by solution to
Closed form expressions for the
probability mass function exist (Lyons,
1980), but they are not very useful for
Variance
practical calculations because of extreme ,   where
numerical instability, except in
degenerate cases.

Several other calculation methods are used, including


recursion, Taylor expansion and numerical integration
(Fog, 2007, 2008).

The most reliable calculation method is recursive


calculation of f(x,n) from f(x,n-1) and f(x-1,n-1) using
the recursion formula given below under properties. The
probabilities of all (x,n) combinations on all possible
trajectories leading to the desired point are calculated,
starting with f(0,0) = 1 as shown on the figure to the
right. The total number of probabilities to calculate is
n(x+1)-x2 . Other calculation methods must be used Recursive calculation of probability f(x,n) in
when n and x are so big that this method is too Wallenius' distribution. The light grey fields are
inefficient. possible points on the way to the final point. The
arrows indicate an arbitrary trajectory.
The probability that all balls have the same color is
easier to calculate. See the formula below under
multivariate distribution.

No exact formula for the mean is known (short of complete enumeration of all probabilities). The equation
given above is reasonably accurate. This equation can be solved for μ by Newton-Raphson iteration. The
same equation can be used for estimating the odds from an experimentally obtained value of the mean.

Properties of the univariate distribution

Wallenius' distribution has fewer symmetry relations than Fisher's noncentral hypergeometric distribution
has. The only symmetry relates to the swapping of colors:

Unlike Fisher's distribution, Wallenius' distribution has no symmetry relating to the number of balls not
taken.

The following recursion formula is useful for calculating probabilities:


Another recursion formula is also known:

The probability is limited by

where the underlined superscript indicates the falling factorial .

Multivariate distribution
The distribution can be expanded to any number of colors c of balls in the urn. The multivariate distribution
is used when there are more than two colors.

The probability mass function


can be calculated by various
Multivariate Wallenius' Noncentral Hypergeometric
Taylor expansion methods or Distribution
by numerical integration (Fog, Parameters
2008).

The probability that all balls


have the same color, j, can be
calculated as:

Support

PMF

where
Mean Approximated by solution to

Variance Approximated by variance of Fisher's noncentral


hypergeometric distribution with same mean.

for xj = n ≤ mj, where the underlined superscript denotes the falling factorial.

A reasonably good approximation to the mean can be calculated using the equation given above. The
equation can be solved by defining θ so that

and solving

for θ by Newton-Raphson iteration.

The equation for the mean is also useful for estimating the odds from experimentally obtained values for the
mean.

No good way of calculating the variance is known. The best known method is to approximate the
multivariate Wallenius distribution by a multivariate Fisher's noncentral hypergeometric distribution with
the same mean, and insert the mean as calculated above in the approximate formula for the variance of the
latter distribution.

Properties of the multivariate distribution

The order of the colors is arbitrary so that any colors can be swapped.

The weights can be arbitrarily scaled:

for all .

Colors with zero number (mi = 0) or zero weight (ωi = 0) can be omitted from the equations.

Colors with the same weight can be joined:


where is the (univariate, central) hypergeometric distribution probability.

Complementary Wallenius' noncentral hypergeometric


distribution
The balls that are not taken in the urn experiment have
a distribution that is different from Wallenius'
noncentral hypergeometric distribution, due to a lack of
symmetry. The distribution of the balls not taken can
be called the complementary Wallenius' noncentral
hypergeometric distribution.

Probabilities in the complementary distribution are


calculated from Wallenius' distribution by replacing n
with N-n, xi with mi - xi, and ωi with 1/ωi.

Probability mass function for the Complementary


Wallenius' Noncentral Hypergeometric Distribution
for different values of the odds ratio ω.
m1 = 80, m2 = 60, n = 40, ω = 0.05 ... 10

Software available
WalleniusHypergeometricDistribution (http://reference.wolfram.com/mathematica/ref/Walleni
usHypergeometricDistribution.html) in Mathematica.
An implementation for the R programming language is available as the package named
BiasedUrn (http://cran.stat.ucla.edu/web/packages/BiasedUrn/index.html). Includes
univariate and multivariate probability mass functions, distribution functions, quantiles,
random variable generating functions, mean and variance.
Implementation in C++ is available from www.agner.org (http://www.agner.org/random/).

See also
Noncentral hypergeometric distributions
Fisher's noncentral hypergeometric distribution
Biased sample
Bias
Population genetics
Fisher's exact test

References
Chesson, J. (1976). "A non-central multivariate hypergeometric distribution arising from
biased sampling with application to selective predation". Journal of Applied Probability.
Vol. 13, no. 4. Applied Probability Trust. pp. 795–797. doi:10.2307/3212535 (https://doi.org/1
0.2307%2F3212535). JSTOR 3212535 (https://www.jstor.org/stable/3212535).
Fog, A. (2007). "Random number theory" (http://www.agner.org/random/theory/).
Fog, A. (2008). "Calculation Methods for Wallenius' Noncentral Hypergeometric
Distribution". Communications in Statictics, Simulation and Computation. 37 (2): 258–273.
doi:10.1080/03610910701790269 (https://doi.org/10.1080%2F03610910701790269).
S2CID 9040568 (https://api.semanticscholar.org/CorpusID:9040568).
Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005). Univariate Discrete Distributions. Hoboken,
New Jersey: Wiley and Sons.
Lyons, N. I. (1980). "Closed Expressions for Noncentral Hypergeometric Probabilities".
Communications in Statistics - Simulation and Computation. Vol. 9, no. 3. pp. 313–314.
doi:10.1080/03610918008812156 (https://doi.org/10.1080%2F03610918008812156).
Manly, B. F. J. (1974). "A Model for Certain Types of Selection Experiments". Biometrics.
Vol. 30, no. 2. International Biometric Society. pp. 281–294. doi:10.2307/2529649 (https://do
i.org/10.2307%2F2529649). JSTOR 2529649 (https://www.jstor.org/stable/2529649).
Wallenius, K. T. (1963). Biased Sampling: The Non-central Hypergeometric Probability
Distribution. Ph.D. Thesis (Thesis). Stanford University, Department of Statistics.

Retrieved from "https://en.wikipedia.org/w/index.php?


title=Wallenius%27_noncentral_hypergeometric_distribution&oldid=1163959688"

You might also like