Professional Documents
Culture Documents
Pearson Distribution
Pearson Distribution
History
The Pearson system was originally devised in an effort
to model visibly skewed observations. It was well
known at the time how to adjust a theoretical model to
fit the first two cumulants or moments of observed
data: Any probability distribution can be extended
straightforwardly to form a location-scale family.
Except in pathological cases, a location-scale family
can be made to fit the observed mean (first cumulant)
and variance (second cumulant) arbitrarily well.
However, it was not known how to construct
probability distributions in which the skewness
(standardized third cumulant) and kurtosis
(standardized fourth cumulant) could be adjusted
equally freely. This need became apparent when trying
to fit known theoretical models to observed data that
exhibited skewness. Pearson's examples include
survival data, which are usually asymmetric. Diagram of the Pearson system, showing
distributions of types I, III, VI, V, and IV in terms
In his original paper, Pearson (1895, p. 360) identified of β1 (squared skewness) and β2 (traditional
four types of distributions (numbered I through IV) in kurtosis)
addition to the normal distribution (which was
originally known as type V). The classification
depended on whether the distributions were supported on a bounded interval, on a half-line, or on the
whole real line; and whether they were potentially skewed or necessarily symmetric. A second paper
(Pearson 1901) fixed two omissions: it redefined the type V distribution (originally just the normal
distribution, but now the inverse-gamma distribution) and introduced the type VI distribution. Together the
first two papers cover the five main types of the Pearson system (I, III, IV, V, and VI). In a third paper,
Pearson (1916) introduced further special cases and subtypes (VII through XII).
Rhind (1909, pp. 430–432) devised a simple way of visualizing the parameter space of the Pearson system,
which was subsequently adopted by Pearson (1916, plate 1 and pp. 430ff., 448ff.). The Pearson types are
characterized by two quantities, commonly referred to as β1 and β2 . The first is the square of the skewness:
where γ1 is the skewness, or third standardized moment. The second is the traditional kurtosis, or
fourth standardized moment: β2 = γ2 + 3. (Modern treatments define kurtosis γ2 in terms of cumulants
instead of moments, so that for a normal distribution we have γ2 = 0 and β2 = 3. Here we follow the
historical precedent and use β2 .) The diagram on the right shows which Pearson type a given concrete
distribution (identified by a point (β1 , β2 )) belongs to.
Many of the skewed and/or non-mesokurtic distributions familiar to us today were still unknown in the
early 1890s. What is now known as the beta distribution had been used by Thomas Bayes as a posterior
distribution of the parameter of a Bernoulli distribution in his 1763 work on inverse probability. The Beta
distribution gained prominence due to its membership in Pearson's system and was known until the 1940s
as the Pearson type I distribution.[1] (Pearson's type II distribution is a special case of type I, but is usually
no longer singled out.) The gamma distribution originated from Pearson's work (Pearson 1893, p. 331;
Pearson 1895, pp. 357, 360, 373–376) and was known as the Pearson type III distribution, before acquiring
its modern name in the 1930s and 1940s.[2] Pearson's 1895 paper introduced the type IV distribution,
which contains Student's t-distribution as a special case, predating William Sealy Gosset's subsequent use
by several years. His 1901 paper introduced the inverse-gamma distribution (type V) and the beta prime
distribution (type VI).
Definition
A Pearson density p is defined to be any valid solution to the differential equation (cf. Pearson 1895,
p. 381)
with:
According to Ord,[3] Pearson devised the underlying form of Equation (1) on the basis of, firstly, the
formula for the derivative of the logarithm of the density function of the normal distribution (which gives a
linear function) and, secondly, from a recurrence relation for values in the probability mass function of the
hypergeometric distribution (which yields the linear-divided-by-quadratic structure).
In Equation (1), the parameter a determines a stationary point, and hence under some conditions a mode of
the distribution, since
Since we are confronted with a first-order linear differential equation with variable coefficients, its solution
is straightforward:
The integral in this solution simplifies considerably when certain special cases of the integrand are
considered. Pearson (1895, p. 367) distinguished two main cases, determined by the sign of the
discriminant (and hence the number of real roots) of the quadratic function
Particular types of distribution
If the discriminant of the quadratic function (2) is negative ( ), it has no real roots. Then
define
The absence of real roots is obvious from this formulation, because α2 is necessarily positive.
Pearson (1895, p. 362) called this the "trigonometrical case", because the integral
Finally, let
Applying these substitutions, we obtain the parametric function:
This unnormalized density has support on the entire real line. It depends on a scale parameter α > 0 and
shape parameters m > 1/2 and ν. One parameter was lost when we chose to find the solution to the
differential equation (1) as a function of y rather than x. We therefore reintroduce a fourth parameter,
namely the location parameter λ. We have thus derived the density of the Pearson type IV distribution:
The normalizing constant involves the complex Gamma function (Γ) and the Beta function (B). Notice that
the location parameter λ here is not the same as the original location parameter introduced in the general
formulation, but is related via
An alternative parameterization (and slight specialization) of the type VII distribution is obtained by letting
which requires m > 3/2. This entails a minor loss of generality but ensures that the variance of the
distribution exists and is equal to σ2 . Now the parameter m only controls the kurtosis of the distribution. If
m approaches infinity as λ and σ are held constant, the normal distribution arises as a special case:
This is the density of a normal distribution with mean λ and standard deviation σ.
This is another specialization, and it guarantees that the first four moments of the distribution exist. More
specifically, the Pearson type VII distribution parameterized in terms of (λ, σ, γ2 ) has a mean of λ, standard
deviation of σ, skewness of zero, and positive excess kurtosis of γ2 .
Student's t-distribution
The Pearson type VII distribution is equivalent to the non-standardized Student's t-distribution with
parameters ν > 0, μ, σ2 by applying the following substitutions to its original parameterization:
This implies that the Pearson type VII distribution subsumes the standard Student's t-distribution and also
the standard Cauchy distribution. In particular, the standard Student's t-distribution arises as a subcase,
when μ = 0 and σ2 = 1, equivalent to the following substitutions:
The density of this restricted one-parameter family is a standard Student's t:
If the quadratic function (2) has a non-negative discriminant ( ), it has real roots a1 and a2
(not necessarily distinct):
In the presence of real roots the quadratic function (2) can be written as
Pearson (1895, p. 362) called this the "logarithmic case", because the integral
involves only the logarithm function and not the arctan function as in the previous case.
The Pearson type I distribution (a generalization of the beta distribution) arises when the roots of the
quadratic equation (2) are of opposite sign, that is, . Then the solution p is supported on the
interval . Apply the substitution
where , which yields a solution in terms of y that is supported on the interval (0, 1):
The Pearson type II distribution is a special case of the Pearson type I family restricted to symmetric
distributions.
where
The ordinate, y, is the frequency of . The Pearson Type II Curve is used in computing the table of
significant correlation coefficients for Spearman's rank correlation coefficient when the number of items in
a series is less than 100 (or 30, depending on some sources). After that, the distribution mimics a standard
Student's t-distribution. For the table of values, certain values are used as the constants in the previous
equation:
Defining
gamma distribution.
Defining
follows a . The Pearson type VI distribution is a beta prime
distribution or F-distribution.
Alternatives to the Pearson system of distributions for the purpose of fitting distributions to data are the
quantile-parameterized distributions (QPDs) and the metalog distributions. QPDs and metalogs can provide
greater shape and bounds flexibility than the Pearson system. Instead of fitting moments, QPDs are
typically fit to empirical CDF or other data with linear least squares.
Applications
These models are used in financial markets, given their ability to be parametrized in a way that has intuitive
meaning for market traders. A number of models are in current use that capture the stochastic nature of the
volatility of rates, stocks, etc., and this family of distributions may prove to be one of the more important.
In the United States, the Log-Pearson III is the default distribution for flood frequency analysis.[5]
Recently, there have been alternatives developed to the Pearson distributions that are more flexible and
easier to fit to data. See the metalog distributions.
Notes
1. Miller, Jeff; et al. (2006-07-09). "Beta distribution" (http://jeff560.tripod.com/b.html). Earliest
Known Uses of Some of the Words of Mathematics. Retrieved 2006-12-09.
2. Miller, Jeff; et al. (2006-12-07). "Gamma distribution" (http://jeff560.tripod.com/g.html).
Earliest Known Uses of Some of the Words of Mathematics. Retrieved 2006-12-09.
3. Ord J.K. (1972) p. 2
4. Ramsey, Philip H. (1989-09-01). "Critical Values for Spearman's Rank Order Correlation".
Journal of Educational Statistics. 14 (3): 245–253. JSTOR 1165017 (https://www.jstor.org/sta
ble/1165017).
5. "Guidelines for Determine Flood Flow Frequency" (https://water.usgs.gov/osw/bulletin17b/dl
_flow.pdf) (PDF). USGS Water. March 1982. Retrieved 2019-06-14.
Sources
Primary sources
Pearson, Karl (1893). "Contributions to the mathematical theory of evolution [abstract]" (http
s://doi.org/10.1098%2Frspl.1893.0079). Proceedings of the Royal Society. 54 (326–330):
329–333. doi:10.1098/rspl.1893.0079 (https://doi.org/10.1098%2Frspl.1893.0079).
JSTOR 115538 (https://www.jstor.org/stable/115538).
Pearson, Karl (1895). "Contributions to the mathematical theory of evolution, II: Skew
variation in homogeneous material" (https://zenodo.org/record/1432104/files/article.pdf)
(PDF). Philosophical Transactions of the Royal Society. 186: 343–414.
Bibcode:1895RSPTA.186..343P (https://ui.adsabs.harvard.edu/abs/1895RSPTA.186..343
P). doi:10.1098/rsta.1895.0010 (https://doi.org/10.1098%2Frsta.1895.0010). JSTOR 90649
(https://www.jstor.org/stable/90649).
Pearson, Karl (1901). "Mathematical contributions to the theory of evolution, X: Supplement
to a memoir on skew variation" (https://doi.org/10.1098%2Frsta.1901.0023). Philosophical
Transactions of the Royal Society A. 197 (287–299): 443–459.
Bibcode:1901RSPTA.197..443P (https://ui.adsabs.harvard.edu/abs/1901RSPTA.197..443
P). doi:10.1098/rsta.1901.0023 (https://doi.org/10.1098%2Frsta.1901.0023). JSTOR 90841
(https://www.jstor.org/stable/90841).
Pearson, Karl (1916). "Mathematical contributions to the theory of evolution, XIX: Second
supplement to a memoir on skew variation" (https://doi.org/10.1098%2Frsta.1916.0009).
Philosophical Transactions of the Royal Society A. 216 (538–548): 429–457.
Bibcode:1916RSPTA.216..429P (https://ui.adsabs.harvard.edu/abs/1916RSPTA.216..429
P). doi:10.1098/rsta.1916.0009 (https://doi.org/10.1098%2Frsta.1916.0009). JSTOR 91092
(https://www.jstor.org/stable/91092).
Rhind, A. (July–October 1909). "Tables to facilitate the computation of the probable errors of
the chief constants of skew frequency distributions" (https://zenodo.org/record/1431607).
Biometrika. 7 (1/2): 127–147. doi:10.1093/biomet/7.1-2.127 (https://doi.org/10.1093%2Fbiom
et%2F7.1-2.127). JSTOR 2345367 (https://www.jstor.org/stable/2345367).
Secondary sources
Milton Abramowitz and Irene A. Stegun (1964). Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables. National Bureau of Standards.
Eric W. Weisstein et al. Pearson Type III Distribution (http://mathworld.wolfram.com/Pearson
TypeIIIDistribution.html). From MathWorld.
References
Elderton, Sir W.P, Johnson, N.L. (1969) Systems of Frequency Curves. Cambridge
University Press.
Ord J.K. (1972) Families of Frequency Distributions. Griffin, London.