Professional Documents
Culture Documents
distributions
Ian Jolliffe
University of Aberdeen
0.0
0.1
f(t)
0.2
0.3
0.4
Families of probability distributions
The number of different probability distributions
is unlimited. However, certain families of
distributions give good approximations to the
distributions of many random variables.
Important families of discrete distributions include
binomial, multinomial, Poisson, hypergeometric,
negative binomial …
Important families of continuous distributions
include normal (Gaussian), exponential, gamma,
lognormal, Weibull, extreme value …
Families of discrete distributions
We consider only two, binomial and
Poisson. There are many more.
Do not use a particular distribution unless
you are satisfied that the assumptions which
underlie it are (at least approximately)
satisfied.
Binomial distributions
1. The data arise from a sequence of n independent trials.
2. At each trial there are only two possible outcomes,
conventionally called success and failure.
3. The probability of success, p, is the same in each trial.
4. The random variable of interest is the number of
successes, X, in the n trials.
The assumptions of independence and constant p
in 1, 3 are important. If they are invalid, so is
the binomial distribution
Binomial distributions - examples
Example 2 on Slide 17 is an example of a binomial
distribution with 10 trials and probability of success 0.2.
It is unlikely that the binomial distribution would be
appropriate for the number of wet days in a period of 10
consecutive days, because of non-independence of rain on
consecutive days.
It might be appropriate for the number of frost-free
Januarys, or the number of crop failures, in a 10-year
period, if we can assume no inter-annual dependence and
no trend in p, the frost-free probability, or crop failure
probability.
Poisson distributions
Poisson distributions are often used to describe the number
of occurrences of a ‘rare’ event. For example
The number of tropical cyclones in a season
The number of occasions in a season when river levels
exceed a certain value
The main assumptions are that events occur
at random (the occurrence of an event doesn’t change the
probability of it happening again)
at a constant rate
Poisson distributions also arise as approximations to
binomials when n is large and p is small.
Poisson distributions – an example
Suppose that we can assume that the number of
cyclones, X, in a particular area in a season has a
Poisson distribution with a mean (average) of 3.
Then P(X=0) = 0.05, P(X=1) = 0.15, P(X=2) =
0.22, P(X=3) = 0.22, P(X=4) = 0.17, P(X=5) =
0.10, … Note:
There is no upper limit to X, unlike the binomial where
the upper limit is n.
Assuming a constant rate of occurrence, the number of
cyclones in 2 seasons would also have a Poisson
distribution, but with mean 6.
Normal (Gaussian) distributions
Normal (also known as Gaussian) distributions are
by far the most commonly used family of
continuous distributions.
They are ‘bell-shaped’ – see Slide 20 - and are
indexed by two parameters:
The mean – the distribution is symmetric about this
value
The standard deviation – this determines the spread of
the distribution. Roughly 2/3 of the distribution lies
within 1 standard deviation of the mean, and 95% within
2 standard deviations.
Normal distributions - examples
The example on Slide 20 is a normal distribution
with mean 27 and standard deviation 3. The
probability of a temperature below 20
20 C is 0.01; the
probability of a temperature between 25
25 C and 30
30 C
is 0.59; the probability of exceeding 32
32 C is 0.05.
Normal distributions are relevant when variables
have roughly symmetric bell-shaped distributions.
They tend to be more appropriate for variables
which are sums or averages over several days or
months than for individual measurements.
Deviations from normality - skewness
0.3
0.2
f(x)
0.1
0.0
0 5 10
x
Families of skewed distributions
There are several families of skewed distributions,
including Weibull, gamma and lognormal. Each
family has 2 or more parameters which can be
varied to fit a variety of shapes.
One particular family (strictly 3 families) consists
of so-called extreme value distributions. As the
name suggests, these can be used to model
extremes over a period, for example, maximum
windspeed, minimum temperature, greatest 24-hr.
rainfall, highest flood …
Other probability distributions
We have sketched a few of the main probability
distributions, but there are many others. Examples
which don’t fit standard patterns include
Proportion of sky covered by cloud may have large
probability values near 0 and 1, with lower probabilities
in between – U-shaped rather than bell-shaped
Daily rainfall is neither (purely) discrete, nor
continuous. Positive values are continuous, but there is
also a non-zero (discrete) probability of taking the
value zero.