You are on page 1of 32

Probability and probability

distributions

Ian Jolliffe
University of Aberdeen

CLIPS module 3.4


What is meant by the probability
of an event?
 Both ‘event’ and ‘probability’ are
intuitively understood by most people, but
we need to establish certain rules.
 ‘Rain tomorrow’, ‘3 or fewer cyclones next year’,
‘The analysed surface pressure at a grid-point has
an error of more than 2 hPa’, ‘Crop yield will
exceed a given threshold’ are all examples of
‘events’ whose ‘probabilities’ might be of interest.
What is meant by the probability
of an event?
 Probabilities lie between 0 and 1.
 Zero probability implies that something is
impossible.
 A probability of 1 means something is
certain.
 What does an intermediate probability
imply, for example if we say that the
probability of rain tomorrow is 0.25?
Notation and terminology
 Let A denote an event . The probability of
that event is usually written P(A) or Pr(A).
 The complement of an event A isc

everything not in that event . The


complement of ‘rain tomorrow’ is ‘no rain
tomorrow’; the complement of ‘3 or fewer
cyclones’ is ‘4 or more cyclones’.
 P(Ac) = 1 - P(A)
What is meant by the probability
of an event?
 A probability of 0.25 (also expressed as 1/4, or as
25%) implies that we think that it is 3 times as
likely not to rain as it is to rain. This is because
 P(no rain) = 1 - P(rain) = 0.75
 0.75/0.25 = 3.
 A probability can often be thought of as a long-
term proportion of times an event will occur.
Probability - long-term
proportions or subjective
 In our rain/no rain example we might know that
for our station of interest it rains on 25% of days
at this time of year. Hence P(rain) = 0.25.
 However sometimes events are unique - it is of
interest to ask what is the probability that a
particular tropical storm will make landfall on a
particular stretch of coastline. There are no long-
term data on which to base the probability.
Subjectivity comes in.
Unions and intersections
 The union of two events A and B consists of everything
included in A or B or both. Let
 A = {rain tomorrow}
 B = {rain the day after tomorrow}
 C = {3 or fewer cyclones}
 D = {4 or 5 cyclones}
 Then
 AB= {rain in the next 2 days}
 CD = {5 or fewer cyclones}
 P{C
P{C D} = P{C} + P{D}, because C and D are mutually
exclusive (they don’t overlap).
Unions and intersections
 P{AB}  P{A} + P{B} because A and B do overlap.
 P{AB} = P{A} + P{B} - P{AB}.
 AB is the intersection of A and B; it includes
everything that is in both A and B, and is counted twice
if we add P{A} and P{B}.
 In our example
 AB = {rains tomorrow and the day after tomorrow}.
 CD is empty - it is impossible for C and D to occur
simultaneously, so P{CD} = 0.
Conditional probability and
independence
 If we know that one event has occurred it may change our
view of the probability of another event. Let
 A = {rain today}, B = {rain tomorrow}, C = {rain in 90 days time}
 It is likely that knowledge that A has occurred will change
your view of the probability that B will occur, but not of
the probability that C will occur.
 We write P(B|A) P(B), P(C|A) = P(C). P(B|A) denotes
the conditional probability of B, given A.
 We say that A and C are independent, but A and B are not.
 Note that for independent events P(AC) = P(A)P(C).
Conditional probability - tornado
forecasting
 Consider the classic data set on the next Slide
consisting of forecasts and observations of
tornados (Finley, 1884).
 Let
 F = {Tornado forecast}
 T = {Tornado observed}
 Use the frequencies in the table to estimate
probabilities – it’s a large sample, so estimates
should not be too bad.
Forecasts of tornados
Tornado No T Total
Forecast forecast
Tornado 28 23 51
observed
No T 72 2680 2752
observed
Total 100 2703 2803
Conditional probability - tornado
forecasting
 P(T) = 51/2803 = 0.0182
 P(T|F) = 28/100 = 0.2800
 P(T|Fc) = 23/2703 = 0.0085
 Knowledge of the forecast changes P(T). F and T are
not independent.
 P(F|T) = 28/51 = 0.5490
 P(T|F), P(F|T) are often confused but are different
quantities, and can take very different values.
Conditional probability - tornado
forecasting
 P(TF) = 28/2803 = P(T) P(F|T) = P(F)P(T|F)
P(F)P(T).
 The two formulae for the probability of an
intersection always hold.
 If A, B are independent, then P(A|B) = P(A), P(B|A)
= P(A), so P(AB) = P(A)P(B).
 P(B|A) = P(B)P(A|B)/P(A)
 This is Bayes Theorem, though in the usual statement of
the theorem P(A) is expanded in a more complicated-
looking fashion.
Random variables
 Often we take measurements which have different
values on different occasions. Furthermore, the
values are subject to random or stochastic
variation - they are not completely predictable,
and so are not deterministic. They are random
variables.
 Examples are crop yield, maximum temperature,
number of cyclones in a season, rain/no rain.
Continuous and discrete random
variables
 A continuous random variable is one which can
(in theory) take any value in some range, for
example crop yield, maximum temperature.
 A discrete variable has a countable set of values.
They may be
 counts, such as numbers of cyclones
 categories, such as much above average, above
average, near average, below average, much below
average
 binary variables, such as rain/no rain
Probability distributions
 If we measure a random variable many times, we
can build up a distribution of the values it can
take.
 Imagine an underlying distribution of values
which we would get if it was possible to take more
and more measurements under the same
conditions.
 This gives the probability distribution for the
variable.
Discrete probability distributions
 A discrete probability distribution associates a
probability with each value of a discrete random
variable.
 Example 1. Random variable has two values Rain/No Rain.
P(Rain) = 0.2, P(No Rain) = 0.8 gives a probability
distribution.
 Example 2. Let X = Number of wet days in a 10 day period.
P(X=0) = 0.1074, P(X=1) = 0.2684, P(X=2) = 0.3020, …
P(X=6) = 0.0055, ... (see Slide 24 for more on this example).
 Note that P(rain) + P(No Rain) = 1; P(X=0) + P(X=1) + P(X=2)
+ … +P(X=6) + … P(X=10) = 1.
Continuous probability distributions
 Because continuous random variables can take all
values in a range, it is not possible to assign
probabilities to individual values.
 Instead we have a continuous curve, called a
probability density function, which allows us to
calculate the probability a value within any
interval.
 This probability is calculated as the area under the
curve between the values of interest. The total area
under the curve must equal 1.
Example: probability distribution
for maximum temperature
 The next Slide shows an idealized probability
density for maximum daily temperature at a station
in a particular month.
 The total area under the curve is 1.
 The area under the curve to the left of 20 is the
probability of a max. temperature less than 20°C.
 The area between 25 and 30 the probability of a
max. temp. between 25°C and 30°C.
 The area to right of 32 is the prob. of the max. temp.
exceeding 32°C.
Example: theoretical probability
density for maximum temperature
t
40 30 20

0.0

0.1

f(t)
0.2

0.3

0.4
Families of probability distributions
 The number of different probability distributions
is unlimited. However, certain families of
distributions give good approximations to the
distributions of many random variables.
 Important families of discrete distributions include
binomial, multinomial, Poisson, hypergeometric,
negative binomial …
 Important families of continuous distributions
include normal (Gaussian), exponential, gamma,
lognormal, Weibull, extreme value …
Families of discrete distributions
 We consider only two, binomial and
Poisson. There are many more.
 Do not use a particular distribution unless
you are satisfied that the assumptions which
underlie it are (at least approximately)
satisfied.
Binomial distributions
1. The data arise from a sequence of n independent trials.
2. At each trial there are only two possible outcomes,
conventionally called success and failure.
3. The probability of success, p, is the same in each trial.
4. The random variable of interest is the number of
successes, X, in the n trials.
The assumptions of independence and constant p
in 1, 3 are important. If they are invalid, so is
the binomial distribution
Binomial distributions - examples
 Example 2 on Slide 17 is an example of a binomial
distribution with 10 trials and probability of success 0.2.
 It is unlikely that the binomial distribution would be
appropriate for the number of wet days in a period of 10
consecutive days, because of non-independence of rain on
consecutive days.
 It might be appropriate for the number of frost-free
Januarys, or the number of crop failures, in a 10-year
period, if we can assume no inter-annual dependence and
no trend in p, the frost-free probability, or crop failure
probability.
Poisson distributions
 Poisson distributions are often used to describe the number
of occurrences of a ‘rare’ event. For example
 The number of tropical cyclones in a season
 The number of occasions in a season when river levels
exceed a certain value
 The main assumptions are that events occur
 at random (the occurrence of an event doesn’t change the
probability of it happening again)
 at a constant rate
 Poisson distributions also arise as approximations to
binomials when n is large and p is small.
Poisson distributions – an example
 Suppose that we can assume that the number of
cyclones, X, in a particular area in a season has a
Poisson distribution with a mean (average) of 3.
Then P(X=0) = 0.05, P(X=1) = 0.15, P(X=2) =
0.22, P(X=3) = 0.22, P(X=4) = 0.17, P(X=5) =
0.10, … Note:
 There is no upper limit to X, unlike the binomial where
the upper limit is n.
 Assuming a constant rate of occurrence, the number of
cyclones in 2 seasons would also have a Poisson
distribution, but with mean 6.
Normal (Gaussian) distributions
 Normal (also known as Gaussian) distributions are
by far the most commonly used family of
continuous distributions.
 They are ‘bell-shaped’ – see Slide 20 - and are
indexed by two parameters:
 The mean  – the distribution is symmetric about this
value
 The standard deviation  – this determines the spread of
the distribution. Roughly 2/3 of the distribution lies
within 1 standard deviation of the mean, and 95% within
2 standard deviations.
Normal distributions - examples
 The example on Slide 20 is a normal distribution
with mean 27 and standard deviation 3. The
probability of a temperature below 20
20 C is 0.01; the
probability of a temperature between 25
25 C and 30
30 C
is 0.59; the probability of exceeding 32
32 C is 0.05.
 Normal distributions are relevant when variables
have roughly symmetric bell-shaped distributions.
They tend to be more appropriate for variables
which are sums or averages over several days or
months than for individual measurements.
Deviations from normality - skewness

 Some variables deviate from normality because their


distributions are symmetric but ‘too flat’ or too ‘long-
tailed’.
 A more common type of deviation is skewness, where one
tail of the distribution is much longer than the other.
 Positive skewness, as illustrated in the next Slide is most
common – it occurs for windspeeds, and for rainfall
amounts.
 Negatively-skewed distrbutions with longer tails to the left
sometimes occur, for example surface pressure.
A positively-skewed Weibull distribution

0.3

0.2
f(x)

0.1

0.0

0 5 10
x
Families of skewed distributions
 There are several families of skewed distributions,
including Weibull, gamma and lognormal. Each
family has 2 or more parameters which can be
varied to fit a variety of shapes.
 One particular family (strictly 3 families) consists
of so-called extreme value distributions. As the
name suggests, these can be used to model
extremes over a period, for example, maximum
windspeed, minimum temperature, greatest 24-hr.
rainfall, highest flood …
Other probability distributions
 We have sketched a few of the main probability
distributions, but there are many others. Examples
which don’t fit standard patterns include
 Proportion of sky covered by cloud may have large
probability values near 0 and 1, with lower probabilities
in between – U-shaped rather than bell-shaped
 Daily rainfall is neither (purely) discrete, nor
continuous. Positive values are continuous, but there is
also a non-zero (discrete) probability of taking the
value zero.

You might also like