You are on page 1of 11

Part 3

Probability Distributions
CHAPTER 6

Probability Distributions and Random Variables

1. Probability Distributions

In business it is often necessary to make statements along the lines of “the probability of observing a
given event is p.” We want to quantify the likelihood of certain events occurring. For example, how
likely is it that a given function of the random outcome takes a certain value?
Probability distributions list or describe probabilities for all possible outcomes of a random process.
They can be extended to answer questions about the probability that a random variable (a numerical
function of the random outcome) takes a certain value or lies in a certain range of values.
Think of a probability distribution as laying out how probability is spread among all the outcomes.

1.1. Analysing uncertainty: problem with single-point approaches. Making “single-point”


assumptions about uncertain outcomes (that is, only considering the mean) causes uncertainty and risks
to remain hidden.
• We may end up:
– improperly valuing the consequences of a potential course of action;
– exposing ourselves to unacceptable risk;
– overlooking opportunities for reducing risk;
– a best-case/worst-case (scenario-based) approach also tends to be incomplete.
• A single-point approach may sometimes suffice in rough analyses, but:
– doesn’t address the likelihood of each case;
– overlooks intermediate outcomes.
A well-considered decision will take into account the probability distributions of key uncertainties, rather
than making single-point assumptions or naive estimates.

1.2. Probability distributions. The set of probabilities associated with all events or outcomes of
an uncertain quantity is called its probability distribution: this is a means of quantifying and representing
the uncertainty associated with a sample space and its events.
Thus, if S is the sample space (set of all outcomes), a probability distribution is just an assignment of
a probability to each event or outcome1 in S.
There are two types of probability distribution:
• Discrete distributions (covered in detail in the next chapter);
• Continuous distributions (covered in detail in the chapter after next).
A discrete probability distribution applies when the sample space of possible outcomes is discrete, such
as rolling dice or tossing a coin. It can be described by a discrete list of the probabilities of the outcomes,
known as a probability function or probability mass function.
A continuous probability distribution applies when the outcomes can take on values in a continuous
range (real numbers), such as the height of a person. It is usually described by a probability density
function where the probability of the outcome lying within a certain range is given by the area under
the density function within that range (and the probability of any single outcome is zero).
1We will see later that for a continuous probability distribution, the probability of any individual outcome is technically
zero: we only get nonzero probabilities for “big enough” events (subsets of outcomes).

77
78 MIS10090

Figure 6.1 shows the distinction between the two types of distribution. In a discrete distribution, prob-
abilities are given at points with distinct intervals between them; whereas in a continuous distribution,
the probabilities can be defined at any point across a range.

Figure 6.1. Discrete versus continuous random variables

There are several standard probability distributions that apply in many — though not all — cases. Based
on empirical observations and theory, we may decide that one of these is most appropriate for a given
situation, e.g., Poisson distribution for number of arrivals per unit time in a queue; normal distribution
for heights of people. Some of these are discussed in subsequent chapters.
There are certain summary parameters for probability distributions such as the mean (also called ex-
pected value), standard deviation, etc.; but these are summaries: they do not tell the whole story that
the distribution does. We can use these if the outcomes are numerical, or if we can convert the outcomes
to numerical values via a random variable.

1.3. Visualising probability distributions. In Chapter 1, we used histograms to plot data sets:
they allow us visualise where the measures of central tendency occur, how variable (spread out) the data
are, and what kind of pattern or distribution the data exhibit. For many large symmetrical distributions,
there is an empirical rule that provides an estimate of the percentage of observations that are contained
within a number of units of standard deviation about the mean. This particularly applies to the normal
distribution that you will meet later, but the normal is often a good approximation to other symmetrical
distributions, under certain conditions. In the case of the normal distribution:
• Approximately 68% of the observations are contained within a symmetrically distributed band
of width one unit of standard deviation either side of the mean. We can interpret this as
saying that 68% of the population lie within one step of standard deviation from the mean. See
Figure 6.2. Or, if we randomly selected one member of the population, the probability that its

68% 95% 99.7%


!
! ± 2" ! ± 3"
! ± 1"

Figure 6.2. Proportions of normal observations within certain distances from the mean

value is within this range is 0.68 = 68%.


• Approximately 95% of the observations are contained within two units of standard deviation
either side of the mean.
• Almost all (99.7%) the observations are contained within three units of standard deviation
either side of the mean.
Data Analysis for Decision Makers 79

You will meet this idea again when you discuss “6 sigma” in quality control during your business studies.
Six sigma means 99.9997% of observations are contained within six units of standard deviation either
side of the mean. In the case of quality control, it means that 99.9997% of manufactured products meet
quality standards, so only three out of every million fail.

1.4. Discrete probability distributions. Discrete probability distributions are defined on a dis-
crete sample space, that is, a set of outcomes separated by clear gaps.2 An important example for us is
the case of a finite number of possible values (i.e., there is a finite number of possible outcomes: a finite
sample space).
For a discrete distribution, we have a probability function or probability mass function: the function that
assigns weight i.e., probability to individual outcomes.
Example 6.1. You book a group skiing trip, hoping to get up to five other people from the class to
come along. You write X for the number that might come and assess the chances of 0, 1, 2, 3, 4 or 5
people coming:
P (X = 0) = 0.03 P (X = 3) = 0.34
P (X = 1) = 0.10 P (X = 4) = 0.25
P (X = 2) = 0.18 P (X = 5) = 0.10

Notice that total probability in a distribution adds up to 1: this follows from our basic rules of probability
(since the distribution describes the probabilities of all outcomes in the sample space). }

A barchart (somewhat like a discrete version of a histogram) can be used to display a discrete probability
distribution in graphical form: there is a bar for each possible measured numerical value and the height
of the bar is the probability of seeing that value: see Figure 6.3.

Figure 6.3. Ski trip example probability function

1.5. Continuous probability distributions. A continuous probability distribution is used where


the uncertain variable may fall anywhere within a range (no gaps between possible outcomes). Examples
include:
• Height of a randomly chosen person (range 0m to 3m)
• Sales level or demand
• Market share of a newly launched product
• Damages awarded in court
2Technically, a discrete probability distribution is a probability distribution that can take on at most a countable
number of values. If the word “countable” doesn’t mean anything to you, don’t worry: it just means that if you had time,
you could count all of the outcomes 1, 2, 3, 4, . . . though there may possibly be infinitely many of them.
80 MIS10090

• Salary after completing MBA

In such cases, we can express probability by talking about the probability of the variable falling within
a certain range. (Technically, this range is called an interval, e.g., [1, 2.5] is the set of all real numbers
1 and  2.5).
For example: “There is a 0.5 probability of sales falling between 65,000 and 85,000 units”.
Example 6.2. Suppose Bling is a new celebrity gossip magazine, launching in September; denote its
circulation by C.
Question: What circulation will it have by the end of the year?
The outcome C is an uncertain numerical value: thus C is a random variable. The continuous probability
distribution for the random variable C is constructed on the basis of intervals:
P (C  5, 000) = 0
P (5, 000 < C  15, 000) = 0.20
P (15, 000 < C  25, 000) = 0.45
P (25, 000 < C  33, 000) = 0.25
P (33, 000 < C  40, 000) = 0.10
When we plot a smooth curve so that the area under the curve between 5,000 and 15,000 is 0.20, the
area under the curve between 15,000 and 25,000 is 0.45, etc., we get something like Figure 6.4. This

Figure 6.4. Bling density function: P (a  C  b) = area under curve between a and b

is the density function for this distribution: the probability of the random variable C taking a value
between two given numbers a and b is the area under this density function curve between a and b. }

2. Random variables

We have hinted at the use of random variables but let us now treat them more fully. You will be familiar
with the idea of using a variable to represent some unknown value. In Chapter 3, we introduced the
idea of indexed notation where we have di↵erent values of a variable, perhaps representing data values
gathered as part of a survey or experiment, or data values extracted from a company’s IT system.
It is useful to be able to refer to variables derived from an outcome of an experiment. Let’s take the
example of a customer in a supermarket. We may wish to know the number of items bought by a
customer (a discrete value) or the amount of money spent by the customer (a continuous value). We
can determine these values after the customer has finished shopping. Note that these values vary from
customer to customer and cannot be determined before the customer completes their shopping, so they
are uncertain. In fact the customer’s shopping basket is the random outcome and we are interested in
numbers that are derived from this random outcome.
Data Analysis for Decision Makers 81

Definition 6.3. A random variable (also called stochastic variable) is a numerical variable X whose
value depends on chance (i.e., randomness). The value X takes can be an integer or possibly any real
number.3

Thus, a random variable is simply a numerical variable whose value is derived from the outcome of a
particular experiment. It represents a possible numerical value arising from an uncertain event. Focussing
on numerical quantities means we can carry out operations on them like addition, multiplication, etc.
It is a standard convention to use a capital letter for a random variable and a small letter for the value
it takes. For example, X might be a random variable, and it might take the particular value x.
We might want to write the mathematical expression that gives the sum of the number of items bought
by four di↵erent customers. To do this, we use indexed notation: forP
each i = 1, 2, 3, 4, let Xi represent
the number of items bought by customer i. We write the expression 4i=1 Xi = X1 + X2 + X3 + X4 for
the sum of the numbers of items bought by the four customers. Each Xi is a random variable, and they
can be combined arithmetically.
Other examples of random variables include the number of siblings a person has, the number of people
at a bus stop at a given time, or the height of a randomly-chosen person.
More generally, a random variable’s possible values might be derived from the possible outcomes of a
measurement or experiment or some other “objectively random” process (e.g., our dice example).
Note: even though X is generally a numerical-valued function, it is not a probability. The probability
measure P on S is what gives probabilities of events (of course, an event might be a single outcome).
Instead, X describes some numerical property that outcomes in S might have.
We mostly use distributions when finding the probability that a certain random variable takes a certain
value or lies in a certain range of values.

2.1. Discrete and continuous random variables. Random variables can be either discrete or
continuous:

• if the set of values a random variable X can take are separated by gaps (e.g., X can only take
integer values), we say X is a discrete random variable;
• if the set of values X can take has no gaps between its elements (there is a continuous spread
of values), we say X is a continuous random variable.

We have already been quietly using random variables: in the skiing example earlier, X was a discrete
random variable; while in the Bling example, X was a continuous random variable.
Example 6.4 (Discrete random variable). As another example, our sample space S might be the set
of outcomes of rolling two fair dice. We might define a random variable X to be the sum of the two
numbers shown on the dice. If the numbers were 3 and 4, then the value of X for that outcome would
be 3 + 4 = 7. This is a discrete random variable, with X(a, b) = a + b. }
Example 6.5 (Continuous random variable). Yet another example: the sample space S might be the
people in this class, and the random variable could be the weight of a person chosen at random. This is
a continuous random variable. }

We will often use the abbreviation DRV for a discrete random variable, and CRV for a continuous random
variable.
We will devote the two chapters following this one to particular examples of (a) discrete and (b) contin-
uous random variables that arise commonly and are particularly important. For the rest of this chapter,
we will focus on topics common to all random variables.
3More precisely, a random variable is actually a function X defined on a sample space S, X : S ! R, with the outputs of
X being numerical values. Thus, strictly speaking, the term random variable is one of the most misleading in mathematics:
it is neither random nor a variable; it is a deterministic function from a sample space to the real numbers R, which associates
with each outcome a number.
82 MIS10090

2.2. Probability of a random variable taking a given value. Each value X can take has an
associated probability. The notation P (X = x) is read as “the probability that the random variable X
has the particular value x”. You’ll sometimes see P (X = x) written as PX (x) or just P (x) for short. It
is best to use P (X = x) to avoid ambiguity.
Example 6.6. Continuing our dice Example 6.4, there are 36 possible outcomes for two six-sided dice,
each equally likely since the dice are fair. Six of these outcomes, (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1),
give X the value of 7. So P (X = 7) = 6/36 = 1/6. }

3. Summary measures of a random variable

3.1. Mean or Expected Value of a random variable. Because random variables are numerical,
we can carry out mathematical operations on their values and thus talk about the mean, variance, etc.,
of the random variable. This mainly makes sense when carried out over all values the random variable
can take, that is, over the whole sample space.
The mean (or expected value) of a random variable is the probability-weighted average of its possible
values.
It can be denoted by µX or just µ (mu), or E(X) for a random variable X.
The expected value is the weighted average of all the possible values that the discrete random variable
can take on. We calculate the expected value E(X) as the weighted sum of the values of X, where the
weights we use are the probabilities that X will take on that value.
We will do this for discrete random variables. It can also be done for continuous random variables; but
this requires integral calculus, a tool which is beyond our scope; so we will not cover it.
Suppose we have a discrete probability distribution describing a random variable X which can take
possible values x1 , . . . , xn with associated probabilities p1 , . . . , pn ; that is, for each i, pi = P (X = xi ).
The mean is defined as the sum
X Xn
E(X) = µ = xP (X = x) = xi P (X = xi )
x i=1
= x1 P (X = x1 ) + x2 P (X = x2 ) + · · · + xn P (X = xn )
= x 1 p 1 + x 2 p2 + · · · + x n pn .
That is, sum up each possible value of X weighted by the probability of that value.
Note that, if all values of X are equally likely, we would give the same weight P (X = xi ) = 1/n to each
xi .
Example 6.7. People joining your group skiing trip
P (X = 0) = 0.03 P (X = 3) = 0.34
P (X = 1) = 0.10 P (X = 4) = 0.25
P (X = 2) = 0.18 P (X = 5) = 0.10
The mean is
0 ⇥ 0.03 + 1 ⇥ 0.10 + 2 ⇥ 0.18 + 3 ⇥ 0.34 + 4 ⇥ 0.25 + 5 ⇥ 0.10 = 2.98.
Note: even though X can only take integer (whole number) values, its mean need not be an integer. }

As we mentioned before when discussing Descriptive Statistics, if we have a data set D with no probability
distribution on it, we have no reason to assume one outcome is more likely than another, so we assign
them equal probability. For example, if the population (sample space) has size N , and we make this
“equally likely” assumption, each outcome will thus have probability (weight) 1/N (so total probability
across the sample space is N ⇥ N1 = 1 as required).
Thus, the concepts we saw before about (arithmetic) mean of a numerical attribute of data are just a
special case of what we are discussing now, where all outcomes have the same probability, e.g., given
Data Analysis for Decision Makers 83

a finite population D, the population mean µ of a numerical attribute is the arithmetic mean of this
attribute, taken over all members of the population:
N
X N
1 1 X x1 + x2 + · · · + xN
µ= xi = xi = .
N N N
i=1 i=1

4. Variance of a random variable

The variance of a random variable X is an important measure of variation: it shows average squared
2 (sigma squared).
variation about the mean. It is denoted by Var(X) or X

To compute it: calculate the di↵erence between the mean (expected value) µ = E(X) and each possible
value xi of X and square that di↵erence; then find the probability weighted mean of these squared
di↵erences.

Thus, if X is a discrete random variable with values x1 , . . . , xn with associated probabilities p1 , . . . , pn ,


then
Xn
Var(X) = pi (xi µ)2 = E((X µ)2 ),
i=1
Pn
where µ = i=1 pi xi is the expected value of X. It can be shown that Var(X) = E(X 2 ) (E(X))2 .

Once p
the variance Var(X) of X has been found, taking its square root gives the standard deviation
X = Var(X) of X.

As with the mean, we will only cover the variance for discrete random variables. It can also be done for
continuous random variables, but again requires integral calculus, which is beyond our scope.

Special case, seen before: if all values are equally likely (e.g., if we have no prior knowledge of probabil-
ities), then the population variance can be written as
N N
2 1 X 2 1 X
= Var(X) = (xi µ) where population mean µ = xi
N N
i=1 i=1

4.1. Standard Deviation of a random variable. The standard deviation of X, denoted by X


or just , is the square root of the variance.
v
u n
p uX
X = Var(X) = t pi (xi µ)2 .
i=1

As mentioned before

• it is the most important measure of variation


• it measures how far any given value in the probability distribution might be expected to lie
from µ = E(X), the mean
• it has the same units as the original data
• population standard deviation in the absence of probability information:
v
u N N
p u1 X 1 X
= Var(X) = t (xi µ)2 where pop. mean µ = xi
N N
i=1 i=1
84 MIS10090

Figure 6.5. Flipping two coins: outcomes (left); probability distribution table (top
right); and probability distribution barchart (bottom right)

4.2. Examples of finding summary measures of a random variable.


Example 6.8 (Discrete random variable: flipping two coins). Let X be a count of the number of heads
when two coins are flipped. The values that X can take are: {0, 1, 2}.
We can count the four elementary outcomes (Figure 6.5, left) and calculate the probability that X takes
on a specific value. We can present the probability distribution as a table (Figure 6.5, top right) or as a
graph (Figure 6.5, bottom right):
This is an example of a probability distribution for a discrete random variable.
Using the formulas from above, we calculate the expected value,
E(X) = 0 ⇥ 0.25 + 1 ⇥ 0.5 + 2 ⇥ 0.25 = 1
and the standard deviation,
p p
= X = (0 1)2 (0.25) + (1 1)2 (0.50) + (2 1)2 (0.25) = 0.50 = 0.707.
Of course, the variance is just 2 = 0.5. }
Example 6.9 (Discrete random variable: text messages). Let X represent how many texts you send per
day. Then X is an example of a discrete random variable (DRV). You monitor your usage pattern over
30 days and summarise it in Table 6.1. Here, the frequency of a given volume of texts means the number
of days you sent that volume of texts. On each of the 30 days, the volume of texts sent was between 0
and 8. That is, X 2 {0, 1, 2, 3, 4, 5, 6, 7, 8}: in this sample, you never sent 9 or more texts.

Volume of Texts sent 0 1 2 3 4 5 6 7 8


Frequency of sending that volume of texts 0 1 4 6 4 7 4 3 1
Table 6.1. Volume of texts sent together with frequency with which that volume was sent

Next we convert the frequency distribution to a probability distribution. We can work out the probabil-
ities for all the values that X may have.
For example, there were no days out of the 30 day trial when you sent zero texts, so the likelihood of
sending zero texts, X = 0, is 0/30. We say P (X = 0) = 0.
Data Analysis for Decision Makers 85

Similarly, from your sample you can calculate the probability you send 2 texts on any day: it is 4/30 or
0.133. We say P (X = 2) = 0.133.
We can represent the probability distribution as a barchart graphic: see Figure 6.6.

Figure 6.6. Probability distribution of volumes of texts as a barchart

In deciding which is the most suitable mobile phone package for you, you need to know how many texts
you send on average per day; in this case:
E(X) = 0 ⇥ P (X = 0) + 1 ⇥ P (X = 1) + 2 ⇥ P (X = 2) + . . . + 8 ⇥ P (X = 8) = 4.37.
On average, you expect to send 4.37 texts per day. }
Exercise 6.10. Find the variance and standard deviation of the random variable X = volume of texts
sent, from Example 6.9.
Exercise 6.11. Find the variance and standard deviation of the random variable X = number of people
coming on the skiing trip with you, from Example 6.7.

You might also like