You are on page 1of 27

Chapter 1

Properties of Random
Variables

Random variables are encountered in every discipline in science. In this Chapter topics:
chapter we discuss how we may describe the properties random variables, 1. Random variables
in particular by using probability distributions, as well as defining the mean 2. Probability distributions
and the standard deviation of random variables. Since random variables (especially the normal
are encountered throughout chemistry and the other natural sciences, this distribution)
chapter is rather broad in scope. We do, however, introduce one particular 3. Measures of location and
type of random variable, the normally-distributed random variable. One dispersion
of the most important skills you will need to obtain in this chapter is the
ability to use tables of the standard normal cumulative probabilities to solve
problems involving normally-distributed variables. A number of numerical
examples in the chapter will illustrate how to do so.

1.1 A First Look at Random Variables


In studying statistics, we are concerned with experiments in which the out- The domain contains all the
come is subject to some element of chance; these are statistical experiments. possible values of a variable
Classical statistical experiments include coin flipping or drawing cards at
random from a deck. Let’s consider a specific experiment: we throw two
dice and add the numbers displayed. The list of possible outcomes of this
would be {2, 3, . . . , 12}. This list comprises the domain of the experiment.
The domain of a variable is defined simply as the list of all possible values
that the variable may assume.
The domain will depend on exactly how a variable is defined. For exam-
ple, in our dice experiment, we might be interested in whether or not the
sum is odd or even; the domain will then be {even, odd}. Let’s consider a
different experiment: tossing a coin two times. We can think of the domain
as {HH, HT, TH, TT}, where H = heads and T = tails. Alternatively, we can
focus on the total number of heads after the two tosses, in which case the
domain is {0, 1, 2}. In all experiments, the domain associated with the
experiment will contain all the outcomes that are possible, no matter how
unlikely.
Chemical measurements using some instrument are also statistical mea-
surements, with an associated domain. The domain of these measurements
will be all the possible values that can be assumed by the measurement de-
vice.

1
2 1. Properties of Random Variables

discrete values

continuous values

Figure 1.1: Difference between discrete and continuous random variables. A dis-
crete variable can only assume certain values (e.g., the hash-marks on the number
line), whereas a continuous variable can assume any value within the range of all
possible values.

The outcome of an experiment will vary: in other words, the outcome


is a variable. Furthermore, the outcome of a vast majority of the experi-
ments in science will contain a random component, so that the outcome
is not completely predictable. These types of variables are called random
Random variables are variables variables. Since random variables cannot be predicted exactly, they must
that cannot be predicted with be described in the language of probability. For example, we cannot say
complete certainty
for certain that the result of a single coin flip will be ‘heads,’ but we can
say that the probability is 0.5. Every outcome in the domain will have a
probability associated with it.
It is an advantage to be able to express experimental outcomes as num-
bers; such variables are quantitative random variables (as opposed to an
outcome such as ‘heads,’ which is a “qualitative” random variable). We
will be concerned exclusively with quantitative random variables, of which
there are two types: discrete and continuous variables.
The distinction between these variables is most easily understood by
using a few examples. If our experiment consists of rolling dice or survey-
ing the number of children in households, then the random variable will
always be a whole number; these are discrete variables. A discrete variable
can only assume certain values within the range contained in the domain.
Unlike a discrete variable, a continuous variable is theoretically able to as-
sume any value in an interval contained within its domain. If we wanted
to measure the height or weight of a group of people, then the resulting
values would be continuous variables.
If we think in terms of a number line, a discrete variable can only as-
sume certain values on the line (for example, the values associated with
whole numbers) while continuous variables may assume any value on the
line. Figure 1.1 demonstrates this concept. The number line in the fig-
ure represents the entire domain for a variable. A discrete variable would
be constrained to assume only certain values within the interval, while a
continuous variable can assume any value on the number line. Within its
domain there are always an infinite number of possible values for a con-
tinuous variable. The number of discrete variables can be either finite or
infinite.
One final note: although the distinction between continuous and dis-
crete variables is important in how we use probability to describe the pos-
sible outcomes, as a practical matter there is probably no such thing as a
truly continuous random variable in measurement science. This is because
any measuring device will limit the number of possible outcomes. For ex-
ample, consider a digital analytical balance that has a range of 0–100 g and
1.2. Probability Distributions of Discrete Variables 3

displays the mass to the nearest 0.1 mg. There are 106 possible values in
this range — a large value, to be sure, but not infinitely large. For most pur-
poses, however, we may treat this measurement as a continuous variable.

1.2 Probability Distributions of Discrete Variables


1.2.1 Introduction
Let’s briefly summarize what we have so far:

• a statistical experiment is one in which there is some element of


chance in the outcome;

• the outcome of the experiment is a random variable;

• the domain is a list (possibly infinite) of all possible outcomes of a


statistical experiment.

Now, although the domain gives us the possible outcomes of an exper-


iment, we haven’t said anything about which of these are the most proba-
ble outcomes of the experiment. For example, if we wish to measure the
heights of all the students at the University of Richmond using a 30 ft. tape
measure, then the domain will consist of all the possible readings from
the tape, 0–30 ft. However, even though measurements of 6 in or 20 ft are
contained within the domain, the probability of observing these values is
vanishingly small.
A probability distribution describes the likelihood of the outcomes of Since random variables are
an experiment. Probability distributions are used to describe both discrete inherently unpredictable,
probability distributions must be
and continuous random variables. Discrete distributions are a little easier used to describe their properties
to understand, and so we will discuss them first.
Let’s consider a simple experiment: tossing a coin twice. Our random
variable will be the number of heads that are observed after two tosses.
Thus, the domain is {0 1 2}; no other outcomes are possible. Now, let’s
assign probabilities to each of these possible outcomes. The following table
lists the four possible outcomes along with the value of x associated with
each outcome.

Outcome Random Variable (x)


TT 0
TH 1
HT 1
HH 2

If we assume that heads or tails is equally probable (probability of 0.5


for both), then each of the four outcomes is equally probable, with a prob-
ability of 0.25 each. It seems intuitive, then, that

P(x = 0) = 0.25
P(x = 1) = 0.5
P(x = 2) = 0.25

where P(x = x0 ) is the probability that the random variable x is equal to


the value x0 .
There! We have described the probability distribution of each possible
outcome of our experiment. The set of ordered pairs, [x, P(x = x0 )], where
4 1. Properties of Random Variables

x is a random variable and P(x = x0 ) is the probability that x assumes any


one of the values in its domain, describe the probability distribution of the
random variable x for this experiment. Note that the sum of the probability
of all the outcomes in the domain equals one; this is a requirement for all
discrete probability distributions.

1.2.2 Examples of Discrete Distributions


The Binomial Distribution

Coin-tossing experiments are an example of a general type of experiment


called a Bernoulli, or binomial, experiment. For example, a biologist may be
testing the effectiveness of a new drug in treating a disease. After infecting,
say, 30 rats, the scientist may then inject the drug into each rat and record
whether the drug is successful or not on a rat by rat basis. Each rat is a
“coin toss,” and the result is an either-or affair, just like heads-tails. Many
other experiments in all areas of science can be described in similar terms.
A Bernoulli experiment consists To generalize, a Bernoulli experiment has the following properties:
of a series of identical trials,
each of which has two possible 1. Each experiment consists of a number of identical trials (“coin flips”).
outcomes. The random variable, x, is the total number of “successful” trials ob-
served after all the trials are completed.

2. Each trial has only two possible results, “success” and “failure.” The
probability of success is p and the possibility of failure is q. Obvi-
ously, p + q = 1.

3. The probabilities p and q remain constant for all the trials in the
experiment.

In our simple coin-tossing example, we could deduce the probability


distribution of the experiment by simple inspection, but there is a more
general method. The probability distribution for any Bernoulli experiment
is given by the following function, p(x),
The binomial distribution n!
function describes the outcome p(x) = p x qn−x (1.1)
of Bernoulli experiments
x!(n − x)!

where n is the number of trials in the experiment, and n! is the factorial of


n. If we want to find the probability of a particular outcome P(x = x0 ), then
we must evaluate the binomial distribution functions at that value x0 .
Let’s imagine that, in our hypothetical drug-testing experiment (with
n = 30 rats), the probability of a successful drug treatment is p = 0.15.
Figure 1.2 shows the probability distribution of this experiment1 .
There are three common methods used to represent the probability dis-
tribution of a random variable:

1. As a table of values, where the probability every of possible outcome


in the domain is given. Obviously, this is only practical when the
number of possible outcomes is fairly small.

2. As a mathematical function. This is the most general format, but it


can be difficult to visualize. In some cases it may not be possible to
represent the probability distribution as a mathematical function.
1 Note that the binomial distribution becomes more difficult to calculate as the number of

trials, n, increases (due to the factorial terms involving n). There some other distribution
functions that can give reasonable approximations to the binomial function in such cases.
1.2. Probability Distributions of Discrete Variables 5

0.25

Probability, P(x) 0.20

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of successes, x

Figure 1.2: Binomial Probability Distribution. A graphical depiction of the binomial


probability distribution as calculated from eqn. 1.1 with n = 30 and p = 0.15.

3. As a graphical plot. This is a common method to examine probability


distributions.

Figure 1.3 on page 6 describes the outcome of an experiment using a plot


and a table, both of which were constructed using equation 1.2.

The Poisson Distribution

Besides binomial experiments, counting experiments are also quite common


in science. Most often, we are interested in counting the number of occur-
rences of some event within some time interval or region in space. For A counting experiment is an
example, we might want to characterize a photon detection rate by count- experiment in which events or
objects are enumerated in a
ing the number of photons detected in a certain time interval, or we might given unit of time or space.
want to characterize the density of trees in a plot of land by counting the
number of trees that occur in a given acre. The random variable in any
counting experiment is a positive whole number, the number of “counts.”
It is often true that this discrete random variable follows a Poisson distri-
bution.
Let’s say that we are counting alpha particles emitted by a sample of a
radioactive isotope at a rate of 2.5 counts/second. Our experimental mea-
surement is thus “counts detected in one second” and the domain consists
of all positive whole numbers (and zero).
The Poisson probability distribution for this experiment can be deter-
mined from the following general formula:
The Poisson distribution function
e−λt (λt)x
p(x) = (1.2) describes the outcome of many
x! counting experiments.

where λ is the average rate of occurrence of events, and t is the interval of


observation. Thus, for our experiment, the product λt = 2.5 events/second
× 1 second = 2.5 events. Let’s use this formula to calculate the probability
6 1. Properties of Random Variables

Probability

Measured alpha counts

Figure 1.3: The Poisson probability distribution, shown here as both a table and
a plot, describes the probability of observing alpha particle counts, as calculated
from eqn. 1.2 with λ = 2.5 counts/second and t = 1 second.

that we will measure 5 counts during one observation period:

e−2.5 (2.5)5
P(x = 5) = = 0.0668
5!
Figure 1.3 shows the probabilities of measuring zero through 10 counts
during one measurement period for this experiment.
Just like the binomial distribution, the Poisson distribution of discrete
variables has two important properties:

• The probability is never negative: P(x = x0 ) ≥ 0


∞
• The sum of all probabilities is unity: i=0 P(x = xi ) = 1

These properties are shared by all discrete probability distributions.

Advanced Topic: The Boltzmann Distribution

Probability distributions are necessary in order to characterize the out-


come of many experiments in science due to the presence of measurement
error, which introduces a random component to experimental measure-
ments. However, probability distributions are also vital in understanding
the nature of matter on a more fundamental level. This is because many
properties of a system, when viewed at the atomic and molecular scale,
are actually random variables. There is an inherent “uncertainty” of matter
and energy that is apparent at small scales; this nature of the universe is
predicted by quantum mechanics. What this means is that we must again
resort to the language of probability (and probability distributions) in order
to describe such systems.
Let us consider the energy of a molecule, which is commonly consid-
ered to be partitioned as electronic, vibrational, and rotational energy. As
you should know from introductory chemistry, a molecule’s energy is quan-
tized. In other words, the energy of a molecule is actually a discrete random
1.2. Probability Distributions of Discrete Variables 7

0.7

0.6

0.5 298 K
400 K
Probability

0.4

0.3

0.2

0.1

0.0
0 1 2 3 4 5 6 7 8 9 10

Vibrational quantum number

Figure 1.4: Probability distribution among vibrational energy levels of the I2


molecule at two different temperatures. The actual energy levels are given by
Eν = (ν + 2 ) · 214.6 cm−1 , where ν is the vibrational quantum number. The proba-
1

bility distribution assumes evenly spaced vibrational levels (i.e., the harmonic oscil-
lator assumption). Notice that at higher temperatures, there is a greater probability
that a molecule will be in a higher energy level.

variable. The random nature of the energy is an innate property of matter


and not due to random error in any measuring process.
Since molecular energy is a random variable, it must be described by
a probability distribution. If a molecule is in thermal equilibrium with its
environment, the probability that the molecule has a particular energy at
any given time is described by the Boltzmann distribution:
The Boltzmann probability
e−βx distribution function describes
p(x) =  βx (1.3) the energy of a molecule in
e
thermal equilibrium with its
surroundings.
where β = (kT )−1 , T is the temperature in K, and the denominator is a sum-
mation over all the possible energy states of the molecule. If the different
states of a molecule have evenly spaced energy levels and no degeneracy,
then the Boltzmann distribution function can be simplified to
 
p(x) = e−βx 1 − e−β∆E

where ∆E is the separation between energy levels. Figure 1.4 shows the
probability distribution for the vibrational energy of an I2 molecule at two
different temperatures.
We can interpret the Boltzmann distribution in two ways, both of which
are useful:

• The Boltzmann distribution gives the probability distribution of the


energy of a single molecule at any given time. For example, if we mea-
sure the vibrational energy of an I2 molecule at 298 K, then according
8 1. Properties of Random Variables

to the Boltzmann distribution there is a 64.5% probability that the


molecule is in the ground vibrational level (ν = 0). If we wait for a
time (say 10 seconds), and then measure again, then there is a 22.9%
chance that the molecule has absorbed some heat and is now in the
first excited vibrational energy level (ν = 1). Of course, there is still a
64.5% chance that the molecule is in the ground state.

• The Boltzmann distribution gives the fractional distribution of molec-


ular energy states in a chemical sample. Let’s imagine that we have a
sample of one million I2 molecules at 298K (which is not very many;
remember that one mole is about 1023 molecules). The Boltzmann dis-
tribution tells us that at any given time, about 645,000 molecules will
be in the ground vibrational energy level and about 229,000 molecules
will be in the first excited vibrational level. Molecules may be con-
stantly gaining and losing vibrational energy, through collisions and
by absorbing/emitting infrared light, but since there are so many
molecules, the total number of molecules at each energy level will
remain fairly constant. For this reason, the energy probability density
function is sometimes called the Boltzmann distribution of states.

1.3 Important Characteristics of Random


Variables
We have discussed the idea of probability distributions, in particular the
distributions of discrete variables. We will proceed to continuous variables
momentarily, but first we will discuss two important properties by which
we may characterize random variables, irregardless of the probability dis-
tribution: location and the dispersion.
Two important properties of Let’s take stock of the situation thus far: for any random variable, the
random variables are domain gives all possible values of the variable and the probability distri-
location (‘central tendency’)
and dispersion (‘variability’).
bution gives the likelihood of those values. Together these two pieces of
information provide a complete description of the properties of the ran-
dom variable. Two important properties of a variable are contained in this
description are:

• Location: the central tendency of the variable, which describes a value


around which the variable tends to cluster, and

• Dispersion: the typical range of values that might be expected to be


observed in experiments. This gives some idea of the spread in values
that might result from our experiment(s).

The probability distribution contains all the information needed to de-


termine these characteristics, as well as still more esoteric descriptors of
the properties of random variables. Since we have discussed the distribu-
tions of discrete variables, we will tend to use these in our discussions and
examples; however, the same concepts apply, with very little modification,
to continuous variables.

1.3.1 Central Tendency of a Random Variable


The central tendency, or location, of a variable can be indicated by any (or
all) of the following: the mode, the median, or the mean. Although most
1.3. Important Characteristics of Random Variables 9

people are familiar with means, the other two properties are actually easier
to understand.

Mode

The mode is the most probable value of a discrete variable. More generally,
it is the maximum of the probability distribution function: the value of
xmode such that Multi-modal Distribution
p(xmode ) = Pmax

probability
Multi-modal probability distributions have more than one mode — distri-
butions with two modes are bimodal, and so on. Although multi-modal dis-
tributions may have several local maxima, there is usually a single global
maximum that is the single most probable value of the random variable.
In the example with the alpha particle measurements (see fig. 1.3), the value

mode of the distribution — xmode = 2 — can be determined by glancing at


the bar graph of the Poisson distribution.

Median

The median is only a little more complicated than the mode: it is the value
Q2 such that
P(x < Q2 ) = P(x > Q2 )
In other words, there is an equal probability of observing a value greater
than or less than the median.
The median is also the second quartile — hence the origin of the symbol A distribution is sometimes split
Q2 . Any distribution can be divided into four equal “pieces,” such that: up ten ways, into deciles. The
median is the fifth decile, D5 .

P(x < Q1 ) = P(Q1 < x < Q2 ) = P(Q2 < x < Q3 ) = P(x > Q3 )

The boundaries Q1 , Q2 (i.e., the median), and Q3 are the quartiles of the
probability distribution.

Mean

Before defining the mean, it is helpful to discuss a mathematical opera-


tion called the weighted sum. Most everybody performs weighted sums —
especially students calculating test averages or grade point averages! For
example, let’s imagine that a student has taken two tests and a final, scoring
85 and 80 points on the tests, and 75 points on the final. An “unweighted
average” of these three numbers is 80 points; however, the final is worth
(i.e., weighted) more than the tests. Suppose that the instructor feels that
the final is worth 60% of the test grade, while the other two tests are worth
20% each. The weighted sum would be calculated as follows.

Let w1 = w2 = 0.2, and w3 = 0.6



weighted score = wi · scor e = w1 · 85 + w2 · 80 + w3 · 75 = 78
The final score, 78, is a weighted sum. Since the final is weighted more
than the test, the weighted sum is closer to the final score (75) than is the
unweighted average (80). Grade point averages are calculated on a similar
principle, where the weights for each grade are determined by the course
credit hours. To choose an example from chemistry, the atomic weights
10 1. Properties of Random Variables

listed in the periodic tables are weighted averages of isotope masses; the
weights are determined by the relative abundance of the isotopes.
In general, a weighted sum is represented by the expression

weighted sum = wi xi (1.4)

where xi are the individual values, and wi are the corresponding


 weights.
When the sum of all the individual weights is one ( wi = 1) then the
weighted sum is often referred to as a weighted average.
The mean of a discrete random variable is simply a weighted average,
using the probabilities as the weights. In this way, the most probable values
have the most “influence” in determining the mean; this is why the mean is
a good indicator of central tendency. The mean, or expected value, E(x),
of a random variable is defined as follows: for a discrete variable, it is

E(x) = µx = xi p(xi ) (1.5a)

while for a continuous variable, it is


+∞

E(x) = µx = x p(x) dx (1.5b)


−∞

where p(x) is a mathematical function that defines the probability distri-


bution of the random variable x.
The means of binomial and Poisson distributions are given by the fol-
lowing general formulas:
µx = n · p
for the binomial distribution, where n is the number of trials and p is the
probability of success for each trial. For a variable described by the Poisson
distribution, the mean can be calculated as

µx = λ · t

where λ is the mean “rate” and t is the measurement interval (usually a


time or distance).

Comparison of Location Measures

We have defined three different indicators of the location of a random vari-


able: the mean, median and mode. Each of these has a slightly different
meaning.
Imagine that you are betting on the outcome of a particular experiment:

• If you choose the mode, you are essentially betting on the single most
likely outcome of the experiment.

• If you choose the median, you are equally likely to be larger or smaller
than the outcome.

• If you choose the mean, you have the best chance of being closest to
the outcome.

A random variable cannot be predicted exactly, but each of the three


indicators gives a sense of the what value the random variable is likely to
1.3. Important Characteristics of Random Variables 11

median

median
mode

mode
mean

mean
(a) (b)

Figure 1.5: Comparison of values of mean, median and mode for (a) positively
and (b) negatively skewed probability distributions. For symmetrical distributions
(so-called ‘bell-shaped’ curves) the three values are identical.

be near. In most applications, the mean gives the best single description of The mean is the most common
the location of the variable. descriptor of the location of a
Just how different are the values of the mean, median and mode? It random variable.
turns out that the three values are different only for assymetric distribu-
tions, such as the two shown in figure 1.5. If a distribution is skewed to the
right (or positively skewed; fig. 1.5(a)) then

µx > Q2 > xmode

while for distributions skewed to the left (negatively skewed; fig. 1.5(b))

µx < Q2 < xmode

For symmetrical (“bell-shaped”) distributions, the mean, median and mode


all have exactly the same value.

1.3.2 Dispersion of a Random Variable


Some variables are more “variable,” more uncertain, than others. Of course,
theoretically speaking, a variable may assume any one of the range of val-
ues in the domain. However, when speaking of the variability of a random
variable, we generally mean the range of values that would commonly (i.e.,
most probably) be observed in an experiment. This property is called the
Statisticians sometimes use the
dispersion of the random variable. Dispersion refers to the range of values term scale instead of dispersion.
that are commonly assumed by the variable.
Experiments that produce outcomes that are highly variable will be more
likely to give values that are farther from the mean than similar experi-
ments that are not as variable. In other words, probability distributions
tend to be broader as the variability increases. Figure 1.6 compares the
probability functions (actually called “probability density functions”) of two
continuous variables.
As with the mean, it is convenient to describe variability with a single
value. Three common ways to do so are:

1. The interquartile range and the semi-interquartile range

2. The mean absolute deviation


12 1. Properties of Random Variables

0.14

0.12

Probability density
0.10

0.08

0.06

0.04

0.02

0.00
0 5 10 15 20 25 30 35 40

Value

Figure 1.6: Comparing the variability of two random variables. The variable de-
scribed by the broader probability distribution (dotted line) is more likely to be
farther from the mean than the other variable.

3. The variance, and the standard deviation.

These will now be described.

(Semi-)Interquartile Range

The interquartile range, IQR, is the difference between the first and third
quartiles (see figure 1.7):
IQR = Q3 − Q1

This is a measure of dispersion because the “wider” a distribution gets, the


greater the difference between the quartiles.
The semi-interquartile range, QR is probably the more commonly used
measure of dispersion; it is simply half the interquartile range.

Q3 − Q1
QR = (1.6)
2

Mean Absolute Deviation

The mean absolute deviation is Since the dispersion describes the spread of a random variable about its
the expected value of | x − µx | mean, it makes sense to have a quantitative descriptor of this quantity. The
mean absolute deviation, MD, is exactly what it sounds like: the expected
value (i.e., the man) of the absolute deviation of a variable from its mean
value, µx .
MD ≡ E (| x − µx |)

The concept behind the mean absolute deviation is quite simple: it indi-
cates the mean (‘typical’) distance of a variable from its own mean, µx . For
1.3. Important Characteristics of Random Variables 13

Interquartile Range

Probability

Q1 Q2 Q3

Value

Figure 1.7: The interquartile range is a measure of the dispersion of a random


variable. It is the difference between the first and third quartiles of a distribution,
Q3 − Q1 , where the quartiles divide the distribution into four equal parts (see page
9). The semi-interquartile range is also a common measure of dispersion; it is half
the interquartile range.

a discrete variable,

MD = | xi − µx | p(x) (1.7a)

while for a continuous variable,


 +∞
MD = | x − µx | p(x) (1.7b)
−∞

Variance and Standard Deviation

Like the mean absolute deviation, the variance and standard deviation mea- The variance is the expected
sure the dispersion of a random variable about its mean µx . The variance value of (x − µx )2 , and the
standard deviation is the positive
of a random variable x, σx2 , is the expected value of (x − µx )2 , which is the root of the variance.
squared deviation of x from its mean value:
 
σx2 ≡ E (x − µx )2

As you can see, the concept of the variance is very similar to that of the
mean absolute deviation. In fact, the variance is sometimes called the mean
squared deviation. The variance for discrete and continuous variables is
given by

σx2 = (xi − µx )2 p(xi ) (1.8a)
i
∞
σx2 = (x − µx )2 p(x) dx (1.8b)
−∞
14 1. Properties of Random Variables

Look at the discrete variable (eqn. 1.8a): we have another weighted sum!
The values being summed, (xi − µx )2 , are the squared deviations of the
variable from the mean. The squared deviations indicate how far the value
xi is from the mean value µx , and the weights in the sum, as in eqn. 1.5, are
the probabilities of xi . Thus, “broader” probability distributions will tend
to have larger weights for values of x that have larger squared deviations
(x − µx )2 (and hence are more distant from the mean). Such distributions
will give larger values for the variance, σx2 . Higher variance signifies greater
variability of a random variable.
One problem with using the variance to describe the dispersion of a
random variable is that the units of variance are the squared units of the
original variable. For example, if x is a length measured in m, then σx2 has
The standard deviation is units of m2 . The standard deviation, σx , has the same units as x, and so
calculated
 from the variance: is a little more convenient at times. The standard deviation is simply the
σx = + σx2 positive square root of the variance.
Sometimes the variability of a random variable is specified by the rela-
tive standard deviation, RSD:
The RSD is an alternate way to σx σx
present the standard deviation
RSD = or RSD =
µx x
Both of these expressions are commonly used to calculate RSD; which one
is used is usually obvious from the context. The RSD can be expressed as a
fraction or as a percentage. The RSD is sometimes called the coefficient of
variation (CV).

Comparison of Dispersion Measures

The standard deviation, σx , is We have described three common ways to measure a random variable’s
the most common measure of dispersion: semi-interquartile range, QR , mean absolute deviation, MD, and
dispersion.
the standard deviation, σ . These measures are all related to each other, so,
in a sense, it makes no difference which we use. In fact, for distributions
that are only moderately skewed, MD ≈ 0.8σ and QR ≈ 0.67σ . For a
variety of reasons (which are beyond the scope of this text), the variance
and standard deviation are the best measures of dispersion of a random
variable.

Aside: Quantum Variability


As stated earlier, quantum mechanics asserts that many of the properties
of matter at the atomic/molecular scale are inherently unpredictable (i.e.,
random). The magnitude of the variability of these properties only becomes
apparent on a sufficiently small scale. Hence, these variables must be de-
scribed by a probability distribution with a certain mean and standard de-
viation. This ability to interpret system properties such as energy and posi-
tion as random variables is an example of the broad scope of the concepts
contained in the study of probability and statistics.
The term ‘uncertainty’ in One of the most important relationships in quantum mechanics is Heisen-
Heisenberg’s principle refers to berg’s Uncertainty Principle. The Uncertainty Principle states that the prod-
the standard deviation of values
used to describe properties at
uct of the standard deviation of certain random variables, called comple-
the atomic/molecular level. mentary variables, or complementary properties, has a lower limit. For
example, the linear momentum p and the position q of a particle are com-
plementary properties; the Heisenberg Uncertainty Principle states that
h
σp · σq ≥

1.4. Probability Distributions of Continuous Variables 15

As stated in the Uncertainty Principle, the standard deviation of comple-


mentary variables are inversely related to one another. In other words, if a
particle such as an electron is constrained to remain confined to a certain
area, then the uncertainty in the linear momentum is great: i.e., if σq is
small (e.g., for a confined electron) then σp is large.

1.4 Probability Distributions of Continuous


Variables
1.4.1 Introduction
Properties such as mass or voltage are typically free to assume any value;
hence, they are continuous variables. There is one fundamental distinction
between discrete and continuous variables: the probability of a continuous
random variable, x, exactly assuming one of the values, x0 , in the domain
is zero! In other words,
P(x = x0 ) = 0
How, then, do we specify the probabilities of continuous random vari-
ables? Instead of calculating the probabilities of specific continuous vari-
ables, we determine the probability that the outcome is within a given range
of continuous variables. In order to find the probability that the random
variable x will be between two values x1 and x2 , we can use a function p(x)
such that

x2

P(x1 ≤ x ≤ x2 ) = p(x) dx (1.9)


x1

The function p(x) is called the probability density function of the continu- The probability density function,
ous random variable x. Figure 1.8 demonstrates the general idea. sometimes called the probability
mass function, is used to
Just as the probability of a discrete variable must sum to one over the determine probabilities of
entire domain, the area under the probability density function within the continuous random variables.
range of possible values for x must be one. For example, if the domain
ranges from −∞ to ∞, then

∞
p(x) dx = 1
−∞

As in the discrete case, the value of the function p(x) must be posi-
tive over its entire range. The probability density function allows us to
construct a probability distribution for continuous variables; indeed, some-
times it is called simply a “distribution function,” as with discrete variables.
However, evaluation of the probability density function for a particular
value x0 does not yield the probability that x = x0 — that probability is
zero, after all — as it would for a discrete distribution function.
Probability distributions are thus a little more complicated for continu-
ous variables than for discrete variables. The main difference is that prob-
abilities of continuous variables are defined in terms of ranges of values,
rather than single values. The probability density function, p(x) (if one
exists) can be used to determine these probabilities.
16 1. Properties of Random Variables

x1 x2

Probability density

Value

Figure 1.8: Probability characteristics of continuous variables. The curve is the


probability density function, and the shaded area is the probability that the random
variable will be between x1 and x2 . The area under the entire curve is one.

1.4.2 Normal (Gaussian) Probability Distributions


By far, the most common probability distribution in science is the Gaussian
distribution. In very many situations, it is assumed that continuous random
variables follow this distribution; in fact, it is so common that it is simply
referred to as the normal probability distribution. The probability density
function of this distribution is given by the following equation:
The normal probability 
distribution describes the 1 (x − µ)2
characteristics of many N(x : µ, σ ) = √ exp − (1.10)
continuous random variables σ 2π 2σ 2
encountered in measurement
science. where the expression N(x : µ, σ ) conveys the information that x is a
normally-distributed variable with mean µ and standard deviation σ . Fig-
ure 1.9 shows a normal probability density function with µx = 50 and
σx = 10. Note that it is a symmetric distribution with the well-known “bell-
curve” shape.
Calculating probability distributions of continuous variables using the
probability density function is a little more complicated than with discrete
variables, as shown in example 1.1.

Example 1.1
Johnny Driver is a conscientious driver; on a freeway with a posted speed
limit of 65 mph, he tries to maintain a constant speed of 60 mph. How-
ever, the car speed fluctuates during moments of inattention. Assuming
that car speed follows a normal distribution with a mean µx = 60 mph
and standard deviation σx = 3 mph, what is the probability that Johnny
is exceeding the speed limit at any time?

Figure 1.10 shows a sketch of the situation. The car speed is a random
variable that is normally distributed with µx = 60 mph and σx = 3 mph.
1.4. Probability Distributions of Continuous Variables 17

0.05

0.04
Probability density

0.03

0.02

0.01

0.00
0 20 40 60 80 100

Measurement value

Figure 1.9: Plot of the Gaussian (“normal”) probability distribution with µ = 50 and
σ = 10. Note that most of the area under the curve is within 3σ of the mean.

We need to determine the probability that x is greater than 65 mph, which


is the shaded area under the curve in the figure:
∞ 
1 (x − µx )2
P(x > 65) = √ exp − dx
2π σ 2σ
65

where µx and σx are given the appropriate values. When this integral is
evaluated, a value of 0.0478 is obtained. Thus, there is a 4.78% probability
that Johnny is speeding.

1.4.3 The Standard Normal Distribution


In calculating probabilities of continuous variables, it is usually necessary
to evaluate integrals, which can be inconvenient (a computer program is re-
quired in the case of normally-distributed variables) and tedious. It would
be preferable if there were tables of integration values available for ref-
erence; there are, in fact, many tables available for just this purpose. Of
course, an integration table will only be valid for a specific probability dis-
tribution. However, the normal distribution is not a single distribution, but
is actually a family of distributions: changing the mean or variance of the
variable will give a different distribution. It is not practical to formulate
integration tables for all possible values of µ and σ 2 ; fortunately, this is
not necessary, as we will see now.
A special case of the normal distribution (eqn. 1.10) occurs when the
mean is zero (µx = 0) and the variance is unity (σx2 = σx = 1); this particular
normal probability distribution is called the standard normal distribution,
N(z).

The standard normal distribution
1 z2 is a special version of the normal
N(z) = √ exp − (1.11) distribution. It is useful in
2π 2
solving problems like
example 1.1.
18 1. Properties of Random Variables

x = 65
0.14

0.12

Probability density
0.10

0.08

0.06

0.04

0.02

0.00
50 52 54 56 58 60 62 64 66 68 70

Car speed, mph

Figure 1.10: Sketch of distribution of the random variable in example 1.1. The
area under the curve is the value we want: P(x > 65) = 0.0478

Other than giving a simplified form of the normal distribution function,


the standard normal distribution is important because integration tables
of this function exist that can be used to calculate probability values for
normally-distributed variables. In order to use these tables, it is necessary
to transform a normal variable, with arbitrary values of µ and σ 2 , to the
standard normal distribution. This transformation is accomplished with
z-transformation, which is usually called standardization.
Taking a variable x, we define z such that
x − µx
z= (1.12)
σx
The transformed value z is the z-score of the value x. The z-score of a
value is the deviation of the value from its mean µx in units of the standard
deviation, σx , as illustrated in the example 1.2.

Example 1.2
Let’s say we set up an experiment such that the outcome is described by
a normal distribution with µx = 25.0 and σx = 2.0. A single measurement
yields x0 = 26.4; what is the z-score of this measurement?

The value is calculated directly from eqn. 1.12


x0 − µx 26.4 − 25.0
z0 = = = 0.7
σx 2.0
Thus, the measurement is +0.7σ from the mean.
The process of standardization of a random variable x yields another
variable z; if x is normally distributed with mean µx and standard deviation
σx , then z is also normally distributed with µ = 0 and σ = 1. This illus-
trates an important concept: any value calculated using one or more ran-
dom variables is also a random variable. In other words, the calculations
associated with standardization did not rid x of its innate “randomness.”
1.4. Probability Distributions of Continuous Variables 19

Although there are no integration tables for a normally-distributed vari-


able x with arbitrary mean and standard deviation, we can apply the z-
transformation and use the integration tables of the standard normal dis-
tribution. Tables of the standard normal distribution usually give cumula-
left tail
tive probabilities, which correspond to the areas in one of the “tails” under
the density function. The ‘left tail’ is given by
z0

z0

P(z < z0 ) = N(z)dz (1.13)


−∞
P(z < z0 )
while the ‘right tail’ area is calculated from
right tail

+∞

P(z > z0 ) = N(z)dz (1.14)


z0
z0

The next example will show how we can use the z-tables to calculate prob-
abilities of normally-distributed variables. P(z > z0 )

Example 1.3
In example 1.1 we determined by integration the probability that a car
of variable speed was exceeding the speed limit (65 mph); the mean and
standard deviation of the car speed were 60 mph and 3 mph, respec-
tively. Now solve this problem using z-tables.

The problem can be re-stated as follows: determine the probability

P(x > x0 ) =?

where x is a normally-distributed variable with µx = 60 mph, σx = 3 mph,


and x0 = 65 mph. The only way to solve this problem is by integration. We
can use the z-tables if we first standardize the variables. The z-transformed
problem reduces to

x − µx x0 − µx
P(x > x0 ) = P >
σx σx
= P(z > z0 )

where z is described by the standard normal distribution, and z0 is the


appropriate z-score:

x0 − µx 65 − 60
z0 = = ≈ 1.67
σx 3
Now we can use the z-table to find the area in the ‘right tail’ of the
z-distribution. From the z-table, we see that

P(x > 65) ≈ P(z > 1.67) = 0.0475

This answer agrees (more or less) with our previous value, 0.0478 (see
example 1.1). The slight difference is due to the fact that 53 does not exactly
equal 1.67.
20 1. Properties of Random Variables

Important Relationships for Standard Normal Distributions

You should become very familiar The Appendix presents a number of useful statistical tables, including one
with the concepts presented in for the standard normal distribution (i.e., a z-table). Since the normal dis-
this section.
tribution is symmetric, there is no need to list the areas corresponding to
both negative and positive z-score values, so most tables only present half
of the information. The z-table given in this book lists right-tail areas asso-
ciated with positive z-scores. In order to calculate the areas corresponding
to various ranges of normally-distributed variables, using only right-tail ar-
eas, a few important relationships should be learned.

⇒ Calculating left-tail areas: P(z < −z0 )


Since the normal distribution is symmetric, the following relationship is
true:

P(z < −z0 ) = P(z > z0 ) (1.15)

This expression allows one to calculate left-tail areas from right-tail areas,
and vice versa. This symmetric nature of the normal probability distribu-
tion is illustrated here:

-z0 z0
=

⇒ Calculating probabilities greater than 0.5: P(z > −z0 )


As mentioned previously, most tables (including the one in this book) only
list the areas for half the normal curve. That is because areas correspond-
ing to the other half — i.e., probabilities larger than 0.5 — can easily be
calculated. Out table only lists the right-tail areas for positive z-scores;
thus, we need a way to calculate right-tail areas for negative z-scores. We
would use the following equation:

P(z > −z0 ) = 1 − P(z > z0 ) (1.16)

A pictorial representation of this equation is:


-z0 z0
= –

⇒ Calculating ‘middle’ Probabilities: P(z1 < z < z2 )


Instead of “tail” areas (i.e., P(z > z0 ) or P(z < −z0 )), it is often necessary to
calculate the area under the curve between two z-scores. The most general
expression for this situation is

P(z1 < z < z2 ) = P(z > z1 ) − P(z > z2 ) (1.17)

Again, in picture form:


z1 z2 z1 z2

= –
1.4. Probability Distributions of Continuous Variables 21

It is important to become adept at using z-tables to calculating proba-


bilities of normally-distributed variables. The following two examples illus-
trate some of the problems you might encounter.

Example 1.4
A soft-drink machine is regulated so that it discharges an average vol-
ume of 200. mL per cup. If the volume of drink discharged is normally
distributed with a standard deviation of 15 mL,

(a) what fraction of the cups will contain more than 224 mL of soft
drink?

(b) what is the probability that a 175 mL cup will overflow?

(c) what is the probability that a cup contains between 191 and 209 mL?

(d) below what volume do we get the smallest 25% of the drinks?

In answering these types of questions, it is always helpful to draw a quick


sketch of the desired area, as we do here (in the margins).

(a) This problem is similar to previous ones: we must find a right-tail area
P(x > x0 ), where x0 = 224 mL. To do so, we can use the z-tables if we first
calculate z0 , the z-score of x0 . 224


x − µx z0 − µx
P(x > x0 ) = P >
σx σx
150 175 200 225 250
224 − 200
=P z> = P(z > 1.6) P(x > 224 mL)
15
= 0.0548

Looking in the z-tables yields the answer. There is a 5.48% probability that
a 224 mL cup will overflow.

(b) In this case, the z-score is negative, so that we must use eqn. 1.16 to find
the probability using the z-tables in the Appendix.

175 − 200
P(x > 175 mL) = P z > 175
15
= P(z > −1.67) = 1 − P(z > 1.67)
= 1 − 0.0475 = 0.9525
150 175 200 225 250

P(x > 175 mL)


There is a 95.25% probability that the 175 mL cup will overflow. A common
mistake in this type of problem is to calculate P(z > z0 ) (0.0475) instead
of 1 − P(z > z0 ) (0.9525); referring to a sketch helps to catch this problem,
since it is obvious from the sketch that the probability should be greater
than 50%.
191 209

(c) We must find P(x1 > x > x2 ), where x1 = 191 mL and x2 = 209 mL. To do
so using the z-tables, we must find the z-scores for both x1 and x2 , and
then use eqn. 1.17 to calculate the probability.

150 175 200 225 250


P(191 mL < x < 209 mL)
22 1. Properties of Random Variables


191 − 200 209 − 200
P(191 mL < x < 209 mL) = P <z<
15 15
= P(−0.6 < z < +0.6)
= 1 − P(z < −0.6) − P(z > +0.6)
= 1 − 2 · P(z > +0.6) = 1 − 2 · 0.2743
= 0.4514
So there is a 45.14% probability that a cup contains 191–209 mL.
(d) This question is a little different than the others. We must find a value x0
such that P(x < x0 ) = 0.25. In all of the previous examples, we began with
x0 a value (or a range of values) and then calculated a probability; now we are
doing the reverse — we must calculate the value associated with a stated
probability. In both cases we use the z-tables, but in slightly different ways.
To begin, from the z-tables we must find a value z0 such that P(z < z0 ) =
25%
0.25. Looking in the z-tables, we see that P(z > 0.67) = 0.2514 and P(z >
150 175 200 225 250
0.68) = 0.2483; thus, it appears that a value of 0.675 will give a right-tailed
P(x < x0 ) = 0.25
area of approximately 0.25. Since we are looking for a left-tailed area, we
can state that
P(z < −0.675) ≈ 0.25
Our next task is to translate this z-score into a volume; in other words, we
want to “de-standardize” the value z0 = −0.675 to obtain x0 , the volume
that corresponds to this z-score. From eqn. 1.12 on page 18, we may write
x0 = µx + z0 · σx
= 200 + (−0.675)(15) mL
= 189.9 mL

Thus, we have determined that the drink volume will be less than 189.9 mL
with probability 25%.

Example 1.5
The mean inside diameter of washers produced by a machine is 0.502 in,
and the standard deviation is 0.005 in. The purpose for which these
washers are intended allows a maximum tolerance in the diameter of
0.496–0.508 in; otherwise, the washers are considered defective. Deter-
mine the percentage of defective washers produced by the machine, as-
suming that the diameters are normally distributed.
0.496 0.508

We are looking for the probability that the washer diameter is either less
than 0.496 in or greater than 0.508 in. In other words, we want to calculate
the sum P(x < 0.496 in) + P(x > 0.508 in).
First we must calculate the z-scores of the two values x1 and x2 , where
0.48 0.49 0.50 0.51 0.52
x1 = 0.496 in and x2 = 0.508 in. Then we can use the z-table to determine
P(x < 0.496) + P(x > 0.508) the desired probability.
x1 − µx
z1 =
σx
0.496 − 0.502
= = −1.2
0.005
x2 − µx
z2 =
σx
0.508 − 0.502
= = 1.2
0.005
1.4. Probability Distributions of Continuous Variables 23

We can see that z1 = −z2 ; in other words, the two tails have the same area.
Thus,

P(x < x1 ) + P(x > x2 ) = P(z < z1 ) + P(z > z2 ) Remember:


x1 = 0.496 in
= 2 · P(z > 1.2) = 2 · 0.1151 x2 = 0.508 in
= 0.2302

Thus, 23.02% of the washers produced by this machine are defective.

Aside: Excel Tip


Modern spreadsheet programs, such as MS Excel, contain a number of sta-
tistical functions, including functions that will integrate the normal prob-
ability distribution. In Excel, the functions NORMDIST and NORMSDIST will Note that both functions return
perform these integrations for normal and standard normal distributions, left-tail areas rather than the
right-tail areas we use in this
respectively. These can be especially useful in determining integration val- book
ues that are not in z-tables, or when the tables are not readily available.
View the on-line help documentation in Excel for more information on
how to use these functions. Note that NORMSDIST was used in generating
the z-table in the Appendix. In fact, all of the statistical tables were gen-
erated in Excel — other useful spreadsheet functions will be highlighted
throughout this book.

Further Characteristics of Normally-Distributed Variables

Before leaving this section, consider the following characteristics of all ran-
dom variables that follow a normal distribution.
• Approximately two-thirds of the time the random variable will be
within one standard deviation of the mean value; to be exact,

P(µx − σx < x < µx + σx ) = 0.6827

• There is approximately a 95% probability that the variable will be


within two standard deviations of the mean:

P(µx − 2σx < x < µx + 2σx ) = 0.9545

You should be able to use the z-tables to obtain these probabilities;


you might want to verify these numbers as an exercise.
These characteristics, which are shown in figure 1.11, are useful rules of
thumb to keep in mind when dealing with normally-distributed variables.
For example, a measurement that is five standard deviations above the
mean is not very likely, unless there is something wrong with the mea-
suring device (or there is some other source of error).

1.4.4 Advanced Topic: Other Continuous Probability


Distributions
At this point, we have described several important probability distri-
butions, along with the types of experiments that might result in these
distributions.
24 1. Properties of Random Variables

68% 95%

-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
z-score z-score

(a) (b)

Figure 1.11: Characteristic of normally-distributed variables: the shaded area rep-


resents the probability that a normally-distributed variable will assume a value
within (a) one or (b) two standard deviations of the mean.

• Bernoulli experiments (“coin tossing experiments”) are common and


their outcomes are described by the binomial probability distribution,
which is a discrete probability distribution.

• Counting experiments are also common, and these often result in vari-
ables described by a Poisson distribution, which is also a discrete dis-
tribution.

• Many continuous variables are adequately described by the Gaussian


(‘normal’) probability distribution.

Still, there are some situations that result in continuous variables that
cannot be described by a normal distribution. We will describe two other
continuous probability functions, but there are many more.

The Exponential Distribution

Let’s go back to counting experiments (see page 5). In this type of exper-
iment, we are interested in counting the number of events that occur in a
unit of time or space. However, let’s say we change things around, as in the
following examples.

• We may count the number of photons detected per unit time (a dis-
crete variable) or we may measure the time between detected photons
(a continuous variable).

• We may count the number of cells in a solution volume (a discrete


variable) or we may measure the distance between cells (a continuous
variable).

• We may count the numbers of molecules that react per unit time (i.e.,
the reaction rate, a discrete variable) or we may be interested in the
time between reactions (a continuous variable).

• We may count the number of cars present on a busy street (a discrete


variable) or we may measure the distance between the cars (a contin-
uous variable).
1.4. Probability Distributions of Continuous Variables 25

2.5

Probability density
2.0

1.5

1.0

0.5

0.0
0.0 0.5 1.0 1.5 2.0 2.5

Time, s

Figure 1.12: Exponential probability distribution with µx = σx = 0.4 s. This distri-


bution describes the time interval between α-particles emitted by a radioisotope;
see page 5 for more details.

And so on. We are essentially “flipping” the variable from events in a


given unit of time (or space) to the time (or space) between events. If the
discrete variable — the number of events — in these examples is described
well by a Poisson distribution, then the continuous variable is described by
the exponential probability density function. If the number of counts follows a
Poisson distribution, then the
p(x) = ke−kx (1.18) interval between counts follows
an exponential distribution.
where k is a characteristic of the experiment. In fact, for an exponentially
distributed variable the mean, median, and standard deviation are given by

µx = k−1 In certain applications, the mean


−1 of the exponential distribution is
Q2 = ln (2) · k called the lifetime, τ, and the
σx = k−1 median is the half-life, t1/2 .

The mean (and standard deviation) of the exponential distribution is


the inverse of the mean of the corresponding Poisson distribution. For ex-
ample, we described an experiment on page 5 in which we were counting
α-particles emitted by a sample of a radioactive isotope at a mean rate of
2.5 counts/second. It only stands to reason that the mean time between
1
detected α-particles would be 2.5 = 0.4 seconds. The corresponding expo-
nential distribution is shown in figure 1.12.
Statistical tables for the exponential distribution are not often given
because the integral of the exponential probability density function is easy
to evaluate: the probability that x is between x1 and x2 is given by

P(x1 < x < x2 ) = e−x1 /µx − e−x2 /µx (1.19)

Exponential probability distributions are common in chemistry, but you


may not be used to thinking of them as probability distributions. Any-
time you come across a process that experiences an “exponential decay,”
26 1. Properties of Random Variables

chances are that you can think of the process in terms of a counting exper-
iment. Examples of exponential decays are:

• the decrease in concentration in a chemical reaction (first-order rate


law);

• the decrease in light intensity as photons travel through an absorbing


medium (Beer’s Law);

• the decrease in the population in an excited energy state of an atom


or molecule (lifetime measurements).

All of these processes can be observed in a counting experiment, with char-


acteristic Poisson and exponential distributions.

Atomic and Molecular Orbitals

Atomic and molecular orbitals are probability density functions for the po-
sition of an electron in an atom or molecule. Such orbitals are sometimes
Atomic and molecular orbitals called electron density functions. They allows us to determine the prob-
are simply probability ability that the electron will be found in a given position relative to the
distributions describing the
position of electrons in atoms
nucleus. The different orbitals (e.g., 2s or 3px atomic orbitals) correspond
and molecules, respectively. to different probability density functions.
The electron density functions actually contain three random variables,
since they give the probability that an electron is at any given point in
space. As such they are really examples of joint probability distributions of
the three random variables corresponding to the coordinate axes (e.g., x, y
and z in a Cartesian coordinate system). For spherically-symmetric orbitals,
it is convenient to rewrite the joint probability distribution in terms of a
single variable r , which is the distance of the electron from the nucleus. For
the 1s orbital, this probability density function (called a radial distribution
function) has the following form:
Radial probability distribution
p(r ) = kr 2 e−3r /µr
function for atomic 1s orbitals.

where µr is the mean electron-nucleus distance and k is a normalization


constant that ensures that the integrated area of the function is one. The
values of µr and k will depend on the identity of the atom. The radial
density function for the hydrogen 1s atomic orbital is shown in figure 1.13.
On page 6 we observed that the energy of a molecule is a random vari-
able that can be described by the Boltzmann probability distribution; now
we have encountered yet another property at the atomic/molecular scale
that must be considered a random variable. Electron position can also only
be described in terms of a probability distribution. Understanding the na-
ture and properties of random variables and their probability distributions
thus has applications beyond statistical data analysis.

1.5 Summary and Skills


The single most important skill A random variable is a variable that cannot be predicted with absolute cer-
developed in this chapter is the tainty, and must be described using a probability distribution. The location
ability to use z-tables to do
probability calculations involving
of the probability distribution is well described by the mean, µx , of the
normally-distributed random variable, and the inherent uncertainty in the variable is usually described
variables. by its standard deviation, σx .
1.5. Summary and Skills 27

52.9 pm

0.010

Probability density
0.008

0.006

0.004

0.002

0.000
0 50 100 150 200 250

Radial distance, pm

Figure 1.13: Radial distribution function of the hydrogen 1s orbital. The dotted
line at the mode indicates the Bohr radius, a0 , of the orbital, where a0 = 52.9 pm.
The mean radial distance µr for this orbital is 79.4 pm.

There are two general types of quantitative random variables: discrete


variables, which can only assume certain values (e.g., integers) and con-
tinuous variables. Examples of important discrete probability distributions
include the binomial and Poisson distributions — these functions allow one
to predict the outcome of Bernoulli and counting experiments, respectively.
Both of these types of experiments are quite common in science.
For continuous variables, the probability density function can be used
to find the probability that the variable is within a certain range of values,
P(x1 < x < x2 ). The most important family of probability density func-
tions is the Gaussian, or normal, probability distribution. The standard
normal distribution specifically describes a normally-distributed variable
with µx = 0 and σx = 1; integration tables of the cumulative standard nor-
mal distribution (i.e., z-tables) can be used to calculate probabilities for any
normally-distributed variable.
Another important probability density function is the exponential dis-
tribution, which describes the interval between successive Poisson events
in a counting experiment.
Finally, properties of matter at the atomic/molecular scale must often
be described using probability distributions. In particular, molecular en-
ergy is a discrete random variable that may be described by the Boltzmann
probability distribution, and electron position is a continuous random vari-
able whose probability density function is called an atomic (or molecular)
orbital. The Heisenberg Uncertainty Principle describes the relationship be-
tween the standard deviations of certain sets of random variables called
complementary variables.

You might also like