Math103 ProbStats Raaz 2010 S1 PDF

Elements of Probability and Statistics
Probability Theory provides the mathematical models of phenomena governed by chance.

Examples of such phenomena include weather, lifetime of batteries, traffic congestion, stock
exchange indices, laboratory measurements, etc. Statistical Theory provides the mathe-
matical methods to gauge the accuracy of the probability models based on observations or
data. The remaining Lectures are about this topic.
“Essentially, all models are wrong, but some are useful.” — George E. P. Box.
Contents
1 Sets, Experiments and Probability 3
1.1 Rudiments of Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Random Variables 15
2.1 Discrete Random Variables and their Distributions . . . . . . . . . . . . . . 18
2.1.1 Discrete uniform random variables with finitely many possibilities . . 19
2.1.2 Discrete non-uniform random variables with finitely many possibilities 20
2.1.3 Discrete non-uniform random variables with infinitely many possibilities 22
2.2 Continuous Random Variables and Distributions . . . . . . . . . . . . . . . . 25
3 Expectations 33
4 Tutorial for Week 1 38

4.1 Preparation Problems (Homework) . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 In Tutorial Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Tutorial for Week 2 43

5.1 Preparation Problems (Homework) . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 In Tutorial Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
List of Tables
1 f (x) and F (x) for the sum of two independent tosses of a fair die RV X. . . 21
2 DF Table for the Standard Normal Distribution. . . . . . . . . . . . . . . . . 47
3 Quantile Table for the Standard Normal Distribution. . . . . . . . . . . . . . 48
1
List of Figures
1 f (x) = P (x) = 16 and F (x) of the fair die toss RV X of Example 2.4 . . . . . 19
2 f (x) and F (x) of an astragali toss RV X of Example 2.6 . . . . . . . . . . . 21
3 f (x) and F (x) of RV X for the sum of two independent tosses of a fair die. . 21
4 Probability density function of the volume of rain in cubic inches over the
lecture theatre tomorrow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 PDF and DF of Normal(µ, σ 2 ) RV for different values of µ and σ 2 . . . . . . 30
2
1 Sets, Experiments and Probability
1.1 Rudiments of Set Theory
1. A set is a collection of distinct objects or elements and we enclose the elements by
curly braces. For example, the collection of the two letters H and T is a set and we
denote it by {H, T}. But the collection {H, T, T} is not a set (do you see why? think
distinct!). Also, recognise that there is no order to the elements in a set, i.e. {H, T} is
the same as {T, H}.
2. We give convenient names to sets. For example, we can call the set {H, T} by A and
write A = {H, T} to mean it.
3. If a is an element of A, we write a ∈ A. For example, if A = {1, 2, 3}, then 1 ∈ A.
4. If a is not an element of A, we write a ∈

/ A. For example, if A = {1, 2, 3}, then
13 ∈/ A.
5. We say that a set A is a subset of a set B if every element of A is also an element of

B and write A ⊆ B. For example, {1, 2} ⊆ {1, 2, 3, 4}.
6. We say that a set A is not a subset of a set B if at least one element of A is not an
element of B and write A * B. For example, {1, 2} is not a subset of {1, 3, 4} since
2 ∈ {1, 2} but 2 ∈
/ {1, 3, 4} and write {1, 2} * {1, 2, 3, 4} to mean this.
7. We say a set A is equal to a set B and write A = B if and only if A ⊆ B and B ⊆ A
8. The union A ∪ B of A and B consists of elements that are in A or in B or in both A

and B. For example, if A = {1, 2} and B = {3, 2} then A ∪ B = {1, 2, 3}.
9. The intersection A ∩ B of A and B consists of elements that are in both A and B.

For example, if A = {1, 2} and B = {3, 2} then A ∩ B = {2}.
10. The empty set contains no elements and it is the collection of nothing. It is denoted
by ∅ = {}.
11. Given some universal set, say Ω, the Greek letter Omega, the Complement of a set
A denoted by Ac is the set of all elements in Ω that are not in A. For example, if
Ω = {H, T} and A = {H} then Ac = {T}. Note that for any set A ⊆ Ω:
Ac ∩ A = ∅, A ∪ Ac = Ω, Ωc = ∅, ∅c = Ω .
12. When we have more than two sets, we can define unions and intersections similarly.
The union of m sets m
[
Aj = A1 ∪ A2 ∪ · · · ∪ Am
j=1
3
consists of elements that are in at least one of the m sets A1 , A2 , . . . , Am , and the
union of infinitely many sets
∞
[
Aj = A1 ∪ A2 ∪ · · · ∪ · · ·
j=1
consists of elements that are in at least one of the sets A1 , A2 , . . ..

Similarly, the intersection
m
\
Aj = A1 ∩ A2 ∩ · · · ∩ Am
j=1
of m sets consists of elements that are in each of the m sets and the intersection of
infinitely many sets
∞
\
Aj = A1 ∩ A2 ∩ · · ·
j=1
consists of elements that are in each of the infinitely many sets.
Exercise 1.1 Let Ω = {1, 2, 3, 4, 5, 6}, A = {1, 3, 5} and B = {2, 4, 6}. By using the
definitions of sets and set operations find the following sets:
Ac = Bc = Ωc = ∅c =
{1}c = { } A∪B = A∩B = A∪Ω=
A∩Ω= B∩Ω= B∪Ω= A ∪ Ac =
B ∪ Bc = etc.
Venn diagrams are visual aids for set operations.
Example 1.1 For three sets A, B and C, the Venn diagrams for A∪B, A∩B and A∩B ∩C
are:
A A A
B
B B
C
Ω Ω Ω
(a) A ∪ B (b) A ∩ B (c) A ∩ B ∩ C
4
Exercise 1.2 Let A = {1, 3, 5, 7, 9, 11}, B = {1, 2, 3, 5, 8, 13} and C = {1, 2, 4, 8, 16, 32}
denote three sets. Let us use a Venn diagram to visualise these three sets and their intersec-
tions. Can you mark which sets correspond to A, B and C in the figure below.
1.2 Experiments
Definition 1.1 An experiment is an activity or procedure that produces distinct, well-
defined possibilities called outcomes. The set of all outcomes is called the sample space
and is denoted by Ω, the upper-case Greek letter Omega. We denote a typical outcome in Ω
by ω, the lower-case Greek letter omega, and a typical sequence of possibly distinct outcomes
by ω1 , ω2 , ω3 , . . ..
Example 1.2 Ω = {Defective, Non-defective} if our experiment is to inspect a light bulb.
Example 1.3 Ω = {Heads, Tails} if our experiment is to note the outcome of a coin toss.
In Examples 1.2 and 1.3, Ω only has two outcomes and we can refer to the sample space of
such two-outcome experiments generically as Ω = {ω1 , ω2 }. For instance, the two outcomes
of of Example 1.2 are ω1 = Defective and ω2 = Non-defective while those of Example 1.3 are
ω1 = Heads and ω2 = Tails.
Example 1.4 If our experiment is to roll a die whose faces are marked with the six numerical
symbols or numbers 1, 2, 3, 4, 5, 6 then there are six outcomes corresponding to the number
that shows on the top. Thus, the sample space Ω for this experiment is {1, 2, 3, 4, 5, 6}.
5
Exercise 1.3 Suppose our experiment is to observe whether it will rain or shine tomorrow.
What is the sample space for this experiment? Answer: Ω = { }.
The subsets of Ω are called events. The outcomes ω1 , ω2 , . . ., when seen as subsets of Ω,
such as, {ω1 }, {ω2 }, . . ., are simple events.
Example 1.5 In our roll a die experiment of Example 1.4 with Ω = {1, 2, 3, 4, 5, 6}, the set
of odd numbered outcomes A = {1, 3, 5} or the set of even numbered outcomes B = {2, 4, 6}
are examples of events. The simple events are {1}, {2}, {3}, {4}, {5}, and {6}.
Example 1.6 Consider a generic die-tossing experiment by a human experimenter. Clearly,

Ω={ } and A = { }, B = { } and C = {ω3 } are examples
of events. This experiment could correspond to rolling a die whose faces are:
1. sprayed with six different scents (nose!), or
2. studded with six distinctly flavoured candy (tongue!), or
3. contoured with six distinct bumps and pits (touch!), or
4. acoustically discernible at six different frequencies (ears!), or
5. painted with six different colours (eyes!), or
6. marked with six different numbers 1, 2, 3, 4, 5, 6 (eyes!), or , . . .

This example is meant to concretely convince you that an experiment’s sample space is
merely a collection of distinct elements called outcomes and these outcomes have to be
discernible in some well-specified sense to the experimenter!
Definition 1.2 A trial is a single performance of an experiment and it results in an out-

come.
Example 1.7 We call a single roll of a die as a trial.
Example 1.8 We call a single toss of a coin as a trial.
An experimenter often performs more that one trial. Repeated trials of an experiment forms
the basis of science and engineering as the experimenter learns about the phenomenon by
repeatedly performing the same mother experiment with possibly different outcomes. This
repetition of trials in fact provides the very motivation for the definition of probability in
§ 1.3.
Definition 1.3 An n-product experiment is obtained by repeatedly performing n trials

of a mother experiment.
6
Example 1.9 Suppose we toss a coin twice by performing two trials of the coin toss ex-
periment of Example 1.3 and use the short-hand H and T to denote the outcome of Heads
and Tails, respectively. Then our sample space Ω = {HH, HT, TH, TT}. Note that this is the
2-product experiment of the coin toss mother experiment.
Exercise 1.4 What is the event that at least one Heads occurs in the 2-product experiment
of Example 1.9, i.e., tossing a fair coin twice?
Exercise 1.5 What is the sample space of the 3-product experiment of the coin toss exper-
iment, i.e., tossing a fair coin thrice?
Definition 1.4 An ∞-product experiment is defined as
lim n-product experiment of some mother experiment .

n→∞
Remark 1.5 Loosely speaking, a set that can be enumerated or tagged uniquely by natural
numbers N = {1, 2, 3, . . .} is said to be countably infinite or contain countably many
elements. Some examples of such sets include any finite set, the set of natural numbers
N = {1, 2, 3, . . .}, the set of non-negative integers {0, 1, 2, 3, . . .}, the set of all integers Z =
{. . . , −3, −2, −1, 0, 1, 2, 3, . . .}, the set of all rational numbers Q = {p/q : p, q ∈ Z, q 6= 0},
but the set of real numbers R = (−∞, ∞) is uncountably infinite.
Example 1.10 The sample space Ω of the ∞-product experiment of tossing a coin infinitely
many times has uncountably infinitely many elements and is in bijection with all binary
numbers in the unit interval [0, 1] — just replace H with 1 and T with 0. We cannot
enumerate all outcomes in Ω but can show some outcomes:
Ω = {HHHH · · · , HTHH · · · , THHH · · · , TTHH · · · , . . . ,

. . . , TTTT · · · , HTTT · · · , THTT · · · , HHTT · · · , . . . , . . .} .
1.3 Probability
Definition 1.6 Probability is a function P that assigns real numbers to events, which
satisfies the following four Axioms:
Axiom (1): for any event A, 0 ≤ P (A) ≤ 1
Axiom (2): if Ω is the sample space then P (Ω) = 1
Axiom (3): if A and B are disjoint, i.e., P (A ∩ B) = ∅ then
P (A ∪ B) = P (A) + P (B)
7
Axiom (4): if A1 , A2 , . . . is an infinite sequence of pairwise disjoint or mutually
exclusive events, i.e., Ai ∩ Aj = ∅ whenever i 6= j, then
∞
! ∞
[ X
P Ai = P (Ai )
i=1 i=1
These axioms are merely assumptions that are justified and motivated by the frequency
interpretation of probability in n-product experiments as n tends to infinity,
which states that if we repeat an experiment a large number of times then the fraction of
times the event A occurs will be close to P (A). To be precise, if we let N (A, n) be the
number of times A occurs in the first n trials then
P (A) = lim N (A, n)/n

n→∞
Given P (A) = limn→∞ N (A, n)/n, Axiom (1) simply affirms that the fraction of times a given
event A occurs must be between 0 and 1. If Ω has been defined properly to be the set of
ALL possible outcomes, then Axiom (2) simply affirms that the fraction of times something
in Ω happens is 1. To explain Axiom (3), note that if A and B are disjoint then
N (A ∪ B, n) = N (A, n) + N (B, n)
since A ∪ B occurs if either A or B occurs but it is impossible for both to occur. Dividing
both sides of the previous equality by n and letting n → ∞, we arrive at Axiom (3).
Axiom (3) implies that Axiom (4) holds for a finite number of sets. In many cases the sample
space is finite so Axiom (4) is not relevant or necessary. Axiom (4) is a new assumption for
infinitely many sets as it does not simply follow from Axiom (3) any longer. Axiom (4) is
more difficult to motivate but without it the theory of probability becomes more difficult
and less useful, so we will impose this assumption on utilitarian grounds.
The following three Theorems are merely properties of probability.
Theorem 1.7 Complementation Rule. The probability of an event A and its comple-
ment Ac in a sample space Ω, satisfy
P (Ac ) = 1 − P (A) . (1)
Proof: By the definition of complement, we have Ω = A ∪ Ac and A ∩ Ac = ∅. Hence by

Axioms 2 and 3,
1 = P (Ω) = P (A) + P (Ac ), thus P (Ac ) = 1 − P (A).
8
Example 1.11 Recall the coin toss experiment of Example 1.3 with Ω = {Heads, Tails}.
Suppose that our coin happens to be fair with P (Heads) = 1/2. Since, {Tails}c = {Heads},
we can apply the complementation rule to find the probability of observing a Tails from
P (Heads) as follows:
1
P (Tails) = 1 − P (Heads) = .
2
Theorem 1.8 Addition Rule for Mutually Exclusive Events. For mutually exclusive
or pair-wise disjoint events A1 , . . . , Am in a sample space Ω,
P (A1 ∪ A2 ∪ A3 ∪ · · · ∪ Am ) = P (A1 ) + P (A2 ) + P (A3 ) + · · · + P (Am ) . (2)
Proof: This is a consequence of applying Axiom (3) repeatedly:
P (A1 ∪ A2 ∪ A3 ∪ · · · ∪ Am ) = P (A1 ∪ (A2 ∪ · · · ∪ Am ))

= P (A1 ) + P (A2 ∪ (A3 · · · ∪ Am )) = P (A1 ) + P (A2 ) + P (A3 · · · ∪ Am ) = · · ·
= P (A1 ) + P (A2 ) + P (A3 ) + · · · + P (Am ) .
Example 1.12 Let us observe the number on the first ball that pops out in a New Zealand
Lotto trial. There are forty balls labelled 1 through 40 for this experiment and so the sample
space Ω = {1, 2, 3, . . . , 39, 40}. Because the balls are vigorously whirled around inside the
Lotto machine before the first one pops out, we can model each ball to pop out first with
1
the same probability. So, we assign each outcome ω ∈ Ω the same probability of 40 , i.e., our
probability model for this experiment is:
1
P (ω) = , for each ω ∈ Ω = {1, 2, 3, . . . , 39, 40} .
40
NOTE: we sometimes abuse notation and write P (ω) instead of the more accurate but
cumbersome P ({ω}) when writing down probabilities of simple events.
Now, let’s check if Axiom (1) is satisfied for simple events in our model for this Lotto
experiment,
1
0 ≤ P (1) = P (2) = · · · = P (40) = ≤1
40
Is Axiom (3) satisfied?
For example, disjoint simple events {1} and {2}
1 1 2
P ({1, 2}) = P ({1} ∪ {2}) = P ({1}) + P ({2}) = + =
40 40 40
Is Axiom (2) satisfied?
Yes, by Equation (2) of the addition rule for mutually exclusive events (Theorem 1.8):
40
! 40
[ X 1 1 1
P (Ω) = P ({1, 2, . . . , 40}) = P i = P (i) = + + ··· + =1
i=1 i=1
40 40 40
9
(a) 1114 NZ Lotto draw frequency from 1987 to 2008. (b) 1114 NZ Lotto draw relative frequency from 1987 to 2008.
Recommended Activity 1.1 Explore the following web sites to learn more about NZ and
British Lotto. The second link has animations of the British equivalent of NZ Lotto.
http://lotto.nzpages.co.nz/
http://understandinguncertainty.org/node/39
Theorem 1.9 Addition Rule for Two Arbitrary Events. For events A and B in a
sample space,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) . (3)
Proof:
P (A ∪ B) = P (A ∪ (B ∩ Ac ))
= P (A) + P (B ∩ Ac ) by Axiom (3) and disjointness
= P (A) + P (B) − P (A ∩ B)
The last equality P (B ∩ Ac ) = P (B) − P (A ∩ B) is due to Axiom (3) and the disjoint union
of B = (B ∩ Ac ) ∪ (A ∩ B) giving P (B) = P (B ∩ Ac ) + P (A ∩ B). It is easy to see this with
a Venn diagram.
Exercise 1.6 In English language text, the twenty six letters in the alphabet occur with
the following frequencies:
E 13% R 7.7% A 7.3% H 3.5% F 2.8% M 2.5% W 1.6% X 0.5% J 0.2%
T 9.3% O 7.4% S 6.3% L 3.5% P 2.7% Y 1.9% V 1.3% K 0.3% Z 0.1%
N 7.8% I 7.4% D 4.4% C 3% U 2.7% G 1.6% B 0.9% Q 0.3%
Suppose you pick one letter at random from a randomly chosen English book from our
central library with Ω = {A, B, C, . . . , Z} – ignoring upper/lower cases, then what is the
probability of the following events?
(a) P ({Z}) =
(b) What is the most likely outcome?
(c) P (‘picking any letter’) = P (Ω) =
10
(d) P ({E, Z}) = — by Axiom (3)
(e) P (‘picking a vowel’) =
by Equation (2) of addition rule for mutually exclusive events (Theorem 1.8).
(f) P (‘picking any letter in the word WAZZZUP’) = by Equa-
tion (2) of addition rule for mutually exclusive events (Theorem 1.8).
(g) P (‘picking any letter in the word WAZZZUP or a vowel’) =
= 42.2%
by Equation (3) of addition rule for two arbitrary events (Theorem 1.9).
1.4 Conditional Probability

Conditional probability allows us to make decisions from partial information about an ex-
periment.
Definition 1.10 The probability of an event B under the condition that an event A occurs
is called the conditional probability of B given A and is denoted by P (B|A). In this case
A serves as a new (reduced) sample space, and that probability is the fraction of P (A) which
corresponds to A ∩ B. Thus,
P (A ∩ B)
P (B|A) = , if P (A) 6= 0 . (4)
P (A)
Similarly, the conditional probability of A given B is
P (A ∩ B)
P (A|B) = , if P (B) 6= 0 . (5)
P (B)
Conditional Probability is a probability and therefore all four Axioms of probability
also hold for conditional probability of events given the conditioning event A has P (A) > 0.
Axiom (1): For any event B, 0 ≤ P (B|A) ≤ 1.
Axiom (2): P (Ω|A) = 1.
Axiom (3): For any two disjoint events B1 and B2 , P (B1 ∪B2 |A) = P (B1 |A)+P (B2 |A).
Axiom (4): For mutually exclusive or pairwise-disjoint events, B1 , B2 , . . .,
P (B1 ∪ B2 ∪ · · · |A) = P (B1 |A) + P (B2 |A) + · · · .
Note that the complementation and addition rules also follow for conditional probability.
1. complementation rule for conditional probability:
P (B|A) = 1 − P (B c |A) . (6)
11
2. addition rule for two arbitrary events B1 and B2 :
P (B1 ∪ B2 |A) = P (B1 |A) + P (B2 |A) − P (B1 ∩ B2 |A) . (7)
Theorem 1.11 Multiplication Rule. If A and B are events and P (A) 6= 0, P (B) =
6 0,
then
P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) . (8)
Proof: Solving for P (A ∩ B) in the Definitions (4) and (5) of conditional probability, we
obtain Equation (8) of the above theorem.
Example 1.13 Suppose the NZ All Blacks team is playing in a four team Rugby match. In
the first round they have a tough opponent that they will beat 40% of the time but if they
win that game they will play against an easy opponent where their probability of success is
0.8. What is the probability that they will win the tournament?
If A and B are the events of victory in the first and second games, respectively, then P (A) =
0.4 and P (B|A) = 0.8, so by multiplication rule, the probability that they will win the
tournament is:
P (A ∩ B) = P (A)P (B|A) = 0.4 × 0.8 = 0.32 .
Exercise 1.7 In Example 1.13, what is the probability that the All Blacks will win the first
game but loose the second?
Definition 1.12 Independent events. If events A and B are such that
P (A ∩ B) = P (A)P (B),
they are called independent events. Assuming P (A) 6= 0, P (B) 6= 0, we have P (A|B) =
P (A), and P (B|A) = P (B). This means that the probability of A does not depend on the
occurrence or nonoccurence of B, and conversely. This justifies the term “independent”.
12
Example 1.14 Suppose you toss a fair coin twice such that the first toss is independent of
the second. Then,
1 1 1
P (HT) = P (Heads on the first toss ∩ Tails on the second toss) = P (H)P (T) = × = .
2 2 4
Similarly, P (HH) = P (TH) = P (TT) = 12 × 12 = 14 . Thus, P (ω) = 1
4
for every ω in the sample
space Ω = {HT, HH, TH, TT}.
Accordingly, three events A, B, C are independent if and only if
P (A ∩ B) = P (A)P (B),
P (B ∩ C) = P (B)P (C),
P (C ∩ A) = P (C)P (A),
P (A ∩ B ∩ C) = P (A)P (B)P (C).
Example 1.15 Suppose you independently toss a fair die thrice. What is the probability
of getting an even outcome in all three trials?
Let Ei be the event that the outcome is an even number on the i-th trial. Then, the
probability of getting an even number in all three trials is:
P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 )P (E3 ) = (P ({2, 4, 6}))3 = (P ({2} ∪ {4} ∪ {6}))3

3 3 3
3 1 1 1 3 1 1
= (P ({2}) + P ({4}) + P ({6})) = + + = = = .
6 6 6 6 2 8
Definition 1.13 Independence of n Events. Similarly, n events A1 , . . . , An are called

independent if
P (A1 ∩ · · · ∩ An ) = P (A1 )P (A2 ) · · · P (An ) .
Example 1.16 Suppose you toss a fair coin independently m times. Then each of the 2m
possible outcomes in the sample space Ω has equal probability of 21m due to independence.
Theorem 1.14 Total probability theorem. Suppose B1 ∪ B2 · · · ∪ Bn is a sequence of

events with positive probability that partition the sample space, i.e., B1 ∪ B2 · · · ∪ Bn = Ω,
Bi ∩ Bj = ∅ for i 6= j, then
n
X n
X
P (A) = P (A ∩ Bi ) = P (A|Bi )P (Bi ) . (9)
i=1 i=1
Proof: The first equality is due to addition rule for mutually exclusive events, A ∩ B1 , A ∩
B2 , . . . , A ∩ Bn and the second equality is due to multiplication rule.
13
Exercise 1.8 An well-mixed urn contain five red and ten black balls. We draw two balls
from the urn without replacement. What is the probability that the second ball drawn is
black?
Theorem 1.15 Bayes theorem.

P (A)P (B|A)
P (A|B) = . (10)
P (B)
Proof: The proof is a consequence of the definition of conditional probability and the
multiplication rule.
P (A ∩ B) P (B ∩ A) P (B|A)P (A) P (A)P (B|A)
P (A|B) = = = = .
P (B) P (B) P (B) P (B)
Exercise 1.9 Approximately 1% of women aged 40–50 have breast cancer. A woman with
breast cancer has a 90% chance of a positive test from a mammogram, while a woman
without breast cancer has a 10% chance of a false positive result from the test. What is
the probability that a woman indeed has breast cancer given that she just had a positive test?
14
2 Random Variables
We are used to traditional variables such as x as an “unknown” in the equation:
x+3=7 ,
where we can solve for x = 7 − 3 = 4. Another example is to use traditional variables to

represent geometric objects such as a line:
y = 3x − 2 ,
where the variable y for the y-axis is determined by the value taken by the variable x, as x
varies over the real line R = (−∞, ∞). The variables we have used to represent sequences
such as:
{an }∞n=1 = a1 , a2 , a3 , . . . ,
are also traditional. When we wrote functions of a variable, such as x, in:

x
f (x) = , for x ≥ 0 ,
x+1
the argument x is also a traditional variable. In fact, all of Calculus you have been taught
is by means of such traditional variables.
Question: What is common to all these variables above, such as, x, y, a1 , a2 , a3 , . . . , f (x)?
Answer: They are instances of deterministic variables, that is, these traditional variables
take a fixed or deterministic value when we can solve for them.
We need a new kind of variable to deal with real-world situations where the same variable
may take different values in a non-deterministic manner. Random variables do this job for
us. Random variables, unlike traditional deterministic variables can take a bunch of different
values!
In fact, random variables are actually functions! They take you from the “world of random
processes and phenomena” to the world of real numbers. In other words, a random variable
is a numerical value determined by the outcome of the experiment.
15
Definition 2.1 A Random variable or RV is a function from the sample space Ω to the
set of real numbers R:
X(ω) : Ω → R ,
such that, for every real number x, the corresponding set {ω ∈ Ω : X(ω) ≤ x}, i.e. the set
of outcomes whose numerical value is less than or equal to x, is an event. The probability
of such events is given by the function F (x) : R → [0, 1] called the distribution function
or DF of the random variable X:
F (x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}) , for any x ∈ R . (11)
NOTE: Distribution function or DF is sometimes called cumulative distribution function or CDF

in pre-calculus treatments of the subject. We will avoid the CDF nomenclature in our treatment.
Example 2.1 Recall the rain or shine experiment of Example 1.3 with sample space Ω =
{rain, shine}. We can associate a random variable X with this experiment as follows:
(
1, if ω = rain
X(ω) =
0, if ω = shine
Thus, X is 1 if it will it rain tomorrow and 0 otherwise. Note that another equally valid
discrete random variable, say Y , for this experiment is:
(
π, if ω = rain
Y (ω) = √
2, if ω = shine
A random variable can be chosen to assign each outcome ω ∈ Ω to any real number as the
experimenter desires.
Recall the experiments of Example 1.6 that involved smelling, tasting, touching, hearing,
or seeing to discern between outcomes. It becomes very difficult to communicate, process
and make decisions based on outcomes of experiments that are discerned in this manner and
even more difficult to record them unambiguously. This is where real numbers can give us a
helping hand.
Data are typically random variables that act as numerical placeholders for out-
comes of an experiment about some real-world random process or phenomenon. We said
that the random variable can take one of many values, but we cannot be certain of which
value it will take. However, we can make probabilistic statements about the value x
the random variable X will take. This can be done with probabilities.
Theorem 2.2 Probability that the RV X takes a value x in the half-open interval (a, b],
i.e., a < x ≤ b, is:
P (a < X ≤ b) = F (b) − F (a) . (12)
16
Proof: Since the events (X ≤ a) = {ω : X(ω) ≤ a} and (a < X ≤ b) = {ω : a < X(ω) ≤ b}
are mutually exclusive or disjoint events whose union is the event (X ≤ b) = {ω : X(ω) ≤ b},
Axiom (3) of Definition 1.6 of probability and by Equation (11) in Definition 2.1 of DF,
F (b) = P (X ≤ b) = P (X ≤ a) + P (a < X ≤ b) = F (a) + P (a < X ≤ b) .
Subtraction of F (a) from both sides of the above equation yields Equation (12).
Example 2.2 Recall the fair coin toss experiment of Example 1.11 with Ω = {H, T} and
P (H) = P (T) = 1/2. We can associate a random variable X with this experiment as follows:
(
1, if ω = H
X(ω) =
0, if ω = T
Note that this choice of values for X equates to counting the number of H in one trial of the
fair coin toss experiment. The DF for X is:

P (∅) = 0,
 if − ∞ < x < 0
1
F (x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}) = P ({T}) = 2 , if 0 ≤ x < 1

P ({H, T}) = P (Ω) = 1, if 1 ≤ x < ∞

And the probability that X takes on a specific value x is:


P (∅) = 0,
 if x ∈
/ {0, 1}
P (X = x) = P ({ω : X(ω) = x}) = P ({T}) = 12 , if x = 0

P ({H) = 12 , if x = 1

All we are really saying above in detail to show the underlying definitions just amounts to:

1
 2 if x = 0

P (X = x) = 12 if x = 1

0 otherwise

Example 2.3 Now let us define at a discrete random variable that can take one of six
possible values from {1, 2, 3, 4, 5, 6} in the toss a fair die experiment. This X gives the
number that shows up on the top face as we roll a fair six-faced die whose faces are labelled by
numerical symbols 1, 2, 3, 4, 5, 6. Note that here Ω is the set of numerical symbols that label
each face while each of these symbols are associate with the real number x ∈ {1, 2, 3, 4, 5, 6}.
Thus,


 1, if ω is the outcome that the die lands with the face labelled by 1 on top

2, if ω is the outcome that the die lands with the face labelled by 2 on top





3, if ω is the outcome that the die lands with the face labelled by 3 on top
X(ω) =






6, if ω is the outcome that the die lands with the face labelled by 6 on top

17
Example 2.4 Consider the random variable X of the Toss a fair dice experiment of Ex-
ample 2.3 with P (X = x) = P ({ω : X(ω) = x}) = 61 for each x ∈ {1, 2, 3, 4, 5, 6} and 0
otherwise. The probability that X ≤ 3 can be obtained by
3
F (3) = P (X ≤ 3) = P ({ω : X(ω) ≤ 3}) = P ({1, 2, 3}) = P ({1}) + P ({2}) + P ({3}) =
6
Exercise 2.1 Similarly, can you complete the following probability statement about the
value x the random variable X of Example 2.4 will take?
P (X = 1) = P (X = 2) = P (X = 3) = P (X = 4) = P (X = 5) = P (X = 6) = .
2.1 Discrete Random Variables and their Distributions

Definition 2.3 A random variable X and its distribution are discrete if X assumes only
finitely many or at most countably many values x1 , x2 , x3 , . . . , called the possible values
of X, with positive probabilities p1 = P (X = x1 ), p2 = P (X = x2 ), p3 = P (X = x3 ), . . . .
A discrete RV X takes on at most countably many values in R. The rain or shine random
variables of Example 2.1 and the fair coin toss RV of Example 2.2 can only take two possible
values while the toss a fair die RV of Example 2.4 can only take six possible values. Thus,
they are examples of discrete random variables. We can study discrete random variables in
a general setting.
Definition 2.4 The probability mass function or PMF f of a discrete RV X is:

(
pi if x = xi , where i = 1, 2, . . .
f (x) = P (X = x) = P ({ω : X(ω) = x}) = . (13)
0 otherwise
From this we get the values of the Distribution Function F (x) by simply taking sums,
X X
F (x) = f (xi ) = pi , (14)
xi ≤x xi ≤x
where for any given x, we sum all the probabilities pi for which xi is smaller than or equal
to that of x. Thus, DF F (x) for discrete random variable is a step function with upward
jumps of size pi at the possible values xi of X and constant in between.
Out of this class of discrete random variables we will define specific kinds as they arise often
in applications. We classify discrete random variables into three types for convenience as
follows:
• Discrete uniform random variables with finitely many possibilities
• Discrete non-uniform random variables with finitely many possibilities
• Discrete non-uniform random variables with (countably) infinitely many possibilities
18
2.1.1 Discrete uniform random variables with finitely many possibilities
Definition 2.5 Discrete Uniform Random Variable. We say that a discrete RV X is
uniformly distributed over k possible values {x1 , x2 , . . . , xk } if its PMF is:
(
pi = k1 if x ∈ {x1 , x2 , . . . , xk } ,
f (xi ) = (15)
0 otherwise .
The DF for discrete uniform RV X is:




 0 if − ∞ < x < 1 ,
1
if 1 ≤ x < 2 ,



 k

X X 2

if 2 ≤ x < 3 ,
F (x) = f (xi ) = pi = .k (16)
 ..
xi ≤x xi ≤x 

k−1




 k
if k − 1 ≤ x < k ,

1 if k ≤ x < ∞ .
Example 2.5 The fair die toss RV X of Example 2.4 is a discrete uniform RV with possible
values {x1 , x2 , x3 , x4 , x5 , x6 } = {1, 2, 3, 4, 5, 6}. Its PMF and DF are given by substituting
k = 6 in Equations 15 and 16, respectively. These functions are depicted in Figure 2.5. Pay
attention to the ◦ and • in the plot to relate them to the Equations 15 and 16. The ◦, • and
the dotted lines are used to depict how the value of f (x) and F (x) jump as x varies.
1
Figure 1: f (x) = P (x) = 6
and F (x) of the fair die toss RV X of Example 2.4
Exercise 2.2 Plot the PMF and DF in detail along with ◦, • and the dotted lines for the
fair coin toss RV X of Example 2.2 and convince yourself that it is also a discrete uniform
RV.
19
Exercise 2.3 Recall the first ball that pops out in a New Zealand Lotto trial of Example 1.12.
First associate a RV X with this experiment that turns the integer-symbolised ball labels
into real numbers in the set of possible values {1, 2, 3, . . . , 39, 40}. Then, give the PMF and
DF for X and know how the plot should look.
Two useful formulae for discrete distributions are readily obtained as follows. For the prob-
ability corresponding to intervals we have
X
P (a < X ≤ b) = F (b) − F (a) = pi . (17)
a<xi ≤b
This is the sum of all probabilities pi for which xi satisfies a < xi ≤ b. From this and
P (Ω) = 1 we obtain the following formula that the sum of all probabilities is 1.
X
pi = 1 . (18)
i
2.1.2 Discrete non-uniform random variables with finitely many possibilities

Example 2.6 Astragali. Board games involving chance were known in Egypt, 3000 years
before Christ. The element of chance needed for these games was at first provided by tossing
astragali, the ankle bones of sheep. These bones could come to rest on only four sides,
the other two sides being rounded. The upper side of the bone, broad and slightly convex
counted four; the opposite side broad and slightly concave counted three; the lateral side
flat and narrow, one, and the opposite narrow lateral side, which is slightly hollow, six. You
may examine an astragali of a kiwi sheep (Ask at Maths & Stats Reception to access it). A
4 3 2 1
surmised PMF with f (4) = 10 , f (3) = 10 , f (1) = 10 and f (6) = 10 and DF are shown in
Figure 2.1.2.
Exercise 2.4 Suppose we toss a possibly biased of unfair coin with a given probability
0 ≤ p ≤ 1 of H, i.e., P (H) = p and P (T) = 1 − p = q. Associate the RV X to this
experiment to report the number of Heads in one trial. Compute P (X = 1), P (X =
2), P (X = 3), P (X = 0)? Sketch the PDF and DF of X when p takes each of the following
four values 0, 1/2, 1/3, 2/3, 1. Note that, p is really a parameter of this RV X. We will see
more parametrised random variables in the sequel.
Example 2.7 Let the random variable X denote the sum of two independent tosses of
a fair die. This discrete RV has possible values in {2, 3, 4, . . . , 12}. There are a total of
6 × 6 = 36 equally likely outcomes (ω1 , ω2 ) ∈ Ω = {(1, 1), (1, 2), . . . , (6, 6)}, where ω1 is
the outcome of the first toss and ω2 is that of the second independent toss. Each such
outcome (ω1 , ω2 ) has probability 1/36. Now X = 2 occurs in the case of the outcome (1, 1);
X = 3 in the case of the two outcomes (1, 2) and (2, 1); X = 4 in the case of the three
outcomes (1, 3), (2, 2), (3, 1); and so on as shown by the mapping in Figure 2.1.2. Hence,
f (x) = P (X = x) and F (x) = P (X ≤ x) have the values shown in Table 2.1.2. Figure 2.1.2
shows the plots of f (x) and F (x).
20
Figure 2: f (x) and F (x) of an astragali toss RV X of Example 2.6
1
(a) X : Ω → {2, 3, 4, . . . , 11, 12}, P (ω) = 36 for any ω ∈ Ω (b) PMF f (x) and DF F (x)
Figure 3: f (x) and F (x) of RV X for the sum of two independent tosses of a fair die.
x 2 3 4 5 6 7 8 9 10 11 12
f (x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
F (x) 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 36/36
Table 1: f (x) and F (x) for the sum of two independent tosses of a fair die RV X.
21
Example 2.8 Compute the probability of a sum of at least 4 and at most 8 from the
probability Table 2.1.2 in Example 2.7. From Equation 17 we get:
26 3 23
P (3 < X ≤ 8) = F (8) − F (3) = − =
36 36 36
Recommended Activity 2.1 You can get a nice treatment of the sum of two independent
tosses of a fair die in ten minutes and seven seconds by watching the following UTube video:
http://www.youtube.com/v/2XToWi9j0Tk&hl=en_US&fs=1&rel=0&border=1
Next we see some the most basic parametric model of discrete random variables.
Definition 2.6 Bernoulli(θ) Random Variable. Given a parameter θ ∈ (0, 1), the prob-
ability mass function (PMF) for the Bernoulli(θ) RV X is:

θ
 if x = 1 ,
f (x; θ) = 1 − θ if x = 0 ,

0 otherwise ,

and its DF is: 

1
 if 1 ≤ x ,
F (x; θ) = 1 − θ if 0 ≤ x < 1 ,

0 otherwise

We emphasise the dependence of the probabilities on the parameter θ by specifying it fol-

lowing the semicolon in the argument for f and F .
Example 2.9 Let the random variable X return 1 if we observe a H and 0 otherwise when
we toss a possibly biased coin with parameter θ ∈ (0, 1). Then, P ({H}) = P (X = 1) = θ
and X is a Bernoulli(θ) RV.
2.1.3 Discrete non-uniform random variables with infinitely many possibilities

Let us now look at a discrete RV that has countably infinitely many possibilities in the set
of natural numbers N = {1, 2, 3, . . .}.
Example 2.10 Waiting For the First Heads. Suppose our experiment is to toss a fair
coin independently and identically (that is the same coin is tossed in essentially the same
manner independent of the other tosses in each trial) as often as necessary until we have a
heads denoted by H. Let the RV X denote the Number of trials until the first H appears.
Then, clearly the possible values X can take is {1, 2, 3, . . .}. Let us compute the PMF of X
22
by independence of events:
1
f (1) = P (X = 1) = P (H) = ,
2
2
1 1 1
f (2) = P (X = 2) = P (TH) = · = ,
2 2 2
3
1 1 1 1
f (3) = P (X = 3) = P (TTH) = · · = , etc.
2 2 2 2
and in general x
1
f (x) = P (X = x) = , x = 1, 2, . . . .
2
Example 2.11 Recall experiment in Example 2.4 of tossing a possibly biased coin with a
fixed parameter θ = P (H), where 0 < θ < 1. Now suppose you use such a coin in the waiting
for the first Heads experiment with RV X in Example 2.10. Confirm that the probabilities
1−rx
indeed sum to 1 by the fact that the x-th partial sums Sx = a 1−r of the geometric series
P ∞ x 2 3 a
x=0 ar = a + ar + ar + ar + · · · converge to S = 1−r if −1 < r < 1. Let us compute the
θ-specific PMF of X by independence of events:
f (1; θ) = P (X = 1) = P (H) = (1 − θ)0 θ = θ ,
f (2; θ) = P (X = 2) = P (TH) = (1 − θ)1 θ ,
f (3; θ) = P (X = 3) = P (TTH) = (1 − θ)2 θ , etc.
and in general
f (x; θ) = P (X = x) = (1 − θ)x−1 θ, x = 1, 2, . . . .
And, we already saw that this series converges if 0 < (1 − θ) < 1:
θ θ
lim F (x; θ) = f (1; θ) + f (2; θ) + f (3; θ) + · · · = = =1 .
x→∞ 1 − (1 − θ) θ
We have just derived the PMF of a θ-parametric family of discrete random variable that can
take countably infinitely many values in {1, 2, 3, . . .}. We also showed that the PMF sums
to 1 as it should. This is called the geometric distribution with “success probability”
parameter θ for obvious reasons.
Definition 2.7 Binomial(n, θ) Random Variable. Let the RV X = ni=1 Xi be the sum
P
of n independent and identically distributed Bernoulli(θ) RVs, i.e.:
n
IID
X
X= Xi , X1 , X2 , . . . , Xn ∼ Bernoulli(θ) .
i=1
Given two parameters n and θ, the PMF of the Binomial(n, θ) RV X is:


 n θx (1 − θ)n−x if x ∈ {0, 1, 2, 3, . . . , n} ,
f (x; n, θ) = x
0 otherwise

23
n

where, x
is:
n n(n − 1)(n − 2) . . . (n − x + 1) n!
= = .
x x(x − 1)(x − 2) · · · (2)(1) x!(n − x)!
n

x
is read as “n choose x” — the number of ways of choosing x object from n of them.
Example 2.12 Find the probability that seven of ten persons will recover from a tropical
disease if we can assume independence and the probability is identically 0.80 that any one
of them will recover from the disease.
Substituting x = 7, n = 10, and θ = 0.8 into the formula for the binomial distribution, we
get:

10 10!
f (7; 10, 0.8) = × (0.8)7 × (1 − 0.8)10−7 = × (0.8)7 × (1 − 0.8)10−7 ,
7 (10 − 7)!7!
and find that the result is 120 × (0.8)7 × (1 − 0.8)10−7 u 0.20.
Exercise 2.5 Compute the probability of obtaining at least two 6’s in rolling a fair die in-
dependently and identically four times.
P (at least two 6’s) =
= .
Definition 2.8 Poisson(λ) Random Variable. Given a parameter λ > 0, the PMF of
the Poisson(λ) RV X is:
λx
f (x; λ) = exp(−λ) where x = 0, 1, . . . (19)
x!
For some values of λ, it can be proved that this distribution is obtained as a limiting case
of the Binomial(n, θ) RV, if we let θ → 0 and n → ∞ so that the product nθ approaches a
finite value. For instance, λ = nθ may be kept constant. Thus, Poisson(λ) RV is really a
limit of Binomial(n, θ) RV as n → ∞, θ → 0 and nθ = λ.
Example 2.13 If the probability of producing a defective screw is 0.01, what is the proba-
bility that a lot of 100 screws will contain more than 2 defectives?
Let the complementary event that there are no more that two defectives be Ac . For its
probability we use the Binomial(n = 100, θ = 0.01) RV with nθ = 100 × 0.01 = 1. Then,

c 100 100 100 99 100
P (A ) = × 0.99 + × 0.01 × 0.99 + × 0.012 × 0.9998 = 92.06% .
0 1 2
24
Since θ is very small, we can approximate this by the much more convenient Poisson(λ) RV
with λ = nθ = 100 × 0.01 = 1, obtaining
0
11 12

c −1 1 −1 1
P (A ) ≈ e + + + =e 1+1+ = 91.97% .
0! 1! 2! 2
Thus P (A) = 1 − P (Ac ) = 1 − 91.97% = 8.03% under the Poisson(λ = 1) approximation.
Since the binomial distribution gives P (A) = 1 − P (Ac ) = 1 − 92.06% = 7.94%, the Poisson
approximation is quite good.
Example 2.14 If on the average, 2 cars enter a certain parking lot per minute, what is the
probability that during any given minute 4 or more cars will enter the lot?
To understand that the Poisson distribution is a model of the situation, we imagine the
minute to be divided into very many short time intervals, let θ be the (constant) probability
that a car will enter the lot during any such short interval, and assume independence of the
events that happen during those intervals. Then we are dealing with a binomial distribution
with very large n and very small θ, which we can approximate by the Poisson distribution
with
λ = nθ = 2 ,
because 2 cars enter on the average. The complementary event of the event four cars or
more during a given minute’ is three cars or fewer enter the lot’ and has the probability
0
21 22 23

−2 2
f (0; 2) + f (1; 2) + f (2; 2) + f (3; 2) = e + + + = 0.857 .
0! 1! 2! 3!
Therefore, the probability of interest is 1 − 0.857 = 14.3%.
2.2 Continuous Random Variables and Distributions

Discrete random variables appear in experiments in which we count (defectives in a produc-
tion, days of sunshine in Christchurch, customers standing in a line, number of buses that
will arrive at the Orbiter bus-stop in the next hour, etc.). Continuous random variables
appear in experiments in which we measure (lengths of screws, voltage in a power line, cubic
inches of rain on this lecture theatre, etc.).
Example 2.15 The random variable X is the exact amount of rain in inches over the roof
of this lecture theatre tomorrow.
X(ω) = x : Ω → [0, ∞) .
This is an example of a continuous random variable that takes one of (uncountably)

infinitely many values in the half-line [0, ∞). So, when it rains tomorrow, X(ω) will take
a value x and this x could be 1.1” of rain or this x could be 1.10000001” of rain, or this x
could be 87.8798787123456”, etc. Do you see why this random variable X can’t take values
in (−∞, 0)?
25
Figure 4: Probability density function of the volume of rain in cubic inches over the lecture
theatre tomorrow.
Exercise 2.6 For the continuous random variable X of Example 2.15 it is more interesting
to make probability statements such as P (X = x) about the actual amount of rain x that
will fall on the lecture theatre tomorrow. Can you see why the following statements are true?
P (X = 1.1) = P (X = 1.10000001) = P (X = 87.8798787123456) = 0
In fact, for this continuous random variable P (X = x) = 0 for any real number x. Stop to
understand this!
After you have understood that P (X = x) = 0 for any x ∈ R, it should not be surprising
that P (1.1 ≤ X ≤ 1.10000001) can now be possibly more that 0.
Recommended Activity 2.2 You can get a nice informal treatment of the contents of
§ 2.1 and § 2.2 in ten minutes and two seconds by watching the following YouTube video:
http://www.youtube.com/v/Fvi9A_tEmXQ&hl=en_US&fs=1&rel=0&border=1
Definition 2.9 Continuous random variables take values on a continuous scale. A random
variable X is continuous, if its Distribution function F (x) can be given by an integral
Z x
F (x) = P (X ≤ x) = f (v)dv (20)
−∞
(we write v because x is needed as the upper limit of the integral) whose integrand f (x),
called the probability density function (PDF) of the distribution, is non-negative, and
is continuous, perhaps except for finitely many x-values. Differentiation gives the relation of
f to F as
f (x) = F 0 (x) (21)
for every x at which f (x) is continuous.
Then we obtain the very important formula for the probability corresponding to an interval:
Z b
P (a < X ≤ b) = F (b) − F (a) = f (v)dv . (22)
a
This is the analog of 17
26
From 20 and P (Ω) = 1 we also have the analog of 18:
Z ∞
f (v)dv = 1 .
−∞
Continuous random variables are simpler than discrete ones with respect to intervals. Indeed,
in the continuous case the four probabilities corresponding to a < X ≤ b, a < X < b,
a ≤ X < b, and a ≤ X ≤ b with any fixed a and b (> a) are all the same.
Definition 2.10 Given a distribution function F (x) = P (X ≤ x) = u, the inverse distri-

bution function is defined as
F [−1] (u) = inf{x ∈ R : u < F (x)} = q ,
where inf is called infimum, which is effectively the smallest element that random variable
X can take in order to satisfy the probability condition. It is also known as quantile or
percentile
The next example illustrates notations and typical applications of our present formulae.
Example 2.16 Let X have the density function f (x) = e−x , if (x ≥ 0) and zero otherwise.
Find the distribution function. Find the probabilities P ( 41 ≤ X ≤ 2) and P − 12 ≤ X ≤ 21 .

Find x such that P (X ≤ x)=0.95.
1. Z x x
F (x) = e−v dv = −e−v 0
= −e−x + 1 = 1 − e−x if x ≥ 0
0
Therefore, (
1 − e−x if x ≥ 0 ,
F (x) =
0 otherwise .
2. Z 2
1
P ≤X≤2 = e−v dv = F (2) − F ( 14 ) = 63.35%
4 1
4
3. 1
Z Z 0
1 1 2
−v
P − ≤X≤ = e dv + e−v dv = F ( 12 ) + 0 = 39.35%
2 2 0 − 12
4.
P (X ≤ x) = F (x) = 1 − e−x = 0.95
Therefore,
x = − log(1 − 0.95) = 2.9957 .
27
The previous example is a special case of the following parametric family of random variables.
Definition 2.11 Exponential(λ) Random Variable. Given a rate parameter λ > 0, the
Exponential(λ) random variable X has probability density function given by:
(
λ exp(−λx) x > 0 ,
f (x; λ) =
0 otherwise ,
and distribution function given by:
F (x; λ) = 1 − exp(−λx) .
The Exponential(λ) RV gives the waiting times between successive events of a Poisson(λ)
RV that is counting the number of events in unit time.
Example 2.17 At a certain location on highway, the number of cars exceeding the speed
limit by more than 10 kilometers per hour in half an hour is a Poisson(λ = 8.4) random
variable. What is the probability of a waiting time of less than 5 minutes between cars
exceeding the speed limit by more than 10 kilometers per hour?
Using half an hour as the unit of time, we have Poisson(λ = 8.4) giving the number of
arrivals in unit time. Therefore, the waiting time is a random variable having an exponential
distribution with λ = 8.4, and since 5 minutes is 16 of the unit of time, we find that the
desired probability is
Z 1
6 1
8.4e−8.4x dx = −e−8.4x 06 = −e−1.4 + 1
0
which is approximately 0.75.

Exercise 2.7 The number of planes arriving per day at a small private airport is a random
variable having a Poisson distribution parameter equal to 28.8. What is the probability that
the time between two such arrivals is at least 1 hour?
Definition 2.12 Uniform(a, b) Random Variable. The distribution with the density
1
f (x) = if a<x<b
b−a
and f = 0 otherwise is called the uniform distribution on the interval a < x < b. The
cumulative distribution function of uniform RV is

0
 x<a ,
x−a
F (x) = b−a a ≤ x < b ,

1 x≥b .

28
Exercise 2.8 Find a probability density function for the random variable whose distribution
function is given by 
0 x < 0

F (x) = x 0 ≤ x < 1

1 x≥1 .




f (x) =


(
f (x) =
.
Example 2.18 A machine pumps cleanser into a process at a rate which has a uniform
distribution in the interval 8.00 to 10.00 litres per minute. What is the pump rate which the
machine can be expected to exceed 61% of the time?
The density function of this distribution is
(
0.5 if 8 ≤ x ≤ 10
f (x) =
0 otherwise
Therefore,
8 + 0.61/0.5 = 8 + 1.22 = 9.22
The pump rate 9.22 is expected to exceed 61% of the time.
Exercise 2.9 The actual amount of coffee (in grams) in a 230-gram jar filled by a certain
machine is a random variable whose probability density is give by

0 x ≤ 227.5

f (x) = 51 227.5 < x < 232.5 .

0 x ≥ 232.5

Find the probabilities that a 230-gram jar filled by this machine will contain
(a) at most 228.65 grams of coffee;
(b) anywhere from 229.34 to 231.66 grams of coffee;
(c) at least 229.85 grams of coffee.
29
Next we discuss the normal distribution. This is the most important continuous distribution
because in applications many random variables are normal random variables (that is, they
have a normal distribution) or they are approximately normal or can be transformed into
normal random variables in a relatively simple fashion. Furthermore, the normal distribution
is a useful approximation of more complicated distributions, and it also occurs in the proofs
of various statistical tests.
Definition 2.13 Given a location parameter µ ∈ (−∞, +∞) and a scale parameter σ 2 > 0,
the normal(µ, σ 2 ) or Gauss(µ, σ 2 ) random variable has probability density function:
" 2 #
1 1 x − µ
f (x; µ, σ 2 ) = √ exp − (σ > 0) (23)
σ 2π 2 σ
This is simpler than it may at first look. f (x; µ, σ 2 ) has these features.
1. µ is the expected value or mean parameter and σ 2 is the variance parameter.
√
2. 1/(σ 2π) is a constant factor that makes the area under the curve of f (x) from −∞
to ∞ equal to 1, as it must be.
3. The curve of f (x) is symmetric with respect to x = µ because the exponent is quadratic.
Hence for µ = 0 it is symmetric with respect to the y-axis x = 0.
4. The exponential function decays to zero very fast — the faster the decay the smaller
the standard deviation σ is.
The normal distribution has the distribution function
Z x " 2 #
1 1 v − µ
F (x; µ, σ 2 ) = √ exp − dv . (24)
σ 2π −∞ 2 σ
Here we needed x as the upper limit of integration and wrote v in the integrand.
Figure 5: PDF and DF of Normal(µ, σ 2 ) RV for different values of µ and σ 2
For the corresponding standardised normal distribution with mean 0 and variance 1 we
denote F (z; 0, 1) by Φ(z). Then we simply have from
Z z
1 2
Φ(z) = √ e−u /2 du .
2π −∞
30
This integral cannot be integrated by one of the methods of calculus. But this is no serious
problem because its values can be obtained numerically and tabulated. Theses values are
needed in working with the normal distribution. The curve of Φ(z) is S-shaped. It increases
monotone from 0 to 1 and intersects the vertical axis at 1/2.
Theorem 2.14 The distribution function F (x; µ, σ 2 ) of the Normal(µ, σ 2 ) RV with any µ
and σ 2 is related to standardised Normal(0, 1) RV with DF Φ(z):

x−µ
F (x) = Φ .
σ
Example 2.19 Let X be normal with mean 5 and standard deviation 0.2. Find c or k
corresponding to the given probability

c−5 c−5
P (X ≤ c) = 95%, Φ = 95%, = 1.645, c = 5.329
0.2 0.2
P (5 − k ≤ X ≤ 5 + k) = 90%, 5 + k = 5.239
c−5
P (X ≥ c) = 1%, thus P (X ≥ c) = 99%, = 2.326, 5.465 .
0.2
Example 2.20 Suppose that the amount of cosmic radiation to which a person is exposed
when flying by jet across the United States is a random variable having a normal distribution
with a mean of 4.35 mrem and a standard deviation of 0.59 mrem. What is the probability
that a person will be exposed to more than 5.20 mrem of cosmic radiation on such a flight?
Looking up the entry corresponding to z = 5.20−4.35
0.59
= 1.44, and subtracting it from 1. We
get 1 − 0.9251 = 0.0749.
Exercise 2.10 Find the probabilites that a random variable having the standard normal
distribution will take on a value
(a) less that 1.72;
(b) less than -0.88;
(c) between 1.30 and 1.75;
31
(d) between -0.25 and 0.45.
32
3 Expectations
We learn of expectations of Random variables as a way to summarise random variables.
Definition 3.1 Expectation of a function g : X → R of a random variable X is:
(P
g(x)f (x) if X is a discrete RV
E(g(X)) = R ∞x
−∞
g(x)f (x)dx if X is a continuous RV
Definition 3.2 Population Mean characterises the central location of the RV X. It is

the expectation of the function g(x) = x:
(P
xf (x) if X is a discrete RV
E(X) = R ∞x
−∞
xf (x)dx if X is a continuous RV
Often, population mean is denoted by µ.

Definition 3.3 Population variance characterises the spread of the variability of the RV
X. It is the expectation of the function g(x) = (x − E(X))2 :
(P
(x − E(X))2 f (x) if X is a discrete RV
V (X) = E (X − E(X))2 = R ∞x

2
−∞
(x − E(X)) f (x)dx if X is a continuous RV
Often, population variance is denoted by σ 2 . Note that σ 2 > 0, and σ 2 = 0 for a point mass
random variable, that is a discrete random variable which can only take one possible value.
The following formula for variance is very useful: V (X) = E(X 2 ) − (E(X))2 . We can prove
the above formula by expanding the square as follows:
V (X) = E (X − E(X))2 = E X 2 − 2XE(X) + (E(X))2

= E(X 2 ) − E (2XE(X)) + E (E(X))2 = E(X 2 ) − 2E(X)E (X) + (E(X))2

= E(X 2 ) − 2(E(X))2 + (E(X))2

= E(X 2 ) − (E(X))2 .
Definition 3.4 Population Standard Deviation is the square root of the variance, and
it is often denoted by σ.
Example 3.1 Mean and Variance of fair coin toss. The random variable X = Number
of heads in a single toss of a fair coin has the possible values X = 0 and X = 1 with
probabilities P (X) = 21 and P (X = 1) = 21 . From the definition we thus obtain the mean
1 1 1
µ = E(X) = 0 · +1· = ,
2 2 2
and the variance as
2 2
2 2 1 1 1 1 1
σ = E(X − E(X)) = 0 − · + 1− · = .
2 2 2 2 4
33
Exercise 3.1 What are the population mean and variance of a biased coin with P (X =
9
1) = 10 ?
Example 3.2 Recall that the random variable X with density

(
1
if a < x < b
f (x) = b−a
0 otherwise
is uniformly distributed on the interval [a, b]. From the definition of the expectation of a
continuous random variable, we find that
Z ∞ Z b Z b Z b
1 1
E(X) = xf (x)dx = xf (x)dx = x dx = xdx
−∞ a a b−a b−a a
x=b
1 1 2 1 1
= x dx = (b2 − a2 ) = (b + a)(b − a) = (a + b)/2 ,
b−a 2 x=a b−a b−a
since
Z b Z b Z b x=b
2 2 2 1 1 2 1 1 3
E(X ) = x f (x) dx = x dx = x dx = x
a a b−a b−a a b−a 3 x=a
1 1 3
= (b − a3 )
b−a3
1 1 b2 + ab + a2
= (b − a)(b2 + ab + a2 ) = .
3b−a 3
Therefore variance is
2
b2 + ab + a2 b2 + ab + a2 a2 + 2ab + b2

2 (a + b)
V (X) = E(X) − (E(X)) = − = −
3 2 3 4
(b − a)2
= .
12
Example 3.3 Mean and variance of discrete uniform random variable Pk with 1, 2, . . . , k out-
m
comes, say for the fair k-faced die, based on Faulhaber’s formula for i=1 i , with m ∈ {1, 2},
are,
1 1 k(k + 1) k+1
E(X) = (1 + 2 + · · · + k) = = ,
k k 2 2
1 2 1 k(k + 1)(2k + 1) 2k 2 + 3k + 1
E(X 2 ) = 1 + 22 + · · · + k 2 = = ,
k k 6 6
2
2k 2 + 3k + 1 2k 2 + 3k + 1
2
2 2 k+1 k + 2k + 1
V (X) = E(X ) − (E(X)) = − = −
6 2 6 4
2 2 2 2
8k + 12k + 4 − 6k − 12k − 6 2k − 2 k −1
= = = .
24 24 12
34
Exercise 3.2 Find the mean and variance of the discrete uniform random variable X with
40 equi-probable outcomes 1, 2, . . . 40. Think of X as the probability model of the first ball
label in one NZ Lotto trial.
Example 3.4 Mean and Variance of Poisson random variable X with parameter λ. Recall
that the Taylor series of eλ
∞
λ λ2 λ3 λ4 X λx
e =1+λ+ + + + ... = .
2! 3! 4! x=0
x!
By using this fact, the population mean is
∞ ∞ ∞ ∞
X X e−λ λx −λ
X λx −λ
X λλx−1
E(X) = xf (x; λ) = x =e x =e = e−λ λeλ = λ .
x=0 x=0
x! x=0
x! x−1=0
(x − 1)!
The population variance is
V (X) = E(X 2 ) − (E(X))2 = λ + λ2 − λ2 = λ .
since
∞ −λ x ∞
x λx−1 2λ 3λ2 4λ3

2e λ
X X
2 −λ −λ
E(X ) = x = λe = λe 1+ + + + ...
x=0
x! x=1
(x − 1)! 1 2! 3!
λ λ2 λ3 λ 2λ2 3λ3

−λ
= λe 1+ + + + ... + + + + ...
1 2! 3! 1 2! 3!
2λ 3λ2 λ2

−λ λ −λ λ

= λe e +λ 1+ + + ... = λe e +λ 1+λ+ + ...
2! 3! 2!
= λ e−λ eλ + λ eλ = λ e−λ eλ + λ eλ = λ(1 + λ)

= λ + λ2
Note that Poisson RV has the same mean and variance λ.
Exercise 3.3 Show that the expectation and variance of the Exponentially distributed non-
negative random variable X with rate parameter λ and density f (x) = λ exp(−λx) is:
1 1
E(X) = V (X) = .
λ λ2
35
Exercise 3.4 What is the mean life of a light bulb whose life X [hours] has the density
f (x) = 0.001e−0.001x (x ≥ 0)?
The expectation and variance of the Normal random variable

X with mean parameter µ and
−(x−µ) 2
1
variance parameter σ 2 with density f (x) = √2πσ 2
exp 2σ 2
is:
E(X) = µ, V (X) = σ 2 .
Theorem 3.5 Chebychev’s Inequality
36
Suppose the random variable X has finite E(X 2 ), then for any constant c > 0 we have
E(X 2 )
P (|X| ≥ c) ≤ .
c2
Proof: We will carry out the proof for a countably valued X and leave the analogous proof
for the density case as an exercise. The idea of the proof is the same for a general random
variable. Suppose that X takes the values xi with probabilities pi . Then we have
X
E(X 2 ) = pi x2i .
i
If we consider only those values xi satisfying the inequality |xi | ≥ c and denote by A the
corresponding set of indices i, namely A = {i : |xi | ≥ c}, then of course x2i ≥ c2 for i ∈ A,
whereas X
P (|X| ≥ c) = pi .
i∈A
Then if we sum the index i only over the partial set A, we have
X X X
E(X 2 ) ≥ pi x2i ≥ p i c2 = c2 pi = c2 P (|X| ≥ c).
i∈A i∈A i∈A
Example 3.5 Let X be any continuous random variable with E(X) = µ and V (X) = σ 2 .
Then, if = kσ = k standard deviations for some integer k, then
σ2 1
P (|X − µ| ≥ kσ) ≤ 2 2
= 2,
k σ k
just as in the discrete case.
37
4 Tutorial for Week 1
4.1 Preparation Problems (Homework)
Exercise 4.1 [§ 1.1] Venn Diagrams.
1. Using Venn diagrams, graph and check the rules
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
2. Using Venn diagram, show that A ⊆ B if and only if A ∪ B = B.
3. Show that, by the definition of complement, for any subset A of a sample space Ω,
(Ac )c = A, Ωc = ∅, ∅c = Ω, A ∪ Ac = Ω, A ∩ Ac = ∅.
Exercise 4.2 [§ 1.2] Find the sample space for the experiment:
1. Tossing 2 coins whose faces are sprayed with black paint denoted by B and white paint
denoted by W
2. Drawing 4 screws from a lot of left-handed and right-handed screws denoted by L and
R, respectively.
Exercise 4.3 [§ 1.3] Suppose we pick a letter at random from the word WAIMAKARIRI.
What is the sample space Ω and what probabilities should be assigned to the outcomes?
Exercise 4.4 [§ 1.3] In the toss an unfair die experiment with Ω = {1, 2, 3, 4, 5, 6}, the prob-
ability of the event A = {1, 3, 5} = 1/3. What is the probability of the event B = {2, 4, 6}?
Exercise 4.5 [§ 2.1]
1. What gives the greater probability of hitting some target at least once: (a) hitting in
a shot with probability 1/2 and firing 1 shot, or (b) hitting in a shot with probability
1/4 and firing 2 shots? First guess. Then calculate.
2. In rolling two fair dice, what is the probability of obtaining a sum greater than 4 but
not exceeding 7?
38
3. A local country club has a membership of 600 and operates facilities that include an
18-hole championship golf course and 12 tennis courts. Before deciding whether to
accept new members, the club president would like to know how many members regu-
larly use each facility. A survey of the membership indicates that 70% regularly use the
golf course, 50% regularly use the tennis courts, and 5% use neither of these facilities
regularly. Given that a randomly selected member uses the tennis courts regularly,
find the probability that they also use the golf course regularly.
4. Let X be the number of years before a particular type of machine will need replacement.
Assume that X has the probability function f (1) = 0.1, f (2) = 0.2, f (3) = 0.2,
f (4) = 0.2, f (5) = 0.3. Graph f and F . Find the probability that the machine needs
no replacement during the first 3 years.
5. A box contains 4 right-handed and 6 left-handed screws. Two screws are drawn at
random without replacement. Let X be the number of left-handed screws drawn.
Find the probabilities P (X = 0), P (X = 1), P (X = 2), P (1 < X < 2), P (X ≤ 1),
P (X ≥ 1), P (X > 1), and P (0.5 < X < 10).
6. One number in the following table for the probability function of a random variable X
is incorrect. Which is it, and what should the correct value be?
x 1 2 3 4 5
P (X = x) 0.07 0.10 1.10 0.32 0.40
4.2 In Tutorial Problems

Exercise 4.6 [§ 1.1] Using Venn diagram, graph and check De Morgan’s Laws:
1. (A ∪ B)c = Ac ∩ B c
2. (A ∩ B)c = Ac ∪ B c
Exercise 4.7 [§ 1.2] Graph a sample space for the experiment:
1. Rolling 2 dice each of whose faces are marked by numbers 1,2,3,4, 5 and 6
2. Tossing a coin until the first H appears
3. Rolling a die until the first 6 appears
Exercise 4.8 [§ 1.3] There are seventy five balls in total inside the Bingo Machine. Each
ball is labelled by one of the following five letters: B, I, N, G, and O. There are fifteen balls
labelled by each letter. The letter on the first ball that comes out of a BINGO machine after
it has been well-mixed is the outcome of our experiment. Formalise this experiment and the
associated probability model step-by-step.
39
1. First, the sample space is:
2. The probabilities of simple events are:
3. Check if Axiom (1) is satisfied:
4. Is Axiom (3) satisfied for simple events {B} and {I}
5. Using the addition rule for mutually exclusive events check that Axiom (2) is satisfied
for the simple events.
6. Consider the following events: C = {B, I, G} and D = {G, I, N}. Using the addition
rule for two arbitrary events compute P (C ∪ D).
Exercise 4.9 Associate a RV X with the BINGO experiment of Exercise 4.8. Note that
you can choose any RV for this job. Find the PMF and DF of X. Now define another RV Y
for this same experiment that counts the number of balls labelled by a vowel in the outcome
of one BINGO Trial. Is Y a discrete uniform RV? What is P (Y = 1)?
Exercise 4.10 Durrett (The Monty Hall problem). The problem is named for the host
of the television show Let’s Make A Deal in which contestants were often placed in situations
like the following: Three curtains are numbered 1, 2, and 3. Behind one curtain is a car;
behind the other two curtains are donkeys. You pick a curtain, say #1. To build some
suspense the host opens up one of the two remaining curtains, say #3, to reveal a donkey.
What is the probability you will win given that there is a donkey behind #3? Should you
switch curtains and pick #2 if you are given the chance?
http://www.math.canterbury.ac.nz/SOCR/SOCR Experiments.html
Choose Monty Hall experiment.
http://www.math.canterbury.ac.nz/SOCR/SOCR Games.html
Choose Monty Hall game.
Exercise 4.11 Based on past experience, 70% of students in a certain course pass the
midterm exam. The final exam is passed by 80% of those who passed the midterm, but only
by 40% of those who fail the midterm. What fraction of students pass the final:
Exercise 4.12 A small brewery has two bottling machines. Machine 1 produces 75% of
the bottles and machine 2 produces 25%. One out of every 20 bottles filled by machine 1 is
rejected for some reason, while one out of every 30 bottles filled by machine 2 is rejected.
What is the probability that a randomly selected bottle comes from machine 1 given that it
is accepted?
40
Exercise 4.13 A process producing microchips, produces 5% defective, at random. Each
microchip is tested, and the test will correctly detect a defective one 4/5 of the time, and if
a good microchip is tested the test will declare it is defective with probability 1/10.
(a) If a microchip is chosen at random, and tested to be good, what was the probability
that it was defective anyway?
(b) If a microchip is chosen at random, and tested to be defective, what was the probability
that it was good anyway?
(c) If 2 microchips are tested and determined to be good, what is the probability that at
least one is in fact defective?
Exercise 4.14 A gale is of force 1, force 2, or force 3, with probabilities 2/3, 1/4, 1/12
respectively.
Force 1 gales cause damage with probability 1/4;
force 2 gales cause damage with probability 2/3;
force 3 gales cause damage with probability 5/6.
(a) A gale is reported; what is the probability of it causing damage?
(b) If the gale DID cause damage, what are the probabilities that it was force 1; force 2;
force 3?
(c) If the gale DID NOT cause damage, what are the probabilities that it was force 1;
force 2; force 3?
Exercise 4.15 Of 200 adults, 176 own one TV set, 22 own two TV sets, and 2 own three
TV sets. A person is chosen at random. What is the probability function of X the number
of TV sets owned by that person?
Exercise 4.16 Suppose a discrete random variable X has probability function give by
x 3 4 5 6 7 8 9 10 11 12 13
P (X = x) .07 .01 .09 .01 .16 .25 .20 .03 .02 .11 .05
(a) Construct a row of cumulative probabilities for this table.

Using both the probabilities of individual values and cumulative probabilities,
compute the probability that
41
(b) X ≤ 5 ,
(c) X > 9 ,
(d) X ≥ 9 ,
(e) X < 12 ,
(f) 5 ≤ X ≤ 9 ,
(g) 4 < X < 11 ,
(h) P (X = 14) ,
(i) P (X < 3) .
42
5 Tutorial for Week 2
5.1 Preparation Problems (Homework)
Exercise 5.1 Four fair coins are tossed simultaneously. Find the probability function of the
random variable X = Number of heads and compute the probabilities of obtaining no heads,
precisely 1 head, at least 1 head, not more than 3 heads.
Exercise 5.2 If the probability of hitting a target in a single shot is 10% and 10 shots are
fired independently, what is the probability that the target will be hit at least once?
Exercise 5.3 If X has the probability function f (x) = k/2x (x = 0, 1, 2, . . . ), what are k
and P (X ≥ 4)?
Exercise 5.4 Let p = 1% be the probability that a certain type of light bulb will fail in a
240hr test. Find the probability that a sign consisting of 10 such bulbs will burn 24 hours
with no bulb failures.
Exercise 5.5 Given a density f (x) = k if −4 ≤ x ≤ 4 and 0 elsewhere, what is the k value
Graph f and F .
Exercise 5.6 If the diameter X of axles has the density f (x) = k if 119.9 ≤ x ≤ 120.1 and
0 otherwise, how many defectives will a lot of 500 axles approximately contain if defectives
are axles slimmer than 119.92 or thicker than 120.08?
Therefore, the probability of ONE defective axles P (defective) = 0.1 + 0.1 = 0.2, and it is
expected that there is 0.2 × 500 = 100 defective axles in 500 axles.
5.2 In Tutorial Problems

Exercise 5.7 (Rutherford-Geiger experiments) In 1910, E. Rutherford and H. Geiger
showed experimentally that the number of alpha particles emitted per second in a radioactive
process is a random variable X having a Poisson distribution. If X has mean 0.5, what is
the probability of observing two or more particles during any given second?
Exercise 5.8 Let p = 1% be the probability that a certain type of light bulb will fail in a
240hr test. Find the probability that a sign consisting of 10 such bulbs will burn 24 hours
with no bulb failures.
43
Exercise 5.9 Suppose that a certain type of magnetic tape contains, on the average, 2 de-
fects per 100 meters. What is the probability that a roll of tape 300 meters long will contain
(a) x defects, (b) no defects?
Exercise 5.10 Find the probability that none of the three bulbs in a traffic signal must
be replaced during the first 1200 hours of operation if the probability that a bulb must be
replaced is a random variable X with density f (x) = 6[0.25 − (x − 1.5)2 ] when 1 ≤ x ≤ 2
and f (x) = 0 otherwise, where x is time measured in multiples of 1000 hours.
Exercise 5.11 Suppose that certain bolds have length L = 200 + X mm, where X is a
random variable with density f (x) = 34 (1 − x2 ) if −1 ≤ x ≤ 1 and 0 otherwise. Determine c
so that with a probability of 95% a bolt will have any length between 200 − c and 200 + c.
Exercise 5.12 Let the random variable X with density f (x) = ke−x if 0 ≤ x ≤ 2 and
0 otherwise be the time after which certain ball bearings are worn out. Find k and the
probability that a bearing will last at least 1 year.
P (X ≥ 1) = 1 − P (x < 1)
Z 1
=1− ke−x
0
1
= 1 + k e−x 0
e−1 − 1
=1+ = 0.2689414
1 − e−2
Exercise 5.13 Assume that a new light bulb will burn out after t hours, where t is chosen
from [0, ∞) with an exponential density
f (t) = λe−λt .
In this context, λ is often called the failure rate of the bulb.
(a) Assume that λ = 0.01, and find the probability that the bulb will not burn out before
T hours. This probability is often called the reliability of the bulb.
(b) For what T is the reliability of the bulb = 1/2?
Exercise 5.14 Choose a number B at random from the interval [0, 1] with uniform density.
Find the probability that
44
(a) 1/3 < B < 2/3
(b) |B − 1/2| ≤ 1/4
(c) B < 1/4 or 1 − B < 1/4
(d) 3B 2 < B
Exercise 5.15 IQ scores for school children are standardised so that they are approximately
Normally distributed with a mean of 100 and a standard deviation of 15. What is approxi-
mately the probability that a randomly selected child has an IQ
(a) less than 80?
(b) between 85 and 110?
(c) greater than 120?
Exercise 5.16 We return to IQ scores that are approximately Normally distributed with a
mean of 100 and a standard deviation of 15.
(a) What is the 80% percentile of IQ scores?
(b) What IQ score is exceeded by only the top 1% of children?
(c) Below what score do only the bottom 30% of children fall?
Exercise 5.17 What is the expected daily profit if a store sells X air conditioners per day
with probability f (10) = 0.1, f (11) = 0.3, f (12) = 0.4, f (13) = 0.2 and the profit per
conditioner is $55?
Exercise 5.18 If the mileage (in multiples of 1000 mi) after which a tire must be replaced
is given by the random variable X with density f (x) = λe−λx (x > 0), what mileage can
you expect to get on one of these tires? Let λ = 0.04 and find the probability that a tire
will last at least 40000 mi.
Exercise 5.19 A small filling station is supplied with gasoline every Saturday afternoon.
Assume that its volume X of sales in ten thousands of gallons has the probability density
f (x) = 6x(1 − x) if 0 ≤ x ≤ 1 and 0 otherwise. Determine the mean, the variance.
45
Exercise 5.20 Let X be normal with mean 80 and variance 9. Find P (X > 83), P (X < 81),
P (X < 80), and P (78 < X < 82).
Exercise 5.21 If the lifetime X of a certain kind of automobile battery is Normally dis-
tributed with a mean of 4 yr and a standard deviation of 1 yr, and the manufacturer wishes
to guarantee that battery for 3 yr, what percentage of the batteries will he or she have to
replace under the guarantee?
Exercise 5.22 If the mathematics scores of the SAT college entrance exams for undergrad-
uate admission in the U.S. are Normally distributed with mean 480 and standard deviation
100 and if some college sets 500 as the minimum score for new students, what percent of
students will not reach that score?
Exercise 5.23 A manufacturer produces airmail envelopes whose weight is Normally dis-
tributed with mean µ = 1.95 grams and standard deviation σ = 0.025 grams. The envelopes
are sold in lots of 1000. How many envelopes in a lot will be heavier that 2 grams?
Exercise 5.24 Find the mean and the variance of the random variable X
1. X = Number a fair die turns up
2. Uniform distribution on [0, 8]
3. f (x) = 2e−2x (x ≥ 0)
46
For any given value z, its cumulative probability Φ(z) was generated by Excel formula NORMSDIST, as NORMSDIST(z).
z Φ(z) z Φ(z) z Φ(z) z Φ(z) z Φ(z) z Φ(z)

0.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.9778 2.51 0.9940
0.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.9783 2.52 0.9941
0.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.9788 2.53 0.9943
0.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.9793 2.54 0.9945
0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946
0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.9803 2.56 0.9948
0.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.9808 2.57 0.9949
0.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.9812 2.58 0.9951
0.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.9817 2.59 0.9952
0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953
0.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.9826 2.61 0.9955
0.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.9830 2.62 0.9956
0.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.9834 2.63 0.9957
0.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.9838 2.64 0.9959
0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960
0.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.9846 2.66 0.9961
0.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.9850 2.67 0.9962
0.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.9854 2.68 0.9963
0.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.9857 2.69 0.9964
0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965
0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.9864 2.71 0.9966
0.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.9868 2.72 0.9967
0.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.9871 2.73 0.9968
0.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.9875 2.74 0.9969
0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970
0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.9881 2.76 0.9971
0.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.9884 2.77 0.9972
0.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.9887 2.78 0.9973
0.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.9890 2.79 0.9974
0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974
0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.9896 2.81 0.9975
0.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.9898 2.82 0.9976
0.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.9901 2.83 0.9977
0.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.9904 2.84 0.9977
0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978
0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.9909 2.86 0.9979
0.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.9911 2.87 0.9979
0.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.9913 2.88 0.9980
0.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.9916 2.89 0.9981
0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981
0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.9920 2.91 0.9982
0.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.9922 2.92 0.9982
0.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.9925 2.93 0.9983
0.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.9927 2.94 0.9984
0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984
0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.9931 2.96 0.9985
0.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.9932 2.97 0.9985
0.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.9934 2.98 0.9986
0.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.9936 2.99 0.9986
0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987
Table 2: DF Table for the Standard Normal Distribution.

47
For any give probability Φ, its standard normal quantile z(Φ) was generated by Excel Formula NORMSINV, as NORMSINV(Φ).
Φ% z(Φ%) Φ% z(Φ%) Φ% z(Φ%)

1 −2.3263 41 −0.2275 81 0.8779
2 −2.0537 42 −0.2019 82 0.9154
3 −1.8808 43 −0.1764 83 0.9542
4 −1.7507 44 −0.1510 84 0.9945
5 −1.6449 45 −0.1257 85 1.0364
6 −1.5548 46 −0.1004 86 1.0803

7 −1.4758 47 −0.0753 87 1.1264
8 −1.4051 48 −0.0502 88 1.1750
9 −1.3408 49 −0.0251 89 1.2265
10 −1.2816 50 0.0000 90 1.2816
11 −1.2265 51 0.0251 91 1.3408

12 −1.1750 52 0.0502 92 1.4051
13 −1.1264 53 0.0753 93 1.4758
14 −1.0803 54 0.1004 94 1.5548
15 −1.0364 55 0.1257 95 1.6449
16 −0.9945 56 0.1510 96 1.7507

17 −0.9542 57 0.1764 97 1.8808
18 −0.9154 58 0.2019 98 2.0537
19 −0.8779 59 0.2275 99 2.3263
20 −0.8416 60 0.2533
21 −0.8064 61 0.2793 99.1 2.3656

22 −0.7722 62 0.3055 99.2 2.4089
23 −0.7388 63 0.3319 99.3 2.4573
24 −0.7063 64 0.3585 99.4 2.5121
25 −0.6745 65 0.3853 99.5 2.5758
26 −0.6433 66 0.4125 99.6 2.6521

27 −0.6128 67 0.4399 99.7 2.7478
28 −0.5828 68 0.4677 99.8 2.8782
29 −0.5534 69 0.4959 99.9 3.0902
30 −0.5244 70 0.5244
31 −0.4959 71 0.5534 99.91 3.1214

32 −0.4677 72 0.5828 99.92 3.1559
33 −0.4399 73 0.6128 99.93 3.1947
34 −0.4125 74 0.6433 99.94 3.2389
35 −0.3853 75 0.6745 99.95 3.2905
36 −0.3585 76 0.7063 99.96 3.3528

37 −0.3319 77 0.7388 99.97 3.4316
38 −0.3055 78 0.7722 99.98 3.5401
39 −0.2793 79 0.8064 99.99 3.7190
40 −0.2533 80 0.8416
Table 3: Quantile Table for the Standard Normal Distribution.
48

Math103 ProbStats Raaz 2010 S1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Math103 ProbStats Raaz 2010 S1 PDF

Uploaded by

Copyright:

Available Formats

Elements of Probability and Statistics

Probability Theory provides the mathematical models of phenomena governed by chance.

4 Tutorial for Week 1 38

5 Tutorial for Week 2 43

3. If a is an element of A, we write a ∈ A. For example, if A = {1, 2, 3}, then 1 ∈ A.

4. If a is not an element of A, we write a ∈

5. We say that a set A is a subset of a set B if every element of A is also an element of

7. We say a set A is equal to a set B and write A = B if and only if A ⊆ B and B ⊆ A

8. The union A ∪ B of A and B consists of elements that are in A or in B or in both A

9. The intersection A ∩ B of A and B consists of elements that are in both A and B.

consists of elements that are in at least one of the sets A1 , A2 , . . ..

consists of elements that are in each of the infinitely many sets.

Venn diagrams are visual aids for set operations.

(a) A ∪ B (b) A ∩ B (c) A ∩ B ∩ C

Example 1.2 Ω = {Defective, Non-defective} if our experiment is to inspect a light bulb.

Example 1.6 Consider a generic die-tossing experiment by a human experimenter. Clearly,

2. studded with six distinctly flavoured candy (tongue!), or

3. contoured with six distinct bumps and pits (touch!), or

4. acoustically discernible at six different frequencies (ears!), or

5. painted with six different colours (eyes!), or

6. marked with six different numbers 1, 2, 3, 4, 5, 6 (eyes!), or , . . .

Definition 1.2 A trial is a single performance of an experiment and it results in an out-

Example 1.7 We call a single roll of a die as a trial.

Example 1.8 We call a single toss of a coin as a trial.

Definition 1.3 An n-product experiment is obtained by repeatedly performing n trials

Definition 1.4 An ∞-product experiment is defined as

lim n-product experiment of some mother experiment .

Ω = {HHHH · · · , HTHH · · · , THHH · · · , TTHH · · · , . . . ,

Axiom (1): for any event A, 0 ≤ P (A) ≤ 1

Axiom (2): if Ω is the sample space then P (Ω) = 1

Axiom (3): if A and B are disjoint, i.e., P (A ∩ B) = ∅ then

P (A) = lim N (A, n)/n

The following three Theorems are merely properties of probability.

P (Ac ) = 1 − P (A) . (1)

Proof: By the definition of complement, we have Ω = A ∪ Ac and A ∩ Ac = ∅. Hence by

1 = P (Ω) = P (A) + P (Ac ), thus P (Ac ) = 1 − P (A).

P (A1 ∪ A2 ∪ A3 ∪ · · · ∪ Am ) = P (A1 ) + P (A2 ) + P (A3 ) + · · · + P (Am ) . (2)

Proof: This is a consequence of applying Axiom (3) repeatedly:

P (A1 ∪ A2 ∪ A3 ∪ · · · ∪ Am ) = P (A1 ∪ (A2 ∪ · · · ∪ Am ))

(b) What is the most likely outcome?

(c) P (‘picking any letter’) = P (Ω) =

1.4 Conditional Probability

P (B1 ∪ B2 ∪ · · · |A) = P (B1 |A) + P (B2 |A) + · · · .

P (B|A) = 1 − P (B c |A) . (6)

P (B1 ∪ B2 |A) = P (B1 |A) + P (B2 |A) − P (B1 ∩ B2 |A) . (7)

Definition 1.12 Independent events. If events A and B are such that

Accordingly, three events A, B, C are independent if and only if

P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 )P (E3 ) = (P ({2, 4, 6}))3 = (P ({2} ∪ {4} ∪ {6}))3

Definition 1.13 Independence of n Events. Similarly, n events A1 , . . . , An are called

Theorem 1.14 Total probability theorem. Suppose B1 ∪ B2 · · · ∪ Bn is a sequence of

Theorem 1.15 Bayes theorem.

where we can solve for x = 7 − 3 = 4. Another example is to use traditional variables to

are also traditional. When we wrote functions of a variable, such as x, in:

F (x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}) , for any x ∈ R . (11)

NOTE: Distribution function or DF is sometimes called cumulative distribution function or CDF

And the probability that X takes on a specific value x is:

2.1 Discrete Random Variables and their Distributions

Definition 2.4 The probability mass function or PMF f of a discrete RV X is:

The DF for discrete uniform RV X is:

2.1.2 Discrete non-uniform random variables with finitely many possibilities

and its DF is: 

We emphasise the dependence of the probabilities on the parameter θ by specifying it fol-

2.1.3 Discrete non-uniform random variables with infinitely many possibilities