Professional Documents
Culture Documents
Math103 ProbStats Raaz 2010 S1 PDF
Math103 ProbStats Raaz 2010 S1 PDF
Contents
1 Sets, Experiments and Probability 3
1.1 Rudiments of Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Random Variables 15
2.1 Discrete Random Variables and their Distributions . . . . . . . . . . . . . . 18
2.1.1 Discrete uniform random variables with finitely many possibilities . . 19
2.1.2 Discrete non-uniform random variables with finitely many possibilities 20
2.1.3 Discrete non-uniform random variables with infinitely many possibilities 22
2.2 Continuous Random Variables and Distributions . . . . . . . . . . . . . . . . 25
3 Expectations 33
List of Tables
1 f (x) and F (x) for the sum of two independent tosses of a fair die RV X. . . 21
2 DF Table for the Standard Normal Distribution. . . . . . . . . . . . . . . . . 47
3 Quantile Table for the Standard Normal Distribution. . . . . . . . . . . . . . 48
1
List of Figures
1 f (x) = P (x) = 16 and F (x) of the fair die toss RV X of Example 2.4 . . . . . 19
2 f (x) and F (x) of an astragali toss RV X of Example 2.6 . . . . . . . . . . . 21
3 f (x) and F (x) of RV X for the sum of two independent tosses of a fair die. . 21
4 Probability density function of the volume of rain in cubic inches over the
lecture theatre tomorrow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 PDF and DF of Normal(µ, σ 2 ) RV for different values of µ and σ 2 . . . . . . 30
2
1 Sets, Experiments and Probability
1.1 Rudiments of Set Theory
1. A set is a collection of distinct objects or elements and we enclose the elements by
curly braces. For example, the collection of the two letters H and T is a set and we
denote it by {H, T}. But the collection {H, T, T} is not a set (do you see why? think
distinct!). Also, recognise that there is no order to the elements in a set, i.e. {H, T} is
the same as {T, H}.
2. We give convenient names to sets. For example, we can call the set {H, T} by A and
write A = {H, T} to mean it.
6. We say that a set A is not a subset of a set B if at least one element of A is not an
element of B and write A * B. For example, {1, 2} is not a subset of {1, 3, 4} since
2 ∈ {1, 2} but 2 ∈
/ {1, 3, 4} and write {1, 2} * {1, 2, 3, 4} to mean this.
10. The empty set contains no elements and it is the collection of nothing. It is denoted
by ∅ = {}.
11. Given some universal set, say Ω, the Greek letter Omega, the Complement of a set
A denoted by Ac is the set of all elements in Ω that are not in A. For example, if
Ω = {H, T} and A = {H} then Ac = {T}. Note that for any set A ⊆ Ω:
Ac ∩ A = ∅, A ∪ Ac = Ω, Ωc = ∅, ∅c = Ω .
12. When we have more than two sets, we can define unions and intersections similarly.
The union of m sets m
[
Aj = A1 ∪ A2 ∪ · · · ∪ Am
j=1
3
consists of elements that are in at least one of the m sets A1 , A2 , . . . , Am , and the
union of infinitely many sets
∞
[
Aj = A1 ∪ A2 ∪ · · · ∪ · · ·
j=1
of m sets consists of elements that are in each of the m sets and the intersection of
infinitely many sets
∞
\
Aj = A1 ∩ A2 ∩ · · ·
j=1
Exercise 1.1 Let Ω = {1, 2, 3, 4, 5, 6}, A = {1, 3, 5} and B = {2, 4, 6}. By using the
definitions of sets and set operations find the following sets:
Ac = Bc = Ωc = ∅c =
{1}c = { } A∪B = A∩B = A∪Ω=
A∩Ω= B∩Ω= B∪Ω= A ∪ Ac =
B ∪ Bc = etc.
Example 1.1 For three sets A, B and C, the Venn diagrams for A∪B, A∩B and A∩B ∩C
are:
A A A
B
B B
C
Ω Ω Ω
4
Exercise 1.2 Let A = {1, 3, 5, 7, 9, 11}, B = {1, 2, 3, 5, 8, 13} and C = {1, 2, 4, 8, 16, 32}
denote three sets. Let us use a Venn diagram to visualise these three sets and their intersec-
tions. Can you mark which sets correspond to A, B and C in the figure below.
1.2 Experiments
Definition 1.1 An experiment is an activity or procedure that produces distinct, well-
defined possibilities called outcomes. The set of all outcomes is called the sample space
and is denoted by Ω, the upper-case Greek letter Omega. We denote a typical outcome in Ω
by ω, the lower-case Greek letter omega, and a typical sequence of possibly distinct outcomes
by ω1 , ω2 , ω3 , . . ..
Example 1.3 Ω = {Heads, Tails} if our experiment is to note the outcome of a coin toss.
In Examples 1.2 and 1.3, Ω only has two outcomes and we can refer to the sample space of
such two-outcome experiments generically as Ω = {ω1 , ω2 }. For instance, the two outcomes
of of Example 1.2 are ω1 = Defective and ω2 = Non-defective while those of Example 1.3 are
ω1 = Heads and ω2 = Tails.
Example 1.4 If our experiment is to roll a die whose faces are marked with the six numerical
symbols or numbers 1, 2, 3, 4, 5, 6 then there are six outcomes corresponding to the number
that shows on the top. Thus, the sample space Ω for this experiment is {1, 2, 3, 4, 5, 6}.
5
Exercise 1.3 Suppose our experiment is to observe whether it will rain or shine tomorrow.
What is the sample space for this experiment? Answer: Ω = { }.
The subsets of Ω are called events. The outcomes ω1 , ω2 , . . ., when seen as subsets of Ω,
such as, {ω1 }, {ω2 }, . . ., are simple events.
Example 1.5 In our roll a die experiment of Example 1.4 with Ω = {1, 2, 3, 4, 5, 6}, the set
of odd numbered outcomes A = {1, 3, 5} or the set of even numbered outcomes B = {2, 4, 6}
are examples of events. The simple events are {1}, {2}, {3}, {4}, {5}, and {6}.
An experimenter often performs more that one trial. Repeated trials of an experiment forms
the basis of science and engineering as the experimenter learns about the phenomenon by
repeatedly performing the same mother experiment with possibly different outcomes. This
repetition of trials in fact provides the very motivation for the definition of probability in
§ 1.3.
6
Example 1.9 Suppose we toss a coin twice by performing two trials of the coin toss ex-
periment of Example 1.3 and use the short-hand H and T to denote the outcome of Heads
and Tails, respectively. Then our sample space Ω = {HH, HT, TH, TT}. Note that this is the
2-product experiment of the coin toss mother experiment.
Exercise 1.4 What is the event that at least one Heads occurs in the 2-product experiment
of Example 1.9, i.e., tossing a fair coin twice?
Exercise 1.5 What is the sample space of the 3-product experiment of the coin toss exper-
iment, i.e., tossing a fair coin thrice?
Remark 1.5 Loosely speaking, a set that can be enumerated or tagged uniquely by natural
numbers N = {1, 2, 3, . . .} is said to be countably infinite or contain countably many
elements. Some examples of such sets include any finite set, the set of natural numbers
N = {1, 2, 3, . . .}, the set of non-negative integers {0, 1, 2, 3, . . .}, the set of all integers Z =
{. . . , −3, −2, −1, 0, 1, 2, 3, . . .}, the set of all rational numbers Q = {p/q : p, q ∈ Z, q 6= 0},
but the set of real numbers R = (−∞, ∞) is uncountably infinite.
Example 1.10 The sample space Ω of the ∞-product experiment of tossing a coin infinitely
many times has uncountably infinitely many elements and is in bijection with all binary
numbers in the unit interval [0, 1] — just replace H with 1 and T with 0. We cannot
enumerate all outcomes in Ω but can show some outcomes:
1.3 Probability
Definition 1.6 Probability is a function P that assigns real numbers to events, which
satisfies the following four Axioms:
P (A ∪ B) = P (A) + P (B)
7
Axiom (4): if A1 , A2 , . . . is an infinite sequence of pairwise disjoint or mutually
exclusive events, i.e., Ai ∩ Aj = ∅ whenever i 6= j, then
∞
! ∞
[ X
P Ai = P (Ai )
i=1 i=1
These axioms are merely assumptions that are justified and motivated by the frequency
interpretation of probability in n-product experiments as n tends to infinity,
which states that if we repeat an experiment a large number of times then the fraction of
times the event A occurs will be close to P (A). To be precise, if we let N (A, n) be the
number of times A occurs in the first n trials then
Given P (A) = limn→∞ N (A, n)/n, Axiom (1) simply affirms that the fraction of times a given
event A occurs must be between 0 and 1. If Ω has been defined properly to be the set of
ALL possible outcomes, then Axiom (2) simply affirms that the fraction of times something
in Ω happens is 1. To explain Axiom (3), note that if A and B are disjoint then
N (A ∪ B, n) = N (A, n) + N (B, n)
since A ∪ B occurs if either A or B occurs but it is impossible for both to occur. Dividing
both sides of the previous equality by n and letting n → ∞, we arrive at Axiom (3).
Axiom (3) implies that Axiom (4) holds for a finite number of sets. In many cases the sample
space is finite so Axiom (4) is not relevant or necessary. Axiom (4) is a new assumption for
infinitely many sets as it does not simply follow from Axiom (3) any longer. Axiom (4) is
more difficult to motivate but without it the theory of probability becomes more difficult
and less useful, so we will impose this assumption on utilitarian grounds.
Theorem 1.7 Complementation Rule. The probability of an event A and its comple-
ment Ac in a sample space Ω, satisfy
8
Example 1.11 Recall the coin toss experiment of Example 1.3 with Ω = {Heads, Tails}.
Suppose that our coin happens to be fair with P (Heads) = 1/2. Since, {Tails}c = {Heads},
we can apply the complementation rule to find the probability of observing a Tails from
P (Heads) as follows:
1
P (Tails) = 1 − P (Heads) = .
2
Theorem 1.8 Addition Rule for Mutually Exclusive Events. For mutually exclusive
or pair-wise disjoint events A1 , . . . , Am in a sample space Ω,
Example 1.12 Let us observe the number on the first ball that pops out in a New Zealand
Lotto trial. There are forty balls labelled 1 through 40 for this experiment and so the sample
space Ω = {1, 2, 3, . . . , 39, 40}. Because the balls are vigorously whirled around inside the
Lotto machine before the first one pops out, we can model each ball to pop out first with
1
the same probability. So, we assign each outcome ω ∈ Ω the same probability of 40 , i.e., our
probability model for this experiment is:
1
P (ω) = , for each ω ∈ Ω = {1, 2, 3, . . . , 39, 40} .
40
NOTE: we sometimes abuse notation and write P (ω) instead of the more accurate but
cumbersome P ({ω}) when writing down probabilities of simple events.
Now, let’s check if Axiom (1) is satisfied for simple events in our model for this Lotto
experiment,
1
0 ≤ P (1) = P (2) = · · · = P (40) = ≤1
40
Is Axiom (3) satisfied?
For example, disjoint simple events {1} and {2}
1 1 2
P ({1, 2}) = P ({1} ∪ {2}) = P ({1}) + P ({2}) = + =
40 40 40
Is Axiom (2) satisfied?
Yes, by Equation (2) of the addition rule for mutually exclusive events (Theorem 1.8):
40
! 40
[ X 1 1 1
P (Ω) = P ({1, 2, . . . , 40}) = P i = P (i) = + + ··· + =1
i=1 i=1
40 40 40
9
(a) 1114 NZ Lotto draw frequency from 1987 to 2008. (b) 1114 NZ Lotto draw relative frequency from 1987 to 2008.
Recommended Activity 1.1 Explore the following web sites to learn more about NZ and
British Lotto. The second link has animations of the British equivalent of NZ Lotto.
http://lotto.nzpages.co.nz/
http://understandinguncertainty.org/node/39
Theorem 1.9 Addition Rule for Two Arbitrary Events. For events A and B in a
sample space,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) . (3)
Proof:
P (A ∪ B) = P (A ∪ (B ∩ Ac ))
= P (A) + P (B ∩ Ac ) by Axiom (3) and disjointness
= P (A) + P (B) − P (A ∩ B)
The last equality P (B ∩ Ac ) = P (B) − P (A ∩ B) is due to Axiom (3) and the disjoint union
of B = (B ∩ Ac ) ∪ (A ∩ B) giving P (B) = P (B ∩ Ac ) + P (A ∩ B). It is easy to see this with
a Venn diagram.
Exercise 1.6 In English language text, the twenty six letters in the alphabet occur with
the following frequencies:
E 13% R 7.7% A 7.3% H 3.5% F 2.8% M 2.5% W 1.6% X 0.5% J 0.2%
T 9.3% O 7.4% S 6.3% L 3.5% P 2.7% Y 1.9% V 1.3% K 0.3% Z 0.1%
N 7.8% I 7.4% D 4.4% C 3% U 2.7% G 1.6% B 0.9% Q 0.3%
Suppose you pick one letter at random from a randomly chosen English book from our
central library with Ω = {A, B, C, . . . , Z} – ignoring upper/lower cases, then what is the
probability of the following events?
(a) P ({Z}) =
10
(d) P ({E, Z}) = — by Axiom (3)
(e) P (‘picking a vowel’) =
by Equation (2) of addition rule for mutually exclusive events (Theorem 1.8).
(f) P (‘picking any letter in the word WAZZZUP’) = by Equa-
tion (2) of addition rule for mutually exclusive events (Theorem 1.8).
(g) P (‘picking any letter in the word WAZZZUP or a vowel’) =
= 42.2%
by Equation (3) of addition rule for two arbitrary events (Theorem 1.9).
Definition 1.10 The probability of an event B under the condition that an event A occurs
is called the conditional probability of B given A and is denoted by P (B|A). In this case
A serves as a new (reduced) sample space, and that probability is the fraction of P (A) which
corresponds to A ∩ B. Thus,
P (A ∩ B)
P (B|A) = , if P (A) 6= 0 . (4)
P (A)
Similarly, the conditional probability of A given B is
P (A ∩ B)
P (A|B) = , if P (B) 6= 0 . (5)
P (B)
Conditional Probability is a probability and therefore all four Axioms of probability
also hold for conditional probability of events given the conditioning event A has P (A) > 0.
Axiom (1): For any event B, 0 ≤ P (B|A) ≤ 1.
Axiom (2): P (Ω|A) = 1.
Axiom (3): For any two disjoint events B1 and B2 , P (B1 ∪B2 |A) = P (B1 |A)+P (B2 |A).
Axiom (4): For mutually exclusive or pairwise-disjoint events, B1 , B2 , . . .,
Note that the complementation and addition rules also follow for conditional probability.
1. complementation rule for conditional probability:
11
2. addition rule for two arbitrary events B1 and B2 :
Theorem 1.11 Multiplication Rule. If A and B are events and P (A) 6= 0, P (B) =
6 0,
then
P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) . (8)
Proof: Solving for P (A ∩ B) in the Definitions (4) and (5) of conditional probability, we
obtain Equation (8) of the above theorem.
Example 1.13 Suppose the NZ All Blacks team is playing in a four team Rugby match. In
the first round they have a tough opponent that they will beat 40% of the time but if they
win that game they will play against an easy opponent where their probability of success is
0.8. What is the probability that they will win the tournament?
If A and B are the events of victory in the first and second games, respectively, then P (A) =
0.4 and P (B|A) = 0.8, so by multiplication rule, the probability that they will win the
tournament is:
P (A ∩ B) = P (A)P (B|A) = 0.4 × 0.8 = 0.32 .
Exercise 1.7 In Example 1.13, what is the probability that the All Blacks will win the first
game but loose the second?
P (A ∩ B) = P (A)P (B),
they are called independent events. Assuming P (A) 6= 0, P (B) 6= 0, we have P (A|B) =
P (A), and P (B|A) = P (B). This means that the probability of A does not depend on the
occurrence or nonoccurence of B, and conversely. This justifies the term “independent”.
12
Example 1.14 Suppose you toss a fair coin twice such that the first toss is independent of
the second. Then,
1 1 1
P (HT) = P (Heads on the first toss ∩ Tails on the second toss) = P (H)P (T) = × = .
2 2 4
Similarly, P (HH) = P (TH) = P (TT) = 12 × 12 = 14 . Thus, P (ω) = 1
4
for every ω in the sample
space Ω = {HT, HH, TH, TT}.
P (A ∩ B) = P (A)P (B),
P (B ∩ C) = P (B)P (C),
P (C ∩ A) = P (C)P (A),
P (A ∩ B ∩ C) = P (A)P (B)P (C).
Example 1.15 Suppose you independently toss a fair die thrice. What is the probability
of getting an even outcome in all three trials?
Let Ei be the event that the outcome is an even number on the i-th trial. Then, the
probability of getting an even number in all three trials is:
Example 1.16 Suppose you toss a fair coin independently m times. Then each of the 2m
possible outcomes in the sample space Ω has equal probability of 21m due to independence.
n
X n
X
P (A) = P (A ∩ Bi ) = P (A|Bi )P (Bi ) . (9)
i=1 i=1
Proof: The first equality is due to addition rule for mutually exclusive events, A ∩ B1 , A ∩
B2 , . . . , A ∩ Bn and the second equality is due to multiplication rule.
13
Exercise 1.8 An well-mixed urn contain five red and ten black balls. We draw two balls
from the urn without replacement. What is the probability that the second ball drawn is
black?
14
2 Random Variables
We are used to traditional variables such as x as an “unknown” in the equation:
x+3=7 ,
y = 3x − 2 ,
where the variable y for the y-axis is determined by the value taken by the variable x, as x
varies over the real line R = (−∞, ∞). The variables we have used to represent sequences
such as:
{an }∞n=1 = a1 , a2 , a3 , . . . ,
Question: What is common to all these variables above, such as, x, y, a1 , a2 , a3 , . . . , f (x)?
Answer: They are instances of deterministic variables, that is, these traditional variables
take a fixed or deterministic value when we can solve for them.
We need a new kind of variable to deal with real-world situations where the same variable
may take different values in a non-deterministic manner. Random variables do this job for
us. Random variables, unlike traditional deterministic variables can take a bunch of different
values!
In fact, random variables are actually functions! They take you from the “world of random
processes and phenomena” to the world of real numbers. In other words, a random variable
is a numerical value determined by the outcome of the experiment.
15
Definition 2.1 A Random variable or RV is a function from the sample space Ω to the
set of real numbers R:
X(ω) : Ω → R ,
such that, for every real number x, the corresponding set {ω ∈ Ω : X(ω) ≤ x}, i.e. the set
of outcomes whose numerical value is less than or equal to x, is an event. The probability
of such events is given by the function F (x) : R → [0, 1] called the distribution function
or DF of the random variable X:
Example 2.1 Recall the rain or shine experiment of Example 1.3 with sample space Ω =
{rain, shine}. We can associate a random variable X with this experiment as follows:
(
1, if ω = rain
X(ω) =
0, if ω = shine
Thus, X is 1 if it will it rain tomorrow and 0 otherwise. Note that another equally valid
discrete random variable, say Y , for this experiment is:
(
π, if ω = rain
Y (ω) = √
2, if ω = shine
A random variable can be chosen to assign each outcome ω ∈ Ω to any real number as the
experimenter desires.
Recall the experiments of Example 1.6 that involved smelling, tasting, touching, hearing,
or seeing to discern between outcomes. It becomes very difficult to communicate, process
and make decisions based on outcomes of experiments that are discerned in this manner and
even more difficult to record them unambiguously. This is where real numbers can give us a
helping hand.
Data are typically random variables that act as numerical placeholders for out-
comes of an experiment about some real-world random process or phenomenon. We said
that the random variable can take one of many values, but we cannot be certain of which
value it will take. However, we can make probabilistic statements about the value x
the random variable X will take. This can be done with probabilities.
Theorem 2.2 Probability that the RV X takes a value x in the half-open interval (a, b],
i.e., a < x ≤ b, is:
P (a < X ≤ b) = F (b) − F (a) . (12)
16
Proof: Since the events (X ≤ a) = {ω : X(ω) ≤ a} and (a < X ≤ b) = {ω : a < X(ω) ≤ b}
are mutually exclusive or disjoint events whose union is the event (X ≤ b) = {ω : X(ω) ≤ b},
Axiom (3) of Definition 1.6 of probability and by Equation (11) in Definition 2.1 of DF,
F (b) = P (X ≤ b) = P (X ≤ a) + P (a < X ≤ b) = F (a) + P (a < X ≤ b) .
Subtraction of F (a) from both sides of the above equation yields Equation (12).
Example 2.2 Recall the fair coin toss experiment of Example 1.11 with Ω = {H, T} and
P (H) = P (T) = 1/2. We can associate a random variable X with this experiment as follows:
(
1, if ω = H
X(ω) =
0, if ω = T
Note that this choice of values for X equates to counting the number of H in one trial of the
fair coin toss experiment. The DF for X is:
P (∅) = 0,
if − ∞ < x < 0
1
F (x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}) = P ({T}) = 2 , if 0 ≤ x < 1
P ({H, T}) = P (Ω) = 1, if 1 ≤ x < ∞
All we are really saying above in detail to show the underlying definitions just amounts to:
1
2 if x = 0
P (X = x) = 12 if x = 1
0 otherwise
Example 2.3 Now let us define at a discrete random variable that can take one of six
possible values from {1, 2, 3, 4, 5, 6} in the toss a fair die experiment. This X gives the
number that shows up on the top face as we roll a fair six-faced die whose faces are labelled by
numerical symbols 1, 2, 3, 4, 5, 6. Note that here Ω is the set of numerical symbols that label
each face while each of these symbols are associate with the real number x ∈ {1, 2, 3, 4, 5, 6}.
Thus,
1, if ω is the outcome that the die lands with the face labelled by 1 on top
2, if ω is the outcome that the die lands with the face labelled by 2 on top
3, if ω is the outcome that the die lands with the face labelled by 3 on top
X(ω) =
4, if ω is the outcome that the die lands with the face labelled by 4 on top
5, if ω is the outcome that the die lands with the face labelled by 5 on top
6, if ω is the outcome that the die lands with the face labelled by 6 on top
17
Example 2.4 Consider the random variable X of the Toss a fair dice experiment of Ex-
ample 2.3 with P (X = x) = P ({ω : X(ω) = x}) = 61 for each x ∈ {1, 2, 3, 4, 5, 6} and 0
otherwise. The probability that X ≤ 3 can be obtained by
3
F (3) = P (X ≤ 3) = P ({ω : X(ω) ≤ 3}) = P ({1, 2, 3}) = P ({1}) + P ({2}) + P ({3}) =
6
Exercise 2.1 Similarly, can you complete the following probability statement about the
value x the random variable X of Example 2.4 will take?
P (X = 1) = P (X = 2) = P (X = 3) = P (X = 4) = P (X = 5) = P (X = 6) = .
A discrete RV X takes on at most countably many values in R. The rain or shine random
variables of Example 2.1 and the fair coin toss RV of Example 2.2 can only take two possible
values while the toss a fair die RV of Example 2.4 can only take six possible values. Thus,
they are examples of discrete random variables. We can study discrete random variables in
a general setting.
From this we get the values of the Distribution Function F (x) by simply taking sums,
X X
F (x) = f (xi ) = pi , (14)
xi ≤x xi ≤x
where for any given x, we sum all the probabilities pi for which xi is smaller than or equal
to that of x. Thus, DF F (x) for discrete random variable is a step function with upward
jumps of size pi at the possible values xi of X and constant in between.
Out of this class of discrete random variables we will define specific kinds as they arise often
in applications. We classify discrete random variables into three types for convenience as
follows:
• Discrete uniform random variables with finitely many possibilities
• Discrete non-uniform random variables with finitely many possibilities
• Discrete non-uniform random variables with (countably) infinitely many possibilities
18
2.1.1 Discrete uniform random variables with finitely many possibilities
Definition 2.5 Discrete Uniform Random Variable. We say that a discrete RV X is
uniformly distributed over k possible values {x1 , x2 , . . . , xk } if its PMF is:
(
pi = k1 if x ∈ {x1 , x2 , . . . , xk } ,
f (xi ) = (15)
0 otherwise .
Example 2.5 The fair die toss RV X of Example 2.4 is a discrete uniform RV with possible
values {x1 , x2 , x3 , x4 , x5 , x6 } = {1, 2, 3, 4, 5, 6}. Its PMF and DF are given by substituting
k = 6 in Equations 15 and 16, respectively. These functions are depicted in Figure 2.5. Pay
attention to the ◦ and • in the plot to relate them to the Equations 15 and 16. The ◦, • and
the dotted lines are used to depict how the value of f (x) and F (x) jump as x varies.
1
Figure 1: f (x) = P (x) = 6
and F (x) of the fair die toss RV X of Example 2.4
Exercise 2.2 Plot the PMF and DF in detail along with ◦, • and the dotted lines for the
fair coin toss RV X of Example 2.2 and convince yourself that it is also a discrete uniform
RV.
19
Exercise 2.3 Recall the first ball that pops out in a New Zealand Lotto trial of Example 1.12.
First associate a RV X with this experiment that turns the integer-symbolised ball labels
into real numbers in the set of possible values {1, 2, 3, . . . , 39, 40}. Then, give the PMF and
DF for X and know how the plot should look.
Two useful formulae for discrete distributions are readily obtained as follows. For the prob-
ability corresponding to intervals we have
X
P (a < X ≤ b) = F (b) − F (a) = pi . (17)
a<xi ≤b
This is the sum of all probabilities pi for which xi satisfies a < xi ≤ b. From this and
P (Ω) = 1 we obtain the following formula that the sum of all probabilities is 1.
X
pi = 1 . (18)
i
Exercise 2.4 Suppose we toss a possibly biased of unfair coin with a given probability
0 ≤ p ≤ 1 of H, i.e., P (H) = p and P (T) = 1 − p = q. Associate the RV X to this
experiment to report the number of Heads in one trial. Compute P (X = 1), P (X =
2), P (X = 3), P (X = 0)? Sketch the PDF and DF of X when p takes each of the following
four values 0, 1/2, 1/3, 2/3, 1. Note that, p is really a parameter of this RV X. We will see
more parametrised random variables in the sequel.
Example 2.7 Let the random variable X denote the sum of two independent tosses of
a fair die. This discrete RV has possible values in {2, 3, 4, . . . , 12}. There are a total of
6 × 6 = 36 equally likely outcomes (ω1 , ω2 ) ∈ Ω = {(1, 1), (1, 2), . . . , (6, 6)}, where ω1 is
the outcome of the first toss and ω2 is that of the second independent toss. Each such
outcome (ω1 , ω2 ) has probability 1/36. Now X = 2 occurs in the case of the outcome (1, 1);
X = 3 in the case of the two outcomes (1, 2) and (2, 1); X = 4 in the case of the three
outcomes (1, 3), (2, 2), (3, 1); and so on as shown by the mapping in Figure 2.1.2. Hence,
f (x) = P (X = x) and F (x) = P (X ≤ x) have the values shown in Table 2.1.2. Figure 2.1.2
shows the plots of f (x) and F (x).
20
Figure 2: f (x) and F (x) of an astragali toss RV X of Example 2.6
1
(a) X : Ω → {2, 3, 4, . . . , 11, 12}, P (ω) = 36 for any ω ∈ Ω (b) PMF f (x) and DF F (x)
Figure 3: f (x) and F (x) of RV X for the sum of two independent tosses of a fair die.
x 2 3 4 5 6 7 8 9 10 11 12
f (x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
F (x) 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 36/36
Table 1: f (x) and F (x) for the sum of two independent tosses of a fair die RV X.
21
Example 2.8 Compute the probability of a sum of at least 4 and at most 8 from the
probability Table 2.1.2 in Example 2.7. From Equation 17 we get:
26 3 23
P (3 < X ≤ 8) = F (8) − F (3) = − =
36 36 36
Recommended Activity 2.1 You can get a nice treatment of the sum of two independent
tosses of a fair die in ten minutes and seven seconds by watching the following UTube video:
http://www.youtube.com/v/2XToWi9j0Tk&hl=en_US&fs=1&rel=0&border=1
Next we see some the most basic parametric model of discrete random variables.
Definition 2.6 Bernoulli(θ) Random Variable. Given a parameter θ ∈ (0, 1), the prob-
ability mass function (PMF) for the Bernoulli(θ) RV X is:
θ
if x = 1 ,
f (x; θ) = 1 − θ if x = 0 ,
0 otherwise ,
Example 2.9 Let the random variable X return 1 if we observe a H and 0 otherwise when
we toss a possibly biased coin with parameter θ ∈ (0, 1). Then, P ({H}) = P (X = 1) = θ
and X is a Bernoulli(θ) RV.
Example 2.10 Waiting For the First Heads. Suppose our experiment is to toss a fair
coin independently and identically (that is the same coin is tossed in essentially the same
manner independent of the other tosses in each trial) as often as necessary until we have a
heads denoted by H. Let the RV X denote the Number of trials until the first H appears.
Then, clearly the possible values X can take is {1, 2, 3, . . .}. Let us compute the PMF of X
22
by independence of events:
1
f (1) = P (X = 1) = P (H) = ,
2
2
1 1 1
f (2) = P (X = 2) = P (TH) = · = ,
2 2 2
3
1 1 1 1
f (3) = P (X = 3) = P (TTH) = · · = , etc.
2 2 2 2
and in general x
1
f (x) = P (X = x) = , x = 1, 2, . . . .
2
Example 2.11 Recall experiment in Example 2.4 of tossing a possibly biased coin with a
fixed parameter θ = P (H), where 0 < θ < 1. Now suppose you use such a coin in the waiting
for the first Heads experiment with RV X in Example 2.10. Confirm that the probabilities
1−rx
indeed sum to 1 by the fact that the x-th partial sums Sx = a 1−r of the geometric series
P ∞ x 2 3 a
x=0 ar = a + ar + ar + ar + · · · converge to S = 1−r if −1 < r < 1. Let us compute the
θ-specific PMF of X by independence of events:
f (1; θ) = P (X = 1) = P (H) = (1 − θ)0 θ = θ ,
f (2; θ) = P (X = 2) = P (TH) = (1 − θ)1 θ ,
f (3; θ) = P (X = 3) = P (TTH) = (1 − θ)2 θ , etc.
and in general
f (x; θ) = P (X = x) = (1 − θ)x−1 θ, x = 1, 2, . . . .
And, we already saw that this series converges if 0 < (1 − θ) < 1:
θ θ
lim F (x; θ) = f (1; θ) + f (2; θ) + f (3; θ) + · · · = = =1 .
x→∞ 1 − (1 − θ) θ
We have just derived the PMF of a θ-parametric family of discrete random variable that can
take countably infinitely many values in {1, 2, 3, . . .}. We also showed that the PMF sums
to 1 as it should. This is called the geometric distribution with “success probability”
parameter θ for obvious reasons.
Definition 2.7 Binomial(n, θ) Random Variable. Let the RV X = ni=1 Xi be the sum
P
of n independent and identically distributed Bernoulli(θ) RVs, i.e.:
n
IID
X
X= Xi , X1 , X2 , . . . , Xn ∼ Bernoulli(θ) .
i=1
23
n
where, x
is:
n n(n − 1)(n − 2) . . . (n − x + 1) n!
= = .
x x(x − 1)(x − 2) · · · (2)(1) x!(n − x)!
n
x
is read as “n choose x” — the number of ways of choosing x object from n of them.
Example 2.12 Find the probability that seven of ten persons will recover from a tropical
disease if we can assume independence and the probability is identically 0.80 that any one
of them will recover from the disease.
Substituting x = 7, n = 10, and θ = 0.8 into the formula for the binomial distribution, we
get:
10 10!
f (7; 10, 0.8) = × (0.8)7 × (1 − 0.8)10−7 = × (0.8)7 × (1 − 0.8)10−7 ,
7 (10 − 7)!7!
Exercise 2.5 Compute the probability of obtaining at least two 6’s in rolling a fair die in-
dependently and identically four times.
= .
Definition 2.8 Poisson(λ) Random Variable. Given a parameter λ > 0, the PMF of
the Poisson(λ) RV X is:
λx
f (x; λ) = exp(−λ) where x = 0, 1, . . . (19)
x!
For some values of λ, it can be proved that this distribution is obtained as a limiting case
of the Binomial(n, θ) RV, if we let θ → 0 and n → ∞ so that the product nθ approaches a
finite value. For instance, λ = nθ may be kept constant. Thus, Poisson(λ) RV is really a
limit of Binomial(n, θ) RV as n → ∞, θ → 0 and nθ = λ.
Example 2.13 If the probability of producing a defective screw is 0.01, what is the proba-
bility that a lot of 100 screws will contain more than 2 defectives?
Let the complementary event that there are no more that two defectives be Ac . For its
probability we use the Binomial(n = 100, θ = 0.01) RV with nθ = 100 × 0.01 = 1. Then,
c 100 100 100 99 100
P (A ) = × 0.99 + × 0.01 × 0.99 + × 0.012 × 0.9998 = 92.06% .
0 1 2
24
Since θ is very small, we can approximate this by the much more convenient Poisson(λ) RV
with λ = nθ = 100 × 0.01 = 1, obtaining
0
11 12
c −1 1 −1 1
P (A ) ≈ e + + + =e 1+1+ = 91.97% .
0! 1! 2! 2
Thus P (A) = 1 − P (Ac ) = 1 − 91.97% = 8.03% under the Poisson(λ = 1) approximation.
Since the binomial distribution gives P (A) = 1 − P (Ac ) = 1 − 92.06% = 7.94%, the Poisson
approximation is quite good.
Example 2.14 If on the average, 2 cars enter a certain parking lot per minute, what is the
probability that during any given minute 4 or more cars will enter the lot?
To understand that the Poisson distribution is a model of the situation, we imagine the
minute to be divided into very many short time intervals, let θ be the (constant) probability
that a car will enter the lot during any such short interval, and assume independence of the
events that happen during those intervals. Then we are dealing with a binomial distribution
with very large n and very small θ, which we can approximate by the Poisson distribution
with
λ = nθ = 2 ,
because 2 cars enter on the average. The complementary event of the event four cars or
more during a given minute’ is three cars or fewer enter the lot’ and has the probability
0
21 22 23
−2 2
f (0; 2) + f (1; 2) + f (2; 2) + f (3; 2) = e + + + = 0.857 .
0! 1! 2! 3!
Therefore, the probability of interest is 1 − 0.857 = 14.3%.
Example 2.15 The random variable X is the exact amount of rain in inches over the roof
of this lecture theatre tomorrow.
X(ω) = x : Ω → [0, ∞) .
25
Figure 4: Probability density function of the volume of rain in cubic inches over the lecture
theatre tomorrow.
Exercise 2.6 For the continuous random variable X of Example 2.15 it is more interesting
to make probability statements such as P (X = x) about the actual amount of rain x that
will fall on the lecture theatre tomorrow. Can you see why the following statements are true?
In fact, for this continuous random variable P (X = x) = 0 for any real number x. Stop to
understand this!
After you have understood that P (X = x) = 0 for any x ∈ R, it should not be surprising
that P (1.1 ≤ X ≤ 1.10000001) can now be possibly more that 0.
Recommended Activity 2.2 You can get a nice informal treatment of the contents of
§ 2.1 and § 2.2 in ten minutes and two seconds by watching the following YouTube video:
http://www.youtube.com/v/Fvi9A_tEmXQ&hl=en_US&fs=1&rel=0&border=1
Definition 2.9 Continuous random variables take values on a continuous scale. A random
variable X is continuous, if its Distribution function F (x) can be given by an integral
Z x
F (x) = P (X ≤ x) = f (v)dv (20)
−∞
(we write v because x is needed as the upper limit of the integral) whose integrand f (x),
called the probability density function (PDF) of the distribution, is non-negative, and
is continuous, perhaps except for finitely many x-values. Differentiation gives the relation of
f to F as
f (x) = F 0 (x) (21)
for every x at which f (x) is continuous.
Then we obtain the very important formula for the probability corresponding to an interval:
Z b
P (a < X ≤ b) = F (b) − F (a) = f (v)dv . (22)
a
26
From 20 and P (Ω) = 1 we also have the analog of 18:
Z ∞
f (v)dv = 1 .
−∞
Continuous random variables are simpler than discrete ones with respect to intervals. Indeed,
in the continuous case the four probabilities corresponding to a < X ≤ b, a < X < b,
a ≤ X < b, and a ≤ X ≤ b with any fixed a and b (> a) are all the same.
where inf is called infimum, which is effectively the smallest element that random variable
X can take in order to satisfy the probability condition. It is also known as quantile or
percentile
The next example illustrates notations and typical applications of our present formulae.
Example 2.16 Let X have the density function f (x) = e−x , if (x ≥ 0) and zero otherwise.
Find the distribution function. Find the probabilities P ( 41 ≤ X ≤ 2) and P − 12 ≤ X ≤ 21 .
1. Z x x
F (x) = e−v dv = −e−v 0
= −e−x + 1 = 1 − e−x if x ≥ 0
0
Therefore, (
1 − e−x if x ≥ 0 ,
F (x) =
0 otherwise .
2. Z 2
1
P ≤X≤2 = e−v dv = F (2) − F ( 14 ) = 63.35%
4 1
4
3. 1
Z Z 0
1 1 2
−v
P − ≤X≤ = e dv + e−v dv = F ( 12 ) + 0 = 39.35%
2 2 0 − 12
4.
P (X ≤ x) = F (x) = 1 − e−x = 0.95
Therefore,
x = − log(1 − 0.95) = 2.9957 .
27
The previous example is a special case of the following parametric family of random variables.
Definition 2.11 Exponential(λ) Random Variable. Given a rate parameter λ > 0, the
Exponential(λ) random variable X has probability density function given by:
(
λ exp(−λx) x > 0 ,
f (x; λ) =
0 otherwise ,
and distribution function given by:
F (x; λ) = 1 − exp(−λx) .
The Exponential(λ) RV gives the waiting times between successive events of a Poisson(λ)
RV that is counting the number of events in unit time.
Example 2.17 At a certain location on highway, the number of cars exceeding the speed
limit by more than 10 kilometers per hour in half an hour is a Poisson(λ = 8.4) random
variable. What is the probability of a waiting time of less than 5 minutes between cars
exceeding the speed limit by more than 10 kilometers per hour?
Using half an hour as the unit of time, we have Poisson(λ = 8.4) giving the number of
arrivals in unit time. Therefore, the waiting time is a random variable having an exponential
distribution with λ = 8.4, and since 5 minutes is 16 of the unit of time, we find that the
desired probability is
Z 1
6 1
8.4e−8.4x dx = −e−8.4x 06 = −e−1.4 + 1
0
Definition 2.12 Uniform(a, b) Random Variable. The distribution with the density
1
f (x) = if a<x<b
b−a
and f = 0 otherwise is called the uniform distribution on the interval a < x < b. The
cumulative distribution function of uniform RV is
0
x<a ,
x−a
F (x) = b−a a ≤ x < b ,
1 x≥b .
28
Exercise 2.8 Find a probability density function for the random variable whose distribution
function is given by
0 x < 0
F (x) = x 0 ≤ x < 1
1 x≥1 .
f (x) =
(
f (x) =
.
Example 2.18 A machine pumps cleanser into a process at a rate which has a uniform
distribution in the interval 8.00 to 10.00 litres per minute. What is the pump rate which the
machine can be expected to exceed 61% of the time?
The density function of this distribution is
(
0.5 if 8 ≤ x ≤ 10
f (x) =
0 otherwise
Therefore,
8 + 0.61/0.5 = 8 + 1.22 = 9.22
The pump rate 9.22 is expected to exceed 61% of the time.
Exercise 2.9 The actual amount of coffee (in grams) in a 230-gram jar filled by a certain
machine is a random variable whose probability density is give by
0 x ≤ 227.5
f (x) = 51 227.5 < x < 232.5 .
0 x ≥ 232.5
Find the probabilities that a 230-gram jar filled by this machine will contain
29
Next we discuss the normal distribution. This is the most important continuous distribution
because in applications many random variables are normal random variables (that is, they
have a normal distribution) or they are approximately normal or can be transformed into
normal random variables in a relatively simple fashion. Furthermore, the normal distribution
is a useful approximation of more complicated distributions, and it also occurs in the proofs
of various statistical tests.
Definition 2.13 Given a location parameter µ ∈ (−∞, +∞) and a scale parameter σ 2 > 0,
the normal(µ, σ 2 ) or Gauss(µ, σ 2 ) random variable has probability density function:
" 2 #
1 1 x − µ
f (x; µ, σ 2 ) = √ exp − (σ > 0) (23)
σ 2π 2 σ
This is simpler than it may at first look. f (x; µ, σ 2 ) has these features.
1. µ is the expected value or mean parameter and σ 2 is the variance parameter.
√
2. 1/(σ 2π) is a constant factor that makes the area under the curve of f (x) from −∞
to ∞ equal to 1, as it must be.
3. The curve of f (x) is symmetric with respect to x = µ because the exponent is quadratic.
Hence for µ = 0 it is symmetric with respect to the y-axis x = 0.
4. The exponential function decays to zero very fast — the faster the decay the smaller
the standard deviation σ is.
The normal distribution has the distribution function
Z x " 2 #
1 1 v − µ
F (x; µ, σ 2 ) = √ exp − dv . (24)
σ 2π −∞ 2 σ
Here we needed x as the upper limit of integration and wrote v in the integrand.
For the corresponding standardised normal distribution with mean 0 and variance 1 we
denote F (z; 0, 1) by Φ(z). Then we simply have from
Z z
1 2
Φ(z) = √ e−u /2 du .
2π −∞
30
This integral cannot be integrated by one of the methods of calculus. But this is no serious
problem because its values can be obtained numerically and tabulated. Theses values are
needed in working with the normal distribution. The curve of Φ(z) is S-shaped. It increases
monotone from 0 to 1 and intersects the vertical axis at 1/2.
Theorem 2.14 The distribution function F (x; µ, σ 2 ) of the Normal(µ, σ 2 ) RV with any µ
and σ 2 is related to standardised Normal(0, 1) RV with DF Φ(z):
x−µ
F (x) = Φ .
σ
Example 2.19 Let X be normal with mean 5 and standard deviation 0.2. Find c or k
corresponding to the given probability
c−5 c−5
P (X ≤ c) = 95%, Φ = 95%, = 1.645, c = 5.329
0.2 0.2
P (5 − k ≤ X ≤ 5 + k) = 90%, 5 + k = 5.239
c−5
P (X ≥ c) = 1%, thus P (X ≥ c) = 99%, = 2.326, 5.465 .
0.2
Example 2.20 Suppose that the amount of cosmic radiation to which a person is exposed
when flying by jet across the United States is a random variable having a normal distribution
with a mean of 4.35 mrem and a standard deviation of 0.59 mrem. What is the probability
that a person will be exposed to more than 5.20 mrem of cosmic radiation on such a flight?
Looking up the entry corresponding to z = 5.20−4.35
0.59
= 1.44, and subtracting it from 1. We
get 1 − 0.9251 = 0.0749.
Exercise 2.10 Find the probabilites that a random variable having the standard normal
distribution will take on a value
31
(d) between -0.25 and 0.45.
32
3 Expectations
We learn of expectations of Random variables as a way to summarise random variables.
Definition 3.1 Expectation of a function g : X → R of a random variable X is:
(P
g(x)f (x) if X is a discrete RV
E(g(X)) = R ∞x
−∞
g(x)f (x)dx if X is a continuous RV
Often, population variance is denoted by σ 2 . Note that σ 2 > 0, and σ 2 = 0 for a point mass
random variable, that is a discrete random variable which can only take one possible value.
The following formula for variance is very useful: V (X) = E(X 2 ) − (E(X))2 . We can prove
the above formula by expanding the square as follows:
V (X) = E (X − E(X))2 = E X 2 − 2XE(X) + (E(X))2
33
Exercise 3.1 What are the population mean and variance of a biased coin with P (X =
9
1) = 10 ?
is uniformly distributed on the interval [a, b]. From the definition of the expectation of a
continuous random variable, we find that
Z ∞ Z b Z b Z b
1 1
E(X) = xf (x)dx = xf (x)dx = x dx = xdx
−∞ a a b−a b−a a
x=b
1 1 2 1 1
= x dx = (b2 − a2 ) = (b + a)(b − a) = (a + b)/2 ,
b−a 2 x=a b−a b−a
since
Z b Z b Z b x=b
2 2 2 1 1 2 1 1 3
E(X ) = x f (x) dx = x dx = x dx = x
a a b−a b−a a b−a 3 x=a
1 1 3
= (b − a3 )
b−a3
1 1 b2 + ab + a2
= (b − a)(b2 + ab + a2 ) = .
3b−a 3
Therefore variance is
2
b2 + ab + a2 b2 + ab + a2 a2 + 2ab + b2
2 (a + b)
V (X) = E(X) − (E(X)) = − = −
3 2 3 4
(b − a)2
= .
12
Example 3.3 Mean and variance of discrete uniform random variable Pk with 1, 2, . . . , k out-
m
comes, say for the fair k-faced die, based on Faulhaber’s formula for i=1 i , with m ∈ {1, 2},
are,
1 1 k(k + 1) k+1
E(X) = (1 + 2 + · · · + k) = = ,
k k 2 2
1 2 1 k(k + 1)(2k + 1) 2k 2 + 3k + 1
E(X 2 ) = 1 + 22 + · · · + k 2 = = ,
k k 6 6
2
2k 2 + 3k + 1 2k 2 + 3k + 1
2
2 2 k+1 k + 2k + 1
V (X) = E(X ) − (E(X)) = − = −
6 2 6 4
2 2 2 2
8k + 12k + 4 − 6k − 12k − 6 2k − 2 k −1
= = = .
24 24 12
34
Exercise 3.2 Find the mean and variance of the discrete uniform random variable X with
40 equi-probable outcomes 1, 2, . . . 40. Think of X as the probability model of the first ball
label in one NZ Lotto trial.
Example 3.4 Mean and Variance of Poisson random variable X with parameter λ. Recall
that the Taylor series of eλ
∞
λ λ2 λ3 λ4 X λx
e =1+λ+ + + + ... = .
2! 3! 4! x=0
x!
∞ ∞ ∞ ∞
X X e−λ λx −λ
X λx −λ
X λλx−1
E(X) = xf (x; λ) = x =e x =e = e−λ λeλ = λ .
x=0 x=0
x! x=0
x! x−1=0
(x − 1)!
since
∞ −λ x ∞
x λx−1 2λ 3λ2 4λ3
2e λ
X X
2 −λ −λ
E(X ) = x = λe = λe 1+ + + + ...
x=0
x! x=1
(x − 1)! 1 2! 3!
λ λ2 λ3 λ 2λ2 3λ3
−λ
= λe 1+ + + + ... + + + + ...
1 2! 3! 1 2! 3!
2λ 3λ2 λ2
−λ λ −λ λ
= λe e +λ 1+ + + ... = λe e +λ 1+λ+ + ...
2! 3! 2!
= λ e−λ eλ + λ eλ = λ e−λ eλ + λ eλ = λ(1 + λ)
= λ + λ2
Exercise 3.3 Show that the expectation and variance of the Exponentially distributed non-
negative random variable X with rate parameter λ and density f (x) = λ exp(−λx) is:
1 1
E(X) = V (X) = .
λ λ2
35
Exercise 3.4 What is the mean life of a light bulb whose life X [hours] has the density
f (x) = 0.001e−0.001x (x ≥ 0)?
E(X) = µ, V (X) = σ 2 .
36
Suppose the random variable X has finite E(X 2 ), then for any constant c > 0 we have
E(X 2 )
P (|X| ≥ c) ≤ .
c2
Proof: We will carry out the proof for a countably valued X and leave the analogous proof
for the density case as an exercise. The idea of the proof is the same for a general random
variable. Suppose that X takes the values xi with probabilities pi . Then we have
X
E(X 2 ) = pi x2i .
i
If we consider only those values xi satisfying the inequality |xi | ≥ c and denote by A the
corresponding set of indices i, namely A = {i : |xi | ≥ c}, then of course x2i ≥ c2 for i ∈ A,
whereas X
P (|X| ≥ c) = pi .
i∈A
Then if we sum the index i only over the partial set A, we have
X X X
E(X 2 ) ≥ pi x2i ≥ p i c2 = c2 pi = c2 P (|X| ≥ c).
i∈A i∈A i∈A
Example 3.5 Let X be any continuous random variable with E(X) = µ and V (X) = σ 2 .
Then, if = kσ = k standard deviations for some integer k, then
σ2 1
P (|X − µ| ≥ kσ) ≤ 2 2
= 2,
k σ k
just as in the discrete case.
37
4 Tutorial for Week 1
4.1 Preparation Problems (Homework)
Exercise 4.1 [§ 1.1] Venn Diagrams.
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
3. Show that, by the definition of complement, for any subset A of a sample space Ω,
(Ac )c = A, Ωc = ∅, ∅c = Ω, A ∪ Ac = Ω, A ∩ Ac = ∅.
Exercise 4.2 [§ 1.2] Find the sample space for the experiment:
1. Tossing 2 coins whose faces are sprayed with black paint denoted by B and white paint
denoted by W
2. Drawing 4 screws from a lot of left-handed and right-handed screws denoted by L and
R, respectively.
Exercise 4.3 [§ 1.3] Suppose we pick a letter at random from the word WAIMAKARIRI.
What is the sample space Ω and what probabilities should be assigned to the outcomes?
Exercise 4.4 [§ 1.3] In the toss an unfair die experiment with Ω = {1, 2, 3, 4, 5, 6}, the prob-
ability of the event A = {1, 3, 5} = 1/3. What is the probability of the event B = {2, 4, 6}?
1. What gives the greater probability of hitting some target at least once: (a) hitting in
a shot with probability 1/2 and firing 1 shot, or (b) hitting in a shot with probability
1/4 and firing 2 shots? First guess. Then calculate.
2. In rolling two fair dice, what is the probability of obtaining a sum greater than 4 but
not exceeding 7?
38
3. A local country club has a membership of 600 and operates facilities that include an
18-hole championship golf course and 12 tennis courts. Before deciding whether to
accept new members, the club president would like to know how many members regu-
larly use each facility. A survey of the membership indicates that 70% regularly use the
golf course, 50% regularly use the tennis courts, and 5% use neither of these facilities
regularly. Given that a randomly selected member uses the tennis courts regularly,
find the probability that they also use the golf course regularly.
4. Let X be the number of years before a particular type of machine will need replacement.
Assume that X has the probability function f (1) = 0.1, f (2) = 0.2, f (3) = 0.2,
f (4) = 0.2, f (5) = 0.3. Graph f and F . Find the probability that the machine needs
no replacement during the first 3 years.
5. A box contains 4 right-handed and 6 left-handed screws. Two screws are drawn at
random without replacement. Let X be the number of left-handed screws drawn.
Find the probabilities P (X = 0), P (X = 1), P (X = 2), P (1 < X < 2), P (X ≤ 1),
P (X ≥ 1), P (X > 1), and P (0.5 < X < 10).
6. One number in the following table for the probability function of a random variable X
is incorrect. Which is it, and what should the correct value be?
x 1 2 3 4 5
P (X = x) 0.07 0.10 1.10 0.32 0.40
39
1. First, the sample space is:
5. Using the addition rule for mutually exclusive events check that Axiom (2) is satisfied
for the simple events.
6. Consider the following events: C = {B, I, G} and D = {G, I, N}. Using the addition
rule for two arbitrary events compute P (C ∪ D).
Exercise 4.9 Associate a RV X with the BINGO experiment of Exercise 4.8. Note that
you can choose any RV for this job. Find the PMF and DF of X. Now define another RV Y
for this same experiment that counts the number of balls labelled by a vowel in the outcome
of one BINGO Trial. Is Y a discrete uniform RV? What is P (Y = 1)?
Exercise 4.10 Durrett (The Monty Hall problem). The problem is named for the host
of the television show Let’s Make A Deal in which contestants were often placed in situations
like the following: Three curtains are numbered 1, 2, and 3. Behind one curtain is a car;
behind the other two curtains are donkeys. You pick a curtain, say #1. To build some
suspense the host opens up one of the two remaining curtains, say #3, to reveal a donkey.
What is the probability you will win given that there is a donkey behind #3? Should you
switch curtains and pick #2 if you are given the chance?
http://www.math.canterbury.ac.nz/SOCR/SOCR Experiments.html
Choose Monty Hall experiment.
http://www.math.canterbury.ac.nz/SOCR/SOCR Games.html
Choose Monty Hall game.
Exercise 4.11 Based on past experience, 70% of students in a certain course pass the
midterm exam. The final exam is passed by 80% of those who passed the midterm, but only
by 40% of those who fail the midterm. What fraction of students pass the final:
Exercise 4.12 A small brewery has two bottling machines. Machine 1 produces 75% of
the bottles and machine 2 produces 25%. One out of every 20 bottles filled by machine 1 is
rejected for some reason, while one out of every 30 bottles filled by machine 2 is rejected.
What is the probability that a randomly selected bottle comes from machine 1 given that it
is accepted?
40
Exercise 4.13 A process producing microchips, produces 5% defective, at random. Each
microchip is tested, and the test will correctly detect a defective one 4/5 of the time, and if
a good microchip is tested the test will declare it is defective with probability 1/10.
(a) If a microchip is chosen at random, and tested to be good, what was the probability
that it was defective anyway?
(b) If a microchip is chosen at random, and tested to be defective, what was the probability
that it was good anyway?
(c) If 2 microchips are tested and determined to be good, what is the probability that at
least one is in fact defective?
Exercise 4.14 A gale is of force 1, force 2, or force 3, with probabilities 2/3, 1/4, 1/12
respectively.
Force 1 gales cause damage with probability 1/4;
force 2 gales cause damage with probability 2/3;
force 3 gales cause damage with probability 5/6.
(a) A gale is reported; what is the probability of it causing damage?
(b) If the gale DID cause damage, what are the probabilities that it was force 1; force 2;
force 3?
(c) If the gale DID NOT cause damage, what are the probabilities that it was force 1;
force 2; force 3?
Exercise 4.15 Of 200 adults, 176 own one TV set, 22 own two TV sets, and 2 own three
TV sets. A person is chosen at random. What is the probability function of X the number
of TV sets owned by that person?
Exercise 4.16 Suppose a discrete random variable X has probability function give by
x 3 4 5 6 7 8 9 10 11 12 13
P (X = x) .07 .01 .09 .01 .16 .25 .20 .03 .02 .11 .05
41
(b) X ≤ 5 ,
(c) X > 9 ,
(d) X ≥ 9 ,
(e) X < 12 ,
(f) 5 ≤ X ≤ 9 ,
(h) P (X = 14) ,
(i) P (X < 3) .
42
5 Tutorial for Week 2
5.1 Preparation Problems (Homework)
Exercise 5.1 Four fair coins are tossed simultaneously. Find the probability function of the
random variable X = Number of heads and compute the probabilities of obtaining no heads,
precisely 1 head, at least 1 head, not more than 3 heads.
Exercise 5.2 If the probability of hitting a target in a single shot is 10% and 10 shots are
fired independently, what is the probability that the target will be hit at least once?
Exercise 5.3 If X has the probability function f (x) = k/2x (x = 0, 1, 2, . . . ), what are k
and P (X ≥ 4)?
Exercise 5.4 Let p = 1% be the probability that a certain type of light bulb will fail in a
240hr test. Find the probability that a sign consisting of 10 such bulbs will burn 24 hours
with no bulb failures.
Exercise 5.5 Given a density f (x) = k if −4 ≤ x ≤ 4 and 0 elsewhere, what is the k value
Graph f and F .
Exercise 5.6 If the diameter X of axles has the density f (x) = k if 119.9 ≤ x ≤ 120.1 and
0 otherwise, how many defectives will a lot of 500 axles approximately contain if defectives
are axles slimmer than 119.92 or thicker than 120.08?
Therefore, the probability of ONE defective axles P (defective) = 0.1 + 0.1 = 0.2, and it is
expected that there is 0.2 × 500 = 100 defective axles in 500 axles.
Exercise 5.8 Let p = 1% be the probability that a certain type of light bulb will fail in a
240hr test. Find the probability that a sign consisting of 10 such bulbs will burn 24 hours
with no bulb failures.
43
Exercise 5.9 Suppose that a certain type of magnetic tape contains, on the average, 2 de-
fects per 100 meters. What is the probability that a roll of tape 300 meters long will contain
(a) x defects, (b) no defects?
Exercise 5.10 Find the probability that none of the three bulbs in a traffic signal must
be replaced during the first 1200 hours of operation if the probability that a bulb must be
replaced is a random variable X with density f (x) = 6[0.25 − (x − 1.5)2 ] when 1 ≤ x ≤ 2
and f (x) = 0 otherwise, where x is time measured in multiples of 1000 hours.
Exercise 5.11 Suppose that certain bolds have length L = 200 + X mm, where X is a
random variable with density f (x) = 34 (1 − x2 ) if −1 ≤ x ≤ 1 and 0 otherwise. Determine c
so that with a probability of 95% a bolt will have any length between 200 − c and 200 + c.
Exercise 5.12 Let the random variable X with density f (x) = ke−x if 0 ≤ x ≤ 2 and
0 otherwise be the time after which certain ball bearings are worn out. Find k and the
probability that a bearing will last at least 1 year.
P (X ≥ 1) = 1 − P (x < 1)
Z 1
=1− ke−x
0
1
= 1 + k e−x 0
e−1 − 1
=1+ = 0.2689414
1 − e−2
Exercise 5.13 Assume that a new light bulb will burn out after t hours, where t is chosen
from [0, ∞) with an exponential density
f (t) = λe−λt .
(a) Assume that λ = 0.01, and find the probability that the bulb will not burn out before
T hours. This probability is often called the reliability of the bulb.
Exercise 5.14 Choose a number B at random from the interval [0, 1] with uniform density.
Find the probability that
44
(a) 1/3 < B < 2/3
(b) |B − 1/2| ≤ 1/4
(c) B < 1/4 or 1 − B < 1/4
(d) 3B 2 < B
Exercise 5.15 IQ scores for school children are standardised so that they are approximately
Normally distributed with a mean of 100 and a standard deviation of 15. What is approxi-
mately the probability that a randomly selected child has an IQ
(a) less than 80?
Exercise 5.16 We return to IQ scores that are approximately Normally distributed with a
mean of 100 and a standard deviation of 15.
(a) What is the 80% percentile of IQ scores?
(c) Below what score do only the bottom 30% of children fall?
Exercise 5.17 What is the expected daily profit if a store sells X air conditioners per day
with probability f (10) = 0.1, f (11) = 0.3, f (12) = 0.4, f (13) = 0.2 and the profit per
conditioner is $55?
Exercise 5.18 If the mileage (in multiples of 1000 mi) after which a tire must be replaced
is given by the random variable X with density f (x) = λe−λx (x > 0), what mileage can
you expect to get on one of these tires? Let λ = 0.04 and find the probability that a tire
will last at least 40000 mi.
Exercise 5.19 A small filling station is supplied with gasoline every Saturday afternoon.
Assume that its volume X of sales in ten thousands of gallons has the probability density
f (x) = 6x(1 − x) if 0 ≤ x ≤ 1 and 0 otherwise. Determine the mean, the variance.
45
Exercise 5.20 Let X be normal with mean 80 and variance 9. Find P (X > 83), P (X < 81),
P (X < 80), and P (78 < X < 82).
Exercise 5.21 If the lifetime X of a certain kind of automobile battery is Normally dis-
tributed with a mean of 4 yr and a standard deviation of 1 yr, and the manufacturer wishes
to guarantee that battery for 3 yr, what percentage of the batteries will he or she have to
replace under the guarantee?
Exercise 5.22 If the mathematics scores of the SAT college entrance exams for undergrad-
uate admission in the U.S. are Normally distributed with mean 480 and standard deviation
100 and if some college sets 500 as the minimum score for new students, what percent of
students will not reach that score?
Exercise 5.23 A manufacturer produces airmail envelopes whose weight is Normally dis-
tributed with mean µ = 1.95 grams and standard deviation σ = 0.025 grams. The envelopes
are sold in lots of 1000. How many envelopes in a lot will be heavier that 2 grams?
Exercise 5.24 Find the mean and the variance of the random variable X
3. f (x) = 2e−2x (x ≥ 0)
46
For any given value z, its cumulative probability Φ(z) was generated by Excel formula NORMSDIST, as NORMSDIST(z).
0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.9803 2.56 0.9948
0.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.9808 2.57 0.9949
0.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.9812 2.58 0.9951
0.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.9817 2.59 0.9952
0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953
0.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.9826 2.61 0.9955
0.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.9830 2.62 0.9956
0.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.9834 2.63 0.9957
0.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.9838 2.64 0.9959
0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960
0.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.9846 2.66 0.9961
0.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.9850 2.67 0.9962
0.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.9854 2.68 0.9963
0.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.9857 2.69 0.9964
0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965
0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.9864 2.71 0.9966
0.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.9868 2.72 0.9967
0.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.9871 2.73 0.9968
0.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.9875 2.74 0.9969
0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970
0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.9881 2.76 0.9971
0.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.9884 2.77 0.9972
0.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.9887 2.78 0.9973
0.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.9890 2.79 0.9974
0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974
0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.9896 2.81 0.9975
0.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.9898 2.82 0.9976
0.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.9901 2.83 0.9977
0.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.9904 2.84 0.9977
0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978
0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.9909 2.86 0.9979
0.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.9911 2.87 0.9979
0.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.9913 2.88 0.9980
0.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.9916 2.89 0.9981
0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981
0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.9920 2.91 0.9982
0.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.9922 2.92 0.9982
0.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.9925 2.93 0.9983
0.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.9927 2.94 0.9984
0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984
0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.9931 2.96 0.9985
0.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.9932 2.97 0.9985
0.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.9934 2.98 0.9986
0.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.9936 2.99 0.9986
0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987
48